Advertisement Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

504 Good

About implicit

  • Rank
  1. Quote:Original post by Nik02 I believe the effect is accomplished by using a scanline interrupt which shifts the color offset of the grid cells and/or the pointer that is used to look up the grid data. A scanline interrupt can be conceptually thought of as a hardware event handler that is launched when each horizontal line is drawn on the screen.But the effect here produces multiple independently-curved lines with perhaps as many as a dozen colour transitions along a single scanline, how could this be achieved by distorting a playfield or altering a palette entry or two on a line-by-line basis? Personally I'd say it's just a pre-rendered animation. Judging from a quick YouTube check it seem that Sonic only ever runs along the grid lines and turns at 90-degrees on grid corners, also there's perhaps eight frames in either animation. By switching the colours for symmetry, mirroring the field when running, playing the clockwise rotation backward, and masking the sky to store the ground in monochrome you could perhaps squeeze everything into 50k or so. The rings and orbs are obviously just sprites pre-scaled at a couple of different sizes. Information about the grind corners could easily be saved along with the ground bitmaps, so you'd know their screen coordinates, zoom-level and whether or not they're beneath the horizon without any real-time math.
  2. If you've using Visual Studio then try enabling Edit and Continue. It's rather limited, and Microsoft seem intent on crippling their debugging features lately, but in many cases it can be a real time-saver.
  3. The local function doesn't seem to be declared anywhere in the code sent to TCC, so by default the compiler assumes it takes a double and promotes the float (e.g. by the pre-ANSI K&R rules, much as for variable argument lists like in printf.). Try adding a "void local(float f);" declaration to the code string.
  4. implicit

    understanding mysterious C code

    Quote:Original post by sindisil Quote:Original post by Ftn Better yet, why is macro needed? inline bool is_8byte_aligned(void const* pointer) { unsigned char const* char_pointer = reinterpret_cast<unsigned char const*>(pointer); return char_pointer == (char_pointer + 7) & ~size_t(7); } That's C++, not C (which is what the OP says they were asking about).You can't bit-mask pointers directly in either language. I suppose your safest bet is something like:if((char *) ptr - NULL & 7) // Unaligned..At any rate this kind of stuff is inherently unportable, just consider the case of trying to align a near pointer to a 32-byte cache line in 16-bit x86 real-mode when the segment is odd.
  5. implicit

    understanding mysterious C code

    Quote:Original post by Storyyeller Ok, what about this one. Is there any reason why stride can't be replaced by (1<<6*i) everywhere? Surely there's no point to memoization with such a trivial task, and even if there was, the compiler would just do it anyway, right?Is the array large enough for the calculation to wrap? On many processors (including x86 ones since the 286) shifts counts greater than the word-size are simply masked, so e.g. p << q is equivalent to p << (q & 31) for 32-bit integers. As for compiler optimization keep in mind that there are some spectacularly bad ones out there (the PIC compiler I've been working with lately comes to mind.) Beyond that I can only guess it's a performance thing, though whether or not it's a benefit depends heavily on the processor.
  6. implicit

    Simple division is too slow

    This is slightly off-topic, but is there a way of linking-up multiple 32-bit word divisions (or more precisely 64-by-32 division) to perform multi-word division, as opposed to resorting to the traditional binary method? I've tried my hand at doing this once or twice over the years and it seems like it ought to be possible, for symmetry with multiplication if nothing else. The case of a multi-word dividend and a single-word divisor is straightforward at least, but beyond that I'm thoroughly stumped.uint32_t divide(uint32_t *numer /* big-endian */, uint32_t denom, size_t len) { uint64_t acc = 0; while(len--) { acc = acc << 32 | *numer; *numer++ = acc / denom; acc %= denom; } return acc; }
  7. Quote:Original post by Antheus Run the application inside a virtual machine and hope the VM is safe enough. Anything else is just a matter of effort. Given enough, anything is possible. As with all such things - one can only confirm presence of an exploit, not prove its absence.Perhaps, but you can still try to minimize the damage if the worst happens. At the very least if you're worried about exploits you should give up any access rights and privileges the process doesn't really need, using functions like chroot and cap_set_proc and their Win32 equivalents. Forking off a child process as a sandbox for the libraries might not be such a bad idea either.
  8. implicit

    int pow not float?

    Any particular reason why the regular floating-point power function with arguments converted to/from integers won't do? At any rate an integer implementation usually looks something like this:int ipow(int base, unsigned int exp) { int result = 1; do { if(exp & 1) result *= base; base *= base; } while(exp >>= 1); return result; }
  9. implicit

    count bits set in a number

    This kind of bithack is probably more magic than it is computer science, and likelihood less efficient than more straightforward algorithms on most architectures, but it is nevertheless an interesting method so I'll try to explain how I believe it works. First off lets consider another, easier, way of counting the bits in a 32-bit word:int bitcount(unsigned int n) { n = (n & 0x55555555) + ((n >> 1) & 0x55555555); n = (n & 0x33333333) + ((n >> 2) & 0x33333333); n = (n & 0x0F0F0F0F) + ((n >> 4) & 0x0F0F0F0F); n = (n & 0x00FF00FF) + ((n >> 8) & 0x00FF00FF); n = (n & 0x0000FFFF) + ((n >> 16) & 0x0000FFFF); return n; }This parallel divide-and-conquer method works on several smaller bit-fields at once within the same 32-bit integer. First the input is divided into two-bit pairs where the upper bit is added to the lower bit to calculate the total number of bits set within the pair (e.g. zero to two.) The same procedure is then repeated to combine the two-bit words into four-bit words, four-bit words into eight-bit words, and so on until we're left with a single bit count for the entire input. Now, lets consider the first line of the algorithm you posted:tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111);This does actually divides the input into three-bit (or octal digits, hence the funky constants) in a similar way to the above solution. To see this consider what happens to a single digit in the two subtractions. First we subtract half of the digit from itself while masking off the least-significant bit so as not to spill over into the next lower digit. This yields:(4·c + 2·b + 1·a) - (2·c + 1·b) = 2·c + b + aWe then subtract one quarter of the original digit in a similar fashion to get:(2·c + b + a) - (1·c) = c + b + a... and we're left with just the three-bit counts we wanted. The part of the code, ((tmp + (tmp >> 3)) & 030707070707), simply combines pairs of three-bit digits into six-bit words as I described before. Only the single final and is necessary here since the maximum sum of two three-bit digits is six, which fits neatly within a digit without any risk of overflowing into the next. The final modulo trick is then used to sum up each of the separate 6-bit fields into a single count. The idea here is that for any natural number, x, the sum of its' digits is congruent x modulo the base minus one (mod 63 in our particular case.) So the remainder will be equal to the total bit count as long as no more than 62-bits are set. In other words if (xn ... x1 x0) are the base-64 digits of x then:x ≡ (x0 + x1 + ... + xn) (mod 63)You can see that this holds since subtracting RHS from LHS gives us an expression congruent with zero:x - (x0 + x1 + ... + xn) = (640x0 + 641x1 + 64nxn) - (x0 + x1 + ... + xn) = (640 - 1)x0 + (641 - 1)x1 + ... + (64n - 1)xnWhere each (64n - 1) factor is clearly divisible by 63.
  10. Quote:Original post by Nypyren There's no EASIER way than ReadProcessMemory. You could also: - Redirect stdout (it sounds like you're already doing this) - Named pipes - Memory mapped file (recommended if you have the source code for both programs) - Windows messages - Sockets - Code Injection (last resort if you don't have the source code for one program)Command line arguments and environment variables can also come in handy if you just want to pass on a couple of settings.
  11. implicit

    x86 assembly Q, IDIV problem

    You need to sign-extend the dividend (typically by replacing XOR EDX,EDX with CDQ.) Otherwise IDIV tries to divide 0xFFFFFFFF in EDX:EAX by one and you get an overflow exception since the resulting quotient won't fit in a signed 32-bit integer.
  12. implicit

    Optimise my asm code to mmx, pls?!

    Not to do all of your work for you but the basic 32-bit version of the innerloop might look something like this in SSE2 assembly: ;Load and vertical addition movdqa xmm0,[src-0] movdqa xmm1,[src-1] pavgb xmm0,[src-2] pavgb xmm1,[src-3] pavgb xmm0,xmm1 ;Horizontal addition pshufd xmm1,xmm0,01001110b pavgb xmm0,xmm1 pshufd xmm1,xmm0,10110001b pavgb xmm0,xmm1 ;Store movdqa [dst-0],xmm0 movdqa [dst-1],xmm0 movdqa [dst-2],xmm0 movdqa [dst-3],xmm0Where the source and destination expressions are 16-byte aligned pointers to each of the four source and destination lines.
  13. implicit

    Optimise my asm code to mmx, pls?!

    So if I interpret your code correctly it takes a packed 24-bit RGB source bitmap and "pixelizes" it by writing the average of each 4x4 block into the destination. The tricky bit here is dealing with the odd 24-bit packing in the MMX code, things will get a whole lot easier (and faster) if you can settle for a 32-bit format. If not then going for something beyond the basic MMX instruction set would certainly help (particularly the SSE shuffles.)
  14. Be careful when overloading the Win32 functions. Anything dealing with strings will be re-#defined to a specific Unicode or ANSI function, depending on what you're compiling as. So, e.g., CreateWindow would be come CreateWindowA or CreateWindowW. Anyway... In practice this usually causes trouble when attempting to overload such a function name in a header, where one module using the function includes <windows.h> and another doesn't, or where <windows.h> is included after your own header.
  15. implicit

    C++ realloc troubles...

    Quote:Original post by Zahlman Quote:not to mention need 30% less memory. You can "trim" the vector at the end of the insertion process if necessary. If there is no clear "end of the insertion process", then there probably isn't any good reason to want to keep an "exact" allocation.Actually I was referring to keeping both the old and new buffers in memory simultaneously while resizing, which makes std::vector somewhat impractical for very large arrays. You do make a good point though. People (myself included) don't seem to bother trimming the excess fat off of vectors nearly as often as they should. Perhaps a three-line helper function to wrap up the swap trick would help to encourage the practice.
  • Advertisement

Important Information

By using, you agree to our community Guidelines, Terms of Use, and Privacy Policy. is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!