Back to General and Gameplay Programming

Problem: turning a bit shift into a division

General and Gameplay Programming Programming

Started by Sik_the_hedgehog April 22, 2012 05:24 PM

24 comments, last by Sik_the_hedgehog 12 years ago

Antheus

2,410

April 23, 2012 08:48 PM

Well, the marked solution here is correct across all values.

* val < 0 ? ((val - 255) / 256) : (val / 256) // performance is not a concern => no bit manipulations in source[/quote]

This one is simplest, but incorrect for values where (val - 255) overflows.

Just about all other answers in SO's thread are also incorrect in similar ways. What's worse is that they may work if int is 64 bits.

Sik_the_hedgehog

3,003

Author

April 23, 2012 09:01 PM

The problem is that such a solution requires comparisons, which is something I'd definitely like to avoid, especially since I don't trust the compiler to actually understand the idiom even when optimization is set to maximum (I suppose that in the worst case I could just do an #ifdef, but whatever).

And I said I don't care about overflow. If the value passed is large enough that overflow happens, then something is seriously wrong with the program and I shouldn't expect it to behave properly at all. So, I don't care about what happens if overflow kicks in.

Also in case this helps: this function is used to turn speeds into pixels. It takes a 24.8 fixed comma value and returns a normal integer. The integer is rounded in such a way as to fake subpixel precision (e.g. if it's 320 (1¼), it'd return 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, etc. as frames pass by). I still can afford to change the algorithm yet, but I'd like the behavior I've described to remain.

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

the_edd

2,109

April 23, 2012 11:15 PM

Have you tried looking at what assembly the various methods generate?

A compiler might be able to optimize something like this in to a very short instruction sequence:



uint32_t signed_rshift(uint32_t number, uint32_t shift)

{

	if (number & 0x80000000) return (~(((~number) + uint32_t(1)) >> shift)) + uint32_t(1); // check this ;)

	else return number >> shift;

}

I know for example that GCC will optimize circular bit-rotations implemented with shifts and bitwise-or in to ROL/ROR on x86, so it's probably worth a go...

http://www.mr-edd.co.uk
http://bitbucket.org/edd

Sik_the_hedgehog

3,003

Author

April 24, 2012 05:59 AM

I know for example that GCC will optimize circular bit-rotations implemented with shifts and bitwise-or in to ROL/ROR on x86, so it's probably worth a go...

That's probably the worst example out there, since it's a very common idiom GCC was explicitly programmed to handle. In fact, if you do it in any way other than you've described (two shifts and an OR), GCC will not generate a rotate instruction, even if the final result is guaranteed to be 100% identical.

I *know* there's a way to turn a signed division into a signed bit shift properly that involves adding some number to the value before dividing it, but I can't recall the details right now. I suppose the solution here would be to do the same, but subtracting the value instead of adding it.

EDIT: OK, I can't find out how that was done, argh. But in the meanwhile I tested some combinations... GCC, -O3, x86-64 (also being GCC means AT&T syntax, watch out).

Bit shift:

void test1(int value) {

   result = value >> 8;

}

sarl $8, %edi

Naive comparison:



void test2(int value) {

   result = value >= 0 ? value / 0x100 : (value + 0xFF) / 0x100;

}

testl %edi, %edi

jns .L5

leal 510(%rdi), %eax

addl $255, %edi

cmovs %eax, %edi

.L5:

sarl $8, %edi

Same as above, trying to fool it to go into the asm optimizer:

void test3(int value) {

   if (value < 0) value += 0xFF;

   result = value / 0x100;

}

leal 255(%rdi), %eax

testl %edi, %edi

cmovs %eax, %edi

leal 255(%rdi), %eax

testl %edi, %edi

cmovs %eax, %edi

sarl $8, %edi

PS: somebody make the editor stop stripping indentation when copying! =/

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

Antheus

2,410

April 24, 2012 03:04 PM

I really don't understand the goal here.

It should duplicate original behavior, yet it doesn't matter if algorithm isn't correct.
It should be portable, yet should be a perfect match for generated assembly opcode.
It should be compiler independent and standard-compliant yet perfectly optimized for any compiler, even those not written and those running on exotic architectures.
Performance is secondary to portability yet performance comes first.

Bitshift is perfectly fine solution if performance matters. Otherwise, there's a so-called reference implementation available on stack overflow.

Implement that, then profile. Is it good enough? Then it's fine. Otherwise, use an ifdef and a bitshift.

Especially since for this type of optimization keyhole optimizations won't matter. 5 lines of assembly is frequently faster than a single instruction.

And if performance is crucial, there's SIMD, for which either approach looks good.

Sik_the_hedgehog

3,003

Author

April 24, 2012 03:18 PM

It should duplicate original behavior, yet it doesn't matter if algorithm isn't correct.

The game isn't out, so the behavior still can change - but it must remain the same across all systems once it's ready (ruling out floating point). The only important thing here is what I mentioned before - faking subpixel accuracy for momentum while using integer coordinates.

It should be portable, yet should be a perfect match for generated assembly opcode.

Well, I don't want it to pull off an entire page of code when a single (common!) opcode does the job. How come there isn't a way to get compilers do something reasonable without relying on undefined behavior, I don't get it, really...

It should be compiler independent and standard-compliant yet perfectly optimized for any compiler, even those not written and those running on exotic architectures.

Or, you know, the major compilers, because those are the ones we're discussing that generate suboptimal code. Personally I'd ditch all non-two's-complement systems altogether but the standard doesn't say so and I don't want to deal with bug reports from tools that complain if I use bit shift on signed integers.

Maybe I should ask GNU to add a specific idiom for this case in GCC (that's the compiler I'm using right now)

EDIT: for the record, this is the best I had gotten (without trying to recreate 1:1 the algorithm), and it doesn't do the job because it skews towards zero (specially noticeable when jumping since gravity goes through 0 when starting to fall):
return (value + comma_offset - 0x80) / 0x100;

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

Problem: turning a bit shift into a division

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Problem: turning a bit shift into a division

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines