Problem: turning a bit shift into a division

Started by
24 comments, last by Sik_the_hedgehog 12 years ago
Well, the marked solution here is correct across all values.

* val < 0 ? ((val - 255) / 256) : (val / 256) // performance is not a concern => no bit manipulations in source[/quote]

This one is simplest, but incorrect for values where (val - 255) overflows.

Just about all other answers in SO's thread are also incorrect in similar ways. What's worse is that they may work if int is 64 bits.
Advertisement
The problem is that such a solution requires comparisons, which is something I'd definitely like to avoid, especially since I don't trust the compiler to actually understand the idiom even when optimization is set to maximum (I suppose that in the worst case I could just do an #ifdef, but whatever).

And I said I don't care about overflow. If the value passed is large enough that overflow happens, then something is seriously wrong with the program and I shouldn't expect it to behave properly at all. So, I don't care about what happens if overflow kicks in.

Also in case this helps: this function is used to turn speeds into pixels. It takes a 24.8 fixed comma value and returns a normal integer. The integer is rounded in such a way as to fake subpixel precision (e.g. if it's 320 (1¼), it'd return 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 1, etc. as frames pass by). I still can afford to change the algorithm yet, but I'd like the behavior I've described to remain.
Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.
Have you tried looking at what assembly the various methods generate?

A compiler might be able to optimize something like this in to a very short instruction sequence:

uint32_t signed_rshift(uint32_t number, uint32_t shift)
{
if (number & 0x80000000) return (~(((~number) + uint32_t(1)) >> shift)) + uint32_t(1); // check this ;)
else return number >> shift;
}


I know for example that GCC will optimize circular bit-rotations implemented with shifts and bitwise-or in to ROL/ROR on x86, so it's probably worth a go...
I know for example that GCC will optimize circular bit-rotations implemented with shifts and bitwise-or in to ROL/ROR on x86, so it's probably worth a go...

That's probably the worst example out there, since it's a very common idiom GCC was explicitly programmed to handle. In fact, if you do it in any way other than you've described (two shifts and an OR), GCC will not generate a rotate instruction, even if the final result is guaranteed to be 100% identical.

I *know* there's a way to turn a signed division into a signed bit shift properly that involves adding some number to the value before dividing it, but I can't recall the details right now. I suppose the solution here would be to do the same, but subtracting the value instead of adding it.

EDIT: OK, I can't find out how that was done, argh. But in the meanwhile I tested some combinations... GCC, -O3, x86-64 (also being GCC means AT&T syntax, watch out).

Bit shift:
void test1(int value) {
result = value >> 8;
}
sarl $8, %edi

Naive comparison:

void test2(int value) {
result = value >= 0 ? value / 0x100 : (value + 0xFF) / 0x100;
}
testl %edi, %edi
jns .L5
leal 510(%rdi), %eax
addl $255, %edi
cmovs %eax, %edi
.L5:
sarl $8, %edi

Same as above, trying to fool it to go into the asm optimizer:
void test3(int value) {
if (value < 0) value += 0xFF;
result = value / 0x100;
}
leal 255(%rdi), %eax
testl %edi, %edi
cmovs %eax, %edi
leal 255(%rdi), %eax
testl %edi, %edi
cmovs %eax, %edi
sarl $8, %edi


PS: somebody make the editor stop stripping indentation when copying! =/
Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.
I really don't understand the goal here.

It should duplicate original behavior, yet it doesn't matter if algorithm isn't correct.
It should be portable, yet should be a perfect match for generated assembly opcode.
It should be compiler independent and standard-compliant yet perfectly optimized for any compiler, even those not written and those running on exotic architectures.
Performance is secondary to portability yet performance comes first.

Bitshift is perfectly fine solution if performance matters. Otherwise, there's a so-called reference implementation available on stack overflow.

Implement that, then profile. Is it good enough? Then it's fine. Otherwise, use an ifdef and a bitshift.

Especially since for this type of optimization keyhole optimizations won't matter. 5 lines of assembly is frequently faster than a single instruction.

And if performance is crucial, there's SIMD, for which either approach looks good.
It should duplicate original behavior, yet it doesn't matter if algorithm isn't correct.

The game isn't out, so the behavior still can change - but it must remain the same across all systems once it's ready (ruling out floating point). The only important thing here is what I mentioned before - faking subpixel accuracy for momentum while using integer coordinates.

It should be portable, yet should be a perfect match for generated assembly opcode.

Well, I don't want it to pull off an entire page of code when a single (common!) opcode does the job. How come there isn't a way to get compilers do something reasonable without relying on undefined behavior, I don't get it, really...

It should be compiler independent and standard-compliant yet perfectly optimized for any compiler, even those not written and those running on exotic architectures.

Or, you know, the major compilers, because those are the ones we're discussing that generate suboptimal code. Personally I'd ditch all non-two's-complement systems altogether but the standard doesn't say so and I don't want to deal with bug reports from tools that complain if I use bit shift on signed integers.

Maybe I should ask GNU to add a specific idiom for this case in GCC (that's the compiler I'm using right now)

EDIT: for the record, this is the best I had gotten (without trying to recreate 1:1 the algorithm), and it doesn't do the job because it skews towards zero (specially noticeable when jumping since gravity goes through 0 when starting to fall):
return (value + comma_offset - 0x80) / 0x100;
Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

This topic is closed to new replies.

Advertisement