fast float number multiplication by 2 and 0.5
Does anybody know a fast way (asm code) to perform a 32 bit float number multiplication by 2 or 0.5, or any integer power of 2?
Or is there a special asm statement that does this, and the compiler does use it in release mode?
Would shifting the mantissa work?
Or is it not worth bothering with it?
I always thought shifting it was the fastest way, but I also thought that compilers can automatically generate a binary shift when they encounter those numbers (hard coded) in code. So if you write:
number *= 2;
it generates:
number <<= 1;
and the equivalent asm/fpu stuff ( I don't remember the fpu command for binary shift)
number *= 2;
it generates:
number <<= 1;
and the equivalent asm/fpu stuff ( I don't remember the fpu command for binary shift)
@shadow12345 : Shifts only work on (unsigned?) integers and not on floats as he wanted.
@szinkopa : I doubt there is an instruction for this. You might get away with some "hacking" on FP exponent (+1 for *2 and -1 for /2) but I highly doubt it's worth it. And as always: don't optimize until profiler tells you to. :)
@szinkopa : I doubt there is an instruction for this. You might get away with some "hacking" on FP exponent (+1 for *2 and -1 for /2) but I highly doubt it's worth it. And as always: don't optimize until profiler tells you to. :)
To multiply by 2 you could always just do something like x += x so the FPU doesn't have to use its multiply instruction, but chances are the multiply instruction is either optimized for certain conditions on chip or the compiler will optimize that code for you, or both.
Quote:Original post by _DarkWIng_
@shadow12345 : Shifts only work on (unsigned?) integers and not on floats as he wanted.
@szinkopa : I doubt there is an instruction for this. You might get away with some "hacking" on FP exponent (+1 for *2 and -1 for /2) but I highly doubt it's worth it. And as always: don't optimize until profiler tells you to. :)
Thats not true. You just have to do some checking and error correcting after doing the bitshift. Of course this kills your speed gains, but a creative programmer can find ways to work around this issue.
I agree if its not a problem don't optimize it.
Quote:Original post by ChaosX2
Thats not true. You just have to do some checking and error correcting after doing the bitshift. Of course this kills your speed gains, but a creative programmer can find ways to work around this issue.
Can you give me an example how you would make *2 with shifts on floating point numbers. I've never seen that. Or were you talking about signed integers? If later then I know it can be done, I'm just saying I doubt it's worth the troubble.
OK thanks.
Yes, multiplying with 2 is easy and fast x+=x. The reason I posted, that I have a lot of .../2 or ...*0.5, and I thought there might be a way to do it faster by an inline function call like z=Half(x+y); or so. But it's not so crucial.
Yes, multiplying with 2 is easy and fast x+=x. The reason I posted, that I have a lot of .../2 or ...*0.5, and I thought there might be a way to do it faster by an inline function call like z=Half(x+y); or so. But it's not so crucial.
Once again the IEEE 754 standard might help. If your float is in memory, you can improve speed. Integer addition usually has less latency than the floating point multiplication.
A simple integer addition is required to multiply a float in memory by two (or 2^n) or divide it by two (or 2^n) :
float x;
//...
// Dividing by 2 just reducing exponent by one.
*(int*)&x=(*(int*)&x)-1L<<23L;
EDIT:
- Note that shifting the mantissa would not work.
- Such a trick could be used to replace a 3DNow mul by a faster MMX add.
A simple integer addition is required to multiply a float in memory by two (or 2^n) or divide it by two (or 2^n) :
float x;
//...
// Dividing by 2 just reducing exponent by one.
*(int*)&x=(*(int*)&x)-1L<<23L;
EDIT:
- Note that shifting the mantissa would not work.
- Such a trick could be used to replace a 3DNow mul by a faster MMX add.
Quote:Original post by Charles B
float x;
//...
// Dividing by 2 just reducing exponent by one.
*(int*)&x=(*(int*)&x)-1L<<23L;
Impressive!
One of the thousand math hacks I know ;) But really all you need here is really understand the IEEE 754 format. There is enough on the web to find numerous ideas. Such as quick fabs, etc... Although beware of such tricks, CPUs do not like store load forward dependencies. Specially when the store (fstp) and load (mov) is not of the same type. Prefer normal floating point code in general, today the FPUs are very fast.
So such advices are mostly relevant if you process float data already stored in memory. And array of vertices for instance.
So such advices are mostly relevant if you process float data already stored in memory. And array of vertices for instance.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement