Archived

This topic is now archived and is closed to further replies.

technobot

How to do fast floating-point multiplication by 1, 0, and -1?

Recommended Posts

I have two single-percision floating-point variables, which I need to multiply as quickly as possible (this is done many times in a loop). One of the variables can potentially contain any real number, but the other is known to contain either 1, 0, or -1. I don't know in advance which of the three is contained in that variable. Given this situation, how can I optimize the multiplication? Is it perhaps possible to encode 1, 0, and -1 as some other special values and do the multiplication with bitwise operations (that would be the optimal solution, I think)? Is there some other way? EDIT: Is this supposed to go into the General Programming forum instead?.... Michael K., Co-designer and Graphics Programmer of "The Keepers"
We come in peace... surrender or die! [edited by - technobot on July 27, 2003 3:19:49 PM]

Share this post


Link to post
Share on other sites
if you just had -1 and 1 you could encode -1 as 0x80000000 and 1 as 0, and than you could do an integer xor.

c = a ^ b;

This will work, but I don''t think it will be faster, since fmul costs 1 clock tick on modern cpu''s, and if you want to do other floating point opts on them after this, they can be kept in the floating point registers if you use fmul. I think just using fmul, or an sse equivalent will be the best way.

My Site

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quasar, I''m not as skilled as you, but I know the IEEE format. How would xoring two sign bits which equal 0, give c a negative sign? Say you''re multiplying by -1 (0x80000000), alright?...so xoring that int with a positive or zero float would be bluntly:
0x800000000
XOR
0xMantissa^Exponent
0 XOR 0 is 0, when that sign bit should be 1, I think. Because + * - = negative. Something''s screwy.

Share this post


Link to post
Share on other sites

Share this post


Link to post
Share on other sites
You should probably just use fmul, any tricks you pull otherwise will just eat up an integer register and probably take more cycles. You might be able to get a speedup by ordering the arguments well, but that might already be done internally.

Share this post


Link to post
Share on other sites
Using conditions is not an option, as the multiplications are part of a more complex calculation, and there are several such multiplications per calculation. I.e. something like:

result1 = (bla1 * bla2) + (bla3 * bla4) + (bla5 * bla6);
result2 = (bla7 * bla8) + (bla9 * bla10) + (bla11 * bla12);
result3 = interpolate(result1, result2, factor);

Basically, this is part of the improved version of Ken Perlin's perlin noise, as described in this paper. One of the improvement is to use specially selected gradient vectors whose components can only be 1, 0, or -1.

Part of the algorithm is a series of dot products between those vectors and certain other vectors. In the performance section of the paper, Ken Perlin writes: "... the new algorithm runs approximately 10 percent faster than the original. The cost of the extra multiplies required to compute the three corrected interpolants is apparently outweighted by the savings from the multiplies no longer required to compute the eigth inner products . ".

What bugs me is: how are the dot (inner) products eliminated??

EDIT: Only 1 cycle for an fmul? Hmm... that's interesting...

Michael K.,
Co-designer and Graphics Programmer of "The Keepers"



We come in peace... surrender or die!

[edited by - technobot on July 28, 2003 4:09:15 PM]

Share this post


Link to post
Share on other sites
A couple questions:

1. Will the variables that are -1, 0, or 1 be set to the constants -1/0/1, or will mathematical calculations produce these numbers?

2. In the equations you just showed us, which are the variables that are 0/1/-1?


How appropriate. You fight like a cow.

Share this post


Link to post
Share on other sites
what he (perlin) talks about is actually hw implementation. THERE mults are expensive, THERE replacing it by some simple bitwise logic to "multiply" -1,0,1 together is useful. it replaces a full fpu multiplier for 32bit floats (or similar) by some and|or logic for 2bits.. or even a lookuptable for the actual product is enough..

dotproduct[(a<<4)|(b<<2)|c] where a,b,c = {0,1,2} with mapping 0 := -1.f, 1 := 0.f, 2 = 1.f

every hw implementation of what ever just LOVES to not use any floatingpoint math where needed.

just expensive processors are directly made to be fast on fpu''s.. and slow on switching between fpu and integer unit (passing values there around..).

the only place where i''d see the possible optimisation to gain performance _could_ be in SSE, where you can just and,or,xor a bitmask, and anding/xoring a bitmask is enough to multiply a float by -1,0,1 i think..

"take a look around" - limp bizkit
www.google.com

Share this post


Link to post
Share on other sites
Here''s a suggestion: write nine functions (one for each possible 3D vector with component values of -1, 0, and 1). You might gain enough from the faster functions to overcome the branch overhead.

Share this post


Link to post
Share on other sites
Maybe you could do something with template specializations?


template<float TNum> void DoStuff( float& f )
{
}

template<> void DoStuff<0.0f>( float &f )
{
f = 0;
}

template<> void DoStuff<1.0f>( float& f )
{
f = f;
}

template<> void DoStuff<-1.0f>( float& f )
{
f = -f;
}




:::: [ Triple Buffer V2.0 ] ::::


[edited by - ifoobar on July 29, 2003 10:49:49 AM]

Share this post


Link to post
Share on other sites
quote:
Original post by Sneftel
1. Will the variables that are -1, 0, or 1 be set to the constants -1/0/1, or will mathematical calculations produce these numbers?

2. In the equations you just showed us, which are the variables that are 0/1/-1?


1. constants
2. the bla''s with even indices

davepermen: he was talking about, and I quote, "C implementations on the Intel optimizing compiler running on a Pentium 3". Sounds very much like a software implementation to me.... Nevertheless, what you''re saying does indeed make sence for me.

JohnBolton: calling the functions would probably be more expensive... (yes, I know, I know -- profile first, talk later )

IFooBar: no templates (of this sort, at least) in Delphi.

Michael K.,
Co-designer and Graphics Programmer of "The Keepers"



We come in peace... surrender or die!

Share this post


Link to post
Share on other sites
quote:
Original post by xMcBaiNx
that and... NEVER TEST FLOATS FOR EQUALITY. :D


Why the hell not? I know you could get problems with not being exactly that number you''re testing, but then you could resort to checking a range. An equality test is just a range test with a range size of 0. You do have to worry about the type, float constants might not mix with double variables well. What''s wrong with that?

Share this post


Link to post
Share on other sites
quote:
Original post by Doc
quote:
Original post by xMcBaiNx
that and... NEVER TEST FLOATS FOR EQUALITY. :D


Why the hell not? I know you could get problems with not being exactly that number you''re testing, but then you could resort to checking a range. An equality test is just a range test with a range size of 0. You do have to worry about the type, float constants might not mix with double variables well. What''s wrong with that?


Beacause, basically 0.0 != -0.0. They are different numbers.

Si if you write if( x == 0.0f) then when you get x=-0.0f coming along your if will false which is almost certainly not desired behaviour.

Share this post


Link to post
Share on other sites
Technically, taking the IF route, you''d probably want to do something like the following:



#define SMALL_ASS_NUMBER 0.00001

inline double LameMul(double x, double y)
{
if(x < -SMALL_ASS_NUMBER) return -y;
if(x > SMALL_ASS_NUMBER) return y;
return 0;
}


That''d do the trick, and would account for float screwups.

(Keep in mind I randomly picked the small constant value, because I don''t remember what the standard one should be. It''s a different suggested value for doubles and floats).

Josh

Share this post


Link to post
Share on other sites
That''s also two branches and a function call. That''s awful performance, even if it''s inline.

I''d just do a fmul. Chances are that more than a few bit ops would be slower than one fmul. Though, the right answer is to try different things and profile.

I like pie.

Share this post


Link to post
Share on other sites