Archived

This topic is now archived and is closed to further replies.

A challenge

This topic is 5144 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have the following expression: (int)(n- ((float)n / x)) where n is type int, and x is of type float. The goal is to get this puppy or an equivalent to run a fast as computerly possible. i know that type conversions are costly.if i remove the float type conversion i loose accuracy, but it does run in about 84% the speed. This operation will get run 90,000 times per update per frame. The result must be stored into a huge array of ints. on a p4 2.8ghz cpu it takes about 2.3 seconds to complete 1000 times, and i would like to get something close to 1.5 or two. Maybe through inline assemly, which i dont know, but the performance benefit and maybe not be worth learning it. Anyone''s input?

Share this post


Link to post
Share on other sites
What''s killing you is the float -> int conversion. You need some background to understand why; short version is, ANSI requires truncation, and IA-32 rounds by default - _ftol (the function that implements the conversion) has to change the FPU rounding mode.
If possible, make the result float, or use int throughout, otherwise, there are several methods to speed up the conversion: -Qifist, __asm fistp, SSE cvtps2pi / 3DNow! pf2id, extracting the mantissa via the ''1.5 trick''.
Note that division (the next longest op) is twice as fast if done in the FPU (16 vs. 26..42 clocks), and can be approximated in 7 clocks with SSE or 3DNow!.
[ all timings for Athlon XP ]

Further optimizations require knowledge of the values of n and x, how often they change, how accurate the result must be, and what it''s used for.
BTW, learning asm is most definitely worth it - understanding what goes on in the CPU makes you automatically write faster code (even in a HLL).

TempusElf: you are correct; if one operand is a float, the other is promoted. Integer division is not correct in this case - the expression is equal to n - ceil(n / x). Strange..

Share this post


Link to post
Share on other sites
quote:
Original post by samgzman
on a p4 2.8ghz cpu it takes about 2.3 seconds to complete 1000 times



I hope you''re talking about your whole update/draw cycle because I just wrote a simple program to see how many times I could execute (int)(n- ((float)n / x)) in two seconds (on a 700MHz machine) and it executed more than 33000 times...

And if you are talking about your whole update/draw cycle, 1000 times in 2.3 seconds is more than 400 frames per second... I don''t see why this needs any streamlining...

Share this post


Link to post
Share on other sites
This is part of a water surface simulation. i didn''t acutally time the expression by itself, but rather the program as a whole with and without it being used, so that was an estimation. It seems there are other influences, but still this expression is the most costly nonetheless. Thanks for all your input guys!

Share this post


Link to post
Share on other sites