Sign in to follow this  

math use fpu or 3dnow?

This topic is 4712 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello I have a water demo which consumes a lot of math (the grid is a projected grid so each vertex must be transformed two times in software, + i project a texture on it,+ I perturb the surface so I have to recalculate normals to the surface every time i update it). A real bottleneck. So I have tried rewriting the routines I had with 3dnow code, but I have never managed to get it to run much faster than normal fpu x87 code, why is that? even with amd 3d lib, it's code doesn't happen to be faster than fpu code, Has somebody yet managed to really take advantage of these instruction sets and How? I have an amd athlon and i code using athlon specific instructions.

Share this post


Link to post
Share on other sites
The strength of SIMD is just that, Single Instruction Multiple Data, so one instruction can do 4 calculations in parallell. You need to design your code to take advantage of this, like for example transforming a vector by a matrix by multiplying the vector by an entire row per instruction.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Are you sure that you are computing normals in the right way for your grid?
(that is, you shouldn't be averaging triangle normals, you should be computing the normals directly from your grid)

Share this post


Link to post
Share on other sites
I calculate normals by averaging them, that is I calculate the normal from each facet, add it to each vertex normal and then i divide each vertex normal by the number of passes, Why shouldn't it be done like that for a water surface?

Share this post


Link to post
Share on other sites
You can definitely get huge gains (+100% speed up). Be certain to control your compiler settings well. If you use C intrisics, decrypt the asm output files generated, learn the timings from the AMD/Intel docs and see if your benchmarks (use a profiler, external or embeded in your code with rdtsc or GetTickCounts) indicates normal timings.

Q : do you use intrisics or inline asm ?

In my experience of 3DNow it's very rare, even in the less favorable cases (like cross product or quaternion muls), that I don't get a +50% speed up. Be certain to avoid anomalies due to bad C/C++ inlining, bad data alignment or function call overheads, etc... Now you get rid of one common artificial dead end : you are sure that you compare quality (truely optimized) code in x87 and in 3DNow or SSE.

Else I leave to you the job of a prior algorithmic analysis. Still I'd say that you should see considerable performance gains :

a) Cheap renormalization : The cheap (1 cycle throughput) rsqrt (without Newton Raphson refinement) gives you enough precision for the kind of output you have : RGBA with 8 bits per component in the end.

b) A regular grid brings increased parallelism. This gives you an implicit adjacency graph (mesh), and implicit x,y coords. You can exploit this structure to compute several (2,4 or 8) rows at once or else use loop unrolling. Be certain to use the register space fully, to schedule your operations well, to use a structure of array preferably. A regular grid has constant steps in x and y, assuming that only the z component changes in your surface animation model. Hence the most obvious optimization : do not compute dynamically constant elements.

If you want more tips, then post your current code, preferably, first, the naive x87 version.

Share this post


Link to post
Share on other sites

This topic is 4712 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this