Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

LinaInverse2010

Packet Vector Class (SSE)

This topic is 5341 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have a packet float class already and am trying to decide how to implement the packet vector class (consisting of > 16 vectors at least). Here are the two main approaches I can think of, which do you think would be better: 1) Implement with 4 values (x,y,z,w) and store it one vector after another (using only xyz would make the vectors cross the bpundry of the SSe types and would make designing the cross and dot product much harder) But this would waste the w component since it isn''t used much (lose 25% efficiency) 2) Take each component as a seperate array and only store x,y,z. Gains efficiency but looses memory coherence especially in cross products where each component will be far apart. Which do you think would be best? Or should I take method 1 and only use x,y,z and do the harder programming? How much do you think I''ll gain from this. These classes will be used in a packet tracer (with long render times), so any improvement here will help a ton. LinaInverse

Share this post


Link to post
Share on other sites
Advertisement
quote:

1) lose 25% efficiency.


That's true for algebra routines. But in other cases these 25% are hidden in the magma of the instruction pipeline. Also the w component is useful for many reasons. At first it can serve to distinguish the point from the vectors. You can assert a few things to let the coder detect flaws in his algorithm (at run time).

quote:

Which do you think would be best?



An hybrid solution. Base implementation should be 1). Use 2) (SoA) for high speed rotations of vertex arrays or lighting. But of course these days most is done by the TnL on 3D chips. So that's why you should stick on 1) as a base. Make a high speed array convertor to switch array formats between 1) and 2) routines, use prefetch.

You might be interested in my thread Open Source pro math lib.
Check the math forum and the buisiness forum.



[edited by - Charles B on March 5, 2004 12:23:07 AM]

Share this post


Link to post
Share on other sites
Just so you know, I''m going to use this for a Ray Tracing program, not a rasterization program. So This library is going to be the basis for the speed of my program. So should would that change what you said?

Share this post


Link to post
Share on other sites
1) premature optimisation
2) i never used w myself




If that''s not the help you''re after then you''re going to have to explain the problem better than what you have. - joanusdmentia

davepermen.net

Share this post


Link to post
Share on other sites
I don''t think its a premature optimization for 2 reasons:

1) It has been shown that using coherent packets have given a great increase in raytacing speed.

2) I intend to port some parts over to GPU shaders and use that to do some on the hard part (like only for intersecting very large batches of triangles).

I already have a pretty good normal raytracer that I have been using for the last year and a half, and would like to start over with something much better and faster. I currently render some very complex scenes (with millions of objects and not just simple triangles at that, I have large polynomials and subdivision surfaces) and some take hours using Global Illumination and supersampling, since I need to trace many rays for things like soft shadows and depth-of-field in order to get a good image (think 1600x1200 * 5x5 initial samples per pixel and up to 12x12 samples if needed on any given pixel).

LinaInverse

Share this post


Link to post
Share on other sites
@Dave,
1) premature optimisation

With such principles Doom and Quake would never have existed. Game dev is exactly the field where this predicate has to be reversed to some extent. Factors 10-20 often kill log(n) in algorithmic complexity.

This means that in real life, in game technos, n*n can beat k*nlog(n), or nlog(n) can beat k*n ... if k is big.

I am creating a math lib benching everything, I can ensure that this k means something.

@Lina
Since you are raytracing, I assume that you need some complex functions for lighting, and you do not necessarilly process arrays of vertices. So, to me, x,y,z,w aligned on 128 bits boundaries has to be a bas feature. Alignement is the main constraint for SIMD. But I do not know your code. I have never written a full ray tracer, always rasterizers (10 years ago) or FPS 3D engines.

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!