Jump to content

  • Log In with Google      Sign In   
  • Create Account


Lots of vector related questions


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
No replies to this topic

#1 coder0xff   Members   -  Reputation: 225

Like
0Likes
Like

Posted 06 November 2010 - 09:58 AM

Essentially, I want to know the best (fastest) way to perform vector operations in C# (add, sub, dot, cross, etc). I'm using SlimDX. I discovered, using .Net reflector, that SlimDX implements vector operations directly, rather than passing them to DirectX (which is good). But the fact that I see this in the reflector's disassembler window:

public static Vector4 Add(Vector4 left, Vector4 right)
{
Vector4 vector;
vector.X = left.X + right.X;
vector.Y = left.Y + right.Y;
vector.Z = left.Z + right.Z;
vector.W = left.W + right.W;
return vector;
}


led me to believe (from experience with VC++/CLI) that the packed-single versions of the SSE instructions aren't used, and it instead does one scalar addition at a time. So I decide to get an actual x86 disassembly, and I'm disappointed to find this:

old-school FPU instructions!


It's not using SSE instructions AT ALL!

I have a library that uses packed-single SSE instructions to do the most common ops, and I wrote it using VC++ intrinsic functions. It is, of course, native. Problem here is that switching from .Net to native code incurs overhead (as I've seen from ANTS profiler and in benchmarks).

I've heard of SlimGen, and know that it was made by some of the same guys that work on SlimDX. I'm wondering, why doesn't SlimDX use SlimGen to inject (or however it works) SSE packed-single versions of the vector operations? I noticed SlimGen is for .Net 2.0. Is it not possible with .Net 4.0?

AND the last question! Unfortunately, most vectors I see in use aren't vec4, but vec3. I can't load/store the whole vec3 into an XMM register with one instruction (or can I?) without going over boundaries (MOVAPS is gonna move 16 bytes, but we only have 12). I'd rather avoid using a bunch of instructions to load and shuffle individual (or pairs of) floating point values. How can I best solve this problem? Convert everything to vec4? Just use scalar instructions?

Thanks in advance

Sponsor:



Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS