Jump to content
  • Advertisement
Sign in to follow this  

This topic is 4171 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I want to write a 4D vector struct with SSE accelerated addition function: _declspec(align(16)) struct Vector { union { float x,y,z,w; float D[4]; }; { ... } void Add(Vector* V1) { Vector* T = this; _asm { mov esi, T mov edi, V1 movaps xmm0, xmmword ptr [esi] movaps xmm1, xmmword ptr [edi] addps xmm0,xmm1 movaps xmmword ptr [esi], xmm0 } } } There is no syntactical problem, but when I try debug the asm code up there only the last 4 bytes of the xmm0 and xmm1 registers will be filled with correct data, and the first 3*4 bytes will be CCCCCCCC. I've tested this code with valid and initilzed vectors. (tried with movups too) Thanks for any help!

Share this post


Link to post
Share on other sites
Advertisement
Your problem is the declaration of union within Vector struct. Your code tells compiler to put all x, y, z, w and D[4] on the same place. That is, addresses of x, y, z, w are the same. What you want is:

union
{
struct
{
float x, y, z, w;
};
float D[4];
};


Otherwise your SSE code is OK.

Share this post


Link to post
Share on other sites
Would I be correct in recommending compiler intrinsics rather than assembly, as inline assembly format (or even support) varies by compiler, and using the compiler intrinsics allows compiler optimization? Although I guess portability's not so important if you're not making library code for reuse (like I am).

Share this post


Link to post
Share on other sites
Which compiler you want to support ? For GCC I used the following approach:

//
// declare types for SSE registers
//

// 4 * float
typedef float v4sf __attribute__ (( vector_size (16) ));

// 2 * double
typedef double v2df __attribute__ (( vector_size (16) ));



//
// declare helper functions
//

// add a and b
static inline
v4sf _addps(v4sf a, v4sf b)
{
v4sf result;
asm (
"addps %2,%0\n\t"
: "=x"(result)
: "0"(a), "xm"(b)
);
return result;
}


//
// encapsulate functions in a class
//

class Vector
{
public:
Vector(v4sf data) : data(data) {}
// ... more stuff
friend Vector operator+ (const Vector& a, const Vector& b)
{
return Vector(_addps(a.data, b.data));
}
Vector& operator+= (const Vector& other)
{
data = _addps(data, other.data);
return *this;
}
private:
// store numbers in a 4-vector of floats
v4sf data;
};



GCC supports vector types, so the compiler can care about alignment of the vector (here 16 bytes). GCC allows inline assembly as a block of assembler code (may also be more than one instruction) with parameters (%0, %1 and so on). The register allocator of the compiler can then decide in which register a value is kept and how it gets there (from another register or from memory). Used in the correct way, you can generate assembler code that looks like hand-written, but is easy to use.

Note that the vector types (here v4sf for instance) supported by GCC allow basic arithmetic, so the line "data = _addps(data, other.data);" above could also be written as "data += other.data". But for a dot product some instructions like HADDPS are useful, but to my knowledge GCC does not use them. So here inline assembly can do the trick.

Share this post


Link to post
Share on other sites
Quote:
Original post by Catafriggm
Would I be correct in recommending compiler intrinsics rather than assembly, as inline assembly format (or even support) varies by compiler, and using the compiler intrinsics allows compiler optimization? Although I guess portability's not so important if you're not making library code for reuse (like I am).


Intrinsics may be better in some cases. Main advantage may be the optimization since compilers usually don't optimize across asm blocks. In VS 2005 you will even need to work with intrinsics sometimes, since inline asm is not supported in 64 bit mode [sad].
Of course it has some drawbacks. For example, it may not fully support instruction(s) you need (e.g. __cpuid in Visual Studio 2005) and it is entirely compiler specific (or at least I don't know of any "standard" for intrinsics).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!