• 13
• 16
• 27
• 9
• 9

# SSE

This topic is 3958 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I want to write a 4D vector struct with SSE accelerated addition function: _declspec(align(16)) struct Vector { union { float x,y,z,w; float D[4]; }; { ... } void Add(Vector* V1) { Vector* T = this; _asm { mov esi, T mov edi, V1 movaps xmm0, xmmword ptr [esi] movaps xmm1, xmmword ptr [edi] addps xmm0,xmm1 movaps xmmword ptr [esi], xmm0 } } } There is no syntactical problem, but when I try debug the asm code up there only the last 4 bytes of the xmm0 and xmm1 registers will be filled with correct data, and the first 3*4 bytes will be CCCCCCCC. I've tested this code with valid and initilzed vectors. (tried with movups too) Thanks for any help!

##### Share on other sites
Your problem is the declaration of union within Vector struct. Your code tells compiler to put all x, y, z, w and D[4] on the same place. That is, addresses of x, y, z, w are the same. What you want is:
union{  struct  {    float x, y, z, w;  };  float D[4];};

Otherwise your SSE code is OK.

##### Share on other sites
Would I be correct in recommending compiler intrinsics rather than assembly, as inline assembly format (or even support) varies by compiler, and using the compiler intrinsics allows compiler optimization? Although I guess portability's not so important if you're not making library code for reuse (like I am).

##### Share on other sites
Which compiler you want to support ? For GCC I used the following approach:
//// declare types for SSE registers//// 4 * floattypedef float v4sf __attribute__ (( vector_size (16) ));// 2 * doubletypedef double v2df __attribute__ (( vector_size (16) ));//// declare helper functions//// add a and bstatic inlinev4sf _addps(v4sf a, v4sf b){	v4sf result;	asm (		"addps %2,%0\n\t"		: "=x"(result)		: "0"(a), "xm"(b)	);	return result;}//// encapsulate functions in a class//class Vector{public:	Vector(v4sf data) : data(data) {}	// ... more stuff	friend Vector operator+ (const Vector& a, const Vector& b)	{		return Vector(_addps(a.data, b.data));	}	Vector& operator+= (const Vector& other)	{		data = _addps(data, other.data);		return *this;	}private:	// store numbers in a 4-vector of floats	v4sf data;};

GCC supports vector types, so the compiler can care about alignment of the vector (here 16 bytes). GCC allows inline assembly as a block of assembler code (may also be more than one instruction) with parameters (%0, %1 and so on). The register allocator of the compiler can then decide in which register a value is kept and how it gets there (from another register or from memory). Used in the correct way, you can generate assembler code that looks like hand-written, but is easy to use.

Note that the vector types (here v4sf for instance) supported by GCC allow basic arithmetic, so the line "data = _addps(data, other.data);" above could also be written as "data += other.data". But for a dot product some instructions like HADDPS are useful, but to my knowledge GCC does not use them. So here inline assembly can do the trick.

##### Share on other sites
Quote:
 Original post by CatafriggmWould I be correct in recommending compiler intrinsics rather than assembly, as inline assembly format (or even support) varies by compiler, and using the compiler intrinsics allows compiler optimization? Although I guess portability's not so important if you're not making library code for reuse (like I am).

Intrinsics may be better in some cases. Main advantage may be the optimization since compilers usually don't optimize across asm blocks. In VS 2005 you will even need to work with intrinsics sometimes, since inline asm is not supported in 64 bit mode [sad].
Of course it has some drawbacks. For example, it may not fully support instruction(s) you need (e.g. __cpuid in Visual Studio 2005) and it is entirely compiler specific (or at least I don't know of any "standard" for intrinsics).