• Advertisement
Sign in to follow this  

SSE alignment

This topic is 3579 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm trying to make a vector class, to learn the maths involved, and to also learn SSE (for the heck of it). So, I created a class. However, when I use it in my raytracer (which creates arrays of vertices and stores them in the __m128 datatype), it sometimes segfaults. Not always, but I assume that it is because of alignment. However, I have tried everything, seemingly. Please, help me understand! (linux, using gcc, probably not the best choice, but I can move into windows to test stuff if you require it). Class:
struct Vector3 {
	union {
#ifdef USE_SSE
		__m128 vec;
#endif
		struct {
			float x, y, z, w;
		};
		struct {
			float r, g, b, a;
		};
		float xyzw [4];
	};
	inline Vector3() {}
	inline Vector3 (float newx, float newy, float newz, float neww=0.0f) :
		x(newx), y(newy), z(newz), w(neww) {}

	//inline Vector3 (const Vector3& vector3) : vec (vector3.vec) {}

	inline float magnitude ();
	inline float magnitudeSquared ();
	inline void normalize ();	
	inline void clamp(const Scalar& min, const Scalar& max);
#ifdef USE_SSE	

	 void* operator new[](size_t allocSize)
	{
		void* p = _mm_malloc(allocSize, 16);
		return p;
	}

	void operator delete[](void *p)
	{
		_mm_free(p);
	}

	void* operator new (size_t allocSize)
	{
		void* p = _mm_malloc(allocSize, 16);
		return p;
	}

	void operator delete(void* p)
	{
		_mm_free(p);
	}

#endif

	void print ();
};

If I remove the allocation operators, there is no noticable difference (in segfault frequency). I am using the xmmintrin.h header. Is the way I'm using anonymous structs/unions wrong? How do other people do it? Thanks.

Share this post


Link to post
Share on other sites
Advertisement
You should make sure that your data types used with SSE are 16-byte aligned. using __declspec(align(16)) should fix that issue.

Share this post


Link to post
Share on other sites
I'm using gcc, that's a VC++ specific option.


More specifically, the __m128 class is already aligned to boundraries, using __attribute__ (as specified in the xmmintrin.h file:

typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));

Also, this issue didn't occur as often before, but probably because my code was simpler.

I'm almost certain that it isn't aligned, yet I have no idea how to check.

Is there a way to check in gdb, or something?

I can post the whole class if you want.

Share this post


Link to post
Share on other sites
Quote:
Original post by solinent
I'm using gcc, that's a VC++ specific option.


More specifically, the __m128 class is already aligned to boundraries, using __attribute__ (as specified in the xmmintrin.h file:

typedef float __m128 __attribute__ ((__vector_size__ (16), __may_alias__));

Also, this issue didn't occur as often before, but probably because my code was simpler.

I'm almost certain that it isn't aligned, yet I have no idea how to check.

Is there a way to check in gdb, or something?

I can post the whole class if you want.


Compile with -g or -ggdb and in GDB break in the scope of the variable you want to check and do "print &vec", where vec is your actual variable's name. You can also use GDB to catch the segfault and see where it happened. You might want to start there. If you don't know how to do this someone can walk you through the basics.

Share this post


Link to post
Share on other sites
I have stepped through with GDB, and used commands like "where" and "break", "next" to debug my program.

However, I can't seem to actually access the variables. "print" doesn't do anything, even when I set a break before the one area in the header file, and try to print out the values I gave the function, it gives me nothing. Also, it doesn't seem to step through the file correctly.

I have used GDB, but I was just wondering if there was a specific command I can use to check alignment. __align_of apparently just returns what it should be, so I can't use that.

Thanks!

Share this post


Link to post
Share on other sites
What does your compile command look like? Either you're not generating appropriate debug info or you have stack corruption.

Share this post


Link to post
Share on other sites
Uh, this is the command to compile a cpp into an object.

g++ -Wall -g -O -msse -ffast-math -funroll-loops -mfpmath=sse -c src/Vector3.cpp -o bin/Vector3.o


And, this is the command to link:
g++ -Wall -g -O -msse -ffast-math -funroll-loops -mfpmath=sse -lrt -lSDL -lboost_thread -o raytracer bin/main.o bin/Vector3.o src/Mesh.o

Also, profiling using -pg option works.

I'm using a Makefile btw.

Share this post


Link to post
Share on other sites
Ok, so you have no problems when USE_SSE is not defined? Are you actually using any intrinsics yet or just including that __m128 in your union at this point?

Share this post


Link to post
Share on other sites
Yeah, I am actually using intrinsics, and when USE_SSE is not defined,the code works fine.

Is there some extra step I can do to make sure it stays aligned?

Share this post


Link to post
Share on other sites
Try something like printf("%p",&vec); or similar in the constructors of your Vector3.

Share this post


Link to post
Share on other sites
I'm not sure about gcc but for VC __declspec(align()) only guarantees that the type will be correctly aligned when allocated on the stack. It doesn't ensure alignment for dynamically allocated memory (and if you think about it it's not clear how it could while remaining fully standards compliant). The default alignment for dynamically allocated memory is 8 bytes with VC so you have to do extra work to ensure correct alignment if you're working with SSE. That may be your problem.

Share this post


Link to post
Share on other sites
Man, ready my posts, or at least the first post.

Also, I understand that you can't specifically set the alignment.

I'll see what I can do about this. The thing is, is that I know that it is un-aligned, there'd be no other reason for unaligned memory access. So, I still need to discover how else I can allocate memory. Maybe I shouldn't use _mm_malloc?

Share this post


Link to post
Share on other sites
Sorry, didn't really look at the code so I didn't notice you already had custom allocators.

It's a while since I looked at custom operator new[] but from what I remember the address you return from it may not be the address that the array will actually start at since the compiler needs to store the number of elements in the array so that it knows how many times to run the destructor when the array is deleted and many compilers will over-allocate and store the element count before the first element in the array. This means that though you're returning an aligned address from operator new I'm not sure that guarantees the array will start on an aligned address. Might be worth looking into that.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement