Alignment misunderstandings

Started by
3 comments, last by Finalspace 6 years, 12 months ago

Well yesterday i got a really weird issue where my game crashed reproduceable at the exact same spot: Trying to set the color for the ~407 th line added to my render buffer.
This looks like this is very straightforward (Memory for this is already pre-allocated, so there was no concerns about that):


struct RenderState {
	u8 *commandBufferBase;
	mem_size commandBufferSize;
	mem_size commandBufferUsed;
};

struct RenderCommand {
	RenderCommandType type;
	Vec4f color;
	Transform transform;
	union {
		RenderCommandClear clear;
		RenderCommandLines lines;
		RenderCommandPolygon polygon;
		RenderCommandCircle circle;
		RenderCommandSprite sprite;
	};
};

internal RenderCommand *RenderPush(RenderState *renderState, RenderCommandType type, Transform *transform, const Vec4f &color) {
	mem_size size = sizeof(RenderCommand);
	Assert(renderState->commandBufferUsed + size <= renderState->commandBufferSize);
	void *renderCommandBase = (u8 *)renderState->commandBufferBase + renderState->commandBufferUsed;
	renderState->commandBufferUsed += size;
	RenderCommand *result = (RenderCommand *)renderCommandBase;
	ZeroStruct(result);
	result->type = type;
	result->color = color; // <- The crash was here
	if (transform) {
		result->transform = *transform;
	}
	return(result);
}

Now after searching hours of the culprit (in totally wrong places of course), i finally double checked the actual structure for storing the color:


union Vec4f {
	struct {
		union {
			Vec3f xyz;
			struct {
				r32 x, y, z;
			};
		};
		r32 w;
	};
	struct {
		union {
			Vec3f rgb;
			struct {
				r32 r, g, b;
			};
		};
		r32 a;
	};
	struct {
		Vec3f xyz;
		r32 w;
	};
	struct {
		Vec2f xy;
		r32 ignored0;
		r32 ignored1;
	};
	struct {
		r32 ignored2;
		Vec2f yz;
		r32 ignored3;
	};
	struct {
		r32 ignored4;
		r32 ignored5;
		Vec2f zw;
	};
	struct {
		__m128 simd128;
	};
	r32 m[4];
};

What i missed here, was the __m128 which was used in the SSE implementation for several color corrections on a pixel level.
What i didn´t know about: __m128 must be aligned by a 16 byte boundary, otherwise you get undefined behaviors.

After removing the __m128 from the vector structure with introducing an additional _mm_load_ps() to get the color correction working - the issue was gone.

This was not easy to track down, it took my hours to figure this out... This was truly coding horror!

I really should sleep more...

Advertisement

Sure that is the cause? _mm_load_ps() must be 16-byte aligned, so if it doesn't work otherwise but does work with that intrinsic, this is distinctly weird.

Note by the way that a (aligned) read from an unaligned address into an xmm register is by no means undefined behavior, on the contrary it is exactly defined. It raises a general protection (GP) exception. You will need to use _mm_loadu_ps() which is available as of SSE2 (or, preferrably, align properly). Did you consider adding alignas(16) or alignas(alignof(__m128)) to your union definition?

Oh, another fun thing... doing the color correction in SSE and thus accessing an inactive member after previously writing to a differnt one indeed invokes undefined behavior in the strictest sense, on a language level, at least in C++ (not so in C, which explicitly says type punning will happen). There's the struct-with-common-sequence exception, but that is not the same as allowing type punning. It will probably "work" anyway, but who knows... might as well keep some values in registers and the read from memory into the xmm gives rubbish.

I've had similar crashes (can't remember the exact fault code off the top of my head), from code like:
//library A:
struct Vec4 { float x,y,z,w; }
struct Foo {
  int a;
  Vec4 b;
};

//library B:
/*[alignment declaration]*/ struct Float4 { /*[union/SSE magic]*/ }

//glue code:
void CopyState(Foo* out, Float4* input)
{
  static_assert( sizeof(Float4) == sizeof(Vec4) );//this should be safe enough...
  ((Float4&)out->b) = *input; // crash - uh oh, b isn't an aligned variable
}

Sure that is the cause?


Yes that was the cause, after removing the __m128 field the issue was gone.

Parts which calculates color stuff using SSE, gets a local _m128 and read the original Vec4f floats using _mm_load_ps(&v.m[[0]) and
this works fine as well.

Maybe on x86 it may not work? Dont know, i support x64 only.

Well samoth you was right. It was not fixed - i got that crash again.

After setting the field alignment for the float field it was properly fixed:

union Vec4f {

__declspec(align(16)) r32 m[4];
};

This topic is closed to new replies.

Advertisement