Jump to content
  • Advertisement
Sign in to follow this  

XMVECTOR and XMMATRIX in header?

This topic is 2146 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello! So I've been struggling with getting XMVECTOR and XMMATRIX working when defined in my header file. If I do:

XMVECTOR camPosition;

in the header file for example, and then do:

camPosition = XMVectorSet( 0.0f, 0.0f, -0.5f, 0.0f );

in my .cpp file, for example. It gives me an "access violation" error. I read that this is because of some kind of alignment thingy? Is there no way to get the XMVECTOR or XMMATRIX defined in the header file and then use it in the .cpp file?


If not, what should I do instead? Because defining them in the .cpp file works, but that looks so bad? Any other solutions, what do you guys do?

Share this post

Link to post
Share on other sites

The above solution works for stack allocated objects, but might not work with heap allocated objects (i.e. objects allocated via new/delete). That depends on your OS though (some will already allocate memory on 16byte boundaries, but you should probably check with your OS to make sure that is the case). Overloading new/delete, and getting them to call _mm_malloc/_mm_free might be the solution you are after. It's also worth pointing out that you should be careful of virtual functions when using aligned data types (because they will silently pad the structure with an extra pointer value)

Share this post

Link to post
Share on other sites

For stack allocation goto project option, set the Struct Member Alignment to 16-byte or use the /ZP16 into command line.


For dynamic allocation you can use _aligned_malloc with the "placement new" (requires <new> header).

#include <new>
ptrToAlignedObj = (AlignedObj*)_aligned_malloc(sizeof(AlignedObj), 16);
new (PtrToAlignedObj) AlignedObj();

To destroy object allocated with _aligned_malloc you must use _aligned_free after manually calling the destructor.


Note that _aligned_malloc and _aligned_free are not standard.


Another better solution could define a custom allocator or just overload the operators new and delete for classes that need 16-byte alignment.

Edited by Alessio1989

Share this post

Link to post
Share on other sites

Hey, I think the fastest way to tell if your problem is alignment related is to build your project with the Debug configuration instead of Release, since I believe visual studio defaults the Debug configuration to disable SSE intrinsics.


Assuming that is the problem, and you don't get the access violation in Debug configuration,  it probably is a memory alignment issue.  That's because XMMATRIX and XMVECTOR use (i think) __m128 SSE intrinsic types for storage, and those need to have a 16 byte memory alignment.  More info here:  http://msdn.microsoft.com/en-us/library/ee418725.aspx


There are a few ways to work around that issue.  The easiest is to just build for an x64 target platform, since all allocations are 16 byte aligned for x64 processes (instead of 8 byte aligned for x86).  


The next simplest is probably to store your XMMATRIX as a XMFLOAT4X4 type instead (and XMVECTOR as XMFloat4), then use XMLoadxxx and XMStorexxx functions with temporary local XMVECTOR/XMMATRIX variables, to feed them into functions like XMVectorSet that expect 16 byte aligned arguments.   More info here:  http://msdn.microsoft.com/en-us/library/microsoft.directx_sdk.loading.xmloadfloat4.aspx


Finally you can attempt to align the containing class/struct that the XMMATRIX/XMVECTOR is a member of.  You essentially declare the class/struct in your header with __declspec(align(16)), and then make sure you declare the member variables with the XMVECTOR/XMMATRIX types first, to ensure the alignment.  For example:

__declspec(align(16)) class CMyClass
      bool SomeFunc();
      void SomeOtherFunc();
      XMMATRIX m_world;
      XMVECTOR m_camPosition;
      int m_someOtherMember;
      bool m_yetAnotherMember;

That should align the entire class/struct to 16 byte boundaries, and since the aligned XM types are declared first they begin at the requested alignment for the class/struct.  I think that can get a bit messy (with virtual functions and inheritance possibly altering the data structure within a class), perhaps someone more experienced could provide better information on that.  Personally I use SSE intrinsic DirectXMath types directly, and only compile for x64 target.

This helped me alot! I bet there is a more correct way, or even more efficient. But I simpy solved it with #define _XM_NO_INNTRINSICS_

Seemed to be alot easier for me, anything that would make this way of doing things more complicated or even wrong? Seemed to have solved my problem for now though.

Share this post

Link to post
Share on other sites

Glad that it helped.  Using  #define _XM_NO_INNTRINSICS_ will just make your Release builds work the same as Debug builds with aligned XM types.  It's not wrong per-se, but if you're doing a lot of matrix or vector operations, it's going to cost a significant amount of extra CPU utilization.  


If you are doing a fair amount of vector/matrix operations (like 50 or 100+ per frame) and don't want to build for x64 target, I'd strongly recommend using the XMLoad and XMStore functions with temporary XMVECTOR/XMMATRIX local variables.  They're really easy to use and you'll see a noticeable drop in cpu usage versus _XM_NO_INNTRINSICS_, despite the Load/Store overhead.


Aligning the allocations on the stack and heap are quite a bit more involved, and I can understand why you'd like to avoid that for now.


On that note I just wanted to say thanks to RobTheBloke and Alessio1989, because I wasn't aware of the details of heap alignment myself either.  At some point I want to use AVX instructions and I'll need to align to 32 byte boundaries, so the additional info about aligned_malloc is appreciated.


quick edit:


Just for examples sake, assuming you store the world matrix for each object/mesh, this is how you'd use XMLoad when transposing a world matrix, for setting constant buffers:

//change your functions to take XMFLOAT4x4 arguments instead of XMMATRIX ones
void UpdateObjectConstantBuffer(ID3D11DeviceContext* context, const XMFLOAT4x4& worldMatrix)
//just use the XMLoad function as the XMMATRIX argument for the XMMatrixTranspose function
XMMATRIX transposedWorld = XMMatrixTranspose(XMLoadFloat4x4(worldMatrix));
//now map the constant buffer, copy over the transposedWorld matrix, unmap buffer, etc or however you did it before

So you'd use XMFLOAT4x4 instead of XMMATRIX for the storage type in your header files (e.g. class/struct members), then use function local XMMATRIX types only when you need them for DirextXMath library functions.  The local transposedWorld is automatically aligned on the stack, so it's pretty simple, and more importantly it'll let you drop your _XM_NO_INNTRINSICS_ define, to take advantage of SSE intrinsic operation performance without worrying about manual alignment.  

Edited by backstep

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!