Should I use something like this:
DirectX::XMStoreFloat3A(&Position, DirectX::XMVectorSet(-1.0f, -1.0f, 0.0f, 0.0f));
or:
Position.x = -1.0f;
Position.y = -1.0f
Position.z = 0.0f;
Well if you click "Go to Definition" for the XMVectorSet :
inline XMVECTOR XMVectorSet
(
float x,
float y,
float z,
float w
)
{
#if defined(_XM_NO_INTRINSICS_)
XMVECTORF32 vResult = {x,y,z,w};
return vResult.v;
#elif defined(_XM_ARM_NEON_INTRINSICS_)
__n64 V0 = vcreate_f32(((uint64_t)*(const uint32_t *)&x) | ((uint64_t)(*(const uint32_t *)&y) << 32));
__n64 V1 = vcreate_f32(((uint64_t)*(const uint32_t *)&z) | ((uint64_t)(*(const uint32_t *)&w) << 32));
return vcombine_f32(V0, V1);
#elif defined(_XM_SSE_INTRINSICS_)
return _mm_set_ps( w, z, y, x );
#else // _XM_VMX128_INTRINSICS_
#endif // _XM_VMX128_INTRINSICS_
}
So if you use XMStoreFloat3A approch: a temp object will be generated from XMVectorSet later it will be passed by value to XMStoreFloat3A(wich is ok because of the registers).
So more instruction will be generated, well the compiler will try optimize that but don't count on this...
"postion.x = something" won't generate some useless instructions and i think it is cleaner than the other approch.
So it's a better solution if I want my program / game run as smoothly and quickly as possible?
Well probably yes.
The point is that loading/storing unaligned values into the aligned vars is slow, but it isn't a big deal. That part of the program won't give you the desired 30 to 60 fps jump.
If you're asking me for an advice I would tell you to use the aligned data. Because:
- for x64 achitecture there will be no overhead when 'converting' from XMFLOAT* to XMVECTOR
- the code is more readable