S M T W T F S
1
2345678
9101112131415
16171819202122
2324252627 28 29
30

# SimpleMath - a simplified wrapper for DirectXMath

//EDIT: There is a problem with the formatting of this post, but the full text can also be found properly formatted HERE.

SimpleMath, created by my colleague Chuck Walbourn, is a header file that wraps the DirectXMath SIMD vector/matrix math API with an easier to use C++ interface. It provides the following types, with similar names, methods, and operator overloads to the XNA Game Studio math API:
• Vector2
• Vector3
• Vector4
• Matrix
• Color
• Plane
• Quaternion
• Ray
• BoundingSphere
• BoundingBox

Why wrap DirectXMath?

DirectXMath provides highly optimized vector and matrix math functions, which take advantage of SSE SIMD intrinsics when compiled for x86/x64, or the ARM NEON instruction set when compiled for an ARM platform such as Windows RT or Windows Phone. The downside of being designed for efficient SIMD usage is that DirectXMath can be somewhat complicated to work with. Developers must be aware of correct type usage (understanding the difference between SIMD register types such as XMVECTOR vs. memory storage types such as XMFLOAT4), must take care to maintain correct alignment for SIMD heap allocations, and must carefully structure their code to avoid accessing individual components from a SIMD register. This complexity is necessary for optimal SIMD performance, but sometimes you just want to get stuff working without so much hassle!

Enter SimpleMath...

These types derive from the equivalent DirectXMath memory storage types (for instance Vector3 is derived from XMFLOAT3), so they can be stored in arbitrary locations without worrying about SIMD alignment, and individual components can be accessed without bothering to call SIMD accessor functions. But unlike XMFLOAT3, the Vector3 type defines a rich set of methods and overloaded operators, so it can be directly manipulated without having to first load its value into an XMVECTOR. Vector3 also defines an operator for automatic conversion to XMVECTOR, so it can be passed directly to methods that were written to use the lower level DirectXMath types.

If that sounds horribly confusing, the short version is that the SimpleMath types pretty much Just Work™ the way you would expect them to.

By now you must be wondering, where is the catch? And of course there is one. SimpleMath hides the complexities of SIMD programming by automatically converting back and forth between memory and SIMD register types, which tends to generate additional load and store instructions. This can add significant overhead compared to the lower level DirectXMath approach, where SIMD loads and stores are under explicit control of the programmer.

Who is SimpleMath for?

You should use SimpleMath if you are:
• Looking for a C++ math library with similar API to the C# Microsoft.Xna.Framework types
• Porting existing XNA code from C# to C++
• Wanting to optimize for programmer efficiency (simplicity, readability, development speed) at the expense of runtime efficiency

You should go straight to the underlying DirectXMath API if you:
• Want to create the fastest possible code
• Enjoy the lateral thinking sometimes needed to express an algorithm in terms of SIMD operations

This need not be a global either/or decision. The SimpleMath types know how to convert themselves to and from the corresponding DirectXMath types, so it is easy to mix and match. You can use SimpleMath for the parts of your program where readability and development time matter most, then drop down to DirectXMath for performance hotspots where runtime efficiency is more important.

Example

Here is a simple object movement calculation, implemented using DirectXMath. Note the skullduggery to make sure the PlayerCat instance will always be 16 byte aligned (and I didn't even include the implementation of the AlignedNew helper here!)

<pre class="csharpcode"> #include &lt;DirectXMath.h&gt;

<span class="kwrd">using</span> <span class="kwrd">namespace</span> DirectX;

__declspec(align(16)) <span class="kwrd">class</span> PlayerCat : <span class="kwrd">public</span> AlignedNew&lt;PlayerCat&gt;
{
<span class="kwrd">public</span>:
<span class="kwrd">void</span> Update()
{
<span class="kwrd">const</span> <span class="kwrd">float</span> cFriction = 0.99f;

XMStoreFloat3A(&amp;mPosition, pos + vel);
XMStoreFloat3A(&amp;mVelocity, vel * cFriction);
}

<span class="kwrd">private</span>:
XMFLOAT3A mPosition;
XMFLOAT3A mVelocity;
};</pre>

Using SimpleMath, the same math is, well, a little more simple :-)

<pre class="csharpcode"> #include <span class="str">&quot;SimpleMath.h&quot;</span>

<span class="kwrd">using</span> <span class="kwrd">namespace</span> DirectX::SimpleMath;

<span class="kwrd">class</span> PlayerCat
{
<span class="kwrd">public</span>:
<span class="kwrd">void</span> Update()
{
<span class="kwrd">const</span> <span class="kwrd">float</span> cFriction = 0.99f;

mPosition += mVelocity;
mVelocity *= cFriction;
}

<span class="kwrd">private</span>:
Vector3 mPosition;
Vector3 mVelocity;
};</pre>

Here is the x86 SSE code generated for the DirectXMath version of the Update method:

<pre class="csharpcode"> movaps xmm2,xmmword ptr [ecx+10h]
movaps xmm1,xmmword ptr [ecx]
movaps xmm0,xmmword ptr [__xmm@3f7d70a43f7d70a43f7d70a43f7d70a4]
mulps xmm0,xmm2
movq mmword ptr [ecx],xmm1
shufps xmm1,xmm1,0AAh
movss dword ptr [ecx+8],xmm1
movq mmword ptr [ecx+10h],xmm0
shufps xmm0,xmm0,0AAh
movss dword ptr [ecx+18h],xmm0
ret</pre>

The SimpleMath version generates slightly more than twice as many machine instructions:

<pre class="csharpcode"> movss xmm2,dword ptr [ecx]
movss xmm0,dword ptr [ecx+4]
movss xmm1,dword ptr [ecx+0Ch]
unpcklps xmm2,xmm0
movss xmm0,dword ptr [ecx+8]
movlhps xmm2,xmm0
movss xmm0,dword ptr [ecx+10h]
unpcklps xmm1,xmm0
movss xmm0,dword ptr [ecx+14h]
movlhps xmm1,xmm0
movss dword ptr [ecx],xmm2
movaps xmm0,xmm2
shufps xmm0,xmm2,55h
movss dword ptr [ecx+4],xmm0
shufps xmm2,xmm2,0AAh
movss dword ptr [ecx+8],xmm2
movss xmm1,dword ptr [ecx+0Ch]
movss xmm0,dword ptr [ecx+10h]
unpcklps xmm1,xmm0
movss xmm0,dword ptr [ecx+14h]
movlhps xmm1,xmm0
mulps xmm1,xmmword ptr [__xmm@3f7d70a43f7d70a43f7d70a43f7d70a4]
movaps xmm0,xmm1
movss dword ptr [ecx+0Ch],xmm1
shufps xmm0,xmm1,55h
shufps xmm1,xmm1,0AAh
movss dword ptr [ecx+10h],xmm0
movss dword ptr [ecx+14h],xmm1
ret</pre>

Most of this difference is because I was able to used aligned loads and stores in the DirectXMath version, while the SimpleMath code must do extra work to handle memory locations that might not be properly aligned. Also note how the SimpleMath version loads the mVelocity value from memory into SIMD registers twice, while the extra control offered by DirectXMath allowed me to do this just once.

But hey, sometimes performance isn't the most important goal. If you care more about optimizing for developer efficiency, SimpleMath could be for you.

Resources

http://blogs.msdn.com/b/chuckw/archive/2012/03/27/introducing-directxmath.aspx

http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-sse-sse2-and-arm-neon.aspx

http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-sse3-and-ssse3.aspx

http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-sse4-1-and-sse-4-2.aspx

http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-avx.aspx

http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-f16c-and-fma.aspx

<img src="http://blogs.msdn.com/aggbug.aspx?PostID=10383307" width="1" height="1">

Source