# CML+vectormathlibrary (SIMD) ?

## Recommended Posts

CML looks great for a math library in an OpenGL program, but it doesn't have SSE operations and I doubt the result from automatic vectorization by the compiler will be as great as using a library that actually has SSE-specific code. I also looked at vectormathlibrary which is part of the free Bullet physics library, but doesn't have the advanced functionality of CML. Now, http://cmldev.net/?p=424 says that CML can be used as an extension of another library through data sharing, but I'm wondering if there's a simple way to have CML internally use vectormathlibrary operations (since vectormathlibrary overloads common operators). By simple I mean modify CML in a way that won't make it a pain to have to redo lots of modification each time CML is updated.

#### Share this post

##### Share on other sites
Quote:
 Original post by PruneCML looks great for a math library in an OpenGL program, but it doesn't have SSE operations and I doubt the result from automatic vectorization by the compiler will be as great as using a library that actually has SSE-specific code.I also looked at vectormathlibrary which is part of the free Bullet physics library, but doesn't have the advanced functionality of CML.Now, http://cmldev.net/?p=424 says that CML can be used as an extension of another library through data sharing, but I'm wondering if there's a simple way to have CML internally use vectormathlibrary operations (since vectormathlibrary overloads common operators). By simple I mean modify CML in a way that won't make it a pain to have to redo lots of modification each time CML is updated.
Hi Prune,

I don't have an answer for you per se, but here are a few thoughts...

I can't think of a way offhand to make the CML make use of another library's vectorized code. It'd be nice if the CML had vectorized implementations, but our first priority was portability, so platform-specific optimizations just weren't in the cards (at least not this time around).

However, although it doesn't sound like it's quite what you're looking for, the 'data sharing' you mention was intended to make it possible to use the CML alongside another math library, kind of like what you describe. Specifically, the purpose is to make it possible to use the CML's comprehensive library of 2-d and 3-d math functions while using another library for the low-level stuff if needed.

Basically, all the CML requires is that the data be contiguous (i.e. the 16 elements of a 4x4 matrix should be stored contiguously in memory). I don't have the API in front of me right now, but usage should look something like this:
typedef cml::matrix<float, external<4,4> > cml_matrix;float m[16]; // Assume this is the data you'll be using with your vectorized math functionscml_matrix cm(m);// You can now call CML transform/math functions on 'cm', but it will be the data in 'm' that is// modified.
I think it's often worth asking though whether assembly-level optimization is really needed. There are certainly cases where it is needed, but if all you're doing is setting up and modifying transforms and transforming a limited amount of geometry for collision detection purposes or what have you (a usage pattern that is common in games, I think), the CML - or any other non-vectorized library - should be plenty fast, I would think.

Please let me know if there's any other info I can provide.

#### Share this post

##### Share on other sites
vectormathlibrary supports both structure of arrays and array of structures type storage for vectors, and obviously SoA has higher througput since it avoids swizzling. I'm wondering how difficult it would be difficult to get CML to work with SoA data, or perhaps I would simply use an SSE intrinsic to do the rearrangement into a temporary vector and provide to CML, which might be OK if this is not a frequent operation.

#### Share this post

##### Share on other sites
Quote:
 Original post by Prunevectormathlibrary supports both structure of arrays and array of structures type storage for vectors, and obviously SoA has higher througput since it avoids swizzling. I'm wondering how difficult it would be difficult to get CML to work with SoA data, or perhaps I would simply use an SSE intrinsic to do the rearrangement into a temporary vector and provide to CML, which might be OK if this is not a frequent operation.
If the operations are relatively infrequent, I would go the 'temporary vector' route. (I don't think it would be possible to get the CML to work in a 'SoA' setting without some major revisions.)

#### Share this post

##### Share on other sites
OK. Thanks for the fast replies :)

#### Share this post

##### Share on other sites
Hi Prune,

If you set up your compiler options properly (on GCC4+, VS7, VS8, VS9), you'll find out that the compiler generates darn good SSE code. This is why the CML significantly outperformed at least one other hand-SSE optimized code in all of the vector tests I did (http://exmat.sourceforge.net/).

However, I'm no SSE expert, so it very well could be that it's possible to write super-tight SSE code that beats the CML in fair tests. If this is the case, I'd be glad to find out.

Moreover, the CML core is actually flexible enough that it can be updated to support SSE-enabled operations. If it turns out that hand-code SSE is faster for certain data types and fixed-length vectors/matrices, it's basically a matter of specializing a couple of classes and structures.

If you can point me at some good, hand-optimized vec/mat code, I'd be very interested in running some tests against CML.

Thanks,
Demian

#### Share this post

##### Share on other sites
The code I mentioned in my first post, written by Sony for the Bullet physics open source library, is very well optimized. I didn't do side by side comparison with CML yet. I use Intel's compiler in both Windows and Linux, which has excellent vectorization but even there I often end up doing lots of manual work--not always intrinsics, but often you have to add crap like #pragma ivdep etc. Not to mention that if you don't align your data, vectorization is almost useless.

#### Share this post

##### Share on other sites
Hi Prune,

I've seen the Sony lib, but haven't had time to try it out. If I can get any spare cycles, I'll check it out. Thanks for the info.

Cheers,
Demian

#### Share this post

##### Share on other sites
Oops I jumped the gun. The SSE implementation has AOS only; SOA option is only implemented for the Cell version.

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account

## Sign in

Already have an account? Sign in here.

Sign In Now

• ### Forum Statistics

• Total Topics
628400
• Total Posts
2982449

• 9
• 10
• 9
• 19
• 24