CML+vectormathlibrary (SIMD) ?
CML looks great for a math library in an OpenGL program, but it doesn't have SSE operations and I doubt the result from automatic vectorization by the compiler will be as great as using a library that actually has SSE-specific code.
I also looked at vectormathlibrary which is part of the free Bullet physics library, but doesn't have the advanced functionality of CML.
Now, http://cmldev.net/?p=424 says that CML can be used as an extension of another library through data sharing, but I'm wondering if there's a simple way to have CML internally use vectormathlibrary operations (since vectormathlibrary overloads common operators). By simple I mean modify CML in a way that won't make it a pain to have to redo lots of modification each time CML is updated.
Quote:Original post by PruneHi Prune,
CML looks great for a math library in an OpenGL program, but it doesn't have SSE operations and I doubt the result from automatic vectorization by the compiler will be as great as using a library that actually has SSE-specific code.
I also looked at vectormathlibrary which is part of the free Bullet physics library, but doesn't have the advanced functionality of CML.
Now, http://cmldev.net/?p=424 says that CML can be used as an extension of another library through data sharing, but I'm wondering if there's a simple way to have CML internally use vectormathlibrary operations (since vectormathlibrary overloads common operators). By simple I mean modify CML in a way that won't make it a pain to have to redo lots of modification each time CML is updated.
I don't have an answer for you per se, but here are a few thoughts...
I can't think of a way offhand to make the CML make use of another library's vectorized code. It'd be nice if the CML had vectorized implementations, but our first priority was portability, so platform-specific optimizations just weren't in the cards (at least not this time around).
However, although it doesn't sound like it's quite what you're looking for, the 'data sharing' you mention was intended to make it possible to use the CML alongside another math library, kind of like what you describe. Specifically, the purpose is to make it possible to use the CML's comprehensive library of 2-d and 3-d math functions while using another library for the low-level stuff if needed.
Basically, all the CML requires is that the data be contiguous (i.e. the 16 elements of a 4x4 matrix should be stored contiguously in memory). I don't have the API in front of me right now, but usage should look something like this:
typedef cml::matrix<float, external<4,4> > cml_matrix;float m[16]; // Assume this is the data you'll be using with your vectorized math functionscml_matrix cm(m);// You can now call CML transform/math functions on 'cm', but it will be the data in 'm' that is// modified.
I think it's often worth asking though whether assembly-level optimization is really needed. There are certainly cases where it is needed, but if all you're doing is setting up and modifying transforms and transforming a limited amount of geometry for collision detection purposes or what have you (a usage pattern that is common in games, I think), the CML - or any other non-vectorized library - should be plenty fast, I would think.Please let me know if there's any other info I can provide.
vectormathlibrary supports both structure of arrays and array of structures type storage for vectors, and obviously SoA has higher througput since it avoids swizzling. I'm wondering how difficult it would be difficult to get CML to work with SoA data, or perhaps I would simply use an SSE intrinsic to do the rearrangement into a temporary vector and provide to CML, which might be OK if this is not a frequent operation.
Quote:Original post by PruneIf the operations are relatively infrequent, I would go the 'temporary vector' route. (I don't think it would be possible to get the CML to work in a 'SoA' setting without some major revisions.)
vectormathlibrary supports both structure of arrays and array of structures type storage for vectors, and obviously SoA has higher througput since it avoids swizzling. I'm wondering how difficult it would be difficult to get CML to work with SoA data, or perhaps I would simply use an SSE intrinsic to do the rearrangement into a temporary vector and provide to CML, which might be OK if this is not a frequent operation.
Hi Prune,
If you set up your compiler options properly (on GCC4+, VS7, VS8, VS9), you'll find out that the compiler generates darn good SSE code. This is why the CML significantly outperformed at least one other hand-SSE optimized code in all of the vector tests I did (http://exmat.sourceforge.net/).
However, I'm no SSE expert, so it very well could be that it's possible to write super-tight SSE code that beats the CML in fair tests. If this is the case, I'd be glad to find out.
Moreover, the CML core is actually flexible enough that it can be updated to support SSE-enabled operations. If it turns out that hand-code SSE is faster for certain data types and fixed-length vectors/matrices, it's basically a matter of specializing a couple of classes and structures.
If you can point me at some good, hand-optimized vec/mat code, I'd be very interested in running some tests against CML.
Thanks,
Demian
If you set up your compiler options properly (on GCC4+, VS7, VS8, VS9), you'll find out that the compiler generates darn good SSE code. This is why the CML significantly outperformed at least one other hand-SSE optimized code in all of the vector tests I did (http://exmat.sourceforge.net/).
However, I'm no SSE expert, so it very well could be that it's possible to write super-tight SSE code that beats the CML in fair tests. If this is the case, I'd be glad to find out.
Moreover, the CML core is actually flexible enough that it can be updated to support SSE-enabled operations. If it turns out that hand-code SSE is faster for certain data types and fixed-length vectors/matrices, it's basically a matter of specializing a couple of classes and structures.
If you can point me at some good, hand-optimized vec/mat code, I'd be very interested in running some tests against CML.
Thanks,
Demian
The code I mentioned in my first post, written by Sony for the Bullet physics open source library, is very well optimized. I didn't do side by side comparison with CML yet. I use Intel's compiler in both Windows and Linux, which has excellent vectorization but even there I often end up doing lots of manual work--not always intrinsics, but often you have to add crap like #pragma ivdep etc. Not to mention that if you don't align your data, vectorization is almost useless.
Hi Prune,
I've seen the Sony lib, but haven't had time to try it out. If I can get any spare cycles, I'll check it out. Thanks for the info.
Cheers,
Demian
I've seen the Sony lib, but haven't had time to try it out. If I can get any spare cycles, I'll check it out. Thanks for the info.
Cheers,
Demian
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement