Sign in to follow this  
Martin

SSE Plus

Recommended Posts

Martin    194
I'm thinking of looking into SSE programming specifically for computing noise functions. SSE2 looks nice though SSE4 has some instructions which would also be useful. (Horizontal Add) I'm thinking I may have to write 3 implementations of my noise functions: C only SSE2 SSE4 Seems like quite a lot of work, really I only want to write the SS4 version and the code automatically fills in the missing functions if SSE4 or SSE2 is not present. Rather than reinventing the wheel I've been looking around for SSE libraries. SSEPlus looks interesting, does anyone have any experience of this? Overview: http://sseplus.sourceforge.net/SSEPlus.pdf Download: http://sourceforge.net/projects/sseplus/ I'm wondering what SSE libraries people have experience with or does everyone roll their own? Thanks

Share this post


Link to post
Share on other sites
frob    44904
Many individuals roll there own. They generally only get marginal benefits at best, but more often than not they are buggy and introduce a slight performance penalty overall.

Most companies don't write their own. Some will have very specialized libraries for SSE and processor-specific optimizations, but those are the exception and not the rule. High quality libraries are inexpensive, just a few hundred bucks per developer, or very cheap for bulk licenses. It is much cheaper than retaining employees who are experts in the field.

Look at Intel's Math Kernel Library (MKL) and Integrated Performance Primitives (IPP) Library, both free for certain noncommercial uses. Jumping through some hoops is required for the free Windows versions.

Both libraries are extremely well optimized by experts in the field. They take advantage of multi-CPU and multi-core processors. And they perform as well or better than most other similar libraries on their 32-bit and 64-bit processors, and are comparable to libraries on AIX and other systems.


The MKL is oriented toward linear algebra, including dense and sparse matrix operations, FFT, calculus, solvers, and higher mathematics.

The IPP library is oriented toward encoding and processing of data, including fast strong encryption, image processing and computer vision, speech and audio processing, and assorted vector/matrix operations.


If you write your own, there is a lot to know. There is overhead involved with SSE. It is easy to introduce errors through unexpected CPU mode changes. Compilers won't optimize the code you write, and must do extra work to prepare for and recover from your own blocks. It takes a lot of knowledge to optimize code for efficient instruction decoding, microcode generation, processing, reordering, and retirement, and doing it poorly can introduce further performance problems. Etc., etc., etc.

As with any performance-oriented change, make sure you profile your code before and after the changes, since it is very easy to make performance worse instead of better.

[Edited by - frob on August 13, 2008 3:43:24 PM]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this