# SSE insertion sort?

This topic is 2538 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hey all,

I've looked all over the Internet but can't seem to find any examples of an SSE insertion sort. Maybe it doesn't lend itself well to SIMD? Anyone know of any good resources? Thanks

##### Share on other sites
No, you aren't finding them because it doesn't really make sense to do.

There are some sorting methods that use SIMD operations, you can even achieve O (N log log N) performance on integer sorting with SIMD in limited cases. ... But not with an insertion sort.

##### Share on other sites
Thanks, I was afraid of that. My data sets is almost always going to be mostly sorted so it seems like the insertion sort would be ideal, even vs another algorithm that is SIMD capable.

I'm not sure of any other algorithms that would give as good performance for mostly sorted data sets as the insertion sort.

##### Share on other sites
What type is the data (e.g. unsigned int, float, struct...) and what are typical and expected near maximum values on the number of items?
Is it in an array rather than a linked-list?
How often will the data actually be sorted already, and how often will it be nearly sorted?
It might help if you explain where the data comes from.
If I can get enough info from you then I can point you at the optimal sorting algorithm.

I doubt there's much to be gained by trying to do it using SSE though.

##### Share on other sites
Derive sorting network for insertion sort and implement it as SSE.

Sorting networks get around the lack of branching, but tend to be less efficient.

For general parallelization, something like bitonic sort might be a better fit.

All of these work best on 2^n-sized sub-sections, other sizes may either require padding or differently sized subdivisions, if possible.

For SSE specifically, such methods tend to be best suited for sorting register-sized data (4xint SSE register, for example). Rather than general sorting, they tend to be used for selection or filtering, such as selecting smallest component of a vector, or as preparation pass for next stage. Biggest gain is ability to keep data in SSE registers instead of moving them back and forth for regular processing.

would give as good performance for mostly sorted data sets[/quote]

Heap sort, merge sort or tim sort. Insertion sort has that painful n^2 common case. Quicksort found in various standard libraries may be implemented as introsort, which should avoid typical qsort problems.

##### Share on other sites
Thanks for the responses, to answer:

What type is the data (e.g. unsigned int, float, struct...) and what are typical and expected near maximum values on the number of items?
Is it in an array rather than a linked-list?
How often will the data actually be sorted already, and how often will it be nearly sorted?
It might help if you explain where the data comes from.
If I can get enough info from you then I can point you at the optimal sorting algorithm.

I doubt there's much to be gained by trying to do it using SSE though.

Basically I'm looking to sort SPH fluid particles. Most of the time, they will be at rest inside of a small round container, with only occasional disturbances. I plan on sorting them with Morton Z-order according to what 3D spatial grid cell they are in, so they'll be spatially sorted inside of memory. I'm currently storing them in an array, and the min and max values should be somewhere around 9 and 25 respectively.

Since the fluid will almost always be stable at rest, I'd say it's sorted probably 90% of the time.

Derive sorting network for insertion sort and implement it as SSE.

Sorting networks get around the lack of branching, but tend to be less efficient.

For general parallelization, something like bitonic sort might be a better fit.

All of these work best on 2^n-sized sub-sections, other sizes may either require padding or differently sized subdivisions, if possible.

For SSE specifically, such methods tend to be best suited for sorting register-sized data (4xint SSE register, for example). Rather than general sorting, they tend to be used for selection or filtering, such as selecting smallest component of a vector, or as preparation pass for next stage. Biggest gain is ability to keep data in SSE registers instead of moving them back and forth for regular processing.

would give as good performance for mostly sorted data sets

Heap sort, merge sort or tim sort. Insertion sort has that painful n^2 common case. Quicksort found in various standard libraries may be implemented as introsort, which should avoid typical qsort problems.
[/quote]

Thanks for the info, the Sorting network sounds interesting. Since the fluids will be 90% sorted I'm trying to optimize for the mostly sorted case, would the sorting network slow things down for that?

1. 1
2. 2
Rutin
19
3. 3
khawk
15
4. 4
A4L
13
5. 5

• 13
• 26
• 10
• 11
• 44
• ### Forum Statistics

• Total Topics
633744
• Total Posts
3013657
×

## Important Information

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!