Std vector, sorting and/or filling

Started by
8 comments, last by cozzie 10 years, 1 month ago
Hi.
Currently I'm trying to find managable and efficient way for filling and sorting a std::vector.
The case is as follows:

Typedef struct Thing
{
int value_a;
int value_b;
int value_c;
}

std::vector<Thing> myThings;

Option 1:
- I randomly fill the vector
- sort the order of items in the vector based on value_b lowest to highest combined with value_c highest to lowest

How would I do that?

Option 2:
- fill the vector in the right order (sort the seperate source data earlier)

Option 2 is what I do at the moment.

My questions:
- is sorting existing content of a vector efficient versus refilling the whole vector?
In my case the content of the vector varies a lot, it contains visuble renderables of a frame in my 3d engine.
- do you know any good articles/ documentation on sorting vectors based on sorting items within in this case a struct?
- is a struct in this case the best way to go, or would you prefer making a hashing table and sort only that?

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

Advertisement

Are you sure you certainly need a vector?

If your elements are unique, you could use an std::set or std::map, which both offer sorting, you'd only have to roll your own compare-function.

http://realtimecollisiondetection.net/blog/?p=86

Note that you don't need to sort the structs necessarily. You can have a separate vector mapping keys to indices to the vector of data. Sorting a vector of smaller items is faster than sorting a vector of larger items, so whether that is a win or not will depend on how big and expensive to move your Thing struct actually is.

As _always_, just try it and profile it to see if it's faster or not. It is practically impossible for anyone else to guess how your implementation of a generic algorithm with your specific data set on your specific target hardware is going to perform compared to other approaches with similar asymptotic complexity.

Sean Middleditch – Game Systems Engineer – Join my team!

-make a single linked list (one that uses a preallocated array, instead of just spitting the elements onto the heap)

-loop through your original vector and start adding the elements to the list(or pointers to the elements)

like:

for( vector size)

list_array = &(original_vector);

then start iterating through your list ....now not by using an array index, but normally with a list iterator. When you find the first element that is smaller/larger than your newly added element finish the insertion by setting the pointers.(as you normally insert an element into a linked list)

Now the slow part here is that for every newly added element you have to iterate through your list. You speed this up by saving the pointer(a list iterator basically) to every 50th or 100th element into a samaller array. ....so before you start iterating through the list, you loop through this smaller array to find the closest pointer(list iterator).

Probably Ive explained it in a crappy way but I used this to sort my models for the rendering queue and it was freakin fast.

Now the slow part here is that for every newly added element you have to iterate through your list. You speed this up by saving the pointer(a list iterator basically) to every 50th or 100th element into a samaller array. ....so before you start iterating through the list, you loop through this smaller array to find the closest pointer(list iterator).


It is often faster (profile and test to be sure, of course) to just use a vector and a binary search (std::lower_bound) to insert. Much, much easier, too. A sensible vector of POD types will use a memmove to make room to insert the new item, which is really fast on modern hardware (the vector possibly only does this with optimizations turned on unless you wrote your own vector, so be sure to test with -O2 or equivalent optimizations flags). A vector has the advantage that memory accesses over the sorted data nicely prefetches/streams, and there's no need to bloat up data with list pointers (which reduces the amount of items that fit in each cache line).

The optimization of storing pointers to every Nth list element is called a "skip list" by the way, though they're not always stored in an array (sometimes it's just two linked lists). Using an array, you could also use binary search just on that skip list array rather than iterating over it, but when that's a win will depend in part on how big that skip list array gets (linear search is faster than binary search sometimes for very small values of N).

Sean Middleditch – Game Systems Engineer – Join my team!

Hi.
Currently I'm trying to find managable and efficient way for filling and sorting a std::vector.
The case is as follows:

Typedef struct Thing
{
int value_a;
int value_b;
int value_c;
}

std::vector<Thing> myThings;

Option 1:
- I randomly fill the vector
- sort the order of items in the vector based on value_b lowest to highest combined with value_c highest to lowest

How would I do that?

Option 2:
- fill the vector in the right order (sort the seperate source data earlier)

Option 2 is what I do at the moment.

My questions:
- is sorting existing content of a vector efficient versus refilling the whole vector?
In my case the content of the vector varies a lot, it contains visuble renderables of a frame in my 3d engine.
- do you know any good articles/ documentation on sorting vectors based on sorting items within in this case a struct?
- is a struct in this case the best way to go, or would you prefer making a hashing table and sort only that?

Filling then sorting is just filling, then calling std::sort with a comparison function.

Sorting in an N log N operation, which can jump all around the vector in memory as its sorting.

But if you're sorting the input one way or another anyway, why would it matter if you sort it in the vector, or in some previous container that the data is in?

To your last question, in general it's more efficent to have a struct-of-arrays instead of an array-of-stucts because of how struct data tends to be accessed. But doing that compromises the complexity, as it is easier to think in terms of arrays-of-structs.

Is this a case of pre-mature optimization? Is your currently implementation running too slow, and have you identified it as the actual bottleneck of your rendering, by using profiling tools? Optimizing areas that aren't the bottleneck, while nice, usually isn't as effective as optimizing the actual bottlenecks.

Assuming this is actually actually a bottleneck, does the vector order really change every frame, or can you get by with resorting it only every other frame or every third frame? (Instant 50% or 66% saving off your bottleneck right there).

Is a known portion of the vector at a greater distance from the player, and less critical to be sorted properly?

Could you say, split the vector into two or more vectors, where one vector just contains the stuff that is within 50 feet of the player, and the other contains stuff that is greater than 50 feet away?

You could sort the 'near object' vector every frame, and use a continual bubble sort to gradually sort the 'far object' vector bit by bit every frame. If most your objects are at a distance, and objects at a distance are less likely to need to be resorted anyway, this might get you a decent performance gain without a serious cost to your visuals. If this is actually your bottleneck.

The benefit of bubble sort also means you can just swap the two adjacent elements instead of trying to find and re-insert them, which causes an avalanche of having to move other elements to make room.

(Note: You don't actually have to use two real vectors, that's just for explaining. The data can actually be in a single vector, but have a cutoff point)

Are you sure you certainly need a vector?

If your elements are unique, you could use an std::set or std::map, which both offer sorting, you'd only have to roll your own compare-function.

Well, if he's using the vector to store renderables, the benefits of vector's iteration might outway the benefits of map sort-when-inserting. Especially so if the OP properly leverages the benefits of vector elements being in contiguous memory, which OpenGL and DirectX take advantage of.

What of

std::sort (std::begin(/*vector name*/), std::end(/*vector name*/))
?

UNREAL ENGINE 4:
Total LOC: ~3M Lines
Total Languages: ~32

--
GREAT QUOTES:
I can do ALL things through Christ - Jesus Christ
--
Logic will get you from A-Z, imagination gets you everywhere - Albert Einstein
--
The problems of the world cannot be solved by skeptics or cynics whose horizons are limited by the obvious realities. - John F. Kennedy

What of




std::sort (std::begin(/*vector name*/), std::end(/*vector name*/))
?

He's not asking how to sort an already filled vector. You need to learn to read a question before you attempt to answer it.

Thanks all. In my case the order will change quite often, but the suggestion to reorder each 2 or 3 frames sounds like a good approach. It's expected that the camera moves or rotates that much within 2 or 3 frames.

What I'll do first is fill the array in the right order, so no sorting of the structs in the vector itself. Then update it each 2 frames and do some profiling afterwards.
Thanks for the help and advices, I'll post the results when I'm done.

Crealysm game & engine development: http://www.crealysm.com

Looking for a passionate, disciplined and structured producer? PM me

This topic is closed to new replies.

Advertisement