# Help with optimizations

## Recommended Posts

[size=2]Here is the test: You have an std::vector full of random numbers and some of the numbers are repeated several times randomly. You must find the most [/size]efficient[size=2] way to create another vector with all numbers in the list [/size]referenced[size=2] only once and ordered from the lowest to the highest.[/size]

##### Share on other sites
std::sort followed by std::unique, done. (For better performance you might want to use unique_copy to avoid O(n^2) complexity).

##### Share on other sites
Or just create a set from the data:

std::set<int>(input.begin(), input.end());

The set doesn't allow duplicates, and uses a binary search tree to sort from low to high.

##### Share on other sites
[i]Or [/i]you take Washu's C++ Quiz
I swear, if you get them all straight, I'll buy you a drink

##### Share on other sites
[quote name='Vectorian' timestamp='1313963911' post='4852025']
std::sort followed by std::unique, done. (For better performance you might want to use unique_copy to avoid O(n^2) complexity).
[/quote]

std::unique has O(n) complexity.

##### Share on other sites
[quote name='Mike.Popoloski' timestamp='1313966276' post='4852033']
Or just create a set from the data:

std::set<int>(input.begin(), input.end());

The set doesn't allow duplicates, and uses a binary search tree to sort from low to high.
[/quote]

std::set definitely appears to be the best way to do it. Now I need a way to get a vector of the locations of values within the first vector that match the given set.

Example: {5,3,2,1,2,5}
Search: 5
Result: {0,5}

##### Share on other sites
[quote name='Mike.Popoloski' timestamp='1313966276' post='4852033']
Or just create a set from the data:

std::set<int>(input.begin(), input.end());

The set doesn't allow duplicates, and uses a binary search tree to sort from low to high.
[/quote]

I know this is going to sound stupid but how do I read an item from the set? Tried this but it does not work:

[code]DWORD foo= *(theSet.begin() + n);[/code]

EDIT: This works:

[code]DWORD foo= (DWORD)(&theSet+(n*sizeof(DWORD)));[/code]

However I assume there must be a better way?...

##### Share on other sites
You know, if you actually had a question you'd probably be better off phrasing your thread title as something other than a pissing contest. In any case, how you read elements from a set depend on how you want to use it. If you're iterating through the elements then you'd use a standard iterator loop:
[code]
for (SetType::iterator i = my_set.begin(); i != my_set.end(); ++i) {
// do something with *i
}
[/code]

##### Share on other sites
[quote name='SiCrane' timestamp='1313974331' post='4852073']
You know, if you actually had a question you'd probably be better off phrasing your thread title as something other than a pissing contest. In any case, how you read elements from a set depend on how you want to use it. If you're iterating through the elements then you'd use a standard iterator loop:
[code]
for (SetType::iterator i = my_set.begin(); i != my_set.end(); ++i) {
// do something with *i
}
[/code]
[/quote]

Yeah I already had code that did that and knew there must be a way to optimize it. I should have just tiled it something like "optimize this." Anyway, I was basically trying to create separate index buffers for each subset of my mesh given an attribute buffer.

Here is what I have now:

[code] // Create Index buffers.
std::set<DWORD> usedAttributes(attributes.begin(), attributes.end());
std::vector<std::vector<Ovgl::Face>> index_subsets;
index_subsets.resize(usedAttributes.size());
for( unsigned int i = 0; i < attributes.size(); i++ )
{
unsigned int s = 0;
for( std::set<DWORD>::iterator j = usedAttributes.begin(); j != usedAttributes.end(); ++j)
{
if( attributes[i] == *j )
{
index_subsets[s].push_back(faces[i]);
}
s++;
}
}[/code]

See anything I can do to optimize that?

##### Share on other sites
[quote name='sam_hughes' timestamp='1313966685' post='4852036']
std::unique has O(n) complexity.
[/quote]

Oh yeah, true that. I blame being tired for thinking of such a naive implementation.

##### Share on other sites
[quote]
EDIT: This works:
[code]
DWORD foo= (DWORD)(&theSet+(n*sizeof(DWORD)));
[/code]
[/quote]
You're getting (un)lucky. That behaviour is totally undefined. You're basically taking the address of the set, moving n DWORDS in memory, and casting the ADDRESS as a DWORD. It makes no sense.

[quote]
See anything I can do to optimize that?
[/quote]
You're doing a "linear" search on the set (the set might be implemented as a tree, so you're not necessarily search memory linearly). A simple improvement is to use std::set::find() rather than iterating through it.

You might find that the time taken to build the set (lots of memory allocations!) and search the set (following pointers!) is actually larger than just linearly searching the original vector, duplicates and all. Vector's memory contiguity could win the day, depending on the size of the data.

Consider testing this with the various types of meshes you expect to load (and maybe stress test it with some larger ones). You might be surprised at the performance of an extremely "naive" implementation here.

##### Share on other sites
There is a sweet spot for using a tree structure in most programs, and std::set and std::map are some of the more common tree structures used in C++.

Too few elements, and the overhead of memory allocation per node in the tree will kill you. Too [i]many[/i] elements, and cache coherency becomes a serious problem, unless you want to go to a lot of work writing a custom allocator. (And believe me, writing an allocator that gets good locality of reference for an arbitrary tree is a [b]bitch[/b][i] [/i]of a problem.)

As with all things performance related: profile heavily, or you're basically just pissing into the wind. What its applicable for your very specific case may not be at all the same as the relevant solutions from everyone else's experience. Even algorithmic-level optimization isn't always a clear winner on modern processing architectures.

##### Share on other sites
You guys had the answer in the very first reply. Sort the vector and then use the std::unique function. This will run much, much faster than any implementation using std::set. (But feel free to profile anyway!)

##### Share on other sites
[quote name='taz0010' timestamp='1314093523' post='4852720']
You guys had the answer in the very first reply. Sort the vector and then use the std::unique function. This will run much, much faster than any implementation using std::set. (But feel free to profile anyway!)
[/quote]

For objects with very heavy copy constructors, such as meshes or shaders, it can be faster to use a set which does no copying during the sort/uniqueness-guarantee operations.

##### Share on other sites
The best algorithmic complexity you can achieve is O(n) space, O(n) time (a counting sort). If you want O(1) space you'll have to settle for O(nlogn) time (an inplace O(nlogn) comparison sort).

The fastest way depends on the size of your data set and the implementations of the contained types copy constructors, move constructors and swap functions.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628345
• Total Posts
2982187

• 9
• 24
• 9
• 9
• 13