Sign in to follow this  
Endar

STL set balancing

Recommended Posts

Endar    668
Okay, this question might seem to be a little rambling, but that's kind of how it is in my head at the moment. I have an array that I'm referencing as a 2d array, and to more easily keep track of different elements in the 2d array, I've created a simple object:
struct Index2d{
	unsigned int x;
	unsigned int y;

	// useful operator overloads: =, ==, +, +=, etc, etc
};

Each element in the 2d array is considered as a block (I'm writing a puzzle game), and I'm using a std::vector to store the 'Index2d' objects when I'm searching in the 2d array for adjacent blocks that are the same colour. At the moment, the order of the Index2d objects are not important. I don't know that will change, but I believe that as long as I have the set of them, it shouldn't matter about the order. When I'm checking if adjacent blocks have the same colour, and therefore should be in the current set, I need to do a bunch of searches using std::find. And I have a feeling that even though it won't be happening more than a couple of times a second (if that), it will probably will turn out to be less optimized than it could be. Also, at one point, I have a std::vector< std::vector<Index2d> > object and I would likely have to compare a vector to all the other vectors contained there. Anyway, I'm thinking of changing the std::vector to a std::set. My first question is: since I've read that the std::set object is usually implemented as a balanced binary tree, will that speed up the std::finds to a degree where it is worth it? And second: would std::set object be balanced the same if the same objects were inserted into it? Or would the objects have to be inserted in the exact same order as well? Because, when I'm comparing two std::set objects, if they have the exact same 5 objects, they will need to be balanced the same in order to be declared equal, right? Hope you guys understood that.

Share this post


Link to post
Share on other sites
ToohrVyk    1595
Balanced? An std::set has no such property. An std::set is a sorted sequence of unique elements, the state of which is independent of the order in which the elements were inserted. Two std::sets are equal if they contain the same elements, and the operation requires linear time (in the number of elements) to perform if the sets are equal (on average, it's constant time if they are different in almost all practical situations). The std::set<T>::find operation is faster than std::find, as it is guaranteed to perform in logarithmic time (or less), whereas std::find only has a linear time guarantee.

Share this post


Link to post
Share on other sites
Vorpy    869
1. Premature optimization; you should aim for conceptual clarity first
2. Why are you using such a sparse representation for a puzzle game played on a grid?

Share this post


Link to post
Share on other sites
Endar    668
Quote:
Original post by ToohrVyk
Balanced? An std::set has no such property. An std::set is a sorted sequence of unique elements, the state of which is independent of the order in which the elements were inserted. Two std::sets are equal if they contain the same elements, and the operation requires linear time (in the number of elements) to perform if the sets are equal (on average, it's constant time if they are different in almost all practical situations). The std::set<T>::find operation is faster than std::find, as it is guaranteed to perform in logarithmic time (or less), whereas std::find only has a linear time guarantee.

Excellent, that is exactly what I wanted to hear. I was a little confused as a site I visited mentioned that std::set used a balanced binary tree in implementation.

Quote:
Original post by Vorpy
1. Premature optimization; you should aim for conceptual clarity first
2. Why are you using such a sparse representation for a puzzle game played on a grid?

The game is actually a grid wrapped around a cylinder, but that neither here nor there.

What do you mean by sparse representation? I assume that you're using 'sparse' in the same meaning as a sparse matrix, where not all the matrix elements are filled. If so, this doesn't hold for the gameplay grid as for the purpose of what this thread is about, all elements of the grid will be filled at gameplay time.

The grid is kept in a 1d array, which is being referenced as a 2d array. The std::set will just be used when I'm trying to find the biggest group of adjacent blocks that have the same colour, so it's not used to keep all the elements in the grid during gameplay.

Share this post


Link to post
Share on other sites
SunTzu    286
Quote:
Excellent, that is exactly what I wanted to hear. I was a little confused as a site I visited mentioned that std::set used a balanced binary tree in implementation.


Technically it might, I think, but probably won't. The usual implementation of a set is a red-black tree, but this is not compulsory - the library writers can use any structure they like as long as the performance guarantees are obeyed. For example, here is an article about implementing a set in terms of a splay tree. Nevertheless... it's almost certain that just about any implementation of a set you come across will use a red-black tree, which is not balanced. [ Edit: I'm no longer sure the splay tree implementation is necessarily compliant as I think searches are worst case O( k log n ), so it may simply be using the same interface without making the same complexity guarantees... ]

In any case, tests for equality etc. compare only the elements in the set, not the internal arrangement of memory, so for all intents and purposes you can freely ignore internal structure and just worry about the performance guarantees.

And to answer the first question... the performance of std::find on a vector compared to std::map::find can vary quite widely, depending mostly on the size of the container. A small vector may easily do better due to cache coherency, fewer pointer dereferences etc. In a large container, a set will win out; std::find on a vector has O(n) complexity whereas std::set::find has O(log n) complexity, which will be quicker if the container is large enough even though each operation may be more costly. (Although, if you know the vector is sorted you could use std::binary_search, which I believe may be O(log n) as well? Obviously this adds the expense of sorting... but that may not matter especially if you only need to do it once). As always, the only way to be certain is to try both and profile them.

If the cost of the find operation is all you are worried about, you may also be interested in trying std::tr1::unsorted_set or boost::hash_set (basically both the same thing) if you have them available to you as they may be even faster than a std::set. If, on the other hand, you're likely to be doing other operations - insertions, removals, etc - on the containers, then the perforance of find may not be the only thing to consider.

And finally... from the description you've given, it sounds unlikely any of this will make a noticeable (maybe not even measurable) difference. But, it's always appealing to get it "right" so I can understand your curiosity.

Share this post


Link to post
Share on other sites
iMalc    2466
Quote:
Original post by SunTzu
For example, here is an article about implementing a set in terms of a splay tree. Nevertheless... it's almost certain that just about any implementation of a set you come across will use a red-black tree, which is not balanced. [ Edit: I'm no longer sure the splay tree implementation is necessarily compliant as I think searches are worst case O( k log n ), so it may simply be using the same interface without making the same complexity guarantees... ]
Splay-trees guarantee amortised O(log n) time for operations. It probably doesn't fit the criteria for the std::set interface simply because individual operations can take longer (whilst others will be proportionally shorter). Nevertheless the idea is quite attractive due to the decreased overhead of information stored per node, and also the very good adaptiveness of splay trees to improve speed when certain patterns appear, such as inserting a run of items in order etc.

They do of course have worst-case O(n) behaviour, which can be unacceptable in certain situations.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this