STL set balancing

Started by
4 comments, last by iMalc 16 years, 11 months ago
Okay, this question might seem to be a little rambling, but that's kind of how it is in my head at the moment. I have an array that I'm referencing as a 2d array, and to more easily keep track of different elements in the 2d array, I've created a simple object:

struct Index2d{
	unsigned int x;
	unsigned int y;

	// useful operator overloads: =, ==, +, +=, etc, etc
};

Each element in the 2d array is considered as a block (I'm writing a puzzle game), and I'm using a std::vector to store the 'Index2d' objects when I'm searching in the 2d array for adjacent blocks that are the same colour. At the moment, the order of the Index2d objects are not important. I don't know that will change, but I believe that as long as I have the set of them, it shouldn't matter about the order. When I'm checking if adjacent blocks have the same colour, and therefore should be in the current set, I need to do a bunch of searches using std::find. And I have a feeling that even though it won't be happening more than a couple of times a second (if that), it will probably will turn out to be less optimized than it could be. Also, at one point, I have a std::vector< std::vector<Index2d> > object and I would likely have to compare a vector to all the other vectors contained there. Anyway, I'm thinking of changing the std::vector to a std::set. My first question is: since I've read that the std::set object is usually implemented as a balanced binary tree, will that speed up the std::finds to a degree where it is worth it? And second: would std::set object be balanced the same if the same objects were inserted into it? Or would the objects have to be inserted in the exact same order as well? Because, when I'm comparing two std::set objects, if they have the exact same 5 objects, they will need to be balanced the same in order to be declared equal, right? Hope you guys understood that.
[size="2"][size=2]Mort, Duke of Sto Helit: NON TIMETIS MESSOR -- Don't Fear The Reaper
Advertisement
Balanced? An std::set has no such property. An std::set is a sorted sequence of unique elements, the state of which is independent of the order in which the elements were inserted. Two std::sets are equal if they contain the same elements, and the operation requires linear time (in the number of elements) to perform if the sets are equal (on average, it's constant time if they are different in almost all practical situations). The std::set<T>::find operation is faster than std::find, as it is guaranteed to perform in logarithmic time (or less), whereas std::find only has a linear time guarantee.
1. Premature optimization; you should aim for conceptual clarity first
2. Why are you using such a sparse representation for a puzzle game played on a grid?
Quote:Original post by ToohrVyk
Balanced? An std::set has no such property. An std::set is a sorted sequence of unique elements, the state of which is independent of the order in which the elements were inserted. Two std::sets are equal if they contain the same elements, and the operation requires linear time (in the number of elements) to perform if the sets are equal (on average, it's constant time if they are different in almost all practical situations). The std::set<T>::find operation is faster than std::find, as it is guaranteed to perform in logarithmic time (or less), whereas std::find only has a linear time guarantee.

Excellent, that is exactly what I wanted to hear. I was a little confused as a site I visited mentioned that std::set used a balanced binary tree in implementation.

Quote:Original post by Vorpy
1. Premature optimization; you should aim for conceptual clarity first
2. Why are you using such a sparse representation for a puzzle game played on a grid?

The game is actually a grid wrapped around a cylinder, but that neither here nor there.

What do you mean by sparse representation? I assume that you're using 'sparse' in the same meaning as a sparse matrix, where not all the matrix elements are filled. If so, this doesn't hold for the gameplay grid as for the purpose of what this thread is about, all elements of the grid will be filled at gameplay time.

The grid is kept in a 1d array, which is being referenced as a 2d array. The std::set will just be used when I'm trying to find the biggest group of adjacent blocks that have the same colour, so it's not used to keep all the elements in the grid during gameplay.
[size="2"][size=2]Mort, Duke of Sto Helit: NON TIMETIS MESSOR -- Don't Fear The Reaper
Quote:Excellent, that is exactly what I wanted to hear. I was a little confused as a site I visited mentioned that std::set used a balanced binary tree in implementation.


Technically it might, I think, but probably won't. The usual implementation of a set is a red-black tree, but this is not compulsory - the library writers can use any structure they like as long as the performance guarantees are obeyed. For example, here is an article about implementing a set in terms of a splay tree. Nevertheless... it's almost certain that just about any implementation of a set you come across will use a red-black tree, which is not balanced. [ Edit: I'm no longer sure the splay tree implementation is necessarily compliant as I think searches are worst case O( k log n ), so it may simply be using the same interface without making the same complexity guarantees... ]

In any case, tests for equality etc. compare only the elements in the set, not the internal arrangement of memory, so for all intents and purposes you can freely ignore internal structure and just worry about the performance guarantees.

And to answer the first question... the performance of std::find on a vector compared to std::map::find can vary quite widely, depending mostly on the size of the container. A small vector may easily do better due to cache coherency, fewer pointer dereferences etc. In a large container, a set will win out; std::find on a vector has O(n) complexity whereas std::set::find has O(log n) complexity, which will be quicker if the container is large enough even though each operation may be more costly. (Although, if you know the vector is sorted you could use std::binary_search, which I believe may be O(log n) as well? Obviously this adds the expense of sorting... but that may not matter especially if you only need to do it once). As always, the only way to be certain is to try both and profile them.

If the cost of the find operation is all you are worried about, you may also be interested in trying std::tr1::unsorted_set or boost::hash_set (basically both the same thing) if you have them available to you as they may be even faster than a std::set. If, on the other hand, you're likely to be doing other operations - insertions, removals, etc - on the containers, then the perforance of find may not be the only thing to consider.

And finally... from the description you've given, it sounds unlikely any of this will make a noticeable (maybe not even measurable) difference. But, it's always appealing to get it "right" so I can understand your curiosity.
Quote:Original post by SunTzu
For example, here is an article about implementing a set in terms of a splay tree. Nevertheless... it's almost certain that just about any implementation of a set you come across will use a red-black tree, which is not balanced. [ Edit: I'm no longer sure the splay tree implementation is necessarily compliant as I think searches are worst case O( k log n ), so it may simply be using the same interface without making the same complexity guarantees... ]
Splay-trees guarantee amortised O(log n) time for operations. It probably doesn't fit the criteria for the std::set interface simply because individual operations can take longer (whilst others will be proportionally shorter). Nevertheless the idea is quite attractive due to the decreased overhead of information stored per node, and also the very good adaptiveness of splay trees to improve speed when certain patterns appear, such as inserting a run of items in order etc.

They do of course have worst-case O(n) behaviour, which can be unacceptable in certain situations.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms

This topic is closed to new replies.

Advertisement