Sign in to follow this  
ToohrVyk

[C++] Lightweight container

Recommended Posts

I need a data structure that provides the following:
  • Insert an element into the structure: this returns an integer index, which can then be used to access this element.
  • Remove the element corresponding to an index.
I need the above in constant-time, amortized constant-time works too. I would like it to be as memory-friendly as possible.

Share this post


Link to post
Share on other sites
As soon as you remove an element, all the indices following it will be invalid. I think the closest you'll get is a hash map.

Share this post


Link to post
Share on other sites
How about pairing a std::list<T> with a std::vector< std::list<T>::iterator >?


const index_t insert(const T& item)
{
// iter = insert item into list [constant-time?]
// vector.push_back(iter) [might resize vector...]
// return vector.size() -1 [constant-time?]
}

T& operator[](const index_t index) // plus const version...
{
// check vector[index] is valid [could be simple]
// return *(vector[index]) [constant-time]
}

void remove(const index_t index)
{
// check vector[index] is valid [could be simple]
// list.erase( vector[index] ) [constant-time?]
// invalidate vector[index] [could be simple]
}


...plus some optimizations to pool and re-use invalidated indices (unless you don't care about them). You could use a special value for invalid indices, which would make validation checks and invalidation very simple and fast to boot.

This solution seems a little simple to me though... I haven't applied much thought to the complexity of the individual operations, as I'm rather sleepy. Ah well... I'll post it anyway; at worst I'm being stupid again! [grin]

[Edit: what exactly do you mean by "Remove the element corresponding to an index"? Do all higher indices get shuffled down one? If so, then you'll also need to shuffle the objects at vector[index+1..vector.size()-1] down one place, which could be nasty.]

Share this post


Link to post
Share on other sites
Quote:
Original post by ajones
[Edit: what exactly do you mean by "Remove the element corresponding to an index"? Do all higher indices get shuffled down one? If so, then you'll also need to shuffle the objects at vector[index+1..vector.size()-1] down one place, which could be nasty.]


Thank you for your proposition, this sounds like a good idea (with an additional linked list of indices for pooling, for instance).

As for your edit, an index is intended to be stable, and not change as soon as anothr element is removed. If I could allow indices to change, a single vector would be enough (using swap-and-pop).

Share this post


Link to post
Share on other sites
If you _only_ need the features you mentioned, and not, say, iteration, the answer's easy ;)

Obj* myObj = new Obj; // myObj now holds your index

// Constant time access:
*myObj....

but I guess that's probably not what you're looking for.

Share this post


Link to post
Share on other sites
Quote:
Original post by jtrask
If you _only_ need the features you mentioned, and not, say, iteration, the answer's easy ;)

Obj* myObj = new Obj; // myObj now holds your index

// Constant time access:
*myObj....

but I guess that's probably not what you're looking for.


Duh... It could be, in fact, assuming that I can convert between a pointer and an integer safely. But you're right, I basically described an allocator :)

Share this post


Link to post
Share on other sites
I do not believe what you described can be done without presuming a limit on the number of objects.

Let's assume you have a limit on the number of objects, "K". Say K=2^64.

A std::map takes no more than O(ln(K)) = O(64) = O(1) time to add/remove/delete any element in the map.

...

Yes, that was a smart ass answer.

---

Ok, let's assume you don't want a std::map. You can also use hash-maps (which are "constant" time, yet stop performing well as you add more elements @_@), the standard-C memory allocator, or a number of other techniques. I am not aware of a constant-time search algoritm that doesn't have a hardware or algorithm-based limit to the number of elements it can store.

Remember, operator+ takes O(ln(n)) time for a number of size n. On small numbers (less than 2^32) you can use a hardware implementation that does it in constant time, but for big numbers it takes much longer.

Share this post


Link to post
Share on other sites
What about a std::set?



// say object is the class you want to hold
typedef std::set<object> object_set;
typedef typename object_set::iterator object_iterator;

object_set mySet;
object_iterator index1 = mySet.insert(object()).first;
object_iterator index2 = mySet.insert(object()).first;




The reason for the .first stuff is that std::set::insert returns a std::pair<iterator,bool>, with the iterator being the iterator (i.e: index) to the element in the set, and the bool being whether or not the new element was inserted. If the bool is false, it means that an element with that value already existed in the set.

If you need a set that can contain multiple instances of the same value, use a std::multiset instead.

Share this post


Link to post
Share on other sites
This can be done with a simple std::vector adapter. Insertion will be amortized constant time (It may need to reallocate if it's bigger now that it has ever been before, just like std::vector) but removal will be constant time. Unfortunatly, it won't be able to give you random-access iterators, because of the non continous indexes. Actually, I haven't thought much about how it would supply the iterators at all. But it might give you some ideas.

The code would look something like

template<typename T>
class SparseVector
{
private:
std::vector<T*> data;
int first_free;
public:
SparseVector() : first_free(-1) {}

int insert(const T& t)
{
size_t spot = first_free;
if (spot == -1)
{
spot = data.size();
data.push_back(t);
return spot;
}
first_free = static_cast<int>(data[spot]);
data[spot] = t;
return spot;
}

void remove(size_t spot)
{
static_cast<int>(data[spot]) = first_free;
first_free = spot;
}
};


What I did was maintain a list of free nodes right inside the vector of pointers.

Share this post


Link to post
Share on other sites
Quote:
Original post by daerid
with the iterator being the iterator (i.e: index) to the element in the set


The problem here is that I need an integer index.

I've opted for using ptrdiff_t indices and an allocator for storing my objects.

Share this post


Link to post
Share on other sites
True O(1) 'element access' depends on being able to do some arithmetic on the index value and directly get a memory address. Having an index value remain valid depends on objects not moving, so that repeating the calculation always yields the correct address. These ideas are incompatible with the idea of deallocating memory when you 'remove' something. Even if you wanted to implement it yourself, you can't deallocate *part* of an array. So *some* level of compromise is going to be required.

As it happens, I have been thinking about a solution for the problem that is almost as good as what you're looking for. The general idea is to store a map of vectors, where the 'key' is a 'base index' for the stored vector. To retrieve an element by index, the appropriate base is found using lower_bound(), and then the vector is indexed into. This will remain constant-time until the first deletion, and gradually corrupt over time. Individual removals would be slow, but remove_if() could be specialized to provide amortized constant(best-case) to log-n(worst-case) time.

On second thought, maybe it's not worth it... :\

Share this post


Link to post
Share on other sites
Allocation and Deallocation of memory is not constant time. The amount of time it takes grows as you fragment memory with previous Allocations and Deallocations.

Share this post


Link to post
Share on other sites
Quote:
Original post by NotAYakk
Allocation and Deallocation of memory is not constant time. The amount of time it takes grows as you fragment memory with previous Allocations and Deallocations.


True. However, it doesn't get any better if you use other data structures (which will still need to allocate their memory). My main problem lies not in enlarging the memory area, but in removing elements while keeping fixed indices. A pool allocator allows this with very good performance.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this