Sign in to follow this  
ToohrVyk

[C++] Sparse iteration

Recommended Posts

I'm writing a small (non-resizable, associative, mutable) container. This container may be made plain or sparse, depending on whether a lot of values in the container have the default value, or not. In short, though the default value is conceptually present in the container, it isn't stored (only its key is). This naturally clashes with the possibility to reference elements in the container, since default elements must be stored "just in case" the reference to them is still alive and may be used to change them. Thus, my options are:
  • Leave it be, and warn the user that they should use the const versions of iterators and member access in order not to upset the performance of the sparse version (and mention explicitly that references returned by the const versions are only references because of return-by-value performance reasons, and not because they're real references to actual objects in the container).
  • Aggressively separate reads and writes in the interface: val = iter.get(); and iter.set(val) respectively don't and do create a value in the container. There is no direct access to iterator contents. This is bad because it doesn't work with standard library iterator-based algorithms.
  • Set up a "cleanup" system, which traverses the container (when?) and eliminates the unreferenced (how?) default values from storage.
  • Have the iterator and subscript accesses return a reference type instead of references. The reference type implicitly casts to a constant reference, or may be used as an lvalue in an assignment operation. This prevents users from having write-access to the contents of the container, they may only replace a value with another.
All methods have disadvantages. Are there any others? Which one would you prefer?

Share this post


Link to post
Share on other sites
For similar functionality I used proxies for client access. I didn't implement iterators, but they'd work on the same principle. The reason I went with this was to reduce the overhead of even handlers (one set of handlers per container, opposed to one per each value).


template < template Container >
class Proxy
{
public:
typedef typename Container::KeyType KeyType;
typedef typename Container::ValueType ValueType;
typedef typename Container::ValuePtr ValuePtr;

Proxy( Container &c, KeyType &k )
: container(c)
, key(k)
, cached_value(NULL)
{}

ValueType &get() const {
if (cached_value == NULL ) {
// returns either container's value or default value
// from container's inheritance hierarchy
cached_value = container.get_reference( key );
}
return *cached_value;
}

void set( ValueType *new_value ) {
cached_value = NULL; // clear cache
container.set( key, new_value );
}
private:
Container &container;
KeyType key;
ValuePtr cached_value;
}


My container is somewhat more elaborate, since proxies have different APIs (containers, sets, raw values, ...), but the idea is the same.

The access is single-threaded only, get is obviously much faster if called frequently. Whenever a new value is set for this particular key, it's invalidated, and a new reference is obtained.


template < typename T >
class SparseContainer
{
public:
SparseContainer( SparseContainer * parent );

ValuePtr get_reference( KeyType &k )
{
ValuePtr result = m_storate.find( ... );
if (result == // not found) {
result = parent->get_reference( k );
}
return result;
}
Proxy< SparseContainer<T> > operator( Key &k )
{
return Proxy< SparseContainer<T> >( *this, k );
}
private:
std::map< KeyType, ValuePtr > m_storage;
}



All of the above is obviously missing a lot of syntactic sugar, but that's the approach I'd likely use.

I am obviously assuming that there exists an underlying parent which provides default values. In my case, if even parent doesn't exist, either an exception or key's default value is returned.

Since I use copy-on-write semantics, key's default value can be returned from anywhere, attempts to change it will update the container.

But I don't know all the implementation details you need, so it's somewhat hard to suggest a well-rounded solution.

Share this post


Link to post
Share on other sites
Quote:
Aggressively separate reads and writes in the interface: val = iter.get(); and iter.set(val) respectively don't and do create a value in the container. There is no direct access to iterator contents. This is bad because it doesn't work with standard library iterator-based algorithms.
Is this the same as the solution given by Antheus?


One more idea, assuming I understand your problem correctly; is the following feasable?..

A modification to an element's value must be made through an iterator using either operator-> or operator*, you warn the user that if they want to hold onto a 'raw' reference or pointer to the object then they must use an aggressively explicit get method (which will safely create a new object in the container).

Within in the container you store a defaultly-initialised object, correct?
Well now you store two...

The operator-> of an iterator returns a reference to the second one.

Everytime an operation on either the iterator or the container is called that would return a value to the user (or maybe all operations?) then it must check whether default_instance1 == default_instance2, if this is false then you copy its value into the underlying 'real' container and re-set the value to default.

Does that make sense?
If so, have I overlooked anything?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this