Criticism needed on my template voodoo

Started by
16 comments, last by King Mir 11 years, 1 month ago

Greetings!

I wrote a small templated piece of code for containing any data type for my entity framework.

The way it works:
I have a State class which is basically a handle (it contains a vector index) to the actual data.
The actual data is in a vector<T> in a templated class DataStore<T>, which inherits from a basic interface to increase and decrease the reference count for a particulair handle (these are used by State objects to allow easy copying and managing the reference count).
Then there is a DataStoreRepository class, which has a private map<size_t, unique_ptr<IDataStore>>, and a templated method for getting a specific DataStore<T>, which it does by using typeid(T).hash_code() to query the map for existance for a pointer to that data store type (if not exist, it news one into a unique_ptr and puts it in).
The code is below for your perusal, so if i missed explaining something, feel free to ask or see if the code is explanatory enough.

My design goals are:
1. able to put any value type into entities using as much by-value stuff, and as little new'ing of the data as possible (State class)
2. keep away from using global states (DataStoreRepository class)

3. pull the actual data from an object pool (DataStore class)


My questions are:
1. am i doing anything that i really shouldn't be doing?
2. is using typeid in this context okay?
3. can anything here be considered a hack/bad design/code smell?

I'd love it if you could take apart my code and point out anything that seems bad or misguided smile.png thanks!

===


I'm most concerned about the concept behind this class. In my hierarchy, the winmain instantiates an Engine class, which itself contains GameContexts, which themselves have one DataStoreRepository each, and when a GameContext is activated, it's DataStoreRepository is bound to State, so every GameConetxt has it's own pool of data. It kinda seems like a hack, but i don't know if i'm just imagining it or if it's justified.


class DataStoreRepository
{
private:
    std::unordered_map<size_t, std::unique_ptr<IDataStore>> _storeMap;
public:
    template<typename T> DataStore<T>& getDataStore()
    {
        auto hash = typeid(T).hash_code();
        auto it = _storeMap.find(hash);
        if(it == _storeMap.end())
        {
            it = _storeMap.insert(std::make_pair(hash, std::unique_ptr<IDataStore>(new DataStore<T>()))).first;
        }
        return *static_cast<DataStore<T>*>(it->second.get());
    }
};

This is the whole class, and it's relatively simple. I'm thinking of replacing the _freeSlots member to be a set, so that data is always at the start of the _data vector.


class IDataStore
{
public:
    virtual ~IDataStore() {}
    virtual uint AcquireDataIndex(int index = NOT_FOUND) = 0;
    virtual void ReleaseDataIndex(uint index) = 0;
};

template<typename T> class DataStore : public IDataStore
{
private:
    typedef std::vector<T> TypeCache;
    typedef std::vector<uint> NumberCache;
    TypeCache _data;
    NumberCache _refs;
    NumberCache _freeSlots;
    uint _defaultSize;
public:
    DataStore(uint size = gDefaultDataStoreSize)
        : _defaultSize(size)
    {
        _data.reserve(size);
        _refs.reserve(size);
        _freeSlots.reserve(size);
    }

    uint AcquireDataIndex(int index = NOT_FOUND)
    {
        if(index == NOT_FOUND)
        {
            //new data index
            if(!_freeSlots.empty())
            {
                index = _freeSlots.back();
                _freeSlots.pop_back();
                _data[index] = T();
            }
            else
            {
                index = _data.size();
                _data.emplace_back();
                _refs.push_back(0);
            }
        }
        ++_refs[index];
        return index;
    }

    void ReleaseDataIndex(uint index)
    {
        if(index != NOT_FOUND && _refs[index] != 0)
        {
            if(--_refs[index] == 0)
            {
                _freeSlots.push_back(index);
            }
        }
    }

    T& getData(uint index)
    {
        return _data[index];
    }
};

Most of the methods in State are non templated, templates here are to make getting the data and setting the data easier. The other methods just use AddRef or Release to point to the same data or get new data.
Before using anything, State::BindStoreRepository is called, and passed in a concrete instance (made on the stack by another class higher up in the call hierarchy, so it's guaranteed to outlive the State objects).

Reason why it's nice to me is because i can do "" State s = 10; "", and it's automatically bound to the right DataStore for the active GameContext, and i can keep them by value in any container i want.


class State
{
private:
    static Util::DataStoreRepository* _storeRepo;

    Util::IDataStore* _dataStore;
    int _dataHandle;
    //this checks _dataStore is not nullptr and calls _dataStore->Acquire(_dataHandle);
    void AddRef(uint handle = NOT_FOUND);
    //this checks _dataStore is not nullptr and calls _dataStore->Release(_dataHandle);
    void Release();

public:
    static void BindStoreRepository(Util::DataStoreRepository& repo);
    State();
    State(const State& rhs);
    State(State&& rhs);
    State& operator=(const State& rhs);
    State& operator=(State&& rhs);
    ~State();
    bool isValid();

    template<typename T> State(const T& value)
        : _dataStore(&_storeRepo->getDataStore<T>()), _dataHandle(_storeRepo->getDataStore<T>().AcquireDataIndex())
    {
        as<T>() = value;
    }

    template<typename T> State& operator=(const T& rhs)
    {
        Util::DataStore<T>& store = _storeRepo->getDataStore<T>();
        if(_dataStore != &store)
        {
            Release();
            _dataStore = &store;
            AddRef();
        }
        as<T>() = rhs;
        return *this;
    }

    template<typename T> T& as()
    {
	Util::DataStore<T>& store = _storeRepo->getDataStore<T>();
	if(&store == _dataStore)
	{
	    return store.getData(_dataHandle);
	}
	throw std::exception("State::as<T>(): Tried to access wrong type!");
    }
};

devstropo.blogspot.com - Random stuff about my gamedev hobby

Advertisement
Problems that I can see:

Access to states would be quite slow; you have to look up the store every time.

All the constructors are called up front, and destructors are never called until the store is shut down.

Once allocated, memory can only be reused for the same type as the original allocation.

No way to break circular references.

I don't believe this can be made thread safe.

Edit: actually it's worse than a lack of thread safety. Get a reference to an object. Call a method. It creates a new object of the same type. The original object moves in memory due to the vector emplace_back. Heap corruption and massive bug hunt follows.

Yeah, you need to use deque instead of vector for your caches, to preserve references on resize.

Also, I don't understand in your design, why do you have State have template assignment and constructor, instead of making State a template class itself?

Thanks!

Regarding references, they are never stored anywhere, only retrieved for modification through the State objects by the as<T>() method, and it's always just that one line ( state.as<int>() = 10; )

The State isn't a template class because i want to be able to store different types of data in the same container. Such as:

State intState = 10; //calls the templated copy ctor

State stringState = std::string("some text");

std::vector<State> states;

states.push_back(intState);

states.push_back(stringState);

The container can actually be any STL container, which stores the State objects by value (and they keep only a handle to the actual data).

Reason why i went for State at all is because if i have 2 entities, with each of them containing say 5 states (all of which are different types for example), and one of the entities gets deleted, the States themselves are automatically destroyed too, and the data they were pointing to is no longer used, and can be overwritten when a new state of that type is created (it just overwrites the previously used location).

Also, one requirement that i have placed on myself is to only use default constructible data as the template type argument. Does that change anything?

devstropo.blogspot.com - Random stuff about my gamedev hobby

Regarding references, they are never stored anywhere, only retrieved for modification through the State objects by the as() method, and it's always just that one line ( state.as() = 10; )

You still have a bug. If two object are ever retrieved at the same time, and the second causes some vector to resize, the first reference to the first object will become invalid. You need to use a std::deque. Or a std::list, but deque is almost always faster.



The State isn't a template class because i want to be able to store different types of data in the same container. Such as:



State intState = 10; //calls the templated copy ctor

State stringState = std::string("some text");

std::vector states;

states.push_back(intState);

states.push_back(stringState);



The
container can actually be any STL container, which stores the State
objects by value (and they keep only a handle to the actual data).

Reason
why i went for State at all is because if i have 2 entities, with each
of them containing say 5 states (all of which are different types for
example), and one of the entities gets deleted, the States themselves
are automatically destroyed too, and the data they were pointing to is
no longer used, and can be overwritten when a new state of that type is
created (it just overwrites the previously used location).

Gotcha. You're using _storeRepo->getDataStore<T>() to check if the stored type matches the accessed or assinged type, by _storeRepo->getDataStore<T>() is an expensive operation, involving a hashmap search. You can find a less expensive way to do this check.

You still have a bug. If two object are ever retrieved at the same time, and the second causes some vector to resize, the first reference to the first object will become invalid. You need to use a std::deque. Or a std::list, but deque is almost always faster.

Could you elaborate on this a bit more? Is this related to multithreading or something else?

Gotcha. You're using _storeRepo->getDataStore<T>() to check if the stored type matches the accessed or assinged type, by _storeRepo->getDataStore<T>() is an expensive operation, involving a hashmap search. You can find a less expensive way to do this check.

I was thinking about just skipping the type check entirely, and just casting down the private interface member down to whatever the as<T>() function was invoked with, but that seems dangerous, and i'm not sure if i'd get errors in case of a bad cast. I was thinking of using static_cast, is trying to avoid dynamic_cast here ok?
I also thought of saving the result of typeid on initialization of State and then checking if the saved type and attempted type are equal, but i dont know whether that's an expensive operation to do every time i call the as<T>() method (which happens a lot, as most of the data is stored in these - physics data, game logic data etc). I've read about it a bit, and some sources say it's bad, some say it's good, and mostly from a design standpoint. Is saving the type_info structure inside State and checking it against the parameter of the as<T>() method an okay design decision? That way i could make the acquisition of the data a bit faster instead of doing the map lookup every time.

devstropo.blogspot.com - Random stuff about my gamedev hobby


You still have a bug. If two object are ever retrieved at the same time, and the second causes some vector to resize, the first reference to the first object will become invalid. You need to use a std::deque. Or a std::list, but deque is almost always faster.

Could you elaborate on this a bit more? Is this related to multithreading or something else?


Nothing to do with multithreading.
gDefaultDataStoreSize = 1; //This is probably a const global, but assume that it's one for this example
const int & from_state( State<int>(10) ).as<int>();
auto state2( State<int>(20) ); //causes Datastore::_data to resize
use(from_state);//uses freed memory. 
That's a quick and dirty example, and you can imagine that generally from_state would be a parameter to a function up the tree, in unrelated code.


Gotcha. You're using _storeRepo->getDataStore<T>() to check if the stored type matches the accessed or assinged type, by _storeRepo->getDataStore<T>() is an expensive operation, involving a hashmap search. You can find a less expensive way to do this check.

I was thinking about just skipping the type check entirely, and just casting down the private interface member down to whatever the as<T>() function was invoked with, but that seems dangerous, and i'm not sure if i'd get errors in case of a bad cast. I was thinking of using static_cast, is trying to avoid dynamic_cast here ok?
I also thought of saving the result of typeid on initialization of State and then checking if the saved type and attempted type are equal, but i dont know whether that's an expensive operation to do every time i call the as<T>() method (which happens a lot, as most of the data is stored in these - physics data, game logic data etc). I've read about it a bit, and some sources say it's bad, some say it's good, and mostly from a design standpoint. Is saving the type_info structure inside State and checking it against the parameter of the as<T>() method an okay design decision? That way i could make the acquisition of the data a bit faster instead of doing the map lookup every time.


If you don't check, then as long as you never do the wrong call, everything is fine. If you ever do use the wrong call, it could be hard to track down the error. I suggest putting the check in an assert.

Yeah saving the typeid seems like a much better approach to a check.

Nothing to do with multithreading.

gDefaultDataStoreSize = 1; //This is probably a const global, but assume that it's one for this example
const int & from_state( State<int>(10) ).as<int>();
auto state2( State<int>(20) ); //causes Datastore::_data to resize
use(from_state);//uses freed memory. 
That's a quick and dirty example, and you can imagine that generally from_state would be a parameter to a function up the tree, in unrelated code.


Ah, i see what you mean.
Thing is, i really don't pass the underlying ints/float/whathaveyou as references, but pass the State object directly, either by reference or by value, as copying it is just a copy of 1 int, 2 pointers, and a call to a virtual method to increase/decrease the reference counts, and when i need to change the underlying data, i use

someState.as<int>() = anotherValue;

So it's always just a one liner. However, seeing as i can't control it (i can control myself, but not some other guy who would do like you wrote), i have 2 options: change the container from vector to deque like you said, or maybe change the access rules for the State object, to it has 2 methods, one for getting a copy of the data, and another to set a new value (T get<T>() and void set<T>(const T& newVal) ), however, then i'm locking myself to using cheaply copyable data types (cant use a class that contains other classes itself, or containers and such). So i guess deque is the best option here. Thanks for the feedback!

If you don't check, then as long as you never do the wrong call, everything is fine. If you ever do use the wrong call, it could be hard to track down the error. I suggest putting the check in an assert.

Yeah saving the typeid seems like a much better approach to a check.

I'm gonna try using typeid then, unless someone can point out why i shouldn't, or proposes a better method. Thinking of using the hash_code it returns (c++11 feature), but i guess it doesn't matter much.

So, the problems mentioned:
Access to states would be quite slow (map query) - using a typeid check is better

All the constructors are called up front, and destructors are never called until the store is shut down - maybe call the destructor manually when the reference count gets to 0 for a slot?

Once allocated, memory can only be reused for the same type as the original allocation - that's the idea, actually, as the purpose for DataStore<T> is to be an object pool of sorts. Am i doing something wrong with the implementation for this to be a problem, and not a feature?

No way to break circular references - not sure what this is referring to sad.png

I don't believe this can be made thread safe - could it be made thread safe by somehow atomizing the retrieval operation of the data store? Or is the problem bigger than that?

Again, thanks for the help!

devstropo.blogspot.com - Random stuff about my gamedev hobby

If you do call the destructor manually, note that you will have to use placement new to reinitialise the object. Or you could use a block of memory instead of a container, and manage creation/destruction yourself. Otherwise, object lifetimes are not well defined, so you are restricting yourself to types with trivial constructors and destructors only.

Reusing memory for different types - not essential unless memory is really tight.

If object A has a reference to B and object B has a reference to A, neither will ever be released. You will have to be careful that that never happens.

Thread safety: You would need atomic operations on the reference count. The containers are more problematic. Even with a deque you would need a lock around every access, which would be expensive. You could store a pointer instead of an index, which would remove the need for a look-up. The same goes for the reference count.

If you follow all of that, you would end up with smart pointers using pools for allocation. Fairly standard stuff, although the typeless aspect is a twist.

If you do call the destructor manually, note that you will have to use placement new to reinitialise the object. Or you could use a block of memory instead of a container, and manage creation/destruction yourself. Otherwise, object lifetimes are not well defined, so you are restricting yourself to types with trivial constructors and destructors only.

Does this hold true even if the container keeps the data by value, and not by pointer?
I was planning on using the stores to keep types that are default constructible. Those would mostly be POD types and a few classes that contain PODs (like math vectors, rects and such). If you consider that to be my intended use for the class, do i still need to keep this in mind?

If object A has a reference to B and object B has a reference to A, neither will ever be released. You will have to be careful that that never happens.

This only applies if i use A and B as the type parameter for the DataStore, right?

Thread safety: You would need atomic operations on the reference count. The containers are more problematic. Even with a deque you would need a lock around every access, which would be expensive. You could store a pointer instead of an index, which would remove the need for a look-up. The same goes for the reference count.

If you follow all of that, you would end up with smart pointers using pools for allocation. Fairly standard stuff, although the typeless aspect is a twist.

Good to know. Do the c++11 concurrency features help with any of these problems? I haven't looked into them, but i remember seeing something like an atomic qualifier for methods, could that be useful?

devstropo.blogspot.com - Random stuff about my gamedev hobby

This topic is closed to new replies.

Advertisement