Jump to content

  • Log In with Google      Sign In   
  • Create Account

Criticism needed on my template voodoo


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
17 replies to this topic

#1 Strewya   Members   -  Reputation: 1589

Like
0Likes
Like

Posted 20 February 2013 - 04:58 AM

Greetings!

I wrote a small templated piece of code for containing any data type for my entity framework.

The way it works:
I have a State class which is basically a handle (it contains a vector index) to the actual data.
The actual data is in a vector<T> in a templated class DataStore<T>, which inherits from a basic interface to increase and decrease the reference count for a particulair handle (these are used by State objects to allow easy copying and managing the reference count).
Then there is a DataStoreRepository class, which has a private map<size_t, unique_ptr<IDataStore>>, and a templated method for getting a specific DataStore<T>, which it does by using typeid(T).hash_code() to query the map for existance for a pointer to that data store type (if not exist, it news one into a unique_ptr and puts it in).
The code is below for your perusal, so if i missed explaining something, feel free to ask or see if the code is explanatory enough.

My design goals are:
1. able to put any value type into entities using as much by-value stuff, and as little new'ing of the data as possible (State class)
2. keep away from using global states (DataStoreRepository class)

3. pull the actual data from an object pool (DataStore class)


My questions are:
1. am i doing anything that i really shouldn't be doing?
2. is using typeid in this context okay?
3. can anything here be considered a hack/bad design/code smell?

 

I'd love it if you could take apart my code and point out anything that seems bad or misguided smile.png thanks!

===


I'm most concerned about the concept behind this class. In my hierarchy, the winmain instantiates an Engine class, which itself contains GameContexts, which themselves have one DataStoreRepository each, and when a GameContext is activated, it's DataStoreRepository is bound to State, so every GameConetxt has it's own pool of data. It kinda seems like a hack, but i don't know if i'm just imagining it or if it's justified.

class DataStoreRepository
{
private:
    std::unordered_map<size_t, std::unique_ptr<IDataStore>> _storeMap;
public:
    template<typename T> DataStore<T>& getDataStore()
    {
        auto hash = typeid(T).hash_code();
        auto it = _storeMap.find(hash);
        if(it == _storeMap.end())
        {
            it = _storeMap.insert(std::make_pair(hash, std::unique_ptr<IDataStore>(new DataStore<T>()))).first;
        }
        return *static_cast<DataStore<T>*>(it->second.get());
    }
};

This is the whole class, and it's relatively simple. I'm thinking of replacing the _freeSlots member to be a set, so that data is always at the start of the _data vector.

class IDataStore
{
public:
    virtual ~IDataStore() {}
    virtual uint AcquireDataIndex(int index = NOT_FOUND) = 0;
    virtual void ReleaseDataIndex(uint index) = 0;
};

template<typename T> class DataStore : public IDataStore
{
private:
    typedef std::vector<T> TypeCache;
    typedef std::vector<uint> NumberCache;
    TypeCache _data;
    NumberCache _refs;
    NumberCache _freeSlots;
    uint _defaultSize;
public:
    DataStore(uint size = gDefaultDataStoreSize)
        : _defaultSize(size)
    {
        _data.reserve(size);
        _refs.reserve(size);
        _freeSlots.reserve(size);
    }

    uint AcquireDataIndex(int index = NOT_FOUND)
    {
        if(index == NOT_FOUND)
        {
            //new data index
            if(!_freeSlots.empty())
            {
                index = _freeSlots.back();
                _freeSlots.pop_back();
                _data[index] = T();
            }
            else
            {
                index = _data.size();
                _data.emplace_back();
                _refs.push_back(0);
            }
        }
        ++_refs[index];
        return index;
    }

    void ReleaseDataIndex(uint index)
    {
        if(index != NOT_FOUND && _refs[index] != 0)
        {
            if(--_refs[index] == 0)
            {
                _freeSlots.push_back(index);
            }
        }
    }

    T& getData(uint index)
    {
        return _data[index];
    }
};

Most of the methods in State are non templated, templates here are to make getting the data and setting the data easier. The other methods just use AddRef or Release to point to the same data or get new data.
Before using anything, State::BindStoreRepository is called, and passed in a concrete instance (made on the stack by another class higher up in the call hierarchy, so it's guaranteed to outlive the State objects).

Reason why it's nice to me is because i can do "" State s = 10; "", and it's automatically bound to the right DataStore for the active GameContext, and i can keep them by value in any container i want.

class State
{
private:
    static Util::DataStoreRepository* _storeRepo;

    Util::IDataStore* _dataStore;
    int _dataHandle;
    //this checks _dataStore is not nullptr and calls _dataStore->Acquire(_dataHandle);
    void AddRef(uint handle = NOT_FOUND);
    //this checks _dataStore is not nullptr and calls _dataStore->Release(_dataHandle);
    void Release();

public:
    static void BindStoreRepository(Util::DataStoreRepository& repo);
    State();
    State(const State& rhs);
    State(State&& rhs);
    State& operator=(const State& rhs);
    State& operator=(State&& rhs);
    ~State();
    bool isValid();

    template<typename T> State(const T& value)
        : _dataStore(&_storeRepo->getDataStore<T>()), _dataHandle(_storeRepo->getDataStore<T>().AcquireDataIndex())
    {
        as<T>() = value;
    }

    template<typename T> State& operator=(const T& rhs)
    {
        Util::DataStore<T>& store = _storeRepo->getDataStore<T>();
        if(_dataStore != &store)
        {
            Release();
            _dataStore = &store;
            AddRef();
        }
        as<T>() = rhs;
        return *this;
    }

    template<typename T> T& as()
    {
	Util::DataStore<T>& store = _storeRepo->getDataStore<T>();
	if(&store == _dataStore)
	{
	    return store.getData(_dataHandle);
	}
	throw std::exception("State::as<T>(): Tried to access wrong type!");
    }
};

devstropo.blogspot.com - Random stuff about my gamedev hobby


Sponsor:

#2 EWClay   Members   -  Reputation: 659

Like
2Likes
Like

Posted 20 February 2013 - 06:30 AM

Problems that I can see:

Access to states would be quite slow; you have to look up the store every time.

All the constructors are called up front, and destructors are never called until the store is shut down.

Once allocated, memory can only be reused for the same type as the original allocation.

No way to break circular references.

I don't believe this can be made thread safe.

Edit: actually it's worse than a lack of thread safety. Get a reference to an object. Call a method. It creates a new object of the same type. The original object moves in memory due to the vector emplace_back. Heap corruption and massive bug hunt follows.

#3 King Mir   Members   -  Reputation: 2050

Like
0Likes
Like

Posted 20 February 2013 - 08:19 AM

Yeah, you need to use deque instead of vector for your caches, to preserve references on resize.

 

Also, I don't understand in your design, why do you have State have template assignment and constructor, instead of making State a template class itself?



#4 Strewya   Members   -  Reputation: 1589

Like
0Likes
Like

Posted 20 February 2013 - 08:44 AM

Thanks!

 

Regarding references, they are never stored anywhere, only retrieved for modification through the State objects by the as<T>() method, and it's always just that one line ( state.as<int>() = 10; )

 

The State isn't a template class because i want to be able to store different types of data in the same container. Such as:

 

State intState = 10; //calls the templated copy ctor

State stringState = std::string("some text");

std::vector<State> states;

states.push_back(intState);

states.push_back(stringState);

 

The container can actually be any STL container, which stores the State objects by value (and they keep only a handle to the actual data).

Reason why i went for State at all is because if i have 2 entities, with each of them containing say 5 states (all of which are different types for example), and one of the entities gets deleted, the States themselves are automatically destroyed too, and the data they were pointing to is no longer used, and can be overwritten when a new state of that type is created (it just overwrites the previously used location).

 

Also, one requirement that i have placed on myself is to only use default constructible data as the template type argument. Does that change anything?


devstropo.blogspot.com - Random stuff about my gamedev hobby


#5 King Mir   Members   -  Reputation: 2050

Like
1Likes
Like

Posted 20 February 2013 - 10:03 AM

Regarding references, they are never stored anywhere, only retrieved for modification through the State objects by the as() method, and it's always just that one line ( state.as() = 10; )

You still have a bug. If two object are ever retrieved at the same time, and the second causes some vector to resize, the first reference to the first object will become invalid. You need to use a std::deque. Or a std::list, but deque is almost always faster.

 



The State isn't a template class because i want to be able to store different types of data in the same container. Such as:



State intState = 10; //calls the templated copy ctor

State stringState = std::string("some text");

std::vector states;

states.push_back(intState);

states.push_back(stringState);



The
container can actually be any STL container, which stores the State
objects by value (and they keep only a handle to the actual data).

Reason
why i went for State at all is because if i have 2 entities, with each
of them containing say 5 states (all of which are different types for
example), and one of the entities gets deleted, the States themselves
are automatically destroyed too, and the data they were pointing to is
no longer used, and can be overwritten when a new state of that type is
created (it just overwrites the previously used location).

Gotcha. You're using _storeRepo->getDataStore<T>() to check if the stored type matches the accessed or assinged type, by _storeRepo->getDataStore<T>() is an expensive operation, involving a hashmap search. You can find a less expensive way to do this check.

Edited by King Mir, 20 February 2013 - 10:12 AM.


#6 Strewya   Members   -  Reputation: 1589

Like
0Likes
Like

Posted 21 February 2013 - 04:02 AM

 

You still have a bug. If two object are ever retrieved at the same time, and the second causes some vector to resize, the first reference to the first object will become invalid. You need to use a std::deque. Or a std::list, but deque is almost always faster.

Could you elaborate on this a bit more? Is this related to multithreading or something else?

Gotcha. You're using _storeRepo->getDataStore<T>() to check if the stored type matches the accessed or assinged type, by _storeRepo->getDataStore<T>() is an expensive operation, involving a hashmap search. You can find a less expensive way to do this check.

I was thinking about just skipping the type check entirely, and just casting down the private interface member down to whatever the as<T>() function was invoked with, but that seems dangerous, and i'm not sure if i'd get errors in case of a bad cast. I was thinking of using static_cast, is trying to avoid dynamic_cast here ok?
I also thought of saving the result of typeid on initialization of State and then checking if the saved type and attempted type are equal, but i dont know whether that's an expensive operation to do every time i call the as<T>() method (which happens a lot, as most of the data is stored in these - physics data, game logic data etc). I've read about it a bit, and some sources say it's bad, some say it's good, and mostly from a design standpoint. Is saving the type_info structure inside State and checking it against the parameter of the as<T>() method an okay design decision? That way i could make the acquisition of the data a bit faster instead of doing the map lookup every time.

devstropo.blogspot.com - Random stuff about my gamedev hobby


#7 King Mir   Members   -  Reputation: 2050

Like
1Likes
Like

Posted 21 February 2013 - 05:04 AM


You still have a bug. If two object are ever retrieved at the same time, and the second causes some vector to resize, the first reference to the first object will become invalid. You need to use a std::deque. Or a std::list, but deque is almost always faster.

Could you elaborate on this a bit more? Is this related to multithreading or something else?


Nothing to do with multithreading.
gDefaultDataStoreSize = 1; //This is probably a const global, but assume that it's one for this example
const int & from_state( State<int>(10) ).as<int>();
auto state2( State<int>(20) ); //causes Datastore::_data to resize
use(from_state);//uses freed memory. 
That's a quick and dirty example, and you can imagine that generally from_state would be a parameter to a function up the tree, in unrelated code.


Gotcha. You're using _storeRepo->getDataStore<T>() to check if the stored type matches the accessed or assinged type, by _storeRepo->getDataStore<T>() is an expensive operation, involving a hashmap search. You can find a less expensive way to do this check.

I was thinking about just skipping the type check entirely, and just casting down the private interface member down to whatever the as<T>() function was invoked with, but that seems dangerous, and i'm not sure if i'd get errors in case of a bad cast. I was thinking of using static_cast, is trying to avoid dynamic_cast here ok?
I also thought of saving the result of typeid on initialization of State and then checking if the saved type and attempted type are equal, but i dont know whether that's an expensive operation to do every time i call the as<T>() method (which happens a lot, as most of the data is stored in these - physics data, game logic data etc). I've read about it a bit, and some sources say it's bad, some say it's good, and mostly from a design standpoint. Is saving the type_info structure inside State and checking it against the parameter of the as<T>() method an okay design decision? That way i could make the acquisition of the data a bit faster instead of doing the map lookup every time.


If you don't check, then as long as you never do the wrong call, everything is fine. If you ever do use the wrong call, it could be hard to track down the error. I suggest putting the check in an assert.

Yeah saving the typeid seems like a much better approach to a check.

Edited by King Mir, 21 February 2013 - 05:05 AM.


#8 Strewya   Members   -  Reputation: 1589

Like
0Likes
Like

Posted 21 February 2013 - 06:03 AM

Nothing to do with multithreading.

gDefaultDataStoreSize = 1; //This is probably a const global, but assume that it's one for this example
const int & from_state( State<int>(10) ).as<int>();
auto state2( State<int>(20) ); //causes Datastore::_data to resize
use(from_state);//uses freed memory. 
That's a quick and dirty example, and you can imagine that generally from_state would be a parameter to a function up the tree, in unrelated code.


Ah, i see what you mean.
Thing is, i really don't pass the underlying ints/float/whathaveyou as references, but pass the State object directly, either by reference or by value, as copying it is just a copy of 1 int, 2 pointers, and a call to a virtual method to increase/decrease the reference counts, and when i need to change the underlying data, i use

someState.as<int>() = anotherValue;

So it's always just a one liner. However, seeing as i can't control it (i can control myself, but not some other guy who would do like you wrote), i have 2 options: change the container from vector to deque like you said, or maybe change the access rules for the State object, to it has 2 methods, one for getting a copy of the data, and another to set a new value (T get<T>() and void set<T>(const T& newVal) ), however, then i'm locking myself to using cheaply copyable data types (cant use a class that contains other classes itself, or containers and such). So i guess deque is the best option here. Thanks for the feedback!
 

If you don't check, then as long as you never do the wrong call, everything is fine. If you ever do use the wrong call, it could be hard to track down the error. I suggest putting the check in an assert.

Yeah saving the typeid seems like a much better approach to a check.

I'm gonna try using typeid then, unless someone can point out why i shouldn't, or proposes a better method. Thinking of using the hash_code it returns (c++11 feature), but i guess it doesn't matter much.

So, the problems mentioned:
Access to states would be quite slow (map query) - using a typeid check is better

All the constructors are called up front, and destructors are never called until the store is shut down - maybe call the destructor manually when the reference count gets to 0 for a slot?

Once allocated, memory can only be reused for the same type as the original allocation - that's the idea, actually, as the purpose for DataStore<T> is to be an object pool of sorts. Am i doing something wrong with the implementation for this to be a problem, and not a feature?

No way to break circular references - not sure what this is referring to sad.png

I don't believe this can be made thread safe - could it be made thread safe by somehow atomizing the retrieval operation of the data store? Or is the problem bigger than that?

Again, thanks for the help!

Edited by Strewya, 21 February 2013 - 06:03 AM.

devstropo.blogspot.com - Random stuff about my gamedev hobby


#9 EWClay   Members   -  Reputation: 659

Like
1Likes
Like

Posted 21 February 2013 - 07:45 AM

If you do call the destructor manually, note that you will have to use placement new to reinitialise the object. Or you could use a block of memory instead of a container, and manage creation/destruction yourself. Otherwise, object lifetimes are not well defined, so you are restricting yourself to types with trivial constructors and destructors only.

Reusing memory for different types - not essential unless memory is really tight.

If object A has a reference to B and object B has a reference to A, neither will ever be released. You will have to be careful that that never happens.

Thread safety: You would need atomic operations on the reference count. The containers are more problematic. Even with a deque you would need a lock around every access, which would be expensive. You could store a pointer instead of an index, which would remove the need for a look-up. The same goes for the reference count.

If you follow all of that, you would end up with smart pointers using pools for allocation. Fairly standard stuff, although the typeless aspect is a twist.

#10 Strewya   Members   -  Reputation: 1589

Like
0Likes
Like

Posted 21 February 2013 - 08:36 AM

If you do call the destructor manually, note that you will have to use placement new to reinitialise the object. Or you could use a block of memory instead of a container, and manage creation/destruction yourself. Otherwise, object lifetimes are not well defined, so you are restricting yourself to types with trivial constructors and destructors only.

Does this hold true even if the container keeps the data by value, and not by pointer?
I was planning on using the stores to keep types that are default constructible. Those would mostly be POD types and a few classes that contain PODs (like math vectors, rects and such). If you consider that to be my intended use for the class, do i still need to keep this in mind?

If object A has a reference to B and object B has a reference to A, neither will ever be released. You will have to be careful that that never happens.

This only applies if i use A and B as the type parameter for the DataStore, right?

Thread safety: You would need atomic operations on the reference count. The containers are more problematic. Even with a deque you would need a lock around every access, which would be expensive. You could store a pointer instead of an index, which would remove the need for a look-up. The same goes for the reference count.

If you follow all of that, you would end up with smart pointers using pools for allocation. Fairly standard stuff, although the typeless aspect is a twist.

Good to know. Do the c++11 concurrency features help with any of these problems? I haven't looked into them, but i remember seeing something like an atomic qualifier for methods, could that be useful?

devstropo.blogspot.com - Random stuff about my gamedev hobby


#11 EWClay   Members   -  Reputation: 659

Like
1Likes
Like

Posted 21 February 2013 - 01:27 PM

Default constructible isn't enough, you should only store POD types if you don't manage the object lifetimes. You can check for this at compile time with std::is_pod.

This would exclude the possibility of circular references, since State is not POD.

C++11 has some nice features for multithreading, but as it stands, you would need locks to protect the containers, not atomics. Nothing to stop you storing atomic types though.

#12 King Mir   Members   -  Reputation: 2050

Like
0Likes
Like

Posted 21 February 2013 - 08:06 PM

All the constructors are called up front, and destructors are never called until the store is shut down - maybe call the destructor manually when the reference count gets to 0 for a slot?

If you do that, you have to store a char buffer or union instead of T. You can't call a destructor on most objects, because the destructor will still automatically be called when they go out of scope naturally, even if the destructor has already been called. Using placement new, you can initialize memory to an object, which can then be manually de-constructed. You can use placement new on a char buffer or a union that contains a type T member, initializing that member to T. Using a union is probably better, because you don't need to static cast when you access the buffer, or store a pointer to the object anywhere. But they syntax is ugly.



I'm also sceptical of the general purpose of this construct. I am not sure what problem you are trying to solve, and why this might be a good solution to anything.

#13 Strewya   Members   -  Reputation: 1589

Like
0Likes
Like

Posted 22 February 2013 - 02:09 AM

The general purpose of this construct is that i'm trying to make an entity component system, but more in the direction of an attribute-behaviour variety.
Instead of having predetermined structs/classes that have specific members (say, a Physics component that has a vec2 for velocity, acceleration and a rect for the bounding box), i was trying to make a system where the components are basic types, such as vec2, vec3, rect, bool int, float, double or color. That way i could have a lot more control over what data i need in an entity. Problem is that standard component systems operate by new'ing the components on the heap, and calling new for basic types just seemed like a bad idea (considering an entity would have at least 20 properties, plus anything required by the game logic, which can be up to a hundred properties or more, depending on the game).
I've read in many places that calling new a lot is expensive, so i tried making a system where calling it was unneccessary. Instinctively, i thought of pulling these basic types from an object pool, and wrapping them with a class that could be saved by value in the entity class.
I presume that new is expensive because it needs to find the memory location to place the object in, so placement new would be cheaper if i already have the location? If i change the store to actually be a memory pool instead of an object pool, would that solve anything? Or is my whole idea just plain wrong to begin with?

devstropo.blogspot.com - Random stuff about my gamedev hobby


#14 King Mir   Members   -  Reputation: 2050

Like
1Likes
Like

Posted 22 February 2013 - 03:03 AM

Your current implementation isn't cheaper than calling new.

Also, a bigger cost stemming from dynamic memory is not calling new, but the loss of cache locality that results in; a new'd object can be located far in memory from where the pointer is used and stored.


Do you need entities to be dynamically typed, or will you still be creating components like "Physics" at compile time? If the latter, could use tuples instead?

#15 Strewya   Members   -  Reputation: 1589

Like
0Likes
Like

Posted 22 February 2013 - 03:39 AM

Yeah, i'm aware of the cache locality issue, and was trying to minimize it by using vectors in the DataStore class.

I wouldn't create components at compile time, but rather from Lua scripts. I was thinking of exposing the needed functionality to Lua, so i could make an Entity by dynamically giving it data that is needed for it to do something.
Physics, rendering, movement, collision and such would be done by giving it only what it needs to perform that action. If an Entity has velocity which never changes, it just wouldn't have an acceleration property. If another Entity needed acceleration, then i'd give it the acceleration property, and it would work as expected.

Each Entity would be registered to a behaviour that operates on a collection of Entities, but only if they have the required properties. For instance, there would be an Accelerate behaviour, that expects the entities in it's collection to have the Acceleration and Velocity properties, and would just add the Acceleration value to the Velocity value (and if any of the properties aren't of the expected type, the game would complain and either crash or skip that entity).

I was thinking about maybe skipping the State wrapper class, and try using DataStores directly, something like DataStore<T>::attachProperty(Entity& entity, string name, T& value); and have a T& DataStore<T>::getProperty(Entity& entity, string name), but this would probably suffer from the same issues.

devstropo.blogspot.com - Random stuff about my gamedev hobby


#16 King Mir   Members   -  Reputation: 2050

Like
0Likes
Like

Posted 22 February 2013 - 04:14 AM

You have a vector of same type objects. That doesn't help cache locality, because there's no special tendency to access same type objects near each other.

Instead you probably want entities components in the same collection to be near each other. As I see it there are two efficent ways to do this:
1) a container of of union types. You might need a wrapper object on the union.
2) a container of pointers, all pointing to a char buffer, holding the data. There are various ways to achieve this.


Also, since the entire point of this voodoo is to optimise performance, you should build a testbed that actually verifies the performace improvement. Time the baseline, using new all over the place. Time this implementation (fixing the deque issue). And if you attempt something else, time that too.

#17 Strewya   Members   -  Reputation: 1589

Like
0Likes
Like

Posted 22 February 2013 - 05:24 AM

Hm, true about vector locality...
I know i should probably test the performance first, and this is part of my problem, that i'm doing premature optimizations.

At any rate, i'm also willing to scrap part of this concept in favour for something better. I just don't know what that would be biggrin.png

The only thing i would really like to have is a way to hold individual data types in a non-template class, and have the non-template class take care of the cleanup.

I'm also juggling an idea where i ask a data store to make me a state, something like this:
State DataStore<T>::MakeState()
{
    return State(this, &GetAddressFromPool(), hashOfType);
//the GetAddressFromPool would be a dataStore<T> method
}

State(IDataStore* store, void* ptrToData, size_t hash) {}

~State() 
{
    store->Release(ptrToData);
}

T& State::as<T>()
{
    if(hash == typeof(T).hash_code())
    {
	return *static_cast<T*>(ptrToData));
    }
    throw exception;
}
Is this something in the lines of your 2) option?
I'm not really planning to go multithreaded any time soon, so i could get away with no thread safety, but the concept of having any data type in a non-template class is really appealing to me, and would like to pursue it.

devstropo.blogspot.com - Random stuff about my gamedev hobby


#18 King Mir   Members   -  Reputation: 2050

Like
0Likes
Like

Posted 22 February 2013 - 01:47 PM

That looks like it's implementing a global memory pool for the state internals. That's better than your original, because objects of different types are in the same pool. Also, there is less indirection.

But I meant to suggest having a separate buffer for each component -- for each container with State objects in it.


Premature optimisation is one problem, but a bigger one is trying to write code without testing if it works. For performance optimisation, testing requires profiling. The surest way to see if a particular implementation is any good is not to ask here, but to test.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS