Game Engine design for many-typed many-copied entities

Started by
58 comments, last by nfries88 8 years, 1 month ago


I wonder how fast a binary search over a vector would be compared to a custom bitset implementation...
Binary search fcks up the CPU's branch predictor. Its pretty much one random jump into memory each iteration, hard to predict, and many times the branch decides which memory segment needs to be loaded next if they're far apart enough, thus hindering the CPU's ability to preload it into a cache line. This cost is amortized if you're handling a lot of data.

Bitsets its a couple bitwise ops to find the word index in the array, jump to it, then do a couple bitwise ops again. Much more straightforward (although not as memory efficient if they're very sparse).

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

Advertisement


I wonder how fast a binary search over a vector would be compared to a custom bitset implementation...
Binary search fcks up the CPU's branch predictor. Its pretty much one random jump into memory each iteration, hard to predict, and many times the branch decides which memory segment needs to be loaded next if they're far apart enough, thus hindering the CPU's ability to preload it into a cache line. This cost is amortized if you're handling a lot of data.

Bitsets its a couple bitwise ops to find the word index in the array, jump to it, then do a couple bitwise ops again. Much more straightforward (although not as memory efficient if they're very sparse).

A binary search works by halving the search area until the desired thing is found. A bitset with 1,000 bits fits into two cache lines (64 bytes per cacheline = 512 bits per cacheline), and the very first iteration will shrink this to one cache line (it halved the search area), meaning AT WORST this causes one cache miss. The branch misprediction penalties will be better than for a normal linear search, and once we get down to the search area being in a single word (at its 5th iteration on x64, 6th on x86), we can use masks and shifts as you suggested to optimize the last 5(4 on x86) iterations.
I was just thinking how to optimally implement bitset.getNextSetBit on a potentially large, sparsely set bitset for use in a loop. Binary search was the first thing to pop into my head because this is one of those very rare cases where cache misses are unlikely. It could well be that the cost of testing for a set bit over a wide area (with out-of-order execution, an interleaved popcount over a large number of words (best part - you can store these for the length of the call, and even for successive calls, at the cost of a byte per word) would outperform successive TESTs) is less than a linear mask-and-shift.


I wonder how fast a binary search over a vector would be compared to a custom bitset implementation...
Binary search fcks up the CPU's branch predictor. Its pretty much one random jump into memory each iteration, hard to predict, and many times the branch decides which memory segment needs to be loaded next if they're far apart enough, thus hindering the CPU's ability to preload it into a cache line. This cost is amortized if you're handling a lot of data.

Bitsets its a couple bitwise ops to find the word index in the array, jump to it, then do a couple bitwise ops again. Much more straightforward (although not as memory efficient if they're very sparse).

A binary search works by halving the search area until the desired thing is found. A bitset with 1,000 bits fits into two cache lines (64 bytes per cacheline = 512 bits per cacheline), and the very first iteration will shrink this to one cache line (it halved the search area), meaning AT WORST this causes one cache miss. The branch misprediction penalties will be better than for a normal linear search, and once we get down to the search area being in a single word (at its 5th iteration on x64, 6th on x86), we can use masks and shifts as you suggested to optimize the last 5(4 on x86) iterations.
I was just thinking how to optimally implement bitset.getNextSetBit on a potentially large, sparsely set bitset for use in a loop. Binary search was the first thing to pop into my head because this is one of those very rare cases where cache misses are unlikely. It could well be that the cost of testing for a set bit over a wide area (with out-of-order execution, an interleaved popcount over a large number of words (best part - you can store these for the length of the call, and even for successive calls, at the cost of a byte per word) would outperform successive TESTs) is less than a linear mask-and-shift.

Binary search can also be a pathological case for cache line aliasing. If your dataset is a power of two (e.g. 1024 items), you actually want to unbalance your search by not splitting right down the middle, even though algorithmically that should be the optimal split location... :o

A bitset with 1,000 bits fits into two cache lines (64 bytes per cacheline = 512 bits per cacheline), and the very first iteration will shrink this to one cache line (it halved the search area), meaning AT WORST this causes one cache miss.

Wait a second, you're mixing a bunch of things: First you mentioned binary search over a vector, not a bitset. Second, you're suggesting binary search for iteration? That doesn't makes much sense, it wouldnt help you for iterating over a bitset.

For binary searching through a vector (what you initially suggested) you'd need 1000 ints for the IDs, thats 4kB of data and it'd really involve a bunch of cache misses.

There are plenty of ways to iterate over the bits of an int, but it has to simply check if each value of the backing array has any single bit on before fetching the next one, regardless of the technique you use to iterate over the bits themselves in the single int.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

I learned something today, thanks Hodgman!

Quite accidentally, I think my idea to use cached popcount data would eliminate these issues. If no popcounts need to be calculated, I don't even look at the memory in the bitset until I find the exact memory word the bit is in. If popcounts do need to be calculated, I'll probably do those in order, and get the advantage of sequential accesses we're all used to. I won't be performing a binary search on these, instead adding sequentially to get the popcount for the whole search range, so again I get the advantages of sequential access. I could also software prefetch the final memory word's cache line after the fourth test before finding the exact memory word, so this would offer great cache miss and branch misprediction avoidance.


Wait a second, you're mixing a bunch of things: First you mentioned binary search over a vector, not a bitset. Second, you're suggesting binary search for iteration? That doesn't makes much sense, it wouldnt help you for iterating over a bitset.



For binary searching through a vector (what you initially suggested) you'd need 1000 ints for the IDs, thats 4kB of data and it'd really involve a bunch of cache misses.

I meant iteration (as in step) of the search.
Also, I was talking about performance of a binary search over a vector<bool>, not a generic vector. vector<bool> is the closest thing STL offers to a dynamic bitset, and the standard gives implementations a great deal of flexibility in its implementation to save memory (maximum memory savings would be to implement it as a dynamic bitset).

And specifically, I was pondering whether a binary search over a good implementation of vector<bool> would perform poorly enough to warrant constructing my own implementation to be able to use the above techniques, to which Hodgman might have answered my question.

I do like the point that by using a component model the problem of game-implementation-complexity isn't magically solved. Often times the game complexity falls onto the communication between the more modular pieces. It's fairly safe to say composition-style does aid in dismantling inflexibility in areas that inheritance trees, I'm sure we all mostly agree on this.

However it looks like there's often an "assumption" on the internet that communication/messaging will also magically align itself with the stars. Another assumption is that all of the entirety of a game engine ought to be forcefully shoved into a component model. As an alternative to these assumptions I would propose:

  1. Architect a couple ways to go about communication across the game. Pointers, function calls, virtual dispatch, polling, listeners/messages, etc. are some options. Each option needs to be considered and weighed against alternatives.
  2. Allow engineers to write components in a more traditional OOP-style in terms of API/interface. An OOP-style interface (i.e. class with member functions) does not incur hardly any overhead other than pointer aliasing. Lastly, the component API does not need to strictly reflect how the components are stored in memory.
  3. Clearly define where the component/aggregation will be used, and what the boundaries are. Are the components mostly interfaces to lower-level services (like wrappers to a graphics or physics API)? Are the components strictly for dynamic gameplay objects, or is every triangle in the world represented in the component model?

The point I'm attempting to argue for is that game implementation (like game logic, ai, state machines, interactions, scripts, etc.) will always be like an exploding plate of spaghetti. Each game is complex and vastly unique from every other game, with lots of inter-tangled bits. I argue to allow the game implementation to take on its most natural form. From an engineer's point of view this can be thought of as taking the path of least resistance -- instead of forcing the game to adhere to a certain paradigm in a strict manner, we let the game land where it will.

That said, pain can be taken to ensure the spaghetti explosion is as small as possible. This is where services like Direct X, or Box2D come into play. These services attempt to provide an API that solves a specific problem. The thing about APIs is that they are incredibly difficult to implement and require solid engineering. Component-models will require extremely solid API design in order to facilitate the demands of an exploding spaghetti game. As long as these APIs are constructed in a way that allows the game to latch onto the API without compromising its natural game-form, the API has succeeded. As Muratori explains in that last link, a good way to construct an API is to provide a range of lower-level "immediate mode" functions to higher-level "retained mode" functions. This range of options, that all ultimately solve the same specific problem, can help the game retain its natural form while latching onto the API for help.

Here's some quick justifications for each point I proposed:

1) This is an attempt to provide a range of communication options for whatever the team's needs are, in order to construct a solid API. I would recommend culling options that do not make sense for a given project.

2) Writing good APIs is super difficult, and when someone constructs a component they are in a way creating an API that encapsulates the solution to a problem. In general devs are familiar with OOP-ish APIs involving classes and member functions. Perhaps for different teams a different style for implementing components would work. This is just one idea. The assumption I made here is that a solid engineer has already implemented the mechanisms for creating/storing components efficiently, and another engineer (possibly the same one) is later on using this mechanism to implement game-specific components.

3) A component-model ought to solve a problem. I'd advise defining what problems the components will solve and then engineer the solution to those problems. Nothing else. For example, if components are meant only to store AI-scripts for run-time game entities then many simplifications can be made here to create a solution. Perhaps an array of function pointers to scripts is all that a "game object" needs to be, where each function pointer represents a script. Perhaps another array can be used to store parameters to each corresponding script index. Additional complexity in the component-model should only arise to meet the challenges of the solutions that the component-model is trying to solve.

The point I'm attempting to argue for is that game implementation (like game logic, ai, state machines, interactions, scripts, etc.) will always be like an exploding plate of spaghetti. Each game is complex and vastly unique from every other game, with lots of inter-tangled bits. I argue to allow the game implementation to take on its most natural form. From an engineer's point of view this can be thought of as taking the path of least resistance -- instead of forcing the game to adhere to a certain paradigm in a strict manner, we let the game land where it will.

That said, pain can be taken to ensure the spaghetti explosion is as small as possible. This is where services like Direct X, or Box2D come into play. These services attempt to provide an API that solves a specific problem. The thing about APIs is that they are incredibly difficult to implement and require solid engineering. Component-models will require extremely solid API design in order to facilitate the demands of an exploding spaghetti game. As long as these APIs are constructed in a way that allows the game to latch onto the API without compromising its natural game-form, the API has succeeded. As Muratori explains in that last link, a good way to construct an API is to provide a range of lower-level "immediate mode" functions to higher-level "retained mode" functions. This range of options, that all ultimately solve the same specific problem, can help the game retain its natural form while latching onto the API for help.

I understand your point Randy but you might want to rephrase 'taking the path of least resistance', as you dont want to end up with an engine that is so inflexible it is unable to add anything more to your game than what is already there. Also, i've heard the opposite being said: When programming you want to design a system which has a wide range and wide applicability. Though it may be more difficult to code in some respects, it means that you get a large amount of freedom in how you want to code the smaller details or the actual meat of the game.

To followup on my original idea, i'm planning to make it look something like this:


EntityPool // A pool of all entities of a single type
{
    // Similar to what you would find in an ECS system, but this adds/removes the component to every entity
    addComponent()
    removeComponent()
    void* getComponent( ComponentType type ) // Gets a Component class, but this class can act on any entity in the given pool
    bool  hasComponent( ComponentType type )
    // When the entity is clicked on a list of possible actions appear. Those actions are stored in the entity pools context menu
    ContextMenu cMenu
}

abstract TextureComponent
{
    virtual Texture& getTexture( EntityIndex index )
}

AlarmTextureComponent : TextureComponent
{
    Texture& getTexture( EntityIndex index )
    {
        if( alarms->getItem<bool>( index, alarmActiveIndex ) )
            return active;
        else
            return inactive
    }

    ItemArr*   alarms
    int        alarmActiveIndex
    Texture&   active
    Texture&   inactive
}

Alarms : EntityPool
{
    Alarms()
    {
        ...
        addComponent( AlarmTextureComponent( &alarms, 0, activeTex, inactiveTex )

        cMenu.add( "Set Alarm", 
            [](EntityIndex index){ alarms[index] = !alarms[index] if(alarms[index] == true) event( CallFireBrigade ) } 
                 );
    }
    Arr<bool>  alarms
}

GraphicsSystem
{
    update()
    {
        for( each item on screen )
            if( item's pool has TextureComponent )
                render( itemPos, pool->TextureComponent->getTexture( id ) )
    }
}

This design shows you generally how the composition design is based and how the systems operate on it. Whilst the use of the event system in this design requires that a lot of objects have a reference to the one event system (and so the eventSystem ptr must be passed down quite a few levels), I think this is a problem that is inevitable in the majority of engines and is a necessary evil. In the graphic system it will try to render every item on the screen as this is more efficient than cycling through every renderable entity, but I am planning to implement a way by which systems can capture only the entitypools that have components they want and operate on only them (in the same way an ECS system will hold a personal container of entities with the components they want to operate on).

EDIT: Oh, and if anyone is confused by storage of entity positions, that is done by a 2d array in which each element contains a vector of entity ids (entity ids contain an EntityType and an EntityIndex to find the correct entitypool and correct index in that entitypool respectively)


***good advice***
Thank you for posting that, good sir!

I understand your point Randy but you might want to rephrase 'taking the path of least resistance', as you dont want to end up with an engine that is so inflexible it is unable to add anything more to your game than what is already there. Also, i've heard the opposite being said: When programming you want to design a system which has a wide range and wide applicability. Though it may be more difficult to code in some respects, it means that you get a large amount of freedom in how you want to code the smaller details or the actual meat of the game.

I think there's a balance - if you design it too abstractly, you're not adding anything but only exposing the same APIs your engine is built on, perhaps merely through the lens of a different paradigm.

You want to think and plan for the future, by designing in such a way that you can extend or modify without too much effort, but I'd err on the side of being too strict rather than too flexible - otherwise I'd run the risk of making an engine that benefits no game, rather than making a game from which an engine emerges. You can tweak your interfaces later, to expose more functionality, but it's harder to take an overly-generic interface and trim it back without rewriting the whole interface. </opinion>

Some say early optimisation is a bad thing. But it could be mostly true but not that black and white, more be a 20 to 80 rule that there is lot grey area. Some post upward it mentiond that ECS is multithreaded friendly .
Multithreaded is a optimisation for scaling and higher scope games that got and wil be more important as utilising modern cores. While quad is mainstream now we going beyond 8 plus hyperthteading.
It isn't about thread save only, but also effecient parralel computing.
ESC using more pure Data oriented design support that much better.
As efficent parrallel computing relies on software architecture and method paradigm used.

While OOP and ECS and DOD are mixable. But some benefits of DOD could be destroyed by mix it with OOP. It could be that some ECS architectures perform bad because of that.
For small games where scaling and heavy multithreading is not needed its no isue .
That my observation by reading a lot on ECS. And are interrested to in ECS scaling and use for big scope games.

This topic is closed to new replies.

Advertisement