DOD and memory layout

Started by
31 comments, last by Yuukan 12 years, 2 months ago
Hi there, I'm trying to write a custom game engine for educational purpose. I read several presentations (on dice publications website and gamesfromwithin) on the subject of Data Oriented Design and I really like to manage the memory myself. I want a cache friendly game engine.

Basically, my game engine is split into two components: the Game Engine itself and the Game Framework. I will use custom allocators to manage the memory but that's not the point. Since I really love the Artemis entity framework and how logic and data are organized, I'd like to do something like this and I'm sure it can tied to the DOD concept. Lets say we have entities which have behaviours (only data) and push those data to a system(logic) with, given input data can produce output data and operate on contigous blocks of memory. So behaviours / data need to be contigous in memory so my question is, who is responsible for allocating memory for behaviours? A custom system, the system itself? And which custom allocator will be the best for such purpose?
Advertisement

Hi there, I'm trying to write a custom game engine for educational purpose. I read several presentations (on dice publications website and gamesfromwithin) on the subject of Data Oriented Design and I really like to manage the memory myself. I want a cache friendly game engine.

Basically, my game engine is split into two components: the Game Engine itself and the Game Framework. I will use custom allocators to manage the memory but that's not the point. Since I really love the Artemis entity framework and how logic and data are organized, I'd like to do something like this and I'm sure it can tied to the DOD concept. Lets say we have entities which have behaviours (only data) and push those data to a system(logic) with, given input data can produce output data and operate on contigous blocks of memory. So behaviours / data need to be contigous in memory so my question is, who is responsible for allocating memory for behaviours? A custom system, the system itself? And which custom allocator will be the best for such purpose?




Could you tell us what platform(s) your engine is targetting ? Remember that the kind of memory management discussed in those articles is basically beneficial to machines that have a low amount of memory (which is also slow memory) and limited cache memory. Alot of it applies to consoles or embeded systems (like cell phones). If your target is desktop PC, you will get little if any benefit at all.

As a general rule, the memory will be supplied by a linear allocator. This is an allocator that basically works as a stack. That means memory that have been allocated must be freed in the exact opposite order as the one used to allocate. To make this easier to use, DICE have designed a scope stak allocator My link that makes it easier to allocated contiguous memory and then release it when not needed.

It would help to have some snipets of code (even just pseudo code) to discuss the matter in more details. It's early here and i yet have to get some coffee blink.gif
We think in generalities, but we live in details.
- Alfred North Whitehead
My engine targets PC first but must run on others platforms. I know this concept will be less crucial on desktop computer but still think it can help to make a more efficient code for parallelization purposes. My idea is basically to use DOD for all core specific jobs like rendering, animation, physics because it run on thousands of objects and need to be fast.

I use a stack allocator to allocate the core subsystems. For instance, when the core need to compute the world position for every objects, the memory layout will be important but the relation between two objects can change at runtime, a sword is picked by the player and its position is now relative to its parent (the player). So I need to be able to reorganize the memory, which allocator should I use?
Sorry for butting in without anything constructive to say...

But I thought the point of DOD was the speed _difference_ between cache and memory, much more then the amount of it you have. And isn't the speed difference pretty huge on a desktop PC too?

Possibly its just a little bit harder to trash the cache on a pc, but if you have a huge datastructure, you will anyhow, and will benefit greatly from being cache friendly.

And then even more if you want to parallellize.

Or am I wrong? :/
I think so Olof, it's all about access speed and CPU cache is still faster than RAM.

BTW, I tried several things to implement an entity system like artemis but DOD oriented and the point is, I am still stuck and don't know how do such a system. The idea is to be able, in an entity system, to loop through all relevant data by querying a central memory manager which return a pointer to the first element to process and a count. But since an entity system can have several component mappers, the memory layout is really hard to define.

For instance, in the Process() method of an entity system I want to be able to do something like this:

// MemoryMarker is just a struct like this
struct MemoryMarker
{
void* m_begin;
void* m_count;
// some getters and setters
};
MemoryMarker tm = ComponentMapper.Get<Transform>();
MemoryMarker rm = ComponentMapper.Get<Render>();
// tm.Count() or rm.Count() since they must process the same amount of entities
for(int i = 0; i < tm.Count(); tm++; rm++)
{
// Process job ! tm and rm points to the same entity components
}


I have something that should work but I worry about having to reorganize memory at runtime when an entity got some new component, etc..

Maybe someone can help me out smile.png

Sorry for butting in without anything constructive to say...

But I thought the point of DOD was the speed _difference_ between cache and memory, much more then the amount of it you have. And isn't the speed difference pretty huge on a desktop PC too?

Possibly its just a little bit harder to trash the cache on a pc, but if you have a huge datastructure, you will anyhow, and will benefit greatly from being cache friendly.

And then even more if you want to parallellize.

Or am I wrong? :/


It is about the difference in speed, and all platforms are affected, however, I would imagine that consoles suffer a greater impact for a few reasons -- firstly, they typically have much smaller amount of total cache, and second because the CPUs are simple, in-order cores (PS3, XBox 360) or just plain slow to begin with (Wii). The PS3 and 360 can switch to another thread cheaply and return when resources are ready, but if you stall two threads awaiting memory -- well, I'm not certain what happens, but it seems logical that you either wait, or load an external thread into one of the two contexts and hope it runs awhile. Ironically, the Wii's processor is probably the least-impacted by the rest of the system (They use 1T SRAM AFAIK, so while there is less of it, the memory system is rather fast compared to the Wii's CPU clock), but of course, the Wii runs at only 700ish Mhz to start off.

throw table_exception("(? ???)? ? ???");

It also matters for the newest PC CPUs, see "CPU Caches and Why You Care" talk by Scott Meyers (links below).
OP, perhaps those materials will be helpful (and the links therein):
http://stackoverflow...ove-performance
[Video] http://scottmeyers.b...-available.html
[Slides] http://scottmeyers.b...accu-talks.html
http://aristeia.com/...odeCamp2010.pdf
http://igoro.com/arc...-cache-effects/

[quote name='Olof Hedman' timestamp='1325258802' post='4898165']
Sorry for butting in without anything constructive to say...

But I thought the point of DOD was the speed _difference_ between cache and memory, much more then the amount of it you have. And isn't the speed difference pretty huge on a desktop PC too?


It is about the difference in speed, and all platforms are affected, however, I would imagine that consoles suffer a greater impact for a few reasons -
[/quote]

My point was just that if the cache is say 100x faster on platform X, and just 10x faster on platform Y, I'd imagine you get a lot higher gain from being cache friendly on platform X, even if platform Y has much more cache/ram or is generally faster... I thought the problem with the insanely quick modern CPU:s is that the ram lagged behind and didn't increase in speed as fast.

What you say about the Wii fits nicely into this. The Ram is quick so you don't get as much benefit from being cache friendly.
Right, but its not always about the measurable clock-speed differences between CPU/Cache/Memory, or even the ratio of CPU MIPS to bandwidth. Simpler, in-order CPU cores like the PPC units in the Xbox360 and PS3, or Intel's Atom processors don't have the complex circuitry that their out-of-order brothers have which can keep the CPU busy while it awaits data. That's why they have the trick of having two thread contexts on-the chip -- it helps, and its simpler than OoO or hyperthreading, but if you go beyond those two contexts its essentially dead in the water until the memory reference can resolve, or a more-expensive context switch occurs.

Such a memory reference on a fast CPU might "miss" for tens or hundreds of instruction slots before it resolves, but if the CPU can keep 75% of those slots busy with other work, then the aggregate impact is less than on a slower CPU which "misses" half as many instruction slots, but can't fill any of them with other work.

throw table_exception("(? ???)? ? ???");


I have something that should work but I worry about having to reorganize memory at runtime when an entity got some new component, etc..


Don't try to process one entity at a time.

Instead process types of components at a time; to use your example you would do all the transform components and then all the render ones.

This gives you the best chance at optimal instruction and data cache reuse while processing.

Your entities themselves only need to maintain some kind of handle to the componet to send any messages/commands they need (or not even that, you could have a completely decoupled system where by the entity stores 'id' values and sends messages via a message system to pass data a long).

The key point is not to think of an entity as being a thing you update, instead do batches of component types together.

This topic is closed to new replies.

Advertisement