Jump to content
  • Advertisement
Sign in to follow this  
Yuukan

DOD and memory layout

This topic is 2512 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi there, I'm trying to write a custom game engine for educational purpose. I read several presentations (on dice publications website and gamesfromwithin) on the subject of Data Oriented Design and I really like to manage the memory myself. I want a cache friendly game engine.

Basically, my game engine is split into two components: the Game Engine itself and the Game Framework. I will use custom allocators to manage the memory but that's not the point. Since I really love the Artemis entity framework and how logic and data are organized, I'd like to do something like this and I'm sure it can tied to the DOD concept. Lets say we have entities which have behaviours (only data) and push those data to a system(logic) with, given input data can produce output data and operate on contigous blocks of memory. So behaviours / data need to be contigous in memory so my question is, who is responsible for allocating memory for behaviours? A custom system, the system itself? And which custom allocator will be the best for such purpose?

Share this post


Link to post
Share on other sites
Advertisement

Hi there, I'm trying to write a custom game engine for educational purpose. I read several presentations (on dice publications website and gamesfromwithin) on the subject of Data Oriented Design and I really like to manage the memory myself. I want a cache friendly game engine.

Basically, my game engine is split into two components: the Game Engine itself and the Game Framework. I will use custom allocators to manage the memory but that's not the point. Since I really love the Artemis entity framework and how logic and data are organized, I'd like to do something like this and I'm sure it can tied to the DOD concept. Lets say we have entities which have behaviours (only data) and push those data to a system(logic) with, given input data can produce output data and operate on contigous blocks of memory. So behaviours / data need to be contigous in memory so my question is, who is responsible for allocating memory for behaviours? A custom system, the system itself? And which custom allocator will be the best for such purpose?




Could you tell us what platform(s) your engine is targetting ? Remember that the kind of memory management discussed in those articles is basically beneficial to machines that have a low amount of memory (which is also slow memory) and limited cache memory. Alot of it applies to consoles or embeded systems (like cell phones). If your target is desktop PC, you will get little if any benefit at all.

As a general rule, the memory will be supplied by a linear allocator. This is an allocator that basically works as a stack. That means memory that have been allocated must be freed in the exact opposite order as the one used to allocate. To make this easier to use, DICE have designed a scope stak allocator My link that makes it easier to allocated contiguous memory and then release it when not needed.

It would help to have some snipets of code (even just pseudo code) to discuss the matter in more details. It's early here and i yet have to get some coffee blink.gif

Share this post


Link to post
Share on other sites
My engine targets PC first but must run on others platforms. I know this concept will be less crucial on desktop computer but still think it can help to make a more efficient code for parallelization purposes. My idea is basically to use DOD for all core specific jobs like rendering, animation, physics because it run on thousands of objects and need to be fast.

I use a stack allocator to allocate the core subsystems. For instance, when the core need to compute the world position for every objects, the memory layout will be important but the relation between two objects can change at runtime, a sword is picked by the player and its position is now relative to its parent (the player). So I need to be able to reorganize the memory, which allocator should I use?

Share this post


Link to post
Share on other sites
Sorry for butting in without anything constructive to say...

But I thought the point of DOD was the speed _difference_ between cache and memory, much more then the amount of it you have. And isn't the speed difference pretty huge on a desktop PC too?

Possibly its just a little bit harder to trash the cache on a pc, but if you have a huge datastructure, you will anyhow, and will benefit greatly from being cache friendly.

And then even more if you want to parallellize.

Or am I wrong? :/

Share this post


Link to post
Share on other sites
I think so Olof, it's all about access speed and CPU cache is still faster than RAM.

BTW, I tried several things to implement an entity system like artemis but DOD oriented and the point is, I am still stuck and don't know how do such a system. The idea is to be able, in an entity system, to loop through all relevant data by querying a central memory manager which return a pointer to the first element to process and a count. But since an entity system can have several component mappers, the memory layout is really hard to define.

For instance, in the Process() method of an entity system I want to be able to do something like this:

// MemoryMarker is just a struct like this
struct MemoryMarker
{
void* m_begin;
void* m_count;
// some getters and setters
};
MemoryMarker tm = ComponentMapper.Get<Transform>();
MemoryMarker rm = ComponentMapper.Get<Render>();
// tm.Count() or rm.Count() since they must process the same amount of entities
for(int i = 0; i < tm.Count(); tm++; rm++)
{
// Process job ! tm and rm points to the same entity components
}


I have something that should work but I worry about having to reorganize memory at runtime when an entity got some new component, etc..

Maybe someone can help me out smile.png

Share this post


Link to post
Share on other sites

Sorry for butting in without anything constructive to say...

But I thought the point of DOD was the speed _difference_ between cache and memory, much more then the amount of it you have. And isn't the speed difference pretty huge on a desktop PC too?

Possibly its just a little bit harder to trash the cache on a pc, but if you have a huge datastructure, you will anyhow, and will benefit greatly from being cache friendly.

And then even more if you want to parallellize.

Or am I wrong? :/


It is about the difference in speed, and all platforms are affected, however, I would imagine that consoles suffer a greater impact for a few reasons -- firstly, they typically have much smaller amount of total cache, and second because the CPUs are simple, in-order cores (PS3, XBox 360) or just plain slow to begin with (Wii). The PS3 and 360 can switch to another thread cheaply and return when resources are ready, but if you stall two threads awaiting memory -- well, I'm not certain what happens, but it seems logical that you either wait, or load an external thread into one of the two contexts and hope it runs awhile. Ironically, the Wii's processor is probably the least-impacted by the rest of the system (They use 1T SRAM AFAIK, so while there is less of it, the memory system is rather fast compared to the Wii's CPU clock), but of course, the Wii runs at only 700ish Mhz to start off.

Share this post


Link to post
Share on other sites
It also matters for the newest PC CPUs, see "CPU Caches and Why You Care" talk by Scott Meyers (links below).
OP, perhaps those materials will be helpful (and the links therein):
http://stackoverflow...ove-performance
[Video] http://scottmeyers.b...-available.html
[Slides] http://scottmeyers.b...accu-talks.html
http://aristeia.com/...odeCamp2010.pdf
http://igoro.com/arc...-cache-effects/

Share this post


Link to post
Share on other sites

[quote name='Olof Hedman' timestamp='1325258802' post='4898165']
Sorry for butting in without anything constructive to say...

But I thought the point of DOD was the speed _difference_ between cache and memory, much more then the amount of it you have. And isn't the speed difference pretty huge on a desktop PC too?


It is about the difference in speed, and all platforms are affected, however, I would imagine that consoles suffer a greater impact for a few reasons -
[/quote]

My point was just that if the cache is say 100x faster on platform X, and just 10x faster on platform Y, I'd imagine you get a lot higher gain from being cache friendly on platform X, even if platform Y has much more cache/ram or is generally faster... I thought the problem with the insanely quick modern CPU:s is that the ram lagged behind and didn't increase in speed as fast.

What you say about the Wii fits nicely into this. The Ram is quick so you don't get as much benefit from being cache friendly.

Share this post


Link to post
Share on other sites
Right, but its not always about the measurable clock-speed differences between CPU/Cache/Memory, or even the ratio of CPU MIPS to bandwidth. Simpler, in-order CPU cores like the PPC units in the Xbox360 and PS3, or Intel's Atom processors don't have the complex circuitry that their out-of-order brothers have which can keep the CPU busy while it awaits data. That's why they have the trick of having two thread contexts on-the chip -- it helps, and its simpler than OoO or hyperthreading, but if you go beyond those two contexts its essentially dead in the water until the memory reference can resolve, or a more-expensive context switch occurs.

Such a memory reference on a fast CPU might "miss" for tens or hundreds of instruction slots before it resolves, but if the CPU can keep 75% of those slots busy with other work, then the aggregate impact is less than on a slower CPU which "misses" half as many instruction slots, but can't fill any of them with other work.

Share this post


Link to post
Share on other sites

I have something that should work but I worry about having to reorganize memory at runtime when an entity got some new component, etc..


Don't try to process one entity at a time.

Instead process types of components at a time; to use your example you would do all the transform components and then all the render ones.

This gives you the best chance at optimal instruction and data cache reuse while processing.

Your entities themselves only need to maintain some kind of handle to the componet to send any messages/commands they need (or not even that, you could have a completely decoupled system where by the entity stores 'id' values and sends messages via a message system to pass data a long).

The key point is not to think of an entity as being a thing you update, instead do batches of component types together.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!