Sign in to follow this  
Pieisgood

Suggestions for memory management classes?

Recommended Posts

Hey guys, 

 

I've been building a series of memory management classes for a small game engine. I currently have a memory pool class, stack allocator class, double buffer allocator, and a series of overloaded new and delete operators to handle memory dumps. 

 

Do you guys have any suggestions for other classes or techniques that would be good to use for memory management? I'm thinking about wrapping this portion of the code up and releasing it on github but I want to be sure it's not lacking critical functionality or structural classes.

Share this post


Link to post
Share on other sites

Memory management is kind of an early optimization. Do you really need it? What I do in such situations is use my own functions as simple wrappers around the standard library functions. Then later on if I realize that things go slow in certain areas I implement my own stuff under those wrappers, but only where I do need them and in certain situations. As phil_t said, memory managers are pretty specific.

Share this post


Link to post
Share on other sites

Memory managers can be pretty specific for particular games, sure. But, general memory management techniques and classes can at least be discussed.

 

When building a piece of software as general and large as a game engine; I think it's best practice to have a series of memory management, assert, and new/delete overloads and classes. For example, when building your software it's nice to know how memory is being allocated even before you begin writing specific game code. Like storing vertices, normals, UVs ect... it's nice to know through testing that you aren't having memory fragmentation so that you don't need to find this out later when you're actually running the game.

 

But I'm also just wondering if there's any type of memory management techniques you guys have seen come in handy. Like has there been a situation where organizing or monitoring memory has been critical to improving your game and allowed you to avoid cache misses.. ect. 

Share this post


Link to post
Share on other sites

Well, I don't want to discourage you, but general game engines usually go nowhere, and there are no successful game engines out there that are general.

 

All have been built for a specific purpose (i.e. a specific game), and then continuously developed to adapt to other situations as well. So first you have to have a game which you are aiming for when developing the engine, and that will give you pretty specific situations where you need memory allocations optimized. If the game runs fine without any optimizations, then there is not reason for adding your own memory management to it.

 

As a general rule there should be nothing developed in your engine that doesn't serve a specific purpose of a specific game you are trying to build with it.

Share this post


Link to post
Share on other sites

Thanks Hodgman, I certainly wasn't thinking of locality. 

My code so far has three unique allocation strategies, and I was looking to expand that. The scope stack allocation I'm looking into now. 

 

I'm glad I wasn't on some sort of crazy train here. I figure creating memory management up-front is sort of a safety net, like unit testing. 

 

Also, is locality as much of an issue on hardware that supports virtual memory? Does that cost really add up on windows, linux and OSX?

Edited by Pieisgood

Share this post


Link to post
Share on other sites
Yep, virtual memory has no impact. If you're processing a batch of data, you ideally want that data to be as small as possible and to be stored continuously (contiguous in address space, either virtual or physical, depending on whether the address space is virtualized). Also, you ideally want to access each bit of data once in a linear order; no random accesses.
Working this way will allow the cache to actually function.

If you access memory randomly, unpredictably, and in areas that have not been used recently, then you get a cache miss - where you have to wait for the data to be moved from RAM to cache to CPU, which can waste hundreds of clock cycles.
e.g. On a Cell SPU, a cache miss is about 400 cycles, each cycle can issue two instructions, and each instruction can operate on 4 floats. That's 3200 FLOPs wasted each time you have to fetch a value from RAM!
On an x86 CPU, it's thankfully not that bad, but it's still a huge performance concern.

CPU speeds are increasing at a faster rate than RAM speeds, so every year this issue just gets worse and worse, not better!

Share this post


Link to post
Share on other sites

I discovered while implementing "cache friendly" structures that usually they tend to dictate the code around them (access/storing code), so it is some kind of a data driven design approach. Typical OOP code with lists of objects and such doesn't map very well to this.

Share this post


Link to post
Share on other sites

Data driven design IS engine design. Going as far back to Game Programming Gems 1 - Chapter 1 we can see that this has been a general goal in game development. 

 

Also, inline function size is something else that you need to look out for in cache management. 

 

While I'm not looking to write for the PS3, it's a nice example of the cost. It also seems like a good time to revisit memory management as whole new strategies are going to start cropping up on the PS4 and XBONE since they both have the homogeneous AMD cores.

Share this post


Link to post
Share on other sites

Some popular ones that haven't been mentioned are Page Allocators, Slab Allocators, and Paged Pool Allocators.  Another really simple one that isn't a heap replacement but is handy on the stack is Alloca.  However, it should never be used in a loop, recursive function, or with an allocation request that will blow up the stack.  Some temporary memory allocators will use a statically sized array first, then alloca, and then go to the heap if needed for "scratch" memory.  One way of going about this is described here.

 

If the end goal is memory performance (and it should be if you're writing allocators) I cannot agree enough with what Hodgman is saying here.  malloc is a general purpose allocator that has to fulfill allocations of all different sizes and alignments.  It also has to do something called a context switch and these are ridiculously expensive.  However, it's still primarily about cache misses at the end of the day.  

 

The goal should be to make fewer allocations, with tightly packed data that is accessed contiguously.  Find out what a struct of arrays is, what prefetching does and how to use it responsibly, why casting a float to an int is expensive, what the restrict keyword is, and how this also applies to instructions.  Custom memory allocators are a small slice of a much bigger problem.

Share this post


Link to post
Share on other sites

That's another three to look up. Awesome. Also good to know, I'm starting to understand the framing here. It seems that not having cache misses is a serious issue.

 

Also, it opens my eyes a bit as to why a previous c++ program I wrote was super slow. It did matrix multiplication of markov chain matrices... it took SUPER long to compute. When I wrote the same thing in MATLAB it was incredibly fast. I can only assume this is because MATLAB matrix multiplication has way less cache misses, they do perform sparse matrix checks and calculations, but after the third iteration it should have been non-sparse... and it was still incredibly fast. Anyway, good information to have here.

 

http://penguin.ewu.edu/~trolfe/MatMult/MatOpt.html

Edited by Pieisgood

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this