Sign in to follow this  
grill8

Questions about memory management and operator new in game engine design.

Recommended Posts

Hello, I have questions about memory management and operator new in game engine design. Ok, so I know real time performance is critical and that operators new/delete are slow. I have a memory allocator class that I wrote that uses free lists and paging to simulate allocation of objects and avoids allocations except when new pages are required. Now, I know that C++ stl containers by default use operators new/delete for their allocators. My engine uses stl in quite a lot of places within the engine, including resource instancing, sorting objects, and most data that requires containers are using stl containers so that means a lot of new/delete calls. I have an idea that I would like some input on. What I would like to do is to design a memory manager module that simply has a bunch of memory allocators which distribute pointers to memory of different sizes (see above). Next, overload the global operator new. Within the overloaded new operator, it should check the size of the object requiring allocation and then find the appropriate allocator (the one that allocates memory of slightly larger size) and return the pointer. The idea then is that EVERY new/delete in the program (including stl I believe) is handled through the memory manager system via operator new. This would greatly improve performance at runtime I believe, and while there may be some extra memory consumption it should be acceptible as it is my understanding that memory is fairly cheap these days. Any input? If this doesn't work as I would like, please leave some input. And if it does not, please suggest alternatives. It just seems a hassle to have to plug in your own allocators everywhere that is performance critical and handle allocations of stl in that manner when an overloaded operator new with a memory manager module should just take care of it everwhere and more efficiently. Thank you for your help. Jeremy

Share this post


Link to post
Share on other sites
With the STL containers taking an allocator as a parameter.
You are better off just giving allocators to your containers. It will give you better control over the memory management.
Put memory pool allocators on containers you allocate a lot with.
And for containers you allocate once at run time (or once per level) make a more simplistic allocator that only has to carve up one block into pieces, and when the level is done you just deallocate the entire block instead of the pieces. Or you can make sure a block of memory is aligned or padded for specific needs (cache lines, SSE instructions, etc)
Here are a couple gamedev links to get you started: Here and Here

Other things to note.
It is nice to know the file/line of an allocation.
It is nice to know the TYPE of allocation. (object type)
It is nice to know the ASSET TYPE of allocation. (debug, collision, audio, graphics, etc)

Having different allocators for each class and each type (while it may be a pain to plug in the first time) can be really helpful in diagnosing memory problems (as well as letting you know where your memory budget went. and on a console, memory budgets get really tight, so asset tracking is very helpful)

Share this post


Link to post
Share on other sites
Quote:
Original post by loufoque
Quote:
I know [...] that operators new/delete are slow.

How do you know it? Did you benchmark it?


I think what he meant to say was that dynamic memory allocation/deallocation is slow, if he doesn't then he is probably miss-interpreting the problem.

As I would look at; new/delete([]) are for allocation of dynamic (heap) memory, and yes it is slow, thats a given no need for profiling. If you want to minimise said memory allocation why not create some sort of object manager where a maximum amount of object storage is set aside and you use these pre allocated objects when required and available, of course you would need to account for the case where there is no free objects available, maybe a 'new' call. This strategy would also appear to fit well with the pimpl idiom, pointer to implementation, allowing you to more efficently change the implementation of a particular object instance, and if your pimpl's had some redefinition functionality calls to new would be very infrequent.

I have yet to really get my teeth in to this area so there could be huge holes in what I have outlined here.

Share this post


Link to post
Share on other sites
Quote:
Original post by grill8
The idea then is that EVERY new/delete in the program (including stl I believe) is handled through the memory manager system via operator new. This would greatly improve performance at runtime I believe, and while there may be some extra memory consumption it should be acceptible as it is my understanding that memory is fairly cheap these days.


IMO, it would be better if you just use resize/reserve on STL containers when it's appropriate. You will gain a tremendous performance boost, if you use those functions wisely. As well as using the right STL container can have a huge performance boost:
I wrote a memory leak detector and used a vector or a map to keep track of the allocations. With some simple programm that allocated some thousand arrays of constant size, the vector implementation took about 50 seconds, whereas the map took about 4 seconds.

I really doubt that you can make the STL work faster by writing your own memory manager (as far as the STL implementation from microsoft is concerned, I haven't used other implementations very often). Their implementation is pretty clever, as it preallocates more memory than you need in the first place. Though it's more efficient to preallocated the things you already know you are going to need (eg. you know there are 50 meshes in your level, so reserve that many memory for your containers).
You also have to ask yourself if the rest of your allocations have a great impact on your code, and if you can prevent that somehow.

Share this post


Link to post
Share on other sites
Quote:
Original post by grill8
I have questions about memory management and operator new in game engine design.

Generally speaking, implementation details should not affect design.
Quote:

Ok, so I know real time performance is critical and that operators new/delete are slow. I have a memory allocator class that I wrote that uses free lists and paging to simulate allocation of objects and avoids allocations except when new pages are required.

A program is slow where it takes too long to perform an action. You determine this through measurement. Identify the slow parts through measurement, then make an effort to speed those parts up.

Do not assume you need to put a lot of effort into speeding something up until you have identified it as a slow bit.
Quote:

Now, I know that C++ stl containers by default use operators new/delete for their allocators. My engine uses stl in quite a lot of places within the engine, including resource instancing, sorting objects, and most data that requires containers are using stl containers so that means a lot of new/delete calls.

The container found in the C++ standard library use their allocator member to allocate their memory. By default, the standard containers use std::allocator as their allocator, and std::allocator is specified as "allocating as if using ::operator new()". What that means in practice is that at some point, std::allocator uses ::operator new() to allocate some memory, but most implementations use a slab allocation strategy with pooling.

Often, your standard library implementation will provide alternative allocator implementations, and there are third-party add-on allocators (like from boost) that do fany things.

Fact is, until you have identified where the performance bottlenecks are in your program (through appropriate testing and merasurement) you should stick with the default and focus on functionality. Make it work, then make it work fast.

Share this post


Link to post
Share on other sites
Quote:
Fact is, until you have identified where the performance bottlenecks are in your program (through appropriate testing and merasurement) you should stick with the default and focus on functionality. Make it work, then make it work fast.


I disagree and I think this default answer is given out far too often here.

Consider a situation where I know my program will allocate millions of small objects during it's lifetime. It is plainly obvious that having memory allocated at program start and placing the objects into that memory will be faster than dynamically allocating the objects when needed. I don't need to test performance, I already know it will be faster, and even if it isn't the major bottleneck of the program, optimising early will still give performance gains.

Make it work fast, then make it work faster...

Share this post


Link to post
Share on other sites
Quote:
Original post by trippytarka
I disagree and I think this default answer is given out far too often here.
...


I think what Begma means is that he should not introduce some fancy memory allocation factory unless he finds out that the way the STL manages memory is a bottleneck. The standard functionality of the STL already grants the ability to reserve memory before needed as I described it and you pointed out with that example.

In simple terms: use the STL properly and a custom factory won't be needed in 99% of the cases.

Share this post


Link to post
Share on other sites
The first thing most console games do is replace the CRT allocator by overriding malloc and new. As far as I know about STL, I don't use it in the games I work on, the STL uses operator new for some of its containers even when you provide a custom allocator. You can read up on EASTL more if you want to know the reasons why.

I'm going to go against most people's advice on this thread and say that thinking about memory management, writing custom allocators, and overriding new/malloc from the beginning is going to be a huge benefit on performance for your project. In my experience whenever a system didn't think about allocation patterns the performance implications hit it towards alpha/beta. Though, the suggestions on reserving memory for your containers is great advice! Pools of memory will always help out.

As far as writing your own allocator I would recommend checking out Doug Lee's memory allocator (dlmalloc) http://g.oswego.edu/dl/html/malloc.html . It's the industry's verbal agreement on the best general purpose allocator out there and is used in numerous titles.

Good luck!
-= Dave

Share this post


Link to post
Share on other sites
Thank you all for your help.

I did some research into boost and found some allocators there that should definately help and meet the requirements of my needs. I would like to ask a few more questions before I move on to the next topic however.

Question 1)
It was mentioned that some implementations of the stl (I use VC 2008) may not actually be allocating/deallocating objects individually via new/delete. The C++ standard I believe says that by default it uses new/delete. So ... before I start using boost allocators, if VC 2008 already uses pooling (and I like the resize option as well) is it really worth using boost at all, or should one just stick to the standard containers? Yes, there will be many allocations throughout the lifetime.

Question 2)
In designing a generalized object manager for my engine I have come across a slightly puzzling scenario in that naturally most game objects, while using a uniform ADT interface, differ in size and the amounts of data they use. That means that pool allocators do not work well for allocations. I use instancing (and objects may be cached as well) so usually only the first base object of an object is ever even allocated, but again, since pooling is out it seems that the base object that objects are instanced from require allocation without pooling or other such allocation schemes. If anyone has any suggestions, I would love to hear them.

Thank you for your help.
Jeremy

Share this post


Link to post
Share on other sites
Quote:
Original post by trippytarka
Quote:
Fact is, until you have identified where the performance bottlenecks are in your program (through appropriate testing and merasurement) you should stick with the default and focus on functionality. Make it work, then make it work fast.


I disagree and I think this default answer is given out far too often here.

Consider a situation where I know my program will allocate millions of small objects during it's lifetime. It is plainly obvious that having memory allocated at program start and placing the objects into that memory will be faster than dynamically allocating the objects when needed. I don't need to test performance, I already know it will be faster, and even if it isn't the major bottleneck of the program, optimising early will still give performance gains.

Make it work fast, then make it work faster...


I take your point, but don't go overboard.

Let us assume (for the sake of argument) you spend a week designing, developing, tuning, testing your memory allocator. Then when you finish your game you realise that the bottleneck is the GPU and your changes do not result in any noticeable speedup.

If the custom allocator is something that you can drop in at a later date without affecting the code I would strongly recommend waiting until allocation is the bottleneck before doing anything about it.

Share this post


Link to post
Share on other sites
Quote:
Original post by David Neubelt
As far as writing your own allocator I would recommend checking out Doug Lee's memory allocator (dlmalloc) http://g.oswego.edu/dl/html/malloc.html . It's the industry's verbal agreement on the best general purpose allocator out there and is used in numerous titles.

I'd also recommend looking at Google's allocator. It has the added benefit of being thread-safe, unlike dlmalloc (Lea's).

Google Performance Tools

[Edited by - mfawcett on January 22, 2009 4:53:38 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by mfawcett
Quote:
Original post by David Neubelt
As far as writing your own allocator I would recommend checking out Doug Lee's memory allocator (dlmalloc) http://g.oswego.edu/dl/html/malloc.html . It's the industry's verbal agreement on the best general purpose allocator out there and is used in numerous titles.

I'd also recommend looking at Google's allocator. It has the added benefit of being thread-safe, unlike dmalloc (Lea's).

Google Performance Tools


I don't know where you get that from

Quote from dlmalloc.c
Thread-safety: NOT thread-safe unless USE_LOCKS defined
When USE_LOCKS is defined, each public call to malloc, free,
etc is surrounded with either a pthread mutex or a win32
spinlock (depending on WIN32). This is not especially fast, and
can be a major bottleneck. It is designed only to provide
minimal protection in concurrent environments, and to provide a
basis for extensions. If you are using malloc in a concurrent
program, consider instead using ptmalloc, which is derived from
a version of this malloc. (See http://www.malloc.de).

Though, it is quite thread safe if you use a memspaces per thread. Not only is that thread safe it avoids any locks and I can gurantee a program that has no locks is much faster then a program using a thread safe (i.e. thread locking) allocator. Placing locks on allocations will make memory allocations even more expensive then they already are.

Though, there are always situations where you need it. I would just try to design my program in such a way that it can avoid it.

Take care,
-= Dave

Share this post


Link to post
Share on other sites
Quote:
Original post by David Neubelt
I don't know where you get that from

Quote from dlmalloc.c
Thread-safety: NOT thread-safe unless USE_LOCKS defined


You're right, I was looking at out-dated information in older PPT presentations on the subject.

Share this post


Link to post
Share on other sites
Actually, replacing global new/delete with a good pooling/slab allocator is one of the best performance optimizations there is, especially combined with heavy STL usage. In practice, we have experienced speedups in excess of factor 10 in scenarios where STL containers and std::strings were heavily used (under the MSVC 2008 CRT). I consider adding a good allocator to be an absolute requirement of any serious application or game. It has nothing to do with premature optimization. It's common sense.

As a side effect, it adds the possibility to include a leak and memory corruption detector with very little overhead.

Share this post


Link to post
Share on other sites
I wish I had read the chapter on replacing new and delete before my post :).

Yann: Would it not be better to have a more localised implementation than replacing the global new and delete; is creating a new allocator for STL containers not a better design approach than this, I was under the impression changing these are the way to control STL container allocation.

I know you mention allocators its just the replacing of global new/delete that has thrown me.

Share this post


Link to post
Share on other sites
Quote:
Original post by Guthur
Yann: Would it not be better to have a more localised implementation than replacing the global new and delete; is creating a new allocator for STL containers not a better design approach than this, I was under the impression changing these are the way to control STL container allocation.

In theory, you are right. But this assumes that the CRT implementation of the global allocator is good. Unfortunately, at least the one in VC8 and VC9 is surprisingly bad. And the one for gcc is not much better.

Of course it is still a good idea to supply a special local allocator, if you know that you will follow a very specific allocation pattern. A highly specialized allocator can always outperform a generic allocator if it operates within its own sweet spot. But the situations where this makes sense are not too commonly encountered.

The vast majority of cases are just general and often unpredictable allocation patterns. The STL likes to allocate and release a lot, and usually very small memory fragments. In fact, the often discussed 'best practices' for C++ (if such a thing exists) will worsen this behaviour. So it is important that the general allocator can handle it. The one in VC9 cannot. That's why replacing it with a better one makes a big difference.

Share this post


Link to post
Share on other sites
I am just wondering now if one could use template meta programming to create the new/delete operator on the fly dependant on compile time analysis of object allocation.

hhh I probably should think it through more though, plus I have only started that book today; its amazing the stuff that goes on with this language, C++, the Techniques(2) chapter of Modern C++ Design nearly made my head spin today :p

[Edited by - Guthur on January 22, 2009 6:27:20 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Guthur
I just wondering now if you could use template meta programming to create your new/delete operator on the fly dependant on compile time analysis of object allocation.

Over-engineering can make the solution worse than the original problem...

Share this post


Link to post
Share on other sites
Quote:
Original post by grill8
Ok, so I know real time performance is critical


It could be important, depending on what kind of game you're making. A blanket statement that "it's critical" is overstating things.

Quote:
and that operators new/delete are slow.


In a manner of speaking, sure. In another manner of speaking, the only slow thing a computer does is wait for the human to respond. When programs appear to run slowly, it's generally more to do with the number of things they're doing, rather than the nature.

Quote:
Now, I know that C++ stl containers by default use operators new/delete for their allocators. My engine uses stl in quite a lot of places within the engine, including resource instancing, sorting objects, and most data that requires containers are using stl containers so that means a lot of new/delete calls.

I have an idea that I would like some input on.

What I would like to do is to design a memory manager module that simply has a bunch of memory allocators which distribute pointers to memory of different sizes (see above). Next, overload the global operator new. Within the overloaded new operator, it should check the size of the object requiring allocation and then find the appropriate allocator (the one that allocates memory of slightly larger size) and return the pointer.

The idea then is that EVERY new/delete in the program (including stl I believe) is handled through the memory manager system via operator new. This would greatly improve performance at runtime I believe, and while there may be some extra memory consumption it should be acceptible as it is my understanding that memory is fairly cheap these days.


You could do that.

Or you could use an existing pool allocator implementation (Boost provides one) and specify the allocator template parameter for your standard library (please don't say "STL"; that hasn't been a relevant term for many years) containers.

And while you're at it, check if (a) things are actually unacceptably slow; and (b) if any slowdown might be addressed by simply choosing a more appropriate container type

Quote:
It just seems a hassle to have to plug in your own allocators everywhere


I can't imagine that it would be more of a hassle than the work you propose. (You also lose flexibility by overloading the global operator; what if you want different containers to allocate from different pools?)

Share this post


Link to post
Share on other sites
Quote:
Original post by Yann L
[...]
Have you tried replacing malloc/et al with the corresponding windows heap functions? I haven't studied the VS 2008 CRT source, but I know in VS 2002 the CRT added a lot of processing on top of the windows heap. I never did any benchmarks, but I wouldn't be surprised to find that directly using the heap was much faster.

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius
Quote:
Original post by Yann L
[...]
Have you tried replacing malloc/et al with the corresponding windows heap functions? I haven't studied the VS 2008 CRT source, but I know in VS 2002 the CRT added a lot of processing on top of the windows heap. I never did any benchmarks, but I wouldn't be surprised to find that directly using the heap was much faster.


The behavior of the MSVC memory allocator is surprisingly complex and baroque yet simple at the same time. Depending on which version of MSVC you run, there are something like five different heap allocation schemes that MSVC uses. There's the debug heap, small block allocator, the MSVC 5 heap allocator, MSVC 6 heap allocator and an allocator that just defers everything the HeapAlloc(). Which allocator is used is based on a mix of compiler flags, environment variables, the presence of a debugger and the raw entropy of the universe. In most cases, the additional "processing on top of the windows heap" basically boils down to: "yup, I'm still using the Windows heap, just forward all the relevant arguments to HeapAlloc()". Just with a lot of if statements to make sure that it's not actually using the small block heap, MSVC 5 heap allocator or MSVC 6 heap allocator.

The algorithm used by HeapAlloc() is fairly decent for C. However, C++ usage patterns are generally different. In particular, C++ favors more frequent allocations of smaller sized memory blocks, which makes HeapAlloc() much less friendly. For example, on 2000 and XP (and probably Vista, but I haven't investigated it yet), HeapAlloc() creates a 24 byte header and pads up to a multiple of 8 with data that gets used in heap consistency checks. Not a big deal when your memory allocations tend to be over a hundred bytes in size. A big deal when using things like shared_ptr which creates a lot of very small allocations.

Share this post


Link to post
Share on other sites
Quote:
Original post by Yann L
Actually, replacing global new/delete with a good pooling/slab allocator is one of the best performance optimizations there is, especially combined with heavy STL usage. In practice, we have experienced speedups in excess of factor 10 in scenarios where STL containers and std::strings were heavily used (under the MSVC 2008 CRT).


To back up Yann's statement. When we profiled doug lee's malloc using custom pooling containers they were actually slower then dl's general purpose malloc for our smaller allocations. Profiled against the default CRT allocator in our environment showed substantial improvements (I can't remember how much though). There are some profiled comparisons posted on the internet some where but it was from back in 2000. I'd love to see someone redo those tests against the newer CRT.

I don't have any experience with using the standard library containers so I can't comment on it but I'd like to reiterate my original point that choosing a good back end general purpose allocator to replace the standard CRT is a really good idea.

After that you should engineer your engine and game libraries to use specific custom heaps and then when designing algorithms you should choose the appropriate containers and allocation schemes.

I wouldn't call any of these premature optimizations - allocation requests, in my opinion, are like designing an algorithm and choosing the right data structure.

-= Dave

BTW, Yann what allocator did your company use or was it in house tech?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this