Advanced Memory Management

Started by
8 comments, last by wodinoneeye 12 years, 4 months ago
So over the course of the years, one finds it imperative that if he wants to improve performance, he must write his own memory management system. I've figured out how to globally override new and delete to make a quick call to malloc(...) and free(...) while printing where the memory is being allocated as a test. The next challenge is to provide an alternative to malloc(...) and free(...) to improve performance. From what I've seen, you can't provide your own definition for malloc and free under Visual Studio without recompiling CRT, and I don't get brk or sbrk under windows. Is there any other way to allocate memory?
Follow and support my game engine (still in very basic development)? Link
Advertisement
The Windows API has a ton of functions for memory management.

So over the course of the years, one finds it imperative that if he wants to improve performance, he must write his own memory management system.


Can someone elaborate on this? I understand that for games 10 years ago this could be considered imperative but I don't see how this would make that much of a different for modern games. Memory isn't (or shouldn't be) newed and deleted excessively every frame and it seems like most games these days are GPU bound. I would guess that energy spent on parallelization would have a bigger performance impact.

Is there something I'm missing? Are there any benchmarks showing significant performance increase in a game based on a custom memory management system?
Generally I'd say it's best to manage your memory management (!!!) first, before looking to alternatives. This means moving from lots of small allocations to fewer large allocations (e.g. if you know in advance that you need memory for 25,000 objects, then allocate it all in one go, rather than making 25,000 separate allocations), getting rid of runtime allocations (careless use of STL for runtime dynamic lists can hurt you badly here) and reusing memory blocks from prior allocations if possible.

Once you've done that, if you still need to improve things, then I find that the Windows API VirtualAlloc pretty much rules here. All memory allocations from other API calls (and even from malloc/new) end up going through VirtualAlloc behind the scenes anyway, so you can short-circuit that and take control directly, using some problem-domain-specific knowledge to help you out.

There are no hard and fast rules, but as a general guideline what you would do is VirtualAlloc a pretty large block with MEM_RESERVE at startup, and keep a low mark counter, initially at 0 (also keep a counter of the amount currently allocated). When you need memory you can then VirtualAlloc with MEM_COMMIT from your low mark to low mark + size (round this up to a page boundary or more for better results). Free is just a rewind of the low mark, and if the low mark + size is less then the total you've committed so far you don't need to do anything, just return a pointer. That's a fairly rough description and I may have left something out. You can easily wrap this in a nice memory manager class, by the way.

That's good for big allocations at startup or load time, but it's a little inflexible sometimes, so you can use HeapAlloc instead. This uses VirtualAlloc behind the scenes of course, and will give you much of the flexibility you currently get from malloc but with some nifty extra options (like being able to specify an initial commit size).

One really nice thing about these is that they hugely simplify the task of freeing memory. Instead of needing to fret over each idividual allocation, you can just rewind your low mark (if using VirtualAlloc) or destroy your heap (if using HeapAlloc). This comes in particularly handy if you want to quickly and safely release all memory used during a game level, and you'll be more or less immune to leaks.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.


[quote name='Sepiantum' timestamp='1322526723' post='4888632']
So over the course of the years, one finds it imperative that if he wants to improve performance, he must write his own memory management system.


Can someone elaborate on this? I understand that for games 10 years ago this could be considered imperative but I don't see how this would make that much of a different for modern games. Memory isn't (or shouldn't be) newed and deleted excessively every frame and it seems like most games these days are GPU bound. I would guess that energy spent on parallelization would have a bigger performance impact.

Is there something I'm missing? Are there any benchmarks showing significant performance increase in a game based on a custom memory management system?
[/quote]

Depends on many things. For instance fragmentation created by uncontrolled heap calls may turn out to be a problem on some platforms, other problems may benefit from memory pools due to their allocating strategy. Therefore a custom allocator to solve those problems may be a solution. I say "it may" because it is problem specific (ie: specific linked list allocator that keeps data contiguous) and platform specific( ie: performance gain from replacing heap allocation on a OS system like Windows to your custom allocator may turn out to be a harder problem to solve than what you think, for a general purpose allocation strategy). General allocators are quite doing their job, that's why it may not be worthy to replace them.

And definitely the almighty hint: do not optimize until you profile.

Cheers.


mhagain is exactly right.

Writing a memory manager is the wrong solution; it's like finding out you have a leak under your sink, and dropping a nuclear bomb on your house so you can rebuild the entire surrounding city with new pipes. Just replace the leaky bits and be done.


Put another way: you're talking about memory allocation being slow. If you do something 200 million times, and make it 20% faster, you have a gain of .... 20%. If you do that exact same action twice instead, you have a gain of 1,000,000%. Worse, gaining that 20% is an exercise in extreme masochism. Don't try to whittle away at the symptoms; cure the disease.

The disease, in this case, is doing lots of small dynamic allocations. Look into pools, arenas, and disposable heaps - all of these are already implemented for you, can be found in free libraries or even your OS APIs themselves, and are basically drop-in compatible with everything you do with a small amount of effort.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Don't fix problems before they exist.

Write your game, then if you are having issues, look for ways to fix those issues. Is it CPU cycles? Memory? Disk access? The reality is that preemptive optimization almost never fixes a real problem, only takes time from working on the real project.
Keep in mind there are valid non-performance reasons to use a custom memory manager. Enhanced leak detection facilities or other memory debugging helpers and per-subsystem memory quotas are common ones.
The whole idea that "real games" are optimised by using a better "memory manager" is a huge red-herring.

For starters, when we say we've rewritten the memory manager, we're talking about a general-purpose heap allocator -- Now, that's a very specific type of allocator, which is one of many options. Why should we use this particular allocator for 100% of our use cases?

If we go back to basics, C/C++ gives us "the heap" and "the stack". We allocate from the heap via [font="Lucida Console"]new[/font]/[font="Lucida Console"]malloc[/font], and we allocate from the stack by creating local variables.
In many ways, the stack is superior to the heap, as it attaches very well defined lifetimes to each allocation (a local variable is destroyed when execution of it's parent code block goes out of scope), whereas on the other hand, there is no implicit lifetime attached to heap allocations (they're destroyed whenever we choose to destroy them).
Because of this, we often go to great lengths to attach heap allocations to stack lifetimes, e.g. via RAII objects like [font="Courier New"]auto_ptr[/font]s.

Going back to the discussion, for some reason most of the discussion on rewriting your memory manager is focussed on reinventing the heap... but why not also reinvent the stack? There are many benefits of doing so, the most obvious is that your own stack can have different lifetime guarantees, other than the current execution scope.

You can have many stacks (alongside the C stack) as well as many heaps (alongside the C heap), and alongside all of those, you'll also have various rings and pools!
Advanced memory management isn't writing a better heap, it's making appropriate use of different allocation structures, instead of shoving everything into a single heap. Ironically, doing so actually makes memory management simpler overall, not more advanced (e.g. if everything is managed in stacks, then you'll automatically have no leaks without even having to use a single delete or shared_ptr!)

So over the course of the years, one finds it imperative that if he wants to improve performance, he must write his own memory management system. I've figured out how to globally override new and delete to make a quick call to malloc(...) and free(...) while printing where the memory is being allocated as a test. The next challenge is to provide an alternative to malloc(...) and free(...) to improve performance. From what I've seen, you can't provide your own definition for malloc and free under Visual Studio without recompiling CRT, and I don't get brk or sbrk under windows. Is there any other way to allocate memory?



If you have ALOT of objects with static data structs (of same mem size) with constant construction/destruction of them then Free Pools can greatly increase your performance.

Allocate/deallocate is as simple as modifying a next pointer on the object and the header pointer of the pool (plus any needed data prepopulation on construction)

Each mem size of pooled objects can have its own free pool (Ive even done minor struct variants all forced to be the same mem size so that a free pool would work more flexibly)

A Free Pool can grow if needed using allocated blocks of additional nodes instead of static arrays.

Dynamic freepool allocated subobjects belonging to normally allocated irregular objects can be done if the lifecycle and sizing limitations are met.

Many games have many similarly staticly sized objects like projectiles, inventory itemes, state/event messages, registration markers, packetbuffers, etc .. who have patterns of usage that shortcutting the allocate/deallocate process could bring significant processing improvements.



As someone else also mentioned ring buffers for certain processing can also increase efficiency (this is basicly the same idea as freepools taken a step further)
--------------------------------------------[size="1"]Ratings are Opinion, not Fact

This topic is closed to new replies.

Advertisement