Is memory management a must have?

Started by
14 comments, last by Andrew Kesterson 11 years, 6 months ago
In the book Game Programming Gems 8 it talks about overriding the new operator and creating some sort of HeapManager system to avoid memory fragmentation in the RAM during run time.Is this some kind of problem for older systems,or do you still have to do it on windows7/with new hardware?It also says that if you have an array of structs with lets say 5 ints in each and you only use 1 int from each struct,the CPU still wastes cycles for the extra unused data.Can't find much info about this anywhere else.
Advertisement
Don't do for the problem you didn't get yet.

Also it's quite depending on your memory management strategy.
If you allocate all memory when entering a game level and free all when exiting, you won't get memory fragments.

5 ints cause CPU wastes cycles? Now CPU has cache for at least 32 bytes per page, so how can the waste happen? biggrin.png

https://www.kbasm.com -- My personal website

https://github.com/wqking/eventpp  eventpp -- C++ library for event dispatcher and callback list

https://github.com/cpgf/cpgf  cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.

I think what he meant was having an array of structs where each one has five integers. If the data is not operated on in the same functions then you will be fetching structs of five ints, whereas if you kept a struct of related arrays instead than you would prefetch only the relevant data resulting in much less cache misses as you have left the cache warm.

For example:
[source lang="cpp"]struct Unit
{
vec3 transform;
int32 other_data;
int32 more_data;
};
std::vector< Unit > units;
[/source]
would perform much better as:
[source lang="cpp"]struct Units
{
std::vector< vec3 > transforms;
std::vector< int32 > other_datas;
std::vector< int32 > more_datas;
};[/source]

When doing something like say updating transformation matrices.. because instead of fetching other_data and more_data along with the transform like in the first example you are able to only fetch the needed transforms resulting in less cache misses.
in c++ local variables are allocated from preallocated pool, so OS do not search for memory to provide, BUT, when you use "new" it searches memory.
If you have a function that has a local variable -char* pMany= new char[100000000]; and before it finishes the block you call delete[] pMany, then you shoud rather have a pre allocated pool, such as you use

{
char* pMany=(char*)m_Pool->Request(100000000);// all great
CClassLike pObject=(CClassLike*)m_Pool->Request(sizeof(CClassLike));// beware, THIS will not call constructor or destructor
//use it and when finished with local data- reset pool, or reset it before function runs
m_Pool->Reset();
}

This would be the m_Pool type - CPool,

public class CPool
{
public:
Cpool()
{
m_bAllocated=false;
}
bool CreatePool(long bytescount)
{
// use only once, or handle freeing
m_bAllocated=true;
m_pPoolMemory=new char[bytescount];
m_iPoolSize=bytescount;
m_iCurrentProvidedBytes=0;
}
char* Request(long size)
{
if (m_iCurrentProvidedBytes+size>m_iPoolSize)
return NULL;
char* result= (char*)(m_pPoolMemory+m_iCurrentProvidedBytes);
m_iCurrentProvidedBytes+=size;
return result;
}
void Reset()
{
m_iCurrentProvidedBytes=0;
}
void FreePool()
{
if (m_bAllocated)
delete[] m_pPoolMemory;
}
private:
long m_iPoolSize;
long m_iCurrentProvidedBytes;
char* m_pPoolMemory;
bool m_bAllocated;
}

As you can see CPool::Request(size) is some fast instruction. And I would not override "new" operator, but this is just my personal preference.
That isn't what the gem is referring to.


Allocations with new will go out to the operating system, allocate memory, and return.

When the system allocates memory it OFTEN takes a long time. It doesn't always, but often it does. By "long time" I'm talking on the order of microseconds.

The 'gem' is that you create your own memory management pool. Instead of letting the OS handle all the work, YOU need to handle all the work. You need to address problems of memory fragmentation, allocators and deallocators, thread safety, and much more. Done correctly you can have allocations take place on the order of a few hundred nanoseconds.


The gem attempts to save around one microsecond per memory allocation.

That one microsecond is valuable when games are pushing the limit. The trade off is that YOU end up needing to solve all those problems that have already been solved by the OS. It does a considerable amount of work for you so that you don't need to, so you better have a serious need for those microseconds.

If you don't have a critical need for those few microseconds, you are probably better off avoiding custom allocators.

If you really feel like you need them, use a pre-written pool library like boost::pool that is already debugged and working properly.
To add on to frob's statement, you don't even need a generic "pool" to take advantage of this. Depending on what you are doing, you can just reuse the same memory blocks for the same purpose (but with new data), and get massive speedups. This has the added benefit that you don't need to worry about calculating sizes or fragmentation issues.

Real life example from my current project:
Since my game is a 2D tile based game, and every "chunk" of the world is the same size (20 by 20 tiles), when I need to unload chunks and load new chunks when the player moves around, I use the memory allocated for the previous chunks to store the new chunks, instead of calling delete() on the old area and calling new() on the new area. Instead of calling new() for 20 * 20 (400) tile structs (per layer per chunk), the tile memory is just reused for new tiles. My speed ups were very noticeable, though I don't remember the exact amount gained.

In the same way, I reuse "tile" memory, "layer" memory, and "chunk" memory, and only allocate if I need more layers, or delete if I have unused layers.
[size=2]Note: the 'tile' struct was very small and was already taking advantage of the Flyweight pattern, mostly holding a pointer to the shared data and a few extra ints for animation details (current frame, timing, etc...).

The rest of my game doesn't do this, since it's not a bottleneck and would be premature optimization (and introduces undesired complexity), but when I need the extra speed, there's almost always a way to get the extra speed. But don't be distracted by what you don't yet need.

If you really feel like you need them, use a pre-written pool library like boost::pool that is already debugged and working properly.


Just a note to say that boost::pool has been found to be considerably slower than the default allocator on some systems, just highlighting how non-trivial this is. I believe many people on the boost mailing list want the library (in its current form) deprecated in some way.
Is such memory management a must have? How much effort you expend managing memory heavily depends on the kind of game you are making.

For small games, it is often enough to ensure that one is not doing silly things - such as loading large resource multiple times (e.g. caching textures/sounds etc), or dynamically allocating scores of simple objects such as particles or bullets.

As your game scales up merely not doing silly things will eventually cease to suffice. Of course, it is an open question as to what limit you are going to start brushing up against first. Depending on the application, you might find that the memory limitations you might come up against are actually on the GPU. Alternatively, the limit might be unrelated to managing memory, but instead due to memory access patterns as others have mentioned. Or it might be neither - there are lots of other things which could cause the game to under-perform, such as an algorithmic bottleneck in some unexpected subsystem. Middle range games might only need a bit of attention applied to one or two key subsystems to make the game perform as expected.

At the upper end of this scale are the kind of AAA games that require major engineering effort to even work on the computers they will be released on. You will be handed a time/memory budget for your subsystem and you'd better not exceed it! It is highly likely that all subsystems in the underlying engine will have some kind of custom memory management - even if only to enable the developers to see if they are meeting their budget.
It's not a must have.

But having a working tool to locate memory leaks is.

You can have memory leak detection stuff going on with a custom heap allocator/deallocator, but generally the tools that are out there that work with new & delete are far better in terms of usability in terms of tracking down memory leaks and fixing them.
I say Code! You say Build! Code! Build! Code! Build! Can I get a woop-woop? Woop! Woop!

In the book Game Programming Gems 8 it talks about overriding the new operator and creating some sort of HeapManager system to avoid memory fragmentation in the RAM during run time.Is this some kind of problem for older systems,or do you still have to do it on windows7/with new hardware?


This only applies if you're allocation and releasing memory at runtime. And if you are - well, you shouldn't be. The correct pattern is to allocate everything you need up-front at start or load time, then just use it at runtime; if you need a temp pool of scratch memory for short-lived runtime allocations, then create a temp pool of scratch memory for short-lived runtime allocations and pull from that (but create the pool at startup or load time too, not at runtime); otherwise you shouldn't be overloading new or delete in the general case.

It also says that if you have an array of structs with lets say 5 ints in each and you only use 1 int from each struct,the CPU still wastes cycles for the extra unused data.Can't find much info about this anywhere else.


That sounds like the worst kind of micro-optimization. If you're worrying about things down to individual bytes and cycles, then you're worrying about the wrong things. There's rarely meaningful performance gains to be had from that kind of optimization. At the same time, if you have a struct with 5 ints but you only use one of them, the big question is - why on earth do you have 5 ints in the struct? If there's no reason for the other 4 to be there - get rid of them. But that's from a good code-cleanliness perspective rather than anything else.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement