Jump to content

  • Log In with Google      Sign In   
  • Create Account


Is memory management a must have?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
15 replies to this topic

#1 mrheisenberg   Members   -  Reputation: 356

Like
0Likes
Like

Posted 29 September 2012 - 06:53 PM

In the book Game Programming Gems 8 it talks about overriding the new operator and creating some sort of HeapManager system to avoid memory fragmentation in the RAM during run time.Is this some kind of problem for older systems,or do you still have to do it on windows7/with new hardware?It also says that if you have an array of structs with lets say 5 ints in each and you only use 1 int from each struct,the CPU still wastes cycles for the extra unused data.Can't find much info about this anywhere else.

Edited by mrheisenberg, 29 September 2012 - 06:54 PM.


Sponsor:

#2 wqking   Members   -  Reputation: 756

Like
1Likes
Like

Posted 29 September 2012 - 07:17 PM

Don't do for the problem you didn't get yet.

Also it's quite depending on your memory management strategy.
If you allocate all memory when entering a game level and free all when exiting, you won't get memory fragments.

5 ints cause CPU wastes cycles? Now CPU has cache for at least 32 bytes per page, so how can the waste happen? Posted Image

Edited by wqking, 29 September 2012 - 07:18 PM.

http://www.cpgf.org/
cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.
v1.5.5 was released. Now supports tween and timeline for ease animation.


#3 joew   Crossbones+   -  Reputation: 3162

Like
5Likes
Like

Posted 29 September 2012 - 07:52 PM

I think what he meant was having an array of structs where each one has five integers. If the data is not operated on in the same functions then you will be fetching structs of five ints, whereas if you kept a struct of related arrays instead than you would prefetch only the relevant data resulting in much less cache misses as you have left the cache warm.

For example:
[source lang="cpp"]struct Unit{ vec3 transform; int32 other_data; int32 more_data;};std::vector< Unit > units;[/source]
would perform much better as:
[source lang="cpp"]struct Units{ std::vector< vec3 > transforms; std::vector< int32 > other_datas; std::vector< int32 > more_datas;};[/source]

When doing something like say updating transformation matrices.. because instead of fetching other_data and more_data along with the transform like in the first example you are able to only fetch the needed transforms resulting in less cache misses.

Edited by Saruman, 29 September 2012 - 08:01 PM.


#4 JohnnyCode   Members   -  Reputation: 90

Like
0Likes
Like

Posted 01 October 2012 - 01:46 PM

in c++ local variables are allocated from preallocated pool, so OS do not search for memory to provide, BUT, when you use "new" it searches memory.
If you have a function that has a local variable -char* pMany= new char[100000000]; and before it finishes the block you call delete[] pMany, then you shoud rather have a pre allocated pool, such as you use

{
char* pMany=(char*)m_Pool->Request(100000000);// all great
CClassLike pObject=(CClassLike*)m_Pool->Request(sizeof(CClassLike));// beware, THIS will not call constructor or destructor
//use it and when finished with local data- reset pool, or reset it before function runs
m_Pool->Reset();
}

This would be the m_Pool type - CPool,

public class CPool
{
public:
Cpool()
{
m_bAllocated=false;
}
bool CreatePool(long bytescount)
{
// use only once, or handle freeing
m_bAllocated=true;
m_pPoolMemory=new char[bytescount];
m_iPoolSize=bytescount;
m_iCurrentProvidedBytes=0;
}
char* Request(long size)
{
if (m_iCurrentProvidedBytes+size>m_iPoolSize)
return NULL;
char* result= (char*)(m_pPoolMemory+m_iCurrentProvidedBytes);
m_iCurrentProvidedBytes+=size;
return result;
}
void Reset()
{
m_iCurrentProvidedBytes=0;
}
void FreePool()
{
if (m_bAllocated)
delete[] m_pPoolMemory;
}
private:
long m_iPoolSize;
long m_iCurrentProvidedBytes;
char* m_pPoolMemory;
bool m_bAllocated;
}

As you can see CPool::Request(size) is some fast instruction. And I would not override "new" operator, but this is just my personal preference.

#5 frob   Moderators   -  Reputation: 16154

Like
8Likes
Like

Posted 01 October 2012 - 04:18 PM

That isn't what the gem is referring to.


Allocations with new will go out to the operating system, allocate memory, and return.

When the system allocates memory it OFTEN takes a long time. It doesn't always, but often it does. By "long time" I'm talking on the order of microseconds.

The 'gem' is that you create your own memory management pool. Instead of letting the OS handle all the work, YOU need to handle all the work. You need to address problems of memory fragmentation, allocators and deallocators, thread safety, and much more. Done correctly you can have allocations take place on the order of a few hundred nanoseconds.


The gem attempts to save around one microsecond per memory allocation.

That one microsecond is valuable when games are pushing the limit. The trade off is that YOU end up needing to solve all those problems that have already been solved by the OS. It does a considerable amount of work for you so that you don't need to, so you better have a serious need for those microseconds.

If you don't have a critical need for those few microseconds, you are probably better off avoiding custom allocators.

If you really feel like you need them, use a pre-written pool library like boost::pool that is already debugged and working properly.
Check out my personal indie blog at bryanwagstaff.com.

#6 Servant of the Lord   Crossbones+   -  Reputation: 14856

Like
0Likes
Like

Posted 02 October 2012 - 11:56 AM

To add on to frob's statement, you don't even need a generic "pool" to take advantage of this. Depending on what you are doing, you can just reuse the same memory blocks for the same purpose (but with new data), and get massive speedups. This has the added benefit that you don't need to worry about calculating sizes or fragmentation issues.

Real life example from my current project:
Since my game is a 2D tile based game, and every "chunk" of the world is the same size (20 by 20 tiles), when I need to unload chunks and load new chunks when the player moves around, I use the memory allocated for the previous chunks to store the new chunks, instead of calling delete() on the old area and calling new() on the new area. Instead of calling new() for 20 * 20 (400) tile structs (per layer per chunk), the tile memory is just reused for new tiles. My speed ups were very noticeable, though I don't remember the exact amount gained.

In the same way, I reuse "tile" memory, "layer" memory, and "chunk" memory, and only allocate if I need more layers, or delete if I have unused layers.
Note: the 'tile' struct was very small and was already taking advantage of the Flyweight pattern, mostly holding a pointer to the shared data and a few extra ints for animation details (current frame, timing, etc...).

The rest of my game doesn't do this, since it's not a bottleneck and would be premature optimization (and introduces undesired complexity), but when I need the extra speed, there's almost always a way to get the extra speed. But don't be distracted by what you don't yet need.

It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.

[Fly with me on Twitter] [Google+] [My broken website]

All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.                                                                                                                                                       [Need free cloud storage? I personally like DropBox]

Of Stranger Flames - [indie turn-based rpg set in a para-historical French colony] | Indie RPG development journal


#7 e‍dd   Members   -  Reputation: 2097

Like
0Likes
Like

Posted 02 October 2012 - 02:32 PM

If you really feel like you need them, use a pre-written pool library like boost::pool that is already debugged and working properly.


Just a note to say that boost::pool has been found to be considerably slower than the default allocator on some systems, just highlighting how non-trivial this is. I believe many people on the boost mailing list want the library (in its current form) deprecated in some way.

#8 rip-off   Moderators   -  Reputation: 6873

Like
1Likes
Like

Posted 02 October 2012 - 04:12 PM

Is such memory management a must have? How much effort you expend managing memory heavily depends on the kind of game you are making.

For small games, it is often enough to ensure that one is not doing silly things - such as loading large resource multiple times (e.g. caching textures/sounds etc), or dynamically allocating scores of simple objects such as particles or bullets.

As your game scales up merely not doing silly things will eventually cease to suffice. Of course, it is an open question as to what limit you are going to start brushing up against first. Depending on the application, you might find that the memory limitations you might come up against are actually on the GPU. Alternatively, the limit might be unrelated to managing memory, but instead due to memory access patterns as others have mentioned. Or it might be neither - there are lots of other things which could cause the game to under-perform, such as an algorithmic bottleneck in some unexpected subsystem. Middle range games might only need a bit of attention applied to one or two key subsystems to make the game perform as expected.

At the upper end of this scale are the kind of AAA games that require major engineering effort to even work on the computers they will be released on. You will be handed a time/memory budget for your subsystem and you'd better not exceed it! It is highly likely that all subsystems in the underlying engine will have some kind of custom memory management - even if only to enable the developers to see if they are meeting their budget.

#9 Kyall   Members   -  Reputation: 287

Like
0Likes
Like

Posted 08 October 2012 - 04:42 AM

It's not a must have.

But having a working tool to locate memory leaks is.

You can have memory leak detection stuff going on with a custom heap allocator/deallocator, but generally the tools that are out there that work with new & delete are far better in terms of usability in terms of tracking down memory leaks and fixing them.
I say Code! You say Build! Code! Build! Code! Build! Can I get a woop-woop? Woop! Woop!

#10 mhagain   Crossbones+   -  Reputation: 6315

Like
0Likes
Like

Posted 08 October 2012 - 05:21 AM

In the book Game Programming Gems 8 it talks about overriding the new operator and creating some sort of HeapManager system to avoid memory fragmentation in the RAM during run time.Is this some kind of problem for older systems,or do you still have to do it on windows7/with new hardware?


This only applies if you're allocation and releasing memory at runtime. And if you are - well, you shouldn't be. The correct pattern is to allocate everything you need up-front at start or load time, then just use it at runtime; if you need a temp pool of scratch memory for short-lived runtime allocations, then create a temp pool of scratch memory for short-lived runtime allocations and pull from that (but create the pool at startup or load time too, not at runtime); otherwise you shouldn't be overloading new or delete in the general case.

It also says that if you have an array of structs with lets say 5 ints in each and you only use 1 int from each struct,the CPU still wastes cycles for the extra unused data.Can't find much info about this anywhere else.


That sounds like the worst kind of micro-optimization. If you're worrying about things down to individual bytes and cycles, then you're worrying about the wrong things. There's rarely meaningful performance gains to be had from that kind of optimization. At the same time, if you have a struct with 5 ints but you only use one of them, the big question is - why on earth do you have 5 ints in the struct? If there's no reason for the other 4 to be there - get rid of them. But that's from a good code-cleanliness perspective rather than anything else.

It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#11 Olof Hedman   Crossbones+   -  Reputation: 2215

Like
0Likes
Like

Posted 08 October 2012 - 05:36 AM

That sounds like the worst kind of micro-optimization. If you're worrying about things down to individual bytes and cycles, then you're worrying about the wrong things. There's rarely meaningful performance gains to be had from that kind of optimization. At the same time, if you have a struct with 5 ints but you only use one of them, the big question is - why on earth do you have 5 ints in the struct? If there's no reason for the other 4 to be there - get rid of them. But that's from a good code-cleanliness perspective rather than anything else.


I think they are more referring to a design issue, where those 5 ints in some way are conceptually coupled (they are parameters to a "vehicle" or such), but you in that particular function you want to optimise only use one of them (like for example the "speed"). In that case it might be bad for memory throughput and cache to work on this sparse array.
Still a micro optimisation though, and nothing you should worry about until you find you need it through performance measurements.

It's good to know about these "tricks" or "gems", but one should not confuse them with code guidelines, and should not worry about them in daily work. (unless you are a performance optimisation specialist)

Edited by Olof Hedman, 08 October 2012 - 05:50 AM.


#12 joew   Crossbones+   -  Reputation: 3162

Like
2Likes
Like

Posted 08 October 2012 - 05:44 PM

That sounds like the worst kind of micro-optimization. If you're worrying about things down to individual bytes and cycles, then you're worrying about the wrong things. There's rarely meaningful performance gains to be had from that kind of optimization.

I wouldn't call it a micro-optimization at all but rather a design issue and one that is becoming more important every year. There are two issues at stake, the first being the fact that fetching from RAM is slow and will continue to get slower. Therefore one of the most important issues is how you fetch and cache the data to operate on as there is really no reason not to do this... it usually makes the code much easier to read. On current generation platforms (and likely future) it is extremely important as you can't afford to DMA a bunch of data that you don't need to work on, etc. Therefore I wouldn't call this the "worst kind of optimization" but rather "the best kind of design".

Also note when data is separated out and designed like this it usually goes hand in hand with being able to parallelize operations much easier. You aren't passing a large object with the kitchen sink inside (where realistically anything could be called/changed)... you are able to pass large contiguous blocks of memory that hold specific pieces of data to be worked on.

Edited by Saruman, 08 October 2012 - 05:52 PM.


#13 JohnnyCode   Members   -  Reputation: 90

Like
0Likes
Like

Posted 12 October 2012 - 08:04 AM

That sounds like the worst kind of micro-optimization. If you're worrying about things down to individual bytes and cycles, then you're worrying about the wrong things. There's rarely meaningful performance gains to be had from that kind of optimization. At the same time, if you have a struct with 5 ints but you only use one of them, the big question is - why on earth do you have 5 ints in the struct? If there's no reason for the other 4 to be there - get rid of them. But that's from a good code-cleanliness perspective rather than anything else.


in my game I have a on air mesh loading from hdd stream. A mesh of 100 000 verticies needs about 8 arrays of megabytes size to dispatch to GPU ram. If I had been allocating those temporary byte arrays and sending them to GPU and freeing them, my game would be framing intensively. I instead started to use preallocated memory for those temporary large arrays and now I can load several 100 000 vertex models to scene without a noticable frame (without textures of course, those are preloaded for whole world).

This was just an example, we are not talking about preallocating 5 ints. Nitpicking preallocated memory to be useless optimization is realy out of sense.

#14 akesterson   Members   -  Reputation: 138

Like
0Likes
Like

Posted 14 October 2012 - 07:52 AM

As the other posters have pointed out, the gem refers to having a pool of memory that you allocate up front, as opposed to allocating on demand. This is the way that the Java JVM works; at startup time, it requests (from the operating system) the maximum amount of memory the program is configured to use (per environment flags), and then does its own allocations out of that memory later. This way it doesn't have to wait on the OS scheduler, kernel, whatever, to do the job for it, and it can optimize its memory arrangement however is optimal for that specific program. The previously mentioned boost::pool does the same thing. There are C libraries that do the same, etc, ad infinitum.

See the wikipedia article on Memory Pools for more generalized information: http://en.wikipedia....iki/Memory_pool

Edited by akesterson, 14 October 2012 - 07:55 AM.


#15 larspensjo   Members   -  Reputation: 1526

Like
0Likes
Like

Posted 15 October 2012 - 04:15 AM

As the other posters have pointed out, the gem refers to having a pool of memory that you allocate up front, as opposed to allocating on demand. This is the way that the Java JVM works;


It is a mechanism I am hesitant to. It reminds me of eating lunch at a place like McDonalds; if it looks like the tables are not going to suffice, people start allocating tables before ordering the food. It is also like Microsoft Windows today. Many applications take a long time to start, so they add some pre-startup functionality with the system or login.

Of course, there may be a benefit of speed. But it can also result in everyone losing. Please excuse me for associations in tangent space.
Current project: Ephenation.
Sharing OpenGL experiences: http://ephenationopengl.blogspot.com/

#16 akesterson   Members   -  Reputation: 138

Like
0Likes
Like

Posted 15 October 2012 - 10:01 PM

It is a mechanism I am hesitant to. ... Of course, there may be a benefit of speed. But it can also result in everyone losing. Please excuse me for associations in tangent space.


It's not suitable for every situation, certainly, but there are times when you know you are better off allocating everything up front, rather than piecemeal. YMMV.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS