C++ memory questions

Started by
10 comments, last by Satharis 10 years, 3 months ago

I use C++ as a hobby and .NET professionally so I can code in C++ as my hobby in game development but I don't understand 100% the insides and out of memory. I think I have a decent understanding but I want to ask some questions and give how I think things work (over reading forums and other thing about it) and maybe someone could be as kind as to tell me if I'm mistaken.

1) Memory created on the stack is created faster and accessed faster (therefor runs faster) than memory created on the heap.

2) There is only so many Meg's of stack memory per application, but heap can grow and shrink.

3) stl containers actually store the objects they contain on the heap. it doesn't matter if the type you have is a pointer or not. ie (list<Object> vs list<Object*>). If I'm filling this list myself ALL objects will be created on the heap meaning the speed penalty between the stack/heap exists.

4) Normal arrays that are not storing pointers ie (Object obj[10]) are stored on the stack no matter what.

5) I thought I heard that if the object you are inside was created on the heap then the variables that it creates are also stored on the heap even if they aren't pointers ie:

class Object
{
};
 
class Entity
{
private:
   Object o;
};
 
Entity* e = new Entity();
 
//e->o is actually created on the heap instead of the stack because it's parent object (Entity) was created on the heap

6) So given that above isn't that far off, that means to truly squeeze performance of both loading times and speed, one should try to avoid the heap if possible. The problem with that being that you only have so much space on the stack (maybe 64-bit eases this pain a lot?) and that you have to know how many objects you'd need. My question about knowing is that very few games actually have unlimited amounts of any objects. So you could store your objects in normal arrays that would be predefined/max size and have code for reusing "old"/"deleted" objects. This would mean you are most likely wasting memory but at the benefit of speed. Is that trade-off ever worth it games? I mean a benefit to this is that you don't have to worry about memory management or leaks. I understand most parts of a game (audio/visual) will use the heap and has too, but I'm more referring to the gameplay code (the code I'm actually writing because I use an engine for all the other stuff) instead of the resource management.

Thanks for reading

Advertisement

1) This is partially correct due to cache effects (the stack is small and accessed frequently and thus more likely to be in cache, while heap allocated items are scattered in memory), however the cache effects are irrelevant because thats not what you base your decision between stack/heap. If you CAN allocate on the stack, you allocate on the stack. What makes the heap slow is that finding a free segment of memory is an extremely expensive operation, which is why you only want to allocate large blocks and prefer more "uniform" allocation strategies (stacks/pool allocators)

2) The stack is limited in most cases, which is why you usually allocate on the heap if you need a huge chunk of memory (especially if the cost of allocation is overwhelmed by the processing of that huge memory chunk). You can usually change the size of the memory chunk reserved for stack though.

3) Dynamic STL containers use the heap to store the objects because they are dynamic, you cant really resize memory allocated on the stack. A std::vector will allocate a contiguous chunk of memory from the heap where it stores your array of objects (and reallocates if needed). If the type is Object, the array contains Objects, if the type is Object*, the array contains Object pointers (however you need to make sure the pointers actually point somewhere!). The containers dont really care what the type is, they just blindly store them. If you were to separately allocate these Objects (to fill the Object* container) on the heap, they would be scattered all around the memory instead of being stored in a contiguous chunk, which is very bad for cache.

4) Normal arrays are stored on the stack, because their size is known at compile time and thus they can be. std::array also allocates on stack because its size is known at compile time, so not all std containers store their elements on the heap.

5) Classes and structures in memory are simply their members stored in a single chunk of memory. So, in your example, Entity in memory is exactly the same as Object in memory. Thus, when you allocate an Entity on the heap, Object must be on the heap too (because its integrated as a part of the memory representation of Entity)

Eg. if we had struct RGB {char r,g,b;}, in memory we would have

[ RGB ]

which is the same as

[char r][char g][char b]

which is the same as

[imagine 24 bits here]

A pointer would simply be a number (which you may interpret to be a Object*) so if Entity had Object* instead, the bytes of the pointer would be in heap, but the Object could be anywhere in memory (heap,stack,wherever you put it). That is, if Entity has Object*, it doesnt mean Object* points to the heap. You can point to an Object stored on the stack, too.

6) Theres nothing wrong with the heap itself, its the allocation that is the problem. If you were to use a fixed size array (and reusing spaces for old objects) you are basically implementing a custom allocator which is most likely more efficient than the general allocator provided by the OS. You can combine these two, use the general allocator to get the large blocks of memory, and then use custom allocators to efficiently allocate your objects within these memory blocks. This way you get both fast allocation as well as nonlimited space (if you allow allocating more of these big memory chunks using the general allocator as needed)

Its a compromise between wasted space and performance in many cases.

o3o

1) Memory created on the stack is created faster and accessed faster (therefor runs faster) than memory created on the heap.

Stack space is allocated faster than a general heap allocation. It's just a pointer increment. Heap allocation and deallocation has other datastructures to manage, and threading to worry about.

But all your remaining questions are revolving around some misconception about how "fast" this memory is. There's NO speed difference in using any particular byte of stack or heap memory, it's all the same physical ram on the machine. The speed differences come from how you use the memory. Stack memory is "fast" in that you usually just pushed onto the stack the object you're playing with. That means the compiler will either optimize it into a register, or if it does spill into main memory it's going to be in the cache. Heap space is only "slow" because you can technically go grab any old byte that you want, and if it isn't in the cache that takes time. Pick the correct datastructures for your objects, and you can optimize for memory locality, which is the biggest time saver. Keep all the data you need to access near related data, and push all the unrelated data somewhere else. There's really big performance benifits to organizing your data into contiguous runs of information that you fully access instead of letting the "interesting" bytes get spaced out across memory.

That said, to your STL questions, it's generally not about how it uses the ram, it's about the "Big O" complexity of the algorithm that you want. Containers are all about defining the insert, delete, find, and iterate complexity. Some datastructures space out your object in the heap more than others, but give you some complexity benefit to your algorithm. If you really care about how it's spacing things out, you can, with some work, replace the allocator to give the behavior you're looking for. But you usually want to just consider the containers on the merits of "i need fast, sorted inserts" or "i need fast linear iteration", and store only what you need in each container so that you're only paying for the "interesting" bytes that you wanted to store.

But there is a speed difference in allocation right, meaning loading times would be faster if everything that could would be stored on the stack vs heap? Again though because most times we don't know how many objects we would sacrifice memory by pre-allocating everything on the stack, but I would think loading would be greatly increased. We all hate loading times right smile.png

I remember I was testing a 2D tile game where I would create Tile objects dynamically for each element in the 2D array (trees, etc). When I was using new Tile on this 2000x2000 2D array it took like 10 mins or something like that to finish (I don't remember the exact numbers but it was a long time). When I switched to stack memory for this it was a couple seconds. So the difference between Tile* map[2000][2000] vs Tile map[2000][2000] was massive. Of course that changes how I was going to code things because I was using polymorphism for the Tile objects and couldn't when I changed to not using a pointer for the objects.

10 minutes sounds like an outrageously large amount of time just to perform 4 million allocations, even on the heap. I just did a test to do four million 32 byte allocations and it only took about 2 seconds.

Anyway: don't put all your stuff on the stack. The stack is useful for objects that are relatively small and who are expected to live only as long as the scope they're in. It can be useful to use the stack for these objects rather than making a slower heap allocation. But for long lived, large objects (like a massive array of tiles) you'll do best putting it on the heap.

And since heap allocation is slow, it's important to think about your memory allocations and not make tons of allocations wastefully. Making four million allocations for your tile map is wasteful, in terms of the time it takes to perform those allocations as well as the overhead for each allocation. You'd be better off making one big allocation that contains the entire array of tiles, and using some other technique than virtual functions for polymorphism (i.e. switching behavior based on a type field in the tile, or something).


You'd be better off making one big allocation that contains the entire array of tiles, and using some other technique than virtual functions for polymorphism (i.e. switching behavior based on a type field in the tile, or something).

I agree, but the attractiveness of polymorphism is powerful :)

The stack pointer is in a register. If you allocate something on the stack, it's assigned an offset. There is no actual runtime allocation of memory that incurs a cost, just the address calculation done at compile time. So when you use map[x][y] from the stack, the compiler replaces that with something like stack register + offset to map + x * sizeof(Tile) + y * sizeof(Tile) * 2000. That numerical result is computed at compile time, so there is no runtime cost to using stack memory.

Even your "Tile* map[2000][2000]" line of code is not using the heap. That is placing 4 million Tile pointers on the stack. When you call new to allocate space for one of the elements of the map 2D array, that is the first heap memory you're using. For performance, you want to minimize the number of times you call new (or malloc, etc.). The new operation is very slow, slow enough that many, many games reinvent the wheel and write their own memory allocation functions.

In your situation, I'd probably do:

const int numTilesX = 2000;

const int numTilesY = 2000;

Tile* map = new Tile[numTilesX * numTilesY];

and access any element as map[x + y * numTilesX]

I use single-dimensional arrays as often as possible. I just find them easier to debug, and I'm a bad enough programmer that I spend a lot more time debugging than writing new code =)

If it helps, think of it in terms of the struct keyword vs class keyword in C#. Any local variable or class member in C++ automatically acts like a C# struct. If you explicitly allocate on the heap (it'll never happen automatically, except when some class you use is doing it internally, like most STL containers) then it's like a C# class. Likewise, the performance in C++ is not too far off from C#: don't allocate things you don't need to allocate. C# makes it harder to avoid allocations compared to C++ in exchange for making it easier deal with. You really need a clear understanding of the stack and heap for high-end C# programming, too, so you might want to take the time to read some texts on the topic.

Video classes on pointers and allocation: https://www.youtube.com/playlist?list=PL2_aWCzGMAwLZp6LMUKI3cc7pgGsasm2_

Articles and online books on the topic:

http://www-ee.eng.hawaii.edu/~dyun/ee160/Book/chap14/subsection2.1.1.8.html

http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html

http://stackoverflow.com/questions/79923/what-and-where-are-the-stack-and-heap

I'm sure you could find more with a little Googling if those aren't cutting it for you.

Sean Middleditch – Game Systems Engineer – Join my team!

Hi there. I don't think it's that slow to allocate memory. My last 2d rts game I allocated all objects with new and deleted them and I had no slow downs or any thing you could notice. And that could have 8 players with 800 units each plus.

Hi there. I don't think it's that slow to allocate memory. My last 2d rts game I allocated all objects with new and deleted them and I had no slow downs or any thing you could notice. And that could have 8 players with 800 units each plus.

If you mean allocating each unit separately, it could make a huge difference if you used pool allocation instead when it comes to creating and destroying units, as well as processing them (due to cache being utilized better)

The creation and destruction of units probably isnt a problem here though because i doubt you would be doing thousands of allocations per second. Its more the processing part that would matter here.

So allocation isnt horribly slow itself, but compared to the alternatives it is.

As an example i had implemented my sparse quadtree using std::function (so i can use lambdas to do operations on it) and std::vector, and when i replaced std::function with a template parameter and applied some .reserve() to the vectors, the performance improved at least tenfold.

o3o

This topic is closed to new replies.

Advertisement