C++ memory questions

Started by
10 comments, last by Satharis 10 years, 3 months ago

I think we need to stop saying, “heap allocation” and “stack allocation”. There is just “allocation”, and it is always on the heap and it is always preferable to avoid for performance reasons.

The stack is nothing more than a piece of memory that was allocated (from the heap) by the operating system when it created the thread (and each thread has its own stack).

On a hardware level, unless you are working on Nintendo DS, there is absolutely no difference what-so-ever between stack memory and heap memory.
You can literally allocate from the heap, copy the thread’s stack memory over to the heap address, change the EBP/ESP pointers to the relative address on the allocated RAM, and the performance will be exactly the same (accounting for cache coherency and every single other factor—literally because this is all the operating system does when it creates a thread).

In a typical case the stack is accessed frequently enough to constantly be in cache, but accessing an array of 2,000 integers on the heap vs. on the stack will incur essentially the same number of cache misses and thus have the same performance.

In other words, the lesson to be learned is that any benefit the stack has in performance can easily transfer over to heap RAM as well. You can use allocated RAM (AKA “the heap”) just as efficiently as stack RAM, so just by moving data to the stack in itself isn’t going to gain you any amount of performance. It’s how you avoid allocations in the first place combined with how you take advantage of the cache and forward-predict.

So when you use map[x][y] from the stack, the compiler replaces that with something like stack register + offset to map + x * sizeof(Tile) + y * sizeof(Tile) * 2000. That numerical result is computed at compile time, so there is no runtime cost to using stack memory.

Wrong.
If you have a hard-coded array on the stack and then you access it with hard-coded indices, such as:


int iMyLife[20][30];
iMyLife[5][12]

…then the compiler can translate that into a simple:


mov			ecx, dword ptr [ebp+648] ; Assumes iMyLife follows no other locals inside the function.

When it comes to actual dynamic array access (iMyLife[Y][X]), the x86 instruction set is itself very dynamic.
The MOV instruction can be written as:


mov     edx, dword ptr [ebp+eax*4+100]

A single instruction can contain the stack base offset (EBP), an array index (EAX), size multiplier (4) and even an offset (100).
But what you are proposing is far beyond the x86 instruction set. It would look something like (assuming an offset of 100 again, sizeof(Tile) = 4, 2,000 tiles in a row, eax = X, ebx = y):


mov			edx, dword ptr [ebp+ebx*8000+eax*4+100]

This is obviously not a valid instruction.
#1: The size multiplier can be only 0, 1, 2, 4, or 8. Not 2,000, not 8,000, not even 20.
#2: No mnemonic allows 3 registers in a single operand; you can’t have ebp, ebx, and eax all being arguments for the right side.

Whether on the stack or the heap, even when the compiler knows the stride of the array (2,000 × sizeof( Tile )), the output code will be very similar.



The basic bottom line is not to assume the stack is faster than the heap. Use the stack how it is supposed to be used (for small local variables) and use the heap how it is supposed to be used (for large data, data of changing or unknown sizes, etc.) and learn general proper performance practices.


L. Spiro

I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

Advertisement

The basic bottom line is not to assume the stack is faster than the heap. Use the stack how it is supposed to be used (for small local variables) and use the heap how it is supposed to be used (for large data, data of changing or unknown sizes, etc.) and learn general proper performance practices.

This alone would sum up the point pretty well, even ignoring the entire rest of the information of the post.

If anything a more realistic point to make would be that there is almost never going to be a situation where you are suffering performance wise simply because you made an allocation on the heap vs the stack, if anything the problem is usually that you're making memory allocations on the heap in an extremely inefficient way.

This topic is closed to new replies.

Advertisement