C++ Memory Usage & Allocation

Started by
11 comments, last by 21st Century Moose 12 years, 1 month ago
Hi,

I am currently working on a game and I am having some difficulty understanding why my memory usage is so high.

So for testing I created a simple project to that allocates memory.


#include <iostream>

unsigned int g_uiTotalClusterMemory = 0;
unsigned int g_uiTotalCubeMemory = 0;

class CCube
{
public:
CCube() {}
~CCube() {}

private:
int m_i1;
int m_i2;
};

class CCluster
{
public:
CCluster()
{
for (int i = 0; i < s_kiX; ++i)
{
for (int j = 0; j < s_kiY; ++j)
{
for (int k = 0; k < s_kiZ; ++k)
{
m_pCube[j][k] = new CCube();
g_uiTotalCubeMemory += sizeof(CCube);
}
}
}
}

~CCluster() {}

private:
static const int s_kiX = 16;
static const int s_kiY = 16;
static const int s_kiZ = 16;
CCube* m_pCube[s_kiX][s_kiY][s_kiZ];
};

CCluster* g_pCluster = 0;

void main()
{
const int kiNumClusters = 16 * 16 * 16;

//Instance clusters
g_pCluster = new CCluster[kiNumClusters];

//Calculate memory useage for all clusters
g_uiTotalClusterMemory = sizeof(CCluster) * kiNumClusters;

//Convert bytes to megabytes
g_uiTotalClusterMemory /= 1024;
g_uiTotalClusterMemory /= 1024;
g_uiTotalCubeMemory /= 1024;
g_uiTotalCubeMemory /= 1024;

//Calculate total memory
unsigned int g_uiTotalMemory = g_uiTotalClusterMemory + g_uiTotalCubeMemory;

//Output memory values to screen
std::cout << "Total Cube Memory Used:" << g_uiTotalCubeMemory << "\n";
std::cout << "Total Cluster Memory Used:" << g_uiTotalClusterMemory << "\n";
std::cout << "Total Memory Used:" << g_uiTotalMemory << "\n";

//Pause
float fCaek = 0.0f;
std::cin >> fCaek;
}


The code outputs the following:

Total Cube Memory Used:128
Total Cluster Memory Used:64
Total Memory Used:192

But windows taskmanager and process explorer both tell me the program is using 329,624k of memory. Is this test setup correctly? If so why is my program using more memory then I am allocating?

Thanks!

EDIT: I am using Visual Studio 2010 with the program running outside of the IDE, release build.
Advertisement

  • Task Manager is a terrible way to accurately judge memory use
  • The run-time will also use memory
  • Your code looks fine at a quick glance
16 * 16 * 16 = 4K
sizeof(CCluster) = 4K * 2 * sizeof(int)
You are allocating 4K * sizeof(CCluster) = 4K * 4K * 2 * sizeof(int) = 128M memory.
It's not surprise to see task manager reporting 300M memory usage.

https://www.kbasm.com -- My personal website

https://github.com/wqking/eventpp  eventpp -- C++ library for event dispatcher and callback list

https://github.com/cpgf/cpgf  cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.


16 * 16 * 16 = 4K
sizeof(CCluster) = 4K * 2 * sizeof(int)
You are allocating 4K * sizeof(CCluster) = 4K * 4K * 2 * sizeof(int) = 128M memory.
It's not surprise to see task manager reporting 300M memory usage.



From what I understand and what the program tells me,

16 * 16 * 16 = 4096 Clusters
4096 * 16 * 16 * 16 = 16777216 Cubes

16777216 * (sizeof(Cube*)) = 64MB
16777216 * (sizeof(int) * 2) = 128MB, like you said

329 - (64 + 128) = 137MB memory that is unaccounted for?

I am using Process Explorer as well to check the programs memory footprint.

329 - (64 + 128) = 137MB memory that is unaccounted for?


Your memory is not allocated one byte after another.
There are maybe holes in your memory.
Those holes have to be counted in task manager.
It's quite reasonable that task manager reporting twice or triple times memory than you allocated.

https://www.kbasm.com -- My personal website

https://github.com/wqking/eventpp  eventpp -- C++ library for event dispatcher and callback list

https://github.com/cpgf/cpgf  cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.


[quote name='bullfrog' timestamp='1331109417' post='4920008']
329 - (64 + 128) = 137MB memory that is unaccounted for?


Your memory is not allocated one byte after another.
There are maybe holes in your memory.
Those holes have to be counted in task manager.
It's quite reasonable that task manager reporting twice or triple times memory than you allocated.
[/quote]


I see what you are saying. I will do another test with allocated blocks of 16 cubes instead of 1 by 1.

Thanks!

I see what you are saying. I will do another test with allocated blocks of 16 cubes instead of 1 by 1.

You don't need to do more test, especially test with task manager or any process inspector.
That makes very few sense.

What you should focus on,
1, Check and avoid memory leak,
2, If the memory usage is still huge, check when and reduce the memory usage.

https://www.kbasm.com -- My personal website

https://github.com/wqking/eventpp  eventpp -- C++ library for event dispatcher and callback list

https://github.com/cpgf/cpgf  cpgf library -- free C++ open source library for reflection, serialization, script binding, callbacks, and meta data for OpenGL Box2D, SFML and Irrlicht.


[quote name='bullfrog' timestamp='1331112841' post='4920017']
I see what you are saying. I will do another test with allocated blocks of 16 cubes instead of 1 by 1.

You don't need to do more test, especially test with task manager or any process inspector.
That makes very few sense.

What you should focus on,
1, Check and avoid memory leak,
2, If the memory usage is still huge, check when and reduce the memory usage.
[/quote]

Allocating cube by cube seems to be the problem. When I allocated the cubes in batches, the memory now settles at expected levels. Windows memory aligning must not be the best.

Thank you for everyones help!
Windows is probably doing fine here. There are multiple "layers" that affect memory allocation, any one of which could be contributing to this result.

The first layer is the type itself Padding between members, padding at the end of a structure/class and the typical implementation of virtual functions add bytes to the allocation. Fortunately we can see these via sizeof().

Another layer is for dynamically allocated arrays (via new[]). The compiler must call the destructor for all the objects in the array. To do this, it must understand the length of the array. A typical implementation will add a hidden word before the data where the length will be stored.

The above is all decided at compile time.

During execution, the language has emitted calls to the runtime (i.e. the implementation of operator new, malloc, etc). Generally speaking, the runtime requests big batches of memory from the operating system. The runtime splits this memory into smaller chunks, and hands it out. It also has to maintain some bookkeeping information to understand which memory in a chunk is allocated and when it can be freed.

Here is where most "Task Manager" analysis breaks down. From the task manager's point of view - your program is a black box. It cannot determine if memory in your program is "logically" free, nor can it give you a separate figure for the bookkeeping overhead.

Then there debug helpers, which might insert guard bytes in various places, allowing some kinds of buffer overruns to be detected. Again, the Task Manager cannot help you here. Shouldn't be occurring if you are running a Release executable by itself, but it is interesting to remember that it could be there other times.
The main thing that's causing your extra memory overhead is that memory allocators add some overhead to the size of allocations. Even in a release build I'd expect to see at least an additional 8 bytes per allocation, plus a rounding up to the next biggest multiple of 8 bytes for alignment. This means allocating objects that are 8 bytes (or less) each will probably actually allocate 16 bytes each. The extra data is used by the heap to track what memory is free and what isn't.

In debug the overhead can be much more, to allow better detection of heap corruption.

In your case you could simply allocate one large array of CCube objects (by changing the array definition to[font=monospace] [/font]CCube m_Cube[s_kiX][s_kiY][s_kiZ]; ). That not only eliminates almost all of the allocation overhead, but also gets rid of the extra pointer per cube (which adds 4 bytes of overhead to the 8 bytes per cube)!

In general the standard solution is to pool the allocations. This should also improve performance as you get better data locality, and therefore less cache misses.

This topic is closed to new replies.

Advertisement