Archived

This topic is now archived and is closed to further replies.

Alan Kemp

Memory mapping

Recommended Posts

Alan Kemp    772
Hi, I am trying to display graphically the memory that my program has allocated. To do this I have overridden new and delete so I can record memory allocations, where they are in memory and their size. My problem is this, what address should I be using a base address when I am plotting my memory? I know that my computer has 512mb of ram, but with virtual memory, and each process getting its own memory space, what address does my program start getting memory allocated at? Is it even a constant? I wrote a test program to allocate some memory and print the address it was allocated at:
  
	int *p[10];

	for (int i = 0; i < 10; i++)
	{
		p[i] = new int[10];
		std::cout << p[i] << std::endl;
	}
  
The lowest address I get is 0x003210E0. Not exactly the nice round number I was hoping for. Any ideas or suggestions? Or can anyone point me in the direction of some specification that details where programs allocate memory? Thanks in advance, Alan

Share this post


Link to post
Share on other sites
kdogg    204
I don''t remember all of this off the top of my head, but there is system memory that is usually at the ''beginning'' of the memory. This is used by the OS and you can''t write to it.

Share this post


Link to post
Share on other sites
jamessharpe    497
I think that you can only show where the allocated memory is relative to each other. The problem you have is that the OS will page areas of the memory in and out of the virtual memory system. All that you can really show is whether the blocks of memory are being allocated next to each other, and from this determine the fragmentation that is occuring while your program is being run.

So to answer your question, you cannot assume that memory allocations begin at some constant address. the address to use to plot your memory against would be the lowest address that has been allocated to you. If a lower address becomes allocated, then you use that address as the base address. Also I think that it is probably a worthless task to show how every byte of 512Mb is allocated, why do you need a graphical representation of the locations, or is it just to show fragmentation?

Share this post


Link to post
Share on other sites
Alan Kemp    772
> why do you need a graphical representation of the locations, or is it just to show fragmentation?

Yes, its to show fragmentation. I am about to start a project on a console, and am pretty worried about all the problems I have read about memory fragmenting and causeing inexplicable crashes after the program has been running a while.

I figured I could keep an eye on how memory was being used by drawing a bar along the bottom of the screen to represent memory, and color it depending on what system allocated which parts of the memory.

However, I don''t think I am going to be able to prototype my idea on a PC, guess I will just have to wait a couple of weeks untill the hardware gets here and write it then.

Thanks for your help,

Alan

Share this post


Link to post
Share on other sites
S1CA    1418
1) "Programming Applications For Windows" (aka "Advanced Windows" for older versions) by Jeffry Richter (MS Press) is a book I''d suggest getting. It has a chapter (or few) dedicated to how memory management, process address spaces etc work under Windows. Unfortunately my copy is at home so I can''t check a few specifics.


2) **ALL** addresses you''ll use in your "user mode" program are virtual. How those addresses relate to real physical RAM addresses (or pages in the page file) is entirely up to the OS (and the MMU in the CPU - TLBs and all that gubbins). It''s entirely feasible that the OS can swap the WHOLE of your address space to disk when a task switch happens, so the idea of "this program starts at physical address X and that maps to virtual address Y" doesn''t really work (AFAICR everything actually works on a page by page basis anyway).


3) IIRC all processes start at the SAME virtual address (~0x40000). Each process gets between 2GB and 4GB as its address range (OS dependent), obviously in most situations not all of those are reserved or committed pages. IIRC the process and user heaps go at the low addresses of that range and DLLs/shared memory (mapped files etc) go at the top of that address range. Though of course there isn''t anything to stop a program calling VirtualAlloc() directly or specify an address for a file mapping to directly allocate some memory in the middle of the address range.


4) Memory allocated using the CRT (new, malloc etc) will end up coming from a heap (i.e. a call to HeapAlloc()) which will be "somewhere" within your process address space. Overhead from the debug allocator, the CRT itself etc means your "start address of memory" isn''t ever a nice value.


5) Some handy utils and source at http://www.sysinternals.com/


6) I''d advise some/all of the following:
a. Write/obtain a decent debug memory allocator that you can easily add your own stuff to. [It''ll come in handy later on the consoles.]

b. For all your PC stuff, allocate your own fixed size heap with HeapCreate(). What you get back should be a contiguous block of "virtual" address space memory. Make your allocator use that and make the size of the heap configurable.
That way you can do set your allocator to have a maximum of say 24Mb so you know the exact range (addresses aren''t important, particularly in a graphical representation where you can "see" the fragmentation) and you can catch cases that would result in out of memory conditions on the consoles.
Many PC programmers would benefit from doing that rather than assuming infinte RAM...

c. The C Runtime Library (and so the default malloc) uses its own heap called _crtheap. If you want to find out stuff about that using the HeapXXX functions, you could do a "extern HANDLE _crtheap;" and use that handle. Remember that some CRT functions allocate memory internally so really you should take those into account.

d. Even without your own allocator what you could do is GetProcessHeaps() to get all the handles for your process (including the CRT one), then display a chart for each heap. Use HeapSize() to get the total size (for scaling the chart) and HeapWalk() to get info about each of the allocations in that heap. Nothing needs addresses, just a visual picture. VirtualQuery() will get you more detailed info about a block.

e. Beware the nastiness of static/globals which need construction (i.e. complex C++ objects as globals). Unless you hook into all the crt0 nastiness (check the CRT source code out to see what goes on), then any custom allocator will miss those allocations. Though if you use those you''ll probably want to put a trap in your alloc()/free() to initialise the allocator if it''s being called before the rest of your program has been started [the crt0 stuff calls the c''tors before calling main()/WinMain()].


7. There may be some memory allocated and being used by your process that you can''t get at, such as stuff done by kernel mode drivers/Windows components (internal file buffers etc). That''s not too much of an issue unless your PC code gets really close to the console memory limits (i.e. it looks like you''re just about ok on the PC, then you come to port to console and realise you need a buffer that you didn''t see on the PC).


8. But of course there are so many other platform differences (UMA for example - textures in system memory) that it''s probably a good idea (IMO) to split memory further i.e. "we''ll set aside 8Mb for the sound code & data, 16Mb for the graphics, and xyz for the rest" so that you only have to prod people about fragmentation and running out of memory in the game level stuff.

--
Simon O''Connor
Creative Asylum Ltd
www.creative-asylum.com

Share this post


Link to post
Share on other sites
Alan Kemp    772
Thank you, you have answered all my questions (even the ones I didn''t realise I needed to ask).

I just ordered the book you recommended, and I am going to be spending quite a bit of time looking into all the HeapXXX() functions. I think I am going to do what you suggested, allocate myself 64mb of memory, and route all memory requests into that.

You mentioned being careful of static/globals, is there anyway I can hook into the memory allocation routines before these get constructed? Is it possible to write my own crt0?

Guess I have lot of reading to do.

Thanks again,

Alan

Share this post


Link to post
Share on other sites
S1CA    1418
Hi Alan,

quote:
You mentioned being careful of static/globals, is there anyway I can hook into the memory allocation routines before these get constructed? Is it possible to write my own crt0?


I''m not sure if you''re already aware of this, but the first piece of code that starts running in a Windows executable (whether console or Windows) IS NOT WinMain() or main().

The *real* entry point is usually actually one of:
mainCRTStartup() - console app.
wmainCRTStartup() - wide character console app.
WinMainCRTStartup() - Windows app.
wWinMainCRTStartup() - wide character Windows app.

When you link a project in MSVC, the entry point in the PE (Portable Executable) header is set up to point at one of the above symbols in the C Runtime Library. When you run the program, execution of code starts at the point specified by the entry point in the PE.

You can override this behavior and make the entry point point a different symbol in the settings for your project under: Link->Output.

If you''ve installed the full version of the Platform SDK or the full headers and source code from MSVC, then there''ll be a folder called SRC\CRT. That folder contains the source code to the whole of the C runtime library, including the above entry point functions.

The code for the entry point stuff is in a file called "crt0.c". It does a number of things including preparing the environment, initialising the heap, initialising stdin/stdout, sorting out the command line parameters etc. After all the behind the scenes initialisation it hands over control to main() or WinMain(). Once you exit your main()/WinMain() the code in there cleans up a few things and exits proper.


The nastiness of global C++ objects:

As much as C++/OO zealots try to wrap things up and pretend the system is nice and Object Oriented, the grim reality is that it isn''t. The C++ linker is really just the C linker (thus name mangling for C++), Windows entry points are set up for C, etc. The C runtime library was also designed for C. The PE file format was designed for C.

So a complication arises: when and how C++ objects with global scope get constructed. Because the first line of the client main()/WinMain() might expect to use one of these global objects, it MUST happen BEFORE main()/WinMain(). However, it''s also feasible that the constructor of a global object might expect to call something in the C runtime library, memset() or printf() for example, so the constructor needs to be called AFTER the basic parts of the C runtime library are set up.

So that means the construction NEEDS to be done during the stuff in crt0.c

That''s the when, but now think about "how". If you''re not using RTTI etc, then how does the code that crt0.c call "know" the names/pointers, sizes etc of the objects?.

The answer is it does it with quite a nice hack which requires the cooperation of the compiler, linker and C runtime. The memory for the objects has already been allocated automatically in the PE because they''re static or global

If you''ve got the CRT source code, look at crt0dat.c and cinitexe.c you''ll get a clue about the mechanism for calling the constructors.

cinitexe.c contains four global variables which are arrays of pointers to functions:

__xi_a[], __xi_z[], __xc_a[], __xc_z[]

Those arrays are all set to have a NULL entry.

At initialisation time the code in crt0dat.c then goes through the _a arrays and calls what''s pointed to by every pointer in the list until a NULL entry is reached. At program shutdown, the _z entries are called to ensure proper destruction of the objects.

Now for the clever bit...

In cinitexe.c around each of the arrays is a load of #pragma and linker setup stuff. The stuff surrounding the arrays forces each of the arrays (which in the .obj file created when cinitexe.c is compiled have **one** entry set to NULL) to have its own named "section" in the PE file.

The C/C++ compiler works on a per source code file (module) basis, and creates object code for each source file (blah.cpp->blah.obj). The linker then takes all these and joins them together to form a single whole PE file. If two .obj files have the same symbol name declared with global scope, you get a link error.

Because of the collaboration on this problem, FOR EACH MODULE (source file), when the **compiler** sees the global object construction or destruction, it makes an array of function pointers to the constructors IN THAT MODULE, and stores the arrays in the .obj produced for that module (which array depends on object type and whether it''s construction or destruction).

In the .obj file for the module being compiled, these arrays of function pointers are given names: __xi_a[], __xi_z[], __xc_a[], __xc_z[]

Yep, the SAME names as those in cinitexe.c. It also forces each array into a different named PE section just as cinitexe.c does.

Now the final clever bits: for each module (i.e. each of the .obj files it reads), the linker sees the 4 named sections (CRT$XIA, CRT$XIZ, CRT$XCA, CRT$XCZ), each of which has an array of pointers to constructors/destructors for that module and merges them together so that the CRT$XIA (__xi_a[]) array for ALL modules ends up as a single big array, likewise with the rest. Finally as well as merging the named arrays together, it puts all of them in one section in the PE: .data or .rdata.

So after all the linking has happened, the final PE has four arrays, each with arrays of pointers to the constructors/destructors for ALL the .obj (compiled .c/.cpp) modules. crt0dat.c then calls each of these pointers at the relevent time during startup.

A small loose end is the names of the sections and the arrays - their names are chosen carefully, first the __ will put them at the start or end of the symbol table, and secondly the A or Z at the end of the name will ensure all of the A''s go together and all of the Z''s go together during merging.

Incidentally: The order that the pointers is called is dependent entirely on the order the compiler generated the table, and the order the linker merged the sections together. That''s in a nutshell why you should NEVER rely on the order of construction for global objects!

This is all of course Windows PC specific stuff, other OSes, other development systems and other hardware systems can do things differently.


Something I forgot to mention in my original post:

If you have problems with things like fragmentation and memory leaks between levels, you could do what I do in my own personal allocation stuff - have "lifetimes" for each allocation, e.g. "this allocation is alive for the whole time the .exe is alive", "this allocation is alive only for the duration of this level", "this allocation is alive only for the duration of this player" etc. You specify the lifetime in the allocation call.

Then in your allocator initialisation (at program startup time), set up "pools" for each lifetime of allocation you''ll be using. i.e. for each "level" you could set up a 10Mb pool, for the player stuff set up a 1Mb pool etc. i.e. when you''re planning your memory map, think of the lifetimes of allocations and decide the maximums you''ll need for each type (this is also good stuff to show in a TDD).

So any allocation specifying "LEVEL" as a lifetime takes memory from the block of memory for that pool.

Now the cool bit: When you finish a particular level in the game, you want to free all of the memory allocated for the level, but usually don''t want to free the current player data, game data etc. It''s these per-level frees that are often missed, and often released out of order (resulting in fragmentation or at best lots of work for the coalesce stuff in the allocator).
If you''ve separated the lifetimes - you can release ALL of the allocations made for the level with ***ONE*** call that just resets the pool (i.e. just reset the block list for the "LEVEL" pool in the allocator). Result: hassle saved releasing everything, 0% fragmentation, 0% memory leaks etc...


--
Simon O''Connor
Creative Asylum Ltd
www.creative-asylum.com

Share this post


Link to post
Share on other sites
Alan Kemp    772
Thanks Simon,

I think I am going to have to read all that a couple more times to take it in.

Thank you, this is a great help to me, and a very intresting topic.

Alan

Share this post


Link to post
Share on other sites