Jump to content
  • Advertisement
ryt

C++ Dynamic memory using "new" and malloc()

Recommended Posts

I recently started to explore how operator new and malloc() really work. If I understood correctly they get implemented using linked-list or forward-list when an array is used. This is probably because free (dynamic, heap) memory gets allocated all over (within its bounds) the memory and so something like std::vector isn't used, something that has continuous allocation.
If this is true than accessing such memory (through []) would not have access time of O(1) but some other, more complex.
Is this true or I misplaced something?  

Share this post


Link to post
Share on other sites
Advertisement

First off there are no rules on how malloc() and free() have to be implemented. I've replaced them numerous times when I didn't like their performance for some piece of code I was working on.  Beyond that, there is also the specific std::vector implementation to contend with, and also there is hardware caching and stuff like that. In general if you aren't growing a vector a lot, it's access time will probably be pretty consistent. But that again depends on how you are accessing the elements. If you run though them in order that will most likely be faster than jumping around randomly because of hardware caching. That doesn't really have much to do with allocation however.

I don't really use std::vector myself except for trivial stuff because I like to have control over exactly what's going on.  I rarely even use the standard malloc/new and free/delete anymore for the same reasons. If you know how you are going to use the heap, you can almost always beat the standard routines by replacing them.  Depends on what you are writing though.  It might not be worth it for a lot of applications.

Share this post


Link to post
Share on other sites

std::vector memory is guaranteed to be contiguous and accessing it in order is the best possible thing. It doesn't matter which allocator the memory came from, it will be contiguous.

Accessing any memory in order is cache-friendly and the best possible thing to do.

What @ryt is interested about, possibly, is that allocating and deallocating memory in random order, using any allocator, creates fragmentation. After some time, unless you take measures, you might be left with many 'holes' which add up to a lot of overall available space, but you won't be able to put your big contiguous array (vector) there at all.

I wanna add that Gnollrunner is right. BUT. You really, really shouldn't go into optimising your structures and allocators, unless you first MEASURE that you suffer from their performance. That goes for any premature optimisation :)

Share this post


Link to post
Share on other sites
On 11/15/2018 at 4:52 PM, pcmaster said:

What @ryt is interested about, possibly, is that allocating and deallocating memory in random order, using any allocator, creates fragmentation. After some time, unless you take measures, you might be left with many 'holes' which add up to a lot of overall available space, but you won't be able to put your big contiguous array (vector) there at all.

I'm just curious how standard implementations of new and malloc() work. Do they use continuous memory allocation or something else?
Because as you sad the memory could end up fragmented and we could arrive at the case where we have enough free memory but not a block (with continuous memory) big enough that we can use another new. In this case we would have to resort to some other implementation that does not use continuous memory and therefore not an O(1).

Share this post


Link to post
Share on other sites

There can be no resorting to anything, the promise of contiguity shall not be broken - if you demand a block, you're guaranteed to get a block or nothing.

Imagine that the pool administred by an allocator is 16B and the granularity is 1B with zero overhead (totally fine for our demonstration). You allocate eight times 2B. All is good. Then you deallocate every other allocation you got. All is good, there's 8B free. Yet, when you now try to allocate 4B, the allocation shall fail and you get nothing. It is THAT simple.

Share this post


Link to post
Share on other sites
9 hours ago, ryt said:

resort to some other implementation that does not use continuous memory

Allocators either garbage collect and compact the heap (in garbage collecting runtimes), ask the OS for more memory to expand the heap (ex: VirtualAlloc), return null, throw an exception, or produce some other type of error.

Successfully allocated memory is ALWAYS contiguous from the perspective of the program (code assumes that data structures are always arranged the same way).  However, protected mode virtual memory allows the addresses that the program uses to map to different actual physical RAM (or even hard drives)

Edited by Nypyren

Share this post


Link to post
Share on other sites
9 hours ago, ryt said:

I'm just curious how standard implementations of new and malloc() work. Do they use continuous memory allocation or something else?
Because as you sad the memory could end up fragmented and we could arrive at the case where we have enough free memory but not a block (with continuous memory) big enough that we can use another new. In this case we would have to resort to some other implementation that does not use continuous memory and therefore not an O(1).

First off, again there is no "standard implementation" only standard behavior.  However there are probably some commonality between implementations.

You can imagine memory being an unused paper tape. When you allocate something you draw a line on it. From the begging of the tape to the line is what you allocated. Now if you allocate again draw a line after the first one. That chunk is from the first line to the second line and so forth. But now if de-allocate something you might have a space in the middle that's unused. However it's only the size of the piece of memory that you previously allocated there.  It's up to the memory manager to figure out what to do with it. It has to save it somehow so if you allocate new piece of memory that is smaller than that space it can give that piece back to you. However let's say all your following allocations are all bigger than that space. In that case it will never be reused and you have a hole.

Now it can get even more complex. Let's say you allocate a piece of memory smaller than that space, so it gets reused.  But it's slightly smaller, so a sub-piece of memory that's not big enough to be useful for anything is left over at the end. So now you have a sliver of lost memory.  So this is where the memory manager comes in. It might think in advance that this situation is sub-optimal and try to give  you a different chunk of memory instead.

Then there is the situation where you de-allocate two chunks of memory next to each other. The allocator has to be smart enough to merge them into a bigger chunk.  There are really a lot of variables. That's why there are a lot of implementations. 

Years ago I worked on IBM workstations.  Someone brought a program to me and said it was crashing on exit or rather it would go into and infinite loop. After debugging it for a while. I found that it got stuck in the memory deallocator.  But it did eventually end after 10 minutes. He was allocating millions of tiny objects. On a hunch I simply replaced mallaoc/free with my own routines and boom, it fixed the problem.  The standard implementation was simply way sub-optimal for his particular memory usage model.

In general for a lot of things I use slab allocators. You can read up on them. But again they aren't good for everything. It all depends on what you are doing. For a lot of gaming stuff, some form of slab allocation probably works pretty well though. They are basically good where you have a lot of objects of similar size, or objects can be categorized in groups of similar sizes.  Strings are it's downfall but you can do a different heaps for those.  I use a lot "placement new" in C++ to allocate in different heaps. You can read up on that too. Over the years I have built up a big library of different heap libraries.  I try to pic the ones that best suit my job.

If you are really trying to optimize memory, there is a lot you can do, but you should try to figure out if it's an issue first. For many people the built in routines work well enough.

Share this post


Link to post
Share on other sites
On 11/15/2018 at 2:59 PM, ryt said:

If this is true than accessing such memory (through []) would not have access time of O(1) but some other, more complex.

This isn't true at all because accessing through the []-operator is just convinience. What the compiler in the background does (normally) is to move the base pointer by certain ammount of bytes and return the result. Accessing an element in a plain array of integer simply is

int* myArray = new int[20];

//int x = myArray[7] ->
int x = *(myArray + (sizeof(int) * 7));

delete[] myArray;

What I learned during study was that the OS maintains a list of memory chunks when you delete something. Any new allocation request is then passed through the list of returned chunks so if something matching is found, an allocation will return that chunk or at least a new piece of it. This is the reason why memory fragmentation occures more and more the longer a program runs.

I used to force run my own memory management and allocator classes whenever possible, also as @Gnollrunner wrote in his first post, use my own STL like container classes. I wrote Array (dynamic resizing), Vector (dynamic add/remove), Stack, Queue, HashTable, Map and whatever I used to need in my code. All of these are based on Array because what you get when allocating memory using the API features of C/C++ is a block of memory of certain size. Malloc isn't interested in the kind of cobject you want to place in there except for its size on the platform.

What I do when allocating memory for an array of integers is to calculate the padded size of the type and multiply this with the ammount of elements I want to place inside memory.

In C/C++ it is a mighty tool to shift pointers left or right some bytes, this can result in a much better performance.

While C/C++ dosen't has any garbage collection (SmartPtr dosen't count because it just deletes it's contents on leaving scope) this is the reason for the ammount of garbage collection libs out in the wild. Games heavy rely on good memory management because they do even more allocation/deallocation than a normal office program does.

There are some models how memory management and garbage collection work. One approach is for example to put ranges of memory in buckets of different size so you know when allocating an integer, it will always stay in a heap reagion whos size wont exceed certain limit while an other approach is to return smart pointer objects instead of plain ones so on garbage collection there is a chance to move objects arround in memory without causing data corruption on access

Share this post


Link to post
Share on other sites

The simple answer is yes.

Although there are no "standard" implementations, dynamic memory allocation and de-allocation is non-deterministic, and the only way to map a contiguous memory address space into non-deterministic allocations and de-allocations is to use something like a linked list or a mark-and-sweep algorithm (or some other non-O(1) strategy).

Even more fraught is that a single central memory allocator on a multi-threaded multi-processor environment (and they all are these days, even your toaster) you can one big lock to slow things down further.

If profiling reveals that memory allocation is an issue, the strategy is generally to allocate all memory up front on startup and never use the central system allocator again.  This is the norm in real time and embedded systems where the vagaries of malloc() can not be tolerated, and has become standard practice in many games for the same reason.

Share this post


Link to post
Share on other sites
4 hours ago, Bregma said:

If profiling reveals that memory allocation is an issue, the strategy is generally to allocate all memory up front on startup and never use the central system allocator again. 

For most stuff just allocating in chunks is fine. With reasonable sized chunks the amount of CPU time you are going to save allocating everything up front is probably negligible.  This is especially true with 64 bit machines where you have a lot of address space available. You can reserve a huge chunk of address space up front, and then actually add the memory in as you need it without having to predict ahead of time how much memory you will actually end up using.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!