RAM fragmentation
I have an application that loads and unloads large amounts of data from the hard drive almost continuously (several gigabytes in the space of an hour), and the allocations are not of a consistent size. It generates a lot of fragmentation. My question is, since the system must have uptimes of at least several days and be very responsive (I cannot have say a malloc() take more than a fraction of a second), what is the policy of an OS to deal with fragmented memory? I'm interested in both Windows XP and Linux. Does the OS wait until there isn't a sufficiently large contiguous block for a given malloc() before rearranging RAM (thus the allocation call I assume would take a long time to complete), or does it do it preemptively as a function of the amount of fragmentation? And can malloc() fail if there is enough total free space but not enough is contiguous, that is, does the OS not always consolidate the free RAM when needed? If so, how do I deal with that?
Pre-allocate blocks of different size.
Small blocks, and big blocks.
If one doesnt fit in a small block, use a big block. Yes there is wasted, but it will never get fragmented. You are just going to use more RAM for your application.
Small blocks, and big blocks.
If one doesnt fit in a small block, use a big block. Yes there is wasted, but it will never get fragmented. You are just going to use more RAM for your application.
Quote:Original post by Prune
I have an application that loads and unloads large amounts of data from the hard drive almost continuously (several gigabytes in the space of an hour),
That is not large. The way you describe your problem it sounds like you're IO bound.
Quote:what is the policy of an OS to deal with fragmented memory?For most purposes, it's a solved problem.
Quote:Does the OS wait until there isn't a sufficiently large contiguous block for a given malloc() before rearranging RAM (thus the allocation call I assume would take a long time to complete), or does it do it preemptively as a function of the amount of fragmentation?
See virtual addressing above.
To get responsive system, first make sure that your working set actually fits into RAM to avoid paging.
Quote:I cannot have say a malloc() take more than a fraction of a second
I'm not familiar with "a fraction" as measure of time. How much is that in seconds. Allocation of new memory, if needed, can be as little as inc eax, or about 1 CPU cycle.
What does your profiler currently say? Where are the bottle-necks?
Quote:Original post by Prune
I have an application that loads and unloads large amounts of data from the hard drive almost continuously (several gigabytes in the space of an hour), and the allocations are not of a consistent size. It generates a lot of fragmentation. My question is, since the system must have uptimes of at least several days and be very responsive (I cannot have say a malloc() take more than a fraction of a second), what is the policy of an OS to deal with fragmented memory? I'm interested in both Windows XP and Linux. Does the OS wait until there isn't a sufficiently large contiguous block for a given malloc() before rearranging RAM (thus the allocation call I assume would take a long time to complete), or does it do it preemptively as a function of the amount of fragmentation? And can malloc() fail if there is enough total free space but not enough is contiguous, that is, does the OS not always consolidate the free RAM when needed? If so, how do I deal with that?
The OS does not rearrange RAM at all. Defragmenting RAM like this would require that you move occupied blocks around in order to make larger contiguous blocks available, which is not possible because pointers to that data would then be outdated. You could implement such a scheme if you wanted to, although its a lot of work. Alternatively, try something like this: Slab allocation.
And yes, malloc can fail if there is enough free space but not enough free contiguous space.
Physical memory and virtual memory is different. Physical memory can be as fragmented as you like and not cause any problem in terms of performance (that's why it's Random Access Memory -- accessing byte 0x00000001 is just as fast as accessing byte 0xFFFFFFFF). It's when virtual memory gets fragmented that you have issues (not performance issues, however, because caching works on page-level and pages are the smallest allocation unit).
The issue you get with memory fragmentation is not performance, as I said, it's that you can run out of virtual memory without actually using up all your memory. On 64-bit systems, it's less of an issue, of course.
The way to solve problems with fragmented memory is to use memory pools. A decent strategy for reducing fragmentation is to use a slab allocator. For example, this is how memcache allocates memory.
Edit: oops, should read all the other posts first, outRider got in there before me with the slab allocator suggestion [smile]
The issue you get with memory fragmentation is not performance, as I said, it's that you can run out of virtual memory without actually using up all your memory. On 64-bit systems, it's less of an issue, of course.
The way to solve problems with fragmented memory is to use memory pools. A decent strategy for reducing fragmentation is to use a slab allocator. For example, this is how memcache allocates memory.
Edit: oops, should read all the other posts first, outRider got in there before me with the slab allocator suggestion [smile]
Quote:Original post by Antheus
That is not large. The way you describe your problem it sounds like you're IO bound.
It's not. I have several dozen different modules (dynamic linked libraries) with associated data (up to several hundred megabytes) loaded up to two per minute based on some schedule. While a few modules are running together, other threads load those modules that are upcoming in the schedule.
Quote:To get responsive system, first make sure that your working set actually fits into RAM to avoid paging.
At any time, I can guarantee that all modules that are loaded into RAM and their data will not exceed RAM (except for those that stream video or virtual textures off the HD).
Quote:I'm not familiar with "a fraction" as measure of time. How much is that in seconds. Allocation of new memory, if needed, can be as little as inc eax, or about 1 CPU cycle.
I'm not interested in "as little", but in "as much". Different modules allocate sets of widely varying blocks of memory. Since my maximum rate of loading modules is about two per minute, all of up to several hundred allocations will need to be doable in 30 seconds without impacting performance of the active modules more than a couple of percent, so when "pages are migrated (or reclaimed under memory pressure)" according to http://kernelnewbies.org/Linux_2_6_24 that must not prevent the render and various update threads of modules from running in real-time--and those are sometimes CPU/GPU bound, but sometimes memory bound (during a module's active time it doesn't normally access the HD, unless some are doing say video).
Quote:What does your profiler currently say? Where are the bottle-necks?
I haven't run things for more than a few minutes, and I only have a few modules at the moment so I can still fit them all in RAM. However, the system must scale to up to 30 modules and I already have some that take up 500 MB.
By the way, the average numbers I gave are for uncompressed data in memory, not the compressed data on the HD.
Codeka and outRider, if malloc can fail, then what's the point of the paragraph in http://kernelnewbies.org/Linux_2_6_24 that states that when fragmentation becomes high pages are migrated?
My concern is whether this anti-fragmentation feature in the kernel is a) complete, that is, it guarantees that sufficient rearranging will be done so malloc succeeds as long as combined free RAM is sufficient, b) whether it would impact performance in a setup such as I described in my reply to Antheus.
Also, Codeka wrote that "you can run out of virtual memory without actually using up all your memory". I don't understand how that is considered acceptable behavior?!
I think custom memory allocation is a very poor option for me because the nature of the modules I'm loading/running/unloading is very varied in terms of the allocation patterns etc. It would be essentially writing a general allocator, not one specific to a single program, and that seems an immense task.
By the way, I can't simply throw more RAM at the problem: I'm using high performance RAM and the ~20GB that would be necessary is simply too much money (not to mention that this limits scalability).
My concern is whether this anti-fragmentation feature in the kernel is a) complete, that is, it guarantees that sufficient rearranging will be done so malloc succeeds as long as combined free RAM is sufficient, b) whether it would impact performance in a setup such as I described in my reply to Antheus.
Also, Codeka wrote that "you can run out of virtual memory without actually using up all your memory". I don't understand how that is considered acceptable behavior?!
I think custom memory allocation is a very poor option for me because the nature of the modules I'm loading/running/unloading is very varied in terms of the allocation patterns etc. It would be essentially writing a general allocator, not one specific to a single program, and that seems an immense task.
By the way, I can't simply throw more RAM at the problem: I'm using high performance RAM and the ~20GB that would be necessary is simply too much money (not to mention that this limits scalability).
You're still confusing physical memory with virtual memory. That Linux kernel patch allows the kernal to rearrange physical memory but that does not affect the virtual memory that applications see at all. Besides, that features has very specific usage scenarios and doesn't really affect regular applications in any noticable way.
Here's a diagram which should help to illustrate the issue:
The bottom row represents your virtual address space, the top row represents physical memory (ignoring the swap file for simplicity). Each square represents one page, and the arrows correspond to where each virtual page resides in physical memory. The green squares are "free".
You can see that the physical memory is really "fragmented", in that there's no two pages that are contiguous. That's OK, because the page cache works on the page level anyway.
The problem, however, is that if you try to allocate 2 pages worth of memory in one go, you're going to get an "out of memory" error because malloc() isn't able to find two contiguous blocks of virtual memory. If you allocate just one page, it's fine, there's plenty of those.
Again, that Linux kernel article is not relevent here. It mainly conserns kernel-mode drivers that want to allocate large(-ish) blocks of contiguous physical memory. But user-mode programs only ever allocate virtual memory and so you don't need to worry about physical memory fragmentation.
What you're describing is more like a garbage collector which compacts the virtual memory when you start to run out. That's not how malloc()/new works, however and it's actually a lot more complicated to write a garbage collector than to simple use a fragmentation-resistant allocation strategy like a slab allocator.
Here's a diagram which should help to illustrate the issue:
The bottom row represents your virtual address space, the top row represents physical memory (ignoring the swap file for simplicity). Each square represents one page, and the arrows correspond to where each virtual page resides in physical memory. The green squares are "free".
You can see that the physical memory is really "fragmented", in that there's no two pages that are contiguous. That's OK, because the page cache works on the page level anyway.
The problem, however, is that if you try to allocate 2 pages worth of memory in one go, you're going to get an "out of memory" error because malloc() isn't able to find two contiguous blocks of virtual memory. If you allocate just one page, it's fine, there's plenty of those.
Again, that Linux kernel article is not relevent here. It mainly conserns kernel-mode drivers that want to allocate large(-ish) blocks of contiguous physical memory. But user-mode programs only ever allocate virtual memory and so you don't need to worry about physical memory fragmentation.
What you're describing is more like a garbage collector which compacts the virtual memory when you start to run out. That's not how malloc()/new works, however and it's actually a lot more complicated to write a garbage collector than to simple use a fragmentation-resistant allocation strategy like a slab allocator.
But why can't this page migration strategy the kernel uses to deal with physical memory fragmentation be used to deal with virtual memory fragmentation?
If I write a custom allocator, it's one thing as far as user code, but I don't see how I can force all the third party libraries to use that allocator as well. I've got things like libpng and SDL and OpenEXR and a dozen more. Moreover, it would have to be multithreaded (at least I don't need concurrent allocation since the running active modules are not allocating memory, just during their loading/initialization), and that seems difficult. A bunch of other problems spring to mind as well. The article on Wikipedia about slab allocators says "retaining an allocated memory that used to contain a data object of certain type and reusing that memory for the next allocations for another object of the same type". Since I don't know ahead of time all the various types different modules added in the future will allocate, not to mention that most types are user defined and there is only limited overlap in types between different modules, I cannot go that route which is really akin to vector<someType>::resize() at program start, since in my case someType will be many different things, most of them incoming as new modules get sent to the system over time.
If I write a custom allocator, it's one thing as far as user code, but I don't see how I can force all the third party libraries to use that allocator as well. I've got things like libpng and SDL and OpenEXR and a dozen more. Moreover, it would have to be multithreaded (at least I don't need concurrent allocation since the running active modules are not allocating memory, just during their loading/initialization), and that seems difficult. A bunch of other problems spring to mind as well. The article on Wikipedia about slab allocators says "retaining an allocated memory that used to contain a data object of certain type and reusing that memory for the next allocations for another object of the same type". Since I don't know ahead of time all the various types different modules added in the future will allocate, not to mention that most types are user defined and there is only limited overlap in types between different modules, I cannot go that route which is really akin to vector<someType>::resize() at program start, since in my case someType will be many different things, most of them incoming as new modules get sent to the system over time.
Quote:Original post by Prune
But why can't this page migration strategy the kernel uses to deal with physical memory fragmentation be used to deal with virtual memory fragmentation?
Because it would have to move all your pointers around. If you had "int *p = &myint" then when the virtual memory was shuffled around to make way for a new allocation, "myint" would be stored at a different address and so the value of "p" would have to be adjusted.
What you're asking for is a garbage collector. When the garbage collector runs, it stops your process, shuffles everything around, and then goes and fixes up all the pointers before restarting execution. That's why you get a pause when the GC runs (usually in the order of a few hundred milliseconds).
Quote:Original post by Prune
If I write a custom allocator, it's one thing as far as user code, but I don't see how I can force all the third party libraries to use that allocator as well.
Right, but you don't have to force everything to use it, just the stuff which does "big" allocations. You can pre-allocate a large block (say "physical memory - 1GB" or whatever), and do your big allocations out of that block. Let the other libraries do their own allocations.
Another option would be to switch to 64-bit, which has a much larger virtual address space (orders of magnitude larger) and so fragmentation is less of an issue (it's still an issue, but it might be enough). Personally, I'd do both.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement