Fragmentation in my own memory allocator for GPU memory

Started by
8 comments, last by Shannon Barber 8 years ago

Hello.

I'm trying to get into Vulkan. According to the spec it is bad practice to allocate GPU memory for each buffer/texture and that I should write my own memory allocator. Model data won't be a problem, but I use texture streaming which will definitely require lots and lots of allocations and deallocations very often, and performance is critical here. I'm thinking of allocating a fixed amount of memory here (say 256-1024MB depending on the texture quality setting) and do my own sub-allocations from there. I've written my own allocator implementation that uses the best-fit algorithm to find the smallest block that the allocation fits into and allocates it there. I'm correctly merging free'd blocks and it all seems to work correctly and is fast enough for my uses.

However, I am worried about fragmentation in the texture memory due to the random allocations and deallocations. I've written a test program which randomly allocates and deallocates "textures" (memory sizes of power-of-two-sized textures with mipmaps), and I quite often get allocation errors due to fragmentation. It seems that at worst 70 to 120 MBs of a 1024 MB heap can be unusable due to fragmentation in some cases after millions of allocations/frees and logging the allocation error with the most free memory left in the heap. That's quite a big amount of wastage IMO.

> Is there any way I can improve my allocator to minimize fragmentation in the case of texture streaming?

> Is there any good way of dealing with fragmentation? Force unload some in-use textures to move them in memory? Defragmenting the memory in real time? For example, would it be possible to try to consolidate holes in the heap by copying around textures? Attempt to copy small textures into holes without creating new ones, or even copying a texture to a "hold", then back to a better location?

Advertisement

A common technique to help minimize it is to operate from different sides of the pool. Objects created as a stack pattern (always loaded, then in-game loaded, then level loaded, then room loaded, and unloaded always in reverse order) might be allocated from the low side of the pool, objects with more dynamic use patterns allocated from the high side of the pool.

When fragmentation gets bad enough, often the best way to fix it is simply to reload everything. Thankfully for graphics, needing to reload all your video card contents is something that can happen anyway, so triggering it intentionally can defragment your memory.

I'm not sure what Vulkan's method is for that, but since this happens at the hardware level I'm sure the drivers and API have a mechanism for handling it.

You can implement a defragmenter... but that requires a level of indirection to work properly. And I'd bet it's tricky to implement in the first place. I know nothing about Vulcan, but you could try this.

You could try mixing a stack allocator with a heap allocator. I've never seen this combination before, and this is off of the top of my head. But the idea would be to treat your memory like a roll of projector film. You can splice and append things together, as well as remove pieces of bad film.

You'll end up wasting space for book keeping. But for GPU memory, reserving 1 or 2 MB won't hurt, this can be tuned later. These Indexes may be structs with an unsigned integer ID. They store the size of an allocated block of memory, and the offset to the start.

When you initially allocate, you start from the end of the stack and rise up. Your pages will be 4bytes each. With each allocation you make, you add information about it to your allocator's index.

When you 'Delete' an item. You do not immediately delete it. Instead you simply slate that segment of code for deletion.

As a new allocation comes up. You check through your index for anything that needs to be deleted. If the segment of marked memory is large enough to fit the new inbound memory, then you do two allocations.

The difference between the "deleted" portion and your "new portion" will create a smaller segment of marked "deleted space".

Should an inbound allocation fail to fit in a deleted segment, then you append to the end of the stack.

When the time is convenient for the memory allocator you defrag the memory.

Because you know how much deleted space you have. And how much free space you have, it's a matter of swapping the data of the pages and updating the Index's offsets. Starting from your first deleted segment and swapping data down untill all of your free space is at the end.

Though this process might be terribly slow. 250MBs is 62500000 things if the entire heap is defragmented.

A common technique to help minimize it is to operate from different sides of the pool. Objects created as a stack pattern (always loaded, then in-game loaded, then level loaded, then room loaded, and unloaded always in reverse order) might be allocated from the low side of the pool, objects with more dynamic use patterns allocated from the high side of the pool.

When fragmentation gets bad enough, often the best way to fix it is simply to reload everything. Thankfully for graphics, needing to reload all your video card contents is something that can happen anyway, so triggering it intentionally can defragment your memory.

I'm not sure what Vulkan's method is for that, but since this happens at the hardware level I'm sure the drivers and API have a mechanism for handling it.


Reloading... I never thought about that. CURSE YOU OVER-COMPLICATION!

1. You can make your own defragmenter. An allocation is just an offset and a size. The defragmenter only needs to update those two from all allocations and memmove from the old region to the new region. Just mark the offset and size private to avoid saving this data somewhere else that could go out of sync. It must be polled from the allocated pointer every time you need it.

2. Millions of allocations doesn't sound realistic to me.

3. At the higher level, batch allocations of similar (eg don't alternate a 2048x2048 with a 128x128 texture followed by another 2048x2048 texture)

4. Make sure when you've freed two contiguous regions, you properly merge them as one. This is very common to get wrong

5. Allocate smaller pools, and when you've ran out of space in the pool, create another pool. You don't must have just ONE, but a few

Sorry if I'm getting some of the terminology wrong here. I'm a bit outside my comfort zone. >___>

@frob

Yeah, I will try to keep memory with different lifetimes in different "pools", which in practice will be different hardware-allocated blocks I guess. In most cases I know exactly how much memory I will need before I start allocating it (model data, some static textures, etc), so I will just calculate how much I need and allocate everything at once. Texture streaming is the only time where I'm dynamically doing things that require advanced suballocations out of one big "real" allocation of memory. I would rather avoid unloading all textures from VRAM as a "panic operation", as it could take almost as much as a minute to reload 1GB of texture data from disk again, and there could be a risk of thrashing when the heap is almost full, causing it to repeatedly panic.

@Tangletail

I do not have direct access to the memory in question. When I allocate GPU memory I receive an "object" back from the Vulkan driver that represents the requested memory. If the computer has a discrete GPU with its own VRAM, the only way I can access this memory is by mapping a CPU-side buffer and calling a Vulkan command to copy the contents of the buffer into the texture. I have no direct access to the memory. The heap structure therefore has to be stored on the CPU.

@Matias
1. Again, I cannot access the memory directly. In Vulkan you first allocate device memory (VRAM) and then "bind" textures to a range inside the allocated device memory. This binding is permanent, so to move a texture I would need to create a new texture, copy the old texture's data to the new one (using Vulkan commands; the copy is entirely in device memory), wait for the GPU to finish the copy, wait for the GPU to finish using the old texture for rendering, THEN delete the original. Also, the source and destination ranges cannot overlap, so if a texture has to be shifted just a few bytes I would need to copy it to somewhere else, delete the original, then copy the copy to the correct place and delete the first copy.
2. No, I would have a few hundred, maybe a thousand at most. All mipmaps of a texture are stored together.
3. This is hard to avoid, but at least pretty much all textures will be 512x512 to 2048x2048. Note that four 512x512 textures do not require exactly the same amount of memory as one 1024x1024 due to mipmaps.
4. I've confirmed that my region merging is working 100% correctly in all special cases. It took some time to get right, yeah.
5. Wouldn't that just result in more wasted memory as I would start with smaller blocks instead of one big block? I have a fixed memory budget to maintain anyway.
EDIT: Hmm, it might be a good idea to try to force the memory allocations to be multiples of each other, so that (for example) a 2048x2048 texture is exactly 4x as big as a 1024x1024 texture. Currently a 2048x2048 is 4.000011x as big as four 1024x1024 textures. Padding textures to keep the relationship an integer might avoid cases where a texture barely fits due to this scenario, although according to my tests increasing block sizes just made the problems caused by fragmentation worse...
If you implement texture streaming consider using sparse texture functionality. I didn't use it so far but as far as I understand you upload chunks of 64kb and bind them to the texture of your choice. This way you should be able to store a texture in several non contiguous parts of memory.

Sparse textures would be great, but they're sadly not something I can rely on for compatibility reasons. Not even all implementations of Vulkan support sparse textures.

disclaimer: I'm not familiar at all with Vulkan yet


According to the spec it is bad practice to allocate GPU memory for each buffer/texture and that I should write my own memory allocator.

If the Vulkan allows allocation functions to perform allocations like cuMemAlloc (from Cuda) did - by simply returning pointers to driver-provided pages of greatly varying size - using them could lead to fragmentation very quickly. If they perform allocations more like common implementations of malloc, then I doubt you would run into fragmentation that commonly (at least not on textures).

if moving memory around is acceptable to you, you can pre-emptively reduce the risk of fragmentation by moving isolated allocations next to contiguous allocations. This check can be performed when freeing an allocation. You might also periodically check for the possibility of moving any very small allocations with neighboring unallocated memory into any small open spaces that have allocations on both sides. You will want to always check all free memory locations for the smallest block to fit; don't just use the first free block.

You could consider a pooled allocator per texture dimension. So you allocate a pool for 512x512 textures and another pool for your 1024x1024 textures. All allocations into a pool are now using the same block size so you don't have to worry about merging chunks together or other complex bookkeeping. Fragmentation only really exists in the sense that pools might have (as yet) unused space, apart from that it's a non-issue.

The main thing you have to worry about is tuning the pool capacities appropriately, but if you run out of space in a pool you can just allocate another one.

If you are streaming textures won't they always be the same size?
Why would you (de)allocate and not just over-write the same locations?

If you really have to have a rolling stream then can a circular buffer solve the problem?

If that can't then you have to choose between pool allocation and defragmenting.

- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara

This topic is closed to new replies.

Advertisement