Back to General and Gameplay Programming

C++: Custom memory allocation

Programming

General and Gameplay Programming

Published April 15, 2013 by Tiago Costa, posted by Aqua Costa

Do you see issues with this article? Let us know.

Fast memory allocations along with memory leak detection can have a big impact on games performance. C++ provides two well known functions to allocate dynamic (heap) memory (malloc and new), these functions are usually very slow because they're general purpose functions and in some implementations require a context-switch from user mode into kernel mode. These functions also do not provide any kind of memory leak detection system natively. Using custom allocators we can have well defined usage patterns and optimize the allocation process accordingly.

Source code is available in the Aqua Engine at https://github.com/tiagovcosta/aquaengine (under Core/Allocators).

Base Allocator

Every allocator in this articles series will be derived from the class Allocator that declares 2 virtual functions (allocate and deallocate) that must be defined by each allocator.

Allocator.h


class Allocator 
{
  public: 
   
  Allocator(size_t size, void* start) 
    {
      _start = start; 
      _size = size; 
      _used_memory = 0; 
      _num_allocations = 0; 
    } 
  
  virtual ~Allocator() 
  { 
    ASSERT(_num_allocations == 0 && _used_memory == 0); 
    _start = nullptr; _size = 0; 
  } 
  
  virtual void* allocate(size_t size, u8 alignment = 4) = 0; 
  virtual void deallocate(void* p) = 0; 
  void* getStart() const { return _start; } 
  size_t getSize() const { return _size; } 
  size_t getUsedMemory() const { return _used_memory; } 
  size_t getNumAllocations() const { return _num_allocations; } 
  
  protected: 
  
  void* _start; 
  size_t _size; 
  size_t _used_memory; 
  size_t _num_allocations; 
}; 

namespace allocator 
{ 
  template <class T> T* allocateNew(Allocator& allocator) 
  {
    return new (allocator.allocate(sizeof(T), __alignof(T))) T; 
  } 
  
  template <class T> T* allocateNew(Allocator& allocator, const T& t) 
  {
    return new (allocator.allocate(sizeof(T), __alignof(T))) T(t);
  }
  
  template <class T> void deallocateDelete(Allocator& allocator, T& object)
  {
    object.~T(); 
    allocator.deallocate(&object);
  }
  
  template <class T> T* allocateArray(Allocator& allocator, size_t length)
  {
    ASSERT(length != 0); 
    u8 headerSize = sizeof(size_t)/sizeof(T); 
    
    if(sizeof(size_t)%sizeof(T) > 0) headerSize += 1; 
    
    //Allocate extra space to store array length in the bytes before the array 
    T* p = ( (T*) allocator.allocate(sizeof(T)*(length + headerSize), __alignof(T)) ) + headerSize;
    *( ((size_t*)p) - 1 ) = length;
    
    for (size_t i = 0; i < length; i++) 
      new (&p) T; 
    
    return p;
  }
  
  template <class T> void deallocateArray(Allocator& allocator, T* array) 
  {
    ASSERT(array != nullptr); 
    size_t length = *( ((size_t*)array) - 1 ); 
    
    for (size_t i = 0; i < length; i++) array.~T(); 
    
    //Calculate how much extra memory was allocated to store the length before the array 
    u8 headerSize = sizeof(size_t)/sizeof(T); 
    if(sizeof(size_t)%sizeof(T) > 0) 
      headerSize += 1; 
    allocator.deallocate(array - headerSize); 
  }
};

Memory leak detection

In the code above you can see an assert in the destructor, this is a simple and easy way to check if you forgot to deallocate any memory, that won't cause any overhead or take any extra memory. This simple method won't tell which allocation you forgot to deallocate but it will pin point in which allocator the leak occured so you can find the leak faster (especially if you use Proxy Allocators like I suggest later in this article).

Aligned Allocations

Processors access memory in word-sized blocks, so when a processor tries to access memory in an unaligned address it might have to access more word-sized memory blocks than would be needed if the memory was aligned and perform masking/shifting to get the required data in the register. Example: A processor accesses memory in 4-byte words (it can only directly access the words starting at (0x00, 0x04, 0x08, 0x0C,...). If it needs to access data (4 bytes) stored at the address 0x0B it will have to read two word-sized blocks (the address 0x08 and the address 0x0C) because the data crosses the boundary from one of the blocks to the other:

If the data was stored in an aligned address like 0x0C the processor could read it in a single memory access:

Aligned data definition

Primitive data is said to be aligned if the memory address where it is stored is a multiple of the size of the primitive. A data aggregate is said to be aligned if each primitive element in the aggregate is aligned.

Implementation

To n-byte align a memory address x we need to mask off the log2(n) least significant bits from x. Simply masking off bits will return the first n-byte aligned address before x, so in order to find the first after x we just need to add alignment-1 to x and mask that address.


inline void* alignForward(void* address, u8 alignment) 
{
  return (void*)( ( reinterpret_cast<u8*>(address) + static_cast<u8>(alignment-1) ) & static_cast<u8>(~(alignment-1)) );
}

It can be useful to calculate by how many bytes the address needs to adjusted to be aligned.


inline u8 alignForwardAdjustment(const void* address, u8 alignment) 
{ 
  u8 adjustment = alignment - ( reinterpret_cast<u8*>(address) & static_cast<u8*>(alignment-1) ); 
  
  if(adjustment == alignment) return 0; 
  
  //already aligned 
  return adjustment; 
}

Some allocators need to store an header before each allocation so they can use the adjustment space to reduce the memory overhead caused by the headers.


inline u8 alignForwardAdjustmentWithHeader(const void* address, u8 alignment, u8 headerSize) 
{
  u8 adjustment = alignForwardAdjustment(address, alignment); 
  u8 neededSpace = headerSize; 
  
  if(adjustment < neededSpace) 
  {
    neededSpace -= adjustment; 
    
    //Increase adjustment to fit header 
    adjustment += alignment * (neededSpace / alignment); 
    
    if(neededSpace % alignment > 0) adjustment += alignment;
  }
  
  return adjustment; 
}

The alignment must be a power of 2!

Linear Allocator

A Linear Allocator is the simplest and fastest type of allocator. Pointers to the start of the allocator, to the first free address and the total size of the allocator are maintained.

Allocations

New allocations simply move the pointer to the first free address forward.

Deallocations

Individual deallocations cannot be made in linear allocators, instead use clear() to completely clear the memory used by the allocator.

Implementation

LinearAllocator.h


#include "Allocator.h" 
  
class LinearAllocator : public Allocator 
{
  public: 
  
  LinearAllocator(size_t size, void* start); 
  ~LinearAllocator(); 
  
  void* allocate(size_t size, u8 alignment) override; 
  void deallocate(void* p) override; 
  void clear(); 
  
  private: 
  
  LinearAllocator(const LinearAllocator&); 
  
  //Prevent copies because it might cause errors 
  LinearAllocator& operator=(const LinearAllocator&); 
  void* _current_pos; 
};

LinearAllocator.cpp


#include "LinearAllocator.h" 
  
LinearAllocator::LinearAllocator(size_t size, void* start) : Allocator(size, start), _current_pos(start) { ASSERT(size > 0); }

LinearAllocator::~LinearAllocator() { _current_pos = nullptr; } 

void* LinearAllocator::allocate(size_t size, u8 alignment) 
{
  ASSERT(size != 0); 
  u8 adjustment = pointer_math::alignForwardAdjustment(_current_pos, alignment); 
  
  if(_used_memory + adjustment + size > _size) return nullptr; 
  
  uptr aligned_address = (uptr)_current_pos + adjustment; 
  _current_pos = (void*)(aligned_address + size); 
  _used_memory += size + adjustment; 
  _num_allocations++; 
  
  return (void*)aligned_address;
}

void LinearAllocator::deallocate(void* p) 
{
  ASSERT( false && "Use clear() instead" );
}

void LinearAllocator::clear() 
{
  _num_allocations = 0; 
  _used_memory = 0;
  _current_pos = _start;
}

Stack Allocator

A Stack Allocator, like the name says, works like a stack. Along with the stack size, three pointers are maintained:

Pointer to the start of the stack.
Pointer to the top of the stack.
Pointer to the last allocation made. (This is optional in release builds)

Allocations

New allocations move the pointer up by the requested number of bytes plus the adjustment needed to align the address and store the allocation header. The allocation header provides the following information:

Adjustment used in this allocation
Pointer to the previous allocation.

Deallocations

Memory must be deallocated in inverse order it was allocated! So if you allocate object A and then object B you must free object B memory before you can free object A memory.

To deallocate memory the allocator checks if the address to the memory that you want to deallocate corresponds to the address of the last allocation made. If so the allocator accesses the allocation header so it also frees the memory used to align the allocation and store the allocation header, and it replaces the pointer to the last allocation made with the one in the allocation header.

Implementation

StackAllocator.h


#include "Allocator.h" 
  
class StackAllocator : public Allocator 
{
  public: 
  
  StackAllocator(size_t size, void* start); 
  ~StackAllocator(); 
  
  void* allocate(size_t size, u8 alignment) override; 
  void deallocate(void* p) override; 
  
  private: 
  
  StackAllocator(const StackAllocator&); 
  
  //Prevent copies because it might cause errors 
  StackAllocator& operator=(const StackAllocator&); 
  
  struct AllocationHeader 
  {
    #if _DEBUG 
      void* prev_address; 
    #endif 
      
    u8 adjustment;
  };
  
  #if _DEBUG 
    void* _prev_position; 
  #endif 
  
  void* _current_pos; 
};

StackAllocator.cpp


#include "StackAllocator.h" 

StackAllocator::StackAllocator(size_t size, void* start) : Allocator(size, start), _current_pos(start) 
{
  ASSERT(size > 0); 
  #if _DEBUG 
    _prev_position = nullptr; 
  #endif 
}

StackAllocator::~StackAllocator() 
{
  #if _DEBUG 
    _prev_position = nullptr;
  #endif 
    
  _current_pos = nullptr; 
}

void* StackAllocator::allocate(size_t size, u8 alignment) 
{
  ASSERT(size != 0); 
  u8 adjustment = pointer_math::alignForwardAdjustmentWithHeader(_current_pos, alignment, sizeof(AllocationHeader)); 
  
  if(_used_memory + adjustment + size > _size) return nullptr; 
  
  void* aligned_address = pointer_math::add(_current_pos, adjustment); 
  
  //Add Allocation Header 
  AllocationHeader* header = (AllocationHeader*)(pointer_math::subtract(aligned_address, sizeof(AllocationHeader))); 
  header->adjustment = adjustment; 
  
  #if _DEBUG 
    header->prev_address = _prev_position;
    _prev_position = aligned_address; 
  #endif 
  
  _current_pos = pointer_math::add(aligned_address, size); 
  _used_memory += size + adjustment; 
  _num_allocations++; 
  
  return aligned_address; 
}

void StackAllocator::deallocate(void* p) 
{
  ASSERT( p == _prev_position ); 
  
  //Access the AllocationHeader in the bytes before p 
  AllocationHeader* header = (AllocationHeader*)(pointer_math::subtract(p, sizeof(AllocationHeader))); 
  _used_memory -= (uptr)_current_pos - (uptr)p + header->adjustment; 
  _current_pos = pointer_math::subtract(p, header->adjustment); 
  
  #if _DEBUG 
    _prev_position = header->prev_address; 
  #endif 
  
  _num_allocations--; 
}

Storing the last previous allocations in a list-like fashion and checking it before deallocations is not mandatory so it can be disabled in release builds. It's just helpful to prevent memory from being overwritten causing bugs.

FreeList Allocator

The FreeList allocator allows allocations of any size to be made (inside the available memory) and deallocations in any order. A linked-list of free blocks of memory is maintained (each free block contains information about its size and a pointer to the next free block).

Allocations

The allocator tries to find a free block large enough for the allocation to fit, if it finds multiple free blocks that meet the requeriments, there's 3 simple ways to decide which free block to choose:

First-fit - Use the first.
Best-fit - Use the smallest.
Worst-fit - Use the largest.

The best-fit method will in most cases cause less fragmentation than the other 2 methods.

In the example implementation below I use the first-fit method.

Deallocation

The allocator keeps the free blocks orderer by the start position. When an allocation is freed the allocator finds the right position in the free blocks list and tries to merge it with the adjacent blocks.

Implementation

FreeListAllocator.h


#include "Allocator.h" 
  
class FreeListAllocator : public Allocator 
{
  public: 
  
  FreeListAllocator(size_t size, void* start); 
  ~FreeListAllocator(); 
  
  void* allocate(size_t size, u8 alignment) override; 
  void deallocate(void* p) override; 
  
  private: 
  
  struct AllocationHeader { size_t size; u8 adjustment; }; 
  struct FreeBlock { size_t size; FreeBlock* next; }; 
  FreeListAllocator(const FreeListAllocator&); 
  
  //Prevent copies because it might cause errors 
  FreeListAllocator& operator=(const FreeListAllocator&); 
  FreeBlock* _free_blocks; 
};

FreeListAllocator.cpp


#include "FreeListAllocator.h" 
  
FreeListAllocator::FreeListAllocator(size_t size, void* start) : Allocator(size, start), _free_blocks((FreeBlock*)start) 
{
  ASSERT(size > sizeof(FreeBlock)); 
  _free_blocks->size = size; 
  _free_blocks->next = nullptr; 
}

FreeListAllocator::~FreeListAllocator() 
{
  _free_blocks = nullptr; 
}

void* FreeListAllocator::allocate(size_t size, u8 alignment) 
{
  ASSERT(size != 0 && alignment != 0); 
  FreeBlock* prev_free_block = nullptr; 
  FreeBlock* free_block = _free_blocks; 
  
  while(free_block != nullptr) 
  {
    //Calculate adjustment needed to keep object correctly aligned 
    u8 adjustment = pointer_math::alignForwardAdjustmentWithHeader(free_block, alignment, sizeof(AllocationHeader)); 
    size_t total_size = size + adjustment; 
    
    //If allocation doesn't fit in this FreeBlock, try the next 
    if(free_block->size < total_size) 
    {
      prev_free_block = free_block; 
      free_block = free_block->next; 
      continue;
    } 
    
    static_assert(sizeof(AllocationHeader) >= sizeof(FreeBlock), "sizeof(AllocationHeader) < sizeof(FreeBlock)"); 
    
    //If allocations in the remaining memory will be impossible 
    if(free_block->size - total_size <= sizeof(AllocationHeader)) 
    {
      //Increase allocation size instead of creating a new FreeBlock 
      total_size = free_block->size; 
      
      if(prev_free_block != nullptr) 
        prev_free_block->next = free_block->next; 
      else 
        _free_blocks = free_block->next; 
    }
    else
    {
      //Else create a new FreeBlock containing remaining memory 
      FreeBlock* next_block = (FreeBlock*)( pointer_math::add(free_block, total_size) );
      
      next_block->size = free_block->size - total_size;
      next_block->next = free_block->next; 
      
      if(prev_free_block != nullptr) 
        prev_free_block->next = next_block; 
      else 
        _free_blocks = next_block;
    } 
    
    uptr aligned_address = (uptr)free_block + adjustment; 
    AllocationHeader* header = (AllocationHeader*)(aligned_address - sizeof(AllocationHeader)); 
    header->size = total_size; 
    header->adjustment = adjustment; 
    _used_memory += total_size; 
    _num_allocations++; 
    
    ASSERT(pointer_math::alignForwardAdjustment((void*)aligned_address, alignment) == 0); 
    
    return (void*)aligned_address; 
  } 
  
  //ASSERT(false && "Couldn't find free block large enough!"); 
  return nullptr; 
}

void FreeListAllocator::deallocate(void* p) 
{
  ASSERT(p != nullptr); 
  AllocationHeader* header = (AllocationHeader*)pointer_math::subtract(p, sizeof(AllocationHeader)); 
  uptr block_start = reinterpret_cast<uptr>(p) - header->adjustment; 
  size_t block_size = header->size; 
  uptr block_end = block_start + block_size; 
  FreeBlock* prev_free_block = nullptr; 
  FreeBlock* free_block = _free_blocks; 
  
  while(free_block != nullptr) 
  {
    if( (uptr) free_block >= block_end ) break; 
    prev_free_block = free_block; 
    free_block = free_block->next;
  }
  
  if(prev_free_block == nullptr) 
  {
    prev_free_block = (FreeBlock*) block_start; 
    prev_free_block->size = block_size; 
    prev_free_block->next = _free_blocks; 
    _free_blocks = prev_free_block; 
  }
  else if((uptr) prev_free_block + prev_free_block->size == block_start) 
  {
    prev_free_block->size += block_size; 
  }
  else 
  {
    FreeBlock* temp = (FreeBlock*) block_start; 
    temp->size = block_size; 
    temp->next = prev_free_block->next; 
    prev_free_block->next = temp; 
    prev_free_block = temp; 
  }
  
  if( free_block != nullptr && (uptr) free_block == block_end) 
  {
    prev_free_block->size += free_block->size; 
    prev_free_block->next = free_block->next;
  }
  
  _num_allocations--; 
  _used_memory -= block_size; 
}

Pool Allocator

This allocator only allows allocations of a fixed size and alignment to be made, this results in both fast allocations and deallocations to be made. Like the FreeList allocator, a linked-list of free blocks is maintaied but since all blocks are the same size each free block only needs to store a pointer to the next one. Another advantage of Pool allactors is no need to align each allocation, since all allocations have the same size/alignment only the first block has to be aligned, this results in a almost non-existant memory overhead.

The block size of the Pool Allocator must be larger than sizeof(void*) because when blocks are free they store a pointer to the next free block in the list.

Allocations

The allocator simply returns the first free block and updates the linked list.

Deallocations

The allocator simply adds the deallocated block to the free blocks linked list.

Implementation

PoolAllocator.h


#include "Allocator.h" 

class PoolAllocator : public Allocator 
{ 
  public: 
  
  PoolAllocator(size_t objectSize, u8 objectAlignment, size_t size, void* mem);
  ~PoolAllocator();
  void* allocate(size_t size, u8 alignment) override;
  void deallocate(void* p) override; 
  
  private: 
  
  PoolAllocator(const PoolAllocator&); 
  
  //Prevent copies because it might cause errors 
  PoolAllocator& operator=(const PoolAllocator&); size_t _objectSize; 
  
  u8 _objectAlignment; 
  void** _free_list;
};

PoolAllocator.cpp


#include "PoolAllocator.h" 
  
PoolAllocator::PoolAllocator(size_t objectSize, u8 objectAlignment, size_t size, void* mem) : Allocator(size, mem), _objectSize(objectSize), _objectAlignment(objectAlignment) 
{
  ASSERT(objectSize >= sizeof(void*)); 
  
  //Calculate adjustment needed to keep object correctly aligned 
  u8 adjustment = pointer_math::alignForwardAdjustment(mem, objectAlignment); 
  _free_list = (void**)pointer_math::add(mem, adjustment); 
  size_t numObjects = (size-adjustment)/objectSize; 
  void** p = _free_list; 
  
  //Initialize free blocks list 
  for(size_t i = 0; i < numObjects-1; i++) 
  {
    *p = pointer_math::add(p, objectSize ); 
    p = (void**) *p;
  }
  
  *p = nullptr;
}

PoolAllocator::~PoolAllocator() 
{
  _free_list = nullptr;
}

void* PoolAllocator::allocate(size_t size, u8 alignment) 
{
  ASSERT(size == _objectSize && alignment == _objectAlignment); 
  if(_free_list == nullptr) return nullptr; 
  void* p = _free_list;
  _free_list = (void**)(*_free_list);
  _used_memory += size;
  _num_allocations++; 
  return p; 
}

void PoolAllocator::deallocate(void* p) 
{
  *((void**)p) = _free_list; 
  _free_list = (void**)p; 
  _used_memory -= _objectSize; 
  _num_allocations--;
}

Proxy Allocator

A Proxy Allocator is a special kind of allocator. It is just used to help with memory leak and subsystem memory usage tracking. It will simply redirect all allocations/deallocations to the allocator passed as argument in the constructor while keeping track of how many allocations it made and how much memory it is "using". Example: Two subsystems use the same allocator A. If you want to show in the debugging user interface how much memory each subsystem is using, you create a proxy allocator, that redirects all allocations/deallocations to A, in each subsystem and track their memory usage. It will also help in memory leak tracking because the assert in the proxy allocator destructor of the subsystem that is leaking memory will fail.

Implementation

ProxyAllocator.h


#include "Allocator.h" 
  
class ProxyAllocator : public Allocator 
{
  public: 
  
  ProxyAllocator(Allocator& allocator); 
  ~ProxyAllocator(); 
  void* allocate(size_t size, u8 alignment) override; 
  void deallocate(void* p) override; 
  
  private: 
  
  ProxyAllocator(const ProxyAllocator&); 
  
  //Prevent copies because it might cause errors 
  ProxyAllocator& operator=(const ProxyAllocator&); 
  Allocator& _allocator; 
};

ProxyAllocator.cpp


#include "ProxyAllocator.h" 
  
ProxyAllocator::ProxyAllocator(Allocator& allocator) : Allocator(allocator.getSize(), allocator.getStart()), _allocator(allocator) { } 

ProxyAllocator::~ProxyAllocator() { } 

void* ProxyAllocator::allocate(size_t size, u8 alignment) 
{
  ASSERT(size != 0); 
  _num_allocations++; 
  size_t mem = _allocator.getUsedMemory(); 
  
  void* p = _allocator.allocate(size, alignment);
  _used_memory += _allocator.getUsedMemory() - mem;
  return p; 
}

void ProxyAllocator::deallocate(void* p)
{
  _num_allocations--; 
  size_t mem = _allocator.getUsedMemory(); 
  _allocator.deallocate(p); 
  _used_memory -= mem - _allocator.getUsedMemory();
}

Allocator Managment

A large block of memory should be allocated when the program starts using malloc (and this should be the only malloc made) this large block of memory is managed by a global allocator (for example a stack allocator). Each subsystem should then allocate the block of memory it needs to work from the global allocator, and create allocators that will manage that memory.

Example usage

Allocate 1GB of memory using malloc and create a FreeList allocator to manage that memory.
Create a Proxy allocator that redirects all allocations to the FreeList allocator.
Initialize the Resource Manager by passing a pointer to the Proxy allocator in the constructor.
Register the Proxy allocator in the memory usage tracker so it shows how much memory the Resource Manager is using.
Allocate 16MB of memory using the FreeList allocator and create a Linear allocator to manage that memory and register it in the memory usage tracker.
Use the Linear allocator to make small temporary allocations needed for game logic, etc, and clear it before the end of each frame.
The Resource Manager will create a Pool allocator for every ResourcePackage it loads.

Tips & Tricks

Depending on the type of allocator, keep the number of individual allocations to a minimum to reduce the memory wasted by allocation headers.
Prefer using allocateArray() to individual allocations when it makes sense. Most allocators will use extra memory in each allocation to store allocation headers and arrays will only need single header.
Instead of making small size allocations from allocators with large amounts of memory available, allocate a single memory block capable of holding all the small allocations and create a new allocator to manage the memory block and make the small allocations from this block.

Performance Comparison

To test the performance of each allocator compared to malloc I wrote a program that measures how long it takes to make 20000 allocations (you can download the program in the end of the article), the tests where made in release mode and the results are averages of 3 runs.

Malloc vs Linear Allocator

10k 16 bytes allocations + 1k 256 bytes allocations + 50 2Mb allocations/deallocations (allocations made using the linear allocator are deallocated in a single call to clear().

Allocator Time (s) Malloc 0.639655 Linear 0.000072

Malloc vs Stack Allocator

10k 16 bytes allocations + 1k 256 bytes allocations + 50 2Mb allocations/deallocations

Allocator Time (s) Malloc 0.650435 Stack 0.000289

Malloc vs FreeList Allocator

10k 16 bytes allocations + 1k 256 bytes allocations + 50 2Mb allocations/deallocations

Allocator Time (s) Malloc 0.673865 FreeList 0.000414

Malloc vs Pool Allocator

20k 16 bytes allocations/deallocations

Allocator Time (s) Malloc 1.454934 Pool 0.000193

Conclusion

There isn't a single best allocator - it's important to think about how the memory will be allocated/accessed/deallocated and choose the right allocator for each situation. Full source code and performance tests in the attached file

Reference

http://bitsquid.blogspot.pt/2010/09/custom-memory-allocation-in-c.html Game Engine Architecture, Jason Gregory 2009 http://molecularmusings.wordpress.com/2011/07/05/memory-system-part-1/

Article Update Log

04 March 2014: Rewritten FreeListAllocator, added support to x64. (Also updated code style and added a new test).

16 November 2013: Fixed two errors in FreeListAllocator.cpp lines 48 and 60

14 April 2013: Fixed an error in allocateArray

13 April 2013: FreeList and Pool allocators added

12 April 2013: Performance comparison & test project added

11 April 2013: Initial release

10 Likes 39 Comments

Comments

yckx

I hate to nitpick, but the example memory addresses look like they should be hexadecimal, but aren't. 0x12 should be 0x0C, etc.

Otherwise, it's a decent read so far.

April 11, 2013 01:37 AM

Michael Tanczos

I hate to nitpick, but the example memory addresses look like they should be hexadecimal, but aren't. 0x12 should be 0x0C, etc.

Otherwise, it's a decent read so far.

Good catch..

April 11, 2013 04:21 AM

rozz666

It is forbidden to start identifier names with an underscore followed by a capital letter (e.g. _UsedMemory, _NumAllocations).

Also, leak detection (ASSERT(_NumAllocations == 0 && _UsedMemory == 0);) should be delegated to derived classes, since there is no guarantee that a derived class would update these variables.

April 11, 2013 05:37 AM

cr88192

nitpick:

the implication that "new" and "malloc" involve (always) going into the kernel isn't really correct (with most sane implementations, luckily they aren't that bad).

these will typically operate in-process and maintain their own free-lists, and typically grab/release "large" (say: 256kB or 1MB) chunks of memory from the OS via lower-level system calls (typically things like "mmap()" or "VirtualAlloc()").

system calls are not used in cases where the needed memory is already in the free-list, or is being returned to the free-list.

so, the only times these will usually make system calls is if more memory needs to be grabbed from the OS, or if a particularly large memory object has just been allocated or freed (such as large array or similar).

ADD: issue seems have been partly fixed.

April 11, 2013 06:25 AM

Aqua Costa

The article isn't finished!

I still have to write about FreeList and Pool allocators, upload the code and update the performance tables (these are only placeholders), etc

It is forbidden to start identifier names with an underscore followed by a capital letter (e.g. _UsedMemory, _NumAllocations).

Also, leak detection (ASSERT(_NumAllocations == 0 && _UsedMemory == 0);) should be delegated to derived classes, since there is no guarantee that a derived class would update these variables.

I wasn't aware it was forbidden.

If the derived classes do not update these variables, maybe that's because they provide their own leak detection system?

I hate to nitpick, but the example memory addresses look like they should be hexadecimal, but aren't. 0x12 should be 0x0C, etc.

Otherwise, it's a decent read so far.

Good catch indeed, I can't believe I missed that...

April 11, 2013 07:52 AM

BitMaster

If this is implemented via actual polymorphism as shown so far, then ~Allocator must be declared virtual. Aside from that, a lot of the 'work' done in the allocator destructors seems useless. Setting values to 0/nullptr would be at best useful as a debugging help - but the MSVC debug runtime for example would already overwrite the memory immediately afterwards anyway.

April 11, 2013 10:59 AM

tivolo

Just wanted to point out that there's two bugs in the implementation.

The first one is in

template<class T> T* allocateArray(uint maxNumObjects):

Allocating sizeof(T)*maxNumObjects is not enough for arrays of non-PODs! Most compilers (all compilers I've ever worked with, including consoles/handhelds) store the number of instances right before the start of the array. So in most cases (depending on the type T) you might need sizeof(T)*maxNumObjects + sizeof(uint32_t), but you can never be too sure about that.

I've written about that exact problem in more detail on my blog:

http://molecularmusings.wordpress.com/2011/07/05/memory-system-part-1/

If you use the implementation above, you'll overwrite memory as soon as you want to allocate an array of non-PODs. This can easily be verified using the memory window of your favourite debugger.

The second bug is in

template<class T> void deallocateDelete(T* pObject):

This one also has to do with arrays, namely that the destructors for the instances in the array aren't called. You need to call the destructors of all instances in reverse order. Again, for reasons stated above this can be tricky, because you need to know the number of instances stored in the array.

I know that the article isn't finished yet, just wanted to give you a heads-up :).

April 11, 2013 11:37 AM

Datamancer

It is forbidden to start identifier names with an underscore followed by a capital letter (e.g. _UsedMemory, _NumAllocations).

Also, leak detection (ASSERT(_NumAllocations == 0 && _UsedMemory == 0);) should be delegated to derived classes, since there is no guarantee that a derived class would update these variables.

Starting functions, etc with an underscore is generally frowned upon, but it is not forbidden. Most compilers reserve the prefix underscore.

April 11, 2013 03:49 PM

cr88192

FWIW:

in my case I am actually using spans of "cells" (currently 16-byte) and bitmaps, for small objects in my allocator (also a GC also does leak detection and sometimes helps detect overruns). free-lists are also used for small-objects (based on the objects' cell-count), before scanning for free-cells in the bitmap. currently, memory for this is allocated in chunks of 16MB on desktop.

this is mostly for objects < 6KiB, past 6KiB and it resorts to a more traditional allocation strategy (free lists / etc). this is partly because, at least in my case, small objects tend to dominate in terms of total memory allocations.

past around 6kB, and the bitmaps become expensive (and a block-carving first-fit strategy becomes preferable, up until around several MB, where raw pages become preferable).

IME: the 16KiB-48KiB range tends to dominate in terms of total memory usage, but drops off quickly as objects get larger.

granted, I guess also possible could be using larger cells (say, 512 byte), then use exact-sized free-lists up to 1MB or similar (6KiB - 1MB), before resorting to OS pages.

April 11, 2013 04:45 PM

rozz666

Also, leak detection (ASSERT(_NumAllocations == 0 && _UsedMemory == 0);) should be delegated to derived classes, since there is no guarantee that a derived class would update these variables.

If the derived classes do not update these variables, maybe that's because they provide their own leak detection system?

That's why the base class shouldn't assume anything about leak detection.

April 11, 2013 04:48 PM

rozz666

It is forbidden to start identifier names with an underscore followed by a capital letter (e.g. _UsedMemory, _NumAllocations).

Also, leak detection (ASSERT(_NumAllocations == 0 && _UsedMemory == 0);) should be delegated to derived classes, since there is no guarantee that a derived class would update these variables.

Starting functions, etc with an underscore is generally frowned upon, but it is not forbidden. Most compilers reserve the prefix underscore.

Starting with an underscore and a lower case letter is permitted, but an underscore followed by a capital letter is reserved for the compiler implementation.

April 11, 2013 04:50 PM

cr88192

It is forbidden to start identifier names with an underscore followed by a capital letter (e.g. _UsedMemory, _NumAllocations).

Also, leak detection (ASSERT(_NumAllocations == 0 && _UsedMemory == 0);) should be delegated to derived classes, since there is no guarantee that a derived class would update these variables.

Starting functions, etc with an underscore is generally frowned upon, but it is not forbidden. Most compilers reserve the prefix underscore.

Starting with an underscore and a lower case letter is permitted, but an underscore followed by a capital letter is reserved for the compiler implementation.

both "_Capital" and "__whatever" are reserved. more often though, "_Capital" is used for the language-standards to add more keywords, where it is more "__whatever" being used by the compilers (for example: "_Complex" or "_Generic" vs "__declspec" or "__attribute__").

April 11, 2013 05:21 PM

rozz666

It is forbidden to start identifier names with an underscore followed by a capital letter (e.g. _UsedMemory, _NumAllocations).

Also, leak detection (ASSERT(_NumAllocations == 0 && _UsedMemory == 0);) should be delegated to derived classes, since there is no guarantee that a derived class would update these variables.

Starting functions, etc with an underscore is generally frowned upon, but it is not forbidden. Most compilers reserve the prefix underscore.

Starting with an underscore and a lower case letter is permitted, but an underscore followed by a capital letter is reserved for the compiler implementation.

both "_Capital" and "__whatever" are reserved. more often though, "_Capital" is used for the language-standards to add more keywords, where it is more "__whatever" being used by the compilers (for example: "_Complex" or "_Generic" vs "__declspec" or "__attribute__").

Agreed.

April 11, 2013 07:39 PM

tivolo

Your math seems to be off in allocateArray. Consider a case with e.g. sizeof(T) == 8:

headerSize will be zero, so you don't allocate extra space for storing the array length, effectively overwriting memory.

Why not just always allocate an extra 4 bytes in order to store the length?

You could later always make an optimization and only store the length for types T that actually need to have their destructors called (similar to how the compiler does it, POD vs. non-POD).

April 14, 2013 10:55 AM

Aqua Costa

Your math seems to be off in allocateArray. Consider a case with e.g. sizeof(T) == 8:
headerSize will be zero, so you don't allocate extra space for storing the array length, effectively overwriting memory.

Yes I had a mistake in the line if(sizeof(T)%sizeof(u32) > 0) it should be if(sizeof(u32)%sizeof(T) > 0). It's now fixed. Thanks.

Why not just always allocate an extra 4 bytes in order to store the length?

Because if the alignment is larger than 4 bytes it would unalign the array.

April 14, 2013 01:09 PM

tivolo

Why not just always allocate an extra 4 bytes in order to store the length?

Because if the alignment is larger than 4 bytes it would unalign the array.

Well, you could always just make the allocate() method aware of that by passing an additional "offset" argument, so that "result ptr + offset" matches the alignment. This also has the benefit that you in general waste less space due to alignment.

Granted, it's more work for the individual allocators though.

April 14, 2013 04:28 PM

tivolo

Your math seems to be off in allocateArray. Consider a case with e.g. sizeof(T) == 8:
headerSize will be zero, so you don't allocate extra space for storing the array length, effectively overwriting memory.

Yes I had a mistake in the line if(sizeof(T)%sizeof(u32) > 0) it should be if(sizeof(u32)%sizeof(T) > 0). It's now fixed. Thanks.

You're welcome.

You should still call the destructors of the array's instances in reverse order to be correct, though. Hopefully nobody ever relies on that, but you never know :).

April 14, 2013 04:35 PM

Bush

Hi! Thanks to the author for the article!!!
Recently, I was concerned about finding memory leaks in my code. Other programmers advised to use different debuggers (vld, memcheck, deleaker, insure...) Some were advised not to use dynamic memory. But the method proposed in the article is different, very simple and I hope, effective enough.
Thanks, it is very helpful to me, +1 to the author

April 16, 2013 05:06 PM

Topiary

I read an article about smart pointers. I have convinced that using of smart pointers prevents errors and memory leaks.

April 17, 2013 08:29 AM

metsfan

I read an article about smart pointers. I have convinced that using of smart pointers prevents errors and memory leaks.

Smart pointers are a useful feature, but they aren't a solution to every problem. If every single one of your pointers is a smart pointer, you are definitly doing it wrong. You should only use shared (i.e. reference counted) pointers when you want two different objects to both be the "owner" of a pointer (i.e. shared ownership). This gives either object the ability to free the pointer, but only when both objects no longer reference it. In general though, you should try to create pointers that are owned by only one object, and deleted by only one object. This is a much better way to make sure memory is not leaked.

Also, just because you use smart pointers, it doesn't mean that you will never have a memory "leak". For instance, if you have two smart pointers, that both retain a reference to each other unless one of them frees the reference to the other, neither pointer will ever be released. So even with smart pointers, you still need to aware of what's going on.

April 17, 2013 05:38 PM

Topiary

I have to agree, smart pointers are not a panacea. I have a stock of various tools that help me to deal with errors and leaks in the code
thanks!

April 18, 2013 11:26 AM

termintator2012

if one of my programs does any closing of handles or freeing of memory within the program (versus at the exit point), I also have to set handle values back to INVALID_HANDLE_VALUE after closing and pointers back to NULL after freeing memory.

April 19, 2013 08:55 AM

Topiary

if one of my programs does any closing of handles or freeing of memory within the program (versus at the exit point), I also have to set handle values back to INVALID_HANDLE_VALUE after closing and pointers back to NULL after freeing memory.

In my opinion this is a good option, but not the most efficient. You can find more reliable methods higher

April 22, 2013 01:56 PM

captaincoder

Hi there

Thanks for writing this article - not much example code out there for custom allocators.

Just wanted to let you know there are serious issues with the FreeListAllocator and this code should NOT be used in production as it stands. By allocating blocks till the buffer is full, then random deallocating blocks before filling the buffer again, I can make the code either crash or loop forever whilst trying to allocate/deallocate blocks.

This replacement main() function illustrates the issue - you'll need to remove this assert from the allocate function - ASSERT(false && "Couldn't find free block large enough!");

int main()
{
char* bigbuf = (char*)malloc(1000);

// remember our allocs
std::list<void*> _allocs;

FreeListAllocator allocator(1000, bigbuf);

// lots of iterations
for (int i = 0; i < 1000; i++)
{
// allocate blocks till the allocator is full
for(;;)
{
size_t allocSize = 90;
void* alloc = allocator.allocate(allocSize, 0);

if (!alloc)
break;

_allocs.push_back(alloc);
}

// randomly mark ~50% of blocks to remove
std::list<void*> allocsToRemove;
for (std::list<void*>::iterator it = _allocs.begin(); it != _allocs.end(); it++)
{
if ((rand() % 2) == 0)
{
allocsToRemove.push_back(*it);
}
}

// deallocate the blocks
for (std::list<void*>::iterator it = allocsToRemove.begin(); it != allocsToRemove.end(); it++)
{
allocator.deallocate(*it);
_allocs.remove(*it);
}
}

// cleanup before exitting
for (std::list<void*>::iterator it = _allocs.begin(); it != _allocs.end(); it++)
{
allocator.deallocate(*it);
}

return 0;
}

February 26, 2014 12:27 PM

captaincoder

The issue above is that it can't cope with an alignment of 0. Fixing this edge condition seems to have stopped the crashies.

However there's another issue - it doesn't properly coalesce free blocks together. You can try this by filling the heap, randomly deleting blocks then filling it again.

Result - the free block is not the size of the whole buffer.

Edit: And there's an issue where blocks can end up pointing backwards to the front of the list...

Edit: And another where after aligning new allocations the previous block next pointer isn't set up correctly to the newly created aligned block...

Edit: And another where the free block pointers aren't correctly updated for blocks that don't result in a merge. I really don't mean to be negative but this code doesn't look like it's ever been tested with anything more rigorous than sequential allocation/deallocations :-/

February 26, 2014 01:45 PM

Aqua Costa

The issue above is that it can't cope with an alignment of 0. Fixing this edge condition seems to have stopped the crashies.

An alignment of 0 makes no sense, so yes it will probably cause issues. I updated the ASSERT to prevent allocations with an alignment of 0.

You're right, there were some errors in the deallocate function of FreeListAllocator, I rewritten the function (the free blocks are now ordered btw), I've tested it a bit and some cases that caused errors in the previous version are now fixed. Feel free to test it yourself and give me some feedback :)

March 04, 2014 03:56 AM

captaincoder

Hey bud

Thanks for the reply - just to say there's still issues there, with random alloc/deallocs it still dies pretty quickly. See my PM :)

March 04, 2014 10:47 AM

Red Ant

I'm sorry, but if you're going to go to the trouble of writing custom allocators why wouldn't you at least make them STL-compliant so you can actually use them with the various STL containers??

May 27, 2014 09:47 AM

colossal

I've been authoring a more modern version of this article for personal use, I have it under MIT if anybody would like to use it! :)

https://github.com/cgikoray/Nemesis-Memory

July 24, 2014 11:48 PM

Francismoy

Hi!

First of all, thanks for this fantastic article.

I don't fully understand the method alignForwardAdjustmentWithHeader, and in particular, the line:

adjustment += alignment * (neededSpace / alignment);

Wouldn't the first and second alignment cancel off, being equivalent to:

adjustment += neededSpace?

October 12, 2014 11:23 AM

Aqua Costa

Hi,

alignForwardAdjustmentWithHeader calculates the adjustment required to align address and is at least headerSize.

Example:

address = 13

alignment = 4

and the size of the allocation header is 6 bytes.

alignForwardAdjustment would return 3, but that's not enough to store the header.

So alignForwardAdjustmentWithHeader returns 3+4=7 enough to store the header and properly align the address.

adjustment += alignment * (neededSpace / alignment);

Wouldn't the first and second alignment cancel off, being equivalent to:

adjustment += neededSpace?

No, because neededSpace and alignment are ints, so a integer division will be performed.

Example: 2 * 5/2 = 4


//Increase adjustment to fit header
adjustment += alignment * (neededSpace / alignment);

if(neededSpace % alignment > 0)
    adjustment += alignment;

is basically calculating


adjustment += alignment * ceil((float)neededSpace/(float)alignment)

Does it make sense?

October 12, 2014 09:56 PM

Francismoy

Yes, I get it now, thank you :)

I was thinking about multi-threading, and concretely, about lock-free dynamic allocation. Is it possible, in your opinion, to use these custom allocators in a lock-free fashion by declaring then thread-local (c++11) so that each thread uses its own allocator? Or do you think that they (the allocators) should be redesigned to support lock-free dynamic allocation?

October 13, 2014 05:40 PM

Aqua Costa

Do you want every thread to allocate from the same block of memory? If so you'll probably need to add a lock to the allocator (maybe in a wrapper similar to the ProxyAllocator).

In my engine, threads don't usually allocate memory from a global allocator, instead they use separate allocators (each allocator manages a separate chunk of memory) so no there's no need for locks in most cases.

October 17, 2014 11:41 AM

Francismoy

Sorry for the late reply, and thanks for your answer.

I've been researching for a while on lock-free multi-threaded allocators, and I've found several of them (as an example: http://www.research.ibm.com/people/m/michael/pldi-2004.pdf).

As another example, the molecule engine creator adds the possibility of adding synchronization primitives to ensure thread-safe allocations (http://molecularmusings.wordpress.com/2011/08/03/memory-system-part-5/).

The question is: how do you make sure in your engine that each allocator manages a separate chunk of memory?

November 09, 2014 06:17 PM

LloydAnthony

Hi Tiago,

I've been looking for some sample implementation of Pool Allocator for a personal project. I like the example you have shown here. I've been looking to see exactly how it works and I understand quite well. There is only a small line I do not understand in the constructor:

[ PoolAllocator.cpp ]

#include "PoolAllocator.h"

PoolAllocator::PoolAllocator(size_t objectSize, u8 objectAlignment, size_t size, void* mem)
: Allocator(size, mem), _objectSize(objectSize), _objectAlignment(objectAlignment)
{
...

size_t numObjects = (size-adjustment)/objectSize;

// ^ WHY?
...
}

Why do you subtract the adjustment to size? is it necessary?

The article is very good and useful!

November 29, 2014 03:30 PM

SaveliyBaranov

I wrote a pool allocator similar to the one here, but expanding - when it's about to run out of space, it'll allocate an extra chunk of memory and link to it. It's still a work in progress, but in case anyone's interested:

https://github.com/concubicycle/poolallocator

October 11, 2016 09:03 PM

Michael Lojkovic

If I need my allocator to be compatible with vector::insert, for something like a vector<char> which type of allocator would I want to use? A stack allocator that's able to move bits after the insertion point up by x amount? would memmove be the most efficient means of implementing this?

March 27, 2017 05:14 PM

Lmoyer94

I know this is an old topic, but it's the best demonstration of a complete memory allocator that I've found. I'm not very advanced with C++, like I've never used reinterpret_cast or static_cast before, but I had a question about the Aligned Allocator code which uses those keywords.

What type are we supposed to put inside the "< >" for reinterpret_cast and static_cast? The "(" has a redline under it that says it was expecting a "<". Is it just supposed to be <u8> for "address" and "alignment-1". And is "uint8_t" acceptable instead of "u8"? For some reason "u8" is an error for me even after including <cstdint> and <stdint.h>.

So the code for the function would look like:


inline void* alignForward(void* address, uint8_t alignment)
{
return (void*)( ( reinterpret_cast<uint8_t>(address) + static_cast<uint8_t>(alignment-1)) & static_cast<uint8_t>(~(alignment-1)) );
}

November 27, 2017 06:40 PM

rip-off

Please do not reply to old topics. Create a new post, linking to this one for context instead - thanks!

November 27, 2017 11:04 PM

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

Being able to efficiently use the available (limited) memory is extremely important in high performance applications like computer games.

C++: Custom memory allocation

Base Allocator

Memory leak detection

Aligned Allocations

Aligned data definition

Implementation

Linear Allocator

Allocations

Deallocations

Implementation

Stack Allocator

Allocations

Deallocations

Implementation

FreeList Allocator

Allocations

Deallocation

Implementation

Pool Allocator

Allocations

Deallocations

Implementation

Proxy Allocator

Implementation

Allocator Managment

Example usage

Tips & Tricks

Performance Comparison

Malloc vs Linear Allocator

Malloc vs Stack Allocator

Malloc vs FreeList Allocator

Malloc vs Pool Allocator

Conclusion

Reference

Article Update Log

Comments

Recommended Tutorials

Other Tutorials by Aqua Costa

C++: Custom memory allocation

Base Allocator

Memory leak detection

Aligned Allocations

Aligned data definition

Implementation

Linear Allocator

Allocations

Deallocations

Implementation

Stack Allocator

Allocations

Deallocations

Implementation

FreeList Allocator

Allocations

Deallocation

Implementation

Pool Allocator

Allocations

Deallocations

Implementation

Proxy Allocator

Implementation

Allocator Managment

Example usage

Tips & Tricks

Performance Comparison

Malloc vs Linear Allocator

Malloc vs Stack Allocator

Malloc vs FreeList Allocator

Malloc vs Pool Allocator

Conclusion

Reference

Article Update Log

Comments

Recommended Tutorials

Other Tutorials by Aqua Costa

Reticulating splines