Implementing custom memory allocators

Started by
5 comments, last by colossal 9 years, 8 months ago

Last year there was a great article on custom general memory allocators: http://www.gamedev.net/page/resources/_/technical/general-programming/c-custom-memory-allocation-r3010

I decided to take a hand at modernizing and fully implementing the article for my own use: https://github.com/cgikoray/Nemesis-Memory I am looking to see if anyone has comments or criticisms of my work thus far, as I would love to improve it as much as I can. Thank you anyone in advance, I am open to all opinions. Be as harsh as you wish!

Software Engineer | Credited Titles: League of Legends, Hearthstone

Advertisement

Just for purposes of education, one of the issues I have found is that my use of lock and unlock inside my concrete allocate() and deallocate() functions, currently can result result in a deadlock due to the fact that I have a return branch right after unlocking a mutex. To solve this I have to follow the principle of RAII and use std::lock_guard from the concurrency library of C++11. I'll probably end up still functionally using my hybrid_lock as a mutex concept for the lock_guard so that I can use spin locking via atomics.

Software Engineer | Credited Titles: League of Legends, Hearthstone

Yes, a single threaded memory allocator is MUCH easier than a multithreaded allocator.

The same can said of most systems, in fact.

To solve this I have to follow the principle of RAII and use std::lock_guard from the concurrency library of C++11. I'll probably end up still functionally using my hybrid_lock as a mutex concept for the lock_guard so that I can use spin locking via atomics.


That is the whole point of those RAII wrapper classes. Never take a lock without them.


So far as allocators and threading in general:

High-performance algorithms and locking/synchronization don't play well together. Every time you take a lock you're hitting some overhead (how much depends on the type of lock) and every time a thread is stuck waiting on a lock you're throwing away CPU cycles. When working with multiple threads you really really want to isolate the threads as much as possible; not doing this is one of the reasons people think threading is so hard (it's really not). Use algorithms that allow the threads to work as independently of each other as possible.

Typical modern general-purpose memory allocators make use of per-thread caches. This allows most allocations/deallocations on a thread to operate without every needing to touch a lock or any shared data structure.

A more thorough overview of one such allocator is http://jamesgolick.com/2013/5/19/how-tcmalloc-works.html.

See also the excellent jemalloc

More constrained platforms might not be able to pay the overhead in memory "waste" of thread caches. Heavy threading or memory allocation aren't super ocmmon on such platforms (in games, anyway), though.

Specific allocators for various needs are often single-threaded in games since each instance of an allocator is typically only used by a single system running on a specific thread. Thread safety in these kinds of allocators is mostly just about ensuring only a single thread uses them, not allowing multiple threads to use them.

Sean Middleditch – Game Systems Engineer – Join my team!

Some quick-glance comments:

It might not play nicely with polymorphic arrays, but then again, I don't know if regular new[]/delete[] work correctly in this case either!
class Base { virtual ~Base(){}; }; class Derived : public Base { int foo[42]; };
Base* ptr = create_array<Derived>(1);
destroy_array(ptr);//uh oh, might use the wrong header_size

With linear_allocator, it's impossible to destruct any created objects (without asserting inside deallocate).

Should the pool_allocator assert that the requested size is smaller or equal to m_type_size?

IMHO, the constant use of the preprocessor makes the code harder to read:
#ifdef _DEBUG
    assert(condition && "message");
#endif
I would instead make your own assertion macro:
#ifndef NEMESIS_ASSERT
# ifdef _DEBUG
#  define NEMESIS_ASSERT( condition, message ) assert(condition && message)
# else
#  define NEMESIS_ASSERT( condition, message )
# endif
#endif

To solve this I have to follow the principle of RAII and use std::lock_guard from the concurrency library of C++11. I'll probably end up still functionally using my hybrid_lock as a mutex concept for the lock_guard so that I can use spin locking via atomics.


That is the whole point of those RAII wrapper classes. Never take a lock without them.


So far as allocators and threading in general:

High-performance algorithms and locking/synchronization don't play well together. Every time you take a lock you're hitting some overhead (how much depends on the type of lock) and every time a thread is stuck waiting on a lock you're throwing away CPU cycles. When working with multiple threads you really really want to isolate the threads as much as possible; not doing this is one of the reasons people think threading is so hard (it's really not). Use algorithms that allow the threads to work as independently of each other as possible.

Typical modern general-purpose memory allocators make use of per-thread caches. This allows most allocations/deallocations on a thread to operate without every needing to touch a lock or any shared data structure.

A more thorough overview of one such allocator is http://jamesgolick.com/2013/5/19/how-tcmalloc-works.html.

See also the excellent jemalloc

More constrained platforms might not be able to pay the overhead in memory "waste" of thread caches. Heavy threading or memory allocation aren't super ocmmon on such platforms (in games, anyway), though.

Specific allocators for various needs are often single-threaded in games since each instance of an allocator is typically only used by a single system running on a specific thread. Thread safety in these kinds of allocators is mostly just about ensuring only a single thread uses them, not allowing multiple threads to use them.

Sean,

Thank you for the great advice. I have actually been asking myself the question, "Do I want to support threaded allocations/deallocations at all?" The entire purpose of this library is to provide functionality for specific types of allocation patterns, and I think that limiting their use to "same thread context" allocations might be only beneficial to my design choices. I found in performance tests that the introduction of mutex locks to my allocation algorithms basically proved that I should just use malloc and new anyway.

Thank you for the links, definitely going to look into them.

Software Engineer | Credited Titles: League of Legends, Hearthstone

Some quick-glance comments:

It might not play nicely with polymorphic arrays, but then again, I don't know if regular new[]/delete[] work correctly in this case either!


class Base { virtual ~Base(){}; }; class Derived : public Base { int foo[42]; };
Base* ptr = create_array<Derived>(1);
destroy_array(ptr);//uh oh, might use the wrong header_size

With linear_allocator, it's impossible to destruct any created objects (without asserting inside deallocate).

Should the pool_allocator assert that the requested size is smaller or equal to m_type_size?

IMHO, the constant use of the preprocessor makes the code harder to read:

#ifdef _DEBUG
    assert(condition && "message");
#endif
I would instead make your own assertion macro:

#ifndef NEMESIS_ASSERT
# ifdef _DEBUG
#  define NEMESIS_ASSERT( condition, message ) assert(condition && message)
# else
#  define NEMESIS_ASSERT( condition, message )
# endif
#endif

Hodgman,
Thank you for the comments -- regarding the linear allocator, I do in fact believe that this is an oversight on my part. The purpose of the allocator is to have a lot of objects at once all with the same lifetime. I am going to change destroy_all() to fit this, and loop through the allocated objects invoking their destructors.
The pool allocator should ABSOLUTELY be asserting that particular oversight as well.
And I agree that my assertion code needs more work. I already have enough preprocessor work as it is and should aim to make the code more readable. I certainly would like to isolate preprocessor directives as much as I can, as you have suggested.

Software Engineer | Credited Titles: League of Legends, Hearthstone

This topic is closed to new replies.

Advertisement