Sign in to follow this  

Custom C++ allocator for alligned data

Recommended Posts

Posted (edited)

I am a bit confused about writing a custom allocator for aligned data. My aligned structs/classes typically derive AlignedData (passing the struct/class name as class template argument). The latter provides a custom operator new/new[]/delete/delete[]. I now want to write a custom allocator for these aligned structs, but I am confused about allocate/deallocate vs construct/destroy.

General

#include <malloc.h>

inline void *AllocAligned(size_t size, size_t alignment = 16) noexcept {
    return _aligned_malloc(size, alignment);
}

template < typename DataT >
inline DataT *AllocAligned(size_t count) noexcept {
    return static_cast< DataT * >(AllocAligned(count * sizeof(DataT)));
}

inline void FreeAligned(void *ptr) noexcept {
    if (!ptr) {
        return;
    }

    _aligned_free(ptr);
}

AlignedData

template< typename DataT >
struct AlignedData {

public:

    static void *operator new(size_t size) {
        const size_t alignment = __alignof(DataT);
        
        // __declspec(align) on DataT is required
        static_assert(alignment > 8, 
            "AlignedData is only useful for types with > 8 byte alignment.");
        
        void * const ptr = AllocAligned(size, alignment);
        if (!ptr) {
            throw std::bad_alloc();
        }

        return ptr;
    }

    static void operator delete(void *ptr) noexcept {
        FreeAligned(ptr);
    }

    static void *operator new[](size_t size) {
        return operator new(size);
    }

    static void operator delete[](void *ptr) noexcept {
        operator delete(ptr);
    }
};

AlignedAllocator:

template< typename DataT, size_t AlignmentS = __alignof(DataT) >
struct AlignedAllocator {
    
public

    using value_type = DataT;
    using pointer =  DataT *;
    using reference = DataT &;
    using const_pointer = const DataT *;
    using const_reference = const DataT &;
    using size_type = size_t;
    using difference_type = ptrdiff_t;

    template< typename DataU >
    struct rebind {

    public:

        using other = AlignedAllocator< DataU, AlignmentS >;

        rebind &operator=(const rebind &r) = delete;
        rebind &operator=(rebind &&r) = delete;

    private:

        rebind() = delete;
        rebind(const rebind &r) = delete;
        rebind(rebind &&r) = delete;
        ~rebind() = delete;
    };

    AlignedAllocator() noexcept = default;
    
    AlignedAllocator(
        const AlignedAllocator< DataT, AlignmentS > &allocator) noexcept = default;
    
    AlignedAllocator(
        AlignedAllocator< DataT, AlignmentS > &&allocator) noexcept = default;
    
    template< typename DataU >
    AlignedAllocator(
        const AlignedAllocator< DataU, AlignmentS > &allocator) noexcept {}
    
    ~AlignedAllocator() = default;

    AlignedAllocator< DataT, AlignmentS > &operator=(
        const AlignedAllocator< DataT, AlignmentS > &allocator) noexcept = delete;

    AlignedAllocator< DataT, AlignmentS > &operator=(
        AlignedAllocator< DataT, AlignmentS > &&allocator) noexcept = delete;

    DataT *address(DataT &data) const noexcept {
        return &data;
    }
    
    const DataT *address(const DataT &data) const noexcept {
        return &data;
    }

    size_t max_size() const noexcept {
        return (static_cast< size_t >(0) - static_cast< size_t >(1)) 
            / sizeof(DataT); // independent of definition of size_t
    }

    DataT *allocate(size_t count) const {
        return new DataT[count];
    }

    template< typename DataU >
    DataT *allocate(size_t count, const DataU *) const {
        return allocate(count);
    }

    void deallocate(DataT *data, size_t count) const {
        if (count == 0) {
            return;
         }
         else if (count == 1) {
            delete data;
         }
         else {
            delete[] data;
          }
    }

    template< typename DataU, typename... ConstructorArgsT >
    void construct(DataU *data, ConstructorArgsT&&... args) const {
        new ((void *)data) DataU(std::forward< ConstructorArgsT >(args)...); // this looks like an allocation after all?
    }

    template< typename DataU >
    void destroy(DataU *data) const {
        data->~DataU();
    }

    bool operator==(
        const AlignedAllocator< DataT, AlignmentS > &rhs) const noexcept {
        
        return true; // stateless allocator
    }

    bool operator!=(
        const AlignedAllocator< DataT, AlignmentS > &rhs) const noexcept {
        
        return false; // stateless allocator
    }
};

I am not sure about the AlignmentS template parameter yet. I think of supporting non-AlignedData values as well. In this case, I would duplicate/replicate the content of my custom operator new/new[]/delete/delete[] in allocate and deallocate themselves.

Edited by matt77hias

Share this post


Link to post
Share on other sites
Posted (edited)
3 hours ago, _Silence_ said:

Why not using aligned_alloc ?

Instead of _aligned_malloc? I seem to be more familiar with Windows specific code :D . But if they are the same, I consider using the standard one, though it seems strange that the associated free (versus _aligned_free) is used for malloc/calloc as well.

 

Without relying on AlignedData:

DataT *allocate(size_t count) const {
  void * const ptr = AllocAligned(sizeof(DataT) * count, AlignmentS);
  if (!ptr) {
    throw std::bad_alloc();
  }
  
  return static_cast< DataT * >(ptr);
}

template< typename DataU >
DataT *allocate(size_t count, const DataU *hint) const {
  return allocate(count);
}

void deallocate(DataT *data, size_t count) const {
  FreeAligned((void *)data);
}

 

Edited by matt77hias

Share this post


Link to post
Share on other sites
19 hours ago, matt77hias said:

Instead of _aligned_malloc? I seem to be more familiar with Windows specific code  . But if they are the same, I consider using the standard one, though it seems strange that the associated free (versus _aligned_free) is used for malloc/calloc as well.

Well I don't know at all about the Microsoft version of this function. If you use C++ 11 or C 11, then aligned_alloc will be present in the standard library. If you use C++, then you'll have even more functionalities. So why using a non-standard and non-portable version of a function, even if you don't plan to develop on another platform.

Share this post


Link to post
Share on other sites
Posted (edited)

I like this way, so i know exactly what my memory is and where everything does. Also you can include a guard memory page after to detect overflow:

void *AllocateMemory(const size_t) {
  assert(size > 0);
  void *memory = VirtualAlloc(0, size, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
  return(memory);
}

void FreeMemory(void *ptr) {
	assert(ptr != nullptr);
	VirtualFree(ptr, 0, MEM_FREE);
}

void *AllocateAlignedMemory(const size_t size, const size_t alignment) {
    assert(size > 0);
    assert((alignment > 0) && !(alignment & (alignment - 1)));

    // Allocate empty memory to hold a size of a pointer + the actual size + alignment padding 
    size_t newSize = sizeof(void *) + size + (alignment << 1);
    void *basePtr = AllocateMemory(newSize);

    // The resulting address starts after the stored base pointer
    void *alignedPtr = (void *)((uintptr_t)basePtr + sizeof(void *));

    // Move the resulting address to a aligned one when address is not aligned to the given alignment
    uintptr_t mask = alignment - 1;
    if ((alignment > 1) && (((uintptr_t)alignedPtr & mask) != 0)) {
        *(uintptr_t *)alignedPtr += ((uintptr_t)alignment - ((uintptr_t)alignedPtr & mask));
    }

    // Write the base pointer before the alignment pointer
    *(void **)((void *)((uintptr_t)alignedPtr - sizeof(void *))) = basePtr;

    return(alignedPtr);
}

void FreeAlignedMemory(void *ptr) {
    assert(ptr != nullptr);

    // Free the base pointer which is stored to the left from the given pointer
    void *basePtr = (void *)((void **)((uintptr_t)ptr - sizeof(void *)));
    FreeMemory(basePtr);
}

aligned_alloc() is not the same, as it does thread syncronisation as well.

Edited by Finalspace

Share this post


Link to post
Share on other sites
Posted (edited)
54 minutes ago, Finalspace said:

assert((alignment > 0) && !(alignment & (alignment - 1)));

assert(size);
assert(alignment && !(alignment & (alignment - 1)));

size_t will always be some unsigned whatever sized integer.

54 minutes ago, Finalspace said:

size_t newSize = sizeof(void *) + size + (alignment << 1);

How does the (alignment << 1) padding work? Assume: alignment = 16,  32 bytes will be added by default independent of size?

Edited by matt77hias

Share this post


Link to post
Share on other sites
1 hour ago, matt77hias said:

assert(size);
assert(alignment && !(alignment & (alignment - 1)));

size_t will always be some unsigned whatever sized integer.

How does the (alignment << 1) padding work? Assume: alignment = 16,  32 bytes will be added by default independent of size?

Well the size can be zero. Allocating zero bytes of memory makes no sense - so that assert has its place.

And the padding is always appended, with a little bit of memory waste in mind. If you want you can improve it by calculating the right amount of padding.

Share this post


Link to post
Share on other sites
Posted (edited)
2 minutes ago, Finalspace said:

Well the size can be zero. Allocating zero bytes of memory makes no sense - so that assert has its place.

I don't mean the is-not-zero test (I still have these two tests in my small refactor), I mean the greater-than-test feels less to-the-point since we are dealing with unsigneds anyway. :)

3 minutes ago, Finalspace said:

If you want you can improve it by calculating the right amount of padding.

Idd. this sounds more appropriate, especially if you have many small chuncks (versus using your own memory arenas).

Edited by matt77hias

Share this post


Link to post
Share on other sites
Posted (edited)
5 minutes ago, matt77hias said:

I don't mean the is-not-zero test (I still have these two tests in my small refactor), I mean the greater-than-test feels less to-the-point since we are dealing with unsigneds anyway.

True, for me its a non-brainer. I personally dont think about this kind of things, the compiler will catch it as a warning eventually and it wont hurt. Its just a _mm_cmpge_ps or _mm_cmpgt_ps in the end and done with one or two cycles.

Edited by Finalspace

Share this post


Link to post
Share on other sites
Posted (edited)

@OP that's a lot of code! Maybe it would be simpler to use malloc/free and perform your own alignment? Here was my strategy for 16 byte aligned allocations: https://github.com/RandyGaul/tinyheaders/blob/master/tinysound.h#L475-L490

static void* malloc16( size_t size )
{
	void* p = malloc( size + 16 );
	if ( !p ) return 0;
	unsigned char offset = (size_t)p & 15;
	p = (void*)TS_ALIGN( p + 1, 16 );
	*((char*)p - 1) = 16 - offset;
	TS_ASSERT( !((size_t)p & 15) );
	return p;
}

static void free16( void* p )
{
	if ( !p ) return;
	free( (char*)p - (size_t)*((char*)p - 1) );
}

 

Edited by Randy Gaul

Share this post


Link to post
Share on other sites

Are you aware of placement new and placement delete? These can be used to invoke constructor/desctructor code, while the use of malloc/free (or similar functions) are just for the allocations.

Share this post


Link to post
Share on other sites
12 hours ago, Randy Gaul said:

Are you aware of placement new and placement delete? These can be used to invoke constructor/desctructor code, while the use of malloc/free (or similar functions) are just for the allocations.

I figured this out in the meantime:

new ((void *)data) DataU(std::forward< ConstructorArgsT >(args)...);

No allocation will happen. The allocation/deallocation vs construction/destruction difference is now clear. For clarity, my final code is:

inline void *AllocAligned(size_t size, size_t alignment = 16) noexcept {
    return _aligned_malloc(size, alignment);
}

template < typename DataT >
inline DataT *AllocAlignedData(size_t count, size_t alignment = 16) noexcept {
    return static_cast< DataT * >(AllocAligned(count * sizeof(DataT), alignment));
}

inline void FreeAligned(void *ptr) noexcept {
    if (!ptr) {
        return;
    }

    _aligned_free(ptr);
}

template< typename DataT >
struct AlignedData {

    static void *operator new(size_t size) {
        const size_t alignment = __alignof(DataT);
        
        void * const ptr = AllocAligned(size, alignment);
        if (!ptr) {
            throw std::bad_alloc();
        }

        return ptr;
    }

    static void operator delete(void *ptr) noexcept {
        FreeAligned(ptr);
    }

    static void *operator new[](size_t size) {
        return operator new(size);
    }

    static void operator delete[](void *ptr) noexcept {
        operator delete(ptr);
    }
};

template< typename DataT, size_t AlignmentS = __alignof(DataT) >
struct AlignedAllocator final {
    
    using value_type = DataT;
    using pointer =  DataT *;
    using reference = DataT &;
    using const_pointer = const DataT *;
    using const_reference = const DataT &;
    using size_type = size_t;
    using difference_type = ptrdiff_t;

    template< typename DataU >
    struct rebind final {

        using other = AlignedAllocator< DataU, AlignmentS >;
    };

    AlignedAllocator() noexcept = default;
    AlignedAllocator(
        const AlignedAllocator< DataT, AlignmentS > &allocator) noexcept = default;
    AlignedAllocator(
        AlignedAllocator< DataT, AlignmentS > &&allocator) noexcept = default;
    template< typename DataU >
    AlignedAllocator(
        const AlignedAllocator< DataU, AlignmentS > &allocator) noexcept {}
    ~AlignedAllocator() = default;
    AlignedAllocator< DataT, AlignmentS > &operator=(
        const AlignedAllocator< DataT, AlignmentS > &allocator) noexcept = delete;
    AlignedAllocator< DataT, AlignmentS > &operator=(
        AlignedAllocator< DataT, AlignmentS > &&allocator) noexcept = delete;

    
    DataT *address(DataT &data) const noexcept {
        return &data;
    }
    
    const DataT *address(const DataT &data) const noexcept {
        return &data;
    }

    size_t max_size() const noexcept {
        return (static_cast< size_t >(0) - static_cast< size_t >(1)) 
            / sizeof(DataT);
    }

    DataT *allocate(size_t count) const {
        DataT * const data = AllocAlignedData(count, AlignmentS);
        if (!data) {
            throw std::bad_alloc();
        }

        return data;
    }

    template< typename DataU >
    DataT *allocate(size_t count, const DataU *hint) const {
        (void)hint;
        return allocate(count);
    }

    void deallocate(DataT *data, size_t count) const {
        (void)count;
        FreeAligned((void *)data);
    }
    
    template< typename DataU, typename... ConstructorArgsT >
    void construct(DataU *data, ConstructorArgsT&&... args) const {
        new ((void *)data) DataU(std::forward< ConstructorArgsT >(args)...);
    }

    template< typename DataU >
    void destroy(DataU *data) const {
        data->~DataU();
    }

    bool operator==(
        const AlignedAllocator< DataT, AlignmentS > &rhs) const noexcept {
        
        return true;
    }

    bool operator!=(
        const AlignedAllocator< DataT, AlignmentS > &rhs) const noexcept {
        
        return false;
    }
};

 

Share this post


Link to post
Share on other sites
9 hours ago, Hodgman said:

Do you need a custom C++ allocator for this? This is just so you can customise the allocation logic of std::vector, etc, right? Does the default one not do the right thing of calling your custom new operator?

This is for using smart pointers. std::make_shared (for constructing std::shared_ptr) will allocate the control and data block together with one allocation, ignoring custom operator new. An alternative is defining your own std::make_shared in a fashion similar to std::make_unique which explicitly invokes operator new, but it seems cleaner to be a bit more conform to the standard and just define an allocator.

Furthermore, I removed the implicit dependence on AlignedData, so if you want to align your objects, irrespective of the need to align them, you can do so with the custom allocator. Idd. for primitive types (e.g. __m128). 

Share this post


Link to post
Share on other sites
On 10/4/2017 at 1:52 AM, Hodgman said:

Do you need a custom C++ allocator for this? This is just so you can customise the allocation logic of std::vector, etc, right? Does the default one not do the right thing of calling your custom new operator?

Standard new (and therefore the default allocator as well), don't align by more than the maximum alignment of any built in types, and __m128 for example is not a C++ builtin type

Share this post


Link to post
Share on other sites
8 minutes ago, l0calh05t said:

Standard new (and therefore the default allocator as well), don't align by more than the maximum alignment of any built in types, and __m128 for example is not a C++ builtin type

He's got a custom new that calls _aligned_malloc, I assumed the default allocator would be nice enough to use the type's custom new :(

This is one of those dark corners of C++ for me because every game project I've worked on has just straight up banned the new keyword (besides placement new inside their own allocation header) :D 

Share this post


Link to post
Share on other sites
3 minutes ago, Hodgman said:

He's got a custom new that calls _aligned_malloc, I assumed the default allocator would be nice enough to use the type's custom new :(

This is one of those dark corners of C++ for me because every game project I've worked on has just straight up banned the new keyword (besides placement new inside their own allocation header) :D 

Allocators are very ugly.

For an aligned allocator, you could always use the one included with Eigen https://eigen.tuxfamily.org/dox/classEigen_1_1aligned__allocator.html

Share this post


Link to post
Share on other sites
2 hours ago, Hodgman said:

He's got a custom new that calls _aligned_malloc, I assumed the default allocator would be nice enough to use the type's custom new :(

This is one of those dark corners of C++ for me because every game project I've worked on has just straight up banned the new keyword (besides placement new inside their own allocation header) :D 

Also, due to the separation of allocate/deallocate and construct/destroy, the C++ allocators cannot directly use new/delete (makes sense if you consider that std::vector can allocate memory on reserve without constructing anything, which isn't possible with custom new/delete)

Share this post


Link to post
Share on other sites
Posted (edited)
2 hours ago, Hodgman said:

He's got a custom new that calls _aligned_malloc, I assumed the default allocator would be nice enough to use the type's custom new :(

Don't really like it myself but yeah that's the standard.

13 minutes ago, l0calh05t said:

Also, due to the separation of allocate/deallocate and construct/destroy, the C++ allocators cannot directly use new/delete (makes sense if you consider that std::vector can allocate memory on reserve without constructing anything, which isn't possible with custom new/delete)

Idd. that is basically the reason behind the "count" parameter, I assume, you can directly allocate a larger memory block, just in case.

Edited by matt77hias

Share this post


Link to post
Share on other sites
On 10/3/2017 at 9:39 AM, Finalspace said:

void *AllocateAlignedMemory(const size_t size, const size_t alignment) {     assert(size > 0);     assert((alignment > 0) && !(alignment & (alignment - 1)));     // Allocate empty memory to hold a size of a pointer + the actual size + alignment padding      size_t newSize = sizeof(void *) + size + (alignment << 1);     void *basePtr = AllocateMemory(newSize);     // The resulting address starts after the stored base pointer     void *alignedPtr = (void *)((uintptr_t)basePtr + sizeof(void *));     // Move the resulting address to a aligned one when address is not aligned to the given alignment     uintptr_t mask = alignment - 1;     if ((alignment > 1) && (((uintptr_t)alignedPtr & mask) != 0)) {         *(uintptr_t *)alignedPtr += ((uintptr_t)alignment - ((uintptr_t)alignedPtr & mask));     }     // Write the base pointer before the alignment pointer     *(void **)((void *)((uintptr_t)alignedPtr - sizeof(void *))) = basePtr;     return(alignedPtr); }

I recently noticed some code very similar to your allocator in the book Game Engine Architecture (2nd Edition). The author adds the alignment to the requested size, since worst case you are misaligned by alignment - 1 bytes. Since alignment bytes are added instead of alignment - 1 bytes, there is always a spare byte which is used for storing the adjustment, needed for recovering the original pointer when freeing. This approach only works for alignments of 128 bytes or lower (since the next power of 2, 256, cannot be stored in one byte).

Your approach handles all alignments by explicitly making room for a complete pointer instead of an offset. Though this makes me wonder if you cannot replace alignment << 1 by alignment - 1?

Share this post


Link to post
Share on other sites
8 hours ago, matt77hias said:

Your approach handles all alignments by explicitly making room for a complete pointer instead of an offset. Though this makes me wonder if you cannot replace alignment << 1 by alignment - 1?

It also seems to me that "alignment - 1" would be sufficient.

Edited by Zipster

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this