Custom allocator for std::string

Started by
6 comments, last by Antheus 15 years, 4 months ago
So far I have avoided using the std::string class (don't hate me) because I work on platforms where memory fragmentation can be a real problem e.g. Nintendo DS. However a recent thread on this site (entitled 'String vs char arrays') convinced me that I should at least do some research. From what I can tell the best solution would be for me to define a new string type that uses a custom allocator. A bit like this:
typedef basic_string<char, char_traits<char>, MyAllocator<char> > MyString;
The MyAllocator class would use multiple free lists of fixed block sizes (16 bytes, 32 bytes, 64 bytes, 128 bytes etc...) to avoid fragmentation and improve performance. Obvioulsy this would result in a small amount of wasted memory, but that's less of a problem than fragmentation so far as I'm concenred. So before I go ahead and implemnent this system, I was just wondering if anyone has attempted anything similar, and if so whether they have any pearls of wisdom to share with me.
Advertisement
I have not personally created an allocator just for std::string so I have no knowledge to help you there.

One other idea that might keep you thinking. If you already have a memory allocation system in place, you can override operator new, delete, new[], delete[]. I'm not sure if it is a guarantee that the std functions use operator new but I have found that at least win32 does by default, your platform may as well.

The reason this could be useful is that you could use your memory allocator for every allocation including all std class as well as every time you use new or delete.
Thanks Wolfdog, but I am already overriding the global new and delete operators. The reason I want to create a custom allocator is to prevent strings from calling these functions, because they are general purpose and therefore not always as efficient as I'd like (remember I'm developing for Nintendo DS, and resources are slim). A custom allocator will hopefully improve string performance, reduce fragmentation and also make debugging a little easier (keeping track of memory is that little bit trickier when strings are constantly allocating from the heap). Hope that all makes sense.
Given the vagueness of the question I can only say that you should consult a good C++ book that covers allocators to get a checklist of things to watch out for. (Example: Scott Meyer's Effective STL)
I believe this(powerpoint) is supposed to be a good read.
I was recently messing with this while implementing stack allocator for strings. For example, under MVC2008, all allocators passed as instances are redundantly copied during initialization. gcc on the other hand allocate()s on one instance, but then stores another (apparently doesn't like stateful allocators). And there's more. std::swap behavior is also interesting one, varying between allocators and containers. Trying to release memory allocated by vector is next to impossible in some cases, even with resize/swap. Last issue I found annoying was that assignments between same containers using different allocators is next to impossible, or at very least, nobody bothers doing it. In case of strings, most simply assume std::string. I suppose it makes sense, but severely limits usefulness of custom allocators.

That presentation is a good start, but ultimately I'd suggest verifying everything yourself. There is some allocator code out there, but IMHO it's not adequately tested or at least documented.

Make sure to understand the operations on standard library well. Especially copy constructions and assignments (especially various pass-by-value cases and temporaries), as well as swap behavior.

I'd suggest making custom allocator stateless (your storage is defined independently of allocator instance), with properly implemented comparison and copy construction.

I also found that littering the code with debug output (I prefer printf for memory management related classes), emitting the allocators and actual storage pointers to be useful in tracking down copies to be handy.

Note that some problems I encountered were related to stack allocations, heap-based allocators are considerably simpler to develop, so some of the problems I mention are not relevant to your case.

Quote:The MyAllocator class would use multiple free lists of fixed block sizes (16 bytes, 32 bytes, 64 bytes, 128 bytes etc...)


For Lua allocator, I used similar approach. But instead of keeping multiple lists, I re-used one single list.

Top page would (for example) 64k (64x1024). If I needed 16 byte allocator, I'd allocate the allocator itself within one of these pages (64*16). Again, this was intended for Lua, where I'd assume I have 1 relatively small (several megabytes), but continuous block of memory, into which I need to store objects of arbitrary size (6-2048 bytes).
Quote:Original post by Antheus
gcc on the other hand allocate()s on one instance, but then stores another (apparently doesn't like stateful allocators).


Eh?

Quote:Trying to release memory allocated by vector is next to impossible in some cases, even with resize/swap.


std::vector<T,Allocator>(original).swap(original); // should get rid of excess allocation...

Quote:Last issue I found annoying was that assignments between same containers using different allocators is next to impossible, or at very least, nobody bothers doing it.


std::vector<T,Allocator1> vector_with_allocator1( vector_with_allocator2.begin(), vector_with_allocator2.end() );
vector_with_allocator1.assign( vector_with_allocator3.begin(), vector_with_allocator3.end() );

Annoying at times, admittedly.
Quote:
Quote:Last issue I found annoying was that assignments between same containers using different allocators is next to impossible, or at very least, nobody bothers doing it.


std::vector<T,Allocator1> vector_with_allocator1( vector_with_allocator2.begin(), vector_with_allocator2.end() );
vector_with_allocator1.assign( vector_with_allocator3.begin(), vector_with_allocator3.end() );

Annoying at times, admittedly.


I was primarily trying to have a drop-in replacement for string. Something like:
struct stack_alloc {  ....  char buffer[1024];};typedef std::basic_string<char, char_traits<char>, stack_alloc> StackString;void foo(const StackString & s) {  foo_that_takes_std_string(s);}...foo("Hello World!");

I was hoping I could fool the compiler into using implicit conversion in above case, to avoid heap allocation.

The above worked without a problem under MVS, but gcc didn't like the approach. Note that allocator and its buffer are allocated implicitly.

For gcc I had to provide an instance to buffer myself, not as part of allocator.

The other issue with conversion however comes from foo_that_takes_std_string() not accepting my version. Obviously it's possible to convert, but that makes an extra copy, which is the very thing I tried to avoid.

Quote:Eh?


In above example, buffer is part of allocator. gcc calls allocator1.allocate(), copies it into allocator2, then destroys allocator1. Needless to say, the pointer provided by allocator1.allocate() is no longer valid. I'm not claiming this is invalid behavior, it just shows the differences between platforms.

MVS on the other hand copies allocator, then copies it again (for no apparent reason), and uses that version for the rest of owning object's life-time. The "copies it again" is completely redundant, but happens with all containers.


I'm also aware of the swap trick for vector, but in some cases deallocate() simply wasn't called until vector itself was destroyed.

This topic is closed to new replies.

Advertisement