# std::map, pool alocators

This topic is 3412 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

anyone used custom allocators with std containers? Items would be of fixed size, and very small, but they get inserted / removed from maps very frequently. So I need a pool thing. Oh yeah, ideally, all the maps (there are lots) would share the same pool. I don't know if that's possible.

##### Share on other sites
sure it is, take a look here

http://www.boost.org/doc/libs/1_39_0/libs/pool/doc/interfaces/pool_alloc.html

##### Share on other sites
I implemented some computational geometry algorithms and experimented a bit with pool allocation. My structures involved things like interval trees, hashtables and lots of linked lists. I experimented with my own allocator<> using pooling behind the scenes as a drop in replacement for the stl managed stuff and obtained a modest improvement, but didnt find the results compelling enough to continue using it over the far more standard std::allocator<>. My experiments with boost pool obtained similar performance results. One thing to be aware of is that the boost stuff doesn't (so far as I know) have a drop in stl compatable allocator that also supports granular control over deallocation - instead its a sort of singleton global cleanup thing that frees up everything at one time - which is a bit of a comprimise.

I suspect the reason that my performance gains with pooling were marginal is because a standard allocator's job is really very simple. It knows the size of the thing to be allocated and destroyed, so can probably just maintain a linked list embedded into relatively large contiguous blocks. On reallocation there is no trying to match up an appropriately sized block from the store like malloc() or new)_ and instead it ought to be a pretty guaranteed O(1) operation that is also going to get mostly inlined. One thing I suspect is that there may be cache implications involved with pooling whereas more deterministic memory handling would reuse reclaimed memory which might be overallocated with pooling. This of course is more in the realm of speculation on my part and would vary wildly on use anyway. I would agree that an allocator<> is the right point at which to experiment - at least so that it can be reused with the stl and ease doing comparative type testing with a minimum of effort.

##### Share on other sites
Quote:
 Original post by chairthrowerI suspect the reason that my performance gains with pooling were marginal is because a standard allocator's job is really very simple.

When using custom allocator, the run-time improvement of allocations should be between factor 5 and 50. Perhaps your use of structures performs negligible number of allocations so they don't represent a bottleneck.

The std::allocator is the absolute worst possible thing to use for small allocations (maps, linked lists).

##### Share on other sites
Quote:
 Original post by AntheusThe std::allocator is the absolute worst possible thing to use for small allocations (maps, linked lists).

This depends on the std::allocator implementation. For example, Borland's C++ compilers used to use a pooled allocator by default for small block allocations. (I don't know if they still do this.)

##### Share on other sites
Quote:
Quote:
 Quote: Original post by chairthrower I suspect the reason that my performance gains with pooling were marginal is because a standard allocator's job is really very simple.

When using custom allocator, the run-time improvement of allocations should be between factor 5 and 50. Perhaps your use of structures performs negligible number of allocations so they don't represent a bottleneck.

The std::allocator is the absolute worst possible thing to use for small allocations (maps, linked lists).

I think (I have notes but its jumbled with source that is also mixed with lots of other tests) that I used my own and a boost pool allocator for an adapted interval tree based on stl port's rbtree as a sweepline container compiled with modern g++ with optimization turned up. My best recollection is about 2x performance improvement running the sweep while avoiding any numeric edge intersection calcs. I believe that this structure was probably quite heavy to manipulate - eg tree rebalancings (the additional interval requirement meant it was a couple of times slower than a std::set or std::map structure by comparison). So perhaps this extrac work tended to predominate in my timings and you would see higher ratios working with say a simple linked list. Something that did play on my mind was the fact that only approx logn number of edges would be needed to be contained at any point in the sweep - and so I believed there could have been an issue ensuring everything relevant was kept packed tightly in cache which I decided could be better guaranteed by using std::allocator (freeing my edge structures as it went) - but it is possible that using the pooling meant things were spilling to main memory or even l2 and that this would play on timings. My test case was map geometry with approx 100k vertices - but I freely acknowledge that I do not really have the experience to make any judgement about this and know little about what type of cache behaviour I could expect. In any case this is always going to be anectodal so perhaps I should have refrained from making any comment about my relative specific performance tests. It should be relatively easy to develop and test an allocator for a specific problem. I agree that pool allocators would seem better adapted to lists sets maps and should probably be avoided for things like vectors that allocate relatively infrequently.

Edit: More context, since my algorithms were dominated by my initial logn sort on edge.min_y and also by the logn behaviour of the sweepline container - I judged the improvement to be not justified in my case.

[Edited by - chairthrower on May 21, 2009 12:35:33 PM]

##### Share on other sites
Ok, I had to clarify this again for myself, and with a restricted example. Comparing speed of boost pool allocator with std::allocator, recent g++ -O3. clock_t and uint64_t (c99?) may not be portable unfortunately. dummy just contains summed contents of raw pointer values and is returned from main() to force evaluation and avoid compiler optimising too much away. boost fast pool allocator allocates 1m ints and then frees them all in one hit. std::allocator allocates 1m ints and frees each one immediately after allocating as it goes. Each test runs 100 times.

Results are about 3 : 1 improvement in speed for pooled over std allocator,

fast_pool_allocator: 1290ms
std::allocator: 3830ms

The test is *heavily* orientated in favour of the std::allocator and probably could not even be considered a representative example. This is due to the immediate free after allocation and the presumption that std::allocator will just reallocate what it just freed. Additionally pool_alloc with 1mx4 bytes might be going outside L2 cache. It would be possible to change the behaviour to be a more representative use case but I believe it shows that pooling is not a given order of magnitude type win. Other factors such as standardization of code and contingent operations (tree rotatations in std::set etc) and cache should be considered.

#include <iostream>#include <ctime>#include <boost/pool/pool_alloc.hpp>static clock_t  m_sw_start;static void start_timer(){    m_sw_start = clock();}static double elapsed_time() {    clock_t stop = clock();    return double(stop - m_sw_start) * 1000.0 / CLOCKS_PER_SEC;}int main(){	unsigned n = 100;	uint64_t	dummy = 0;	start_timer();	for( unsigned k = 0; k < n; ++k) 	{ 			typedef boost::fast_pool_allocator< int>        alloc_type;		alloc_type      pa; 		for( unsigned i = 0; i < 1000000; ++i) { 			int	*pi = pa.allocate( 1); 			dummy += (uint64_t)pi;		}		boost::singleton_pool< boost::fast_pool_allocator_tag, sizeof( int)>::release_memory(); 	}	std::cout << "fast_pool_allocator " << elapsed_time() << "ms" << std::endl;	start_timer();	for( unsigned k = 0; k < n; ++k) 	{ 			typedef std::allocator< int>        alloc_type;		alloc_type      pa; 		for( unsigned i = 0; i < 1000000; ++i) { 			int	*pi = pa.allocate( 1); 			dummy += (uint64_t)pi;			pa.deallocate( pi, 1);		}	}	std::cout << "std::allocator " << elapsed_time() << "ms" << std::endl;	return dummy;};

Edit: this is a much better example for std::allocator that for gcc still gives the 1/3 improvement
  for( unsigned i = 0; i < 10000; ++i) {            const unsigned nn = 100;            int *pi[ nn];            for( unsigned j = 0; j < nn; ++j)                pi[ j] = pa.allocate( 1);            func( pi, nn);      // do nothing function defined in external file guarantees                                 // filling of pi etc.             for( unsigned j = 0; j < nn; ++j) {                dummy += (uint64_t)pi[ j];                pa.deallocate( pi[ j], 1);            }        }

[Edited by - chairthrower on May 21, 2009 4:33:58 PM]

##### Share on other sites
thanks for the input. I'm not sure I will get a performance boost out of it, or if it really matters (It's for a demo), but it's nice to have the option.

Here's a sample test I did. Getting the semantics right was a bit of a pain, but here goes.

Quote:
 #include #include #include struct Object { int m_dummy; };typedef std::less ObjectCompare;typedef std::pair ObjectPair;typedef boost::pool_allocator ObjectPool;typedef std::map ObjectMap;void main(int argc, char** argv){ ObjectPool pool; ObjectMap map(ObjectCompare(), pool); map.insert(ObjectPair(0, new Object));}

1. 1
2. 2
3. 3
Rutin
18
4. 4
5. 5
JoeJ
12

• 14
• 9
• 22
• 9
• 31
• ### Forum Statistics

• Total Topics
632618
• Total Posts
3007482
• ### Who's Online (See full list)

There are no registered users currently online

×