Jump to content

  • Log In with Google      Sign In   
  • Create Account

Custom Allocator Efficiency Comparison


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
4 replies to this topic

#1 jerrinx   Members   -  Reputation: 208

Like
0Likes
Like

Posted 14 May 2012 - 01:48 AM

Hey Guys,

I made a custom allocator (based on Doug Lee allocator) and a utility library for C++.
When I use the custom allocator I am not getting a lot of difference when compared to the windows default malloc and free.

This is my result on the release version of the program run from console.


########Windows Allocator########
Sample 0 Time: 56.000000 ms
Sample 1 Time: 52.000000 ms
Sample 2 Time: 46.000000 ms
Sample 3 Time: 55.000000 ms
Sample 4 Time: 54.000000 ms
Sample 5 Time: 52.000000 ms
Sample 6 Time: 46.000000 ms
Sample 7 Time: 54.000000 ms
Sample 8 Time: 51.000000 ms
Sample 9 Time: 46.000000 ms
Sample 10 Time: 58.000000 ms
Sample 11 Time: 46.000000 ms
Sample 12 Time: 48.000000 ms
Sample 13 Time: 46.000000 ms
Sample 14 Time: 50.000000 ms
Sample 15 Time: 55.000000 ms
Sample 16 Time: 56.000000 ms
Sample 17 Time: 45.000000 ms
Sample 18 Time: 47.000000 ms
Sample 19 Time: 45.000000 ms
Sample 20 Time: 47.000000 ms
Sample 21 Time: 56.000000 ms
Sample 22 Time: 47.000000 ms
Sample 23 Time: 45.000000 ms
Sample 24 Time: 46.000000 ms
Sample 25 Time: 57.000000 ms
Sample 26 Time: 57.000000 ms
Sample 27 Time: 49.000000 ms
Sample 28 Time: 56.000000 ms
Sample 29 Time: 46.000000 ms


########Custom Allocator########
Sample 0 Time: 47.000000 ms
Sample 1 Time: 44.000000 ms
Sample 2 Time: 39.000000 ms
Sample 3 Time: 40.000000 ms
Sample 4 Time: 38.000000 ms
Sample 5 Time: 39.000000 ms
Sample 6 Time: 38.000000 ms
Sample 7 Time: 40.000000 ms
Sample 8 Time: 40.000000 ms
Sample 9 Time: 38.000000 ms
Sample 10 Time: 42.000000 ms
Sample 11 Time: 39.000000 ms
Sample 12 Time: 38.000000 ms
Sample 13 Time: 45.000000 ms
Sample 14 Time: 44.000000 ms
Sample 15 Time: 40.000000 ms
Sample 16 Time: 37.000000 ms
Sample 17 Time: 38.000000 ms
Sample 18 Time: 45.000000 ms
Sample 19 Time: 43.000000 ms
Sample 20 Time: 46.000000 ms
Sample 21 Time: 44.000000 ms
Sample 22 Time: 45.000000 ms
Sample 23 Time: 37.000000 ms
Sample 24 Time: 45.000000 ms
Sample 25 Time: 42.000000 ms
Sample 26 Time: 39.000000 ms
Sample 27 Time: 41.000000 ms
Sample 28 Time: 38.000000 ms
Sample 29 Time: 42.000000 ms

For the custom allocator
allocation - O(number of free chunks)
delete - O(1)

Does any one else have any information regarding general statistics for custom allocators they have used as compared to the native one ?
Any information is greatly appreciated..

I included the test file for reference.

Thanks,
Jerry

Attached Files


JerrinX

Sponsor:

#2 Hodgman   Moderators   -  Reputation: 30352

Like
2Likes
Like

Posted 14 May 2012 - 06:41 AM

I would expect Microsoft/GCC/etc.'s malloc to perform almost as well as something like dlmalloc. Actually, I wouldn't be surprised if they were actually using dlmalloc themselves!

Not having your allocator/list code, I obviously can't run the same test myself, but I copied your code roughly to these results:
########Standard List########

Sample 0 Time: 174.658611 ms

Sample 1 Time: 178.035155 ms

Sample 2 Time: 170.639098 ms

Sample 3 Time: 169.447884 ms

Sample 4 Time: 171.300843 ms

Sample 5 Time: 171.426281 ms

Sample 6 Time: 170.957386 ms

Sample 7 Time: 171.255186 ms

Sample 8 Time: 171.174973 ms

Sample 9 Time: 171.274394 ms

Sample 10 Time: 172.276601 ms

Sample 11 Time: 173.462272 ms

Sample 12 Time: 172.348276 ms

Sample 13 Time: 172.015920 ms

Sample 14 Time: 171.314925 ms

Sample 15 Time: 170.047760 ms

Sample 16 Time: 172.620490 ms

Sample 17 Time: 171.961725 ms

Sample 18 Time: 170.538411 ms

Sample 19 Time: 172.772378 ms

Sample 20 Time: 171.157062 ms

Sample 21 Time: 170.712486 ms

Sample 22 Time: 171.153218 ms

Sample 23 Time: 171.347350 ms

Sample 24 Time: 171.799183 ms

Sample 25 Time: 170.436874 ms

Sample 26 Time: 169.666752 ms

Sample 27 Time: 170.434311 ms

Sample 28 Time: 171.126768 ms

Sample 29 Time: 169.729471 ms



########Malloc/Free########

Sample 0 Time: 72.385743 ms

Sample 1 Time: 77.789746 ms

Sample 2 Time: 78.608923 ms

Sample 3 Time: 76.863475 ms

Sample 4 Time: 73.470719 ms

Sample 5 Time: 74.614152 ms

Sample 6 Time: 73.470294 ms

Sample 7 Time: 73.288970 ms

Sample 8 Time: 74.126065 ms

Sample 9 Time: 73.396482 ms

Sample 10 Time: 74.125208 ms

Sample 11 Time: 73.187426 ms

Sample 12 Time: 73.719032 ms

Sample 13 Time: 72.559386 ms

Sample 14 Time: 75.735405 ms

Sample 15 Time: 78.938723 ms

Sample 16 Time: 78.135759 ms

Sample 17 Time: 78.236453 ms

Sample 18 Time: 79.052642 ms

Sample 19 Time: 75.144060 ms

Sample 20 Time: 73.745914 ms

Sample 21 Time: 73.770232 ms

Sample 22 Time: 74.688390 ms

Sample 23 Time: 74.539915 ms

Sample 24 Time: 75.108223 ms

Sample 25 Time: 74.743003 ms

Sample 26 Time: 74.312508 ms

Sample 27 Time: 73.611088 ms

Sample 28 Time: 73.832951 ms

Sample 29 Time: 104.177885 ms



########New/Delete########

Sample 0 Time: 106.053881 ms

Sample 1 Time: 81.357852 ms

Sample 2 Time: 84.248006 ms

Sample 3 Time: 79.414867 ms

Sample 4 Time: 77.739395 ms

Sample 5 Time: 77.777795 ms

Sample 6 Time: 78.748859 ms

Sample 7 Time: 80.961064 ms

Sample 8 Time: 78.002214 ms

Sample 9 Time: 77.532895 ms

Sample 10 Time: 77.698864 ms

Sample 11 Time: 79.061598 ms

Sample 12 Time: 81.683390 ms

Sample 13 Time: 80.320232 ms

Sample 14 Time: 80.953382 ms

Sample 15 Time: 80.176875 ms

Sample 16 Time: 80.615900 ms

Sample 17 Time: 78.425460 ms

Sample 18 Time: 79.089329 ms

Sample 19 Time: 78.178428 ms

Sample 20 Time: 78.876004 ms

Sample 21 Time: 76.929182 ms

Sample 22 Time: 78.069203 ms

Sample 23 Time: 77.831559 ms

Sample 24 Time: 77.523082 ms

Sample 25 Time: 77.952728 ms

Sample 26 Time: 80.115438 ms

Sample 27 Time: 78.923792 ms

Sample 28 Time: 78.023121 ms

Sample 29 Time: 78.078590 ms



########Custom New########

Sample 0 Time: 57.033867 ms

Sample 1 Time: 57.575718 ms

Sample 2 Time: 57.030451 ms

Sample 3 Time: 58.402147 ms

Sample 4 Time: 59.323721 ms

Sample 5 Time: 59.077967 ms

Sample 6 Time: 58.853120 ms

Sample 7 Time: 59.006289 ms

Sample 8 Time: 57.555236 ms

Sample 9 Time: 58.053996 ms

Sample 10 Time: 56.847844 ms

Sample 11 Time: 56.901176 ms

Sample 12 Time: 57.571024 ms

Sample 13 Time: 57.548411 ms

Sample 14 Time: 56.950241 ms

Sample 15 Time: 57.026185 ms

Sample 16 Time: 56.919523 ms

Sample 17 Time: 56.417350 ms

Sample 18 Time: 57.097863 ms

Sample 19 Time: 57.543717 ms

Sample 20 Time: 57.553958 ms

Sample 21 Time: 61.106279 ms

Sample 22 Time: 61.448030 ms

Sample 23 Time: 60.788851 ms

Sample 24 Time: 62.251419 ms

Sample 25 Time: 64.147048 ms

Sample 26 Time: 57.912774 ms

Sample 27 Time: 57.066292 ms

Sample 28 Time: 56.633238 ms

Sample 29 Time: 60.453072 ms



########Custom Malloc########

Sample 0 Time: 25.684588 ms

Sample 1 Time: 25.197776 ms

Sample 2 Time: 24.875225 ms

Sample 3 Time: 24.606859 ms

Sample 4 Time: 24.524942 ms

Sample 5 Time: 24.430651 ms

Sample 6 Time: 24.728457 ms

Sample 7 Time: 24.803974 ms

Sample 8 Time: 24.011251 ms

Sample 9 Time: 24.533048 ms

Sample 10 Time: 24.780935 ms

Sample 11 Time: 24.725469 ms

Sample 12 Time: 24.698591 ms

Sample 13 Time: 24.162285 ms

Sample 14 Time: 24.635872 ms

Sample 15 Time: 24.655499 ms

Sample 16 Time: 25.093244 ms

Sample 17 Time: 24.769414 ms

Sample 18 Time: 24.684938 ms

Sample 19 Time: 24.730589 ms

Sample 20 Time: 26.535336 ms

Sample 21 Time: 25.872743 ms

Sample 22 Time: 25.338145 ms

Sample 23 Time: 24.931969 ms

Sample 24 Time: 24.574434 ms

Sample 25 Time: 24.454543 ms

Sample 26 Time: 24.661472 ms

Sample 27 Time: 24.692191 ms

Sample 28 Time: 24.640139 ms

Sample 29 Time: 24.202392 ms
[source lang=cpp]StackAlloc alloc(scratch, ((u8*)scratch)+scratchSize);Scope a(alloc, "");Timer* timer = eiNew(a, Timer)();printf("\n\n########Standard List########");for(int itrSample = 0; itrSample < 30; itrSample++){ double begin = timer->Elapsed(); for(int itrItr = 0; itrItr < 100; itrItr++) { std::list<int> lst; for(int itr = 0; itr < 5000; itr++) { lst.push_front(itr); } } double end = timer->Elapsed(); printf("\nSample %d Time: %f ms", itrSample, float(end - begin)*1000.0f);}struct ListItem{ int data; ListItem* next;};printf("\n\n########Malloc/Free########");for(int itrSample = 0; itrSample < 30; itrSample++){ double begin = timer->Elapsed(); for(int itrItr = 0; itrItr < 100; itrItr++) { ListItem* lst = 0; for(int itr = 0; itr < 5000; itr++) { ListItem* push = (ListItem*)malloc(sizeof(ListItem)); push->next = lst; push->data = itr; lst = push; } while(lst) { ListItem* dead = lst; lst = dead->next; free(dead); } } double end = timer->Elapsed(); printf("\nSample %d Time: %f ms", itrSample, float(end - begin)*1000.0f);}printf("\n\n########New/Delete########");for(int itrSample = 0; itrSample < 30; itrSample++){ double begin = timer->Elapsed(); for(int itrItr = 0; itrItr < 100; itrItr++) { ListItem* lst = 0; for(int itr = 0; itr < 5000; itr++) { ListItem* push = new ListItem; push->next = lst; push->data = itr; lst = push; } while(lst) { ListItem* dead = lst; lst = dead->next; delete dead; } } double end = timer->Elapsed(); printf("\nSample %d Time: %f ms", itrSample, float(end - begin)*1000.0f);}printf("\n\n########Custom New########");for(int itrSample = 0; itrSample < 30; itrSample++){ double begin = timer->Elapsed(); for(int itrItr = 0; itrItr < 100; itrItr++) { Scope t(a,""); ListItem* lst = 0; for(int itr = 0; itr < 5000; itr++) { ListItem* push = eiNew(t, ListItem); push->next = lst; push->data = itr; lst = push; } } double end = timer->Elapsed(); printf("\nSample %d Time: %f ms", itrSample, float(end - begin)*1000.0f);}printf("\n\n########Custom Malloc########");for(int itrSample = 0; itrSample < 30; itrSample++){ double begin = timer->Elapsed(); for(int itrItr = 0; itrItr < 100; itrItr++) { Scope t(a,""); ListItem* lst = 0; for(int itr = 0; itr < 5000; itr++) { ListItem* push = eiAlloc(t, ListItem); push->next = lst; push->data = itr; lst = push; } } double end = timer->Elapsed(); printf("\nSample %d Time: %f ms", itrSample, float(end - begin)*1000.0f);}[/source]

#3 ApochPiQ   Moderators   -  Reputation: 15697

Like
2Likes
Like

Posted 14 May 2012 - 11:02 AM

The general-purpose allocators in the wild are extremely hard to improve upon, in my experience. The Windows NT low-fragmentation heap allocator in particular is insanely good at its job.

Where custom allocators come in handy is in a couple of niche situations:

- You're on a platform with no (or bad) default allocators, such as a console
- You're not using general-purpose allocation patterns


In the first case, custom libraries like dlmalloc are often deployed in certain configurations because the alternatives are terrible. If you're programming against Windows (and to a lesser extent most Linux/BSD flavors) you have no need to worry about general-purpose allocation.

The second case is IMHO the more interesting one. For instance, if you know your allocation pattern looks like "allocate ten million tiny things then throw them all away at once" you can write a custom allocator that'll be an order of magnitude faster than a general-purpose allocator easily. Stack allocators are often useful, one-way allocators as mentioned, and so on. There are a handful of such patterns that can be handy to tap if you know what to look for.


There's one other reason to screw with allocators in my book, and that's instrumentation. It can be extremely handy in a large, complex code base to get comprehensive dumps of who's using what memory. Thankfully, there are increasingly good tools for this on most platforms these days, so rolling your own is rarely necessary any more.

#4 jerrinx   Members   -  Reputation: 208

Like
0Likes
Like

Posted 14 May 2012 - 03:06 PM

Wow !! Thanks a lot Hodgman and ApochiPiQ for that useful update !!
Really awesome you guys replied so fast !!

Hey Hodgman ur getting some pretty impressive results.

From the code you posted I am guessing it is a stack allocator ? Am I right ?
I think it is better to compare the windows allocator to a pool based one though.
Also are you running on release on console ?

I attached the allocator here. Check out if you have time.
I recommend using a fragment size of 4 Bytes on 32 Bit machine

Thanks again !
Jerry

Attached Files


Edited by jerrinx, 14 May 2012 - 03:14 PM.

JerrinX

#5 Hodgman   Moderators   -  Reputation: 30352

Like
2Likes
Like

Posted 15 May 2012 - 02:16 AM

From the code you posted I am guessing it is a stack allocator ? Am I right ?
I think it is better to compare the windows allocator to a pool based one though.
Also are you running on release on console ?

Yep that's a stack allocator (the lifetime of the allocations are tied to the lifetime of the Scope t variable) so it's not really a fair comparison, but then again, the almost my entire engine is allocated from stack allocators... Posted Image Both my "custom new" and "custom malloc" tests are using the stack allocator, but the speed difference is that the first test also allocates a linked-list of destructor callbacks, whereas the second test skips all the data required to call destructors correctly when the scope object dies.

I tried using my fixed-size pool allocator in the same test (again, not quite a fair comparison with a general-purpose allocator) and got even faster results than my stack allocator though!
########Pool########
Sample 0 Time: 18.695643 ms
Sample 1 Time: 18.929025 ms
Sample 2 Time: 18.866306 ms
Sample 3 Time: 19.606981 ms
Sample 4 Time: 19.088168 ms
Sample 5 Time: 18.805722 ms
Sample 6 Time: 18.731056 ms
Sample 7 Time: 18.869294 ms
Sample 8 Time: 18.873986 ms
Sample 9 Time: 19.106515 ms
Sample 10 Time: 19.028010 ms
Sample 11 Time: 19.021183 ms
Sample 12 Time: 18.926892 ms
Sample 13 Time: 19.299362 ms
Sample 14 Time: 19.043369 ms
Sample 15 Time: 18.762629 ms
Sample 16 Time: 18.997716 ms
Sample 17 Time: 18.873986 ms
Sample 18 Time: 18.857347 ms
Sample 19 Time: 18.777136 ms
Sample 20 Time: 19.036969 ms
Sample 21 Time: 18.784815 ms
Sample 22 Time: 19.121021 ms
Sample 23 Time: 19.129980 ms
Sample 24 Time: 18.948652 ms
Sample 25 Time: 19.180752 ms
Sample 26 Time: 18.912386 ms
Sample 27 Time: 19.054888 ms
Sample 28 Time: 18.936278 ms
Sample 29 Time: 19.156005 ms

I attached the allocator here. Check out if you have time.

I tried to use it, but I'm using a different compiler than you -- the lib file was incompatible.

I've uploaded the source for my stack, pool and scope classes if you're interested.

Attached Files


Edited by Hodgman, 15 May 2012 - 02:18 AM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS