Sign in to follow this  
SymLinked

NedMalloc and Pools

Recommended Posts

Hi,

Using NedMalloc, is there any use of pools like the boost ones? I have to admit I haven't profiled yet, but am going to when I get home. I assume a generic allocator like NedMalloc can't be as fast as using pools on commonly allocated (each frame) objects.

What do you think? Have you made any comparasions? Would you use both in combination?

Sincerely,
Stefan

Share this post


Link to post
Share on other sites
The first line should probably be "profile whether allocations are your bottleneck first", but back to topic :-)

I have benchmarked two versions (current and previous, which was "current" at the time of testing) of NedMalloc against the MinGW default allocator as well as the low-frag heap in the pretty much best case scenario for NedMalloc and in more realistic scenarios, too. Tested with 1 to 5 threads on a 4-core machine. NedMalloc was compiled with full optimizations, and the default allocator was, well... whatever it is :-)

The test benchmark coming with NedMalloc (deliberately?) doesn't test the low-frag heap, by the way.

The standard allocator consistently outperformed NedMalloc. NedMalloc could get closer with 3-4 concurrent threads and was about the same with 5 concurrent threads, but was still 5% slower than the low-frag heap.
The standard allocator won both hands down in all tests involving 3 or less concurrent threads (10-30% faster, depending on pattern and concurrency).

So... for the "fastest existing allocator", I would have expected to see something different.

Regardless, you have to ask yourself: When does your program need to do millions of allocations with a concurrency level of 3 or more? Even if you have 3 or 4 threads running in parallel, the allocations will usually not all happen at the exact same time. Congestion is never really that high.
So even under the assumption that NedMalloc was, say, 20 times faster in this scenario, it will never happen. On the other hand, the default allocator is quite provably bug-free (tested billions of times) and zero maintenance, which is always a good thing.

EDIT:
Oi, forgot about pools. Pools (and stacks likewise) may be a valid optimization, since they can eliminate locking, and may offer a better cache behaviour. Plus, stacks have the additional benefit that allocations are incrementing a pointer and deallocations are no-op.

However, again, be sure that you actually have so many regular allocations that this is necessary. The standard allocator easily does upwards of 10 million allocations per second on my machine. If you do 10,000 allocations per frame at 60fps, that is about 6% CPU. Worth optimizing?

Share this post


Link to post
Share on other sites
Interesting find on NedMalloc.

I'm not sure I understand the rest of your claims though. 6% CPU on what machine? Netbooks for instance have high memory and low processing power. What about fragmentation? Boost pools are quick to put in so it's not any additional work really.

I've used pools for projectiles before, since they're created and destroyed often and in larger numbers. I'm also not sure what the standard allocator is. Is that MSVC's, some STD allocator I haven't used or the ones found in some Linux distros?

Because comparing MSVC's with NedMalloc's the difference was very visible. Even when using regular std::String objects. Again, I'm not sure I understand all the details so feel free to explain and thanks for chiming in. :)

Share this post


Link to post
Share on other sites
Quote:
Original post by SymLinked
6% CPU on what machine? Netbooks for instance have high memory and low processing power.
6% on my somewhat older desktop machine. While I do see that netbooks having low processing power, I think it is a bit arguable insofar as they lack the processing power for many other things as well.
I mean, I wouldn't expect to play Crysis on a netbook, for example. It just isn't made for that kind of thing.

So, if I'm worrying that a netbook won't be able to cope with my number of allocations, then probably the correct thing to do is radically scale down (as in, e.g., reduce particles from 50,000 to 500), since that radical scaledown will likely be needed in all other places anyway. And, at that point, the allocations don't really matter any more.

Quote:
What about fragmentation?
Fragmentation surely occurs with every standard allocator (including Ned's), and certainly pools do help to limit it.

[Though funnily a lot of people (including myself) will go to great lengths to avoid the fragmentation devil, and then, at the same time, end up using std::vector with push_back, or a library like stb_vorbis which reallocates growing memory blocks all the time.]

Quote:
I'm also not sure what the standard allocator is.
No idea really. Whatever the MinGW compiler maps malloc and operator new[] to, that may be Hoard (as in the Linux version) or simply a call into MSVCRT, or some wrapper with some special optimizatins around the latter, I wouldn't know.

What I'm saying is that allocation is by no means as evil as people often depict it. Certainly, it is always worth to do optimisations that can be achieved with 2 additional lines of code, and it is always worth to consider that you do things in a somewhat reasonable way. Then the default implementation (whatever it is) works surprisingly well.

The same is true for locking in general, by the way. A lot of people (again, including me) have spent or still spend a lot of time writing nasty and possibly buggy lockfree structures and algorithms because lock contention is the devil and locking is totally killing performance.
Except that a plain normal critical section, which is well-tested and reliable takes something like two dozen clock cycles in the non-contended case, and the non-contended case occurs 99.99% of the time, unless you really try to construct a totally contrieved example, such as 4 threads hammering the same lock in an infinite loop all the time.

So the old school "profile first, optimize then" word isn't all wrong.

Share this post


Link to post
Share on other sites
Thanks for your advice, it's valuable to me.

Though, I didn't question the use of NedMalloc vs the default allocator (and MSVC's is much slower than ptmalloc, NedMalloc, Hoard etc.) Just questioning the use of pools together with NedMalloc, incase it does similiar stuff that pools do.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this