• Advertisement
Sign in to follow this  

New is SLOW!

This topic is 1962 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello all.

I just ran the following two functions 100 times each:

void passing(int j )
{
for(int i = 0; i < 1000000; ++i)
{
int f = 0;
}
}
void referrencing(int* j)
{
for(int i = 0; i < 1000000; ++i)
{
int* f = new int;
delete f;
}
}

here was the output:

passing: 2381801 ticks
referrencing: 787614789 ticks

That is almost 400x as slow! Is this an anomaly of me running windows on a virtual machine, or is new, delete always this slow as compared with local variable alloation? I also did not find that simply dereferencing j caused any significant slowdown in the code.

Share this post


Link to post
Share on other sites
Advertisement
A lot of this behaviour in C++ is implementation dependent but as a general rule in a nutshell:
- stack allocation requires incrementing a pointer by sizeof(WhatIWant)
- heap allocation requires synchronizing over all threads and walking some kind of list until a suitable chunk of memory is found, modifying the list accordingly, and returning the pointer.
Note that this applies to C++ only. On modern Java implementations for example, new is silly cheap. Also note that the Debug runtime of MSVC for example does a lot of extra work (like initializing the allocated memory with a special value and padding the allocated memory with another special value).

Also note that an optimizing compiler will probably be able to completely eliminate your passing function since it can detect that nothing in there has a side effect. I would not trust the benchmark as written. Either the compiler is not optimizing (in which case it's probably the default Debug configuration in case of MSVC and as such completely useless for any benchmark) or the function and containing loop was probably completely removed.

Share this post


Link to post
Share on other sites

On modern Java implementations for example, new is silly cheap.


new might be cheap in Java compared to C++ but cleaning up isn't. (excessive use of new in Java is almost as bad as it is in C++ since it will give the GC more work)

Share this post


Link to post
Share on other sites
I didn't say it does not come with a price at a different place. If Java-new were just completely better without any negative side effect at another point everyone would be stopping to use C++ and start using Java (well, not really). The important point up there was that new being expensive is something related to C++ only. It comes with prices and benefits all over the place. Other languages, even if they have the exactly same looking new-keyword, do that differently and things come with different prices and benefits.

Share this post


Link to post
Share on other sites
Apart from the already mentioned obvious (the first function is no-op), I don't consider 787 cycles for a general allocator slow. If you subtract the cycles for your [font=courier new,courier,monospace]for [/font]loop and for the non-inlineable library call, that's around 700-750 cycles alltogether.

Actually, for a non-specialized allocator which has to deliver regardless of size and situation, and is running in debug mode (unless you use the world's worst compiler in existence, this is demonstrably a non-optimized build), that's a pretty awesome performance.

In comparison, a single cache miss is on the order of several hundred cycles. Thus, realistically, for non-trivial code (one that actually uses a million allocated memory locations), the overhead of the allocator wouldn't matter much.

Share this post


Link to post
Share on other sites
Samoth,

I was actually trying to generate cache misses with this code. I have read all about them and would like to experience them for myself. I wanted to know if it was better to send in a pointer to a structure for a often called piece of code, or to just pass the structure by value to avoid a potential cache miss. Can you advise on which you think would be better in that case? I might not be getting cache misses because I have a 15MB L2(3?) cache, but I know not all systems have that.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement