Sign in to follow this  
KingofNoobs

New is SLOW!

Recommended Posts

KingofNoobs    305
Hello all.

I just ran the following two functions 100 times each:

void passing(int j )
{
for(int i = 0; i < 1000000; ++i)
{
int f = 0;
}
}
void referrencing(int* j)
{
for(int i = 0; i < 1000000; ++i)
{
int* f = new int;
delete f;
}
}

here was the output:

passing: 2381801 ticks
referrencing: 787614789 ticks

That is almost 400x as slow! Is this an anomaly of me running windows on a virtual machine, or is new, delete always this slow as compared with local variable alloation? I also did not find that simply dereferencing j caused any significant slowdown in the code.

Share this post


Link to post
Share on other sites
BitMaster    8651
A lot of this behaviour in C++ is implementation dependent but as a general rule in a nutshell:
- stack allocation requires incrementing a pointer by sizeof(WhatIWant)
- heap allocation requires synchronizing over all threads and walking some kind of list until a suitable chunk of memory is found, modifying the list accordingly, and returning the pointer.
Note that this applies to C++ only. On modern Java implementations for example, new is silly cheap. Also note that the Debug runtime of MSVC for example does [b]a lot[/b] of extra work (like initializing the allocated memory with a special value and padding the allocated memory with another special value).

Also note that an optimizing compiler will probably be able to completely eliminate your passing function since it can detect that nothing in there has a side effect. I would not trust the benchmark as written. Either the compiler is not optimizing (in which case it's probably the default Debug configuration in case of MSVC and as such completely useless for any benchmark) or the function and containing loop was probably completely removed.

Share this post


Link to post
Share on other sites
SimonForsman    7642
[quote name='BitMaster' timestamp='1354540404' post='5006585']
On modern Java implementations for example, new is silly cheap.
[/quote]

new might be cheap in Java compared to C++ but cleaning up isn't. (excessive use of new in Java is almost as bad as it is in C++ since it will give the GC more work)

Share this post


Link to post
Share on other sites
BitMaster    8651
I didn't say it does not come with a price at a different place. If Java-new were just completely better without any negative side effect at another point everyone would be stopping to use C++ and start using Java (well, not really). The important point up there was that new being expensive is something related to C++ only. It comes with prices and benefits all over the place. Other languages, even if they have the exactly same looking new-keyword, do that differently and things come with different prices and benefits.

Share this post


Link to post
Share on other sites
Apart from the already mentioned obvious (the first function is no-op), I don't consider 787 cycles for a general allocator slow. If you subtract the cycles for your [font=courier new,courier,monospace]for [/font]loop and for the non-inlineable library call, that's around 700-750 cycles alltogether.

Actually, for a non-specialized allocator which has to deliver regardless of size and situation, and is running in debug mode (unless you use the world's worst compiler in existence, this is [i]demonstrably [/i]a non-optimized build), that's a pretty awesome performance.

In comparison, a single cache miss is on the order of several hundred cycles. Thus, realistically, for non-trivial code (one that actually uses a million allocated memory locations), the overhead of the allocator wouldn't matter much.

Share this post


Link to post
Share on other sites
KingofNoobs    305
Samoth,

I was actually trying to generate cache misses with this code. I have read all about them and would like to experience them for myself. I wanted to know if it was better to send in a pointer to a structure for a often called piece of code, or to just pass the structure by value to avoid a potential cache miss. Can you advise on which you think would be better in that case? I might not be getting cache misses because I have a 15MB L2(3?) cache, but I know not all systems have that.

Share this post


Link to post
Share on other sites
It depends on the underlying allocator, but most allocators are sophisticated enough to properly optimize for that. All but the most stupid implementations will add a freed block to a kind of free list (some even do this per-thread), and return an element from there on the next allocation.

Therefore, if you allocate/free/allocate/free/allocate an integer, you will very likely get the same address or one of a handful of addresses every time. Thus, cache misses are not going to happen.

To provoke cache misses, you need to either allocate one large block and index into that randomly, or do some ten thousand allocations without freeing, and dereference the returned pointers (preferrably in random order). Any access pattern that is either localized or an access pattern with a constant stride (e.g. accessing every 64th or 253th byte) will very likely [i]not [/i]give a lot of cache misses due to either several accesses landing in the same cache line anyway or triggering the CPU's built-in prefetch mechanism to anticipate your requests. Edited by samoth

Share this post


Link to post
Share on other sites
the_edd    2109
[quote name='KingofNoobs' timestamp='1354560968' post='5006718']
I was actually trying to generate cache misses with this code. I have read all about them and would like to experience them for myself.
[/quote]
Without wanting to sound too disheartening, I wouldn't worry about this kind of stuff for the time being. Concentrate on writing solid code and understanding the language (for instance, free store vs automatic storage semantics is particularly important point in a value-based language such as C++).


If you really care about how much you're missing the cache, it's not something you want to be working out in your head, for the most part. You first want to write code sensibly and then use profiling tools to tell you where you might be able to improve it in terms of performance. [url="http://software.intel.com/en-us/articles/intel-performance-counter-monitor/"]Intel's PCM will give you really low-level information of this kind[/url].

[quote]
I wanted to know if it was better to send in a pointer to a structure for a often called piece of code,
[/quote]
Neither. Typically use reference-to-const for non-primitive types. Tweak according to profile results. Again, IMHO getting the basics of the language down first is much more worth while.


[quote]or to just pass the structure by value to avoid a potential cache miss. [/quote]
Whether or not an object is on the stack or heap makes a difference, as does its size, and the amount of indirection inherent in its data structure. The full answer is extremely complicated.

[quote]I might not be getting cache misses because I have a 15MB L2(3?) cache, but I know not all systems have that.[/quote]
If you insist on learning more about this stuff (and again, I don't think it's necessarily a good idea at this point in your education), a [url="http://igoro.com/archive/gallery-of-processor-cache-effects/"]good starting point is this article which demonstrates cache effects on performance[/url]. Note how the expert himself failed to explain all the effects revealed by the data. You could try replicating the results yourself in C++:
Edited by e?dd

Share this post


Link to post
Share on other sites
KingofNoobs    305
Alvaro: I assumed that by copying the data into the stack then accessing it there, I would certainly not miss the cache. Because the CPU is fast this should be fast also.
edd: Actually my real goal is to implement an Observer scheme where I have, for example, DigitalButtonObservable, AnalogButtonObservable, 2DAxisObservable, etc., and then have a single InputPolyObserver class built into a vector in each UIController. By doing that I could, for example, map the jump of my playable character to three separate actions, i.e. Space Bar, X (360 controller) and maybe a mouse move for some Indie coolness. I know that it is not exactly logical to have the same action map to such disparate input methods, but I want to start with a completely generic setup for the engine code, then simplify when making the actual game. I really want to avoid having to code the input system twice, but rather script the input in any way I please, including mixing input types i.e. Digital vs. Analog.

The reason I can't pass by const ref in this case is that, for example, a digital button would return a bool for buttonDown and a mouse would return its int or float x, and y coords as a struct or whatnot. The only way I could think of to implement this as a single overload of a function in the abstract Observer class, named something such as GetObservableState(...), is to pass something like a void*, then requiring my client code to interpret the event, perhaps by also passing a type, such as an enum specifying whether the button was a digital press or an analog press i.e. DIGITAL_PRESS, ANALOG_PRESS.

I could then simply load up an observer with a function pointer for its code and link it with any button or combination of buttons from any controller, keyboard, or mouse, and the code would simply work as requested. Do you see another way to implement this?

Share this post


Link to post
Share on other sites
Cornstalks    7030
[quote name='KingofNoobs' timestamp='1354565467' post='5006750']
Alvaro: I assumed that by copying the data into the stack then accessing it there, I would certainly not miss the cache. Because the CPU is fast this should be fast also.
[/quote]
It doesn't matter how fast the CPU is (when it comes to this cache/memory stuff); the bottleneck is the memory speed, not the CPU speed. Also, if there is a cache miss when using the object, there will be cache misses when copying the object as well. Just an FYI.

Share this post


Link to post
Share on other sites
rip-off    10976
I fail to see what any of that has to do with performance. Input is, relative to your game loop, a rare event. There is usually little point trying to optimise it, there are typically much "hotter" areas where you can actually reap measurable benefit.

There is a lot of stuff going on in this thread. First you were talking about the allocator, then the cache, now it seems to be about design. Try to keep a thread coherent, and create additional threads if necessary for unrelated topics.

Share this post


Link to post
Share on other sites
the_edd    2109
[quote name='KingofNoobs' timestamp='1354565467' post='5006750']
Alvaro: I assumed that by copying the data into the stack then accessing it there, I would certainly not miss the cache. Because the CPU is fast this should be fast also.
[/quote]
But where are you copying it from? Any memory you read from there has to go through the cache hierarchy too. I don't want to start sounding like a broken record, but I'd say get the basics down first.


[quote]
The reason I can't pass by const ref in this case is that, for example, a digital button would return a bool for buttonDown and a mouse would return its int or float x, and y coords as a struct or whatnot.
[/quote]
I still don't see why you can't use references to const, or even which part of the whole system you're talking about here. There are surely a whole bunch of functions involved, any of which could take arguments by value, reference or pointer(?).

[quote]
The only way I could think of to implement this as a single overload of a function in the abstract Observer class, named something such as GetObservableState(...), is to pass something like a void*, then requiring my client code to interpret the event, perhaps by also passing a type, such as an enum specifying whether the button was a digital press or an analog press i.e. DIGITAL_PRESS, ANALOG_PRESS.
[/quote]
It sounds like you're perhaps doing the observer pattern backwards. The thing being observed tells the things observing it what's going on. You really don't want to start falling back to void*.

[quote]
I could then simply load up an observer with a function pointer
[/quote]
void* and function pointers are all that's available in C. In C++, use virtual functions instead.

[quote]Do you see another way to implement this?[/quote]
We're wandering well off-topic here (for the second time), and my suggestion may not fit what you're doing, but I'd go with something like this:

[code]
template<typename State>
struct observer : uncopyable
{
virtual ~observer() { }
// called by an observable when a change is made.
virtual void on_update(const State &new_state) = 0;
};

template<typename State>
class observable : uncopyable
{
public:
// ...


// observers can be registered to receive notifications of change.
void add_observer(const share_ptr<observer<State> > &obs);

// called when something changes, calls on_update(new_state) for each
// registered observer.
void notify_observers(const State &new_state);


private:
// ...
};
[/code]

So a mouse might have an "observable<mouse_position>", for example. Anything that wants to know when the mouse has moved registers an observer<mouse_position> implementation. This is a very noddy version of something like boost::signal<>.

Share this post


Link to post
Share on other sites
KingofNoobs    305
Thank you very much edd, that is very instructive. I never considered the template version. Can you tell me how I can quote others and post clean and colored code like you did? Also, I have never heard the word "noddy."

Share this post


Link to post
Share on other sites
Cornstalks    7030
[quote name='KingofNoobs' timestamp='1354568007' post='5006769']
Thank you very much edd, that is very instructive. I never considered the template version. Can you tell me how I can quote others and post clean and colored code like you did?
[/quote]
Under each post, there's a button that says "Quote". Click it to quote someone. Alternatively, add [ quote ][ /quote ] (without spaces) tags in your post to create your own quote (but it won't display their name or post or timestamp if you don't click "Quote"). To get formatted code, use [ code ][ /code ] tags (again, without spaces)

Share this post


Link to post
Share on other sites
KingofNoobs    305
[quote name='Cornstalks' timestamp='1354568234' post='5006771']
[quote name='KingofNoobs' timestamp='1354568007' post='5006769']
Thank you very much edd, that is very instructive. I never considered the template version. Can you tell me how I can quote others and post clean and colored code like you did?
[/quote]
Under each post, there's a button that says "Quote". Click it to quote someone. Alternatively, add [ quote ][ /quote ] (without spaces) tags in your post to create your own quote (but it won't display their name or post or timestamp if you don't click "Quote"). To get formatted code, use [ code ][ /code ] tags (again, without spaces)
[/quote]

Thank you very much!

Share this post


Link to post
Share on other sites
alnite    3436
[quote name='BitMaster' timestamp='1354542515' post='5006595']
I didn't say it does not come with a price at a different place. If Java-new were just completely better without any negative side effect at another point everyone would be stopping to use C++ and start using Java (well, not really). The important point up there was that new being expensive is something related to C++ only. It comes with prices and benefits all over the place. Other languages, even if they have the exactly same looking new-keyword, do that differently and things come with different prices and benefits.
[/quote]

No. Allocating memory is always expensive regardless of what language you are speaking in, even in Java. I find this a common error among Java programmers. They take GC for granted and they forgot their code is wasting cycles. Allocating new inside a loop is very much a bad habit.

[CODE]
some loop {
String someStuff = "I'm" + " concatenating " + " like" + "crazy " + "here.";
MyClass temp = new MyClass();
// using temp and someStuff
}
[/CODE]

Don't do this. Since those variables are going to be used over and over, why don't you just create it once outside of loop and use it inside?

Share this post


Link to post
Share on other sites
KingofNoobs    305
I am a noob here. So I have a noobish question. I just received a downvote for this post. Is there a reason for this, and something I can avoid in the future? Because I don't like offending people or posting inappropriate content.

Share this post


Link to post
Share on other sites
lride    674
[quote name='KingofNoobs' timestamp='1354569012' post='5006782']
I am a noob here. So I have a noobish question. I just received a downvote for this post. Is there a reason for this, and something I can avoid in the future? Because I don't like offending people or posting inappropriate content.
[/quote]
+1 to compensate for that downvote.
It is an obvious fact that new is slow, so someone could've thought your question was ridiculous especially when you caught attention of people by capitalizing SLOW!

Share this post


Link to post
Share on other sites
Khatharr    8812
[quote name='KingofNoobs' timestamp='1354569012' post='5006782']
I am a noob here. So I have a noobish question. I just received a downvote for this post. Is there a reason for this, and something I can avoid in the future? Because I don't like offending people or posting inappropriate content.
[/quote]

Some people are just in a default state of 'offended' and will apply that to whoever and whatever they come across. If your conscience is clear then you're okay.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this