New is SLOW!

Started by
26 comments, last by iMalc 11 years, 5 months ago
It depends on the underlying allocator, but most allocators are sophisticated enough to properly optimize for that. All but the most stupid implementations will add a freed block to a kind of free list (some even do this per-thread), and return an element from there on the next allocation.

Therefore, if you allocate/free/allocate/free/allocate an integer, you will very likely get the same address or one of a handful of addresses every time. Thus, cache misses are not going to happen.

To provoke cache misses, you need to either allocate one large block and index into that randomly, or do some ten thousand allocations without freeing, and dereference the returned pointers (preferrably in random order). Any access pattern that is either localized or an access pattern with a constant stride (e.g. accessing every 64th or 253th byte) will very likely not give a lot of cache misses due to either several accesses landing in the same cache line anyway or triggering the CPU's built-in prefetch mechanism to anticipate your requests.
Advertisement

I was actually trying to generate cache misses with this code. I have read all about them and would like to experience them for myself.

Without wanting to sound too disheartening, I wouldn't worry about this kind of stuff for the time being. Concentrate on writing solid code and understanding the language (for instance, free store vs automatic storage semantics is particularly important point in a value-based language such as C++).


If you really care about how much you're missing the cache, it's not something you want to be working out in your head, for the most part. You first want to write code sensibly and then use profiling tools to tell you where you might be able to improve it in terms of performance. Intel's PCM will give you really low-level information of this kind.


I wanted to know if it was better to send in a pointer to a structure for a often called piece of code,
[/quote]
Neither. Typically use reference-to-const for non-primitive types. Tweak according to profile results. Again, IMHO getting the basics of the language down first is much more worth while.


or to just pass the structure by value to avoid a potential cache miss. [/quote]
Whether or not an object is on the stack or heap makes a difference, as does its size, and the amount of indirection inherent in its data structure. The full answer is extremely complicated.

I might not be getting cache misses because I have a 15MB L2(3?) cache, but I know not all systems have that.[/quote]
If you insist on learning more about this stuff (and again, I don't think it's necessarily a good idea at this point in your education), a good starting point is this article which demonstrates cache effects on performance. Note how the expert himself failed to explain all the effects revealed by the data. You could try replicating the results yourself in C++:
How would passing the structure by value prevent a cache miss?
Alvaro: I assumed that by copying the data into the stack then accessing it there, I would certainly not miss the cache. Because the CPU is fast this should be fast also.
edd: Actually my real goal is to implement an Observer scheme where I have, for example, DigitalButtonObservable, AnalogButtonObservable, 2DAxisObservable, etc., and then have a single InputPolyObserver class built into a vector in each UIController. By doing that I could, for example, map the jump of my playable character to three separate actions, i.e. Space Bar, X (360 controller) and maybe a mouse move for some Indie coolness. I know that it is not exactly logical to have the same action map to such disparate input methods, but I want to start with a completely generic setup for the engine code, then simplify when making the actual game. I really want to avoid having to code the input system twice, but rather script the input in any way I please, including mixing input types i.e. Digital vs. Analog.

The reason I can't pass by const ref in this case is that, for example, a digital button would return a bool for buttonDown and a mouse would return its int or float x, and y coords as a struct or whatnot. The only way I could think of to implement this as a single overload of a function in the abstract Observer class, named something such as GetObservableState(...), is to pass something like a void*, then requiring my client code to interpret the event, perhaps by also passing a type, such as an enum specifying whether the button was a digital press or an analog press i.e. DIGITAL_PRESS, ANALOG_PRESS.

I could then simply load up an observer with a function pointer for its code and link it with any button or combination of buttons from any controller, keyboard, or mouse, and the code would simply work as requested. Do you see another way to implement this?

I wonder as I wander...

http://www.davesgameoflife.com


Alvaro: I assumed that by copying the data into the stack then accessing it there, I would certainly not miss the cache. Because the CPU is fast this should be fast also.

It doesn't matter how fast the CPU is (when it comes to this cache/memory stuff); the bottleneck is the memory speed, not the CPU speed. Also, if there is a cache miss when using the object, there will be cache misses when copying the object as well. Just an FYI.
[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]
I fail to see what any of that has to do with performance. Input is, relative to your game loop, a rare event. There is usually little point trying to optimise it, there are typically much "hotter" areas where you can actually reap measurable benefit.

There is a lot of stuff going on in this thread. First you were talking about the allocator, then the cache, now it seems to be about design. Try to keep a thread coherent, and create additional threads if necessary for unrelated topics.
rip-off,

I normally would do that, but I was responding directly to a post made by edd. I'll create a new thread next time.

I wonder as I wander...

http://www.davesgameoflife.com


Alvaro: I assumed that by copying the data into the stack then accessing it there, I would certainly not miss the cache. Because the CPU is fast this should be fast also.

But where are you copying it from? Any memory you read from there has to go through the cache hierarchy too. I don't want to start sounding like a broken record, but I'd say get the basics down first.



The reason I can't pass by const ref in this case is that, for example, a digital button would return a bool for buttonDown and a mouse would return its int or float x, and y coords as a struct or whatnot.
[/quote]
I still don't see why you can't use references to const, or even which part of the whole system you're talking about here. There are surely a whole bunch of functions involved, any of which could take arguments by value, reference or pointer(?).


The only way I could think of to implement this as a single overload of a function in the abstract Observer class, named something such as GetObservableState(...), is to pass something like a void*, then requiring my client code to interpret the event, perhaps by also passing a type, such as an enum specifying whether the button was a digital press or an analog press i.e. DIGITAL_PRESS, ANALOG_PRESS.
[/quote]
It sounds like you're perhaps doing the observer pattern backwards. The thing being observed tells the things observing it what's going on. You really don't want to start falling back to void*.


I could then simply load up an observer with a function pointer
[/quote]
void* and function pointers are all that's available in C. In C++, use virtual functions instead.

Do you see another way to implement this?[/quote]
We're wandering well off-topic here (for the second time), and my suggestion may not fit what you're doing, but I'd go with something like this:


template<typename State>
struct observer : uncopyable
{
virtual ~observer() { }
// called by an observable when a change is made.
virtual void on_update(const State &new_state) = 0;
};

template<typename State>
class observable : uncopyable
{
public:
// ...


// observers can be registered to receive notifications of change.
void add_observer(const share_ptr<observer<State> > &obs);

// called when something changes, calls on_update(new_state) for each
// registered observer.
void notify_observers(const State &new_state);


private:
// ...
};


So a mouse might have an "observable<mouse_position>", for example. Anything that wants to know when the mouse has moved registers an observer<mouse_position> implementation. This is a very noddy version of something like boost::signal<>.
Thank you very much edd, that is very instructive. I never considered the template version. Can you tell me how I can quote others and post clean and colored code like you did? Also, I have never heard the word "noddy."

I wonder as I wander...

http://www.davesgameoflife.com


Thank you very much edd, that is very instructive. I never considered the template version. Can you tell me how I can quote others and post clean and colored code like you did?

Under each post, there's a button that says "Quote". Click it to quote someone. Alternatively, add [ quote ][ /quote ] (without spaces) tags in your post to create your own quote (but it won't display their name or post or timestamp if you don't click "Quote"). To get formatted code, use [ code ][ /code ] tags (again, without spaces)
[size=2][ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]

This topic is closed to new replies.

Advertisement