Back to General and Gameplay Programming

std::set destructor performance

General and Gameplay Programming Programming

Started by swiftcoder May 25, 2009 12:59 PM

19 comments, last by iMalc 14 years, 10 months ago

swiftcoder

18,997

Author

May 25, 2009 12:59 PM

I have written a small simulation (work related), which performs most of its work on std::sets. The profiler is telling me that 40% of the running time is spent in the destructor of std::set, and another 25% is spent in the copy constructor. At the same time, the expected performance sinks (union, intersection and difference operations) each consume less than 5% of the running time. The sets are relatively small (never greater than 100 items), and contain only pointers, so the destructor and copy constructor should have very little work to do. I have tried reusing the sets by calling clear(), which had no performance benefit, and nor does replacing the copy constructor with a call to swap(). I am running Apple's version of GCC 4.0.1, and am compiling with the -O3 -g flags (-g to obtain symbol names for the profiler). Profiler in use is Apple's Shark. Any ideas how I might improve on this performance? It seems more than a little odd, given that the items involved are raw pointers. I would rather not have to replace std::set, but if necessary, that is certainly an option.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

SiCrane

11,840

May 25, 2009 01:09 PM

Did you try using a pooled allocator?

swiftcoder

18,997

Author

May 25, 2009 01:43 PM

Quote:Original post by SiCrane
Did you try using a pooled allocator?

Thanks! I hadn't thought of that, so I just added in boost::fast_pooled_allocator, with the result that the copy constructor cost fell 1-2%, while the destructor cost fell 15%.

That is a very decent speedup, but these two still account ~50% of my running time, and still dwarf the cost of the actual set operations (12%, 8% and 6% respectively).

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

NotAYakk

876

May 25, 2009 03:35 PM

union, intersection and difference doesn't involve allocation or deallocation.

Well, it does -- but that time gets assigned (probably) to the constructor and destructor. :)

Sets and Maps are tree based structures. This makes them exceedingly slow to copy.

How are you using your std sets? Do you add and remove elements, or is it more of a "build up the set" then "read from the finished set" then "throw away the set" sequence?

Because the build/use/dispose can be done quit efficiently with lazy-sorting on a vector buffer. And if duplicate elements is (relatively) rare or not a problem, then you can even defer detection of duplication until the lazy-sort efficiently.

osmanb

2,082

May 25, 2009 04:25 PM

Obviously, to provide any useful information you need to show us some sample code. But my hunch is that you're passing/returning std::sets to/from functions by value. Unless you can actually explain why your code should spend any significant amount of time creating/destroying sets.

swiftcoder

18,997

Author

May 25, 2009 04:55 PM

Quote:Original post by osmanb
Obviously, to provide any useful information you need to show us some sample code. But my hunch is that you're passing/returning std::sets to/from functions by value. Unless you can actually explain why your code should spend any significant amount of time creating/destroying sets.

Code is long and convoluted (this is a simulation of a bittorrent-like p2p network).

All functions producing sets are passed the destination set by reference, and sets are only allocated in the main loop.

In a given cycle of the simulation, there are anywhere between 10 and 100 set operations (union, intersection or difference), and only a single set copy/swap.

However, if each of those operations counts as a copy to fill the destination set, and a destruction to clear the result (as NotAYakk suggests), you can see where the problem may arise.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

ApochPiQ

23,138

May 25, 2009 05:01 PM

I know this is probably a totally dumb question, but you are running a Release build, right? I've seen problems with sets and maps consuming huge amounts of CPU time due to iterator checking and other debug/safety functionality.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

dmatter

4,872

May 25, 2009 05:06 PM

Quote:Original post by swiftcoder
you can see where the problem may arise.

Is it actually slow then?
The union, intersection and difference functions have a linear time complexity, algorithmically quite fast plus they're only dealing with pointers, I'm not actually that surprised that they only account for 26% of the running time, compared to the more-hefty manipulations of the set data structures as well as allocation and deallocation of memory.

David Gill :: GitHub :: twitter .

loufoque

May 25, 2009 05:09 PM

Well, since you're containing PODs, you could eventually do a region allocator and avoid walking through the structure to free the memory.

Standard containers don't allow that, however. Maybe you could look into a Boost.Intrusive solution.

Why do you have a container of pointers also?

swiftcoder

18,997

Author

May 25, 2009 05:21 PM

Quote:Original post by ApochPiQ
I know this is probably a totally dumb question, but you are running a Release build, right? I've seen problems with sets and maps consuming huge amounts of CPU time due to iterator checking and other debug/safety functionality.

Ja, highest optimisation level under GCC.

Quote:Original post by dmatter
Is it actually slow then?

Bit hard to say - I don't really have a baseline. The simulation was originally written in python, but I ported it to C++ when the running times became too large.

The C++ code shows a 4-5x speedup over Python, which means that the (very small) simulations we are running currently take only 10 minutes each. Unfortunately, in the near future I will have to run much larger simulations, and very many of them.

Ultimately, after applying the pool allocator, it is probably fast enough - but the less time I spend waiting for data, the better.

Quote:Original post by loufoque
Well, since you're containing PODs, you could eventually do a region allocator and avoid walking through the structure to free the memory.

Standard containers don't allow that, however. Maybe you could look into a Boost.Intrusive solution.

Why do you have a container of pointers also?

All the sets contain only pointers to nodes, while the nodes them selves are stored in a sorted, contiguous std::vector. This has the advantage that the sets are only operating on pointers (no expensive copies to be made).

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

std::set destructor performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

std::set destructor performance

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines