std::set destructor performance

Started by
19 comments, last by iMalc 14 years, 10 months ago
I have written a small simulation (work related), which performs most of its work on std::sets. The profiler is telling me that 40% of the running time is spent in the destructor of std::set, and another 25% is spent in the copy constructor. At the same time, the expected performance sinks (union, intersection and difference operations) each consume less than 5% of the running time. The sets are relatively small (never greater than 100 items), and contain only pointers, so the destructor and copy constructor should have very little work to do. I have tried reusing the sets by calling clear(), which had no performance benefit, and nor does replacing the copy constructor with a call to swap(). I am running Apple's version of GCC 4.0.1, and am compiling with the -O3 -g flags (-g to obtain symbol names for the profiler). Profiler in use is Apple's Shark. Any ideas how I might improve on this performance? It seems more than a little odd, given that the items involved are raw pointers. I would rather not have to replace std::set, but if necessary, that is certainly an option.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Advertisement
Did you try using a pooled allocator?
Quote:Original post by SiCrane
Did you try using a pooled allocator?
Thanks! I hadn't thought of that, so I just added in boost::fast_pooled_allocator, with the result that the copy constructor cost fell 1-2%, while the destructor cost fell 15%.

That is a very decent speedup, but these two still account ~50% of my running time, and still dwarf the cost of the actual set operations (12%, 8% and 6% respectively).

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

union, intersection and difference doesn't involve allocation or deallocation.

Well, it does -- but that time gets assigned (probably) to the constructor and destructor. :)

Sets and Maps are tree based structures. This makes them exceedingly slow to copy.

How are you using your std sets? Do you add and remove elements, or is it more of a "build up the set" then "read from the finished set" then "throw away the set" sequence?

Because the build/use/dispose can be done quit efficiently with lazy-sorting on a vector buffer. And if duplicate elements is (relatively) rare or not a problem, then you can even defer detection of duplication until the lazy-sort efficiently.
Obviously, to provide any useful information you need to show us some sample code. But my hunch is that you're passing/returning std::sets to/from functions by value. Unless you can actually explain why your code should spend any significant amount of time creating/destroying sets.
Quote:Original post by osmanb
Obviously, to provide any useful information you need to show us some sample code. But my hunch is that you're passing/returning std::sets to/from functions by value. Unless you can actually explain why your code should spend any significant amount of time creating/destroying sets.
Code is long and convoluted (this is a simulation of a bittorrent-like p2p network).

All functions producing sets are passed the destination set by reference, and sets are only allocated in the main loop.

In a given cycle of the simulation, there are anywhere between 10 and 100 set operations (union, intersection or difference), and only a single set copy/swap.

However, if each of those operations counts as a copy to fill the destination set, and a destruction to clear the result (as NotAYakk suggests), you can see where the problem may arise.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

I know this is probably a totally dumb question, but you are running a Release build, right? I've seen problems with sets and maps consuming huge amounts of CPU time due to iterator checking and other debug/safety functionality.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Quote:Original post by swiftcoder
you can see where the problem may arise.
Is it actually slow then?
The union, intersection and difference functions have a linear time complexity, algorithmically quite fast plus they're only dealing with pointers, I'm not actually that surprised that they only account for 26% of the running time, compared to the more-hefty manipulations of the set data structures as well as allocation and deallocation of memory.
Well, since you're containing PODs, you could eventually do a region allocator and avoid walking through the structure to free the memory.

Standard containers don't allow that, however. Maybe you could look into a Boost.Intrusive solution.

Why do you have a container of pointers also?
Quote:Original post by ApochPiQ
I know this is probably a totally dumb question, but you are running a Release build, right? I've seen problems with sets and maps consuming huge amounts of CPU time due to iterator checking and other debug/safety functionality.
Ja, highest optimisation level under GCC.
Quote:Original post by dmatter
Is it actually slow then?
Bit hard to say - I don't really have a baseline. The simulation was originally written in python, but I ported it to C++ when the running times became too large.

The C++ code shows a 4-5x speedup over Python, which means that the (very small) simulations we are running currently take only 10 minutes each. Unfortunately, in the near future I will have to run much larger simulations, and very many of them.

Ultimately, after applying the pool allocator, it is probably fast enough - but the less time I spend waiting for data, the better.
Quote:Original post by loufoque
Well, since you're containing PODs, you could eventually do a region allocator and avoid walking through the structure to free the memory.

Standard containers don't allow that, however. Maybe you could look into a Boost.Intrusive solution.

Why do you have a container of pointers also?
All the sets contain only pointers to nodes, while the nodes them selves are stored in a sorted, contiguous std::vector. This has the advantage that the sets are only operating on pointers (no expensive copies to be made).

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

This topic is closed to new replies.

Advertisement