std::set destructor performance
I have written a small simulation (work related), which performs most of its work on std::sets. The profiler is telling me that 40% of the running time is spent in the destructor of std::set, and another 25% is spent in the copy constructor. At the same time, the expected performance sinks (union, intersection and difference operations) each consume less than 5% of the running time.
The sets are relatively small (never greater than 100 items), and contain only pointers, so the destructor and copy constructor should have very little work to do. I have tried reusing the sets by calling clear(), which had no performance benefit, and nor does replacing the copy constructor with a call to swap().
I am running Apple's version of GCC 4.0.1, and am compiling with the -O3 -g flags (-g to obtain symbol names for the profiler). Profiler in use is Apple's Shark.
Any ideas how I might improve on this performance? It seems more than a little odd, given that the items involved are raw pointers. I would rather not have to replace std::set, but if necessary, that is certainly an option.
Quote:Original post by SiCraneThanks! I hadn't thought of that, so I just added in boost::fast_pooled_allocator, with the result that the copy constructor cost fell 1-2%, while the destructor cost fell 15%.
Did you try using a pooled allocator?
That is a very decent speedup, but these two still account ~50% of my running time, and still dwarf the cost of the actual set operations (12%, 8% and 6% respectively).
union, intersection and difference doesn't involve allocation or deallocation.
Well, it does -- but that time gets assigned (probably) to the constructor and destructor. :)
Sets and Maps are tree based structures. This makes them exceedingly slow to copy.
How are you using your std sets? Do you add and remove elements, or is it more of a "build up the set" then "read from the finished set" then "throw away the set" sequence?
Because the build/use/dispose can be done quit efficiently with lazy-sorting on a vector buffer. And if duplicate elements is (relatively) rare or not a problem, then you can even defer detection of duplication until the lazy-sort efficiently.
Well, it does -- but that time gets assigned (probably) to the constructor and destructor. :)
Sets and Maps are tree based structures. This makes them exceedingly slow to copy.
How are you using your std sets? Do you add and remove elements, or is it more of a "build up the set" then "read from the finished set" then "throw away the set" sequence?
Because the build/use/dispose can be done quit efficiently with lazy-sorting on a vector buffer. And if duplicate elements is (relatively) rare or not a problem, then you can even defer detection of duplication until the lazy-sort efficiently.
Obviously, to provide any useful information you need to show us some sample code. But my hunch is that you're passing/returning std::sets to/from functions by value. Unless you can actually explain why your code should spend any significant amount of time creating/destroying sets.
Quote:Original post by osmanbCode is long and convoluted (this is a simulation of a bittorrent-like p2p network).
Obviously, to provide any useful information you need to show us some sample code. But my hunch is that you're passing/returning std::sets to/from functions by value. Unless you can actually explain why your code should spend any significant amount of time creating/destroying sets.
All functions producing sets are passed the destination set by reference, and sets are only allocated in the main loop.
In a given cycle of the simulation, there are anywhere between 10 and 100 set operations (union, intersection or difference), and only a single set copy/swap.
However, if each of those operations counts as a copy to fill the destination set, and a destruction to clear the result (as NotAYakk suggests), you can see where the problem may arise.
I know this is probably a totally dumb question, but you are running a Release build, right? I've seen problems with sets and maps consuming huge amounts of CPU time due to iterator checking and other debug/safety functionality.
Quote:Original post by swiftcoderIs it actually slow then?
you can see where the problem may arise.
The union, intersection and difference functions have a linear time complexity, algorithmically quite fast plus they're only dealing with pointers, I'm not actually that surprised that they only account for 26% of the running time, compared to the more-hefty manipulations of the set data structures as well as allocation and deallocation of memory.
Well, since you're containing PODs, you could eventually do a region allocator and avoid walking through the structure to free the memory.
Standard containers don't allow that, however. Maybe you could look into a Boost.Intrusive solution.
Why do you have a container of pointers also?
Standard containers don't allow that, however. Maybe you could look into a Boost.Intrusive solution.
Why do you have a container of pointers also?
Quote:Original post by ApochPiQJa, highest optimisation level under GCC.
I know this is probably a totally dumb question, but you are running a Release build, right? I've seen problems with sets and maps consuming huge amounts of CPU time due to iterator checking and other debug/safety functionality.
Quote:Original post by dmatterBit hard to say - I don't really have a baseline. The simulation was originally written in python, but I ported it to C++ when the running times became too large.
Is it actually slow then?
The C++ code shows a 4-5x speedup over Python, which means that the (very small) simulations we are running currently take only 10 minutes each. Unfortunately, in the near future I will have to run much larger simulations, and very many of them.
Ultimately, after applying the pool allocator, it is probably fast enough - but the less time I spend waiting for data, the better.
Quote:Original post by loufoqueAll the sets contain only pointers to nodes, while the nodes them selves are stored in a sorted, contiguous std::vector. This has the advantage that the sets are only operating on pointers (no expensive copies to be made).
Well, since you're containing PODs, you could eventually do a region allocator and avoid walking through the structure to free the memory.
Standard containers don't allow that, however. Maybe you could look into a Boost.Intrusive solution.
Why do you have a container of pointers also?
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement