std::sort really slow in debug mode?

Started by
15 comments, last by zdlr 15 years, 5 months ago
Hello, I am sorting a std::vector which stores (about 400) elements of type IndexIntervalDesc with std::sort:
void TerrainQuadTree::computeVisiblePatches(const ICamera& cam, 
                                       std::vector<IndexIntervalDesc>& patches) { 
   for(int i=0; i < 4; i++) 
       tier1Quadrants->traverse(cam, patches); 
    
   std::sort(patches.begin(), patches.end(), sortVisiblePatchesFrontToBack);    
}

This is the predicate I am using:
bool sortVisiblePatchesFrontToBack(const IndexIntervalDesc& lhs, 
                              const IndexIntervalDesc& rhs) { 
   return lhs.distanceToCamSq < rhs.distanceToCamSq; 
}

And the elements in the vector look like this:
struct IndexIntervalDesc { 
   std::string    nodeID; 
   float        distanceToCamSq;    // Squared distance to camera    
   UINT        indexCount, startIndex, baseVertexIndex;    
};

The problem: If I am running the code in relase mode, everything runs fast (about 260 fps). But when I execute the same code in debug mode, my fps drops to 40 fps! If I don't call the sort function in debug mode the fps goes up to 180fps, so obviously the sort function is the reason for the huge fps loss. Just to be sure I measured the needed time for the call of sort: In Release Mode the call of sort needs about 0.000178794 seconds IN Debug Mode the call of sort needs about 0.0170315 seconds This is really annoying. Just because I am sorting the vector I cant use the debug version to play around. Why is std::sort so unbelievable slow in debug mode? Is there anything I can do against it?
Advertisement
IndexIntervalDesc contains a std::string, which is expensive to copy, because it allocates and deallocates memory on the heap.

So, sorting through patches may trigger a lot of copy calls, depending on how much there is to sort. Triggering a lot of copy calls will trigger a lot of heap activity, which is slow, espectially in debug mode, because there the heap is often checked a lot.

But even release mode code will be slower than it could be.

Fortunately you have some options to get rid of the expensive string copy call:

1) Make nodeID a std::string*. Copying a pointer is a very cheap operation.
2) Implement a swap method for IndexIntervalDesc, that's recognized by STL, effectively changing the expensive copy calls into much cheaper swap calls.
This may be trickier than it sounds. Read this.
3) Get rid of nodeId altogether or make it a different type, that's cheaper to copy.

I hope that helps!
As noted, this could help with the string member swapping (both in debug and release):
namespace std {    template<> inline void swap(IndexIntervalDesc &a, IndexIntervalDesc &b) throw() {        std::swap(a.nodeID, b.nodeID);        std::swap(a.distanceToCamSq, b.distanceToCamSq);        std::swap(a.indexCount, b.indexCount);        std::swap(a.startIndex, b.startIndex);        std::swap(a.baseVertexIndex, b.baseVertexIndex);    }};
You could also try turning off the checked iterators, and iterator debugging compile flags if you're using Visual Studio.
"In order to understand recursion, you must first understand recursion."
My website dedicated to sorting algorithms
Most of the STL is fairly slow in debug mode. It's template code which instantiates a lot of tiny classes and one-line functions, and it tends to rely heavily on the compiler being able to inline this. So when you disable optimizations, yes, it slows down a lot.

The same goes for std::vector. In release mode, it's essentially as efficient as a raw array, but only because the operator[] and any other small helper function gets inlined. Without optimizations, it's nowhere near as fast. Is this a problem? Why does performance in debug mode matter so much?
Thanks Rattenhirn (obviously you don't have one ;-), iMalc and Spoonbender.

Of course the performance in debug mode is not so important, but it just sucks if you run your application in debug mode and you just have a 10 fps slide show.

Anyway, in order to get a deeper understanding of the problem I added output messages to the Ctor and CopyCtor:
struct IndexIntervalDesc {   std::string  nodeID;   float        distanceToCamSq; // Squared distance to camera	   UINT         indexCount, startIndex, baseVertexIndex;	   IndexIntervalDesc() {       DXUTOutputDebugStringW(L"Ctor\n"); }   IndexIntervalDesc(const IndexIntervalDesc& rhs) {      DXUTOutputDebugStringW(L"CopyCtor\n"); }};
I tested this code with these lines:
std::vector<IndexIntervalDesc> v;IndexIntervalDesc i1;v.push_back(i1);IndexIntervalDesc i2;v.push_back(i2);

And the output is:
Quote:Ctor
CopyCtor
CopyCtor
Ctor
CopyCtor
CopyCtor
CopyCtor

I don't understand two things: First of all, why is the Copy Ctor called two times if I add i1? And second: Why is the copy ctor even called three times while adding i2?

My last question: Why is the copy ctor of IndexIntervalDesc actually called if I use std::sort? I mean I just pass references to the predicate so why is there any call to the copy ctor?
Quote:Original post by schupf two things: First of all, why is the Copy Ctor called two times if I add i1? And second: Why is the copy ctor even called three times while adding i2?

I'm not sure about this one, but one thing to remember is that whenever you push_back, the vector may be resized, and that involves copying all elements within it. If inserting an element for whatever reason involves two copies, then the third copy when inserting i2 could be accounted for if the vector gets resized, so i1 has to be copied.

Quote:
My last question: Why is the copy ctor of IndexIntervalDesc actually called if I use std::sort? I mean I just pass references to the predicate so why is there any call to the copy ctor?

Sorting the vector kinda involves moving the objects around, doesn't it? [grin]
And the only way to move an object is to make a copy of it. So every time sort determines that two elements should be swapped, it has to copy them around.

Of course, sort examines the elements in the vector by reference, but once it's determined that two objects should be swapped around, how would it achieve this without invoking the copy constructor? :)
>>Why does performance in debug mode matter so much?

its very important
A/ most developing is done with debug (faster compile times, tracking of errors etc)

but if your app is literally gonna run at 1fps (instead of 60+fps in release) debug mode becomes worthless

I believe EA have a replacement STL lib where debug mode is greatly improved speedwise, unfortunately its not generally available
Quote:Original post by schupfI don't understand two things: First of all, why is the Copy Ctor called two times if I add i1? And second: Why is the copy ctor even called three times while adding i2?


I can't think of a reason why the copy constructor is called one extra time in both cases. If you really want to know, just step into the push_back, using the debugger and behold.

As Spoonbender correctly stated, the second time has one more copy, because the vector grows from 1 to 2 and needs to copy the old contents (1 element) over.

Actually, at the first push_back, the vector is growing as well, from 0 to 1, but there are no elements to be copied.

If you want to cut down on copy calls when filling the vector with push_back, try to reserve slightly more elements than you will probably need. Or, in case you know exactly how many you are going to need, reserve that amount.

Quote:Original post by schupfMy last question: Why is the copy ctor of IndexIntervalDesc actually called if I use std::sort? I mean I just pass references to the predicate so why is there any call to the copy ctor?


Again, Spoonbender was faster than me. Note, that the std::sort should not call any copy constructors when a matching swap method is available.

Hm, now I am kinda disappointed. I tried your suggestions but none really helped that much.

These are the results of my speed measurements in debug mode (time which was needed to call std::sort and fps of my application):
1) No call of std::sort: 0s calling time and 177 fps
2) Calling std::sort without specialized swap: 0.017s and 44 fps
3) Calling std::sort with specialized swap: 0.015s and 47 fps
4) Calling std::sort with specialized swap and using std::string* in IndexIntervalDesc instead of std::string: 0.013s and 55 fps

So even if I specialize std::swap for my type (copy & pasted the code fro iMalc) AND use a string pointer its still annoying slow (55fps - remember: if I comment out the call of swap I get 177fps).

Its really frustrating that this one function call totally kills my debug performance.

Any further suggestions? @iMalc: You mentioned disabling checked iterators, and iterator debugging - how can I do this (I am using Visual Studio 2005)? I tried
#define _HAS_ITERATOR_DEBUGGING 0 in the first line of my cpp file, but this just gives me a "Debug Assertion Failed! Expression: ITERATOR LIST CORRUPTED"
This is a common problem. Debugging features, safety checks and logging aren't free. Deciding what to enable (and when) is a trade-off like any other, one between the features' cost and their benefit.

In this case why not simply put the sort in a separate source file for which you enable optimizations, even in debug mode? That way you can choose to spend your debugging cycles on more useful tests..

This topic is closed to new replies.

Advertisement