STL Performance Thread

Started by
79 comments, last by ApochPiQ 14 years, 1 month ago
Quote:Original post by Codeka
Are you telling me you don't use a profiler when doing performance analysis? If not a profiler then what? How do you know "Implementation A" is faster or slower than "Implementation B" if you don't profile?


It depends where one is standing. It is also the crux of this "Foo is slow". Also the basis for engineering, and the reason why Linux will not become desktop OS and why Twitter made it big and all other absurdities that seem to permeate the computing world.

It is not code that matters, and it is not code that is slow.

Slowness is in the domain of users - they tell you something is slow. Profiler lets you pinpoint down certain limited causes as to why it is slow.

This is why vector is good enough most of the time. Since development time is limited, after solving all the other causes, the application will, at some point, become fast enough.

A common fallacy of benchmarking is benchmarking in isolation.

Here's a more practical example, I deal in visualizations from time to time. And there was this graph handling tool. I used all the best practices, memory layouts, algorithms, etc, until it was able to handle 100,000 nodes in real time. As Borat would say "Great Success!".

Except it wasn't. The biggest graph that any user would ever load would be no more than 500 nodes. Which could be handled by a linked list of nodes using an n^2 algorithm.

I made the wrong assumption about what "fast" means. And second fallacy: "future-proof". Sure, my version is "future-proof". Except that future versions which actually do handle large graphs of hundreds of millions of nodes run on cluster of CUDA machines.

This is just one example of why constant factor optimizations are dangerous. Sure, 200% percent may sound a lot, but unless it is achievable with minimal effort, there's likely a million% improvement in using a different approach altogether.

Especially with multi-core being old news, TBB and similar tools abundant, it has gotten almost impossible to allocate any budget into constant overhead.

Instead, improving development practices by automating performance tests and either accurately getting user expectations, as well as accurate budget estimates will go much further, even if at smaller increments.

So I agree with that - profiler, while indispensable tool - is not where one should focus attention. Users are, for any arbitrary definition of user. They tell you which profiler to use, and where to point it.
Advertisement
Quote:Original post by thatguyfromthething
Profiler is another misconception, it's like saying you need a debugger to figure out bugs. It's just raw data, it catches things that amount to mistakes but it really tells you very little. Like the debugger, it is easy to misuse as a crutch that negates actual thought - it's not there to debug logic, but to catch simple mistakes and that's it.


Well, there's a big difference. When you're debugging you already know you have a bug and the general area based on what isn't working. When you're profiling, you're often looking for that initial glimpse into where your bottlenecks are, because frankly, even though you may think you have an idea, oftentimes, you won't really know until you profile.

Can tools be too heavily relied upon? Of course. But, the fact is that your avoidance of using these tools is part of the reason why you may feel like you do. If you think a debugger and a profiler are two tools that tell you very little, I don't know what to say, other than I couldn't disagree more. I can't foresee creating a large software project without both in my arsenal, and that is with some very good software engineering practices to surround it.

Quote:Original post by thatguyfromthething
Most programs don't need to worry a lot about code size or performance, but most programs don't need to be written in C++. It's easy to get gigabytes of object files in C++, though.


Well, it's simply about picking the best tool for the job. If the only thing C++ had going for it is performance, I don't think it would consistently be one of the top three or four most widely used programming languages. It's hard to fathom that 10% of all applications are truly performance critical.

Quote:Original post by thatguyfromthething
The point of macros is metaprogramming, too. Templates were introduced as a replacement, because they work ok for little stuff but they are hard to deal with when making something large. I don't say anyone who uses macros is incompetent, but a template library programmed almost entirely in macros is pretty ironic and shows that it's probably not someone who should be listened to much. But, free has a big appeal.


A template library programmed almost entirely in macros? Well, I don't agree with that statement at all. That's a grandiose exaggeration.

Quote:Original post by thatguyfromthething
Yes, other languages did learn from C++ mistakes. That's why they don't use C++ templates at all, but what does that have to do with anything? My point is not to bash on anybody, just to point out reality. The fact C++ has been left in the dust should say a lot about many of C++'s features.


Actually, other languages started out with no templates, then people were writing a lot of type-unsafe and dangerous code, so it revealed the need for generics. These languages later added support for generics and were often limited by their earlier choices. Most languages don't have a template or generic mechanism as powerful as C++ templates, but that often has to do with the history of those programming languages.

Anyway, you were saying STL is horrible; I was just curious what you were comparing it to that you might like better.

Quote:Original post by thatguyfromthething
I know templates are not OO, I'm the one who said it. Most people who talk about stl, design patterns, and OOP in C++ are pretty vague on what OOP is and how templates work. Or more likely, simply dead wrong. But, it's unsurprising because the way templates and C++ in general work is beyond mindboggling if you really look at it in detail.


Well, I think beyond mindboggling is a bit of an overstatement. Template meta-programming can be ugly. The true power wasn't really envisioned when C++ templates were designed and it has evolved through creative exploit after exploit into a very powerful, but somewhat confusing paradigm. That being said, there is an effort to simplify it with C++ concepts, though we're years away from getting that.

Quote:Original post by thatguyfromthething
People drop to c array because of performance, not size. I said nothing about resizing. You can also preallocate a vector, but that's not the real issue.


If you're saying a vector is still slower even after preallocating, I'll have to disagree. Memory allocations aside, an STL vector tends to operate as fast or faster, especially when looping, due to locality of reference.

Quote:Original post by thatguyfromthething
Sure you can trace through anything, but good luck with that using stlport or dinkumware, or probably any other big template library. But more important, what assembly gets generated? The point is not to look for mistakes but to find out why things slow down. This is possible but much harder with templates, almost impossible when you have complicated things going on (which almost anything in the stl qualifies as).


I don't see why the assembly generated classifies as "more importantly". But if you need to know, you can certainly see the generated assembly. There is nothing preventing you from looking at it. As for complexity, I find the complexity of my application itself is much more critical than the complexity of the STL.

Quote:Original post by thatguyfromthething
Who said I microoptimized anything? How would ten years passing have changed how stl works? Most of stlport was written 15+ years ago and has not changed a whit as far as I can tell in that time, whereas what qualifies as optimal code today has changed dramatically because hardware has changed dramatically. Sure computers can perform most tasks faster than ever, but again it comes to why are you using C++ if you don't care about performance? For some things you basically get paid in direct proportion to how much of x you can do, or even in exponential proportion. If Modo could handle 20 billion poly models would they be languishing in obscurity while zbrush has a dozen times the customers? Modo is certainly easier to use and it has all the same basic features, but bottom line is it's useless for most of what zbrush does so well simply due to not having the raw firepower.


Well, for one thing, many STL implementations were non-standard. In fact, the standard that we're referring to was C++98. Before that, the STL wasn't even exception-safe. I can't speak for STLport 15 years ago, because I've only needed STLport on two separate occasions, when there were performance reasons, because the STLport implementation was more performant. However, I'm quite confident that STLport has changed quite a bit in 15 years.

More to the point, Microsoft did not actually have a standards-compliant version of the STL until Visual Studio .Net 2003, IIRC. The VS6 one was terrible, buggy and leaky. This is why programmers with a heavy background in Windows tended to stick to MFC or ATL collections, rather than STL.

However, it's been a long time past, these issues are no longer true. Then, factor in that compilers have, in the past 5-7 years made tremendous leaps. Visual Studio 2008 optimizes considerably better than VS6 ever did. The inline keyword or lack there of is taken merely as a suggestion to modern compilers, where on many older compilers, it was blindly followed, even when it degraded performance. There's a lot that changed.

I still have vague memories of leaky strstream implementations, of old GCC environments where STL didn't work quite right, of a 1996 implementation of STL on an old version of VxWorks, where half the methods of vector weren't implemented, because it was pre-standard. Yes, these times existed. They are long past.

Quote:Original post by thatguyfromthething
How does stl solve almost anything?


I never said the STL solves almost anything. Your comment was "STL is a joke".

Quote:Original post by thatguyfromthething
This came up as a discussion about data structures, if you remember. You (and I) are only talking about vectors because that's about all there is worth noting in stl.


Not at all. We're talking about vectors because you brought up vectors. I'd be happy to apply any of the discussion to list, set, multiset, map, multimap, unordered_set, unordered_map, etc...

Quote:Original post by thatguyfromthething
If that's all you need for data structures, you probably are mistaken to use C++ at all.


If you read my earlier post, you'll notice I also mentioned using Boost and other third party libraries. If you need a list, I'll give you a list.

Quote:Original post by thatguyfromthething
I'd almost say the entire reason to use C++ at all compared to say Java is to be able to make specialized APIs. Of course if there is some API you need that's only in C++ then you are stuck with C++, but the real point is so that the people writing the API can have performance and flexibility and in turn use other low level APIs. Say you want to make something akin to Oracle or a game engine. Do you really think STL will cover your data structure needs? Well, it won't even scratch the surface. However if you are just some guy who uses Oracle or a game engine's built in stuff and make GUIs on top of it, by all means don't worry about data structures they use at all, because it's out of your hands anyway. If that's the case, drop C++ completely if you can and use C# or mono or python or Java to call on their API to do the actual work. But you can be pretty sure that somewhere in those programs there's a lot of work that went into various data structures.


Again, you're writing things that I never said. STL is far from covering all my data structure needs. But it's powerful and flexible enough to build new data structures ontop. STL absolutely isn't perfect (especially with the design of allocators), but it's definitely better than you've given it credit for.

Quote:
If you want to convince yourself stl is a joke, just make a class with a vector of ints and a couple other primitive members, basically a typical object of some kind. Then make a vector of these classes, each with a dozen ints in the child vector. Add about a million to the vector, then sort it and see how long that takes to complete(for comparison, I can run the same code with my own implementation in about 150ms). Also note how awkward this is, syntactically speaking. Sure, you can blame the implementation, but the implementation is going to be different on each platform and yet tend to universally suck because. Also check the difference between what you get with a static array and a vector, I know that put a sad frown on my face.


Ok, all you're saying is you don't like the sort implementation. STL certainly provides you the flexibility to create an algorithm with a different sort implementation. If you're saying blindly using STL sort is a bad idea, I agree. In fact, I find that 99% of the time if I need something sorted, a vector is the wrong container to start with. Regardless, if you have code that proves that your mechanism is 150ms and the STL version is ridiculously slow, then post it, and I'm sure we can analyze it and tell you what you're doing differently and how to use STL more efficiently.
Quote:Original post by Antheus
Quote:Original post by Codeka
Are you telling me you don't use a profiler when doing performance analysis? If not a profiler then what? How do you know "Implementation A" is faster or slower than "Implementation B" if you don't profile?
It depends where one is standing. It is also the crux of this "Foo is slow". Also the basis for engineering, and the reason why Linux will not become desktop OS and why Twitter made it big and all other absurdities that seem to permeate the computing world.
I agree with you, but when you're talking about using std::vector (etc) vs. rolling your own, you can't conclude that one is faster than the other without profiling the specific circumstances.
Quote:Original post by thatguyfromthething
People think that the compiler has some magic ability to go in and do wonders and that somehow with the stl that it's even better and these super advanced programmers (who mainly wrote this code 15+ years ago)
Point me to a current compiler that uses a code generator not improved upon in the last 15 years. Among other objections to your statement, many code generators for amd64 have only been written in the last 5 years.
Quote:have some secret mojo no one else does.
They have domain-specific knowledge generally restricted to language designers, hardware engineers and the like, which most of the rest of us *don't* have. If you are up to date on the intricacies of modern cache architecture and instruction-level parallelism, good for you.
Quote:
Quote:It's certainly not possible without having a profiler

Profiler is another misconception, it's like saying you need a debugger to figure out bugs. It's just raw data, it catches things that amount to mistakes but it really tells you very little. Like the debugger, it is easy to misuse as a crutch that negates actual thought - it's not there to debug logic, but to catch simple mistakes and that's it.
If you are honestly claiming that attempting to solve a problem without reference to the relevant data is a sensible idea, then you open yourself to ridicule. I give you the opportunity to clarify what you meant there...
Quote:
Quote:and it's amazing how many people who haven't written line 1 of their software are spending their time thinking about how to optimise things.
Well sure, it is always the case you can preoptimize things.
And 99% of the time, you will be optimising the wrong thing. remember the rule: 10% of the code takes 90% of the execution time. A profiler allows you to identify the 10%, but you can't do that until you write at least a prototype of the code.

You can legitimately make algorithmic optimisations ahead of time, but even here you want make sure you are actually working on a bottleneck.
Quote:It's also easy to assume everyone on a site like this is a 15 year old game enthusiast or a college student but that's not necessarily the case and I'd bet there are also some of the best programmers in the world hanging around.
Contrary to popular myth, the best programmer is not the programmer who writes the best code. Rather, the best programmer is the programmer who delivers software conforming to project specifications, on or before the project deadline.

With that in mind, any optimisation beyond the project requirements is wasted time, and time is money. Optimising the wrong part of the application (because you didn't run the profiler) is even worse, because not only have you wasted time and money, but you likely still have to optimise the right part of the program.
Quote:Most programs don't need to worry a lot about code size or performance... most programs don't need to be written in C++.
This is the truest statement in this thread - Kudos.
Quote:It's easy to get gigabytes of object files in C++, though.
Sure, in Debug mode. But then again, most hard drives are hundreds of gigabytes these days, and release mode is generally on the order of 100x smaller...
Quote:People drop to c array because of performance, not size. I said nothing about resizing. You can also preallocate a vector, but that's not the real issue.
Are you claiming that a pre-allocated std::vector performs worse than a malloc'd array? If so, are you willing to put your money where your mouth is, and demonstrate reproducible benchmarks to prove this claim?
Quote:How would ten years passing have changed how stl works? Most of stlport was written 15+ years ago and has not changed a whit as far as I can tell in that time
The underlying compilers have improved drastically in that time. The standard library doesn't do anything particularly unusual or tricky - quite the opposite, it is very straightforward and efficient C++. Compiler improvements translate to direct performance improvements in standard library elements.
Quote:optimal code today has changed dramatically because hardware has changed dramatically.
Really? I would like you to supply a concrete reference for this.

In certain specific areas this is true, for instance the PS3 with its unusual Cell architecture. In most other areas, the only change in writing optimal code has been moving away from C++ (because C++ isn't a good fit for something like Google's cluster).
Quote:But you can be pretty sure that somewhere in those programs there's a lot of work that went into various data structures.
A lot of work went into, and continues to be put into, both the standard library and boost. Are you claiming there wasn't?
Quote:Add about a million to the vector, then sort it and see how long that takes to complete(for comparison, I can run the same code with my own implementation in about 150ms).
Again, reproducible benchmarks, or it didn't happen. If you are making a comparative performance claim, handwaving is not sufficient.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Quote:
If you want to convince yourself stl is a joke, just make a class with a vector of ints and a couple other primitive members, basically a typical object of some kind. Then make a vector of these classes, each with a dozen ints in the child vector. Add about a million to the vector, then sort it and see how long that takes to complete(for comparison, I can run the same code with my own implementation in about 150ms).


Mmhmm. This smells.

I distinctly remember the assembly generated from optimized stl being identical to that of arrays. I needed the C++ refresher anyways, so I generated this little test based off of what I interpreted this mini-rant to mean.

MSVS2008, standard release build config:

foo.h
#include <string>#define _SCL_SECURE 0#include <vector>class foo{public:	int x;	int y;	std::string name;	std::vector<int> stuff;	foo();	bool operator<(const foo &rhs);};


foo.cpp
#include "foo.h"foo::foo(){	x=rand();	y=rand();	name = "";	int count = rand()%12;	for( int ix = 0; ix < count; ++ix){		stuff.push_back(rand());	}}bool foo::operator<(const foo &rhs){	if(stuff.size() < rhs.stuff.size()){		return(true);	}	if(stuff.size() > rhs.stuff.size()){		return(false);	}	for(int ix = 0; ix < stuff.size(); ++ix){		if(stuff[ix]<rhs.stuff[ix]){			return(true);		}		if(stuff[ix]>rhs.stuff[ix]){			return(false);		}	}	return(false);}


main.cpp
#include "foo.h"#include <vector>#include <windows.h>#include <algorithm>#include <iostream>int main(){	DWORD beginInit,endInit, beginSort, endSort, beginArrayInit, endArrayInit, beginArraySort, endArraySort;	beginInit = GetTickCount();	std::vector<foo> foos(10000);	endInit = GetTickCount();	beginSort = GetTickCount();	std::sort(foos.begin(),foos.end());	endSort = GetTickCount();	beginArrayInit = GetTickCount();	foo foos2[10000];	endArrayInit = GetTickCount();	beginArraySort = GetTickCount();	std::sort(&foos2[0],&foos2[9999]);	endArraySort = GetTickCount();	std::cout << "init: " << endInit-beginInit << std::endl;	std::cout << "sort: " << endSort-beginSort << std::endl;	std::cout << "array init: " << endArrayInit-beginArrayInit << std::endl;	std::cout << "array sort: " << endArraySort-beginArraySort << std::endl;	return(0);}


(using 10000 entries since a million blows the stack. Arrays are awesome!)

init: 110sort: 125array init: 375array sort: 828


Yeah, those arrays are super awesome.


I don't have access to your code to test of course, but I suspect that any benefits you might gain in 'add a bunch of data and sort' tests will be offset by losses in 'remove entries from middle' or 'random access something near the end' or any of the other common collection behaviors. The STL has withstood years of analysis by people far better than you or I, and nobody has made a better performing general purpose vector. Period.

If you think that you or someone else has, prove it. Post some code or quit spreading disinformation.
Quote:Original post by swiftcoder
Quote:Add about a million to the vector, then sort it and see how long that takes to complete(for comparison, I can run the same code with my own implementation in about 150ms).
Again, reproducible benchmarks, or it didn't happen. If you are making a comparative performance claim, handwaving is not sufficient.
^^This.

Everyone in this thread knows that they are right, so quit talking and produce the code, then we can optimise it with and without STL.
We can objectively look at the results, see which one is faster, smaller source, smaller binary, etc...
Quote:Original post by thatguyfromthething
If you want to convince yourself stl is a joke, just make a class with a vector of ints and a couple other primitive members, basically a typical object of some kind. Then make a vector of these classes, each with a dozen ints in the child vector. Add about a million to the vector, then sort it and see how long that takes to complete(for comparison, I can run the same code with my own implementation in about 150ms). Also note how awkward this is, syntactically speaking. Sure, you can blame the implementation, but the implementation is going to be different on each platform and yet tend to universally suck because. Also check the difference between what you get with a static array and a vector, I know that put a sad frown on my face.


Then run the same thing on a compiler supporting r-value references and an appropriate standard library implementation and watch the *one* performance problem that vectors suffer from be completely squashed.
Here's my attempt :)

#include <iostream>#include <string>#include <vector>#include <algorithm>#include <sstream>#include <map>#include <windows.h> // for QPCstruct typical_object_of_some_kind{	std::vector<int> vector_of_ints;	std::string a_string;	std::string another_string;};bool operator < (typical_object_of_some_kind const &lhs, typical_object_of_some_kind const &rhs){	// we could do "a_string" first, but that would be cheating...	std::less<std::vector<int> > pred;	return pred(lhs.vector_of_ints, rhs.vector_of_ints);}//------------------------------------------------------------------------------int main(int, char **){	srand(0);	std::vector<typical_object_of_some_kind> big_vector;	for(int i = 0; i < 1000000; i++)	{		typical_object_of_some_kind tofsk;				std::stringstream ss;		ss << "This is a long-ish string that we'll add to the first value: " << i;		tofsk.a_string = ss.str();		for(int j = 0; j < 20; j++)		{			tofsk.vector_of_ints.push_back(rand());		}		big_vector.push_back(tofsk);	}	LARGE_INTEGER freq, start, end;	::QueryPerformanceFrequency(&freq);	::QueryPerformanceCounter(&start);	std::cout << "about to sort..." << std::endl;	std::sort(big_vector.begin(), big_vector.end());	::QueryPerformanceCounter(&end);	std::cout << "sort complete in: " << (static_cast<double>(end.QuadPart) - start.QuadPart) / freq.QuadPart << "s" << std::endl;	// print out the first few, just to prove it actually *is* sorted...	int num = 0;	for(std::vector<typical_object_of_some_kind>::iterator it = big_vector.begin(); it != big_vector.end() && num < 100; ++it, num++)	{		std::stringstream ss;		for(std::vector<int>::iterator it2 = it->vector_of_ints.begin(); it2 != it->vector_of_ints.end(); ++it2)		{			ss << " " << *it2;		}		std::cout << ss.str() << std::endl;		num++;	}	return 0;}

It's totally unoptimized and I just wrote it in my lunch break, so it's not that great. For 1,000,000 entries it sorts in about 6.5 seconds. If you take out the strings, it runs in about 2.7s. For 100,000 entries (with strings) it sorts in 530ms.

Not as fast as it could be, but how often do you need to sort 100,000-element arrays of ints anyway?
Quote:Original post by Telastyn
init: 110sort: 125array init: 375array sort: 828


Yeah, those arrays are super awesome.


I don't have access to your code to test of course, but I suspect that any benefits you might gain in 'add a bunch of data and sort' tests will be offset by losses in 'remove entries from middle' or 'random access something near the end' or any of the other common collection behaviors. The STL has withstood years of analysis by people far better than you or I, and nobody has made a better performing general purpose vector. Period.

If you think that you or someone else has, prove it. Post some code or quit spreading disinformation.


He's trying to create a hypothetical example where the vector is going to do a lot of copying due to having to re-arrange lots of stuff. This is a pathological example of poor performance given the right circumstances and a sufficiently expensive-to-copy object. Regardless, the problem is non-existant with r-value references so there's no point in even discussing it IMO.

Even if you want to say "well I'm talking about *old* code on compilers that didn't have that support", it's still arguably an invalid comparison because you aren't comparing the same things. A C-array is not a dynamically sized array. If you want a real comparison to a C-array then replace "vector" with boost::array<> and let's talk.
Quote:The STL has withstood years of analysis by people far better than you or I, and nobody has made a better performing general purpose vector.


This is what it comes down to right here. *General purpose* containers. You can create a structure that performs better than the STL in your particular situation, provided you know what you're doing. But it always involves sacrificing generality. The STL is designed to work with all container types without assuming certain traits which break certain objects. For example, if it's okay to move an object around without invoking it's copy constructor/destructor then you can reallocate or shift a vector of these objects faster than the STL would. But then you've sacrificed generality and made your code more dangerous too. If the object is later modified to contain a pointer to it's own memory, you're in big trouble.
Fortunately the commercial compilers actually take notice of what types you store in the vector and they will perform shallow copies on objects which are known to be safe. e.g. a vector of other vectors.

So if you think the STL isn't a satisfactory solution to your problem, you might want to:

1) Profile your code to see if it really matters
2) Switch to a different container as you're probably just using the wrong one
3) Understand why the STL container is slower than a customised solution and which constraints your own method drops in order to outperform it.
4) Determine if the expected performance increase is worth the time you're about to invest
5) Write your own container, profile it, get frustrated when it's somehow slower than the STL, do some research, discover that your compiler's STL implementation is using optimisations you've never heard of, then incorporate those optimisations into your own code and hopefully receive a payoff for all your time spent!

This topic is closed to new replies.

Advertisement