Sign in to follow this  
joe_phsiao

STL Map char* is faster than string

Recommended Posts

Hello, I wrote a test program to test the speed of find() in std::map. And I found that if I use char* as the key type, compare to string, the program would spend less time in find(). My question is why? I think both of them compare keys character by character. Here's the program:
#include <sstream>
#include <ctime>
#include <iostream>
#include <map>
#include <string>

using namespace std;

struct lesscmp{

  bool operator()(const char* st1, const char* st2 )
    return strcmp( st1, st2 ) < 0;
}

size_t LOOP = 50000;

int main()
{
  int t;
  
  map<string, int> map1;
  map<const char*, int, lesscmp> map2;

  string sta[LOOP];
  for( size_t i = 0 ; i < LOOP; i++ ){
    stringstream ss;
    ss << i;
    string st;
    ss >> st;
    sta[i] = st;

    map1[ st ] = 1;
    map2[ st.c_str() ] = 1;
  }

  t = clock();
  for( size_t i = 0 ; i < LOOP; i++ )
    map1.find( sta[i] );

  cout << clock() - t << endl;
  
  t = clock();
  for( size_t i = 0 ; i < LOOP; i++ )
    map2.find( sta[i].c_str() );

  cout << clock() - t << endl;

  return 0;
}

The output is: 110000 50000

Share this post


Link to post
Share on other sites
Because you're doing it wrong?

Using (relatively old) gcc-3.3.5 on linux 2.4.27 (relatively old debian install) 1.2 ghz, 512m with 'gcc -O3', and your source fixed to actually compile:


#include <sstream>
#include <string>
#include <iostream>
#include <map>

using namespace std;

struct lesscmp{

bool operator()(const char* st1, const char* st2 ){
return strcmp( st1, st2 ) < 0;
}
};


int main()
{
int t;
size_t LOOP = 50000;


map<string, int> map1;
map<const char*, int, lesscmp> map2;

string sta[LOOP];
for( size_t i = 0 ; i < LOOP; i++ ){
stringstream ss;
ss << i;
string st;
ss >> st;
sta[i] = st;

map1[ st ] = 1;
map2[ st.c_str() ] = 1;
}

t = clock();
for( size_t i = 0 ; i < LOOP; i++ )
map1.find( sta[i] );

cout << clock() - t << endl;

t = clock();
for( size_t i = 0 ; i < LOOP; i++ )
map2.find( sta[i].c_str() );

cout << clock() - t << endl;

return 0;
}




yields:

50000
50000


The key there is using -O3 I suspect. Release mode/optimizations on is your friend.

Share this post


Link to post
Share on other sites
I'm suprised that even works.
string st; is a temp variable with limited scope.
st.c_str() is a temp variable valid only within the context of the call to it
map2[ st.c_str() ] = 1; makes a copy of the POINTER not a copy of the string
and thusly, map2 probably contains garbage for keys.
Unless I'm totally missing something.

Share this post


Link to post
Share on other sites
Quote:
Original post by Telastyn
yields:

50000
50000


Shame. I was hoping std::string would win due to it being able to reject strings based on length rather than content - I don't know if strcmp does anything clever like that though.

Quote:
for( size_t i = 0 ; i < LOOP; i++ ){
stringstream ss;
ss << i;
string st;
ss >> st;
sta[i] = st;

map1[ st ] = 1;
map2[ st.c_str() ] = 1;
}

Hmm, what am I missing?

The way I see it the c_str is retrieved, stored in the map, and then the string is destructed, leaving the map's pointer dangling.

Edit:
@KulSeran: Snap [grin]

Share this post


Link to post
Share on other sites
I am using gcc-4.2.3 (Ubuntu 4.2.3-2ubuntu7) and linux 2.6.24-19.
The compile command is "g++ -O3 main.cc"

I try to run the program three times, and the outputs are:

joe@joe-laptop:~/Desktop/map$ ./a.out
50000
40000
joe@joe-laptop:~/Desktop/map$ ./a.out
60000
20000
joe@joe-laptop:~/Desktop/map$ ./a.out
60000
30000


The overall is faster, but seems like char* is still a little bit faster.

Share this post


Link to post
Share on other sites
Quote:
Original post by dmatter
Shame. I was hoping std::string would win due to it being able to reject strings based on length rather than content - I don't know if strcmp does anything clever like that though.
strcmp can't do anything clever like that, since it doesn't know the length ahead of time. It's sort of a silly optimization, though... it would be difficult to talk about the "average case" for string comparison, but I would think a differing first character would be at least as likely as a differing length for two different strings.

Share this post


Link to post
Share on other sites
Quote:
Original post by joe_phsiao
The overall is faster, but seems like char* is still a little bit faster.

The variance in your results is far too high to draw any meaningful conclusions, and clock() is not suitable for reliable benchmarking. Try the tests again with something a little more accurate and precise before jumping to conclusions.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
strcmp can't do anything clever like that, since it doesn't know the length ahead of time.

I was thinking more along the lines of peeking into the header to the allocated memory to determine length.

Quote:
It's sort of a silly optimization, though... it would be difficult to talk about the "average case" for string comparison, but I would think a differing first character would be at least as likely as a differing length for two different strings.

I suppose, in the general case.
Although, with a finite set of symbols, there are an infinite number of strings beginning with a character c, but finite number of length n.

Share this post


Link to post
Share on other sites
Quote:
Original post by dmatter
I was thinking more along the lines of peeking into the header to the allocated memory to determine length.
The allocation sizes tell you nothing about whether the strings are equal or not.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
Quote:
Original post by joe_phsiao
The overall is faster, but seems like char* is still a little bit faster.

The variance in your results is far too high to draw any meaningful conclusions, and clock() is not suitable for reliable benchmarking. Try the tests again with something a little more accurate and precise before jumping to conclusions.
Good to know. I wasn't aware that clock() was bad on Linux, but I did a bit of testing and it's atrocious on my system.
A quick test reveals that the resolution for clock() is 10 ms, while getrusage() has 4 ms resolution and gettimeofday() single us resolution (probably an intelligent RDTSC wrapper).
The whole thing is weird as I would've assumed that clock() would be a wrapper for getrusage(), and why it can't include the time elapsed during the current scheduling quanta is beyond me.

That should teach me not to rely on the stated precision rather than measuring the actual resolution..

Share this post


Link to post
Share on other sites
Quote:
Original post by joe_phsiaoThe overall is faster, but seems like char* is still a little bit faster.
Speed means nothing when it doesn't work anyway. I wouldn't be surprised if the second map only has one entry, with an invalid pointer.
Try printing out the contents of the maps afterwards and see for yourself, if your program happens not to crash that is.

Share this post


Link to post
Share on other sites
Quote:
Original post by dmatter
Quote:
Original post by Sneftel
The allocation sizes tell you nothing about whether the strings are equal or not.
True enough - Forget my nonsense! [embarrass]


Just to put the nail in this one ... even if you COULD somehow determine the length of the strings in O(1) time before you started comparing their contents ... that STILL wouldn't be useful, unless you don't want to use lexicographic ordering of your strings. Assuming that you want things in alphabetical (or some similar) ordering, how does it help you to know that "azz" is shorter then "aazz"?

Share this post


Link to post
Share on other sites
Well a literal string's length would be known at compile time but I doubt that would, or could, be made use of.

As for ordering, I originally had in mind that the find function would be testing for equality with the == operator (or similar) but, of course, in C++ it's done using the strict weak ordering predicate.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
The variance in your results is far too high to draw any meaningful conclusions, and clock() is not suitable for reliable benchmarking. Try the tests again with something a little more accurate and precise before jumping to conclusions.

I would love to know what else is more suitable.

After I google searched for a while, it seems to me that c_str() points to an internal space in string class and it has a longer lifetime than the string object. That's why the above code works. But nobody knows exactly what the lifetime is, it's safer to ditch that use.

So I modify the main() to focus on the benchmark.


int main()
{
int t;

map<string, int> map1;
map<const char*, int, lesscmp> map2;

string sta1[LOOP];
const char* sta2[LOOP];

for( size_t i = 0 ; i < LOOP; i++ ){
stringstream ss;
ss << i;
string st;
ss >> st;
sta1[i] = st;
sta2[i] = sta1[i].c_str();

map1[ sta1[i] ] = 1;
map2[ sta2[i] ] = 1;
}

t = clock();
for( size_t i = 0 ; i < LOOP; i++ ){
map1.find( sta1[i] );
}
cout << clock() - t << endl;

t = clock();
for( size_t i = 0 ; i < LOOP; i++ ){
map2.find( sta2[i] );
}
cout << clock() - t << endl;

return 0;


The results are same as before.

Share this post


Link to post
Share on other sites
Quote:
Original post by joe_phsiao
After I google searched for a while, it seems to me that c_str() points to an internal space in string class and it has a longer lifetime than the string object. That's why the above code works. But nobody knows exactly what the lifetime is, it's safer to ditch that use.

Incorrect. Think of it like this: c_str() points to an internal buffer in the string object. When the string object is destructed, that internal buffer goes away as well.

That means that when your string st goes out of scope, the pointer you stored is now invalid. If it *seems* to still point to valid memory, then it just means that you were incredibly lucky that your program didn't crash.

Share this post


Link to post
Share on other sites
Quote:
Original post by joe_phsiao
After I google searched for a while, it seems to me that c_str() points to an internal space in string class and it has a longer lifetime than the string object. That's why the above code works. But nobody knows exactly what the lifetime is, it's safer to ditch that use.

No. The 'safe' lifetime of the string returned by c_str terminates once you call any non-const member function on that string object.

Share this post


Link to post
Share on other sites
Quote:
Original post by joe_phsiaoI would love to know what else is more suitable.
Under UNIX, there's no guaranteed good option, but gettimeofday anecdotally works well. Alternatively, you can use the time command to reliably clock the execution of an entire program, and put the tests in different files... but you'll want a program that runs at least 10-20 seconds, to amortize out startup time.

BTW, your program is still incorrect and possibly not working properly. You don't appear to fully understand the ramifications of object lifetimes and pointer aliasing. Do you have a good book on C++? If not, I can suggest some web references... but IMHO books really are best.

EDIT: actually, on a second look, any reasonable implementation should execute it fine. so yeah, just concentrate on the benchmarking aspect.

Share this post


Link to post
Share on other sites
Quote:
Original post by joe_phsiao
After I google searched for a while, it seems to me that c_str() points to an internal space in string class and it has a longer lifetime than the string object. That's why the above code works. But nobody knows exactly what the lifetime is, it's safer to ditch that use.
Some compilers implement string pooling for small strings. Other compilers it might give different results, or crash. Quite simply it is wrong as it makes too many assumptions about the underlying memory management routines and compiler specific behaviour, and on probably most compilers attempting to even display the contents of the map afterwards will probably cause a crash.
Please try outputing all those strings afterwards:
cout << "Size1 = " << map1.size() << endl;
for (map<string, int>::const_iterator it = map1.begin(); it != map1.end(); ++it)
cout << it->first << " ";
cout << "So far so good..." << endl;

cout << "Size2 = " << map2.size() << endl;
for (map<const char*, int, lesscmp>::const_iterator it = map2.begin(); it != map2.end(); ++it)
cout << it->first << " ";
cout << "Boom?" << endl;

Share this post


Link to post
Share on other sites
The code seems to work on my compiler (g++ 4.0.1 on a MacBook Pro). He keeps around the strings in the array called sta, so doing this should be safe:
  const size_t LOOP = 50000; // <-- Should be const

string sta[LOOP];
for( size_t i = 0 ; i < LOOP; i++ ){
stringstream ss;
ss << i;
string st;
ss >> st;
sta[i] = st;

map1[ st ] = 1;
map2[ sta[i].c_str() ] = 1; // <-- I changed this line
}




What he did probably works because the particular implementation of std::string that I am using happens to have copy-on-write semantics, so st and sta[i] have the same internal buffer. You should never rely on something like that.

It seems to be the case still that comparisons of std::string objects is slower than calls to strcmp(). (EDIT: Actually, they are the same; the difference in map performance must come from somewhere else (I bet it's because of the difference between the size of std::string and the size of a pointer)) This is the modified code I used to test it:
#include <sstream>
#include <string>
#include <iostream>
#include <map>
#include <sys/time.h>

using namespace std;

double now() {
struct timeval tp;
gettimeofday(&tp,0);
return tp.tv_sec+.000001*tp.tv_usec;
}

struct lesscmp {
bool operator()(const char* st1, const char* st2 ) {
return strcmp( st1, st2 ) < 0;
}
};

int main() {
double t;
const size_t LOOP = 50000;


map<string, int> map1;
map<const char*, int, lesscmp> map2;

string sta[LOOP];
for( size_t i = 0 ; i < LOOP; i++ ) {
stringstream ss;
ss << i;
string st;
ss >> st;
sta[i] = st;

map1[ st ] = 1;
map2[ sta[i].c_str() ] = 1;
}

t = now();
for( size_t i = 0 ; i < LOOP; i++ )
map1.find( sta[i] );

cout << now() - t << endl;

t = now();
for( size_t i = 0 ; i < LOOP; i++ )
map2.find( sta[i].c_str() );

cout << now() - t << endl;

return 0;
}






:~ alvaro$ g++ kk2.cpp -o kk2 -O3 -Wall -ansi -pedantic
:~ alvaro$ ./kk2
0.0410659
0.0234401


[Edited by - alvaro on July 7, 2008 4:51:05 AM]

Share this post


Link to post
Share on other sites
I found the benchmark in this program is really UNRELIABLE.

If I swap the two loops of find(), now the string loop is doing first, the string loop becomes the faster one. So the first loop is always faster. I guess it's some optimization in the compiler. But the result is really confusing.

Share this post


Link to post
Share on other sites
Quote:
Original post by joe_phsiao
I found the benchmark in this program is really UNRELIABLE.

If I swap the two loops of find(), now the string loop is doing first, the string loop becomes the faster one. So the first loop is always faster. I guess it's some optimization in the compiler. But the result is really confusing.

Actually, that's to be expected.

What you're noticing is known as "cache priming". Through the first benchmark, a lot of the data you're reading in is not yet resident in the cache, so it takes a while to pull it in from main memory. In the second test, since you already pulled all of it in, and at least some of it is still there, access is much faster. One solution is to run the benchmarks a few times in one execution, alternating between the two. Also, having a much larger data set (so that a smaller fraction is actually in the cache) helps. Ultimately, though, this demonstrates nothing so much as how little information one often gets from a contrived benchmark of two algorithmically equivalent implementations. It's possible--hell, it's likely--that one approach would work better in some circumstances but not others. The moral of the story: time realistic uses, not simple loops; worry about algorithmic performance rather than constant-factor; always keep YAGNI in mind.

Share this post


Link to post
Share on other sites
Quote:
Original post by Sneftel
Quote:
Original post by joe_phsiao
I found the benchmark in this program is really UNRELIABLE.

If I swap the two loops of find(), now the string loop is doing first, the string loop becomes the faster one. So the first loop is always faster. I guess it's some optimization in the compiler. But the result is really confusing.

Actually, that's to be expected.

What you're noticing is known as "cache priming". Through the first benchmark, a lot of the data you're reading in is not yet resident in the cache, so it takes a while to pull it in from main memory. In the second test, since you already pulled all of it in, and at least some of it is still there, access is much faster. One solution is to run the benchmarks a few times in one execution, alternating between the two. Also, having a much larger data set (so that a smaller fraction is actually in the cache) helps. Ultimately, though, this demonstrates nothing so much as how little information one often gets from a contrived benchmark of two algorithmically equivalent implementations. It's possible--hell, it's likely--that one approach would work better in some circumstances but not others. The moral of the story: time realistic uses, not simple loops; worry about algorithmic performance rather than constant-factor; always keep YAGNI in mind.


Second +1 to write clear, algorithmically correct code, then bench mark actual use for low level optimizations. So far the only thing you have determined with this "Benchmark" is that you have no clue what you are actually benchmarking.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this