STL Map char* is faster than string

Started by
23 comments, last by stonemetal 15 years, 9 months ago
Hello, I wrote a test program to test the speed of find() in std::map. And I found that if I use char* as the key type, compare to string, the program would spend less time in find(). My question is why? I think both of them compare keys character by character. Here's the program:

#include <sstream>
#include <ctime>
#include <iostream>
#include <map>
#include <string>

using namespace std;

struct lesscmp{

  bool operator()(const char* st1, const char* st2 )
    return strcmp( st1, st2 ) < 0;
}

size_t LOOP = 50000;

int main()
{
  int t;
  
  map<string, int> map1;
  map<const char*, int, lesscmp> map2;

  string sta[LOOP];
  for( size_t i = 0 ; i < LOOP; i++ ){
    stringstream ss;
    ss << i;
    string st;
    ss >> st;
    sta = st;

    map1[ st ] = 1;
    map2[ st.c_str() ] = 1;
  }

  t = clock();
  for( size_t i = 0 ; i < LOOP; i++ )
    map1.find( sta );

  cout << clock() - t << endl;
  
  t = clock();
  for( size_t i = 0 ; i < LOOP; i++ )
    map2.find( sta.c_str() );

  cout << clock() - t << endl;

  return 0;
}

The output is: 110000 50000
Advertisement
Because you're doing it wrong?

Using (relatively old) gcc-3.3.5 on linux 2.4.27 (relatively old debian install) 1.2 ghz, 512m with 'gcc -O3', and your source fixed to actually compile:

#include <sstream>#include <string>#include <iostream>#include <map>using namespace std;struct lesscmp{  bool operator()(const char* st1, const char* st2 ){    return strcmp( st1, st2 ) < 0;  }};int main(){  int t;  size_t LOOP = 50000;  map<string, int> map1;  map<const char*, int, lesscmp> map2;  string sta[LOOP];  for( size_t i = 0 ; i < LOOP; i++ ){    stringstream ss;    ss << i;    string st;    ss >> st;    sta = st;    map1[ st ] = 1;    map2[ st.c_str() ] = 1;  }  t = clock();  for( size_t i = 0 ; i < LOOP; i++ )    map1.find( sta );  cout << clock() - t << endl;  t = clock();  for( size_t i = 0 ; i < LOOP; i++ )    map2.find( sta.c_str() );  cout << clock() - t << endl;  return 0;}


yields:
5000050000


The key there is using -O3 I suspect. Release mode/optimizations on is your friend.
I'm suprised that even works.
string st; is a temp variable with limited scope.
st.c_str() is a temp variable valid only within the context of the call to it
map2[ st.c_str() ] = 1; makes a copy of the POINTER not a copy of the string
and thusly, map2 probably contains garbage for keys.
Unless I'm totally missing something.
Quote:Original post by Telastyn
yields:

50000
50000


Shame. I was hoping std::string would win due to it being able to reject strings based on length rather than content - I don't know if strcmp does anything clever like that though.

Quote: for( size_t i = 0 ; i < LOOP; i++ ){
stringstream ss;
ss << i;
string st;
ss >> st;
sta = st;

map1[ st ] = 1;
map2[ st.c_str() ] = 1;
}

Hmm, what am I missing?

The way I see it the c_str is retrieved, stored in the map, and then the string is destructed, leaving the map's pointer dangling.

Edit:
@KulSeran: Snap [grin]
Yeah, I didn't pay any attention to the functional correctness of the test. I have better things to futz around with than avoiding C++ pitfalls.
I am using gcc-4.2.3 (Ubuntu 4.2.3-2ubuntu7) and linux 2.6.24-19.
The compile command is "g++ -O3 main.cc"

I try to run the program three times, and the outputs are:

joe@joe-laptop:~/Desktop/map$ ./a.out
50000
40000
joe@joe-laptop:~/Desktop/map$ ./a.out
60000
20000
joe@joe-laptop:~/Desktop/map$ ./a.out
60000
30000


The overall is faster, but seems like char* is still a little bit faster.
Quote:Original post by dmatter
Shame. I was hoping std::string would win due to it being able to reject strings based on length rather than content - I don't know if strcmp does anything clever like that though.
strcmp can't do anything clever like that, since it doesn't know the length ahead of time. It's sort of a silly optimization, though... it would be difficult to talk about the "average case" for string comparison, but I would think a differing first character would be at least as likely as a differing length for two different strings.
Quote:Original post by joe_phsiao
The overall is faster, but seems like char* is still a little bit faster.

The variance in your results is far too high to draw any meaningful conclusions, and clock() is not suitable for reliable benchmarking. Try the tests again with something a little more accurate and precise before jumping to conclusions.
Quote:Original post by Sneftel
strcmp can't do anything clever like that, since it doesn't know the length ahead of time.

I was thinking more along the lines of peeking into the header to the allocated memory to determine length.

Quote:It's sort of a silly optimization, though... it would be difficult to talk about the "average case" for string comparison, but I would think a differing first character would be at least as likely as a differing length for two different strings.

I suppose, in the general case.
Although, with a finite set of symbols, there are an infinite number of strings beginning with a character c, but finite number of length n.
Quote:Original post by dmatter
I was thinking more along the lines of peeking into the header to the allocated memory to determine length.
The allocation sizes tell you nothing about whether the strings are equal or not.

This topic is closed to new replies.

Advertisement