Jump to content
  • Advertisement
Sign in to follow this  
CoffeeMug

strtok() performance

This topic is 5411 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm using strtok with a multithreaded runtime library (I have to) and it proves to be a major bottleneck in my code. I don't have exact numbers but the profiler shows that about 60% of the time is spent inside strtok function. I use strtok inside an inner loop (I tokenize 100+ MB log files at work) and I'd really like to improve performance as much as possible. Any ideas what I could do? Perhaps a better tokenizer implementation? AFAIK boost implementation is significantly slower than C runtime. Thanks.

Share this post


Link to post
Share on other sites
Advertisement
you could test this out, just modify it to suit your needs:


#include <string>
#include <deque>
#include <iterator>
#include <algorithm>
#include <iostream>

template< typename Container >
void stringtok(Container& container, const std::string& in,
const char * const delimiters = " \t\n") {

const std::string::size_type len = in.length();
std::string::size_type i = 0;

while(i < len) {
// eat leading whitespace
i = in.find_first_not_of(delimiters, i);
if (i == std::string::npos)
return; // nothing left but white space

// find the end of the token
std::string::size_type j = in.find_first_of (delimiters, i);

// push token
if(j == std::string::npos) {
container.push_back(in.substr(i));
return;
} else
container.push_back(in.substr(i, j-i));

// set up for next loop
i = j + 1;
}
}

int main() {
std::deque<std::string> tokens;
std::string sentance;

std::cout << "Enter a sentance:\n";

std::getline(std::cin, sentance);

stringtok(tokens, sentance);

std::copy(tokens.begin(), tokens.end(),
std::ostream_iterator<std::string>(std::cout, "\n"));

return 0;
}

Share this post


Link to post
Share on other sites
Well, what sort of data are you parsing? I realize you may not be able to be totally specific, but maybe there are some characteristics of the data that lend themselves to more optimal solutions.

Share this post


Link to post
Share on other sites
Just arbitrary size strings (30 bytes - 1KB) separated by comas and containing nine tokens. I realize I could do a more efficient custom solution, I just find it surprising that strtok is performing so poorly.

Share this post


Link to post
Share on other sites
Hmm, replaced strtok with my own implementation (a simple for loop, really) and the bottleneck shifted to a completely different function. I wonder why MSVC implementation is so slow...

On a different note, profilers are cool [smile]

Share this post


Link to post
Share on other sites
StrTok uses some static storage. Hence there is an old rule: never, ever, ever, ever, ever use StrTok in a multithreaded environmetn. Use Stringstreams - they're inefficient, but clean.

Share this post


Link to post
Share on other sites
Quote:
Original post by Pxtl
StrTok uses some static storage. Hence there is an old rule: never, ever, ever, ever, ever use StrTok in a multithreaded environmetn. Use Stringstreams - they're inefficient, but clean.

Don't be so certain. I checked MSDN and it looks like they use thread-local storage for strtok:
Quote:
Each function uses a static variable for parsing the string into tokens. If multiple or simultaneous calls are made to the same function, a high potential for data corruption and inaccurate results exists. Therefore, do not attempt to call the same function simultaneously for different strings and be aware of calling one of these functions from within a loop where another routine may be called that uses the same function. However, calling this function simultaneously from multiple threads does not have undesirable effects.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!