strtok not threadsafe on windows

Published July 09, 2010
Advertisement
Anyone, like me, who started off in the land of "C" or before
stl really got standardized probably uses, or has legacy code that has strtok
for tokenization.

On Unix/Mac platforms--strtok has been replaced by strsep.
To me, this is more for convenience: strsep is smart enough to know
to skip double delimiters.

Eg, strtok("--Hi!-Dash-Seperated", "-") = failure, 2 delimeters back to back.
strsep("--Hi!-Dash-Seperated", "-") = "Hi!"

Thats all great and wonderful, except if you are developing on windows,
strsep is not available in the msvc runtime.

The ANSI definition for strtok is a bit vague as it does not reference
what it ought do when it comes to threads and processes:

char * strtok ( char * str, const char * delimiters );Split string into tokensA sequence of calls to this function split str into tokens, which are sequences of contiguous characters separated by any of the characters that are part of delimiters.On a first call, the function expects a C string as argument for str, whose first character is used as the starting location to scan for tokens. In subsequent calls, the function expects a null pointer and uses the position right after the end of last token as the new starting location for scanning.To determine the beginning and the end of a token, the function first scans from the starting location for the first character not contained in delimiters (which becomes the beginning of the token). And then scans starting from this beginning of the token for the first character contained in delimiters, which becomes the end of the token.This end of the token is automatically replaced by a null-character by the function, and the beginning of the token is returned by the function.Once the terminating null character of str has been found in a call to strtok, all subsequent calls to this function with a null pointer as the first argument return a null pointer.


Seems okay, right? What does MSDN say about threads and strtok?
Almost all of the above plus this little note:

Note:Each function uses a thread-local static variable for parsing the string into tokens. Therefore, multiple threads can simultaneously call these functions without undesirable effects. However, within a single thread, interleaving calls to one of these functions is highly likely to produce data corruption and inaccurate results. When parsing different strings, finish parsing one string before starting to parse the next. Also, be aware of the potential for danger when calling one of these functions from within a loop where another function is called. If the other function ends up using one of these functions, an interleaved sequence of calls will result, triggering data corruption.


Well, that looks good as well. Looks like I can use it and not worry about
thread interactions.

But low and behold, what does the C Runtime say?

Using the statically linked CRT implies that any state information saved by the C runtime library will be local to that instance of the CRT. For example, if you use strtok, _strtok_l, wcstok, _wcstok_l, _mbstok, _mbstok_l when using a statically linked CRT, the position of the strtok parser is unrelated to the strtok state used in code in the same process (but in a different DLL or EXE) that is linked to another instance of the static CRT. In contrast, the dynamically linked CRT shares state for all code within a process that is dynamically linked to the CRT. This concern does not apply if you use the new more secure versions of these functions; for example, strtok_s does not have this problem.


Wow, so its only thread safe if you static link to the c runtime.
To make matters worse: If you create your COM connections in proc, that is
what they are--in process. So you and all your threads + all the threads
from any libraries you are using that are dynamically linked to the C
runtime have the chance of a collision.

Also--if you throw the /clr switch to use C++/CLI code: you are required
to dynamic link. The static option is mutually exclusive, so you can't
do a quick fix and just switch to static.

Looks like the only solution on windows is to switch to strtok_s--luckily
it works almost the same way as strtok. Almost... it still takes some tweaking
to get the exact functionality.

Good luck, and beware. If you are multi-threaded then this code
has the potential to throw an exception, and since there is no handler:
cause a crash.

...lFileLength = File->GetLength();Data = new TCHAR[lFileLength+1];File->Read(Data, lFileLength);			Data[lFileLength] = NULL;File->Close();char *buffer;// start our tokenizationbuffer = strtok(Data, "\n\r");...


Literally, after this call went on the stack and started to execute:
We had a background thread strtok which changed the value of
its static internal state pointer. When this went to execute, it was at
the end of the other call's buffer and tried to read past it.

If its not at the end of your address space, then the call could simply fail,
returning NULL, or get a garbled state where you are now tokenizing the wrong
buffer.

If you are at the end of the buffer, and the next address is not owned
by you: it will throw an exception.
0 likes 1 comments

Comments

Dragon88
Huh. I've been in C-land for a long time, and I've always found myself reaching for strstr() instead of strtok(). I guess I probably end up more or less re-implementing strtok(), but at least it's thread-safe, and allows me to deal with the double-delimiter issue easily enough.
July 10, 2010 05:01 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement