strtok bus error

Started by
5 comments, last by Nemesis2k2 18 years, 10 months ago
the following piece of code generates a bus error, and i've been twisting my head inside out to figure out why but i'm stuck. i've tried looking at the strtok source(strtok.c), but it didn't make me any wiser. it works without error if i make str1 an array of chars instead of a char pointer. any ideas/facts?

char* str1 = "hello_world";
char* str2 = strtok(str1, "_");

Advertisement
It has to do with the way the string "hello_world" is stored. It is stored in a memory segment that is not modifiable.

Try this:


char *str1 = "boo";
str1[1] = 'x';

that will give a run-time error.



However,

char str2[] = "boo";
str2[1] = 'x';

will not. This is because str2 is stored in a segment that's modifiable. I wish I could remember the exact technical explanation behind the two, but it's been a few months since I've been out of class ;)


Now, about strtok. What strtok does for tokenization of a string, is actually -modify- the source string itself (this is why it's not const) to add a null where it found the token. This is so that on the next call to strtok (when the supplied source string is NULL), it can continue where it left off. It's kind of anal and dangerous, but this is how they designed it.

okay, i see. so wherever i see a function that takes a const char* i should pass a char* and if it wants a char* i should pass a char[] ?
Quote:Original post by branhield
okay, i see. so wherever i see a function that takes a const char* i should pass a char* and if it wants a char* i should pass a char[] ?


No, if a function expects a const char*, you should pass it a const char*.

I'm no assembly guru, but I compared the difference between using char*sz="sss" and char sz[]="sss", and this is what it looks like to me:

When you use the "char*" method, the compiler creates a 4-byte segment in a _TEXT segment.
And when the execution gets to the point where you assign a value to that segment, all it does is a 'mov' instruction to place the address of the constant "hello_world" string into that 4-byte segment.

now, we come to
sz[0] = "x";

When you attempt to set the value of a char in that string (and maybe an asm god can explain this to me).. the address of the string is copied into the eax register, and then I see:

mov BYTE PTR [eax], 120

From what I understand.. it's trying to place '120' (x) into an undefined location contained in eax. Naturally, this is in a non-mutable segment, and the compiler complains.

However, when we use the char[] method, the compiler creates the _TEXT segment just big enough to contain the desired string. In our case, it's 12 bytes long.
It then performs a number of mov instructions to copy the string from the constant segment into that _TEXT segment.

Then when you try to change an element in that string, it just changes the copy.
So, it would seem to me that this is more inefficient, because it copies the entire string 4 bytes at a time into a temporary location.

Again, I don't know that much asm, perhaps someone else can explain it better.

Oh, and FYI, when I run a release build, both methods work fine.
To put it simply, any literal is a constant.
So when you type "hello_world", you're making a constant c string.
When you do
char *str1 = "hello_world";
you're making a pointer to non-const string point to a const string, which is a bad thing.
If you then try to write to the string that 'str1' points to, you get undefined behavior because you're modifying a constant.

When you do
char str[] = "hello_world";
you're making an array of characters and copying over a constant string to the array. The array itself is not constant, so changing stuff inside the array is fine. The original string is never modified this way.

strtok modifies it's first argument, which in your example is really a constant string.
The proper way to declare str1 would be
const char* str1 = "hello_world";
and if you try to do that, you'll get a compiler error because strtok doesn't want a constant string as the first parameter. When you change the pointer to an array, it copies the constant string into the array so strtok is modifying the copy instead of the constant original.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Okay, that clears it up! Thanks for the great replies guys. I guess i really should take some time to understand assembly, as it seems that everything that gets fucked up in C, can be easier explained using asm.
For anyone who's wondering why the compiler lets you assign a non-const pointer to constant data, it's for compatibility reasons with previous versions of C and C++, where the type of a string literal was char*, not const char*. I personally think they should be throwing a compiler warning at least nowadays.

This topic is closed to new replies.

Advertisement