Sign in to follow this  
branhield

strtok bus error

Recommended Posts

the following piece of code generates a bus error, and i've been twisting my head inside out to figure out why but i'm stuck. i've tried looking at the strtok source(strtok.c), but it didn't make me any wiser. it works without error if i make str1 an array of chars instead of a char pointer. any ideas/facts?
char* str1 = "hello_world";
char* str2 = strtok(str1, "_");

Share this post


Link to post
Share on other sites
It has to do with the way the string "hello_world" is stored. It is stored in a memory segment that is not modifiable.

Try this:


char *str1 = "boo";
str1[1] = 'x';

that will give a run-time error.



However,

char str2[] = "boo";
str2[1] = 'x';

will not. This is because str2 is stored in a segment that's modifiable. I wish I could remember the exact technical explanation behind the two, but it's been a few months since I've been out of class ;)


Now, about strtok. What strtok does for tokenization of a string, is actually -modify- the source string itself (this is why it's not const) to add a null where it found the token. This is so that on the next call to strtok (when the supplied source string is NULL), it can continue where it left off. It's kind of anal and dangerous, but this is how they designed it.

Share this post


Link to post
Share on other sites
Quote:
Original post by branhield
okay, i see. so wherever i see a function that takes a const char* i should pass a char* and if it wants a char* i should pass a char[] ?


No, if a function expects a const char*, you should pass it a const char*.

I'm no assembly guru, but I compared the difference between using char*sz="sss" and char sz[]="sss", and this is what it looks like to me:

When you use the "char*" method, the compiler creates a 4-byte segment in a _TEXT segment.
And when the execution gets to the point where you assign a value to that segment, all it does is a 'mov' instruction to place the address of the constant "hello_world" string into that 4-byte segment.

now, we come to
sz[0] = "x";

When you attempt to set the value of a char in that string (and maybe an asm god can explain this to me).. the address of the string is copied into the eax register, and then I see:

mov BYTE PTR [eax], 120

From what I understand.. it's trying to place '120' (x) into an undefined location contained in eax. Naturally, this is in a non-mutable segment, and the compiler complains.

However, when we use the char[] method, the compiler creates the _TEXT segment just big enough to contain the desired string. In our case, it's 12 bytes long.
It then performs a number of mov instructions to copy the string from the constant segment into that _TEXT segment.

Then when you try to change an element in that string, it just changes the copy.
So, it would seem to me that this is more inefficient, because it copies the entire string 4 bytes at a time into a temporary location.

Again, I don't know that much asm, perhaps someone else can explain it better.

Oh, and FYI, when I run a release build, both methods work fine.

Share this post


Link to post
Share on other sites
To put it simply, any literal is a constant.
So when you type "hello_world", you're making a constant c string.
When you do
char *str1 = "hello_world";
you're making a pointer to non-const string point to a const string, which is a bad thing.
If you then try to write to the string that 'str1' points to, you get undefined behavior because you're modifying a constant.

When you do
char str[] = "hello_world";
you're making an array of characters and copying over a constant string to the array. The array itself is not constant, so changing stuff inside the array is fine. The original string is never modified this way.

strtok modifies it's first argument, which in your example is really a constant string.
The proper way to declare str1 would be
const char* str1 = "hello_world";
and if you try to do that, you'll get a compiler error because strtok doesn't want a constant string as the first parameter. When you change the pointer to an array, it copies the constant string into the array so strtok is modifying the copy instead of the constant original.

Share this post


Link to post
Share on other sites
Okay, that clears it up! Thanks for the great replies guys. I guess i really should take some time to understand assembly, as it seems that everything that gets fucked up in C, can be easier explained using asm.

Share this post


Link to post
Share on other sites
For anyone who's wondering why the compiler lets you assign a non-const pointer to constant data, it's for compatibility reasons with previous versions of C and C++, where the type of a string literal was char*, not const char*. I personally think they should be throwing a compiler warning at least nowadays.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this