Jump to content

  • Log In with Google      Sign In   
  • Create Account

String handling in C

  • You cannot reply to this topic
32 replies to this topic

#1 dave j   Members   -  Reputation: 599

Like
4Likes
Like

Posted 10 March 2013 - 04:53 PM

In a former colleague's code about 14 years ago:

sprintf(str, "%s %s %s", a, b);

The value of str was displayed on screen after this. This had been live at a bank for a couple of years before it was discovered. The reason it took so long to notice is that the value on the stack that was used for the third string's address happened to point to a byte containing zero.

Sponsor:

#2 Alpha_ProgDes   Crossbones+   -  Reputation: 4692

Like
1Likes
Like

Posted 10 March 2013 - 04:57 PM

Does any C or even C++ compiler catch a mismatch like that? That's a terrible bug to have. Code reviews FTW.


Beginner in Game Development? Read here.
 
Super Mario Bros clone tutorial written in XNA 4.0 [MonoGame, ANX, and MonoXNA] by Scott Haley
 
If you have found any of the posts helpful, please show your appreciation by clicking the up arrow on those posts Posted Image
 
Spoiler

#3 Paradigm Shifter   Crossbones+   -  Reputation: 5436

Like
4Likes
Like

Posted 10 March 2013 - 05:05 PM

gcc looks at the format string for printf & co and gives a warning I think. It has to be built in to the compiler (or via metadata related to a function declaration) since using variable length argument lists removes all checking to do with type and number of arguments...


"Most people think, great God will come from the sky, take away everything, and make everybody feel high" - Bob Marley

#4 mhagain   Crossbones+   -  Reputation: 8279

Like
3Likes
Like

Posted 10 March 2013 - 05:36 PM

"String handling in C" is a coding horror all on it's own - no further comment is necessary.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#5 TheChubu   Crossbones+   -  Reputation: 4766

Like
1Likes
Like

Posted 10 March 2013 - 06:04 PM

Care to elaborate? :D

 

I purposely stayed away from C's formatted output for the brief time I was learning C++.


"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#6 iGoogleThis   Members   -  Reputation: 207

Like
-1Likes
Like

Posted 11 March 2013 - 02:29 AM

Could be worse, could be a web of pointers so convoluted that they point to nothing while trying to point to some embedded function, with an over-called string in it that still works for some reason.  *shudders*


Editor // Joy-Toilet.com

Anything But Shitty Entertainment!


#7 patrrr   Members   -  Reputation: 1049

Like
0Likes
Like

Posted 11 March 2013 - 03:43 AM

gcc looks at the format string for printf & co and gives a warning I think. It has to be built in to the compiler (or via metadata related to a function declaration) since using variable length argument lists removes all checking to do with type and number of arguments...

 

Clang also does this, plus, it also checks that the format string is correct with respect to argument types. Really handy!



#8 mhagain   Crossbones+   -  Reputation: 8279

Like
1Likes
Like

Posted 11 March 2013 - 04:21 AM

Care to elaborate? biggrin.png

 

I purposely stayed away from C's formatted output for the brief time I was learning C++.

 

You can overflow the buffer at any time, you don't know how long it is, you need to walk over the entire string in order to do any operation (which can easily lead to O(n2) algorithms) - strings in C basically contain everything that one should not do if one was going to design a string library.  See http://en.wikipedia.org/wiki/C_string_handling#Criticism and http://www.joelonsoftware.com/articles/fog0000000319.html for more.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#9 dave j   Members   -  Reputation: 599

Like
0Likes
Like

Posted 11 March 2013 - 07:26 AM

Does any C or even C++ compiler catch a mismatch like that? That's a terrible bug to have. Code reviews FTW.


This was an IBM C compiler which didn't perform any such checks. I don't think any did at the time.

The team were supposed to do code reviews and should have picked this up then. My job was developer support which included solving "our code's crashing and we don't know why" type problems. In this case I was given a memory dump and asked to figure out what was going wrong.

#10 dave j   Members   -  Reputation: 599

Like
2Likes
Like

Posted 11 March 2013 - 07:34 AM

Care to elaborate? :D

I purposely stayed away from C's formatted output for the brief time I was learning C++.

Each %s in the string means there should be another parameter that is a pointer to a string. The line should look like:

sprintf(str, "%s %s %s", a, b, c);
Because the function is expecting another parameter on the stack to go with the third %s, it will use whatever is in the next memory location after the b. This could be anything!

#11 Sik_the_hedgehog   Crossbones+   -  Reputation: 1833

Like
0Likes
Like

Posted 11 March 2013 - 11:48 AM

Care to elaborate? biggrin.png

 

I purposely stayed away from C's formatted output for the brief time I was learning C++.

 

You can overflow the buffer at any time, you don't know how long it is, you need to walk over the entire string in order to do any operation (which can easily lead to O(n2) algorithms) - strings in C basically contain everything that one should not do if one was going to design a string library.  See http://en.wikipedia.org/wiki/C_string_handling#Criticism and http://www.joelonsoftware.com/articles/fog0000000319.html for more.

Actually it isn't just strings, it's arrays in general that suffer from that (actually with generic arrays it's even worse - with strings at least you can expect it to stop when there's a zero, with an array the only way to be 100% sure of the length is to pass it separately). Strings just happen to be one specific application of an array (to the point all array operations work on them).


Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#12 wintertime   Members   -  Reputation: 1877

Like
0Likes
Like

Posted 11 March 2013 - 02:56 PM

I always wonder when people just use printf-like functions with %s or even without a single %. Dont they know there are things like fputs, strcpy, strcat which dont need to parse possibly wrong format strings?



#13 Sik_the_hedgehog   Crossbones+   -  Reputation: 1833

Like
0Likes
Like

Posted 11 March 2013 - 04:01 PM

Then the code above would have been equivalent to this (bug included):

strcpy(str, a);
strcat(str, " ");
strcat(str, b);
strcat(str, " ");

I know that isn't optimal (it'll read all of the string thrice) and you can make it faster, but then the code becomes less clear and can be much harder to read. Not like this code is not error prone anyway - I wonder how many programmers end up reading the strcpy as strcat. So in that sense sprintf looks like a good thing because it makes the code more concise without giving up much on readability (if we're talking about just a single string then it's overkill though).


Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#14 Kylotan   Moderators   -  Reputation: 3338

Like
4Likes
Like

Posted 11 March 2013 - 05:52 PM

Care to elaborate? biggrin.png

 

I purposely stayed away from C's formatted output for the brief time I was learning C++.

 

C doesn't have strings. It has arrays of characters, and some fancy goggles for the programmer which make those arrays look and act a bit like strings if you're very careful.



#15 Khatharr   Crossbones+   -  Reputation: 3040

Like
2Likes
Like

Posted 11 March 2013 - 08:01 PM

Then the code above would have been equivalent to this (bug included):

strcpy(str, a);
strcat(str, " ");
strcat(str, b);
strcat(str, " ");
I know that isn't optimal (it'll read all of the string thrice) and you can make it faster, but then the code becomes less clear and can be much harder to read. Not like this code is not error prone anyway - I wonder how many programmers end up reading the strcpy as strcat. So in that sense sprintf looks like a good thing because it makes the code more concise without giving up much on readability (if we're talking about just a single string then it's overkill though).


char* unknown;
strcpy(str, a);
strcat(str, " ");
strcat(str, b);
strcat(str, " ");
srtcat(str, unknown);
I fixed your bug for you, sir.
void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

#16 Hodgman   Moderators   -  Reputation: 31847

Like
1Likes
Like

Posted 11 March 2013 - 08:23 PM

"String handling in C" is a coding horror all on it's own - no further comment is necessary.

I'd say "string manipulation in C" is a coding horror, but consuming read-only strings in C is refreshingly lacking in unnecessary abstraction.

 

In my C++ engine, I don't use any string classes. Instead I choose to use const char* for any strings, simply because I don't do any string manipulation at all, so the simplest solution works fine wink.png

[edit] to clarify, this also means not using any of the C standard library functions that work on strings [/edit]


Edited by Hodgman, 12 March 2013 - 04:18 AM.


#17 TheChubu   Crossbones+   -  Reputation: 4766

Like
0Likes
Like

Posted 11 March 2013 - 08:28 PM

Care to elaborate? biggrin.png

 

I purposely stayed away from C's formatted output for the brief time I was learning C++.

 

You can overflow the buffer at any time, you don't know how long it is, you need to walk over the entire string in order to do any operation (which can easily lead to O(n2) algorithms) - strings in C basically contain everything that one should not do if one was going to design a string library.  See http://en.wikipedia.org/wiki/C_string_handling#Criticism and http://www.joelonsoftware.com/articles/fog0000000319.html for more.

 

 

Care to elaborate? biggrin.png

I purposely stayed away from C's formatted output for the brief time I was learning C++.

Each %s in the string means there should be another parameter that is a pointer to a string. The line should look like:

sprintf(str, "%s %s %s", a, b, c);
Because the function is expecting another parameter on the stack to go with the third %s, it will use whatever is in the next memory location after the b. This could be anything!

 

 

Care to elaborate? biggrin.png

 

I purposely stayed away from C's formatted output for the brief time I was learning C++.

 

C doesn't have strings. It has arrays of characters, and some fancy goggles for the programmer which make those arrays look and act a bit like strings if you're very careful.

Oh I see then it might trash the memory, thanks!


Edited by TheChubu, 11 March 2013 - 08:30 PM.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

 

My journals: dustArtemis ECS framework and Making a Terrain Generator


#18 Sik_the_hedgehog   Crossbones+   -  Reputation: 1833

Like
0Likes
Like

Posted 12 March 2013 - 01:32 AM

"String handling in C" is a coding horror all on it's own - no further comment is necessary.

I'd say "string manipulation in C" is a coding horror, but consuming read-only strings in C is refreshingly lacking in unnecessary abstraction.

Only as long as you're reading it sequentially. If you ever need to know the length, you'll need to use strlen which traverses the entire string (and thereby is a performance penalty), and if you're using a variable-length encoding such as UTF-8, consider yourself screwed as all the functions work on chars rather than the proper characters (e.g. in that case strlen would return the number of bytes, rather than the number of characters).


Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#19 dave j   Members   -  Reputation: 599

Like
0Likes
Like

Posted 12 March 2013 - 05:24 AM

Oh I see then it might trash the memory, thanks!


It's not just that. If the value that happens to be on the stack is invalid if used as an address, it would crash the program.

#20 mhagain   Crossbones+   -  Reputation: 8279

Like
0Likes
Like

Posted 12 March 2013 - 07:09 AM

If you're lucky it trashes the memory and gives you a nice clean crash at the point where things started going wrong.

 

More normally, it seems to work OK but at some arbitrary point later and in a completely different part of your code you start getting weird things happen.

 

Yeah, strings are just arrays, but I think it's worth singling out strings here because if you're using an array you've normally got an extra level of awareness of what you're doing, whereas the CRT tries to look like it's pretending that strings are some kind of special case or something different, which may lead the unwary into thinking that they're OK.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.






PARTNERS