• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
dave j

String handling in C

32 posts in this topic

Could be worse, could be a web of pointers so convoluted that they point to nothing while trying to point to some embedded function, with an over-called string in it that still works for some reason.  *shudders*

-1

Share this post


Link to post
Share on other sites

gcc looks at the format string for printf & co and gives a warning I think. It has to be built in to the compiler (or via metadata related to a function declaration) since using variable length argument lists removes all checking to do with type and number of arguments...

 

Clang also does this, plus, it also checks that the format string is correct with respect to argument types. Really handy!

0

Share this post


Link to post
Share on other sites

Care to elaborate? biggrin.png

 

I purposely stayed away from C's formatted output for the brief time I was learning C++.

 

You can overflow the buffer at any time, you don't know how long it is, you need to walk over the entire string in order to do any operation (which can easily lead to O(n2) algorithms) - strings in C basically contain everything that one should not do if one was going to design a string library.  See http://en.wikipedia.org/wiki/C_string_handling#Criticism and http://www.joelonsoftware.com/articles/fog0000000319.html for more.

1

Share this post


Link to post
Share on other sites

Does any C or even C++ compiler catch a mismatch like that? That's a terrible bug to have. Code reviews FTW.


This was an IBM C compiler which didn't perform any such checks. I don't think any did at the time.

The team were supposed to do code reviews and should have picked this up then. My job was developer support which included solving "our code's crashing and we don't know why" type problems. In this case I was given a memory dump and asked to figure out what was going wrong.
0

Share this post


Link to post
Share on other sites

Care to elaborate? :D

I purposely stayed away from C's formatted output for the brief time I was learning C++.

Each %s in the string means there should be another parameter that is a pointer to a string. The line should look like:

sprintf(str, "%s %s %s", a, b, c);
Because the function is expecting another parameter on the stack to go with the third %s, it will use whatever is in the next memory location after the b. This could be anything!
2

Share this post


Link to post
Share on other sites

Care to elaborate? biggrin.png

 

I purposely stayed away from C's formatted output for the brief time I was learning C++.

 

You can overflow the buffer at any time, you don't know how long it is, you need to walk over the entire string in order to do any operation (which can easily lead to O(n2) algorithms) - strings in C basically contain everything that one should not do if one was going to design a string library.  See http://en.wikipedia.org/wiki/C_string_handling#Criticism and http://www.joelonsoftware.com/articles/fog0000000319.html for more.

Actually it isn't just strings, it's arrays in general that suffer from that (actually with generic arrays it's even worse - with strings at least you can expect it to stop when there's a zero, with an array the only way to be 100% sure of the length is to pass it separately). Strings just happen to be one specific application of an array (to the point all array operations work on them).

0

Share this post


Link to post
Share on other sites

I always wonder when people just use printf-like functions with %s or even without a single %. Dont they know there are things like fputs, strcpy, strcat which dont need to parse possibly wrong format strings?

0

Share this post


Link to post
Share on other sites

Then the code above would have been equivalent to this (bug included):

strcpy(str, a);
strcat(str, " ");
strcat(str, b);
strcat(str, " ");

I know that isn't optimal (it'll read all of the string thrice) and you can make it faster, but then the code becomes less clear and can be much harder to read. Not like this code is not error prone anyway - I wonder how many programmers end up reading the strcpy as strcat. So in that sense sprintf looks like a good thing because it makes the code more concise without giving up much on readability (if we're talking about just a single string then it's overkill though).

0

Share this post


Link to post
Share on other sites

Then the code above would have been equivalent to this (bug included):

strcpy(str, a);
strcat(str, " ");
strcat(str, b);
strcat(str, " ");
I know that isn't optimal (it'll read all of the string thrice) and you can make it faster, but then the code becomes less clear and can be much harder to read. Not like this code is not error prone anyway - I wonder how many programmers end up reading the strcpy as strcat. So in that sense sprintf looks like a good thing because it makes the code more concise without giving up much on readability (if we're talking about just a single string then it's overkill though).


char* unknown;
strcpy(str, a);
strcat(str, " ");
strcat(str, b);
strcat(str, " ");
srtcat(str, unknown);
I fixed your bug for you, sir.
2

Share this post


Link to post
Share on other sites

"String handling in C" is a coding horror all on it's own - no further comment is necessary.

I'd say "string manipulation in C" is a coding horror, but consuming read-only strings in C is refreshingly lacking in unnecessary abstraction.

 

In my C++ engine, I don't use any string classes. Instead I choose to use const char* for any strings, simply because I don't do any string manipulation at all, so the simplest solution works fine wink.png

[edit] to clarify, this also means not using any of the C standard library functions that work on strings [/edit]

Edited by Hodgman
1

Share this post


Link to post
Share on other sites

Care to elaborate? biggrin.png

 

I purposely stayed away from C's formatted output for the brief time I was learning C++.

 

You can overflow the buffer at any time, you don't know how long it is, you need to walk over the entire string in order to do any operation (which can easily lead to O(n2) algorithms) - strings in C basically contain everything that one should not do if one was going to design a string library.  See http://en.wikipedia.org/wiki/C_string_handling#Criticism and http://www.joelonsoftware.com/articles/fog0000000319.html for more.

 

 

Care to elaborate? biggrin.png

I purposely stayed away from C's formatted output for the brief time I was learning C++.

Each %s in the string means there should be another parameter that is a pointer to a string. The line should look like:

sprintf(str, "%s %s %s", a, b, c);
Because the function is expecting another parameter on the stack to go with the third %s, it will use whatever is in the next memory location after the b. This could be anything!

 

 

Care to elaborate? biggrin.png

 

I purposely stayed away from C's formatted output for the brief time I was learning C++.

 

C doesn't have strings. It has arrays of characters, and some fancy goggles for the programmer which make those arrays look and act a bit like strings if you're very careful.

Oh I see then it might trash the memory, thanks!

Edited by TheChubu
0

Share this post


Link to post
Share on other sites

"String handling in C" is a coding horror all on it's own - no further comment is necessary.

I'd say "string manipulation in C" is a coding horror, but consuming read-only strings in C is refreshingly lacking in unnecessary abstraction.

Only as long as you're reading it sequentially. If you ever need to know the length, you'll need to use strlen which traverses the entire string (and thereby is a performance penalty), and if you're using a variable-length encoding such as UTF-8, consider yourself screwed as all the functions work on chars rather than the proper characters (e.g. in that case strlen would return the number of bytes, rather than the number of characters).

0

Share this post


Link to post
Share on other sites

Oh I see then it might trash the memory, thanks!


It's not just that. If the value that happens to be on the stack is invalid if used as an address, it would crash the program.
0

Share this post


Link to post
Share on other sites

If you're lucky it trashes the memory and gives you a nice clean crash at the point where things started going wrong.

 

More normally, it seems to work OK but at some arbitrary point later and in a completely different part of your code you start getting weird things happen.

 

Yeah, strings are just arrays, but I think it's worth singling out strings here because if you're using an array you've normally got an extra level of awareness of what you're doing, whereas the CRT tries to look like it's pretending that strings are some kind of special case or something different, which may lead the unwary into thinking that they're OK.

0

Share this post


Link to post
Share on other sites

It is very easy for users to make grave mistakes, and the CRT will have no way of detecting them. If you misuse strncpy(), and omit copying the null-terminator, then trying to use strlen() later on will invoke undefined behavior, and likely crash. Additionally, for reasons mentioned above, anything involving knowing the string's length is a nightmare. Unless you pass the length yourself, calculating the length all of the time is unacceptable, because it has to loop over all of the characters to find the end. Using strcat() repeatedly means repeatedly finding the length of the destination string, then writing characters to the end of it. There's no way around that, unless you keep track of the length of the destination string after the end of each operation (which might require getting the length of the source string before each operation. D'oh!) Also, don't forget, if you are using strncat(), to make sure that there is a null-terminator. Rookie mistake.

You also don't have the luxuries of a higher-level string object: things like resizing the string whenever you please, iterating through characters starting from the _end_ of the string, finding the character in the middle of the string, etc. Since you could have gotten that string though any means, if it is determined that you don't have enough space in the string to store more characters, you might not have enough information to know whether the string was allocated with malloc(), on the stack, part of a memory-mapped file, or even garbage data. An std::basic_string has the ability to resize itself with its copy of its allocator class instance. A function that handles C-strings either must refuse to resize strings, or blindly trust that the caller has provided enough space for characters in the memory block. Worrying about this leads directly to using strncpy() and strncat(), which have a case where a null-terminator isn't appended, for historical reasons!

In closing, yes, it has all of the shortcomings of an array, and it lures the inexperienced (and sometimes the experienced, too) into somewhat of a false sense of security, by hiding how fragile a C-string is, causing them to handle them wrong and then the fun begins.

0

Share this post


Link to post
Share on other sites

Back on topic, I once did this:

 

char *buf = (char *) malloc (strlen (str + 1));
strcpy (buf, str);

 

Ouch!!!!

Edited by mhagain
2

Share this post


Link to post
Share on other sites

I have to agree with Hodgman. In the case of C strings vs std::string, it's not black and white: there are cases where C strings are perfectly viable to use.

 

Generally, I prefer to use const char* in my public methods, and use std::string internally. The first reason is that different compilers (sometimes even different versions of a same compiler) will produce incompatible binaries with a same template library, which can cause very bad bugs when you call a method that takes a std::string across DLL boundaries. This basically forces you to distribute a runtime for every single compiler. (Ogre is the perfect example of this.) The second reason I restrict myself to C strings in public interfaces is that if a function takes a std::string for parameter, you have to do 'const std::string("Hello, world!")' to pass a constant literal, in which case you're depending on the compiler and implementation to optimize it. On the other hand, passing a pointer to a char array on the stack consists of one very basic operation.

 

Of course, when it comes to string manipulation (or just storing copies), you'd be crazy to not use std::string. I'm 100% agreeing that strncat and friends are annoying, error prone and very vulnerable to attacks. Basically, the only C functions I use are strlen() and wcslen() for getting the length of character arrays, for other operations there's no point to not use std::string.

 

As for multi-byte strings, do people actually use that in game projects? For performance reasons, I'd rather use 8- or 16-bit characters anyway.

0

Share this post


Link to post
Share on other sites

But without C style string manipulation we don't get all of the exciting possibilities of buffer overrun exploits! Ahh... The good old days....

0

Share this post


Link to post
Share on other sites

As for multi-byte strings, do people actually use that in game projects? For performance reasons, I'd rather use 8- or 16-bit characters anyway.

Since when were string operations a major performance bottleneck in games?

If you plan on localising for non-Latin scripts, you probably want to use UTF-8 or UTF-16 - both of which have differing-byte characters.
2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0