Sign in to follow this  
gsGomer

strtok unsafe?

Recommended Posts

Taking the advice I received on using strings instead of char arrays, I decided to get a nice STL book. I'm going through a chapter regarding strings, and must say the methods I'm learning are easy to use and powerful. I wrote a simple program using the strtok method that "strips" punctuation from a string. It looks like the following:
#include <iostream>
#include <string>

using namespace std;

int main()
{
	char x[] = "this, (is a) string!";
	char *symbol;
	symbol = strtok(x, ",;.!?");
	cout << symbol << endl;

	while ((symbol = strtok(NULL, " ,;.()!?")) != NULL)
		cout << symbol << endl;

		cout << "x at the end is :" << x << ": with length " << strlen(x) << "\n";


	system("Pause");
	return 0;
}

I know that system("Pause") is not portable, but I'm writing these on windows so it works for my situation At any rate the program compiles and runs fine, but I get the following warning from visual c++ 2005 express edition: Warning 1 warning C4996: 'strtok': This function or variable may be unsafe. Consider using strtok_s instead. To disable deprecation, use _CRT_SECURE_NO_WARNINGS. See online help for details. h:\dev\learning\applieddatastructures\2-8strtok\main.cpp 10 So I looked up strtok_s and it seems easy, but I had some questions on its usage. 1.) Is this part of the STL, if so is it some kind of new addition to the STL or has it been around for a while? 2.) I had thought that it was safer to use strings instead of char arrays to avoid buffer overflow, is this not really the case or is this security warning coming from something else?

Share this post


Link to post
Share on other sites
Lots of stuff about the CRT in there... did follow a link that states the following about the STL

"Beginning with Visual C++ 2005, several enhancements have been made to the libraries that ship with Visual C++, including the Standard C++ Library, to make them more secure.

Several methods in the Standard C++ Library have been identified as potentially unsafe because they could lead to a buffer overrun or other code defect. The use of these methods is discouraged, and new, safer methods have been created to replace them. These new methods all end in _s."


Is this a MS only issue, or are these methods found in the actual STL??

Share this post


Link to post
Share on other sites
These are Microsoft extensions.

Btw, here's your program. Notice that I spend most of the code writing a nice interface; the actual code to perform the strip is one line.


#include <algorithm>
#include <cctype>
#include <iostream>
#include <string>

std::string strip_punctuation(const std::string & s)
{
std::string ret = s;
std::remove_if(ret.begin(), ret.end(), std::ispunct);
return ret;
}

int main()
{
using namespace std;

string line;
while(true)
{
cout << "Please enter a sentence; enter an empty string to quit: ";
getline(cin, line);
if(line == "")
break;

cout << "The sentence, with punctuation stripped, is: \'"
<< strip_punctuation(line) << "\'" << endl;
}

return 0;
}

Share this post


Link to post
Share on other sites
Yes I do like your approach to this method quite a bit more, I'm assuming the remove_if function is from the algorithm header you've included?

Also, if MS says that strtok could be "unsafe" is it something I should attempt to avoid on all systems? What I mean is, if I was writing this same program on my Ubuntu machine, would I still be "unsafe" to use the strtok method?

Thanks for the info!

Share this post


Link to post
Share on other sites
0) "The STL" is dead. The thing you are talking about is the Standard C++ Library. This is an important concept. We're not talking about a third-party extension here; we're talking about the standard library of the language you are using. To ignore it in C++ is analogous to ignoring things like, say, strcat() in C.

1) Er, so if you're "taking advice on using strings", then why doesn't your "simple program" use them?

2) strtok() is an evil function. You don't want to use it. You don't want to use strtok_s(), either.

Here's the deal:

Everything with _s() after it is evil, because it's still fundamentally a C function. None of these functions have anything to do with the standard C++ library, except for the fact that you can (but normally shouldn't) use anything from the C standard library in C++. In C++, we use the string's built-in functionality, and/or algorithms from <algorithm>.

The corresponding functions without _s() at the end are, in a sense, more evil, because they haven't been designed to take buffer length into account. The _s functions have taken steps to make things a little safer against buffer overruns: you can, for example, set it up so that if you pass an array to a _s function, it will use a "secure template overload", check (at compile-time) the array size, and not overflow the array. (This only works in C++, where there are templates, and where arrays can be passed by reference. The C-specific _s functions just add an extra parameter where you specify the array length, and it becomes your responsibility to keep the values in sync.)

On the other hand, the functions with _s() at the end are, in another sense, the more evil ones, because they try to lure you away from doing things the right way. "Look at us!", they say, "we're protected against buffer overruns!" But doing things this way will still put arbitrary limits on your string lengths, and won't really help if you try to allocate memory for string dynamically. They're also evil in that they're non-standard, yes, and will tie your code to Windows. (The fundamental issue with the "plain" functions doesn't depend on the OS, though; it has to do with how C and C++ treat memory, and will be the same on Ubuntu, etc.)

strtok() is especially evil, though, because of how it works. The interface is strange, and it depends a lot on how C interprets "null-terminated" strings. Also, it drops those \0 characters directly into your source string, which means you can't parse a constant string that way, and can't necessarily get your original string back. On some Unix-like systems, the documentation for strtok() actually directly tells you never to use it!

3) Never mind portability; putting 'system("pause")' at the end of your program is a bad idea simply because it is an artificial pause. Your program should not just pause at the end, and expect an extra input from the user. That makes it harder to use your program from a script (consider, for a moment, the possibility that you might one day program something that isn't a game) and is just aesthetically ugly. It also treats your "problem" ("the command window closes after my program is done") as a problem, when it isn't actually one.

What you want to do is run the program properly. You can create a batch script that will run your program, and then the pause program; or you can play around with your IDE settings; or you can open a console prompt (Start->Run; 'cmd') and run it manually.

Share this post


Link to post
Share on other sites
Zahlman,
Lots of good info here, I appreciate you sharing this knowledge with me. It does spawn a couple of questions.

First, my code doesn't use strings because...well because its a duplicate of a program in a text book. Its actually the first example in the chapter for strings. Its good to know that the strtok is a bad function, I will remember this for future programs.

You say that the STL is dead, this really confuses me. I've been told by a lot of people, be it in books, articles, forums or otherwise that I should learn the STL...I even bought a book called the STL pocket reference. If this is really an outdated topic then perhaps I'm waisting my time?

The only other confusion I have is when you say that everything ending with _s is evil because its fundamentally C. Are you saying that I should never mix c and c++? If so what are the downsides to this, I know very little C coding so it wouldn't be a bad thing to not mix these together...would just be good to know if it's good practice.

I'm not writing a large project of any kind with any of this code, I'm simply trying to learn the functionality (good and bad) of the string class. I do understand the dangers of using system pause at the end of a function and if I were writing anything practical it would certainly be left out.

[edit]However your advice about batch files is not something I've ever tried so I will certainly give that a go![/edit]

Again I appreciate your answers and advice on these topics! :)

Share this post


Link to post
Share on other sites
Quote:
Original post by gsGomer
You say that the STL is dead, this really confuses me. I've been told by a lot of people, be it in books, articles, forums or otherwise that I should learn the STL...I even bought a book called the STL pocket reference. If this is really an outdated topic then perhaps I'm waisting my time?


This is really more a matter of semantics more than anything else. The STL did not "die", so much as it became part of the standard library (most of it did, at least). The things referred to as the "STL" still exist in the same form as they used to, and are still referred to as the "STL" by some, but technically they are no longer their own, separate (and non-standard) library.

Quote:

Are you saying that I should never mix c and c++?


Ideally, I'd say no. There are times when it is useful to use C in C++ code, but in general, when writing C++ use C++.

Share this post


Link to post
Share on other sites
Excellent clarification!

I swear for every thread I start on here I walk away learning at least five new things :-)

Thanks for the help!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this