Back to General and Gameplay Programming

Trouble with isspace with some characters

General and Gameplay Programming Programming

Started by Jary January 30, 2007 08:31 AM

5 comments, last by Jary 17 years, 2 months ago

Jary

122

Author

January 30, 2007 08:31 AM

Hello everyone, I wish to use this algorithm:


bool space(char c)
{
	return isspace(c);
}

bool not_space(char c)
{
	return !isspace(c);
}

//GET A WORD
string getword(string& buffer)
{
	typedef string::iterator iter;
	iter i = buffer.begin();
	string word;

	if (i != buffer.end()) {
		i = find_if(i, buffer.end(), not_space);

		iter j = find_if(i, buffer.end(), space);

		if (i != buffer.end()) {
			word = string(i, j);
			buffer = string(j, buffer.end());
		}
	}

	return word;
}

The trouble is that characters like "é" or "è" or such make it crash. Is there anyway to precise the good "isspace" function (I think they are 13), that allows any type of characters ? Thanks in advance.

Bregma

9,461

January 30, 2007 10:17 AM

Quote:Original post by Jary
The trouble is that characters like "é" or "è" or such make it crash. Is there anyway to precise the good "isspace" function (I think they are 13), that allows any type of characters ? Thanks in advance.

You could use the C++ standard library. It just "does the right thing."

#include <locale>#include <string>using namespace std;bool space(const char c){	return isspace(c, locale());}int main(){	const string s = "Now is the time";	string::const_iterator iter = find_if(s.begin(), s.end(), space);	// and so on and so forth....}

Stephen M. Webb
Professional Free Software Developer

Jary

122

Author

January 30, 2007 11:34 AM

Thank you very much !

I'm sorry but I have a last question:

I'm wondering if the above code with locale() is faster than this:

//GET A WORDstring getword(string& buffer){        string::size_type i = 0;        while (i < buffer.size() && buffer != ' ') {                  i;        }         string word;        if (i != 0) {                word = buffer.substr(0, i - 0);                  i;                if (i <= buffer.size())                         buffer = buffer.substr(i);                else                        buffer.clear();        }         return word;}

Bregma

9,461

January 30, 2007 12:04 PM

Quote:Original post by Jary
I'm wondering if the above code with locale() is faster than this:

No. The code using a hardcoded space character from the source language implementation set will definitely be faster than using the runtime locale's ctype facet. Then again, the faster code is not internationalizable and will fail if the space consists of invisible whitespace like tabs, cariiage returns, linefeeds, or certain Klingon characters with names unpronouncable by human vocal apparatus.

Because the locale is a lightweight object, the std::algorithms allow inline expansions, and the std::ctype facet uses a lookup table, the speed difference is not likely to be noticeable outside of a very tight loop. In the context of parsing text from human interaction (GUI stuff) or file I/O (file parsing), the difference in speed will not be significant.

--smw

Stephen M. Webb
Professional Free Software Developer

Jary

122

Author

January 30, 2007 12:18 PM

Thank you both very much !

deffer

755

January 30, 2007 02:22 PM

Just to expand on what made the original code go nuts.

Special characters are not included in 0-127 range of char values, but ocupy 128-255 range. In case of "signed char", it is seen as (-128)-(-1). isspace takes "signed int" as a parameter. So your value is converted to a signed, negative integer when passed to isspace(int) function.

C version of isspace() function is (in most implementations) internally using a lookup table of size 256, for every ASCII character value to see the character's properties (is it a space, is it lower/uppercase, etc.). Using a negative index to that table is giving you a read-access-violation.

The scenario happend to me a while ago, too. [smile]

Jary

122

Author

January 30, 2007 02:24 PM

Ah thanks a lot !

I get it now :-)

Trouble with isspace with some characters

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Trouble with isspace with some characters

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines