SpriteFont render Umlaute

Started by
23 comments, last by xycsoscyx 8 years, 1 month ago

Wow, thank you. Indeed I hadn't UTF-8...

Now I have it and everything works fine!

THANK YOU! *-*

Advertisement

I have one more question to this topic... xD

I have a Inputfield, where the user can write something...

The fieldinput I filter with this regex:


std::regex reg("^[A-Za-z0-9_öäüÖÄÜß]*$");
if (!std::regex_match(levelName, reg)) {
	m_error.setError(16);	///< Allow only letters & digits
	return true;
}

The problem is now, that the compiler thinks, that he can't work with umlauts...

If i input this string:

äüöß

It checks for the string:

äüöß

Is there a way, how I can check with umlauts, or do I to put these strange chars instead of the Umlauts in the regex?

EDIT:

And if I have to do it with the strange chars:

How can I say regex, that it should take the two chars together always?

Because ä is ä and not à or ¤

And I don't think, that it is good, if I only check for those chars, I think i should check for the two chars at once always (so I really only accept Umlauts, Letters, Digits & _'s)

EDIT 2:

I got this solution:


std::regex reg("^[A-Za-z0-9_(\b(? : ö|ä|ü|Ä|Ü|ß) - ? \b) + ]*$");

I know, this looks crazy, but I think it works...

Is this right? And is there a better solution for this? :s

It looks like it's auto converting to utf-8, but doing straight character comparisons. There's a lot going on under the hood when working with STL and regex and etc, including dealing with locale. Here's a good example of setting the locale to support utf-8: http://stackoverflow.com/questions/11254232/do-c11-regular-expressions-work-with-utf-8-strings

char strings typically are simple strings, but utf-8 is a type of encoding. You can store a utf-8 string in a std:string, but at that point it's really not human readable anymore. A lot of editors know that the string is utf-8 and auto convert when displaying the value, but if you look at the raw memory then you'll see that it's actually a buffer of utf-8 data under the hood. That's what's happening with äüöß to äüöß, äüöß doesn't make any sense when reading, but the values in it are the utf-8 encoded data for äüöß.

EDIT: This is why your solution works, you're extending the regex to check the encoded utf-8 values. You won't need to do that if you set the locale ala the SO link, or if you switched to using wchar_t types (wregex/wstring).

I don't get it..

How do I have to do that?

Like this:


std::locale old;
std::locale::global(std::locale("en_US.UTF-8"));
std::regex reg("^[A-Za-z0-9_öäüÖÄÜß]*$", std::regex_constants::extended);
// Check regex...
std::locale::global(old)

?

Correct, std::locale::global() sets some global values that std::regex (and other things) will use under the hood. Since your string seems to get converted into UTF-8 during compile, setting the global locale to a UTF-8 variant means it will do proper handling when doing the regex expressions. You shouldn't even need to reset to the old value, if you're using UTF-8 then you might as well use it across the board. Set it and forget it.

This topic is closed to new replies.

Advertisement