Wow, thank you. Indeed I hadn't UTF-8...
Now I have it and everything works fine!
THANK YOU! *-*
Wow, thank you. Indeed I hadn't UTF-8...
Now I have it and everything works fine!
THANK YOU! *-*
I have one more question to this topic... xD
I have a Inputfield, where the user can write something...
The fieldinput I filter with this regex:
std::regex reg("^[A-Za-z0-9_öäüÖÄÜß]*$");
if (!std::regex_match(levelName, reg)) {
m_error.setError(16); ///< Allow only letters & digits
return true;
}
The problem is now, that the compiler thinks, that he can't work with umlauts...
If i input this string:
äüöß
It checks for the string:
äüöß
Is there a way, how I can check with umlauts, or do I to put these strange chars instead of the Umlauts in the regex?
EDIT:
And if I have to do it with the strange chars:
How can I say regex, that it should take the two chars together always?
Because ä is ä and not à or ¤
And I don't think, that it is good, if I only check for those chars, I think i should check for the two chars at once always (so I really only accept Umlauts, Letters, Digits & _'s)
EDIT 2:
I got this solution:
std::regex reg("^[A-Za-z0-9_(\b(? : ö|ä|ü|Ä|Ü|ß) - ? \b) + ]*$");
I know, this looks crazy, but I think it works...
Is this right? And is there a better solution for this? :s
It looks like it's auto converting to utf-8, but doing straight character comparisons. There's a lot going on under the hood when working with STL and regex and etc, including dealing with locale. Here's a good example of setting the locale to support utf-8: http://stackoverflow.com/questions/11254232/do-c11-regular-expressions-work-with-utf-8-strings
char strings typically are simple strings, but utf-8 is a type of encoding. You can store a utf-8 string in a std:string, but at that point it's really not human readable anymore. A lot of editors know that the string is utf-8 and auto convert when displaying the value, but if you look at the raw memory then you'll see that it's actually a buffer of utf-8 data under the hood. That's what's happening with äüöß to äüöß, äüöß doesn't make any sense when reading, but the values in it are the utf-8 encoded data for äüöß.
EDIT: This is why your solution works, you're extending the regex to check the encoded utf-8 values. You won't need to do that if you set the locale ala the SO link, or if you switched to using wchar_t types (wregex/wstring).
I don't get it..
How do I have to do that?
Like this:
std::locale old;
std::locale::global(std::locale("en_US.UTF-8"));
std::regex reg("^[A-Za-z0-9_öäüÖÄÜß]*$", std::regex_constants::extended);
// Check regex...
std::locale::global(old)
?
Correct, std::locale::global() sets some global values that std::regex (and other things) will use under the hood. Since your string seems to get converted into UTF-8 during compile, setting the global locale to a UTF-8 variant means it will do proper handling when doing the regex expressions. You shouldn't even need to reset to the old value, if you're using UTF-8 then you might as well use it across the board. Set it and forget it.