# Regex - Scan a string for special chars

This topic is 914 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Introduction

In my program I have a field, where the user can input data, that gets stored after it. This field should only accept normal letters, digits and (German) umlauts. For the first 2  things I could already create a Regex, but I don't know, how I can add the third thing (ä,ü,ö,Ä,Ü,Ö).

Also I'm not exactly sure, how I can check, if the regex is in the string, or not. My plan is, that I scan the string (with regex_match) and if there is more than one char, that is not from one of the 3, at the top, mentioned categories, it should return false, otherwise it should return true.

Code

std::regex reg("[^A-Za-z0-9\s]"); // Add Umlauts
std::string testString1 = "Hello, this is the first test string"
std::string testString2 = "Hällo, this is the 2nd test String+#"

// Check testString1 for reg
//-> Should return true

// Check testString2 for reg
//-> Should return false


Detailed Question

How can I check, if a string matches the reg in my example and how can I extend the reg, that it allows umlauts?

Thanks!

##### Share on other sites

After poking around it doesn't look like there's an umlaut class, but if you're using a locale that supports umlaut (see http://stackoverflow.com/questions/11254232/do-c11-regular-expressions-work-with-utf-8-strings) then you should be able to just manually include them in the regex:

std::regex reg("[^A-Za-z0-9äüöÄÜÖ\s]");

##### Share on other sites

Thanks! :)

And what's about the second part of the question?  :P

##### Share on other sites

Normally you try to write a regular expression that fully captures the kind of strings you want to allow. Your approach of having a pattern you want to disallow and search for instances of it can also work, but would be a less common approach.

To simplify, I'm going to imagine a case you only want lowercase letters (e.g. [a-z]). You can say that you expect zero or more, or one or more, by using the metacharacters * or +. You can use the metacharacters ^ and $to say that your requirement is to match against the entire input, not just a partial match in the middle. So that might yield ^[a-z]*$, which means a string starting with zero or more lowercase characters and then ending.

You might have something like this:

#include <regex>
#include <string>
#include <iostream>

int main() {
#if 0
std::regex reg("^[a-z]+$"); #else std::regex reg("^[a-z]*$");
#endif

std::string strings[] = { "", "    ", "hello", "1number2", "!goodbye!" };
for (std::string s : strings) {
std::cout << '\"' << s << "\": " << (std::regex_match(s, reg) ? "matches" : "doesn\'t match") << '\n';
}
}



There are helpful sites which can speed up developing a regular expressions, the one I linked is tailored for Ruby (regex syntax can vary), but there are lots of alternatives out there.

Thanks! :)

##### Share on other sites

but I don't know, how I can add the third thing (ä,ü,ö,Ä,Ü,Ö).

The general solution to handle non-ASCII characters is to switch to codepoints, which in C++ means use some form of wide characters, std::wstring, or in C++11 also available, std::u16string and std::u32string. See also http://en.cppreference.com/w/cpp/string/basic_string

1. 1
2. 2
3. 3
4. 4
5. 5
Rutin
11

• 12
• 19
• 10
• 14
• 10
• ### Forum Statistics

• Total Topics
632665
• Total Posts
3007711
• ### Who's Online (See full list)

There are no registered users currently online

×