boost::regex - How to match "variable" end of string?

Started by
2 comments, last by BedderDanu 9 years, 9 months ago
Greetz, I'm having somewhat of a design dilemma.

I'm working with boost:regex (PCRE) to match certain parts of a word, and for each match I am then modifying the string by using regex_replace. The goal is to generate a word of a given length using regex_replace rules. Think of it as a random nickname generator. The program has an array of match rules, and for each of these match rules there is an array of "replace" rules where one of them is chosen at random upon a match. There is a loop that iterates through all the rules, and stops on the first one that matches. Then the string gets modified using regex_replace (some characters are added to the string), and the whole thing repeats until a word of a specified length has been generated.

Example: At the beginning we have an empty string, so the first regex rule that matches is the one for an empty string. This one "replaces" the empty string with a single character that the nickname is going to begin with. The character is chosen at random from a database. The loop now exits and repeats from the start. Since, we now no longer have an empty string, the former rule will no longer match. The loop will proceed to the next rule until it finds a match. So now the second character is chosen, the third, etc. Three are some rules that prevent more than two consecutive consonants or vowels, and some more complex pattern-based hackery.

The thing that I am stuck with is how to match the rules for the end of the word of a specific length. I have a separate database that dictates how a nickname should end. There are some preferred endings like "-ight", "-[aeo]yah", "-ite", "-ard", "-bby", etc, but in case where none of these match, a single pattern-based character is selected as the ending. I don't want these rules to match prematurely, as the generated word has to be of a specific length i.e. 4 to 12 characters (always random). I cannot simply match the "$" character because that would match every time. there has to be some condition that detects we're at the ending of the string (the ending being the final specified length).

My solution was to fill the string with a specific number of spaces which signify the permitted word length. These would then be replaced by regex rules, however, this method is unreliable because the replace rules can erase/insert multiple characters, and the word length becomes distorted.


Ideas?
Advertisement

I think ^.{#}$ will match a string of length #.

For example, ^.{12}$ would find strings of exactly length 12.

^.{#}$, exactly # of characters

^.{#,}$, at least # of characters

^.{,#}$, up to # of characters

^.{A,B}$, between A and B characters.

so your list of patters could start with

^\s*$ - match empty line

^.{#}$ - match exactly desired number of letters

^.{#,}$ - handle case of too many letters

[rest] - rest of your patters to fix the nick.

Thanks for the reply. Yeah, but how do I tell the "#" in regex? I mean the length of the nicks is decided at runtime, and it is always a random number. I cannot know in advance what this number will be, so I cannot put it into the regex rules. And even if I knew the number, there are still multiple possible lengths. The regex patterns are compiled only once, and they are packed into a JSON database. If only there was some regex character that I could use to adjust it programmatically at runtime... Having to edit the rule (replace the "#" with a number) each time a new nick of a different length is being generated is still hackery, and it is no better than my own solution with spaces. Is there a better way to do this? Best regards!

Is it a static amount? can you pre-program the length RegExs from, say, 4-30 into the database, and just pull the ones you need for that trip through?

Do you have to do this on RegEx match every time? Can you just check the string length first before running the RegEx?

psudocode:


string s = "";
int len = RAND(3, 30);
bool finished = false;
While(!finished)
{
  switch(s.length())
  {
  case len:
    finish(s);
    finished = true;
    break;
  case 0:
    start(s);
    break;
  default:
    process(s);
    break;
  }
}

This topic is closed to new replies.

Advertisement