I'm trying to solve a text parsing problem using C++11's <regex> library. This is somewhat unrelated to C++, however. This is mostly a Regex question, with only a few minor C++ twists. I could solve this problem easily using other C++ methods, but I figure knowing RegEx is a very important skill that I'm currently lacking.
The text I'm trying to parse has this format: (with a varying number of parameters - but for now, I'm assuming it has at least one)
[color=#ff8c00]functionName
[color=#ff0000]([color=#008080]"blah"[color=#800080], "Foo", Stuff
[color=#ff0000])
Quotation marks on the parameters are optional, so I need to accept both cases.
I want to get 'functionName' as one string, and each parameter as another string.
Here's what I'm trying to do: (Code highlighted to match my _expectation_ of where it should line up on the format above).
[color=#ff8c00][\w-]+
[color=#ff0000]\([color=#008080](("[\w- ]*")|([\w-]+))[color=#800080](, *
[color=#800080](("[\w- ]*")|([\w-]+))[color=#800080])*[color=#ff0000]\)
I've been using this site to test, and it's not matching as I'm thinking it should.
Here's what I'm thinking each part of this code does.
[color=#FF8C00][\w-]+
Function name. Alphanumerical, and includes hypens.
Should match: [color=#ff0000]functionName
("blah", "Foo", Stuff
)
[color=#FF0000]\(
Opening function bracket.
Should match: functionName[color=#ff0000]
("blah", "Foo", Stuff
)
[color=#008080](("[\w- ]*")|([\w-]+))
First function parameter.
Should match: functionName
([color=#ff0000]"blah", "Foo", Stuff
)
Look at it like this:
[color=#ff0000]( [color=#008080]("[\w- ]*")
[color=#ff0000]| [color=#008080]([\w-]+) [color=#ff0000])
First half: <quotation-mark> OneOrMore:(alphaNumerical OR hyphen OR space) <quotation-mark>
Second half: OneOrMore:(alphaNumerical OR hyphen)
The second half should match the same as the first half, but without quotes and without spaces (because arguments should only have spaces within quotes).
[color=#800080](, *[color=#800080](("[\w- ]*")|([\w-]+))[color=#800080])*
This is the exact same as the previous, except it is prefixed with a comma and optional spaces. This entire sub-expression occures zero or more times.
Should match: functionName
("blah", [color=#ff0000]"Foo", Stuff
)
[color=#FF0000]\)
Closing function bracket.
Should match: functionName
("blah", "Foo", Stuff
[color=#FF0000])
=============================================
My questions are several:
1) What am I overlooking in this above expression? Why doesn't the expression 'match' the example input?
2) Assuming it did match, how do I 'pull-out' or retrieve the different parts I want? I know how to do this on C++'s side of it, but I don't know how to specify in the expression itself which parts are the parts I want (the function name, and each argument), verses which parts to discard.
3) How do I repeat a sub-expression, so I don't have to copy + paste it multiple times into the expression? (Example: I have the second+ arguments as a copy and paste of the first argument... how do I avoid that?