Calling all Boost experts!
Ok, so I'm writing a compiler and am building the scanner right now in C++. I'm using the Boost regular expressions lib because it makes lexical analysis *so* much easier.
For some reason I seem to be having troubles using escape characters in my searches. For example. I have a file that is delimited by the newline character. So here's the expression that would use to get the first line:
boost::regex lineDelimExpression("(.*)[\n].*");
This for some reason wont work and I can't figure out why. If I were to delimit the file by say the pound symobl (#) and use this regexp:
boost::regex poundDelimExpression("(.*)[#].*");
It has no problems whatsoever and works like a charm.
Anyone have any ideas why it won't work with \n? I've been racking my brain on this all day ;/
Yes! No better feeling than solving a problem on your own =)
After what seems like my 100th careful examination I realized that the newline character is included in '.'. So the obvious solution was to tell it to include all characters excluding the newline character as so:
boost::regex lineDelimExpression("([^\n]*)[\n].*");
Cheers! =)
After what seems like my 100th careful examination I realized that the newline character is included in '.'. So the obvious solution was to tell it to include all characters excluding the newline character as so:
boost::regex lineDelimExpression("([^\n]*)[\n].*");
Cheers! =)
Get a copy of the Regex Coach. It is the *best* way to test out regular expressions. Written in LISP, too.
Have you tried [\r\n], as Windows typically uses the CRLF combo to specify a new line. I use these at work but can't remember off the top of my head. I'll post some source when I get into the office.
EDIT: I found a reference to back it up here
I'm sure my problem had something to do with that, I'll comfirm when I get into work.
EDIT: I found a reference to back it up here
Quote:
You can use special character sequences to put non-printable characters in your regular expression. \t will match a tab character (ASCII 0x09), \r a carriage return (0x0D) and \n a line feed (0x0A). Remember that Windows text files use \r\n to terminate lines, while UNIX text files use \n.
I'm sure my problem had something to do with that, I'll comfirm when I get into work.
evolutional-
So I modified the expression as follows:
//([^\n|^\r]*)[\n|\r]?.*
So it'll end the line if it finds a \n *or* a \r, whichever comes first. I don't mind what comes after the first \n or \r so this works out great for me.
antareus-
I'm looking into this right now. Thanks a ton for the heads up.
Thanks again guys! =)
So I modified the expression as follows:
//([^\n|^\r]*)[\n|\r]?.*
So it'll end the line if it finds a \n *or* a \r, whichever comes first. I don't mind what comes after the first \n or \r so this works out great for me.
antareus-
I'm looking into this right now. Thanks a ton for the heads up.
Thanks again guys! =)
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement