Calling all Boost experts!

Started by
3 comments, last by OuncleJulien 19 years, 7 months ago
Ok, so I'm writing a compiler and am building the scanner right now in C++. I'm using the Boost regular expressions lib because it makes lexical analysis *so* much easier. For some reason I seem to be having troubles using escape characters in my searches. For example. I have a file that is delimited by the newline character. So here's the expression that would use to get the first line: boost::regex lineDelimExpression("(.*)[\n].*"); This for some reason wont work and I can't figure out why. If I were to delimit the file by say the pound symobl (#) and use this regexp: boost::regex poundDelimExpression("(.*)[#].*"); It has no problems whatsoever and works like a charm. Anyone have any ideas why it won't work with \n? I've been racking my brain on this all day ;/
Advertisement
Yes! No better feeling than solving a problem on your own =)

After what seems like my 100th careful examination I realized that the newline character is included in '.'. So the obvious solution was to tell it to include all characters excluding the newline character as so:

boost::regex lineDelimExpression("([^\n]*)[\n].*");

Cheers! =)
Get a copy of the Regex Coach. It is the *best* way to test out regular expressions. Written in LISP, too.
--God has paid us the intolerable compliment of loving us, in the deepest, most tragic, most inexorable sense.- C.S. Lewis
Have you tried [\r\n], as Windows typically uses the CRLF combo to specify a new line. I use these at work but can't remember off the top of my head. I'll post some source when I get into the office.

EDIT: I found a reference to back it up here

Quote:
You can use special character sequences to put non-printable characters in your regular expression. \t will match a tab character (ASCII 0x09), \r a carriage return (0x0D) and \n a line feed (0x0A). Remember that Windows text files use \r\n to terminate lines, while UNIX text files use \n.


I'm sure my problem had something to do with that, I'll comfirm when I get into work.
evolutional-
So I modified the expression as follows:

//([^\n|^\r]*)[\n|\r]?.*

So it'll end the line if it finds a \n *or* a \r, whichever comes first. I don't mind what comes after the first \n or \r so this works out great for me.


antareus-
I'm looking into this right now. Thanks a ton for the heads up.

Thanks again guys! =)

This topic is closed to new replies.

Advertisement