Sign in to follow this  
OuncleJulien

Calling all Boost experts!

Recommended Posts

Ok, so I'm writing a compiler and am building the scanner right now in C++. I'm using the Boost regular expressions lib because it makes lexical analysis *so* much easier. For some reason I seem to be having troubles using escape characters in my searches. For example. I have a file that is delimited by the newline character. So here's the expression that would use to get the first line: boost::regex lineDelimExpression("(.*)[\n].*"); This for some reason wont work and I can't figure out why. If I were to delimit the file by say the pound symobl (#) and use this regexp: boost::regex poundDelimExpression("(.*)[#].*"); It has no problems whatsoever and works like a charm. Anyone have any ideas why it won't work with \n? I've been racking my brain on this all day ;/

Share this post


Link to post
Share on other sites
Yes! No better feeling than solving a problem on your own =)

After what seems like my 100th careful examination I realized that the newline character is included in '.'. So the obvious solution was to tell it to include all characters excluding the newline character as so:

boost::regex lineDelimExpression("([^\n]*)[\n].*");

Cheers! =)

Share this post


Link to post
Share on other sites
Have you tried [\r\n], as Windows typically uses the CRLF combo to specify a new line. I use these at work but can't remember off the top of my head. I'll post some source when I get into the office.

EDIT: I found a reference to back it up here

Quote:

You can use special character sequences to put non-printable characters in your regular expression. \t will match a tab character (ASCII 0x09), \r a carriage return (0x0D) and \n a line feed (0x0A). Remember that Windows text files use \r\n to terminate lines, while UNIX text files use \n.


I'm sure my problem had something to do with that, I'll comfirm when I get into work.

Share this post


Link to post
Share on other sites
evolutional-
So I modified the expression as follows:

//([^\n|^\r]*)[\n|\r]?.*

So it'll end the line if it finds a \n *or* a \r, whichever comes first. I don't mind what comes after the first \n or \r so this works out great for me.


antareus-
I'm looking into this right now. Thanks a ton for the heads up.

Thanks again guys! =)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this