Implementing the regular expression functionality.

Engines and Middleware Programming

Started by lgc_ustc October 10, 2004 03:06 AM

7 comments, last by Extrarius 19 years, 6 months ago

138

Author

October 10, 2004 03:06 AM

Greetings to all:) I am wondering how can a program accept the user's input of a regular expression, and then it parses it, builds the NFA/DFA for it, and finally uses the NFA/DFA to search a matching pattern in a long string. Some examples for this are: the find dialog box of VC++6.0,with Regular Expression enabled; the regexp tool under unix; or the lexical scanner commonly used in compiler construction: flex(and many others like *lex). Any response would be preciated.

hh10k

589

October 10, 2004 03:42 AM

Check out Boost.Regex.

Sharlin

864

October 10, 2004 05:02 AM

There's a nice recent series of articles in GameDev detailing how state machines work and how to implement regex matching using them, here.

fagiano

307

October 10, 2004 12:11 PM

If you want a small one, check my T-Rex is 500 lines of code http://tiny-rex.sourceforge.net.

ciao
Alberto

-----------------------------The programming language Squirrelhttp://www.squirrel-lang.org

lgc_ustc

138

Author

October 10, 2004 04:54 PM

Thanks for your kind replies:) They are tremendously awesome especially the series articles from GameDev. However, the series does not seem to go to a conclusion, since it does not cover the implementation of converting NFAs to DFAs, or the simulation directly of input characters on NFAs. Can you give some suggestions on it? I spent hours searching using google but in vain.

Also,fagiano,I have downloaded your library and read the source, and it seems compact to me. Have you written any paper on how it works? Or any references of it? I need this simply can not fully understand what is going on there by reading the source only:)

DarkHamster

180

October 10, 2004 05:28 PM

Here's a hint... (I actually coded this just today, funny you should mention it): take the NFA, assume you start in the initial state. Follow any and all epsilon transitions, assume you're in any one of those states. Then follow each possible input, followed by epsilons, et cetera. Each combination of states is unique. I'm not sure how clear this was, so if you're still confused, give holler.

"There is no dark side of the moon really,
As a matter of fact, its all dark."

algorhythmic

176

October 11, 2004 06:10 AM

there's a 'fast regular expression' article up on CodeProject. It make interesting use of object-orientation in building the state machine and stl algorithms but be warned the code is pretty messy.

[email=algorhythmic@transcendenz.co.uk]algorhythmic[/email] | home | xltronic | whatever you do will be insignificant, but it is very important that you do it - mahatma gandhi

lgc_ustc

138

Author

October 12, 2004 08:14 PM

Hi, I am now pretty clear of how to simulate NFA without converting it into DFA using subset construction. Thanks for all the replies you posted.

BTW, I am still wondering how the lexical scanner generators family work: do they use the conventional methods in textbooks or some special techiques? I read the machine-generated scanner source before, and found it messy.

Extrarius

1,412

October 13, 2004 02:44 PM

I'm fairly certain that the generator's techniques are described in the dragon book(ISBN 0201101947). It covers some techniques that aren't so nice to do by hand and aren't very easy to read, but are more efficient and not TOO difficult to automate.

"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk

Implementing the regular expression functionality.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Implementing the regular expression functionality.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines