Implementing the regular expression functionality.

Started by
7 comments, last by Extrarius 19 years, 6 months ago
Greetings to all:) I am wondering how can a program accept the user's input of a regular expression, and then it parses it, builds the NFA/DFA for it, and finally uses the NFA/DFA to search a matching pattern in a long string. Some examples for this are: the find dialog box of VC++6.0,with Regular Expression enabled; the regexp tool under unix; or the lexical scanner commonly used in compiler construction: flex(and many others like *lex). Any response would be preciated.
Advertisement
Check out Boost.Regex.
There's a nice recent series of articles in GameDev detailing how state machines work and how to implement regex matching using them, here.
If you want a small one, check my T-Rex is 500 lines of code http://tiny-rex.sourceforge.net.

ciao
Alberto
-----------------------------The programming language Squirrelhttp://www.squirrel-lang.org
Thanks for your kind replies:) They are tremendously awesome especially the series articles from GameDev. However, the series does not seem to go to a conclusion, since it does not cover the implementation of converting NFAs to DFAs, or the simulation directly of input characters on NFAs. Can you give some suggestions on it? I spent hours searching using google but in vain.

Also,fagiano,I have downloaded your library and read the source, and it seems compact to me. Have you written any paper on how it works? Or any references of it? I need this simply can not fully understand what is going on there by reading the source only:)
Here's a hint... (I actually coded this just today, funny you should mention it): take the NFA, assume you start in the initial state. Follow any and all epsilon transitions, assume you're in any one of those states. Then follow each possible input, followed by epsilons, et cetera. Each combination of states is unique. I'm not sure how clear this was, so if you're still confused, give holler.


"There is no dark side of the moon really,
As a matter of fact, its all dark."
there's a 'fast regular expression' article up on CodeProject. It make interesting use of object-orientation in building the state machine and stl algorithms but be warned the code is pretty messy.
[email=algorhythmic@transcendenz.co.uk]algorhythmic[/email] | home | xltronic | whatever you do will be insignificant, but it is very important that you do it - mahatma gandhi
Hi, I am now pretty clear of how to simulate NFA without converting it into DFA using subset construction. Thanks for all the replies you posted.

BTW, I am still wondering how the lexical scanner generators family work: do they use the conventional methods in textbooks or some special techiques? I read the machine-generated scanner source before, and found it messy.
I'm fairly certain that the generator's techniques are described in the dragon book(ISBN 0201101947). It covers some techniques that aren't so nice to do by hand and aren't very easy to read, but are more efficient and not TOO difficult to automate.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk

This topic is closed to new replies.

Advertisement