Jump to content
  • Advertisement
Sign in to follow this  
lgc_ustc

Implementing the regular expression functionality.

This topic is 5092 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Greetings to all:) I am wondering how can a program accept the user's input of a regular expression, and then it parses it, builds the NFA/DFA for it, and finally uses the NFA/DFA to search a matching pattern in a long string. Some examples for this are: the find dialog box of VC++6.0,with Regular Expression enabled; the regexp tool under unix; or the lexical scanner commonly used in compiler construction: flex(and many others like *lex). Any response would be preciated.

Share this post


Link to post
Share on other sites
Advertisement
There's a nice recent series of articles in GameDev detailing how state machines work and how to implement regex matching using them, here.

Share this post


Link to post
Share on other sites
Thanks for your kind replies:) They are tremendously awesome especially the series articles from GameDev. However, the series does not seem to go to a conclusion, since it does not cover the implementation of converting NFAs to DFAs, or the simulation directly of input characters on NFAs. Can you give some suggestions on it? I spent hours searching using google but in vain.

Also,fagiano,I have downloaded your library and read the source, and it seems compact to me. Have you written any paper on how it works? Or any references of it? I need this simply can not fully understand what is going on there by reading the source only:)

Share this post


Link to post
Share on other sites
Here's a hint... (I actually coded this just today, funny you should mention it): take the NFA, assume you start in the initial state. Follow any and all epsilon transitions, assume you're in any one of those states. Then follow each possible input, followed by epsilons, et cetera. Each combination of states is unique. I'm not sure how clear this was, so if you're still confused, give holler.

Share this post


Link to post
Share on other sites
there's a 'fast regular expression' article up on CodeProject. It make interesting use of object-orientation in building the state machine and stl algorithms but be warned the code is pretty messy.

Share this post


Link to post
Share on other sites
Hi, I am now pretty clear of how to simulate NFA without converting it into DFA using subset construction. Thanks for all the replies you posted.

BTW, I am still wondering how the lexical scanner generators family work: do they use the conventional methods in textbooks or some special techiques? I read the machine-generated scanner source before, and found it messy.

Share this post


Link to post
Share on other sites
I'm fairly certain that the generator's techniques are described in the dragon book(ISBN 0201101947). It covers some techniques that aren't so nice to do by hand and aren't very easy to read, but are more efficient and not TOO difficult to automate.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!