|
||||||||||||||||||
Add Forum to Favorites | Send Topic To a Friend | View Forum FAQ | Track this topic |
Last Thread Next Thread ![]() |
| Algorithmic Forays Part 6 |
|
![]() sirSolarius Member since: 12/7/2003 |
||||
|
|
||||
| Ok, I understand everything except the 5 states right at the end. We have A, the start state. We then split that into eps-closure(move(A, a)) and eps-closure(move(A, b)). Those are our two new states, B and C. We then mark A. Why don't we break B into eps-closure(move(B, a)) and eps-closure(move(B,b)), and do the same with C? What is the operation that creates the new set? In other words, we are generating D and E using move(B, ?) and move(C, ?). Or am I just totally off? |
||||
|
||||
![]() spur4444 Member since: 3/9/2004 From: Haifa, Holy Land |
||||
|
|
||||
| We do break B and C. See the following: ~~~~~~~~ ... C is added, unmarked, to dfa_states. Since all the alphabet symbols are done for A, we go back to the outer loop. Are there any unmarked states ? Yes, there are B and C. So we pick B and go over the process again and again, until all the sets that are states of the DFA are marked. ... ~~~~~~~~ If this doesn't answer your question, please be more specific. What is the precise sentence you don't understand ? |
||||
|
||||
![]() porthios Member since: 4/11/2002 From: Canada |
||||
|
|
||||
| Hi, I started a regex parser before I read these articles (and had it working), but I must say that this is so far a nice series. Regular expressions can be confusing at times so you've done a good job of trying to explain it in a bit simpler terms. I have a question. I was gonna start putting in some features to my regex parser like the RepeatN structure (ie {m,n}) and charsets (ie [x-yz]) and was wondering if you'll consider doing an article or something describing those. For instance, I thought of a coupla ways to do the RepeatN structure. You could have a transition that has some values stating the min/max times it can be travelled and work with that or you could expand the NFA. (ie (a|b){2,4} --> (a|b)(a|b)((a|b)|eps)|(a|b)The second example obviously could become memory-intensive for the complex regexs. For the charset, how is that handled? For something like [a-z] would you have (a|b|c|d|e|f...|y|z) or something else? If you could point me in the right direction, or give a link, on how to implement these things I would be very grateful! Thanks, Porthios |
||||
|
||||
![]() sirSolarius Member since: 12/7/2003 |
||||
|
|
||||
Quote: Right, but we should have two sets from B (move(B, a) and move(B, b)) and two sets from C (move(C, a) and move(C, b)) right? How do you know that you only need two more sets (D and E) instead of four more, and how are D and E generated (i.e. move(B, ?) and move(C, ?))? |
||||
|
||||
![]() porthios Member since: 4/11/2002 From: Canada |
||||
|
|
||||
Quote: Ill try my best to answer your questions, if im wrong, go ahead and correct me! :) For your first question, the reason there's only two instead of four because we have to check to see if the set created from EpsilonClosure(Move(SET, input)) is already in the DFA table, and if it is, we wont add it to the unmarked set. That means That not every set generated from EpsilonClosure(Move(SET, input)) wil be required. For your second, I believe they are created like so: D = EpsilonClosure(Move(C, 'b')) E = EpsilonClosure(Move(D, 'b')) The answer to the first question explains why there is no set for EpsilonClosure(Move(C, 'a')) and EpsilonClosure(Move(D, 'a')) Hope this helps! |
||||
|
||||
![]() spur4444 Member since: 3/9/2004 From: Haifa, Holy Land |
||||
|
|
||||
Quote: I don't *know* it - it's a result of the algorithm's run :-) Read the algorithm (the pseudo code) again - and pay attention to the condition: if U is not in dfa_states add U to dfa_states, unmarked When we generate a new U with: U = eps-closure(N.move(T, i)) We don't just insert it into the DFA states, we first check that it's not already there. If you just go over the algorithm with B and C, you'll easily see this. As an intuitive hint, note that B contains all states of C, except 5, but 5 leads to no state not already contained in B, so the real *interesting* transition is on 'b' from 8. This gives us set D. Then, again the only *interesting* transition is on 'b' from 9... |
||||
|
||||
![]() spur4444 Member since: 3/9/2004 From: Haifa, Holy Land |
||||
|
|
||||
Quote: As I see it, there are two ways to think about an implementation: 1) A simple way - create something that works. It's the best for educational purposes, and it's the approach I'm taking in Algorithmic Forays. 2) Optimize - various optimizations "over" the simple way. For instance, though the "real" tools (Lex, Perl...) use some variation of the algorithms I've presented, they definitely also have many optimizations and cut-offs, and hand-tailorings, etc. I'm not focusing on this approach. As for your questions: A charset is definitely a simple alternation. [a-c] is just (a|b|c). Repeats can be achieved with eps, similarly to what you presented. For example: t{2,4} is tt(t|eps)(t|eps) As for links/references, you can get the Mastering Regular Expressions book if you're really into the subject. You can also try your luck with the source code / internal docs of Flex. |
||||
|
||||
All times are ET (US)![]() |
Last Thread Next Thread ![]() |
|