Building a parser (not solved)

Started by
12 comments, last by Nice Coder 18 years, 9 months ago
I'm not sure if a parser is the right word for this... Using Visual basic 6. What i would like to be able to do, is to have 'patterns' (somewhat liek aiml). So, pattern "Do you like *" Then on the responce i can have "No, i don't like {star1}" Or, a pattern can be "[Is/Are] you [great/good]?" The problem is, how would i deal with multiple stars? (or even one...), and how would i deal with the brackets? From, Nice coder [Edited by - Nice Coder on June 28, 2005 2:09:51 AM]
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
Advertisement
Regular Expressions. I believe Boost has a C++ library for it, Java's standard library includes support, and Perl has it built in.

CM
Boost does indeed have a RegEx library.

Nice Coder: can you better explain exactly what do you do with the pattern?

Here's how I'd deal with the input/response example given, using the boost spirit library (LL parser):

#include <boost/spirit.hpp>#include <string>#include <iostream>void example ( void ) {    using namespace boost;    using namespace boost::spirit;    using namespace std;    string line;    string subject;    getline( cin , line );    parse_into< std::string::iterator > info = parse( line.begin() , line.end() ,        str_p("Do you like ") >> (+anychar_p)[ assign_a( subject ) ]        );    if ( ! info.full ) {        cout << "Sorry, I couldn't understand what you said" << endl;        return;    }    cout << "No, I don't like " << subject << endl;}


To break down the "grammar" (the matching rules) of the parse statement:


str_p("Do you like ") //Match anything starting with "Do you like ">> //followed by(+anychar_p) //1 or more characters of any type..//(*anychar_p) //would match 0 or more//(!anychar_p) //would match 0 or 1 character of any type//(+(anychar_p - ' ')) //would match 1 or more characters that arn't spaces[ assign_a( subject ) ] //..and assign the group to "subject"



Then, if the user entered: Do you like pie?
The program would respond: No, I don't like pie?
(since "?" is matched as part of the subject)

To fix that, we could change our grammar to:

str_p("Do you like ") //match "Do you like "
>> (+(anychar_p - '?'))[ assign_a( subject ) ] //match 1+ non question marks and assign them to subject
>> (!ch_p('?')) //optionally match (e.g. match 0 or 1) a question mark
>> end_p; //match the end of the input (not really a character, this consumes 0 characters, but will only "match" if we're at the end of the input.


Then, our program would respond:
to: Do you like pie?
with: No, I don't like pie

to: Do you like pie
with: No, I don't like pie

to: Do you like pie??
with: Sorry, I couldn't understand what you said
ok.

First off, this is for the ALPHA chatbot, as the primary comment based responce engine. (and its written in vb6... which is either a blessing or a curse, i haven't figured out which)

What that means, is that for things like comments (like YAY!!!) or questions that have lost there way (WTF???), we have to respond.
This is the first line, if nothing picks up, then it movwes to maximatch. (which is a nifty algo, but sometimes it just plain does wierd things when your talking about things on the fringes of its kb).

I also need some way of scripting responces for this. (so, we could go and gett it to add a variable for names in this, to be used for the question routines, ect.)

From,
Nice coder
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
Can anyone help?

From,
Nice coder
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
Quote:Original post by Conner McCloud
Regular Expressions. I believe Boost has a C++ library for it, Java's standard library includes support, and Perl has it built in.

CM


The problem is, i can't extract what the * represents...

From,
Nice coder
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
Can anyone help?

from,
Nice coder
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
Really, you haven't said a whole lot about what you want the parser to be able to do... If you just want to match a few commands with variable parameters, it's pretty simple. If you want to make your own entire complicated command language, it gets a lot harder.

I suggest you look up some books on parsers. Or any decent book on compilers will start with about two to four chapters on the art of parsing pretty complex input.

Also note that if you want, say, several small expressions to be composed to form large expressions, regexps probably won't be powerful enough. But then, maybe you don't need anything that complicated.
-----http://alopex.liLet's Program: http://youtube.com/user/icefox192
The major thing i want to be able to do, is to use *'s properly.

Mainly because i haven't got the foggiest of how to do it... :(

From,
Nice coder
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
I guess you'll need to look at something like BackTracking. It's pretty simple to write a recursive version of a parser that does what you said, it's slightly harder a faster version.

The idea is, you have a function that checks if if can match the input with the pattern. It knows where in the pattern it is, and where in the input it is. It then checks if it can match with the pattern.

The function should take 4 parameters; the current pattern position and the current input position.

Pseudo code:
function match(byref pattern, byref input, byref patternpos = 0, inputpos = 0){  if (at end of pattern) return true  while(not at end of pattern){OMG_A_LABEL:    select case (next character){      "*": //backtracking, I think. maybe not though. possibly even exhaustive.        patternpos++        find next wildcard, set patternpos to be just before it.        let text = the absolute (non-wilcard) text.        look for that text in the input.        if it is not found, return false.        if it is found, then call match(pattern, input, patternpos + length of text, place where text was found + length of text)        if that call to match returns true, then add the length of the text to patternpos, let inputpos = place where you last found that text + length of the text, and goto OMG_A_LABEL (you may use the continue statement if you decide to use C++)        if that returns false, go back and search for the text again, until all instances of that text return false. if they all return false, return false as well.      "[": //choice        read in the choices, then let patternpos = the pattern position after the closing ].        if(match(pattern, input, patternpos, inputpos) returns false) also return false      else: //Absolute text        find next wildcard, set patternpos to be just before it. let text = the absolute text that is being tested.        if text is not found here in the input, return false.    }  }}


The next step is to get the wildcards and replace {star1}, etc, with the correct values. You could do that fairly easily, I think.

[Edited by - MrEvil on June 28, 2005 7:44:48 AM]

This topic is closed to new replies.

Advertisement