Sign in to follow this  
Nice Coder

Building a parser (not solved)

Recommended Posts

I'm not sure if a parser is the right word for this... Using Visual basic 6. What i would like to be able to do, is to have 'patterns' (somewhat liek aiml). So, pattern "Do you like *" Then on the responce i can have "No, i don't like {star1}" Or, a pattern can be "[Is/Are] you [great/good]?" The problem is, how would i deal with multiple stars? (or even one...), and how would i deal with the brackets? From, Nice coder [Edited by - Nice Coder on June 28, 2005 2:09:51 AM]

Share this post


Link to post
Share on other sites
Boost does indeed have a RegEx library.

Nice Coder: can you better explain exactly what do you do with the pattern?

Here's how I'd deal with the input/response example given, using the boost spirit library (LL parser):

#include <boost/spirit.hpp>
#include <string>
#include <iostream>

void example ( void ) {
using namespace boost;
using namespace boost::spirit;
using namespace std;
string line;
string subject;
getline( cin , line );
parse_into< std::string::iterator > info = parse( line.begin() , line.end() ,
str_p("Do you like ") >> (+anychar_p)[ assign_a( subject ) ]
);

if ( ! info.full ) {
cout << "Sorry, I couldn't understand what you said" << endl;
return;
}
cout << "No, I don't like " << subject << endl;
}





To break down the "grammar" (the matching rules) of the parse statement:


str_p("Do you like ") //Match anything starting with "Do you like "
>> //followed by
(+anychar_p) //1 or more characters of any type..
//(*anychar_p) //would match 0 or more
//(!anychar_p) //would match 0 or 1 character of any type
//(+(anychar_p - ' ')) //would match 1 or more characters that arn't spaces
[ assign_a( subject ) ] //..and assign the group to "subject"



Then, if the user entered: Do you like pie?
The program would respond: No, I don't like pie?
(since "?" is matched as part of the subject)

To fix that, we could change our grammar to:

str_p("Do you like ") //match "Do you like "
>> (+(anychar_p - '?'))[ assign_a( subject ) ] //match 1+ non question marks and assign them to subject
>> (!ch_p('?')) //optionally match (e.g. match 0 or 1) a question mark
>> end_p; //match the end of the input (not really a character, this consumes 0 characters, but will only "match" if we're at the end of the input.


Then, our program would respond:
to: Do you like pie?
with: No, I don't like pie

to: Do you like pie
with: No, I don't like pie

to: Do you like pie??
with: Sorry, I couldn't understand what you said

Share this post


Link to post
Share on other sites
ok.

First off, this is for the ALPHA chatbot, as the primary comment based responce engine. (and its written in vb6... which is either a blessing or a curse, i haven't figured out which)

What that means, is that for things like comments (like YAY!!!) or questions that have lost there way (WTF???), we have to respond.
This is the first line, if nothing picks up, then it movwes to maximatch. (which is a nifty algo, but sometimes it just plain does wierd things when your talking about things on the fringes of its kb).

I also need some way of scripting responces for this. (so, we could go and gett it to add a variable for names in this, to be used for the question routines, ect.)

From,
Nice coder

Share this post


Link to post
Share on other sites
Quote:
Original post by Conner McCloud
Regular Expressions. I believe Boost has a C++ library for it, Java's standard library includes support, and Perl has it built in.

CM


The problem is, i can't extract what the * represents...

From,
Nice coder

Share this post


Link to post
Share on other sites
Really, you haven't said a whole lot about what you want the parser to be able to do... If you just want to match a few commands with variable parameters, it's pretty simple. If you want to make your own entire complicated command language, it gets a lot harder.

I suggest you look up some books on parsers. Or any decent book on compilers will start with about two to four chapters on the art of parsing pretty complex input.

Also note that if you want, say, several small expressions to be composed to form large expressions, regexps probably won't be powerful enough. But then, maybe you don't need anything that complicated.

Share this post


Link to post
Share on other sites
I guess you'll need to look at something like BackTracking. It's pretty simple to write a recursive version of a parser that does what you said, it's slightly harder a faster version.

The idea is, you have a function that checks if if can match the input with the pattern. It knows where in the pattern it is, and where in the input it is. It then checks if it can match with the pattern.

The function should take 4 parameters; the current pattern position and the current input position.

Pseudo code:


function match(byref pattern, byref input, byref patternpos = 0, inputpos = 0){
if (at end of pattern) return true
while(not at end of pattern){
OMG_A_LABEL:
select case (next character){
"*": //backtracking, I think. maybe not though. possibly even exhaustive.
patternpos++
find next wildcard, set patternpos to be just before it.
let text = the absolute (non-wilcard) text.
look for that text in the input.
if it is not found, return false.
if it is found, then call match(pattern, input, patternpos + length of text, place where text was found + length of text)
if that call to match returns true, then add the length of the text to patternpos, let inputpos = place where you last found that text + length of the text, and goto OMG_A_LABEL (you may use the continue statement if you decide to use C++)
if that returns false, go back and search for the text again, until all instances of that text return false. if they all return false, return false as well.
"[": //choice
read in the choices, then let patternpos = the pattern position after the closing ].
if(match(pattern, input, patternpos, inputpos) returns false) also return false
else: //Absolute text
find next wildcard, set patternpos to be just before it. let text = the absolute text that is being tested.
if text is not found here in the input, return false.
}
}
}






The next step is to get the wildcards and replace {star1}, etc, with the correct values. You could do that fairly easily, I think.

[Edited by - MrEvil on June 28, 2005 7:44:48 AM]

Share this post


Link to post
Share on other sites
Since it looks like you might be considering using languages other than VB I also suggest that you take a look at ANTLR ( www.antlr.org ). It's basically Yacc/Bison and Flex/Lex on steroids and crack (respectively). It generates the lexer, parser, AND tree walker. Creation of abstract syntax tress is built in and it generates source for C++, Java, and C# (all from a single grammar file). It's written in Java, is open source, and has lots of existing grammar files.

Typically creating a grammar and associated support code for AST's is a time consuming task. I started my current grammar file almost two weeks ago (on friday) and it's almost complete - with full AST.

Share this post


Link to post
Share on other sites
There are regular expressions in VB6. Go to Project->References. The regular expression library is in there. If it's not, then go to microsoft's website and search for windows scripting host. Download and install.

If you want to extract a specific string from another string with regular expressions, you'll have to group the specific part of the expression with parentheses.

For example (all from memory, so some class names could be wrong):


Dim re as RegExp
Dim matches as MatchCollection
Set re = New RegExp
re.Pattern = "Do you like ([^$]+)$
matches = re.Execute(stringToParse)

print "You like " & matches.item(0).submatches(0)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this