Archived

This topic is now archived and is closed to further replies.

Boolean equation parser library?

This topic is 5405 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I''m looking for a library or source that can parse a boolean equation (with variables). I have tried myself for several hours, but it''s taking too much time. I''m very certain that I can make a functioning parser, but it will take time and probably be very buggy. Do anyone know where I can find example source or a ([L]GPL) library for such parser? I have spent time searching, but it was very difficult to find anything. It will need to be able to parse an equation like this: "A(BC''+D)+A''(E''C)+G''" Into preferable a binary tree, which I then can use to evaluate it. As an alternative, I''d be happy to get some ideas how to make the parser. How the parsed equation is stored is already solved; it''s the parsing of the ASCII version that troubles me. I thought I had it solved, but then realized that it didn''t take operator precedance into account, which broke the whole concept I had. Any pointers or tips are appreciated.

Share this post


Link to post
Share on other sites
I just wrote a very simple parser for a project of mine - it''s the input for a drive engine for an artificial life creature, so the idea is to take an expression such as

d[0]=(s[3]+50)*c[2]

which sets the value of drive 0 to sense 3 + 50 x chemical 2''s concentration.

As the datafiles aren''t going to be user-generated (mostly), I did bracket-depth counting; the above would look like this:

d[0]=(((s[3])+(50))*(c[2]))

using the depth of brackets to delimit statements.

Simple switch statements then construct some polymorphic objects (Elements, Expressions, Values, or References) to make a tree; call ''evaluate'' on the top and it''ll call down through the tree to give an answer.

You''ll be wanting something like this; if you can''t get operator precedence, then you can use bracket-depth, though I wouldn''t recommend it for user interaction.

If you want to write your own parser, I''d look into Lexx and Yacc (or Bison on the GNU front). Might take a while to learn, but it''s good knowledge.

Share this post


Link to post
Share on other sites
Thanks,

I need this one to be able to handle standard algebraic notation and should be able to handle errors in the equation. That is, it will take input directly from users.

I have two classes that should make up the parsed equation: BoolOperand and BoolOperator. A BoolOperand can be either a constant (0 or 1) or a variable (represented as an index) or a sub-expression (ie. a BoolOperator). The latter is simply an operator (AND or OR) and the two BoolOperands to operate on. I need only to call an Evaluate function on the root BoolOperand with a variable value table to evaluate the tree.

So, now I am trying to create this tree. The problem is that to be able to take operator precedence into account, I need to look ahead, which calls for an ever more complex and buggy parser. The current idea works if one use paranthesis to indicate precedance, so I''ve been thinking about having a step where I translate a string like this: "A+B''C(A''B+D)" to: "A+(B''C((A''B)+D))".

I will look into what you recommended and see if it is of any use.


Share this post


Link to post
Share on other sites
The best approach is to use lex/yacc or flex/bison. If that's not an option, the next best approach is to write a recursive-descent parser by hand.

Hmm.. Your use of postfix notation for negation of an expression will complicate things a bit, I suspect. Still, it's worth a try. The first step is to generate an appropriate grammar in EBNF form. (EBNF can work a bit more easily for recursive descent parsers, if used appropriately.)

Note: This is operating on the assumption that your order of operations is Parentheses, negation, AND, OR. (where concatenation is AND, and + is OR).


E ::= A { + A }*
A ::= N { N }*
N ::= T | N '
T ::= atom | ( E )


In the EBNF form used here, a group of the form { N }* means "optionally, N repeated any number of times."

I believe that the grammar above allows all well-formed expressions, and properly captures order of operations. Unfortunately, it can't quite be made into a recursive-descent parser. The "N ::= P | N '" production can't be decided without lookahead. If it could be changed to "N ::= P | P'", it would work, though. However, that change would prevent the legal (although not terribly useful) expression A' ' .

Ah, I see the solution now. Change the grammar to this:


E ::= A { + A }*
A ::= N { N }*
N ::= T { ' }*
T ::= atom | ( E )


Then forming a recursive-descent parser becomes trivial, especially given that associativity doesn't matter in these boolean expressions. Simply convert the grammar into a set of (indirectly recursive) functions, one for each non-terminal symbol in the grammar.

The code would look something like the following. There'll probably be at least one stupid logic error and one typo. But this should get the idea of the method across:


        
char *cur;

Expr *parse(char *s)
{
cur = s;
return parseE();
}

Expr *parseE()
{
Expr *temp = parseA();

while (skipWS() && *cur == '+')
{
cur++;
temp = new ORExpr(temp, parseA());
}

return temp;
}

Expr *parseA()
{
Expr *temp = parseN();

// note, this needs to be checked and maintained carefully,

// as it's looking for things that indicate that the next

// character isn't the start of an N expression. I think I

// got them both, but I'm not sure.

while (skipWS() && *cur != '+' && *cur != ')')
{
temp = new ANDExpr(temp, parseN());
}

return temp;
}

Expr *parseN()
{
Expr *temp = parseT();

while (skipWS() && *cur == '\')
{
cur++;
temp = new NOTExpr(temp);
}

return temp;
}

Expr *parseT()
{
// and finally, interesting stuff happens.

if (!skipWS)
{
// reached the end of the string, unexpectedly.

whatever error-handling code you want.
}
else if (*cur == '(')
{
cur++;
Expr * temp = parseE();

if (!skipWS() || *cur != ')')
{
// unbalanced parens.

whatever error-handling code you want.
}

cur++;
return temp;
}
else if (isalpha(*cur))
{
Expr *temp = new AtomExpr(*cur);
cur++;
return temp;
}
else
{
// unexpected character.

whatever error-handling code you want.
}
}

char skipWS()
{
while (*cur && isspace(*cur)) cur++;
return *cur;
}


Well, I can't see any mistakes in that, without actually testing it. Note, of course, that my code assumes you have a polymorphic Expr class to store parse trees in, with 4 subclasses: ORExpr, ANDExpr, NOTExpr, and AtomExpr. You'll need to alter that to fit whatever you're actually using for the parse tree, but that shouldn't be much trouble.

I hope this helps.

edit: found one typo, anyway.

[edited by - cwraith on March 2, 2003 6:26:17 PM]

Share this post


Link to post
Share on other sites
cwraith, thanks! That took some time to grasp, but I now see pretty clear how you''re doing it. I will need to look at it some more to see how it can be modified to work with my other stuff, but it look promising indeed. That was the refresher of mind I needed. Thanks again, and I''ll let you know when I get it done.


Share this post


Link to post
Share on other sites
yay!! I think I have a decent working parser now.

I started out by throwing away my previous operand and operator class, and made a polymorphic class as cwraith suggested:

  
class BoolExpr
{
public:
BoolExpr() : _a(0), _b(0) {}
~BoolExpr() { if(_a) delete _a; if(_b) delete _b; }
virtual bool Eval(bool values[]) = 0;
virtual bool Eval(unsigned long values) = 0;
virtual void Print(void) = 0;
protected:
BoolExpr *_a;
BoolExpr *_b;
};
Then derived from that:

  class BoolExpr_OR : public BoolExpr
{
public:
BoolExpr_OR(BoolExpr *a, BoolExpr *b) { _a = a; _b = b; }
bool Eval(bool values[]) { return _a->Eval(values) || _b->Eval(values); }
bool Eval(unsigned long values) { return _a->Eval(values) || _b->Eval(values); }
void Print(void) { std::cout << "or("; _a->Print(); std::cout << ", "; _b->Print(); std::cout << ")"; }
};

  class BoolExpr_AND : public BoolExpr
{
public:
BoolExpr_AND(BoolExpr *a, BoolExpr *b) { _a = a; _b = b; }
bool Eval(bool values[]) { return _a->Eval(values) && _b->Eval(values); }
bool Eval(unsigned long values) { return _a->Eval(values) && _b->Eval(values); }
void Print(void) { std::cout << "and("; _a->Print(); std::cout << ", "; _b->Print(); std::cout << ")"; }
};

  class BoolExpr_NOT : public BoolExpr
{
public:
BoolExpr_NOT(BoolExpr *a) { _a = a; }
bool Eval(bool values[]) { return !(_a->Eval(values)); }
bool Eval(unsigned long values) { return !(_a->Eval(values)); }
void Print(void) { std::cout << "not("; _a->Print(); std::cout << ")"; }
};

  class BoolExpr_ATOM : public BoolExpr
{
public:
BoolExpr_ATOM(int index) : _index(index) {}
bool Eval(bool values[]) { return values[_index]; }
bool Eval(unsigned long values) { return (values & (1 << _index)) ? true : false; }
void Print(void) { std::cout << "#" << _index; }
private:
int _index;
};
Although, this will make it harder if I deside that the recursive evaluation is too inefficient. These expressions are to be evaluated maybe a million+ times!

I followed your lead when designing the parser:

  
BoolExpr * BoolEquation::_CompileE(const char *&expr_string, std::string &var_list)
{
// E ::= And { + And }*

BoolExpr *expr = _CompileA(expr_string, var_list);
while(expr && *expr_string && *expr_string == ''+'')
expr = new BoolExpr_OR(expr, _CompileA(++expr_string, var_list));
return expr;
}

BoolExpr * BoolEquation::_CompileA(const char *&expr_string, std::string &var_list)
{
// A ::= N { N }*

BoolExpr *expr = _CompileN(expr_string, var_list);
while(expr && *expr_string && *expr_string != ''+'' && *expr_string != '')'')
expr = new BoolExpr_AND(expr, _CompileN(expr_string, var_list));
return expr;
}

BoolExpr * BoolEquation::_CompileN(const char *&expr_string, std::string &var_list)
{
// N ::= T { '' }*

BoolExpr *expr = _CompileT(expr_string, var_list);
while(expr && *expr_string && *expr_string == ''\'''') {
expr = new BoolExpr_NOT(expr);
expr_string++;
}
return expr;
}

BoolExpr * BoolEquation::_CompileT(const char *&expr_string, std::string &var_list)
{
// T ::= ( E ) | Atom

if(!*expr_string) {
_error = UNEXPECTED_END;
return 0;
}
if(*expr_string == ''('') {
BoolExpr *expr = _CompileE(++expr_string, var_list);
if(*expr_string != '')'') {
_error = UNBALANCED_PARANTHESES;
if(expr) delete expr;
return 0;
}
expr_string++;
return expr;
}
else {
int pos = var_list.find(*expr_string++, 0);
if(pos == std::string::npos) {
_error = UNKOWN_SYMBOL;
return 0;
}
return new BoolExpr_ATOM(pos);
}
_error = ILLEGAL_CHARACTER;
return 0;
}
I also stripped all white-space before starting the action.

I cannot thank you enough, cwraith. This was exactly the kind of clean solution that I was looking for but I was too burried in parser hell to see it. This is also a new way (for me) to parse, and will be extremely useful in the future.


Share this post


Link to post
Share on other sites
Or you could use the Spirit library which lets you write ''inline'' LL(inf) parsers (as in ''inline assembly'') in C++.




[ Start Here ! | How To Ask Smart Questions | Recommended C++ Books | C++ FAQ Lite | Function Ptrs | CppTips Archive ]
[ Header Files | File Format Docs | LNK2001 | C++ STL Doc | STLPort | Free C++ IDE | Boost C++ Lib | MSVC6 Lib Fixes ]

Share this post


Link to post
Share on other sites