Sign in to follow this  

python: binary expression parser

This topic is 4093 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I want to write a binary expression parser in python - one that will take statements like:
(a AND b) OR (c AND (d OR NOT e)) 
and evaluate it for given set of variables a, b, c, d, and e. Now, this seems to be a pretty standard parsing problem that I could throw bison or maybe antlr at. But that seems overkill, and I'm not familiar with those tools anyway. Alternatively, I could write one from scratch in a (probably) naive manner. But I kind of don't want to. It would be a good learning experience, since I've only done the theory of formal languages and grammars in my classes and no real coding, but I don't got the time. So - is there a really quick way to parse these? I'm using python so it seems like there might even be a package for this kind of thing. Thanks :)

Share this post


Link to post
Share on other sites
Here's a somewhat tongue-in-cheek solution:


class symbol: pass

AND = symbol()
OR = symbol()
NOT = symbol()

def parse(string, **kwargs):
# Turn the string into syntax that represents a nested tuple of variables
# and AND/OR/NOT symbols, evaluate the string to create that nested tuple,
# and then evaluate the expression. Here the keyword-args to 'parse'
# will provide the context in which the expression is evaluated.
return simplify(eval(str.replace(' ', ','), kwargs))

def simplify(tree):
if not isinstance(tree, tuple):
# We better have a variable...
assert isinstance(tree, bool)
return tree

# Handle any NOTs while simplifying subexpressions.
parsed = []
sense = True
for node in tree:
if node == NOT:
sense = not sense
else:
subexpression = simplify(node)
if not sense: subexpression = not subexpression # odd NOT count, so invert
parsed.append(subexpression)
sense = True # reset for the next set of NOTs.

# We should now either have a single item, or x AND y, or x OR y.
assert len(parsed) in (1, 3)
if len(parsed) == 1: return simplify(parsed)
# otherwise, we have an AND or an OR
assert parsed[1] in (AND, OR)
if parsed[1] == AND: return simplify(parsed[0]) and simplify(parsed[2])
else: return simplify(parsed[0]) or simplify(parsed[2])


Edit: Fixed now, I hope.

Share this post


Link to post
Share on other sites
Aha, that's the kind of trick that I was looking for.

I don't really know parsing too well (I've never done real coding in that, like I said I pretty much only know the theory) and I don't really know python or antlr either. So I didn't really want to mess around trying to hack something together on my own.

Share this post


Link to post
Share on other sites
Alright, thanks a lot! It works great!

One thing though, it has to be this way:


for node in tree:
if node == NOT:
sense = not sense
elif node == AND or node == OR: ### <-----------
parsed.append(node) ### <-----------
else:
subexpression = simplify(node)
if not sense: subexpression = not subexpression # odd NOT count, so invert
parsed.append(subexpression)
sense = True # reset for the next set of NOTs.


because otherwise it will attempt to call simplify(AND) or simplify(OR) and it would fail with an AssertionError as AND and OR are not tuples or bools

Share this post


Link to post
Share on other sites
Yes. Equivalently, you could define simplify(AND) to return AND, and similarly for OR. Thanks for taking the time to actually test it ;)

Kylotan: I primarily make use of eval() for rapidly developing text data formats. I don't even do any pre-parsing, either: I just document "you should make one big dict with these keys, each of which has values of those types... oh, and these class definitions will be available...", and then read and eval() the whole file as a single expression. Works a treat - way less code than even the barest-bones of XML readers, yet the data still gets typed automatically. :) (Oh, and you can easily switch between human-readable format, pickle jars and shelve databases. :) )

Share this post


Link to post
Share on other sites
one last thing:


# We should now either have a single item, or x AND y, or x OR y.
#assert len(parsed) in (1, 3) #<--------
if len(parsed) == 1: return simplify(parsed)
# otherwise, we have an AND or an OR
assert parsed[1] in (AND, OR)
if parsed[1] == AND: return simplify(parsed[0]) and simplify(tuple(parsed[2:])) #<----
else: return simplify(parsed[0]) or simplify(tuple(parsed[2:])) #<----


What is happening here, is that suppose you had a statement like:


a AND b AND c


Then, we would like to recursively call:

simplify(a) and simplify( (b,AND,c) )


and we can do this by creating a new tuple out of the slice removing the a,AND elements from the parsed list.

Finally, as a side note: it seems that this causes expressions such as:

a AND b OR c

are evaluated as:

a AND (b OR c)


which may or may not be a confusing operator precedence scheme. however, this isn't really a problem since truly "unambiguous" binary expressions always evaluate as intended - so I'm just going to leave it as is.

Share this post


Link to post
Share on other sites
... Actually, that's all much too complicated, although the even-more-tongue-in-cheek way is even less fun. ;P (Instead of transforming spaces to commas and doing all this processing, just lowercase the AND/OR/NOT keywords and eval() to get the answer directly.)

Share this post


Link to post
Share on other sites

This topic is 4093 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this