Sign in to follow this  

flex vs lemon

This topic is 3860 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I am trying to understand the inherent differences between flex (a free rendition of lex) and LEMON (apparently a distant cousin of yacc). (1) What does each technology do? What specific functions do they perform? (2) What do they take as inputs and what gets "spit out" the other end? (3) What order does one use them in: flex -> lemon OR lemon -> flex.... and why? Hopefully this should clear a lot of things up for me... thanks, ply

Share this post


Link to post
Share on other sites
Quote:
Original post by plywood
I am trying to understand the inherent differences between flex (a free rendition of lex) and LEMON (apparently a distant cousin of yacc).

(1) What does each technology do? What specific functions do they perform?
(2) What do they take as inputs and what gets "spit out" the other end?
(3) What order does one use them in: flex -> lemon OR lemon -> flex.... and why?

Hopefully this should clear a lot of things up for me...


Both tools generate code for you, so you don't really use them in either "order". However, the code generated by flex will feed data to the code generated by LEMON, so in that sense it is "used" first.

The input to flex is a file that describes what kind of "tokens" exist in your programming language. Typically, these are things like "a variable name", "an integer constant", "a solid chunk of whitespace", "an operator symbol" etc. You use regular expressions to describe what sequences of characters make up a valid token of each type. The output is some generated C (also usable from C++ with a little care - and you might be able to configure it to output stuff in other languages, but I wouldn't know) code that is designed to read input from a file and create instances of a 'token' struct (this will typically include an enumeration indicating the kind of token, and an optional "value" field - e.g. it might hold the numeric value if the token is the "integer constant" type, or the name of a variable for a "variable name" type). You'll need to write a little code that's embedded in your flex input file in order to explain what your structure should look like and what gets packed into it. However, the overall effort is much less than it would be to do the file reading and regular-expression-matching yourself.

The input to that code is the input source file of the language you're creating, and the output is token structures.

The input to LEMON is a file that describes the "grammar" of your language; i.e. what kinds of sequences of tokens are used to make up the statements and expressions. You also describe actions to take when the parser encounters particular sub-expressions: typically, these are used to build some kind of tree structure - an "abstract syntax tree" - out of the input tokens. The descriptions here are more powerful than what (ordinary) regular expressions allow for; that level of power is usually not called for when detecting tokens. In particular, these "context-free grammars" allow you to define constructs that are described recursively (i.e. an arithmetic expression is made up of smaller - but also valid on their own - arithmetic expressions, until you get down to the level of simple compositions of values and a single operator). The output from LEMON is code which can look at tokens in order, and recognize the structures as they appear, performing the requested actions as they do.

The input to that code is a sequence of tokens, and the usual output is the same set of tokens, but rearranged into a tree.

Finally, the rest of the code that you write for your compiler ;) takes this tree and uses it to output the compiled code.




The usual "tutorial" example for using a lexer and parser-generator combination is to make a basic calculator. The idea is that your "language" contains only arithmetic operators, parentheses, numbers and whitespace tokens (which you ignore). The "compiled code" simply consists of the result ;) so the semantic action that you take for each sub-expression in the parser is "evaluate the subexpression and store the result in a number-token" (so that that number-token can be used in recognizing a higher-level sub-expression), and the final piece of code that you write takes the tree, and:

- If there's just one node, of number type, then it must be the result, so we output it.
- Otherwise, there must have been something wrong with the input, so we report an error.

Share this post


Link to post
Share on other sites

This topic is 3860 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this