Jump to content
  • Advertisement
Sign in to follow this  
plywood

lex/flex & yacc/lemon

This topic is 4153 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I'm a senior (in CS) doing a 2-credit independent study this summer and have begun to discuss a "curriculumn" layout with my course advisor. He had suggested that I do something involving flex and lemon (specifically). His instructions were for me to become familiar with them over the next 3 - 4 weeks, and then to propose a project to take on. Having never worked with these two technologies before, I started by googling them, and found that flex is (more or less) a free version of lex, and that lemon is a similar offspring of yacc. The web definition for lex is: A program generator for lexical analysis. And the def for yacc is: Yet another compiler compiler. Not being very helpful descriptions, I continued on, and believe I have the main grasp for what these two systems do (what they are used for). My understanding is that they, essentially, are given syntactical descriptions of how a language can be used legally, and then they produce parse trees that are used as "models" by which compilers can determine for the validity of statements written in the targeted lanaguage. (1) How wrong am I in this understanding? (2) How do lex and yacc work with each other? Is the same true for flex and lemon? (3) Besides the official websites for each, does anybody have any good suggestions for "further reading" on introductory material? Thanks, ply

Share this post


Link to post
Share on other sites
Advertisement
http://en.wikipedia.org/wiki/LALR_parser
http://en.wikipedia.org/wiki/Finite_state_machine
http://en.wikipedia.org/wiki/Regular_expression
http://en.wikipedia.org/wiki/Lexical_analysis
http://en.wikipedia.org/wiki/Compiler_generator
http://en.wikipedia.org/wiki/Perl

Start here, then work from here on.

The reason there's little or no documentation, is because this is a very narrow field of CS (albeit highly useful for some tasks).

It is a specialization of FSMs and related fields, it involves topics from discrete mathematics, set theory and other mostly mathematical topics.


But simply put, given:
- alphabet of symbols
- set of rules

It is possible to determine, whether a given string composed out of tokens is a valid string acording to rules.

Example:
- Alphabet contains letters 'A' and 'B'
- Rule #1: String may only contain same letters

"AAAA", "B" and "BBB" are valid strings
"", "AB", "AAAAAAAB", "xyz" are not.

Going from here, it's possible to derive self-describing alphabet and ruleset that allows automatic creation of such state checkers (lex/yacc/bison/...).

Lexer, lexical parser is responsible for reading some input, and extracting tokens from it. Lexer's alphabet is, for example, ASCII characters, rules are how these characters may follow one another (regular expressions, "a digit is a sequence of characters 0-9", a keyword is one of "begin", "end", "for", ...)

Lexer's output is either error (invalid state) or set of tokens (logical, abstract entities).

Grammar parser is same as above, but is fed the tokens produced by lexer, and defines rules at much higher level.

All of these tools are context independant. '"Hi", he said' is not context independant. He could say it emotionally, sarcastically, absent-minedly, casually, ...., and in each case the response would be different. Almost all computer languages are context independant - important property here. "for" is a for, no matter how or when it's used. Equality/assignment operator in VB could be considered to not be context independant, it can be either/or, depending on context.

This is somewhat broad topic, with plenty of theory behind it. There's also plenty of details and semantics to swallow, which lex/yacc and similar implementations hide from you, but it doesn't hurt to at least look over them.

Share this post


Link to post
Share on other sites
I done some work with flex + yacc on last semester on building a compiler. They work this way:

In flex, you specify the symbols that make up your expressions and how the parser behaves on finding those symbols. In yacc, you define a type that will be used for keeping each expression/symbol's characteristics such as produced code, text value and type info. A yacc file has every single expression that composes your grammar, along with C++ code that is executed at the end of each of those expressions, so you define some functions (including main) at the end of that file to make things happen in order to produce code to the compiled file (which would be in ASM or some kind of intermediary language, like C-ASM -- in my case, the latter). I can try and do more explanation on them, if you want.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!