Jump to content
  • Advertisement
Sign in to follow this  
chaosgame

Lexical Analyzer

This topic is 4858 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

How would I go about writing my own lexical analyzer. Just something that would just take a std::string, parse it into tokens and dump it in a std::list.

Share this post


Link to post
Share on other sites
Advertisement
Robust lexical analyzers generally use Finite State Automata. Deterministic FSAs, which are processed from Nondeterministic FSAs. The nondeterministic FSA, in turn, is built from regular expressions. Any good book on compilers will teach you how to do these things; alternatively, do a web search for the terms I've capitalized. It's a heady area of computer science, but an accessible one.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
You can write them by hand for simple things. It's really fairly simple. If you want to use more advanced techniques like DFAs etc look at lex/flex and other generators, why reinvent the wheel on something like that?

In the old days, every C programmer knew how to do these simple things. Now in this modern day, everyone wants to use the most esoteric techniques to do a child's job.


Share this post


Link to post
Share on other sites
If you are going to STL, might has well use things like string streams and tokenizers.

But if you want to do it yourself... like Sneftel said, you have to construct the regular expression of each of your token. Then you produce a state machine that will read each character until it finishes reading a token or reaches an error state.

Example, the token for integer numbers. A regular expression could look like this:

(digit)^1+

with digit = (0|1|2|3|4|5|6|7|8|9)

That translates by "an integer is any sequence of at least one digit"

then you can easily write a deterministic FSA to read it:

State 0: On digit, goto State 1
On other, goto error.
State 1: On digit, goto State 1
On other, goto State 2
State 2: Integer read.

There are a lot of tutorial around about FSA and regular expressions. Good luck!

PS: If its not for learning but for a real project,, lex and lex++ are the way to go.

Share this post


Link to post
Share on other sites
We have an article series that covers things like finite state automatat if you're interested in writing those yourself.

Another option in C++ is to use boost::spirit to create a lexer.

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
You can write them by hand for simple things. It's really fairly simple. If you want to use more advanced techniques like DFAs etc look at lex/flex and other generators, why reinvent the wheel on something like that?

In the old days, every C programmer knew how to do these simple things. Now in this modern day, everyone wants to use the most esoteric techniques to do a child's job.
There are many valid reasons for doing it by hand, such as having readable code that fits your style and design. I've yet to see a generator that produced code I could use as anything more than a standalone converter.

Share this post


Link to post
Share on other sites
Quote:
Original post by Extrarius
Quote:
Original post by Anonymous Poster
You can write them by hand for simple things. It's really fairly simple. If you want to use more advanced techniques like DFAs etc look at lex/flex and other generators, why reinvent the wheel on something like that?

In the old days, every C programmer knew how to do these simple things. Now in this modern day, everyone wants to use the most esoteric techniques to do a child's job.
There are many valid reasons for doing it by hand, such as having readable code that fits your style and design. I've yet to see a generator that produced code I could use as anything more than a standalone converter.


Not only that, but for complex grammars auto-generated code can be quite slow compared to a hand written lexer. Even then, by far, the parser in a typical compiler is the most processor hungry part.

And anyway, every real computer scientist needs to know DFAs, NFAs, regex, formal grammars, and how to link them all together. They're essential to formal, theoretical computer science as well as understanding of more complex automata and turing machines, which in turn are essential to the study of algorithms.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by Extrarius
Quote:
Original post by Anonymous Poster
You can write them by hand for simple things. It's really fairly simple. If you want to use more advanced techniques like DFAs etc look at lex/flex and other generators, why reinvent the wheel on something like that?

In the old days, every C programmer knew how to do these simple things. Now in this modern day, everyone wants to use the most esoteric techniques to do a child's job.
There are many valid reasons for doing it by hand, such as having readable code that fits your style and design. I've yet to see a generator that produced code I could use as anything more than a standalone converter.


If you want to waste your time nitpicking about something like that, go ahead.
Many times strtok() suffices. When strtok() fails, there are handrollers. When handrollers fail, there is Lex. When lex fails, you kill yourself.

Not to mention Python, Perl and other scripting languages tend to have a ton of tools to help too that are fast and simple to use.

But rolling the DFA as pure C code is very simple anyways.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by etothex
Quote:
Original post by Extrarius
Quote:
Original post by Anonymous Poster
You can write them by hand for simple things. It's really fairly simple. If you want to use more advanced techniques like DFAs etc look at lex/flex and other generators, why reinvent the wheel on something like that?

In the old days, every C programmer knew how to do these simple things. Now in this modern day, everyone wants to use the most esoteric techniques to do a child's job.
There are many valid reasons for doing it by hand, such as having readable code that fits your style and design. I've yet to see a generator that produced code I could use as anything more than a standalone converter.


Not only that, but for complex grammars auto-generated code can be quite slow compared to a hand written lexer. Even then, by far, the parser in a typical compiler is the most processor hungry part.

And anyway, every real computer scientist needs to know DFAs, NFAs, regex, formal grammars, and how to link them all together. They're essential to formal, theoretical computer science as well as understanding of more complex automata and turing machines, which in turn are essential to the study of algorithms.


You make a good point but for 99% of the time you only need very basic tools.

If the OP wants to write a compiler, than by all means, this advice is good.
I am assuming OP wants practical advice for daily chores, not writing the next C#2++ compiler etc.

Share this post


Link to post
Share on other sites
Quote:
Original post by chaosgame
How would I go about writing my own lexical analyzer. Just something that would just take a std::string, parse it into tokens and dump it in a std::list.


Not sure what you are trying to do, but the easiest way is to parse something is just to use yacc and define your grammer. Its pretty easy. Its not so hard to make your own parser, but the nice thing about yacc is that its very easy to make changes. The only problem with shift/reducing parsing is that its hard to get good error messages.

Of course, parsing begs for a lisp like data structure, its really the easiest way to implement it.



Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!