Any good introduction for a noob on parsing scripting languages with Python?

Started by
1 comment, last by Alberth 7 years, 1 month ago

I want to start learning how to parse fragments of text -mostly, snippets of code in a custom language-, using Python. But I'm at a loss since all the material I find seems to be quite advanced, or at least it's terminology it's to foreign to me.

So, is there any tutorial "for dummies"-style on parsing custom languages with Python? Could you perhaps point me to any good material? I know that there are many packages for parsing (PyParse, modgrammar, etc), but besides I needing some introductory material, I also am not clear as to wether they would work with Python 3.6 or not.

Any advice would be welcome. Thank you in advance.

Advertisement

I used pygments to do some parsing with, its fairly straight forward and will tell you what token each word in a script is. Pygments supports a lot of languages, I for example used it to figure out what the arguments and names of public function were in a actionscript 3 class. This took me about 3 hours to write most of that time was not spent on integrating pygments to the script.

Pygments is mostly used for langauges highlighters in text editors, it allows you to define the grammar it matches the string against too.

You might want to look at abstract syntaxes to understand the meaning of the concepts mentioned in parsing.

Worked on titles: CMR:DiRT2, DiRT 3, DiRT: Showdown, GRID 2, theHunter, theHunter: Primal, Mad Max, Watch Dogs: Legion

There is "Text processing with Python" which looks like a good starting point. http://gnosis.cx/TPiP/

Never read it myself, as a glance through the table of contents suggested I know most of it.

Python versions are not that interesting, plain string operations haven't changed much. In addition, newer Python typically get new things added, and very little (if any) things removed, so you should be ok with anything Python 3. In fact even 2.6 and 2.7 will mostly work, as these version moved a lot towards Python 3 already. Biggest difference between 2 and 3 is how data 'outside' and data 'inside' is now strictly separated. You get that when writing or reading files, but those are standard patterns, so once you know them, it's no problem.

As for parsers, I always use PLY ( http://www.dabeaz.com/ply/ ), which is "Python Lex & Yacc". Lex and Yacc are the defacto standard tools to write parsers for production compilers, they are old tools, and the C version is all over the Internet, and in compiler construction books.

This topic is closed to new replies.

Advertisement