Jump to content
  • Advertisement
Sign in to follow this  
jwezorek

writing a scripting language parser in C++

This topic is 3074 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm about to write a parser for a scripting language in C++ and was wondering which Boost libraries I can leverage to do this. I know about Boost::Spirit but, honestly, I feel like it's going to take me longer to understand it enough to use it effectively than it is for me to just write a recursive descent parser. If someone would like to talk me out of that belief I'm open to changing my mind. I was wondering what I can use to do tokenization, however. I'm looking at boost::tokenizer but all the example code I've seen is using it as a glorified string split() function. I need to actually tokenize i.e. end up with a vector of token objects that have an enum for the type of token and a possibly null string value. Can boost::tokenizer do this?

Share this post


Link to post
Share on other sites
Advertisement
ANTLR strongly seconded. Parser generators are some of the most well-understood and refined pieces of software that exist today. Rolling a parser by hand anymore is a serious waste of effort, especially for nontrivial grammars.

If you are even passingly familiar with EBNF notation, you can specify the grammar with ease, and then having a parser generator create the code for you is a snap. If you're not familiar with the notation, it's easy to pick up, and learning how to specify grammars isn't very hard.

boost::spirit is a remarkable library, but it just doesn't match up to a good lexer/parser generator. The only reason I would accept for choosing spirit over ANTLR/flex/yacc/bison/et. al. is if you need to heavily develop the grammar and don't want to spend all your time regenerating your parser code. Even then, once the grammar stabilizes a bit, transitioning to a more robust tool is strongly recommended. There quickly comes a point for nontrivial grammars where you start spending long compile times in spirit, at which point you might as well just swap to a full-fledged generator.

Share this post


Link to post
Share on other sites
Quote:

If you are even passingly familiar with EBNF notation, you can specify the grammar with ease, and then having a parser generator create the code for you is a snap.


I must be cursed or something, but I never found this to be the case. I've never once gotten a parser generator to do anything, even just pasting the example into the tool. It's not as though there's a huge userbase either, and the tutorials are pretty much 'paste this code in, and it works!', which is of course useless when it doesn't work.

Not that you shouldn't use a parser generator, but they're not a snap to use in my (perhaps accursed) experience.

Share this post


Link to post
Share on other sites
So ANTLR will generate parsing code in C from a description of the grammar, I get that, but what will be the output of the generated C code when I call it to do parsing? -- that's the part I don't understand. How do you specify the back end?

Share this post


Link to post
Share on other sites
Quote:
Original post by Telastyn
Not that you shouldn't use a parser generator, but they're not a snap to use in my (perhaps accursed) experience.


Yeah, just looking around on the ANTLR site I can tell that it's going to be hard to get set up and doing something. The documentation looks bad, basically. I'm willing to give it a try, however.

The thing is this scripting language that I'm parsing is a newer version of a scripting language that I wrote the parser for 10 years ago -- so if it comes down to it, I have the old code. I mean with minor changes I could just re-use my old code for parsing expressions, for example, which is the hard part; however, it would kind of be ugly to do it that way -- cutting and pasting legacy code --, just wanted to do this cleanly this time.

But I don't have a ton of time for this project. Am already dealing with / getting feedback from the users of the app I'm working on with the scripting language not supported yet, etc ... so we'll see how this plays out.

[Edited by - jwezorek on February 11, 2010 8:10:19 PM]

Share this post


Link to post
Share on other sites
I've basically just finished doing what you propose.

I wanted to make a simple scripting language to describe the creation of a directed acyclic graph and I used flex/bison.

flex/bison might be old but I found them easy to use, they work brilliantly together for lexical analysis and parsing and I had both those two steps done in a day coming from having no background in compilers at all.

http://www.flipcode.com/archives/Implementing_A_Scripting_Engine-Part_1_Overview.shtml

not sure if you've seen this tutorial. I used it for reference.

I also looked at the code for both the pixie and aqsis renderman compliant renderers.
They are both open source and use flex/bison for their implementation of their rsl compilers.

between these three resources and a bit of googling, It was quite straight forward to get the basics working.

I spent much more time on the semantics and code generation than the parsing using these generators.


Share this post


Link to post
Share on other sites
I would STRONGLY, STRONGLY advise against implementing the scripting language yourself unless you have a special reason to do so (such as learning). Lua+Luabind is an insanely powerful and stupid simple VM and scripting language that can do anything you ever need, and is already bug-tested like crazy.

That being said, I recently finished implementing a sorta-simple scripting language in boost::spirit... frankly, it works, and its cool...I think it took me about 95% of the time in spirit as it would take me to implement it in recursive descent, and debugging was hell.

I think either way is a decent approach, but if you arent familiar with template meta-programming and aren't interested in learning, I would just implement it yourself recursive descent.

Share this post


Link to post
Share on other sites
Quote:
Original post by Steve132
I think either way is a decent approach, but if you arent familiar with template meta-programming and aren't interested in learning, I would just implement it yourself recursive descent.


Lua is not an option. I'm actually just writing a parser and serializer not implementing an interpreter. It's kind of hard to explain what I'm doing: I'm writing something that takes a scripting language as input and serializes it to a binary format that is set in stone and is a binary serialization of a scripting language that is slightly different than the one I'm taking as input. I thought this was going to be easier because I thought that the syntax of expressions was the same in both cases. Had that been the case I wouldn't have to parse expressions because in the old format they are serialized as strings, but, as it turns out, expressions are slightly different, so to do this right, I have to parse new style expressions and then serialize them as old-style expressions, if that makes any sense. The cleanest way to do what I have to do then is do a full parse of the new-style script.

I'm not interested in learning, I'm interested in getting this done. I've written parsers for nontrivial languages professionally twice before, both times I rolled my own -- once in Java where, at the time, there were no (good) tools I could have used, and once in C++ for which I could've used lex/yacc but didn't. I'm interested in boost::spirit or ANTLR to the extent that I can get this done faster using them than not.

[Edited by - jwezorek on February 12, 2010 4:33:42 AM]

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!