[C++] (Scanner/Lexer) / Parser

Started by
6 comments, last by hughiecoles 12 years, 8 months ago
I'm familiar with C++, but I admit I'm no pro. I didn't know whether to put this in 'Game Programming' part of the forum or here, but seeing I'm not dealing with things exactly for games, I figured I'd put it here.
I'm relatively new to the whole lexer-parser concept and its uses, but I became quite interested in exactly what sorts of things it can do after stumbling upon a shoot-em-up maker called "Danmakufu".
I've managed to get my eyes on the source-code of this engine, and a part of it consists of a lexer/parser. Anyway, to the meat of my question....
How exactly does this thing work? I understand how to use it, but the inner-workings of this engine proved to be quite difficult for me to put together. And I'm only new to this concept to begin with, so it makes it that much more difficult for me to understand.

Attached is the source code to the script engine I've asked from the author, who sadly can't explain how it works to me in detail in English.
The script engine is a part of the game engine he made (which is free). The idea is, it's a shmup-maker. The whole thing uses DirectX to animate and sound, and the resources are images/sounds provided by the person making the game. The whole thing uses a text document to make sense of these images/sounds to animate the game in almost any way one can imagine.

It uses a C-like language to make sense of everything. However, there are a few differences, such as the way it holds data. The keyword "let", for example, provides the respective variable the capacity of holding an integer, character, string, or a directory (within the engine, of course, since the text document is simply one, long string of characters).
Anyway, I need help deciphering this engine. One thing I'd really like to know in this engine is the use of microthreads/tasks and the use of yield, which I believe to be extremely useful in part of its design.
This is the source code to it. [attachment=4258:ScriptEngine.zip]

Here's what I understand so far.
-The input is a form of string
-The operators/operands/identifiers somehow get converted to tokens by the scanner
-These tokens go through a grammar check and converted into a readable code by the engine
-The grammar of this engine is similar to the C format. It has functions, tasks, allows use of variables, and a unique "yield" feature to suspend certain tasks at any time, and with precision, for later use.
-"MainLoop" is run every cycle in this engine, and the keyword yield appears to suspend that task and look for (and execute) other unfinished tasks.
-The same is true for a yield in any of the other other tasks. More than one yield suspends those tasks for another cycle, or more depending on the number of them.

Can someone break down this engine for me and how it works?
Advertisement
So this looks like a pretty standard hand-rolled recursive descent parser with a limited AST generator.

Basically the thing loops through your code and transitions through a state machine depending on what it finds; each state defines what is valid to come next, thereby specifying the overall grammar of the language. As it goes, it constructs these AST nodes which are basically hand-rolled variant records that can store different types dynamically. Once the AST is built from the code, the execution layer just walks it and does the according actions based on the node types and their contents.

Beyond that... it seems like you already have a pretty good grasp on it. Is there anything in particular which remains mysterious?

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

ApochPiQ,

Thank you for the reply.
Yes, I'm not very familiar with the concept of a state machine and AST generator.
Please elaborate what they are and what they do.
A state machine (more correctly a finite state machine) is simply a set of rules organized around some fairly basic ideas:
  • You can be in precisely one state at any given time
  • A state can transition into any number of other states
  • While you are "in" a state, you act in a particular context
  • Transitions between states represent changes to the context

You've almost certainly run into this concept before, although perhaps not formally. Many programmatic processes are directly isomorphic to finite state machines. Consider a GUI program with three different dialogs, which the user can navigate between, but they only see one dialog at a time. (Installers are a familiar example of this.) You can model this as a FSM where each dialog is one state, and each sensible change to a different dialog is a transition.

Basically any process that changes its behavior based on some contextual information can be considered a form of state machine, although that's stretching the definitions a little bit. FSMs are common in game simulations where an AI agent might do different things based on his current circumstances; a classic example is the three-state "Hunt/Evade/Dead" machine where a creature can swap between pursuing and escaping foes, and eventually will wind up in the "dead" state (from which there are no valid transitions). Pac-Man ghosts are modeled like this, for instance; eat the power pill, and they transition from the hunt to the evade state; gobble the ghost, and they go to dead. (Although in this case they actually transition to "respawning" whereby the eyeballs do their little animation and then the ghost reappears in the corral.)


ASTs (Abstract Syntax Trees) are a family of data structures for representing programs. The canonical example is a simple algebraic expression language: (1 + 2) can be represented as a tree with + as the root, and 1 and 2 as leaves. This means you have an operation (addition) based on two inputs (1 and 2 in this case). You can then recursively construct more sophisticated concepts using this pattern, such as 1 + (2 * 36), and so on.

AST generation is at the heart of compiler and interpreter implementation. It's a hefty subject and in fact merits several university-level courses to really cover in depth. There are many categories of operations that you can do on an AST, such as improving code optimality, and so on; but for the purposes of this program it's basically just a matter of building a data structure that's equivalent to the code but faster to operate on than a string buffer.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

ApochPiQ,

Alright, so I have a basic understanding of what they both do.
But I still don't understand what's going on in this engine as it does what it does.
The root of my question: how does this thing work?

I ran the debugger to try to see how this thing processes, but it seams understanding the structures do is my problem.
It starts at int main() function, as any simple program.
There, it runs across RunSample() function. This function assigns the string as the source input string.
That's about all I know.
Can you help me follow it as what it's doing after that? I want to understand this thing in depth. Is this the right place to ask?
Approaching this from the source is a poor way to do this. I'd recommend you get some theory behind you first, the "dragon book" would be somewhere to start.

One way to implement "yield" is to separate the local state used by a particular task (for example the program counter and stack pointer) from the program data (globals and the heap, or their equivalents). This can then stored it somewhere when "yield" is executed. A top level loop can cycle through all the available tasks, restoring their execution state and running them until they choose to yield.
This frankly isn't the kind of thing to try and learn in 10 minutes on an internet forum; it's a vastly complicated subject and as rip-off and I have both noted it carries a huge amount of studying for people who do it on a regular basis. If you're really interested in taking it apart, you have two options: get a grounding in compiler theory first, then tackle it with those tools; or apply yourself and learn how to read code more effectively. Both options are going to involve a huge amount of work and time.

Honestly this is really, really simple as far as compilers/interpreters go. It took me about 2 minutes of scanning the code to figure it out - but that's because I have a background in language implementation and extensive experience reading other people's code. To get to a point where you can grapple with it as efficiently, you'll have to invest in one skillset (or ideally both).


I don't mean to be obnoxious about it, but please do consider what you're asking: either I have to spend my time reverse engineering this thing, then break it down into pieces you can understand; or I can point you to the resources to let you do it yourself. Me being both a busy guy and pathologically lazy, which do you think I'm going to opt for? wink.gif

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

I second the "Dragon Book" recommendation.
--------------------------------------Not All Martyrs See Divinity, But At Least You Tried

This topic is closed to new replies.

Advertisement