How to build a simple text parser

Started by
6 comments, last by jniblick 11 years, 7 months ago
I have been working for a little while now on a text based adventure game in the vein of Zork, and I have gotten to the point where I need a text parser. I naturally went online and looked for a tutorial on building one, but I came up short. I was dearly hoping that someone here could give me the basics on building a text parser and help me get my project going. Thank you.
Advertisement
Please be more specific. There are an endless variety of text-parsers that work in ways limited only by your imagination; from command-line parsers to rich-text parsers...
_______________________________________________________________________________
CEO & Lead Developer at ATCWARE™
"Project X-1"; a 100% managed, platform-agnostic game & simulation engine

Please visit our new forums and help us test them and break the ice!
___________________________________________________________________________________
I describe basic text parsing here; is that the kind of text parsing you're interested in?
I am looking to build a very simple command line parser that will work in a similar manner to that of Zork and take input from the player and respond without the need to hard code all the different possible outcomes. I want to build something very simple that breaks down the string into tokens and then compares those tokens to words that the game recognizes. I do not know how to do this.

I naturally went online and looked for a tutorial on building one, but I came up short.


That is likely because what you are looking for is complicated. In short, there is no short and easy answer.

For a very simple text adventure game, you could opt to support a limited subset of input, for example only verb + noun.

Once you start wanting to parse more robust input, things quickly get more complicated.


> Examine the sword
Which sword, the silver one or the green one?
> The green one
The green sword is very sharp.
> Pick it up and put it on the altar of silver
You pick up the green sword and place it on the Altar of Silver Light.
[/quote]

Notice the disambiguation and handling of ambiguous English in the commands.


>Pick it up and put it on the altar of silver
[/quote]

In the above line alone, the parser has to determine that both of the "it" words refer to the sword just picked up, that "put in on" doesn't mean wear something, that "silver" refers to an object named "Altar of Silver" and not an adjective for the silver sword, etc. To handle this type of complexity, you need a parser that can handle ambiguous grammar, and a system built around it that can select the correct parse tree from a collection of possibilities, or othewise ask for clarification from the player in order to select the correct tree. You will also need some sort of database of world objects, along with their names and adjectives, including plural forms, together with a sytem to keep track of item scope (i.e. what objects the player can see, or touch, or pick up).

Zork was created using a domain language called Inform, you can find a lot of info on it here, including a lot of high level info on their parser:
http://emshort.wordp...signers-manual/

Here a couple links for discussions on the subject:
http://www.mud.co.uk...rd/commpars.htm
https://groups.googl...7E/e-xy-z6WRfUJ

Here is another take on tackling the problem:
http://fiziwig.com/intfic/design.html

Finally, a while back I implmented an Earley parser (http://en.wikipedia....i/Earley_parser), which is a type of parser that can handle ambiguous grammar, as an experiment in creating a generalized parser for these kinds of games. I can tell you it was not easy, at least for me, since I have had little experience with the field myself.

EDIT: just saw your clarification on "very simple parser", in this case I wouldn't aim as high as Zork, instead try and get a verb + noun system to work first.
Perhaps you can, for the sake of learning, abandon the concept of writing an advanced text parser accepting commands in [action][target] form... Perhaps the best way for you to go is to use a "multiple choice" type of system, which means you only have to handle a limited number of cases and user input. You can step things up a notch and "script" your entire game in a basic text file the game just reads... the game knows what to expect as input, things work easily and transparently, everyone wins...
_______________________________________________________________________________
CEO & Lead Developer at ATCWARE™
"Project X-1"; a 100% managed, platform-agnostic game & simulation engine

Please visit our new forums and help us test them and break the ice!
___________________________________________________________________________________
Keep it simple for now. Split your dictionary into verbs and objects, you might even want to stick with the typical verbs found in point and click adventures.

"Please use the hammer to hit the nail"

First thing you are looking for is a verb, so look for one.

"Please" is useless and not found in the list of verbs. Move on.
"Use" is found, so from now on you search the object list.
"the" is again useless, not found and skipped.
"hammer" is found, so you got your first object.
"hit" is not found in the object list (though it could be in the verb list). Ignore.
"nail" is finally found and your sentence is complete: "use hammer nail"

Of course the objects might be looked for in the players inventory or the current scene (a global list would require storing the current location for each object).

To "resolve" your input, you could for example just use a nested map, so to define the outcome it could be something like:

someMap3["use"]["hammer"]["nail"] = functionToExecute;
someMap3["use"]["nail"]["hammer"] = functionToExecute;

Why both? Because the user could go with "hit nail with hammer" instead.

Also, there would be maps for just verbs ("look") and the most common 2 word inputs ("examine hammer", "open door"). Unless of course you'd rather go with a tree for parsing, which might seem a bit more natural.

Note that this is a very primitive method and you have to be careful not to have multiple objects with the same name. Also, several words should refer to the same verb or object, also easily done with a map.

verbs["hit"] = "use";
verbs["open"] = "use";

That drastically limits your combinations, but also allows generic inputs like "use chest". You can also go with enums for all your verbs and/or objects. Changing the maps to

verbs["hit"] = VERB_USE;
verbs["open"] = VERB_USE;

Error messages might not always be useful, especially when things are ambiguous.

"Hammer nail into wall". If hammer isn't a verb, the parsing will fail completely ("What?"). If it is, you might get "Hammer with what?", requiring absurd input like "hammer nail into wall with hammer", which can fail if wall is also recognized ("I can't use wall").

So the first decision: do you want to spend a good bit of time on a clever parser or just make it work?
f@dzhttp://festini.device-zero.de
I am looking for a basic idea for a command line text parser. I am really just looking for a basic idea for now and I will build upon it as necessary. You have helped me all to that extent and for that I thank you.

This topic is closed to new replies.

Advertisement