Natural Language Parsing

Started by
21 comments, last by GeniX 24 years ago
Hi, I had an idea for creating characters in games with whom the player can converse. I am interested in NL parsing for this. If the computer could parse and pull apart an entered sentance, breaking it up into parts, it could look up in a small dictionary in the game engine the ''meaning''s. ex: Take the axe, and chop off the goblin''s head. The parser would (in theory) locate (this is very simplistic example): Sentance 1 Do-er: Self Action: take Obj: axe Sentance 2 Do-er: Self Action: chop Obj1: axe Obj2: head Obj2 Adj: Goblin''s ... Thus, you see what Im looking at. Most of you who have played games before would think of some of the later text adventures where the parsers got pretty damn complex! I just need some pointers to information on how to split up the english language and "parse" it. (developing a mini-scripting language similar to Prolog or such may be the key for allowing easy but configurable language rules). regards, GeniX
regards,GeniXwww.cryo-genix.net
Advertisement
I would suggest an introductory book on linguistics, in the are of syntax (maybe with semantics). There should be references that basically break English into a parse tree, much like this

Phrase = Subject, Verb, Object, (Conjunction, Phrase)*

You will likely want to take a reasonable subset of the complete list, for there are several deviations from a standard and English has some anomolies.

Be aware that if you wish to port the engine to different languages these rules will change.
Hmm.. the subject of imparitive sentences is implictly "you", so you''d have to take those sentences as if the person was talking to the computer. Same overall effect, I just have a grammar thing.

In imparitive (spelling?) sentences, the verb is always first "Look out!", "Give me that.", unless it''s a negative sentence, then look out for Do nots and Dont''s, but I doubt people would be typing in stuff like "Don''t walk on the lava" in a video game. For merely controlling the Player Character (PC) this sort of expectation of sentences is sufficient, but actually conversing with NPCs may be quite an impractical feature for a video game. But if you''re willing to tackle it you should look at the source for other language parsers to see how they split up the language.
http://ciips.ee.uwa.edu.au/~hutch/hal/
MegaHAL might have it''s source available.
I was more looking to break up sentances typed to a "simulated computer" which would then respond.

It would make RPG''s and such a lot more interesting.

Jason Hutchens page with MegaHAL didnt impress me that much. His target is a more general conversation bot which is more orientated toward generating sentances. Im looking more to give a fake illusion of understanding in a situation.

ie: Inn Keeper vs Player

Player: Hi there.
Inn: Good Morning sir!
Player: May I get a room?
Inn: We have 4 rooms free at 8 each.
Player: Give me a room.
Inn: Here you go, sir. Room number 5.

--

Breaking up the players sentace would help the "NPC" to get a better idea of what verb is referencing what object and such... then the Inn Keeper could appear to "understand" the player. Ofcourse situations where the player asks something totally unrelated to the Inn Keeper would generate some kind of "I dont know what youre on about" response.

If anyone knows particularly of web sites which may have info on such topics, please post here.



regards,

GeniX
regards,GeniXwww.cryo-genix.net
You need to consider generating a ''dictionary'' for your game, of all the verbs, nouns, adverbs, adjectives, etc. This allows your parser to make a much better guess at which parts of the sentence are what. You can then strip out the fluff (articles like ''the'', or ''a'', generally), validate the sentences (certain verbs will need no object, some will need one, some also need an indirect object), and correspond that to some sort of look-up table in your game that corresponds to game logic, usually some sort of NPC knowledge base. To be able to ask questions requires a different sentence structure (and therefore slightly different parsing order) than for statements, so looking for a question as the first thing you do could possibly simplify it.

Also - adjectives and nouns don''t necessarily have to be treated separately, as they are both just variations on ways to identify that item from others. The noun specifies type, the adjective specifies appearance etc, but as long as it distinguishes that item, you have as much info as you need.

Look up recursive descent parsing somewhere - it''s more commonly used for interpreting scripting languages, but you should be able to use some of the ideas for parsing English or any other language.
There''s one thing that no-one has thought of yet - spelling. It wouldn''t look to realistic if the player said "Giv the axe to me" and the keeper said "I have no idea what your talking about". Once you have a complete dictionary, I suggest that you modify it slighly to include slightly mispelt words, but giv (sorry) them the same meaning as the correct ones.

Don''t bother about it now, include it when you have the rest of the game working.
Regarding mis-spelt words, you could use some heuristic to compare an unknown word with all the words in the dictionary and try to guess the meaning. You could cut down the search by analysing the type of word you''re expecting (verb, noun etc) and search those word lists first.
So if "Give me a sord" matches a verb phrase best then the computer knows that sord is most likely a verb, searches the list of verbs comparing word length, word ordering and word fragments against the verb dictionary and hopefully come up with the word sword as the most likely answer.
You can then use the fact that the word was mis-spelt to ask the player a question. "Did you say you wanted a sword?".
None of the above is difficult it''s just getting the mix right.

Also, it would be useful to have many different examples of each phrase type, from
"I would like a sword"
to
"Ug want sword"
As long as these didn''t conflict or confuse other phrases.

Mike
Another thing to remember: most people like to use pronouns in communication. They are easier to type than always using the proper, longer, noun. Also, your NPC needs to remember the things that were said to it. At least for a short time. I had an old text adventure game that used natural language very well. It understood context when it came to individual words. Some words could be both a verb or a noun. An example is the word "check." The check is in the mail. And, I will check the mail. The game understood the difference between these two sentences. Any dictionary you make would have to include how the word can be used as well as the definition for each use.

Making friends one burger at a time.
Thanks.

Altho a lack of references, the postings have been useful.

I had not considered spelling mistakes :-)

A dictionary of words - either associated with actions, or objects in the game-world would be ofcourse a nessecity.

Maybe even having the language ''rules'' not too strict. Thus if a sentance is entered with slightly incorrect grammar, the parser would try to find the closest matching rule or such.

A recent ''history'' is also a must. It would be nice if not only could the NPC match the pronoun to the last spoken about object (with gender), within a short period of time ofcourse, but also if the NPC could also use pronouns in its responses.
May seem more realistic.

Still, does anyone have any texts/URL''s which may help me out with this kind of stuff?




regards,

GeniX
regards,GeniXwww.cryo-genix.net
Andre Lamothe has written some stuff on this topic. The articles can be found with his latest book, Tricks of the Windows Game Programming Gurus, and with the older Teach Yourself Game Programming in 21 Days. I have used his solutions in a couple of very different contexts and they work well. I don''t know if the articles are available separately from the books, however. If you are stuck in finding them, i think i have one in HTML i could email to you. Just mail me @ rjbianco@home.com if you want it...

<(o)>
<(o)>

This topic is closed to new replies.

Advertisement