English Language Processing?

Started by
16 comments, last by Rhalin 18 years, 6 months ago
The progress on my AI bot is continuing. I now have a functional IRC client system and other basic parsing facilities. The bot can connect to an IRC channel, send and receive messages, etc... The challenge is now to process the english language. I am thinking of basing it on a "knowledge base". This would be my equivalent of a dictionary, but with logical links between the various items stored in it. Internally, it would be implemented as an std::multimap, where the keys are strings (the words), and the mapped elements are "knowledge elements". This would limit the knowledge base to the amount of RAM I have, but since I currently have 2GB, and don't plan on expanding this much farther, it should be fine. As for parsing, as I stated before, the bot will expect you to address it by his name, which is "Zee". So whenever someone says "Zee, <something>", the bot will attempt to parse that something. To make the processing simpler, the received sentence will be converted to lowercase, and split by tokens (words). It will then be possible to match these words in the knowledge base. Where I am getting is the challenging part. I'm wondering if I should go with a purely grammatical approach. I'm thinking of classifying sentences based on type, where the relevant types would be a question, a statement (ie: Cats are animals), or an order. But I know there can be some quite complex sentences in english, which are composed of many phrases. I'm wondering if I should take that into consideration as well, and how to go about things. The problem of identifying sentence types isn't that obvious either (there are *lots* of ways to phrase a question). Has anybody got some experience with this? Anyone wants to give some insight or feedback?

Looking for a serious game project?
www.xgameproject.com
Advertisement
good luck to you.

many have tried and failed.
-

I think your best bet is to create a bot that learns from what others say ( from scratch preferably ). Your bot can then learn how words are connected to one another. For example if someone says "hello my name is tom" and then says "hi my name is tom" you can probably approximate that 'hello' and 'hi' mean the same thing. but my knowledge is not so advanced when it comes to natural language processing.

The real problem starts when you try to give meaning to words or phrases :P
-www.freewebs.com/tm1rbrt -> check out my gameboy emulator ( worklog updated regularly )
I agree with ErUs.
It sounds like a great way to implement AI. In fact, I have once tried it (but not online) but I never stuck up to it. :(
Anyway, about the meaning of the words and phrases. Why not try to implement those like feelings of characters are implemented in games such as "The Sims".
Make a few sorts of main feelings and give the words those feelings... Perhaps let the online users help you fill the feelings in. Then, try to make the comp feel one feeling more than the other and change his "feelings" according to words he recognizes...

It might not be exactly what you asked, but it could make a great AI game :)
Don''t have one, sorry.
Well first I would reorder the sentence into a standard format, deleting all extra parts. For example take "The dog walked down the street" This could be shortened to "Dog walked down street" We have the main noun at the start of the sentence followed by a verb.... If we had "The street is where the dog walked" then we need to "Dog walked down street".....

Perhaps you could setup some kind of "grammar rules" file which would tell the AI how to order a sentence. After you have it ordered a certain way, then you can start doing stuff with it. For example if you KNOW for a fact that the first word is the noun, then you can figure out a responce.

Although this doesn't do much good without some kind of AI backing it up. How do you formulate a responce? I guess one way would be for the AI's goal to be to gather as much information as it can.... IE a one way conversation "What color is your dog?" "What is your dog's name?" So that it just asks questions based on what people say.
This is a very big, very HARD field. Good luck!

I can give you some google words to search for:

ATN
FSM
Statistical Language Parsing
NLP

Python has an NLP toolkit
Perl has many linguistics tools and many many tools to help you in CPAN.
Prolog is a pretty darn good language for this stuff
Lisp and Scheme also excel at this.

I'd avoid C++ for something like this if you can help it.
"It's such a useful tool for living in the city!"
I'm surprised that everyone is making this out to be really hard - it's true that to get 100% accuracy is impossible with current technology, but parsing the average English sentence is actually pretty easy, and computer games running in 64Kb were doing a good job of this 20 years ago.

I would advise creating a vocabulary database of verbs, nouns, adjectives, etc. You may even be able to download such a thing somewhere. Then you can mark each word in a sentence with the appropriate type, and that then allows you to determine the sentence structure. You could even do this with Flex/Bison. All you have to do is put in a decent grammar definition, which shouldn't take long if you have plenty of training data - in your case, humans talking to it. Just log the sentences Zee couldn't understand and adjust the grammar or vocabulary accordingly.
Max payne, continuing with my help with zee....


You can keep track of the "Subject" of the sentence. When it is you, talk until it is one of the other people. Then wait for it to become you again, or you "timeout" (make commetns about things after a piriod of activity, or inactivity).

ie. when people are talking a lot, but not about you, after maybe 15secs-1 min just find a random "High probability of making a successful responce" sentence and respond to it to *poke* yourself into the convo.
You then set the subject to yourself.

Also, after awhile, fire off a random "Conversation starter" like a "Whose here, random joke, ect."

Just remember to eventually leave if nothing happens in maybe 5 mins.

Eg. Hi Zee.
Subject Zee, so talk

How are you?
Subject Zee (nothing changed) talk

Hmmm. Great
Subject Zee (nothing changed). talk

Dazza how are ya
Subject Dazza. be quiet and listen. (and log).


Pretty much, what you need is something that takes a sentence and a rule, and changes some varaibles.

ie.

I love school
[subject] love [object]

Changes subject to be I (which is replaced with the users nick) and [object] to be school.

Czar_Botmaster (still willing to talk. You havn't been on recently...)
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
Quote:Original post by Kylotan
I'm surprised that everyone is making this out to be really hard - it's true that to get 100% accuracy is impossible with current technology, but parsing the average English sentence is actually pretty easy, and computer games running in 64Kb were doing a good job of this 20 years ago.

I would advise creating a vocabulary database of verbs, nouns, adjectives, etc. You may even be able to download such a thing somewhere. Then you can mark each word in a sentence with the appropriate type, and that then allows you to determine the sentence structure. You could even do this with Flex/Bison. All you have to do is put in a decent grammar definition, which shouldn't take long if you have plenty of training data - in your case, humans talking to it. Just log the sentences Zee couldn't understand and adjust the grammar or vocabulary accordingly.


Parsing is one thing. Understanding what is parsed is another thing.

You can use WordNet for a vocabulary/dictionary tool. Your parser description.. it is a very sucky way to do it but it is easy and it will work for trivial things. I would mention Prolog again.. but so be it.

Have fun.

"It's such a useful tool for living in the city!"
i have played with brainhat a bit some time ago; maybe it worths a look.
Quote:Original post by Name_Unknown
Parsing is one thing. Understanding what is parsed is another thing.


Agreed.

From Wikipedia:
"Chomsky argues that modelling some aspects of human language requires a more complex formal grammar (as measured by the Chomsky hierarchy) than modeling others. For example, while a regular language is powerful enough to model English morphology, it is not powerful enough to model English syntax"

And therein lies the problem. It is still possible to do some minimal parsing for keywords however. The name escapes me now, but there was a program first written on the BBC-B computer (or around that time anyway) which you could have a basic conversation with. I think it was a name like small, tiny or smalltalk but not sure. It basically went along the lines of

Hi I'm Zee
<comp> Hi Zee
I have an itch
<comp> Please tell me more about this itch (registers "itch" as a keyword)
Well it's in a very personal place & i'd rather not
<comp> Tell me about this place
Er, no
...

when it couldn't find a suitable response, it would say something like "go on", although obviously such a simple set of rules would get tedious very quickly.

Based on what you're saying about statements (cats are animals etc), Prolog would definitely be worthwhile investigating, although its syntax and operation takes a bit of getting used to to say the least.

http://cs.wwc.edu/~cs_dept/KU/PR/Prolog.html

I have never used a version that produces a standalone executable, however I believe they exist. Alternatively you can interface it with another language (a friend did this in java as part of his dissertation)
"I must not fear. Fear is the mindkiller. Fear is the little death that brings total obliteration. I will face my fear. I will permit it to pass over me and through me. And when it has gone past me I will turn to see fear's path. Where the fear has gone there will be nothing. Only I will remain." ~Frank Herbert, DuneMy slice of the web

This topic is closed to new replies.

Advertisement