English Language Processing?

Started by
16 comments, last by Rhalin 18 years, 7 months ago
Eliza is the word your looking for.
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
Advertisement
Have a chat with Alice at www.alicebot.org =)
You'll have to decide how much preprogrammed knowledge you want the bot to start off with. If you allow yourself to supply grammar rules your bot may start to speak linguistically perfect English under most circumstances. Like some others said though, actually making it appear it understands what it talks about is an entirely different issue. Bots I've chatted to all have frustrated me in their ability to find totally irrelevant connections to certain words and their disability to find logical connections in others.

Also, is it really necessary to keep everything in memory? Couldn't some kind of storage structure be used to load relevant words and corresponding data as files from the HD when needed? Obviously the storage structure would have to be pretty advanced, and I don't have any idea currently of what it might look like. When I tried to implement language processing in my IRC bot (which didn't get very far), I settled for a word structure. Unfortunately, I realize it could never give a fair representation, especially as spelling errors occur. Even some fairly good comparison functions will fail to determine whether a word is equivalent from another when comparing the spelling.

An interesting experiment that I've thought of some time would be to give the bot English grammar to start off it, but try to give it the ability to learn other languages by comparing them to English.
Quote:
An interesting experiment that I've thought of some time would be to give the bot English grammar to start off it, but try to give it the ability to learn other languages by comparing them to English.


How exactly would you go about making a bot compare another language to english?

(when other languages have different ways of linking different works to others ect :/ )
-www.freewebs.com/tm1rbrt -> check out my gameboy emulator ( worklog updated regularly )
Depends what your goals are for the bot. If you want it to speak intelligently about a narrow field, your best best is AIML based bots. As someone else stated earlier, if you're wanting to attempt something more ambitious, you're wading out into some deep water. I'd start here by looking into OpenCyc.

Roy
http://www.p2pmud.com - open source peer to peer interactive fiction
http://www.p2pmud.com
I started working on such a project (thinking about it) and the first thing I did was make a general state machine template.
I thought if I could parse the words into literals and identifiers, then I could just run those literals through a predefined state machine (created from a regex) and expect it to function appropriately..

However most languages account for recursive definitions grammar (problem 1) which cannot be easily defined in a state machine and second it was tedious to know what literal refered to each type..

This is what I mean:
MAP FROM WORD TO TYPE
x -> T1
y -> T2
z -> T3
w -> T3

// more of a grammar than a regex actually
regex1 := T1 T2 [regex1]
The tediousness is obvious here because we would need a recursive data structure to identify the structure of the sentence..

So to process languages with recursive definitions (such as english) you MUST process sentences into syntax trees...

THAT is as much as I can go with parsing and structuring (am waiting till I read the dragon book first)

As for UNDERSTANDING words, the computer simply can do that in an ABSOLUTE manner. However you can provide the computer with some tautologies (logic axioms) and recursively solve sessions in conversations.

In essense you would define a rule transformation such as this:
{(X = Y) and (Y = Z)} <=> {(X = Z) and (X = Y) and (Y = Z)}

Then within a conservation:
<HUMAN>
A rabbit is at.
A rat is mouse.
is a rabbit a mouse
<COMPUTER>
TRUE

In other words the knowledge of the computer is relative to former premises and only as much as it is told expanded by its logical sense

This logic engine would prove to be an enormous subject but basically it can be modelled by an A* algorithm

GOOD LUCK
[ my blog ]
Quote:
This logic engine would prove to be an enormous subject but basically it can be modelled by an A* algorithm


wha?? please elaborate!! how would you use a path finding algo to do logic?

i mean, i could see if you converted everything down into prolog rules and queuried against that.

http://www.p2pmud.com
I've recently started laying out specs for a similier project, although the intention is a little different (language translation).

Some of my approach might be useful though. I was planning to use a tree-based structure for the vocabulary. It would describe the word itself, include single and multiword synonymes, and have the extrapolated forms of the word (which would include all words created from the base), using some basic rules in a special syntax I'm working on, so that it should be able to generate the proper forms of the word. The tree would also include a way to associate words with each other (non synonyme, but related) such that "steel" and "sword" would have a "weak" relation.

Now this is all well and good for vocabulary and some basic translation from one language to another, but I'm sure there's a way to apply a similier method to make AI "understand" some basic conversation.

I figure, as long as the AI understands enough about what is said, it should be able to ask a question for someone to clarify what they said. This happens often enough in normal conversation anyways: "What did you say about my mother?!" ;)

Just my 2 cents

This topic is closed to new replies.

Advertisement