Jump to content

  • Log In with Google      Sign In   
  • Create Account

Conversational AI


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
48 replies to this topic

#1 phantomus   Members   -  Reputation: 621

Like
Likes
Like

Posted 10 March 2004 - 08:29 AM

Hello, My name is Jacco Bikker, I'm pretty new to this forum. I am the author of WinAlice, a C++ version of Dr. Wallace's 'ALICE', with various extensions, such as moods. Usually I'm busy with mobile games and 3D stuff, but every now and then I want to create a virtual companion, or electronic offspring, or whatever, you know what I mean. My latest piece of software is an interpreter based on ideas used in ALICE: I believe natural language is merely a way to communicate, it encapsulates information in a way that is compatible with human means of data transport (esp. voice) while at the same time it's (almost) versatile enough to represent the thoughts that humans want to transfer. In this picture, language itself has nothing to do with intelligence, it merely is a link between two intelligent beings. The link itself can probably be simulated by a machine. This would open up the possibility of communication between a man and a machine, in a way that is most natural to the human. So I have been focussing at extracting information from language. My most recent attempt uses patterns and templates, like ALICE: Patterns to classify input sentences, templates to generate replies. ALICE uses wildcards, like DOS: * and ? for multiple or single characters. My pattern matcher uses much more complex syntax: Multiple wildcards can be used, optional parts, and selections, as well as if statements and variables (both regular and stacks). The template processor is even more versatile and supports reprocessing of (parts of) the input sentence, based on variables etcetera. It can even handle loops this way. So now I have this cool interpreter (I'll gladly post more detailed info if anyone cares, I'm even willing to share the source code), but I need a goal for it. I found that ALICE is ultimately a toy, as it does not persue any other goal than entertaining the user, or avoiding being detected as a artificial entity. Yesterday I found the goal. It's natural for a new being to be curious. So I want to make a curious entity. I want the software to check input sentences for words that it doesn't know yet, and I want it to ask questions about them. Things require definitions, actions require 'why' questions. I think a bot asking lots of questions would already be a huge improvement over the current situation, but it might get a bit too much, so I would also like the bot to look for information online: It could look up information using google, scan acquired text for more facts, and so on. The text it finds could very well contain things that it doesn't understand, and taht way, talking about a car one day could result in a question about airbags the next day. Of course it needs to store all this information. I would like to store information using a 'mind map'; every item would have links to an arbitrary number of other items. Every link is a relation of a certain type: 'has a', 'is a', 'knows' and so on. A relation can be unidirectional or bidirectional. A relation can have a weight. An item should have extra data attached too: The number of times the user used the data in the item, the source of the data in the item, the thrustworthyness of the data in the item and so on. This way, looking for new answers can start at items that are interesting, but not yet clear. Well that was a big rant. I'm quite enthousiastic about the idea myself, but I would like to hear opinions: Am I mentioning things that can't be done, do you have interesting additions? Right now, I have the pattern matcher, but that's it. I don't have the time to generate the 10.000 patterns for it that it probably needs for decent chatter, like ALICE. I think I found a good substitite for it, as the bot I just described would require relatively few patterns. OK, fire away. - Jacco (a.k.a. "The Phantom"). [edited by - phantomus on March 10, 2004 3:38:35 PM]

Sponsor:

#2 phantomus   Members   -  Reputation: 621

Like
Likes
Like

Posted 10 March 2004 - 08:32 AM

Forgot to post a link to the latest version of WinAlice:
www.bik5.com/winalice.htm
It's my best shot at conversational AI yet.


[edited by - phantomus on March 10, 2004 3:34:04 PM]

#3 Anonymous Poster_Anonymous Poster_*   Guests   -  Reputation:

Likes

Posted 10 March 2004 - 08:34 AM

Am I being really stupid or is the thing you want to impliment a neural network - I suggest you look into them!

#4 phantomus   Members   -  Reputation: 621

Like
Likes
Like

Posted 10 March 2004 - 08:43 AM

No definitely not. Basically I want to create software that gathers definitions of things (how''s that called in english?) and other relations. It does so by detecting undefined words in user input, and actively asking about these words. Besides that, it does autonomous research by looking up these words on the internet. I intend to abuse Google for this, and I expect that Google will serve my software with rather large texts. These texts can be scanned using the same pattern matching to find more things that need definitions. The cool thing is that these things will usually be related to the original item.

The ideal role of the user in this process is that of a ''mentor'': The user answers questions that Google can''t answer, and affirms found data. In return, the user has a sensible conversation with a very curious entity. The conversation will be different every day, and it won''t be limited to a build-in dataset.

#5 RPGeezus   Members   -  Reputation: 216

Like
Likes
Like

Posted 10 March 2004 - 08:47 AM

Visit

http://www.20q.net/

I''m not sure if this is the idea you were thinking of. This keeps relationships about things. Kind of neat.

Will

#6 cowsarenotevil   Crossbones+   -  Reputation: 2103

Like
Likes
Like

Posted 10 March 2004 - 08:52 AM

Interesting. It seems to me that the biggest difficulty here will be organizing the data, though I can''t really help

<- Cow Soft, free software I''ve made
"Gay marriage will encourage people to be gay, in the same way that hanging around tall people will make you tall." - Grizwald

#7 jamessharpe   Members   -  Reputation: 497

Like
Likes
Like

Posted 10 March 2004 - 09:10 AM

You might want to look into semantic nets. They''re basically a graph which defines relationships between different words to allow the AI to make connections between different objects.

James

#8 Jack Sotac   Members   -  Reputation: 516

Like
Likes
Like

Posted 10 March 2004 - 09:13 AM

Is this not just a roundabout way of getting the user to generate the patterns instead of the programmers? The more users would equal more realistic chatter?

"definitions of things " -> facts?

#9 Anonymous Poster_Anonymous Poster_*   Guests   -  Reputation:

Likes

Posted 10 March 2004 - 10:17 AM

Hi, Jacco.
My name is Marco, I´m from Brazil and I always think that Alice sucks. A "entity" that does not have inteligence, have no future.
I think that Artificial Inteligence is possible. And I mean INTELIGENCE, with real learning... Not that minimal "inteligence" we put in our simple games.
I dreammed, for a while, to make a chatbot really smart. So I start to think and write down a document. When I realize that it could be a life-time project, I gave a break. I have some ideas, and I think that many of them matches with yours.
I will re-open this document and discuss some point, specially the way of making data storage. For now I just like to register my interest.
Marco.


#10 Jolle   Members   -  Reputation: 178

Like
Likes
Like

Posted 10 March 2004 - 10:22 AM

a-i.com have some nice things going on

[edited by - Jolle on March 10, 2004 5:22:54 PM]

#11 Krippy2k   Members   -  Reputation: 134

Like
Likes
Like

Posted 10 March 2004 - 10:28 AM

Actually, there is a reason that ALICE doesnt ask a lot of questions of the end user that immediately has any effect on its own personaity, or whatever. This is because as soon as a user finds out that they are talking to a bot, they tend to feed it garbage to see how it responds. It would take an extremely advanced bot to handle user input in that way without becoming worthless as an entity by devouring massive amounts of nonsense. You would probably have to have the bot ''test'' the user for truthfulness by occasionally asking questions that it knows the answer to, and/or applying varying degrees of certainty to answers, by asking the same question of multiple users.

Experienced ALICE botmasters take a different route. Since you know you can''t rely on the accuracy of user-level input, ALICE collects previously unknown information and stores it in a ''targetting'' database. Most information gleamed is only immediately represented in conversations with the user that provided the input. Then in a private targetting session with the botmaster, ALICE presents the information to the botmaster, and the botmaster provides appropriate responses or symbolic reductions, which are then integrated into the global knowledge base that is interfaced by all users. This helps keep information relatively accurate, and tuned to the overall personality of the bot.

Many of the same pitfalls apply to using Google. On an average query, you will get a handful of results that are relative to what you are searching for, and several thousand or million results that have absolutely nothing to do with it, or distort acceptable definitions immensely.

Having been down that particular road, I would personally suggest that you use more focused and targetted resources. Instead of using Google, use WordNet/Webster/OED dictionary databases, thesaurus, almanacs, etc. Choose resources that keep information in a uniformly structured way, and ones that are acceptably accurate. One thing that I did with my foray into extending ALICE was to interface her with the CIA World Factbook. Granted that is not a 100% reliable source, it is reasonably accurate and covers a wide range of information about countries in a well structured format.

But generally speaking, it is my experience and that of many others that releasing a bot to learn from end users that are not bound to conversational rules requires a bot that is exceptionally advanced, and very well learned on distinguishing human concepts and demeanor (determining if something said is meant to be serious, or taken in jest, etc).

There are also a lot of legal ramifications if the information the bot learns from one user is to be shared with others. There have been cases where people have used publicly accessible chatbots to store and retrieve stolen credit card numbers, and things like that. Liability of botmasters for information dispersed by their bots is a pretty grey area in most countries, so it is something you definitely want to keep in mind.

At first you would probably want to limit it''s ''learning'' to a core set of users that you know you can trust to provide meaningful, accurate, and legal input.

Just some thoughts.

Peace

#12 cowsarenotevil   Crossbones+   -  Reputation: 2103

Like
Likes
Like

Posted 10 March 2004 - 11:16 AM

quote:
Original post by Jolle
a-i.com have some nice things going on

[edited by - Jolle on March 10, 2004 5:22:54 PM]


Alan is evil. Particularly because my own beautiful bot gets confused whenever talking to Alan.

<- Cow Soft, free software I''ve made
"Gay marriage will encourage people to be gay, in the same way that hanging around tall people will make you tall." - Grizwald

#13 Krippy2k   Members   -  Reputation: 134

Like
Likes
Like

Posted 10 March 2004 - 11:28 AM

BTW, nice interface. I like the ''human typing''.

The AIML set in that download is missing the default category though, and any otherwise unmatched input crashes the program. (Saying ''duh'' for instance). If I add the default category to Defaults.aiml it works fine, with the little added quirps about making typos.

Peace

#14 phantomus   Members   -  Reputation: 621

Like
Likes
Like

Posted 10 March 2004 - 07:59 PM

RPGeezus: 20Q is interesting. Correct me if I''m wrong, but it looks like an expert system. By letting tons of people play it, it gathers knowledge. Making it successfull is probably a marketing issue though.

I want to make my software different in a couple of ways: 1. It should learn not just things (WorldLingo suggests the English word ''pronoun'', is that correct?) and their descriptions / abstractions, but also other relations: A monkey eats banana''s, this links monkeys and banana''s through the verb (WordLingo again) ''eating''. This should make the program ask "WHY does a monkey eat banana''s?" and: "What is a banana?", if it''s not yet in the vocabulary. Same for monkey. When it scans Google for the word ''Monkey'', it will probably find a document that describes their habitat, so a future question might be: "What is Africa?", and when you answer: "A country", it might reply with "Just like Holland?", while linking Africa to Holland and to the abstraction "country". I believe this goes beyond a normal expert system. It would allow for queries like "are banana''s and Holland related?", the answer would be: "Vaguely; Holland is a country, and Africa is a country, monkeys live in Africa and monkeys eat banana''s". Now that would be seriously cool, especially when it starts using links that you didn''t enter yourself.

cowsarenotevil: You are quite right, there''s going to be a ton of data, and it need to be accessed pretty quickly. I have help from an SQL guru, so that could improve things a bit. So far I use plain text files for the pattern code, and plain text files for vocabularies (used for the spellchecker at the moment). The new information records need a lot of info, so text files would not work anymore.

Marco: I''m very interested in your input. Tell me more.

Jack Sotac: No, it''s not just a way to generate patterns automatically, or by chatting. My primary goal is to make an entity with a purpose. I believe THAT is what should separate it from ALICE and all the other bots out there. It would also be the only type of chatbot that I would consider more than just a toy.

Krippy2k: Thanks for your suggestions. I think you are right that the bot should use thrusted sources. I want to tag gathered information with a figure indicating thrustworthyness; I would consider information from a mentor (known human) thrustworthy; WordNet/CIA/whatever quite thrustworthy; Google would require confirmation from the mentor before information is accepted. However, even bad information can lead to interesting new questions.

I do not intend to let the bot loose on the general public by the way. Initially, the bot will run on my machine, and gather information to a local database. If someone else runs the bot too, there will be a second local database. I imagine that I could tell the bot to merge with a known thrustworthy friend at a specific IP; that would be an acceptable way of distributing the information gathering. The merge could lead to interesting questions as some information is bound to conflict.

- Jacco.



#15 phantomus   Members   -  Reputation: 621

Like
Likes
Like

Posted 10 March 2004 - 08:01 PM

I''ll try to fix the default pattern for WinAlice, thanks for the suggestion. It''s old dusty software, but I think it does a better job than ALICE. Did you encounter cases in wich it returns to a previous topic? I''m particularly proud of that mechanism. It''s still a hiding tactic though.

#16 Anonymous Poster_Anonymous Poster_*   Guests   -  Reputation:

Likes

Posted 11 March 2004 - 11:41 AM

Ok, lets have a sample of my thoughts :
The general idea is to create a chatbot that can respond to inputs, answering questions or formulating its own questions. It would also work on its datas, to generate new datas (logical conclusions). In a second step, i would like to give the bot some variations that could work like emotions. All the project is a great challenge, but this second step is really too big.

To do that I think in "Cells of Information" (or just CI´s). A CI could be a word, or a group of words, but it must have a meaning. A CI is nothing alone, but it have lots of links with others CI´s, what make the all thing work (I hope). A link could be weak or strong, and could be more or less truthfulness. Strong links make the bot reminds the CI´s linked, when some CI is activeted, in the other hand, a truth link is just more reliable. A weak link could be a a link between a knife and a cat, but this link will have high truthfulness if the link is of type of "smaller then" (just an example link).

A CI could also be represented in various languages. A language is also a CI.
Another specials CI´s is the "sources of informations" (or SI). Human is a SI. Knife its not. Babysitter is a SI. The bot must discover what CI could be a bot and what cannot, so give then crediability, in a specific matter.

I think also in algorithms to give life to all this storage data, like "expression making", "generalization", "reprocess of truthfulness", and others.

I´m not explaning all the things I´d wondered. And all the things that I´d wondered is far away from the necessary to make the bot. But I truly believes that I´m on the way. So you are. But... will we stay on the way or reach the goal some day ?

Finishing, if you didnt understand something, it could also be a reflex of my "poor english", so forgive me.

What do you think ?




#17 Anonymous Poster_Anonymous Poster_*   Guests   -  Reputation:

Likes

Posted 11 March 2004 - 11:43 AM

By the way, the last post was mine...
Marco.

#18 Antonio Carisba   Members   -  Reputation: 122

Like
Likes
Like

Posted 11 March 2004 - 11:55 AM

Try read Russell and Norvig, "Artificial Intellingence".
Ciao

#19 geoffsulcer   Members   -  Reputation: 122

Like
Likes
Like

Posted 12 March 2004 - 01:35 AM

Jacco, I was a user of your EasyCE, and I still play with AliCE once in a while, I learned a lot studying the source.

I''ve often thought about having a conversation agent that "learned" rather than having hardcoded knowledge. The biggest problem with Alice is that it never retained knowledge. Your technique of jumping back to a previous topic was good, and made for some interesting conversations. But repeated or lengthy conversations could be frustrating since Alice wouldn''t remember information you had provided.

I like your idea of having a weighted network. I think about how my son is learning (2 years old). He sometimes asks what something is, but more often is just given information. He receives positive and feedback when he repeats correct information, and negative feedback when he states misinformation. Over time, he builds up associations with objects in his world and words and associations. If he gets positive feedback from multiple trusted sources (parents) he picks up new information faster than if he receives conflicting feedback, or feedback from only untrusted sources (television, people he doesn''t know, etc.)

So, I thought about having an agent that "knows" nothing, except how to parse a sentence, and ask simple questions. The very first conversation might look like this:

Human: Hello
Agent: What is ''Hello''?
Human: Hello is a greeting.
Agent: What is a ''greeting''?

Over time, and many conversations, the agent might learn something. The problem I ran into is at some point an basic set of knowledge needs to be available to the agent just to be able to parse the language. And some concepts are difficult to explain in a format the agent can parse and store in the network.

Anyway, those are some initial thoughts. Sorry for the length of the post. Good luck, I look forward to hearing more about it.

Geoff

It''''s a simple choice, really. Get busy livin'''' or get busy dyin''''.

#20 phantomus   Members   -  Reputation: 621

Like
Likes
Like

Posted 12 March 2004 - 08:48 AM

As far as I can see, Marco, Geoff and I are describing more or less the same data structure. I believe it''s the same way that Cyc is storing data (check www.opencyc.org), wich is basically a database of concepts and relations between the concepts. By adding things like thrustworthyness of data the data can be gathered from both ''secure'' sources and ''insecure'' sources.

I believe storing data this way is quite natural, it almost looks like a neural network (so it''s probably good ). The system will probably even ''forget'' data, if there are only a few links to other items, just like our own memory does. Perhaps data could even age.

So the problem is, how do I (we?) fill this database? I believe Cyc is fed with data without using natural language, but I think in this case, the system should be fed data using ''normal'' conversations.

This indeed requires basic knowledge to extract data from input. Basically, we need to be able to do proper ''part of speech'' tagging. I did some research, and it looks like the best English part of speech taggers still can''t do much better than 75%, wich could mean that the software would regulary ask about a noun that is actually a verb.

I think this problem can be circumvented by initially limiting the number of ''correct sentences''; instead of allowing some random conversation from wich the software extracts new data, a fixed set of sentences could be used. This makes the conversation less interesting, obviously, but over time, the patterns can be made more generic. But the software will be learning much earlier than that. Things like learning through the internet could be added at an even later stage. The limited set of patterns probably wouldn''t even need part of speech tagging, while more generic conversation handling requires even more complex algorithms, like anaphora resolution.

Basic patterns:

''a(n) * is a(n) *'' (a dog is an animal ==> DOG = ANIMAL)
''what is a(n) *?'' (what is a dog?)
''is a(n) * a(n) *?'' (is a dog an animal?)

More advanced patterns:

''* [verb] *'' (cats eat mice ==> CAT EAT MICE)
(where [verb] is a build-in list of verbs, or something more intelligent)

I think this could lead to some results pretty quickly.

- Jacco.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS