• Advertisement
Sign in to follow this  

Simple mechanisms for low-budget natural language generation

This topic is 1456 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm hacking around on an IRC bot in my spare time, mostly as an interesting exercise in Javascript. It has some basic functionality but it just lacks that special something... so I want to teach it to talk.

 

Before we get too far into this, I should say that I'm fully aware that NLG is a massive field of research, and I'm not trying to pass any Turing tests here. I don't care if the generated "speech" even makes sense half the time; it's more for amusement than anything else.

 

My first inclination was to build a Markov model and use simple chains to construct sentences. Unfortunately, the space complexity of this is rather nasty, and the real killer is the amount of data needed to train the model adequately. I don't have a readily available corpus of plaintext to feed into the thing that suits the mood and personality I want to create.

 

The next obvious route would be to construct a Petri net for the language I want to speak. The major advantage is that this is a compact and fairly efficient way to do poor-man's NLG; the disadvantage is that hand-authoring and tuning a Petri net for nontrivial languages can be a huge time sink.

 

 

So I figured I'd poke around here and see if anyone knows of good algorithms for simple NLG that I might be able to take advantage of. I don't mind having to use a huge data set as long as the data is easily constructed and/or readily available in an easily digested format. Runtime is important since this is supposed to be a realtime conversational bot.

 

Non-goals: contextual recognition, memory, progressive refinement/learning, etc. It doesn't even have to do more than dumb keyword recognition for all I care.

 

 

Cheers!

Share this post


Link to post
Share on other sites
Advertisement

I'd second your Markov model idea, and somehow try to work around the training problem.

 

If you build a simple semantic model using WordNet for example, you could reduce your training data required significantly.  So you'd end up learning at the high-level, <pronoun> <verb> <noun>, or possibly more detailed like <pronoun> <eat> <vegetable>.  I'm not sure how good NLP / NLG libraries are for Javascript but there are some awesome ones in Python that could help with this.

 

Anyway, cool project ;-)

Share this post


Link to post
Share on other sites

You might want to look up what was done for the NaNoGenMo project (look on Github). It might give you a few ideas of some of the different approaches.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement