Jump to content
  • Advertisement
Camillelola

R&D Sentence classification and named identity detection with automatic retraining

Recommended Posts

Hi Folks,

I am learning Artificial Intelligence  and trying out my first real-life AI application. What I am trying to do is taking as an input various sentences, and then classifying the sentences into one of X number of categories based on keywords, and 'action' in the sentence.

The keywords are, for example, Merger, Acquisition, Award, product launch etc. so in essence I am trying to detect if the sentence in question talks about a merger between two organizations, or an acquisition by an organisation, a person or an organization winning an award, or launching of a new product etc.

To do this, I have made custom models based on the basic NLTK package model, for each keyword, and trying to improve the classification by dynamically tagging/updating the models with related keywords, synonyms etc to improve the detection capability. Also, given a set of sentences, I am presenting the user with the detected categorization and asking whether its correct or wrong, and if wrong, what is the correct categorization, and also identify the entities.

So the object is to first classify the sentence into a category, and additionally, detect the named entities in the sentence, based on the category.

The idea is, to be able to automatically re-train the models based on this feedback to improve its performance over time and to be able to retrain with as less manual intervention as possible. For the sake of this project, we can assume that user feedback would be accurate.

The problem I am facing is that NLK is allowing fixed length entities while training, so, for example, a two-word award is being detected as two awards.

What should be my approach to solve this problem? Is there a better NLU (even a commercial one) which can address this problem? It seems to me that this would be a common AI problem, and I am missing something basic. Would love you guys to have an input on this.

Thanks & Regards

Camillelola

Share this post


Link to post
Share on other sites
Advertisement

Please not tht this is a "a,game AI" forum 99% of the people here wouldn't have any clue about the problem here and of those and of tha t do perhaps 25% might have an inking.

 

Regardless, a game AI forum is not great place to ask games about non-game AI. 

Share this post


Link to post
Share on other sites
Posted (edited)

Hey there,

IADave has a point, typically this thread is reserved for Game AI, and not Natural Language Processing, or other forms of knowledge-based mining algorithms. I haven't worked much with NLTK offered by Python, but have used Naive Bayes/ID3 to perform sentiment analysis on sentences.

It sounds like you are working with a Supervised model. That is, you have labeled training examples presented to your algorithm on what constitutes its classification. Where the disconnect seems to be is that it sounds like you aren't using a Bag-of-words approach, but instead a keyword approach, how would your current algorithm attempt to classify, "Our engineers will be launching a rocket into low-earth orbit this afternoon."?

Unless you know in advance the type of sentences that your algorithm will be expected to classify, fixating on keywords that you think are relevant may not be the best approach. Instead, I'd advise a simple occurrence + bag of words approach. That is, keep track of  all unique words, and their occurrences in your training data in relation to its training label (Acquisition, merger, launch, ect.), remove stop words (and, its, the, ect.), perform stemming on your words,  ((programmer, programming) == program), and present that data to your algorithm to have it determine what qualifiers in a sentence given the training data encompasses a sentence with 'x' label.

Quote

The problem I am facing is that NLK is allowing fixed length entities while training, so, for example, a two-word award is being detected as two awards. 

I don't know the level of abstraction you are working with, but that doesn't sound like an issue you should have if you are working with a bag-of-words approach.

Quote

What should be my approach to solve this problem? Is there a better NLU (even a commercial one) which can address this problem? It seems to me that this would be a common AI problem, and I am missing something basic. Would love you guys to have an input on this.

NTLK is probably one the easier frameworks out there to quickly perform Natura Language Classification. I think it might help you to not start with a high-level framework, but to actually implement an algorithm yourself for learning purposes. Take a peek at the link provided to implement a simple supervised classifier yourself

https://monkeylearn.com/blog/practical-explanation-naive-bayes-classifier/

 

Edited by markypooch

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!