Sign in to follow this  

text classification

Recommended Posts

hello I need to do text classification with thousands of helpdesk question/answers I already "summarized" the questions based on the answer given (since it is pretty standard), so what I currently have is something like "I already paid this invoice" -> [paid invoice] "you're charging me for that, but I already paid that" -> [paid invoice] "this isn't due anymore" -> [paid invoice] etc now I "just" need to relate the words to its category for example, "I already paid" in a sentence would probably mean [paid invoice] category I tried doing that with naive-bayes, but it has many problems: 1) it doesn't know some words in a sentence is more important than others... for example, "I *DIDN'T* paid" is very different from "I paid", but naive-bayes doesn't handle that correct 2) texts usually have like 2 or 3 "keywords" statistically linked words, and naive-bayes basically consider only one I can see some modifications that can be done: a) for each phrase, check if it's afirmative or negative b) don't do it so naive... there's thousands of texts and words, so I can't consider all of them, but I could concatenate 2 words for example and then analyze those concatenated words... for example, in the first example above, I would concatenate "Ialready", "Ipaid", "alreadypaid", etc... but it would require a lot of computer power and I'm not sure of the results any other ideas for solving that?

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this