Object recognition - Where do I begin?

Started by
5 comments, last by _Sauce_ 13 years, 4 months ago
Hello everyone,

I'm working on a program in which the user will draw something (say, a word). However, rather than recognize the word that's being drawn (I will tell them which word to draw), I need to recognize the way they draw it. Is it unusually big? Small? Curvy? Rigid?

I know that this is possible to do, but I'm not sure how to go about it. I think what I need to do is break up the drawing into pieces, where each piece is a polynomial function. I could then analyze the sizes, directions, and derivatives of the functions. Rather than reinvent the wheel, I thought I'd look online to see if there is an established way to do this. I've had trouble finding anything so far, especially since I don't know the terminology for something like this, so I thought I'd ask her.

If anyone could give me a hand I'd really appreciate it. Thanks!
Advertisement
Try to get your hands on as much "labeled data" (examples of drawings with the indication of whether they are big, small, curvy, rigid...) as you can. Then define some features that you can measure on each drawing (total length, diameter, general slope, some measure of the distribution of local curvatures...). Then use some machine-learning technique to decide which labels apply to new cases (support vector machines are the first thing that comes to mind).

You will probably need to tweak the setup (add or modify features, change parameters, try with other machine-learning techniques...) for a while until you are satisfied with the results.
This sounds like a hard problem that might be better placed with deep learners. What are you going to do next with the classification into curvy or not? A Restricted Boltzman machine or likely even a Deep Belief Net might be what you need.
As alvaro mentioned, a Machine Learning algorithm would be the way to go. Gather labeled data, split that data up into two chunks, one for training one for testing, to ensure you're not overfitting. Once you've done that it's a matter of classifying the data, for something simple try classifying with K-Nearest Neighbour or Perceptron (though a perceptron can only handle two classes, so you'd have to use perceptrons for each pair, e.g. Big and Small, Curvy and Rigid).

Define some good features as well, length, width, whatever you like.
Without looking at the research, and just off the top of my head, I'd suggest using a Histogram of Oriented Gradients to convert the drawing in to 'strokes', rather than using a spline or other polynomial function. The only reason I say this is because HoGs have been used successfully for a lot of other image classification problems. Just a suggestion.

From there you might what to look at PCA and Eigenfeatures, since you want to know what makes one thing significantly different from another.

Have you looked at gesture recognition? I know Microsoft has a few research papers on this relating to hand writing analysis for touch-screens and tablets.

First you have to determine the features of the image being drawn.

You use various filters that would do contrast differentiation, edge detection, line/block detections to build up a bunch of factors/attributes that can then be fed into the pattern match/recongnition part of the processing.

OCR processing has been done for quite a while so there should be information about what algorthms are used.

--------------------------------------------[size="1"]Ratings are Opinion, not Fact
You might find this article helpful http://www.gamedev.net/reference/articles/article2039.asp

This topic is closed to new replies.

Advertisement