How can I recognize hand drawn shapes? (eg. the wizard in Trine)

Started by
5 comments, last by Shannon Barber 9 years, 8 months ago

tl;dr: I want simple hand drawn shapes (boxes, lines, circles) to be identified by my code when the user makes that pattern with their cursor.

Example video

I want to go from mouse input to solid objects based on the (rough) shape that is drawn. This is not edge or blob detection, nor vector art as far as I'm aware, but some sort of path direction/line segment detection. There are X preset shapes, and drawing something similar to one of them will create that shape at that size.

In Trine, this includes a square for a box and a line for a plank (not sure if there are others, but I don't mind that being spoiled for me if there are). So far I'm getting *way* more complex stuff than I need with my Google searches, I think, so turning here for more practical applications at my needs level. "Evolutionary visual learning" was described in one document I found, and that's far beyond anything I want.

Working in C#/Unity, in case there are any relevant libraries I should look at.

EDIT: These two assets in Unity seem to do what I want, but I'm looking to understand how it's done in case I can implement it more cheaply/effectively/simply myself for my own purposes (rather than adopting a new engine to what I already have)

Drawing Engine

HyperGlyph

Advertisement

Follow up from my own research: ends up there's no simple solution, should have figured. My options are either to go through the full learned state idea where a library is taught what to look for with examples, or you program it to compare against the sizes of a few basic shapes and nothing more.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.37.2973&rep=rep1&type=pdf

This article was the most helpful, and it looks like there's no easy or intuitive solution to this. The closest is...

1. Get convex hull for current shape (like putting a rubber band around it)

2. Compare convex hull to smallest of X shape that would fit around it (circle, then square, then triangle, etc)

3. If areas are almost one to one, they are probably the same shape.

It's pretty good for what it sets out to achieve, but it can't handle interesting shapes that are different, but have similar convex shapes (so ellipses and diamonds are tricky for it).

If I find any better options, I'll report back here, but for now it looks like this isn't a good enough approach, so a library is probably necessary.

Gesture recognition typically breaks down into two stages: quantizing the input into an easily-compared representation, and then comparing the result with a set of know gestures. I've experimented with a number of approaches to quantizing in the past.

Convex hull-based approaches, such as the one you describe, attempt to match the rough outline of the glyph to a set of pre-defined convex hulls. These methods tend to be fast and simple, but have trouble differentiating between similar outlines.

Stroke-based approaches quantize the input curve into discrete segments, and then compare the rate-of-change in slope and segment length against a set of predefined curves. These approaches are fast, and exceptionally resilient to changes in scale and rotation, but are heavily dependent on stroke direction (i.e. left-hand and right-hand circle are different shapes), and have trouble differentiating between shapes with matching regions of continuous curvature (i.e. circle vs ellipse).

Shape-based approaches typically rasterize the shape onto a bitmap grid, and then look for pixel similarity with a pre-defined set of images. These are a lot more robust for similar shapes, easily account for varying scale, but make it tricky to deal with rotations.

And of course, once you have quantized the input like this, there are always a whole host of machine learning techniques which can be applied to the problem...

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

A lot of the short-comings of the stroke-based method can be overcome with a good tool-chain and tweaks to implementation. For example, while a right-hand and a left-hand circle are going to be different, you could have your tool chain over-provision the shapes you're looking for by generating both circles as matches for "circle". With regard to implementation tweaks, while the method allows for you to be pretty resilient to transforms like scale and rotation, it doesn't lose this data, so you can still take advantage of orientation knowledge when you take the action that the gesture is meant to induce. For example, you can pick up an uneven scale transform to differentiate between a circle and ellipse.

Stroke-based for the win.

If you can be clever in your picks of shapes you let the user draw you can vastly simplify things.

For example, line vs. box.

If the user draws a single stroke where the average slopes of all the points making up the slope are similar - they drew a line, so make a line of the same length with the average slope.

If the above test fails, fit a rotated bounding box to what the user drew and divide it into 9 segments (3x3). If the stroke starts in one segment, and then goes around hitting each of the "bounding" segments without going into the middle, they drew a box, so make an in-world box that matches the bounding box you just fit.

You could extend this to a triangle by checking the 3x3 grid for the user hitting two "corners" of the grid and the middle segment of the opposite side.

All great posts, thank you!

Echo17's stroke based implementation best matches the slightly-modified idea mentioned above, it can easily recognize shapes being drawn from any combination of directions, so I'll probably use that. It is the first library linked in my original post, Drawing Engine.

The other engine, HyperGlyph, also still stands out - but hasn't been updated in over two years. Drawing Engine was updated two weeks ago, so it makes it a much easier choice.

Thanks again!

I was going to suggest the stroke-based method.

e.g. every 100ms (and on mouse up/down) capture the line-segment.

Capture the whole set of segments, then process it.

Merge lines, one at a time, and maintain a priority queue based on the "best-match rank"

You need to merge the most similar two lines each iteration to guarantee the best result.

You need a heuristic to rank how similar the current set of line segments is to your ideal line segments.

I think you can't just ignore magnitude, I think you need to use relative magnitude to the whole shape (e.g. normalize it).

Then measure the angle from the previous stroke.

Use the sum of least squares to rank it?

Repeat until you merge everything into 1 line.

The top ranking match on the the priority queue is your current best answer.

- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara

This topic is closed to new replies.

Advertisement