Using a camera as a mouse (Image recognition)

Started by
11 comments, last by UnshavenBastard 16 years, 2 months ago
Hello Im trying to use a webcamera as a mouse-like device. My idea is to put the camera on a rigid structure that holds the camera at an even height and angle over the tables surface (or whatever i use it on). This way only 3 axis of movement are possible: position x, y, and the angle around z-axis, phi. What im looking for is an algorithm to recogize features in subsequent pictures. I imagine i could use some methods similar to the ones used in automatic panorama photo-stitching, only i would face a less difficult problem since the position and orientation of the next picture could be interpolated with reasonably good accuracy from the last few frames (given a high enough framerate and a smooth enough path). Problem is i dont know much about image processing and feature recognition. Some keywords to search for would be great. The purpose of this is that we need an accurate and reliable distance and rotation sensor for a small robot project were doing in school. So, given that i can make this work, we'll need to implement this on some small computer chip (probably on an AVR processor) so the algorithm used would need to be reasonably efficient in processing power and memory consumption. We though about just using an ordinary mouse for this, but they (at least mine) generally arent very precise or reliable. Which is fine when steering a cursor on a screen but not to navigate by. All answers appreciated! Emil
Emil Jonssonvild
Advertisement
Whilst I can't help with the actual proplem you asked about, I can make other suggestions.

Firstly, I don't think a typical webcam has a comparable framerate to that of an optical mouse, I'm thinking 15-30 fps for a webcam and 1500 for a digital mouse (according to howstuffworks.com). Mind you, I don't think an optical mouse transmits information about orientation, but mine is at-least very smooth for translations.

Anyway, if you want reliability, then image recognition techniques perhaps aren't the way to go, see the KISS principle.
Why not just a sonar sensor for distance measuring and digital compass for orientation?
Quote:
Mind you, I don't think an optical mouse transmits information about orientation, but mine is at-least very smooth for translations.



Well, mine isnt. Im not talking about unsmooth transitions but that it gets stuck on certain patterns and misses some movements etc. Perhaps we should just get a better mouse though, and we'd be set.

Quote:
Anyway, if you want reliability, then image recognition techniques perhaps aren't the way to go, see the KISS principle.
Why not just a sonar sensor for distance measuring and digital compass for orientation?


We cant use compasses since we'll be indoors with alot of electronic devices and steelbeams around us, witch would iterfere with the earths magnetic field. We could use a gyro, but that would accumulate error unless its 100% friction free. Were planning on using several sonars (or IR distance sensors), but they arent terribly accurate.

The goal of our robot is to release it in a room and let it draw a map of the interior. So we need some reliable way to keep track of our own location.

Thanks anyway

Emil
Emil Jonssonvild
Another idea is just dead-reckoning [smile].
If that's not acceptable then, if this is a wheeled robot, you could introduce a suitably high resolution tachometer to measure wheel rotations and then indirectly figure out how much you've moved forward or rotated; the point being the tachometer won't lie or make assumptions.
Quote:Original post by cannonicus
What im looking for is an algorithm to recogize features in subsequent pictures. I imagine i could use some methods similar to the ones used in automatic panorama photo-stitching, only i would face a less difficult problem since the position and orientation of the next picture could be interpolated with reasonably good accuracy from the last few frames (given a high enough framerate and a smooth enough path).


Well, I guess you should spend your time analyzing the video instead of the individual images. Or in other words, analyze the difference between 2 frames. There are things that would allow you to estimate with relatively good precision the movement of the camera.

I think it was optical flow (not sure if it's the correct translation for "flux optique" in french) that could get you a movement vector for every pixels in your image (or block of pixels). In other words this will return you the translation that you need to apply to a pixel to find it on the next image.

I've seen it used to estimate movement of objects in a video but I'm sure I heard it could be used to estimate camera movement. Anyway your "objects" should all move with a "consistent" (no better word to say what I mean) vector in your case.

In the previous paragraph I said "consistent" vectors instead same vectors because that will show you translation in rotations. If all vectors are the same direction, you only have translation. Otherwise you will have some rotation.

If I understood what I've done at school correctly. This will get you a translation vector and a way to estimate rotation. Once you have them you will use some pre-calibrated data from the camera to estimate the translation in reality from the translation in pixel. This would be calculated/estimated using other calibrating techniques.

Anyway, this is not meant to be a complete explanation of the thing, it's just an explanation to show you that it might help you... and what I remember from a course I took over a year ago.

Also, this looked like something that needed a lot of calculations per-image, specially when implemented using FFT. But I still think it might be worth a try. And there are many ways to implement that and not sure which one uses less processing. Memory side you just need the current and previous frames and a structure to hold the vectors.

JFF

Quote:Original post by dmatter
Another idea is just dead-reckoning [smile].
If that's not acceptable then, if this is a wheeled robot, you could introduce a suitably high resolution tachometer to measure wheel rotations and then indirectly figure out how much you've moved forward or rotated; the point being the tachometer won't lie or make assumptions.


That or use some tracking device like those used for virtual reality or 3D reconstruction. You put one part on the ceiling of the room one of the robot and it will give you exact position and rotation.

I'm not sure about the costs though... just showing you another option.

JFF
You can disassemble some optical mice to access the optical mouse sensor directly rather than going through the PC interface. If you do so you can read the motion counts directly instead of using the pointer translations from the driver. If you do so, then you can process the inputs from multiple mice directly and form a more accurate composite information model. Example optical mouse hack.
Quote:Original post by cannonicus
The purpose of this is that we need an accurate and reliable distance and rotation sensor for a small robot project were doing in school.


'Optic flow' is indeed the English term used to describe the technique of detecting self motion (egomotion) by measuring velocities of objects directly in the image plane of a sensor fixed in the frame of reference of the body. It is a very good technique for solving the above problem and there is plenty of literature on how to do it.

Optical mice sensors are just a cheap form of optic flow sensor. You can pick up a dedicated sensor board from Centeye (www.centeye.com) or hack your own... but if you're going to do this I'd recommend going out and buying a good quality digital gaming mouse (should be about 2kHz on the sensor read cycle).

You can use optic flow directly to perform odemetry... indeed, this is how bees are able to know exactly where they are in relation to their hive while flying quite random paths over long distances (several miles). If you need some direction on how to do this just holler... or take a look at Mandayam Srinivasan's papers on the subject of visual odemetry in bees... or one or more of the many papers on the application of visual odemetry and egomotion detection in robotics. If you need some pointers just let me know.

I'd also love to know how your project turns out and to hear your perceptions on the assessment task. One aspect of my current research is in the area of visually moderated control in robots (although I work on flying robots mostly) so I'm always interested to hear of other peoples experiences in this area.
++ for the optical flow as an solution for apparent motion. I dont like optical flow for 3d problems because they screw up around occlusions, but as long as your surface is flat enough it should work very well.

If you need to find the optical flow between two images by yourself, the Lucas-Kanade method should work fine for you.

To extract the ego-motion from the optical flow, I liked a paper (it might even be one of Srinivasan's, that guy's done everything) that suggested to first find the rotation that makes the residual flow parallel. But personally, I found it easier (when I worked on the 3d case) to first find the translation that makes the residual flow null or concentric. Lastly, you'll need to calibrate image translation -> world translation. That depends on the distance from the camera to the table, the focal distance of the camera, the pixel size, etc.

Of course, using a mouse would be easier, but nowhere near as fun as doing it yourself :P. Your precision problem with mouses probably come from the fact that small errors accumulate really fast. In that sense the very high frequency of mouses become a liability rather than an advantage. You need a slower sensor that captures a larger portion of the table. Avoid sensors with a large aperture (like webcams!) because you'll get a strong radial distortion which will screw your results.
Quote:Original post by Steadtler
To extract the ego-motion from the optical flow, I liked a paper (it might even be one of Srinivasan's, that guy's done everything) that suggested to first find the rotation that makes the residual flow parallel. But personally, I found it easier (when I worked on the 3d case) to first find the translation that makes the residual flow null or concentric.


Did you investigate this beyond 'getting it to work'. Did you find (or do you believe) that it was an artifact of your experimental setup, or do you have a reason to believe it's a more fundamental result?

This topic is closed to new replies.

Advertisement