Sign in to follow this  
cannonicus

Using a camera as a mouse (Image recognition)

Recommended Posts

Hello Im trying to use a webcamera as a mouse-like device. My idea is to put the camera on a rigid structure that holds the camera at an even height and angle over the tables surface (or whatever i use it on). This way only 3 axis of movement are possible: position x, y, and the angle around z-axis, phi. What im looking for is an algorithm to recogize features in subsequent pictures. I imagine i could use some methods similar to the ones used in automatic panorama photo-stitching, only i would face a less difficult problem since the position and orientation of the next picture could be interpolated with reasonably good accuracy from the last few frames (given a high enough framerate and a smooth enough path). Problem is i dont know much about image processing and feature recognition. Some keywords to search for would be great. The purpose of this is that we need an accurate and reliable distance and rotation sensor for a small robot project were doing in school. So, given that i can make this work, we'll need to implement this on some small computer chip (probably on an AVR processor) so the algorithm used would need to be reasonably efficient in processing power and memory consumption. We though about just using an ordinary mouse for this, but they (at least mine) generally arent very precise or reliable. Which is fine when steering a cursor on a screen but not to navigate by. All answers appreciated! Emil

Share this post


Link to post
Share on other sites
Whilst I can't help with the actual proplem you asked about, I can make other suggestions.

Firstly, I don't think a typical webcam has a comparable framerate to that of an optical mouse, I'm thinking 15-30 fps for a webcam and 1500 for a digital mouse (according to howstuffworks.com). Mind you, I don't think an optical mouse transmits information about orientation, but mine is at-least very smooth for translations.

Anyway, if you want reliability, then image recognition techniques perhaps aren't the way to go, see the KISS principle.
Why not just a sonar sensor for distance measuring and digital compass for orientation?

Share this post


Link to post
Share on other sites
Quote:

Mind you, I don't think an optical mouse transmits information about orientation, but mine is at-least very smooth for translations.



Well, mine isnt. Im not talking about unsmooth transitions but that it gets stuck on certain patterns and misses some movements etc. Perhaps we should just get a better mouse though, and we'd be set.

Quote:

Anyway, if you want reliability, then image recognition techniques perhaps aren't the way to go, see the KISS principle.
Why not just a sonar sensor for distance measuring and digital compass for orientation?


We cant use compasses since we'll be indoors with alot of electronic devices and steelbeams around us, witch would iterfere with the earths magnetic field. We could use a gyro, but that would accumulate error unless its 100% friction free. Were planning on using several sonars (or IR distance sensors), but they arent terribly accurate.

The goal of our robot is to release it in a room and let it draw a map of the interior. So we need some reliable way to keep track of our own location.

Thanks anyway

Emil

Share this post


Link to post
Share on other sites
Another idea is just dead-reckoning [smile].
If that's not acceptable then, if this is a wheeled robot, you could introduce a suitably high resolution tachometer to measure wheel rotations and then indirectly figure out how much you've moved forward or rotated; the point being the tachometer won't lie or make assumptions.

Share this post


Link to post
Share on other sites
Quote:
Original post by cannonicus
What im looking for is an algorithm to recogize features in subsequent pictures. I imagine i could use some methods similar to the ones used in automatic panorama photo-stitching, only i would face a less difficult problem since the position and orientation of the next picture could be interpolated with reasonably good accuracy from the last few frames (given a high enough framerate and a smooth enough path).


Well, I guess you should spend your time analyzing the video instead of the individual images. Or in other words, analyze the difference between 2 frames. There are things that would allow you to estimate with relatively good precision the movement of the camera.

I think it was optical flow (not sure if it's the correct translation for "flux optique" in french) that could get you a movement vector for every pixels in your image (or block of pixels). In other words this will return you the translation that you need to apply to a pixel to find it on the next image.

I've seen it used to estimate movement of objects in a video but I'm sure I heard it could be used to estimate camera movement. Anyway your "objects" should all move with a "consistent" (no better word to say what I mean) vector in your case.

In the previous paragraph I said "consistent" vectors instead same vectors because that will show you translation in rotations. If all vectors are the same direction, you only have translation. Otherwise you will have some rotation.

If I understood what I've done at school correctly. This will get you a translation vector and a way to estimate rotation. Once you have them you will use some pre-calibrated data from the camera to estimate the translation in reality from the translation in pixel. This would be calculated/estimated using other calibrating techniques.

Anyway, this is not meant to be a complete explanation of the thing, it's just an explanation to show you that it might help you... and what I remember from a course I took over a year ago.

Also, this looked like something that needed a lot of calculations per-image, specially when implemented using FFT. But I still think it might be worth a try. And there are many ways to implement that and not sure which one uses less processing. Memory side you just need the current and previous frames and a structure to hold the vectors.

JFF

Share this post


Link to post
Share on other sites
Quote:
Original post by dmatter
Another idea is just dead-reckoning [smile].
If that's not acceptable then, if this is a wheeled robot, you could introduce a suitably high resolution tachometer to measure wheel rotations and then indirectly figure out how much you've moved forward or rotated; the point being the tachometer won't lie or make assumptions.


That or use some tracking device like those used for virtual reality or 3D reconstruction. You put one part on the ceiling of the room one of the robot and it will give you exact position and rotation.

I'm not sure about the costs though... just showing you another option.

JFF

Share this post


Link to post
Share on other sites
You can disassemble some optical mice to access the optical mouse sensor directly rather than going through the PC interface. If you do so you can read the motion counts directly instead of using the pointer translations from the driver. If you do so, then you can process the inputs from multiple mice directly and form a more accurate composite information model. Example optical mouse hack.

Share this post


Link to post
Share on other sites
Quote:
Original post by cannonicus
The purpose of this is that we need an accurate and reliable distance and rotation sensor for a small robot project were doing in school.


'Optic flow' is indeed the English term used to describe the technique of detecting self motion (egomotion) by measuring velocities of objects directly in the image plane of a sensor fixed in the frame of reference of the body. It is a very good technique for solving the above problem and there is plenty of literature on how to do it.

Optical mice sensors are just a cheap form of optic flow sensor. You can pick up a dedicated sensor board from Centeye (www.centeye.com) or hack your own... but if you're going to do this I'd recommend going out and buying a good quality digital gaming mouse (should be about 2kHz on the sensor read cycle).

You can use optic flow directly to perform odemetry... indeed, this is how bees are able to know exactly where they are in relation to their hive while flying quite random paths over long distances (several miles). If you need some direction on how to do this just holler... or take a look at Mandayam Srinivasan's papers on the subject of visual odemetry in bees... or one or more of the many papers on the application of visual odemetry and egomotion detection in robotics. If you need some pointers just let me know.

I'd also love to know how your project turns out and to hear your perceptions on the assessment task. One aspect of my current research is in the area of visually moderated control in robots (although I work on flying robots mostly) so I'm always interested to hear of other peoples experiences in this area.

Share this post


Link to post
Share on other sites
++ for the optical flow as an solution for apparent motion. I dont like optical flow for 3d problems because they screw up around occlusions, but as long as your surface is flat enough it should work very well.

If you need to find the optical flow between two images by yourself, the Lucas-Kanade method should work fine for you.

To extract the ego-motion from the optical flow, I liked a paper (it might even be one of Srinivasan's, that guy's done everything) that suggested to first find the rotation that makes the residual flow parallel. But personally, I found it easier (when I worked on the 3d case) to first find the translation that makes the residual flow null or concentric. Lastly, you'll need to calibrate image translation -> world translation. That depends on the distance from the camera to the table, the focal distance of the camera, the pixel size, etc.

Of course, using a mouse would be easier, but nowhere near as fun as doing it yourself :P. Your precision problem with mouses probably come from the fact that small errors accumulate really fast. In that sense the very high frequency of mouses become a liability rather than an advantage. You need a slower sensor that captures a larger portion of the table. Avoid sensors with a large aperture (like webcams!) because you'll get a strong radial distortion which will screw your results.

Share this post


Link to post
Share on other sites
Quote:
Original post by Steadtler
To extract the ego-motion from the optical flow, I liked a paper (it might even be one of Srinivasan's, that guy's done everything) that suggested to first find the rotation that makes the residual flow parallel. But personally, I found it easier (when I worked on the 3d case) to first find the translation that makes the residual flow null or concentric.


Did you investigate this beyond 'getting it to work'. Did you find (or do you believe) that it was an artifact of your experimental setup, or do you have a reason to believe it's a more fundamental result?

Share this post


Link to post
Share on other sites
Quote:
Original post by Timkin
Quote:
Original post by Steadtler
To extract the ego-motion from the optical flow, I liked a paper (it might even be one of Srinivasan's, that guy's done everything) that suggested to first find the rotation that makes the residual flow parallel. But personally, I found it easier (when I worked on the 3d case) to first find the translation that makes the residual flow null or concentric.


Did you investigate this beyond 'getting it to work'. Did you find (or do you believe) that it was an artifact of your experimental setup, or do you have a reason to believe it's a more fundamental result?


The reason I found was because the second option removes the need to evaluate the direction of the parallel flow prior to each step of the optimization process. It eliminate a degree of error and simplify the cost function. Assuming, of course, that you know the optical center of your camera. Else there is no gain.

Of course, the original paper included panning and tilting with large aperture, so it couldnt do that. Since here the only rotation is around the optical axis (Im assuming), it could apply.

(My case was different too, I used additional visual cues for rotation)

Share this post


Link to post
Share on other sites
Thanks for the extra info...

My situation is different as well, as I have three cameras in a fixed, known orientation, each with wide field of view... so I get a hemispheric image, from which it is fairly easy to deduce rotations around any axis in 3D space. I have the added benefit of several accelerometers fixed in the frame of reference of the cameras... so this gives me sufficient information for visually mediated attitude control of the camera system. ;) Most of my work in this area is based around biological models for visuo-motor control of flying robots and my flying camera system is one example, being based on the primate vistublar-occular system.

Ah, it's all good fun and I could chat about this stuff all day... but I'd better actually do some work... 8(

Cheers,

Timkin

Share this post


Link to post
Share on other sites
hrm, I'm not sure whether this could be feasibly at all... but it just came to my mind when reading the thread... two ideas, I guess the 2. is more realistic:

1)
you have some not too powerfull, relatively high frequency, stationary radio transmitter sending some signal, and in the robot, a direction sensitive, rotateable antenna (radar like), you could determine the direction of the stationary transmitter and thus know your orientation (in 2D)
I have no idea how precise this could be.


2)
another thing would be to use your web cam and place it on a rotateable part of your robot's head.
then you have a lightsource, that the robot can, via radio, switch on and off, to be sure it's the right light source.
you use preferably infrared, so it's not disturbed by other light sources in your room. My cheap chinese webcam can pick up IR, so you just need to stick an IR filter before such a cam, and voila, have a pure IR cam.
You can use as much IR LEDs in your room as you want, at different positions which the robot all knows, and have it switch on the LED that he likes to see via radio. if it makes orientation finding easier that with only one.
So you rotate your camera around, and have the robot adjust it such that the center pixels of the camera are illuminated by it, and if you have the orientation of the camera relative to the robot (rotation measurement with maybe parts from an old, non-optical mouse?), you have your robot's orientation...

if you use more than one light source, you can also calculate position


EDIT:

if the rotation thing is problematic, you could use 4 webcams (if they have wide enough FOV, that is) and put them with 90 degrees to each other on the robot's head. and infer the angles to the lightsources from which pixels are lit up by them. you'd have to know the FOV angle of the camera to do this, you could calculate it by capturing an image of your wallpaper with marks on it and measuring the distance cam--wall... or something (your gf might not like the marks on the wall, though.

EDIT #2:
hrm, of course this only works if it's okay that nobody stands in the robot's view :-D

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this