Video Input
What's the best way to work with video input? (either live or in any video format)
That is, I would like to be able to recognize the properties of a pixel at a given moment in time at a given location in a given video. I figure that if I were able to do this or something similar, I would be able to track the location of an object in the video, potentially in 3D space as well.
EDIT: My preferred language is Java, although I can adapt easily to C++, C, C#. Any others will require much reading on my part.
Well, I'm not actually answering your question, but making a general point. In terms of tracking objects (especially in 3D)... getting the image data from the video stream is the easy part. Tracking objects in anything approximating a generic way is a very complicated task. As for 3D, consider a large rectangle at a distance vs a small rectangle close to the camera. These may be indistinguishable. Second, camera's do not produce a pure "mathematical" perspective of projection of the environment. They all introduce distortion (barrel, pin cushion etc). If you knew these parameters (as well as the focal length and sensor size) of a specific camera (these can be figured out by calibration)... you can still not obtain 3d information from a generic scene. You require (at least) two cameras for this. The generic approach is known as "photogrammetry" and is by no means trivial. If you are *geniunely* interested in persuing this (and it is a complex business) there are two free projects I've seen that have software to make a 3D model based on a camera (which you calibrated) against a specific background and a laser pointer. I can't remember the names or links, but I guess you'll be able to locate them if you search around a bit.
Hope this helps,
Dan
EDIT: Here is one of those projects.
Hope this helps,
Dan
EDIT: Here is one of those projects.
I seem to recall seeing this project a while back. As for the complicated math, a single image does have enough information to determine the location of the tracked object, for the size of this object is known, as is the position of other objects and the position of the camera. With this information, I am confident that I can tackle the calculations. The object in question is a small 40mm diameter sphere (a ping pong ball). The problems which remain are in determining the position of the center of the ball on the 2D video and on determining the radius of the ball on the 2D video. From there it would be easy. However, the ball is in motion and thus blurred, but I think I know how I would overcome this. The only problem I have is reading the data.
Well... for a ball on a uniform background (ie good contrast) assuming the lens is relatively long focal length (minimise distortion towards the edges), but not too long (as the change in size with distance will be insufficient to detect reliably) this calculation is fairly simple. As for how to obtain a video stream. My primary platform is linux... therefore v4l/v4l2 (video for linux) is the obvious API from which you can obtain raw pixel data from the stream. As for windows, I believe this is what directshow is for. I believe it is capable of rendering directly to a context without intervention, but I believe it is also capable of rendering to an off screen pixel buffer (I presume this is how virtual dub uses it). As for Mac, I'm not sure what the API is, so I can offer no advise. As for blur on the image. If your background is uniform, you could find all "ball coloured" pixels weighted by their intensity (compared to the background) and carry out some sort of error minimisation to fit a non blurred ball of appropriate size and location. This should work for a ball moving towards/away from the camera as well as across. I'd be interested to hear if you are successful in this endeavour, so please reply if you get it working.
Good luck,
Dan
Good luck,
Dan
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement