Archived

This topic is now archived and is closed to further replies.

Robotic Imagery

This topic is 5090 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Does anybody know of a program, or library of some sort, where I can scan in real time an image coming from a web cam? Is there some way to interface the TWAIN driver directly into memory for analyzation by my imagery program? I''m developping a program which, hopefully, can convert the 2d image of a 3d scene into 3d objects in the computer''s brain(3d world) using motion for navigation by the robot.

Share this post


Link to post
Share on other sites
just a comment, you''ll need two cameras which are split by a known offset to get any sort of 3d derivation.

Share this post


Link to post
Share on other sites
quote:
Original post by strider44
just a comment, you''ll need two cameras which are split by a known offset to get any sort of 3d derivation.


Not true. You can simply move one camera to achieve this. However, even then, you''ll just get a bad-looking heightmap from this method.

Share this post


Link to post
Share on other sites
Assuming you''re using Windows you can use DirectShow to capture from the webcam, but''s its a real pain to do.
You could also take a look at the ARToolKit source code. Grab the version 2.52 downloads that use DirectShow and Video For Windows. There''s also a version for Linux.

Enigma

Share this post


Link to post
Share on other sites
Yes, though I respect everyone''s ideas and give them thought, to me it is out of the question to have two cameras for the job.

The human brain doesn''t need two eyes, try for yourself. Because of this I don''t believe any computer imaging system should require two offset cameras because if it does, it uses a different method then our mind does.

The philosophy behind my system is this example: (and I can elaborate hugely if you guys are interested) is that if there''s a ZIP file on your computer, the zip file does not jump up and get running when you click on it, rather, Windows compares the extension with that of all known file extensions. It compares only what it knows to what it sees. My imaging tech will only be able to distinguish what it''s looking at unless it has a predetermined model to compare. The rest is just filling holes.

The short and long is, computers aren''t human and they cannot simply look at an object and make sense of it. They have to try fit their own idea of what it is, and what the computer thinks it should be, on to it. This idea works its way down through the system which I''m abstracting on right now, and the way I work it we should have already established alot of the technology to do this. In fact, all the techniques can be built right ontop of alot of the functions available in OpenGL.(I''m not a DX fan, I''ve only learned GL and it seems to do the job)

The goal of my research is to build a cheap, fast, independent program that needs no more than one camera to understand it''s surroundings and, with a glance, and a little process comparison, it can construct in it''s memory a 3d world which reflects closely the real world around it.

This kind of technology would enable us to build robots that we have always wanted. If you guys want to discuss this, I''m all up for it, I just hope we''re in the right forum!

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Oh, and did I forget to say I think the system should be able to look at any scene we are exposed to and resolve it. Not just a dinky little red ball finder, no, this vision system would be told to find a path of adequate size in front, or a machine or object of a certain size.

I''m working on some concept photos right now of the imaging process which I hope will help project my ideas better and so everybody can get a clear idea(I visualize everything) and challenge me that better.

With that said, rip my little theory apart.

Share this post


Link to post
Share on other sites
Have a look at DirectShow, that should do you for the image capture end of things, not the fastest but it works under windows. Under linux you can use Video 4 Linux.

The image processing end of things has been done before so my suggestion is to use a 3rd party library like OpenCV to give you the best shot at getting finished in a resonable time. It comes with code and samples that deal with stereo vision, and some sophisticated tracking systems (Kalman and HMM).

You might want to look into some of the papers around from Robocup (the robot soccer competition). They get the sony aibo dogs to do some really amazing 3d localisation with very minimal resources.

[edit]typo[/edit]

[edited by - XXX_Andrew_XXX on February 10, 2004 11:40:23 PM]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Thanks. I''ll be sure to check that out

Share this post


Link to post
Share on other sites
no, you are incorrect when you say our human brain can do it with only 1 eye. You already have memories in your mind stored up of how objects look in 3 dimensions & use that data to make sense of what you see, not even decode it, because you really can''t tell, you just make sense of it. I had a friend who only had 1 eye, & believe me... sometimes he couldn''t tell if a moth was fluttering 3 feet away or a bird was flying 3 dozen feet away. That is just a simple example... lets not even get into the details of all the objects, shapes, colors, light intensities, blah blah blah that might be present in a scene for a camera to decode.

Share this post


Link to post
Share on other sites
To address the obvious and most difficult issue, I have to say that in response to

quote:
... lets not even get into the details of all the objects, shapes, colors, light intensities, blah blah blah that might be present in a scene for a camera to decode.


that most difficult surfaces are actually patterns, and trends can be found, though alot of them are rough. I think the program would have to be intelligent enough to recognize a uniform surface. (ps. as well as gradients, gradual movements of light would be classified as part of a surface's trend or pattern, however sudden changes in the apperance of a surface's edge would be registered as points/vectors for later matching by the 3d plane trace system)

It is essential that the program be able to trace the edges of objects in the photo. This would allow the program to produce 2d points corresponding to the actual 3d dimensions of a particular surface.

The computer is not that dumb, either. Knowing that it's in a house, looking around it knows that it won't find anything that's over 12 feet tall, nor below it's feet, usually.

With that, it would project a wireframe plane on the expected Y coordinate and shift it until the panels of this wireframe plane came close to the corners of the bed. By experimenting with this, the program would shift the wireframe plane, guestimating the coordinates of the bed, wondering what it is, until it comes up with a 3d surface in it's own mind that matches the points in the 2d picture that it sees. (ps. that would be a close match. . even a little bit off, in any 3d direction, and the coords would be pretty messed. Movement, from different angles, would also refine this, though I dont want to make that necessary)

Essentially, when looking at a bed, I want it to see, roughly four corners to keep that whole process less cpu intensive.

Hope that makes sense, I think that sums up the principle of the theory.

[edited by - Crunchdown on February 11, 2004 1:32:04 AM]

- another edit - about the two offset cameras, I can see that two eyes would make some advantage, but like mentioned above, this has not been implemented in software with much success as being the sole method of 3d locationing(bad looking height map). It might help a bit in some ways but a computer works too much different from the mind to see any real advantage. Again, the system the way I see it so far doesn't need an offset camera. It might somehow boost performance, though. I think my approach to solving the vision problem is different.

[edited by - Crunchdown on February 11, 2004 1:45:00 AM]

Share this post


Link to post
Share on other sites
quote:
The human brain doesn''t need two eyes, try for yourself.


This is partitially false. The human brain needs only one eye but you can''t compair the eye with a camera. The eye has a secret weapon: it can focus on a specific depth quickly telling your brain what depth a specific object is on. This way you still can see depth (don''t try seeing real depth without moving your eye though, it''s almost impossible). I wouldn''t try that with a simple camera.

Share this post


Link to post
Share on other sites
ha nice idea, and good luck, i personaly think you will not get enought computational power in todays hardware. i for example capture a simple laserline and use some simple math to gen 3d data, but on my athlon xp2400 its already not full 25fps, ok its not optimized, but its only simple math and one line that has to be detected and transfered to 3d (imagine how slow your full recognition will be )

here''s how ya do it


T2k

Share this post


Link to post
Share on other sites