• Advertisement
Sign in to follow this  

Object Recognition

This topic is 3922 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have been trying to solve an issue of object recognition within my puny, naive brain for a few days now. My goal is to be able to identify 20 or so different objects within images -- and once the object is identified, identify a more specific object in the subgroup based on coloration, decals, et cetera. The majority of these objects are 'similar,' but have certain key features which distinguish them. They exist in a 3d environment (projected 2d), and therefore can be rotated in all directions as well as scaled and obscured (an object may exist in front of them). Basically, I think I should start by breaking down what I think 'identifies' an object: color, shape, texture, and 'key features'. These are fairly self-explanatory -- color identifies an objects color. Shape defines the general outline of an object. Texture basically takes away the color and gray scales the object to find contrasts to identify major textural differences. Finally, 'key identifiers/features' like decals are used to sub identify certain objects. But then things seem to get really inefficient when it comes to teaching the system. Because these objects can occur at almost any orientation, it seems almost impossible to identify objects with any real efficiency -- you would need a chain of images taken at all orientations and then use some sort of similarity scale to get a 'best' fit. If you have several hundred objects, each taken at several hundred orientations -- well, the system doesn't scale too well. Now, ignoring the above, my plan was basically for each orientation to create a basic 1000x1000 canvas, and continually feed the system images of an object at different scales (but the same orientation). I would scale images until the object was 1000x1000, then overlay the new image on the old canvas, using averaging algorithms to find similarities. 'Similar' areas would create a darker pixel, while dissimilar areas would lighten the pixels. At the end of the day, you would have a very blurry gray-scale identifier for the object. Texture would be a similar process, but use color contrast techniques to pull textures out more strongly. Color would simply be identified by a chain of 'similar colors' identified by the same keyword. When it came to searching by color, it would be a 'closest match' sort of deal. Key Features is a bit more complex, but I figured I could write an algorithm to identify areas of unique dissimilarity -- different colors than the primary ones used on the object. These would be saved individually. But then the whole thing sort of falls apart when it comes to the fact that the objects can be obscured and oriented in any direction. It seems like I would need to feed the system, quite literally, millions of photos. Also, how do identify within an image? For example, the object could be located anywhere within an image of any size -- at any scale and orientation. Would I have to check absolutely every pixel location? I can't seem to make the mental leap from ASCII recognition to 3d object recognition... My system doesn't seem realistic for facial recognition technology, which would be far more complex then what I would want. I want to be able to identify things like backpacks and chairs -- all with pretty unique signatures. Any ideas? I put this under AI because I believe learning techniques will definitely be needed. Thanks!

Share this post


Link to post
Share on other sites
Advertisement
I knew it wouldn't be as easy as I wanted. Thank you sir! I will be sure to check it out. All other opinions are welcome!

Share this post


Link to post
Share on other sites
The key term you're looking for in your literature search is image understanding. Many computer vision resources spend too much time worrying about image capture and image processing and little on image understanding. Object classification from visual images falls into the last topic.

Share this post


Link to post
Share on other sites
Here's something you'll want to keep an eye on: http://www.spectrum.ieee.org/apr07/4982

Although I don't think it's able to do what _you_ want it to yet (and I doubt anything else can come close), it probably won't be too long before it is able to do it, and more besides.

Hawkins also published a book in 2004 entitled "On Intelligence", which described the theory in detail.

It's a very exciting trail that Numenta's blazing.

[Edited by - Kring on March 31, 2007 4:53:39 PM]

Share this post


Link to post
Share on other sites
For some reason that article is no longer available, so here's another one that's almost as good: http://www.wired.com/wired/archive/15.03/hawkins.html

Hopefully this one doesn't disappear.

[Edited by - Kring on April 1, 2007 11:07:28 PM]

Share this post


Link to post
Share on other sites
I definitely had the feeling of "heard it all before" when I read that article about Hawkins' company and research. Not just the hype, but the technology as well. It's basically a heirarchical mixture of pattern classifiers with bidirectional weight updates. Each node essentially learns a Markov field model related to the set of 4x4 patterns it has been trained on, with dependency on the nodes above and below it in the heirarchy. The contribution of Hawkins will not be the technology, but rather an efficient implementation for training and use.

As for Hawkins' statement that the cortex implements the "heirarchical temporal memory" is pure conjecture. His statement that the proof of this conjecture is in the doing is logically absurd. I solved problem X using method A. That system solved problem X as well, hence it must also use method A.

Anyway, if he makes more millions from this, it only reiterates that most of the world don't understand science and technology, but are only too willing to consume it when it's sold with the right packaging!

Share this post


Link to post
Share on other sites
I'm quite new to all this HTM stuff myself and my computer doesn't have enough RAM to handle their Research Release, but from what I've read sofar it looks quite promising.
Only time will tell how it turns out, I suppose.

Share this post


Link to post
Share on other sites
I've skimmed through a few papers on object classifcation in my research, although I'm afraid I'm not that expert in the processes involved for your particular problem. My understanding is that most approaches in object classification use some form of feature detector algorithm to form an abstract representation of the image; the most common one in use today would be SIFT, or the Scale-invariant feature transform. You would then build a representation of the object in training on its own to construct of a database of the set of SIFT descriptors. Identification of the object would occur if you could find a minimum number of the descriptors in an acceptable configuration (i.e. similar to the object); this would help with occulsions and error as you don't need to match the entire set. I'm not sure how they deal with multiple orientations; I assume they just build a database of the object in a series of orientations.

It's an active area of computer vision research, so there's plenty of papers being published in this area if you want to look at what the academics are doing. The main conferences in computer vision are ICCV (International Conference on Computer Vision), CVPR (International Conference on Computer Vision and Pattern Recognition; traditionally a bit more practically oriented than some of the others), ECCV (European Conference on Computer Vision), as well as ACCV (Asian Conference onf Computer Vision) and ICPR (International Conference on Pattern Recognition).

Share this post


Link to post
Share on other sites
Quote:
Original post by Timkin
I definitely had the feeling of "heard it all before" when I read that article about Hawkins' company and research. Not just the hype, but the technology as well. It's basically a heirarchical mixture of pattern classifiers with bidirectional weight updates. Each node essentially learns a Markov field model related to the set of 4x4 patterns it has been trained on, with dependency on the nodes above and below it in the heirarchy. The contribution of Hawkins will not be the technology, but rather an efficient implementation for training and use.

But has that mixture been used before? (Honest question - I have hardly any knowledge of the field, so I just don't know)

I got fairly caught up in the hype when I heard about it. After letting it simmer in the back of my brain for a while, my feeling is that it's a useful contribution (whether it's original or not), and will be a very useful addition to the AI toolbox, but like all the other methods its useful domain is limited, and it won't be capable of producing the kinds of phenomenal results Hawkins' seems to imagine.

Quote:
As for Hawkins' statement that the cortex implements the "heirarchical temporal memory" is pure conjecture. His statement that the proof of this conjecture is in the doing is logically absurd. I solved problem X using method A. That system solved problem X as well, hence it must also use method A.

I'm not sure where this is coming from. Having read a couple of papers on the HTM idea, and On Intelligence (and watched a couple of talks given by Hawkins'), the impression I got was that Hawkins' belief that the neocortex implements the HTM comes straight from analysis of the brain (I'm not certain whether the whole idea for the HTM came from his research in neuroscience or not), not from seeing that they result in similar behaviour. He's certainly done his homework on the neuroscience end of it. From memory, I don't think I've ever read or heard Hawkins' claiming that the statement is more than conjecture on his part anyway, although he certainly seems to have very strong belief in it.

Anyway, sorry to be off-topic.

John B

Share this post


Link to post
Share on other sites
Quote:
Original post by JohnBSmall
Quote:
As for Hawkins' statement that the cortex implements the "heirarchical temporal memory" is pure conjecture. His statement that the proof of this conjecture is in the doing is logically absurd. I solved problem X using method A. That system solved problem X as well, hence it must also use method A.

I'm not sure where this is coming from.


He made a statment in his interview that if the system worked it was proof that the brain used this computational model. That's not a logical inference, hence my statement. It could, of course, be a misrepresentation by the interviewer. I've read articles on a variety of proposed models of cortex structure (I did several years of professional neuroscience research in hospital research labs) and it's fairly evident that layering exists and bidirectional information processing is occuring in neuronal columns in the cortex, but that doesn't mean that it translates into the exact computational model proposed in HTM. For the time being I'm optimistically skeptical; that is, I have my doubts, but would be happy to see some resolution of the issue of how the cortex organises and manages information during processing. If it can be shown rigorously that HTM is a faithful representation of this processing method, then great! If not, as you said, it's another tool in the box.

Share this post


Link to post
Share on other sites
Quote:
Original post by Timkin
Quote:
Original post by JohnBSmall
Quote:
As for Hawkins' statement that the cortex implements the "heirarchical temporal memory" is pure conjecture. His statement that the proof of this conjecture is in the doing is logically absurd. I solved problem X using method A. That system solved problem X as well, hence it must also use method A.

I'm not sure where this is coming from.

He made a statment in his interview that if the system worked it was proof that the brain used this computational model.

Ah, fair enough. I should have checked up more carefully before jumping in.

John B

Share this post


Link to post
Share on other sites
David Forsyth is who I consider to be the leading expert on the subject. I witnessed an awesome demo of his human/animal recognition a few years ago.

You can find and read lots of his object recognition papers here. I suggest you look around for the others. Like Sneftel said, this is a big field, and you'll need lots of maths and computer vision concepts. Basically, its a very difficult classification problem, you can can look into the best classifications methods (Classifications trees/forests, SVM) if you want something supervised, or clustering approaches (Estimation-Maximization maybe?) for something not so supervised.

And yes, you need a pretty big training database for this to succeed ;)

Share this post


Link to post
Share on other sites
Quote:
Original post by Steadtler
(Estimation-Maximization maybe?)


Did you actually mean Expectation-Maximisation, more widely known simply as the EM Algorithm?

Share this post


Link to post
Share on other sites
To those who say that the picture-recognition program that Hawkins&co. came up with isn't really unique: I'd be very interested in testing out other types of software that can do the same thing.

Btw, if I can't download it and test it myself, then don't even bother mentioning it.

[Edited by - Kring on April 15, 2007 2:20:11 PM]

Share this post


Link to post
Share on other sites
All I can say is that if you establish the key-feature-recogniser then you can do the rest quite easily without storing lots of images from different angles.

Lets say that the key-feature-recogniser... recognises, for lack of a better term, the nose on a person's face. If the person then rotates their head, the perspective on the nose will change, but because the program recognises the nose, it can account for any change in perspective or rotation of the head, and can then alter the image it is comparing with to match the same perspective. It then compares the two in the way you have detailed in order to determine how similiar the two faces are.

Of course, you aren't doing facial recognition, but it does make a good example.

You could also try this.

I know it's not quite what you are looking for, but it presents some similiar problems, and even if you can't use the code, the theory may help.

[Edited by - _Sauce_ on April 15, 2007 9:00:31 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Timkin
Quote:
Original post by Steadtler
(Estimation-Maximization maybe?)


Did you actually mean Expectation-Maximisation, more widely known simply as the EM Algorithm?


yeah. where was I when I typed that lol.

Share this post


Link to post
Share on other sites
Quote:
Original post by Kring
To those who say that the picture-recognition program that Hawkins&co. came up with isn't really unique: I'd be very interested in testing out other types of software that can do the same thing.

Btw, if I can't download it and test it myself, then don't even bother mentioning it.


There's a big difference between 'unique' and 'original'. His software might be unique in that it is the only software to implement HTM, but that doesn't make this information processing model original.

As for software, no one is going to give you access to research software at the cutting edge of AI, at least not without an iron-clad non-disclosure agreement and a damn good reason! As for commercial software, there isn't much out there that is publicly available. What I have seen is predominantly within the military domain and not for public consumption. This is why I believe Hawkins will make money... he's entering a market with few competitors with a decent product. That doesn't make it super-intelligent software or a model of the brain... it makes him a good businessman. He has competitors, but most aren't in the market yet.

As for competing models of intelligence and methods for image understanding, check out the research literature. There's plenty out there.

Share this post


Link to post
Share on other sites
Quote:
Original post by Timkin
There's a big difference between 'unique' and 'original'. His software might be unique in that it is the only software to implement HTM, but that doesn't make this information processing model original.

As for software, no one is going to give you access to research software at the cutting edge of AI, at least not without an iron-clad non-disclosure agreement and a damn good reason! As for commercial software, there isn't much out there that is publicly available. What I have seen is predominantly within the military domain and not for public consumption. This is why I believe Hawkins will make money... he's entering a market with few competitors with a decent product. That doesn't make it super-intelligent software or a model of the brain... it makes him a good businessman. He has competitors, but most aren't in the market yet.

As for competing models of intelligence and methods for image understanding, check out the research literature. There's plenty out there.


I've read that Numenta is involved with military contractors aswell, so if what you say is true, it's likely the same thing. But anyway, there are a lot of people who claim to have great technology, but do it mainly to grab investors; When you ask them how their stuff works or if they have an example to show you, they come up short.

Numenta, on the other hand, can both explain how their stuff works, and have examples that people can actually test out for themselves. It's always wise to be highly skeptical of big claims that are short on details and examples.

You said: "There's a big difference between 'unique' and 'original'. His software might be unique in that it is the only software to implement HTM, but that doesn't make this information processing model original."

This information processing model was pretty much discovered by Hawkins, and only became general knowledge after the publication of his book in 2004. The likeliest reason he doesn't have competition yet is because everyone else is secretly following his lead.

Share this post


Link to post
Share on other sites
Quote:
Original post by Kring
This information processing model was pretty much discovered by Hawkins, and only became general knowledge after the publication of his book in 2004.


If you're referring specifically to HTM, then you are, of course, correct. If you're referring to heirarchical decomposition and bidirectional parameter adjustment for classification (as I was), then I've definitely seen it before. Specifically, in heirarchical Markov field models for clustering and classification (with some application to MRI), as well as heirarchical Bayesian models for modelling of the cortex from MRI images). I'll grant that the application of an architecture such as this to time dependent classification is probably novel.

Share this post


Link to post
Share on other sites
Quote:
Original post by Timkin
If you're referring specifically to HTM, then you are, of course, correct. If you're referring to heirarchical decomposition and bidirectional parameter adjustment for classification (as I was), then I've definitely seen it before. Specifically, in heirarchical Markov field models for clustering and classification (with some application to MRI), as well as heirarchical Bayesian models for modelling of the cortex from MRI images). I'll grant that the application of an architecture such as this to time dependent classification is probably novel.


Yes, Hawkins did say something similar in a recent article I read.

Share this post


Link to post
Share on other sites
This is interesting.

It really makes me wonder how the hell human beings are so good at it. What exactly allows me to make identifications of objects within pictures so easily and accurately? What are the heuristics I use to do that? It's so automatic that I can't even break it down!

Share this post


Link to post
Share on other sites
Quote:
Original post by Kevinator
It really makes me wonder how the hell human beings are so good at it. What exactly allows me to make identifications of objects within pictures so easily and accurately? What are the heuristics I use to do that?


One popular theory as to how the brain performs recognition/understanding is that it proposes many possible, closely related explanations of what it 'sees' (and here I use 'see' to mean all sensory information). These are then tested against subsequent information and the poorly performing hypotheses are quickly culled until (hopefully) only one hypothesis remains, which is accepted as truth. This processing happens on timescales of the order of milliseconds, but can last longer. We know that in the visual system, for example, detecting certain information in our environment leads to set sequences of eye movements to vary the information we gather (which we don't even realise happens as our brain hides the variation in the images we see).

In certain cortical regions the hypotheses are seen to be recorded as oscillations of given frequencies of spike trains in closed neuronal loops, with variations on represented by a narrow band of alternate frequency signals. The information that combines in these signals is feature information from the sensory space; for example, texture, lines (and their orientation), shadow, movement velocity, etc. The hypotheses are ways of combining this information into a plausible description of the environment, grounded in what we have already observed. If we've never seen it before and it is difficult to cull the hypotheses, we can have difficulty working out what it is we are seeing. There are classic visual illusions used in psychology to assess this indecisive behaviour of the brain with regards to vision and one could reasonably assume such illusions could be created for our other sensory systems.

A good example of this kind of recognition occurs in the olfactory system, which was shown (back in the mid '80s iirc) to encode smell information as low dimensional attractors in the phase space of certain neuronal clusters. If you recognised a smell/taste the oscillations in the olfactory neurons would settle onto a unique attractor. If you didn't, after repeated exposure (which equates to reinforcement of the stimulus) you learned a new attractor and hence a new smell/flavour.


I could go on about this stuff all day, but I won't. If you're interested, there is heaps of information out there in both lay and professional publications. It makes for great Sunday afternoon reading! ;)

Cheers,

Timkin

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement