Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

justo

letter recognition

This topic is 5535 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

im working on a fairly simple letter recognition program using a basic neural net and i remember reading a few months ago about a data set of letters out in the ether somewhere. if i recall correctly it was just boolean pixels making up all the letters of the alphabet many times over and used for training a net. obviously this would make training my net a lot easier, if i could just load a dat file or something and run that. if any one knows where this might be, or another method/source i could use, i would really appreciate the help. thanks, justo [edited by - justo on August 18, 2003 9:06:08 PM]

Share this post


Link to post
Share on other sites
Advertisement
2 years ago, at GDC, I was shown an applet from the Postal Service that showed how they use NNs to do the character recognition for handwriting, etc. for addresses. I don''t know if they had an available character set, but that might be something to search for.

For your information, it WAS a grid system of boolean data... a pixel was either on or off.

Dave Mark - President and Lead Designer
Intrinsic Algorithm -
"Reducing the world to mathematical equations!"

Share this post


Link to post
Share on other sites
ok, i found numerous references to the NIST database via google, but its like $1000 and not very viable for this project. anyone else have any other ideas?

Share this post


Link to post
Share on other sites
You don''t give many specifics about what you''re doing (size of the input image and whether other sizes/resolutions will be converted to a standard image size, whether you expect to accept grayscale or color images (to be converted to binary), etc.

Anyway, the UCI Machine Learning Repository (http://www.ics.uci.edu/~mlearn/MLRepository.html) might have what you''re looking for. Otherwise, try the data sources list at KDNuggets (http://www.kdnuggets.com/datasets/index.html) or search on your favorite search engine for things like "OCR", "character recognition", etc.

Here are two other possible sources:
http://www.eelab.usyd.edu.au/cel/Resources/ocr/ocrdata.htm
http://www.cs.wisc.edu/~dyer/cs766/hw/hw4/hw4-sheng/sheng.html

Good luck,
Predictor
http://will.dwinnell.com

Share this post


Link to post
Share on other sites
ok, thanks a lot, i''ll check those out. i dont know if it matters now, but that the reason i was looking for this before i got too far along was so i could know what resolution to pick...whatever has the largest source of training sets. and i *probably* wont put in the capability to accept greyscale/color images, but who knows, somewhere along the line i may get bored with doing UI stuff and play arond with some conversion algorithms.

anyway, thanks again
justo

Share this post


Link to post
Share on other sites
For what it''s worth, although i despise the AI: A Modern Approach book by Russel & Norvig, the one section i think they actually did well was a comparision of neural nets, boosted ANNs, SVMs, boosted SVMs, decision trees, nearest neighbor and LeNet in letter recognition on the US Postal database (it''s $1,000? i didn''t know that...)

Anyway, it''s an interesting read and you might be able to find it on the Web somewhere (maybe even at aima''s Web site at Berkeley)

All of which assumes you are interested in alternatives to ANNs. As expected (by me at least ), ANNs took a super long time to train and were not the best. LeNet is the reigning champion. It''s a feature extractor. Some people call them ANNs, but then again, some people will call anything that uses linked lists an ANN. LeNet is just a series of arrays, each smaller than the last, that squeezes values kinda like trilinear filtering does in graphics. If you have a 16x16 base array you might squeeze it into a 4x4 array where each block of 4 pixels is averaged to create one pixel in the next layer. The end result is a very small array with hopefully all the proper fields filled in

BTW, if you''re just going to scan in free form text, i should mention that psychologists have spent a lot of time talking about how strongly vision influences human thinking and how human thought is strongly influenced by directions like up and down. The reason why this is important is that humans rely a lot on orientation for meaning. It''s why most humans consider a diamond to be a different shape than a square. And it makes a lot of difference in telling lowercase "b", "d" and "p" apart. Vision tends to be "size invariant", meaning we don''t give a lot of importance to size except in relative situations (big "O" vs. little "o", big "I" vs. little "l")

Anyway, the point is that someone has to rotate the image properly before you can recognize it. i think the way humans do it for a single letter is they feature match a letter or number to an orientation-sensitive feature template and if there are no matches they mentally rotate the image 15 degrees clockwise and try again (repeat 24 times). The exception being if the image is 2D and can be flipped in the 3rd dimension (ie, rotate on diagonal axis), which your brain does right away. Note this doesn''t work on 3D images. Also note that humans probably look at other clues for rotation when they''re looking at a page of text rather than a single letter

-b

Share this post


Link to post
Share on other sites
quote:
Original post by baylor
i think the way humans do it for a single letter is they feature match a letter or number to an orientation-sensitive feature template and if there are no matches they mentally rotate the image 15 degrees clockwise and try again (repeat 24 times).


You''ll find that most people, when presented with a letter that they think they might recognise but don''t - i.e., it has features of a letter but is not a pattern recognise instantly - will rotate either the presented object (paper) or their head if they can''t rotate the object (computer screen).

We have neuronal columns in our visual cortex that have a firing rate dependent on the orientation of visual stimuli. That is, we can say that certain neuronal clusters are sensitive to given orientations of stimuli. It has also been found - from in vivo experiments in cats and simulations of human neuronal columns of the visual cortex - that features in visual stimuli are encoded in a) the firing rates of clusters of neurons; b) the difference in firing rates between clusters of neurons; c) the phase coherence between clusters of neurons; and (MOST INTERSTINGLY), the temporal evolution of phase coherence between clusters of neurons.

So what does all this mean? It means that orientation and direction of motion of visual stimuli are important features, however the encoding of features of visual stimuli in natural visual cortices is not easily translated into simulated neural systems, such as the ANN. Therefore, you''re going to have to to either, 1) train your classifier on all possible orientations (presuming that the stimuli is stationary relative to the receptor); or, 2) design an artificial neuronal network that encodes orientation features in loosely connected sub-networks.

Most people just do (1).

Cheers,

Timkin

Share this post


Link to post
Share on other sites
ha, thanks for the responses. i thought that was it for this thread.

baylor: just joking about the $1000 thing, i was wrong. its $90, which is still more than i''d be willing to spend on this, but i guess not unreasonable considering how huge a database it is and how useful it would be for a professional system. As for LeNets...does the Le stand for Leplacian(sp?) Pyramids? i learned about them in a class i sat in on last year, using them for texture filtering, as you mentioned. it seems like the same technique. i''ll definitely check them out

as for your and timkin''s discussion on orientaion, really im just doing this as a simple project, figuring out how to save and load and train neural nets in manageable ways. i find it all very interesting though, so bring on the discussion!

ive read about encoding information in the different firing rates of neurons and how to abstract that, along with a host of other issues that really put my little net to shame. it makes you appreciate the real things, but, like you guys seem to think, make them so much more interesting to study and try and emulate. besides AI: A Modern Approach, which you seem to love so much baylor, are there any good books you reccomend? books that you still have and pull off your shelf to read about something....maybe a mix of theory and practical examples?

thanks again for the responses
justo

Share this post


Link to post
Share on other sites
quote:
Original post by Timkin
...you''re going to have to to either, 1) train your classifier on all possible orientations (presuming that the stimuli is stationary relative to the receptor); or, 2) design an artificial neuronal network that encodes orientation features in loosely connected sub-networks.

Most people just do (1).


Another possibility is to add a non-neural pre-processing step which is invariant to rotation (typcially, either rotating the data to a standard orientation or summarized features which are invariant to rotation).

-Predictor
http://will.dwinnell.com


Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!