#### Archived

This topic is now archived and is closed to further replies.

# letter recognition

This topic is 5264 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

im working on a fairly simple letter recognition program using a basic neural net and i remember reading a few months ago about a data set of letters out in the ether somewhere. if i recall correctly it was just boolean pixels making up all the letters of the alphabet many times over and used for training a net. obviously this would make training my net a lot easier, if i could just load a dat file or something and run that. if any one knows where this might be, or another method/source i could use, i would really appreciate the help. thanks, justo [edited by - justo on August 18, 2003 9:06:08 PM]

##### Share on other sites
2 years ago, at GDC, I was shown an applet from the Postal Service that showed how they use NNs to do the character recognition for handwriting, etc. for addresses. I don''t know if they had an available character set, but that might be something to search for.

For your information, it WAS a grid system of boolean data... a pixel was either on or off.

Dave Mark - President and Lead Designer
Intrinsic Algorithm -
"Reducing the world to mathematical equations!"

ok, i found numerous references to the NIST database via google, but its like $1000 and not very viable for this project. anyone else have any other ideas? #### Share this post ##### Link to post ##### Share on other sites You don''t give many specifics about what you''re doing (size of the input image and whether other sizes/resolutions will be converted to a standard image size, whether you expect to accept grayscale or color images (to be converted to binary), etc. Anyway, the UCI Machine Learning Repository (http://www.ics.uci.edu/~mlearn/MLRepository.html) might have what you''re looking for. Otherwise, try the data sources list at KDNuggets (http://www.kdnuggets.com/datasets/index.html) or search on your favorite search engine for things like "OCR", "character recognition", etc. Here are two other possible sources: http://www.eelab.usyd.edu.au/cel/Resources/ocr/ocrdata.htm http://www.cs.wisc.edu/~dyer/cs766/hw/hw4/hw4-sheng/sheng.html Good luck, Predictor http://will.dwinnell.com #### Share this post ##### Link to post ##### Share on other sites ok, thanks a lot, i''ll check those out. i dont know if it matters now, but that the reason i was looking for this before i got too far along was so i could know what resolution to pick...whatever has the largest source of training sets. and i *probably* wont put in the capability to accept greyscale/color images, but who knows, somewhere along the line i may get bored with doing UI stuff and play arond with some conversion algorithms. anyway, thanks again justo #### Share this post ##### Link to post ##### Share on other sites Both gray and color can be reduced to on/off at some level. Dave Mark - President and Lead Designer Intrinsic Algorithm - "Reducing the world to mathematical equations!" #### Share this post ##### Link to post ##### Share on other sites For what it''s worth, although i despise the AI: A Modern Approach book by Russel & Norvig, the one section i think they actually did well was a comparision of neural nets, boosted ANNs, SVMs, boosted SVMs, decision trees, nearest neighbor and LeNet in letter recognition on the US Postal database (it''s$1,000? i didn''t know that...)

Anyway, it''s an interesting read and you might be able to find it on the Web somewhere (maybe even at aima''s Web site at Berkeley)

All of which assumes you are interested in alternatives to ANNs. As expected (by me at least ), ANNs took a super long time to train and were not the best. LeNet is the reigning champion. It''s a feature extractor. Some people call them ANNs, but then again, some people will call anything that uses linked lists an ANN. LeNet is just a series of arrays, each smaller than the last, that squeezes values kinda like trilinear filtering does in graphics. If you have a 16x16 base array you might squeeze it into a 4x4 array where each block of 4 pixels is averaged to create one pixel in the next layer. The end result is a very small array with hopefully all the proper fields filled in

BTW, if you''re just going to scan in free form text, i should mention that psychologists have spent a lot of time talking about how strongly vision influences human thinking and how human thought is strongly influenced by directions like up and down. The reason why this is important is that humans rely a lot on orientation for meaning. It''s why most humans consider a diamond to be a different shape than a square. And it makes a lot of difference in telling lowercase "b", "d" and "p" apart. Vision tends to be "size invariant", meaning we don''t give a lot of importance to size except in relative situations (big "O" vs. little "o", big "I" vs. little "l")

Anyway, the point is that someone has to rotate the image properly before you can recognize it. i think the way humans do it for a single letter is they feature match a letter or number to an orientation-sensitive feature template and if there are no matches they mentally rotate the image 15 degrees clockwise and try again (repeat 24 times). The exception being if the image is 2D and can be flipped in the 3rd dimension (ie, rotate on diagonal axis), which your brain does right away. Note this doesn''t work on 3D images. Also note that humans probably look at other clues for rotation when they''re looking at a page of text rather than a single letter

-b

##### Share on other sites
quote:
Original post by baylor
i think the way humans do it for a single letter is they feature match a letter or number to an orientation-sensitive feature template and if there are no matches they mentally rotate the image 15 degrees clockwise and try again (repeat 24 times).

You''ll find that most people, when presented with a letter that they think they might recognise but don''t - i.e., it has features of a letter but is not a pattern recognise instantly - will rotate either the presented object (paper) or their head if they can''t rotate the object (computer screen).

We have neuronal columns in our visual cortex that have a firing rate dependent on the orientation of visual stimuli. That is, we can say that certain neuronal clusters are sensitive to given orientations of stimuli. It has also been found - from in vivo experiments in cats and simulations of human neuronal columns of the visual cortex - that features in visual stimuli are encoded in a) the firing rates of clusters of neurons; b) the difference in firing rates between clusters of neurons; c) the phase coherence between clusters of neurons; and (MOST INTERSTINGLY), the temporal evolution of phase coherence between clusters of neurons.

So what does all this mean? It means that orientation and direction of motion of visual stimuli are important features, however the encoding of features of visual stimuli in natural visual cortices is not easily translated into simulated neural systems, such as the ANN. Therefore, you''re going to have to to either, 1) train your classifier on all possible orientations (presuming that the stimuli is stationary relative to the receptor); or, 2) design an artificial neuronal network that encodes orientation features in loosely connected sub-networks.

Most people just do (1).

Cheers,

Timkin

##### Share on other sites
ha, thanks for the responses. i thought that was it for this thread.

baylor: just joking about the $1000 thing, i was wrong. its$90, which is still more than i''d be willing to spend on this, but i guess not unreasonable considering how huge a database it is and how useful it would be for a professional system. As for LeNets...does the Le stand for Leplacian(sp?) Pyramids? i learned about them in a class i sat in on last year, using them for texture filtering, as you mentioned. it seems like the same technique. i''ll definitely check them out

as for your and timkin''s discussion on orientaion, really im just doing this as a simple project, figuring out how to save and load and train neural nets in manageable ways. i find it all very interesting though, so bring on the discussion!

ive read about encoding information in the different firing rates of neurons and how to abstract that, along with a host of other issues that really put my little net to shame. it makes you appreciate the real things, but, like you guys seem to think, make them so much more interesting to study and try and emulate. besides AI: A Modern Approach, which you seem to love so much baylor, are there any good books you reccomend? books that you still have and pull off your shelf to read about something....maybe a mix of theory and practical examples?

thanks again for the responses
justo

##### Share on other sites
quote:
Original post by Timkin
...you''re going to have to to either, 1) train your classifier on all possible orientations (presuming that the stimuli is stationary relative to the receptor); or, 2) design an artificial neuronal network that encodes orientation features in loosely connected sub-networks.

Most people just do (1).

Another possibility is to add a non-neural pre-processing step which is invariant to rotation (typcially, either rotating the data to a standard orientation or summarized features which are invariant to rotation).

-Predictor
http://will.dwinnell.com

##### Share on other sites
quote:
Original post by justo books that you still have and pull off your shelf to read about something....maybe a mix of theory and practical examples?

While some people don''t like it, I like and still utilise "C++ Neural Networks & Fuzzy Logic" as a reference and "how-to" for many ANN tasks. It comes with code and data so it''s fairly good for learning, although there are certainly other good books out there that suit this purpose as well.

Timkin

##### Share on other sites
quote:
Original post by justo
...are there any good books you reccomend? books that you still have and pull off your shelf to read about something....maybe a mix of theory and practical examples?

For the image processing side of this process, I'd suggest either "Algorithms for Image Processing and Computer Vision", by Parker, or "Digital Image Processing", by Gonzalez and Woods. Parker's book includes more OCR-specific information.

Assuming that you are training some sort of learning system for the classification, I'd suggest "Neural Networks for Statistical Modeling", by Smith for a purely neural approach, "Advanced Methods in Neural Computing", by Wasserman for alternative neural architectures, or one of "Computer Systems That Learn", by Weiss and Kulikowski, "Predictive Data Mining", by Weiss and Indurkhya, or "Solving Data Mining Problems through Pattern Recognition", by Kennedy, Lee, Reed, Van Roy and Lippmann for a broader mixture of learning algorithms.

All of these titles, and others, are described (including ISBNs) in Will's Technical Book List, which can be found at http://will.dwinnell.com/will/Wills%20Technical%20Book%20List%20Mar-2002.doc

Good luck!
Predictor
http://will.dwinnell.com

[edited by - Predictor on August 26, 2003 8:40:41 AM]

##### Share on other sites
For a good coverage of classification techniques, you might also want to try: "Pattern Classification", by Duda & Hart. For a good recent text on using ANNs for pattern recognition/classification, check out: "Neural Networks for Pattern Recognition" by C.M. Bishop.(Clarendon Press, 1995)

Cheers,

Timkin

##### Share on other sites
quote:
Original post by Timkin
For a good coverage of classification techniques, you might also want to try: "Pattern Classification", by Duda & Hart.

There is a second edition of this book, co-authored by Duda, Hart and Stork. For the ISBN and a review, see:

http://www.pcai.com/Paid/Issues/AC14563/CD4231A/16.2_PA/PCAI-16.2-Paid-pg.40-Bookzone.htm