Sign in to follow this  
Si1ver

any neural network experts?

Recommended Posts

Hey, does anyone have experience training neural networks with large datasets? I doing character recognition and at this point it isn't as feasible to use trail and error to train, especially when I have to leave it on over night. At the moment I'm training characters A-Z, with 10 training examples for each character. My network structure is 36,36,36,26. Learning rate = 0.05 Momentum 0.01 With higher learning rates or momentums it doesn't seem to converge, just goes up and down vigorously, but with these it seems to converge slowly, just slower and slower as it gets from 2000 error to 1200 error. And at 1200 error, where the slope seems to get caught in local minima it is only partially good at recognising characters. My only guess would be that the network isn't complex enough for the dataset? So I am now training with 36,36,36,30,26 to see what happens, but it takes forever. Like one minute per an epoch. any ideas anyone? should i try increasing the size of the hidden layers, instead of increasing the amount of layers? cheers silver

Share this post


Link to post
Share on other sites
Back propagation is surprisingly robust if you let it train for ages, more so than other training algorithms.

However, I've had a lot of success with things like RPROP. The idea is to batch up all the weight deltas while training, so you're less likely to find local minima.

It's not too much work to implement. See the FANN library if you get stuck.

Share this post


Link to post
Share on other sites
Quote:
Original post by Si1ver
Hey, does anyone have experience training neural networks with large datasets? I doing character recognition and at this point it isn't as feasible to use trail and error to train, especially when I have to leave it on over night.

At the moment I'm training characters A-Z, with 10 training examples for each character.

My network structure is 36,36,36,26.
Learning rate = 0.05
Momentum 0.01

With higher learning rates or momentums it doesn't seem to converge, just goes up and down vigorously, but with these it seems to converge slowly, just slower and slower as it gets from 2000 error to 1200 error. And at 1200 error, where the slope seems to get caught in local minima it is only partially good at recognising characters.

My only guess would be that the network isn't complex enough for the dataset? So I am now training with 36,36,36,30,26 to see what happens, but it takes forever. Like one minute per an epoch.

any ideas anyone? should i try increasing the size of the hidden layers, instead of increasing the amount of layers?


I use early stopping to prevent over-fitting, and my experience is that backpropagation frequently hits its optimum fairly quickly (sometimes in as few as 50 training iterations). Obviously, your experience may be very different, given different software and data.

As a general note, once the neural network has more than one output node, there exists the possibility that some output nodes are trained before others. This implies that further training will overfit some nodes while others are yet underfit. With 26 output nodes, I suggest that there is a strong possibility that this is happening.

I suggest trying different methods of pre-processing the data. This will often improve results more than tinkering with training settings.

Good luck!



Share this post


Link to post
Share on other sites
Another option would be to train 26 separate networks (one for each letter) and train them separately.

Similar approaches have been used for things like autonomus navigation, where one net controls yaw, and another controls pitch, etc...

Share this post


Link to post
Share on other sites
I think your network structure is actually too complex, especially in the modified version with 3 hidden layers. Two hidden layers should be plenty.

You mention oscillatory behavior - are you doing batch training or individual training. Often oscillations occur when you have a training step after each example, and you can avoid this with batch training - running all of your examples through once, accumulate the errors, and the do one training step afterward.

-Kirk

Share this post


Link to post
Share on other sites
Thanks for your comments guys. I have mostly come to a solution now. I believe the network as in fact too complex as kirkd suggested. I was much more successful with one hidden layer. In addition I am processing each of the training examples in a random sequence for each epoch.

Silver

Share this post


Link to post
Share on other sites
One thing I would recommend before making any NN, is to consider the input and output data itself. How to represent the data to/from the net? This is often the first issue in making a fast net.

For example, you are using 26 output nodes, with each node representing a letter. A better way would be to have the net output a binary number.

00001 = a
00010 = b
00011 = c
...
11010 = z

This, along with only using one hidden layer, would really speed up the net. I used this with a net that detected numbers on 16x16 pixel images, and one hidden layer, and the 5 node output layer. Backprop. only took a minute.

Share this post


Link to post
Share on other sites
You have to be careful with that sort of thing, as it implies that A and C have some factor in common that B doesn't have, and so on. I can see that 5 outputs might well converge more quickly than 26 but there's a real risk of training the wrong sort of information there.

Share this post


Link to post
Share on other sites
Quote:
Original post by Kylotan
You have to be careful with that sort of thing, as it implies that A and C have some factor in common that B doesn't have, and so on. I can see that 5 outputs might well converge more quickly than 26 but there's a real risk of training the wrong sort of information there.


I've run into more problems trying to train a net that uses the "one node per letter" type structure, having issues with over-fitting. I also don't think that it matters. :) I've used nets to train with vector data to output controls to 'bots'; the output was between 1 and -1 with a few nodes controlling turning angle and speed. What I'm saying is that there is no connection between the output nodes themselves. Just because a = 00001 and c = 00011 doesn't mean there is a relationship.

I just found the project I used this for. The net was:

384 -> 40 -> 7

384 inputs because the images where 24x16. 150 training images, numbers up to 75 with a white or gray background. It works every time I train a net, the error rate always falls and never fluctuates. :) (I was most proud of it lol)

Share this post


Link to post
Share on other sites
Your network has a lot of layers...

Ten examples per character sounds a bit low...

What are you using as input to the network?

Are you using separate training sets and testing sets to prevent overfitting?

Share this post


Link to post
Share on other sites
Quote:
Original post by the_eb
Quote:
Original post by Kylotan
You have to be careful with that sort of thing, as it implies that A and C have some factor in common that B doesn't have, and so on. I can see that 5 outputs might well converge more quickly than 26 but there's a real risk of training the wrong sort of information there.


I've run into more problems trying to train a net that uses the "one node per letter" type structure, having issues with over-fitting. I also don't think that it matters. :) I've used nets to train with vector data to output controls to 'bots'; the output was between 1 and -1 with a few nodes controlling turning angle and speed. What I'm saying is that there is no connection between the output nodes themselves. Just because a = 00001 and c = 00011 doesn't mean there is a relationship.

But there clearly is, since one of the output nodes is the same in each case. That means you've embodied in the network the requirement that one of the output nodes cannot change between A and C, whereas it has to change between A and B or A and C. This places requirements on some of the weights and affects the training.

eg. With only one node being the difference between a C and a G you can expect a lot of errors there, whereas I and J, which you might expect to be confused more often, have 2 nodes differing making them easier to differentiate as you'd need 2 nodes to be wrong in order to wrongly classify them.

Looking at it from a different angle, imagine a 33 letter alphabet (eg. Cyrillic) with 6 binary output nodes representing numbers from 0 to 32 - now you have 1 output node that exclusively identifies the last letter regardless of the other 5 nodes, whereas every other letter depends on a combination of the other 5. You can hopefully see that this is not an efficient use of the network as there are 5 outputs completely irrelevant to 1 of the letters and 32 letters where one of the outputs is irrelevant.

Of course, in the case we're talking about this type of disadvantage is outweighed by the benefit of having to train a fifth as many outputs. But it's well worth bearing this in mind because generally you don't want to be forcing your network to fit irrelevant patterns. Ideally the outputs should be completely independent.

(EDIT: apologies for the broken quoting before.)

[Edited by - Kylotan on March 17, 2010 6:14:53 AM]

Share this post


Link to post
Share on other sites
Not sure if you are doing this for school or not, but if you are, you should strongly consider neuroevolution. That is, evolving the network topology itself and not hand-creating an ad-hoc topology.

There are several methods for doing it, but NEAT is fairly easy to implement and there are multiple open-source implementations out there in a variety of languages :

http://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_topologies

The benefits are that (1) the problem gets solved with a minimal network and (2) you don't have to trial-and-error hand code networks, which may be extremely tricky to do anyway if the problem is sufficiently complex.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this