any neural network experts?

Started by
11 comments, last by EJH 14 years, 1 month ago
Hey, does anyone have experience training neural networks with large datasets? I doing character recognition and at this point it isn't as feasible to use trail and error to train, especially when I have to leave it on over night. At the moment I'm training characters A-Z, with 10 training examples for each character. My network structure is 36,36,36,26. Learning rate = 0.05 Momentum 0.01 With higher learning rates or momentums it doesn't seem to converge, just goes up and down vigorously, but with these it seems to converge slowly, just slower and slower as it gets from 2000 error to 1200 error. And at 1200 error, where the slope seems to get caught in local minima it is only partially good at recognising characters. My only guess would be that the network isn't complex enough for the dataset? So I am now training with 36,36,36,30,26 to see what happens, but it takes forever. Like one minute per an epoch. any ideas anyone? should i try increasing the size of the hidden layers, instead of increasing the amount of layers? cheers silver
Advertisement
Back propagation is surprisingly robust if you let it train for ages, more so than other training algorithms.

However, I've had a lot of success with things like RPROP. The idea is to batch up all the weight deltas while training, so you're less likely to find local minima.

It's not too much work to implement. See the FANN library if you get stuck.

Join us in Vienna for the nucl.ai Conference 2015, on July 20-22... Don't miss it!

Do you think the structure is sufficient?
Quote:Original post by Si1ver
Hey, does anyone have experience training neural networks with large datasets? I doing character recognition and at this point it isn't as feasible to use trail and error to train, especially when I have to leave it on over night.

At the moment I'm training characters A-Z, with 10 training examples for each character.

My network structure is 36,36,36,26.
Learning rate = 0.05
Momentum 0.01

With higher learning rates or momentums it doesn't seem to converge, just goes up and down vigorously, but with these it seems to converge slowly, just slower and slower as it gets from 2000 error to 1200 error. And at 1200 error, where the slope seems to get caught in local minima it is only partially good at recognising characters.

My only guess would be that the network isn't complex enough for the dataset? So I am now training with 36,36,36,30,26 to see what happens, but it takes forever. Like one minute per an epoch.

any ideas anyone? should i try increasing the size of the hidden layers, instead of increasing the amount of layers?


I use early stopping to prevent over-fitting, and my experience is that backpropagation frequently hits its optimum fairly quickly (sometimes in as few as 50 training iterations). Obviously, your experience may be very different, given different software and data.

As a general note, once the neural network has more than one output node, there exists the possibility that some output nodes are trained before others. This implies that further training will overfit some nodes while others are yet underfit. With 26 output nodes, I suggest that there is a strong possibility that this is happening.

I suggest trying different methods of pre-processing the data. This will often improve results more than tinkering with training settings.

Good luck!



Another option would be to train 26 separate networks (one for each letter) and train them separately.

Similar approaches have been used for things like autonomus navigation, where one net controls yaw, and another controls pitch, etc...

I think your network structure is actually too complex, especially in the modified version with 3 hidden layers. Two hidden layers should be plenty.

You mention oscillatory behavior - are you doing batch training or individual training. Often oscillations occur when you have a training step after each example, and you can avoid this with batch training - running all of your examples through once, accumulate the errors, and the do one training step afterward.

-Kirk

Thanks for your comments guys. I have mostly come to a solution now. I believe the network as in fact too complex as kirkd suggested. I was much more successful with one hidden layer. In addition I am processing each of the training examples in a random sequence for each epoch.

Silver
One thing I would recommend before making any NN, is to consider the input and output data itself. How to represent the data to/from the net? This is often the first issue in making a fast net.

For example, you are using 26 output nodes, with each node representing a letter. A better way would be to have the net output a binary number.

00001 = a
00010 = b
00011 = c
...
11010 = z

This, along with only using one hidden layer, would really speed up the net. I used this with a net that detected numbers on 16x16 pixel images, and one hidden layer, and the 5 node output layer. Backprop. only took a minute.
You have to be careful with that sort of thing, as it implies that A and C have some factor in common that B doesn't have, and so on. I can see that 5 outputs might well converge more quickly than 26 but there's a real risk of training the wrong sort of information there.
Quote:Original post by Kylotan
You have to be careful with that sort of thing, as it implies that A and C have some factor in common that B doesn't have, and so on. I can see that 5 outputs might well converge more quickly than 26 but there's a real risk of training the wrong sort of information there.


I've run into more problems trying to train a net that uses the "one node per letter" type structure, having issues with over-fitting. I also don't think that it matters. :) I've used nets to train with vector data to output controls to 'bots'; the output was between 1 and -1 with a few nodes controlling turning angle and speed. What I'm saying is that there is no connection between the output nodes themselves. Just because a = 00001 and c = 00011 doesn't mean there is a relationship.

I just found the project I used this for. The net was:

384 -> 40 -> 7

384 inputs because the images where 24x16. 150 training images, numbers up to 75 with a white or gray background. It works every time I train a net, the error rate always falls and never fluctuates. :) (I was most proud of it lol)

This topic is closed to new replies.

Advertisement