Jump to content
  • Advertisement
Sign in to follow this  
Si1ver

any neural network experts?

This topic is 3198 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hey, does anyone have experience training neural networks with large datasets? I doing character recognition and at this point it isn't as feasible to use trail and error to train, especially when I have to leave it on over night. At the moment I'm training characters A-Z, with 10 training examples for each character. My network structure is 36,36,36,26. Learning rate = 0.05 Momentum 0.01 With higher learning rates or momentums it doesn't seem to converge, just goes up and down vigorously, but with these it seems to converge slowly, just slower and slower as it gets from 2000 error to 1200 error. And at 1200 error, where the slope seems to get caught in local minima it is only partially good at recognising characters. My only guess would be that the network isn't complex enough for the dataset? So I am now training with 36,36,36,30,26 to see what happens, but it takes forever. Like one minute per an epoch. any ideas anyone? should i try increasing the size of the hidden layers, instead of increasing the amount of layers? cheers silver

Share this post


Link to post
Share on other sites
Advertisement
Back propagation is surprisingly robust if you let it train for ages, more so than other training algorithms.

However, I've had a lot of success with things like RPROP. The idea is to batch up all the weight deltas while training, so you're less likely to find local minima.

It's not too much work to implement. See the FANN library if you get stuck.

Share this post


Link to post
Share on other sites
Quote:
Original post by Si1ver
Hey, does anyone have experience training neural networks with large datasets? I doing character recognition and at this point it isn't as feasible to use trail and error to train, especially when I have to leave it on over night.

At the moment I'm training characters A-Z, with 10 training examples for each character.

My network structure is 36,36,36,26.
Learning rate = 0.05
Momentum 0.01

With higher learning rates or momentums it doesn't seem to converge, just goes up and down vigorously, but with these it seems to converge slowly, just slower and slower as it gets from 2000 error to 1200 error. And at 1200 error, where the slope seems to get caught in local minima it is only partially good at recognising characters.

My only guess would be that the network isn't complex enough for the dataset? So I am now training with 36,36,36,30,26 to see what happens, but it takes forever. Like one minute per an epoch.

any ideas anyone? should i try increasing the size of the hidden layers, instead of increasing the amount of layers?


I use early stopping to prevent over-fitting, and my experience is that backpropagation frequently hits its optimum fairly quickly (sometimes in as few as 50 training iterations). Obviously, your experience may be very different, given different software and data.

As a general note, once the neural network has more than one output node, there exists the possibility that some output nodes are trained before others. This implies that further training will overfit some nodes while others are yet underfit. With 26 output nodes, I suggest that there is a strong possibility that this is happening.

I suggest trying different methods of pre-processing the data. This will often improve results more than tinkering with training settings.

Good luck!



Share this post


Link to post
Share on other sites
Another option would be to train 26 separate networks (one for each letter) and train them separately.

Similar approaches have been used for things like autonomus navigation, where one net controls yaw, and another controls pitch, etc...

Share this post


Link to post
Share on other sites
I think your network structure is actually too complex, especially in the modified version with 3 hidden layers. Two hidden layers should be plenty.

You mention oscillatory behavior - are you doing batch training or individual training. Often oscillations occur when you have a training step after each example, and you can avoid this with batch training - running all of your examples through once, accumulate the errors, and the do one training step afterward.

-Kirk

Share this post


Link to post
Share on other sites
Thanks for your comments guys. I have mostly come to a solution now. I believe the network as in fact too complex as kirkd suggested. I was much more successful with one hidden layer. In addition I am processing each of the training examples in a random sequence for each epoch.

Silver

Share this post


Link to post
Share on other sites
One thing I would recommend before making any NN, is to consider the input and output data itself. How to represent the data to/from the net? This is often the first issue in making a fast net.

For example, you are using 26 output nodes, with each node representing a letter. A better way would be to have the net output a binary number.

00001 = a
00010 = b
00011 = c
...
11010 = z

This, along with only using one hidden layer, would really speed up the net. I used this with a net that detected numbers on 16x16 pixel images, and one hidden layer, and the 5 node output layer. Backprop. only took a minute.

Share this post


Link to post
Share on other sites
You have to be careful with that sort of thing, as it implies that A and C have some factor in common that B doesn't have, and so on. I can see that 5 outputs might well converge more quickly than 26 but there's a real risk of training the wrong sort of information there.

Share this post


Link to post
Share on other sites
Quote:
Original post by Kylotan
You have to be careful with that sort of thing, as it implies that A and C have some factor in common that B doesn't have, and so on. I can see that 5 outputs might well converge more quickly than 26 but there's a real risk of training the wrong sort of information there.


I've run into more problems trying to train a net that uses the "one node per letter" type structure, having issues with over-fitting. I also don't think that it matters. :) I've used nets to train with vector data to output controls to 'bots'; the output was between 1 and -1 with a few nodes controlling turning angle and speed. What I'm saying is that there is no connection between the output nodes themselves. Just because a = 00001 and c = 00011 doesn't mean there is a relationship.

I just found the project I used this for. The net was:

384 -> 40 -> 7

384 inputs because the images where 24x16. 150 training images, numbers up to 75 with a white or gray background. It works every time I train a net, the error rate always falls and never fluctuates. :) (I was most proud of it lol)

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!