Neural Net training - error sum sometimes INCREASES

Started by
3 comments, last by Sirveaux 16 years, 2 months ago
I wrote my first artificial neural network implementation the other day and got it working great for XOR and a few other simple functions I gave it. It seemed a little slow at times, but it would usually eventually get the error sum for each epoch down to a reasonable level. I've decided to give it something a little more complex and now I'm having some issues. The network consists of 7 inputs, 4 outputs, and currently, a single hidden layer with 6 neurons in it. It uses backprop for training and, while I've tried a learning rate with a wide range of values, it's currently set at 0.5. I'm trying to train the network to navigate an agent around a single wall and toward a target location. The 7 inputs are the lengths of 3 rays that extend out to find walls and 4 activation levels of radar wedges around the agent that return information about the target. As it is, the training set ends up being around 150 entries in length (they are recorded at every time step as I drive the agent around the wall toward the target). I've reduced this to as low as 6 and I still get the same problem: the change in the error sum eventually levels off to the point where it's no longer decreasing, even though the sum may be something like 30.0 or 40.0 (whereas I'm trying to get it to 0.01). Sometimes, the error will actually INCREASE over the course of the training. I can't figure out if this is a problem with my data set or with my network training implementation. Here's basically what my code is doing (or what I think it's doing, anyway):

For each training set
   For each neuron in the output layer
      Set the error value of this neuron to:
       error=(target output - actual output) * actual output*(1 - actual output)

      error sum += (target output - actual output) squared

      For each weight from the hidden layer to the output layer
         new weight += error * learning rate * activation of hidden layer neuron
      Next weight

      new bias weight += error * learning rate * bias (bias is 1)
   Next outer layer neuron

   For each neuron in the hidden layer
      error = 0
      For each neuron in the outer layer
         error += outer neuron error * weight from hidden to outer
      Next outer layer neuron

      error *= hidden neuron activation * (1 - hidden neuron activation)

      For each weight from inputs to the hidden layer
         hidden neuron weight += error * learning rate * input value associated with this weight
      Next weight

      new hidden bias weight += error * learning rate * bias
   Next hidden layer neuron
Next training set
============== If you've made it this far, I thank you. If any clarification is needed, let me know. I'd appreciate any help you can offer.
Advertisement
Since no one has replied i thought i would add my $0.02. With a network such as 7-6-4 like you are trying, drop your alpha to 0.1 or 0.01. I could see a learning rate of 0.5 diverging like crazy if something doesn't go quite right when training. Lowering your rate will most likely increase training time, but the chance of you converging on what you want is much better.
To add on what cirons said.
You might also add some sort of momentum to your learning.This will save the net from learning very wrong spikes in the training set by lowering the learning rate if a different input comes and increase it if they are on the same side.
There are also stuff like delta-bar-delta,widrof-hoff learning but i cant remember them at the moment :)

[Edited by - Black Knight on February 23, 2008 2:43:22 AM]
Yeah, if the error is plateauing like that it sounds like you're finding a local minimum of some sort. A momentum term should help.

Richard "Superpig" Fine - saving pigs from untimely fates - Microsoft DirectX MVP 2006/2007/2008/2009
"Shaders are not meant to do everything. Of course you can try to use it for everything, but it's like playing football using cabbage." - MickeyMouse

Actually, I discovered that my major problem was that I wasn't normalizing my input. I had some inputs that were in the range 0 - 60,000 while other inputs between 0 - 50. So, I've scaled them all down so that they are now between -1 and 1 and it performs much better. I've also added momentum to it, which seems to help it keep from getting stuck as frequently.

It's working much better now, and I've actually had it navigate around the wall and find the target on its own a few times. I think the network is in pretty good shape. Mainly I think I need to adjust the sensors to present more meaningful data (I may have to add a couple more inputs as well).

Thanks for the help! Of course, any other suggestions you might have are welcome.

This topic is closed to new replies.

Advertisement