neural network input/outputs

Started by
6 comments, last by hdaly 20 years ago
Hi all, I''ve implemented a standard back-propagation neural network, and its works fine when the inputs and desired outputs are in the range 0.0- 1.0. My activation function is the standard sigmoid 1/(1+e^-x). However, if i change the activation function to 2/(1+e^-x) and have the inputs and desired patterns in the range 0.0 - 2.0, the network dosen''t work at all. I''ve changed the derivative of the activation function (used for training) from AF*(1-AF) to 2*AF*(1-AF) to take account of the changed activation function. Any ideas why this isn''t working??? My ideal goal is to have the network accept inputs/ desired outputs in the range -20.0 - 20.0 , using the activation function (2/(1+e^-x))-1.0. Is they any thing preventing a network from using both negative and positive numbers??? Any help or ideas would be greatly appreciated. hdaly.
Advertisement
I know nothing practical about neural networks, but why not rescale your inputs before feeding them in to the network? Then you can have them supplied in any scale you wish.
Cheers for that.
If in the end i have to rescale i will, but i just want to so if it will work without rescaling first. I have a feeling (not based on experience or anything!!) that by rescaling from -20.0/20.0 to 0.0/2.0, some significance of the difference between negative and positive numbers might be lost.

hdaly.
quote:Original post by hdaly
Cheers for that.
If in the end i have to rescale i will, but i just want to so if it will work without rescaling first. I have a feeling (not based on experience or anything!!) that by rescaling from -20.0/20.0 to 0.0/2.0, some significance of the difference between negative and positive numbers might be lost.


In most cases, inputs don't "need" to be scaled, although it is often advantageous in practice.

Scaling the output variable is the usual solution to this problem. Can you think of an example where information would be lost and how?

-Predictor
http://will.dwinnell.com






[edited by - Predictor on March 22, 2004 10:11:42 AM]
To be honest, no i can''t give an example of information being lost, just a gut feeling really.
Using input that includes negatives and positives has been shown to decrease the training time involved (Haykin - Neural Networks). However, I don''t think that going from -20.0 to 20.0 is really benificial (this is just my opinion though). If I were you, I would scale my inputs between -1.0-1.0 like...
scaledvector = 2*(inputvector-min)/(max-min)-1;<br> </pre> <br><br>Then change your activation function to a hyperbolic tangent (also shown to work well with negative/positive input).  The general form for the activation function goes something like…<br><br>a*tanh(b*neuron_sum)<br><br>Suggested values for a,b are 1.7159 and 2.0/3 respectively.<br><br>But, as you noticed, you will have to change the gradient function to accomadate the new derivative of this activation function.<br><br>I''ll save you some time.. I think it should be:<br><br>(b/a)*(a-neuron_output)*(a+neuron_output)<br><br>instead of..<br><br>neuron_output*(1.0-neuron_output) like in a normal sigmoid activation function network.<br><br>Give that a go and see if you have any more success.  I used it on a digital fourieur transform that was scaled between -1.0-1.0 and I had a good deal of success with it.  (Although, sadly, results from one persons experiment are not always transferable to another persons experiment due to the input data always being different in the best way that it can be seperated)<br><br>Good luck!<br><br>Ryan  </i>  
mang?
Well, If you want your outputs of the sigmoid f(x) between 0 and 2, you sould have

f(x) = 2 / (1 + exp(-x))

and

f''(x) = (1/2) * f(x) * (2 - f(x))


That should work.


Let''s make a general rule here:

To have the sigmoid function f(x) between min and max:

a = max - min
b = -min

f(x) = ( a / (1 + exp(-x)) ) - b
f''(x) = (1/a) * (b + f(x)) * (a - b - f(x))


Try this out and tell me how it goes!


quote:Original post by trub
Using input that includes negatives and positives has been shown to decrease the training time involved (Haykin - Neural Networks).


It''s probably worth qualifying the above, which, while true for the most common MLP implementations, will not be true for all. MLPs trained by global (or hybrid) optimization, for instance, will not be affected by such scaling one way or the other.

-Predictor
http://will.dwinnell.com

This topic is closed to new replies.

Advertisement