Jump to content
  • Advertisement
Sign in to follow this  

The maths behind back propagation

This topic is 4651 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Ive been debugging my neural net class for a week now, I cant figure out whats wrong. I thought it could be that the back propagation equation isnt implented as intended. Not being that good at reading long mathemetical equations with a bunch of greek letters, I looked up this rather simplified version at generetion5: d2(1) = x2(1)(1 - x2(1))w3(1,1)d3(1) Which in english mean (at least how I percieve it): "Delta for neuron n in layer l is = (output of neuron n in layer l) * (1 - output of neuron n in layer l) * (weight for input n in neuron n in layer l+1) * (delta for neuron n in layer l+1)" First things first, have I got that right? It sounds reasonable, the delta should be high if this neuron corresponded much to the error that the next neuron in line produced. However, there is one thing here that Im uncertain of: "weight for input n in neuron n in layer l+1" This is where we look on how much the next neuron in line consider our input, if it doesnt care much about out output, we should care to much to adjust ourselves to it. The thing that is bothering me is that its both neuron n and input n. What if we have two neurons in the next layer? Then we only look at one, the one straight ahead. For example:
x2(1) ---- w3(1,1) > x3(1)
      \  / w3(2,1) 
      /  \ w3(1,2) 
x2(2) ---- w3(2,2) > x3(2)

Here, x2(1) serve as an input to both x3(1) and x3(2) but according to the formula it will only look at x3(1), the one straight ahead. "weight for input n in neuron n in layer l+1" -> "weight for input 1 in neuron 1 in layer 3" x2(2) on the other hand will only look at the other neuron, also straight ahead: "weight for input 2 in neuron 2 in layer 3" Shouldnt back propagation consider all the other neurons of which this current neuron serve as an input to? Or is this the way it should be done? Please correct me where Im wrong.

Share this post

Link to post
Share on other sites
I think its a perceptron node net. The article which I built my net from is found here. Ive read more but that was the one I copied most of the structure from.

And what I want to do with it? Well... learn neural nets. :P The idea is to create a very dynamic class with a configurable number of inputs, hidden layers and neurons per hidden layer which I can use anywhere where I want a neural net.

Share this post

Link to post
Share on other sites
Ok Miz,

From the article, they aren't really using a hardlimiter, they are using a sigmoidal function. This is used because sigmoidal functions are differentiable.

So, for instance, let us consider a simple neural network with an input, an output, and a hidden layer. The input layer will just consist of nodes that are the input values.

Let's go through the feedforward operation (when we have the inputs and we want to see what the output produces). For simplicity, lets say we have 3 inputs (one of them bias = 1), 3 hidden nodes, and 2 outputs.

To calculate the feedforward of the hidden layer we have:

h1 = v(w11*x1+w21*x2+w31*x3)
h2 = v(w12*x1+w22*x2+w32*x3)
h3 = v(w13*x1+w23*x2+w33*x3)
where v is sigmodal: v(x) = 1/(1+exp(-x))

This is the output of the hidden layer.

Now lets go calculate the output layer (we shall call the weights here o instead of w).
y1 = v(o11*h1+o21*h2+o31*h3)
y2 = v(o12*h1+o22*h2+o32*h3)

Now that we can do feedforward of a neural network let's do training... next message. Let me know if you don't understand so far.

Share this post

Link to post
Share on other sites
To determine the backpropagation equations, we shall use gradient based optimization. In other words, we want to minimize this equation:

Error = 0.5*(t1-y1)^2 + 0.5*(t2-y2)^2

where tk is the target output for yk.

To do this, we must perform derivatives on the Error with respect to each weight. I won't go into this because you may not like derivatives. You can see the idea in my journal later if you want to take a look. I will just give you the weight update equations right now. They are:

First let
v'(x) = (1-v(x))*v(x) -- this is the derivative of v(x)

Then the weight update equations are (if I didn't make a mistake)
new_o11 = o11 + v'(o11*h1+o21*h2+o31*h3)*h1*(t1-y1)*learning_rate
new_o21 = o21 + v'(o11*h1+o21*h2+o31*h3)*h2*(t1-y1)*learning_rate
new_o31 = o31 + v'(o11*h1+o21*h2+o31*h3)*h3*(t1-y1)*learning_rate
new_o12 = o12 + v'(o12*h1+o22*h2+o32*h3)*h1*(t2-y2)*learning_rate
new_o22 = o22 + v'(o12*h1+o22*h2+o32*h3)*h2*(t2-y2)*learning_rate
new_o32 = o32 + v'(o12*h1+o22*h2+o32*h3)*h3*(t2-y2)*learning_rate

new_w11 = w11 + {[v'(o11*h1+o21*h2+o31*h3)*o11*(t1-y1)+
new_w21 = w21 + {[v'(o11*h1+o21*h2+o31*h3)*o11*(t1-y1)+
new_w31 = w31 + {[v'(o11*h1+o21*h2+o31*h3)*o11*(t1-y1)+

new_w12 = w12 + {[v'(o11*h1+o21*h2+o31*h3)*o21*(t1-y1)+
new_w22 = w22 + {[v'(o11*h1+o21*h2+o31*h3)*o21*(t1-y1)+
new_w32 = w32 + {[v'(o11*h1+o21*h2+o31*h3)*o21*(t1-y1)+

new_w13 = w13 + {[v'(o11*h1+o21*h2+o31*h3)*o31*(t1-y1)+
new_w23 = w23 + {[v'(o11*h1+o21*h2+o31*h3)*o31*(t1-y1)+
new_w33 = w33 + {[v'(o11*h1+o21*h2+o31*h3)*o31*(t1-y1)+

Pray I didn't make a mistake LOL.

[Edited by - NickGeorgia on January 26, 2006 4:48:35 PM]

Share this post

Link to post
Share on other sites
I understand the first post very well, and derivatives is no problem... if I remember it correctly, they worked like this:

y(x) = x^2
y'(x) = 2x


And I already use the sigmoid function, I thought that might work better than a hardlimiter.

That error equation of yours works slightly different from the one I found on generation5, on the site it was like this:

Error = y * (1 - y) * (t - y)

Or did I missunderstand that last part?

[edit] Getting late... Ill return to this thread tomorrow for a reread and, hopefully, Ill get some kind of huge eureka-insight. :P [smile]

Share this post

Link to post
Share on other sites
Hmm, that looks quite different from what I learned on the site. But what I got now doesnt work :P Or I cant make it work I mean. Is this correct?:

w is weight for hidden neurons
o is weight for output neurons
h is what a neuron in the hidden layer fired/outputed
x is initial input to net
t is target output of net
y is actual output of net

In my net I only use one output, made a thread about it here and I made the conclussion that nets with a single output neuron performed better. A little slower but better results.

I make a backup tomorrow and try to switch my code into using that system instead. See if I get any more luck. Ill post results here either way.

Thanks, Nick. [smile] Rating++.

[edit] Oh, and the error equation I mentioned that you wasnt sure about. I found it on the site I linked to in my second post. Go check it out if you want.

Share this post

Link to post
Share on other sites
Yep, I think you got my notation down. Since they were using a different error equation, I would imagine the equations would be different. OK, let me know how it turns out. (Hope I didn't make a mistake, but I'll check it over later)

Also here is a link on how you might do it if you wanted to use hard limiters.

Share this post

Link to post
Share on other sites
Ive read your post over and over Nick and heres what Ive come up with.

First, look at the design of my net:

Input Hidden_1 Hidden_2 Output_Neuron

x(0,0) x(1,0) x(2,0) <- Biases

x(0,1) w(0,0,0) x(1,1) w(1,0,0) x(2,1) w(2,0,0)
x(0,2) w(0,1,0) w(1,1,0) w(2,1,0) x(3,0) // Total net output
w(0,2,0) w(1,2,0) w(2,2,0)

w(0,0,1) x(1,2) w(1,0,1) x(2,2)
w(0,1,1) w(1,1,1)
w(0,2,1) w(1,2,1)

Here, x(l,n) is the ouput fron neuron n in layer l. n = 0 is bias, 1, in every layer. w(l,f,n) is the weight for input f to neuron n in layer l.

For example, x(1,1) is Sigmoid( x(0,0)*w(0,0,0)+x(0,1)*w(0,1,0)+x(0,2)*w(0,2,0) ).

x(3,0) is the total net output.

Trying to convert your formulas into one I could use, I came up with this for the output neuron:

w(2,f,0) = W(2,f,0) + v'(Input) * x(2,f) * (d-Input) * learn_rate

Where d is the desired output of the net. And Input is defied as the sum of all inputs times thier respective weight; x(2,0)*w(2,0,0)+x(2,1)*w(2,1,0)+x(2,2)*w(2,2,0). v'(x) function is the derivative function which I wrote like this:

float Deriv(float num) {return (1-num)*num;};

Is that right?

When I got to the hidden layer new weight calculations, I got completely stuck, cant come up with a good way to write a formula for it. This is the best I could do:

w(l,f,n) = w(l,f,n) + v'(Input) * w(l+1,n, ? ) * (d-NetOut) * x(l,f) * learn_rate

Here, v'(Input) is the same as above, i.e. the total input to the output neuron. But this is the same problem that made me start this thread. v'(Input) maybe instead is the total input to the neuron in the next layer.

I got so many thoughts but they are shattered so I got problems describing them. It makes sense multiplying total input in the nextlayer neuron with a weight, but fow which neuron? Neuron n here serve as the input to the next neuron so it makes sense making f=n. But for which neuron in is this? Neuron 1 in layer 1 server as input 1 for every neuron in layer 2, so which neuron should I calculate with?

Ive probably made a bunch of errors here so I stop and wait for your reply. Probably could use a break to, you know how blind you get when you look at the same problem for to long.

Thanks again for all your help.

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!