The maths behind back propagation

Started by
13 comments, last by NickGeorgia 18 years, 2 months ago
Ive been debugging my neural net class for a week now, I cant figure out whats wrong. I thought it could be that the back propagation equation isnt implented as intended. Not being that good at reading long mathemetical equations with a bunch of greek letters, I looked up this rather simplified version at generetion5: d2(1) = x2(1)(1 - x2(1))w3(1,1)d3(1) Which in english mean (at least how I percieve it): "Delta for neuron n in layer l is = (output of neuron n in layer l) * (1 - output of neuron n in layer l) * (weight for input n in neuron n in layer l+1) * (delta for neuron n in layer l+1)" First things first, have I got that right? It sounds reasonable, the delta should be high if this neuron corresponded much to the error that the next neuron in line produced. However, there is one thing here that Im uncertain of: "weight for input n in neuron n in layer l+1" This is where we look on how much the next neuron in line consider our input, if it doesnt care much about out output, we should care to much to adjust ourselves to it. The thing that is bothering me is that its both neuron n and input n. What if we have two neurons in the next layer? Then we only look at one, the one straight ahead. For example:

x2(1) ---- w3(1,1) > x3(1)
      \  / w3(2,1) 
       ><
      /  \ w3(1,2) 
x2(2) ---- w3(2,2) > x3(2)

Here, x2(1) serve as an input to both x3(1) and x3(2) but according to the formula it will only look at x3(1), the one straight ahead. "weight for input n in neuron n in layer l+1" -> "weight for input 1 in neuron 1 in layer 3" x2(2) on the other hand will only look at the other neuron, also straight ahead: "weight for input 2 in neuron 2 in layer 3" Shouldnt back propagation consider all the other neurons of which this current neuron serve as an input to? Or is this the way it should be done? Please correct me where Im wrong.
Advertisement
What type of neural network are you using Miz? Perceptron nodes?

And what do you want it to do?
I think its a perceptron node net. The article which I built my net from is found here. Ive read more but that was the one I copied most of the structure from.

And what I want to do with it? Well... learn neural nets. :P The idea is to create a very dynamic class with a configurable number of inputs, hidden layers and neurons per hidden layer which I can use anywhere where I want a neural net.
Ok Miz,

From the article, they aren't really using a hardlimiter, they are using a sigmoidal function. This is used because sigmoidal functions are differentiable.

So, for instance, let us consider a simple neural network with an input, an output, and a hidden layer. The input layer will just consist of nodes that are the input values.

Let's go through the feedforward operation (when we have the inputs and we want to see what the output produces). For simplicity, lets say we have 3 inputs (one of them bias = 1), 3 hidden nodes, and 2 outputs.

To calculate the feedforward of the hidden layer we have:

h1 = v(w11*x1+w21*x2+w31*x3)
h2 = v(w12*x1+w22*x2+w32*x3)
h3 = v(w13*x1+w23*x2+w33*x3)
where v is sigmodal: v(x) = 1/(1+exp(-x))

This is the output of the hidden layer.

Now lets go calculate the output layer (we shall call the weights here o instead of w).
y1 = v(o11*h1+o21*h2+o31*h3)
y2 = v(o12*h1+o22*h2+o32*h3)

Now that we can do feedforward of a neural network let's do training... next message. Let me know if you don't understand so far.


To determine the backpropagation equations, we shall use gradient based optimization. In other words, we want to minimize this equation:

Error = 0.5*(t1-y1)^2 + 0.5*(t2-y2)^2

where tk is the target output for yk.

To do this, we must perform derivatives on the Error with respect to each weight. I won't go into this because you may not like derivatives. You can see the idea in my journal later if you want to take a look. I will just give you the weight update equations right now. They are:

First let
v'(x) = (1-v(x))*v(x) -- this is the derivative of v(x)

Then the weight update equations are (if I didn't make a mistake)
new_o11 = o11 + v'(o11*h1+o21*h2+o31*h3)*h1*(t1-y1)*learning_rate
new_o21 = o21 + v'(o11*h1+o21*h2+o31*h3)*h2*(t1-y1)*learning_rate
new_o31 = o31 + v'(o11*h1+o21*h2+o31*h3)*h3*(t1-y1)*learning_rate
new_o12 = o12 + v'(o12*h1+o22*h2+o32*h3)*h1*(t2-y2)*learning_rate
new_o22 = o22 + v'(o12*h1+o22*h2+o32*h3)*h2*(t2-y2)*learning_rate
new_o32 = o32 + v'(o12*h1+o22*h2+o32*h3)*h3*(t2-y2)*learning_rate

new_w11 = w11 + {[v'(o11*h1+o21*h2+o31*h3)*o11*(t1-y1)+
v'(o12*h1+o22*h2+o32*h3)*o12*(t2-y2)]}*x1*learning_rate
new_w21 = w21 + {[v'(o11*h1+o21*h2+o31*h3)*o11*(t1-y1)+
v'(o12*h1+o22*h2+o32*h3)*o12*(t2-y2)]}*x2*learning_rate
new_w31 = w31 + {[v'(o11*h1+o21*h2+o31*h3)*o11*(t1-y1)+
v'(o12*h1+o22*h2+o32*h3)*o12*(t2-y2)]}*x3*learning_rate

new_w12 = w12 + {[v'(o11*h1+o21*h2+o31*h3)*o21*(t1-y1)+
v'(o12*h1+o22*h2+o32*h3)*o22*(t2-y2)]}*x1*learning_rate
new_w22 = w22 + {[v'(o11*h1+o21*h2+o31*h3)*o21*(t1-y1)+
v'(o12*h1+o22*h2+o32*h3)*o22*(t2-y2)]}*x2*learning_rate
new_w32 = w32 + {[v'(o11*h1+o21*h2+o31*h3)*o21*(t1-y1)+
v'(o12*h1+o22*h2+o32*h3)*o22*(t2-y2)]}*x3*learning_rate

new_w13 = w13 + {[v'(o11*h1+o21*h2+o31*h3)*o31*(t1-y1)+
v'(o12*h1+o22*h2+o32*h3)*o32*(t2-y2)]}*x1*learning_rate
new_w23 = w23 + {[v'(o11*h1+o21*h2+o31*h3)*o31*(t1-y1)+
v'(o12*h1+o22*h2+o32*h3)*o32*(t2-y2)]}*x2*learning_rate
new_w33 = w33 + {[v'(o11*h1+o21*h2+o31*h3)*o31*(t1-y1)+
v'(o12*h1+o22*h2+o32*h3)*o32*(t2-y2)]}*x3*learning_rate

Pray I didn't make a mistake LOL.

[Edited by - NickGeorgia on January 26, 2006 4:48:35 PM]
I understand the first post very well, and derivatives is no problem... if I remember it correctly, they worked like this:

y(x) = x^2
y'(x) = 2x

Right?

And I already use the sigmoid function, I thought that might work better than a hardlimiter.

That error equation of yours works slightly different from the one I found on generation5, on the site it was like this:

Error = y * (1 - y) * (t - y)

Or did I missunderstand that last part?

[edit] Getting late... Ill return to this thread tomorrow for a reread and, hopefully, Ill get some kind of huge eureka-insight. :P [smile]
I just use the sum of squared errors. Not sure about the one you mentioned.
Hmm, that looks quite different from what I learned on the site. But what I got now doesnt work :P Or I cant make it work I mean. Is this correct?:

w is weight for hidden neurons
o is weight for output neurons
h is what a neuron in the hidden layer fired/outputed
x is initial input to net
t is target output of net
y is actual output of net

In my net I only use one output, made a thread about it here and I made the conclussion that nets with a single output neuron performed better. A little slower but better results.

I make a backup tomorrow and try to switch my code into using that system instead. See if I get any more luck. Ill post results here either way.

Thanks, Nick. [smile] Rating++.

[edit] Oh, and the error equation I mentioned that you wasnt sure about. I found it on the site I linked to in my second post. Go check it out if you want.
Yep, I think you got my notation down. Since they were using a different error equation, I would imagine the equations would be different. OK, let me know how it turns out. (Hope I didn't make a mistake, but I'll check it over later)

Also here is a link on how you might do it if you wanted to use hard limiters.
Ive read your post over and over Nick and heres what Ive come up with.

First, look at the design of my net:
Input	Hidden_1		Hidden_2		Output_Neuronx(0,0)			x(1,0)			x(2,0)			<- Biases			x(0,1)	w(0,0,0)	x(1,1)	w(1,0,0)	x(2,1)	w(2,0,0)x(0,2)	w(0,1,0)		w(1,1,0)		w(2,1,0)	x(3,0)	// Total net output	w(0,2,0)		w(1,2,0)		w(2,2,0)		w(0,0,1)	x(1,2)	w(1,0,1)	x(2,2)	w(0,1,1)		w(1,1,1)	w(0,2,1)		w(1,2,1)


Here, x(l,n) is the ouput fron neuron n in layer l. n = 0 is bias, 1, in every layer. w(l,f,n) is the weight for input f to neuron n in layer l.

For example, x(1,1) is Sigmoid( x(0,0)*w(0,0,0)+x(0,1)*w(0,1,0)+x(0,2)*w(0,2,0) ).

x(3,0) is the total net output.

Trying to convert your formulas into one I could use, I came up with this for the output neuron:

w(2,f,0) = W(2,f,0) + v'(Input) * x(2,f) * (d-Input) * learn_rate

Where d is the desired output of the net. And Input is defied as the sum of all inputs times thier respective weight; x(2,0)*w(2,0,0)+x(2,1)*w(2,1,0)+x(2,2)*w(2,2,0). v'(x) function is the derivative function which I wrote like this:

float	Deriv(float num)	{return (1-num)*num;};


Is that right?

When I got to the hidden layer new weight calculations, I got completely stuck, cant come up with a good way to write a formula for it. This is the best I could do:

w(l,f,n) = w(l,f,n) + v'(Input) * w(l+1,n, ? ) * (d-NetOut) * x(l,f) * learn_rate

Here, v'(Input) is the same as above, i.e. the total input to the output neuron. But this is the same problem that made me start this thread. v'(Input) maybe instead is the total input to the neuron in the next layer.

I got so many thoughts but they are shattered so I got problems describing them. It makes sense multiplying total input in the nextlayer neuron with a weight, but fow which neuron? Neuron n here serve as the input to the next neuron so it makes sense making f=n. But for which neuron in is this? Neuron 1 in layer 1 server as input 1 for every neuron in layer 2, so which neuron should I calculate with?

Ive probably made a bunch of errors here so I stop and wait for your reply. Probably could use a break to, you know how blind you get when you look at the same problem for to long.

Thanks again for all your help.

This topic is closed to new replies.

Advertisement