# Back Prop Problem

This topic is 4518 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I am writing a Neural Network that will take in 64 inputs (all normalised between 0-1) and try to reproduce the same at the o/p. It will have 16 hidden neurons. I am using simple back-prop to train it, but I only seem to cycling between 2/3 different weight vectors. I had initialised the network with small random weights ((double)rand()/RAND_MAX). What could be the problem? My textbook uses Conjugate gradient for training, but I dont want to use it, as I dont understand the Math behind. I am actually calculating the gradient vector of the weight vector, and then using w += grad w * learning rate as the modification step. Is it that I am trying to train a little too much using a rather simplistic algorithm? Or is the problem with my algorithm?
double Layernet:: find_grad(double *input)
{
int n = n_hid * (n_in + 1) + n_out * (n_hid + 1); //total number of neurons

for (int j = 0; j <n; j++)

double diff; double error = 0.0;

for (int k=0;k<n_out;k++)
{
diff = input[k] - out[k];
error += diff * diff;
outdelta[k] = diff * act_deriv (out[k]);
}
int l;

//calc o/p gradient, of the weights connecting Hidden to Output
double delta;
for (int k=0;k<n_out;k++)
{
delta = outdelta[k];
for ( l=0; l< n_hid;l++)
grad[n_hid * (n_in + 1) + l + k * (n_hid + 1)] = delta * hid[l];
grad[n_hid * (n_in + 1) + l + k * (n_hid + 1) + 1] = delta ;
}

int i,jj,k;

for (i=0;i<n_hid;i++)
{
delta = 0;

for (jj=0;jj<n_out;jj++)
delta += outdelta[jj] * out_coeffs[jj * (n_hid + 1) + i];
delta *= act_deriv (hid);

for (k=0; k<n_in; k++)
*hidgrad++ = delta * input [k];
}
return error / (double (n_out));
}

void Layernet::modify_weights ()
{
int i,j;

for (i=0; i<n_hid;i++)
{
for (j=0; j<=n_in;j++)
hid_coeffs[j + i * (n_in + 1)] += .4* grad [j + i * (n_in + 1)] ;
}

for (i=0; i<n_out; i++)
{
for (j=0; j<=n_hid;j++)
out_coeffs[j + i*(n_hid + 1)] += .4 * newgrad [j + i*(n_hid + 1)] ;
}

}

Thanks.

##### Share on other sites
Some debugging has shown that the total_error of the network
(target[k] - observed[k])^2 actually starts 'increasing' after about 100 iterations. How is that happening? Any clues??

##### Share on other sites
The code's a bit fuzzy to me, but first of the block, you don't need to calculate the gradient, you just need to keep the output of the units (which you'll need anyway, that's where your problem is). The gradient = output * ( 1 - output ), so forget the explicit gradients.

Then, w += grad w * learning is not complete, should be

w += grad w * learning * error * input

the error is obviously determined by the output (that's why you need to keep them). for the output layer this is (output-desired output) and for the hidden layer(s) this value is calculated backward through the network (the inverse way of how you calculated the output going forward.

In lin algebra terms its like:

going forward for all layers

sigmoid( ( weight-matrix * input ) ) = ( output next layer )

at the output layer

output error = ( output - desired output )

than backwards for all layers

weight-matrix-transposed * output error = input error (or output error previous layer )

and

weight adjustment = learning * ( output * ( 1 - output ) ) *

input * weight

This probably still not very clear, but looking at it like matrix/vector products simplifies your implmentation tremendously and you might be able to use some EFFICIENT linear algebra package! If you would use matrix-matrix multiply, with the same schema you could process (more efficiently) several inputs at the same time (for offline training). Lin Alg Packs usually do a better job at optimising matrix matrix multiply than consequtive matrix vector multiplies. Speed is really the crux!

##### Share on other sites
Quote:
 Then, w += grad w * learning is not complete, should be w += grad w * learning * error * input

Page 163 of Neural Networks (Simon Haykin, 2nd Edition) shows

del w(i,j) = - eta * grad ---eqn (4.12)

where grad = partial derivative of the error function wrt w(i,j)

##### Share on other sites
Well, I don't want to argue with you, i looked it up in my handbook, which is
R Rojas, Neural Network (a systematic introduction) page 165-167 gives the above information. All I can say is that my networks work and you seemed to have a problem. Have a look at this link http://www.dontveter.com/bpr/public2.html there's a detailed numerical example, so that should help.

1. 1
2. 2
frob
16
3. 3
4. 4
5. 5

• 14
• 13
• 14
• 75
• 22
• ### Forum Statistics

• Total Topics
632145
• Total Posts
3004342

×