Backpropagation XOR issue

Started by
15 comments, last by alvaro 9 years, 4 months ago
I had an edit block in the previous comment, but I kept getting it wrong, so here it is (hopefully correctly):

Let me add some explanation of why the size of the initialization weights is important. Let's imagine your inputs to a particular layer are uniformly distributed random numbers between 0 and 1. If you multiply them by random weights between -1 and 1 and add N of them together, the size of the result will typically be about sqrt(N)/3 (that's the standard deviation of the resulting distribution). For large N that means that your sigmoid function will saturate very close to 0 or 1 with high probability, and it's very hard to learn anything from there, because the sigmoid function is so flat in those regions that the gradients are pretty much 0. By using weights between -1/sqrt(N) and +1/sqrt(N), the typical linear combination will have zero mean and standard deviation 1/3, so the values of the sigmoid function will have a very small probability of saturating and you can still learn how to adjust the weights usefully.

The assumption that the inputs into the layer be uniformly distributed is not very important. If instead they are independent binary inputs with probability 1/2 of being 1 and probability 1/2 of being 0, the standard deviation of the random linear combinations with weights uniformly distributed between -1/sqrt(N) and +1/sqrt(N) is 1/sqrt(6) ~= 0.40825. So the situation is qualitatively the same.
Advertisement

So, you believe I should run the back-propagation always, period. Ok, I will change the code to do that.

Do you think sharing the weights after a run, and after, say, 100 runs, here, would be useful?

Thank you!

I think it would be more useful to see the weights before and after one update, together with the computations that led to that update.

Sure, ok, I think I can do that. Gimme a day and I'll post the results... brb....

I think I found something wrong with my code, I'm using JavaNNS to validate it right now, if I still have issues, I'll post them here soon.

For now thank you for your patience and assistance. smile.png

Hmm, even tough I've validated my BP algorithm with JavaNNS, a neural net doesn't always achieve a result when training XOR. Is this a documented symptom?

If I place the same value on all weights, it doesn't seem to reach a solution. Seemingly, I would venture, the net enjoys more success when filled with random weights that favor a certain result. I'm off for holidays, but I will post more results here when I have them. Thanks all, and to all happy holidays with your families.

A neural network with equal weights in each layer will have equal weights in each layer after a step of gradient descent, so it will be as effective as a network with a single hidden unit per layer. This is well known.

This topic is closed to new replies.

Advertisement