Neural networks: hidden Layer Bias

Started by
9 comments, last by kirkd 18 years, 5 months ago
How important is it to have a bias on your hidden layers? Can you have the same effect by just having a bias on the input layer? Is it worth having one (or more) fixed value(s) in your training (and output) patterns? i.e. for XOR input{A,B,fixed}output{result,fixed} {0,0,1}{0,1} {0,1,1}{1,1} {1,0,1}{1,1} {1,1,1}{0,1} Does anyone have an simple C(++) source I could try these out with multilayer EBP? Thanks, Rob.
Advertisement
You need bias going into every hidden node and every output node.

There is no bias going into the input nodes.

Bias is always 1.0

Have the learning algorithm learn the weights for the connecting edges normally.

You will need a sigmoid and sigmoidPrime for the bias activation function.

Basically, a bias neuron helps seperate cases and allows you to have smaller neural nets than you would without bias.

A little fuzzy on the exact reasons as its been a while since I went through the theory but I just coded a NN at work and it works well.

Also, xor should look like this:

A - B - Output
0 - 0 - 0
0 - 1 - 1
1 - 0 - 1
1 - 1 - 0

The bias shouldn't even show up in the training data. It should just be handled automatically without you even having to see it.

Make you neural net data driven. 2 files, 1 config and 1 data.

That way you can much about with the internal structure of the nn by modding the config file rather than having to mod the code.
the bias is needed to be able to solve problems that is not linearly seperable.


/Nico
Thanks, but can I get away with JUST biasing the input pattern? (to simplify implementation).

I last did NNs ages ago, and I remember mucking around with my own C NN "engine" which is lost on a very old disk (I recompiled it for GCC to use 32bit memory addressing, that's how old!).

I'm planning on writing a NN to drive round a race track, and I'm going to make the program use a technique I thought of to create it's own learning patterns on the fly.

The NN Inputs will consist of:
Direction of travel relative to heading.
Steering Wheel relative to heading.
De/Acceleration Force.
Accelerator pedal.
64 radial Time-To-Impact with left side of track Inputs surrounding the driver.
64 radial Time-To-Impact with right side of track Inputs surrounding the driver.
(1 implies Impact next frame; -1 no chance of impact)

The outputs are Steering Wheel, Accelerator.
So you will have no biases on any of the hidden layers?
I don't have any mathematical proof, but intuitively, I would suspect that with any bias on the hidden layers, you will be limiting the ability of your net. The bias allows your fit curve to shift and have an arbitrary intercept. If you think about it in one dimension, consider an X-axis with a sigmoidal curve going through the origin. Without the bias, you're stuck with the curve going through the origin. If you add in a bias term, you can now move that interception point on the X-axis around. This would translate to the hidden layers as well. Without a bias, they can only have one possible point of intercept without shifting, which may limit the ability to model a given curve.

I should point out that an earlier post said that the bias was necessary to address linearly inseperable problems, but that is not correct. Linear inseperability is addressed by having hidden layers. Without hidden layers, you essentially have a linear function approximator - the perceptron. When you add in a hidden layer, you start allowing for nonlinear transformations of the input variables.

I hope that helps.

-Kirk

Quote:Original post by ROBERTREAD1
Thanks, but can I get away with JUST biasing the input pattern? (to simplify implementation).

I think the answer is yes. If the network really needs to have a bias further down the line, it can always use one of the neurons to just produce a constant, which can then be used as bias in the next layer.

Quote:Original post by kirkd
I don't have any mathematical proof, but intuitively, I would suspect that with any bias on the hidden layers, you will be limiting the ability of your net. The bias allows your fit curve to shift and have an arbitrary intercept. If you think about it in one dimension, consider an X-axis with a sigmoidal curve going through the origin. Without the bias, you're stuck with the curve going through the origin. If you add in a bias term, you can now move that interception point on the X-axis around. This would translate to the hidden layers as well. Without a bias, they can only have one possible point of intercept without shifting, which may limit the ability to model a given curve.

I should point out that an earlier post said that the bias was necessary to address linearly inseperable problems, but that is not correct. Linear inseperability is addressed by having hidden layers. Without hidden layers, you essentially have a linear function approximator - the perceptron. When you add in a hidden layer, you start allowing for nonlinear transformations of the input variables.

I hope that helps.

-Kirk



Ah so this is an actual way of doing a feed forward network?

Do you know if there are any articles about this that i can read up on?

THanks
Yes, the bias nodes are part of a feed forward network. I assumed we were talking about feed forward networks from the context.

As for articles, I don't have any specific recommendations. A few books of interest include Neural Networks for Pattern Recognition by Bishop, Neural and Adaptive systems by Principe, The Elements of Statistical Learning by Hastie, and Pattern Classification by Duda. The only one of those that includes a disucssion of bias units is Bishop, and there is no discussion of their necessity in the hidden layer.

There are tons of tutorials on the web. Try Google-ing for "neural net tutorial" and you'll get a whole weekend's worth of reading material. I search for quite a while, but didn't find anything related to sacrificing the bias on the hidden layers.

-Kirk

This is from a classifier perspective. Bias is needed when you need separation boundaries that don't all intersect at the origin.

Example:

weight1*x + weight2*y + weight3*z + .. + (0 bias term) = 0 defines a hyperplane going through the origin.

weight1*x + weight2*y + weight3*z + .. + weightbias*(bias=1) = 0 defines a hyperplane that does not intersect the origin.

Think of these hyperplanes, lines, etc. as decision boundaries on what output a neural network should have.

Edit: err what kirkd said ... :)

This topic is closed to new replies.

Advertisement