How important is it to have a bias on your hidden layers?
Can you have the same effect by just having a bias on the input layer?
Is it worth having one (or more) fixed value(s) in your training (and output) patterns?
i.e. for XOR
input{A,B,fixed}output{result,fixed}
{0,0,1}{0,1}
{0,1,1}{1,1}
{1,0,1}{1,1}
{1,1,1}{0,1}
Does anyone have an simple C(++) source I could try these out with multilayer EBP?
Thanks,
Rob.

**0**

10 replies to this topic

Sponsor:

###
#2
Anonymous Poster_Anonymous Poster_*
Guests - Reputation:

Posted 08 November 2005 - 09:09 AM

You need bias going into every hidden node and every output node.

There is no bias going into the input nodes.

Bias is always 1.0

Have the learning algorithm learn the weights for the connecting edges normally.

You will need a sigmoid and sigmoidPrime for the bias activation function.

Basically, a bias neuron helps seperate cases and allows you to have smaller neural nets than you would without bias.

A little fuzzy on the exact reasons as its been a while since I went through the theory but I just coded a NN at work and it works well.

Also, xor should look like this:

A - B - Output

0 - 0 - 0

0 - 1 - 1

1 - 0 - 1

1 - 1 - 0

The bias shouldn't even show up in the training data. It should just be handled automatically without you even having to see it.

Make you neural net data driven. 2 files, 1 config and 1 data.

That way you can much about with the internal structure of the nn by modding the config file rather than having to mod the code.

There is no bias going into the input nodes.

Bias is always 1.0

Have the learning algorithm learn the weights for the connecting edges normally.

You will need a sigmoid and sigmoidPrime for the bias activation function.

Basically, a bias neuron helps seperate cases and allows you to have smaller neural nets than you would without bias.

A little fuzzy on the exact reasons as its been a while since I went through the theory but I just coded a NN at work and it works well.

Also, xor should look like this:

A - B - Output

0 - 0 - 0

0 - 1 - 1

1 - 0 - 1

1 - 1 - 0

The bias shouldn't even show up in the training data. It should just be handled automatically without you even having to see it.

Make you neural net data driven. 2 files, 1 config and 1 data.

That way you can much about with the internal structure of the nn by modding the config file rather than having to mod the code.

###
#4
Members - Reputation: **100**

Posted 08 November 2005 - 02:27 PM

Thanks, but can I get away with JUST biasing the input pattern? (to simplify implementation).

I last did NNs ages ago, and I remember mucking around with my own C NN "engine" which is lost on a very old disk (I recompiled it for GCC to use 32bit memory addressing, that's how old!).

I'm planning on writing a NN to drive round a race track, and I'm going to make the program use a technique I thought of to create it's own learning patterns on the fly.

The NN Inputs will consist of:

Direction of travel relative to heading.

Steering Wheel relative to heading.

De/Acceleration Force.

Accelerator pedal.

64 radial Time-To-Impact with left side of track Inputs surrounding the driver.

64 radial Time-To-Impact with right side of track Inputs surrounding the driver.

(1 implies Impact next frame; -1 no chance of impact)

The outputs are Steering Wheel, Accelerator.

I last did NNs ages ago, and I remember mucking around with my own C NN "engine" which is lost on a very old disk (I recompiled it for GCC to use 32bit memory addressing, that's how old!).

I'm planning on writing a NN to drive round a race track, and I'm going to make the program use a technique I thought of to create it's own learning patterns on the fly.

The NN Inputs will consist of:

Direction of travel relative to heading.

Steering Wheel relative to heading.

De/Acceleration Force.

Accelerator pedal.

64 radial Time-To-Impact with left side of track Inputs surrounding the driver.

64 radial Time-To-Impact with right side of track Inputs surrounding the driver.

(1 implies Impact next frame; -1 no chance of impact)

The outputs are Steering Wheel, Accelerator.

###
#6
Members - Reputation: **505**

Posted 11 November 2005 - 04:10 AM

I don't have any mathematical proof, but intuitively, I would suspect that with any bias on the hidden layers, you will be limiting the ability of your net. The bias allows your fit curve to shift and have an arbitrary intercept. If you think about it in one dimension, consider an X-axis with a sigmoidal curve going through the origin. Without the bias, you're stuck with the curve going through the origin. If you add in a bias term, you can now move that interception point on the X-axis around. This would translate to the hidden layers as well. Without a bias, they can only have one possible point of intercept without shifting, which may limit the ability to model a given curve.

I should point out that an earlier post said that the bias was necessary to address linearly inseperable problems, but that is not correct. Linear inseperability is addressed by having hidden layers. Without hidden layers, you essentially have a linear function approximator - the perceptron. When you add in a hidden layer, you start allowing for nonlinear transformations of the input variables.

I hope that helps.

-Kirk

I should point out that an earlier post said that the bias was necessary to address linearly inseperable problems, but that is not correct. Linear inseperability is addressed by having hidden layers. Without hidden layers, you essentially have a linear function approximator - the perceptron. When you add in a hidden layer, you start allowing for nonlinear transformations of the input variables.

I hope that helps.

-Kirk

###
#7
Crossbones+ - Reputation: **11861**

Posted 11 November 2005 - 07:11 AM

Quote:

Original post by ROBERTREAD1

Thanks, but can I get away with JUST biasing the input pattern? (to simplify implementation).

I think the answer is yes. If the network really needs to have a bias further down the line, it can always use one of the neurons to just produce a constant, which can then be used as bias in the next layer.

###
#8
Members - Reputation: **100**

Posted 12 November 2005 - 12:09 AM

Quote:

Original post by kirkd

I don't have any mathematical proof, but intuitively, I would suspect that with any bias on the hidden layers, you will be limiting the ability of your net. The bias allows your fit curve to shift and have an arbitrary intercept. If you think about it in one dimension, consider an X-axis with a sigmoidal curve going through the origin. Without the bias, you're stuck with the curve going through the origin. If you add in a bias term, you can now move that interception point on the X-axis around. This would translate to the hidden layers as well. Without a bias, they can only have one possible point of intercept without shifting, which may limit the ability to model a given curve.

I should point out that an earlier post said that the bias was necessary to address linearly inseperable problems, but that is not correct. Linear inseperability is addressed by having hidden layers. Without hidden layers, you essentially have a linear function approximator - the perceptron. When you add in a hidden layer, you start allowing for nonlinear transformations of the input variables.

I hope that helps.

-Kirk

Ah so this is an actual way of doing a feed forward network?

Do you know if there are any articles about this that i can read up on?

THanks

###
#9
Members - Reputation: **505**

Posted 12 November 2005 - 02:53 AM

Yes, the bias nodes are part of a feed forward network. I assumed we were talking about feed forward networks from the context.

As for articles, I don't have any specific recommendations. A few books of interest include Neural Networks for Pattern Recognition by Bishop, Neural and Adaptive systems by Principe, The Elements of Statistical Learning by Hastie, and Pattern Classification by Duda. The only one of those that includes a disucssion of bias units is Bishop, and there is no discussion of their necessity in the hidden layer.

There are tons of tutorials on the web. Try Google-ing for "neural net tutorial" and you'll get a whole weekend's worth of reading material. I search for quite a while, but didn't find anything related to sacrificing the bias on the hidden layers.

-Kirk

As for articles, I don't have any specific recommendations. A few books of interest include Neural Networks for Pattern Recognition by Bishop, Neural and Adaptive systems by Principe, The Elements of Statistical Learning by Hastie, and Pattern Classification by Duda. The only one of those that includes a disucssion of bias units is Bishop, and there is no discussion of their necessity in the hidden layer.

There are tons of tutorials on the web. Try Google-ing for "neural net tutorial" and you'll get a whole weekend's worth of reading material. I search for quite a while, but didn't find anything related to sacrificing the bias on the hidden layers.

-Kirk

###
#10
Members - Reputation: **580**

Posted 13 November 2005 - 04:31 AM

This is from a classifier perspective. Bias is needed when you need separation boundaries that don't all intersect at the origin.

Example:

weight1*x + weight2*y + weight3*z + .. + (0 bias term) = 0 defines a hyperplane going through the origin.

weight1*x + weight2*y + weight3*z + .. + weightbias*(bias=1) = 0 defines a hyperplane that does not intersect the origin.

Think of these hyperplanes, lines, etc. as decision boundaries on what output a neural network should have.

Edit: err what kirkd said ... :)

Example:

weight1*x + weight2*y + weight3*z + .. + (0 bias term) = 0 defines a hyperplane going through the origin.

weight1*x + weight2*y + weight3*z + .. + weightbias*(bias=1) = 0 defines a hyperplane that does not intersect the origin.

Think of these hyperplanes, lines, etc. as decision boundaries on what output a neural network should have.

Edit: err what kirkd said ... :)

###
#11
Members - Reputation: **505**

Posted 13 November 2005 - 11:42 AM

Nick,

What are your thoughts on the need for bias nodes in hidden layers? Intuitively, I'm certain that they should be there for the exact same reason we want them in the input layer, but I can't come up with any supporting evidence. Everywhere I've looked and found discussions on bias nodes, they are included in all the hidden layers, but I've never seen a discussion of what would happen if they weren't there. Again, I can intuitively see that they will provide the same flexibility as they do in the input nodes - move the hyperplane away from the origin. Is it possible, however, that without a bias node in the hidden layer, the transform that occurs in the hidden layer will merely need to adjust itslef through the input-hidden layer weights?? Does a bias node in the hidden layer merely allow us to find "a" solution rather than "a specifc" solution???

Hmmmm.....

-Kirk

What are your thoughts on the need for bias nodes in hidden layers? Intuitively, I'm certain that they should be there for the exact same reason we want them in the input layer, but I can't come up with any supporting evidence. Everywhere I've looked and found discussions on bias nodes, they are included in all the hidden layers, but I've never seen a discussion of what would happen if they weren't there. Again, I can intuitively see that they will provide the same flexibility as they do in the input nodes - move the hyperplane away from the origin. Is it possible, however, that without a bias node in the hidden layer, the transform that occurs in the hidden layer will merely need to adjust itslef through the input-hidden layer weights?? Does a bias node in the hidden layer merely allow us to find "a" solution rather than "a specifc" solution???

Hmmmm.....

-Kirk