backprop neural net training trouble

Started by
1 comment, last by niblick 17 years, 9 months ago
hi. i've been having some troubles with my neural net not being able to train. here's some info so hopefully i can get some pointers to look for where i'm going wrong. context: my neural net is a simple one. i am using it to process some vectorised mouse input and have only 3 gestures for it to recognise. to this end it is simply a 24 input node, 6(to be tweaked) node hidden layer and a 3 node output layer. it uses a sigmoid function for it's activation and backprop for the training method. issue: i have got the neural net activating and setting up with random weights but as it trains i notice the values for each of the outputs to gestures tending towards (1, 1, 1) instead of (1, 0, 0), (0, 1, 0) and (0, 0, 1). This training is way out of whack and i have been debugging it as best i can and have noticed some errors which have failed to complete the network. the mean squared error also tends to 6 and stops there althout it will happily train for ever. questions: has anyone else had these issues with their neural nets and could tell me where i might look to check them out? i am going to review all of my processing code to ensure the network is set up correctly but have been doing this and comparing to another working net recently to no avail. *when comparing with the other net i did discover the only real difference was that i was recalculating my output layer weights before calculating my hidden layer error values whereas they were calculating all the error values and then recalculating the weights. but i dont see how this could cause it to screw up so badly* code samples: here's some code from the net that should be useful in understanding my approach. i will abstract some of it to avoid explaining all the classes i have used yadda yadda... activation function

double sigmoid (double dInput, double dResponse)   {
   return ( 1 / (1 + exp (-dInput / dResponse) ));
}


backprop error function

double	ANN::BackProp::Output::Error	(double dObserved, double dExpected)   {
   return   ((dExpected - dObserved) * (dObserved * (1 - dObserved)));
}


weight change for output layer

double	ANN::BackProp::Output::WeightChange	(double dObserved, double dError, double dLastMomentumChange, double dLearningRate)	{
	return	(dObserved * dError * dLearningRate);
}


training iteration for each training step we have the code...

bool	TrainIter (std::vector <double> * vecdObserved, std::vector <double> *                         vecdExpected)	{
	double error = 0.0, sum=0, dWeightAdjust=0;
	int n=0;
	
	//	loop through output layer calculating error
	for (int i=0; i < m_vecnOutLayer.size (); i++)	{
		
		error = Error (vecdObserved->at (i), vecdExpected->at (i));				
		m_vecnOutLayer.at (i).m_dError = error;
		double dTempError = (vecdObserved->at (i) - vecdExpected->at (i));
		m_dSSE += dTempError*dTempError;
		
		// adjust weight of h>o layer weights
		for (int j=0; j < m_vecnHiddenLayer.size (); j++)	{	
			dWeightAdjust = WeightChange (vecdObserved->at (i), vecdExpected->at (i), m_vecnOutLayer.at (i).m_vecdLastWeights.at (j));
			m_vecnOutLayer.at (i).m_vecdLastWeights.at (j) = dWeightAdjust;
			m_vecnOutLayer.at (i).m_vecdWeights.at (j)	+= dWeightAdjust;
		}
		dWeightAdjust = m_vecnOutLayer.at (i).m_dError * STD_BIAS * STD_LEARNINGRATE ;

		m_vecnOutLayer.at (i).m_dBiasWeight +=	dWeightAdjust;
		m_vecnOutLayer.at (i).m_dBiasLastWeight = dWeightAdjust;
	}
	
	for (int i=0; i < m_vecnHiddenLayer.size (); i++)	{	
		// calc error for hidden layer
		sum = 0.0;
		for (int j=0; j < m_vecnOutLayer.size (); j++)	{
			sum += m_vecnOutLayer.at (j).m_vecdWeights.at (i) * m_vecnOutLayer.at (j).m_dError;
		}
		sum *= (m_vecnHiddenLayer.at (i).m_dValue * (1 - m_vecnHiddenLayer.at (i).m_dValue));
		m_vecnHiddenLayer.at (i).m_dError = sum;

		// adjust weight of i>h layer weights
		for (int j=0; j<vecdObserved->size (); j++)	{
			dWeightAdjust = BackProp::Hidden::WeightChange (vecdObserved->at (j), sum, m_vecnHiddenLayer.at (i).m_vecdLastWeights.at (j));
			m_vecnHiddenLayer.at (i).m_vecdWeights.at (j) += dWeightAdjust;
			m_vecnHiddenLayer.at (i).m_vecdLastWeights.at (j) = dWeightAdjust;
		}
		m_vecnHiddenLayer.at (i).m_dBiasWeight +=				m_vecnHiddenLayer.at (i).m_dError * STD_BIAS * STD_LEARNINGRATE ;
	}
	if (m_dSSE < STD_ERRORTHRESHOLD)	{
		return (true);
	}
	return (false);
}


i was going to abstract the above iteration code but it is pretty simple. there's two vectors of 'nodes' each containing a list of the weight values for the incoming nodes. each iter is called once the nn has been fed and processed the next of the gestures in repeating order. thanks. nib Edit: i have looked over the processing function of the nn and it is simply carrying out the neural net. for node X sum the values of the nodes in the layer below with the apropriate weighting value. add the bias * bias weight. and feed through the activation function. et voila. nothing fancy and nothing wrong. which leads me to believe the error is in the area described above. [Edited by - niblick on July 6, 2006 4:59:07 AM]
Advertisement
Why is your error function so weird? I was expecting to see
return (dExpected - dObserved) * (dExpected - dObserved);

i was basing it off Mat Bucklands from "AI techniques fof game programmers".

i have been noticing the error value being WAY out of whack. it gives a resonable answer when comparing the output of say 0.8 to 1.0 but when i compare it with 0.0 it likes to tell me it's got less of an error.

i will go and try it using (E-O)^2.

i noticed some noob errors in the bulk of the code but the error value has been pissing me right off.

edit: (E-O)^2 doesn't help. it gives more meaningful error values but that's an error for the entire network rather than an error for the individual node itself.
(E-O)O(1-O) gives a signed error value but seems to love giving bullshit when i compare a high value with 0. :arg:

[Edited by - niblick on July 7, 2006 9:05:07 AM]

This topic is closed to new replies.

Advertisement