Trouble with sqrt & template functions

Started by
25 comments, last by jmau0438 14 years, 7 months ago
I don't think so, but someone more knowledgeable than I might beg to differ. As far as I'm aware, an extra semi-colon will result in an empty statement in the assembled code, that the compiler will probably optimise away for you.
Advertisement
Important tip: the comma operator is a subtle beast. Don't try to be clever, you'll only get burned.

Just write it on separate lines. Or use std::vector<>. Why aren't you using std::vector<>?
delete [] X_M;X_M = 0;
I actually am using vectors. What I had posted was kind of like a draft. I took in almost all the advise of the fine programmers at game dev. For those who are curious of my final solution, I suppose I'll post it here:

template < typename TYPE > TYPE STD_DEVIATION ( const vector < TYPE > Array ) {

//The sum of all values in the sample/array, or the sum of X
TYPE SampleSum = 0;
//The squared sum of all values in the sample/array, or the sum of (X)2
TYPE SampleSumSquared = 0;
//Constant value representing the number of (actively used) elements in the vector.
const int Length = Array.size();

//Compute & store the sum of the samples' raw & squared values
for ( int intLoop = 0; intLoop < Length; intLoop++ ) {
SampleSum += Array [ intLoop ];
SampleSumSquared += ( Array [ intLoop ] * Array [ intLoop ] );
};

//Return the population standard deviation
return sqrt ( ( SampleSumSquared - ( SampleSum * SampleSum ) /
Length ) / ( Length - 1 ) );
};

What do you fellas think?
Looks good. And yeah, as long as you need the for-loop anyway for the sum of squares (it could be done with std::accumulate, std::transform and a couple other pieces, I think, but it really doesn't do a very good job of looking any simpler :) ), might as well use it for the sum of values too. ;)

Only things I would change:

- Pass the vector in by reference, and don't call it 'Array' when it isn't one. :) (Actually, instead of trying to describe what kind of container it is, better to say a little about what its contents are. ;) )

- Iterate using an iterator, and save the value in a temporary:

// BTW, a loop iteration variable doesn't benefit from that 'int' prefix // any more than anything else. :)typedef vector<TYPE>::const_iterator const_iterator;for (const_iterator it = data.begin(); it != data.end(); ++it) {  const TYPE& value = *it;  SampleSum += value;  SampleSumSquared += (value * value);} // no semicolon needed here


- It really doesn't need that much commenting ;)
I used accumulate in the function, and I gotta say it made it way more readable, check it out:

template < typename TYPE > TYPE STD_DEVIATION ( const vector < TYPE > &Vec ) {

//The sum of all values in the sample/array, or the sum of X
TYPE SampleSum = accumulate ( Vec.begin(), Vec.end(), 0 );
//The squared sum of all values in the sample/array, or the sum of (X)2
TYPE SampleSumSquared = accumulate ( Vec.begin(), Vec.end(), 0, AccumulateSquaredSum < TYPE > );
//Constant value representing the number of (actively used) elements in the vector.
const int Length = Vec.size();

//Return the population standard deviation
return sqrt ( ( SampleSumSquared - ( SampleSum * SampleSum ) / Length ) / ( Length - 1 ) );

};

Yea I kept the comments, but I can't tell you how many times I wrote something nice and just used it for like a year, and when I wanted to modify it I ended up spending more time than I would have liked understanding what it was that I did. At least this way I won't forget, and explaining it helps me to better understand what it was that I did. I know we've probably beat this whole thing to death, but what do ya think?
I think you should drop the all-caps, they are hard to read and they are universally understood to stand for constant values, not types or function names.

I generally wouldn't mark a variable like "Length" const. While technically correct it can add unnecessary mental processing because most programmers associate "const" with pointers or reference types (and constant values of course). This is a minor point, a more important one is consistency. If Length is constant - why not SampleSum or SampleSumSquared?

Quote:
Yea I kept the comments, but I can't tell you how many times I wrote something nice and just used it for like a year, and when I wanted to modify it I ended up spending more time than I would have liked understanding what it was that I did. At least this way I won't forget, and explaining it helps me to better understand what it was that I did. I know we've probably beat this whole thing to death, but what do ya think?

The *intent* behind your is fine, but they are a little verbose. The don't actually give any more information than the variable names (which are well chosen here) and algorithms do. This is what we call self documenting code, to anyone familiar with the standard algorithms the code is clear.

I would me more concerned with documenting the rather obscure looking formula used in the return value than the variables being input into it. Even including a link you can refer to later would be a good idea.

Also, I would be tempted to break it over a few lines, like Zahlman did, or introduce a few temporaries. Large expressions are difficult for the mind to parse - make it easier.
Quote:Original post by Zahlman
So we end up with the (sum of x2) - 2N x02 + N x02, or just (sum of x2) - N x02. And since (sum of x) == (N x0), again, we can simplify as (sum of x2) - ((sum of x)2 / N). :)


Unfortunately, floating point arithmetic doesn't follow all the same laws as ideal mathematics. This transformation is quite literally a textbook example of a numerically unstable algorithm (ex: pg 19 of "Scientific Computing An Introduction Survey" by Heath). Taking the difference of two sums that can be very close to one another can result in what is called catastrophic cancellation and can cause standard deviation calculations orders of magnitude different than the actual value and, on occasion, taking the square root of a negative number.

Wikipedia lists a few more stable algorithms for calculating variance.
And here I was doing it specifically to avoid round-off error by delaying the division when the template type is integral... :P

Anyway, as a rule of thumb, if it isn't readable without the comments, it isn't really readable with them, either. Comments aren't there to enhance readability so much as to provide direction to the person reading the code.

That doesn't always mean you have to rewrite it again. It sometimes means you have to trust your ability to understand (and write) code a little more. :)
Thanks for your posting guys. Again, I guess I'll have to answer in parts:

Zahlman & Rip-Off - Comments

In terms of commenting, I'm not trying to improve readability so much as to leave behind bread-crumbs for later. As Zahlman said, "Comments aren't there to enhance readability so much as to provide direction to the person reading the code." This has kind of been my aim with comments, to leave behind direction. Maybe I'm not hitting that up as well as I could, so I'll work on it. Again to quote Zahlman, "It sometimes means you have to trust your ability to understand (and write) code a little more.", and maybe trusting myself in this respect is what I'm really having a problem with.

Rip-Off - Constant Consistency and the Return Value

I agree on your point of consistency, so the "SampleSum" & "SampleSumSquared" are now constant. Although I understand what you mean in how a constant cast can "add unnecessary mental processing because most programmers associate "const" with pointers or reference types (and constant values of course)", I've chosen to retain the constant cast as identifying them as simply constants isn't too far of a strech in this case. Your point on further commenting on the return value is also a good one, and I'll make changes to comment after this post.

SiCrane - Statistics & Negative Numbers

What you say in regards to negative numbers in this formula is absolutely true. However, I have found in most (keyword is most here) statistical calculations when you get a negative result you should check your math before anything else. In fact, in this very discussion thread I had a problem were I was performing a squared root of a value when I should have just squared it, which yielded a negative number just prior to the final algorithm. In this case, I got C/C++'s NaN value (-1.#IND), which did indeed suck. However, it was my math, not the algorithm, which got the better of me. Your point is still valid, but in this case (that being statistical math) I'd prefer it broke here as it will indicate to me that my math is wrong somewhere. Man, I should write an exception for just that! Awesome!
Quote:Original post by jmau0438
Your point is still valid, but in this case (that being statistical math) I'd prefer it broke here as it will indicate to me that my math is wrong somewhere.

That makes no sense. If it breaks here it doesn't indicate anything about your math anywhere else. It only indicates that you used a numerically unstable algorithm here.

This topic is closed to new replies.

Advertisement