Sign in to follow this  

Evaluate my float comparison strategy, plz

This topic is 4725 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have been fighting for quite some time with epsilons, and I have been hating the ambiguity of them - one kind of epsilon would have no impact on numbers too large, while having an enormous impact on smaller numbers. What I have tried, is to scale my epsilon to match the numbers under comparison. This is what I came up with:
						// 0.0000001% within each other
#define EPSILON_DEFINER				10000000.0f;

inline bool FloatCmp(float a_value1, float a_value2)
{
	// sort the values
	SortValues(&a_value1, &a_value2);
	// then see if the biggest value is within decent range of the lowest
	if(fabs(a_value2) > fabs(a_value1)*10.0f)
		return false;

	float l_EPSILON;

	// guard against division with 0
	if(a_value1 == 0.0f)
		l_EPSILON = 0.0000001;
	else
		// ok, they are in the same range, create a decent EPSILON
		l_EPSILON = fabs(a_value1) / EPSILON_DEFINER;

	return (a_value2 - a_value1 < l_EPSILON);
}


template<class Item>
void SortValues(Item* a_value1, Item* a_value2)
{
	if(*a_value2 < *a_value1)
	{
		Item l_valuebuffer = *a_value1;
		*a_value1 = *a_value2;
		*a_value2 = l_valuebuffer;
	}
}







Edit: This is an altered version compared to the original post, the first was flawed, but this should work. If needed, EPSILON_DEFINER can be set to something else, if you want to have a hit on numbers further from each other - or if you want even more precision when 64-bit floats come along. Another way could be to make a global, that you allow the program to initialize upon startup... Ok, this is what I do: I start by sorting my input values (a_value2 will be the biggest). Then I check if they are within decent range of each other (I multiply smallest number by 10 - if large number is still larger, then they are obviously not equal. If they are in the same range, then I create an EPSILON based on their size, before I make the standard test (notice, I know that a_value2 is the largest, since I sorted it before, therefore I don't need to fabs the subtraction). Do you see any flaws? And how do you evaluate the costs? We are talking 3 comparisons (one inside SortValues()), one multiplication, one division, one subtraction. Too expensive? (Notice that SortValues() is declared inline in its header file, so there is no overhead to the function call). [Edited by - Mercenarey on January 1, 2005 11:24:46 AM]

Share this post


Link to post
Share on other sites
First I have to say, that I'm not sure what you exactly want. It seems to me like you want a sort of generic float comparison function, but why?

If you are looking if the temperature on two locations are the same, then you might want to call 0.0 and 0.2 equal. In other situations, like when 0.0 and 0.2 would indicate Amperes of current for example, you would certainly not call them equal. The truth is, that the epsilon is different in every situation, and that you should not look at the size of the floating point numbers.

If I am wrong and you're using it for some higher purpose, have a look at this:

http://www.gamedev.net/community/forums/topic.asp?topic_id=291163

As you can see, floating point variables cosist out of an exponent and a mantissa. The best thing you can do is:
1. Compare the exponents. If they are 2 or more a part then there's at least a factor 2 difference, and you would call them different (so false), else:
2. Compare the mantissae. Look at their difference, if it is bigger than some integer 'strictness' value (between 0 and 2^23), than they are not equal, else:
3. They are 'equal'

Good luck with whatever you're doing ;)
Bas

[Edited by - basananas on January 1, 2005 1:04:18 PM]

Share this post


Link to post
Share on other sites
helpful part: It just doesn't work. FloatCmp(-10,-10) = false;

I wrote that code

inline bool FloatCmp(float a, float b){
// keep your code clean. Long nondescriptive parameter names
// do not improve readability.
// Prefixes in pure functions do not improve readability.
if(abs(a-b)<abs_tolerance) return true;
if(a>0){
if(b>a*(1+rel_tolerance) || a>b*(1+rel_tolerance)) return false;
}else{
if(b<a*(1+rel_tolerance) || a<b*(1+rel_tolerance)) return false;
}
return true;
}





but i suggest you not to use such things.

As about performance, you can first check if difference of exponents of floats <= 1 , and then do rest, it should accelerate somewhat.

rant part:
I see flaw in entire idea to make universal "FloatCmp". There is no single reliable way to "compare floats for equality". Every "method" will fail in some uses. With bigger numbers you typically get bigger error, but how much it grow depends to what calculations you do. Usually you get linear dependence, but in some cases you can get quadratic. It is much better if every time you have need to compare floats, you think what maximal tolerance is acceptable _for software user_ in that specific case. For example, you need to check if point is on line. It is better to check if point is inside some reasonable cone, (for example, cone that corresponds to pixel size) and if it fails, you need bigger precision than floats...

In most of my programs, i compare floats with certain tolerance that depends to specific case. For example, i do not need big precision for pixel color values.

Float comparasion is very "innatural" operation, you should be able to avoid it. Usually you don't really need to compare floats but need to compare for specific well-known range. If you don't know range, it might be a sign that you are doing something conceptually wrong.

edit: as about exponents, it's almost as basananas said, except that you can't just compare matissas.
Example:
1.999999 and 2.0000001 have different exponents and therefore matissas is very different too. First is something like 1.1111111...0...*2^0 , second is like 0.000000...1...*2^1 , so matissas differ. Branching is costly, so handling such cases specially is not really much faster.

edit: and strictly speaking, float epsilon is more or less well-defined thing, and is used in iterative algorithms to stop when precision is good enough and will not improve anymore.

Share this post


Link to post
Share on other sites
Dmytry:

Ya, you are right, it fails with minus-numbers. I didn't test it properly. I will work on that :)


Ok, thats a minor detail, however.


The reason why I want a generic epsilon to compare with, is to be able to set size of a world dynamically. Of course I could set the epsilon when I set the world parameters...

Basananas:
I will take your advice and look into exponent/mantissa.
But how do I get access directly to the mantissa and exponent values?

[Edited by - Mercenarey on January 1, 2005 10:23:51 AM]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by basananas
1. Compare the exponents. If they don't match then there's at least a factor 2 difference, and you would call the different, else:

Not true, the difference can be much smaller than a factor 2. For instance, 1.0 and 0.99999999 have different exponents.

Share this post


Link to post
Share on other sites
Why not take the easy approach and write a function that evaluates a tolerance?
bool equal(float a, float b, float tol=.000001)
{ return (abs(a-b)<tol)?true:false }
Seems to me to be a heck of a lot easier than messing around with float internals, and achieves the same purpose/

Share this post


Link to post
Share on other sites
Quote:
Original post by thedustbustr
Why not take the easy approach and write a function that evaluates a tolerance?
bool equal(float a, float b, float tol=.000001)
{ return (abs(a-b)<tol)?true:false }
Seems to me to be a heck of a lot easier than messing around with float internals, and achieves the same purpose/

read posts above, esp. my post and code.
At bigger numbers you need bigger "tol"

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
Quote:
Original post by basananas
1. Compare the exponents. If they don't match then there's at least a factor 2 difference, and you would call the different, else:

Not true, the difference can be much smaller than a factor 2. For instance, 1.0 and 0.99999999 have different exponents.


Yeah, that's true. You'll have to look for a difference of at least 2, and if the difference is smaller, you'll have to look at the mantissa of both of the values.

I've written an example for you (Mercenarey that is :)) Use it, tear it apart and do whatever you want to do with it.


#include <iostream>
using namespace std;

#define STRICTNESS 1.2 //The strictness function check returns true if:
// (A/B) < STRICTNESS, in which A is larger than B
//and A and B are the two compared floating points
//STRICTNESS may be any value between 1.0 and 2.0

int check(float * a, float * b);

float a1 = 5;
float a2 = 4.6;

int main() {

cout << check (&a1, &a2);

return 0;
}

//Pre: both are of the same sign (either both positive or both negative)
//Post: true if max(a,b)/min(a,b) < strictness, false otherwise
int check(float * a, float * b) {
unsigned int toIntA = * ((unsigned int *) a);
unsigned int toIntB = * ((unsigned int *) b);

//Check if the floats had the same sign
if (toIntA & 0x80000000) { //float a was negative (bit 0 = sign bit)
if (!(toIntB & 0x80000000)) return 0; //Different signs, return false
}

//Calculate exponents and mantissae of a and b
unsigned int expA = ((toIntA & 0x7F800000) >> 23) - 127; //See the forum thread I was talking about...
unsigned int mantissaA = 0x800000 | (toIntA & 0x007FFFFF); //idem dito, or ask on the forum if you don't understand it

unsigned int expB = ((toIntB & 0x7F800000) >> 23) - 127;
unsigned int mantissaB = 0x800000 | (toIntB & 0x007FFFFF);

if (expB - expA > 1 || expA - expB > 1) return 0; //Considering that their exponents are at least 2 apart
//expA/expB or expB/expA must be greater than 2, thus out of range
if (expA > expB) {
mantissaA = mantissaA << 1; //expA is still greater than expB, so shift it's mantissa right
} else {
if (expB > expA) {
//Shift mantissaB right and swap it with mantissaA to make sure that A > B
unsigned int swap = mantissaA; mantissaA = mantissaB << 1; mantissaB = swap;
} else {
//Equal exponents
if (mantissaB > mantissaA) {
//swap the mantissae so that A > B
unsigned int swap = mantissaA; mantissaA = mantissaB; mantissaB = swap;
}
}
}

//Now just look if mantissaA / mantissaB smaller than the STRICTNESS value:
if ((double)mantissaA / (double)mantissaB < STRICTNESS) return 1;

return 0;
}






Good luck,
Bas

Share this post


Link to post
Share on other sites
thedustbustr:

Try this:
float aa = 10000000.0f;
float bb = 10000001.0f;

float cc = fabs(aa - bb);

cc will 1, even though the difference between the numbers is extremely small (percentage-wise), and your small epsilon will detect them as being different, even if you might wish to evaluate them to being equal.

Similarly, if you compare these:

float aa = 0.000000000000f;
float bb = 0.000000000001f;

float cc = fabs(aa - bb);

cc will be extremely small, and you need an even smaller epsilon to detect a difference. Your simple epsilon of 0.000001f will have no chance to see a difference - it will first see a difference when the numbers are a million times bigger!
That is the epsilon problem - you can not use the same epsilon for all magnitudes. And that is what I try to solve with my code.

For some numbers it will be far too small, for others it will be far too big.

basananas:
Thanx alot! Those ways to evaluate mantissa/exponent was just what I was looking for :)

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
While basananas approach might work, I don't see why you would want to complicate things like that. Just try something like

return fabs(a - b) <= epsilon * max(fabs(a), fabs(b))

or

return fabs(a - b) <= epsilon * (fabs(a) + fabs(b))

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
While basananas approach might work, I don't see why you would want to complicate things like that. Just try something like

return fabs(a - b) <= epsilon * max(fabs(a), fabs(b))

or

return fabs(a - b) <= epsilon * (fabs(a) + fabs(b))


It's kind of amazing how most of code posted in this thread fail to work with negative numbers.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by Dmytry
It's kind of amazing how most of code posted in this thread fail to work with negative numbers.

What do you mean? Why wouldn't it work with negative numbers?

Share this post


Link to post
Share on other sites
Easy way to test equality
(its been a while since i've been coding in c++, so bear with me)

int32 *a;
int32 *b

*a = &floata
*b = &gloatb

if !(a ^ b) {
// Floats are equal, exactly
} else {
// Floats are not equal
}


Easy way.

hard way


int32 *a
int32 *b
bool sa
bool sb
int32 fraca
int32 fracb
int32 manta
int32 mantb

*a = &floata
*b = &floatb

Sa = a & b10000... you get the piecture
Sb = b & b100000 same here

Fraca = a & b011111111
fracb = b & b011111111

Manta = a & b0000000001111111111111111111111
Mantb = b & b0000000001111111111111111111111



This seperates the sign bits, fractions and mantissas from the floats.

You first check wiether the sign bits are the same, if there not then they arn't equal (i know this is a generalisation, but for most numbers that you'll be using, it'll hold true).

You can then use the mantissa to do the first equality test, ie, they should be resimably similar. just suptract them, & out the sign bit, and check if its < 2 or 3.

you then find out the difference between the two fractions, (just subract them, & out the sign bit, and see if there close enough).

And if it takes all that, then actually calculate both as integers.

You then get both ints, take the difference between them (& out the sign bit again), and divide that by the sum of the floats.

You then check weither this error is small enough to be ok.

Quote:
Original post by Dmytry
Branching is costly, so handling such cases specially is not really much faster.


Actually brancing is quite fast nowdays. maybe on an 8080 it may have been very costly, but now, its really cheap. (read up on branch prediction)


HTH
From,
Nice coder

Share this post


Link to post
Share on other sites
DmyTry:
Lol, good comment about the negatives :)
I corrected my code, it should work with negatives now.

You can cut/paste from the original post, if you wanna try. Sorry for the inconvenience :)

Nice Coder:
if !(a ^ b) {...

What does "^" mean?

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder
Quote:
Original post by Dmytry
Branching is costly, so handling such cases specially is not really much faster.


Actually brancing is quite fast nowdays. maybe on an 8080 it may have been very costly, but now, its really cheap. (read up on branch prediction)

Actually, on 8080 branching costed much less than on P4 if you count cost in additions (that is, how many additions spend same time. Also, on P4 branching is more costly than on P3, etc. Read about pipelines. "branching prediction" doesn't always work, it often works with 50/50 chances.

Share this post


Link to post
Share on other sites
AP: first code that uses "max" will not work. Second will work. In this thread, probability to get working code = 50/50 [grin]

Quote:
Original post by Mercenarey
DmyTry:
Lol, good comment about the negatives :)
I corrected my code, it should work with negatives now.

You can cut/paste from the original post, if you wanna try. Sorry for the inconvenience :)

I don't cut-n-paste, i just see mistakes.

FloatCmp(-0.000000001,0.0000000001);

You misundestood me. "but i suggest you not to use such things." does not mean you need to write your own "universal comparison routine" that fail. You just need to drop that idea about universal float comparison routine. I doubt you have good reasons to do that comparison. If you still do not want to drop, better use my code (with apporiate abs_tolerance and rel_tolerance), it is really much faster, clealer, safer, etc. (ote that you should avoid division as it is slow.)

Your code still sucks, sorry.

Note that with negatives, if you call FloatCmp(-1,-100);
after sorting, -1 is the largest,
if(fabs(a_value2) > fabs(a_value1)*10.0f)return false;
just serves no purprose.
Note that
FloatCmp(-0.00000001,0); is true but
FloatCmp( 0.00000001,0);
and
FloatCmp(-0.00000001,-0.0000000001);
is false.

edit: sorry, i'm bit mistaken myself by double-wrongness, so first isn't true.

So, good luck with it....

[Edited by - Dmytry on January 2, 2005 3:34:25 AM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Mercenarey
DmyTry:
Lol, good comment about the negatives :)
I corrected my code, it should work with negatives now.

You can cut/paste from the original post, if you wanna try. Sorry for the inconvenience :)

Nice Coder:
if !(a ^ b) {...

What does "^" mean?


Exclusive or

I do !(a ^ b) {

Once i've casted the floats to integers, becasue
for any binary number n ^ n = 0.
So, if both numbers are binarily equal (all the bits are equal), then when there xored together, they evaluate to 0.

I then add the !, so that the 0 from the xor, (which is taken as false), gets negitated to 1, (which is TRUE (actual names, defined soemwhere)).

Just doing my job. I'm not called Nice coder for nothing you know...

From,
Nice coder

Share this post


Link to post
Share on other sites
Quote:
Original post by Dmytry
Quote:
Original post by Nice Coder
Quote:
Original post by Dmytry
Branching is costly, so handling such cases specially is not really much faster.


Actually brancing is quite fast nowdays. maybe on an 8080 it may have been very costly, but now, its really cheap. (read up on branch prediction)

Actually, on 8080 branching costed much less than on P4 if you count cost in additions (that is, how many additions spend same time. Also, on P4 branching is more costly than on P3, etc. Read about pipelines. "branching prediction" doesn't always work, it often works with 50/50 chances.


I've already read about branch prediction, and its quite good.

Although the pipeline gets in the way (a lot), conditional jumps are pretty fast. It just depends on how you use them.

From,
Nice coder

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by Dmytry
AP: first code that uses "max" will not work. Second will work. In this thread, probability to get working code = 50/50 [grin]

Still can't see why it wouldn't work. Can you explain?

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
Quote:
Original post by Dmytry
AP: first code that uses "max" will not work. Second will work. In this thread, probability to get working code = 50/50 [grin]

Still can't see why it wouldn't work. Can you explain?

ops, sorry man, somehow haven't noticed fabs(a) and fabs(b) in max.(and would bet 'em wasn't there[grin] ) Will remember not to post into math & physics at first day of new year... :(

Share this post


Link to post
Share on other sites
DmyTry:
Yes, I know that my code sucks. I didn't test it well enough before writing it here.
But it still works with all those examples, that you brought up, where you say it would fail.

I see other problems with this approach, however: When comparing very small numbers, the epsilon will always be alot smaller, and therefore make small numbers always be not equal (unless they are 100% equal).
And that is a problem, considering, that often you will be interested in having small numbers be equal...

What I asked for was an evaluation of this approach, and we got that. A debate about different problems and possible solutions can't be bad - it can only clarify. And our discussion have made me see alot of problems with my approach.

But since we have already established that my code has weaknesses, why can't we just forget it, and look at the concept? There is no reason to keep pointing out that the code sucks - especially not since this is still an interesting conceptual discussion.

--------------

I already have a FloatCmp with a provided epsilon, why do you want me to write that one more time? The idea was, that I didn't have to worry about epsilons and tolerances. What you suggest would just put me back on that track again.

Share this post


Link to post
Share on other sites
In code i posted, you can set
abs_tolerance=0.0000001;
and
rel_tolerance=0.0000001;
and it will work correctly with small numbers too.

Actually, i'm looking at that from physics point of view. Values are almost never exactly equal, but you can consider them to be equal if them is closer than error of instruments. "Close enough" is usually meant to be difference of values, as is or in logarithmic scale. First corresponds to abs_tolerance , second to rel_tolerance. If you *might* have negative numbers, relative tolerance is usually meaningless.

That is, if you measure themperatures in Kelvin degrees , rel_tolerance is meaningfull , error between 0.9K versus 1K and 9K versus 10K is similar.
If in Celsius or Farenheit, not so.

Share this post


Link to post
Share on other sites
Quote:
Original post by Nice Coder
I've already read about branch prediction, and its quite good.

Although the pipeline gets in the way (a lot), conditional jumps are pretty fast. It just depends on how you use them.


Branch prediction is good when the condition is predictable. People are talking about branching on relative size of exponents on floats in a low-level library. That sounds like russian roulette to me. Conditional moves are fast though, and can be used for the min/max stuff.

Share this post


Link to post
Share on other sites
@Mercenarey,

I'm going to defend my simplistic approach.
Quote:
bool equal(float a, float b, float tol=.000001)
{ return (fabs(a-b)<tol)?true:false }
(assuming fabs() is the floating point equivalent of abs())


As Dmytry said, trying to compare to within a certain percent is futile, it depends on the situation. You should know at call time to what tolerance you need them compared, so you pass it to the function (0.000001f is just a default tolerance). There is no way to determine an appropriate tolerance given only the two values: you need to have knowledge of the application. Thus, the programmer must pass in an appropriate tolerance. Your battle to calculate an ideal epsilon from the input values is pointless. KISS (Keep It Simple Stupid)

This code works with negatives.

[edit]
Quote:
I already have a FloatCmp with a provided epsilon, why do you want me to write that one more time? The idea was, that I didn't have to worry about epsilons and tolerances. What you suggest would just put me back on that track again.
Unfortunately, any solution which abstracts away the epsilon/tolerance is never going to be ideal for all circumstances.

[edit2]You could try converting your inputs to fundamental units (thus losing any 10^3's etc in your input) (think in scientific notation here), making your epsilon the desired significant figures times 10 raised to the larger input exponent - 1. eg

float a = 1.000*10^6;
float b = 9.999*10^5;
const float epsilon = .0001*10^(6-1) //.0001 represents 4 sig figs
//epsilon = .0001*10^5 = 10 (which is reasonable, considering
//we are dealing with quantities in the millions- this might
//be too accurate)

.0001 is an arbitrary constant of how tolerant your comparison will be- how many digits are significant in your comparison.

You should be able to pull the exponent from the way floats are stored (see erlier discussion in this thread)

I haven't thought this approach all the way through, and it will by no means be perfect- problems with not inputting your own epsilon will still arise.

[Edited by - thedustbustr on January 2, 2005 4:36:28 PM]

Share this post


Link to post
Share on other sites
easy way


float fa
float fb
int ia
int ib

ia = fabs(fa)
ib = fabs(fb)

if ((abs((ia - ib)) / (ia + ib)) < epsilon) {
//The numbers are equal
}



Now thats the easy way, the fast ways going to be harder.

From,
Nice coder

Share this post


Link to post
Share on other sites

This topic is 4725 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this