I'm having problems with the resolution of floating-point calculations and am unsure how I should handle them. Here's my problem and my (failed) solution:
(NOTE: Either my browser or the forum software is not displaying the PLUS sign, even though I'm writing it. In my code below, I will write out "-PLUS-" instead of using the addition sign, to make it clear)
The equations of motion of a game object are governed by the following equation (movement as a function of the parameter r, which increases a very small amount every loop):
[source="cpp"]
float b = 26.087315;
float b2 = 680.54797;
float Rs = 10.;
float position = 0;
float EPSILON = .00001;
// Movement loop
while (r is still valid)
{
// r is the dependent variable, b and Rs are constants
float val = 1. - b2 / (r * r) * (1. - Rs / r);
float movement = b / (r * r * sqrt(val);
position = position -PLUS- movement;
r = r -PLUS- EPSILON;
}
The equation (1. - (b2 / r^2) * (1. - Rs / r)) blows up to infinity, depending on the values of b2 and Rs. In this case, the critical value is 15.840763. I have code that solves the above equation to find the critical point, so this is not a problem - I simply start r at a value above this critical value (15.840763) and do not allow r to fall low enough to be equal to or below this value.
Above the critical value, the function (1. - (b2 / r^2) * (1. - Rs / r)) is smooth and continuous, increasing in value from 0 to 1 as r approaches infinity.
However, certain values above the critical value will still result in a divide-by-zero error, as val will equal 0. I reasoned that this was because of the imprecision of floating-point numbers - too close to the critical value and val is evaluated to such a small number that it rounds to 0.
So I first ran the following code to determine the lowest possible value of r (while still remaining above my critical value) such that val is still evaluted to be greater than 0:
float b2 = 680.54797;
float Rs = 10.;
float critical_r = 15.840763; // Calculated earlier in the code
float low_limit = critical_r; // The value for which we know any r
// equal to or below will return a 0
// upon evaluation of "val"
float high_limit = 20.; // A value chosen because I -KNOW- this to be valid
// from earlier code. Any r value equal to or greater
// than high_limit should be guaranteed to return a
// number greater than 0 upon evaluation of "val"
float min_r = high_limit; // min_r will be the smallest possible r above
// critical_r that we know will not provide a 0
// for the evaluation of "val"
while(1)
{
// Test a value midway between the low limit and high limit
float test_r = low_limit -PLUS- 0.5 * (high_limit - low_limit);
if (test_r - low_limit <= 0 || high_limit - test_r <= 0)
{
// We've reached the floating point resolution limit, where we
// can't fine-tune the value of min_r anymore
break;
}
float val = 1. - b2 / (test_r * test_r) * (1. - Rs / test_r);
if (val <= 0)
{
// Mathematically, we know val should be greater than 0.
// However, due to floating point precision limits, we have
// calculated it to be equal to or less than 0, so we know
// min_r has to be greater than our current value of test_r
low_limit = test_r;
}
else
{
// We've shown that evaluating "val" with r = test_r, we get
// a greater-than-zero value. Thus, our value of test_r is
// currently the lowest possible value of r that we know will
// give a valid value for "val," despite floating point
// precision errors
high_limit = test_r;
min_r = test_r;
}
}
// Since our tolerance limits our resolution to this difference at the most
// sensitive point anyway, let's use it as a minimum cap always
EPSILON = min_r - critical_r;
So my code attempts to narrow down on the smallest possible value of r that is above critical_r (15.840763) AND also yields a greater-than-zero value for "val." I then forbid my movement code from ever allowing r to go below this value (min_r).
Unfortunately, I've found that while min_r yields a greater-than-zero value for "val," there are other values of r that are GREATER than min_r which will return 0 for "val."
For example, running with the above code and constants (b2 = 680.54797, Rs = 10., critical_r calculated to be 15.840763), I find:
min_r = 15.840767
When I evaluate "val" at min_r:
val = 1. - b2 / (min_r * min_r) * (1. - Rs / min_r) = 5.9604645e-008;
Interestingly, if I evaluate val at r = 15.840769 (which is greater than min_r):
val = 1. - b2 / (15.840769 * 15.840769) * (1. - Rs / 15.840769) = 5.9604645e-008;
So val has the same value, even though r is now slightly higher. Fine, I can chalk this up to imprecision in the floating-point calculation.
However, if I evaluate val at r = 15.840771 (which is greater than both min_r and the last test value of r, 15.840769):
val = 1. - b2 / (15.840771 * 15.840771) * (1. - Rs / 15.840771) = 0;
I expect val to be equal to previous values, if not larger (as my r value is larger). However, not only is it smaller, it's actually 0 - which will lead to a divide-by-zero error! Mathematically, this makes no sense - I conclude it must be due to floating-point resolution and precision errors!
Clearly, my algorithm for determining the smallest possible r value to use (while still remaining valid given the limitations of the floating-point calculating abilities of my machine) is not completely effective.
How should I approach and solve this problem? How can I effectively determine the values of r I can use in my equations above, (given a value for b2 and Rs)?