Float point calculations on diffrent phones

Started by
6 comments, last by Pink Horror 8 years, 1 month ago

I have a strange bug and i cant find where and what it causes it, i wonder if there are any specs that say anything about floating point calculations and when possible NaNs occur for diffrent cpus (mainly on sony and samsung), i have issues with samsung grand prime where on sony xperia j i have no issues.

Advertisement

Usually CPUs target an IEEE spec, but the manufacturer docs are authoritative if they're available. If they're not then I think experimentation is the only reliable path. You can make a tester that attempts several different types of calculations and captures the results in a short log with a CPU identifier, then emails it to you. Pass that around to various users to gather a database and then post it somewhere (like GDN) so that people don't have to mess with it. You can probably find willing testers with a wide variety of devices at XDA.

If you just need to transfer float values between platforms that are handling them differently you may want to resort to multiplying by a precision factor, truncating to an integer type and then reversing that process on the other end of the connection.

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

Looks like Samsung Galaxy Grand Prime is a ARM Cortex-A53 and sony xperia j is a ARM Cortex-A5 CPU. So hunting on the ARM site is your best bet if you can find the spec there. From what i can see the cortex-a5 has a 'optional Cortex-A5 FPU' and support ANSI/IEEE Std 754-1985 floating point. A53 has a FPU and support IEE754-2008 and i guess its newer and better. The different ways to runt it might get different result.

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0041c/Ciheagah.html
http://hugofeng.info/2014/04/25/float-operations-on-arm/

@spinningcubes | Blog: Spinningcubes.com | Gamedev notes: GameDev Pensieve | Spinningcubes on Youtube

A much more likely source of your NaNs is some calculation that rely on some metric taken from the device, such as screen size or density or something like that.

Maybe indirectly, through sensors or touch input.

Also, maybe things happen in different orders, and a value that is initialized on one device isn't on the other.

I'd check those for 0/0 divisions and square roots of negative numbers and other such things before starting to dig into the IEEE specs.

Operations that might generate NaN are listed here: https://en.wikipedia.org/wiki/NaN

Found where problem lies:


output log:
/**************************************************
vel: 1.4984 m/s
surf_choordlen: 1.78408
Vfi: X 0.0102074 Y 0.00227052 Z 0.999945
ACTUAL_FRAME[i].surface: 4.11654
CFRn: 4134.78 <-- here
skin_percentage_coeff: 0.320026
skin: X 27775.8 Y 6178.39 Z 2.72098e+06
***************************************************/

code:



for (int i=0; i < submerged_tri_count; i++)
if (ACTUAL_FRAME[i].surface > 0.1) //do not coun't anything that is less than 10 cm^2
{

	vec3 element_form_drag = vec3(0.0, 0.0, 0.0);
	vec3 element_skin_drag = vec3(0.0, 0.0, 0.0);
	vec3 cog2pC 	= vectorAB(pos + ROTATION_MAT * CENTER_OF_GRAVITY, ACTUAL_FRAME[i].pC);
	vec3 local_vel = vel + AngVel * cog2pC;
	vec3 vn = Normalize(local_vel);
	float vv = VectorLength(local_vel);
	float v2 = vv;
	vv = vv * vv;
float form_percentage_coeff = absnf( acos(dot(ACTUAL_FRAME[i].normal, vn)) / float(pi) );
if (dot(ACTUAL_FRAME[i].normal, vn) <= 0.0) form_percentage_coeff = form_percentage_coeff*0.8; //suction force
float skin_percentage_coeff = absnf(1.0 - form_percentage_coeff); //due to float inaccuracy
element_form_drag = (-vn) * (form_percentage_coeff * 999.1026 * 0.5 * vv * ACTUAL_FRAME[i].surface); //form drag
ALOG("vel: "+FloatToStr(v2) +" m/s");

/*
 * This is a bit complicated, we project velocity vector onto a tested surface, check choord length and calc reynolds num etc
 * ****************************************************************************************************
 * ****************************************************************************************************
 */

//get choord length
float surf_choordlen = getTriangleChoordLength(
-local_vel, ACTUAL_FRAME[i].normal, ACTUAL_FRAME[i].pC,
ACTUAL_FRAME[i].V[0], ACTUAL_FRAME[i].V[1], ACTUAL_FRAME[i].V[2], true, ACTUAL_FRAME[i].pC);
ALOG("surf_choordlen: "+FloatToStr(surf_choordlen));
if (surf_choordlen > 20.0) surf_choordlen = 3.0;

//calc reynolds number
float Rn = ReynoldsNum(VectorLength(local_vel), surf_choordlen, 0.894);
if (Rn > 2.0)
{
float lg = log10(Rn - 2.0);

float CFRn = 0.075 / (lg * lg);

vec3 Vfi = Normalize( ProjectVectorOnPlane(ACTUAL_FRAME[i].normal, -local_vel) );
ALOG("Vfi: "+POINT_TO_TEXT(Vfi));
ALOG("ACTUAL_FRAME[i].surface: " + FloatToStr(ACTUAL_FRAME[i].surface));
ALOG("CFRn: " + FloatToStr(CFRn));
ALOG("skin_percentage_coeff: " + FloatToStr(skin_percentage_coeff));
element_skin_drag = Vfi * (0.5 * 999.1026 * CFRn * ACTUAL_FRAME[i].surface * skin_percentage_coeff);
ALOG("skin: "+POINT_TO_TEXT(element_skin_drag));
}

Now:

this part of code gives wrong results on samsung:


float Rn = ReynoldsNum(VectorLength(local_vel), surf_choordlen, 0.894);
if (Rn > 2.0)
{
float lg = log10(Rn - 2.0);

float CFRn = 0.075 / (lg * lg);
}



where:

inline float ReynoldsNum(float V, float choord_len, float kinematic_viscosity_of_fluid)
{
return (V*choord_len) / kinematic_viscosity_of_fluid;
}

:/

Transcendentals (like log10) are weird - they have very different behavior on various platforms for handling different edge cases. What's the actual value you are passing to log10 in the error case?



Btw if you can get the specific hardware model(s) that are misbehaving, you can look up what CPUs they have and find out more detail on this.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

ill post in a spoiler expected case
[spoiler]


vel: 1.4984 m/s
surf_choordlen: 1.78408
Vfi: X 0.0102074 Y 0.00227052 Z 0.999945
ACTUAL_FRAME[i].surface: 4.11654
CFRn: 4134.78 <-- here
skin_percentage_coeff: 0.320026
skin: X 27775.8 Y 6178.39 Z 2.72098e+06


float Rn = (1.4984 * 1.78408) / 0.894; -> 2.9902298344519015659955257270694
if (Rn > 2.0)
{
Rn in that case should be: 0.9902298344519015659955257270694
float lg = log10(Rn - 2.0);
lg = lg * lg; => 0.98055512503864038156439399626653




float CFRn = 0.075 / (lg * lg); => 0.075 / 0.9855 = 0.07648728570670058156645529789237
but its 4134.78...
}
[/spoiler]
CFRn should be 0.07648728570670058156645529789237 and i get 4134.78
Since i don't have any logging of that Rn-2.0 thing i need to rerun the app and find another error:
vel: 1.73881 m/s
surf_choordlen: 1.54067
Rn - 2.0 = 0.996562 <- this is what comes to log10 function
Vfi: X 0.956411 Y 0.0546405 Z -0.286865 <- doesnt seem to be normalized (even when passed normalization)
ACTUAL_FRAME.surface: 26.2391
CFRn: 33526.4 where


float lg = log10(0.996562);lg = lg * lg; => according to google calculator log10(0.996562) is -0.00149567697
so the square of that is  0.0000022370495985883809

float CFRn = 0.075 / (lg * lg); => 0.075 / 0.0000022370495985883809 = 33526.301807222498769557812760537

skin_percentage_coeff: 0.998636
skin: X 4.19728e+08 Y 2.39794e+07 Z -1.25893e+08
so after that im confused in first example (in spoiler ) it shows wrong thing now it shows that its properly calculated and my equation for skin drag dies there. but still i cant get such big coefficients. they should stay i dont know from 0 to 1.2
additionally when i pass same numbers and check that on all phones
float lg = log10(0.996562);
float CFRn = 0.075 / (lg * lg);
ALOG("SKIN COEFF: "+FloatToStr(CFRn));
i get the same result. but still on samsung i get wrong results.
on sony whole sim is stable, on samsung not i am running the same simulation on both devices thus i am doing each phys iteration by 0.033 ms

I don't know anything about the physics involved here, but log10(Rn - 2.0) looks incredibly strange. I just can't think of many situations where I would want to use the logarithm of a number, but subtract 2 first. The fact that it's a log10 makes me think that this is a parameter that tends to differ by orders of magnitude - it's 10, or 100, or 1000. So, the fact you're starting with a number close to 3 is already fishy, and subtracting 2 from it is more fishy.

I tried to look up the source of that equation, and it looks like it should be log10(Rn) - 2.0. I'm still also skeptical about your Rn. A sample graph I found comparing this and other methods started out with a value of 10,000 on the left side of the graph. I really don't think this model is meant to apply to small Reynolds numbers, whatever that number even means (sorry I do not know the physics at all).

Also, when I see something like this, I just don't know what to think:


if (surf_choordlen > 20.0) surf_choordlen = 3.0;

I know we're dealing with a number less than 20 here, but why clamp down to 3 when this is above 20, while leaving numbers in the 3 to 20 range alone? It's just a sign to me that says, don't trust any of the math here. The last thing I'd think about, when seeing this, is processor specifications. I understand the inclination - I used to work somewhere where they blamed every screwy animation bug on floating point issues.

I think things will work better on all platforms if the math made more sense.

Edit: I tried to edit the extra "I" out of "A sample I graph I found" three times, and it didn't take. What gives?
Edit2: I fixed that, but I cannot delete these remarks down here. I'm going to stop now.

This topic is closed to new replies.

Advertisement