🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Back to General and Gameplay Programming

Double to float C++

General and Gameplay Programming Programming

Started by taby May 13, 2024 04:27 PM

181 comments, last by JoeJ 6 days, 18 hours ago

JoeJ

4,258

May 22, 2024 05:40 PM

Oh no! Please use proper bit manipulation, no strings of ones and zeroes. I beg you…

taby

1,508

Author

May 22, 2024 05:47 PM

It’s only for proof-of-concept; a base reference. Once that’s working, then comes optimization, right? : )

JoeJ

4,258

May 22, 2024 06:06 PM

Yes, but that's not optimization. It is ‘turning an eye sore of a clumsy and hard to read workaround into generic code’.

The only proper zero in that string is the string terminator. Which causes me a grain of uncertainty and doubt beside the desire to vomit all over the place. :D

But seriously, as far as i can read it, all it does is setting n right bits to zero, so the exact same i have proposed yesterday? Sure it makes a difference?

taby

1,508

Author

May 22, 2024 06:23 PM

As you can tell, I’m not a bit twiddler. There are people who are better than me at it…. Such as yourself.

JoeJ

4,258

May 22, 2024 06:41 PM

Yeah, maybe because that's the first thing the C64 manual has teached me. It's still super useful, mostly to pack multiple uints / or bools into one.

Your code should give the same result as this:

double value = PI;
uint64_t bits = (uint64_t&) value;
bits = bits & 0b1111111111111111111111111111111110000000000000000000000000000000ull;

double truncated = (double&) bits;

Clearing the least 31 bits, if i got the counting right.
Binary numbers are quite intuitive to use for bit math, sometimes. : )

taby

1,508

Author

May 22, 2024 07:06 PM

I tried this, and it's super close to perfection, except in some cases:

#include <iostream>
#include <iomanip>
#include <string>
#include <bitset>
using namespace std;


void get_double_bit_string(double d, string& s)
{
	s = "";

	for (int i = 63; i >= 0; i--)
		s += to_string((reinterpret_cast<uint64_t&>(d) >> i) & 1);
}


double truncate_normalized_double(double d)
{
	//return static_cast<double>(static_cast<float>(d));

	double value = d;
	uint64_t bits = (uint64_t&)value;
	bits = bits & 0b1111111111111111111111111111111110000000000000000000000000000000ull;

	double truncated = (double&)bits;

	string sd = "";
	get_double_bit_string(truncated, sd);
	cout << sd << endl;

	//std::bitset<64> Bitset64(sd);

	//uint64_t value = Bitset64.to_ullong();

	//double dv = reinterpret_cast<double&>(value);
	//string sdv = "";
	//get_truncated_bit_string(dv, sdv);
	//cout << sdv << endl;

	double df = static_cast<double>(static_cast<float>(d));
	string sdf = "";
	get_double_bit_string(df, sdf);
	cout << sdf << endl;

	return truncated;
}	



int main(void)
{
	cout << setprecision(20) << endl;

	for(double d = 0.0; d < 1.0; d += 0.1)
		cout << truncate_normalized_double(d) << endl << endl;

	return 0;
}

The results are:

0000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000
0

0011111110111001100110011001100110000000000000000000000000000000
0011111110111001100110011001100110100000000000000000000000000000
0.099999994039535522461

0011111111001001100110011001100110000000000000000000000000000000
0011111111001001100110011001100110100000000000000000000000000000
0.19999998807907104492

0011111111010011001100110011001100000000000000000000000000000000
0011111111010011001100110011001101000000000000000000000000000000
0.29999995231628417969

0011111111011001100110011001100110000000000000000000000000000000
0011111111011001100110011001100110100000000000000000000000000000
0.39999997615814208984

0011111111100000000000000000000000000000000000000000000000000000
0011111111100000000000000000000000000000000000000000000000000000
0.5

0011111111100011001100110011001100000000000000000000000000000000
0011111111100011001100110011001101000000000000000000000000000000
0.59999990463256835938

0011111111100110011001100110011000000000000000000000000000000000
0011111111100110011001100110011001100000000000000000000000000000
0.69999980926513671875

0011111111101001100110011001100110000000000000000000000000000000
0011111111101001100110011001100110100000000000000000000000000000
0.79999995231628417969

0011111111101100110011001100110010000000000000000000000000000000
0011111111101100110011001100110011000000000000000000000000000000
0.89999985694885253906

0011111111101111111111111111111110000000000000000000000000000000
0011111111110000000000000000000000000000000000000000000000000000
0.99999976158142089844

JoeJ

4,258

May 22, 2024 07:39 PM

Seems you need 2 bits more, so

bits = bits & 0b1111111111111111111111111111111110000000000000000000000000000000ull;

should become:

bits = bits & 0b1111111111111111111111111111111111100000000000000000000000000000ull;

Just the last output does not fit the pattern, but should work.

I have a memory corruption bug happening during multithreading, so the crash tells me nothing about the cause.
Thus, after fruitless guessing, i decided to use debug mode. Which i usually can't, because it's slow AF.

After more than two hours of execution, a window popped up. I came just from eating and saw it.
I did read it's an out of bounds write to a std::vector. Great, now i only need to click the button an i'll see where it is, i've thought.

While my hand was moving towards the mouse, the corruption caused a real crash from another thread. :O

Now i can't get back to the first crash. I'm fucked.

taby

1,508

Author

May 22, 2024 07:58 PM

That sucks. Sorry to hear about the issues. :( Bugs suck.

Thank you again for all of your ideas. You're an ideas guy, as well as the coder guy. That's really rare.

JoeJ

4,258

May 22, 2024 09:29 PM

Turned out the debugger was at the right place, and i can prevent further crashes now. : ) But figuring out the true origin of the problem will take me some more time… >:(

Not sure if zeroing out bits is an idea. That's just the primary way to reduce precision. Actually i wanted to propose it in my first reply, but there was a more detailed response already, and then confusion slipped in.

Maybe it's not the precision that matters.
Beside that, you always make the value a little bit smaller, since there is no rounding.
I still think you either miss a term, or the values from the solar system textbook are not accurate enough to serve as ground truth. Maybe gravity of other planets also affects results enough to cause the error you see.

Btw, how can we even tell something is moving at the speed of light, or not moving at all?
We require some reference, like a global world space. But that does not exist, or does it?
Would the movement of the sun itself also affect your results?

taby

1,508

Author

May 23, 2024 01:04 AM

I'm sure that you'll figure it out, after more thought.

Yes, Mach's principle is what you're looking for.

Edit: I'm giving up. Thanks for all of your help!!!

Edit: The code is:

#include <iostream>
#include <iomanip>
#include <string>
#include <bitset>
using namespace std;



void get_double_bit_string(double d, string& s)
{
	s = "";

	for (int i = 63; i >= 0; i--)
		s += to_string((reinterpret_cast<uint64_t&>(d) >> i) & 1);
}


double truncate_normalized_double(double d)
{
	if (d <= 0.0)
		return 0.0;
	else if (d >= 1.0)
		return 1.0;

	//////return static_cast<double>(static_cast<float>(d));

	string s;
	get_double_bit_string(d, s);
	cout << s << endl;

	const int64_t mantissa_size = 52;
	uint64_t max = static_cast<uint64_t>(-1); // 2^64 - 1

	uint64_t bits = reinterpret_cast<uint64_t&>(d);
	bits = bits & 0b1111111111111111111111111111111111100000000000000000000000000000ull;
	double reduced = reinterpret_cast<double&>(bits);

	get_double_bit_string(reduced, s);
	cout << s << endl;

	double df = static_cast<double>(static_cast<float>(d));
	string sdf = "";
	get_double_bit_string(df, sdf);
	cout << sdf << endl;

	return reduced;
}


int main(void)
{
	cout << setprecision(30) << endl;

	for(double d = 0; d < 1.0; d += 0.1)
	cout << truncate_normalized_double(d) << endl << endl;


	return 0;
}

🎉 Celebrating 25 Years of GameDev.net! 🎉

Double to float C++

Popular Topics

Recommended Tutorials

🎉 Celebrating 25 Years of GameDev.net! 🎉

Double to float C++

Popular Topics

Recommended Tutorials

Reticulating splines