Jump to content
  • Advertisement
Sign in to follow this  
JoeJ

Is there a trick to use signed floats with integer atomic min/max?

This topic is 492 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I want to calculate a bounding box form points given by float numbers on GPU, but it has atomic min/max operations only for integers.
Usually i multiply the float value by a large number and round to integer,
but if all values are positive i could just use the float bits as they are and integer min/max works properly without a need to convert from float to int.
Of course my values are positive and negative.

Is there any trick i miss here?

Share this post


Link to post
Share on other sites
Advertisement
[EDIT: Oh, I deleted the first paragraph because it's probably incorrect. I have to think about it a bit more.]

However, you can also implement reduce operations without atomics. The Thrust library, for instance, gives you reductions: http://docs.nvidia.com/cuda/thrust/#reductions Edited by �lvaro

Share this post


Link to post
Share on other sites
You can treat the bit pattern as a "sign and magnitude" format integer. Convert that into a two's compliment integer, do the atomic stuff, then convert back to sign and magnitude and bitwise cast back to float.

This problem also applies to counting sort algorithm, so a search for how to radix sort floats will find example solutions.

Share this post


Link to post
Share on other sites
Here's one way to do the conversion Hodgman talks about (which I botched badly in my first attempt):
unsigned flip_negatives(unsigned x) {
  return x ^ ((-(x >> 31)) & 0x7fffffffu);
}

Share this post


Link to post
Share on other sites
Yeah, found the same meanwhile with the radix keyword: http://stereopsis.com/radix.html
That's what i've expected, thanks guys!

Here the code again also with the back conversion:
static inline uint32_t F2U(uint32_t f)
{
uint32_t mask = -int32_t(f >> 31) | 0x80000000;
return f ^ mask;
}

static inline uint32_t U2F(uint32_t f)
{
uint32_t mask = ((f >> 31) - 1) | 0x80000000;
return f ^ mask;
}

I'll try the scan too but on recent GPUs atomics in LDS seem always faster than a scan for min/max. Edited by JoeJ

Share this post


Link to post
Share on other sites

Here's one way to do the conversion Hodgman talks about (which I botched badly in my first attempt):

unsigned flip_negatives(unsigned x) {
  return x ^ ((-(x >> 31)) & 0x7fffffffu);
}


Spotted a difference and your code is probably still buggy.
It has a single order error when going from negative to positive, with this test code:
(ignoring conversiation errors because i don't know your bock conversion)

I'm too lazy to figure out how it works exactly rigth now, so i'm just 99% sure :D

	struct stuff
	{
		static inline uint32_t F2U(uint32_t f)
		{
			uint32_t mask = -int32_t(f >> 31) | 0x80000000;
			return f ^ mask;
			//return f ^ ((-(f >> 31)) & 0x7fffffffu); // <- does not work
		}

		static inline uint32_t U2F(uint32_t f)
		{
			uint32_t mask = ((f >> 31) - 1) | 0x80000000;
			return f ^ mask;
		}
	};

	static int k=1;
	if(k)
	{
		k = 0;
		SystemTools::Log("FLOAT MAX TEST\n");
		
		uint32_t old = 0;
		for (float f=-50; f<50; f+=0.1f)
		{
			uint32_t i = (uint32_t&) f;
			i = stuff::F2U(i);
			if (old > i) SystemTools::Log("ORDER ERROR i %f\n", f);
			old = i;

			uint32_t back = stuff::U2F(i);
			if ( ((float&)back) != f) SystemTools::Log("CONVERSION ERROR i %f\n", f); 
		}
		SystemTools::Log("\n");
	}

Share this post


Link to post
Share on other sites

Here's one way to do the conversion Hodgman talks about (which I botched badly in my first attempt):

unsigned flip_negatives(unsigned x) {
  return x ^ ((-(x >> 31)) & 0x7fffffffu);
}


Spotted a difference and your code is probably still buggy.
It has a single order error when going from negative to positive, with this test code:
(ignoring conversiation errors because i don't know your bock conversion)


No, my code does something slightly different than the one you are using, but it's correct. My code requires interpreting the result as a signed integer before you compare. You can think of the one you found as adding 2^31 to every number so the comparisons can be done as unsigned integers. There is very little difference between both options.

Share this post


Link to post
Share on other sites
Ah sorry, but i like your version a lot more.
We can rewrite this even simpler with signed numbers:
 
int32_t flip_negatives(int32_t f)
{
	return f ^ ((f >> 31) & 0x7fffffff); // if negative flip all bits except the sign to flip the direction numbers increase
}
The back conversation is just the same function and does not need a addition, so your version is clearly better.

Also, after looking closely it boils down to the exact thing i've had in mind before starting the thread - but i did not dare to try due to floating point paranoia :D Edited by JoeJ

Share this post


Link to post
Share on other sites

All those conversions are neat and all, but are they actually the right solution to the REAL underlying problem?

I want to calculate a bounding box form points given by float numbers on GPU, but it has atomic min/max operations only for integers.

Yes, converting to integer work is faster for that part of building a bounding box. And doing it as an integer allows using the atomic operation that works in place. However, non-atomic min/max is supported just fine if you are modifying a buffer instead of a single value, and doesn't suffer the performance penalty of the atomic ordering. Yes you will need a slightly different algorithm to gather the results, but that is quite a collection of bitwise operations and they are not free either.  Even with those integer-based bitwise operations you've still got the waiting for atomic operations as you slowly build your bounding box.

Secondly, if you're looking at bounding boxes of objects that can be rendered, why are you doing it that way?  If they were small and trivial you wouldn't need to be asking, so we can assume their bigger data sets.  In that case, there are alternative methods to calculate bounding boxes on the GPU, like this one

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!