Is there a trick to use signed floats with integer atomic min/max?
Usually i multiply the float value by a large number and round to integer,
but if all values are positive i could just use the float bits as they are and integer min/max works properly without a need to convert from float to int.
Of course my values are positive and negative.
Is there any trick i miss here?
However, you can also implement reduce operations without atomics. The Thrust library, for instance, gives you reductions: http://docs.nvidia.com/cuda/thrust/#reductions
This problem also applies to counting sort algorithm, so a search for how to radix sort floats will find example solutions.
unsigned flip_negatives(unsigned x) {
return x ^ ((-(x >> 31)) & 0x7fffffffu);
}
That's what i've expected, thanks guys!
Here the code again also with the back conversion:
static inline uint32_t F2U(uint32_t f)
{
uint32_t mask = -int32_t(f >> 31) | 0x80000000;
return f ^ mask;
}
static inline uint32_t U2F(uint32_t f)
{
uint32_t mask = ((f >> 31) - 1) | 0x80000000;
return f ^ mask;
}
I'll try the scan too but on recent GPUs atomics in LDS seem always faster than a scan for min/max.
Here's one way to do the conversion Hodgman talks about (which I botched badly in my first attempt):
unsigned flip_negatives(unsigned x) { return x ^ ((-(x >> 31)) & 0x7fffffffu); }
Spotted a difference and your code is probably still buggy.
It has a single order error when going from negative to positive, with this test code:
(ignoring conversiation errors because i don't know your bock conversion)
I'm too lazy to figure out how it works exactly rigth now, so i'm just 99% sure :D
struct stuff
{
static inline uint32_t F2U(uint32_t f)
{
uint32_t mask = -int32_t(f >> 31) | 0x80000000;
return f ^ mask;
//return f ^ ((-(f >> 31)) & 0x7fffffffu); // <- does not work
}
static inline uint32_t U2F(uint32_t f)
{
uint32_t mask = ((f >> 31) - 1) | 0x80000000;
return f ^ mask;
}
};
static int k=1;
if(k)
{
k = 0;
SystemTools::Log("FLOAT MAX TEST\n");
uint32_t old = 0;
for (float f=-50; f<50; f+=0.1f)
{
uint32_t i = (uint32_t&) f;
i = stuff::F2U(i);
if (old > i) SystemTools::Log("ORDER ERROR i %f\n", f);
old = i;
uint32_t back = stuff::U2F(i);
if ( ((float&)back) != f) SystemTools::Log("CONVERSION ERROR i %f\n", f);
}
SystemTools::Log("\n");
}
Here's one way to do the conversion Hodgman talks about (which I botched badly in my first attempt):
unsigned flip_negatives(unsigned x) { return x ^ ((-(x >> 31)) & 0x7fffffffu); }
Spotted a difference and your code is probably still buggy.
It has a single order error when going from negative to positive, with this test code:
(ignoring conversiation errors because i don't know your bock conversion)
No, my code does something slightly different than the one you are using, but it's correct. My code requires interpreting the result as a signed integer before you compare. You can think of the one you found as adding 2^31 to every number so the comparisons can be done as unsigned integers. There is very little difference between both options.
We can rewrite this even simpler with signed numbers:
int32_t flip_negatives(int32_t f)
{
return f ^ ((f >> 31) & 0x7fffffff); // if negative flip all bits except the sign to flip the direction numbers increase
}
The back conversation is just the same function and does not need a addition, so your version is clearly better.Also, after looking closely it boils down to the exact thing i've had in mind before starting the thread - but i did not dare to try due to floating point paranoia :D
All those conversions are neat and all, but are they actually the right solution to the REAL underlying problem?
I want to calculate a bounding box form points given by float numbers on GPU, but it has atomic min/max operations only for integers.
Yes, converting to integer work is faster for that part of building a bounding box. And doing it as an integer allows using the atomic operation that works in place. However, non-atomic min/max is supported just fine if you are modifying a buffer instead of a single value, and doesn't suffer the performance penalty of the atomic ordering. Yes you will need a slightly different algorithm to gather the results, but that is quite a collection of bitwise operations and they are not free either. Even with those integer-based bitwise operations you've still got the waiting for atomic operations as you slowly build your bounding box.
Secondly, if you're looking at bounding boxes of objects that can be rendered, why are you doing it that way? If they were small and trivial you wouldn't need to be asking, so we can assume their bigger data sets. In that case, there are alternative methods to calculate bounding boxes on the GPU, like this one.