• 12
• 12
• 9
• 10
• 13

# request HLSL support for sqrt() for integers

This topic is 769 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

The problem is that float operations are not deterministic on GPU so sqrt() should support ints for those people that want determinism.

I created my own sqrt() for ints, Its actually a function to get the length of a 2D uint vector but it has the limitation of only getting the correct results for numbers from 0 to 2^15.

uint length2D(uint2 u, uint b)
{
//dot
uint l = u.x*u.x + u.y*u.y;
//aproximation of sqrt()
b = b + ((u.x + u.y - b) >> 1);
//aproximate it more using babilonian method
b = (b + (l / b)) >> 1;
b = (b + (l / b)) >> 1;
b = (b + (l / b)) >> 1;
return b;
}


"b" is the biggest value in the "u" vector

oops its actually a request for a faster length() function for ints that can support every vector dimension(from 2D to 4D) and correct answer for every number from 0 to 2^32.

Edited by lomateron

##### Share on other sites

sqrt is as deterministic as every other instruction.

I'm not sure what gave you the impression otherwise.

##### Share on other sites
Floating point operations surely are not deterministic on GPU, but I'm pretty sure that casting an int to a float, then sqrt(), then cast back to int (truncate, floor, ceil) will result in deterministic results.

##### Share on other sites

If we're using the usual definition of "determinism" to mean a system that doesn't produce random results (ie, same input + same set of operations = same output. Every time.) then I fail to see how any of the normal operations on a GPU can be classified as non-deterministic.

Now, if you're talking about things that are sensitive to timing (like Atomic operations, UAV writes) then you can get some non-determinism, but only by virtue of having started operating on a shared resource with many threads. This is the same non-determinism you'd get on any architecture, CPUs included.

##### Share on other sites

arrrgg I messed up in the title, pls read the whole question, I would like to request to the people that make the intrinsic HLSL functions to make a length() function for uint vectors that gets a correct answer for every vector value uint4(0 to 2^32,0 to 2^32,0 to 2^32,0 to 2^32)

##### Share on other sites

Your request doesn't really make sense.

The value of a function that returns '1' for the length of the vector uint2(1,1) doesn't seem very useful.

What do you propose the function does when the length of the vector exceeds 2^32 and can no longer be represented in a 32 bit uint?

##### Share on other sites

so there is that case when the length of the vector is 1 and another case when its 0 and some other cases when it is bigger than 2^32 but there are a whole lot more cases when it actually makes sense and those outweighs the other, so it actually makes sense to make that function, its the same with float operations isn't it, its even worse, there are more cases when length() doesn't work with float vectors.

what do I propose whent its bigger than 2^32? return the same value when any number is divided by 0.

Edited by lomateron

##### Share on other sites

Your premise for wanting the function seemed to be based on the idea that doing it in floating point was somehow non-deterministic. Your function is going to give incorrect results every time the the length of the vector isn't a whole integer, so why not do it in floating point where you'll get a far closer to correct result?

Bear in mind your method is going to break down with values a lot lower than 2^32. As soon as 'l' exceeds 2^32 your entire function is broken. A vector of uint2(66000, 66000) will produce a completely nonsensical result.

##### Share on other sites

just tested HLSL intrinsic length() vs my integer length2D() and length3D() my functions are deterministic, HLSL intrinsic length() isn't, tested on an old ATI vs a new NVIDIA, on HLSL 4. You should test it yourself and that should be enough reason, for the people that want determinism.

Edited by lomateron

##### Share on other sites

If we're using the usual definition of "determinism" to mean a system that doesn't produce random results (ie, same input + same set of operations = same output. Every time.) then I fail to see how any of the normal operations on a GPU can be classified as non-deterministic.

Now, if you're talking about things that are sensitive to timing (like Atomic operations, UAV writes) then you can get some non-determinism, but only by virtue of having started operating on a shared resource with many threads. This is the same non-determinism you'd get on any architecture, CPUs included.

For two different machines to produce the same output (GPU speaking), they must follow these rules:

1. Exact same GPU chip (not even different revisions).
2. Same drivers (to generate the same ISA).
3. Same version of HLSL compiler (if compiling from source).

Otherwise the result will not be deterministic across machines. This is very different from x86/x64 and ARM CPUs where the same assembly with the same input will result in the same output even across different Intel & AMD chips, as long as you stay away from some transcendental FPU functions (like acos), some non-determinstic instructions (RCPPS & RSQRTPS) and ignoring certain models with HW bugs (e.g. FDIV bug)

Edited by Matias Goldberg