# request HLSL support for sqrt() for integers

This topic is 675 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

The problem is that float operations are not deterministic on GPU so sqrt() should support ints for those people that want determinism.

I created my own sqrt() for ints, Its actually a function to get the length of a 2D uint vector but it has the limitation of only getting the correct results for numbers from 0 to 2^15.

uint length2D(uint2 u, uint b)
{
//dot
uint l = u.x*u.x + u.y*u.y;
//aproximation of sqrt()
b = b + ((u.x + u.y - b) >> 1);
//aproximate it more using babilonian method
b = (b + (l / b)) >> 1;
b = (b + (l / b)) >> 1;
b = (b + (l / b)) >> 1;
return b;
}


"b" is the biggest value in the "u" vector

oops its actually a request for a faster length() function for ints that can support every vector dimension(from 2D to 4D) and correct answer for every number from 0 to 2^32.

Edited by lomateron

##### Share on other sites

sqrt is as deterministic as every other instruction.

I'm not sure what gave you the impression otherwise.

##### Share on other sites
Floating point operations surely are not deterministic on GPU, but I'm pretty sure that casting an int to a float, then sqrt(), then cast back to int (truncate, floor, ceil) will result in deterministic results.

##### Share on other sites

If we're using the usual definition of "determinism" to mean a system that doesn't produce random results (ie, same input + same set of operations = same output. Every time.) then I fail to see how any of the normal operations on a GPU can be classified as non-deterministic.

Now, if you're talking about things that are sensitive to timing (like Atomic operations, UAV writes) then you can get some non-determinism, but only by virtue of having started operating on a shared resource with many threads. This is the same non-determinism you'd get on any architecture, CPUs included.

##### Share on other sites

arrrgg I messed up in the title, pls read the whole question, I would like to request to the people that make the intrinsic HLSL functions to make a length() function for uint vectors that gets a correct answer for every vector value uint4(0 to 2^32,0 to 2^32,0 to 2^32,0 to 2^32)

##### Share on other sites

Your request doesn't really make sense.

The value of a function that returns '1' for the length of the vector uint2(1,1) doesn't seem very useful.

What do you propose the function does when the length of the vector exceeds 2^32 and can no longer be represented in a 32 bit uint?

##### Share on other sites

so there is that case when the length of the vector is 1 and another case when its 0 and some other cases when it is bigger than 2^32 but there are a whole lot more cases when it actually makes sense and those outweighs the other, so it actually makes sense to make that function, its the same with float operations isn't it, its even worse, there are more cases when length() doesn't work with float vectors.

what do I propose whent its bigger than 2^32? return the same value when any number is divided by 0.

Edited by lomateron

##### Share on other sites

Your premise for wanting the function seemed to be based on the idea that doing it in floating point was somehow non-deterministic. Your function is going to give incorrect results every time the the length of the vector isn't a whole integer, so why not do it in floating point where you'll get a far closer to correct result?

Bear in mind your method is going to break down with values a lot lower than 2^32. As soon as 'l' exceeds 2^32 your entire function is broken. A vector of uint2(66000, 66000) will produce a completely nonsensical result.

##### Share on other sites

just tested HLSL intrinsic length() vs my integer length2D() and length3D() my functions are deterministic, HLSL intrinsic length() isn't, tested on an old ATI vs a new NVIDIA, on HLSL 4. You should test it yourself and that should be enough reason, for the people that want determinism.

Edited by lomateron

##### Share on other sites

If we're using the usual definition of "determinism" to mean a system that doesn't produce random results (ie, same input + same set of operations = same output. Every time.) then I fail to see how any of the normal operations on a GPU can be classified as non-deterministic.

Now, if you're talking about things that are sensitive to timing (like Atomic operations, UAV writes) then you can get some non-determinism, but only by virtue of having started operating on a shared resource with many threads. This is the same non-determinism you'd get on any architecture, CPUs included.

For two different machines to produce the same output (GPU speaking), they must follow these rules:

1. Exact same GPU chip (not even different revisions).
2. Same drivers (to generate the same ISA).
3. Same version of HLSL compiler (if compiling from source).

Otherwise the result will not be deterministic across machines. This is very different from x86/x64 and ARM CPUs where the same assembly with the same input will result in the same output even across different Intel & AMD chips, as long as you stay away from some transcendental FPU functions (like acos), some non-determinstic instructions (RCPPS & RSQRTPS) and ignoring certain models with HW bugs (e.g. FDIV bug)

Edited by Matias Goldberg

##### Share on other sites

For two different machines to produce the same output (GPU speaking), they must follow these rules:

Therein lies the difference between the two arguments we're making.

Is floating point arithmetic deterministic? Yes.

Can I run the same code across multiple vendors' hardware and expect identical results? No, that's not what I said or claimed.

The fact that the same code can produce different results on different hardware doesn't make "float operations non-deterministic". Saying things like "non-deterministic across machines" is a not really a very good use of terms. The results are deterministic on machine A and deterministic on machine B, but the fact that you can't predict the output on Machine C doesn't mean "float operations are non-deterministic".

If floating point operations were non-deterministic they'd be great source of random numbers!

##### Share on other sites
Your function is going to give incorrect results every time the the length of the vector isn't a whole integer, so why not do it in floating point where you'll get a far closer to correct result?

float operations have a margin of error too, when using uints you just have to change your definition of what 0 to 1 means, for example 0 to 1 in my world physics engine is 0 to 2^9 uint.

Edited by lomateron

##### Share on other sites
Your function is going to give incorrect results every time the the length of the vector isn't a whole integer, so why not do it in floating point where you'll get a far closer to correct result?

The same thing happens with float operations, when using uints you just have to change your definition of what 0 to 1 means, for example 0 to 1 in my world physics engine is 0 to 2^9 uint.

If large errors are less important to you than predictable results across different hardware then by all means implement your length functions however you want, but don't expect that they'll get added to HLSL.

##### Share on other sites

Can I ran the same code across multiple vendors' hardware and expect identical results? No, that's not what I said or claimed.

It's what the topic is about though...

The fact that the same code can produce different results on different hardware doesn't make "float operations non-deterministic".

It's pretty common to call game code that produces different results on different machines "non-deterministic". It's also common for games requiring determinism to jump through a lot of hoops when it comes to floating point precision and machine implementation differences, such as using fixed-point and implementing all your math (including sqrt/length calculations) using integer instructions...

##### Share on other sites

If large errors are less important to you than predictable results across different hardware

wait what?

uint operations are more accurate than float operations, as I said you can change the definition of 0 to 1 at your own taste, so I can make (0 to 1) = (0 to 2^25) and this will make it 2 times more accurate than (0 to 1) float

Edited by lomateron

##### Share on other sites

If large errors are less important to you than predictable results across different hardware

wait what?

uint operations are more accurate than float operations, as I said you can change the definition of 0 to 1 at your own taste, so I can make (0 to 1) = (0 to 2^25) and this will make it 2 times more accurate than (0 to 1) float

Let me try and understand your system then:

You said you mapped "0-1" in your world to 0 - 2^9 uint.

So if you want the length of the floating point vector float2(1 / 512, 1 / 512) you would instead call your integer length2D function with uint2(1,1)?

The truly accurate length of float2(1 / 512, 1 / 512) is 0.002762136.

Your length 2D function when called with uint2(1,1) will return 1. 1 in your "0 - 2^9" = "0 - 1" system is equivalent to 0.001953125.

##### Share on other sites

wait what?

uint operations are more accurate than float operations, as I said you can change the definition of 0 to 1 at your own taste, so I can make (0 to 1) = (0 to 2^25) and this will make it 2 times more accurate than (0 to 1) float

As I already pointed out, since you're calculating (x *x) + (y * y) you can't guarantee avoiding overflow unless you ensure X and Y don't exceed 2^15.

(2^25 * 2^25) + (2^25 + 2^25) = 2 ^ 51.

##### Share on other sites

For two different machines to produce the same output (GPU speaking), they must follow these rules:

Therein lies the difference between the two arguments we're making.

Is floating point arithmetic deterministic? Yes.

Can I run the same code across multiple vendors' hardware and expect identical results? No, that's not what I said or claimed.

The fact that the same code can produce different results on different hardware doesn't make "float operations non-deterministic". Saying things like "non-deterministic across machines" is a not really a very good use of terms. The results are deterministic on machine A and deterministic on machine B, but the fact that you can't predict the output on Machine C doesn't mean "float operations are non-deterministic".

If floating point operations were non-deterministic they'd be great source of random numbers!

You are, strictly speaking, correct. But considering that something as simple as a driver update or a DirectX runtime update can cause your shaders to return a different output, makes this kind of determinism useless to almost all practical use cases.

##### Share on other sites

You are, strictly speaking, correct. But considering that something as simple as a driver update or a DirectX runtime update can cause your shaders to return a different output, makes this kind of determinism useless to almost all practical use cases.

Being able to normalize/transform/manipulate vectors and have the result be the same from one frame to the next is critically important to 3D graphics unless you're content for the entire screen to jitter and wobble uncontrollably.

##### Share on other sites

As I already pointed out, since you're calculating (x *x) + (y * y) you can't guarantee avoiding overflow unless you ensure X and Y don't exceed 2^15.

HLSL trigonometric functions have some magic going on behind it, that's why I want them to do a length() for uints, they could make something that will let the function calculate all vectors that have a length less than 2^32

Edited by lomateron

##### Share on other sites

As I already pointed out, since you're calculating (x *x) + (y * y) you can't guarantee avoiding overflow unless you ensure X and Y don't exceed 2^15.

HLSL trigonometric functions have some magic going on behind it, that's why I want them to do a length() for uints, they could make something that will let the function calculate all vectors that have a length less than 2^32

Magic? What magic does it perform?

You realise "length" in HLSL is implemented as: sqrt(x * x + y * y)?

##### Share on other sites

HLSL trigonometric functions are sin() cos() tan() etc...

##### Share on other sites

My point is that you're not really asking for "length" for integers to be added to HLSL, but rather new instructions to be added to future GPU hardware that perform some sort of "magic", as you put it.

For reference, your implementation of length2D is approximately 13 times slower than floating point length2D on AMD hardware, so I hope you don't call it too often.

##### Share on other sites

My point is that you're not really asking for "length" for integers to be added to HLSL, but rather new instructions to be added to future GPU hardware that perform some sort of "magic", as you put it.

The "magic" is standard fixed-point stuff.

That said, HLSL isn't likely to add any new intrinsics for instructions that don't exist on any GPUs. You'd be better off petitioning NVidia/AMD with a good argument for why they should spend transistor-space on a fixed-point instruction set.

However... even if you could convince AMD/NVidia to implement fixed-point sqrt, they'd probably each implement it using different levels of approximation, so you'd be left in the same position as with floats!! :lol:

Edited by Hodgman

##### Share on other sites

Since you want the same results for sqrt on different GPUs, I assume that what you need this for is not directly related to graphics, since if it was, a small error would most likely not matter.

If that is the case, you should consider OpenCL or CUDA where you can enforce IEEE 754 floating point compliance on recent GPUs.

##### Share on other sites

This topic is 675 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.