Sign in to follow this  

Difference between Tecture object Load call with/without offset?

Recommended Posts

Hey Guys,

 

Is the following code effectively the same:

 

```

uint uSampleDepth = inputTex.Load(i3CurrentUVIdx, int2(i, 0));

uint uSampleDepth = inputTex.Load(i3CurrentUVIdx + int3(i, 0, 0));

```

 

They produce exactly the same result, but if you put the first one inside a dynamic loop, the hlsl compiler will complain:

warning X3582: texture access must have literal offset and multisample index, forcing loop to unroll...error X3511: unable to unroll loop, loop does not appear to terminate in a timely manner (1024 iterations)
 
I understand you can't put inputTex.Sample(...) inside dynamic loop, but why Load with offset will cause the same error?
 
Also what's the difference between specify offset in Load call and apply offset to first param?
 
Thanks

Share this post


Link to post
Share on other sites

In MSDN Documentation it says:

 

"An optional offset applied to the texture coordinates before sampling. The offset type is dependent on the texture-object type."

 

 

So if you use an offset of 0, you will get the same result because no offset is applied. Did you try hard coding the offset to see if results change?

Share this post


Link to post
Share on other sites

See the documentation for the ld assembly instruction. For the first example you listed, the offset will be applied using the "address offset by immediate integer", as described in the section called "Address Offset". This offset needs to be a literal: in other words, it needs to be "hard-coded" into the shader program. This is why the compiler is trying to unroll the loop for you, so that it convert the offset into a hard-coded literal. For the second example you listed, the offset will applied to the original address using normal arithmetic instructions to store the result in a general purpose register, which will then be used as the srcAddress parameter for the ld instruction. This works fine in a dynamic loop, since the register value can just be re-calculated every loop iteration.

 

Whether or not there's any performance difference between the variations depends on the hardware. Some hardware might natively support immediate offsets, and will therefore avoid the extra arithmetic required to compute the final offseted address. Other hardware might not support that at all, and will end up doing the equivalent of your second example when it JIT compiles the D3D assembly into the native bytecode for that GPU architecture.

Share this post


Link to post
Share on other sites

In MSDN Documentation it says:

 

"An optional offset applied to the texture coordinates before sampling. The offset type is dependent on the texture-object type."

 

 

So if you use an offset of 0, you will get the same result because no offset is applied. Did you try hard coding the offset to see if results change?

Thanks nicolas, but I mean, the differences of applying the offset though the "optional param" and applying the offset directly to location... If we can achieve the same result by just apply the offset directly to 'uv' param, what's the point of having the 'optional offset' param in the API?


See the documentation for the ld assembly instruction. For the first example you listed, the offset will be applied using the "address offset by immediate integer", as described in the section called "Address Offset". This offset needs to be a literal: in other words, it needs to be "hard-coded" into the shader program. This is why the compiler is trying to unroll the loop for you, so that it convert the offset into a hard-coded literal. For the second example you listed, the offset will applied to the original address using normal arithmetic instructions to store the result in a general purpose register, which will then be used as the srcAddress parameter for the ld instruction. This works fine in a dynamic loop, since the register value can just be re-calculated every loop iteration.

 

Whether or not there's any performance difference between the variations depends on the hardware. Some hardware might natively support immediate offsets, and will therefore avoid the extra arithmetic required to compute the final offseted address. Other hardware might not support that at all, and will end up doing the equivalent of your second example when it JIT compiles the D3D assembly into the native bytecode for that GPU architecture.

Thanks MJP, I just learned that I can find the difference all myself by examining the assembly :) BTW is there any convenient online HLSL 'compiler' which given the HLSL code, can show you the assembly like c++ with gotdbolt.org?

Edited by Mr_Fox

Share this post


Link to post
Share on other sites

is there any convenient online HLSL 'compiler' which given the HLSL code, can show you the assembly

 

Probably this would not help to find out how the final hardware handles the offset, assuming the code goes through a toolchain like this: HLSL -> DX byte code (=='assembly'?) -> Driver -> Nivia PTX -> final hardware machine code.

I guess only the machine code would clarify and AFAIK Nvidia does not show it (AMD does).

 

I remember a guy managed to get PTX by provoking a compile error, so a file spit out somewhere contained PTX code. But i don't remember any details and where i've read this. There may be an unofficial tool.

 

However, the additional arithmetic won't make a difference on a slow operation like a texture fetch, but eventually you can save a register and this could matter.

I'd try again to get Nsight to work. Looking here proofs it gives all the necessery information you need to know how well your stuff performs: http://docs.nvidia.com/gameworks/content/developertools/desktop/analysis/report/cudaexperiments/kernellevel/achievedoccupancy.htm

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this