Sign in to follow this  
n3Xus

[DX10 & HLSL] Cost of un/packing variables in shaders

Recommended Posts

Hello, I'm doing some packing in the shaders; I move 4 value in the range from 0-255 into one int. I've then checked the shader assembly in PIX to see how many instructions are used (though I dont know how to code shaders in assembly), but pix says just this:
Quote:
ps_4_0 dcl_output o0.xyzw mov o0.xyzw, l(0.857143, 0.714286, 0.714286, 0.000000) ret // Approximately 2 instruction slots used
The HLSL code:
Quote:
// The four ints to get packed into one int b3=4,b2=5,b1=7,b0=6; // Do packing int res=b3<<8; res|=0x000000ff&b2; res<<=8; res|=0x000000ff&b1; res<<=8; res|=0x000000ff&b0; // Extract and return just to see something return float4(0x000000ff&res>>0*8,0x000000ff&res>>2*8,0x000000ff&res>>30*8,0);
The un/packing is correct, but how come it detects "approximately" 2 instructions? How much instructions does this need?

Share this post


Link to post
Share on other sites
As you can see by your example, you only used 1 instruction to do the work in that shader. The reason for this is that the compiler was able to do all of the math up front since none of it relies on uniform or stream input. The shader consists of the instruction which moves the result in to the output register for the shader. Change your code to use a stream or uniform input and you'll see the actual code required to calculate this in a shader.


You can use the command line compiler fxc.exe which comes with the sdk in order to do your compiles and view the instruction output without having to add PIX to the process.

Share this post


Link to post
Share on other sites
It is worth noting that the performance profile is hardware-specific regardless of the complexity of the shader intermediate code.

In current graphics hardware generation, bit shifts and bit masking are generally much slower than simple float math, since said operations are commonly implemented in transcendental ALUs and there are generally a lot less of those than "simple" ALUs. However, in the near future, consumer GPUs could have discrete integer ALUs so the performance profile of this technique would change for the better.

In this non-dynamic case, the D3D is able to optimize the shifts out but if your input is variable, the hardware may be forced to do actual bit shifting.

This is why you should profile your shader performance using PIX and/or manufacturer-specific profilers, using as many different cards as you can. That said, worrying about things like this is premature optimization, given the (non)complexity of the shaders presented. It should be enough at this point that all cards capable of SM4 will run the code correctly.

[Edited by - Nik02 on July 24, 2009 11:40:18 PM]

Share this post


Link to post
Share on other sites
Also, shader intermediate code doesn't necessarily have 1:1 correspondence to the machine-specific code that the driver creates for the hardware to run.

This is due to the fact that only the driver knows how the hardware actually implements the ops, and D3D itself doesn't.

The same goes for CPU assembly; even though the instructions may seem very specific, the actual hardware and/or CPU driver will at least rearrange the instructions in order to fill its arithmetic logic and memory loading units in the most efficient way. "The most efficient way" is a highly moving target since other processes will also compete for the resources in a modern OS.

Share this post


Link to post
Share on other sites
ATI has a tool (GPU Shaderanalyzer) that shows you how the actual shader byte code looks in real GPU ISA code. It’s although can tell you how fast the different GPUs can run such code.

nVidias tool for this is called ShaderPerf. The last time I checked it only gives you performances values but doesn’t show the GPU code.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this