Jump to content
  • Advertisement
Sign in to follow this  
auto.magician

DX11 vs_4_0 optimsations

This topic is 2477 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hiya,
Sorry, the title should read ps_4_0 optmisations....


I've searched the forums for this problem and couldn't find anything related.
I'm writing a vs and ps in Dx11.0 using shader level 4. If I use optimisation level 3 in the d3dcompile function, the fps is around 65fps, however if I turn off optimisations using the skip optimisations flag then I get 80fps!
Has anyone heard of this kind of thing? Could it simply be the drivers for my gpu or the gpu itself? The shaders are nothing special, the vs is a basic 'pass-though' to the ps. Any ideas and help is appreciated. The ps is below -

cbuffer ScreenDim
{
float screenWidth;
float screenHeight;
float2 padding;
}

struct PixelLightingType
{
float4 position : SV_POSITION;
float2 tex : TEXCOORDS0;
float4 lightPR : TEXCOORDS1;
float4 lightCI : TEXCOORDS2;
};

Texture2D inTex[2];
SamplerState Sampler;

float4 LightingPixelShader(PixelLightingType input) : SV_TARGET
{
float4 outColor;

float depth = inTex[0].Sample(Sampler,input.tex).r;

float3 normal = inTex[1].Sample(Sampler,input.tex).rgb;
normal = normal*2-1;
normal = normalize(normal);

float3 pixel;
pixel.x = screenWidth * input.tex.x;
pixel.y = screenHeight * input.tex.y;
pixel.z = depth;

float3 shading = 0;

float3 lightDir = input.lightPR.xyz - pixel;
float cone = saturate(1 - length( lightDir)/input.lightPR.w);
if (cone>0)
{
float distance = 1/length(lightDir) * input.lightCI.w;
float amount = max(dot(normal + depth , normalize(distance)),0);

shading = distance * amount * cone * input.lightCI.rgb;
}

outColor = float4(shading,1);
return outColor;
}




Thanks in advance.

Dave

Share this post


Link to post
Share on other sites
Advertisement
If you're actually sure that it's the pixel shader that's causing this performance delta, then I would look at the compiled assembly and see what the differences are. It might be putting in a branch instruction in one version, and flattening it in the other.

Share this post


Link to post
Share on other sites
Hiya MJP,

Yes, I'm 100% sure its the pixel shader. Or at least I think I am :P
I'm compiling the vs and ps seperately and changing the compilation flag only for the ps. I'm not using the fx framework at all. Without optimisations the ps ends up with 45 instruction slots which includes 2 'if else endif' nested one inside the other. The optimised version is only 26 instruction slots with no nesting or branching, but its almost 20% slower.
TBH, the assembly was the first place I looked. Is it worth me posting the assembly output here?
Are you thinking it might be something stalling in the pipeline ?

Share this post


Link to post
Share on other sites
You can try forcing the branch in the optimized version, and see if that speeds things up. Just do this:

[branch]
if (cone>0)
{
...
}

Share this post


Link to post
Share on other sites
Wow, Thankyou.

It increased the instruction slots to 31 but brang the framerate back up to 80.
I've read about those commands in the docs but I thought it would make things slower as more instruction slots would be used. Do you know where I could information in regards to the speed of the shader commands and functions?

Thankyou for that tip and fixing it up! And I've learned something new too.
Thanks again.


Dave.

Share this post


Link to post
Share on other sites
There's not really any direct correlation between shader ASM instructions and performance, or even the number of shader cycles. The driver will JIT compile your ASM shaders into microcode for your specific GPU, and at that could translate into any number of cycles. Plus shader performance in general is pretty complicated, due to texture fetch latency + many threads running in parallel. The vendor-specific tools can give you a better idea when it comes to number of shader cycles, and things of that nature. Either way, a branch can significantly change your performance since you can skip the instructions inside the branch (if enough adjacent threads all take the same branch).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!