• Advertisement
Sign in to follow this  

DX11 vs_4_0 optimsations

This topic is 2298 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Sorry, the title should read ps_4_0 optmisations....

I've searched the forums for this problem and couldn't find anything related.
I'm writing a vs and ps in Dx11.0 using shader level 4. If I use optimisation level 3 in the d3dcompile function, the fps is around 65fps, however if I turn off optimisations using the skip optimisations flag then I get 80fps!
Has anyone heard of this kind of thing? Could it simply be the drivers for my gpu or the gpu itself? The shaders are nothing special, the vs is a basic 'pass-though' to the ps. Any ideas and help is appreciated. The ps is below -

cbuffer ScreenDim
float screenWidth;
float screenHeight;
float2 padding;

struct PixelLightingType
float4 position : SV_POSITION;
float2 tex : TEXCOORDS0;
float4 lightPR : TEXCOORDS1;
float4 lightCI : TEXCOORDS2;

Texture2D inTex[2];
SamplerState Sampler;

float4 LightingPixelShader(PixelLightingType input) : SV_TARGET
float4 outColor;

float depth = inTex[0].Sample(Sampler,input.tex).r;

float3 normal = inTex[1].Sample(Sampler,input.tex).rgb;
normal = normal*2-1;
normal = normalize(normal);

float3 pixel;
pixel.x = screenWidth * input.tex.x;
pixel.y = screenHeight * input.tex.y;
pixel.z = depth;

float3 shading = 0;

float3 lightDir = input.lightPR.xyz - pixel;
float cone = saturate(1 - length( lightDir)/input.lightPR.w);
if (cone>0)
float distance = 1/length(lightDir) * input.lightCI.w;
float amount = max(dot(normal + depth , normalize(distance)),0);

shading = distance * amount * cone * input.lightCI.rgb;

outColor = float4(shading,1);
return outColor;

Thanks in advance.


Share this post

Link to post
Share on other sites
If you're actually sure that it's the pixel shader that's causing this performance delta, then I would look at the compiled assembly and see what the differences are. It might be putting in a branch instruction in one version, and flattening it in the other.

Share this post

Link to post
Share on other sites
Hiya MJP,

Yes, I'm 100% sure its the pixel shader. Or at least I think I am :P
I'm compiling the vs and ps seperately and changing the compilation flag only for the ps. I'm not using the fx framework at all. Without optimisations the ps ends up with 45 instruction slots which includes 2 'if else endif' nested one inside the other. The optimised version is only 26 instruction slots with no nesting or branching, but its almost 20% slower.
TBH, the assembly was the first place I looked. Is it worth me posting the assembly output here?
Are you thinking it might be something stalling in the pipeline ?

Share this post

Link to post
Share on other sites
Wow, Thankyou.

It increased the instruction slots to 31 but brang the framerate back up to 80.
I've read about those commands in the docs but I thought it would make things slower as more instruction slots would be used. Do you know where I could information in regards to the speed of the shader commands and functions?

Thankyou for that tip and fixing it up! And I've learned something new too.
Thanks again.


Share this post

Link to post
Share on other sites
There's not really any direct correlation between shader ASM instructions and performance, or even the number of shader cycles. The driver will JIT compile your ASM shaders into microcode for your specific GPU, and at that could translate into any number of cycles. Plus shader performance in general is pretty complicated, due to texture fetch latency + many threads running in parallel. The vendor-specific tools can give you a better idea when it comes to number of shader cycles, and things of that nature. Either way, a branch can significantly change your performance since you can skip the instructions inside the branch (if enough adjacent threads all take the same branch).

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By Stewie.G
      I've been trying to implement a basic gaussian blur using the gaussian formula, and here is what it looks like so far:
      float gaussian(float x, float sigma)
          float pi = 3.14159;
          float sigma_square = sigma * sigma;
          float a = 1 / sqrt(2 * pi*sigma_square);
          float b = exp(-((x*x) / (2 * sigma_square)));
          return a * b;
      My problem is that I don't quite know what sigma should be.
      It seems that if I provide a random value for sigma, weights in my kernel won't add up to 1.
      So I ended up calling my gaussian function with sigma == 1, which gives me weights adding up to 1, but also a very subtle blur.
      Here is what my kernel looks like with sigma == 1
              [0]    0.0033238872995488885    
              [1]    0.023804742479357766    
              [2]    0.09713820127276819    
              [3]    0.22585307043511713    
              [4]    0.29920669915475656    
              [5]    0.22585307043511713    
              [6]    0.09713820127276819    
              [7]    0.023804742479357766    
              [8]    0.0033238872995488885    
      I would have liked it to be more "rounded" at the top, or a better spread instead of wasting [0], [1], [2] with values bellow 0.1.
      Based on my experiments, the key to this is to provide a different sigma, but if I do, my kernel values no longer adds up to 1, which results to a darker blur.
      I've found this post 
      ... which helped me a bit, but I am really confused with this the part where he divide sigma by 3.
      Can someone please explain how sigma works? How is it related to my kernel size, how can I balance my weights with different sigmas, ect...
      Thanks :-)
    • By mc_wiggly_fingers
      Is it possible to asynchronously create a Texture2D using DirectX11?
      I have a native Unity plugin that downloads 8K textures from a server and displays them to the user for a VR application. This works well, but there's a large frame drop when calling CreateTexture2D. To remedy this, I've tried creating a separate thread that creates the texture, but the frame drop is still present.
      Is there anything else that I could do to prevent that frame drop from occuring?
    • By cambalinho
      i'm trying draw a circule using math:
      class coordenates { public: coordenates(float x=0, float y=0) { X = x; Y = y; } float X; float Y; }; coordenates RotationPoints(coordenates ActualPosition, double angle) { coordenates NewPosition; NewPosition.X = ActualPosition.X*sin(angle) - ActualPosition.Y*sin(angle); NewPosition.Y = ActualPosition.Y*cos(angle) + ActualPosition.X*cos(angle); return NewPosition; } but now i know that these have 1 problem, because i don't use the orign.
      even so i'm getting problems on how i can rotate the point.
      these coordinates works between -1 and 1 floating points.
      can anyone advice more for i create the circule?
    • By isu diss
      I managed convert opengl code on http://john-chapman-graphics.blogspot.co.uk/2013/02/pseudo-lens-flare.html to hlsl, but unfortunately I don't know how to add it to my atmospheric scattering code (Sky - first image). Can anyone help me?
      I tried to bind the sky texture as SRV and implement lens flare code in pixel shader, I don't know how to separate them (second image)

    • By jonwil
      I have some code (not written by me) that is creating a window to draw stuff into using these:
      CreateDXGIFactory1 to create an IDXGIFactory1
      dxgi_factory->CreateSwapChain to create an IDXGISwapChain
      D3D11CreateDevice to create an ID3D11Device and an ID3D11DeviceContext
      Other code (that I dont quite understand) that creates various IDXGIAdapter1 and IDXGIOutput instances
      Still other code (that I dont quite understand) that is creating some ID3D11RenderTargetView and ID3D11DepthStencilView instances and is doing something with those as well (possibly loading them into the graphics context somewhere although I cant quite see where)
      What I want to do is to create a second window and draw stuff to that as well as to the main window (all drawing would happen on the one thread with all the drawing to the sub-window happening in one block and outside of any rendering being done to the main window). Do I need to create a second IDXGISwapChain for my new window? Do I need to create a second ID3D11Device or different IDXGIAdapter1 and IDXGIOutput interfaces? How do I tell Direct3D which window I want to render to? Are there particular d3d11 functions I should be looking for that are involved in this?
      I am good with Direct3D9 but this is the first time I am working with Direct3D11 (and the guy who wrote the code has left our team so I cant ask him for help
  • Advertisement