Sign in to follow this  

DX11 DX11 - Pixel Shader 5 vs. Group Shared Memory and Atomic operations

Recommended Posts

pcmaster    982
Greetings community,

we all know that SM5 brought the possibility to scatter stuff in pixel shaders, too (not only compute shaders). MSDN is rather brief on this topic. I can only see that I can use Interlocked*() instructions in both PS and CS. I suppose on UAVs. DeviceMemoryBarrier() seems to work in both PS and CS and it seems to be the only barrier instruction usable in PS. My question now is whether it's principally impossible to[b] take advantage of the group shared memory in PS [/b]too. I don't see the API for that and maybe that makes sense. In GL4.2 I noticed they released the GL_ARB_shader_image_load_store extension, which obviously supports the same stuff but still nothing for the scarce but fast shared memory manipulation :( I did implement various parallel algorithms in OpenCL, so although I might seem little confused now, I'm very much aware which memory is which and what's it good for in GPGPU via CUDA/OpenCL/DX11 CS.

Also, I see virtually nobody discussing using the atomic instructions outside compute shaders and wonder why. I see some OIT and Bokehs around which use Append Buffers. But I have a scenario where I need to rasterise normal geometry with a lot of textures and where I might benefit from being able to reduce a lot of info from pixel shaders using atomic operations on global (device) buffers, instead of writing out shitloads of texture data and reducing it parallelly afterwards. I'm not going to elaborate on my scenario further, I just state that I'll need to analyse what has been rasterised. I don't know how will the performance suffer if all units (fragments) try to write to the same memory location using InterlockedMax() or similar :(

Any thoughts on pixel shaders (not compute shaders!) and shared and atomic stuff in DX11?

Share this post

Link to post
Share on other sites
MJP    19788
There's no way to access shared memory at all in pixel shaders. I would assume that the GPU is already using shared memory for coordinating pixel shader executions, but even if that's not the case the API has no means of using it. So you're out of luck on that one.

I really haven't played around too much with using UAV's in pixel shaders, aside from using an append buffer for bokeh (I wrote that sample you're talking about). I'd imagine it's pretty slow using device-wide interlocked operations due to the kind of synchronization required for that sort operation. Even interlocked adds on shared memory is pretty slow...if you look at any fast parallel reductions for compute shaders or Cuda you'll find that they all avoid atomics. But it would definitely be better to profile than to assume, so if you do try any experiments I'd love to know how they turn out.

Share this post

Link to post
Share on other sites
griffin77    125
That is my understanding too. That there is magic going on behind the scenes when you compile a pixel shader that converts the pixel shader code into low-level GPU instructions that use shared memory and the like (basically everything you have to do yourself when you write CS or Cuda code).

I have used DeviceMemoryBarrier() in a pixel shader, the documentation is VERY sketchy. As I understand it this is basically a hint to tell the compiler all the GPU threads in the current block should finish accessing globak memory before continuing. Used correctly this should reduce the memory access overhead associated with different threads accessing global memory. But without a coherent description of exactly what this means in the context of pixel shader its difficult to know if I'm using it correctly. Does anyone know of a good description of what this function means in the context of a pixel shader ?

Share this post

Link to post
Share on other sites
Jason Z    6436
Building on what the others have said, there is no access to the group shared memory in pixel shaders. If you consider for a moment how it is used in compute shaders, I think it will be clear why. In the compute shader, you specify how large the thread groups are that you will be working with, and how many of them will be executed in a particular dispatch. Part of your thread group size declaration is the declaration of how much shared memory it will be using. This gives very fine control over how many threads will be needing to access the memory, and you can design your algorithm very precisely to coordinate access to it.

In a pixel shader on the other hand, there is currently no method or concept of a thread group. Instead, it is up to the vendors to determine the optimal split size to be used when rasterizing a primitive, and then it is done more or less behind the scenes. This makes it impossible for a developer to write a shader that will have a coherent access strategy to the shared memory.

Who knows what will be coming in the next versions of D3D, but this seems like a logical extension of the possibilities. People have been talking about programmable rasterization for a while too, so perhaps sometime down the road there could be selectable group sizes for rasterization... That is just pure speculation though - I would be happy with a programmable rasterizer, but I don't know if one would ever come around and/or be useful...

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Partner Spotlight

  • Similar Content

    • By RubenRS
      How do i open an image to use it as Texture2D information without D3DX11CreateShaderResourceViewFromFile? And how it works for different formats like (JPG, PNG, BMP, DDS,  etc.)?
      I have an (512 x 512) image with font letters, also i have the position and texcoord of every letter. The main idea is that i want to obtain the image pixel info, use the position and texcoords to create a new texture with one letter and render it. Or am I wrong in something?
    • By thmfrnk
      I found a very interesting blog post here:
      However, I didn't really got how to use his "TestConeVsSphere" test in 3D (last piece of code on his post). I have the frustumCorners of a 2D Tile cell in ViewSpace and my 3D Cone Origin and Direction, so where to place the "testSphere"? I thought about to also move the Cone into viewspace and put the sphere to the Center of the Cell with the radius of half-cellsize, however what about depth? A sphere does not have inf depth?
      I am missing anything? Any Ideas?
      Thx, Thomas
    • By Modymek
      hi all
      I want to enable and disable shader in MPCH Media player Classic
      the MPCH have shader option using HLSL shaders
      I want the shader to read each file extension before it plays the file
      so if the video file name is video.GR.Mp4 it will play it in Grayscale shader 
      if it is not and standard file name Video.Mp4 without GR. unique extension so it plays standard without shader or end the shader
      here is the shader I have for grayscale
      // $MinimumShaderProfile: ps_2_0
      sampler s0 : register(s0);
      float4 main(float2 tex : TEXCOORD0) : COLOR {
          float c0 = dot(tex2D(s0, tex), float4(0.299, 0.587, 0.114, 0));
          return c0;
      I want to add if or block stantement or bloean to detect file name before it call the shader in order to go to the procedure or disable it or goto end direct without it
      any thoughts or help
    • By noodleBowl
      I've gotten to part in my DirectX 11 project where I need to pass the MVP matrices to my vertex shader. And I'm a little lost when it comes to the use of the constant buffer with the vertex shader
      I understand I need to set up the constant buffer just like any other buffer:
      1. Create a buffer description with the D3D11_BIND_CONSTANT_BUFFER flag 2. Map my matrix data into the constant buffer 3. Use VSSetConstantBuffers to actually use the buffer But I get lost at the VertexShader part, how does my vertex shader know to use this constant buffer when we get to the shader side of things
      In the example I'm following I see they have this as their vertex shader, but I don't understand how the shader knows to use the MatrixBuffer cbuffer. They just use the members directly. What if there was multiple cbuffer declarations like the Microsoft documentation says you could have?
      //Inside vertex shader cbuffer MatrixBuffer { matrix worldMatrix; matrix viewMatrix; matrix projectionMatrix; }; struct VertexInputType { float4 position : POSITION; float4 color : COLOR; }; struct PixelInputType { float4 position : SV_POSITION; float4 color : COLOR; }; PixelInputType ColorVertexShader(VertexInputType input) { PixelInputType output; // Change the position vector to be 4 units for proper matrix calculations. input.position.w = 1.0f; // Calculate the position of the vertex against the world, view, and projection matrices. output.position = mul(input.position, worldMatrix); output.position = mul(output.position, viewMatrix); output.position = mul(output.position, projectionMatrix); // Store the input color for the pixel shader to use. output.color = input.color; return output; }  
    • By gomidas
      I am trying to add normal map to my project I have an example of a cube: 
      I have normal in my shader I think. Then I set shader resource view for texture (NOT BUMP)
                  device.ImmediateContext.PixelShader.SetShaderResource(0, textureView);             device.ImmediateContext.Draw(VerticesCount,0); What should I do to set my normal map or how it is done in dx11 generally example c++?
  • Popular Now