• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By Endemoniada

      Hi guys, when I do picking followed by ray-plane intersection the results are all wrong. I am pretty sure my ray-plane intersection is correct so I'll just show the picking part. Please take a look:
       
      // get projection_matrix DirectX::XMFLOAT4X4 mat; DirectX::XMStoreFloat4x4(&mat, projection_matrix); float2 v; v.x = (((2.0f * (float)mouse_x) / (float)screen_width) - 1.0f) / mat._11; v.y = -(((2.0f * (float)mouse_y) / (float)screen_height) - 1.0f) / mat._22; // get inverse of view_matrix DirectX::XMMATRIX inv_view = DirectX::XMMatrixInverse(nullptr, view_matrix); DirectX::XMStoreFloat4x4(&mat, inv_view); // create ray origin (camera position) float3 ray_origin; ray_origin.x = mat._41; ray_origin.y = mat._42; ray_origin.z = mat._43; // create ray direction float3 ray_dir; ray_dir.x = v.x * mat._11 + v.y * mat._21 + mat._31; ray_dir.y = v.x * mat._12 + v.y * mat._22 + mat._32; ray_dir.z = v.x * mat._13 + v.y * mat._23 + mat._33;  
      That should give me a ray origin and direction in world space but when I do the ray-plane intersection the results are all wrong.
      If I click on the bottom half of the screen ray_dir.z becomes negative (more so as I click lower). I don't understand how that can be, shouldn't it always be pointing down the z-axis ?
      I had this working in the past but I can't find my old code
      Please help. Thank you.
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By evelyn4you
      Hello,
      in my game engine i want to implement my own bone weight painting tool, so to say a virtual brush painting tool for a mesh.
      I have already implemented my own "dual quaternion skinning" animation system with "morphs" (=blend shapes)  and "bone driven"  "corrective morphs" (= morph is dependent from a bending or twisting bone)
      But now i have no idea which is the best method to implement a brush painting system.
      Just some proposals
      a.  i would build a kind of additional "vertecie structure", that can help me to find the surrounding (neighbours) vertecie indexes from a given "central vertecie" index
      b.  the structure should also give information about the distance from the neighbour vertecsies to the given "central vertecie" index
      c.  calculate the strength of the adding color to the "central vertecie" an the neighbour vertecies by a formula with linear or quadratic distance fall off
      d.  the central vertecie would be detected as that vertecie that is hit by a orthogonal projection from my cursor (=brush) in world space an the mesh
            but my problem is that there could be several  vertecies that can be hit simultaniously. e.g. i want to paint the inward side of the left leg. the right leg will also be hit.
      I think the given problem is quite typical an there are standard approaches that i dont know.
      Any help or tutorial are welcome
      P.S. I am working with SharpDX, DirectX11
        
    • By Luca Davidian
      Hi, I'm implementing a simple 3D engine based on DirectX11. I'm trying to render a skybox with a cubemap on it and to do so I'm using DDS Texture Loader from DirectXTex library. I use texassemble to generate the cubemap (texture array of 6 textures) into a DDS file that I load at runtime. I generated a cube "dome" and sample the texture using the position vector of the vertex as the sample coordinates (so far so good), but I always get the same face of the cubemap mapped on the sky. As I look around I always get the same face (and it wobbles a bit if I move the camera). My code:   
      //Texture.cpp:         Texture::Texture(const wchar_t *textureFilePath, const std::string &textureType) : mType(textureType)         {             //CreateDDSTextureFromFile(Game::GetInstance()->GetDevice(), Game::GetInstance()->GetDeviceContext(), textureFilePath, &mResource, &mShaderResourceView);             CreateDDSTextureFromFileEx(Game::GetInstance()->GetDevice(), Game::GetInstance()->GetDeviceContext(), textureFilePath, 0, D3D11_USAGE_DEFAULT, D3D11_BIND_SHADER_RESOURCE, 0, D3D11_RESOURCE_MISC_TEXTURECUBE, false, &mResource, &mShaderResourceView);         }     // SkyBox.cpp:          void SkyBox::Draw()     {         // set cube map         ID3D11ShaderResourceView *resource = mTexture.GetResource();         Game::GetInstance()->GetDeviceContext()->PSSetShaderResources(0, 1, &resource);              // set primitive topology         Game::GetInstance()->GetDeviceContext()->IASetPrimitiveTopology(D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST);              mMesh.Bind();         mMesh.Draw();     }     // Vertex Shader:     cbuffer Transform : register(b0)     {         float4x4 viewProjectionMatrix;     };          float4 main(inout float3 pos : POSITION) : SV_POSITION     {         return mul(float4(pos, 1.0f), viewProjectionMatrix);     }     // Pixel Shader:     SamplerState cubeSampler;     TextureCube cubeMap;          float4 main(in float3 pos : POSITION) : SV_TARGET     {         float4 color = cubeMap.Sample(cubeSampler, pos.xyz);         return color;     } I tried both functions grom DDS loader but I keep getting the same result. All results I found on the web are about the old SDK toolkits, but I'm using the new DirectXTex lib.
    • By B. /
      Hi Guys,
      i want to draw shadows of a direction light but the shadows always disappear, if i translate my mesh (cube) in the world to far of the bounds of my orthographic projection matrix.
      That my code (Based of an XNA sample i recode for my project):
      // Matrix with that will rotate in points the direction of the light Matrix lightRotation = Matrix.LookAtLH(Vector3.Zero, lightDir, Vector3.Up); BoundingFrustum cameraFrustum = new BoundingFrustum(Matrix.Identity); // Get the corners of the frustum Vector3[] frustumCorners = cameraFrustum.GetCorners(); // Transform the positions of the corners into the direction of the light for (int i = 0; i < frustumCorners.Length; i++) frustumCorners[i] = Vector4F.ToVector3(Vector3.Transform(frustumCorners[i], lightRotation)); // Find the smallest box around the points BoundingBox lightBox = BoundingBox.FromPoints(frustumCorners); Vector3 boxSize = lightBox.Maximum - lightBox.Minimum; Vector3 halfBoxSize = boxSize * 0.5f; // The position of the light should be in the center of the back pannel of the box. Vector3 lightPosition = lightBox.Minimum + halfBoxSize; lightPosition.Z = lightBox.Minimum.Z; // We need the position back in world coordinates so we transform // the light position by the inverse of the lights rotation lightPosition = Vector4F.ToVector3(Vector3.Transform(lightPosition, Matrix.Invert(lightRotation))); // Create the view matrix for the light this.view = Matrix.LookAtLH(lightPosition, lightPosition + lightDir, Vector3.Up); // Create the projection matrix for the light // The projection is orthographic since we are using a directional light int amount = 25; this.projection = Matrix.OrthoOffCenterLH(boxSize.X - amount, boxSize.X + amount, boxSize.Y + amount, boxSize.Y - amount, -boxSize.Z - amount, boxSize.Z + amount); I believe the bug is by cameraFrustum to set a Matrix Idetity. I also tried with a Translation Matrix of my Camera Position and also the View Matrix of my Camera, but without success
      Can anyone tell me, how to draw shadows of my direction light always where my camera is current in my scene?
      Greets
      Benjamin
  • Advertisement
  • Advertisement
Sign in to follow this  

DX11 [D3D12] Total brightness problem in compute shader

This topic is 812 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

The code below is supposed to calculate total brightness of tsrc texture.

 

The good thing is HD Graphics 4600 calculates it just fine. The bad thing is GTX 980 does not.

 

The values I read from read-back buffer fluctuate wildly, but they seem to stay below correct value.

 

I took the code for atomic addition of float values from this thread http://www.gamedev.net/topic/613648-dx11-interlockedadd-on-floats-in-pixel-shader-workaround/

 

I have no idea what's going on. Thanks in advance.

 

EDIT: 'globallycoherent' doesn't work. Using 'InterlockedAdd' and summing uint's doesn't work.

#define TotalGroups 32
#define RSDT "RootFlags(0), UAV(u0), DescriptorTable(SRV(t0))" // Descriptor table is required for texture

Texture2D<float4> tsrc: register(t0);
RWByteAddressBuffer total : register(u0);

groupshared float4 bpacked[TotalGroups*TotalGroups];

float brightness(float4 cl) {
  // TODO: replace with correct implementation
  return cl.r + cl.g + cl.b;
}

[RootSignature(RSDT)]
[numthreads(TotalGroups,TotalGroups,1)]
void CSTotal(uint3 gtid: SV_GroupThreadId, uint3 gid : SV_GroupId, uint gindex : SV_GroupIndex, uint3 dtid : SV_DispatchThreadID) {
  uint2 crd = (gid.xy * TotalGroups + gtid.xy)*2;
  float br[4];
  [unroll]
  for (uint x = 0; x < 2; ++x) {
    [unroll]
    for (uint y = 0; y < 2; ++y) {
      // Color outside of tsrc is guarantied to be 0.
      br[y * 2 + x] = brightness(tsrc[crd+uint2(x,y)]);
    }
  }
  bpacked[gindex] = float4(br[0],br[1],br[2],br[3]);
  if (all(dtid == uint3(0, 0, 0))) {
?    // set initial value of total brightness accumulator
    total.Store(0, asuint(0.0));
  };
  AllMemoryBarrierWithGroupSync();
?  // bpacked array now contains brightness in each component of each value

  // reduce bpacked to single value
  [unroll]
  for (uint thres = TotalGroups*TotalGroups / 2; thres > 0; thres /= 2) {
    if (gindex < thres) {
      bpacked[gindex] += bpacked[gindex + thres];
    }
    AllMemoryBarrierWithGroupSync();
  }
  if (gindex == 0) {
    float4 cl = bpacked[0];
    float value = cl.r + cl.g + cl.b + cl.a;

    // First thread in thread group atomically adds calculated brightness to the accumulator
    uint comp, orig = total.Load(0);
    [allow_uav_condition]do
    {
      comp = orig;
      total.InterlockedCompareExchange(0, comp, asuint(asfloat(orig) + value), orig);
    } while (orig != comp);
  }
}

Invocation of compute shader is written in Rust. But it should be sufficiently readable.

 

I'm sure that Rust bindings to D3D12 are not the cause for the problem. I work with them for months without problems.

        let src_desc = srv_tex2d_default_slice_mip(srcdesc.Format, 0, 1);
        core.dev.create_shader_resource_view(Some(&src), Some(&src_desc), res.total_dheap.cpu_handle(0));

        clist.set_pipeline_state(&self.total_cpso);
        clist.set_compute_root_signature(&self.total_rs);
        clist.set_descriptor_heaps(&[res.total_dheap.get()]);
        clist.set_compute_root_descriptor_table(1, res.total_dheap.gpu_handle(0));
        clist.set_compute_root_unordered_access_view(0, res.rw_total.get_gpu_virtual_address());

        clist.resource_barrier(&[
          *ResourceBarrier::transition(&src,
            D3D12_RESOURCE_STATE_COMMON, D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE),
          *ResourceBarrier::transition(&res.rw_total,
            D3D12_RESOURCE_STATE_COMMON, D3D12_RESOURCE_STATE_UNORDERED_ACCESS),
        ]);

        clist.dispatch(cw / TOTAL_CHUNK_SIZE, ch / TOTAL_CHUNK_SIZE, 1);
        clist.resource_barrier(&[
          *ResourceBarrier::transition(&res.rw_total,
            D3D12_RESOURCE_STATE_UNORDERED_ACCESS, D3D12_RESOURCE_STATE_COPY_SOURCE),
        ]);
        clist.copy_resource(&res.rb_total, &res.rw_total);
        clist.resource_barrier(&[
          *ResourceBarrier::transition(&res.rw_total,
            D3D12_RESOURCE_STATE_COPY_SOURCE, D3D12_RESOURCE_STATE_COMMON),
        ]);
        try!(clist.close());

        core.compute_queue.execute_command_lists(&[clist]);

        wait_for_compute_queue(core, &res.fence, &create_event());

        let total_brightness = res.total_brightness();
        let avg_brightness = total_brightness / cw as f32 / ch as f32;


Edited by red75prime

Share this post


Link to post
Share on other sites
Advertisement

I knew it. This part looked just too ugly.

  if (all(dtid == uint3(0, 0, 0))) {
?    // set initial value of total brightness accumulator
    total.Store(0, asuint(0.0));
  };

When I replaced it with ClearUnorderedAccessViewUint... I've got program crash on GTX 980. It seems this function is broken on NVidia.

 

Then I replaced it with

[RootSignature(RSDT)]
[numthreads(1,1,1)]
void CSClearTotal() {
  total[0] = 0;
}

And now it works.

 

EDIT: Maybe ClearUnorderedAccessViewUint isn't broken. Maybe I don't understand what second parameter means.

Edited by red75prime

Share this post


Link to post
Share on other sites

I took the code for atomic addition of float values from this thread http://www.gamedev.net/topic/613648-dx11-interlockedadd-on-floats-in-pixel-shader-workaround/

 

Don't spin your Gpu like this. Read this instead: http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/doc/reduction.pdf

 

or search "parallel reduction". There's actually faster ways to do it on an nvidia gpu that aren't exposed to dx12 sad.png.

 

They're reducing twice the number of values in a 1920x1080 texture (4 billion items) in .268ms on something like 5 year old hardware for a reference benchmark.

Edited by Dingleberry

Share this post


Link to post
Share on other sites

 

 

Don't spin your Gpu like this. Read this instead: http://developer.download.nvidia.com/compute/cuda/1.1-Beta/x86_website/projects/reduction/doc/reduction.pdf

 

or search "parallel reduction". There's actually faster ways to do it on an nvidia gpu that aren't exposed to dx12 sad.png.

 

They're reducing twice the number of values in a 1920x1080 texture (4 billion items) in .268ms on something like 5 year old hardware for a reference benchmark.

 

 

I implemented parallel reduction inside a thread group. Atomic addition is performed once per thread group. And performance is not ?that bad. Around 4ms for 3840x2160 R32G32B32A32_FLOAT texture.

 

Vendor-specific optimizations can be safely postponed, I think.

Edited by red75prime

Share this post


Link to post
Share on other sites

You might be getting a debug error if ClearUnorderedAccessViewUint fails. I have a GTX 970 and it works fine. You need to have the buffer's uav in a set descriptor heap and also the uav can't be a shader visible heap iirc. That's what got me at first. 

 

I'm still highly skeptical about that atomic float addition. If 4ms is good enough then great, but it seems like a pretty substantial amount of time to me.

Edited by Dingleberry

Share this post


Link to post
Share on other sites

You need to have the buffer's uav in a set descriptor heap and also the uav can't be a shader visible heap iirc

 

Thank you. MSDN doesn't mention any of it and debug layer's message in case of error is not clear at all.

 

I experimented a bit. GPU descriptor handle can be in any descriptor heap (either set or not, either shader visible or not). Debug layer doesn't complain in any case.

 

CPU descriptor handle must not be for UAV in shader visible heap.

 

EDIT: My bad. MSDN has a comment in community additions section, but I use offline docs.

Edited by red75prime

Share this post


Link to post
Share on other sites

D3D12 is really tricky. It took me two weeks to make the code work on 3 out of 4 GPUs I have.

 

Key insight is "A shader cannot reliably read from UAV of a resource filled by another shader, you need to use SRV to read from it".

 

Another one is "Two ways to sum float values produced by different thread groups are a) lock buffer and sum on CPU b) spin in InterlockedCompareExchange"

Share this post


Link to post
Share on other sites

Key insight is "A shader cannot reliably read from UAV of a resource filled by another shader, you need to use SRV to read from it".

Shouldn't that be possible as long as you issue the appropriate resource transition between the two shader dispatches? The transition tells the driver that there's a data dependency between the two dispatches, so it can insert a wait-for-cache-flush command before the 2nd one, ensuring UAV coherency.

Share this post


Link to post
Share on other sites

Shouldn't that be possible as long as you issue the appropriate resource transition between the two shader dispatches? The transition tells the driver that there's a data dependency between the two dispatches, so it can insert a wait-for-cache-flush command before the 2nd one, ensuring UAV coherency.

 

It doesn't work on Microsoft Basic Render Driver and, possibly, on HD 4600 (I have another problems with this one). I just checked it. Also it requires one more resource barrier, d3d12 doesn't allow transitioning from one state into the same state.

Edited by red75prime

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement