Sign in to follow this  
Followers 0
maxest

DX11
HLSL race condition when writing to shared memory passed to function

8 posts in this topic

I have code like this:

groupshared uint tempData[ElementsCount];

[numthreads(ElementsCount/2, 1, 1)]
void CSMain(uint3 gID: SV_GroupID, uint3 gtID: SV_GroupThreadID)
{
    tempData[gtID.x] = 0;
}

And it works fine. Now I change it to this:

void MyFunc(inout uint3 gtID: SV_GroupThreadID, inout uint inputData[ElementsCount])
{
    inputData[gtID.x] = 0;
}

groupshared uint tempData[ElementsCount];

[numthreads(ElementsCount/2, 1, 1)]
void CSMain(uint3 gID: SV_GroupID, uint3 gtID: SV_GroupThreadID)
{
    MyFunc(gtID, tempData);
}

and I get "error X3695: race condition writing to shared memory detected, consider making this write conditional.". Any way to go around this?

0

Share this post


Link to post
Share on other sites

Based on a quick search I found these:

 

https://www.gamedev.net/topic/594131-dx11-compute-shader-race-condition-error-when-using-optimization-level-2-or-3/

http://xboxforums.create.msdn.com/forums/t/63981.aspx

 

Based on those it sounds like there might be a bug in certain versions of the compiler. I'd suggest trying to use either command line fxc.exe or a more recent version of the d3dcompiler dll to see if it makes any difference.

0

Share this post


Link to post
Share on other sites

Posted (edited)

I stumbled upon those threads as well and it's not it.

Also, I'm not really sure how to update my d3dcompiler. I'm using Windows 10 so I presume it gets updated automatically. Although I use Visual Studio 2013 so I cannot really be sure if the most up-to-date dll is used.

I found out that the problem appears even in this code:

static const int ElementsCount = 512;


groupshared uint tempData[2 * ElementsCount];


void MyFunc(inout uint3 gtID: SV_GroupThreadID, inout uint inputData[2 * ElementsCount])
{

}


[numthreads(ElementsCount, 1, 1)]
void CSMain(uint3 gID: SV_GroupID, uint3 gtID: SV_GroupThreadID)
{
    MyFunc(gtID, tempData);
}

Note that I don't even write anything to tempData in MyFunc.
I also found out the problem goes away if I remove the "inout" modifier but then the array just gets copied probably as the code doesn't work as expected.

Edited by maxest
0

Share this post


Link to post
Share on other sites

https://blogs.msdn.microsoft.com/chuckw/2012/05/07/hlsl-fxc-and-d3dcompile/ explains all the details of how the compiler DLL works.

If you want to check which version of the DLL your program is using, then just pause it in the debugger and look through the modules window for the DLL. I believe the latest version is D3dcompiler_47.dll

Have you tried compiling the shader using fxc.exe?

1

Share this post


Link to post
Share on other sites

Yeah, I have D3dcompiler_47.dll indeed.

I did try. Forgot to mention that in previous post. The same problem persists.

0

Share this post


Link to post
Share on other sites

There is many version of d3dcompiler_47.dll, a very dumb idea…

If your shader just compile from visual studio as a hlsl source file, the fxc and dll you use is probably bound to the windows sdk that is setup in your project.

 

Not saying that getting the latest one would solve this, but you may still run an outdated compiler :)

1

Share this post


Link to post
Share on other sites

I've reproduced the behaviour, and simplified the case that goes wrong. Here's my minimal failing case:

groupshared uint tempData[1];

void MyFunc(inout uint inputData[1])
{
}

[numthreads(2, 1, 1)]
void CSMain()
{
    MyFunc(tempData);
}

It looks like just passing the argument to the function is enough to make it fail to compile.

Here's a workaround for the problem - don't pass the array as a function argument:

#define ElementsCount 256

groupshared uint tempData[ElementsCount];

void MyFunc(in uint3 gtID)
{
      tempData[gtID.x] = 0;
}

[numthreads(ElementsCount/2, 1, 1)]
void CSMain(uint3 gID: SV_GroupID, uint3 gtID: SV_GroupThreadID)
{
    MyFunc(gtID);
}
1

Share this post


Link to post
Share on other sites

Yeah, I'm perfectly aware of that workaround and I do it this way. But because I can't pass a shared memory array to function I can't make the function more general. Instead I need to copy it to a few files I use it in.

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • By Enitalp
      Hi all.
      I have a direct2d1 application with all my UI. And now i'm trying to insert 3d rendering in my UI. I tried a lot of thing as i'm new to that. and failed....
      So my UI contain a control that is a 3d render. so stupidly i was thinking of making a 3d rendertarget, get the bitmap of that. and draw it at the place of my control.
      So i created this function
      public Bitmap1 CreateTarget(int i_Width, int i_Height) { Texture2DDescription l_Description = new Texture2DDescription(); l_Description.BindFlags = BindFlags.RenderTarget; l_Description.Format = m_BackBuffer.Description.Format; l_Description.Width = i_Width; l_Description.Height = i_Height; l_Description.Usage = ResourceUsage.Default; l_Description.ArraySize = 1; l_Description.MipLevels = 1; l_Description.SampleDescription = new SampleDescription(1, 0); l_Description.CpuAccessFlags = CpuAccessFlags.None; l_Description.OptionFlags = ResourceOptionFlags.None; Texture2D l_RenderTarget = new Texture2D(m_Device, l_Description); BitmapProperties1 properties = new BitmapProperties1() { PixelFormat = new PixelFormat(l_Description.Format, SharpDX.Direct2D1.AlphaMode.Premultiplied), BitmapOptions = BitmapOptions.Target, DpiX=96, DpiY = 96 }; Bitmap1 m_OffscreenBitmap; using (Surface l_Surface = l_RenderTarget.QueryInterface<Surface>()) { m_OffscreenBitmap = new Bitmap1(m_2DContext, l_Surface, properties); } return m_OffscreenBitmap; }  
      And my control does a simple :
      if (m_OldSize != Size) { m_OldSize = Size; if (m_OffscreenBitmap != null) { m_OffscreenBitmap.Dispose(); } m_OffscreenBitmap = i_Param.CurrentWindow.CreateTarget(Size.Width, Size.Height); } i_Context.DrawContext2D.DrawBitmap(m_OffscreenBitmap, m_Rect, 1.0f, BitmapInterpolationMode.Linear);  
      Here is my problem, if BitmapOptions is different from BitmapOptions = BitmapOptions.Target | BitmapOptions.CannotDraw
      i crash when creating my new Bitmap1 because of invalid params.
      and if i let it, i crash at present because :
      Additional information: HRESULT: [0x88990021], Module: [SharpDX.Direct2D1], ApiCode: [D2DERR_BITMAP_CANNOT_DRAW/BitmapCannotDraw], Message: Impossible de dessiner avec une bitmap qui a l’option D2D1_BITMAP_OPTIONS_CANNOT_DRAW.
       
      I must admit i'm out of idea. and i'm stuck. Please help.
      Does my method is totally wrong ?
      I tried to make my control owning is own 3d device so i can render that at a different pace than the 2d and did get the same result
       
       
       
       
    • By ErnieDingo
      Before you read, apologies for the wall of text!  
      I'm looking to leverage efficiencies in DirectX 11 calls, to improve performance and throughput of my game.  I have a number of bad decisions I am going to remedy, but before I do, I am just wanting to get input into I should put effort into doing these.
      I've been running for a while with a high frame rate in my game, but as I add assets, its obviously dipping (its a bit of an n squared issue).  I'm fully aware of the current architecture, and I'm looking to take care of some severe vertex buffer thrashing i'm doing at the moment. 
      Keep in mind, the game engine has evolved over the past year so some decisions made at that time in hindsight are considered bad, but were logical at the time.
      The scenarios:
      Current: my game world is broken up by quad tree.  I'm rendering the terrain geometry and water geometry separately and in different vertex buffers.   Currently I am using Raw Draw Calls which means that I am very wasteful on computational power.  
      Goal: Use Index buffers to reduce vertices by 80%, compress my index buffers and vertex buffers into one index buffer and vertex buffer.  I can't reduce the number of draw calls as its per leaf.
      Current: Static assets such as trees etc are bound to each leaf of my quad tree, as I traverse the tree to see whats in view/out of view, I trim the leaf which in turn trims all the static assets.  This means there is an instance buffer for each node AND for each mesh.  
      Goal: Compress the instance Buffers into one instance buffer per mesh (Ie, even if 10 meshes are in 1 vertex buffer, I need 10 instance buffers), for all meshes, compress the meshes into 1 index buffer and 1 vertex buffer.  I can not reduce the number of draw calls.
      Current: My unlimited sea function reuses the same tile mesh and just remaps with a constant buffer.  This means, if there are 10 tiles, there are 10 draw calls and 10 constant buffer updates.
      Goal: Simple, Use an instance buffer and remove the constant buffer updates (I was lazy and wanted to do this quick :)).  Reduces it to 1 draw call, 1 instance buffer bind and 1 vertex buffer bind.
      Current: Each shader, i'm rebinding the same constant buffers, these buffers only change at the start of a new scene (shadow AND rendered).  
      Goal: Create a map of buffers to be bound once per context, use consistent registers.   Combine wasteful buffer structures into 1 buffer.  Reduce number of constant changes.  More negligible for deferred contexts but still worth it.
      All these changes are not difficult as I have layered my graphics engine in such a way that it doesn't disturb the lower levels.  Ie. Instance management is not bound to mesh directly, mesh management allows for compression easily.    All static buffers are set immutable in my game, so vertex, index and most index buffers are immutable.
      So the questions: 
      - Are some or all changes worth it?  Or am I going to just suffer from draw calls?  
      - I am assuming at the moment that Setting vertex buffers, index buffers, instance buffers are part of the command buffer?  Is this correct, i'm looking to reduce the number of calls pushed through it.
      - I assume in a deferred context world, that constant buffers when set are not persistent across contexts when I execute command lists.
      - Lastly, should I look into Draw Indexed instanced indirect to accumulate draw calls?  And would I get any benefit from the GPU side doing this?
       
       
       
    • By Zototh
      I am using slimDX and am having a problem with a shader. I have an instance Shader that works perfect but I needed one for drawing fonts manually. The idea is to create the plane and simple instance it with separate position color and texture coordinates for each char.  I know this post is terribly long but any help would be appreciated. I tried to provide everything needed but if you need more I will be glad to post it.
      This is the shader. the only difference between it and the working one is the instance texture coordinates. I was able to render 4,000 spheres with 30,000 faces with the original and still maintain a 100+ framerate. I don't know if that is a lot but it looked like it to me.
      cbuffer cbVSPerFrame:register(b0) { row_major matrix world; row_major matrix viewProj; }; Texture2D g_Tex; SamplerState g_Sampler; struct VSInstance { float4 Pos : POSITION; float3 Normal : NORMAL; float2 Texcoord : TEXCOORD0; float4 model_matrix0 : TEXCOORD1; float4 model_matrix1 : TEXCOORD2; float4 model_matrix2 : TEXCOORD3; float4 model_matrix3 : TEXCOORD4; // this is the only addition float2 instanceCoord:TEXCOORD5; float4 Color:COLOR; }; struct PSInput { float4 Pos : SV_Position; float3 Normal : NORMAL; float4 Color:COLOR; float2 Texcoord : TEXCOORD0; }; PSInput Instancing(VSInstance In) { PSInput Out; // construct the model matrix row_major float4x4 modelMatrix = { In.model_matrix0, In.model_matrix1, In.model_matrix2, In.model_matrix3 }; Out.Normal = mul(In.Normal, (row_major float3x3)modelMatrix); float4 WorldPos = mul(In.Pos, modelMatrix); Out.Pos = mul(WorldPos, viewProj); Out.Texcoord = In.instanceCoord; Out.Color = In.Color; return Out; } float4 PS(PSInput In) : SV_Target { return g_Tex.Sample(g_Sampler, In.Texcoord); } technique11 HWInstancing { pass P0 { SetGeometryShader(0); SetVertexShader(CompileShader(vs_4_0, Instancing())); SetPixelShader(CompileShader(ps_4_0, PS())); } } this is the input elements for the 2 buffers
      private static readonly InputElement[] TextInstance = { new InputElement("POSITION", 0, Format.R32G32B32_Float, 0, 0, InputClassification.PerVertexData, 0), new InputElement("NORMAL", 0, Format.R32G32B32_Float, InputElement.AppendAligned, 0, InputClassification.PerVertexData, 0), new InputElement("TEXCOORD", 0, Format.R32G32_Float, InputElement.AppendAligned, 0, InputClassification.PerVertexData, 0), new InputElement("TEXCOORD", 1, Format.R32G32B32A32_Float, 0, 1, InputClassification.PerInstanceData, 1 ), new InputElement("TEXCOORD", 2, Format.R32G32B32A32_Float, InputElement.AppendAligned, 1, InputClassification.PerInstanceData, 1 ), new InputElement("TEXCOORD", 3, Format.R32G32B32A32_Float, InputElement.AppendAligned, 1, InputClassification.PerInstanceData, 1 ), new InputElement("TEXCOORD", 4, Format.R32G32B32A32_Float, InputElement.AppendAligned, 1, InputClassification.PerInstanceData, 1 ), new InputElement("TEXCOORD", 5, Format.R32G32_Float, InputElement.AppendAligned, 1, InputClassification.PerInstanceData, 1 ), new InputElement("COLOR", 0, Format.R32G32B32A32_Float, InputElement.AppendAligned, 1, InputClassification.PerInstanceData, 1 ) }; the struct for holding instance data. 
      [StructLayout(LayoutKind.Sequential)] public struct InstancedText { public Matrix InstancePosition; public Vector2 InstanceCoords; public Color4 Color; }; instanceData buffer creation. Instance Positions is a simple List<InstancedText> above
      DataStream ds = new DataStream(InstancePositions.ToArray(), true, true); BufferDescription vbDesc = new BufferDescription(); vbDesc.BindFlags = BindFlags.VertexBuffer; vbDesc.CpuAccessFlags = CpuAccessFlags.None; vbDesc.OptionFlags = ResourceOptionFlags.None; vbDesc.Usage = ResourceUsage.Default; vbDesc.SizeInBytes = InstancePositions.Count * Marshal.SizeOf<InstancedText>(); vbDesc.StructureByteStride = Marshal.SizeOf<InstancedText>(); ds.Position = 0; instanceData = new Buffer(renderer.Device, vbDesc);  
      and finally the render code.
      the mesh is a model class that contains the plane's data. PositionNormalTexture is just a struct for those elements.
      renderer.Context.InputAssembler.InputLayout = new InputLayout(renderer.Device, effect.GetTechniqueByName("HWInstancing").GetPassByIndex(0).Description.Signature, TextInstance); renderer.Context.InputAssembler.PrimitiveTopology = PrimitiveTopology.TriangleList; renderer.Context.InputAssembler.SetVertexBuffers(0, new VertexBufferBinding(mesh.VertexBuffer, Marshal.SizeOf<PositionNormalTexture>(), 0)); renderer.Context.InputAssembler.SetIndexBuffer(mesh.IndexBuffer, SlimDX.DXGI.Format.R32_UInt, 0); renderer.Context.InputAssembler.SetVertexBuffers(1, new VertexBufferBinding(instanceData, Marshal.SizeOf<InstancedText>(), 0)); effect.GetVariableByName("g_Tex").AsResource().SetResource(textures[fonts[name].Name]); EffectTechnique currentTechnique = effect.GetTechniqueByName("HWInstancing"); for (int pass = 0; pass < currentTechnique.Description.PassCount; ++pass) { EffectPass Pass = currentTechnique.GetPassByIndex(pass); System.Diagnostics.Debug.Assert(Pass.IsValid, "Invalid EffectPass"); Pass.Apply(renderer.Context); renderer.Context.DrawIndexedInstanced(mesh.IndexCount, InstancePositions.Count, 0, 0, 0); }; I have been over everything I can think of to find the problem but I can't seem to locate it.
      my best guess is the instance data buffer is wrong somehow since VS graphics debugger shows no output from vertex shader stage
       but I just can't see where.
    • By Jordy
      I'm copying mipmaps of a BC3 compressed texture region to a new (and bigger) BC3 compressed texture with ID3D11DeviceContext::CopySubresourceRegion.
      Unfortunately the new texture contains incorrect mipmaps when the width or height of a mipmap level are unaligned to the block size, which is 4 in the case of BC3.
      I think this has to do with the virtual and physical size of a mipmap level for block compressed textures: https://msdn.microsoft.com/en-us/library/windows/desktop/bb694531(v=vs.85).aspx#Virtual_Size
      There is also a warning:
      I don't know how to account for the physical memory size and if that's possible when using ID3D11DeviceContext::CopySubresourceRegion.
      Is it possible, and if so, how?
    • By thefoxbard
      From what the MSDN states, there are two ways of compiling HLSL shaders: either at runtime or "offline" -- using a tool like fxc.exe, for instance
      My question is, are there any risks in using pre-compiled shaders in the final game? I mean, is there any situation in which the pre-compiled shaders might not work?
      Or ideally shaders should always be compiled when lauching the game?
  • Popular Now