• 9
• 11
• 9
• 20
• 12
• ### Similar Content

• Hi,
I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
• By NikiTo
Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.

if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

MSN>
discard: Do not output the result of the current pixel.
<MSN

As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

(what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
• By NikiTo
I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!

• By Axiverse
I'm wondering when upload buffers are copied into the GPU. Basically I want to pool buffers and want to know when I can reuse and write new data into the buffers.
• By NikiTo
AMD forces me to use MipLevels in order to can read from a heap previously used as RTV. Intel's integrated GPU works fine with MipLevels = 1 inside the D3D12_RESOURCE_DESC. For AMD I have to set it to 0(or 2). MSDN says 0 means max levels. With MipLevels = 1, AMD is rendering fine to the RTV, but reading from the RTV it shows the image reordered.

Is setting MipLevels to something other than 1 going to cost me too much memory or execution time during rendering to RTVs, because I really don't need mipmaps at all(not for the 99% of my app)?

(I use the same 2D D3D12_RESOURCE_DESC for both the SRV and RTV sharing the same heap. Using 1 for MipLevels in that D3D12_RESOURCE_DESC gives me results like in the photos attached below. Using 0 or 2 makes AMD read fine from the RTV. I wish I could sort this somehow, but in the last two days I've tried almost anything to sort this problem, and this is the only way it works on my machine.)

# DX12 Bug only appear when Debug layer is off... Need help :( (NV Driver bug, Fixed in 378.49)

This topic is 450 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hey Guys,

Recently I encountered a rendering bug which only shows up when debuglayer is off, and later even I fixed all validation errors/warnings (including GPU-Based Validation ones) this bug still exist when debuglayer is off, and it happens on all GPUs available to me (GTX680m, GTX1080) but will not happen on warpdevice*

After days of struggling I found two ways to 'solve' those bug: 1. replacing one particular split barrier with normal barrier; 2. break one cmdlist into 2 and submit them to GPU in order...  All these 'solutions' doesn't make any sense to me, and I am almost run out of ideas. Please see here and there.

So I trimmed my project to get rid of Kinect dependencies (it uses Kinect color and depth sensor image as input) and make a repo for anyone who are interested or are willing to test/help (thanks!)

Here is the repo: https://github.com/pengliu916/BugRepo.git

To successful compile and run the code you need DX12 capable GPU and need windows sdk 10.0.14393.0, and to get rid of GPU-Based Validation warning/errors, you GPU need to support typedload.

(The following paragraph is not necessary for the bug, but just in case someone need more information)

This project originally will use depthMap from Kinect Depth sensor to create/update a TSDF (truncated sign distance field) volume to reconstruct 3D model of what Kinect sees.  To maintain this dynamic sparse volume efficiently, I use blocks to avoid update each voxel every frame. (instead of checking each voxel against the depthmap, I first check each block (contains 8^3 voxels) aginst depthmap, and then in the next pass do voxel-depthmap check only for voxels in needed blocks....... The bug is in this block update routine. And to avoid depending on Kinect, I modified the project to use GPU generated depth map as input (which is a sphere rotating with a radius in foreground with a wall in the background, and to make extremely slow warp device also generate reasonable result, I made the animation based on frame not time, also I change the volume reso to 64^3. You could change it to 512^3, and it will run 70fps on GTX680m, but remember to make voxel size small to see the whole picture, and you will kinda know this is data corruption bug)  You could also press the 'ResetVolume' button to reset related resource. But all other features show up on the right panel may not work or even cause crash since I get rid of a lot important components in a very short time...

If you directly compile and run the project you will see the following

[attachment=34634:Bug.PNG]

So you see the sphere is broken, and background wall is broken due to wrong Block update (and I visualized wrong result block (missing block) as small red box, and correct block as green big box. So ideally you should not see any small red boxes, but only green big box appear and disappear as the sphere moves as the following HWDeviceReference and WarpDeviceReference

[attachment=34632:HWDeviceReference.PNG][attachment=34633:WarpDeviceReference.PNG]

*I lied, warp device won't give you expected result (though it still didn't give you this bug) unless you uncheck the circled checkbox, but it's totally unrelated: that's for the rendering part, but the bug is in volume updating part.

So to 'solve' the bug, there are three ways:

1. Change Core::g_config.enableDebuglayer to true in file KinectVisualizer.cpp line 180 make sure you are in debug build (debug layer will be disabled in other build)


void
KinectVisualizer::OnConfiguration()
{
Core::g_config.FXAA = false;
Core::g_config.warpDevice = false;
Core::g_config.enableDebuglayer = false; // change this to true will enable debug layer
Core::g_config.enableGPUBasedValidationInDebug = false;
Core::g_config.swapChainDesc.Width = _width;
Core::g_config.swapChainDesc.Height = _height;
Core::g_config.swapChainDesc.BufferCount = 5;
Core::g_config.passThroughMsg = true;
Core::g_config.useSceneBuf = false;
}



This will enable debug layer and under Debug build the bug will magically disappear  (I don't know why...)

2. Comment out line 1215 in file TSDFVolume\TSDFVolume.cpp

            cptCtx.DispatchIndirect(_indirectParams, 0);
}
//======================================================================
// Code Part A
//
// The following line will cause the bug if 'Code Part B' is commented

BeginTrans(cptCtx, _occupiedBlocksBuf, UAV); // this line

// Add blocks to UpdateBlockQueue from DepthMap
Trans(cptCtx, _fuseBlockVol, UAV);
Trans(cptCtx, *pDepthTex, psSRV | csSRV);
Trans(cptCtx, *pWeightTex, csSRV);
Trans(cptCtx, _updateBlocksBuf, UAV);


This will remove the split transition (start one, and the end one will automatically become a normal transition). This will 'fix' the bug, I also don't know why...

3. Uncomment 3 lins of code from 1259 in TSDFVolume\TSDFVolume.cpp

       //======================================================================
// Code Part B
//
// The following 3 lines is one work around the bug if
// 'Code Part A' is uncommented

//cptCtx.Flush();
//cptCtx.SetRootSignature(_rootsig);
//_UpdateAndBindConstantBuffer(cptCtx);

// Update voxels in blocks from UpdateBlockQueue and create queues for
// NewOccupiedBlocks and FreedOccupiedBlocks
Trans(cptCtx, _occupiedBlocksBuf, UAV);
Trans(cptCtx, _renderBlockVol, UAV);
Trans(cptCtx, _fuseBlockVol, UAV);
Trans(cptCtx, _newFuseBlocksBuf, UAV);


This will end recording current cmdlist, and flush all cached resource barrier and submit to GPU for execution and grab a new cmdlist from cmdlist pool for the following GPU calls, and set back rootsig and constant buffer.  This also 'fixed' the bug, and I don't know why.....

Please let me know if you have any trouble compile and running the code, and any comments or any words are appreciated.

Edited by Mr_Fox

##### Share on other sites

I get a error there:

macro_[2].Definition = "1";
V(_Compile(L"BlockQueueCreate_cs", macro_, &blockQCreate_Pass1CS));
macro_[2].Definition = "0"; macro_[3].Definition = "1";
V(_Compile(L"BlockQueueCreate_cs", macro_, &blockQCreate_Pass2CS));   <--

First shader compiles fine, but the second fails:

...
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\ucrtbase.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\vcruntime140.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\uxtheme.dll'. Cannot find or open the PDB file.
[ INFO    ]: DX12Framework start
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\msctf.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\dwmapi.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\rmclient.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\amdxc64.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\version.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\amdihk64.dll'. Module was built without symbols.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\detoured.dll'. Module was built without symbols.
[ INFO    ]: D3D12-capable hardware found (selected):  AMD Radeon (TM) R9 Fury Series (4072 MB)
[ INFO    ]: D3D12-capable hardware found:  Microsoft Basic Render Driver (0 MB)
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\amdxc64.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\version.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\amdihk64.dll'. Module was built without symbols.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\detoured.dll'. Module was built without symbols.
[ WARN    ]: Tier 1, 2 and 3 are supported.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\DXGIDebug.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\dcomp.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\rsaenh.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\bcrypt.dll'. Cannot find or open the PDB file.
'MiniEngine-KinectVisualizer.exe' (Win32): Loaded 'C:\Windows\System32\cryptbase.dll'. Cannot find or open the PDB file.
[ INFO    ]: QueryHeap created
[ INFO    ]: Typed load is supported
[ ERROR    ]: C:\dev\BugRepo-master\x64\Debug\TSDFVolume_BlockQueueCreate_cs.hlsl(108,9-39): error X4532: cannot map expression to cs_5_1 instruction set

The program '[0x2174] MiniEngine-KinectVisualizer.exe' has exited with code 0 (0x0).


108 porbably means the line in the source file? There is:

InterlockedOr(

##### Share on other sites

I get a error there:

Thanks for trying, but I just tried on all machine on my lab, they don't get such errors (though I only have one gtx680m two gtx1080 machines).   I guess I should ask for a AMD test machine.....

all shader source are there, and 108 is line number, for this kinda prototype project I compile all shader during run time. But unfortunately I can't reproduce the error you encountered, maybe you could try compile it with sm 5.0 maybe? (in file TSDFVolume\TSDFVolume.cpp line 123 - 125 change all "_5_1" to "_5_0")  at least this change works for Nvidia GPUs here

##### Share on other sites

Or you could try turn on debug layer on  in file KinectVisualizer.cpp line 180 to see whether that will give more informative error message?   It's weird that compile result is so different across different vendor.....

##### Share on other sites

Changing to 5_0 fixed it and it runs on AMD now. Changing all volume size settings to 512 i get 7 ms in release mode.

The bug does not happen, i see always a whole sphere.

There are some artefacts at the shillouette (like in any screenshot you posted), but this does not indicate the bug i guess.

...making a NV driver bug more likely.

##### Share on other sites

Thanks JoeJ, I update the repo with that fix.

The artifacts on silhouette is caused by on purpose rejecting surface with normal diverge too much away from view direction, and that's necessary since those pixels are less stable from kinect, so I just reject them.

Wish to have more people to test it and confirm it is not my bug somewhere......

And, well that's both good news and bad news for me, on good side I think I could just use the workaround and moving forward, on bad side, it means all the days of work is just wasted......

But that brought me a questions: As a prototype project like this, I could get weird bug like this which may take me several days and not be my fault, but for much complex game engine they must have much more such issues, how could they know whether it is driver bug or not? Tracing down such bugs may be a wast of precious time for them.

Again, thank you for helping me on this, really appreciated

##### Share on other sites

I can tell you that on an Intel card (HD 530) the application TDRs. Do you have a DX12 capable Intel GPU to test on?

##### Share on other sites

I can tell you that on an Intel card (HD 530) the application TDRs. Do you have a DX12 capable Intel GPU to test on?

Thanks Adam, I will try to find a dx12 intel gpu to test it in an hour,  do you have other dedicate GPU to see whether it works? Thanks

##### Share on other sites

But that brought me a questions: As a prototype project like this, I could get weird bug like this which may take me several days and not be my fault, but for much complex game engine they must have much more such issues, how could they know whether it is driver bug or not? Tracing down such bugs may be a wast of precious time for them.

Driver bugs are a common thing. I can't confirm this is a driver bug (missing DX knowledge), but if you think the behaviour is against the specs you should talk with NV. They will listen, look at your repo and fix the bug. Although the process might take some time (months).

To make it easier for them you could strip down your project until only the bug remains (e. g. printing just some numbers to proof misbehaviour would be enough).

Usually you just post on their public forums.

I can tell you that on an Intel card (HD 530) the application TDRs. Do you have a DX12 capable Intel GPU to test on?

In the reduction code there is the assumption that the GPU has at least 32 threads in lockstep and some LDS barriers are missing.

Intel has only 8 threads in lockstep, so the code will likely go wrong.

You should add all the barriers and leave it to the compiler to remove them if it's save to do so.

One interesting thing: Changing some block size value from 8x8 to a higher value (16x16 IIRC) at the UI doubled performance for me from 7ms to 3.6 ms.

Is that expected and is it possible to show detailed profiler timings somehow?

##### Share on other sites
In the reduction code there is the assumption that the GPU has at least 32 threads in lockstep and some LDS barriers are missing. Intel has only 8 threads in lockstep, so the code will likely go wrong. You should add all the barriers and leave it to the compiler to remove them if it's save to do so.

Thanks JoeJ, the reduction should be removed since the actual call to that pass is removed, I will update the repo

One interesting thing: Changing some block size value from 8x8 to a higher value (16x16 IIRC) at the UI doubled performance for me from 7ms to 3.6 ms. Is that expected and is it possible to show detailed profiler timings somehow?

Yup, I build this project with lots of different ways as options as you could see from the side panel. there are 2 different block volume: one for update block volume, one for render block volume, and the reso could be changed. and that's mean to be tried to figure out the best configuration.

For GPU timing, on the bottom right corner, there is Engine Panel, and the first collapse line is GPU Profiler, you could click on it and open options for show timing for all passes. But there is probably an other NV bug for that since I only observe corrupted timing info on GTX 1080.

Edited by Mr_Fox