• 9
• 10
• 10
• 9
• 10
• ### Similar Content

• While working on a project using D3D12 I was getting an exception being thrown while trying to get a D3D12_CPU_DESCRIPTOR_HANDLE. The project is using plain C so it uses the COBJMACROS. The following application replicates the problem happening in the project.
#define COBJMACROS #pragma warning(push, 3) #include <Windows.h> #include <d3d12.h> #include <dxgi1_4.h> #pragma warning(pop) IDXGIFactory4 *factory; ID3D12Device *device; ID3D12DescriptorHeap *rtv_heap; int WINAPI wWinMain(HINSTANCE hinst, HINSTANCE pinst, PWSTR cline, int cshow) { (hinst), (pinst), (cline), (cshow); HRESULT hr = CreateDXGIFactory1(&IID_IDXGIFactory4, (void **)&factory); hr = D3D12CreateDevice(0, D3D_FEATURE_LEVEL_11_0, &IID_ID3D12Device, &device); D3D12_DESCRIPTOR_HEAP_DESC desc; desc.NumDescriptors = 1; desc.Type = D3D12_DESCRIPTOR_HEAP_TYPE_RTV; desc.Flags = D3D12_DESCRIPTOR_HEAP_FLAG_NONE; desc.NodeMask = 0; hr = ID3D12Device_CreateDescriptorHeap(device, &desc, &IID_ID3D12DescriptorHeap, (void **)&rtv_heap); D3D12_CPU_DESCRIPTOR_HANDLE rtv = ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart(rtv_heap); (rtv); } The call to ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart throws an exception. Stepping into the disassembly for ID3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart show that the error occurs on the instruction
mov  qword ptr [rdx],rax
which seems odd since rdx doesn't appear to be used. Any help would be greatly appreciated. Thank you.

• By lubbe75
As far as I understand there is no real random or noise function in HLSL.
I have a big water polygon, and I'd like to fake water wave normals in my pixel shader. I know it's not efficient and the standard way is really to use a pre-calculated noise texture, but anyway...
Does anyone have any quick and dirty HLSL shader code that fakes water normals, and that doesn't look too repetitious?

• Hi,
I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
• By NikiTo
Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.

if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

MSN>
discard: Do not output the result of the current pixel.
<MSN

As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

(what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
• By NikiTo
I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!

# DX12 [D3D12] One more bug...

This topic is 685 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hi,

Porting my stuff to DirectX12 is proving to be a real challenge.

Right now I am in the process of converting my gpu occlusion code to DX12 and I noticed something very strange.

First off, this bug is happening with and without using WARP adapter.

ConstantBuffer<Parameters> g_parameters : register(b0);
RWStructuredBuffer<uint> VisibleInstances : register(u0);

{
uint instance = DTid.x;
if(instance < g_parameters.instanceCount)
VisibleInstances[instance] = instance;
}


Inspecting the buffer reveals that all the values are "0".

Before you ask, "g_parameters.instanceCount" is not 0 and  the buffer was filled with 0xffff so I could see if something was actually written to it.

If I slightly change the shader to:
(Removing "instance")
ConstantBuffer<Parameters> g_parameters : register(b0);
RWStructuredBuffer<uint> VisibleInstances : register(u0);

{
if(DTid.x < g_parameters.instanceCount)
VisibleInstances[DTid.x] = DTid.x;
}


Then VisibleInstances contains what I would expect.. {0, 1, 2, 3, ... }

I am building shaders using cs_5_1.

I could paste the dissassembled code but I can tell you that the only difference is that the first listing puts the value of DTid.x in r0.x before using it where DTid.x is used in the second listing.

Any idea what is going on?

##### Share on other sites

Does it make any difference if you compile the shaders as cs_5_0?

At least on the compiler I have here, there's only a difference in the compiled DXBC if I use /Od to disable optimisations. Are you using /Od?

##### Share on other sites

I am using /Od in debug but the problem is also in release (without /Od).

I had to convert some shaders to build with cs_5_0 but I was able to test it.

It does appear that the problem is specific to cs_5_1 as I get the expected behavior using cs_5_0.

##### Share on other sites
Is there a way to get unbounded descriptor tables on 5_0 profiles?

##### Share on other sites

No, there isn't.

I take it from your code snippet that you aren't providing a root signature at compile time?

##### Share on other sites
I do not, i build it in c++

##### Share on other sites

Are you in a position to build a small repro and send it our way?

##### Share on other sites

I think so yes. I'll get to it tonight.

##### Share on other sites

Sorry Adam I could not get back to you sooner but I found the issue and its not related to the shader compiler.

The problem was not "DTid.x" but "g_parameters.instanceCount".

After our discussion, I tried to isolate the problem in a smaller program and I could not get a 100% repro.

I am using root constants to pass "g_parameters.instanceCount" and in the smaller app, it was not packed the same way as in my big app.

Then I realized that root constants follows the same packing rules as constant buffers.

My root signature contained 7 root 32bits values to map to an hlsl struct that looks like this:

struct

{

float3 minC;

// < automatic padding here!! >

float3 maxC;

uint instanceCount;

};

I moved "instanceCount" after "minC" and it worked. I could also have added a 32bit value in the root signature to account for padding.

So I learned something... Treat root constants just as you would treat constant buffers (in term of packing rules).

Cheers!