• 9
• 9
• 11
• 13
• 9
• ### Similar Content

• Hi,
I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
• By NikiTo
Some people say "discard" has not a positive effect on optimization. Other people say it will at least spare the fetches of textures.

if (color.A < 0.1f) { //discard; clip(-1); } // tons of reads of textures following here // and loops too
Some people say that "discard" will only mask out the output of the pixel shader, while still evaluates all the statements after the "discard" instruction.

MSN>
discard: Do not output the result of the current pixel.
<MSN

As usual it is unclear, but it suggests that "clip" could discard the whole pixel(maybe stopping execution too)

I think, that at least, because of termal and energy consuming reasons, GPU should not evaluate the statements after "discard", but some people on internet say that GPU computes the statements anyways. What I am more worried about, are the texture fetches after discard/clip.

(what if after discard, I have an expensive branch decision that makes the approved cheap branch neighbor pixels stall for nothing? this is crazy)
• By NikiTo
I have a problem. My shaders are huge, in the meaning that they have lot of code inside. Many of my pixels should be completely discarded. I could use in the very beginning of the shader a comparison and discard, But as far as I understand, discard statement does not save workload at all, as it has to stale until the long huge neighbor shaders complete.
Initially I wanted to use stencil to discard pixels before the execution flow enters the shader. Even before the GPU distributes/allocates resources for this shader, avoiding stale of pixel shaders execution flow, because initially I assumed that Depth/Stencil discards pixels before the pixel shader, but I see now that it happens inside the very last Output Merger state. It seems extremely inefficient to render that way a little mirror in a scene with big viewport. Why they've put the stencil test in the output merger anyway? Handling of Stencil is so limited compared to other resources. Does people use Stencil functionality at all for games, or they prefer discard/clip?

Will GPU stale the pixel if I issue a discard in the very beginning of the pixel shader, or GPU will already start using the freed up resources to render another pixel?!?!

• By Axiverse
I'm wondering when upload buffers are copied into the GPU. Basically I want to pool buffers and want to know when I can reuse and write new data into the buffers.
• By NikiTo
AMD forces me to use MipLevels in order to can read from a heap previously used as RTV. Intel's integrated GPU works fine with MipLevels = 1 inside the D3D12_RESOURCE_DESC. For AMD I have to set it to 0(or 2). MSDN says 0 means max levels. With MipLevels = 1, AMD is rendering fine to the RTV, but reading from the RTV it shows the image reordered.

Is setting MipLevels to something other than 1 going to cost me too much memory or execution time during rendering to RTVs, because I really don't need mipmaps at all(not for the 99% of my app)?

(I use the same 2D D3D12_RESOURCE_DESC for both the SRV and RTV sharing the same heap. Using 1 for MipLevels in that D3D12_RESOURCE_DESC gives me results like in the photos attached below. Using 0 or 2 makes AMD read fine from the RTV. I wish I could sort this somehow, but in the last two days I've tried almost anything to sort this problem, and this is the only way it works on my machine.)

# DX12 Mutexes for sharing resources

This topic is 739 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I'm trying to share resources across different command queues. Obviously I want to keep the sharing to a minimum but at some point something needs to be shared. Afaik there's no mutex object in dx12, but does using fences work? Here's my current plan that seems to work, but I'm wondering if it could be better or what other people do. I want the compute queue to run as fast as possible, and as many times as it can, but I also want the draw queue to stop it when drawing needs to happen.

F0 <- 1

F1 <- 0

Compute loop:

* queue->Set F1 = 1

* queue->Wait for F0 == 1

* Exec command list

* queue->Set F1 = 0

Draw loop:
* cpu->Set F0 = 0, this should stop new computes from starting
* queue->Wait for F1 == 0, this should wait until the currently executing compute command list is finished
* Exec command list, compute queue should not be executing anything at this point
* queue->Set F0 = 1, compute queue can run again

I'm pretty sure there's a deadlock in there somewhere but whatever.

Should/could I do this with one fence instead of two? If I have a bunch of command lists pending inside the compute queue, I want the draw queue to take priority so that I can stack a bunch of stuff into compute but maintain priority for drawing. Obviously this will fall apart if a compute command list takes too long and the draw queue waits for it, so I want compute command lists to be pretty fast relative to draws.

I think setting the queue priority would let me do this -- i.e. if two command queues are waiting on the same fence, the one with higher priority should get it? But all the documentation says is

The priority for the command queue, as a D3D12_COMMAND_QUEUE_PRIORITY enumeration constant to select normal or high priority

so idk :(.
Edited by Dingleberry

##### Share on other sites

You should use fences for multiple queue synchronization on GPU.

Conceptually, I suggest something like this:

commandQueue1->ExecuteCommandLists(...);
// Insert a fence.
{fence, value} = commandQueue1->Signal(fence, value);
// Do some work...
// The next execution of commandQueue2 needs the results of commandQueue1.
commandQueue2->Wait(fence, value);
// The following execution will not happen until the fence is reached.
commandQueue2->ExecuteCommandLists(...);

##### Share on other sites

Why would you down vote somebody for attempting to give an honest answer to your question, even if you disagree with it? You just said "I don't know the answer, but I know that's not it and you're wrong. Piss off mate". That's what you just said.

Edited by ExErvus

##### Share on other sites

Why would you down vote somebody for attempting to give an honest answer to your question, even if you disagree with it? You just said "I don't know the answer, but I know that's not it and you're wrong. Piss off mate". That's what you just said.

I'm not saying that the downvote was justified, however let's not start an unnecessary heated discussion, let's wait for Dingleberry to respond.

##### Share on other sites

I don't want to talk about up/downvoting at all, unless you're talking about compute shader vote functions. I didn't tell anyone to piss off, they simply didn't read my post.

Edited by Dingleberry

##### Share on other sites

Have you looked at the nBodyGravity sample? I think this does exactly what you're trying to do: simulate as frequently as possible and every now and then render the results into the swapchain buffer. It uses multiple threads as well as multiple queues.

##### Share on other sites

I need to go over it again, but why are multiple threads necessary? In this case the threads don't seem to be performing a whole lot of work -- I understand it's a sample, but even then the situation seems to be a lot of compute work vs not too much time to assemble the compute command list. Wouldn't it be fine for the main thread to synchronize both command queues?