Is it possible, that two or more fragments (perhaps from different primitives) with the SAME xy target (screen) coordinates are ever being executed concurrently by multiple GPU threads?
It sounds weird, let me rephrase it just to be sure
Say I submit 2 triangles that will project to the very same target xy pixels (they might or might not have different z (depth)). 1. For each of the 6 vertices a vertex shader will be invoked and all these 6 vertex shaders run at once (there's plenty of groups and units on the GPU, right?) 2. There is no geometry shader (irrelevant). Both triangles will project to the same pixels of the target. Rasteriser rasterises and pixel shaders get invoked... 3. Is there a possibility that two pixel shaders (one for a fragment on triangle 0 and one for another fragment on triangle 1), which want to shade (and possibly write or scatter) at the very same target (screen) location, get executed really concurrently (obviously on different thread groups)???
I suppose yes. I am obviously looking for the worst-case scenario. Am I right? Early-Z, presence of discards and explicit depth-writes and similar peculiarities probably have a say in this.
Next question would be (DX11/GL4) if it is necessary to have ANY output target bound to the output merger, or is it enough to bind 0 RTVs and 1 UAV (OMSetRenderTargetsAndUnorderedAccessViews()). I'm not talking compute shaders.
Part 1: Theoretically, yes, they can run concurrently. The D3D/GL specs are intentionally not too specific about how this works under the hood, just that the end result needs to be deterministic. There *may* be specific hardware implementations that kill or rearrange operations if this happens in practice, but there are no blanket 'this is how it must work' restrictions I am aware of. When in doubt, use atomics.
Part 2: Good question, I think you may be okay. It's entirely possible to do depth-only rendering by not binding any RTVs (just a DSV) and I would think the same would apply here.
clb: At the end of 2012, the positions of jupiter, saturn, mercury, and deimos are aligned so as to cause a denormalized flush-to-zero bug when computing earth's gravitational force, slinging it to the sun.
This is implementation specific (and I do know of cases where this is and is not true), but yes, on most high end graphics hardware this will happen. Post-shader, there is the requirement that fragments must appear to be rendered as though they were in triangle order.
Thank you very much guys. I will then have to assume (despite your last sentence, Crowley99) it will happen and I will use atomics. I'll try to update it here with performance and results, all if I get to try the 0 RTV + 0 DSV + n UAV scenario.