[D3D12] Rasterizer Ordered View. What is it actually capable of?

Started by
6 comments, last by _void_ 8 years, 6 months ago

Hello,

I have started playing around with ROV and order independent transparency.

I have a very simple app with two transparent triangles intersecting.

The documentation does not mention anything how you could specify the ordering for them: either back-to-front or front-to-back.

I thought that I should be able to figure this out from the rendered result. So I am overwriting always ROV with the last received color in pixel shader.


struct PSInput
{
	float4 screenSpacePos	: SV_Position;
	float4 color	        : COLOR;
};

RasterizerOrderedTexture2D<float4> BlendTexture : register(u0);

void Main(PSInput input)
{
	int2 texCoord = int2(input.screenSpacePos.xy);
	BlendTexture[texCoord] = input.color;
}

I specify the data for two triangles in one vertex buffer and then render them with one draw call.

Thr red triangle goes first and the blue triangle after.


struct Vertex
{
	Vector4f position;
	Vector4f color;
};	
const Vertex vertices[] = 
{
        // Red triangle
	{Vector4f( 0.5f,  0.75f, 0.8f, 1.0f), Vector4f(1.0f, 0.0f, 0.0f, 1.0f)},
	{Vector4f( 0.5f, -0.75f, 0.8f, 1.0f), Vector4f(1.0f, 0.0f, 0.0f, 1.0f)},
	{Vector4f(-0.4f,  0.0f,  0.2f, 1.0f), Vector4f(1.0f, 0.0f, 0.0f, 1.0f)},
        // Blue triangle
	{Vector4f(-0.5f,  0.75f, 0.8f, 1.0f), Vector4f(0.0f, 0.0f, 1.0f, 1.0f)},
	{Vector4f( 0.4f,  0.0f,  0.2f, 1.0f), Vector4f(0.0f, 0.0f, 1.0f, 1.0f)},
	{Vector4f(-0.5f, -0.75f, 0.8f, 1.0f), Vector4f(0.0f, 0.0f, 1.0f, 1.0f)},
};

In the result below, you can see that the red triangle indeed was rendered first and the blue triangle after.

I expect the blue triangle to be overwritten by red triangle as it is located closer to the viewer for some parts. However, this does not happen.

I was assuming pixel sync to work in a similar fashion as it works with per-pixel lists for OIT.

It seems that ROV only ensures that primitives are rendered in the order they are submitted in, not in the order of pixels as if they are in a per-pixel list.

MSDN (https://msdn.microsoft.com/en-us/library/windows/desktop/dn914601(v=vs.85).aspx) states that

The order in which overlapping ROV accesses of pixel shader invocations are executed is identical to the order in which the geometry is submitted

ROV.png

I would like to ask you these questions:

1. How do we specify the ordering for ROV?

2. Is the ordering on per-primitive or per-pixel level (like with per-pixel lists for OIT)?

Many thanks

Advertisement

I would like to ask you these questions:
1. How do we specify the ordering for ROV?
2. Is the ordering on per-primitive or per-pixel level (like with per-pixel lists for OIT)?

You've quoted the documentation already -- The order in which overlapping ROV accesses of pixel shader invocations are executed is identical to the order in which the geometry is submitted. That's it.
ROV's don't just magically give you OIT out of the box -- but they are a useful tool for accelerating certain OIT algorithms.

The purpose of ROV's is that they allow pixel shader invocations to occur atomically, which means you can perform read->modify->write algorithms within the pixel shader.

Normally, for a pixel where the red + blue triangles overlap, the GPU is allowed to concurrently execute the pixel shader for both the red triangle and the blue triangle, however, even though it's executed these shaders concurrently, the "Output merger" stage of the pipeline will ensure that the red triangle's result will be written to the Render-Target-View first, and the blue triangle's result written secondly.
i.e. traditionally, the GPU timeline might look like:


[Red Pixel Shader]->export result->[OM writes red pixel to RTV]
[Blue Pixel Shader]->export result->....waiting.....waiting....->[OM writes blue pixel to RTV]

This is perfectly fine for regular rendering -- the OM ensures that the render target is written to according to the draw order... but, no guarantees are made about the order that the pixel shaders are executed in. With traditional algorithms, this doesn't really matter.

This traditional behavior only becomes a big problem if the pixel shader wants to use UAV's to perform a read-modify-write operation on the current pixel -- they're running concurrently in a non-deterministic order, so such an operation cannot be safely performed. What we need is a per-pixel mutex!
Within the pixel shader you might have:


Red:[Read from UAV][Compute result][Write to UAV]
Blue:.......[Read from UAV][Compute result][Write to UAV]

Or randomly, you might have:


Red:......[Read from UAV][Compute result][Write to UAV]
Blue:[Read from UAV][Compute result][Write to UAV]

Or randomly, if you're very lucky, everything might seem to be correct:


Red:Read from UAV][Compute result][Write to UAV]
Blue:........lucky wait........lucky wait.......[Read from UAV][Compute result][Write to UAV]

This is a text-book race-condition. It's down to chance whether the two pixel shaders will communicate properly, or whether one will simply overwrite the other.

When using ROV's, the timeline changes to:


[Red Pixel Shader]->export result->[OM writes red pixel to RTV]
.....waiting......[Blue Pixel Shader]->export result->..waiting...[OM writes blue pixel to RTV]

And so, the read-modify-write operation becomes perfectly safe:


Red: [Read from UAV][Compute result][Write to UAV]
Blue:....waiting........waiting........waiting...[Read from UAV][Compute result][Write to UAV]

And now you can be sure that the blue pixel will read the value that was written by the red pixel. The two invocations of the pixel shader can now communicate with each other safely, as if there was a mutex on the pixel.

These HW-accelerated "per-pixel mutexes" mean that OIT algorithms that have complex per-pixel data structures can be written in new ways now.

See also:
https://software.intel.com/en-us/blogs/2013/03/27/programmable-blend-with-pixel-shader-ordering
http://advances.realtimerendering.com/s2013/2013-07-23-SIGGRAPH-PixelSync.pdf
http://international.download.nvidia.com/geforce-com/international/pdfs/GeForce_GTX_980_Whitepaper_FINAL.PDF
Intel has this feature available in D3D11, under the name "pixel shader ordreing" or "PixelSync"

@Hodgman, thank you for such a nice explanation.

I have just had a quick look at the presentation by NVIDIA you have linked, page 25 in particular.

One potential application for Raster Ordered View is order independent transparency rendering algorithms, which handle the case of an application that is unable to pre-sort its transparent geometry by instead having the pixel shader maintain a sorted list of transparent fragments per pixel.

After reading this you could come to the conclusion that ROV should do exactly what per-pixel list does and even have advantages over it.

At least it should take care of sorting your data. But it does not.

which handle the case of an application that is unable to pre-sort its transparent geometry by instead having the pixel shader maintain a sorted list of transparent fragments per pixel

It is you who are responsible for sorting the data as you specify its order when you feed it into IA. The documentation is really misleading.

@Hodgman, thanks again!

After reading this you could come to the conclusion that ROV should do exactly what per-pixel list does and even have advantages over it.

At least it should take care of sorting your data. But it does not.

Yes, it does.

You're writing a blue and and a red triangle. First the red triangle is sent to GPU, then the blue triangle. The following can happen when you don't use ROVs:

Frame 0:
1. Blue's pixel shader is started
2. Red's pixel shader is started
3. Blue's pixel shader writes to the UAV.
4. Red's pixel shader writes to the UAV.
5. Blue's pixel shader finishes.
6. Red's pixel shader finishes.
7. The ROP unit sorts red & blue triangle so that the output of Blue is always written to the colour framebuffer after the Red one.

Result: Blue wrote to the UAV first, then Red did.

Frame 1:
1. Blue's pixel shader is started
2. Red's pixel shader is started
3. Red's pixel shader writes to the UAV.
4. Red's pixel shader finishes.
5. Blue's pixel shader writes to the UAV.
6. Blue's pixel shader finishes.
7. The ROP unit sorts red & blue triangle so that the output of Blue is always written to the colour framebuffer after the Red one.

Result: Red wrote to the UAV first, then Blue did.

In frame 0, Blue wrote to the UAV before Red. In frame 1 Red wrote to the UAV before Blue. You get completely random order. If the UAV space is limited (e.g. only support up to 4 fragments per pixel), you can get a lot of flickering (e.g. if you have 8 triangles, any random 4 of these 8 triangles will be written to the UAV, so the result may vary a lot every frame).

With ROVs, Red will ALWAYS write to the ROV before Blue does. Granted, you need to sort the triangles yourself which normally isn't trivial. But at least you can make the assumption that the mesh rendered in the previous drawcall has already written to the ROV (e.g. sort at draw call level, rather than triangle level). Without ROVs, you cannot assume that unless you place full blown memory barriers between the draw calls (can get pretty expensive!).
Also even if you don't sort at all, you get the guarantee the exact same input will always generate the same output; without ROVs there will be race conditions so the output may vary per frame, despite having the same input. In other words you get deterministic order.

Edit: Note all of this is assuming you're not forcing earlydepthstencil. If you do, then you still will get race conditions, because if the GPU evaluates depth of Blue before Red; the pixel shader of Red will never run (since we already know Blue will be in front of Red), if the GPU evaluates the depth of Red before Blue, then the pixel shader of Red will be executed and then the value overwritten by Blue's.

After reading this you could come to the conclusion that ROV should do exactly what per-pixel list does and even have advantages over it.

At least it should take care of sorting your data. But it does not.

What NV are saying in that quote, is that ROVs are useful when implementing "per-pixel lists for OIT".
Without ROV, you'd have to use all sorts of atomic contraptions to safely generate your lists, but ROVs make it very simple to safely generate the lists.

And then the marketing guys get in there and "make easier"->"make possible" :lol:

@Matias many thanks for trying to help

Could you please elaborate on this? I am not really following you.

But at least you can make the assumption that the mesh rendered in the previous drawcall has already written to the ROV (e.g. sort at draw call level, rather than triangle level). Without ROVs, you cannot assume that unless you place full blown memory barriers between the draw calls (can get pretty expensive!).

@Matias many thanks for trying to help

Could you please elaborate on this? I am not really following you.

But at least you can make the assumption that the mesh rendered in the previous drawcall has already written to the ROV (e.g. sort at draw call level, rather than triangle level). Without ROVs, you cannot assume that unless you place full blown memory barriers between the draw calls (can get pretty expensive!).

GPUs must ensure that the output of the rasterized triangles stays in order (in the order they were submitted). But this doesn't mean they get processed in order.

Normally this is not a problem; since eventually the ROP or something similar will be the one performing the final sort. How this is done is extremely GPU-architecture dependant.

This breaks when UAVs enter the scene (hence they're Uav, the U stands for unordered); since UAVs are accessed while the shaders are being ran. The shaders are not necessarily in order (except when using ROVs).

Hence if your code does something like this:


DrawPrimitive( 3 triangles ); //Red
DrawPrimitive( 10 triangles ); //Blue
DrawPrimitive( 6 triangles ); //Yellow

The UAV may be filled with 3 yellow tris, 2 blue tris, 2 red tris, 2 yellow tris, 1 blue tri, etc. In that order. Following frame may have a completely different order.

The only ways to guarantee the UAV is first filled with 3 red tris, then 10 blue tris, then 6 yellow tris is with either:

1. ROVs.

2. Multiple passes, sending IDs and then performing a final sorting phase to ensure order.

3. Explicit memory barriers. In OpenGL it would be:


DrawPrimitive( 3 triangles ); //Red
glMemoryBarrier();
DrawPrimitive( 10 triangles ); //Blue
glMemoryBarrier();
DrawPrimitive( 6 triangles ); //Yellow

In D3D12:


DrawPrimitive( 3 triangles ); //Red
ResourceBarrier();
DrawPrimitive( 10 triangles ); //Blue
ResourceBarrier();
DrawPrimitive( 6 triangles ); //Yellow

Note: D3D11 doesn't provide explicit memory barriers.

Without the barriers or ROVs, you get data races. A barrier per draw would obliterate performance (if you have many draws) so that gets ruled out for most of the uses.

Normally it's not THAT chaotic because most GPU architectures will consume draws serially and not in parallel (Specially if there were state changes involved in the middle), or have some internal/implicit dependency that puts some order, but you're running on luck and may stop working on a different GPU.

It's not impossible nor guaranteed that they will be processed serially; hence a yellow triangle (which was submitted last) may be inserted first into your UAV. This is completely counter-intuitive and for some algorithm it can cause issues.

So what I was trying to say with ROVs, is that although you may not want / be able to sort all 22 triangles in these 3 draw calls (usually you will have thousands or millions of tris with thousands of draws btw, not to mention the problem of overlapping triangles), you may want to sort the draws so you can control at drawcall granularity (e.g. you don't care which of the yellow triangles in the draw gets added first to the ROV, but at least you can ensure the Red triangles are inserted before blue & yellow ones by having the DrawCall with the Red triangles go before the Blue & Yellow ones)

In my experience UAVs aren't super chaotic as I'm making it sound, but you don't really get any guarantee. With ROVs, you get guarantees, no data races, deterministic output, and also a lot of control on the order. Needless to say these benefits don't come for free, but they're much cheaper than the alternatives (explicit memory barriers, using multiple passes and/or UAV sorting phases)

Controlling each individual tri could be prohibitively expensive, but controlling them at draw call level... that's another story.

In fact most game engines sort their draw calls every frame.

@Hodgman and @Matias thank you once again

This topic is closed to new replies.

Advertisement