Sign in to follow this  

[D3D12] Rasterizer Ordered View. What is it actually capable of?

This topic is 801 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello,

 

I have started playing around with ROV and order independent transparency.

I have a very simple app with two transparent triangles intersecting.

 

The documentation does not mention anything how you could specify the ordering for them: either back-to-front or front-to-back.

I thought that I should be able to figure this out from the rendered result. So I am overwriting always ROV with the last received color in pixel shader.

struct PSInput
{
	float4 screenSpacePos	: SV_Position;
	float4 color	        : COLOR;
};

RasterizerOrderedTexture2D<float4> BlendTexture : register(u0);

void Main(PSInput input)
{
	int2 texCoord = int2(input.screenSpacePos.xy);
	BlendTexture[texCoord] = input.color;
}

I specify the data for two triangles in one vertex buffer and then render them with one draw call.

Thr red triangle goes first and the blue triangle after.

struct Vertex
{
	Vector4f position;
	Vector4f color;
};	
const Vertex vertices[] = 
{
        // Red triangle
	{Vector4f( 0.5f,  0.75f, 0.8f, 1.0f), Vector4f(1.0f, 0.0f, 0.0f, 1.0f)},
	{Vector4f( 0.5f, -0.75f, 0.8f, 1.0f), Vector4f(1.0f, 0.0f, 0.0f, 1.0f)},
	{Vector4f(-0.4f,  0.0f,  0.2f, 1.0f), Vector4f(1.0f, 0.0f, 0.0f, 1.0f)},
        // Blue triangle
	{Vector4f(-0.5f,  0.75f, 0.8f, 1.0f), Vector4f(0.0f, 0.0f, 1.0f, 1.0f)},
	{Vector4f( 0.4f,  0.0f,  0.2f, 1.0f), Vector4f(0.0f, 0.0f, 1.0f, 1.0f)},
	{Vector4f(-0.5f, -0.75f, 0.8f, 1.0f), Vector4f(0.0f, 0.0f, 1.0f, 1.0f)},
};

In the result below, you can see that the red triangle indeed was rendered first and the blue triangle after.

I expect the blue triangle to be overwritten by red triangle as it is located closer to the viewer for some parts. However, this does not happen.

I was assuming pixel sync to work in a similar fashion as it works with per-pixel lists for OIT.

It seems that ROV only ensures that primitives are rendered in the order they are submitted in, not in the order of pixels as if they are in a per-pixel list.

 

MSDN (https://msdn.microsoft.com/en-us/library/windows/desktop/dn914601(v=vs.85).aspx) states that

 

 

The order in which overlapping ROV accesses of pixel shader invocations are executed is identical to the order in which the geometry is submitted

 

ROV.png

 

I would like to ask you these questions:

1. How do we specify the ordering for ROV?

2. Is the ordering on per-primitive or per-pixel level (like with per-pixel lists for OIT)?

 

Many thanks

Edited by _void_

Share this post


Link to post
Share on other sites

@Hodgman, thank you for such a nice explanation.

 

I have just had a quick look at the presentation by NVIDIA you have linked, page 25 in particular.

 

 

One potential application for Raster Ordered View is order independent transparency rendering algorithms, which handle the case of an application that is unable to pre-sort its transparent geometry by instead having the pixel shader maintain a sorted list of transparent fragments per pixel.

 

After reading this you could come to the conclusion that ROV should do exactly what per-pixel list does and even have advantages over it.

 

At least it should take care of sorting your data. But it does not.

 

 

 

which handle the case of an application that is unable to pre-sort its transparent geometry by instead having the pixel shader maintain a sorted list of transparent fragments per pixel

 

It is you who are responsible for sorting the data as you specify its order when you feed it into IA. The documentation is really misleading.

 

@Hodgman, thanks again!

Edited by _void_

Share this post


Link to post
Share on other sites

After reading this you could come to the conclusion that ROV should do exactly what per-pixel list does and even have advantages over it.
 
At least it should take care of sorting your data. But it does not.

Yes, it does.

You're writing a blue and and a red triangle. First the red triangle is sent to GPU, then the blue triangle. The following can happen when you don't use ROVs:

Frame 0:
1. Blue's pixel shader is started
2. Red's pixel shader is started
3. Blue's pixel shader writes to the UAV.
4. Red's pixel shader writes to the UAV.
5. Blue's pixel shader finishes.
6. Red's pixel shader finishes.
7. The ROP unit sorts red & blue triangle so that the output of Blue is always written to the colour framebuffer after the Red one.

Result: Blue wrote to the UAV first, then Red did.

Frame 1:
1. Blue's pixel shader is started
2. Red's pixel shader is started
3. Red's pixel shader writes to the UAV.
4. Red's pixel shader finishes.
5. Blue's pixel shader writes to the UAV.
6. Blue's pixel shader finishes.
7. The ROP unit sorts red & blue triangle so that the output of Blue is always written to the colour framebuffer after the Red one.

Result: Red wrote to the UAV first, then Blue did.

In frame 0, Blue wrote to the UAV before Red. In frame 1 Red wrote to the UAV before Blue. You get completely random order. If the UAV space is limited (e.g. only support up to 4 fragments per pixel), you can get a lot of flickering (e.g. if you have 8 triangles, any random 4 of these 8 triangles will be written to the UAV, so the result may vary a lot every frame).

With ROVs, Red will ALWAYS write to the ROV before Blue does. Granted, you need to sort the triangles yourself which normally isn't trivial. But at least you can make the assumption that the mesh rendered in the previous drawcall has already written to the ROV (e.g. sort at draw call level, rather than triangle level). Without ROVs, you cannot assume that unless you place full blown memory barriers between the draw calls (can get pretty expensive!).
Also even if you don't sort at all, you get the guarantee the exact same input will always generate the same output; without ROVs there will be race conditions so the output may vary per frame, despite having the same input. In other words you get deterministic order.

Edit: Note all of this is assuming you're not forcing earlydepthstencil. If you do, then you still will get race conditions, because if the GPU evaluates depth of Blue before Red; the pixel shader of Red will never run (since we already know Blue will be in front of Red), if the GPU evaluates the depth of Red before Blue, then the pixel shader of Red will be executed and then the value overwritten by Blue's.

Edited by Matias Goldberg

Share this post


Link to post
Share on other sites

After reading this you could come to the conclusion that ROV should do exactly what per-pixel list does and even have advantages over it.

At least it should take care of sorting your data. But it does not.

What NV are saying in that quote, is that ROVs are useful when implementing "per-pixel lists for OIT".
Without ROV, you'd have to use all sorts of atomic contraptions to safely generate your lists, but ROVs make it very simple to safely generate the lists.

And then the marketing guys get in there and "make easier"->"make possible" :lol:

Share this post


Link to post
Share on other sites

@Matias many thanks for trying to help

 

Could you please elaborate on this? I am not really following you.

 

 

 

But at least you can make the assumption that the mesh rendered in the previous drawcall has already written to the ROV (e.g. sort at draw call level, rather than triangle level). Without ROVs, you cannot assume that unless you place full blown memory barriers between the draw calls (can get pretty expensive!).

Share this post


Link to post
Share on other sites

 

@Matias many thanks for trying to help

 

Could you please elaborate on this? I am not really following you.

 

 

 

But at least you can make the assumption that the mesh rendered in the previous drawcall has already written to the ROV (e.g. sort at draw call level, rather than triangle level). Without ROVs, you cannot assume that unless you place full blown memory barriers between the draw calls (can get pretty expensive!).

 

GPUs must ensure that the output of the rasterized triangles stays in order (in the order they were submitted). But this doesn't mean they get processed in order.

Normally this is not a problem; since eventually the ROP or something similar will be the one performing the final sort. How this is done is extremely GPU-architecture dependant.

 

This breaks when UAVs enter the scene (hence they're Uav, the U stands for unordered); since UAVs are accessed while the shaders are being ran. The shaders are not necessarily in order (except when using ROVs).

 

Hence if your code does something like this:

DrawPrimitive( 3 triangles ); //Red
DrawPrimitive( 10 triangles ); //Blue
DrawPrimitive( 6 triangles ); //Yellow

The UAV may be filled with 3 yellow tris, 2 blue tris, 2 red tris, 2 yellow tris, 1 blue tri, etc. In that order. Following frame may have a completely different order.

 

The only ways to guarantee the UAV is first filled with 3 red tris, then 10 blue tris, then 6 yellow tris is with either:

1. ROVs.

 

2. Multiple passes, sending IDs and then performing a final sorting phase to ensure order.

 

3. Explicit memory barriers. In OpenGL it would be:

DrawPrimitive( 3 triangles ); //Red
glMemoryBarrier();
DrawPrimitive( 10 triangles ); //Blue
glMemoryBarrier();
DrawPrimitive( 6 triangles ); //Yellow

In D3D12:

DrawPrimitive( 3 triangles ); //Red
ResourceBarrier();
DrawPrimitive( 10 triangles ); //Blue
ResourceBarrier();
DrawPrimitive( 6 triangles ); //Yellow

Note: D3D11 doesn't provide explicit memory barriers.

 

Without the barriers or ROVs, you get data races. A barrier per draw would obliterate performance (if you have many draws) so that gets ruled out for most of the uses.

 

Normally it's not THAT chaotic because most GPU architectures will consume draws serially and not in parallel (Specially if there were state changes involved in the middle), or have some internal/implicit dependency that puts some order, but you're running on luck and may stop working on a different GPU.

It's not impossible nor guaranteed that they will be processed serially; hence a yellow triangle (which was submitted last) may be inserted first into your UAV. This is completely counter-intuitive and for some algorithm it can cause issues.

 

So what I was trying to say with ROVs, is that although you may not want / be able to sort all 22 triangles in these 3 draw calls (usually you will have thousands or millions of tris with thousands of draws btw, not to mention the problem of overlapping triangles), you may want to sort the draws so you can control at drawcall granularity (e.g. you don't care which of the yellow triangles in the draw gets added first to the ROV, but at least you can ensure the Red triangles are inserted before blue & yellow ones by having the DrawCall with the Red triangles go before the Blue & Yellow ones)

 

In my experience UAVs aren't super chaotic as I'm making it sound, but you don't really get any guarantee. With ROVs, you get guarantees, no data races, deterministic output, and also a lot of control on the order. Needless to say these benefits don't come for free, but they're much cheaper than the alternatives (explicit memory barriers, using multiple passes and/or UAV sorting phases)

Controlling each individual tri could be prohibitively expensive, but controlling them at draw call level... that's another story.

In fact most game engines sort their draw calls every frame.

Edited by Matias Goldberg

Share this post


Link to post
Share on other sites

This topic is 801 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this