Detailed flow through the Direct3D 11 Pipeline

posted in On the path with a ramblin' man

Published March 21, 2009

Feedback on the below is greatly appreciated!

Detailed flow

At first glance the new stages don't appear overly complex, but on closer inspection when designing and writing code the flow of data and the responsibilities of each unit can quickly become confusing. This is compounded by the deeper pipeline being harder to visualize - a classic VS+PS pipeline was easy and small enough to be held in a developers head, but trying to juggle all 6 stages can get much more difficult!

The above diagram shows a simplified flow of the Curved Point Normal implementation covered later in this article. To help make the diagram clearer the conceptual view discussed in the previous section has been included with matching a colour scheme. Arrows show the flow of data and/or execution and individual shader functions are denoted in the form "xx(yy) -> zz", where 'xx' is the abbreviation of the shader type, 'yy' are important parameter(s) and 'zz' is the output.

Before getting involved in the full details there are two important and general observations to be made:

Several of the shaders are executed repeatedly for different inputs according to configuration by the developer. Whilst a bit obvious it is important to note that a Domain Shader transforms only a single new point at a time and the code you write does not have to write out all new points to a stream in a similar way to the Geometry Shader.
Previously in Direct3D 10 the Geometry Shader was the only unit that had visibility of the entire set of inputs. Now the Hull, Domain and Geometry Shaders have visibility of all outputs from the previous stage.

The Input Assembler

As previously mentioned, the input shader can now output primitives with up to 32 vertices but regardless of this the IA functions in exactly the same way as it has in previous versions. It uses the vertex declaration (a ID3D11InputLayout created from an array of D3D11_INPUT_ELEMENT_DESC's), a vertex buffer (an ID3D11Buffer with binding of D3D11_BIND_VERTEX_BUFFER) and an index buffer (another ID3D11Buffer with a binding of D3D11_BIND_INDEX_BUFFER). The topology set on the device will be D3D11_PRIMITIVE_TOPOLOGY_n_CONTROL_POINT_PATCHLIST where 'n' is between 1 and 32 and the IA will then read the index buffer in chunks of 'n' and pick out the appropriate vertices from the vertex buffer.

In this example there are four vertices defining a quad on the XZ plane and six indices defining two triangles. Unlike many other tessellation algorithms there is no adjacency information required thus it doesn't appear to be any different from rendering a normal quad. Without tessellation the output would look like:
< image of just plain 4-vert quad >

The Vertex Shaders

As previously stated, the vertex shaders no longer have to output a projection space vertex to SV_Position like in Direct3D 10 (technically this could be done in a Geometry Shader in D3D10, but it was more efficient to stick with the conventional VS approach). It is now completely free to operate on data in whatever form the application gives it (via the IA) and output that data in whatever coordinate system or format it chooses.

The common use-case for a vertex shader with D3D11 tessellation will be for animation - transforming a model according to the bones provided for a skeletal animation being a good example. In this example the vertex shader simply transforms the model-space vertex buffer data into world-space for the later stages. Note that once the VS has executed the later stages cannot see any of the original data from the vertex buffer, so if this is useful it should be passed down as part of an output.

The Hull Shaders

This is the first new programmable unit in the Direct3D 11 pipeline. It is made up of two developer-authored functions - the Hull Shader itself and a 'constant function'.

The constant function is executed once per-patch and its job is to compute any values that are shared by all control points and don't logically belong as per-control-point attributes. As far as Direct3D 11 is concerned the requirement is to output an array of SV_TessFactor and SV_InsideTessFactor values. Depending on the primitive topology the size of these arrays varies, but all this is discussed in a later section. The outputs from a constant function are limited to 128 scalars (32 float4's) which gives ample room for per-patch constants once the tessellation factors have been included.

An attribute on the main Hull Shader function declares how many output control points will be generated. This can be a maximum of 32, and does not have to match the topology set on the Input Assembler - it is perfectly legal for the HS to increase or decrease the number of control points. This attribute determines how many times the individual Hull Shader is executed, once for each of the declared output (the index is provided via a SV_OutputControlPointID uint input) control points. The quantity of data can be up to 32 float4's or 128 scalars, the same as the per-patch constant function but with one difference; the maximum output for all HS invocations is 3,968 scalars. In practice this means that if you're outputting 32 control points then you can only use 31 float4's instead of 32. Putting these numbers together the entire Hull Shader output size is clamped to 4kb.

Both functions have full visibility of all vertices output by the Vertex Shader and deemed to be part of this primitive. This is represented on the diagram by the blue arrows leading from the Vertex Shading section to each of the Hull Shader and constant function invocations.

In this example of Curved Point Normal Triangles (discussed in detail later) the later Domain Shader needs 10 control points to construct the appropriate cubic surface. These are broken up into 3 for the original vertices, 2 for each edge (6 in total) and 1 in the middle of the triangle, for 10 in total.

The Fixed Function Tessellator

The next stage of processing is entirely fixed function and operates as a black-box except for the two inputs - the SV_TessFactor and SV_InsideTessFactor values output by the Hull Shader constant function.

A very important point to note is that the control points output by the Hull Shader are not used by this stage. That is, it does all of its tessellation work based on the two aforementioned inputs. The control points are entirely in existence for you as the developer to create your tessellation algorithm and the pipeline itself doesn't ever pay them any attention, hence why there are no restrictions (other than space) on the output from the individual invocations.
The output from the tessellator is a set of weights corresponding to the primitive topology declared in the Hull Shader - line, triangle or quad. Each of these weights gets fed into a separate Domain Shader invocation, discussed next. In addition to these newly created vertices which the developer actually sees as part of the Domain Shading stage the tessellator also handles the necessary winding and relations between domain samples so that they form correct triangles that can later be rasterized.

The Domain Shaders

Domain shaders run in isolation yet, as is necessary, they can see all the control points and per-patch constants output by the earlier Hull Shader stage. Simply put, the domain shader's job is to take the point on the line/triangle/quad domain provided by the tessellator and use the control mesh provided by the hull shader to create a complete new, renderable vertex.
It is now the Domain Shader's responsibility to output a projection space vertex coordinate to SV_Position (although, strictly speaking, the GS could do this but it'll be less efficient).

The Geometry Shaders

This stage remains unchanged from previous versions of Direct3D and effectively marks the end of any tessellation related programming. Unless the Domain Shader passed along any of the control mesh information as per-vertex attributes then the Geometry Shader has no knowledge of the tessellation work that preceded it.
Unlike in Direct3D 10 where the number of GS invocations was directly linked to the parameters of a draw call, the number of executions is now linked to the SV_TessFactor and SV_InsideTessFactor values emitted from the Hull Shader constant function. If these are constant and/or set by the application (e.g. via a constant buffer) then you can derive the number of GS invocations, but if a more intelligent LOD scheme is implemented then the number of invocations will be much more difficult to compute.

Rasterization and the Pixel Shaders

Tessellation is a geometry based operation, thus the final rasterization stages remain completely unchanged and oblivious to anything that came before it.

One Final Note on Efficiency

The diagram presented here takes the naive approach of executing as many times as there are outputs or inputs. It is expected that hardware can take advantage of commonality (via pre- or post-transform caches for example) and reduce the number of invocations. The two best candidates are the vertex and domain shaders; in this example there are 6 VS invocations and 36 DS invocations yet there are only 4 unique control points and 14 domain points. Specifically, the example used here would do 50% more vertex shading and 157% more domain shading - in a field where performance is crucial it's easy to see why the hardware would want to be cleverer!

Previous Entry Diagram update

Next Entry Snook's terrain algo on a GPU

0 likes 6 comments

Comments

Mike.Popoloski

I absolutely love the color coding and the way it's linked up with the overview. It's a good way to break down the increasingly complex pipeline. I approve.

March 21, 2009 11:19 PM

jollyjeffers

Quote:Original post by Mike.Popoloski
I approve.

Excellent, thanks! [grin]

March 22, 2009 02:39 AM

clb

Excellent job covering the new D3D11 ground so early on! Just a small note, you have a line "< image of just plain 4-vert quad >" there, was it supposed to show a picture instad?

March 22, 2009 04:10 AM

Jason Z

Great job covering many details in not so much space - I just have a couple small suggestions:

1. Add your shader code snippets in expandable code blocks by each explanation for your sample algorithm. It will help clarify what each stage is doing.

2. Perhaps add a shorter overview of the basic concept of each (new) stage to consider while reading - I think the HS and DS stages are going to be a big pill to swallow on first reading!

Anyhow, I think it turned out great, and the color coding is very effective. Also, I think you mentioned before that you were going to add images of the geometry at each stage - this will be immensely helpful too!

March 22, 2009 07:26 AM

sirob

Why are there two tesselators? Any chance you could reduce it to just one box that executes twice? (unless I misunderstood why there were two).

Looks awesome otherwise!

March 22, 2009 02:50 PM

jollyjeffers

Thanks for the feedback everyone - greatly appreciated [smile]

Quote:you have a line "< image of just plain 4-vert quad >" there, was it supposed to show a picture instad?

I thought I edited that out, oops! This text is an excerpt from a larger article and that placeholder is highlighted in bold+yellow so that I can fill it in once I've generated the image...

Quote:Add your shader code snippets in expandable code blocks by each explanation for your sample algorithm. It will help clarify what each stage is doing.

Later parts of the article go into the actual code, so I wasn't planning on including any here. This still counts as part of the introduction [smile]

Quote:add a shorter overview of the basic concept of each (new) stage to consider while reading

Yup, that already exists in the longer article - I just haven't posted it here! Glad you suggested it as I was debating whether the "high level" then "detailed view" were both necessary of if I should merge the two...

Quote:you mentioned before that you were going to add images of the geometry at each stage

Yup, I just need to run up the sample code and grab some shots...

Quote:Why are there two tesselators?

To correspond with the two paths through the pipeline - there are two primitives being rendered, thus I've effectively duplicated the diagram. I thought it'd be clearer how/where the tessellator worked if I had two of them - putting a single unit in seemed to suggest to me that they were somehow shared or that it was some sort of forced synchronisation of the pipeline...

Cheers,
Jack

March 23, 2009 08:39 AM

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

jollyjeffers

Author

Detailed flow through the Direct3D 11 Pipeline

Comments

jollyjeffers

Latest Entries

AWOL Update

Windows 7

SlimDX11 and stuff

Windows 7

Another stab at Manged DirectX?!

More Compute Shader Goodness

Improved LOD algorithm

Compute Shader seeded Terrain Tessellation

Compute Shader for Terrain Rendering

Two new YouTube videos

Detailed flow through the Direct3D 11 Pipeline

Comments

jollyjeffers

Latest Entries

AWOL Update

Windows 7

SlimDX11 and stuff

Windows 7

Another stab at Manged DirectX?!

More Compute Shader Goodness

Improved LOD algorithm

Compute Shader seeded Terrain Tessellation

Compute Shader for Terrain Rendering

Two new YouTube videos

Reticulating splines