With some new details of Direct3D 11.1 becoming available over the past couple of days, there are some interesting new capabilities being built into the API. The two new additions that seem to provide big functional changes are: 1. The ability to utilize unordered access views (UAVs) in all of the programmable pipeline stages (previously it was only allowed in compute and pixel shaders). 2. The ability to perform logical operations in the blending stage of the output merger. I wanted to stop and think about what possible uses some of the new features could enable. As a warning, I haven't seen or run any of the D3D11.1 implementations, and the following ideas are strictly what I have thought up since hearing about the new features. These may or may not be plausible in the actual hardware implementations! Today I will talk a little about the addition of UAVs over the entire pipeline, and will post again in the future about the logical operations in the output merger.
[subheading]Using UAVs In The Entire Pipeline[/subheading] UAVs provide random access read and write capabilities for device memory resources. In Direct3D 11, these are limited to only compute and pixel shaders, but the new announcement in Direct3D 11.1 is that we can now use them in the programmable stages of the entire pipeline. The programmable stages have always had random access read capabilities through Shader Resource Views (SRVs), so I will limit my thoughts to what is now possible with a new capabilities to perform random access writes.
General purpose computation is supported quite well with compute shaders that use UAVs (In fact, the UAV is the only output mechanism that a compute shader has). This means that we shouldn't really consider general purpose computation in the other programmable pipeline stages - the new uses that the UAVs bring to those stages would instead need to take advantage of each stage's location in the pipeline, and hence untilize the underlying data that is special or only available at that stage. With this in mind, we can consider each of the stages in sequence as we progress through the pipeline and consider what we see there. The following image of the Direct3D 11 pipeline is taken from our D3D11 book Practical Rendering and Computation with Direct3D 11. [/font]
[subheading]Vertex Shader[/subheading] The first stop in the pipeline that will gain the ability to use UAVs is the vertex shader stage. It is situated between the input assembler and the hull shader, and is generally intended to perform transformations on the vertices produced by the input assembler (here transformations typically include world, view, projection, and/or skinning calculations). The vertex shader-specific data therefore includes access to the object space vertex information, the post transformation data, and can also access the SV_VertexID and SV_InstanceID system values. This data is limited to an individual vertex, as the vertex shader doesn't know anything about primitives. At this point in the pipeline, the available data is already useful. While the object space data isn't really interesting (since it comes from vertex buffers already available in resources) it could be interesting to output some data on the results of what the input assembler is producing. In some of the more exotic drawing setups, a vertex can be constructed from several vertex buffers, the result of an instance, and also part of several primitives with indexed rendering. Gathering statistics in this area could provide some insight into how heavily the vertex shader stage is being used. When this data is combined with the post transformation data, you can also gain some LOD based information. For example, if a mesh is rendered and the output screen space position is passed to a UAV in the form of a bounding box (the min/max checks must be done in the VS before writing), then you can easily determine the onscreen size of that mesh and utilize appropriate LOD techniques on the next rendering frame.
This concept could become fully automated with the use of indirect rendering too. If every rendered mesh is given an ID, we could create a structured buffer where the mesh ID is the index into the buffer. Each rendered frame would use the vertex shader to determine the mesh's screen space bounding box and store it in our structured buffer. After all the rendering is done in a frame, a compute shader pass is used to calculate the appropriate LOD level and update the contents of another buffer with the appropriate indirect rendering arguments. This indirect rendering buffer would then be used to render all of the meshes in the next frame, repeating the process. You would get immediate, frame by frame and mesh by mesh LOD adjustments. Of course, there could be contention for writing to the UAV from all of the vertex shader instances, but this would need to be taken into consideration when designing the algorithm.
[subheading]Hull Shader[/subheading] Next stop is the hull shader, which runs in two different phases (per control point and per patch). The special information that is available in this stage is the patch level information. Due to the very specific nature of this pipeline stage, I don't really see many novel uses for UAVs. Perhaps for use during the debugging of new tessellation algorithms, or for gathering metric information about patches like the screen space area of the patch, or density of control points within a given area. I'm sure more uses will come along, but for now it isn't obvious what else I would do with the new technology...
[subheading]Domain Shader[/subheading] The domain shader is executed once for every tessellation point (or vertex if you will) produced by the tessellator stage, and is more or less responsible for giving a position and set of attributes to the tessellated vertex. This means that our mesh data has been amplified at this point, and we should take this into consideration for any new UAV usage that we come up with. Any writing or reading done here will probably be at a much higher number of executions than any of the stages prior to here, so it should be done with care.
Really the best thing I can think of to do with the domain shader is to make your own stream output directly from this stage. This would be able to output data directly to a buffer resource, which could then be used somewhere else in the pipeline for reading. This would be ideal for instances where you are using the tessellation stages but not the geometry shader. Even if you are using the geometry shader, the use of UAVs should have fewer restrictions than the stream output, and should allow a greater amount of data to be produced.
[subheading]Geometry Shader[/subheading] Now we have the geometry shader, which accesses complete primitives directly before they are rasterized by the rasterizer stage. There has been the ability to do stream output from the geometry shader for quite some time, but the output is not a random write access as will be available in the UAVs. This provides what I think is one of the most interesting new possibilities for these UAVs. Since the data is available prior to rasterization, it means that we can build a representation of a mesh which is not subject to an aliased representation. Consider the case of shadow mapping - there are so many algorithms out there that attempt to find a way to reduce the effects of aliasing produced by rasterization. If we utilize the geometry shader to produce a data structure that represents a mesh in light space, then we could later use the resulting representation for querying the visibility of a particular point to a light source. This technique is already published and used in CPU based shadowing algorithms, and is called Irregular Z-buffering. It should be possible to directly implement the Irregular Z-Buffer with the information available in the geometry shader.
Typical implementations utilize a grid of linked lists, which would be possible with UAVs. Any primitives that are not facing the light source could be discarded, otherwise the primitives are stored in a sorted linked list for each area of a rough grid. Later, visibility checks can be done by traversing the linked list looking for an intersection. Strictly speaking, this technique could have been performed with the compute shader in much the same manner. However, using the geometry shader has the advantage that the tessellation results can be used, which is significantly simpler than re-implementing the entire functionality in the compute shader.