Last time we discussed the big picture of compute shaders, and looked a little bit of the syntax needed to get them set up. This time around, we will look a bit closer at the resources used with compute shaders in both HLSL and the application. Plus, we'll see how all of the index/identifier information is used within the compute shader for a useful purpose. So let's get started...
New Resources Overview
In D3D9 and D3D10, resources were bound to the pipeline as either read or write, but never both at the same time. For example, vertex/index buffers were always read, render targets were always read, and textures could be either read or write (as a texture or a render texture) but only one at a time - never both simultaneously. The available resource view types were a Render Target View (RTV), Depth Stencil View (DSV), and a Shader Resource View (SRV). These resource views only allow one direction - read OR write.
Now D3D11 has changed all of that, and allows a new type of resource view to provide read/write access to a given resource through a single resource binding. The new resource view is called an 'Unordered Access View' (or UAV for short), and can be used to bind resources to either the compute shader or the pixel shader.
With the UAV, we can use a shader to read or write at any location within the attached resource. This gives the possibility of performing scatter operations in addition to gather operations. Together with the flexible threading model we discussed last time, you can start to consider the types of operations that are possible. For image processing (which includes post processing of rendered scenes), the familiar 'pixel shader' approach could be easily implemented where a thread is invoked to process each pixel of an image and it could sample several pixels of the input image to determine what to write to the output. This is considered a gather operation.
In addition to this, you could also imagine implementing an algorithm that takes as input a list of particle locations and then calculates a new position for each particle - plus allows some particles to split into multiple additional particles (to simulate an explosion). Since a single input can end up writing to multiple output locations, we can consider this to be a scatter operation. We could also do combinations of these approaches to do both scatter and gather operations in the same shader.
So how do we implement these operations? One of the largest influences on what we can do comes from the resources that we use to read and write data from. Let's look at what resources are available and how to declare them in HLSL and the application.
To start with, the compute shader has access to all of the resources from the standard shader core. This generally includes buffers and textures, as well as subsets of each of these resources. New to D3D11 are structured buffers, byte address buffers, and append/consume buffers. Each of these resources provides a unique ability, as shown in the following list:
Structured Buffers: Provides a resource that contains an array of structures, with the structure accessible in the shader via an index.
Byte Address Buffers: Provides a resource that represents a raw array of bytes. This let's you do pretty much whatever you want to with the memory, and should open up new algorithms with more general purpose data structures (i.e. data structures that require pointers between cells instead of using an inherent structure as in textures...).
Append/Consume Buffers: Provides a resource that allows for queue like access - a resource can have elements added to it (append) and removed from it (consume) to allow for processing of elements in an ordered fashion.
With this much diversity, you can implement pretty much whatever algorithm you want to, including non-graphical operations. Other potential uses include physics, pathfinding, and animation with quite a few other possibilities in general.
Looking at all of these resource types from the algorithm view seems like a great increase in potential, but when you first consider this from the application side it seems like there is a ton of possibilities that you will need to cover with your renderer functions. However, creating resources in general only deal with textures and buffers - you just configure the description appropriately before creation, and then you can use the special features in the shader when bound properly to the pipeline. Over the next few tips, I'll cover in more detail how to create each type of resource and the corresponding resource view for a given algorithm (that's a slight foreshadowing of what is to come...).
So now we have resources that can be bound to the pipeline, but how do we declare and use resources in HLSL. Up until D3D11, the following standard resource objects could be specified and used:
These objects could be declared, have a resource bound to it, and then used in the HLSL code. With the additional resource types mentioned above, we get some additional object types to use. Let's start with the existing types that can now be read/write instead of read only:
These have obvious resource types, and simply get bound to the pipeline with a UAV instead of a normal shader resource view or render target view. The next type that we talked about were structured buffers, which are declared as follows:
These objects allow you to declare a structure in your HLSL code, then declare a buffer of those structs - then you simply index the resource with the  operator and an appropriate index. These have been exceedingly easy to use in my experiments so far, and we'll be talking more about these later on.
Next are the byte address buffers:
These basically allow you to declare a 4 byte aligned array of memory for use in whatever you want to do with it in the compute shader. I would expect that these fellows will be quite popular with the folks that like to push the limits of traditional computing (as in 'who will be the first one to build a red-black tree in the compute shader' [grin][grin]). I'm sure we'll be investigating these more in the future as well.
Up next is the Append/Consume buffers:
These buffers essentially provide a stream like interface for the programmer. One could imagine that there would be two buffers in a typical algorithm that would 'consume' from the input buffer and 'append' to the output buffer under some conditions. These also seem to be geared toward the more eccentric applications, and could allow for building basic data structures like queues and stacks.
That brief overview shows you the plethora of object types that can be declared and used in HLSL with shader model 5. If you include the new tessellation objects that I haven't discussed yet (InputPatch and OutputPatch) there is a grand total of 22 different objects to choose from! I'll have to wait until the next time to begin the discussion of the group, thread, and dispatch ID's and how they will become useful in performing parallel operations. The goods are coming when we'll be playing around with sample implementations, so get familiar with these objects for next time!