DirectX 11 Compute Shader tutorial

Started by
-1 comments, last by musawirali 15 years, 4 months ago
With the introduction of DirectX 11 come a number of exciting new features that you as a game developer (or a graphics technology enthusiast) would definitely want to play around with. The most prominent of them are the Tesselation Shaders and the Compute Shaders. Since my personal interest in graphics is more to the side of rendering, I am particularly interested in the compute shader. Those with an inclination towards geometry are probably more keen on diving into the tesselation shader. But what ever your focus of interest may be, I believe that these new upcoming features have a much more promising employment outlook than the geometry shaders, which in my opinion turned out to be complete duds. Microsoft released their beta DirectX 11 SDK in the November update. This means that you can start playing around with these new toys right away! I know what you are thinking ... doesn't DirectX 11 require Windows 7? And is the DirectX 11 compatible hardware already out? Fully DirectX 11 compatible hardware has not been released yet by any of the manufacturers (NVIDIA or ATI). But, Microsoft has determined a subset of DirectX 11 features that will run on DirectX 10 class hardware. And luckily, compute shader is one of them! The formal name for this initiative is "DirectX 11 Compute on DirectX 10 hardware". We will start seeing public releases of drivers from NVIDIA and ATI for these new features soon. The point, however, is that you can start developing your first DirectX 11 application right away ... with a reference device of course. This tutorial walks you through the steps of creating a simple application that utilizes compute shaders and the DirectX 11 API. The following code is meant to run on DirectX 10 class hardware. I assume that you already have experience with writing DirectX code, therefore I will not provide any redundant details. 1] Creating the device, context, and swap chain: DXGI_SWAP_CHAIN_DESC sd; ZeroMemory( &sd, sizeof( sd ) ); sd.BufferCount = 1; sd.BufferDesc.Width = width; sd.BufferDesc.Height = height; sd.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; sd.BufferDesc.RefreshRate.Numerator = 60; sd.BufferDesc.RefreshRate.Denominator = 1; sd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT; sd.OutputWindow = g_hWnd; sd.SampleDesc.Count = 1; sd.SampleDesc.Quality = 0; sd.Windowed = TRUE; D3D_FEATURE_LEVEL level; D3D_FEATURE_LEVEL levelsWanted[] = { D3D_FEATURE_LEVEL_11_0, D3D_FEATURE_LEVEL_10_1, D3D_FEATURE_LEVEL_10_0 }; UINT numLevelsWanted = sizeof( levelsWanted ) / sizeof( levelsWanted[0] ); D3D_DRIVER_TYPE driverTypes[] = { D3D_DRIVER_TYPE_HARDWARE, D3D_DRIVER_TYPE_REFERENCE, }; UINT numDriverTypes = sizeof( driverTypes ) / sizeof( driverTypes[0] ); for( UINT driverTypeIndex = 0; driverTypeIndex < numDriverTypes; driverTypeIndex++ ) { g_driverType = driverTypes[driverTypeIndex]; hr = D3D11CreateDeviceAndSwapChain( NULL, g_driverType, NULL, createDeviceFlags, levelsWanted, numLevelsWanted, D3D11_SDK_VERSION, &sd, &g_pSwapChain, &g_pd3dDevice, &level, &g_pd3dContext ); if( SUCCEEDED( hr ) ) break; else if( g_driverType == D3D_DRIVER_TYPE_HARDWARE ) MessageBox(NULL, L"Could not create hardware device", L"Device creation failed", MB_OK); } The code above cycles through the various driver types and selects the highest feature level that is available. If you were using the "DirectX 11 Compute on DirectX 10 hardware" drivers, then the DirectX runtime would end up selecting D3D_FEATURE_LEVEL_10_0 or _10_1 depending on what graphics card you have. Note that the D3D_FEATURE_LEVEL_11_0 is only for graphics cards that support the entire DirectX 11 feature set. The cards that support only the subset are still considered 10.x feature level cards. I will assume that with the code above, you are expecting D3D_DRIVER_TYPE_HARDWARE to be selected along with D3D_FEATURE_LEVEL_10_0. Of course that will not be the case if you don't have the proper drivers, but lets just assume it is for sake of demonstration. 2] Check for Compute feature support D3D11_FEATURE_DATA_D3D10_X_HARDWARE_OPTIONS options; hr = g_pd3dDevice->CheckFeatureSupport(D3D11_FEATURE_D3D10_X_HARDWARE_OPTIONS, &options, sizeof (D3D11_FEATURE_D3D10_X_HARDWARE_OPTIONS)); if( !options.ComputeShaders_Plus_RawAndStructuredBuffers_Via_Shader_4_x ) { MessageBox(NULL, L"Compute Shaders are not supported on your hardware", L"Unsupported HW", MB_OK); return E_FAIL; } Now we get to the compute shader stuff. If this is the first time you are hearing about "compute" shaders, then let me give you a quick description of what they are before I show code related to them. Compute shaders are basically the same as any other shader ... pixel shader for example. Just like the pixel shader is envoked for each pixel, the compute shader is envoked for each "thread". A thread is a generic and independent execution entity that doesn't really require any sort of geometry. Up until now, if you wanted to do general purpose computation on the GPU (to exploit this parallel computing beast), you would have had to resort to 3D geometry trickery that didn't really make any sense for the problem that you were trying to solve. For example, if you wanted to do matrix multiplication on the GPU, you'd have to draw a quad so that you can force rasterization of certain pixels which would then allow you to use the pixel shader on them. Draw a quad for a matrix multiplication?? what?? This is exactly the problem that the compute shader solves. All you have to do now is to dispatch a number of threads, and your shader will be executed for each of these threads. Thats all. Clean and simple. In DirectX, these threads are organized into "groups". You have X * Y * Z threads in each group, and U * V * W thread groups in your application. You can think of them as 3D blocks of threads and groups. Threads are organized into groups for synchornization purposes, but I will not get into that. One thing to note though, for "DirectX 11 compute on DirectX 10 hardware", the third dimension is always 1. i.e., there are X * Y * 1 threads in each group, and U * V * 1 total groups. The number of groups are specified during dispatch time, and the number of threads in each group are hardcoded in the compute shader. We will see this later when we get to the code. The compute shader is just like any other shader. You can access buffer and texture resources. But what/how do you output from the compute shader? From vertex shaders you output transformed vertices; from geometry shaders you output primitives, from pixel shaders you output color and depth ... but what in the world do you output from a compute shader? Well the compute shaders actually don't output anything. You store your computation results in a buffer at any location. This is done via what is known as an "Unordered Access View" (UAV). There are just like any other resource views that we have in DirectX 10, except they let you to read and write at any location. UAVs can be created from buffers and textures on DirectX 11 hardware. But on DirectX 10 hardware, you cannot construct UAVs from typed resources (i.e. textures). Instead, we use two new types of buffers: (1) structured buffers, and (2) raw buffers. Structured buffers are arrays of structures. Raw buffers are arrays of bytes. I will not go into the details on these because they are just, well, buffers :/ Also, I think I have provided you with just enough information to finally show you some code now. 3] Creating a structured buffer struct BufferStruct { UINT color[4]; }; D3D11_BUFFER_DESC sbDesc; sbDesc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE; sbDesc.CPUAccessFlags = 0; sbDesc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED; sbDesc.StructureByteStride = sizeof(BufferStruct); sbDesc.ByteWidth = sizeof(BufferStruct) * THREAD_GRID_SIZE_X * THREAD_GRID_SIZE_Y; sbDesc.Usage = D3D11_USAGE_DEFAULT; InitData.pSysMem = NULL; hr = g_pd3dDevice->CreateBuffer(&sbDesc, NULL, &g_pStructuredBuffer); 4] Creating a UAV of a structured buffer D3D11_UNORDERED_ACCESS_VIEW_DESC sbUAVDesc; sbUAVDesc.Buffer.FirstElement = 0; sbUAVDesc.Buffer.Flags = 0; sbUAVDesc.Buffer.NumElements = THREAD_GRID_SIZE_X * THREAD_GRID_SIZE_Y; sbUAVDesc.Format = DXGI_FORMAT_UNKNOWN; sbUAVDesc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER; hr = g_pd3dDevice->CreateUnorderedAccessView(g_pStructuredBuffer, &sbUAVDesc, &g_pStructuredBufferUAV); By the way, we can also create access views of this same buffer so that they can be bound to other stages of the graphics pipeline, such as the pixel shader. This is done via Shader Resource Views (SRV). These are read-only views. (Note: UAVs are available in pixel shaders as well! but only on DirectX 11 hardware). So for instance, you can have your compute shader perform some sort of simulation, store the result in a buffer via the UAV, and then read the result from the pixel shader via a SRV and render something based on that. Anyway, here is how you create a SRV: 5] Creating a SRV of a structured buffer D3D11_SHADER_RESOURCE_VIEW_DESC sbSRVDesc; sbSRVDesc.Buffer.ElementOffset = 0; sbSRVDesc.Buffer.ElementWidth = sizeof(BufferStruct); sbSRVDesc.Buffer.FirstElement = 0; sbSRVDesc.Buffer.NumElements = THREAD_GRID_SIZE_X * THREAD_GRID_SIZE_Y; sbSRVDesc.Format = DXGI_FORMAT_UNKNOWN; sbSRVDesc.ViewDimension = D3D11_SRV_DIMENSION_BUFFER; hr = g_pd3dDevice->CreateShaderResourceView(g_pStructuredBuffer, &sbSRVDesc, &g_pStructuredBufferSRV); Alright! time to create the compute shader 6] Here is the source code of a very simple compute shader. Well this really doesn't do any "computation" per say, it simply identifies its thread ID and writes it to UAV. struct BufferStruct { uint4 color; }; RWStructuredBuffer<BufferStruct> g_OutBuff; [numthreads( 4, 4, 1 )] void main( uint3 threadIDInGroup : SV_GroupThreadID, uint3 groupID : SV_GroupID ) { float4 color = float4( (float)threadIDInGroup.x / THREAD_GROUP_SIZE_X , (float)threadIDInGroup.y / THREAD_GROUP_SIZE_Y, 0, 1 ) * 255; int buffIndex = ( groupID.y * THREAD_GROUP_SIZE_Y + threadIDInGroup.y ) * THREAD_GROUPS_X * THREAD_GROUP_SIZE_X + ( groupID.x * THREAD_GROUP_SIZE_X + threadIDInGroup.x ); g_OutBuff[ buffIndex ].color = color; } Note how we have hardcoded the number of threads in the shader. This specifies how many threads are in each group. Also note the input parameters to the main() function. Each thread is identified by a 3D thread ID inside teh group, and also a 3D group ID. Both of these values are provided to us as input to the shader. In this compute shader, I'm simply using these IDs to compute an index to my structured buffer to output some value. Nothing fancy. Oh, and the THREAD_GROUP_SIZE_* variables are something I defined, those are not provided by the runtime. These values are specified when you execute the compute pass, we'll see that in a bit. 7] Compile and create the compute shader object hr = D3DCompile(cs_src, strlen(cs_src), NULL, NULL, NULL, "main", "cs_4_0", 0, 0, &pByteCodeBlob, NULL); hr = g_pd3dDevice->CreateComputeShader(pByteCodeBlob->GetBufferPointer(), pByteCodeBlob->GetBufferSize(), NULL, &g_pComputeShader); 8] And finally we have our compute pass where we dispatch the threads UINT initCounts = 0; g_pd3dContext->CSSetUnorderedAccessViews( 0, 1, &g_pStructuredBufferUAV, &initCounts ); g_pd3dContext->CSSetShader( g_pComputeShader, NULL, 0 ); g_pd3dContext->Dispatch( THREAD_GROUPS_X, THREAD_GROUPS_Y, 1 ); ID3D11UnorderedAccessView* pNullUAV = NULL; g_pd3dContext->CSSetUnorderedAccessViews( 0, 1, &pNullUAV, &initCounts ); Look at the Dispatch() call. This is where we specify the number of groups that we want to execute. Note that the last parameter (i.e. the Z dimension) is 1, since we want to be able to run this on DirectX 10 class hardware. And thats all folks! I hope you enjoyed the tutorial.

This topic is closed to new replies.

Advertisement