Lyost

GDNet+ Pro
  • Content count

    28
  • Joined

  • Last visited

Community Reputation

328 Neutral

About Lyost

  • Rank
    Member

Personal Information

  • Interests
    Programming
  1. Update: source now available at https://github.com/lyost/d3d12_framework While trying to build a couch and dealing with a broken pipe below the concrete floor of the basement, I've also been continuing playing with Direct3D12. Since the last blog entry, I have implemented an FPS meter that uses a basic texture atlas for its display, added classes for having vertex and index buffers reside in GPU memory without direct CPU access, and I added a depth-fail shadow volume test case for adding use of the stencil part of the depth-stencil to the framework. FPS Meter So far in the framework, the Game base class passed the value of the fixed timestep to the update and draw functions as the elapsed time. In order to compute the actual number of frames per second, the actual elapsed time between frames is needed instead. So, both values are now provided as arguments to the update and draw functions. This allows for it to easily be the choice of the game for which value to use, or it can use both. This of course required a minor update to all the existing test programs to add in the additional argument even though they are still using the fixed timestep value. The FPS meter itself is a library in the project named "fps_monitor" so it can be easily re-used for projects as needed. The library is the FPSMonitor class and the shaders needed for rendering it. The FPSMonitor calculates and displays the minimum, maximum, and average FPS over a configurable number of frames. It has its own graphics pipeline for rendering. So that it doesn't get bloated with code for loading different image formats or texture atlas data formats, the already loaded data is taken as arguments to the constructor. The vertices sent to the vertex shader use projection space x and y coordinates that maintain the width and height of the character as provided to the FPSMonitor constructor (which means this works best with monospaced fonts), uv coordinates for the texture going from 0-1 in both dimensions, and the key into the texture atlas lookup table (initialized to 0, but the Update function fills in the desired value for that frame). m_vertex_buffer_data[i * VERTS_PER_CHAR ] = { XMFLOAT2(-1 + x, y), XMFLOAT2(0.0f, 0.0f), 0 }; m_vertex_buffer_data[i * VERTS_PER_CHAR + 1] = { XMFLOAT2(-1 + x, y - m_char_height), XMFLOAT2(0.0f, 1.0f), 0 }; m_vertex_buffer_data[i * VERTS_PER_CHAR + 2] = { XMFLOAT2(-1 + x + m_char_width, y - m_char_height), XMFLOAT2(1.0f, 1.0f), 0 }; m_vertex_buffer_data[i * VERTS_PER_CHAR + 3] = { XMFLOAT2(-1 + x + m_char_width, y), XMFLOAT2(1.0f, 0.0f), 0 }; The texture atlas lookup table is provided to the vertex shader through a constant buffer that is an array of the uv coordinates to cover a rectangle for that entry. struct LookupTableEntry { float left; float right; float top; float bottom; }; cbuffer LOOKUP_TABLE : register(b0) { LookupTableEntry lookup_table[24]; } The combination of the 0-1 uv coordinates on each vertex and the lookup table index allow for the vertex shader to easily compute the uv coordinates for the particular character in the texture atlas. output.uv.x = (1 - input.uv.x) * lookup_table[input.lookup_index].left + input.uv.x * lookup_table[input.lookup_index].right; output.uv.y = (1 - input.uv.y) * lookup_table[input.lookup_index].top + input.uv.y * lookup_table[input.lookup_index].bottom; An alternative approach would be to skip the index field in the vertex data and update the uv coordinates on the host so that the vertex shader becomes more of a pass through. In order to test that the FPS values are being computed correctly, the test program needs the frame rate to vary. Conceptually there are 2 ways to accomplish this within a program. One is to switch between different content for one set that don't stress the system's rendering capabilities and one that does. Another way, and the way taken in the test program, is to change the fixed timestep duration. By pressing and releasing numpad 1, 2, or 3 the test program will move between 60, 30, or 24 FPS respectively. While changing the frame rate up or down instantly changes the min or max FPS, the average FPS takes a little bit, based on the number of samples, to get to a steady value. Assuming a system can handle the requested frame rate, once enough samples at the new frame rate have occurred to fill all of the sample slots in the FPSMonitor class, then all 3 should have the same value. GPU Vertex and Index Buffers The vertex and index buffers in the framework thus far have used D3D12_HEAP_TYPE_UPLOAD so that their memory can be mapped when their data needs to be updated. While the FPS meter discussed in the previous section needs to update a vertex buffer every frame, this is a rare case. Taking the common example of loading a model, normally after loading its vertex and index buffers wouldn't change. So there is no need for CPU access after loading. To cover this, there are additional classes for vertex and index buffers that use D3D12_HEAP_TYPE_DEFAULT named VertexBufferGPU_* and IndexBufferGPU16. To populate or update the data in in these GPU-only buffers, the existing vertex and index buffer classes provide a PrepUpload function for the corresponding GPU-only type. This adds to a command list for copying data between the two buffers. The actual copying is done when the command list is executed. Beyond the lack of CPU access, they function the same as the previously existing vertex and index buffers, so there's not too much to say about these. Stencil Part of the Depth-Stencil Buffer Up until now, the depth-stencil buffer has been used for just depth data. Exercising the stencil portion of this buffer required framework updates to create a depth-stencil with an appropriate format (previously the depth-stencils were all DXGI_FORMAT_D32_FLOAT), adding the ability to configure the stencil when creating a pipeline, and an algorithm to use for a test case. For the format, the DepthStencil class has an optional argument of "bool with_stencil" that if true will create the depth stencil with a format of DXGI_FORMAT_D32_FLOAT_S8X24_UINT. If it is false (the default), the format will be DXGI_FORMAT_D32_FLOAT. For configuring the stencil, the static CreateD3D12 functions in the Pipeline class had their "DepthFuncs depth_func" argument changed to "const DepthStencilConfig* depth_stencil_config". If that argument is NULL, both the depth and stencil tests are disabled. If it points to an instance of the DepthStencilConfig struct, then the depth and stencil test can be enabled or disabled individually along with the specifying the other configuration data. /// <summary> /// Enum of the various stencil operations /// </summary> /// <remarks> /// Values must match D3D12_STENCIL_OP /// </remarks> enum StencilOp { SOP_KEEP = 1, SOP_ZERO, SOP_REPLACE, SOP_INCREMENT_CLAMP, SOP_DECREMENT_CLAMP, SOP_INVERT, SOP_INCREMENT_ROLLOVER, SOP_DECREMENT_ROLLOVER }; /// <summary> /// Configuration for processing pixels /// </summary> struct StencilOpConfig { /// <summary> /// Stencil operation to perform when stencil testing fails /// </summary> StencilOp stencil_fail; /// <summary> /// Stencil operation to perform when stencil testing passes, but depth testing fails /// </summary> StencilOp depth_fail; /// <summary> /// Stencil operation to perform when both stencil and depth testing pass /// </summary> StencilOp pass; /// <summary> /// Comparison function to use to compare stencil data against existing stencil data /// </summary> CompareFuncs comparison; }; /// <summary> /// Configuration for the depth stencil /// </summary> struct DepthStencilConfig { /// <summary> /// true if depth testing is enabled. false otherwise /// </summary> bool depth_enable; /// <summary> /// true if stencil testing is enabled. false otherwise /// </summary> bool stencil_enable; /// <summary> /// Format of the depth stencil view. Must be correctly set if either depth_enable or stencil_enable is set to true. /// </summary> GraphicsDataFormat dsv_format; /// <summary> /// true if writing to the depth portion of the depth stencil is allowed. false otherwise. /// </summary> bool depth_write_enabled; /// <summary> /// Comparison function to use to compare depth data against existing depth data /// </summary> CompareFuncs depth_comparison; /// <summary> /// Bitmask for identifying which portion of the depth stencil should be used for reading stencil data /// </summary> UINT8 stencil_read_mask; /// <summary> /// Bitmask for identifying which portion of the depth stencil should be used for writing stencil data /// </summary> UINT8 stencil_write_mask; /// <summary> /// Configuration for processing pixels with a surface normal towards the camera /// </summary> StencilOpConfig stencil_front_face; /// <summary> /// Configuration for processing pixels with a surface normal away from the camera /// </summary> StencilOpConfig stencil_back_face; }; After those changes it was onto an algorithm to use as a test case. While over the years I've read up on different algorithms that use the stencil, I haven't implemented one before. I ended up picking depth-fail shadow volume using both the Wikipedia article and http://joshbeam.com/articles/stenciled_shadow_volumes_in_opengl/ for reference (I don't plan for this entry to be a tutorial on depth-fail, so I'd recommend those links if you want to read up on the algorithm). The scene is a simple one comprised of an omnidirectional light source at (8, 0, 0), an occluder at (1, 0, 0), and a textured cube that can be moved in y and z with the arrow keys that is initially positioned at (-7, 0, 0). The textured cube is initially in shadow, so the up, down, and left arrows allowed it to be moved so it can be partially or completely out of shadow or back into shadow. For the right arrow key, there was an issue where the framework was always assuming D3D12_CULL_MODE_BACK which prevented the stencil buffer from being correct. Since the stencil configuration in D3D12 allows different stencil operations for front faces and back faces, only 1 pass is needed for setting the stencil buffer when the cull mode is set to none. By doing that, the model was correctly lit when moving out the shadow volume with the right arrow key as well.
  2. Direct3D12 framework

    Screenshots from the various test program for the Direct3D12 framework
  3. Cleanup work

    There were two main pieces of cleanup work that I wanted to take care of in my Direct3D12 framework. The first was to break apart the LoadContent functions of the various test programs. The second was to minimize object lifetimes. Previously all the various framework wrapper objects and their internal D3D12 resources would live for nearly the entire program regardless of how long they were actually needed for. LoadContent In each test program the LoadContent function was a mess due to them doing more than they were originally intended to do, which was to load the models or other content needed for the program. Since I have been trying to minimize dependencies in these test programs to keep the D3D12 code as clear as possible, I'm not using a model library and am instead filling in the vertex, index, and other buffers directly. On top of that, D3D12 also requires setting up the graphics pipeline, which was also being done in those functions. To clean this up and make the code more understandable at a glance, I've introduced new TestModel and TestGraphicsPipeline classes in each test program (some have an additional pipeline class as needed for the particular test case). These new classes take a lot of the burden off the Game subclass by having them be responsible for managing just a just a model or a graphics pipeline respectively (even if the model is still hard-coded for demonstration purposes). The LoadContent functions now take care of creating and configuring instances of these classes. So, it is still doing graphics pipeline setup as needed by Direct3D12, but it is encapsulated and offloaded to an appropriate class. Where before a typical LoadContent function was a few hundred lines, now a typical LoadContent function now looks like: void GameMain::LoadContent(){ GraphicsCore& graphics = GetGraphics(); m_pipeline = new TestGraphicsPipeline(graphics); m_model = new TestModel(graphics, m_pipeline->GetShaderResourceDescHeap(), m_pipeline->GetCommandList()); m_pipeline->SetModel(m_model); // setup the cameras for the viewport Viewport full_viewport = graphics.GetDefaultViewport(); m_camera_angle = 3 * XM_PI / 2; m_camera = new Camera(full_viewport.width / full_viewport.height, 0.01f, 100.0f, XMFLOAT4(0, 0, -10, 1), XMFLOAT4(0, 0, 1, 0), XMFLOAT4(0, 1, 0, 0)); m_pipeline->SetCamera(m_camera); } For the various fields that were part of the test program Game subclasses, those have been moved to either TestModel or TestGraphicsPipeline as appropriate. One field of note is the descriptor heap. Due to a requirement of ID3D12GraphicsCommandList::SetDescriptorHeaps, where it can use only 1 D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV and only 1 D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER heap for rendering, the descriptor heap needs to be shared between the model and pipeline classes. The models just need it while creating their texture resources, and the pipeline needs it for both creating non-model resources (e.g. a constant buffer to hold the camera matrices) and for binding the descriptor heap when rendering a frame. So, the TestGraphicsPipeline class owns the descriptor heap instance and provides an accessor method for the TestModel to use it. This means a real game using this approach would need to compute the number of unique textures for its models and add in the number of constant buffers the rendering process requires, then either pass that information into the pipeline for creation of the descriptor heap and allow the models access to the same descriptor heap, or move creation of the descriptor heap outside of the pipeline and pass it to both the pipeline and model along with deal with managing its lifetime alongside the pipeline and models. Minimizing Object Lifetimes For the fields that were moved to the TestGraphicsPipeline class, there were a few that were being kept for nearly the whole duration of the program that weren't needed for that long. In particular are the various shader instances and the input layout. Those are used to create a framework's Pipeline instance, which internally creates a ID3D12PipelineState. Once that instance is created, there is no need to keep the shaders or input layout around any longer. For the TestModel classes, they didn't need to keep around the TextureUploadBuffer instances once their content had been copied to the Texture subclass that is actually used for rendering. So, after the TextureUploadBuffer's PrepUpload function has been called for all the textures to upload, the command list executes, and a fence is waited on, then the TextureUploadBuffer should be safe to delete. However, I would occasionally get an exception and message in the debug window of: D3D12 ERROR: ID3D12Resource::: CORRUPTION: An ID3D12Resource object (0x0000027D7F65D880:'Unnamed Object') is referenced by GPU operations in-flight on Command Queue (0x0000027D7D3BB130:'Unnamed ID3D12CommandQueue Object'). It is not safe to final-release objects that may have GPU operations pending. This can result in application instability. [ EXECUTION ERROR #921: OBJECT_DELETED_WHILE_STILL_IN_USE] So, this exposed a bug in my fence implementation that had been hiding there all along. While the code for D3D12_Core::WaitOnFence is a conversion from Microsoft's D3D12HelloWord sample, it turns out I had forgot to initialize value passed along to the command queue's Signal function. In debug builds this was leading to the value to start at 0, which also was the fence's initial value. This caused D3D12_Core::WaitOnFence to go into the case of thinking the fence was already complete and allow execution of the program to continue. Sometimes my system would be fast enough that was okay for this data set, other times I'd get this error. Once I initialized the initial signal value to 1, D3D12_Core::WaitOnFence would properly detect whether the fence needed to be waited on. Technically any value greater than the fence's initial value would work. Miscellaneous I also tweaked D3D12_TextureUploadBuffer::PrepUploadInternal to only do state changes in the subresource being uploaded to instead of all subresources in the target texture. When I had initially written the D3D12_ReadbackBuffer, my starting point was copying D3D12_ConstantBuffer. As a result of that the 256 byte alignment required for a constant buffer was also in the readback buffer code for the past couple of times I've uploaded the framework's source to a dev journal/blog. Since that isn't actually a requirement for a readback buffer, it has been removed.
  4. Stream Output Addendum

    Shortly after I posted my previous dev journal entry, I got a direct message asking how I determined the number of vertices that the stream output buffer should be able to hold in the test program. I think that is a great question and while I replied discussing it for the test program, I think it warrants another entry to discuss the minimum size of the stream output buffer for the test program and more generally. Vertices Available to the Stream Output Stage The stream output stage places vertices from up to 4 output streams into corresponding stream output buffers, if bound. One of these output streams can also go to the rasterizer stage for drawing to a render target. If the geometry shader is not used, then only 1 output stream is available in the graphics pipeline. In the previous entry I discussed determining how many vertices were written to a stream output buffer (basically reading back BufferFilledSizeLocation and converting bytes to vertices). In the event there are more vertices in the output stream than there is room in the buffer, then the buffer is filled in the order the vertices are in the output stream and the extraneous vertices are discarded and the count doesn't include them. If the overflow is in the output stream for the rasterizer stage, then the vertices that don't fit into the stream output buffer are still sent along to the rasterizer with the rest. Test Program The test program tries to be relatively simple for testing stream output capability. The two triangles on the right are trying to duplicate the two triangles on the left, where the ones on the left went through all the graphics pipeline stages including sending the tessellated vertices to a stream output buffer. So the stream output buffer needs to have space for at least as many vertices as are used for the left image. For the left image, I am sending a 4 point control patch through the pipeline, which the hull shader is setting all edge and inside tessellation factors to 1 and with an output topology of triangle. The tessellation factors of 1 will prevent subdivision of the edges and interior, so it's the output topology change that matters in this case. Since the control patch is a quad and the hull shader output topology is a triangle, the tessellator stage needs to create vertices for two objects, triangles in this case. Rather than re-use shared vertices between objects, the tessellator stage duplicates the vertices. That leads to domain shader being invoked 6 times for the control patch (3 per triangle). The geometry shader isn't doing any further amplification, so the stream output stage will get and need capacity for 6 vertices. Vertex Shader Without Tessellation and Geometry Shader The stream output stage can be used without tessellation or the geometry shader. In this case, there is 1 output stream available and it is comprised of the vertices returned by the vertex shader. There are basically 4 ways to use the graphics pipeline in this configuration: Just using a vertex buffer using a list topology. There is no amplification of vertices in this case. The stream output buffer should be sized for the number of vertices passed to the input assembler stage. This is not necessarily equal to the number of vertices in the vertex buffer since the first argument to ID3D12GraphicsCommandList::DrawInstanced specifies the number of vertices to draw. Documenting this case made me realize that while the framework allows for setting the number of vertices to draw, it assumed the start index would always be 0 (the third argument to ID3D12GraphicsCommandList::DrawInstanced). So, I have added an overload to D3D12_CommandList::DrawInstanced to allow the start index to be configurable. Using a vertex and index buffers in a list topology. Since the index buffer could reference the same vertex multiple times or skip vertices entirely, the stream output buffer should be sized for the number of indices passed to the input assembler stage. Like the previous case, ID3D12GraphicsCommandList::DrawIndexedInstanced has arguments to use a subset of the index buffer. Also like the previous case, documenting this made me realize that D3D12_CommandList::DrawIndexedInstanced assumed the starting index into the index buffer would be 0. And again I have added an overload to allow the start index to be configurable. Either of the previous cases using a strip topology. The shared vertices get duplicated effectively converting to a list topology. So, this will cause vertex amplification based on which primitive shape is used for the topology. Using an instance buffer with any of the previous cases. Instance buffers have a multiplicative effect on the number of vertices. Like the other buffers, ID3D12GraphicsCommandList::DrawInstanced and ID3D12GraphicsCommandList::DrawIndexedInstanced take arguments to allow a subset of the instance buffer to be used. So, the stream output buffer would need to be sized for the number of vertices for a single instance multiplied by the number of entries in the instance buffer passed to the graphics pipeline. Adding Tessellation When the tessellation stages are enabled, then the number of vertices produced depends on output topology and the tessellation factors. As mentioned when discussing the test program, the same vertex can be duplicated by the tessellator if they are for different objects in the output topology, which obviously increases the vertex count. Increasing the tessellation factors increases the subdivision of the control patch, which by definition increases the vertex count. The test program made things simple for these stages by using fixed values. In the case of dynamic tessellation values, such as doing camera distance LOD over a large terrain patch when viewed along the surface, the exact number of vertices produced is a bit more difficult to calculate and it is of questionable value to do so since the number of vertices could change frame to frame as the camera moves. Instead, calculating the upper bound for the vertices produced for a particular hull shader would allow for the stream output buffer to be appropriately sized, and the number of vertices actually written can be retrieved as needed. Adding a Geometry Shader The geometry shader allows for using multiple output streams with a maximum of 4, one of which can go to the rasterizer stage. This stage is also interesting in that the shader is responsible for adding vertices to the output streams. This means it can effectively discard a vertex by not adding to an output stream. Or conversely, it could add more vertices than there are in its input patch. The result of these possibilities is the amount of vertex amplification or reduction is implementation dependent and the shader bound to this stage will need to be analyzed for its upper bound. Oversized Stream Output Buffer In the course of doing some additional testing to make sure I discussed all the factors involved here, I had updated the test program (which subsequently got spun off into its own) so that the stream output buffer was massively oversized for the data set and would log the number of vertices in the stream output buffer. That logging made it obvious that the stream output buffer is appended to if there is space each time the pipeline uses it instead of it always starting from the beginning. Stream output buffer num verts: 6 Stream output buffer num verts: 12 Stream output buffer num verts: 18 Stream output buffer num verts: 24 Stream output buffer num verts: 30 Stream output buffer num verts: 36 Stream output buffer num verts: 42 This means that the UINT64 for BufferFilledSizeLocation needs to be reset when the buffer should be overwritten. So, I have added a new function in the framework's StreamOutputBuffer/D3D12_StreamOutputBuffer classes to do exactly that: virtual void PrepReset(CommandList& command_list, ConstantBuffer& scratch_buffer) = 0 Rather than take the approach of GetNumVerticesWrittenD3D12 where the function takes care of executing the command list and waiting on the fence, PrepReset only adds the commands to a command list. The reason for this difference is GetNumVerticesWrittenD3D12 needs to execute the command list and wait on the fence for the data to be available to a host readable buffer to do its computation. Whereas PrepReset doesn't need to do any computation, it just needs the sizeof(UINT64) bytes at the start of scratch_buffer to be 0 when the command list executes and the fence is waited on. There is a remark in the function comment to document this requirement. That also means that I can't just re-use the buffer the test program has for the camera and a separate one needs to be created. I could have made the creation of this scratch buffer internal to the class where when the first StreamOutputBuffer is created, then so is this scratch buffer. However I expect there may be other uses for a scratch buffer as I continue development of this framework. Now that the test program is updated to call this during of the draw function, but before the stream output buffer is bound to the stream output stage, the logging of the number of vertices written to the stream output buffer produces the expected result: Stream output buffer num verts: 6 Stream output buffer num verts: 6 Stream output buffer num verts: 6 Stream output buffer num verts: 6 Stream output buffer num verts: 6 Stream output buffer num verts: 6 Resetting the buffer also ensures that each frame the two triangles on the right are getting the stream output of the triangles on the left for the same frame.
  5. Adding MSAA and stream output support

      Thanks!
  6. Since my last entry I've added support for more of Direct3D12's features to my framework, which are: Cube and cube array textures. The implementation of these isn't notably different from the other texture types, so I don't plan to delve into these. Mipmaps for all the texture types (including cube and cube array). I'll talk briefly about these in a moment. MSAA render targets. Stream output stage of the graphics pipeline. Mipmaps Since I added mipmap support after all the texture types, there were 3 main questions for implementing this feature. For the first question the options are add more classes treating texture that use mipmaps as new texture types or add the number of mipmaps as an argument to the existing texture classes. The existing texture types being represented by different classes makes sense from a validation perspective since they have a different number of dimensions. Whereas mipmaps are just different resolutions of a texture. Since the difference between a texture with multiple mipmaps and a texture with just 1 resolution is so minor, adding more classes is overkill conceptually and practically doesn't have any substantial validation benefit due to the number of mipmaps being variable. This naturally leads to the other option as adding it as an argument when creating the resource. For any place that needs the mipmap index to be validated (e.g. when determining which mipmap in a texture to upload to), the minimum of 0 can be checked by using the appropriate unsigned type (UINT16), and the maximum can be checked with the argument validation that I had mentioned in a previous dev journal entry which in this case is just comparing two UINT16 values. Determining the subresource index is obvious for 1D and 2D textures with mipmaps (0 is the normal image, and each mipmap level just increments by 1). For the more complex texture types such as a cube array with mipmaps, determining the answer requires checking and understanding the documentation. I found https://msdn.microsoft.com/en-us/library/windows/desktop/ff476906%28v=vs.85%29.aspx and https://msdn.microsoft.com/en-us/library/windows/desktop/dn705766%28v=vs.85%29.aspx useful for understanding the order of subresource indices for the texture arrays. And since the second link covers it so well and has nice graphics to explain it, I won't re-hash it here. As for TextureUploadBuffer, it is responsible for creating the upload buffer and preparing a command list to copy from the upload buffer to a texture subresource (e.g. a particular side of a cube texture). For reference here's the function declaration of one of the overloads for creating a TextureUploadBuffer from before mipmapping support was added: static TextureUploadBuffer* CreateD3D12(const GraphicsCore& graphics, const Texture2D& texture); This guarantees that the upload buffer will be large enough to hold the data for that particular texture, but also allows the buffer to be re-used for other smaller or equal textures. For texture arrays, the created buffer is enough for 1 entry in the array. So if you have a Texture2DArray with 5 entries and want to upload to all of them at once, you would need 5 TextureUploadBuffer instances (the function comments for the array overloads have comments explaining this). So there are two basic options here, add in the mipmap level to create an exact fit resource, or ignore the mipmap level the buffer will be used for and create a resource large enough for the default texture size. The first option would reduce memory usage, increase code complexity, and require the mipmap level between the created buffer and the requested upload level to be validated. The second option fits the pattern established by the texture array types of make it big enough for 1 subresource. So, I went for the second approach which leaves the function signature unchanged for mipmap support. This does waste a bit of memory, but the buffer only needs to be kept around while uploading, so the extraneous memory usage is temporary. There is another detail about mipmap support I would like to mention which is the shader resource view. In Direct3D there can be multiple views for the same resource, such as using a 2D texture as a both a texture and a render target. Another option for these multiple views is to create a view that has access to only a subset of the mipmap levels. In the framework, the internal function D3D12_Texture::Create (called by the factory methods for all the texture classes) takes care of creating the resource and a shader resource view that has access to all the mipmap levels. At the moment, this full view is the only one available through the framework. However, for future development work for adding partial mipmap shader resource views, I would implement them by adding factory methods to the various texture classes that take the source texture and min/num mipmap levels as arguments where instead of calling D3D12_Texture::Create, it would just AddRef on the resource and create the requested shader resource view, passing both the resource and shader resource view to the instance of the particular texture class to manage. This AddRef and create a new view approach is basically the same approach I took with RenderTarget::CreateD3D12FromTexture where the main differences are the render target is not dealing with mipmap levels and it's creating a render target view instead of another shader resource view. MSAA While there are other resources online that cover MSAA render targets and depth stencils in Direct3D12, I want to sum up the changes needed for them here which I will cover in the remainder of this section. And there is one point worth mentioning before getting into the changes, the swap chain cannot be created with MSAA enabled. A separate render target is needed, and I'll discuss how to go from this separate render target to the back buffer at the end of this section. The first change is actually optional but recommended of checking the quality level supported by the device. This is to ensure when creating render targets and depth stencils that you don't exceed the supported quality level. Since devices can support different levels of multisampling for different formats, settings about the render target need to be passed in when checking, specifically the format, multisample count per pixel, and whether it is for a tiled resource or not. Since this check should be done before creating a MSAA render target, for the framework these fields are passed to a GraphicsCore instance's CheckSupportedMultisampleLevels function instead of the function taking a RenderTargetMSAA instance and extracting the values. The GraphicsCore instance is available to the application by using the Game base class which has a function named GetGraphics and has been used in all the sample programs. The function's declaration in GraphicsCore is: virtual UINT CheckSupportedMultisampleLevels(GraphicsDataFormat format, UINT sample_count, bool tiled) const = 0; And for anyone that would like to see the implementation that calls into Direct3D12: UINT D3D12_Core::CheckSupportedMultisampleLevels(GraphicsDataFormat format, UINT sample_count, bool tiled) const { D3D12_FEATURE_DATA_MULTISAMPLE_QUALITY_LEVELS query; query.Format = (DXGI_FORMAT)format; query.SampleCount = sample_count; query.Flags = tiled ? D3D12_MULTISAMPLE_QUALITY_LEVELS_FLAG_TILED_RESOURCE : D3D12_MULTISAMPLE_QUALITY_LEVELS_FLAG_NONE; query.NumQualityLevels = 0; HRESULT rc = m_device->CheckFeatureSupport(D3D12_FEATURE_MULTISAMPLE_QUALITY_LEVELS, &query, sizeof(query)); if (FAILED(rc)) { throw FrameworkException("Failed to get multisample quality information"); } return query.NumQualityLevels; } Unlike the previous change, the rest of the changes are required for MSAA to work. The next change is to create a pipeline that can handle MSAA. For the test program, everything going through the graphics pipeline is to go to a MSAA render target, I updated the only pipeline created in it to use MSAA. Through the framework this is adding 2 arguments to each of Pipeline's CreateD3D12 overloads. Here's an example of the update to one of the overloads: static Pipeline* CreateD3D12(const GraphicsCore& graphics_core, const InputLayout& input_layout, Topology topology, const Shader& vertex_shader, const StreamOutputConfig* stream_output, const Shader& pixel_shader, DepthFuncs depth_func, const RenderTargetViewConfig& rtv_config, const RootSignature& root_sig, UINT ms_count = 1, UINT ms_quality = 0, bool wireframe = false) Since I expect MSAA to be enabled more often than wireframe rendering, I added the MSAA arguments of ms_count and ms_quality before wireframe so that the wireframe argument can keep using its default value when not needed instead of needing to be expressly set for every pipeline. As for how this affects the Direct3D12 code for creating a pipeline, it is updating 3 fields in D3D12_GRAPHICS_PIPELINE_STATE_DESC. While they are non-adjacent lines in D3D12_Pipeline::CreateDefaultPipelineDesc, the changes to these fields are: D3D12_GRAPHICS_PIPELINE_STATE_DESC desc; desc.RasterizerState.MultisampleEnable = ms_count > 1; desc.SampleDesc.Count = ms_count; desc.SampleDesc.Quality = ms_quality; // fill in the rest of the structure here And for anyone that wants to see the full implementation of D3D12_Pipeline::CreateDefaultPipelineDesc: void D3D12_Pipeline::CreateDefaultPipelineDesc(D3D12_GRAPHICS_PIPELINE_STATE_DESC& desc, const D3D12_InputLayout& layout, const D3D12_RenderTargetViewConfig& rtv, const D3D12_RootSignature& root, D3D12_PRIMITIVE_TOPOLOGY_TYPE topology, UINT ms_count, UINT ms_quality, bool wireframe, const StreamOutputConfig* stream_output) { #ifdef VALIDATE_FUNCTION_ARGUMENTS if (layout.GetNextIndex() != layout.GetNum()) { throw FrameworkException("Not all input layout entries have been set"); } if (ms_count < 1) { throw FrameworkException("Invalid multisample count"); } else if (ms_count == 1 && ms_quality != 0) { throw FrameworkException("Multisampling quality must be 0 when multisampling is disabled"); } #endif /* VALIDATE_FUNCTION_ARGUMENTS */ desc.pRootSignature = root.GetRootSignature(); desc.InputLayout.pInputElementDescs = layout.GetLayout(); desc.InputLayout.NumElements = layout.GetNum(); desc.RasterizerState.FillMode = wireframe ? D3D12_FILL_MODE_WIREFRAME : D3D12_FILL_MODE_SOLID; desc.RasterizerState.CullMode = D3D12_CULL_MODE_BACK; desc.RasterizerState.FrontCounterClockwise = false; desc.RasterizerState.DepthBias = D3D12_DEFAULT_DEPTH_BIAS; desc.RasterizerState.DepthBiasClamp = D3D12_DEFAULT_DEPTH_BIAS_CLAMP; desc.RasterizerState.SlopeScaledDepthBias = D3D12_DEFAULT_SLOPE_SCALED_DEPTH_BIAS; desc.RasterizerState.DepthClipEnable = true; desc.RasterizerState.MultisampleEnable = ms_count > 1; desc.RasterizerState.AntialiasedLineEnable = false; desc.RasterizerState.ForcedSampleCount = 0; desc.RasterizerState.ConservativeRaster = D3D12_CONSERVATIVE_RASTERIZATION_MODE_OFF; desc.BlendState = rtv.GetBlendState(); desc.DepthStencilState.DepthEnable = false; desc.DepthStencilState.StencilEnable = false; desc.SampleMask = UINT_MAX; desc.PrimitiveTopologyType = topology; desc.NumRenderTargets = rtv.GetNumRenderTargets(); memcpy(desc.RTVFormats, rtv.GetFormats(), sizeof(RenderTargetViewFormat) * desc.NumRenderTargets); desc.SampleDesc.Count = ms_count; desc.SampleDesc.Quality = ms_quality; if (stream_output) { desc.StreamOutput = ((D3D12_StreamOutputConfig*)stream_output)->GetDesc(); } } You'll notice that function isn't calling ID3D12Device::CreateGraphicsPipelineState, which is due to the various D3D12_Pipeline::Create overloads calling CreateDefaultPipelineDesc. And the Create overloads call ID3D12Device::CreateGraphicsPipelineState after doing changes to D3D12_GRAPHICS_PIPELINE_STATE_DESC for their particular overload. The next change is creating a MSAA render target and depth stencil. In keeping with the framework's design and goals, these are separate classes (RenderTargetMSAA and DepthStencilMSAA) from their non-MSAA versions. Like the pipeline, creating these requires adding the sample count and quality as arguments to their respective CreateD3D12 functions. These additional arguments get applied to D3D12_RESOURCE_DESC for both the render target and depth stencil: D3D12_RESOURCE_DESC resource_desc; resource_desc.SampleDesc.Count = sample_count; resource_desc.SampleDesc.Quality = quality; // fill in the rest of the structure here Also the render target view and depth stencil views need updating of their view dimension (the non-MSAA versions use D3D12_RTV_DIMENSION_TEXTURE2D and D3D12_DSV_DIMENSION_TEXTURE2D respectively): D3D12_RENDER_TARGET_VIEW_DESC view_desc; view_desc.ViewDimension = D3D12_RTV_DIMENSION_TEXTURE2DMS; // fill in the rest of the structure here D3D12_DEPTH_STENCIL_VIEW_DESC view_desc; view_desc.ViewDimension = D3D12_DSV_DIMENSION_TEXTURE2DMS; // fill in the rest of the structure here After the resources are created they can be used with a corresponding pipeline by passing them to the framework's command list's OMSetRenderTarget overload, or for raw Direct3D12 ID3D12GraphicsCommandList::OMSetRenderTargets. Which makes the final change for adding MSAA support resolving the MSAA render target to a presentable back buffer. When using the framework this is accomplished by calling the CommandList's RenderTargetToResolved function, which makes the end of a Draw function something along the lines of: m_command_list->RenderTargetToResolved(*m_render_target_msaa, current_render_target); m_command_list->RenderTargetResolvedToPresent(current_render_target); m_command_list->Close(); graphics.ExecuteCommandList(*m_command_list); graphics.Swap(); And the implementation of RenderTargetToResolved is: void D3D12_CommandList::RenderTargetToResolved(const RenderTargetMSAA& src, const RenderTarget& dst) { ID3D12Resource* src_resource = ((const D3D12_RenderTargetMSAA&)src).GetResource(); ID3D12Resource* dst_resource = ((const D3D12_RenderTarget&)dst).GetResource(); D3D12_RESOURCE_BARRIER barrier[2]; barrier[0].Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION; barrier[0].Flags = D3D12_RESOURCE_BARRIER_FLAG_NONE; barrier[0].Transition.pResource = src_resource; barrier[0].Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES; barrier[0].Transition.StateBefore = D3D12_RESOURCE_STATE_RENDER_TARGET; barrier[0].Transition.StateAfter = D3D12_RESOURCE_STATE_RESOLVE_SOURCE; barrier[1].Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION; barrier[1].Flags = D3D12_RESOURCE_BARRIER_FLAG_NONE; barrier[1].Transition.pResource = dst_resource; barrier[1].Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES; barrier[1].Transition.StateBefore = D3D12_RESOURCE_STATE_PRESENT; barrier[1].Transition.StateAfter = D3D12_RESOURCE_STATE_RESOLVE_DEST; m_command_list->ResourceBarrier(2, barrier); m_command_list->ResolveSubresource(dst_resource, 0, src_resource, 0, dst_resource->GetDesc().Format); } Stream output While this is not the last stage in the graphics pipeline, it was the last one to get added to the framework. I've updated the "Direct3D12 framework" gallery with a screenshot of the test program for this stage. It displays 2 quads formed by 2 triangles per quad. While graphically this is not a complex image, there is a fair amount going on in the test program to generate it. The quad on the left is a tessellated 4 point control patch where the 4 vertices are sent by a vertex buffer from the host to graphics card and the results of tessellation are rendered and also stored to a stream output buffer. The quad on the right takes the stream output buffer that was filled by the other quad and uses the vertex and pixel shader stages to render the same quad at a different location. Since Direct3D12 does not have an equivalent of Direct3D11's ID3D11DeviceContext::DrawAuto, the test program also uses the number of vertices the GPU wrote to the stream output buffer to determine how many vertices to draw. This involves reading back a UINT64 that is in a buffer on the GPU. For this simple of a test case the read back isn't strictly necessary, but it allows for testing functionality that should prove useful for more complex or dynamic tessellation. Since the graphics pipeline for the two quads are so different from each other, they need separate pipelines and even root signatures. The pipeline for the left quad needs all the stages turned on and uses the corresponding overload and sets the nullable stream_output argument to point to an instance of the framework's StreamOutputConfig class to describe what is going to the stream output stage and which output stream should go to the rasterizer stage. The StreamOutputConfig class is a wrapper around D3D12_STREAM_OUTPUT_DESC that adds validation when setting the various fields and computes the stride in bytes for each output stream. The pipeline for the right quad needs the input assembler, vertex shader, rasterizer, pixel shader, and output merge stages enabled. Since the source for setting everything setup for these two pipelines is a bit lengthy rather than copy it here, it is available in the stream_output test program's GameMain::CreateNormalPipeline and GameMain::CreateStreamOutputPipeline functions in the attached source. The key part linking the two quads is the stream output buffer. It is created in CreateNormalPipeline by calling the framework's StreamOutputBuffer::CreateD3D12 function with the same StreamOutputConfig object that was passed to creating the left quad's pipeline. The heap properties for creating a committed resource for the stream output buffer are normal for a GPU read/write buffer: D3D12_HEAP_PROPERTIES heap_prop; heap_prop.Type = D3D12_HEAP_TYPE_DEFAULT; heap_prop.CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN; heap_prop.MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN; heap_prop.CreationNodeMask = 0; heap_prop.VisibleNodeMask = 0; Due to the BufferFilledSizeLocation field in D3D12_STREAM_OUTPUT_BUFFER_VIEW, there is a detail worth mentioning when filling out D3D12_RESOURCE_DESC. In Direct3D11 the number of bytes or vertices written to a stream output buffer was handled internally and application didn't need to think about it. This changed in Direct3D12 where the application needs to provide memory in a GPU writable buffer for this field (https://msdn.microsoft.com/en-us/library/windows/desktop/mt431709%28v=vs.85%29.aspx#fixed_function_rendering). In the framework, I regard this as part of the stream output buffer and want it to be part of its resource, which means increasing the number of bytes specified by the width field of D3D12_RESOURCE_DESC. The question becomes, how many bytes should the resource be increased by to include this field? The MSDN page for D3D12_STREAM_OUTPUT_BUFFER_VIEW (https://msdn.microsoft.com/en-us/library/windows/desktop/dn903817(v=vs.85).aspx) doesn't mention how large this field is, and https://msdn.microsoft.com/en-us/library/windows/desktop/dn903944%28v=vs.85%29.aspx discusses a 32-bit field named "BufferFilledSize" for how many bytes have been written to the buffer along with another field of unspecified size named "BufferFilledSizeOffsetInBytes". However, increasing the buffer by only sizeof(UINT) causes Direct3D12's debug layer to output all sorts of fun messages when trying to use the stream output buffer. Using a UINT64 works and would indicate that "BufferFilledSizeOffsetInBytes" is a 32-bit field as well. However, as will be discussed in the part about determining the number of vertices written to a stream output buffer, this 64-bit quantity seems to be 1 field for the number of bytes written rather than 2 separate fields. So, with it being determined that the additional field is a UINT64, the code for filling in the resource description becomes: D3D12_RESOURCE_DESC res_desc; res_desc.Dimension = D3D12_RESOURCE_DIMENSION_BUFFER; res_desc.Alignment = D3D12_DEFAULT_RESOURCE_PLACEMENT_ALIGNMENT; res_desc.Width = num_bytes + sizeof(UINT64); // note: +sizeof(UINT64) due to graphics card counter for how many bytes it has written to the stream output buffer res_desc.Height = 1; res_desc.DepthOrArraySize = 1; res_desc.MipLevels = 1; res_desc.Format = DXGI_FORMAT_UNKNOWN; res_desc.SampleDesc.Count = 1; res_desc.SampleDesc.Quality = 0; res_desc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR; res_desc.Flags = D3D12_RESOURCE_FLAG_NONE; Since it makes sense for a stream output buffer to be initially ready for use with the stream output stage, the initial resource state should reflect that. So the code for creating the resource is: ID3D12Resource* buffer; HRESULT rc = core.GetDevice()->CreateCommittedResource(&heap_prop, D3D12_HEAP_FLAG_NONE, &res_desc, D3D12_RESOURCE_STATE_STREAM_OUT, NULL, __uuidof(ID3D12Resource), (void**)&buffer); if (FAILED(rc)) { ostringstream out; out << "Failed to create committed resource for stream output buffer. HRESULT = " << rc; throw FrameworkException(out.str()); } Next the stream output buffer view needs to be created. For D3D12_STREAM_OUTPUT_BUFFER_VIEW, the BufferLocation is the GPU address pointing to the beginning of the stream output buffer, SizeInBytes is the number of bytes in the stream output buffer, and BufferFilledSizeLocation is the GPU address of that additional field I previously talked about for the counter of how many bytes have been written to the stream output buffer. Initially I had put the byte counter field after the stream output buffer. However, so that when copying the byte counter to a host readable buffer (D3D12_StreamOutputBuffer::GetNumVerticesWritten) doesn't have to compute an offset when calling ID3D12GraphicsCommandList::CopyBufferRegion, I moved it to the beginning of the resource and pushed the stream output buffer back by sizeof(UINT64). D3D12_STREAM_OUTPUT_BUFFER_VIEW so_view; so_view.BufferFilledSizeLocation = buffer->GetGPUVirtualAddress(); so_view.BufferLocation = so_view.BufferFilledSizeLocation + sizeof(UINT64); so_view.SizeInBytes = num_bytes; Since the stream output buffer can also be used as a vertex buffer for the input assembler stage, a vertex buffer view needs to be created to support that. It's pretty straight forward of having the same start address as the stream output buffer (not including the byte counter field), setting the total size of the buffer, and the vertex stride in bytes which is taken from the StreamOutputConfig for the particular stream index. D3D12_VERTEX_BUFFER_VIEW vb_view; vb_view.BufferLocation = so_view.BufferLocation; vb_view.SizeInBytes = (UINT)num_bytes; vb_view.StrideInBytes = (UINT)vertex_stride; For getting the number of vertices written to a stream output buffer available to an application, the framework provides a static GetNumVerticesWritten function in the StreamOutputBuffer class. It's declaration is: static void GetNumVerticesWrittenD3D12(GraphicsCore& graphics, CommandList& command_list, const std::vector so_buffers, ReadbackBuffer& readback_buffer, std::vector& num_vertices); You'll notice that rather than take a single stream output buffer, this takes a collection of them. This is due to retrieve the data the byte count field needs to be copied from the resource for the stream output buffer, which is only GPU accessible, to a host readable buffer. That process involves using a command list to perform the copy, then executing the command list needs to wait for fence so that the copying is complete, and finally converting from number of bytes written to number of vertices in the buffer (which is dividing the value from the field by the vertex stride). Doing all of that per stream output buffer gets is basically stalling the application on the CPU by waiting for multiple fences to complete. By performing these operations on a collection, there is only 1 fence causing the CPU to wait. The source for this function is: void D3D12_StreamOutputBuffer::GetNumVerticesWritten(GraphicsCore& graphics, CommandList& command_list, const vector so_buffers, ReadbackBuffer& readback_buffer, vector& num_vertices) { D3D12_Core& core = (D3D12_Core&)graphics; ID3D12GraphicsCommandList* cmd_list = ((D3D12_CommandList&)command_list).GetCommandList(); D3D12_ReadbackBuffer& rb_buffer = (D3D12_ReadbackBuffer&)readback_buffer; ID3D12Resource* resource = rb_buffer.GetResource(); command_list.Reset(NULL); num_vertices.reserve(num_vertices.size() + so_buffers.size()); vector::const_iterator so_it = so_buffers.begin(); UINT64 dst_offset = 0; while (so_it != so_buffers.end()) { D3D12_StreamOutputBuffer* curr_so_buffer = (D3D12_StreamOutputBuffer*)(*so_it); cmd_list->CopyBufferRegion(resource, dst_offset, curr_so_buffer->m_buffer, 0, sizeof(UINT64)); ++so_it; dst_offset += sizeof(UINT64); } command_list.Close(); core.ExecuteCommandList(command_list); core.WaitOnFence(); rb_buffer.Map(); UINT64* cbuffer_value = (UINT64*)rb_buffer.GetHostMemStart(); so_it = so_buffers.begin(); while (so_it != so_buffers.end()) { D3D12_StreamOutputBuffer* curr_so_buffer = (D3D12_StreamOutputBuffer*)(*so_it); num_vertices.push_back((UINT)((*cbuffer_value) / curr_so_buffer->m_vb_view.StrideInBytes)); ++so_it; ++cbuffer_value; } rb_buffer.Unmap(); } *>*>*> One additional thing I would like to mention about the stream output stage is that if multiple output streams are used by a shader, they must be of type PointStream. In the test program, I am sending the tessellated object space coordinate to the stream output buffer, but due to this restriction I am also sending it to the pixel shader (in addition to the projection coordinates). While sending this additional field to the pixel shader isn't harmful, it's also unnecessary since the pixel shader doesn't use it. Having the left quad drawn as points isn't the behavior I wanted for this test program, so using multiple output streams to minimize the data sent to both stream output and pixel shader stages wasn't viable in this case. How do I want to expose creating textures that use them? How to determine the subresource index for all the different types of textures? Since using compile time type checking to avoid incorrect operations is a goal of this project, how does this affect the TextureUploadBuffer interface and D3D12_TextureUploadBuffer implementation?
  7. Just a minor update from my previous post. I've updated my D3D12 framework () to use committed instead of placed resources, along with centralizing the texture creation code. The switch over was pretty straight forward since it was basically removing an additional resource heap and switching my calls from CreateResource on my heap wrapper class (which internally called CreatePlacedResource) to use CreateCommittedResource instead while being careful to copy the correct initial state for the resource type. Since placed resources are something I will probably want to revisit down the road, along with revising them to allow for overlapping resources which is their primary benefit, I saved off the diff for reference code in ref directory in the project. Also, in my previous post I had mentioned that I needed to use 2 different resources and copying between them for using a render target as a texture. While that is true with placed resources on heap tier 1 devices, when using committed resources it is possible to use the same resource for the render target as the texture. It just requires that the flags in the resource description includes D3D12_RESOURCE_FLAG_ALLOW_RENDER_TARGET. Since MSDN recommends setting that flag only when the texture will be used as a render target and since most textures will be loaded from a file in real usage instead, I created the additional Texture2DRenderTarget and D3D12_Texture2DRenderTarget classes. The existing RenderTarget/D3D12_RenderTarget class can have an instance created from one of these new classes which causes it to share the same resource (with corresponding AddRef/Release calls for managing its lifetime). I could have re-used the existing Texture2D/D3D12_Texture2D classes by adding a flag to their create functions, however adding the new classes keeps with the goal of using compile time type checking to avoid incorrect operations. I have added another test program to demonstrate this (render_target_to_texture_same_resource). In the spirit of the test programs being unit tests and for reference code, I have kept the previous test program and framework functionality around.
  8. Playing with Direct3D12

    It's been quite some time since I last posted here, despite reading some of the forums regularly. For this post I've gotten my work-in-progress Direct3D12 framework project into a good enough state that I thought I'd share it out (a zip file of the source is attached). The purposes of this project are: Play around with Direct3D12 to get hands on experience with it. Build a framework that uses compile time type checking to try to avoid invalid operations. A natural extension of the previous point is that inclusion of d3d12 headers should be restricted to inside the framework and the consuming application should not include d3d12 headers or have them included indirectly from the framework's public headers. For the current state of this project, it supports: All the rendering stages except stream output Placed resources (see note at the end of this post) Heap tier 1 hardware (https://msdn.microsoft.com/en-us/library/windows/desktop/dn986743%28v=vs.85%29.aspx) Supports vertex, index, and instance buffers Handles the various texture types (1D, 1D array, 2D, 2D array, and 3D) Allows for multiple viewports Resizing the window and toggling full screen mode (note: there is a bug at the moment for resizing while in full screen mode) Debug and release modes for x86 and x64 The compute pipeline is on my to-do list. The texturing support also needs to be improved to support mipmapping and MSAA textures. By not supporting MSAA at the moment, I've been able to cheat when it comes to dealing with alignment of resources in a buffer heap by only considering the size since they all use the same alignment without MSAA. Getting Started When I started this project, the first step was to upgrade to Windows 10 since I was still on 7 at that point, as well as updating to Visual Studio 2015. After that a driver updated was still necessary so that I could use Direct3D12 with the existing graphics card. Once I got to working with code, the starting point of this project was my Direct3D11 framework that I had discussed in a previous dev journal entry. With the new Visual Studio project, I was initially confused why it couldn't find d3d12.h for which the answer turned out to be that I needed to update my project settings. By default the a project's "Target Platform Version" (in the "General" section) is "8.1" which is an issue since Windows 8.1 doesn't support Direct3D12. After updating that field to "10.0.10240.0", the compiler was able to find the Direct3D12 headers without further issue. Another stumbling block I hit early on was figuring out that I was on heap tier 1 hardware, meaning that each resource heap could only support 1 type of resource whereas tier 2 allows for all types of resources in the same resource heap. Test Programs As I had mentioned, the starting point for my Direct3D12 project was my Direct3D11 framework, which had included only a few test programs since my approach was to update the existing test program until I was doing something incompatible with it and spin off a new one. When going to Direct3D12 I very quickly realized that approach was very problematic. The main test program was trying to use vertex, index, and instance buffers, along with constant buffers for the various cameras, texturing, and multiple viewports. While all of those are perfectly reasonable to do with a graphics API, when trying to start with the basics of initializing the graphics API that additional functionality was encumbering. So, with this project instead of constantly expanding a minimal number of test programs, there are a variety of them. This has the added benefit of basically being unit tests for previously implemented functionality. To keep with this idea of them being unit tests, each one is focused on testing the functionality in a simple way instead of implementing some graphics algorithm. This allowed me to focus on debugging the functionality instead of debugging the higher level algorithm as well. This is distinction is perhaps most clear in test program for the hull and domain shader stages. While most people would probably implement a LOD or terrain smoothing algorithm, I took the approach of rendering a wireframe of a square to visualize how the tessellation was subdividing the square. As the following list shows, the test programs were created in order of expanding functionality and complexity. Screenshots of the test programs are in the album at https://www.gamedev.net/gallery/album/1165-direct3d12-framework/. Differences from the Direct3D11 Framework Aside from which graphics API is being used, there are a few important differences from the Direct3D11 framework to the 12 one. Working with Direct3D12 Since the significant difference between Direct3D11 and 12 is the resource management rather than graphical capabilities, that has pretty much been the main focus of the framework so far. Since the framework is using placed resources, this was a bit more complicated than it needed to be since the aligned size for a resource needed to be determined, a buffer heap created to store all the resources of that type, then a descriptor heap with enough entries for all the resources needed for the frame, and finally using all of that to create the resource and its view. My understanding of committed resources is that finding the aligned size and the buffer heap can be skipped if those are used instead of placed resources. Aside from resource creation, another resource management difference is needing to use a fence to allow for the graphics card to finish processing a command list. I've had to work with fences on non-graphics projects before, and with a little reasoning I found it pretty straight forward to determine where these needed to be added. And speaking of command lists, I found them to be a very nice change from Direct3D11's contexts. The rendering processes with command lists is basically reset them, tell them which pipeline to use, setup your resources, issue your draw commands, update the render target to a present state, close the command list, and execute it. This keeps the pipeline in a clean and known state since there is no concern over what was bound to the pipeline for the previous frame or render pass. One key area of difference between Direct3D11 and 12 that I need more practice with is the root signature. These are new with Direct3D12 and describe the stages used and the layout of resources to each stage. Since a program should try to minimize the number of root signatures, they should be defined in a way to cover the variety pipeline configurations the program will use. The key issues I had with them is figuring out when I needed a root signature entry to be a table vs a view, and if I have multiple textures in a shader if I should use 1 entry in a table where the register space covers all the textures or one entry per texture in the table (and in case you're curious, the one table entry per texture approach is correct). One project I've periodically thought about is creating a program that displays a GUI of the graphics pipeline and allows for the shaders to different stages to be specified, and then will output a root signature that is compatible with those settings. However, that is something I haven't started yet. Test Programs host_computed_triangle: A very basic test of rendering a triangle. Instead of communicating a camera's matrices via a constant buffer so the vertices can be transformed in the vertex shader, this test is so simple that the projection coordinates of the triangle's vertices are set when creating the vertex buffer so that the vertex buffer is just a pass-through. Index_buffer: Like host_computed_triangle, this test has the projection coordinates set when creating the vertex buffer, but adds using an index buffer to use 2 triangles to make a quad. single_2d_texture: A copy of host_computed_triangle modified to use a green texture instead of per-vertex colors constant_buffer: A copy of host_computed_triangle modified to use a color specified in a constant buffer instead of the per-vertex color. two_textured_instance_cubes: Uses vertex and index buffers to render a cube, which have the same 2D texture applied to the each face. This uses a constant buffer to communicate a camera's matrices to the vertex shader. By adding in an instance buffer, two cubes are displayed instead of just the one specified in the vertex/index buffers. I also added in using the left and right arrow keys to rotate the camera around the cubes. Which after seeing a what should have been an obscured cube on top of the other one, lead to my realization that the previous test programs had depth testing turned off. So, while I originally intended this program to be just a test of instance rendering, it is also the first test of depth stencils. geometry_shader_viewports: This is 4 cameras/viewports showing different angles of two_textured_instance_cubes. The left and right arrow keys will only update the upper-left camera. This is bringing the test programs on par with the main Direct3D11 framework test program, though it does rendering to multiple viewports completely differently. The Direct3D11 framework test program would iterate over the different viewports to make them active and issue the draw commands to that particular viewport. Whereas this test program renders to all 4 viewports at once by using the geometry shader to create the additional vertices and send the vertices to the correct viewport. hull_and_domain: In order to exercise two more stages of the rendering pipeline that haven't been used yet, this draws a quad in wireframe mode so that the tessellation is visible. I plan to revisit this program in the future to make the tessellation factors dynamic based on user input, but for now hull_and_domain_hs.hlsl needs to be edited, recompiled, and the program restarted to view the changed factors. texture_type_tester: Draws a quad to the screen, where pressing spacebar cycles through a 1D, 2D, and 3D textures. The 1D texture is a gradient from red to blue. The 2D texture is the same one used by two_textured_instance_cubes. The 3D texture has 3 slices where the first is all red, the second is all green, and the third is all blue. The uvw coordinates are set such that some of each slice will be visible. texture_array_tester: Uses instance rendering to draw 3 quads to the screen, where pressing space cycles through a 1D or 2D texture array. The 1D textures are a gradient from red to blue, where to show the different entries in the array the green component is set based on which entry in the array it is (0 for the first, 50% for the second, and 100% for the last). The 2D textures take the same approach to the green component, but instead of being a gradient it has a red triangle in the upper left half and blue triangle bottom right half. render_target_to_texture: Draws to a render target then copies the render target to a texture. Since heap tier 1 in Direct3D12 does not allow a render target texture to be on the same heap as a non-render target texture, the same ID3D12Resource could not be used for a texture and render target with only different views created for the same resource like was done in Direct3D11. Hence this program creating two different ID3D12Resources and doing a copy between them. texture_multiple_uploads: The previous test programs uploaded textures from host memory to the graphic's card memory 1 at a time with a fence between each one re-using the same buffer in host memory. This program uploads all the textures at once with only 1 fence by using 1 buffer per texture. Otherwise it's the same as texture_array_tester. Lessons Learned One of the goals of the Direct3D11 Framework was to be similar to XNA. While the Direct3D12 framework has a nearly identical Game base class, similarity to XNA is not a goal of this project. In particular is the removal of the ContentManager from the framework, which includes loading image formats as textures. There were two motivating factors behind this. The first was file formats aren't really part of a graphics API. The second is having an asset loading library on top of the framework is more extensible and allows for new formats to be added if a particular project requires them instead of bloating the framework with a wide variety of formats. If you look through the source for the test programs, you'll notice I currently don't have an asset loading library and instead create my textures through code, so this thought of an additional library is more for if this framework would be used on a real project. But it's still filling in an in-memory buffer to pass to the framework along with the pixel format, so the only things really missing are dealing with file IO and file formats. This framework also has consistent error handling (aside from 1 more class to update). Rather than using the included log library to make note of the problem then return NULL or false, this one uses a FrameworkException class which inherits from std::exception. Improving that exception class is on my to-do list, but in its current state it has moved log library usage out of the framework and makes the application aware of the problem. Due to the simplicity of the test programs, they still call into the log library and bail out of the program, but a real application could potentially take steps to work around the problem (e.g. if you get an exception when trying to create a texture due to not enough space in the resource heap, make another resource heap with enough space). Another use of the FrameworkException class is argument validation. The Direct3D11 framework didn't consistently do argument validation and frequently just documented that it was the caller's responsibility to ensure a particular invariant. As part of C++'s pay for what you use philosophy, I made argument validation optional via a define in BuildSettings.h. In the attached zip file, it is turned on because while I try to use compile time type checking to avoid incorrect operations, that can't catch all cases and I found it useful in development of the test programs. For a real project, I would expect it to be turned on in debug mode but off in release mode. From a previous dev journal entry, the Direct3D11 framework used public header generation by using ifndefs and a processing program. This was a very messy solution and in retrospect, I'm not really sure why I did it that way, and didn't solve the issue (nor was it intended to) of the application indirectly getting the Direct3D11 header files included from including the framework headers. The Direct3D12 framework keeps the idea of public vs private headers, but doesn't use a processing program. The public headers are the generic interface that doesn't need to include the Direct3D12 headers. These have factory methods to create the actual instances from the private headers. The pimpl idiom could also have been used here instead of inheritance to solve this issue in a reasonable way. The main reason I thought to use factory methods and inheritance here is if the backing graphics API was changed to OpenGL, Metal, or something else then it is a matter of adding new factory methods and private header subclasses (along with argument validation to make sure all the private classes are using the same graphics API). For shaders, instead of continuing what I had done in the Direct3D11 framework of changing the name of the entry point function to be reflective of which stage the shader was for, it would have been less updating of file properties to go with Visual Studio's default of "main". Though when editing the file properties, updating the dialog to "all configurations" and "all platforms" does help mitigate this rename issue. Multiple test programs to check the functionality instead of implement a graphics algorithm worked quite well, though are visually less impressive. Multiple viewports in Direct3D12 seem to be best implemented by using the geometry shader to send the vertices to the correct viewport. Compile time type checking isn't enough to avoid invalid operations, hence the VALIDATE_FUNCTION_ARGUMENTS ifdefs. When I was first working on the hull_and_domain test program, as a result of copy and pasting a different project as the starting point the index buffer was initially setup for using 2 triangles to make a quad. Since the original code for that test program was modified to use an input topology of a 4 point control patch, that produced an incorrect result and allowed me to realize that I just needed to update the index buffer to be the indices of the 4 vertices in order. While this is obvious in retrospect and didn't take me long to realize, since the online resources I looked at didn't include this detail I figured I'd mention it here. Setting the platform to build/run to x64 helps find more spots where argument validation is needed. This is mainly due to my using the stl where the return type of size is a size_t which can vary by platform, whereas the Direct3D12 API is uses fixed sizes such as UINT32 across all platforms. Other Notes After the last Windows Update there were additional messages generated for incorrect states. I'm glad that these messages were added and I took care of resolving them inside the framework. It was things like the initial state of depth stencil and render targets needing to be D3D12_RESOURCE_STATE_DEPTH_WRITE and D3D12_RESOURCE_STATE_PRESENT respectively instead of D3D12_RESOURCE_STATE_GENERIC_READ which I had simply as a copy and paste from the texturing and constant buffer heaps. While most examples that I saw online for Direct3D12 used committed resources which are simpler to setup, the framework currently uses place resources. This is due to my wanting to get hands on experience with the interactions between the resource, resource heap, and descriptor heap. However, since the framework currently prevents creating overlapping resources, which is the primary benefit of placed resources, and since Nvidia recommends committed resources over placed, I will likely revise this in the future.
  9. D3D11 Framework Update

    While there is still plenty of work to do on my d3d11 framework, I thought I'd post an update on the current status. For what's currently in it: keyboard input mouse input vertex shaders pixel shaders viewports blend and depth stencil states texturing (just loading from a file for now, will be expanding functionality later) resizing/fullscreen toggling A short list of things that I will be adding is (I have a longer list of tasks for my own reference): rest of the shader types model loading high dpi mouse support joystick/xbox controller support deferred contexts sprite support (DirectXTK looks like it might make this part easy, especially the text support) networking lib It would be further along, but two main factors slowed down development on it. The first and main factor was that I wasn't able to work on it for most of January due to work requiring a significant amount of extra hours. The second and more of a speed bump was there are several functions that MSDN notes that cannot be used in applications submitted to the Windows Store (the entire D3D11 reflection API for example). While I am a long way off from doing anything with the Windows Store, if ever, I figured it might make this framework more useful and reduce maintenance down the line if I followed the recommendations and requirements for it. Part of that is to compile shaders at build time instead of runtime, so I upgraded to VS2012 to get integrated support for that instead of trying to roll my own content pipeline (which is not off the table for other content types, just no immediate plans for that feature). Converting the project from VS2010 to VS2012 needed four notable changes, the first three of which are straight forward: 1. Removing references to the June 2010 DirectX SDK from include and linker paths, since the header and library files are now part of the VS2012 installation and available in default directories. 2. Switching from xnamath to directxmath.h, which in addition to changing the name of the include file is also updating the namespace of the types in the file to be part of the DirectX namespace. Since the rest of the DirectX functions and types (i.e. ID3D11DeviceContext) are not in that namespace, it seems a bit inconsistent to have some functions/types in the namespace and others outside of it. 3. The next easy change was to do build time shader compilation and change the runtime loading mechanism to load the .cso file. This was a matter of splitting the vertex and pixel shaders into separate .hlsl files and setting their properties appropriately (mainly "Entrypoint Name" and "Shader Type", I also like "Treat Warnings As Errors" on). The ContentManager class also needed its shader loading mechanism updated. Instead of calling D3DX11CompileFromFile in the CompileShader function, the CompileShader function was changed to LoadFile, which does a binary load of a file into memory. After that, the code was the same of calling CreateVertexShader or CreatePixelShader on the ID3D11Device instance. There was a little wrinkle in these edits for current working directory when debugging vs the location of the .cso files. The .cso files were placed in $(OutDir) with the results of compiling the other projects in the solution (i.e. .lib and .exe files, not the .obj for each code file), whereas the working directory when debugging was the code directory for the test program. This meant that just specifying the filename without a path would fail to find the file at runtime. So either the output path for compiling the shaders needed to change or the working directory when debugging needed to change. I chose to change the working directory since running in the same directory as the .exe file is a better match to how an application would be used outside of development. This also meant that the texture I was using needed to be copied to $(OutDir) as well, which was easy enough to add as a "Post-Build Event" (though I initially put in the copy command as "cp" since I'm used to Linux, but it didn't take long to remember the Windows name for the command is "copy"). 4. This was the not so straight forward change, texture loading. Previously I was using D3DX11CreateShaderResourceViewFromFile, which is not available in VS2012. To get texture loading working again, I had to do a bit of research. I didn't want to get bogged down into loading various image file formats and reinventing the wheel (this entire project can probably be called reinventing the wheel, but the point of it is more about applying D3D11 and expanding my understanding of it). Luckily, right on the MSDN page for D3DX11CreateShaderResourceViewFromFile, there are links to recommended replacements. The recommended runtime replacement is DirectXTK, which compiled right out of the box for me, no playing with settings or paths to get it to compile. For using it, there was one problem I ran into. Initially I was using seafloor.dds from Tutorial 7 of the June 2010 DirectX Sample Browser. The DirectXTK Simple Win32 Sample comes with a texture file of the same name and looks the same. However, there is at least a slight difference in the file contents. The one from the June 2010 sample failed to load with an ERROR_NOT_SUPPORTED issue, but the one from the DirectXTK sample works. After dealing with that, I decided to get rid of third party content, and am now using a debug texture I had lying around which is a .png file and it works with DirectXTK just fine. Backing up for a moment to before I did the VS2012 upgrade, in my previous journal entry of "D3D11 Port of Particle System Rendering", I mentioned that I wanted to look into other ways for creating the input layout description for a vertex shader. The first thing I looked into was the shader reflection API, so that the input layout would automatically be determined by the vertex shader code. This was obviously before I knew the API would not be available to Windows Store applications or decided to follow those requirements. Aside from that, I ran into a different issue that prevented me from using it here. The vertex shader code knows what its inputs are, but it doesn't know which are per-vertex and which are per-instance. From looking at D3D11_INPUT_ELEMENT_DESC, knowing per-vertex or per-instance is important to set the InputSlotClass and InstanceDataStepRate fields correctly. Since reflection wasn't an option here, I created a class to manage the input layout, not surprisingly named InputLayout. It does avoid the simple mistake I had done in the previous journal entry of incorrectly computing the aligned byte offsets, as well as avoiding spelling mistakes for semantic names, and avoiding attempts to set more input layouts than were allocated. In its current form, it does force non-instance vertex buffers into slot 0 and instance into slot 1, precluding any additional vertex buffers, which I might revisit later once I hit a scenario that requires more. For the configuring the D3D11 rendering pipeline, my approach has been to get everything into a known state. Taking the example of setting a constant buffer on the vertex shader stage, if a particular constant buffer is not set, then whatever the previous value was will be present the next time the shader is invoked. To avoid these stale states, in the framework when a VertexShader instance's MakeActive function is called, it sets all of the constant buffers to the last value the VertexShader instance received for them (the actual function for setting a constant buffer is in ShaderBase). Taking this known state design a step further is where the RenderPipeline class comes in (once I get the parts done and integrated into it anyways). The plan is for when an instance's MakeActive function is called that the vertex shader, pixel shader, their constant buffers, etc all are active on the provided ID3D11DeviceContext. An obvious improvement to setting all of the pipeline info each time would be to only set the parts that don't match the last value for the ID3D11DeviceContext instance. However, I've been trying to get things working before I start looking into performance improvements. Below is a screen shot of the test program, which is using instance rendering for 2 cubes, and 4 viewports to display them from different angles. [sharedmedia=gallery:images:3470] is a zip file that has the source for the framework (MIT license, like normal for me). Though it does not have DirectXTK in it, which that can be found here. For adding it in, put the DirectXTK include files in the framework's external_tools/DirectXTK/public_inc/ directory, and once build the DirectXTK.lib file in external_tools/DirectXTK/lib/Debug/ or external_tools/DirectXTK/lib/Release/ depending on which configuration you built it for.
  10. I don't have immediate plans to port this again.  When I have free time again, I'd like to get back to working on my D3D11 framework.  Which I have made progress on, but still have a long list of things that need to be done.
  11. Public header generation

    In the course of expanding and fixing the D3D11 framework from my previous journal entry, I've found that there are types and functions that I want to have as public in the library but not exposed outside of it (internal scope in C#). Instead of purely using the pimpl idiom (for those unfamiliar with the term or how to apply it, these should help explain it: http://en.wikipedia.org/wiki/Opaque_pointer and https://www.gamedev.net/page/resources/_/technical/general-programming/the-c-pimpl-r1794), I decided on public and private headers, for which pimpl can be used to hide types. To automatically generate the public headers, I created a utility program that uses tokens in the private headers to generate the public ones (to be clear, the utility doesn't automatically do pimpl, that's up to the developer, the utility just does simple line exclusion). I did a few quick searches and didn't see a utility that already did this for C++, but it was simple enough to program my own. There were a few choices for how to specify the start/stop tokens: comments, ifdef, ifndef. So that IDE outlining can show the affected sections and so the build of the library doesn't need an additional define (this is pure laziness since adding a define to the project settings is trivial), I went with lines starting with "#ifndef" and "#endif" which also contain "PUBLIC_HEADER" to mark the start/end of sections to exclude from the public header. The lines need to start with the "#ifndef" or "#endif" tokens so that if needed the line can be commented out rather than needing to delete it and remember where they need to go if/when they need to be added back in. I've tested it successfully so far for hiding types (via pimpl's declare and only use pointers approach) and functions. For an example usage:#ifndef FOO_H#define FOO_Hclass Foo{ public: // stuff... #ifndef PUBLIC_HEADER // this function is excluded from the public header void blah();#endif /* PUBLIC_HEADER */ // stuff... private: // stuff...};#endif /* FOO_H */ For integrating it into the library build, I added a post-build event to make the public include directory and call the utility. That has the side effect of the test program in the solution can't check its code against the library headers until the library build is complete and if an error is found during the test program build, clicking the error message opens the public include instead of the private one in the solution. I consider this okay since it is not the normal use case. The normal use case is having a separate solution to which a released build of the library is added to "additional dependencies", and the path to the public headers is added to the "additional include directories" project settings. An alternative to generating public headers is to use the same headers for public and private, and switch to using an ifdef for when building the library. I didn't go this way primarially due to one issue: if a program build sets that define, all the library private stuff is available to the program Source (under MIT license as normal):
  12. I didn't create any for this since they would look the same as the previous journal entry, aside from the window icon and name.  The particle systems I did to test the functionality of this port look the same as they do in the XNA renderer.
  13. The D3D11 port of the rendering process for my particle system editor is complete (VC++ 2010 project and source under the MIT license can be found at the end of this entry, also you will need to edit the include and additional dependancy paths to build). The hardest part of the project has been finding time for it since my job has been demanding evenings and weekends recently to release that project. But, I've been on vacation this past week and I've spent part of it on finishing this port, which from looking back took about 12 weeks of calendar time and vastly less actual time (at least half of which was this week). I started this port after reading "Practical Rendering & Computation with Direct3D11" and thought it would be a good first project to start applying what I had learned from that book. From reading it and from using XNA on previous projects, I knew I wanted to create an interface layer to hide most of the details of D3D11 from the application, which is where the d3d11_framework project (also included below) came from. It is a partial recreation of the XNA framework in C++, though is comparatively in very rough shape. It also takes into account the differences between D3D9/XNA and D3D11, such as using xnamath.h and passing a ID3D11DeviceContext* to the rendering functions so that multithreaded rendering is possible, though the creation of deferred contexts is currently not implemented. The current form is good enough to get through a sample project (a port of tutorial 7 from the D3D11 samples included in the June 2010 SDK, to which I added instance rendering) and the particle system rendering. My plan for my next project is some cleanup work and expansion of the framework, including pushing parts of the particle system port to the framework. I already have a laundry list of tasks that is sure to grow. I had thought about a compute shader version of the particle system, but I'm going to hold off on that at least for now. In the course of working on this project, I'm pretty sure I made every newbie mistake: Using XMVECTOR and XMMATRIX in dynamically allocated types, which lead to access violations when changing them since they may not be properly aligned. So, now I use them only during calculations and store the result in XMFLOAT3/XMFLOAT4 or XMFLOAT4X4. Missing that the order of the arguments to XMMatrixRotationRollPitchYaw(pitch, yaw, roll) is different than XNA (yaw, pitch, roll) and different than its function name (roll, pitch, yaw) Constant buffers bound to the vertex shader stage are not automatically bound to the pixel shader stage (not sure if this is just XNA that does this or if it's D3D9) For the layout description, having both aligned byte offsets for the instance entries set to 0 (one of the things on my to-do list for the framework is to look into making creating the layout description less error prone) In addition to that, while creating the C++ port, it made me wonder why I made the C# editor and XNA renderer multithreaded. The C++ version is completely single threaded, since there really is no need for multithreading. The best answer I can come up with for why the C# versions have multithreading is that C# makes it easy. d3d11_framework: particle system renderer:
  14. Particle System Editor - Edit: Now with source

    The journal has been updated to include a zip file of the source. I took a look at slimdx, but have not had time to get the sample programs working under Visual C# 2010 (they compile but there is a run time exception early on).