• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
  • entries
    18
  • comments
    20
  • views
    25157

Entries in this blog

tricona

Update: source now available at https://github.com/lyost/d3d12_framework

While trying to build a couch and dealing with a broken pipe below the concrete floor of the basement, I've also been continuing playing with Direct3D12.  Since the last blog entry, I have implemented an FPS meter that uses a basic texture atlas for its display, added classes for having vertex and index buffers reside in GPU memory without direct CPU access, and I added a depth-fail shadow volume test case for adding use of the stencil part of the depth-stencil to the framework.

FPS Meter

large.fps_monitor.png.f5530f79f5dd47b9c0ffeed7d04bdbfb.png

So far in the framework, the Game base class passed the value of the fixed timestep to the update and draw functions as the elapsed time.  In order to compute the actual number of frames per second, the actual elapsed time between frames is needed instead.  So, both values are now provided as arguments to the update and draw functions.  This allows for it to easily be the choice of the game for which value to use, or it can use both.  This of course required a minor update to all the existing test programs to add in the additional argument even though they are still using the fixed timestep value.

The FPS meter itself is a library in the project named "fps_monitor" so it can be easily re-used for projects as needed.  The library is the FPSMonitor class and the shaders needed for rendering it.  The FPSMonitor calculates and displays the minimum, maximum, and average FPS over a configurable number of frames.  It has its own graphics pipeline for rendering.  So that it doesn't get bloated with code for loading different image formats or texture atlas data formats, the already loaded data is taken as arguments to the constructor.

The vertices sent to the vertex shader use projection space x and y coordinates that maintain the width and height of the character as provided to the FPSMonitor constructor (which means this works best with monospaced fonts), uv coordinates for the texture going from 0-1 in both dimensions, and the key into the texture atlas lookup table (initialized to 0, but the Update function fills in the desired value for that frame).

m_vertex_buffer_data[i * VERTS_PER_CHAR    ] = { XMFLOAT2(-1 + x,                y),                 XMFLOAT2(0.0f, 0.0f), 0 };
m_vertex_buffer_data[i * VERTS_PER_CHAR + 1] = { XMFLOAT2(-1 + x,                y - m_char_height), XMFLOAT2(0.0f, 1.0f), 0 };
m_vertex_buffer_data[i * VERTS_PER_CHAR + 2] = { XMFLOAT2(-1 + x + m_char_width, y - m_char_height), XMFLOAT2(1.0f, 1.0f), 0 };
m_vertex_buffer_data[i * VERTS_PER_CHAR + 3] = { XMFLOAT2(-1 + x + m_char_width, y),                 XMFLOAT2(1.0f, 0.0f), 0 };

The texture atlas lookup table is provided to the vertex shader through a constant buffer that is an array of the uv coordinates to cover a rectangle for that entry.

struct LookupTableEntry
{
  float left;
  float right;
  float top;
  float bottom;
};

cbuffer LOOKUP_TABLE : register(b0)
{
  LookupTableEntry lookup_table[24];
}

The combination of the 0-1 uv coordinates on each vertex and the lookup table index allow for the vertex shader to easily compute the uv coordinates for the particular character in the texture atlas.

output.uv.x = (1 - input.uv.x) * lookup_table[input.lookup_index].left + input.uv.x * lookup_table[input.lookup_index].right;
output.uv.y = (1 - input.uv.y) * lookup_table[input.lookup_index].top  + input.uv.y * lookup_table[input.lookup_index].bottom;

An alternative approach would be to skip the index field in the vertex data and update the uv coordinates on the host so that the vertex shader becomes more of a pass through.

In order to test that the FPS values are being computed correctly, the test program needs the frame rate to vary.  Conceptually there are 2 ways to accomplish this within a program.  One is to switch between different content for one set that don't stress the system's rendering capabilities and one that does.  Another way, and the way taken in the test program, is to change the fixed timestep duration.  By pressing and releasing numpad 1, 2, or 3 the test program will move between 60, 30, or 24 FPS respectively.  While changing the frame rate up or down instantly changes the min or max FPS, the average FPS takes a little bit, based on the number of samples, to get to a steady value.  Assuming a system can handle the requested frame rate, once enough samples at the new frame rate have occurred to fill all of the sample slots in the FPSMonitor class, then all 3 should have the same value.

GPU Vertex and Index Buffers

The vertex and index buffers in the framework thus far have used D3D12_HEAP_TYPE_UPLOAD so that their memory can be mapped when their data needs to be updated.  While the FPS meter discussed in the previous section needs to update a vertex buffer every frame, this is a rare case.  Taking the common example of loading a model, normally after loading its vertex and index buffers wouldn't change.  So there is no need for CPU access after loading.  To cover this, there are additional classes for vertex and index buffers that use D3D12_HEAP_TYPE_DEFAULT named VertexBufferGPU_* and IndexBufferGPU16.  To populate or update the data in in these GPU-only buffers, the existing vertex and index buffer classes provide a PrepUpload function for the corresponding GPU-only type.  This adds to a command list for copying data between the two buffers.  The actual copying is done when the command list is executed.  Beyond the lack of CPU access, they function the same as the previously existing vertex and index buffers, so there's not too much to say about these.

Stencil Part of the Depth-Stencil Buffer

large.stencil.png.b403d34dc0585439fcb0d95f11019749.png

Up until now, the depth-stencil buffer has been used for just depth data.  Exercising the stencil portion of this buffer required framework updates to create a depth-stencil with an appropriate format (previously the depth-stencils were all DXGI_FORMAT_D32_FLOAT), adding the ability to configure the stencil when creating a pipeline, and an algorithm to use for a test case.

For the format, the DepthStencil class has an optional argument of "bool with_stencil" that if true will create the depth stencil with a format of DXGI_FORMAT_D32_FLOAT_S8X24_UINT.  If it is false (the default), the format will be DXGI_FORMAT_D32_FLOAT.

For configuring the stencil, the static CreateD3D12 functions in the Pipeline class had their "DepthFuncs depth_func" argument changed to "const DepthStencilConfig* depth_stencil_config".  If that argument is NULL, both the depth and stencil tests are disabled.  If it points to an instance of the DepthStencilConfig struct, then the depth and stencil test can be enabled or disabled individually along with the specifying the other configuration data.

/// <summary>
/// Enum of the various stencil operations
/// </summary>
/// <remarks>
/// Values must match D3D12_STENCIL_OP
/// </remarks>
enum StencilOp
{
  SOP_KEEP = 1,
  SOP_ZERO,
  SOP_REPLACE,
  SOP_INCREMENT_CLAMP,
  SOP_DECREMENT_CLAMP,
  SOP_INVERT,
  SOP_INCREMENT_ROLLOVER,
  SOP_DECREMENT_ROLLOVER
};

/// <summary>
/// Configuration for processing pixels
/// </summary>
struct StencilOpConfig
{
  /// <summary>
  /// Stencil operation to perform when stencil testing fails
  /// </summary>
  StencilOp stencil_fail;

  /// <summary>
  /// Stencil operation to perform when stencil testing passes, but depth testing fails
  /// </summary>
  StencilOp depth_fail;

  /// <summary>
  /// Stencil operation to perform when both stencil and depth testing pass
  /// </summary>
  StencilOp pass;

  /// <summary>
  /// Comparison function to use to compare stencil data against existing stencil data
  /// </summary>
  CompareFuncs comparison;
};

/// <summary>
/// Configuration for the depth stencil
/// </summary>
struct DepthStencilConfig
{
  /// <summary>
  /// true if depth testing is enabled.  false otherwise
  /// </summary>
  bool depth_enable;

  /// <summary>
  /// true if stencil testing is enabled.  false otherwise
  /// </summary>
  bool stencil_enable;

  /// <summary>
  /// Format of the depth stencil view.  Must be correctly set if either depth_enable or stencil_enable is set to true.
  /// </summary>
  GraphicsDataFormat dsv_format;

  /// <summary>
  /// true if writing to the depth portion of the depth stencil is allowed.  false otherwise.
  /// </summary>
  bool depth_write_enabled;

  /// <summary>
  /// Comparison function to use to compare depth data against existing depth data
  /// </summary>
  CompareFuncs depth_comparison;

  /// <summary>
  /// Bitmask for identifying which portion of the depth stencil should be used for reading stencil data
  /// </summary>
  UINT8 stencil_read_mask;

  /// <summary>
  /// Bitmask for identifying which portion of the depth stencil should be used for writing stencil data
  /// </summary>
  UINT8 stencil_write_mask;

  /// <summary>
  /// Configuration for processing pixels with a surface normal towards the camera
  /// </summary>
  StencilOpConfig stencil_front_face;

  /// <summary>
  /// Configuration for processing pixels with a surface normal away from the camera
  /// </summary>
  StencilOpConfig stencil_back_face;
};

After those changes it was onto an algorithm to use as a test case.  While over the years I've read up on different algorithms that use the stencil, I haven't implemented one before.  I ended up picking depth-fail shadow volume using both the Wikipedia article and http://joshbeam.com/articles/stenciled_shadow_volumes_in_opengl/ for reference (I don't plan for this entry to be a tutorial on depth-fail, so I'd recommend those links if you want to read up on the algorithm).  The scene is a simple one comprised of an omnidirectional light source at (8, 0, 0), an occluder at (1, 0, 0), and a textured cube that can be moved in y and z with the arrow keys that is initially positioned at (-7, 0, 0).  The textured cube is initially in shadow, so the up, down, and left arrows allowed it to be moved so it can be partially or completely out of shadow or back into shadow.  For the right arrow key, there was an issue where the framework was always assuming D3D12_CULL_MODE_BACK which prevented the stencil buffer from being correct.  Since the stencil configuration in D3D12 allows different stencil operations for front faces and back faces, only 1 pass is needed for setting the stencil buffer when the cull mode is set to none.  By doing that, the model was correctly lit when moving out the shadow volume with the right arrow key as well.

tricona

There were two main pieces of cleanup work that I wanted to take care of in my Direct3D12 framework. The first was to break apart the LoadContent functions of the various test programs. The second was to minimize object lifetimes. Previously all the various framework wrapper objects and their internal D3D12 resources would live for nearly the entire program regardless of how long they were actually needed for.

LoadContent

In each test program the LoadContent function was a mess due to them doing more than they were originally intended to do, which was to load the models or other content needed for the program. Since I have been trying to minimize dependencies in these test programs to keep the D3D12 code as clear as possible, I'm not using a model library and am instead filling in the vertex, index, and other buffers directly. On top of that, D3D12 also requires setting up the graphics pipeline, which was also being done in those functions. To clean this up and make the code more understandable at a glance, I've introduced new TestModel and TestGraphicsPipeline classes in each test program (some have an additional pipeline class as needed for the particular test case). These new classes take a lot of the burden off the Game subclass by having them be responsible for managing just a just a model or a graphics pipeline respectively (even if the model is still hard-coded for demonstration purposes). The LoadContent functions now take care of creating and configuring instances of these classes. So, it is still doing graphics pipeline setup as needed by Direct3D12, but it is encapsulated and offloaded to an appropriate class. Where before a typical LoadContent function was a few hundred lines, now a typical LoadContent function now looks like:

void GameMain::LoadContent(){
  GraphicsCore& graphics = GetGraphics();
  m_pipeline = new TestGraphicsPipeline(graphics);
  m_model = new TestModel(graphics, m_pipeline->GetShaderResourceDescHeap(), m_pipeline->GetCommandList()); m_pipeline->SetModel(m_model); // setup the cameras for the viewport
  Viewport full_viewport = graphics.GetDefaultViewport();
  m_camera_angle = 3 * XM_PI / 2;
  m_camera = new Camera(full_viewport.width / full_viewport.height, 0.01f, 100.0f, XMFLOAT4(0, 0, -10, 1), XMFLOAT4(0, 0, 1, 0), XMFLOAT4(0, 1, 0, 0));
  m_pipeline->SetCamera(m_camera);
}

For the various fields that were part of the test program Game subclasses, those have been moved to either TestModel or TestGraphicsPipeline as appropriate. One field of note is the descriptor heap. Due to a requirement of ID3D12GraphicsCommandList::SetDescriptorHeaps, where it can use only 1 D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV and only 1 D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER heap for rendering, the descriptor heap needs to be shared between the model and pipeline classes. The models just need it while creating their texture resources, and the pipeline needs it for both creating non-model resources (e.g. a constant buffer to hold the camera matrices) and for binding the descriptor heap when rendering a frame. So, the TestGraphicsPipeline class owns the descriptor heap instance and provides an accessor method for the TestModel to use it. This means a real game using this approach would need to compute the number of unique textures for its models and add in the number of constant buffers the rendering process requires, then either pass that information into the pipeline for creation of the descriptor heap and allow the models access to the same descriptor heap, or move creation of the descriptor heap outside of the pipeline and pass it to both the pipeline and model along with deal with managing its lifetime alongside the pipeline and models.

Minimizing Object Lifetimes

For the fields that were moved to the TestGraphicsPipeline class, there were a few that were being kept for nearly the whole duration of the program that weren't needed for that long. In particular are the various shader instances and the input layout. Those are used to create a framework's Pipeline instance, which internally creates a ID3D12PipelineState. Once that instance is created, there is no need to keep the shaders or input layout around any longer.

For the TestModel classes, they didn't need to keep around the TextureUploadBuffer instances once their content had been copied to the Texture subclass that is actually used for rendering. So, after the TextureUploadBuffer's PrepUpload function has been called for all the textures to upload, the command list executes, and a fence is waited on, then the TextureUploadBuffer should be safe to delete. However, I would occasionally get an exception and message in the debug window of:

D3D12 ERROR: ID3D12Resource::: CORRUPTION: An ID3D12Resource object (0x0000027D7F65D880:'Unnamed Object') is referenced by GPU operations in-flight on Command Queue (0x0000027D7D3BB130:'Unnamed ID3D12CommandQueue Object'). It is not safe to final-release objects that may have GPU operations pending. This can result in application instability. [ EXECUTION ERROR #921: OBJECT_DELETED_WHILE_STILL_IN_USE]

So, this exposed a bug in my fence implementation that had been hiding there all along. While the code for D3D12_Core::WaitOnFence is a conversion from Microsoft's D3D12HelloWord sample, it turns out I had forgot to initialize value passed along to the command queue's Signal function. In debug builds this was leading to the value to start at 0, which also was the fence's initial value. This caused D3D12_Core::WaitOnFence to go into the case of thinking the fence was already complete and allow execution of the program to continue. Sometimes my system would be fast enough that was okay for this data set, other times I'd get this error. Once I initialized the initial signal value to 1, D3D12_Core::WaitOnFence would properly detect whether the fence needed to be waited on. Technically any value greater than the fence's initial value would work.

Miscellaneous


  • I also tweaked D3D12_TextureUploadBuffer::PrepUploadInternal to only do state changes in the subresource being uploaded to instead of all subresources in the target texture.
  • When I had initially written the D3D12_ReadbackBuffer, my starting point was copying D3D12_ConstantBuffer. As a result of that the 256 byte alignment required for a constant buffer was also in the readback buffer code for the past couple of times I've uploaded the framework's source to a dev journal/blog. Since that isn't actually a requirement for a readback buffer, it has been removed.

     

tricona

Shortly after I posted my previous dev journal entry, I got a direct message asking how I determined the number of vertices that the stream output buffer should be able to hold in the test program. I think that is a great question and while I replied discussing it for the test program, I think it warrants another entry to discuss the minimum size of the stream output buffer for the test program and more generally.

Vertices Available to the Stream Output Stage

The stream output stage places vertices from up to 4 output streams into corresponding stream output buffers, if bound. One of these output streams can also go to the rasterizer stage for drawing to a render target. If the geometry shader is not used, then only 1 output stream is available in the graphics pipeline.

In the previous entry I discussed determining how many vertices were written to a stream output buffer (basically reading back BufferFilledSizeLocation and converting bytes to vertices). In the event there are more vertices in the output stream than there is room in the buffer, then the buffer is filled in the order the vertices are in the output stream and the extraneous vertices are discarded and the count doesn't include them. If the overflow is in the output stream for the rasterizer stage, then the vertices that don't fit into the stream output buffer are still sent along to the rasterizer with the rest.

Test Program

The test program tries to be relatively simple for testing stream output capability. The two triangles on the right are trying to duplicate the two triangles on the left, where the ones on the left went through all the graphics pipeline stages including sending the tessellated vertices to a stream output buffer. So the stream output buffer needs to have space for at least as many vertices as are used for the left image. For the left image, I am sending a 4 point control patch through the pipeline, which the hull shader is setting all edge and inside tessellation factors to 1 and with an output topology of triangle. The tessellation factors of 1 will prevent subdivision of the edges and interior, so it's the output topology change that matters in this case. Since the control patch is a quad and the hull shader output topology is a triangle, the tessellator stage needs to create vertices for two objects, triangles in this case. Rather than re-use shared vertices between objects, the tessellator stage duplicates the vertices. That leads to domain shader being invoked 6 times for the control patch (3 per triangle). The geometry shader isn't doing any further amplification, so the stream output stage will get and need capacity for 6 vertices.

Vertex Shader Without Tessellation and Geometry Shader

The stream output stage can be used without tessellation or the geometry shader. In this case, there is 1 output stream available and it is comprised of the vertices returned by the vertex shader. There are basically 4 ways to use the graphics pipeline in this configuration:

  1. Just using a vertex buffer using a list topology. There is no amplification of vertices in this case. The stream output buffer should be sized for the number of vertices passed to the input assembler stage. This is not necessarily equal to the number of vertices in the vertex buffer since the first argument to ID3D12GraphicsCommandList::DrawInstanced specifies the number of vertices to draw. Documenting this case made me realize that while the framework allows for setting the number of vertices to draw, it assumed the start index would always be 0 (the third argument to ID3D12GraphicsCommandList::DrawInstanced). So, I have added an overload to D3D12_CommandList::DrawInstanced to allow the start index to be configurable.

  2. Using a vertex and index buffers in a list topology. Since the index buffer could reference the same vertex multiple times or skip vertices entirely, the stream output buffer should be sized for the number of indices passed to the input assembler stage. Like the previous case, ID3D12GraphicsCommandList::DrawIndexedInstanced has arguments to use a subset of the index buffer. Also like the previous case, documenting this made me realize that D3D12_CommandList::DrawIndexedInstanced assumed the starting index into the index buffer would be 0. And again I have added an overload to allow the start index to be configurable.

  3. Either of the previous cases using a strip topology. The shared vertices get duplicated effectively converting to a list topology. So, this will cause vertex amplification based on which primitive shape is used for the topology.
  4. Using an instance buffer with any of the previous cases. Instance buffers have a multiplicative effect on the number of vertices.  Like the other buffers, ID3D12GraphicsCommandList::DrawInstanced and ID3D12GraphicsCommandList::DrawIndexedInstanced take arguments to allow a subset of the instance buffer to be used. So, the stream output buffer would need to be sized for the number of vertices for a single instance multiplied by the number of entries in the instance buffer passed to the graphics pipeline.

Adding Tessellation

When the tessellation stages are enabled, then the number of vertices produced depends on output topology and the tessellation factors. As mentioned when discussing the test program, the same vertex can be duplicated by the tessellator if they are for different objects in the output topology, which obviously increases the vertex count. Increasing the tessellation factors increases the subdivision of the control patch, which by definition increases the vertex count. The test program made things simple for these stages by using fixed values. In the case of dynamic tessellation values, such as doing camera distance LOD over a large terrain patch when viewed along the surface, the exact number of vertices produced is a bit more difficult to calculate and it is of questionable value to do so since the number of vertices could change frame to frame as the camera moves. Instead, calculating the upper bound for the vertices produced for a particular hull shader would allow for the stream output buffer to be appropriately sized, and the number of vertices actually written can be retrieved as needed.

Adding a Geometry Shader

The geometry shader allows for using multiple output streams with a maximum of 4, one of which can go to the rasterizer stage. This stage is also interesting in that the shader is responsible for adding vertices to the output streams. This means it can effectively discard a vertex by not adding to an output stream. Or conversely, it could add more vertices than there are in its input patch. The result of these possibilities is the amount of vertex amplification or reduction is implementation dependent and the shader bound to this stage will need to be analyzed for its upper bound.

 

Oversized Stream Output Buffer

In the course of doing some additional testing to make sure I discussed all the factors involved here, I had updated the test program (which subsequently got spun off into its own) so that the stream output buffer was massively oversized for the data set and would log the number of vertices in the stream output buffer. That logging made it obvious that the stream output buffer is appended to if there is space each time the pipeline uses it instead of it always starting from the beginning.

Stream output buffer num verts: 6
Stream output buffer num verts: 12
Stream output buffer num verts: 18
Stream output buffer num verts: 24
Stream output buffer num verts: 30
Stream output buffer num verts: 36
Stream output buffer num verts: 42

This means that the UINT64 for BufferFilledSizeLocation needs to be reset when the buffer should be overwritten. So, I have added a new function in the framework's StreamOutputBuffer/D3D12_StreamOutputBuffer classes to do exactly that:

virtual void PrepReset(CommandList& command_list, ConstantBuffer& scratch_buffer) = 0

Rather than take the approach of GetNumVerticesWrittenD3D12 where the function takes care of executing the command list and waiting on the fence, PrepReset only adds the commands to a command list. The reason for this difference is GetNumVerticesWrittenD3D12 needs to execute the command list and wait on the fence for the data to be available to a host readable buffer to do its computation. Whereas PrepReset doesn't need to do any computation, it just needs the sizeof(UINT64) bytes at the start of scratch_buffer to be 0 when the command list executes and the fence is waited on. There is a remark in the function comment to document this requirement. That also means that I can't just re-use the buffer the test program has for the camera and a separate one needs to be created. I could have made the creation of this scratch buffer internal to the class where when the first StreamOutputBuffer is created, then so is this scratch buffer. However I expect there may be other uses for a scratch buffer as I continue development of this framework.

Now that the test program is updated to call this during of the draw function, but before the stream output buffer is bound to the stream output stage, the logging of the number of vertices written to the stream output buffer produces the expected result:

Stream output buffer num verts: 6
Stream output buffer num verts: 6
Stream output buffer num verts: 6
Stream output buffer num verts: 6
Stream output buffer num verts: 6
Stream output buffer num verts: 6

Resetting the buffer also ensures that each frame the two triangles on the right are getting the stream output of the triangles on the left for the same frame.

tricona

Since my last entry I've added support for more of Direct3D12's features to my framework, which are:

  • Cube and cube array textures. The implementation of these isn't notably different from the other texture types, so I don't plan to delve into these.
  • Mipmaps for all the texture types (including cube and cube array). I'll talk briefly about these in a moment.
  • MSAA render targets.
  • Stream output stage of the graphics pipeline.

Mipmaps

Since I added mipmap support after all the texture types, there were 3 main questions for implementing this feature.
For the first question the options are add more classes treating texture that use mipmaps as new texture types or add the number of mipmaps as an argument to the existing texture classes. The existing texture types being represented by different classes makes sense from a validation perspective since they have a different number of dimensions. Whereas mipmaps are just different resolutions of a texture. Since the difference between a texture with multiple mipmaps and a texture with just 1 resolution is so minor, adding more classes is overkill conceptually and practically doesn't have any substantial validation benefit due to the number of mipmaps being variable. This naturally leads to the other option as adding it as an argument when creating the resource. For any place that needs the mipmap index to be validated (e.g. when determining which mipmap in a texture to upload to), the minimum of 0 can be checked by using the appropriate unsigned type (UINT16), and the maximum can be checked with the argument validation that I had mentioned in a previous dev journal entry which in this case is just comparing two UINT16 values.

Determining the subresource index is obvious for 1D and 2D textures with mipmaps (0 is the normal image, and each mipmap level just increments by 1). For the more complex texture types such as a cube array with mipmaps, determining the answer requires checking and understanding the documentation. I found https://msdn.microsoft.com/en-us/library/windows/desktop/ff476906%28v=vs.85%29.aspx and https://msdn.microsoft.com/en-us/library/windows/desktop/dn705766%28v=vs.85%29.aspx useful for understanding the order of subresource indices for the texture arrays. And since the second link covers it so well and has nice graphics to explain it, I won't re-hash it here.

As for TextureUploadBuffer, it is responsible for creating the upload buffer and preparing a command list to copy from the upload buffer to a texture subresource (e.g. a particular side of a cube texture). For reference here's the function declaration of one of the overloads for creating a TextureUploadBuffer from before mipmapping support was added:

static TextureUploadBuffer* CreateD3D12(const GraphicsCore& graphics, const Texture2D& texture);

This guarantees that the upload buffer will be large enough to hold the data for that particular texture, but also allows the buffer to be re-used for other smaller or equal textures. For texture arrays, the created buffer is enough for 1 entry in the array. So if you have a Texture2DArray with 5 entries and want to upload to all of them at once, you would need 5 TextureUploadBuffer instances (the function comments for the array overloads have comments explaining this). So there are two basic options here, add in the mipmap level to create an exact fit resource, or ignore the mipmap level the buffer will be used for and create a resource large enough for the default texture size. The first option would reduce memory usage, increase code complexity, and require the mipmap level between the created buffer and the requested upload level to be validated. The second option fits the pattern established by the texture array types of make it big enough for 1 subresource. So, I went for the second approach which leaves the function signature unchanged for mipmap support. This does waste a bit of memory, but the buffer only needs to be kept around while uploading, so the extraneous memory usage is temporary.

There is another detail about mipmap support I would like to mention which is the shader resource view. In Direct3D there can be multiple views for the same resource, such as using a 2D texture as a both a texture and a render target. Another option for these multiple views is to create a view that has access to only a subset of the mipmap levels. In the framework, the internal function D3D12_Texture::Create (called by the factory methods for all the texture classes) takes care of creating the resource and a shader resource view that has access to all the mipmap levels. At the moment, this full view is the only one available through the framework. However, for future development work for adding partial mipmap shader resource views, I would implement them by adding factory methods to the various texture classes that take the source texture and min/num mipmap levels as arguments where instead of calling D3D12_Texture::Create, it would just AddRef on the resource and create the requested shader resource view, passing both the resource and shader resource view to the instance of the particular texture class to manage. This AddRef and create a new view approach is basically the same approach I took with RenderTarget::CreateD3D12FromTexture where the main differences are the render target is not dealing with mipmap levels and it's creating a render target view instead of another shader resource view.

MSAA

While there are other resources online that cover MSAA render targets and depth stencils in Direct3D12, I want to sum up the changes needed for them here which I will cover in the remainder of this section. And there is one point worth mentioning before getting into the changes, the swap chain cannot be created with MSAA enabled. A separate render target is needed, and I'll discuss how to go from this separate render target to the back buffer at the end of this section.

The first change is actually optional but recommended of checking the quality level supported by the device. This is to ensure when creating render targets and depth stencils that you don't exceed the supported quality level. Since devices can support different levels of multisampling for different formats, settings about the render target need to be passed in when checking, specifically the format, multisample count per pixel, and whether it is for a tiled resource or not. Since this check should be done before creating a MSAA render target, for the framework these fields are passed to a GraphicsCore instance's CheckSupportedMultisampleLevels function instead of the function taking a RenderTargetMSAA instance and extracting the values. The GraphicsCore instance is available to the application by using the Game base class which has a function named GetGraphics and has been used in all the sample programs. The function's declaration in GraphicsCore is:

virtual UINT CheckSupportedMultisampleLevels(GraphicsDataFormat format, UINT sample_count, bool tiled) const = 0;

And for anyone that would like to see the implementation that calls into Direct3D12:

UINT D3D12_Core::CheckSupportedMultisampleLevels(GraphicsDataFormat format, UINT sample_count, bool tiled) const
{
  D3D12_FEATURE_DATA_MULTISAMPLE_QUALITY_LEVELS query;
  query.Format = (DXGI_FORMAT)format;
  query.SampleCount = sample_count;
  query.Flags = tiled ? D3D12_MULTISAMPLE_QUALITY_LEVELS_FLAG_TILED_RESOURCE : D3D12_MULTISAMPLE_QUALITY_LEVELS_FLAG_NONE;
  query.NumQualityLevels = 0;
  HRESULT rc = m_device->CheckFeatureSupport(D3D12_FEATURE_MULTISAMPLE_QUALITY_LEVELS, &query, sizeof(query));
  if (FAILED(rc)) {
    throw FrameworkException("Failed to get multisample quality information");
  }
  return query.NumQualityLevels;
}

Unlike the previous change, the rest of the changes are required for MSAA to work.

The next change is to create a pipeline that can handle MSAA. For the test program, everything going through the graphics pipeline is to go to a MSAA render target, I updated the only pipeline created in it to use MSAA. Through the framework this is adding 2 arguments to each of Pipeline's CreateD3D12 overloads. Here's an example of the update to one of the overloads:

static Pipeline* CreateD3D12(const GraphicsCore& graphics_core, const InputLayout& input_layout, Topology topology, const Shader& vertex_shader, const StreamOutputConfig* stream_output, const Shader& pixel_shader, DepthFuncs depth_func, const RenderTargetViewConfig& rtv_config, const RootSignature& root_sig, UINT ms_count = 1, UINT ms_quality = 0, bool wireframe = false)

Since I expect MSAA to be enabled more often than wireframe rendering, I added the MSAA arguments of ms_count and ms_quality before wireframe so that the wireframe argument can keep using its default value when not needed instead of needing to be expressly set for every pipeline. As for how this affects the Direct3D12 code for creating a pipeline, it is updating 3 fields in D3D12_GRAPHICS_PIPELINE_STATE_DESC. While they are non-adjacent lines in D3D12_Pipeline::CreateDefaultPipelineDesc, the changes to these fields are:

D3D12_GRAPHICS_PIPELINE_STATE_DESC desc;
desc.RasterizerState.MultisampleEnable = ms_count > 1;
desc.SampleDesc.Count = ms_count;
desc.SampleDesc.Quality = ms_quality;
// fill in the rest of the structure here

And for anyone that wants to see the full implementation of D3D12_Pipeline::CreateDefaultPipelineDesc:

void D3D12_Pipeline::CreateDefaultPipelineDesc(D3D12_GRAPHICS_PIPELINE_STATE_DESC& desc, const D3D12_InputLayout& layout, const D3D12_RenderTargetViewConfig& rtv, const D3D12_RootSignature& root, D3D12_PRIMITIVE_TOPOLOGY_TYPE topology, UINT ms_count, UINT ms_quality, bool wireframe, const StreamOutputConfig* stream_output)
{
#ifdef VALIDATE_FUNCTION_ARGUMENTS
  if (layout.GetNextIndex() != layout.GetNum())
  {
    throw FrameworkException("Not all input layout entries have been set");
  }
  if (ms_count < 1)
  {
    throw FrameworkException("Invalid multisample count");
  }
  else if (ms_count == 1 && ms_quality != 0)
  {
    throw FrameworkException("Multisampling quality must be 0 when multisampling is disabled");
  }
#endif /* VALIDATE_FUNCTION_ARGUMENTS */
  desc.pRootSignature = root.GetRootSignature();
  desc.InputLayout.pInputElementDescs = layout.GetLayout();
  desc.InputLayout.NumElements = layout.GetNum();
  desc.RasterizerState.FillMode = wireframe ? D3D12_FILL_MODE_WIREFRAME : D3D12_FILL_MODE_SOLID;
  desc.RasterizerState.CullMode = D3D12_CULL_MODE_BACK;
  desc.RasterizerState.FrontCounterClockwise = false;
  desc.RasterizerState.DepthBias = D3D12_DEFAULT_DEPTH_BIAS;
  desc.RasterizerState.DepthBiasClamp = D3D12_DEFAULT_DEPTH_BIAS_CLAMP;
  desc.RasterizerState.SlopeScaledDepthBias = D3D12_DEFAULT_SLOPE_SCALED_DEPTH_BIAS;
  desc.RasterizerState.DepthClipEnable = true;
  desc.RasterizerState.MultisampleEnable = ms_count > 1;
  desc.RasterizerState.AntialiasedLineEnable = false;
  desc.RasterizerState.ForcedSampleCount = 0;
  desc.RasterizerState.ConservativeRaster = D3D12_CONSERVATIVE_RASTERIZATION_MODE_OFF;
  desc.BlendState = rtv.GetBlendState();
  desc.DepthStencilState.DepthEnable = false;
  desc.DepthStencilState.StencilEnable = false;
  desc.SampleMask = UINT_MAX;
  desc.PrimitiveTopologyType = topology;
  desc.NumRenderTargets = rtv.GetNumRenderTargets();
  memcpy(desc.RTVFormats, rtv.GetFormats(), sizeof(RenderTargetViewFormat) * desc.NumRenderTargets);
  desc.SampleDesc.Count = ms_count;
  desc.SampleDesc.Quality = ms_quality;
  if (stream_output)
  {
    desc.StreamOutput = ((D3D12_StreamOutputConfig*)stream_output)->GetDesc();
  }
}

You'll notice that function isn't calling ID3D12Device::CreateGraphicsPipelineState, which is due to the various D3D12_Pipeline::Create overloads calling CreateDefaultPipelineDesc. And the Create overloads call ID3D12Device::CreateGraphicsPipelineState after doing changes to D3D12_GRAPHICS_PIPELINE_STATE_DESC for their particular overload.

The next change is creating a MSAA render target and depth stencil. In keeping with the framework's design and goals, these are separate classes (RenderTargetMSAA and DepthStencilMSAA) from their non-MSAA versions. Like the pipeline, creating these requires adding the sample count and quality as arguments to their respective CreateD3D12 functions. These additional arguments get applied to D3D12_RESOURCE_DESC for both the render target and depth stencil:

D3D12_RESOURCE_DESC resource_desc;
resource_desc.SampleDesc.Count = sample_count;
resource_desc.SampleDesc.Quality = quality;
// fill in the rest of the structure here

Also the render target view and depth stencil views need updating of their view dimension (the non-MSAA versions use D3D12_RTV_DIMENSION_TEXTURE2D and D3D12_DSV_DIMENSION_TEXTURE2D respectively):
 

D3D12_RENDER_TARGET_VIEW_DESC view_desc;
view_desc.ViewDimension = D3D12_RTV_DIMENSION_TEXTURE2DMS;
// fill in the rest of the structure here
D3D12_DEPTH_STENCIL_VIEW_DESC view_desc;
view_desc.ViewDimension = D3D12_DSV_DIMENSION_TEXTURE2DMS;
// fill in the rest of the structure here

After the resources are created they can be used with a corresponding pipeline by passing them to the framework's command list's OMSetRenderTarget overload, or for raw Direct3D12 ID3D12GraphicsCommandList::OMSetRenderTargets. Which makes the final change for adding MSAA support resolving the MSAA render target to a presentable back buffer. When using the framework this is accomplished by calling the CommandList's RenderTargetToResolved function, which makes the end of a Draw function something along the lines of:

m_command_list->RenderTargetToResolved(*m_render_target_msaa, current_render_target);
m_command_list->RenderTargetResolvedToPresent(current_render_target);
m_command_list->Close();
graphics.ExecuteCommandList(*m_command_list); graphics.Swap();

And the implementation of RenderTargetToResolved is:
 

void D3D12_CommandList::RenderTargetToResolved(const RenderTargetMSAA& src, const RenderTarget& dst)
{
  ID3D12Resource* src_resource = ((const D3D12_RenderTargetMSAA&)src).GetResource();
  ID3D12Resource* dst_resource = ((const D3D12_RenderTarget&)dst).GetResource();
  D3D12_RESOURCE_BARRIER barrier[2];
  barrier[0].Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
  barrier[0].Flags = D3D12_RESOURCE_BARRIER_FLAG_NONE;
  barrier[0].Transition.pResource = src_resource;
  barrier[0].Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
  barrier[0].Transition.StateBefore = D3D12_RESOURCE_STATE_RENDER_TARGET;
  barrier[0].Transition.StateAfter = D3D12_RESOURCE_STATE_RESOLVE_SOURCE;
  barrier[1].Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION;
  barrier[1].Flags = D3D12_RESOURCE_BARRIER_FLAG_NONE;
  barrier[1].Transition.pResource = dst_resource;
  barrier[1].Transition.Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES;
  barrier[1].Transition.StateBefore = D3D12_RESOURCE_STATE_PRESENT;
  barrier[1].Transition.StateAfter = D3D12_RESOURCE_STATE_RESOLVE_DEST;
  m_command_list->ResourceBarrier(2, barrier);
  m_command_list->ResolveSubresource(dst_resource, 0, src_resource, 0, dst_resource->GetDesc().Format);
}

Stream output

While this is not the last stage in the graphics pipeline, it was the last one to get added to the framework. I've updated the "Direct3D12 framework" gallery with a screenshot of the test program for this stage. It displays 2 quads formed by 2 triangles per quad. While graphically this is not a complex image, there is a fair amount going on in the test program to generate it. The quad on the left is a tessellated 4 point control patch where the 4 vertices are sent by a vertex buffer from the host to graphics card and the results of tessellation are rendered and also stored to a stream output buffer. The quad on the right takes the stream output buffer that was filled by the other quad and uses the vertex and pixel shader stages to render the same quad at a different location. Since Direct3D12 does not have an equivalent of Direct3D11's ID3D11DeviceContext::DrawAuto, the test program also uses the number of vertices the GPU wrote to the stream output buffer to determine how many vertices to draw. This involves reading back a UINT64 that is in a buffer on the GPU. For this simple of a test case the read back isn't strictly necessary, but it allows for testing functionality that should prove useful for more complex or dynamic tessellation.

Since the graphics pipeline for the two quads are so different from each other, they need separate pipelines and even root signatures. The pipeline for the left quad needs all the stages turned on and uses the corresponding overload and sets the nullable stream_output argument to point to an instance of the framework's StreamOutputConfig class to describe what is going to the stream output stage and which output stream should go to the rasterizer stage. The StreamOutputConfig class is a wrapper around D3D12_STREAM_OUTPUT_DESC that adds validation when setting the various fields and computes the stride in bytes for each output stream. The pipeline for the right quad needs the input assembler, vertex shader, rasterizer, pixel shader, and output merge stages enabled. Since the source for setting everything setup for these two pipelines is a bit lengthy rather than copy it here, it is available in the stream_output test program's GameMain::CreateNormalPipeline and GameMain::CreateStreamOutputPipeline functions in the attached source.

The key part linking the two quads is the stream output buffer. It is created in CreateNormalPipeline by calling the framework's StreamOutputBuffer::CreateD3D12 function with the same StreamOutputConfig object that was passed to creating the left quad's pipeline. The heap properties for creating a committed resource for the stream output buffer are normal for a GPU read/write buffer:

D3D12_HEAP_PROPERTIES heap_prop;
heap_prop.Type = D3D12_HEAP_TYPE_DEFAULT;
heap_prop.CPUPageProperty = D3D12_CPU_PAGE_PROPERTY_UNKNOWN;
heap_prop.MemoryPoolPreference = D3D12_MEMORY_POOL_UNKNOWN;
heap_prop.CreationNodeMask = 0;
heap_prop.VisibleNodeMask = 0;

Due to the BufferFilledSizeLocation field in D3D12_STREAM_OUTPUT_BUFFER_VIEW, there is a detail worth mentioning when filling out D3D12_RESOURCE_DESC. In Direct3D11 the number of bytes or vertices written to a stream output buffer was handled internally and application didn't need to think about it. This changed in Direct3D12 where the application needs to provide memory in a GPU writable buffer for this field (https://msdn.microsoft.com/en-us/library/windows/desktop/mt431709%28v=vs.85%29.aspx#fixed_function_rendering). In the framework, I regard this as part of the stream output buffer and want it to be part of its resource, which means increasing the number of bytes specified by the width field of D3D12_RESOURCE_DESC. The question becomes, how many bytes should the resource be increased by to include this field? The MSDN page for D3D12_STREAM_OUTPUT_BUFFER_VIEW (https://msdn.microsoft.com/en-us/library/windows/desktop/dn903817(v=vs.85).aspx) doesn't mention how large this field is, and https://msdn.microsoft.com/en-us/library/windows/desktop/dn903944%28v=vs.85%29.aspx discusses a 32-bit field named "BufferFilledSize" for how many bytes have been written to the buffer along with another field of unspecified size named "BufferFilledSizeOffsetInBytes". However, increasing the buffer by only sizeof(UINT) causes Direct3D12's debug layer to output all sorts of fun messages when trying to use the stream output buffer. Using a UINT64 works and would indicate that "BufferFilledSizeOffsetInBytes" is a 32-bit field as well. However, as will be discussed in the part about determining the number of vertices written to a stream output buffer, this 64-bit quantity seems to be 1 field for the number of bytes written rather than 2 separate fields. So, with it being determined that the additional field is a UINT64, the code for filling in the resource description becomes:

D3D12_RESOURCE_DESC res_desc;
res_desc.Dimension = D3D12_RESOURCE_DIMENSION_BUFFER;
res_desc.Alignment = D3D12_DEFAULT_RESOURCE_PLACEMENT_ALIGNMENT;
res_desc.Width = num_bytes + sizeof(UINT64); // note: +sizeof(UINT64) due to graphics card counter for how many bytes it has written to the stream output buffer
res_desc.Height = 1;
res_desc.DepthOrArraySize = 1;
res_desc.MipLevels = 1;
res_desc.Format = DXGI_FORMAT_UNKNOWN;
res_desc.SampleDesc.Count = 1;
res_desc.SampleDesc.Quality = 0;
res_desc.Layout = D3D12_TEXTURE_LAYOUT_ROW_MAJOR;
res_desc.Flags = D3D12_RESOURCE_FLAG_NONE;

Since it makes sense for a stream output buffer to be initially ready for use with the stream output stage, the initial resource state should reflect that. So the code for creating the resource is:

ID3D12Resource* buffer; HRESULT rc = core.GetDevice()->CreateCommittedResource(&heap_prop, D3D12_HEAP_FLAG_NONE, &res_desc, D3D12_RESOURCE_STATE_STREAM_OUT, NULL, __uuidof(ID3D12Resource), (void**)&buffer); if (FAILED(rc)) { ostringstream out; out << "Failed to create committed resource for stream output buffer. HRESULT = " << rc; throw FrameworkException(out.str()); }

Next the stream output buffer view needs to be created. For D3D12_STREAM_OUTPUT_BUFFER_VIEW, the BufferLocation is the GPU address pointing to the beginning of the stream output buffer, SizeInBytes is the number of bytes in the stream output buffer, and BufferFilledSizeLocation is the GPU address of that additional field I previously talked about for the counter of how many bytes have been written to the stream output buffer. Initially I had put the byte counter field after the stream output buffer. However, so that when copying the byte counter to a host readable buffer (D3D12_StreamOutputBuffer::GetNumVerticesWritten) doesn't have to compute an offset when calling ID3D12GraphicsCommandList::CopyBufferRegion, I moved it to the beginning of the resource and pushed the stream output buffer back by sizeof(UINT64).

D3D12_STREAM_OUTPUT_BUFFER_VIEW so_view;
so_view.BufferFilledSizeLocation = buffer->GetGPUVirtualAddress();
so_view.BufferLocation = so_view.BufferFilledSizeLocation + sizeof(UINT64);
so_view.SizeInBytes = num_bytes;

Since the stream output buffer can also be used as a vertex buffer for the input assembler stage, a vertex buffer view needs to be created to support that. It's pretty straight forward of having the same start address as the stream output buffer (not including the byte counter field), setting the total size of the buffer, and the vertex stride in bytes which is taken from the StreamOutputConfig for the particular stream index.

D3D12_VERTEX_BUFFER_VIEW vb_view;
vb_view.BufferLocation = so_view.BufferLocation;
vb_view.SizeInBytes = (UINT)num_bytes;
vb_view.StrideInBytes = (UINT)vertex_stride;

For getting the number of vertices written to a stream output buffer available to an application, the framework provides a static GetNumVerticesWritten function in the StreamOutputBuffer class. It's declaration is:

static void GetNumVerticesWrittenD3D12(GraphicsCore& graphics, CommandList& command_list, const std::vector so_buffers, ReadbackBuffer& readback_buffer, std::vector& num_vertices);

You'll notice that rather than take a single stream output buffer, this takes a collection of them. This is due to retrieve the data the byte count field needs to be copied from the resource for the stream output buffer, which is only GPU accessible, to a host readable buffer. That process involves using a command list to perform the copy, then executing the command list needs to wait for fence so that the copying is complete, and finally converting from number of bytes written to number of vertices in the buffer (which is dividing the value from the field by the vertex stride). Doing all of that per stream output buffer gets is basically stalling the application on the CPU by waiting for multiple fences to complete. By performing these operations on a collection, there is only 1 fence causing the CPU to wait. The source for this function is:
 

void D3D12_StreamOutputBuffer::GetNumVerticesWritten(GraphicsCore& graphics, CommandList& command_list, const vector so_buffers, ReadbackBuffer& readback_buffer, vector& num_vertices)
{
  D3D12_Core& core = (D3D12_Core&)graphics;
  ID3D12GraphicsCommandList* cmd_list = ((D3D12_CommandList&)command_list).GetCommandList();
  D3D12_ReadbackBuffer& rb_buffer = (D3D12_ReadbackBuffer&)readback_buffer;
  ID3D12Resource* resource = rb_buffer.GetResource();
  command_list.Reset(NULL); 
  num_vertices.reserve(num_vertices.size() + so_buffers.size());
  vector::const_iterator so_it = so_buffers.begin();
  UINT64 dst_offset = 0;
  while (so_it != so_buffers.end())
  {
    D3D12_StreamOutputBuffer* curr_so_buffer = (D3D12_StreamOutputBuffer*)(*so_it);
    cmd_list->CopyBufferRegion(resource, dst_offset, curr_so_buffer->m_buffer, 0, sizeof(UINT64));
    ++so_it;
    dst_offset += sizeof(UINT64);
  }
  command_list.Close();
  core.ExecuteCommandList(command_list);
  core.WaitOnFence();
  rb_buffer.Map();
  UINT64* cbuffer_value = (UINT64*)rb_buffer.GetHostMemStart();
  so_it = so_buffers.begin();
  while (so_it != so_buffers.end())
  {
    D3D12_StreamOutputBuffer* curr_so_buffer = (D3D12_StreamOutputBuffer*)(*so_it);
    num_vertices.push_back((UINT)((*cbuffer_value) / curr_so_buffer->m_vb_view.StrideInBytes));
    ++so_it;
    ++cbuffer_value;
  }
  rb_buffer.Unmap();
}
*>*>*>

One additional thing I would like to mention about the stream output stage is that if multiple output streams are used by a shader, they must be of type PointStream. In the test program, I am sending the tessellated object space coordinate to the stream output buffer, but due to this restriction I am also sending it to the pixel shader (in addition to the projection coordinates). While sending this additional field to the pixel shader isn't harmful, it's also unnecessary since the pixel shader doesn't use it. Having the left quad drawn as points isn't the behavior I wanted for this test program, so using multiple output streams to minimize the data sent to both stream output and pixel shader stages wasn't viable in this case.

  1. How do I want to expose creating textures that use them?
  2. How to determine the subresource index for all the different types of textures?
  3. Since using compile time type checking to avoid incorrect operations is a goal of this project, how does this affect the TextureUploadBuffer interface and D3D12_TextureUploadBuffer implementation?
tricona

Just a minor update from my previous post. I've updated my D3D12 framework (d3d12_framework.zip) to use committed instead of placed resources, along with centralizing the texture creation code. The switch over was pretty straight forward since it was basically removing an additional resource heap and switching my calls from CreateResource on my heap wrapper class (which internally called CreatePlacedResource) to use CreateCommittedResource instead while being careful to copy the correct initial state for the resource type. Since placed resources are something I will probably want to revisit down the road, along with revising them to allow for overlapping resources which is their primary benefit, I saved off the diff for reference code in ref directory in the project.

Also, in my previous post I had mentioned that I needed to use 2 different resources and copying between them for using a render target as a texture. While that is true with placed resources on heap tier 1 devices, when using committed resources it is possible to use the same resource for the render target as the texture. It just requires that the flags in the resource description includes D3D12_RESOURCE_FLAG_ALLOW_RENDER_TARGET. Since MSDN recommends setting that flag only when the texture will be used as a render target and since most textures will be loaded from a file in real usage instead, I created the additional Texture2DRenderTarget and D3D12_Texture2DRenderTarget classes. The existing RenderTarget/D3D12_RenderTarget class can have an instance created from one of these new classes which causes it to share the same resource (with corresponding AddRef/Release calls for managing its lifetime). I could have re-used the existing Texture2D/D3D12_Texture2D classes by adding a flag to their create functions, however adding the new classes keeps with the goal of using compile time type checking to avoid incorrect operations. I have added another test program to demonstrate this (render_target_to_texture_same_resource). In the spirit of the test programs being unit tests and for reference code, I have kept the previous test program and framework functionality around.

tricona

Playing with Direct3D12

It's been quite some time since I last posted here, despite reading some of the forums regularly. For this post I've gotten my work-in-progress Direct3D12 framework project into a good enough state that I thought I'd share it out (a zip file of the source is attached). The purposes of this project are:

  1. Play around with Direct3D12 to get hands on experience with it.

  2. Build a framework that uses compile time type checking to try to avoid invalid operations.

  3. A natural extension of the previous point is that inclusion of d3d12 headers should be restricted to inside the framework and the consuming application should not include d3d12 headers or have them included indirectly from the framework's public headers.

For the current state of this project, it supports:

  • All the rendering stages except stream output

  • Placed resources (see note at the end of this post)

  • Heap tier 1 hardware (https://msdn.microsoft.com/en-us/library/windows/desktop/dn986743%28v=vs.85%29.aspx)

  • Supports vertex, index, and instance buffers

  • Handles the various texture types (1D, 1D array, 2D, 2D array, and 3D)

  • Allows for multiple viewports

  • Resizing the window and toggling full screen mode (note: there is a bug at the moment for resizing while in full screen mode)

  • Debug and release modes for x86 and x64

The compute pipeline is on my to-do list. The texturing support also needs to be improved to support mipmapping and MSAA textures. By not supporting MSAA at the moment, I've been able to cheat when it comes to dealing with alignment of resources in a buffer heap by only considering the size since they all use the same alignment without MSAA.

Getting Started

When I started this project, the first step was to upgrade to Windows 10 since I was still on 7 at that point, as well as updating to Visual Studio 2015. After that a driver updated was still necessary so that I could use Direct3D12 with the existing graphics card. Once I got to working with code, the starting point of this project was my Direct3D11 framework that I had discussed in a previous dev journal entry. With the new Visual Studio project, I was initially confused why it couldn't find d3d12.h for which the answer turned out to be that I needed to update my project settings. By default the a project's "Target Platform Version" (in the "General" section) is "8.1" which is an issue since Windows 8.1 doesn't support Direct3D12. After updating that field to "10.0.10240.0", the compiler was able to find the Direct3D12 headers without further issue. Another stumbling block I hit early on was figuring out that I was on heap tier 1 hardware, meaning that each resource heap could only support 1 type of resource whereas tier 2 allows for all types of resources in the same resource heap.

Test Programs

As I had mentioned, the starting point for my Direct3D12 project was my Direct3D11 framework, which had included only a few test programs since my approach was to update the existing test program until I was doing something incompatible with it and spin off a new one. When going to Direct3D12 I very quickly realized that approach was very problematic. The main test program was trying to use vertex, index, and instance buffers, along with constant buffers for the various cameras, texturing, and multiple viewports. While all of those are perfectly reasonable to do with a graphics API, when trying to start with the basics of initializing the graphics API that additional functionality was encumbering. So, with this project instead of constantly expanding a minimal number of test programs, there are a variety of them. This has the added benefit of basically being unit tests for previously implemented functionality. To keep with this idea of them being unit tests, each one is focused on testing the functionality in a simple way instead of implementing some graphics algorithm. This allowed me to focus on debugging the functionality instead of debugging the higher level algorithm as well. This is distinction is perhaps most clear in test program for the hull and domain shader stages. While most people would probably implement a LOD or terrain smoothing algorithm, I took the approach of rendering a wireframe of a square to visualize how the tessellation was subdividing the square. As the following list shows, the test programs were created in order of expanding functionality and complexity. Screenshots of the test programs are in the album at https://www.gamedev.net/gallery/album/1165-direct3d12-framework/.

Differences from the Direct3D11 Framework

Aside from which graphics API is being used, there are a few important differences from the Direct3D11 framework to the 12 one.

Working with Direct3D12

Since the significant difference between Direct3D11 and 12 is the resource management rather than graphical capabilities, that has pretty much been the main focus of the framework so far. Since the framework is using placed resources, this was a bit more complicated than it needed to be since the aligned size for a resource needed to be determined, a buffer heap created to store all the resources of that type, then a descriptor heap with enough entries for all the resources needed for the frame, and finally using all of that to create the resource and its view. My understanding of committed resources is that finding the aligned size and the buffer heap can be skipped if those are used instead of placed resources. Aside from resource creation, another resource management difference is needing to use a fence to allow for the graphics card to finish processing a command list. I've had to work with fences on non-graphics projects before, and with a little reasoning I found it pretty straight forward to determine where these needed to be added.

And speaking of command lists, I found them to be a very nice change from Direct3D11's contexts. The rendering processes with command lists is basically reset them, tell them which pipeline to use, setup your resources, issue your draw commands, update the render target to a present state, close the command list, and execute it. This keeps the pipeline in a clean and known state since there is no concern over what was bound to the pipeline for the previous frame or render pass.

One key area of difference between Direct3D11 and 12 that I need more practice with is the root signature. These are new with Direct3D12 and describe the stages used and the layout of resources to each stage. Since a program should try to minimize the number of root signatures, they should be defined in a way to cover the variety pipeline configurations the program will use. The key issues I had with them is figuring out when I needed a root signature entry to be a table vs a view, and if I have multiple textures in a shader if I should use 1 entry in a table where the register space covers all the textures or one entry per texture in the table (and in case you're curious, the one table entry per texture approach is correct). One project I've periodically thought about is creating a program that displays a GUI of the graphics pipeline and allows for the shaders to different stages to be specified, and then will output a root signature that is compatible with those settings. However, that is something I haven't started yet.

Test Programs

  1. host_computed_triangle: A very basic test of rendering a triangle. Instead of communicating a camera's matrices via a constant buffer so the vertices can be transformed in the vertex shader, this test is so simple that the projection coordinates of the triangle's vertices are set when creating the vertex buffer so that the vertex buffer is just a pass-through.
  2. Index_buffer: Like host_computed_triangle, this test has the projection coordinates set when creating the vertex buffer, but adds using an index buffer to use 2 triangles to make a quad.
  3. single_2d_texture: A copy of host_computed_triangle modified to use a green texture instead of per-vertex colors
  4. constant_buffer: A copy of host_computed_triangle modified to use a color specified in a constant buffer instead of the per-vertex color.
  5. two_textured_instance_cubes: Uses vertex and index buffers to render a cube, which have the same 2D texture applied to the each face. This uses a constant buffer to communicate a camera's matrices to the vertex shader. By adding in an instance buffer, two cubes are displayed instead of just the one specified in the vertex/index buffers. I also added in using the left and right arrow keys to rotate the camera around the cubes. Which after seeing a what should have been an obscured cube on top of the other one, lead to my realization that the previous test programs had depth testing turned off. So, while I originally intended this program to be just a test of instance rendering, it is also the first test of depth stencils.
  6. geometry_shader_viewports: This is 4 cameras/viewports showing different angles of two_textured_instance_cubes. The left and right arrow keys will only update the upper-left camera. This is bringing the test programs on par with the main Direct3D11 framework test program, though it does rendering to multiple viewports completely differently. The Direct3D11 framework test program would iterate over the different viewports to make them active and issue the draw commands to that particular viewport. Whereas this test program renders to all 4 viewports at once by using the geometry shader to create the additional vertices and send the vertices to the correct viewport.
  7. hull_and_domain: In order to exercise two more stages of the rendering pipeline that haven't been used yet, this draws a quad in wireframe mode so that the tessellation is visible. I plan to revisit this program in the future to make the tessellation factors dynamic based on user input, but for now hull_and_domain_hs.hlsl needs to be edited, recompiled, and the program restarted to view the changed factors.
  8. texture_type_tester: Draws a quad to the screen, where pressing spacebar cycles through a 1D, 2D, and 3D textures. The 1D texture is a gradient from red to blue. The 2D texture is the same one used by two_textured_instance_cubes. The 3D texture has 3 slices where the first is all red, the second is all green, and the third is all blue. The uvw coordinates are set such that some of each slice will be visible.
  9. texture_array_tester: Uses instance rendering to draw 3 quads to the screen, where pressing space cycles through a 1D or 2D texture array. The 1D textures are a gradient from red to blue, where to show the different entries in the array the green component is set based on which entry in the array it is (0 for the first, 50% for the second, and 100% for the last). The 2D textures take the same approach to the green component, but instead of being a gradient it has a red triangle in the upper left half and blue triangle bottom right half.
  10. render_target_to_texture: Draws to a render target then copies the render target to a texture. Since heap tier 1 in Direct3D12 does not allow a render target texture to be on the same heap as a non-render target texture, the same ID3D12Resource could not be used for a texture and render target with only different views created for the same resource like was done in Direct3D11. Hence this program creating two different ID3D12Resources and doing a copy between them.
  11. texture_multiple_uploads: The previous test programs uploaded textures from host memory to the graphic's card memory 1 at a time with a fence between each one re-using the same buffer in host memory. This program uploads all the textures at once with only 1 fence by using 1 buffer per texture. Otherwise it's the same as texture_array_tester.

Lessons Learned

  • One of the goals of the Direct3D11 Framework was to be similar to XNA. While the Direct3D12 framework has a nearly identical Game base class, similarity to XNA is not a goal of this project. In particular is the removal of the ContentManager from the framework, which includes loading image formats as textures. There were two motivating factors behind this. The first was file formats aren't really part of a graphics API. The second is having an asset loading library on top of the framework is more extensible and allows for new formats to be added if a particular project requires them instead of bloating the framework with a wide variety of formats. If you look through the source for the test programs, you'll notice I currently don't have an asset loading library and instead create my textures through code, so this thought of an additional library is more for if this framework would be used on a real project. But it's still filling in an in-memory buffer to pass to the framework along with the pixel format, so the only things really missing are dealing with file IO and file formats.
  • This framework also has consistent error handling (aside from 1 more class to update). Rather than using the included log library to make note of the problem then return NULL or false, this one uses a FrameworkException class which inherits from std::exception. Improving that exception class is on my to-do list, but in its current state it has moved log library usage out of the framework and makes the application aware of the problem. Due to the simplicity of the test programs, they still call into the log library and bail out of the program, but a real application could potentially take steps to work around the problem (e.g. if you get an exception when trying to create a texture due to not enough space in the resource heap, make another resource heap with enough space).
  • Another use of the FrameworkException class is argument validation. The Direct3D11 framework didn't consistently do argument validation and frequently just documented that it was the caller's responsibility to ensure a particular invariant. As part of C++'s pay for what you use philosophy, I made argument validation optional via a define in BuildSettings.h. In the attached zip file, it is turned on because while I try to use compile time type checking to avoid incorrect operations, that can't catch all cases and I found it useful in development of the test programs. For a real project, I would expect it to be turned on in debug mode but off in release mode.
  • From a previous dev journal entry, the Direct3D11 framework used public header generation by using ifndefs and a processing program. This was a very messy solution and in retrospect, I'm not really sure why I did it that way, and didn't solve the issue (nor was it intended to) of the application indirectly getting the Direct3D11 header files included from including the framework headers. The Direct3D12 framework keeps the idea of public vs private headers, but doesn't use a processing program. The public headers are the generic interface that doesn't need to include the Direct3D12 headers. These have factory methods to create the actual instances from the private headers. The pimpl idiom could also have been used here instead of inheritance to solve this issue in a reasonable way. The main reason I thought to use factory methods and inheritance here is if the backing graphics API was changed to OpenGL, Metal, or something else then it is a matter of adding new factory methods and private header subclasses (along with argument validation to make sure all the private classes are using the same graphics API).
  • For shaders, instead of continuing what I had done in the Direct3D11 framework of changing the name of the entry point function to be reflective of which stage the shader was for, it would have been less updating of file properties to go with Visual Studio's default of "main". Though when editing the file properties, updating the dialog to "all configurations" and "all platforms" does help mitigate this rename issue.
  • Multiple test programs to check the functionality instead of implement a graphics algorithm worked quite well, though are visually less impressive.
  • Multiple viewports in Direct3D12 seem to be best implemented by using the geometry shader to send the vertices to the correct viewport.
  • Compile time type checking isn't enough to avoid invalid operations, hence the VALIDATE_FUNCTION_ARGUMENTS ifdefs.
  • When I was first working on the hull_and_domain test program, as a result of copy and pasting a different project as the starting point the index buffer was initially setup for using 2 triangles to make a quad. Since the original code for that test program was modified to use an input topology of a 4 point control patch, that produced an incorrect result and allowed me to realize that I just needed to update the index buffer to be the indices of the 4 vertices in order. While this is obvious in retrospect and didn't take me long to realize, since the online resources I looked at didn't include this detail I figured I'd mention it here.
  • Setting the platform to build/run to x64 helps find more spots where argument validation is needed. This is mainly due to my using the stl where the return type of size is a size_t which can vary by platform, whereas the Direct3D12 API is uses fixed sizes such as UINT32 across all platforms.

Other Notes

  1. After the last Windows Update there were additional messages generated for incorrect states. I'm glad that these messages were added and I took care of resolving them inside the framework. It was things like the initial state of depth stencil and render targets needing to be D3D12_RESOURCE_STATE_DEPTH_WRITE and D3D12_RESOURCE_STATE_PRESENT respectively instead of D3D12_RESOURCE_STATE_GENERIC_READ which I had simply as a copy and paste from the texturing and constant buffer heaps.
  2. While most examples that I saw online for Direct3D12 used committed resources which are simpler to setup, the framework currently uses place resources. This is due to my wanting to get hands on experience with the interactions between the resource, resource heap, and descriptor heap. However, since the framework currently prevents creating overlapping resources, which is the primary benefit of placed resources, and since Nvidia recommends committed resources over placed, I will likely revise this in the future.
tricona
While there is still plenty of work to do on my d3d11 framework, I thought I'd post an update on the current status. For what's currently in it:

  • keyboard input
  • mouse input
  • vertex shaders
  • pixel shaders
  • viewports
  • blend and depth stencil states
  • texturing (just loading from a file for now, will be expanding functionality later)
  • resizing/fullscreen toggling

    A short list of things that I will be adding is (I have a longer list of tasks for my own reference):

    • rest of the shader types
    • model loading
    • high dpi mouse support
    • joystick/xbox controller support
    • deferred contexts
    • sprite support (DirectXTK looks like it might make this part easy, especially the text support)
    • networking lib

      It would be further along, but two main factors slowed down development on it. The first and main factor was that I wasn't able to work on it for most of January due to work requiring a significant amount of extra hours. The second and more of a speed bump was there are several functions that MSDN notes that cannot be used in applications submitted to the Windows Store (the entire D3D11 reflection API for example). While I am a long way off from doing anything with the Windows Store, if ever, I figured it might make this framework more useful and reduce maintenance down the line if I followed the recommendations and requirements for it. Part of that is to compile shaders at build time instead of runtime, so I upgraded to VS2012 to get integrated support for that instead of trying to roll my own content pipeline (which is not off the table for other content types, just no immediate plans for that feature).

      Converting the project from VS2010 to VS2012 needed four notable changes, the first three of which are straight forward:
      1. Removing references to the June 2010 DirectX SDK from include and linker paths, since the header and library files are now part of the VS2012 installation and available in default directories.
      2. Switching from xnamath to directxmath.h, which in addition to changing the name of the include file is also updating the namespace of the types in the file to be part of the DirectX namespace. Since the rest of the DirectX functions and types (i.e. ID3D11DeviceContext) are not in that namespace, it seems a bit inconsistent to have some functions/types in the namespace and others outside of it.
      3. The next easy change was to do build time shader compilation and change the runtime loading mechanism to load the .cso file. This was a matter of splitting the vertex and pixel shaders into separate .hlsl files and setting their properties appropriately (mainly "Entrypoint Name" and "Shader Type", I also like "Treat Warnings As Errors" on). The ContentManager class also needed its shader loading mechanism updated. Instead of calling D3DX11CompileFromFile in the CompileShader function, the CompileShader function was changed to LoadFile, which does a binary load of a file into memory. After that, the code was the same of calling CreateVertexShader or CreatePixelShader on the ID3D11Device instance. There was a little wrinkle in these edits for current working directory when debugging vs the location of the .cso files. The .cso files were placed in $(OutDir) with the results of compiling the other projects in the solution (i.e. .lib and .exe files, not the .obj for each code file), whereas the working directory when debugging was the code directory for the test program. This meant that just specifying the filename without a path would fail to find the file at runtime. So either the output path for compiling the shaders needed to change or the working directory when debugging needed to change. I chose to change the working directory since running in the same directory as the .exe file is a better match to how an application would be used outside of development. This also meant that the texture I was using needed to be copied to $(OutDir) as well, which was easy enough to add as a "Post-Build Event" (though I initially put in the copy command as "cp" since I'm used to Linux, but it didn't take long to remember the Windows name for the command is "copy").
      4. This was the not so straight forward change, texture loading. Previously I was using D3DX11CreateShaderResourceViewFromFile, which is not available in VS2012. To get texture loading working again, I had to do a bit of research. I didn't want to get bogged down into loading various image file formats and reinventing the wheel (this entire project can probably be called reinventing the wheel, but the point of it is more about applying D3D11 and expanding my understanding of it). Luckily, right on the MSDN page for D3DX11CreateShaderResourceViewFromFile, there are links to recommended replacements. The recommended runtime replacement is DirectXTK, which compiled right out of the box for me, no playing with settings or paths to get it to compile. For using it, there was one problem I ran into. Initially I was using seafloor.dds from Tutorial 7 of the June 2010 DirectX Sample Browser. The DirectXTK Simple Win32 Sample comes with a texture file of the same name and looks the same. However, there is at least a slight difference in the file contents. The one from the June 2010 sample failed to load with an ERROR_NOT_SUPPORTED issue, but the one from the DirectXTK sample works. After dealing with that, I decided to get rid of third party content, and am now using a debug texture I had lying around which is a .png file and it works with DirectXTK just fine.

      Backing up for a moment to before I did the VS2012 upgrade, in my previous journal entry of "D3D11 Port of Particle System Rendering", I mentioned that I wanted to look into other ways for creating the input layout description for a vertex shader. The first thing I looked into was the shader reflection API, so that the input layout would automatically be determined by the vertex shader code. This was obviously before I knew the API would not be available to Windows Store applications or decided to follow those requirements. Aside from that, I ran into a different issue that prevented me from using it here. The vertex shader code knows what its inputs are, but it doesn't know which are per-vertex and which are per-instance. From looking at D3D11_INPUT_ELEMENT_DESC, knowing per-vertex or per-instance is important to set the InputSlotClass and InstanceDataStepRate fields correctly. Since reflection wasn't an option here, I created a class to manage the input layout, not surprisingly named InputLayout. It does avoid the simple mistake I had done in the previous journal entry of incorrectly computing the aligned byte offsets, as well as avoiding spelling mistakes for semantic names, and avoiding attempts to set more input layouts than were allocated. In its current form, it does force non-instance vertex buffers into slot 0 and instance into slot 1, precluding any additional vertex buffers, which I might revisit later once I hit a scenario that requires more.

      For the configuring the D3D11 rendering pipeline, my approach has been to get everything into a known state. Taking the example of setting a constant buffer on the vertex shader stage, if a particular constant buffer is not set, then whatever the previous value was will be present the next time the shader is invoked. To avoid these stale states, in the framework when a VertexShader instance's MakeActive function is called, it sets all of the constant buffers to the last value the VertexShader instance received for them (the actual function for setting a constant buffer is in ShaderBase). Taking this known state design a step further is where the RenderPipeline class comes in (once I get the parts done and integrated into it anyways). The plan is for when an instance's MakeActive function is called that the vertex shader, pixel shader, their constant buffers, etc all are active on the provided ID3D11DeviceContext. An obvious improvement to setting all of the pipeline info each time would be to only set the parts that don't match the last value for the ID3D11DeviceContext instance. However, I've been trying to get things working before I start looking into performance improvements.

      Below is a screen shot of the test program, which is using instance rendering for 2 cubes, and 4 viewports to display them from different angles.
      [sharedmedia=gallery:images:3470]


      d3d11_framework.zip is a zip file that has the source for the framework (MIT license, like normal for me). Though it does not have DirectXTK in it, which that can be found here. For adding it in, put the DirectXTK include files in the framework's external_tools/DirectXTK/public_inc/ directory, and once build the DirectXTK.lib file in external_tools/DirectXTK/lib/Debug/ or external_tools/DirectXTK/lib/Release/ depending on which configuration you built it for.
tricona
In the course of expanding and fixing the D3D11 framework from my previous journal entry, I've found that there are types and functions that I want to have as public in the library but not exposed outside of it (internal scope in C#). Instead of purely using the pimpl idiom (for those unfamiliar with the term or how to apply it, these should help explain it: http://en.wikipedia.org/wiki/Opaque_pointer and https://www.gamedev.net/page/resources/_/technical/general-programming/the-c-pimpl-r1794), I decided on public and private headers, for which pimpl can be used to hide types. To automatically generate the public headers, I created a utility program that uses tokens in the private headers to generate the public ones (to be clear, the utility doesn't automatically do pimpl, that's up to the developer, the utility just does simple line exclusion). I did a few quick searches and didn't see a utility that already did this for C++, but it was simple enough to program my own.

There were a few choices for how to specify the start/stop tokens: comments, ifdef, ifndef. So that IDE outlining can show the affected sections and so the build of the library doesn't need an additional define (this is pure laziness since adding a define to the project settings is trivial), I went with lines starting with "#ifndef" and "#endif" which also contain "PUBLIC_HEADER" to mark the start/end of sections to exclude from the public header. The lines need to start with the "#ifndef" or "#endif" tokens so that if needed the line can be commented out rather than needing to delete it and remember where they need to go if/when they need to be added back in. I've tested it successfully so far for hiding types (via pimpl's declare and only use pointers approach) and functions. For an example usage:#ifndef FOO_H#define FOO_Hclass Foo{ public: // stuff... #ifndef PUBLIC_HEADER // this function is excluded from the public header void blah();#endif /* PUBLIC_HEADER */ // stuff... private: // stuff...};#endif /* FOO_H */



For integrating it into the library build, I added a post-build event to make the public include directory and call the utility. That has the side effect of the test program in the solution can't check its code against the library headers until the library build is complete and if an error is found during the test program build, clicking the error message opens the public include instead of the private one in the solution. I consider this okay since it is not the normal use case. The normal use case is having a separate solution to which a released build of the library is added to "additional dependencies", and the path to the public headers is added to the "additional include directories" project settings.

An alternative to generating public headers is to use the same headers for public and private, and switch to using an ifdef for when building the library. I didn't go this way primarially due to one issue: if a program build sets that define, all the library private stuff is available to the program

Source (under MIT license as normal): public_header_generator.zip
tricona
The D3D11 port of the rendering process for my particle system editor is complete (VC++ 2010 project and source under the MIT license can be found at the end of this entry, also you will need to edit the include and additional dependancy paths to build). The hardest part of the project has been finding time for it since my job has been demanding evenings and weekends recently to release that project. But, I've been on vacation this past week and I've spent part of it on finishing this port, which from looking back took about 12 weeks of calendar time and vastly less actual time (at least half of which was this week).

I started this port after reading "Practical Rendering & Computation with Direct3D11" and thought it would be a good first project to start applying what I had learned from that book. From reading it and from using XNA on previous projects, I knew I wanted to create an interface layer to hide most of the details of D3D11 from the application, which is where the d3d11_framework project (also included below) came from. It is a partial recreation of the XNA framework in C++, though is comparatively in very rough shape. It also takes into account the differences between D3D9/XNA and D3D11, such as using xnamath.h and passing a ID3D11DeviceContext* to the rendering functions so that multithreaded rendering is possible, though the creation of deferred contexts is currently not implemented. The current form is good enough to get through a sample project (a port of tutorial 7 from the D3D11 samples included in the June 2010 SDK, to which I added instance rendering) and the particle system rendering. My plan for my next project is some cleanup work and expansion of the framework, including pushing parts of the particle system port to the framework. I already have a laundry list of tasks that is sure to grow. I had thought about a compute shader version of the particle system, but I'm going to hold off on that at least for now.

In the course of working on this project, I'm pretty sure I made every newbie mistake:

  • Using XMVECTOR and XMMATRIX in dynamically allocated types, which lead to access violations when changing them since they may not be properly aligned. So, now I use them only during calculations and store the result in XMFLOAT3/XMFLOAT4 or XMFLOAT4X4.
  • Missing that the order of the arguments to XMMatrixRotationRollPitchYaw(pitch, yaw, roll) is different than XNA (yaw, pitch, roll) and different than its function name (roll, pitch, yaw)
  • Constant buffers bound to the vertex shader stage are not automatically bound to the pixel shader stage (not sure if this is just XNA that does this or if it's D3D9)
  • For the layout description, having both aligned byte offsets for the instance entries set to 0 (one of the things on my to-do list for the framework is to look into making creating the layout description less error prone)

    In addition to that, while creating the C++ port, it made me wonder why I made the C# editor and XNA renderer multithreaded. The C++ version is completely single threaded, since there really is no need for multithreading. The best answer I can come up with for why the C# versions have multithreading is that C# makes it easy.

    d3d11_framework: d3d11_framework.zip
    particle system renderer: ParticleSystemEditor_D3D11.zip
tricona
[heading]Reason for the Project[/heading]
In the tank game, the particle system was probably one of the weaker parts of the game on the technical side. In addition, it was a tedious process to adjust it: run the game, get to a point where the particle system will be used, kick it off, exit, tweak it, and repeat. Which is why my project after that was this particle system editor.

[heading]Two Windows[/heading]
My plan was to create an editor UI using winforms and the rendering process in XNA. Since I want to start playing around with Direct3D 11 soon, including creating a renderer for this in it, I did not have the rendering window embedded in the editor window. Instead the winforms editor uses a TCP socket to connect to the rendering window. All the configuration settings and changes to them are sent over the socket. This way once I get to creating the D3D11 renderer, the winforms editor can be used without modification.
gallery_49001_516_26998.png

gallery_49001_516_9718.png

When I started this, I expected the winforms and XNA side of things to take about a week. Instead it ended up taking just over two weeks in terms of development time (calendar time is a different story). A couple days of the extra development time was trying out instance rendering for the first time. I also moved the billboarding to the vertex shader (the tank game did it on the CPU) using the method described in the 1.2 section of GPU Gems 2. Though instead of moving the vertices to the world space position in the vertex buffer, I just passed the world space position as part of the instance data. This allowed for vertex buffer containing the position of the vertices to remain 3 or 4 vertices long (depending on particle shape) and unmodified.

[heading]Re-use[/heading]
So that the particle systems created are easy to use in future projects, the editor can save/load files using an XML format. I also wrote it to be easily expandable in case a new configuration parameter or feature is needed. In case I end up going really wild with it, I have been writing it thus far so plug-ins could eventually be supported for existing categories on configuration parameters (i.e. adding a new dispersion type or emitter shape).
The particle system code itself is mostly re-usable but could use a variety of tweaks for a game implementation. The implementation in the renderer allows for the configuration to change at any time. In a game, that additional complexity is likely not needed. Also, many of the conditionals could be completely removed in a version for a game. And of course there would need to be a minor change to get rid of the looping that is in the current implementation (emitter runs for a configurable time period and after all the particles die, it waits a second before starting up again).

[heading]Next[/heading]
While I could expand this to include attaching particle systems to models, then defining triggers for when the particle systems should start/stop, I'm not going to do that (yet). Next up for me is to finish reading "Practical Rendering & Computation with Direct3D11". After that and maybe a few basic practice projects in D3D11 will be the D3D11 renderer for the particle system editor, which I'll mainly be doing for a combination of practice and code that I'll very likely need on a future game.

[heading]Download[/heading]
Since particle systems look better in action, if you want to try it, you can grab the editor particle_system_editor.zip and the renderer xna_renderer.zip. Please note, that these are just a zipping up of the directories I had Visual C# publish to. I haven't tried to use this feature before, and the only testing I've done on it was on one of my computers here and it required installing the XNA Framework Redistributable 4.0 first. If I was going to release it as a real tool instead of "if you feel like it, feel free to give it a whirl", then there are other features I would like to include, such as:

  • Installer that includes both the editor and renderer
  • Being able to set more things to change over time, such as individual particle size, alpha (overriding the coloring setting in the current coloring modes)
  • Adding alpha to the color sets and sequences (currently alpha is used only in textures)
  • Lighting and materials
  • Models as particles (not sure if this is normally done, but could be cool for particle systems with a low number of active particles)
  • Add color selection to the dialog for adding/editing color set/sequence entries instead of using a button to launch a color selection dialog
  • Custom control for selecting colors that is faster to use
  • More intuitive UI for doing color sequences when interpolation is turned on
  • Make renderer part of editor window by default, while keeping client/server code as an option
  • Handle resizing the editor window/dialogs
  • Help menu
  • Ability to specify a custom port/IP/host name for the client/server portion
  • Renderer able to render multiple particle systems at once
  • Editor able to edit multiple particle systems (i.e. each particle system is its own tab in the editor UI)
  • Make color set/sequence % readable on dark colors
  • Add ability to unselect a color in the color set/sequence list boxes

    [heading]Source[/heading]
    I had been thinking about releasing the source for this project and since the comments after the original posting said it would be useful, here it is: ParticleSystemEditor.zip. That is a zip file of the Visual C# solution, project files, and source. I'm going with the MIT license for this (LICENSE file included in the same directory as the solution files), so have fun with the code.
tricona

XNA Game Complete

[heading]Summary[/heading]
The primary goals of my tank game project were: to learn about the XNA framework, and create a feature complete game. I feel that I have accomplished both. Though I used the term "feature complete game" instead of "complete game" because all the code is there and working for the game, but content is lacking. There is 1 model for an enemy turret, 1 model for both the player and enemy tanks, textures for all models are their UV maps, and only 1 level currently (tested level changes by having the same level be the next level). As for what's in:

  • Support for multiple players in split screen
  • Main Menu for starting a campaign, selecting a level, or changing options
  • Basic particle systems for weapon impacts and object destruction
  • Location based triggers
  • Checkpoint respawn system, complete with fallback if there are too many deaths at the current checkpoint too quickly
  • State machine-based AI where the state machine is specified for each enemy in the level instead of enemy type
  • Navmesh and A* for pathfinding and smoothing
  • Ability for subsequent levels. Due to detecting when to switch to the next level as a trigger, the campaign actually doesn't have to be linear since each next level trigger can go to a different level and there can be multiple triggers in a given level.

    Some key things that I've learned from this are:

    • XNA is great at creating quick test programs
    • XNA's Ray struct needs to have its Direction field be unit length or intersection methods don't give correct results
    • Blender 2.56 is significantly easier to use than the last version I tried
    • When creating a model, be careful with the object matrices if you plan to do anything beyond just drawing the model. Having an object matrix not be the identity matrix means the vertices aren't actually being updated in the modeling software and can produce unexpected results in a custom model processor
    • Using "SharedResource = true" means that what is supposed to be the same instance will in fact be different instances at run-time
    • Pathsmoothing is easy on a navmesh when all regions in a navmesh share a complete edge

      Here's some screen shots of the game:
      gallery_49001_299_1110.png
      The main menu (as with the rest of the game, nothing fancy)

      gallery_49001_299_43309.png
      Level start (player's tank in foreground, enemy tank a bit further back and to the side, wall on the left)

      gallery_49001_299_47475.png
      Weapon particle system

      gallery_49001_299_63513.png
      Particle system for killing an enemy tank

      [heading]What's next[/heading]
      There's a few things I have planned for my next few projects. I want to spend some time playing with different graphics algorithms, as well as create a more complex particle system and better looking particle system. While I'm working on those, I'm also going to be reading a book on D3D11. Once all that is done, I plan to start on working on my next game. I already have a few ideas to flesh out and will probably start the design doc before I actually finish any of the other projects :-)
tricona
While working on a basic statemachine AI for my tank game, I found that my A* pathfinding on a navmesh wasn't working. I had already verified that the A* implementation worked in a tester program, which created the navmesh at run-time. However in the game, the content pipeline is used for loading a navmesh from an XML file. The A* implementation was using Object.ReferenceEquals to determine whether the destination region was reached, and was relying on each region's neighbor list working on the same instances of regions. As it turns out that due to having each region's neighbor list have [ContentSerializer(SharedResource = true)] attribute and parameter on the property, neighbors were not the same instances throughout the navmesh and therefore the destination detection would not work, as well as the closed list not working properly.

My navmesh is specified in an XML file enumerating the regions and their connecting edges. For the sake of this particular game, the regions of the navmesh are axis aligned boxes where connected regions must share a full edge. Because of the axis aligned boxes part, my class for storing the regions is named "Rect". I originally tried to declare its neighbor list property as:

[ContentSerializer]
public Rect[] Neighbors { get; private set; }
That produced a compiler error for a cyclic reference. This is because of when region 0 is a neighbor of region 1, region 1 is in region 0's neighbor list and vice versa. From a quick search, the normal way to deal with this is to set "SharedResource = true" parameter for the "ContentSerializer" attribute:

[ContentSerializer (SharedResource = true)]
public Rect[] Neighbors { get; private set; }
What I hadn't realized until debugging my A* implementation was that setting "SharedResource = true" made it so the region 1 instance in region 0's neighbor list was not the same instance as in the list of all regions in the navmesh. The particular test I used to determine this was:

if (Object.ReferenceEquals(m_regions[0],m_regions[1].Neighbors[0]))
{
int foo = 3;
}
if (Object.ReferenceEquals(m_regions[0],m_regions[1].Neighbors[1]))
{
int foo = 3;
}
With breakpoints set at both of the "int foo = 3;" lines, and neither breakpoint was hit. It should also probably be noted that in my navmesh for this level each region is connected only to two other regions which is why only the two if statements above where needed for this test. After debugging this, I found http://social.msdn.m...4-652207885ad8/ which (while old) confirms that setting SharedResource to true does result in different instances.

My solution for dealing with this was to store an ID number for each region and have the neighbor list that the XML importer populates be a list of those IDs. At run-time after the navmesh is loaded, each region now has a CompleteLoad function called which takes the list of all regions. This function takes care of converting the IDs to the actual instance, which allows my A* implementation to function properly.
tricona

Mesh's Matrix

[heading]Model Processing[/heading]
While working on a model processor to determine the bounds of a model, I noticed something odd. The min/max were not what I expected and the total width did not match the model. The code is simple and taken from one of my tester programs were it worked correctly, so I was wondering why this model was an issue.

private void GetAllVertices(NodeContent input,LinkedList verts,
ref BoundingBox box)
{
MeshContent mesh = input as MeshContent;
if (mesh != null)
{
// load the positions from the mesh node
foreach (Vector3 pos in mesh.Positions)
{
// store the vertex
verts.AddLast(pos);

// update the bounding box
if (pos.X < box.Min.X)
{
box.Min.X = pos.X;
}
if (pos.X > box.Max.X)
{
box.Max.X = pos.X;
}
if (pos.Y < box.Min.Y)
{
box.Min.Y = pos.Y;
}
if (pos.Y > box.Max.Y)
{
box.Max.Y = pos.Y;
}
if (pos.Z < box.Min.Z)
{
box.Min.Z = pos.Z;
}
if (pos.Z > box.Max.Z)
{
box.Max.Z = pos.Z;
}
}
}
else
{
// recurse on the children of a non-mesh node
foreach (NodeContent child in input.Children)
{
GetAllVertices(child,verts,ref box);
}
}
}

This function is called from the model processor's Process method with box initialized so that its min values are float.MaxValue and its max values are float.MinValue. The part of storing the vertices in a linked list was for other functionality in this particular model processor (building a heightmap from the mesh).

Since this code works elsewhere, I inspected the fbx file. The fbx files I use are generated from the fbx exporter that comes with Blender 2.56.5, which generates text based fbx files; Autodesk's site says fbx files can also be binary. I noticed that the matrix in the PoseNode for the model was not the identity matrix, whereas in the models that work it is. Later exploration of the fbx files also showed that the Model node for the mesh has a "Lcl Rotation" property whose rotation in degrees matches the matrix.

The model showing the issue is the result of manipulating a Blender grid object. These objects are in the xy plane when created. Since I was creating terrain for a game where terrain is in the xz plane and y is the height, I rotated the object by -90 degrees around the x-axis, where the minus sign is to get the normal facing the right direction of +y. This is what created the issue. Setting rotation or scale in object mode does not edit the vertices of the mesh, instead it manipulates the object's world matrix. Since the model I was dealing with was a single mesh, the solution was simple, set the object rotation to 0 and in edit mode rotate by -90 around the x-axis. By applying the rotation in edit mode, the vertices' coordinates were actually changed by the rotation. To help avoid this problem with future models, I've updated the "Modeling Requirements" section of my design doc to specify that objects must have a rotation of 0 around each axis and a scale of 1.

[heading]How This Matters to Drawing[/heading]
Going through the process in the previous section helped me understand the details of a different section of code. When drawing a simple model in XNA, the code is typically along the lines of:

Matrix[] transforms = new Matrix[model.Bones.Count];
model.CopyAbsoluteBoneTransformsTo(transforms);
foreach (ModelMesh mesh in model.Meshes)
{
foreach (BasicEffect effect in mesh.Effects)
{
effect.World = transforms[mesh.ParentBone.Index] * World;
effect.View = cam.View;
effect.Projection = cam.Projection;
}

mesh.Draw();
}

With World being a Matrix defined by the game. Why do "transforms[mesh.ParentBone.Index] * World" instead of just World? Both MSDN and "Learning XNA 4.0" by Aaron Reed don't go into detail of why it is necessary. MSDN keeps referring to bones in its brief explanation, but doesn't mention that even models without a skeleton still have 1 bone in XNA (discussed in the next paragraph). "Learning XNA 4.0" by Aaron Reed says it's to handle models that are comprised of multiple meshes. However, this code matters for single mesh models that are not attached to a skeleton, such as my terrain model in the previous section.

All meshes in a model have a world space matrix attached to them. This matrix allows rotation and scaling of the mesh to be applied to the vertices that it is comprised of. XNA loads this matrix as if it were a bone and sets it as the parent bone of the mesh. If you skip applying this matrix, then your rendering of the model may not match what is shown in the software used to create the model.
tricona
[heading]Intro[/heading]
Don't let the title deceive you, this is not a simple rotation case. This is the subject that I've spent more time on than any other in my tank game so far, even more than the rest of the game combined. If I had found a solution, this dev journal entry would probably be named "Orientation of Objects Substantially Larger Than Heightmap Resolution". After a week[sup]1[/sup] of working on it, I thought I would end up writing a tutorial on this, due to being unable to find reference material for this particular problem (aside from one page that I have been unable to find again and isn't in the browser history when I got to looking for it). After a month[sup]1[/sup] plus of working on this, I decided it was time to move on as my motivation was getting sapped. I may come back to this problem later, but for now, I want to get the rest of the game working.

This problem is determining the pitch and roll of the tanks based on the terrain they are over. Positioning an object on a heightmap using a single point is a fairly simple task and I could find an abundance of reference material on it. When using a single point, the orientation of the object can be determined by looking at the normal of the heightmap cell it is in or the face it is on. However, my tanks are substantially larger than the resolution of my heightmap. The tanks roughly have a footprint of 9.5 meters by 3.5 meters, and the heightmap is a regular 2d grid in xz with indices being 1 meter apart. I could increase distance between points on the heightmap, but thought that there must be a better solution.

While looking for the better solution, I spun off test programs so that I could debug different techniques and their parts separately from the whole game and not have to revert the changes in order to get the game back. This showed one of the strengths of XNA, how quickly a new program can get up and running so you can concentrate on what you are trying to do. Window creation, gathering input, model loading, basic rendering are all there once the project is created. The only thing missing is a basic camera class or struct, which of course I brought over from the game. While I'm not sold on XNA for large projects, for test programs the quick setup is very nice.

[heading]First Test Program[/heading]
The first test program was to debug part of an algorithm that didn't pan out. My first attempt was to gather all the entries in the heightmap that are within the bounding box of the object/tank, and from there try to determine the pitch and roll. For this and the next attempt, the real trick is going from heightmap values to pitch and roll, which I will discuss in the next section. This way of gathering heightmap values has a flaw in that it could cause the object to clip through an incline if the bounds didn't end on an integer value (due to the heightmap having a resolution of 1 meter and its alignment).

The part of first attempt that the first test program dealt with is determining which entries in the heightmap are within the object's bounding box. There are 4 pieces of information needed for this search, the bounding box, object's position, the object's rotation (just yaw since determining pitch and roll is the ultimate goal), and the heightmap. The axis-aligned bounding box, which XNA's BoundingBox struct represents, is determined by the model. After doing a model processor for skeletal animation, writing one to determine the bounding box was trivial. The object's position and rotation are variables that could be adjusted by the user and are passed to the function that does the search. The heightmap is the class I put the search function in, and for the sake of this test it is just an 11 by 11 grid with all heights set to 0. The search has an additional twist of in the design doc I specified that the origin of the model is to be the center of mass[sup]2[/sup], which is not necessarily the geometric center. This is easy to deal with, but worth mentioning. As with ray-sphere intersection on scaled, rotated, and translated spheres discussed in my previous dev journal entry, I created a matrix but this one is to convert the entries of the heightmap, which are in world space, into the object space of the bounds. This matrix is:
Matrix.CreateTranslation(-box.Center()) * rot * Matrix.CreateTranslation(pos)
Where box.Center() is an extension method to get the center of the BoundingBox, rot is the object's rotation matrix for yaw, and pos is the position of the object in world space. As with the ray conversion, I multiplied each heightmap entry against by the inverse of the matrix. The result of the conversions are run against BoundingBox's Contains method, and so long as it does not return ContainmentType.Disjoint, the position is in bounds.

In order to verify this part of the algorithm worked and debug earlier iterations of the code, I drew a series of squares to represent the heightmap locations which are colored white if not a match and green if they are within the bounds. The squares were centered at the heightmap position and scaled down slightly to not overlap. An outline box was also drawn as reference for where the bounds are. The model for the outline box was also the model used for creating the bounding box for this test program rather than tank model from the game. A screen shot is below (note: white squares that partially overlap with the outline box are outside the bounds because it is the center of the square that is the entry in the heightmap and the center is outside the outline box).
gallery_49001_299_2671.png

[heading]Second Test Program[/heading]
The purpose of the second test program was to convert the heightmap values into pitch and roll, the hard part of this problem. A variant of the outline box model was used for this program, with the change being adding some height to the model. Since I didn't bother making the outline box anything more than a series of faces in either tester, in this tester I set the cull mode to none so that both front and back faces would be drawn. In addition to that model, several terrain test patches were created to cover all the basic scenarios and some more complicated ones. So that the object's position was fixed for the tests against the different terrain, instead of moving the object, I switched the terrain and ran the algorithm on the switch. Doing it this way was nice so that breakpoints were not being hit on every update cycle, just when there was an actual change.
gallery_49001_299_30504.png
From taking the final code from the first test program and integrating it into the second, I quickly realized that this was not going work. It's been quite awhile since this part, so I don't remember all the problems. But from what I remember, the two main problems were:
1. The y value for the object was an unknown, and wouldn't be known until the pitch and roll were known.
2. By taking all of the heightmap entries within the bounds, I could compute the pitch and roll between any number of them, and how many there were would depend on the footprint of the object.
To handle problem 2, I tried to split the heightmap entries into quadrants within the bounds, trying to keep track of the entry that would cause the greatest pitch and/or roll. This is what made problem 1 an issue since to calculate the pitch and roll of a point relative to the center, both points needed to be known and the center was not yet known.

Those problems and that reference page I couldn't re-find, caused me to switch to a completely different method. I would use control points[sup]3[/sup] at defined positions on the bounding box of the object and deal with pitch and roll between the control points. In order to avoid clipping through the crest of a hill, I went with 8 control points, 4 at the corners and the other 4 along the perimeter aligned with the center of mass. The numbers in the image below are the indices I used into the array that stored the points.
gallery_49001_299_4001.png
For each control point, the heightmap would be queried for the height at that point. After that, the points could be combined to compute various pitches and rolls. Since I had decided in the design doc that models should be facing the +x direction, pitch is rotation around the z axis and roll around the x axis. Based on XNA's description of the arguments in Matrix.CreateFromYawPitchRoll, it seems more typical to have the facing direction be along +z. In my next game, I will change the facing direction, but will keep my current one for this game and test programs. In the image below, the numbers are the indices in the arrays of floats for keeping track of the individual pitches and rolls between the control points. In case it's not clear, the 4 and 5 indices are between the corner control points and the rest are between a particular corner and a center of mass aligned control point.
gallery_49001_299_6266.png
After computing the various pitches and rolls between the control points, the remaining questions were:
1. Which of those values should be used for pitch and roll?
2. What is the y value for the object's position?
I'll answer the second question first as it was consistent across any method I used to try to answer the first question. For each control point, I took the xz vector from subtracting the highest control point from the current one one and applied the pitch and roll from the answer to the first question to that vector, then added the resulting vector to the highest control point. Since code is probably more clear (note: going through only the first 4 indices is because those are the corner control points and the only ones needed for the next part):
for (int i = 0; i < 4; ++i)
{
if (i != max_index)
{
Vector3 xz = new Vector3(
control_points_object.X - control_points_object[max_index].X,
0,
control_points_object.Z - control_points_object[max_index].Z);
control_points_object =
Vector3.Transform(xz,pitch_roll) + control_points_object[max_index];
}
}

After that, performing bilinear interpolation on the corner control points would result in the y value for the object's position.

The answer to the first question is where I got stuck. After trying a variety of methods, I asked on the forums in case someone else had solved this problem before. Apparently, this isn't a common problem as there were no responses, which had me hoping I could find a solution and post a tutorial on it in a dev journal. The methods I had tried before posting the question were doing the min, max, min magnitude, and max magnitude per side and combining the sides with the same variety of functions. Resorting to these combinations was well after thinking about the problem and trying/debugging the one that seemed reasonable. Some would give reasonable results in default orientation, but break when I tested with other yaws (my test cases were yaws, in radians, of 0, pi/4, pi/2, 3*pi/4,pi,-pi/4). I then tried to do a series of if/else cases for equalities and inequalities, which again worked in default orientation and broke in several of the other terrain/yaw test cases.

Also, in the course of these initial attempts, it was clear that when both pitch and roll were non-zero each needed to be scaled back. Rotating by both at their full values caused over-rotation and the object to clip through the terrain. The scaling factor I used after realizing its necessity was:
public float Scale
{
get
{
return Math.Abs(Pitch) / (Math.Abs(Pitch) + Math.Abs(Roll) + .000001f);
}
}

With the "+ .000001f" simply to handle the case were pitch and roll are both 0. For applying the scaling factor, it is simply Scale * Pitch, and (1 - Scale) * Roll.

The next time I worked on this after posting the question on the forums, I tried a brute force approach. Go through all combinations of pitches and rolls between the control points, assign a score to each combination, and pick the one with the best score. So that I could debug the scoring method, I made the test program keep all the combinations and flip to the next best combination with a button press. My goal with this brute force method was to find the right answer in all the test cases and analyze the results to find out what is the right combination method.

After debugging the scoring mechanism and code, I went with a two part scoring mechanism. The first part is that combinations whose control points all remain at or above their heightmap values are better than those that have any that dip below. After that, it is which combination has the smallest total increase for each control point's y value over their heightmap values. What I was shooting for with this scoring mechanism is finding the result that matched the terrain the closest without going below it.

From analyzing the results of that scoring mechanism, the min/max approaches didn't fit the results at all. So, I adjusted my if/else method of computing the final result to be very close to the brute force method. It wasn't an exact match because the analysis showed a contradiction. Looking at one of the test cases for the inequality results would require a particular index to be used, but a different test case with the same inequality result, would need a different index. After adjusting the if/else method, I re-enabled that code and disabled the scoring mechanism and ran through the test cases. While most of the results looked good, the contradiction case looked terrible.

[heading]Game Integration[/heading]
Since the scoring mechanism produced good results and at this point I wanted to get on with the game and not stay on this problem indefinitely (I might come back to it eventually), I tried plugging the scoring method into the game. Given what the game currently does and what I expect it to do, it should be able to handle the added performance cost of running through the combinations without a noticeable degradation to performance. Performance wasn't the problem. While the scoring results looked good in a static scene, when applied to a moving object the difference frame-to-frame was too severe. The camera is at a fixed following distance from the tank, so the tank's movement affects the camera which made the jitter very apparent. The jitter was due to only looking at the tank's current position on the terrain, so the best fit pitch and roll on that frame had no relation to the pitch and roll on the previous.

At this point, I've decided it's time to move on with my game as there are plenty of other aspects that need to be implemented. At the time of working on this problem, there wasn't a hud, object-object collision detection, or any AI yet. The first two of those have since been added and the third is the next topic to start looking at[sup]4[/sup]. While I'm not sure if there is a way to combine the various pitches and rolls between the control points to get a satisfying result, I think this problem might be solvable by using physics to tell the tank when the ground level has increased or when center of mass has crested a hill or cliff and can start tilting down. But, physics simulation is a subject well beyond the scope of this particular game. So for now, my game is just going with level terrain. Since the goals of the game are to:
1. Get experience with XNA
2. Create a full game
3. Be a small game that can be developed quickly
I'm willing to sacrifice the detail of cresting hills and crossing short trenches, since that detail combined with how much is left to do has made the "quickly" part of goal #3 to have passed.

[heading]Foot Notes[/heading]
[sup]1[/sup] Just to be clear, these are calendar times and has been spare time work, definitely no where close to 40+ hour work weeks.
[sup]2[/sup] The reason for models having their origin be the center of mass instead of their geometric center is because when I was doing the initial planning, I was thinking of integrating some physics into it. I have since decided that physics is outside the scope of this game due to the amount of complexity it brings and this was supposed to be a quick to develop game.
[sup]3[/sup] "Control points" might not be the conventional term and could be why I can't re-find that reference page.
[sup]4[/sup] I read "Programming Game AI by Example" by Mat Buckland a couple months ago and look forward to applying some of that.
tricona
Since my plan for the tank game was to be a quick game to develop[sup]1[/sup], I decided to just do hit-scan weapons instead of actually generating a projectile object and tracking it. Naturally, I went with a 2-tiered approach of a fast-rejection phase followed by a more precise test.

Due to my game objects being able to rotate, I went with bounding spheres for the fast-rejection phase. I could have dealt with adjusting an AABB (axis-aligned bounding box[sup]2[/sup]) for rotation or using OBB (oriented bounding box), but decided to go with bounding spheres for the simplicity. XNA already provides a BoundingSphere struct that can be tested for intersection against a ray, so it was the natural choice to use that.

For the more precise phase, I could have gone against a collision mesh and gotten very accurate results from that. But for the sake of playing around a bit, I decided to take a different approach, which I think turned out to be a good exploration of XNA's Ray and BoundingSphere structs. I used an XML file to specify spheres that closely approximate the mesh. In order to allow these spheres to approximate the mesh, they can be scaled and translated along each axis as well as rotated about an arbitrary axis. Each sphere is also attached to a bone so that animation is taken into account.

gallery_49001_299_27160.png
Turret and tank models

gallery_49001_299_34270.png
Turret and tank collision spheres

Manipulating spheres this way is just applying what I learned in my college computer graphics course's ray tracer. If you have never written a ray tracer, I highly recommend the exercise. In that program, there were different primitives that we needed to support (spheres, triangles, cubes, cylinders, etc). The primitives were all unit size and centered at the origin. In order to get a non-unit sphere or triangle, a scale operation had to be applied, which adjusted the primitive instance's world matrix. When it came to rendering, the ray was just multiplied by the inverse of the world matrix. It is worth noting that to use just the inverse both the ray's position and direction were done as Vector4 with the w component being 1 and 0 respectively.

In the tank game, for checking the collision spheres against the ray, I tried adjusting an instance of XNA's Ray struct in the same way and once again using XNA's BoundingSphere struct for the intersection test. This did not work. From looking at the documentation of BoundingSphere.Intersects, I didn't see a reason why it wouldn't. So the next step in my debugging was extracting the relevant values with break points and manually doing the math for the hit and miss test cases. The math worked out correctly. Turns out on the Ray's member documentation page, the direction is specified as a unit vector. There is no note that using a non-unit direction would cause the Intersects method to give incorrect results. This is relevant because by scaling the collision spheres the converted ray's direction will not be a unit length (unless the resulting scale is 1). My solution to this was to implement the ray-sphere intersection test myself in my CollisionSphere class. Another solution would have been to convert the ray then normalize the direction. However, that will give an incorrect t value back, which will mess up determining which is the closest intersection to the ray's source and positioning cross-hairs in 3D space. Maybe it's because of my prior experience for this, but I find it bizarre that XNA's Ray struct has this unit length requirement.

[sup]1[/sup] It has already taken substantially longer than I expected, but I am learning not only the framework, but also more about game development. So, I still intend to complete this project.
[sup]2[/sup] While I assume that most people reading this already know that AABB stands for axis-aligned bounding box, I prefer to define the meanings of all abbreviations at least once just in case someone is unfamiliar.
tricona

Starting with XNA

[subheading]Normal Mapping[/subheading]
After reading and going through the code in "Learning XNA 4.0" by Aaron Reed, I created a few minor test programs to play with some basic rendering techniques while learning more of the XNA framework. In each of these programs, I used simple models to test against (cubes, spheres, and a curved surface in one case).

The first of the rendering techniques was doing per-pixel lighting manually instead of using BasicEffect's PerferPerPixelLighting property. The development of this program was straight forward and there wasn't anything of note.

After that I went onto normal mapping. There are plenty of tutorials online for normal mapping, so I'm not going to go into the details of the algorithm. The only hang up I had in this program was getting the binormal and tangent. There are also plenty of tutorials online on how to compute those. But, in XNA it turns out you don't need to do so manually. I didn't see this mentioned in the various references on normal mapping, so thought I'd mention this little detail here. After you add the model to your content project, expand the "Content Processor" property for that model and set "Generate Tangent Frames" to "True". This way the XNA framework's model processor takes care of computing the binormal and tangent which will then be available to your shaders. This was just a lesson in explore your tools.

[subheading]Skeletal Animation[/subheading]
I found "Learning XNA 4.0" by Aaron Reed to be a good introduction to XNA. However, I also think the main thing it is lacking is that it does not cover the content pipeline. While it covers a large variety of topics in XNA and provides good detail, the content pipeline is a big part of using XNA beyond the BasicEffect rendering mechanism. Even to use XNA's SkinnedEffect renderer, a custom model processor is needed. Luckily, Microsoft's App Hub has the Skinned Model example that covers creating the relevant model processor. From analyzing it, I think it's a good example to learn from. It shows some additional classes/structs to help with loading, and shows that an additional library is needed to actually allow the serialized data to accessible to the content pipeline and the game at runtime.

For the purposes of playing with skeletal animation, I also wrote my own vertex and pixel shaders for rendering an animated test model that was just basically 3 joined cubes with 3 bones. To help with debugging both the computation of the bone matrices and the shaders, I switched the program between using the custom shaders and XNA's SkinnedEffect renderer. In the course of debugging the program, I was surprised to find out that the content pipeline automatically took care of interpolating the keyframes. The test model only had 2 keyframes in the fbx file, but after loading there were 104 keyframes.
tricona

Intro

[subheading]About me[/subheading]
I am a software engineer who graduated from UNH back in 2005 with a BS in CS. By day I'm a software engineer working in C++, various scripting languages, and with the user and admin levels of Linux on distributed software that needs to run 24/7. By night (well, evening and weekends :)), I'm a hobbyist developer working on various projects, not all game related. I predominately work in C++, but have been learning C# in my spare time for over a year now and like many of its features. It's been about 14 years since I started in C++, and before that I was doing hobby projects in Visual Basic and QBasic. Even without programming, I have been using a computer for nearly as far back as I can remember.

During my freshman year of college, my desire to learn how to create 3D games lead me to NeHe on this site and I have been predominately a lurker on the forums ever since. Back then I posted a few times under the handle Tricona, but never really did much with the account. I was surprised it actually still existed and that it was attached to my current email address when I went to sign up for a gdnet+. The main reason I want a gdnet+ account is to help support the site that I have learned a lot from.

After going through the OpenGL lessons on NeHe I tried to create some games on my own and never completed any of them. Knowing OpenGL helped me in several of my college courses, but knowing a graphics API and knowing how to create a game are two different things. Looking back on it, I bit off more than I could chew with those projects. Instead of starting with several simple projects and building up from there, my second or third project was creating a RTS/FPS hybrid. In that one project alone I transitioned the graphics rendering from immediate mode OpenGL to VBOs, and finally tried switching to Ogre. Graphics development aside, the game itself was too complex for my knowledge of game development.

Fast forwarding from college to present day, after doing several application projects in C# (each to practice different areas in the language), I am trying out game development in C#. For a few months now I have been playing with XNA 4.0. I started with a few simple programs to familiarize myself with the framework and to play with some simple rendering algorithms (per pixel lighting, normal mapping, skeletal animation). My current project (described in the last section of this entry) is to continue exploring XNA and to create a complete game.

[subheading]Purpose of this Dev Journal[/subheading]
I am doing a dev journal for two main reasons. The first is to provide a reference for solutions to issues I've encountered in case anyone else runs into them. An example of the type of issue I'm talking about is: when using XNA's Ray struct, the direction must be unit length or the intersection methods don't work correctly, which I will talk more about in another entry. I don't plan to do tutorials as there are already plenty out there, unless I hit something I can't find adequately covered elsewhere.

The second reason is to have somewhere to post progress on my projects, as I feel that a public posting of progress should help with motivation to continue the project (regardless of whether anyone reads it or not :)). There have been several times that when I get past a hurdle in whatever hobby project I'm working on that I will post a status update on facebook about it. From doing that I found that facebook has a limit on status updates in the area of 400-460 characters (I forget exactly where in that range the limit was). Also, most of my friends, on facebook or otherwise, are not programmers so they don't understand most of what I was talking about. By posting here instead, the people that potentially will read this will likely have the background to understand what I'm talking about.

[subheading]Current Project[/subheading]
My current project is a simple tank game. The primary goals of this project are to explore the XNA framework more, and to (finally) create a complete game. I've been using html to keep a design doc for the game and have been updating it as the programming progresses. Especially at the start of the project, this doc has proven very useful for keeping me focused on what the game is to be and record the technical details (i.e. modeling requirements, level file format). The gameplay is not going to be anything fancy. The player controls a tank and can destroy enemy turrets and tanks, and the game will have multiple levels. Since XNA allows for developing games for the XBox 360 and Windows, I have been using a wired 360 controller on my PC for developing it and have been coding it to support split screen co-op play. I haven't done the menus yet (been concentrating on game play), so the mechanism to get extra players in the game is currently not implemented. I'm planning to take a break from developing it tomorrow to write more dev journal entries about some of the issues that have arisen so far in its development and in the earlier XNA programs.
Sign in to follow this  
Followers 0