• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.


  • Content count

  • Joined

  • Last visited

Community Reputation

921 Good

About ZachBethel

  • Rank

Personal Information

  • Location
    Upland, IN
  1. Please tell me you're not going to be the only engineer on this. That just isn't working out for you, as brilliant as you are. ;)
  2. Is it valid behavior to map a region of a read back resource while simultaneously writing to a disjoint region via the GPU? I've got a profiler subsystem with a single read back buffer that is N times the size of my query heap for N frames. The debug SDK layer gives a warning that the subresource is mapped while writing from the GPU.
  3. It turns out it's the UAV barrier that's barking at me, and it only seems to happen if I do a UAV barrier on a command list without first transitioning the resource to a UAV in that command list, which seems wrong.
  4. I've got a scenario where I am building a command list that involves using UAVs. The UAV is transitioned to the Unordered Access state in a prior command list, like so:   Command List A: Transition NonPixelShaderResource -> UnorderedAccess   Command List B: UAV barrier ClearUnorderedAccessViewUint Dispatch more UAV barriers   Direct Queue: (A, B)   When I try and queue a UAV barrier on the later command list, I get this error spewing:   D3D12 ERROR: ID3D12CommandList::ClearUnorderedAccessViewUint: Resource state (0x0) of resource (0x00000242CA4635A0:'Histogram') (subresource: 0) is invalid for use as a unordered access view.  Expected State Bits: 0x8, Actual State: 0x0, Missing State: 0x8. [ EXECUTION ERROR #538: INVALID_SUBRESOURCE_STATE] D3D12 ERROR: ID3D12GraphicsCommandList::ResourceBarrier: Before state (0x8) of resource (0x00000242CA4635A0:'Histogram') (subresource: 0) specified transition barrier does not match with the state (0x0) specified in the previous call to ResourceBarrier [ RESOURCE_MANIPULATION ERROR #527: RESOURCE_BARRIER_BEFORE_AFTER_MISMATCH]   Is the debug layer just over validating? Or is there actually an issue here? For one, the error doesn't really make sense, if I remove the UAV barrier call the errors stop, but my resource is definitely not in the common state (0x0). I get this error even when I create the resource in the UnorderedAccess state.     Besides, how can the debug layer know I haven't transitioned the resource properly before I call ExecuteCommandLists? A prior command list could do the transition.   Has anyone encountered this issue before?
  5. A thing I'm struggling with right now is how to handle mapping of resources across multiple command lists.   i.e. // Thread 1 ConstData cd; cd.data = 1; ptr = consBuffer.Map(); memcpy(ptr, cd); constBuffer.Unmap(); CL1->draw(obj1); cd.data = 2; ptr = consBuffer.Map(); memcpy(ptr, cd); constBuffer.Unmap(); CL1->draw(obj2); // thread 2 CL2->draw(obj3) // use const buffer being written in thread 1 // submission thread: CL1 -> CL2 One approach I've seen is to cache changes to the resource in a command list-local cache, and then update the contents of the buffers when the command lists are serialized to the submission thread.
  6. Great, that's what I expected. I feel like a lot of documents assume you know that and gloss over it.   On that note, when building a task graph, it seems like it's wise to statically bake out your high level render passes on your queue submission thread, batch up all the command lists in those passes (e.g. wait until all your Z-prepass lists come in, for instance), and then submit in dependency sorted order (wait to submit g-buffer until z-prepass group has been submitted).
  7. When you submit command lists to a command queue, what ordering guarantees / expectations do you have?   According to MSDN:   GPU work submission To execute work on the GPU, an app must explicitly submit a command list to a command queue associated with the Direct3D device. A direct command list can be submitted for execution multiple times, but the app is responsible for ensuring that the direct command list has finished executing on the GPU before submitting it again. Bundles have no concurrent-use restrictions and can be executed multiple times in multiple command lists, but bundles cannot be directly submitted to a command queue for execution. Any thread may submit a command list to any command queue at any time, and the runtime will automatically serialize submission of the command list in the command queue while preserving the submission order. That last sentence is where I'm confused. Is it the case that if I build N command lists and call ExecuteCommandLists(...) with an array of those N command lists, that they are processed in order? That much seems to be true. The fuzzier part for me is how transition barriers and fences play into the submission order. Say I have a Z-Prepass and a shadow pass, and then some G-Buffer pass. Assuming I transition barrier everything correctly, am I expected to submit my Z-Prepass / Shadow pass command lists before the g-buffer command lists? That would basically mean I have to schedule my submission thread to wait for all the precursor work to come in from the job system before it can submit. This is what I am expecting that I have to do, but it's pretty unclear to me. It doesn't help that none of the samples online actually do a job-system based multithreaded demo :) I would love an elaboration on how the driver actually schedules the command list work. Thanks!
  8. Hey all,   I'm reading up on render passes, which seem like a powerful concept to clue the driver into the exact path of your render pipeline. In the spec they explain that the driver can react to ordering constraints to insert transition barriers.   Something I'm confused about is how render passes relate to command lists and command list submission.   For one, render passes form a DAG. Do I still have to submit command lists in dependency sorted order with respect to render passes? I would expect so, but I wasn't able to find any specific details on that.   Secondly, what's the granularity of a command list to a render pass? Can a command list span several render passes? (through multiple begin / end blocks)? Can a render pass be composed of several command lists (and if so, does each one inherit the begin / end state from the previously submitted list)?   If you understand the details of this I would to get your input.   Thanks!
  9. DX12

    I've been thinking more about this, and I've come to realize some things.   I did some investigation into how some real workloads are handling the root signature. I found that a vast majority of what I saw have a structure similar to this:   DX12 style binding slots:   For bucketed scene draws:   0: Some push constants 1: per draw constant buffer 2: per pass constant buffer 3: per material constant buffer 4: A list of SRVs   For various post processing jobs:   0+ constant buffers simple table of UAVs simple table of SRVs   I didn't find any use cases where different regions of the same descriptor table were used for different stuff... for the most part is seems a simple list of SRVs / UAVs is enough.   I also realized that Vulkan has the strong notion of a render pass, and that UAVs could be factored into render passes as outputs (which are then transitioned to SRVs).   To me, it seems like having constant buffer binding slots, a way to bind a list of SRVs to the draw call, and a way to bind a list of UAVs to a render pass is enough to support most scenarios.   With regards to list allocation, it seems like descriptor layouts are going to be bounded by the application. Like you said, Witek902, you could just create a free list pool for descriptors and orphan them on update into a recycle queue. Static descriptor sets just get allocated once and held.   For DX12, you could model that same technique by allocating fixed size pieces out of a descriptor heap, or use some sort of buddy allocator. With the descriptor heap approach it becomes a bit weirder because it seems the ideal use case scenario is to keep the same heap bound for the whole frame.   I also read in Gpu Fast Paths that using dynamic constant buffers eats up 4 registers of the USER-DATA memory devoted to the pipeline layout. Apparently using a push constant to offset into a big table is more performant (I'm not sure how portable this is to platforms like mobile).    Anyway, just some thoughts.
  10. I've been reading up on how the resource binding methods work in Vulkan and DX12. I'm trying to figure out how to best design an API that abstracts the two with respect to binding descriptors to the pipeline. Naturally, the two API's are similar, but I'm finding that they treat descriptor binding differently in subtle ways.   Disclaimer: Skip to the bottom if you have a deep understanding of this already and just care about my specific question.   Explanation:   In DirectX, you define a "root signature". It can have push constants, inlined descriptors binding points, or descriptor table binding points. It also defines static samples on the signature itself. A descriptor table is a contiguous block of descriptors within a descriptor heap. Binding a table involves specifying the first descriptor in the heap to the pipeline. Tables can hold either UAV/SRV/CBV descriptors or SAMPLER descriptors. You cannot share the two within a single heap--and therefore table. Descriptor tables are also organized into ranges, where each range defines one or more descriptors of a SINGLE type.   Root Signature Example:   Descriptor Heap indirection:     In Vulkan, you define a "pipeline layout". It can have push constants and "descriptor set" binding points. You cannot define inlined descriptors binding points. Each descriptor set defines a set of static samplers. A descriptor set is a first class object in Vulkan. It also has one or more ranges of a SINGLE descriptor type.       Descriptor Sets:     Now, an interesting pattern I'm seeing is that the two API's provide descriptor versioning functionality for completely different things. In DirectX, you can version descriptors implicitly within the command list using the root descriptor bindings. This allows you to do things like specify a custom offset for a constant buffer view. In Vulkan, they provide an explicit UNIFORM_DYNAMIC descriptor type that allows you to version an offset into the command list. See the image below:       Question:   Okay, so I'm really just looking for advice on how to organize binding points for an API that wraps these two models.   My current tentative approach is to provide an API for creating buffers and images, and then explicit UAV/SRV/CBV/RTV/DSV views into those objects. The resulting view is an opaque, typeless handle on the frontend that can map to descriptors on DirectX 12 or some staging resource in Vulkan for building descriptor sets.   I think I want to provide an explicit "ResourceSet" object that defines 1..N ranges of views similar to how both the descriptor set and descriptor table models work. I expect that I would make sampler binding a separate API that does its own thing for the two backends. I would really like to treat these ResourceSet objects similar to constant buffers, except that I'm just writing view handles into it.   I need to figure out how to handle versioning of updates to these descriptor sets. In the simplest case, I treat them as fully static. This maps well to both DX12 and Vulkan because I can simply allocate space in a descriptor heap or create a descriptor set, write the descriptors to it, and I'm done.   Handling dynamic updates becomes complicated for both API's and this is the crux of where I'm struggling right now.   Both APIs let me push constants, so that's not really a problem. However, DirectX allows you to version descriptors directly in the command list, but Vulkan allows you to dynamic offsets into buffers. It seems like this is chiefly for CBVs.   So it seems like if I want to do something like have a descriptor set with 3 CBV's, and then do dynamic offsets, I have to explicitly version the entire table in DirectX by allocating some new space in the heap and spilling descriptors to it.   On the other hand, since Vulkan doesn't really have the notion of root descriptors, I'd have to create multiple descriptorset objects and version those out if I want to bind a single dynamic UAV.   Either way, it seems like the preferred model is to build static descriptor sets but provide some fast path for constant buffers, and that's the direction I think I'm going to head in.   Anyway, does this sound like a sane approach? Have you guys find better ways to abstract these two binding models?   Side question: How do you version descriptor sets in vulkan? Do you just have to pool descriptor sets for the frame and spill when updates occur?   Thanks!
  11. It looks like you're allowed to create a resource with an unknown dimension, as opposed to an explicit buffer, texture{1d, 2d, 3d}.   What does that mean, practically? Is it possible to create a raw buffer and then create views that map it as 2D, for instance? The MSDN docs don't really explain what this means at the driver or even SDK level.
  12.   I believe nVidia explicitly calls out that you should avoid batch submitting your entire frame at the very end. I believe the idea to keep the hardware busy with ~5 vkSubmit / ExecuteCommandList calls per frame?   That said, I guess there are several ways you can pipeline your frame. 
  13. Vulkan and DX12 have very similar API's, but the way they handle their synchronization primitives seem to differ in the fundamental design.   Now, both Vulkan and DirectX 12 have resource barriers, so I'm going to ignore those.   DirectX 12 uses fences with explicit values that are expected to monotonically increase. In the simplest case, you have the swap chain present barrier. I can see two ways to implement fencing in this case:   1) You create N fences. At the end of frame N you signal fence N and then wait on fence (N + 1) % SwapBufferCount. 2) You create 1 fence. At the end of each frame you increment the fence and save off the value. You then wait for the fence to reach the value for frame (N + 1) % SwapBufferCount.   In general, it seems like the "timestamp" approach to fencing is powerful. For instance, I can have a page allocator that retires pages with a fence value and then wait for the fence to reach that point before recycling the page. It seems like creating one fence per command list submission would be expensive (maybe not? how lightweight are fences?).   Now compare this with Vulkan.   Vulkan has the notion of fences, semaphores, and events. They are explained in detail here. All these primitives are binary, it is signaled once and stay signaled until you reset it. I'm less familiar with how to use these kinds of primitives, because you can't do the timestamp approach like you can with DX12 fences.   For instance, to do the page allocator in Vulkan, the fence is the correct primitive to use because it involves synchronizing the state of the hardware queue with the host (i.e. to know when a retired page can be recycled).   In order to do this, I now have to create 1 fence for each vkSubmit call, and the page allocator receives a fence handle instead of a timestamp.   It seems to me like the DirectX-style fence is more flexible, as I would imagine that internally the Vulkan fence is using the same underlying primitive as the DirectX fence to track signaling. In short, it seems like the DirectX timestamp-based fencing allows you to use less fence objects overall.   My primary concern is thinking about a common backend between Vulkan and DX12. It seems like the wiser course of action is to support the Vulkan style binary fences because they can be implemented with DX12 fences. My concern is whether I will lose performance due to creating 1 fence per ExecuteCommandLists call vs 1 overall in DirectX.   For those who understand the underlying hardware and API's deeper than me, I would appreciate some insight into these design decisions.   Thanks!
  14. That makes sense, and I it's what I figured would happen. Thanks!
  15. It seems like it's valid usage, the docs don't seem to mention it. However, isn't the point of a DEFAULT resource that it's not visible to the host? You would have to use an upload heap to stage it.