• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.

MJP

Moderators
  • Content count

    8459
  • Joined

  • Last visited

Community Reputation

19731 Excellent

About MJP

Personal Information

  • Location
    Irvine, CA
  1. I've definitely run into a few Nvidia DX12 driver bugs (especially when DX12 was new), but I haven't personally seen anything with compute shaders. The driver and/or shader JIT is probably just trying to do something clever, and ends up doing something bad.
  2. Two other approaches that you can take: Create the CBV descriptor on-the-fly with the desired offset. Creating descriptors is fast, since there's no allocation or heavy resource creation Use a "root" CBV, which let's you pass an arbitrary GPU virtual address. I'd recommend #2, since it's easy and cheap. Nvidia also prefers it for various reasons.
  3. That's correct, the set of UAV's bound for the CS stage is distinct from the set that's bound for rendering stages. You need to clear the rendering UAV bindings before you can bind the resource as an SRV.
  4. DX11

    That error message is correct: the dimensions and MSAA counts need to match between your render targets and your depth-stencil buffers. You can't use a larger depth-stencil buffer with a smaller render target. Depth buffers tend to be complicated in modern GPU's: they have all kinds of compression features, and they will often use tiled memory layouts to improve bandwidth. Allowing a larger depth buffer to work with a smaller render target would probably get in the way with these things. In D3D12 you can alias two different depth buffer resources onto the same heap range to avoid wasting memory, but in D3D11 you don't have control over resource memory placement.
  5. Your intuition is correct: with true HDR sky intensities it's possible that the lower mip levels could have a very different result. It depends on the relative intensities between the clear pixels and the cloud pixels, which can very different in a tone-mapped LDR image vs. an HDR image that uses values that are directly proportional to physical intensities.
  6. Nice! Glad to hear that you got it working.  :)
  7. It's still mostly relevent, although with modern API's you have more flexibility with regards to how you can read data in a shader. For instance, you now have structured buffers which are generally more convenient to work with than textures or constant buffers. Typically your vertex shader will only be concerned with reading the final skeleton pose data, and won't be performing any of the actual animation or blending work that you're referring to. You would probably want to do this ahead of time on CPU (or possibly in a compute shader if you want to get fancy), where you write out the final bone transforms into buffers for each unique animation state. Then your instances would read from these buffers, with multiple instances possibly sharing the same animation state. That whitepaper achieves this sharing of animation states by putting an indirection index into the per-instance data, which is then used as an offset into a global combined bone texture/buffer. On modern API's you could also achieve this with bindless techniquesm where instead of having an offset into a buffer you could have an index into an array of descriptors.
  8. Ultimately everything in Windows will end up in a D3D texture, and will get composited with the rest of the desktop by the Desktop Window Manager (DWM) using the GPU. By going through GDI+ (which is what WinForms uses under the hood) you'll be going through some layers of abstraction first, but at the end of the day your framebuffer is going to end up in a texture. By skipping GDI you can probably get better performance, but I couldn't tell you how much faster it would run or whether it would be meaningful for you. So if you want to learn a bit about MonoGame, D3D, or OpenGL, then go ahead and see if you can get your program working by more directly writing to a texture. But otherwise, I probably wouldn't worry about it too much unless you've determined (or strongly suspect) that overhead from GDI is really limiting your performance. Anyhow, I have no experience with MonoGame but I did a bunch of XNA work back in the day. Their Texture2D class has a SetData method that you can use to fill a texture with data from an array of integers, which should be easy for you to use. To get that on the screen, probably the easiest way would be to create a SpriteBatch and pass your Texture2D to SpriteBatch.Draw. You could also of course manually draw a triangle to the screen that reads your texture, either by using a built-in shader program or by writing your own.
  9. That page you linked is only talking about loads, and not stores. Stores have always been supported for most formats, but loads have optional support.  You can read from one mip level of a texture as an SRV while simultaneously writing to another mip level of the same texture using a UAV. You just have to make sure that your SRV and UAV are created so that they only "target" a single mip level of the texture, otherwise you will get errors from the validation layer. This means you will need N-1 SRV's and N-1 UAV's for a texture that has N mip levels to generate a mip chain 1 mip level at a time. You can control the mip level available to an SRV by setting the "MostDetailedMip" and "MipLevels" member of the D3D11_TEX2D_SRV structure, which is part of D3D11_SHADER_RESOURCE_VIEW_DESC. Unordered access view descriptions have the "MipSlice" member which lets you achieve the same thing.
  10. Support for UAV typed stores to R11G11B10_FLOAT is required for FEATURE_LEVEL_11_0, as indicated by this chart. What's optional is support for UAV typed loads is what's optional. So in other words if you only need to write to a R11G11B10 texture, then you're fine. You'll only need the extended UAV typed load support if you want to read from a UAV with that format.
  11. FYI you're not going crazy: this warning is new with the Windows 10 Creator's Update. It's always been required to have the "unorm/snorm" thing according to the spec, but almost nobody knew about it because it wasn't well-documented and there was no validation error for it (it has happened to work correctly on most hardware).
  12. I would recommend reading Bump Mapping Unparametrized Surfaces on the GPU.
  13. Explicitly passing gradients can be slower in some situations, but it depends on the hardware as well as the specifics of the shader itself. Sending the gradients requires sending more data from the shader unit to the texture unit, and in some cases it can cause texture sampling to run at a lower rate. There are 3 ways (that I know of) that you can handle sampling your textures in your deferred pass: Pack all scene textures into atlases or arrays  Use virtual texturing techniques, which effectively lets you achieve atlasing through hardware or software indirections Use bindless resources When I experimented with this I went with #3 using D3D12, and it worked just fine. The one thing you need to watch out for with bindless techniques is divergent sampling: if different threads within a warp/wavefront sample from different texture descriptors, the performance will go down by a factor proportional to the number of different textures. Generally your materials tend to be coherent in screen-space so it's not too much of an issue, but if you have areas with many different materials only covering a small number of pixels then your performance may suffer.  I also disagree with ATEFred that it won't help you at all. One of the primary advantages of deferred techniques is that it can decouple your heavy shading from your geometric complexity, which helps you to avoid the reduced efficiency from sub-pixel triangles. Deferred texturing in particular aims to go even further than traditional deferred techniques by only writing a slim G-Buffer and moving all texture sampling to a deferred pass, which makes it even more ideal appealing for your situation. Obviously your G-Buffer pass is still going to be slower than it would be with a lower geometric complexity that has a more ideal triangle-to-pixel ratio, but in general the more you can decouple your shading from rasterization the less you'll be impacted by poor quad utilization. That said, you should always be thorough about gathering performance numbers using apples-to-apples comparison as much as you can, so that you can make your choices based on concrete data. Some things can vary quite a bit depending on the scene, material complexity, the GPU, drivers, etc.
  14. "Back-buffer" specifically refers to a render target that belongs to a "swap chain" of buffers that are used to present to a window or screen, where that chain typically consists of two separate buffers that you swap every time you present. The terminology can vary a bit between API's and engines, and these terms are used by D3D/DXGI API's. In some other places you might see it called a "flip" operation instead of a "present", since you're "flipping" between front and back buffers. You may also see the back buffer referred to as a "frame buffer", which is an older term that dates back to days when graphics hardware had a specific region of memory that was dedicated to scanning out images to the display. So in general you have textures, which are typically read-only. But you can also have textures that the GPU can write to, and then possibly read from later. In D3D these are called "render targets", since they can be the "target" of your rendering operations. With that terminology your back-buffer is really just a special render target, and the latest versions of the API absolutely work that way. In older D3D versions it was common to refer to non-backbufffer render targets as "off-screen targets", or "off-screen textures", since back then it was rather unusual to have a render target that wasn't tied to a swap chain. So you might say "I'm going to render this shadow to an off-screen texture, then I'll read from it when I'm actually rendering to the screen using the back-buffer". In modern times it's possible for an engine to go through dozens of different render targets in a single frame, and typically you'll do majority of your rendering to "off-screen" targets instead of the back-buffer. A typical setup might go like this: Render Z Only to depth buffer Render to N G-Buffer Targets Render SSAO to RT Render Shadow Maps Read G-Buffer/Depth/SSAO/Shadows and Render Lighting to RT Read lighting RT and perform Bloom + Motion Blur + DOF Passes Read Lighting/Bloom/MB/DOF results and combine, perform tone mapping, write to back-buffer Render UI Present In this context it's really the back-buffer that's the "unusual" target, and the terminology and newer API's tend to reflect that. As for MSAA...back in the days before we used dozens of render targets and games typically just rendered directly to the back-buffer/framebuffer, the way you would use MSAA would be to specify that you wanted MSAA when you created the swap chain, and then your back-buffer would just magically have MSAA. You'd draw to it, present, and it would Just Work without you really needing to do anything as a programmer. Behind the scenes the GPU had to do some work with your MSAA target, since it's not natively presentable on its own. It has to be resolved, which is an operation that combines the individual sub-samples to create a non-MSAA image that can be shown on the screen. Up until D3D12 you could still do things this way: you could specify an MSAA mode in the swap chain parameters, render to an MSAA swap chain, and the driver would resolve for you when you call Present. But it doesn't make sense to do this if you have a setup like the one I outlined above, since you've already done a bunch of post-processing operations and UI rendering by the time you reach the back-buffer, and you probably only want MSAA for your "real" geometry passes. So instead you'll create your own MSAA render target, and manually resolve to a non-MSAA render target. For old-school forward rendering this can be pretty simple: there's a dedicated API for resolving, and it will let the driver/GPU do it for you. However starting with D3D10 you can also do your own resolves in a shader, which is required for deferred rendering and/or for achieving higher-quality results with HDR. Even if you're just doing a normal resolve there's no downside to creating your own MSAA target instead of creating the swap chain with MSAA, since the same thing would happen behind the scenes if you created an MSAA swap chain. So I would recommend doing that, since it will set you up for doing more advanced rendering with post-processing or other techniques that require additional render targets. Getting back to the issue of forcing MSAA through the driver control panel: in the older days when people just rendered to the back-buffer, it was really easy for the driver to force MSAA to be enabled without the app knowing about it. It would just silently ignore the MSAA parameter when creating the swap chain, and replace it with something else. Back then everyone just did simple forward rendering, so everything would once again "just work". These days it's not nearly so simple. Even forward-rendered games often go through many render targets, and so the driver would have to carefully choose which render targets get silently promoted to MSAA. On top of that, it would have to figure out a point in the frame where it could sneak in a resolve operation before anybody reads from those render targets, since the results would be broken if they didn't do this. In a deferred setup like the one I outline above there's really no way for the driver to do it, since MSAA handles requires invasive changes to the shader code (this is often true in modern forward-rendering setups, which often make use of semi-deferred techniques that require special handling for MSAA). This is what Hodgman is referring to when he says that it's a hack that can break games, and why nobody should ever turn it on anymore (I don't even know why they still have it in the control panel). As a developer, the only sane thing you can do is ignore that feature and entirely, and hope that nobody turns it on when they run your game or app.
  15. As you've pointed out, on any recent hardware/API there's nothing stopping you from doing fully programmable vertex fetch. There's 3 things you should keep in mind though: The performance between programmable and fixed-function vertex fetch may not be the same. There's still GPU's out there that have dedicated hardware for vertex fetch, and using it could possibly be the fastest path depending on what you're doing. On the other hand, some hardware (for instance anything made by AMD in the last 7 years) has no dedicated vertex fetch, and will generate shader code that implements your input layout. But even then there can be differences depending on what types of resources you fetch your data from (structured buffer vs. formatted buffer vs. textures), and your data layout (AoS vs. SoA). For an example, here's what happened when someone benchmarked a bunch of different ways to fetch vertex data on their GTX 970. GPU's will typically tie their post-VS cache to indices from an index buffer, so you'll still need to use a dedicated index buffer to benefit from it. You may want to look through this thread for some ideas on how to do interesting things within the limitations of standard index buffers. Input layouts let you have some decoupling between your vertex buffer layout and your actual vertex shader, which can be convenient in some cases. However it's possible that different input layouts will cause the driver to generate different permutations of your VS (or different VS preludes) behind the scenes.