MJP

Moderators
  • Content count

    8455
  • Joined

  • Last visited

Community Reputation

19728 Excellent

About MJP

  • Rank
    XNA/DirectX Moderator & MVP

Personal Information

  • Location
    Irvine, CA
  1. Your intuition is correct: with true HDR sky intensities it's possible that the lower mip levels could have a very different result. It depends on the relative intensities between the clear pixels and the cloud pixels, which can very different in a tone-mapped LDR image vs. an HDR image that uses values that are directly proportional to physical intensities.
  2. Nice! Glad to hear that you got it working.  :)
  3. It's still mostly relevent, although with modern API's you have more flexibility with regards to how you can read data in a shader. For instance, you now have structured buffers which are generally more convenient to work with than textures or constant buffers. Typically your vertex shader will only be concerned with reading the final skeleton pose data, and won't be performing any of the actual animation or blending work that you're referring to. You would probably want to do this ahead of time on CPU (or possibly in a compute shader if you want to get fancy), where you write out the final bone transforms into buffers for each unique animation state. Then your instances would read from these buffers, with multiple instances possibly sharing the same animation state. That whitepaper achieves this sharing of animation states by putting an indirection index into the per-instance data, which is then used as an offset into a global combined bone texture/buffer. On modern API's you could also achieve this with bindless techniquesm where instead of having an offset into a buffer you could have an index into an array of descriptors.
  4. Ultimately everything in Windows will end up in a D3D texture, and will get composited with the rest of the desktop by the Desktop Window Manager (DWM) using the GPU. By going through GDI+ (which is what WinForms uses under the hood) you'll be going through some layers of abstraction first, but at the end of the day your framebuffer is going to end up in a texture. By skipping GDI you can probably get better performance, but I couldn't tell you how much faster it would run or whether it would be meaningful for you. So if you want to learn a bit about MonoGame, D3D, or OpenGL, then go ahead and see if you can get your program working by more directly writing to a texture. But otherwise, I probably wouldn't worry about it too much unless you've determined (or strongly suspect) that overhead from GDI is really limiting your performance. Anyhow, I have no experience with MonoGame but I did a bunch of XNA work back in the day. Their Texture2D class has a SetData method that you can use to fill a texture with data from an array of integers, which should be easy for you to use. To get that on the screen, probably the easiest way would be to create a SpriteBatch and pass your Texture2D to SpriteBatch.Draw. You could also of course manually draw a triangle to the screen that reads your texture, either by using a built-in shader program or by writing your own.
  5. That page you linked is only talking about loads, and not stores. Stores have always been supported for most formats, but loads have optional support.  You can read from one mip level of a texture as an SRV while simultaneously writing to another mip level of the same texture using a UAV. You just have to make sure that your SRV and UAV are created so that they only "target" a single mip level of the texture, otherwise you will get errors from the validation layer. This means you will need N-1 SRV's and N-1 UAV's for a texture that has N mip levels to generate a mip chain 1 mip level at a time. You can control the mip level available to an SRV by setting the "MostDetailedMip" and "MipLevels" member of the D3D11_TEX2D_SRV structure, which is part of D3D11_SHADER_RESOURCE_VIEW_DESC. Unordered access view descriptions have the "MipSlice" member which lets you achieve the same thing.
  6. Support for UAV typed stores to R11G11B10_FLOAT is required for FEATURE_LEVEL_11_0, as indicated by this chart. What's optional is support for UAV typed loads is what's optional. So in other words if you only need to write to a R11G11B10 texture, then you're fine. You'll only need the extended UAV typed load support if you want to read from a UAV with that format.
  7. FYI you're not going crazy: this warning is new with the Windows 10 Creator's Update. It's always been required to have the "unorm/snorm" thing according to the spec, but almost nobody knew about it because it wasn't well-documented and there was no validation error for it (it has happened to work correctly on most hardware).
  8. I would recommend reading Bump Mapping Unparametrized Surfaces on the GPU.
  9. Explicitly passing gradients can be slower in some situations, but it depends on the hardware as well as the specifics of the shader itself. Sending the gradients requires sending more data from the shader unit to the texture unit, and in some cases it can cause texture sampling to run at a lower rate. There are 3 ways (that I know of) that you can handle sampling your textures in your deferred pass: Pack all scene textures into atlases or arrays  Use virtual texturing techniques, which effectively lets you achieve atlasing through hardware or software indirections Use bindless resources When I experimented with this I went with #3 using D3D12, and it worked just fine. The one thing you need to watch out for with bindless techniques is divergent sampling: if different threads within a warp/wavefront sample from different texture descriptors, the performance will go down by a factor proportional to the number of different textures. Generally your materials tend to be coherent in screen-space so it's not too much of an issue, but if you have areas with many different materials only covering a small number of pixels then your performance may suffer.  I also disagree with ATEFred that it won't help you at all. One of the primary advantages of deferred techniques is that it can decouple your heavy shading from your geometric complexity, which helps you to avoid the reduced efficiency from sub-pixel triangles. Deferred texturing in particular aims to go even further than traditional deferred techniques by only writing a slim G-Buffer and moving all texture sampling to a deferred pass, which makes it even more ideal appealing for your situation. Obviously your G-Buffer pass is still going to be slower than it would be with a lower geometric complexity that has a more ideal triangle-to-pixel ratio, but in general the more you can decouple your shading from rasterization the less you'll be impacted by poor quad utilization. That said, you should always be thorough about gathering performance numbers using apples-to-apples comparison as much as you can, so that you can make your choices based on concrete data. Some things can vary quite a bit depending on the scene, material complexity, the GPU, drivers, etc.
  10. "Back-buffer" specifically refers to a render target that belongs to a "swap chain" of buffers that are used to present to a window or screen, where that chain typically consists of two separate buffers that you swap every time you present. The terminology can vary a bit between API's and engines, and these terms are used by D3D/DXGI API's. In some other places you might see it called a "flip" operation instead of a "present", since you're "flipping" between front and back buffers. You may also see the back buffer referred to as a "frame buffer", which is an older term that dates back to days when graphics hardware had a specific region of memory that was dedicated to scanning out images to the display. So in general you have textures, which are typically read-only. But you can also have textures that the GPU can write to, and then possibly read from later. In D3D these are called "render targets", since they can be the "target" of your rendering operations. With that terminology your back-buffer is really just a special render target, and the latest versions of the API absolutely work that way. In older D3D versions it was common to refer to non-backbufffer render targets as "off-screen targets", or "off-screen textures", since back then it was rather unusual to have a render target that wasn't tied to a swap chain. So you might say "I'm going to render this shadow to an off-screen texture, then I'll read from it when I'm actually rendering to the screen using the back-buffer". In modern times it's possible for an engine to go through dozens of different render targets in a single frame, and typically you'll do majority of your rendering to "off-screen" targets instead of the back-buffer. A typical setup might go like this: Render Z Only to depth buffer Render to N G-Buffer Targets Render SSAO to RT Render Shadow Maps Read G-Buffer/Depth/SSAO/Shadows and Render Lighting to RT Read lighting RT and perform Bloom + Motion Blur + DOF Passes Read Lighting/Bloom/MB/DOF results and combine, perform tone mapping, write to back-buffer Render UI Present In this context it's really the back-buffer that's the "unusual" target, and the terminology and newer API's tend to reflect that. As for MSAA...back in the days before we used dozens of render targets and games typically just rendered directly to the back-buffer/framebuffer, the way you would use MSAA would be to specify that you wanted MSAA when you created the swap chain, and then your back-buffer would just magically have MSAA. You'd draw to it, present, and it would Just Work without you really needing to do anything as a programmer. Behind the scenes the GPU had to do some work with your MSAA target, since it's not natively presentable on its own. It has to be resolved, which is an operation that combines the individual sub-samples to create a non-MSAA image that can be shown on the screen. Up until D3D12 you could still do things this way: you could specify an MSAA mode in the swap chain parameters, render to an MSAA swap chain, and the driver would resolve for you when you call Present. But it doesn't make sense to do this if you have a setup like the one I outlined above, since you've already done a bunch of post-processing operations and UI rendering by the time you reach the back-buffer, and you probably only want MSAA for your "real" geometry passes. So instead you'll create your own MSAA render target, and manually resolve to a non-MSAA render target. For old-school forward rendering this can be pretty simple: there's a dedicated API for resolving, and it will let the driver/GPU do it for you. However starting with D3D10 you can also do your own resolves in a shader, which is required for deferred rendering and/or for achieving higher-quality results with HDR. Even if you're just doing a normal resolve there's no downside to creating your own MSAA target instead of creating the swap chain with MSAA, since the same thing would happen behind the scenes if you created an MSAA swap chain. So I would recommend doing that, since it will set you up for doing more advanced rendering with post-processing or other techniques that require additional render targets. Getting back to the issue of forcing MSAA through the driver control panel: in the older days when people just rendered to the back-buffer, it was really easy for the driver to force MSAA to be enabled without the app knowing about it. It would just silently ignore the MSAA parameter when creating the swap chain, and replace it with something else. Back then everyone just did simple forward rendering, so everything would once again "just work". These days it's not nearly so simple. Even forward-rendered games often go through many render targets, and so the driver would have to carefully choose which render targets get silently promoted to MSAA. On top of that, it would have to figure out a point in the frame where it could sneak in a resolve operation before anybody reads from those render targets, since the results would be broken if they didn't do this. In a deferred setup like the one I outline above there's really no way for the driver to do it, since MSAA handles requires invasive changes to the shader code (this is often true in modern forward-rendering setups, which often make use of semi-deferred techniques that require special handling for MSAA). This is what Hodgman is referring to when he says that it's a hack that can break games, and why nobody should ever turn it on anymore (I don't even know why they still have it in the control panel). As a developer, the only sane thing you can do is ignore that feature and entirely, and hope that nobody turns it on when they run your game or app.
  11. As you've pointed out, on any recent hardware/API there's nothing stopping you from doing fully programmable vertex fetch. There's 3 things you should keep in mind though: The performance between programmable and fixed-function vertex fetch may not be the same. There's still GPU's out there that have dedicated hardware for vertex fetch, and using it could possibly be the fastest path depending on what you're doing. On the other hand, some hardware (for instance anything made by AMD in the last 7 years) has no dedicated vertex fetch, and will generate shader code that implements your input layout. But even then there can be differences depending on what types of resources you fetch your data from (structured buffer vs. formatted buffer vs. textures), and your data layout (AoS vs. SoA). For an example, here's what happened when someone benchmarked a bunch of different ways to fetch vertex data on their GTX 970. GPU's will typically tie their post-VS cache to indices from an index buffer, so you'll still need to use a dedicated index buffer to benefit from it. You may want to look through this thread for some ideas on how to do interesting things within the limitations of standard index buffers. Input layouts let you have some decoupling between your vertex buffer layout and your actual vertex shader, which can be convenient in some cases. However it's possible that different input layouts will cause the driver to generate different permutations of your VS (or different VS preludes) behind the scenes.
  12. If you enable the debug layer for your device (which you should *definitely* do for debug builds), you will get error messages when you have a resource simultaneously bound as both an input (SRV) and an output (RTV or UAV).
  13. If you're using a pixel shader and you would like to write to a specific mip as a render target, you need to create a render target view that specifically targets the mipmap and face index. You can do that by specifying the appropriate values for MipSlice, FirstArraySlice, and ArraySize in the D3D11_TEX2D_ARRAY_RTV struct that's a member of D3D11_RENDER_TARGET_VIEW_DESC (it's the same for D3D12 if you're using that API instead of D3D11). If you're using a compute shader, you have to instead create an unordered access view that targets the mip level that you want to write to. It's basically the same as creating the RTV, where you set the appropriate values for the MipSlice, FirstArraySlice, and ArraySlice members of D3D11_TEX2D_ARRAY_UAV, which is a member of D3D11_UNORDERED_ACCESS_VIEW_DESC.
  14. So I'm guessing you're linking against d3dcompiler_47.dll using the import libs from the Windows SDK? There's actually multiple versions of that DLL in the wild: they've updated it multiple times alongside Windows SDK releases, but kept the same filename. If you're linking against that and loading whatever version the OS has installed in C:\Windows\System32, you'll end up loading different versions of the DLL depending on which OS version you're running on. If you want to ensure that you always use the same compiler version, you should include the the DLL alongside your executable (or alternatively, pre-compile your shaders). That said, I'm not sure why this behavior would be different between compiler versions. It's very possible that a bug (or different behavior) was introduced somewhere along the way.
  15. If you'd like to see a more complete example of dynamic indexing, you can check out the deferred texturing demo that I made a while ago. It uses dynamic indexing in both the forward and deferred rendering path to sample material textures, as well as to sample decal textures. There's also an experimental branch where I use bindless techniques throughout the entire rendering framework. Basically all SRV's are persistently allocated from a global descriptor heap, and every shader accesses them using 32-bit indices. However I should warn you that there may be a few bugs on this branch that I haven't fixed yet, and there's also a few issues with dynamic buffers that I have to clean up.