Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

  • Days Won


MJP last won the day on September 17

MJP had the most liked content!

Community Reputation

20018 Excellent

1 Follower

About MJP

  • Rank
    XNA/DirectX Moderator & MVP

Personal Information


  • Twitter
  • Github

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. It looks to me light you're rendering back-faces for the sphere mesh representing your point light, and that sphere is getting clipped by the far clipping plane of your camera. Does the issue go away if you increase the far clip plane? Generally you want to render front-faces for your light proxy geo when your camera/near-plane aren't intersecting the light volume, since it avoids issues like this one.
  2. If you're always going to offset the depth in one direction, you can use SV_DepthGreaterEqual or SV_DepthLessEqual. These can potentially be more optimal for GPU's that perform depth testing before the pixel shader is run (pretty much all desktop GPU's do this).
  3. MJP

    DirectX 12 command queues

    D3D12 is not the same as Vulkan when it comes to queues. In Vulkan you can query how many queues (and of which type) are supported by the device, and then you bind to those queues. The idea is that if the actual hardware supports N extra compute queues, then they'll be exposed in Vulkan and you can submit to any of those to have the work execute concurently. In D3D12 you can create as many queues as you want regardless of what the hardware supports. The queues are "virtualized" in D3D12, which means that the OS/scheduler can do things like "flattening" multiple submissions into a single hardware queue (this is possible because the queues in D3D12 are subsets of each other's functionality). I talked about this a bit in this article (scroll down to the section called "The Present: Windows 10, D3D12, and WDDM 2.0"). If you want to know for sure that your command lists are executing in parallel, you'll need to use a tool like GPUView or AMD's Radeon Graphics Profiler.
  4. After applying a D3D-style perspective projection, "w" is equivalent to the Z coordinate of the vertex is in view space (relative to the eye/camera). A typical orthographic projection does not produce an equivalent value, instead the "z" value is equivalent to (ViewSpaceZ - NearClip) / (FarClip - NearClip). So if you'd like you can reconstruct the view-space Z value from that using the Near/Far clip plane values that you used, or by directly using values from the orthographic matrix. Or alternatively, you can just compute ViewSpaceZ in the vertex shader by transforming the vertex into view space.
  5. Do you know which cost you were measuring in this case? As in, were you measuring CPU cost or GPU cost? Only Nvidia could say for sure what's going on in their driver, so I don't think that anybody here could tell you definitively why that might be the case. In terms of GPU cost, one thing that can really cause some trouble in certain cases is that the D3D11 spec requires a full GPU sync point between dispatches. This can be expensive in terms of GPU cost, since it usually requires a full thread/execution barrier as well as flushing of caches. Normally you need this in cases where a 1 dispatch writes a result and the next dispatch immediately reads from it, but in cases where you have lots of dispatches writing to the same buffer or texture with no dependencies the flushes are unnecessary. Nvidia and AMD actually have special "extension" API's that can let you disable the sync/flush for cases where you know it's safe to do so, but you have to be verrrrrrrrrry careful when using these. If you mess it up, you can get unpredictably corrupted results. See the section called "UAV Overlap" from my list of D3D11 driver hacks for more info. In terms of CPU cost, that's much harder to say. It really depends on what's going on the driver. For instance it might be doing some expensive check to see if it can skip the flush depending on which resources are bound when you call Dispatch, but it's impossible to say for sure without seeing the driver code.
  6. So the problem with MSAA is that you need to have both an MSAA render target and and MSAA depth target for it to work. This means that if you want your depth/normal pass to work as a prepass for your forward pass so that you don't have any overdraw (which you certainly do), then your prepass has to render to MSAA depth/render targets. For something like SSAO, you're typically going to do that at a lower resolution anyway. In that case, you probably want to downsample from a full-res MSAA depth/normal target to a low-res non-MSAA target.
  7. That's right: the depth from an orthogonal projection is linearly distributed across the range between the clip planes. A fixed-point UNORM format will therefore have even precision across this entire range, at least in terms of the values that are stored. You will still have various effects from the precision of floating-point values used to compute the depth values that are stored in the depth buffer.
  8. If you're getting a crash in a D3D function, then make sure that you enable the D3D debug layer for your non-release builds. The debug layer will tell you when pass invalid parameters to API functions, or otherwise trigger invalid behavior.
  9. Yes, you can absolutely do this. Projection onto SH is an integral of your signal multiplied with the SH basis functions, which means you can formulate it as a parallel reduction on the GPU. I actually implemented this many years ago for a lightmap baker, where the final lightmap coefficients were generated by rendering a hemicube per-texel and then projecting that onto SH. This is pretty much exactly what you're doing, except that you have a full cubemap and not just the upper half of one. That was one of the first compute shaders that I ever wrote, so I'm sure it's nowhere near optimal. It probably would have been more cache efficient to do the reduction in 2D tiles instead along rows, and I'm sure the shared memory access pattern was full of bank conflicts. But If you want to have a look it should help get you started on your own implementation.
  10. AMD doesn't support the "Driver Command Lists" feature for deferred contexts. This means that the D3D11 runtime lets you use deferred contexts, but instead of storing commands in actual hw-specific command buffers it will store them in a device-agnostic intermediate buffers. The runtime will then serialize those commands and pass them to the driver to create the final command buffer for the GPU. While this can possibly let you parallelize certain aspects of submitting commands, the actual command buffer generation is going to happen on a single thread. Thus you may be better off using a single thread and letting the CPU reach peak turbo clocks instead of trying to use multiple threads/cores to generate deferred command lists. Like turanszkij mentioned, D3D11 is just a really poor fit for multithreading in terms of its core level of abstraction. It tries to hide dependencies and asynchronous execution from you, and it's hard to do all of that hiding and abstraction unless the command submission is single-threaded or otherwise serialized.
  11. Schlick fresnel should work fine for the clear coat surface of a car. The clear coat basically acts like a layer of clear plastic/glass over the paint, so the same rules apply. Perhaps your light source isn't bright enough? You should also make sure that your specular terms are properly balanced with your diffuse terms. For instance if you omit the 1 / Pi in the Lambertian diffuse BRDF, then your specular will always look too dim (since your diffuse will be too bright).
  12. I know this doesn't exactly help with your problem, but have you tried using PIX or RenderDoc instead of the VS graphics debugger? I find both of those tools to be *much* more usable than the built-in VS debugger.
  13. This presentation (and accompanying paper) talks about the cosine/dot products in the denominator a bit, amongst other things.The reason for these terms is that the BRDF always deals with with a surface patch whose area is exactly 1, but the projected area from the eye's point of view and the light's point of view is not 1 (they're proportional to the cosine of the angle between the eye/light and the surface normal). These terms data back to 1967(!), when they were discussed in Torrance and Sparrow's paper about off-peak specular reflections.
  14. MJP

    Unusually high memory usage

    So you have to be a bit careful when looking at the overall memory usage of your process. There are lots of things can allocate virtual memory inside of your process, of which your own code is just one. The OS and its various components might allocate things in your address space, third-party libraries linked into your code or loaded as a DLL can allocate, and then of course there's a very heavy D3D runtime and a user-mode driver loaded into your process. Sometimes these other components will allocate memory as a direct result of your code (for instance, if you create a D3D resource) but in many other cases it will do so indirectly. The driver might need some memory for generating command buffers, or it might have a pool of temporary memory that it draws from when you map a dynamic buffer. Basically this means that while it's always a good idea to keep an eye on your memory usage, you probably need more information than what task manager gives you if you really want to know what's going on. For the memory directly allocated by your own code, writing your own simple tools can be really useful. Generally you want to have all kinds of information that's specific to your game or engine, such as "how much memory am I using for this one level?" or "how much memory am I using for caching streamed data?". As for keeping track of everyone else's allocations, for that the best tool is ETW. With that you can trace the callstack of every call to VirtualAlloc, which can give you clues as to what's going on. Unfortunately there will be things for which you don't have PDB's, but you can at least get symbols for Microsoft DLL's using their public symbol server. The new PIX for Windows also has a built-in memory capture tool that can give you the same information, which it does by using ETW under the hood.
  15. MJP

    Silly Input Layout Problem

    The OP's VS code is fine, the shader compiler automatically converts a float4x4 to to 4 float4 attributes in the input signature (with sequential semantic indices).
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!