• Content count

  • Joined

  • Last visited

Community Reputation

19872 Excellent

1 Follower

About MJP

  • Rank
    XNA/DirectX Moderator & MVP

Personal Information


  • Twitter
  • Github
  1. Whoops, sorry about that! That value is what you're going to compare against the depth value in the shadow map, so in your case you want to use "lightDepthValue". I'll update the code that I posted in case anybody else copy/pastes it. The "offset" parameter is optional, so you can ignore that. It will offset the location that the texture is sampled by a number of texels equal to that parameter.
  2. I haven't done this myself, but couldn't you just multiply the V coordinate by -1 and then add 1? That should work even for tiled UV's. The V coordinate will often be negative, but that's fine since you're probably going to use a signed representation anyway for your UV's.
  3. Is there anywhere in your code where you're setting device states before drawing your 3D square? There are several states that you can set on the context that will affect your rendering, such as blend state, depth/stencil state, rasterizer state, input layout, and vertex/index buffers, and I don't see you setting those anywhere in the code you've provided. SpriteBatch will set those states in order to do its thing (there's list of states that it will set listed here, under the section called "State management"). You'll want to make sure that you set all of the states that you need before issuing your draw call in order to ensure proper results. One thing that you can do to help with this is to call ID3D11DeviceContext::ClearState at the beginning of every frame, which will set the context back to a default state. I would also recommend enabling the debug validation layer when you create your device (but only in debug builts), and check out and warnings or errors that it reports. Another thing that can help with these kinds of issues is to use debugging tools like RenderDoc, which will let you inspect the device state at the time of a particular draw call.
  4. I'm not sure that I completely understand what you're trying to do here. Are you trying to add a penumbra to your shadow, so that the shadows don't have hard edges? If so, then the standard way to do this with shadow maps is to use percentage closer filtering (PCF for short). In very simple terms, PCF amounts to sampling the shadow map several times around a small region of the shadow map, performing the depth comparison for each sample, and then computing a single result by applying a filter kernel (the simplest filter kernel being a box filter, where you essentially just compute the average of all of the results). The easiest way to get started with PCF is to let hardware before automatic 2x2 bilinear filtering for you. You'll have to make a few changes to your code to do this: Create a special "comparison" sampler state to use for sampling your shadow depth map. You do this by specifying "D3D11_COMPARISON_LESS_EQUAL" as the "ComparisonFunc" member of the D3D11_SAMPLER_DESC structure. This specifies that the hardware should return 1 when the passed in surface depth value is <= the shadow map depth value stored in the texture. You'll also want to use "D3D11_FILTER_COMPARISON_MIN_MAG_MIP_LINEAR" to specify that you want 2x2 bilinear filtering when you sample. In your shader code, declare your shadow sampler state with the type "SamplerComparisonState" instead of "SamplerState". Change your shader code to use SampleCmp instead of Sample. SampleCmp will return the filtered comparison result instead of the shadow map depth value. So you'll also want to restructure your code so that it looks something like this: SamplerComparisonState ShadowSampler; lightDepthValue = input.lightViewPositions[i].z / input.lightViewPositions[i].w; lightDepthValue = lightDepthValue - bias; float lightVisibility = shaderTextures[6 + i].SampleCmp(ShadowSampler, projectTexCoord, lightDepthValue); lightIntensity = saturate(dot(input.normal, normalize(input.lightPositions[i]))) * lightVisibility; color += (diffuseCols[i] * lightIntensity * 0.25f); Once you've got the hang of that and you want to look into more advanced filtering techniques, you can check out a blog post I wrote that talks about some of the most common ways to do shadow map filtering (or jump right to the code sample).
  5. ResolveSubresource unfortunately doesn't work for depth buffers. If you want to do it, you need to do it manually with a pixel shader that outputs to SV_Depth. You also probably wouldn't want the average of the sub-pixel depth values, since this wouldn't make much sense. Anyway, you don't need to copy the depth resource to do #2, as long as you're not writing to the depth buffer during your water pass. If you create a read-only depth-stencil view, then you can read from it using an SRV while the DSV is still bound to the pipeline.
  6. A few branches to compute a texture array index really doesn't sound like a big deal to me. If you're dealing with atlasing of textures that are all the same size, then texture arrays are definitely the easiest way to do it. This is especially true when it comes to mipmaps (which you'll want for terrain textures), since texture arrays keep mips separate and therefore let you avoid the "bleeding" problems that you run into with traditional atlases. There are some caveats when it comes to dynamically updating an atlas at runtime from the CPU, which I can elaborate on if you can tell me which API you're planning on using for this. If you're curious or you'd like to expand your atlas approach into something more generalized, you may want to good for some articles or presentations about virtual texturing. Virtual texturing is really a generalization of what you're proposing, and has been effectively used for terrain in games with large worlds (like the Battlefield series, or the Far Cry series). The typical approach that they use for the "figure out where to sample a pixel's texture from" is to have an indirection texture that's sampled first. So for instance, you might have a "virtual texture" that's 32k x 32k texels that represents all textures that could ever be referenced, but you only keep an 8k x 8k atlas of textures loaded. You would first sample the indirection texture to see where the virtual texture page is loaded into the atlas, and that would give you UV coordinates to use when sampling the atlas. So if your "page" size is 32x32, then your indirection texture would only need to be 1k x 1k. In practice it gets pretty complicated with mip mapping, since each mip will typically be packed separately in the atlas, which requires manual mip sampling + filtering in the pixel shader. There's also somewhat-recent hardware + API support for virtual textures, called "Tiled Resources" in D3D and "Sparse Textures" in GL/Vulkan. If you use that you can potentially skip the indirection texture and also remove the need for manual mip/anisotropic filtering in the pixel shader, but your virtual texture still has to respect the API limits (16k max in D3D). D3D10-level hardware guarantees support for 8k textures, and D3D11-level hardware guarantees support for 16k textures.
  7. Alternatively, you can just transform your point by your combined view + projection matrix and ensure that the resulting XYZ coordinates are between -W and +W (or 0 and +W for the Z component if using D3D conventions for your projection matrix).
  8. Which API are you using to render on the GPU? It sounds like you're using Direct3D, but the various versions have different behavior when it comes to device states and multithreading. Either way you almost certainly don't want to use multiple devices, especially if you're sharing the same content among your various windows.
  9. Is this what you're looking for? https://msdn.microsoft.com/en-us/library/windows/desktop/bb173347(v=vs.85).aspx
  10. Yes, you'll need to sample your IBL cubemap for each layer. This is because each layer will have a different different normal, roughness, and specular reflectance, which means you'll need to sample the cubemap with a different reflection vector and mip level.
  11. DX12 DX12 and threading

    That's exactly what should be happening: it's where the CPU waits for the GPU to wait for the previous frame. You always need to wait in order to make sure that you don't overwrite a command buffer that the GPU is reading from. For instance, say the CPU is submitting frame 60 and the GPU is working on frame 59. The CPU will have generated command buffers using command allocator index 0, and the GPU is consuming command buffers from allocator index 1. If the CPU doesn't wait for the GPU to finish the previous frame and starts writing to a command buffer using allocator index 0, it will write to data that the GPU is reading from. If you're GPU-bound (the GPU is taking longer than the CPU to complete a frame), then you should expect to spend some time waiting on the fence. The be more precise, if the GPU is taking N milliseconds to present a frame and it's taking the CPU M milliseconds to process a frame and submit it to the GPU, then you'll end up waiting ~N-M milliseconds for the fence to be signaled. So if the GPU is VSYNC'ed at 16.6ms and it only takes you 1ms to submit a frame on the CPU, you'll spend ~15.6ms waiting for the fence.
  12. DX12 DX12 and threading

    The code that I posted will let the CPU get no more than 1 frame ahead of the GPU. After the CPU submits command lists to the direct queue, it waits for the previous GPU frame to finish. So if the GPU is taking more time to complete a frame than the CPU is (or if VSYNC is enabled), the CPU will be effectively throttled by fence and will stay tied to the GPU's effective framerate. In my experience, frame pacing issues usually come from situations where the time delta being used for updating the game's simulation doesn't match the rate at which frames are actually presented on the screen. This can happen very easily if you use the length of the previous frame as your delta for the next frame. When you do this, you're basically saying "I expect the next frame to take just as long to update and render as the previous frame". This assumption will hold when you're locked at a steady framerate (usually due to VSYNC), but if your framerate is erratic then you will likely have mismatches between your simulation time delta and the actual frame time. It can be especially bad when missing VSYNC, since your frame times may go from 16.6ms up to 33.3ms, and perhaps oscillate back and forth. I would probably suggest the following for mitigating this issue: Enable VSYNC, and never miss a frame! This will you 100% smooth results, but obviously it's much easier said than done. Detect when you're not making VSYNC, and increase the sync interval to 2. This will effectively halve your framerate (for instance, you'll go from 60Hz to 30Hz on a 60Hz display), but that may be preferable to "mostly" making full framerate with frequent dips. Alternatively, disable VSYNC when you're not quite making it. This is common on consoles, where you have the ability to do this much better than you do on PC. It's good for when you're just barely missing your VSYNC rate, since in that case most of the screen will still get updated at full rate (however there will be a horizontal tear line). It will also keep you from dropping to half the VSYNC rate, which will reduce the error in your time delta assumption. Triple buffering can also give you similar results to disabling VSYNC, but also prevent tearing (note that non-fullscreen D3D apps on Windows are effectively triple-buffered by default since they go through the desktop compositor) You could also try filtering your time deltas a bit to keep them from getting too erratic when you don't make VSYNC. I've never tried this myself, but it's possible that having a more consistent but smaller errors in your time delta is better than less frequent but larger errors. Hopefully someone else can chime in with more thoughts if they have experience with this. I haven't really done any specific research or experimentation with this issue outside of making games feel good when they ship, so don't consider me an authority on this issue.
  13. That sounds like a bug in Nvidia's driver. In D3D11 the results of UAV writes should be visible to all other pipeline stages after the Distpatch completes, regardless of flags and whether or not you've used atomic operations.
  14. For The Order our compositing system was totally based on parameter blending, much in the same spirit as Disney's material system that was presented at SIGGRAPH 2012. The only exception was when compositing cloth-based materials, in which case we would evaluate lighting with both our cloth BRDF as well as with GGX specular and then perform a simple linear blend of the two results. As far as I know UE4 is generally doing parameter blending as well, but I've never worked with that engine so I'm not very familiar with the specifics. As you've already figured out, you can't really simulate true multi-layer materials with parameter blending alone. To do it "for real", you have to make some attempt at modeling the light transport through the various translucent layers (much like they do in that COD presentation). This generally requires some approximation of volume rendering so that you can compute the amount of light absorbed as it travels through the medium. For something like car paint, at the minimum you'll need to compute your specular twice: one for the light being reflected off of the clear coat layer, and another time for the light being reflected off of the actual metallic paint flecks. I'd probably start out with something like this: Compute the specular reflection off of the clear coat using the roughness and IOR (F0 specular intensity) Compute the amount of light transmitted (refracted) into the clear coat using fresnel equations and IOR Compute the intensity and direction of the view direction as it refracts into the clear cloat using fresnel equations and IOR Compute the specular reflection off of the metallic layer using separate roughness and IOR (and perhaps a separate normal map), and also using the refracted view direction Final result is ClearCoatSpecular + MetallicSpecular * LightTransmitAmt * ViewTransmitAmt This would not account for absorption in the clear coat, but it would give you the characteristic dual specular lobes. If you wanted to account for absorption, you could compute a travel distance through the clear coat by having a "thickness" parameter and computing the intersection of the light/view ray with an imaginary surface located parallel to the outer surface. Using that distance you could evaluate the Beer-Lambert equations and use the result to modulate your transmittance values.
  15. DX12 DX12 and threading

    That's not quite what I meant. You'll still want to signal your fence and wait on it every frame, you just need to wait on the value one frame later. The first frame you don't need to wait because there was no "previous" frame, but you do need to wait for every frame after that. Here's what my code looks like, minus a few things that aren't relevant: void EndFrame(IDXGISwapChain4* swapChain, uint32 syncIntervals) { DXCall(CmdList->Close()); ID3D12CommandList* commandLists[] = { CmdList }; GfxQueue->ExecuteCommandLists(ArraySize_(commandLists), commandLists); // Present the frame. DXCall(swapChain->Present(syncIntervals, syncIntervals == 0 ? DXGI_PRESENT_ALLOW_TEARING : 0)); ++CurrentCPUFrame; // Signal the fence with the current frame number, so that we can check back on it FrameFence.Signal(GfxQueue, CurrentCPUFrame); // Wait for the GPU to catch up before we stomp an executing command buffer const uint64 gpuLag = DX12::CurrentCPUFrame - DX12::CurrentGPUFrame; Assert_(gpuLag <= DX12::RenderLatency); if(gpuLag >= DX12::RenderLatency) { // Make sure that the previous frame is finished FrameFence.Wait(DX12::CurrentGPUFrame + 1); ++DX12::CurrentGPUFrame; } CurrFrameIdx = DX12::CurrentCPUFrame % NumCmdAllocators; // Prepare the command buffers to be used for the next frame DXCall(CmdAllocators[CurrFrameIdx]->Reset()); DXCall(CmdList->Reset(CmdAllocators[CurrFrameIdx], nullptr)); }