• Advertisement

MJP

Moderator
  • Content count

    8566
  • Joined

  • Last visited

Community Reputation

19878 Excellent

1 Follower

About MJP

  • Rank
    XNA/DirectX Moderator & MVP

Personal Information

Social

  • Twitter
    @MyNameIsMJP
  • Github
    TheRealMJP
  1. You have to compute the diffuse albedo (Cdiff) and specular albedo (F0) from the base color like this: float3 diffuseAlbedo = lerp(baseColor.xyz, 0.0f, metallic); float3 specularAlbedo = lerp(0.03f, baseColor.xyz, metallic); The basic idea is that non-metallics have diffuse lighting, non-colored specular, and a small F0. While metallics have no specular, colored specular, and possibly a much higher F0. Cavity is usually just multiplied with the final result of the specular term (f(l, v)). It's essentially an occlusion term that represents how much of the specular is blocked by the surface.
  2. You can't, unfortunately. You either have to accept having a fully expanded vertex buffer (which can be MUCH bigger than your original indexed VB), or you need to do a separate pass. Neither is ideal, really, There is another option, but only if you're running on Windows 8 or higher: UAV's from the vertex shader. All you need to do is bind a structured buffer (or any kind of buffer) as a UAV, and then write to it from your vertex shader using SV_VertexID as the index.
  3. DX12 Descriptor Resource Sets

    Long term, you definitely don't want to be constantly copying around descriptors into tables if you want the best possible performance. So yeah, assuming that you mean "descriptor table" instead of "descriptor heap" then #3 sounds like its the closest to that ideal. If you do it that way, then you can basically follow the general practice of constant buffers, and group your descriptors based on update frequency. Ideally you would want to be able to identify descriptor tables that never change, and only build them once. There's another (crazier) approach which is what we use at work, but it requires at least RESOURCE_BINDING_TIER_2 (so no Kepler or old Intel GPU's). Basically the idea is that you go full "bindless", where instead of having contiguous tables of descriptors, you have 1 great big table and expose that to all of your shaders. Then each shader has access to a big unbounded texture array, and it samples a texture by using an integer index to grab the right texture from the array. Then "binding" can be as simple as filling a constant buffer with a bunch of uint's. There's a bunch of wrinkles in practice, but I assure it works. You can also do some really fun things with this setup once you have it working, since it basically gives your shaders arbitrary access to any texture or buffer, with the ability to use any data structure you want to store handles.
  4. I just wanted to chime in on a few things, since I've lost too much of my time to this particular subject. I'm sure plenty of games still bake the direct contribution for at least some of their lights. We certainly did this for The Order, and did it again for Lone Echo. Each of our lights has flags that control whether or not the diffuse and indirect lighting is baked, so the lighting artists could choose to fully bake the light if was unimportant and/or it would only ever need to affect static geometry. We also always baked area lights, since we didn't have run-time support for area lights on either of those games. For the sun shadows we also bake the shadow term to a separate lightmap. We'll typically use this for surfaces that are past the last cascade of dynamic runtime shadows, so that they still have something to fall back on. Here's a video if you want to see what it looks like in practice: https://youtu.be/zxPuZYMIzuQ?t=5059. It's common to store irradiance in a light map (or possibly a distribution of irradiance values about a hemisphere in modern games), but if you want to to compute a specular term then you need to store a radiance distribution in your lightmap. Radiance tells you "if pick a direction, how much light is coming in from that direction?" while irradiance tells you "if I have a surface with a normal oriented in this direction, what's the total amount cosine-weighted light that's hitting the surface?". You can use irradiance to reconstruct Lambertian diffuse (since the BRDF is just a constant term), but that's about it. Any more complicated BRDF's, including specular BRDF's, require that you calculate Integral(radiance * BRDF) for all directions on the hemisphere surrounding the surface your'e shading. How to do this efficiently completely depends on the basis function that you use to approximate radiance in your lightmap. If you want SH but only on a hemisphere, then you can check out H-basis. It's basically SH reformulated to only exist on the hemisphere surrounding the Z axis, and there's a simple conversion from SH -> H-basis. You can also project directly into H-basis if you want to. I have some shader code here for projecting and converting. You can also do a least-squares fitting on SH to give you coefficients that are optimized for the upper hemisphere. That said I'm sure you would be fine with the Last of Us approach of ambient + dominant direction (I believe they kept using that on Uncharted 4), but it's nice to know all of your options before making a decision You don't necessarily have to store directions for a set of SG's in a lightmap. We assume a fixed set of directions in tangent space, which saves on storage and makes the solve easier. But that's really the nicest part of SG's: you have a lot of flexibility in how you can use them, as opposed to SH which has a fixed set of orthogonal basis functions. For instance you could store a direction for one SG, and use implicit directions for the rest.
  5. Whoops, sorry about that! That value is what you're going to compare against the depth value in the shadow map, so in your case you want to use "lightDepthValue". I'll update the code that I posted in case anybody else copy/pastes it. The "offset" parameter is optional, so you can ignore that. It will offset the location that the texture is sampled by a number of texels equal to that parameter.
  6. I haven't done this myself, but couldn't you just multiply the V coordinate by -1 and then add 1? That should work even for tiled UV's. The V coordinate will often be negative, but that's fine since you're probably going to use a signed representation anyway for your UV's.
  7. Is there anywhere in your code where you're setting device states before drawing your 3D square? There are several states that you can set on the context that will affect your rendering, such as blend state, depth/stencil state, rasterizer state, input layout, and vertex/index buffers, and I don't see you setting those anywhere in the code you've provided. SpriteBatch will set those states in order to do its thing (there's list of states that it will set listed here, under the section called "State management"). You'll want to make sure that you set all of the states that you need before issuing your draw call in order to ensure proper results. One thing that you can do to help with this is to call ID3D11DeviceContext::ClearState at the beginning of every frame, which will set the context back to a default state. I would also recommend enabling the debug validation layer when you create your device (but only in debug builts), and check out and warnings or errors that it reports. Another thing that can help with these kinds of issues is to use debugging tools like RenderDoc, which will let you inspect the device state at the time of a particular draw call.
  8. I'm not sure that I completely understand what you're trying to do here. Are you trying to add a penumbra to your shadow, so that the shadows don't have hard edges? If so, then the standard way to do this with shadow maps is to use percentage closer filtering (PCF for short). In very simple terms, PCF amounts to sampling the shadow map several times around a small region of the shadow map, performing the depth comparison for each sample, and then computing a single result by applying a filter kernel (the simplest filter kernel being a box filter, where you essentially just compute the average of all of the results). The easiest way to get started with PCF is to let hardware before automatic 2x2 bilinear filtering for you. You'll have to make a few changes to your code to do this: Create a special "comparison" sampler state to use for sampling your shadow depth map. You do this by specifying "D3D11_COMPARISON_LESS_EQUAL" as the "ComparisonFunc" member of the D3D11_SAMPLER_DESC structure. This specifies that the hardware should return 1 when the passed in surface depth value is <= the shadow map depth value stored in the texture. You'll also want to use "D3D11_FILTER_COMPARISON_MIN_MAG_MIP_LINEAR" to specify that you want 2x2 bilinear filtering when you sample. In your shader code, declare your shadow sampler state with the type "SamplerComparisonState" instead of "SamplerState". Change your shader code to use SampleCmp instead of Sample. SampleCmp will return the filtered comparison result instead of the shadow map depth value. So you'll also want to restructure your code so that it looks something like this: SamplerComparisonState ShadowSampler; lightDepthValue = input.lightViewPositions[i].z / input.lightViewPositions[i].w; lightDepthValue = lightDepthValue - bias; float lightVisibility = shaderTextures[6 + i].SampleCmp(ShadowSampler, projectTexCoord, lightDepthValue); lightIntensity = saturate(dot(input.normal, normalize(input.lightPositions[i]))) * lightVisibility; color += (diffuseCols[i] * lightIntensity * 0.25f); Once you've got the hang of that and you want to look into more advanced filtering techniques, you can check out a blog post I wrote that talks about some of the most common ways to do shadow map filtering (or jump right to the code sample).
  9. ResolveSubresource unfortunately doesn't work for depth buffers. If you want to do it, you need to do it manually with a pixel shader that outputs to SV_Depth. You also probably wouldn't want the average of the sub-pixel depth values, since this wouldn't make much sense. Anyway, you don't need to copy the depth resource to do #2, as long as you're not writing to the depth buffer during your water pass. If you create a read-only depth-stencil view, then you can read from it using an SRV while the DSV is still bound to the pipeline.
  10. A few branches to compute a texture array index really doesn't sound like a big deal to me. If you're dealing with atlasing of textures that are all the same size, then texture arrays are definitely the easiest way to do it. This is especially true when it comes to mipmaps (which you'll want for terrain textures), since texture arrays keep mips separate and therefore let you avoid the "bleeding" problems that you run into with traditional atlases. There are some caveats when it comes to dynamically updating an atlas at runtime from the CPU, which I can elaborate on if you can tell me which API you're planning on using for this. If you're curious or you'd like to expand your atlas approach into something more generalized, you may want to good for some articles or presentations about virtual texturing. Virtual texturing is really a generalization of what you're proposing, and has been effectively used for terrain in games with large worlds (like the Battlefield series, or the Far Cry series). The typical approach that they use for the "figure out where to sample a pixel's texture from" is to have an indirection texture that's sampled first. So for instance, you might have a "virtual texture" that's 32k x 32k texels that represents all textures that could ever be referenced, but you only keep an 8k x 8k atlas of textures loaded. You would first sample the indirection texture to see where the virtual texture page is loaded into the atlas, and that would give you UV coordinates to use when sampling the atlas. So if your "page" size is 32x32, then your indirection texture would only need to be 1k x 1k. In practice it gets pretty complicated with mip mapping, since each mip will typically be packed separately in the atlas, which requires manual mip sampling + filtering in the pixel shader. There's also somewhat-recent hardware + API support for virtual textures, called "Tiled Resources" in D3D and "Sparse Textures" in GL/Vulkan. If you use that you can potentially skip the indirection texture and also remove the need for manual mip/anisotropic filtering in the pixel shader, but your virtual texture still has to respect the API limits (16k max in D3D). D3D10-level hardware guarantees support for 8k textures, and D3D11-level hardware guarantees support for 16k textures.
  11. Alternatively, you can just transform your point by your combined view + projection matrix and ensure that the resulting XYZ coordinates are between -W and +W (or 0 and +W for the Z component if using D3D conventions for your projection matrix).
  12. Which API are you using to render on the GPU? It sounds like you're using Direct3D, but the various versions have different behavior when it comes to device states and multithreading. Either way you almost certainly don't want to use multiple devices, especially if you're sharing the same content among your various windows.
  13. Is this what you're looking for? https://msdn.microsoft.com/en-us/library/windows/desktop/bb173347(v=vs.85).aspx
  14. Yes, you'll need to sample your IBL cubemap for each layer. This is because each layer will have a different different normal, roughness, and specular reflectance, which means you'll need to sample the cubemap with a different reflection vector and mip level.
  15. DX12 DX12 and threading

    That's exactly what should be happening: it's where the CPU waits for the GPU to wait for the previous frame. You always need to wait in order to make sure that you don't overwrite a command buffer that the GPU is reading from. For instance, say the CPU is submitting frame 60 and the GPU is working on frame 59. The CPU will have generated command buffers using command allocator index 0, and the GPU is consuming command buffers from allocator index 1. If the CPU doesn't wait for the GPU to finish the previous frame and starts writing to a command buffer using allocator index 0, it will write to data that the GPU is reading from. If you're GPU-bound (the GPU is taking longer than the CPU to complete a frame), then you should expect to spend some time waiting on the fence. The be more precise, if the GPU is taking N milliseconds to present a frame and it's taking the CPU M milliseconds to process a frame and submit it to the GPU, then you'll end up waiting ~N-M milliseconds for the fence to be signaled. So if the GPU is VSYNC'ed at 16.6ms and it only takes you 1ms to submit a frame on the CPU, you'll spend ~15.6ms waiting for the fence.
  • Advertisement