• Content count

  • Joined

  • Last visited

  • Days Won


MJP last won the day on September 12

MJP had the most liked content!

Community Reputation

19809 Excellent

About MJP

  • Rank
    XNA/DirectX Moderator & MVP

Personal Information


  • Twitter
  • Github
  1. Getting the cubemap face + UV coordinates from a direction vector is fairly simple. The largest component determines the face, and the other two components are then your UV's after you divide by the max component, and then remap them from [-1, 1] to [0, 1]. Here's some example code for you from one of my open-source projects: template<typename T> static XMVECTOR SampleCubemap(Float3 direction, const TextureData<T>& texData) { Assert_(texData.NumSlices == 6); float maxComponent = std::max(std::max(std::abs(direction.x), std::abs(direction.y)), std::abs(direction.z)); uint32 faceIdx = 0; Float2 uv = Float2(direction.y, direction.z); if(direction.x == maxComponent) { faceIdx = 0; uv = Float2(-direction.z, -direction.y) / direction.x; } else if(-direction.x == maxComponent) { faceIdx = 1; uv = Float2(direction.z, -direction.y) / -direction.x; } else if(direction.y == maxComponent) { faceIdx = 2; uv = Float2(direction.x, direction.z) / direction.y; } else if(-direction.y == maxComponent) { faceIdx = 3; uv = Float2(direction.x, -direction.z) / -direction.y; } else if(direction.z == maxComponent) { faceIdx = 4; uv = Float2(direction.x, -direction.y) / direction.z; } else if(-direction.z == maxComponent) { faceIdx = 5; uv = Float2(-direction.x, -direction.y) / -direction.z; } uv = uv * Float2(0.5f, 0.5f) + Float2(0.5f, 0.5f); return SampleTexture2D(uv, faceIdx, texData); } I don't think there's any simple matrix or transformation that will get you UV coordinates for a cubemap that's set up as a "cross". It would be easier if you had all of the faces laid out horizontally or vertically in cubemap face order(-X, +X, -Y, +Y, -Z, +Z), but if that's not possible then it should be possible to do a bit of extra computation to go from face index -> cross coordinates. From there doing bilinear filtering isn't too hard by just treating the texture as 2D, but smoothly filtering across cubemap faces requires all kinds of special logic.
  2. DirectX Texture Tool

    Most (all?) of the features of the old DX Texture Tool are now built into Visual Studio (since 2012, IIRC). Just drag an image into VS and it will open in the image editor. RenderDoc also has a DDS/image viewer built-in that can use drag-and-drop, but I don't think that it has any editing capabilities.
  3. As alternative to Map/Unmap, you can also use UpdateSubResource1 as described in this article. That particular method also lets you avoid having to manually avoid writing to a buffer that the GPU is currently reading from, which is pretty dodgy to begin with in D3D11 since you don't have explicit submission or fences.
  4. I would echo what galo1n mentioned about being careful with regards to which memory statistic you're looking at. Like any OS that uses virtual memory. Windows has quite a few statistics that you can query via task manager or Win32 API's and they all have different meanings (and sometimes the difference is quite subtle). D3D resources will always allocate committed virtual memory via VirtualAlloc. This will increase the "commit size" in task manager, which is telling you the total amount of committed virtual memory in the process. By extension it will also increase the system total commit size that you can view under the "Performance" tab of task manager (it's the value labeled "Committed"). If you want to query these values programatically, you can use GetProcessMemoryInfo for process-specific stats and GetPerformanceInfo for system-wide stats. Committed memory has to be backed up by either system memory or the page file, which means that your commit total is typically not equal to the physical RAM consumption of your process. To see that, you want to look at the private working set, which is visible in task manager under the Details and Performance tabs. The process and system totals are also returned by the functions I linked above. In general your working set will be a function of how much memory your program is actually accessing at any given time. So if you allocate a bunch of memory that you never use, it can get paged out to disk and it won't be reflected in your working set. However Windows tends to only page out data when it really needs to, so if you access some memory once and then never again, it can stay in your working set until somebody needs that physical memory. If your D3D resources are causing a large increase in your working set, then it's possible that you're over-committing the GPU's dedicated memory pool. Either the Windows video memory manager (VidMM) or the video card driver (depending on the version of Windows that you're running) will automatically move GPU data to and from system memory if it can't keep the entire system's worth of resources resident in dedicated GPU memory. If this happens a lot it can really tank your performance, so you generally want to avoid it as much as possible. You can check for this by capturing your program with ETW and using GPUView, or by using the new PIX for Windows. In general though you probably want to limit your committed memory as much as possible, even if your working set is low. Like I mentioned earlier it needs to be backed up by either system memory or the page file, and if those are both exhausted your system will start to get very unhappy and either you or the video card's user-mode driver will crash and burn. This unfortunately means that the system's physical memory amount, page file size, and memory usage of other programs will dictate how much memory you can use without crashing, which in turn means that your higher performance settings may crash even if they have a nice GPU with lots of dedicated memory. On the last game I shipped there were a non-trivial number of crashes from users with high-end video cards who cranked their settings, but had their page file turned off!
  5. I would go with updating a constant buffer that contains the material parameters. It's very common to update a constant buffer before issuing a Draw, so drivers will try to optimize for that case. For DX11, you can see this article from Nvidia on how to optimize your constant buffer updates, and use D3D11.1 features in Windows 8.1 to make updating them even faster. On DX12/Vulkan/consoles updating a constant buffer is (almost) completely in your hands, and so you can potentially make it *really* fast.
  6. With a metallic workflow the specular reflectance is fixed for dialectrics (constant IOR), or for metals the specular reflectance is equal to the base color. So there's no need to store an IOR, since it's redundant. It also fits well with how most tools/engines have artists author material parameters, and is easily usable with Schlick's approximation. Naty Hoffman talks about this a bit in the section entitled "Fresnel Reflection" from these course notes.
  7. The "metallic" workflow comes from Disney's 2012 SIGGRAPH presentation about their physically based shading model (slides, course notes). Basically when metallic is 0 then you treat your base color as the diffuse reflectance, with a fixed F0 ("head-on") specular reflectance of 0.03 or 0.04 (with fresnel applied so that it goes to 1.0 at grazing angles). This gives you a typical dielectric with colored diffuse, and white specular. However when metallic is 1.0 you then use your base color as your F0 specular reflectance, with diffuse reflectance set to 0. This now gives you a typical metal, with colored specular and no diffuse. So that lets you represent both dielectrics and metals with a single set of 5 parameters (base color, metallic, and roughness), which is nice for deferred renderers and/or for packing those parameters into textures. The 1/Pi factor in a Lambertian diffuse BRDF is essentially a normalization term that ensures that the surface doesn't reflect more energy than the amount that is incident to the surface (the irradiance). Imagine a white surface with diffuse reflectance of 1 that's in a completely white room (like the construct from the matrix), where the incoming radiance is 1.0 in every direction. If you compute the irradiance in this situation by integrating the cosine (N dot L) term of the entire hemisphere surrounding the surface normal you get a value of Pi. Now let's say our diffuse BRDF is just Cdiff instead of Cdiff/Pi. To the get the outgoing radiance in any viewing direction, you would compute Cdiff * irradiance. This would give you a value of Pi for Cdiff = 1.0, which means that the surface is reflecting a value of Pi in every viewing direction! In other words we have 1.0 coming in from every direction, but Pi going out! However if we use the proper Lambertian BRDF with the 1/Pi factor, when end up with 1.0 going out an all is well. So yes, this means that if you have a red surface with Cdiff = [1, 0, 0] that's being lit by a directional light with irradiance of 1.0, then the surface will have an intensity of [1 / Pi, 0, 0]. However this should make sense if you consider that in this case the lighting is coming from a single direction, and is getting scattered in all directions on the hemisphere. So naturally there is less light in any particular viewing direction. To get a fully lit surface, you need to have constant incoming lighting from all directions on the hemisphere.
  8. DX11 Viewports required?

    Your vertex shader is expected to output vertices in "clip space", where X, Y are between -W and +W and Z is between 0 and +W, and everything outside of that range is clipped (see the section called "Viewport culling and clipping" in this article for more info, or google for "clip space" to find some more resources if you're curious). (note that this image is using OpenGL conventions where Z is between -W and +W, whereas D3D specifies that Z is between 0 and W) The rasterizer expects homogeneous coordinates where the final coordinate will be divided by W after interpolation, which is how you get to the [-1, -1, 0] -> [1, 1, 1] "normalized device coordinate" space that you're referring to in your above post. So with the details out of the way, let's say that we wanted to position a triangle so that it has one vertex at the top-middle of the screen, on one the right-middle of the screen, and one in the very center of the screen: The easiest way to do this is to use a value of 1.0 for W, which means we can specify the XYZ coordinates in NCD [-1, -1, 0] -> [1, 1, 1] space. So to get the triangle where we want, we could set the three vertices to (0, 1, 1, 1), (1, 0, 1, 1), (0, 0, 1, 1). If we do this, the triangle will be rasterized in the top-right quadrant of the screen, with all pixels having a z-buffer value of 1.0. In practice, you usually don't calculate vertex coordinates in this way except in special circumstances (like drawing a full-screen quad). Instead you'll apply a projection matrix that takes us from camera-relative 3D space to a projected 2D space, where the resulting coordinates are perfectly set up to be in the clip space that I mentioned earlier. Projection matrices typically aren't too fancy: they're usually just a scale and a translation for X, Y, and Z, with either 1 or Z ending up in the W component. For 2D stuff like sprites, orthographic matrices are usually the weapon of choice. For X and Y, an orthographic matrix will usually divide X and Y by the "width" and "height" of the projection, and possibly also shift them down afterwards with a translation. If you think about it, this is a perfect way to automatically account for the viewport transform so that you can work in 2D coordinates. Let's say you wanted to work such that (0, 0) is the bottom left of the screen, and (ViewportWidth, ViewportHeight) is the top. To go from this coordinate space to [-1, 1] NCD space, you would do something like this: // Go from [0, VPSize] to [0, 1] float2 posNCD = posVP / float2(VPWidth, VPHeight); // Go from [0, 1] to [0, -1] posNCD = posNCD * 2.0f - 1.0f; Now you can do this yourself in the vertex shader if you'd like, but if you carefully look at how an orthographic projection is set up you should see that you can use such a matrix to represent the transforms that I described above. You can even work an extra -1 into the Y component if you wanted to have your original coordinate space set up so that (0, 0) is the top-left corner of the screen, which is typical for 2D and UI coordinate systems. Perspective projections are a little more complicated, and aren't really set up for 2D operations. Instead they create perspective effects by scaling X and Y according to Z, so that things appear smaller as they get further from the camera. But your typical symmetrical perspective projection is still doing roughly the same thing as an orthographic projection, in that it's applying a scale and translation so that your coordinates will end up with (-W, -W) as the bottom left of the screen and (W, W) as the top right. One of the major difference is that an orthographic projection will typically always set W to 1.0, while a perspective projection will typically set it the Z value of the coordinate before the projection was applied. Then when homogeneous "divide-by-w" happens, coordinates with a higher Z value will end up being closer to 0, which makes geometry appear smaller as it gets further away from the camera.
  9. For Tier 2 your entire shader-visible UAV and CBV descriptor tables need to have valid descriptors in them. So if your root signature specifies a descriptor table with 8 UAV's and your shader only uses 4, then there still needs to be 8 valid UAV descriptors in the table that you specify with SetGraphicsRootDescriptorTable. NULL descriptors count as "valid" in this case, so you can fill up the rest of your table with NULL descriptors if you want. The validation layer will complain at you if you mess this up, but unfortunately it will only do this if you run on Tier 2 hardware. There's not a whole lot of DX12-capable Tier 2 hardware out there ever since Nvidia upgraded their Maxwell and Pascal GPU's to Tier 3.
  10. This is expected. The typical "stabilization" techniques for cascade shadow maps only fix flickering for the case of completely static geometry. If the geometry moves or otherwise animates, you'll get flickering due to changes in how the geometry rasterizes into the depth map. The only way to avoid this this is to apply filtering when sampling the shadow map (or potentially pre-filter, if you're using a technique like VSM that supports pre-filtering).
  11. Just go ahead and look at an image of a cone under projection, like this one: It's very clearly not a triangle in 2D. You could possibly fit a triangle to the cone, but I don't know if there's a (cheap) algorithm for doing that.
  12. As others have alluded to, there's a layer of abstraction here that you're not accounting for here. fxc outputs DXBC, which contains instructions that use a virtual ISA. So it's basically emitting code for a virtual GPU that's expected to behave according to rules defined in the spec for that virtual ISA. The GPU and its driver is then responsible for taking that virtual ISA, JIT-compiling it to native instructions that the GPU can understand (which they sometimes do by going through multiple intermediate steps!), and then executing those instructions in a way that the final shader results are consistent with what the virtual machine would have produced. This gives GPU's tons of leeway in how they can design their hardware and their instructions, which is kind of the whole point of having a virtual ISA. The most famous example is probably recent Nvidia and AMD GPU's, which have their threads work in terms of scalar instructions instead of the 4-wide instructions that are used in DXBC. Ultimately what this means for you is that you often can only reason about things in terms of the DXBC virtual ISA. The IHV's will often provide tools that can show you the final hardware-specific instructions for reference, which can occasionally help guide you in terms of writing more optimal code for a particular generation of hardware. But in the long run hardware can change in all kinds of ways, and you can never make assumptions about how hardware will compute the results of your shader program. That said, the first thing you should do is compile your shader and look at the resulting DXBC. In the case of loading from an R8_UINT texture and XOR'ing it, it's probably just going to load the integer data into a single component of one of its 16-byte registers and then perform an xor instruction on that. Depending on what else is going on in your program you might or might not have other data packed into the same register, and the compiler may or may not merge multiple scalar operations into a 2, 3, or 4-way vector operation. But again, this can have little bearing on the actual instructions executed by the hardware. In general, I think that your worries about "packing parallel XOR's" are a little misplaced in terms of modern GPU's. GPU's will typically use SIMT style execution, where a single "thread" running a shader program will run on a single lane of a SIMD unit. So as long as you have lots of thread executing (pixels being shaded, in your case) the XOR will pretty much always be run in parallel across wide SIMD units as a matter of course.
  13. DX11 Constant buffer and names?

    In DXBC assembly, constant buffers are made up of "elements" that are 16 bytes wide. So the constant buffer will always be made up of N elements, where the total size is then 16 * N bytes. This is why you have to create your constant buffers rounded up to the next multiple of 16 bytes when you call CreateBuffer(). This is also the reason for trying to pack vector types so that they don't cross 16-byte element boundaries. DXBC is basically virtual ISA that works in terms of 4-component vectors, which means that registers and instructions can typically work with 4 values at a time. This applies to constant buffers as well, where each element is 16-byte value that can be treated as a 4-component vector, and can be used in instructions as if it were a register. As an example, let's look at a simple shader and it's resulting DXBC output from the compiler: cbuffer MyConstants { float4 MyValue; }; float4 PSMain() : SV_Target0 { return MyValue * 8.0f; } // ps_5_0 // dcl_globalFlags refactoringAllowed // dcl_constantbuffer CB0[1], immediateIndexed // dcl_output o0.xyzw // mul o0.xyzw, cb0[0].xyzw, l(8.000000, 8.000000, 8.000000, 8.000000) // ret You'll see that the whole program is really just a single instruction, where it basically says "multiply the first float4 element from the constant buffer with 8.0". Since "MyValue" is a float4 and is lined up on exactly with a constant buffer "element", the DXBC assembly can reference all of that data and multiply it with a single instruction. Now let's try another example where we split up "MyValue" so that it straddles a 16-byte boundary, which causes it to be located in two different constant buffer elements: cbuffer MyConstants { float3 SomeOtherValue; float MyValue_X; float3 MyValue_XYZ; }; float4 PSMain() : SV_Target0 { return float4(MyValue_X, MyValue_XYZ) * 8.0f; } // ps_5_0 // dcl_globalFlags refactoringAllowed // dcl_constantbuffer CB0[2], immediateIndexed // dcl_output o0.xyzw // mul o0.x, cb0[0].w, l(8.000000) // mul o0.yzw, cb0[1].xxyz, l(0.000000, 8.000000, 8.000000, 8.000000) // ret In this case the compiler has to emit two separate instructions to perform the multiply, since the instruction can only use a single constant buffer element as an operand. Do keep in mind that this is all rather specific to the particulars of DXBC's virtual ISA, which can be (and very often is) very different from the actual native instructions executed by the GPU. For example, Nvidia and AMD have long ago dropped the notion of vector instructions within a single execution thread, and instead only work with scalar operations. So in that case a float4 multiply will always expand out to 4 individual instructions, and so it necessarily doesn't gain them anything to have the source data aligned to a 16-bute boundary in the constant buffer. The new open-source DirectX shader compiler (dxc) has a completely different (scalar) output format, and so they might even change the packing rules for that compiler in the future.
  14. There are many variants of "tile based shading", so you'll have to adapt your culling based on your particular approach. If you're only using a single Z partition that isn't fit to the depth buffer, then your sub-frustum will extend all the way from the camera's near clipping plane to the far clipping plane. In such a case your sub-frusta will typically by very long and skinny, in which case a bounding sphere can be a poor fit: A 3D cone is not a triangle under perspective projection, so your 2D approach wouldn't work.
  15. If you have the corners of your frustum slice, then you should be able to fit a bounding sphere to those points using any of the common sphere-from-points algorithms. Depending on what projection matrix you use and how you're generating your sub-frusta you may be able to make some assumptions that you can use to quickly compute your sphere, but you have to careful with that. In the typical case of projection matrices, the sub-frusta will be non-symmetrical (they will appear skewed towards the corners of the screen), and the points on the rear plane will be more spread out than the points on the front plane.