Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!


1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


MJP

Member Since 29 Mar 2007
Offline Last Active Yesterday, 04:03 PM

#5233908 Roughness in a Reflection

Posted by MJP on 09 June 2015 - 03:05 PM

Like swiftcoder mentioned, the common way to do this is pick a roughness per mip level and pre-convolve with the corresponding specular lobe for each mip level. As roughness increases you end up with lower-frequency information (AKA blurrier data), and so it makes sense to use the lower-resolution mip levels for higher roughnesses.

If you want to try to approximate more physically-based BRDF's, then you have to go further and try to incorporate a fresnel term as well as a geometry term. Unfortunately you can't pre-integrate those terms into a cubemap, and so you have to use approximations in order to try get the correct reflection intensity. There are two courses from SIGGRAPH 2013 that go into the details of doing this: the Black Ops II presentation, and the Unreal Engine 4 presentation.


#5233660 order of resource creation

Posted by MJP on 08 June 2015 - 04:40 PM

I don't want to get into specifics since it's under NDA, so I'll just say that it does a great job of fully exposing all of the hardware's functionality in a sane manner. I really like what's happening with D3D12/Vulkan, particularly in that it shifts a lot of the memory and synchronization management over to us. That, combined with bindless resousources and more direct control over command buffer generation/submission, should allow for much greater efficiency compared to D3D11. I don't want to compare PS4 and D3D12 too much though, since it's not really a fair comparison. PS4 API's are only need to expose the functionality of only 1 hardware configuration on platform that's primarily designed for running 1 game at a time. D3D12 is in the much tougher position of exposing a wide range of GPU's and drivers in the context of a multitasking OS running many applications, which is a considerably more complex situation. And so you end up with concepts like a root signature, which abstracts away many different low-level binding models so that it can present them through a single coherent interface. In that context, I think D3D12 has done a great job of exposing things as efficiently as possible while still working within the limitations imposed by PC development.


#5233621 Difference between camera velocity & object velocity

Posted by MJP on 08 June 2015 - 02:57 PM

Computing velocity from only the depth buffer assumes that each pixel is static in world-space. Therefore the resulting velocity buffer only captures velocity from camera movements, and not from geometry movement. On the other hand, computing velocity in the pixel shader during rasterization can capture velocity from both camera movements as well as object movements, since you're using both the previous camera transform as well as the previous object transform for computing the previous pixel position. If you also keep track of the joint transforms from the previous frame, you can also capture velocity from joint-based animations.


#5233594 The most efficient way to create a const array in HLSL?

Posted by MJP on 08 June 2015 - 01:16 PM

Talking about this sort of thing can get a bit complicated, since there's actually a layered approach to how shaders execute on a GPU. Your HLSL code is compiled to D3D bytecode, which is a bunch of instructions and declarations for a virtual ISA. When you actually bind that shader at runtime and use it, the driver will JIT compile the virtual ISA bytecode into the actual ISA used by the GPU. It's actually a bit like CLR/.NET in this regard, in that you have different semantics for both the virtual machine and the actual hardware. So to simplify things, I'm mostly just going to talk about D3D's virtual ISA here since there's such a wide variety of GPU's out in the wild.

The D3D shader ISA does not have any concept of a stack. So there is possibility of your array being placed into the stack, and read from there.The ISA can only read values from registers, or from a buffers/textures. Which one of these it uses will depend on how you declare the array, and also on the code that uses the array. In your particular case, it primarily depends on whether you declare your array in a constant buffer and whether or not the compiler unrolls your for loop. The first part is simple: if you declare your array inside of a cbuffer declarataion, or declare as a global without the "static" modifier, then the array data will be loaded out of a constant buffer. The second part is a bit more complicated. In general the HLSL compiler likes to unroll loops whenever it can, which is typically when the number of iterations is known at compile time. In your case the number of iterations is fixed at 16, and so its likely that the compiler will unroll the loop. If it unrolls the loop and your array is marked as constant (doesn't come from a buffer), then the compiler can essentially inline your array data right into the code generated for the unrolled loop. This can be nice, since it removes the need for any memory accessing instructions when using the array data. However it also results in more total program instructions compared to using flow control instructions. Note that you can try to force the compiler to unroll the loop by using the [unroll] attribute on your for loop. If the compiler doesn't unroll the loop and instead uses dynamic flow control instructions (either because the loop count isn't known, or because you used the [loop] attribute), then the compiler will no longer be able to directly embed your array values into the code, since the same code is used for each loop iteration. It also can't just pre-load the array values into a bunch of registers, since the D3D ISA lacks the ability to dynamically index into registers. So what it will typically do is embed your array values into a compiler-generated constant buffer. This is a special constant buffer that the driver manages transparently, and it allows the shader program to dynamically index into the array using memory instructions.

With all of that said, I would recommend keeping the code the way you have it. Doing it that way should allow the compiler to unroll the loop, which is usually good from a performance point of view. Not only will you avoid memory accesses from reading the values out of a constant buffer, but unrolling can also give the hardware a better opportunity to pipeline your shadow map texture fetches in order to hide latency.


#5233388 Temporary Buffer Management in Post-Processing

Posted by MJP on 07 June 2015 - 11:48 AM

What I've done in the past is to maintain a list of temporary textures that can be used by post-processing. Whenever a post-processing pass needs a temporary texture, it asks for one by specifying a size and format. If one already exists that isn't currently being used, that texture gets marked as "in use" and is returned to the post-processing pass that requested it. Otherwise, a new texture is created. Then when the post-processing pass is done using the texture, it marks it as "free" so that another pass can potentially use it. This potentially uses more memory than D3D12 where you can share memory between arbitrary textures, and requires you to allocate your textures during your first frame. But on the upside it's very simple, and can still save quite a bit of memory.


#5233204 order of resource creation

Posted by MJP on 06 June 2015 - 01:16 PM

On Windows Vista and up all GPU memory is managed by WDDM, not the driver. WDDM will "virtualize" the memory in the sense that the driver will only work with references to allocations, and the actual contents of those allocations may be moved in and out of GPU memory depending on what's currently being used by the entire system. On Windows 10 with WDDM2.0 and D3D12, things are different yet again since that model supports full-bown virtual addressing on the GPU with unique per-process virtual address spaces. In other words, things have changed a bit on the memory management side of things since that document was written. Over-committing GPU memory can definitely still be a problem, however with D3D11 it will typically manifest as performance issues instead of out-of-memory errors. This is because of WDDM's paging system, which will attempt to move your resources in and out of GPU memory mid-frame. Nvidia actually just put out an article about diagnosing these problems, so you should definitely read it over.


#5232993 Questions about the final stages of the graphics pipeline

Posted by MJP on 05 June 2015 - 12:46 PM

Typically, it will do the depth-test beforehand if:
Blending is disabled.
The shader does not output a custom depth or stencil value.
The shader does not use a clip/discard instruction to implement alpha testing.
Fixed-function alpha testing is disabled.
The pixel-shader doesn't perform random writes to memory using UAVs.


I don't think that I've ever heard of a GPU disabling early depth due to blending being enabled, nor can I think of a reason why that would be the case. Early depth/stencil is still completely valid when blending is used, since blending doesn't affect the depth or stencil test at all

Also related to the UAV's, D3D11 lets you specify the [earlydepthstencil] attribute to force early depth/stencil optimizations to be enabled even if you have writes to UAV's.


#5232500 Color multisampling with multiple rendertargets in the geometry pass

Posted by MJP on 02 June 2015 - 06:55 PM

Handling MSAA with deferred rendering is fairly complicated. You can't just enable multisampling for your G-Buffer render targets and then resolve them: doing this won't provide the same result as forward rendering. Instead you need to calculate lighting for each subsample in your G-Buffer targets, and then resolve that result. Doing this in a naive way is expensive, since you would be effectively supersampling the lighting phase (which is typically the most expensive part for a deferred rendererer). To make it workable, you need to instead choose per-pixel whether to apply lighting for just one sub-sample, or whether to apply it for all sub-samples. Typically this is done using some sort of conservative metric, for instance by checking the depth of each subsample and only using per-sample lighting if the difference in depth samples passes some threshold.

Here's some reading material for you:

https://software.intel.com/en-us/articles/deferred-rendering-for-current-and-future-rendering-pipelines

http://www.crytek.com/download/Sousa_Graphics_Gems_CryENGINE3.pdf

http://www.crytek.com/cryengine/presentations/the-rendering-technologies-of-crysis-3

http://docs.nvidia.com/gameworks/content/gameworkslibrary/graphicssamples/d3d_samples/antialiaseddeferredrendering.htm

http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Deferred%20Shading%20Optimizations.pps


#5231120 Fov and proportional Depth

Posted by MJP on 26 May 2015 - 01:38 PM

Ahh sorry, I didn't see your tag.

As an alternative, you can use ddx/ddy to calculate the mip level manually. Something like this should work (not tested):

float MipLevel(in float2 uv)
{ 
    float2 dx_uv = ddx(uv);
    float2 dy_uv = ddy(uv);
    float maxSqr = max(dot(dx_uv, dx_uv), dot(dy_uv, dy_uv));
 
    return 0.5 * log2(maxSqr); // == log2(sqrt(maxSqr));
}



#5231116 Depth texture empty (shadowmapping)

Posted by MJP on 26 May 2015 - 01:30 PM

SamplerComparisonState lets you use the hardware's PCF, which is generally faster than doing it manually in the shader. Basically you get 2x2 PCF at the same cost as a normal bilinear texture fetch, which is pretty nice.

To use it, you want to create your sampler state with D3D11_FILTER_COMPARISON_MIN_MAG_MIP_LINEAR and D3D11_COMPARISON_LESS_EQUAL. Then in your shader, declare your sampler as a SamplerComparisonState, and sample your shadow map texture using SampleCmp or SampleCmpLevelZero. For the comparison value that you pass to SampleCmp, you pass the pixel's projected depth in shadow space. The hardware will then compare the pixel depth against the depth from the shadow map texture, and return 1 when the pixel depth is less than or equal to the shadow map depth.


#5230915 Fov and proportional Depth

Posted by MJP on 25 May 2015 - 04:16 PM

So it sounds like you just want to fade off a detail texture at a point where the details would be imperceptibly small. Unfortunately this doesn't just depend on depth (as you've already discovered), but also the FOV as well as the resolution at which you're rasterizing. Really what you want is to do the same thing that the hardware does for calculating which miplevel to use, which is analyze the gradients of your texture UV's. This can be done using the quad derivative functions that are available in HLSL or GLSL, or with HLSL it's also possible to get the texture LOD level directly with Texture2D.CalculateLevelOfDetail (I would imagine that GLSL has an equivalent as well). With this, you could pick a mip level for your grass texture at which it should fade out, and then use the returned LOD value to compute your alpha:

float lod = DetailTexture.CalculateLevelOfDetail(DetailSampler, uv);
float alpha = 1.0f - smoothstep(StartFadeMipLevel, EndFadeMipLevel, lod);



#5230913 What's the deal with setting multiple viewports on the rasterizer?

Posted by MJP on 25 May 2015 - 04:01 PM

Apparently it's a semantic applied to the geometry shader output, but I would imagine you can apply it to the vertex shader output if you have no geometry shader (I may be wrong on this).


Unfortunately, that's not the case. You can only use it as an output from a geometry shader.

Recent AMD hardware supports setting it from a vertex shader at the hardware level, but it's not exposed in D3D11. However they did expose it as an OpenGL extension.


#5230912 Depth texture empty (shadowmapping)

Posted by MJP on 25 May 2015 - 03:54 PM

Your code for creating the depth texture and corresponding DSV + SRV looks correct, and so does your vertex shader code. If I were you, I would take a frame capture using RenderDoc to see what's going on. First, I would check the depth texture after rendering shadow casters to see if it looks correct. Keep in mind that for a depth texture, if you used a perspective projection then it will appear mostly white by default. To get a better visualization, use the range slider to set the start range to about 0.9. If the depth texture looks okay, then I would check the draw call where you use the shadow map to make sure that your textures and samplers are bound correctly.

As for that sampler state that you've created, how exactly are you using it? Are you trying to use it with a SamplerComparisonState in your pixel shader? Or are you just using a regular SamplerState for sampling from your shadow map texture?

Either way, always make sure that you've created your device with the D3D11_CREATE_DEVICE_DEBUG flag when you're debugging problems like this. It will cause D3D to output warnings to your debugger output window whenever an error occurs due to API misuse.


#5230600 Problem Alpha Blending in DirectX

Posted by MJP on 23 May 2015 - 01:56 PM

Do you disable depth buffer writes when rendering the transparent cube?


#5229740 Clamp light intensity

Posted by MJP on 18 May 2015 - 07:47 PM

Like Hodgman mentioned, mixing small mirror-like roughness values with infinitely-small point lights is bad news. Not only will you get those unreasonably-high values out of your BRDF, but that specular highlight is going to alias like crazy. So unless crazy sparkling specular is part of your game's look, I would avoid the lower roughness range for analytical light sources. It can work okay for area lights (or approximations to area lights, such as environment maps), but not for point lights.

Clamping can be good, especially if you get into higher light intensities. Note that if you want to get using real-world photometric units, FP16 won't be enough and you'll need to introduce some kind of scale factor to avoid overflow in the specular highlights. You should also note that even if you clamp, you can still cause overflow after-the-fact by using alpha blending. During production of The Order we actually had this problem all over the place, mainly due to light bulbs stacking up on the same pixel. We ended up just detecting overflow early on in the PostFX chain, and converting back to a reasonable value. It was heavy-handed, but guaranteed that invalid values didn't slip through into DOF and bloom and create the dreaded "squares of death".

We also used fp16 buffers everywhere, since R11G11B10 wasn't enough precision for our intensity range. You'll definitely want fp16 if you decide to use photometric units, since there's a very large range of values for real-world intensities.




PARTNERS