Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Today, 01:14 AM

#5276220 Shadow map depth range seems off

Posted by on 17 February 2016 - 04:46 PM

^^^ what mhagain said. I would suggest reading through this article for some good background information on the subject.

Also, a common trick for visualizing a perspective depth buffer is to just ignore the range close to the near clip plane and then treat the remaining part as linear. Like this:

float zw = DepthBuffer[pixelPos];  
float visualizedDepth = saturate(zw - 0.9f) * 10.0f;  
You can also compute the original view-space Z value from a depth buffer if you have the original projection matrix used for generating that depth buffer. If you take this and normalize to a [0,1] range use the near and far clip planes, then you get a nice linear value:

float zw = DepthBuffer[pixelPos];
float z = Projection._43 / (zw - Projection._33);
float visualizedDepth = saturate((z - NearClip) / (FarClip - NearClip));

#5275840 D3D12: Copy Queue and ResourceBarrier

Posted by on 15 February 2016 - 08:35 PM

Honestly i use a direct queue for everything including moving data to and from the GPU. The Present API only presents at multiples of the screen refresh rate (I haven't had luck getting unlocked fps yet), and i get 120 FPS whether i use a direct queue or a copy queue. Unless you are moving a lot of data to and from the GPU, i personally feel the copy queue just makes things more complex than they really need to be for how much performance gain you might get with it.

It definitely depends on how much data you're moving around, and how long it might take the GPU to copy that data. The problem with using the direct queue for everything is that it's probably going to serialize with your "real" graphics work. So if you submit 15ms worth of graphics work for a frame and you also submit 1ms worth of resource copying on the direct queue, then your entire frame will probably take 16ms on the GPU. Using a copy queue could potentially allow the GPU to execute the copy while also concurrently executing graphics work, reducing your total frame time.

#5275061 Diffuse IBL - Importance Sampling vs Spherical Harmonics

Posted by on 09 February 2016 - 05:48 PM

Spherical harmonics are pretty popular for representing environment diffuse, because they're fairly compact and they can be evaluated with just a bit of shader code. For L2 SH you need 9 coefficients (so 27 floats for RGB), which is about the size of a 2x2 cubemap. However it will always introduce some amount of approximation error compared to ground truth, and in some cases that error can be very noticeable. In particular it has the problem where very intense lighting will cause the SH representation to over-darken on the opposite side of the sphere, which can lead to totally black (or even negative!) values.

The other nice thing about SH is that it's really simple to integrate. With specular pre-integration you usually have to integrate for each texel of your output cubemap, which is why importance sampling is used as an optimization. If you're integrating to SH you don't need to do this, you're effectively integrating for a single point. This means you can just loop over all of the texels in your source cubemap, which means you won't "miss" any details. You can look at my code for an example, if you want.

#5274941 Limiting light calculations

Posted by on 09 February 2016 - 02:49 AM

I was always advised to avoid dynamic branching in pixel shader.

You should follow this advice if you're working on a GPU from 2005. If you're working on one from the last 10 years...not so much. On a modern GPU I would say that there's two main things you should be aware of with dynamic branches and loops:

1. Shaders will follow the worst case within a warp or wavefront. For a pixel shader, this means groups of 32-64 pixels that are (usually) close together in screen space. What this means is that if you have an if statement where the condition evaluates false to 31 pixels but true for one pixel in the 32-thread warp, then they're all to execute what's inside the if statement. This can be especially bad if you have an else clause, since you can end up with your shader executing both the "if" as well as the "else" part of your branch! For loops it's similar: the shader will keep executing the loop until all threads have hit the termination condition. Note that if you're branching or looping on something from a constant buffer, then you don't need to worry about any of this. For that case every single pixel will take the same path, so there's no coherency issue.

2. Watch out for lots of nested flow control. Doing this can start to add overhead from the actual flow control instructions (comparisons, jumps, etc.), and can cause the compiler to use a lot of general purpose registers.

For the case you're talking about, a dynamic branch is totally appropriate and is likely to give you a performance increase. The branch should be fairly coherent in screen space, so you should get lots of warps/wavefronts that can skip what's inside of the branch. For an even more optimal approach, look into deferred or clustered techniques.

#5274702 Questions on Baked GI Spherical Harmonics

Posted by on 06 February 2016 - 05:34 PM

For The Order we kept track of "dead" probes that were buried under geometry. These were detected by counting the percentage of rays that hit backfaces when baking the probes, and marking as "dead" if over a threshold. Early in the project the probe sampling was done on the CPU, and was done once per object. When doing this, we would detect dead probes during filtering (they were marked with a special value), and give them a filter weight of 0. Later on we moved to per-pixel sampling on the GPU, and we decided that manual filtering would be too expensive. This lead us to preprocess the probes by using a flood-fill algorithm to assign dead probes a value from their closest neighbor. We also ended up allowing the lighting artists to author volumes, where any probes inside of the volume would be marked as "dead". This was useful for preventing leaking through walls or floors.

#5274158 A problem about implementing stochastic rasterization for rendering motion blur

Posted by on 03 February 2016 - 08:33 PM

So they're using iFragCoordBase to lookup a value the random time texture. This will essentially "tile" the random texture over the screen, taking MSAA subsamples into account. So if there's no MSAA the random texture will be tiled over 128x128 squares on the screen, while for the 4xMSAA case it will be tiled over 64x64 squares. This ensures that each of the 4 subsamples gets a different random time value inside of the loop.

#5274149 Normalized Blinn Phong

Posted by on 03 February 2016 - 07:50 PM

You should read through the section called "BRDF Characteristics" in chapter 7, specifically the part where they cover directional-hemispherical reflectance. This value is the "area under the function" that Hodgman is referring to, and must be <= 1 in order for a BRDF to obey energy conservation. As Hodgman mentioned a BRDF can still return a value > 1 for a particular view direction, as long as the result is still <= 1 after integrating about the hemisphere of possible view directions.

#5273767 Shadow Map gradients in Forward+ lighting loop

Posted by on 01 February 2016 - 07:24 PM

In our engine I implemented it the way that you've described. It definitely works, but it consumes extra registers which isn't great. I don't know of any cheaper alternatives that would work with anisotropic filtering.

#5272923 directional shadow map problem

Posted by on 27 January 2016 - 07:48 PM

You can use bias value that depends on angle between surface normal and direction to light:

float bias = clamp(0.005 * tan(acos(NoL)), 0, 0.01);
where: NoL = dot(surfaceNormal, lightDirection);

tan(acos(x)) == sqrt(1 - x * x) / x

You really do not want to use the inverse trig functions on a GPU. They are not natively supported by their ALUs, and will cause the compiler to generate a big pile of expensive code.

#5272897 D3d12 : d24_x8 format to rgba8?

Posted by on 27 January 2016 - 05:01 PM

Yes they mentionned it on some twitter account, but then does GCN store 24 bits depth value as 32 bits if a 24 bits depth texture is requested ?
Since there is no performance bandwidth advantage since 24 bits needs to be stored in a 32 bits location and 8 bits are wasted the driver might as well promote d24x8 to d32 + r8 ?

No, they store it as 24-bit fixed point with 8 bits unused. It only uses 32 bits if you request a floating point depth buffer, and they can't promote from fixed point -> floating point since the distribution of precision is different.

[EDIT] Is it possible to copy depth component to a RGBA8 (possibly typeless) texture or do I have to use a shader to manually convert the float depth to int, do some bit shift operations and store component separatly ?

You can only copy between textures that have the same format family.

#5272794 D3d12 : d24_x8 format to rgba8?

Posted by on 26 January 2016 - 09:03 PM

D3D12 doesn't allow creating a shader resource view for a resource that was created with a different format. The only exception is if the resource was created with a "TYPELESS" format, in which case you can create an SRV using a format from that same "family". So for instance if you create a texture with R8G8B8A8_TYPELESS, you can create an SRV that reads it as R8G8B8A8_UNORM.

If you really wanted to, you can create two placed resources at the same memory offset within the same heap. However this is very unlikely to give you usable results, since the hardware is free to store the texture data in a completely different layout or swizzle pattern for resources that use different formats. You also can't keep depth buffers and normal textures in the same heap if the hardware reports RESOURCE_HEAP_TIER_1, which applies to older Nvidia hardware.

#5272522 Shimmering / offset issues with shadow mapping

Posted by on 24 January 2016 - 05:34 PM

The first is that I need to implement some culling of objects that need not be considered to render the shadow maps of (I haven't really looked into this yet and it is probably simple enough; I'd imagine I can just perform my usual frustum culling routine using a ortho view-projection matrix that has its near plane pulled back to the light source (or rather the length of the scene camera's z-range) but is otherwise the same as the light matrix of each cascade split?).

Yes, you can perform standard frustum/object intersection tests in order to cull objects for each cascade. Since the projection is orthographic, you can also treat the frustum as an OBB and test for intersection against that. Just be aware that if you use pancaking, then you have to treat the frustum as if it extended infinitely towards the light source. If you're going to cull by testing against the 6 planes of the frustum, then you can simply skip testing the near clip plane.

The second is how to get this shadow map interpolation working properly. I just whipped the following up for testing, it doesn't really create any visible difference from just leaving the interpolation part out alltogether, but am I going about this in the right way or would I be better off to change my approach?

Generally you want to determine if your pixel is at the "edge" of a cascade, using whichever method you use for partitioning the viewable area into your multiple cascades. You can have a look at my code for an example, if you'd like. In that sample app, the cascade is chosen using the view-space Z value (depth) of the pixel. It basically checks how far into the cascade the pixel is, and if it's in the last 10% of the depth range it starts to blend in the next cascade.

#5271622 Shimmering / offset issues with shadow mapping

Posted by on 17 January 2016 - 06:28 PM

By the way, regarding this part of your code:

// The near- and far value multiplications are arbitrary and meant to catch shadow casters outside of the frustum, 
// whose shadows may extend into it. These should probably be better tweaked later on, but lets see if it at all works first.
It's not necessary to pull back the shadow near clip in order to capture shadows from meshes that are outside the view frustum. You can handle this with a technique sometimes referred to as "pancaking", which flattens said meshes onto each cascade's near clip plane. See this thread for details. I recommend implementing by disabling depth clipping in the rasterizer state, since it avoids artifacts for triangles that intersect the near clip plane.

#5271620 [D3D12] Freeing committed resources?

Posted by on 17 January 2016 - 06:23 PM

ComPtr<T> calls Release on the underlying pointer when its assigned a new value. So you can just do "vertexBuffer = ComPtr<ID3D12Resource>", and Release will be called. Alternatively you can call "vertexBuffer.Reset()", which is equivalent. Or if you're going to pass the ComPtr to CreateCommittedResource, then it will call Release as part of its overloaded "&" operator. The resource will then be destroyed whenever the ref count hits 0, so if that's the only reference to that resource then it will be destroyed immediately.

Just be careful when destroying resources, since its invalid to do so while the GPU is still using it. So if you've just submitted a command list that references the resource, you need to wait on a fence to ensure that the GPU is finished before you destroy it. If you mess this up the debug layer will typically output an error message.

#5271617 Cubemap Depth Sample

Posted by on 17 January 2016 - 06:17 PM

It looks like the direction that you use to sample the cubemap is backwards. You want to do "shadowPosH - l", assuming that "l" is the world space position of your point light. The code that vinterberg posted is actually incorrect in the same way: it uses the variable name "fromLightToFragment", but it's actually computing a vector from the fragment to the light (this is why it uses "-fromLightToFragment" when sampling the cube map).

Also...if you're going to use SampleCmp to sample a depth buffer, then you can't use the distance from your point light to the surface as the comparison value. Your depth buffer will contain [0, 1] values that correspond to z/w after applying your projection matrix, not the absolute world space distance from the light to the surface. This means you need to project your light->surface vector onto the axis that corresponds to the cubemap face you'll be sampling from:

float3 shadowPos = surfacePos - lightPos;
float3 shadowDistance = length(shadowPos);
float3 shadowDir = normalize(shadowPos);

// Doing the max of the components tells us 2 things: which cubemap face we're going to use,
// and also what the projected distance is onto the major axis for that face.
float projectedDistance = max(max(abs(shadowPos.x), abs(shadowPos.y)), abs(shadowPos.z));

// Compute the project depth value that matches what would be stored in the depth buffer
// for the current cube map face. "ShadowProjection" is the projection matrix used when
// rendering to the shadow map.
float a = ShadowProjection._33;
float b = ShadowProjection._43;
float z = projectedDistance * a + b;
float dbDistance = z / projectedDistance;

return ShadowMap.SampleCmpLevelZero(PCFSampler, shadowDir, dbDistance - Bias);