Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Online Last Active Today, 07:46 PM

#5272522 Shimmering / offset issues with shadow mapping

Posted by on 24 January 2016 - 05:34 PM

The first is that I need to implement some culling of objects that need not be considered to render the shadow maps of (I haven't really looked into this yet and it is probably simple enough; I'd imagine I can just perform my usual frustum culling routine using a ortho view-projection matrix that has its near plane pulled back to the light source (or rather the length of the scene camera's z-range) but is otherwise the same as the light matrix of each cascade split?).

Yes, you can perform standard frustum/object intersection tests in order to cull objects for each cascade. Since the projection is orthographic, you can also treat the frustum as an OBB and test for intersection against that. Just be aware that if you use pancaking, then you have to treat the frustum as if it extended infinitely towards the light source. If you're going to cull by testing against the 6 planes of the frustum, then you can simply skip testing the near clip plane.

The second is how to get this shadow map interpolation working properly. I just whipped the following up for testing, it doesn't really create any visible difference from just leaving the interpolation part out alltogether, but am I going about this in the right way or would I be better off to change my approach?

Generally you want to determine if your pixel is at the "edge" of a cascade, using whichever method you use for partitioning the viewable area into your multiple cascades. You can have a look at my code for an example, if you'd like. In that sample app, the cascade is chosen using the view-space Z value (depth) of the pixel. It basically checks how far into the cascade the pixel is, and if it's in the last 10% of the depth range it starts to blend in the next cascade.

#5271622 Shimmering / offset issues with shadow mapping

Posted by on 17 January 2016 - 06:28 PM

By the way, regarding this part of your code:

// The near- and far value multiplications are arbitrary and meant to catch shadow casters outside of the frustum, 
// whose shadows may extend into it. These should probably be better tweaked later on, but lets see if it at all works first.
It's not necessary to pull back the shadow near clip in order to capture shadows from meshes that are outside the view frustum. You can handle this with a technique sometimes referred to as "pancaking", which flattens said meshes onto each cascade's near clip plane. See this thread for details. I recommend implementing by disabling depth clipping in the rasterizer state, since it avoids artifacts for triangles that intersect the near clip plane.

#5271620 [D3D12] Freeing committed resources?

Posted by on 17 January 2016 - 06:23 PM

ComPtr<T> calls Release on the underlying pointer when its assigned a new value. So you can just do "vertexBuffer = ComPtr<ID3D12Resource>", and Release will be called. Alternatively you can call "vertexBuffer.Reset()", which is equivalent. Or if you're going to pass the ComPtr to CreateCommittedResource, then it will call Release as part of its overloaded "&" operator. The resource will then be destroyed whenever the ref count hits 0, so if that's the only reference to that resource then it will be destroyed immediately.

Just be careful when destroying resources, since its invalid to do so while the GPU is still using it. So if you've just submitted a command list that references the resource, you need to wait on a fence to ensure that the GPU is finished before you destroy it. If you mess this up the debug layer will typically output an error message.

#5271617 Cubemap Depth Sample

Posted by on 17 January 2016 - 06:17 PM

It looks like the direction that you use to sample the cubemap is backwards. You want to do "shadowPosH - l", assuming that "l" is the world space position of your point light. The code that vinterberg posted is actually incorrect in the same way: it uses the variable name "fromLightToFragment", but it's actually computing a vector from the fragment to the light (this is why it uses "-fromLightToFragment" when sampling the cube map).

Also...if you're going to use SampleCmp to sample a depth buffer, then you can't use the distance from your point light to the surface as the comparison value. Your depth buffer will contain [0, 1] values that correspond to z/w after applying your projection matrix, not the absolute world space distance from the light to the surface. This means you need to project your light->surface vector onto the axis that corresponds to the cubemap face you'll be sampling from:

float3 shadowPos = surfacePos - lightPos;
float3 shadowDistance = length(shadowPos);
float3 shadowDir = normalize(shadowPos);

// Doing the max of the components tells us 2 things: which cubemap face we're going to use,
// and also what the projected distance is onto the major axis for that face.
float projectedDistance = max(max(abs(shadowPos.x), abs(shadowPos.y)), abs(shadowPos.z));

// Compute the project depth value that matches what would be stored in the depth buffer
// for the current cube map face. "ShadowProjection" is the projection matrix used when
// rendering to the shadow map.
float a = ShadowProjection._33;
float b = ShadowProjection._43;
float z = projectedDistance * a + b;
float dbDistance = z / projectedDistance;

return ShadowMap.SampleCmpLevelZero(PCFSampler, shadowDir, dbDistance - Bias);

#5271472 Cube shadow mapping issue

Posted by on 16 January 2016 - 05:16 PM

I'm going to lock this thread since you already have another thread open about this issue. I'd also like to add that it's not really appropriate for this forum to just dump some code, and then ask people to write a feature for you. If you're having trouble, then please feel to ask questions, and the community here will do their best to answer them.

#5271471 Cubemap Depth Sample

Posted by on 16 January 2016 - 05:10 PM

You don't need a render target to write depth, you just need a depth stencil view. For rendering to the 6 faces separately, you just need 6 depth stencil views that each target a particular face. It's almost exactly like the code that you have for creating the 6 render target views, except that you create depth stencil views:
for(uint32_t i = 0; i < 6; ++i)
    D3D11_DEPTH_STENCIL_VIEW_DESC dsvDesc = { };
    dsvDesc.Format = format;
    dsvDesc.ViewDimension = D3D11_DSV_DIMENSION_TEXTURE2DARRAY;
    dsvDesc.Texture2DArray.ArraySize = 1;
    dsvDesc.Texture2DArray.FirstArraySlice = i;
    dsvDesc.Texture2DArray.MipSlice = 0;
    dsvDesc.Flags = 0;
    DXCall(device->CreateDepthStencilView(textureResource, &dsvDesc, &arraySliceDSVs[i]));
The other way to do it is to have 1 depth stencil view that targets the entire array, and then use SV_RenderTargetArrayIndex from a geometry shader in order to specify which slice you want to render to.

#5271208 Vertex to cube using geometry shader

Posted by on 14 January 2016 - 08:56 PM

Relevant blog post: http://www.joshbarczak.com/blog/?p=667

#5271201 [D3D12] Render wireframe on top of solid rendering

Posted by on 14 January 2016 - 08:00 PM

With a "standard" projection, your depth buffer will have a value of 0.0 at the near clipping plane and 1.0 at the far clipping plane. Based on your description and the image that you posted, you're definitely using a "reversed" projection. This is fine, in fact it gives you better precision for floating point depth buffers. The reason I asked, is because it means that you'll want to use a *positive* depth bias instead of the normal negative bias.

If you can't get the depth bias to work, you could always do something custom in the pixel shader. For instance, you could output SV_DepthGreaterEqual from the pixel shader, and add a small bias directly in the shader. Or alternatively, you can do a "manual" depth test using alpha blending or discard. For each pixel you could read in the depth buffer value, compare against the depth of the pixel being shaded, and set alpha to 1 if the depth is close enough within some threshold (or set it to 0 otherwise).

#5271015 HLSL Bilinear Filtering ?

Posted by on 14 January 2016 - 12:29 AM

Assuming that you've created and bound the sampler state correctly, that should be sufficient to have bilinear filtering enabled.

The way that rasterization and pixel shading works is that by default, all attributes that are output by your vertex shader will be interpolated to the exact center of each pixel. Here's a quick and dirty diagram that shows what I mean by that:

Pixel Coordinates.png

This diagram shows the case of a 2x2 render target that's fully covered by a quad with 4 vertices. As far as SV_Position goes, the edges of the pixels are located at integer coordinates while the center (which is where attributes are interpolated to) is located at .5. So the X coordinate of the first pixel is 0.5, the next one is 1.5, then 2.5, and so on. The UV's are interpolated, in the same way, except that they're typically 0 or 1 at all of the vertices. This means that they end up being fractional at every pixel, and their value ends up being lerp(float2(0, 0), float2(0, 1), svPosition / renderTargetSize). So if you wanted to sample the texel neighboring to your right with a UV coordinate, you would do UV + (1.0f / renderTargetSize). Alternatively, Sample takes an integer offset that you can use to specify that you want to sample exactly 1 texel over to the right. Or if your prefer, you can use the bracket operators to directly load a texel using SV_Position like this: return Texture0[uint2(svPosition + float2(1, 0))].

Now let's say that we're downsampling that 2x2 render target to 1x1. In that case, SV_Position will be (0.5, 0.5) for that one pixel. However the UV will be (0.5, 0.5), since the center of this pixel lies exactly in the center of your 4 vertices. In other words, the 1x1 target's single pixel center would be in the exact center of the above diagram. This is perfect for bilinear filtering, since it means that the filter kernel will sample all 4 texels and blend them all equally. However if you wanted to sample a single texel with UV's, you would need to offset by 1/4 of a pixel. So for example to sample the bottom right texel, you would want to do UV + (float2(0.25f, 0.25f) / renderTargetSize). Or again you could load using explicit coordinates by doing Texture0[uint2(svPosition * 2.0f + 1.0f)]

#5269998 Explicit suffix “.f” for floats in HLSL.

Posted by on 08 January 2016 - 12:31 AM

It's the same as C++: the "f" suffix specifies that it's a floating point literal. If you leave it off you will get an integer, which means you will get an implicit conversion to float in cases like the second line of your code snippet. It's essentially the same as doing this:

OUT.position = mul( ModelViewProjection, float4( IN.position, (float)1 ) );
Using the "f" suffix is more explicit: it tells the compiler that you want your value to be a float, and therefore no implicit conversion is necessary. In most cases there will probably be no difference in the generated code, it's more a matter of style and personal preference. However you should be careful when mixing floats with doubles, since in some cases that will give you different results, and can potentially result in more expensive double-precision ops being used by the GPU.

#5269344 Does anyone use counter/Append/Consume buffers very much?

Posted by on 05 January 2016 - 12:22 AM

The only catch with an append buffer is that you can only append one element at a time. This can be wasteful if a single thread decides to append multiple elements, since a lot of hardware implements an append buffer by performing atomic increments on a "hidden" counter variable. For such cases, you can potentially get better performance (as well as better data coherency) by performing a single atomic add in order to "reserve" multiple elements in the output buffer.

#5269091 Questions on Baked GI Spherical Harmonics

Posted by on 03 January 2016 - 06:40 PM

I agree with Dave: you should remove the ambient term. To start out, I would try just rendering your cubemaps with nothing but direct lighting applied to your meshes. When you bake this and combine it with direct lighting at runtime, you'll effectively have 1 bounce GI. Since it's only 1 bounce you'll end up with shadowed areas that are too dark, but this is probably better than just introducing a flat ambient term. If you want to approximate having more bounces, a really easy way to that is to just repeat your baking step, but this time feeding in the results from your first bake pass. If you do this, then every pass gives you another bounce. As long as you have physically-plausible materials where they always absorb some energy (in other words, each component of the diffuse albedo is < 1), then every added bounce will have a smaller effect on the final result. This means that you can often get a decent approximation of GI with as few as 3 or 4 bounces.

#5269089 Particle Systems Full GPU

Posted by on 03 January 2016 - 06:33 PM

The last game I shipped simulated all particles on the GPU. We still had our legacy codepath for CPU-simulated particles, but I don't think that we actually had any assets in the shipping game that used it. At least for us all of our use cases for particles were things that didn't require feedback from the simulation back to the CPU. Personally, if any such case came up I would be fine writing an optimized component/actor system to handle that one particular case instead of making our particle systems flexible enough to handle it. For the other 99% of cases, the GPU path is extremely fast, and simulation was essentially free as far as we were concerned (especially on PS4 where we could use async compute).

I would also disagree with the notion that "uploading" is a high cost of GPU-based particle systems. In our case, the amount of CPU->GPU data that occurred every frame was very small, and generally amounted to a few small constant buffers. Everything else was pre-packed into static buffers or textures kept in GPU memory.

#5268651 IASetIndexBuffer Error

Posted by on 31 December 2015 - 05:48 PM

So it's saying the virtual address supplied by your index buffer view is outside the range of the resource to which that address belongs to. However it lists the address of the resource as 'nullptr', which is peculiar. Is there any chance that perhaps you're already destroyed the resource containing your index buffer?

#5268650 VertexBuffers and InputAssmbler unnecessary?

Posted by on 31 December 2015 - 05:45 PM

I personally shipped a project that 100% used manual vertex fetch in the shader, although this was for a particular console that has a AMD GCN variant as its GPU. AMD GPUs have no fixed-function vertex fetch, and so they implement "input assembler" functionality by generating small preamble for the VS that they call a "fetch shader". They can basically generate this fetch shader for any given input layout, and the vertex shader uses it through something analogous to a function call for their particular ISA. When the fetch shader runs, it pulls the data out of your vertex buffer(s) using SV_VertexID and SV_InstanceID, and deposits them all in registers. The vertex shader then knows which registers contain the vertex data according to convention, and it can proceed accordingly. Because of this setup, the fetch shader can sometimes have suboptimal code generation compared to a vertex shader that performs manual vertex fetch. The fetch shader must ensure that all vertex data is deposited into registers up-front, and must ensure that the loads are completed before passing control back to the VS. However if the VS is fetching vertex data, then the vertex fetches can potentially be interleaved with other VS operations, and can potentially re-use registers whose contents are no longer needed.

Unfortunately I'm not sure if it's the same when going through DX11, since there are additional API layers in the way that might prevent optimal code-gen. I'm also not sure which hardware still has fixed-function vertex fetch, and what kind of performance delta you can expect.