Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 29 Mar 2007
Offline Last Active Today, 02:00 AM

#5271472 Cube shadow mapping issue

Posted by MJP on 16 January 2016 - 05:16 PM

I'm going to lock this thread since you already have another thread open about this issue. I'd also like to add that it's not really appropriate for this forum to just dump some code, and then ask people to write a feature for you. If you're having trouble, then please feel to ask questions, and the community here will do their best to answer them.

#5271471 Cubemap Depth Sample

Posted by MJP on 16 January 2016 - 05:10 PM

You don't need a render target to write depth, you just need a depth stencil view. For rendering to the 6 faces separately, you just need 6 depth stencil views that each target a particular face. It's almost exactly like the code that you have for creating the 6 render target views, except that you create depth stencil views:
for(uint32_t i = 0; i < 6; ++i)
    D3D11_DEPTH_STENCIL_VIEW_DESC dsvDesc = { };
    dsvDesc.Format = format;
    dsvDesc.ViewDimension = D3D11_DSV_DIMENSION_TEXTURE2DARRAY;
    dsvDesc.Texture2DArray.ArraySize = 1;
    dsvDesc.Texture2DArray.FirstArraySlice = i;
    dsvDesc.Texture2DArray.MipSlice = 0;
    dsvDesc.Flags = 0;
    DXCall(device->CreateDepthStencilView(textureResource, &dsvDesc, &arraySliceDSVs[i]));
The other way to do it is to have 1 depth stencil view that targets the entire array, and then use SV_RenderTargetArrayIndex from a geometry shader in order to specify which slice you want to render to.

#5271208 Vertex to cube using geometry shader

Posted by MJP on 14 January 2016 - 08:56 PM

Relevant blog post: http://www.joshbarczak.com/blog/?p=667

#5271201 [D3D12] Render wireframe on top of solid rendering

Posted by MJP on 14 January 2016 - 08:00 PM

With a "standard" projection, your depth buffer will have a value of 0.0 at the near clipping plane and 1.0 at the far clipping plane. Based on your description and the image that you posted, you're definitely using a "reversed" projection. This is fine, in fact it gives you better precision for floating point depth buffers. The reason I asked, is because it means that you'll want to use a *positive* depth bias instead of the normal negative bias.

If you can't get the depth bias to work, you could always do something custom in the pixel shader. For instance, you could output SV_DepthGreaterEqual from the pixel shader, and add a small bias directly in the shader. Or alternatively, you can do a "manual" depth test using alpha blending or discard. For each pixel you could read in the depth buffer value, compare against the depth of the pixel being shaded, and set alpha to 1 if the depth is close enough within some threshold (or set it to 0 otherwise).

#5271015 HLSL Bilinear Filtering ?

Posted by MJP on 14 January 2016 - 12:29 AM

Assuming that you've created and bound the sampler state correctly, that should be sufficient to have bilinear filtering enabled.

The way that rasterization and pixel shading works is that by default, all attributes that are output by your vertex shader will be interpolated to the exact center of each pixel. Here's a quick and dirty diagram that shows what I mean by that:

Pixel Coordinates.png

This diagram shows the case of a 2x2 render target that's fully covered by a quad with 4 vertices. As far as SV_Position goes, the edges of the pixels are located at integer coordinates while the center (which is where attributes are interpolated to) is located at .5. So the X coordinate of the first pixel is 0.5, the next one is 1.5, then 2.5, and so on. The UV's are interpolated, in the same way, except that they're typically 0 or 1 at all of the vertices. This means that they end up being fractional at every pixel, and their value ends up being lerp(float2(0, 0), float2(0, 1), svPosition / renderTargetSize). So if you wanted to sample the texel neighboring to your right with a UV coordinate, you would do UV + (1.0f / renderTargetSize). Alternatively, Sample takes an integer offset that you can use to specify that you want to sample exactly 1 texel over to the right. Or if your prefer, you can use the bracket operators to directly load a texel using SV_Position like this: return Texture0[uint2(svPosition + float2(1, 0))].

Now let's say that we're downsampling that 2x2 render target to 1x1. In that case, SV_Position will be (0.5, 0.5) for that one pixel. However the UV will be (0.5, 0.5), since the center of this pixel lies exactly in the center of your 4 vertices. In other words, the 1x1 target's single pixel center would be in the exact center of the above diagram. This is perfect for bilinear filtering, since it means that the filter kernel will sample all 4 texels and blend them all equally. However if you wanted to sample a single texel with UV's, you would need to offset by 1/4 of a pixel. So for example to sample the bottom right texel, you would want to do UV + (float2(0.25f, 0.25f) / renderTargetSize). Or again you could load using explicit coordinates by doing Texture0[uint2(svPosition * 2.0f + 1.0f)]

#5269998 Explicit suffix “.f” for floats in HLSL.

Posted by MJP on 08 January 2016 - 12:31 AM

It's the same as C++: the "f" suffix specifies that it's a floating point literal. If you leave it off you will get an integer, which means you will get an implicit conversion to float in cases like the second line of your code snippet. It's essentially the same as doing this:

OUT.position = mul( ModelViewProjection, float4( IN.position, (float)1 ) );
Using the "f" suffix is more explicit: it tells the compiler that you want your value to be a float, and therefore no implicit conversion is necessary. In most cases there will probably be no difference in the generated code, it's more a matter of style and personal preference. However you should be careful when mixing floats with doubles, since in some cases that will give you different results, and can potentially result in more expensive double-precision ops being used by the GPU.

#5269344 Does anyone use counter/Append/Consume buffers very much?

Posted by MJP on 05 January 2016 - 12:22 AM

The only catch with an append buffer is that you can only append one element at a time. This can be wasteful if a single thread decides to append multiple elements, since a lot of hardware implements an append buffer by performing atomic increments on a "hidden" counter variable. For such cases, you can potentially get better performance (as well as better data coherency) by performing a single atomic add in order to "reserve" multiple elements in the output buffer.

#5269091 Questions on Baked GI Spherical Harmonics

Posted by MJP on 03 January 2016 - 06:40 PM

I agree with Dave: you should remove the ambient term. To start out, I would try just rendering your cubemaps with nothing but direct lighting applied to your meshes. When you bake this and combine it with direct lighting at runtime, you'll effectively have 1 bounce GI. Since it's only 1 bounce you'll end up with shadowed areas that are too dark, but this is probably better than just introducing a flat ambient term. If you want to approximate having more bounces, a really easy way to that is to just repeat your baking step, but this time feeding in the results from your first bake pass. If you do this, then every pass gives you another bounce. As long as you have physically-plausible materials where they always absorb some energy (in other words, each component of the diffuse albedo is < 1), then every added bounce will have a smaller effect on the final result. This means that you can often get a decent approximation of GI with as few as 3 or 4 bounces.

#5269089 Particle Systems Full GPU

Posted by MJP on 03 January 2016 - 06:33 PM

The last game I shipped simulated all particles on the GPU. We still had our legacy codepath for CPU-simulated particles, but I don't think that we actually had any assets in the shipping game that used it. At least for us all of our use cases for particles were things that didn't require feedback from the simulation back to the CPU. Personally, if any such case came up I would be fine writing an optimized component/actor system to handle that one particular case instead of making our particle systems flexible enough to handle it. For the other 99% of cases, the GPU path is extremely fast, and simulation was essentially free as far as we were concerned (especially on PS4 where we could use async compute).

I would also disagree with the notion that "uploading" is a high cost of GPU-based particle systems. In our case, the amount of CPU->GPU data that occurred every frame was very small, and generally amounted to a few small constant buffers. Everything else was pre-packed into static buffers or textures kept in GPU memory.

#5268651 IASetIndexBuffer Error

Posted by MJP on 31 December 2015 - 05:48 PM

So it's saying the virtual address supplied by your index buffer view is outside the range of the resource to which that address belongs to. However it lists the address of the resource as 'nullptr', which is peculiar. Is there any chance that perhaps you're already destroyed the resource containing your index buffer?

#5268650 VertexBuffers and InputAssmbler unnecessary?

Posted by MJP on 31 December 2015 - 05:45 PM

I personally shipped a project that 100% used manual vertex fetch in the shader, although this was for a particular console that has a AMD GCN variant as its GPU. AMD GPUs have no fixed-function vertex fetch, and so they implement "input assembler" functionality by generating small preamble for the VS that they call a "fetch shader". They can basically generate this fetch shader for any given input layout, and the vertex shader uses it through something analogous to a function call for their particular ISA. When the fetch shader runs, it pulls the data out of your vertex buffer(s) using SV_VertexID and SV_InstanceID, and deposits them all in registers. The vertex shader then knows which registers contain the vertex data according to convention, and it can proceed accordingly. Because of this setup, the fetch shader can sometimes have suboptimal code generation compared to a vertex shader that performs manual vertex fetch. The fetch shader must ensure that all vertex data is deposited into registers up-front, and must ensure that the loads are completed before passing control back to the VS. However if the VS is fetching vertex data, then the vertex fetches can potentially be interleaved with other VS operations, and can potentially re-use registers whose contents are no longer needed.

Unfortunately I'm not sure if it's the same when going through DX11, since there are additional API layers in the way that might prevent optimal code-gen. I'm also not sure which hardware still has fixed-function vertex fetch, and what kind of performance delta you can expect.

#5268647 GS Output not being rasterized (Billboards)

Posted by MJP on 31 December 2015 - 04:57 PM

If I understand your code correctly, it looks like you're setting the output vertex position to have z = 0.0 and w = 0.0, which is invalid. Try setting to w to 1.0 instead.

#5268406 [D3D12] Driver level check to avoid duplicate function call?

Posted by MJP on 29 December 2015 - 05:23 PM

As far as I know there's no API-level guarantee that the implementation will filter out redundant calls for you. It's possible that the drivers will do it, but there's no way of knowing without asking them or profiling. Filtering yourself should be pretty easy and cheap, you can just cache the pointer to the PSO that's currently set for that command list and compare with it before setting a new one.

#5267077 OpenGL Projection Matrix Clarifications

Posted by MJP on 19 December 2015 - 05:40 PM

This image is from the presentation Projection Matrix Tricks by Eric Lengyel, and shows how normalized device coordinates work using OpenGL conventions:


As you can see, in OpenGL the entire visible depth range between the near clip plane and the far clip plane is mapped to [-1, 1] in normalized device coordates. So if a position has a Z value of 0 then it it's not actually located at the camera position, it's actually located somewhere between the near clip plane and the far clip plane (but not exactly halfway between, since the mapping is non-linear).

#5266801 Questions on Baked GI Spherical Harmonics

Posted by MJP on 17 December 2015 - 01:27 PM

Yes, you'll either need to use multiple textures or atlas them inside of 1 large 3D texture (separate textures is easier). It would be a lot easier if GPU's supported 3D texture arrays, but unfortunately they don't.