Is there a depth buffer bound? And if so, is depth testing enabled? First glance looks like it's getting rejected. If you look in the event history (usually on the side of the graphics debugger) you may see a little symbol that looks like a crossed out 'Z' (if I'm remembering correctly).
Subresources can be in different states, but the rules still apply when setting resource barriers in that the before and after states must be correct. So for example, you could set mip level 0 to a shader resource and mip level 1 as a render target. See the documentation for D3D12_RESOURCE_TRANSITION_BARRIER, specifically the Subresource member.
You can definitely re-use command lists, as well as command allocators, albeit at different paces.
You can think of a command list as providing the interface to write instructions to the command allocator. The command allocator is basically just a chunk of memory that lives somewhere. It doesn't really matter exactly where - that's for the driver implementer to decide. When you submit a command list to the command queue, you're basically saying "this is the next chunk of work for the GPU to complete and here's the chunk of memory that contains all the instructions". Even if the GPU immediately takes that work on (it probably won't), you still need that chunk of memory containing the instructions to be available the entire time they're being processed and executed.
So for command lists, you can reset them as soon as you have submitted them to the command queue and immediately begin to reuse them, because all you're really resetting is the command allocator they're using. However, you must not reset the command allocator that's just been submitted until you are completely sure that all GPU work has completed. Once you're sure the GPU has finished working on the tasks submitted to it, then you're free to cycle back and reset/reuse that allocator.
I'd recommend searching this site for ID3D12Fence. I'm sure I've seen a few posts over the past few months that have touched on this topic and show how to synchronize command list and allocator usage across multiple frames. To get you started, though, you'll need to think in terms of having enough allocators to cover however many frames of latency you'll be working with.
You seem to have discovered for yourself, and Krzysztof mentioned, how arrays are stored in HLSL constant buffers. A possible solution to this would be to just halve your sample count and use the full float4 for storage. Your example would turn into:
It's OK to reset the command list immediately after submitting it as long as you use a different allocator or wait for a GPU signal that the submitted work has completed.
ID3D12CommandAllocator* pAlloc1, pAlloc2;
// later on
// record some commands
// now you can reset the command list, but use pAlloc2 since pAlloc1 will still be executing
After submitting the command list the second time (after recording into pAlloc2), you need to check the fence value set after the first submission to make sure it has completed before resetting with pAlloc1. You can use more allocators to avoid the likeliness of actually blocking on a fence very often if at all, but allocators only ever increase in size, so the largest batch of commands submitted to one is how much memory that allocator will consume until it's destroyed (COM destroyed through Release, not Reset).
Command Allocator Reset: It seems as though this is literally destroying the contents of the heap, which may have command list data active on the GPU.
Kind of. That's exactly why you'd want to ensure the job is done executing before resetting the command allocator, but it's likely more of a pointer reset to the beginning of the allocator's memory than any type of expensive deconstruction.
How does the memory ownership work between the command allocator and command list? Is the allocator doing implicit double (or more?) buffering on a command list reset?
The allocator owns the memory. The command list is your interface to write into that memory, but it doesn't actually own it. You've probably deduced from the previous parts of the post that the allocator does no additional buffering (double or otherwise), so again, that's why you want to ensure your work is done (fence) before resetting an allocator.
Yes, that's correct. As long as the samples are from non-fractional mip levels, the result is the same as bilinear filtering. In the use case above, you could also use D3D11_FILTER_MIN_MAG_LINEAR_MIP_POINT for the same result.
Adam_42's way to increase the blur is one approach. Another approach is to simply run the result of the first blur pass through the blur technique again. In pseudo-code it would look something like this:
int numPasses = 3;
for(int i = 0; i < numPasses; ++i)
If you were using a 7-tap blur in your shader and ran it through that loop 3 times, you would have the same result as running a 21-tap blur.
Unfortunately I haven't used either of the other approaches, so I won't be able to be too much help there. If you render the front and back faces, you should be able to calculate the thickness of the geometry and use that as a replacement for the thickness value you'd get from the shadow map. The difference, though, is that using that value won't account for other occluders in the scene that would otherwise obstruct the light and diminish or cancel the transmittance effect.
I'm not positive but my first guess is that it's because of the additional space you're taking between blur taps. Try it by just taking neighboring pixels and not using the below value. You can just set it to 1.0f for testing before removing it.
float blur = 20.0f; // The higher the more blurry
I'm on mobile at the moment, so I'll try to get to your other questions later if no one else has yet.
There shouldn't be any issue using the technique in a forward shading path. You just apply it at the same time you draw and light your geometry. If it's not already, add the world position as an output from the vertex shader / input to the pixel shader. This way you don't even need to reconstruct it, you can use the interpolated value that gets generated from the rasterizer as the same value you'd be getting from reconstruction in a deferred setup. You'll still need to access to your light's view-projection matrix and shadow map for sampling. The steps are the same as above, minus position reconstruction, once you get to the pixel shader.
I'm not sure I'm interpreting Aras's comment correctly from the link you posted, but if he's saying that the shadow map isn't accessible, there might not be a good way to do this as it's a key part of the technique. I'd be surprised if there wasn't some way to customize the pipeline though to provide the shadow map. This is where my limited knowledge of Unity isn't helping.
I haven't worked with Unity so I'm not sure on some of its details, but from your description it sounds like you're using a deferred shading pipeline. You should still have all you need to get the transmittance effect working.
From my description above, PWS comes from reconstructing the world space position from the depth buffer at a given pixel, or sampling it directly from a buffer that stores position - whichever Unity uses. If you have access to that and the light's view-projection matrix, which should be included somewhere in a constant buffer as long as the shadow map is being applied to the light, you should have everything you need. You shouldn't need any off-screen information for this effect.
You need to do this as two separate passes, one horizontal then one vertical (or vice versa). What you currently have is blurring each pixel in a cross shape instead of as the full area around the pixel.
In other words, your current setup is blurring like below for each pixel, where the middle pixel (P) is the current one. Separating these into two passes and blurring the first pass into a temporary buffer to be used as the input for the second pass will give you a blur that includes the pixels in the area surrounding the center point, also.
It's been a while since I implemented this effect, so I may be a bit hazy on some details. Basically what's happening is this.
You have a world space position (PWS). Nothing too special here, just a plain world space position.
You transform position PWS by the light's view-projection matrix (PLVP). This is similar to what you do for shadow mapping, but in this case instead of transforming a vertex, you're transforming the exact pixel's world space value.
You then need to transform PLVP into texture space. You can do this manually, or just add the texture space transform to the light's view-projection matrix, because the only component of PLVP that we need is the z-component (depth).
Divide PLVP.xy by PLVP.w and use that as the texture coordinate to sample your shadow map (D1).
Extract the z-component from PLVP (D2).
In the implementation I followed (and the one you cited) they store their shadow maps linearly. I don't do this, so I reconstruct their linear depth. D1 and D2 both need to have linear depth. If your shadow map is stored linearly, you can just multiply each by the far plane used in the light's projection matrix, otherwise you can use projection values to reconstruct them. See MJP's post if you're unclear on what values to use: https://mynameismjp.wordpress.com/2010/09/05/position-from-depth-3/
Now that you have both values linearized, you can take the absolute value of their difference to find the distance between the depths DIST = abs(D1 - D2).
The rest of the effect is based on the scale, weights, and variances values that are generally defined per-kernel. They will vary from material to material, but the code provided should give you all you need for using them properly.
The links below are a little more recent than the one you posted and include the transmittance function as part of the larger solution.