SampleCmpLevelZero
sum += tDepthMap.SampleCmpLevelZero(ShadowSampler, uv + offset, z);
is supposed to do 2x2 PCF in one fetch. However, is this a "driver hack" like they had in DX9? Looking at the documentation for SampleCmpLevelZero, it does not say it does 2x2 PCF.
Nah, it's not a hack anymore. You just need to enable LINEAR filtering in your sampler state to get PCF.
Is SampleCmpLevelZero preferred over GatherCmpRed?
I'm still concerned why I wasn't able to emulate basic PCF using GatherCmpRed. For the vector returned:
float4 s = GatherCmpRed(...);
What sample points do the components map to (top-left, top-right, bottom-left, bottom-right).
s.x = ?
s.y = ?
s.z = ?
s.w = ?
And would I just use COMPARISON_MIN_MAG_LINEAR_MIP_POINT as the comparison filter?
Yeah you are right. Thanks.
I'm still concerned why I wasn't able to emulate basic PCF using GatherCmpRed. For the vector returned:
float4 s = GatherCmpRed(...);
What sample points do the components map to (top-left, top-right, bottom-left, bottom-right).
s.x = ?
s.y = ?
s.z = ?
s.w = ?
And would I just use COMPARISON_MIN_MAG_LINEAR_MIP_POINT as the comparison filter?
I think that's the right order, although I'm not totally sure. You could do a simple test where you render the screen coordinate of each pixel to the render target, then use PIX to debug a shader where you call GatherRed and GatherGreen to see XY coordinates that were grabbed.
For Gather I've always just used MIN_MAG_MIP_POINT, never tried LINEAR. I have no idea if LINEAR changes the behavior of Gather in any way.
To clarify: DO NOT try to use GatherCmpRed to speed up PCF!
GatherCmpRed simply gives you the 4 raw depth compares that the hardware uses internally to do PCF. You have to do the bilinear filtering yourself, and there's other overhead that makes it not worth it. Only use it if you're doing a special kernel that doesn't need PCF.
For the sake of completeness, below is some code that does 3x3 PCF samples, and 3x3 gather samples.
The gather version is a lot slower because it incurs a lot more adds (4 component vs 1 component), and the compiler uses up a lot more registers (4 instead of 1 per sample) in an effort to hide latency. You might be able to get around this if you're doing a lot more math with each result than just accumulating them up. Compile for yourself and check the microcode with something like nvshaderperf. It's worse than you'd expect.
float fShadow = 0;
#ifdef DO_HARDWARE_PCF
fShadow += ShadowTexture.SampleCmpLevelZero(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2(-1, -1));
fShadow += ShadowTexture.SampleCmpLevelZero(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2(-1, 0));
fShadow += ShadowTexture.SampleCmpLevelZero(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2(-1, 1));
fShadow += ShadowTexture.SampleCmpLevelZero(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2( 0, -1));
fShadow += ShadowTexture.SampleCmpLevelZero(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2( 0, 0));
fShadow += ShadowTexture.SampleCmpLevelZero(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2( 0, 1));
fShadow += ShadowTexture.SampleCmpLevelZero(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2( 1, -1));
fShadow += ShadowTexture.SampleCmpLevelZero(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2( 1, 0));
fShadow += ShadowTexture.SampleCmpLevelZero(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2( 1, 1));
#else
float4 vShadow = 0;
vShadow += ShadowTexture.GatherCmpRed(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2(-1, -1));
vShadow += ShadowTexture.GatherCmpRed(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2(-1, 0));
vShadow += ShadowTexture.GatherCmpRed(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2(-1, 1));
vShadow += ShadowTexture.GatherCmpRed(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2( 0, -1));
vShadow += ShadowTexture.GatherCmpRed(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2( 0, 0));
vShadow += ShadowTexture.GatherCmpRed(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2( 0, 1));
vShadow += ShadowTexture.GatherCmpRed(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2( 1, -1));
vShadow += ShadowTexture.GatherCmpRed(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2( 1, 0));
vShadow += ShadowTexture.GatherCmpRed(ShadowSampler, vShadowCoord.xyw, vShadowCoord.z, int2( 1, 1));
// The samples come in this order:
// W Z
// X Y
// Meaning W is in the -1,-1 uv direction and Y is in the +1,+1 direction.
float4 vLerp;
vLerp.wz = frac(vShadowCoord.xy / 1024 + 0.5);
vLerp.xy = (float2)1 - vLerp.wz;
vLerp = vLerp.xwwx * vLerp.zzyy;
fShadow = dot(vShadow, vLerp) / 9;
#endif
return fShadow;