tile frustum construction issue for tile-based deferred shading

Started by
5 comments, last by dongsaturn 10 years, 6 months ago

Hi all,

I'm just learning the tile-based deferred shading, and found the great article by Andrew Lauritzen. And I also checked the source code, it's really helpful. But i still have some confusion about the code for tile frustum construction:


    // Work out scale/bias from [0, 1]
    float2 tileScale = float2(mFramebufferDimensions.xy) * rcp(float(2 * COMPUTE_SHADER_TILE_GROUP_DIM));
    float2 tileBias = tileScale - float2(groupId.xy);

    // Now work out composite projection matrix
    // Relevant matrix columns for this tile frusta
    float4 c1 = float4(mCameraProj._11 * tileScale.x, 0.0f, tileBias.x, 0.0f);
    float4 c2 = float4(0.0f, -mCameraProj._22 * tileScale.y, tileBias.y, 0.0f);
    float4 c4 = float4(0.0f, 0.0f, 1.0f, 0.0f);

    // Derive frustum planes
    float4 frustumPlanes[6];
    // Sides
    frustumPlanes[0] = c4 - c1;
    frustumPlanes[1] = c4 + c1;
    frustumPlanes[2] = c4 - c2;
    frustumPlanes[3] = c4 + c2;
    // Near/far
    frustumPlanes[4] = float4(0.0f, 0.0f,  1.0f, -minTileZ);
    frustumPlanes[5] = float4(0.0f, 0.0f, -1.0f,  maxTileZ);
    
    // Normalize frustum planes (near/far already normalized)
    [unroll] for (uint i = 0; i < 4; ++i) {
        frustumPlanes[i] *= rcp(length(frustumPlanes[i].xyz));
    }

I'm sure it's the Clip Space Approach for derive the Frustum, explained here in detail.

The really confusion part for me is the code for building the tile project matrix.


    // Work out scale/bias from [0, 1]
    float2 tileScale = float2(mFramebufferDimensions.xy) * rcp(float(2 * COMPUTE_SHADER_TILE_GROUP_DIM));
    float2 tileBias = tileScale - float2(groupId.xy);

    // Now work out composite projection matrix
    // Relevant matrix columns for this tile frusta
    float4 c1 = float4(mCameraProj._11 * tileScale.x, 0.0f, tileBias.x, 0.0f);
    float4 c2 = float4(0.0f, -mCameraProj._22 * tileScale.y, tileBias.y, 0.0f);
    float4 c4 = float4(0.0f, 0.0f, 1.0f, 0.0f);

mFramebufferDimensions is the current view port dimension.
COMPUTE_SHADER_TILE_GROUP_DIM is the tile dimension.

I think the correct scale factor should be:


    float2 tileScale = float2(mFramebufferDimensions.xy) * rcp(float(COMPUTE_SHADER_TILE_GROUP_DIM));
    // instead of 
    // float2 tileScale = float2(mFramebufferDimensions.xy) * rcp(float(2 * COMPUTE_SHADER_TILE_GROUP_DIM));

And I can't figure out the exact mean of tileBias.

I think the tileScale is applied in the view space, and tileBias is in the normalized device space (tile projection matrix).

After days experiments I still can't figure it out by myself.

Please give me some help!


I'm really confused.

Update:

I also found another method for deriving the tile frustum in the AMD Forward+ example.

1. calculate the four corner position X, Y on the near plane of the tile frustum in the normalized-device-space from the view port space position.

Note, we are using inverted 32-bit float depth for better precision, so near and far is reversed, and the z value in near plane is 1.f instead 0.f.

2. transform them back to the view space by multiply the inverse projection matrix.

3. build the tile frustum with the four corner and origin point in the view space, use the Geometric Approach described here in detail.


float4 frustumEqn[4];// construct frustum for this tile

// four corners of the tile, clockwise from top-left
float4 frustum[4];
frustum[0] = ConvertProjToView( float4( pxm/(float)uWindowWidthEvenlyDivisibleByTileRes*2.f-1.f, (uWindowHeightEvenlyDivisibleByTileRes-pym)/(float)uWindowHeightEvenlyDivisibleByTileRes*2.f-1.f,1.f,1.f) );
frustum[1] = ConvertProjToView( float4( pxp/(float)uWindowWidthEvenlyDivisibleByTileRes*2.f-1.f, (uWindowHeightEvenlyDivisibleByTileRes-pym)/(float)uWindowHeightEvenlyDivisibleByTileRes*2.f-1.f,1.f,1.f) );
frustum[2] = ConvertProjToView( float4( pxp/(float)uWindowWidthEvenlyDivisibleByTileRes*2.f-1.f, (uWindowHeightEvenlyDivisibleByTileRes-pyp)/(float)uWindowHeightEvenlyDivisibleByTileRes*2.f-1.f,1.f,1.f) );
frustum[3] = ConvertProjToView( float4( pxm/(float)uWindowWidthEvenlyDivisibleByTileRes*2.f-1.f, (uWindowHeightEvenlyDivisibleByTileRes-pyp)/(float)uWindowHeightEvenlyDivisibleByTileRes*2.f-1.f,1.f,1.f) );

// create plane equations for the four sides of the frustum,
// with the positive half-space outside the frustum (and remember,
// view space is left handed, so use the left-hand rule to determine
// cross product direction)
for(uint i=0; i<4; i++)
    frustumEqn[i] = CreatePlaneEquation( frustum[i], frustum[(i+1)&3] );


It's much easy to understand, but a little bit slower.

Thank you very much. (Sorry for my poor English)

Advertisement

Hi all,

I have got another problem about the code for reconstruct view space position from depth.


    // Compute screen/clip-space position and neighbour positions
    // NOTE: Mind DX11 viewport transform and pixel center!
    // NOTE: This offset can actually be precomputed on the CPU but it's actually slower to read it from
    // a constant buffer than to just recompute it.
    float2 screenPixelOffset = float2(2.0f, -2.0f) / gbufferDim;
    float2 positionScreen = (float2(positionViewport.xy) + 0.5f) * screenPixelOffset.xy + float2(-1.0f, 1.0f);

positionViewport is the SV_Position.

I known it's means transform from viewport space [0 - ViewportDimension] to the clip-space [-1 - 1].

But why need add 0.5f offset to the positionViewport before the transformation?

Thank you very much.

One note. With tiled deferred you can skip far and near planes altogether at tile level. All tile frustums share those so you can first cull all lights against camera frustum and then cull that list against all tiles.

About a month ago I went through this same sample trying to understand how it works, and created my own implementation of it from that.

Regarding your first post, I'll admit, that's the one part of the sample I didn't entirely understand, so I was watching this thread hoping someone would post about it. At the time I decided I'd just copy the frustrum construction code wholesale and then revisit it if-and-when I needed to understand what exactly it was doing. The only other time I've needed to construct a frustrum manually is CPU-side for culling the scene. Here's hoping someone sheds some light though.

Your second question is to do with rasterization. My understanding is that rasterized triangles are tested against pixel centers. See the following - http://msdn.microsoft.com/en-us/library/windows/desktop/cc627092(v=vs.85).aspx . If you think about the coordinates in viewport space, they describe the top-left corner of each pixel, e.g. the upper right pixel in an 800x600 viewport is 799, 0. However the depth buffer you're sampling was generated in clip-space via rasterization where pixel coordinates describe the center of each pixel.

So in order to reconstruct an accurate position you need your clip-space xy values to also refer to pixel centers, rather than top-left corner of the pixel, just like the depth buffer. Adding 0.5f to the x and y viewport coordinates makes them refer to pixel center, so that upper right pixel in an 800x600 viewport becomes 799.5, 0.5. Then when those coordinates are transformed into clip space they still refer to pixel centers rather than top-left pixel corners, the same as your depth buffer, and your reconstructed position is more accurate. Hopefully that makes sense.

Lastly, I'm not entirely certain it would be wise to skip the near/far frustrum planes at a tile level as suggested above. The entire reason to compute each tile's min and max Z range and create a per-tile frustrum is to cull lights much more efficiently. If all tiles use the same frustrum, all lights within the camera frustrum will be within every tile's Z range. You'd essentially be only culling lights for each tile in X and Y alone. That would allow for situations where a foreground light would be processed for tiles with only background pixels. The foreground light would be processed on a pixel level during shading when the attentuation test fails since the background pixels are too far from the light. By using a per-tile frustrum that same foreground light is culled once for all 256 pixels of a 16x16 tile (since the foreground light will never intersect the per-tile near plane of a tile only containing such distant pixels), rather than tested 256 times. When you start using a lot more lights, which is one of the big reasons to use tiled deferred, you can see how many per-pixel checks become per-tile if each tile has it's own frustrum with near/far planes derived from the min/max Z of the tile. It's not perfect, and tiles with large Z ranges have efficiency problems (2.5D culling can help with that), but it's a lot better than only testing the camera frustrum.

...

Lastly, I'm not entirely certain it would be wise to skip the near/far frustrum planes at a tile level as suggested above. The entire reason to compute each tile's min and max Z range and create a per-tile frustrum is to cull lights much more efficiently. If all tiles use the same frustrum, all lights within the camera frustrum will be within every tile's Z range. You'd essentially be only culling lights for each tile in X and Y alone. That would allow for situations where a foreground light would be processed for tiles with only background pixels. The foreground light would be processed on a pixel level during shading when the attentuation test fails since the background pixels are too far from the light. By using a per-tile frustrum that same foreground light is culled once for all 256 pixels of a 16x16 tile (since the foreground light will never intersect the per-tile near plane of a tile only containing such distant pixels), rather than tested 256 times. When you start using a lot more lights, which is one of the big reasons to use tiled deferred, you can see how many per-pixel checks become per-tile if each tile has it's own frustrum with near/far planes derived from the min/max Z of the tile. It's not perfect, and tiles with large Z ranges have efficiency problems (2.5D culling can help with that), but it's a lot better than only testing the camera frustrum.

Yeah you are absolute right. I readed code hastily and didn't catch up min/maxZ. Some how I just thinked that I read near and far plane there. Silly me.

Yeah you are absolute right. I readed code hastily and didn't catch up min/maxZ. Some how I just thinked that I read near and far plane there. Silly me.


No worries, we all do silly things. For instance, reading back your quote, it looks like I have serious problems typing frustum and attenuation. smile.png

Your second question is to do with rasterization. My understanding is that rasterized triangles are tested against pixel centers. See the following - http://msdn.microsoft.com/en-us/library/windows/desktop/cc627092(v=vs.85).aspx . If you think about the coordinates in viewport space, they describe the top-left corner of each pixel, e.g. the upper right pixel in an 800x600 viewport is 799, 0. However the depth buffer you're sampling was generated in clip-space via rasterization where pixel coordinates describe the center of each pixel.

So in order to reconstruct an accurate position you need your clip-space xy values to also refer to pixel centers, rather than top-left corner of the pixel, just like the depth buffer. Adding 0.5f to the x and y viewport coordinates makes them refer to pixel center, so that upper right pixel in an 800x600 viewport becomes 799.5, 0.5. Then when those coordinates are transformed into clip space they still refer to pixel centers rather than top-left pixel corners, the same as your depth buffer, and your reconstructed position is more accurate. Hopefully that makes sense.

I appreciate your kindly help. I understand the problem now. I will read the Rasterization Rules page carefully later.

This topic is closed to new replies.

Advertisement