Elegant reduction algorithm on CS

Started by
-1 comments, last by Happy SDE 7 years ago

Hi Forum!

I need to reduce 2D texture.
For each Dispatch(), I reduce it 16x both directions on CS.
The biggest problem arises when I have the last step, say when texture is 8x5. For previous steps it is not such big.

I found next solution: store in CB for the last, not complete quads, their real denominators, and use it instead of division by 16*16.
CB data is changed on Resize() only, so I don’t need to update it every frame: I just store vector of them as reduction targets.

But I wonder: is there more elegant solution?

Here is a scratch:


static const uint gLumReductionTGSize = 16;

cbuffer CB
{
    uint cb_xGroupId;
    uint cb_xDenominator; //if GroupID.x == cb_xGroupId, use it. Otherwise - gLumReductionTGSize  
    uint cb_yGroupId;
    uint cb_yDenominator; //if GroupID.y == cb_yGroupId, use it. Otherwise - gLumReductionTGSize  
}


//Each time reduce by 16x16
[numthreads(gLumReductionTGSize, gLumReductionTGSize, 1)]
void main(uint3 GroupID : SV_GroupID, uint3 DispatchThreadId : SV_DispatchThreadID, uint ThreadIndex : SV_GroupIndex)
{
    // Will read 0 in case "out of bounds"
    float pixelLuminance = InputLumMap[DispatchThreadId.xy];

    // Store in shared memory
    LumSamples[ThreadIndex] = pixelLuminance;
    GroupMemoryBarrierWithGroupSync();

    // Reduce
    [unroll]
    for (uint s = NumThreads / 2; s > 0; s >>= 1)
    {
        if (ThreadIndex < s)
        {
            LumSamples[ThreadIndex] += LumSamples[ThreadIndex + s];
        }

        GroupMemoryBarrierWithGroupSync();
    }

    if (ThreadIndex == 0)
    {
        uint divX = (GroupID.x == cb_xGroupId) ? cb_xDenominator : gLumReductionTGSize;
        uint divY = (GroupID.y == cb_yGroupId) ? cb_yDenominator : gLumReductionTGSize;

        OutputLumMap[GroupID.xy] = LumSamples[0] / (divX* divY);
    }
}

Thanks in advance!

This topic is closed to new replies.

Advertisement