Hi Forum!
I need to reduce 2D texture.
For each Dispatch(), I reduce it 16x both directions on CS.
The biggest problem arises when I have the last step, say when texture is 8x5. For previous steps it is not such big.
I found next solution: store in CB for the last, not complete quads, their real denominators, and use it instead of division by 16*16.
CB data is changed on Resize() only, so I don’t need to update it every frame: I just store vector of them as reduction targets.
But I wonder: is there more elegant solution?
Here is a scratch:
static const uint gLumReductionTGSize = 16;
cbuffer CB
{
uint cb_xGroupId;
uint cb_xDenominator; //if GroupID.x == cb_xGroupId, use it. Otherwise - gLumReductionTGSize
uint cb_yGroupId;
uint cb_yDenominator; //if GroupID.y == cb_yGroupId, use it. Otherwise - gLumReductionTGSize
}
//Each time reduce by 16x16
[numthreads(gLumReductionTGSize, gLumReductionTGSize, 1)]
void main(uint3 GroupID : SV_GroupID, uint3 DispatchThreadId : SV_DispatchThreadID, uint ThreadIndex : SV_GroupIndex)
{
// Will read 0 in case "out of bounds"
float pixelLuminance = InputLumMap[DispatchThreadId.xy];
// Store in shared memory
LumSamples[ThreadIndex] = pixelLuminance;
GroupMemoryBarrierWithGroupSync();
// Reduce
[unroll]
for (uint s = NumThreads / 2; s > 0; s >>= 1)
{
if (ThreadIndex < s)
{
LumSamples[ThreadIndex] += LumSamples[ThreadIndex + s];
}
GroupMemoryBarrierWithGroupSync();
}
if (ThreadIndex == 0)
{
uint divX = (GroupID.x == cb_xGroupId) ? cb_xDenominator : gLumReductionTGSize;
uint divY = (GroupID.y == cb_yGroupId) ? cb_yDenominator : gLumReductionTGSize;
OutputLumMap[GroupID.xy] = LumSamples[0] / (divX* divY);
}
}
Thanks in advance!