How to make downsampling with directx 11 ?

Started by
11 comments, last by vinterberg 6 years, 7 months ago
3 hours ago, theScore said:

I am interested by the 1x1 version only, do you a better method than downsampling for getting the 1x1 texture ?

Below is a possible implementation, i don't say it is the fastest, but it show the logic clearly and it is quite easy to understand. Only profiling and tweak of the group count and parsing of the texture will lead to the optimum, but it should already be quite blazing fast

This is just two dispatch with one small intermediate texture of w/8 by 1 pixel. The first pass is computing one average per column of 8 pixels width, write the value to the intermediate resource, then the second pass compute the average of the columns.

Each pass compute first a local average for his own thread, then average the value for the group with a groupshared storage and finaly write the value if it is the first thread in the group.

There is potential for errors in the code, i did not test it, but it should be quite close.

EDIT: On hold, the missing float atomics on PC make it a little harder to implement than on PS4/XboxOne, this need some adjustement, i will fix that later :(

 




// i assume the original image has dimensions that are multiple of 8 for clarity
// you will create a texture of dimension [w/8, 1] of type float with uav/srv binding, call it Columns
// you will create a texture of dimension [1,1] of type float with uav/srv binding, call it Result

// At runtime :
// SetCompute 1
// Set Rows to U0
// Set SourceImage to T0
// Dispatch( width / 8, 1, 1);
// SetCompute 2
// Set Rows to T0
// Set Result to U0
// Dispatch( 1, 1, 1 );
// Voilà


// Common.hlsli
float Lum( float3 rgb ) { return dot(rgb,float3(0.25,0.60,0.15)); }

// Pass1.hlsl
#include "Common.hlsli"
Texture2D<float3> sourceImage : register(t0);
RWTexture2D<float> columns : register(u0);

groupshared float intermediate;
[numthreads(8, 8, 1)]
void main(uint2 GTid : SV_GroupThreadID, uint gidx : SV_GroupIndex, uint2 Gid : SV_GroupID) {
	intermediate = 0;
	
	uint2 dim;
	sourceImage.GetDimensions(0,dim.x,dim.y);

	uint rowCount = dim.y / 8; 
	float tmp = 0.f;
	for(uint row = 0; row < rowCount; ++row )
		tmp += Lum(sourceImage[ GTid + uint2(Gid.x,row) * 8 ]) / float(rowCount); // this use the operator[], you can try to use a sampler+Sample to hit half pixels uvs here.

	GroupMemoryBarrierWithGroupSync(); // for the initial intermediate = 0;
	InterlockAdd(intermediate,tmp / 64.f); 
	GroupMemoryBarrierWithGroupSync(); // for the interlock

	if (gidx == 0) 
		columns[Gid.x] = intermediate;
}

// Pass2.hlsl
#include "Common.hlsli"
Texture2D<float> columns : register(t0);
RWTexture2D<float> average : register(u0);

groupshared float intermediate;
[numthreads(64, 1, 1)]
void main(uint GTid : SV_GroupThreadID) {
	intermediate = 0;
	
	uint2 dim;
	columns.GetDimensions(0,dim.x,dim.y);

	float tmp = 0.f;
	for(uint col = 0; col < dim.x; col += 64)
		tmp += columns[col + GTid];

	GroupMemoryBarrierWithGroupSync(); // for the initial intermediate = 0;
	InterlockAdd(intermediate,tmp);
	GroupMemoryBarrierWithGroupSync(); // for the interlock

	if (GTid == 0) 
		columnLums[Gid.x] = intermediate / dim.x;
}

 

Advertisement

If you're having trouble grasping the simple downsample technique, a compute shader will only confuse you more LOL

Start out simple, and work from there....

.:vinterberg:.

This topic is closed to new replies.

Advertisement