Sign in to follow this  
theScore

DX11 How to make downsampling with directx 11 ?

Recommended Posts

theScore    158

Hello !

I have a texture in 4K resolution and I need to downsample this texture to get a 1x1 resulting texture.

I know that there are intermediate downsamplings before getting to the 1x1 texture but how downsampling works and how do I have to code my pixel shader to downsample my texture ?

Share this post


Link to post
Share on other sites
vinterberg    1236

For each slice, you just use a simple copy-pixel pixel shader and adjust your UVs with 0.5 texel offset with linear filtering - just continue this down to the 1x1 texture. That way, you let the hardware filtering do the job of averaging the pixels with nearly no cost, since you're sampling in-between texels :)

 

Share this post


Link to post
Share on other sites
theScore    158
11 hours ago, vinterberg said:

For each slice, you just use a simple copy-pixel pixel shader and adjust your UVs with 0.5 texel offset with linear filtering - just continue this down to the 1x1 texture. That way, you let the hardware filtering do the job of averaging the pixels with nearly no cost, since you're sampling in-between texels :)

 

So you mean the pixel shader would look like something like this :

pixelOut = pixelIN ; 

?

Share this post


Link to post
Share on other sites
vinterberg    1236

Pseudo-code:

for (slice = (nslices - 1) to 1 step -1)
{
	setrendertarget(texture[slice - 1]);				// render to half-size texture
	pixelshader->setinputtexture(texture[slice]);
	texwidth = texture[slice].width;					// the double-size texture we want to downsample
	texheight = texture[slice].height;
	float u_offset = (1.0f / texwidth) / 2.0f;			// half texel size
	float v_offset = (1.0f / texheight) / 2.0f;
	pixelshader->setUVoffset(u_offset, v_offset);
	pixelshader->run();									// draw a fullscreen
}

pixelshader
{
	pixel_out = texture_sample(texture_input, vertexshader_UV + UVoffset);
};

 

Share this post


Link to post
Share on other sites
unbird    8338
13 hours ago, belfegor said:

I thought we didnt need pixel offset since dx11?

We don't, but here it's set so deliberately to use bilinear filtering for downsampling.

Aside: You can use the API to do this, though you won't have control of the filtering: ID3D11DeviceContext::GenerateMips (note the mandatory creation flags for the texture) .

Share this post


Link to post
Share on other sites
belfegor    2835

I am confused as what to believe is true now.

For example, i am looking at MJP shadow sample project, where he downsample/scale texture, there is no "pixel offsets" applied, just bilinear filter:

 

quad verts

QuadVertex verts[4] =
    {
        { XMFLOAT4(1, 1, 1, 1), XMFLOAT2(1, 0) },
        { XMFLOAT4(1, -1, 1, 1), XMFLOAT2(1, 1) },
        { XMFLOAT4(-1, -1, 1, 1), XMFLOAT2(0, 1) },
        { XMFLOAT4(-1, 1, 1, 1), XMFLOAT2(0, 0) }
    };

 

quad vertex shader

VSOutput QuadVS(in VSInput input)
{
    VSOutput output;

    // Just pass it along
    output.PositionCS = input.PositionCS;
    output.TexCoord = input.TexCoord;

    return output;
}

 

// Uses hw bilinear filtering for upscaling or downscaling
float4 Scale(in PSInput input) : SV_Target
{
    return InputTexture0.Sample(LinearSampler, input.TexCoord);
}

 

Share this post


Link to post
Share on other sites
galop1n    977

Are you interested only in the 1x1 version ? or do you need all the chain ? To do short, Are you computing the average exposure for exposure adaptation or something else ?

If you are interested only in the 1x1 result as i understand your question, you should forget about pixel shader, running some compute looping over the image, keeping the averaging in groupshared memory or in register will outperform the bandwidth of writing and reading full surface plus you get rid of expensive pipeline flush between the different reduction pass ( because of going from rtv to srv ).

If you are interested in the full chain, running compute can also outperform, you can for example again save on reads by having a compute generating 3 mip in one run, doing the extra 2 by reusing what it read for the first reduction and working in groupshared memory.

Forget also about the legacy GenerateMips, it is not a hardware feature and usually does a sub optimal job compared to a hand crafted solution.

 

Share this post


Link to post
Share on other sites
galop1n    977
3 hours ago, theScore said:

I am interested by the 1x1 version only, do you a better method than downsampling for getting the 1x1 texture ?

Below is a possible implementation, i don't say it is the fastest, but it show the logic clearly and it is quite easy to understand. Only profiling and tweak of the group count and parsing of the texture will lead to the optimum, but it should already be quite blazing fast

This is just two dispatch with one small intermediate texture of w/8 by 1 pixel. The first pass is computing one average per column of 8 pixels width, write the value to the intermediate resource, then the second pass compute the average of the columns.

Each pass compute first a local average for his own thread, then average the value for the group with a groupshared storage and finaly write the value if it is the first thread in the group.

There is potential for errors in the code, i did not test it, but it should be quite close.

EDIT: On hold, the missing float atomics on PC make it a little harder to implement than on PS4/XboxOne, this need some adjustement, i will fix that later :(

 



// i assume the original image has dimensions that are multiple of 8 for clarity
// you will create a texture of dimension [w/8, 1] of type float with uav/srv binding, call it Columns
// you will create a texture of dimension [1,1] of type float with uav/srv binding, call it Result

// At runtime :
// SetCompute 1
// Set Rows to U0
// Set SourceImage to T0
// Dispatch( width / 8, 1, 1);
// SetCompute 2
// Set Rows to T0
// Set Result to U0
// Dispatch( 1, 1, 1 );
// Voilà


// Common.hlsli
float Lum( float3 rgb ) { return dot(rgb,float3(0.25,0.60,0.15)); }

// Pass1.hlsl
#include "Common.hlsli"
Texture2D<float3> sourceImage : register(t0);
RWTexture2D<float> columns : register(u0);

groupshared float intermediate;
[numthreads(8, 8, 1)]
void main(uint2 GTid : SV_GroupThreadID, uint gidx : SV_GroupIndex, uint2 Gid : SV_GroupID) {
	intermediate = 0;
	
	uint2 dim;
	sourceImage.GetDimensions(0,dim.x,dim.y);

	uint rowCount = dim.y / 8; 
	float tmp = 0.f;
	for(uint row = 0; row < rowCount; ++row )
		tmp += Lum(sourceImage[ GTid + uint2(Gid.x,row) * 8 ]) / float(rowCount); // this use the operator[], you can try to use a sampler+Sample to hit half pixels uvs here.

	GroupMemoryBarrierWithGroupSync(); // for the initial intermediate = 0;
	InterlockAdd(intermediate,tmp / 64.f); 
	GroupMemoryBarrierWithGroupSync(); // for the interlock

	if (gidx == 0) 
		columns[Gid.x] = intermediate;
}

// Pass2.hlsl
#include "Common.hlsli"
Texture2D<float> columns : register(t0);
RWTexture2D<float> average : register(u0);

groupshared float intermediate;
[numthreads(64, 1, 1)]
void main(uint GTid : SV_GroupThreadID) {
	intermediate = 0;
	
	uint2 dim;
	columns.GetDimensions(0,dim.x,dim.y);

	float tmp = 0.f;
	for(uint col = 0; col < dim.x; col += 64)
		tmp += columns[col + GTid];

	GroupMemoryBarrierWithGroupSync(); // for the initial intermediate = 0;
	InterlockAdd(intermediate,tmp);
	GroupMemoryBarrierWithGroupSync(); // for the interlock

	if (GTid == 0) 
		columnLums[Gid.x] = intermediate / dim.x;
}

 

Edited by galop1n

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Similar Content

    • By khawk
      CRYENGINE has released their latest version with support for Vulkan, Substance integration, and more. Learn more from their announcement and check out the highlights below.
      Substance Integration
      CRYENGINE uses Substance internally in their workflow and have released a direct integration.
       
      Vulkan API
      A beta version of the Vulkan renderer to accompany the DX12 implementation. Vulkan is a cross-platform 3D graphics and compute API that enables developers to have high-performance real-time 3D graphics applications with balanced CPU/GPU usage. 

       
      Entity Components
      CRYENGINE has addressed a longstanding issue with game code managing entities within the level. The Entity Component System adds a modular and intuitive method to construct games.
      And More
      View the full release details at the CRYENGINE announcement here.

      View full story
    • By khawk
      CRYENGINE has released their latest version with support for Vulkan, Substance integration, and more. Learn more from their announcement and check out the highlights below.
      Substance Integration
      CRYENGINE uses Substance internally in their workflow and have released a direct integration.
       
      Vulkan API
      A beta version of the Vulkan renderer to accompany the DX12 implementation. Vulkan is a cross-platform 3D graphics and compute API that enables developers to have high-performance real-time 3D graphics applications with balanced CPU/GPU usage. 

       
      Entity Components
      CRYENGINE has addressed a longstanding issue with game code managing entities within the level. The Entity Component System adds a modular and intuitive method to construct games.
      And More
      View the full release details at the CRYENGINE announcement here.
    • By ilovegames
      You are a US Army soldier, and one day an unknown enemy attacked your base. Now your task is to survive and protect the base. There are a lot of weapons in your arsenal. Survive this nightmare at any cost. 
      Download https://falcoware.com/Defender.php
       



    • By ilovegames
      Hold out as long as possible in the "Arena of Death." Only your skill and the correct tactics of combat will help you survive in the Arena. Nine types of weapons are at your disposal, but remember that cartridges are limited and they need to be spent wisely. You must prioritize the position and the type of weapons in order to achieve victory.   Awaiting you:   - 9 types of weapons - Dynamic gameplay - Different types of opponents, with their own characteristics and weaknesses - Increasing complexity   Download https://falcoware.com/DeadArena.php


    • By the incredible smoker
      I like to understand the basics of BSP tree rendering.
      I am looking to the doom source code also to understand the simplest form from the beginning.
      Maybe i can write a win32 program with drawing pixels and lines to see how it works.
      I dont know how the doom code works just by looking at it, i cant even find the part where they draw a pixel.
       
      Also intrested in how to implement your own BSP tree in dirextX,
      how do i use the vertexbuffer for that ?
       
      Do you need to replace all vertices from the vertex buffer for every frame ?, aint that slow ?
      I need if for unlimited lights, in the fixed pipeline i have only 8 light sources available.
       
      I also dont understand how you make your own pipeline ?, how does it work in combination with a video card ?,
      i dont wanto code asm or shader language or something : C++ only, is it possible ?
       
      Anyone has undestandable info about the simplest BSP tree or non fixed pipeline ?
      thanks
       
       
  • Popular Now