Sign in to follow this  

DX11 How to make downsampling with directx 11 ?

Recommended Posts

Hello !

I have a texture in 4K resolution and I need to downsample this texture to get a 1x1 resulting texture.

I know that there are intermediate downsamplings before getting to the 1x1 texture but how downsampling works and how do I have to code my pixel shader to downsample my texture ?

Share this post

Link to post
Share on other sites

For each slice, you just use a simple copy-pixel pixel shader and adjust your UVs with 0.5 texel offset with linear filtering - just continue this down to the 1x1 texture. That way, you let the hardware filtering do the job of averaging the pixels with nearly no cost, since you're sampling in-between texels :)


Share this post

Link to post
Share on other sites
11 hours ago, vinterberg said:

For each slice, you just use a simple copy-pixel pixel shader and adjust your UVs with 0.5 texel offset with linear filtering - just continue this down to the 1x1 texture. That way, you let the hardware filtering do the job of averaging the pixels with nearly no cost, since you're sampling in-between texels :)


So you mean the pixel shader would look like something like this :

pixelOut = pixelIN ; 


Share this post

Link to post
Share on other sites


for (slice = (nslices - 1) to 1 step -1)
	setrendertarget(texture[slice - 1]);				// render to half-size texture
	texwidth = texture[slice].width;					// the double-size texture we want to downsample
	texheight = texture[slice].height;
	float u_offset = (1.0f / texwidth) / 2.0f;			// half texel size
	float v_offset = (1.0f / texheight) / 2.0f;
	pixelshader->setUVoffset(u_offset, v_offset);
	pixelshader->run();									// draw a fullscreen

	pixel_out = texture_sample(texture_input, vertexshader_UV + UVoffset);


Share this post

Link to post
Share on other sites
13 hours ago, belfegor said:

I thought we didnt need pixel offset since dx11?

We don't, but here it's set so deliberately to use bilinear filtering for downsampling.

Aside: You can use the API to do this, though you won't have control of the filtering: ID3D11DeviceContext::GenerateMips (note the mandatory creation flags for the texture) .

Share this post

Link to post
Share on other sites

I am confused as what to believe is true now.

For example, i am looking at MJP shadow sample project, where he downsample/scale texture, there is no "pixel offsets" applied, just bilinear filter:


quad verts

QuadVertex verts[4] =
        { XMFLOAT4(1, 1, 1, 1), XMFLOAT2(1, 0) },
        { XMFLOAT4(1, -1, 1, 1), XMFLOAT2(1, 1) },
        { XMFLOAT4(-1, -1, 1, 1), XMFLOAT2(0, 1) },
        { XMFLOAT4(-1, 1, 1, 1), XMFLOAT2(0, 0) }


quad vertex shader

VSOutput QuadVS(in VSInput input)
    VSOutput output;

    // Just pass it along
    output.PositionCS = input.PositionCS;
    output.TexCoord = input.TexCoord;

    return output;


// Uses hw bilinear filtering for upscaling or downscaling
float4 Scale(in PSInput input) : SV_Target
    return InputTexture0.Sample(LinearSampler, input.TexCoord);


Share this post

Link to post
Share on other sites

Are you interested only in the 1x1 version ? or do you need all the chain ? To do short, Are you computing the average exposure for exposure adaptation or something else ?

If you are interested only in the 1x1 result as i understand your question, you should forget about pixel shader, running some compute looping over the image, keeping the averaging in groupshared memory or in register will outperform the bandwidth of writing and reading full surface plus you get rid of expensive pipeline flush between the different reduction pass ( because of going from rtv to srv ).

If you are interested in the full chain, running compute can also outperform, you can for example again save on reads by having a compute generating 3 mip in one run, doing the extra 2 by reusing what it read for the first reduction and working in groupshared memory.

Forget also about the legacy GenerateMips, it is not a hardware feature and usually does a sub optimal job compared to a hand crafted solution.


Share this post

Link to post
Share on other sites
3 hours ago, theScore said:

I am interested by the 1x1 version only, do you a better method than downsampling for getting the 1x1 texture ?

Below is a possible implementation, i don't say it is the fastest, but it show the logic clearly and it is quite easy to understand. Only profiling and tweak of the group count and parsing of the texture will lead to the optimum, but it should already be quite blazing fast

This is just two dispatch with one small intermediate texture of w/8 by 1 pixel. The first pass is computing one average per column of 8 pixels width, write the value to the intermediate resource, then the second pass compute the average of the columns.

Each pass compute first a local average for his own thread, then average the value for the group with a groupshared storage and finaly write the value if it is the first thread in the group.

There is potential for errors in the code, i did not test it, but it should be quite close.

EDIT: On hold, the missing float atomics on PC make it a little harder to implement than on PS4/XboxOne, this need some adjustement, i will fix that later :(


// i assume the original image has dimensions that are multiple of 8 for clarity
// you will create a texture of dimension [w/8, 1] of type float with uav/srv binding, call it Columns
// you will create a texture of dimension [1,1] of type float with uav/srv binding, call it Result

// At runtime :
// SetCompute 1
// Set Rows to U0
// Set SourceImage to T0
// Dispatch( width / 8, 1, 1);
// SetCompute 2
// Set Rows to T0
// Set Result to U0
// Dispatch( 1, 1, 1 );
// Voilà

// Common.hlsli
float Lum( float3 rgb ) { return dot(rgb,float3(0.25,0.60,0.15)); }

// Pass1.hlsl
#include "Common.hlsli"
Texture2D<float3> sourceImage : register(t0);
RWTexture2D<float> columns : register(u0);

groupshared float intermediate;
[numthreads(8, 8, 1)]
void main(uint2 GTid : SV_GroupThreadID, uint gidx : SV_GroupIndex, uint2 Gid : SV_GroupID) {
	intermediate = 0;
	uint2 dim;

	uint rowCount = dim.y / 8; 
	float tmp = 0.f;
	for(uint row = 0; row < rowCount; ++row )
		tmp += Lum(sourceImage[ GTid + uint2(Gid.x,row) * 8 ]) / float(rowCount); // this use the operator[], you can try to use a sampler+Sample to hit half pixels uvs here.

	GroupMemoryBarrierWithGroupSync(); // for the initial intermediate = 0;
	InterlockAdd(intermediate,tmp / 64.f); 
	GroupMemoryBarrierWithGroupSync(); // for the interlock

	if (gidx == 0) 
		columns[Gid.x] = intermediate;

// Pass2.hlsl
#include "Common.hlsli"
Texture2D<float> columns : register(t0);
RWTexture2D<float> average : register(u0);

groupshared float intermediate;
[numthreads(64, 1, 1)]
void main(uint GTid : SV_GroupThreadID) {
	intermediate = 0;
	uint2 dim;

	float tmp = 0.f;
	for(uint col = 0; col < dim.x; col += 64)
		tmp += columns[col + GTid];

	GroupMemoryBarrierWithGroupSync(); // for the initial intermediate = 0;
	GroupMemoryBarrierWithGroupSync(); // for the interlock

	if (GTid == 0) 
		columnLums[Gid.x] = intermediate / dim.x;


Edited by galop1n

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Forum Statistics

    • Total Topics
    • Total Posts
  • Similar Content

    • By ramirofages
      Hello everyone, I was following this article:
      And I'm trying to understand how the algorithm works. I'm currently testing it in Unity3D to first get a grasp of it and later port it to webgl.
      What I'm having problems is the space in which the calculations take place. First the author calculates the position in NDC and takes into account the aspect ratio of the screen.  Later, he calculates a displacement vector which he calls offset, and adds that to the position that is still in projective space, with the offset having a W value of 1. What's going on here? why can you add a vector in NDC to the resulting position of the projection? what's the relation there?. Also, what is that value of 1 in W doing?
      Supposedly this algorithm makes the thickness of the line independent of the depth, but I'm failing to see why.
      Any help is appreciated. Thanks
    • By GreenGodDiary
      I'm attempting to implement some basic post-processing in my "engine" and the HLSL part of the Compute Shader and such I think I've understood, however I'm at a loss at how to actually get/use it's output for rendering to the screen.
      Assume I'm doing something to a UAV in my CS:
      RWTexture2D<float4> InputOutputMap : register(u0); I want that texture to essentially "be" the backbuffer.
      I'm pretty certain I'm doing something wrong when I create the views (what I think I'm doing is having the backbuffer be bound as render target aswell as UAV and then using it in my CS):
      DXGI_SWAP_CHAIN_DESC scd; ZeroMemory(&scd, sizeof(DXGI_SWAP_CHAIN_DESC)); scd.BufferCount = 1; scd.BufferDesc.Format = DXGI_FORMAT_R8G8B8A8_UNORM; scd.BufferUsage = DXGI_USAGE_RENDER_TARGET_OUTPUT | DXGI_USAGE_SHADER_INPUT | DXGI_USAGE_UNORDERED_ACCESS; scd.OutputWindow = wndHandle; scd.SampleDesc.Count = 1; scd.Windowed = TRUE; HRESULT hr = D3D11CreateDeviceAndSwapChain(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, NULL, NULL, NULL, D3D11_SDK_VERSION, &scd, &gSwapChain, &gDevice, NULL, &gDeviceContext); // get the address of the back buffer ID3D11Texture2D* pBackBuffer = nullptr; gSwapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), (LPVOID*)&pBackBuffer); // use the back buffer address to create the render target gDevice->CreateRenderTargetView(pBackBuffer, NULL, &gBackbufferRTV); // set the render target as the back buffer CreateDepthStencilBuffer(); gDeviceContext->OMSetRenderTargets(1, &gBackbufferRTV, depthStencilView); //UAV for compute shader D3D11_UNORDERED_ACCESS_VIEW_DESC uavd; ZeroMemory(&uavd, sizeof(uavd)); uavd.Format = DXGI_FORMAT_R8G8B8A8_UNORM; uavd.ViewDimension = D3D11_UAV_DIMENSION_TEXTURE2D; uavd.Texture2D.MipSlice = 1; gDevice->CreateUnorderedAccessView(pBackBuffer, &uavd, &gUAV); pBackBuffer->Release();  
      After I render the scene, I dispatch like this:
      gDeviceContext->OMSetRenderTargets(0, NULL, NULL); m_vShaders["cs1"]->Bind(); gDeviceContext->CSSetUnorderedAccessViews(0, 1, &gUAV, 0); gDeviceContext->Dispatch(32, 24, 0); //hard coded ID3D11UnorderedAccessView* nullview = { nullptr }; gDeviceContext->CSSetUnorderedAccessViews(0, 1, &nullview, 0); gDeviceContext->OMSetRenderTargets(1, &gBackbufferRTV, depthStencilView); gSwapChain->Present(0, 0); Worth noting is the scene is rendered as usual, but I dont get any results from the CS (simple gaussian blur)
      I'm sure it's something fairly basic I'm doing wrong, perhaps my understanding of render targets / views / what have you is just completely wrong and my approach just makes no sense.

      If someone with more experience could point me in the right direction I would really appreciate it!

      On a side note, I'd really like to learn more about this kind of stuff. I can really see the potential of the CS aswell as rendering to textures and using them for whatever in the engine so I would love it if you know some good resources I can read about this!

      Thank you <3
      P.S I excluded the .hlsl since I cant imagine that being the issue, but if you think you need it to help me just ask

      P:P:S. As you can see this is my first post however I do have another account, but I can't log in with it because just keeps asking me to accept terms and then logs me out when I do over and over
    • By noodleBowl
      I was wondering if anyone could explain the depth buffer and the depth stencil state comparison function to me as I'm a little confused
      So I have set up a depth stencil state where the DepthFunc is set to D3D11_COMPARISON_LESS, but what am I actually comparing here? What is actually written to the buffer, the pixel that should show up in the front?
      I have these 2 quad faces, a Red Face and a Blue Face. The Blue Face is further away from the Viewer with a Z index value of -100.0f. Where the Red Face is close to the Viewer with a Z index value of 0.0f.
      When DepthFunc is set to D3D11_COMPARISON_LESS the Red Face shows up in front of the Blue Face like it should based on the Z index values. BUT if I change the DepthFunc to D3D11_COMPARISON_LESS_EQUAL the Blue Face shows in front of the Red Face. Which does not make sense to me, I would think that when the function is set to D3D11_COMPARISON_LESS_EQUAL the Red Face would still show up in front of the Blue Face as the Z index for the Red Face is still closer to the viewer
      Am I thinking of this comparison function all wrong?
      Vertex data just in case
      //Vertex date that make up the 2 faces Vertex verts[] = { //Red face Vertex(Vector4(0.0f, 0.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(100.0f, 100.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(100.0f, 0.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(0.0f, 0.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(0.0f, 100.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), Vertex(Vector4(100.0f, 100.0f, 0.0f), Color(1.0f, 0.0f, 0.0f)), //Blue face Vertex(Vector4(0.0f, 0.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(100.0f, 100.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(100.0f, 0.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(0.0f, 0.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(0.0f, 100.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), Vertex(Vector4(100.0f, 100.0f, -100.0f), Color(0.0f, 0.0f, 1.0f)), };  
    • By mellinoe
      Hi all,
      First time poster here, although I've been reading posts here for quite a while. This place has been invaluable for learning graphics programming -- thanks for a great resource!
      Right now, I'm working on a graphics abstraction layer for .NET which supports D3D11, Vulkan, and OpenGL at the moment. I have implemented most of my planned features already, and things are working well. Some remaining features that I am planning are Compute Shaders, and some flavor of read-write shader resources. At the moment, my shaders can just get simple read-only access to a uniform (or constant) buffer, a texture, or a sampler. Unfortunately, I'm having a tough time grasping the distinctions between all of the different kinds of read-write resources that are available. In D3D alone, there seem to be 5 or 6 different kinds of resources with similar but different characteristics. On top of that, I get the impression that some of them are more or less "obsoleted" by the newer kinds, and don't have much of a place in modern code. There seem to be a few pivots:
      The data source/destination (buffer or texture) Read-write or read-only Structured or unstructured (?) Ordered vs unordered (?) These are just my observations based on a lot of MSDN and OpenGL doc reading. For my library, I'm not interested in exposing every possibility to the user -- just trying to find a good "middle-ground" that can be represented cleanly across API's which is good enough for common scenarios.
      Can anyone give a sort of "overview" of the different options, and perhaps compare/contrast the concepts between Direct3D, OpenGL, and Vulkan? I'd also be very interested in hearing how other folks have abstracted these concepts in their libraries.
    • By pristondev
      Hey guys, Im getting bounding box of a mesh in my engine using D3DXComputeBoundingBox, but when I use this function, looks like the mesh is every on position (0,0,0), but it isn't.
      The bounding box should be in position of sphere, and dont (0,0,0)

      D3DXComputeBoundingBox getting wrong sMin and sMax (how we can see on the pic, it isnt a problem of render...)

      How it should be:

      The code im using to get bounding box:
      BYTE * pData; pMeshContainer->MeshData.pMesh->LockVertexBuffer( D3DLOCK_READONLY, (void**)&pData ); //Compute Bounding Box D3DXComputeBoundingBox( (const D3DXVECTOR3*)(pData), pMeshContainer->MeshData.pMesh->GetNumVertices(), pMeshContainer->MeshData.pMesh->GetNumBytesPerVertex(), &pMeshContainer->cBoundingBox.sMin, &pMeshContainer->cBoundingBox.sMax ); pMeshContainer->cBoundingBox.sMid = (pMeshContainer->cBoundingBox.sMax - pMeshContainer->cBoundingBox.sMin) * 0.5f; pMeshContainer->cBoundingBox.sCenter = (pMeshContainer->cBoundingBox.sMax + pMeshContainer->cBoundingBox.sMin) * 0.5f; //Compute Bounding Sphere D3DXComputeBoundingSphere( (const D3DXVECTOR3*)(pData), pMeshContainer->MeshData.pMesh->GetNumVertices(), pMeshContainer->MeshData.pMesh->GetNumBytesPerVertex(), &pMeshContainer->cBoundingSphere.sCenter, &pMeshContainer->cBoundingSphere.fRadius ); pMeshContainer->MeshData.pMesh->UnlockVertexBuffer(); //We have min and max values, use these to get the 8 corners of the bounding box pMeshContainer->cBoundingBox.sBoxPoints[0] = D3DXVECTOR3( pMeshContainer->cBoundingBox.sMin.x, pMeshContainer->cBoundingBox.sMin.y, pMeshContainer->cBoundingBox.sMin.z ); //xyz pMeshContainer->cBoundingBox.sBoxPoints[1] = D3DXVECTOR3( pMeshContainer->cBoundingBox.sMax.x, pMeshContainer->cBoundingBox.sMin.y, pMeshContainer->cBoundingBox.sMin.z ); //Xyz pMeshContainer->cBoundingBox.sBoxPoints[2] = D3DXVECTOR3( pMeshContainer->cBoundingBox.sMin.x, pMeshContainer->cBoundingBox.sMax.y, pMeshContainer->cBoundingBox.sMin.z ); //xYz pMeshContainer->cBoundingBox.sBoxPoints[3] = D3DXVECTOR3( pMeshContainer->cBoundingBox.sMax.x, pMeshContainer->cBoundingBox.sMax.y, pMeshContainer->cBoundingBox.sMin.z ); //XYz pMeshContainer->cBoundingBox.sBoxPoints[4] = D3DXVECTOR3( pMeshContainer->cBoundingBox.sMin.x, pMeshContainer->cBoundingBox.sMin.y, pMeshContainer->cBoundingBox.sMax.z ); //xyZ pMeshContainer->cBoundingBox.sBoxPoints[5] = D3DXVECTOR3( pMeshContainer->cBoundingBox.sMax.x, pMeshContainer->cBoundingBox.sMin.y, pMeshContainer->cBoundingBox.sMax.z ); //XyZ pMeshContainer->cBoundingBox.sBoxPoints[6] = D3DXVECTOR3( pMeshContainer->cBoundingBox.sMin.x, pMeshContainer->cBoundingBox.sMax.y, pMeshContainer->cBoundingBox.sMax.z ); //xYZ pMeshContainer->cBoundingBox.sBoxPoints[7] = D3DXVECTOR3( pMeshContainer->cBoundingBox.sMax.x, pMeshContainer->cBoundingBox.sMax.y, pMeshContainer->cBoundingBox.sMax.z ); //XYZ SAFE_RELEASE( pMeshContainer->lpBoundingBoxMesh ); SAFE_RELEASE( pMeshContainer->lpBoundingSphereMesh ); //Create Bounding Sphere Mesh D3DXCreateSphere( lpDevice, pMeshContainer->cBoundingSphere.fRadius, 15, 10, &pMeshContainer->lpBoundingSphereMesh, NULL ); //Create Bounding Box Mesh float fWidth = pMeshContainer->cBoundingBox.sMax.x - pMeshContainer->cBoundingBox.sMin.x; float fHeight = pMeshContainer->cBoundingBox.sMax.y - pMeshContainer->cBoundingBox.sMin.y; float fDepth = pMeshContainer->cBoundingBox.sMax.z - pMeshContainer->cBoundingBox.sMin.z; D3DXCreateBox( lpDevice, fWidth, fHeight, fDepth, &pMeshContainer->lpBoundingBoxMesh, NULL ); Im not using any World transform on the mesh or bounding box...
  • Popular Now