Jump to content
  • Advertisement
Sign in to follow this  
MysteryX

HLSL Dithering

This topic is 820 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm processing video data through a series of HLSL shaders with 16-bit-depth before returning 8-bit-depth data back to the CPU, using DirectX 9.

 

I just realized I'm not applying any dithering!

 

I saw this option in DirectX. What kind of dithering does it apply? Some people told me it probably depends on the GPU and I shouldn't rely on it.

m_pDevice->SetRenderState(D3DRS_DITHERENABLE, TRUE);
 
I would like to apply Ordered Dither at the last stage of processing. Has anyone done such an implementation?

Share this post


Link to post
Share on other sites
Advertisement

Hi, 

 

You may first use this script to generate a texture which contains the Bayer matrix (with threshold value encoded into the RG channel):

 

https://jsfiddle.net/ming4883/9qmjssuq/

 

and then fetch the threshold values using the following HLSL

uniform sampler2D _BayerTex; // the generated texture
uniform float4 _BayerTex_TexelSize; // 1.0 / width, 1.0 / height, width, height

float Bayer( float2 uv )
{
	uv = uv * _BayerTex_TexelSize.xy;
	float2 val = dot(tex2D(_BayerTex, uv).rg, float2(256.0 * 255.0, 255.0));
	val = val * _BayerTex_TexelSize.x * _BayerTex_TexelSize.y;
	return val;
}

where uv is screen space xy coordinate (i.e. 0.. screen width, 0... screen height);

and make sure you set the sampler of _BayerTex to D3DTEXF_POINT and  D3DTADDRESS_WRAP.

 

Hope this help

Share this post


Link to post
Share on other sites

Thanks! So I'd have to translate that TypeScript into C++?

 

I just found this.

 

D3DXLoadSurfaceFromSurface has the option D3DX_FILTER_DITHER

"The resulting image must be dithered using a 4x4 ordered dither algorithm."

 

Contrary to D3DRS_DITHERENABLE, this one clearly defines that it does.

 
Is that what I'm looking for or your implementation is better?
Edited by MysteryX

Share this post


Link to post
Share on other sites

OK, your version allows using a 4x4, 5x5 or 6x6 matrix while D3DX_FILTER_DITHER uses only 4x4. Result-wise, I suppose it should be the same?

 

However, I came into an implementation problem. It needs to be applied when I downgrade from 16-bit to 8-bit. It also needs to be done on the GPU because transfering back to the CPU is a serious bottleneck. I need to read from the Render Target, copy into another texture while applying Dither, and then transfer to the CPU. However, it appears that transferring back to the CPU with D3DXLoadSurfaceFromSurface is *MUCH* slower than using GetRenderTargetData (no idea why).

 

One work-around would be to re-run the dithered image through another loop of processing (ugly design).

 

Another work-around would be to use D3DRS_DITHERENABLE while rendering, but I have no clue what kind of dithering it applies.

 

Or perhaps I'm best to adapt the code you posted. I'll need to translate it into PS_3_0 and C++ (and I know nothing of writing HLSL)

 

Oh, your code uses 16x16, 32x32 or 64x64 matrix instead of 4x4! That's a considerable difference, although I'm not sure which is best before doing x265 encoding.

 

 

 

The size of the map selected should be equal to or larger than the ratio of source colors to target colors. For example, when quantizing a 24bpp image to 15bpp (256 colors per channel to 32 colors per channel), the smallest map one would choose would be 4x2, for the ratio of 8 (256:32). This allows expressing each distinct tone of the input with different dithering patterns

 

Which means that to convert 16-bit (65536) to 8-bit (256), I need a ratio of 256, which means I need a 16x16 grid. The 4x4 grid definitely won't be optimal. I could code the generation of the matrix, or perhaps the best is to simply hardcode the values. Anyone knows where I can find it?

Edited by MysteryX

Share this post


Link to post
Share on other sites

and then fetch the threshold values using the following HLSL

uniform sampler2D _BayerTex; // the generated texture
uniform float4 _BayerTex_TexelSize; // 1.0 / width, 1.0 / height, width, height

float Bayer( float2 uv )
{
	uv = uv * _BayerTex_TexelSize.xy;
	float2 val = dot(tex2D(_BayerTex, uv).rg, float2(256.0 * 255.0, 255.0));
	val = val * _BayerTex_TexelSize.x * _BayerTex_TexelSize.y;
	return val;
}

where uv is screen space xy coordinate (i.e. 0.. screen width, 0... screen height);

and make sure you set the sampler of _BayerTex to D3DTEXF_POINT and  D3DTADDRESS_WRAP.

 

Hope this help

 

OK. This needs to be called from another PS_3_0 HLSL file that has a Main entry point, correct? What would that main function look like?

 

 

I found this 32x32 Bayer Matrix in MPC-HC's source code. I can just trim it into 16x16 and discard the extra values, correct?

Edited by MysteryX

Share this post


Link to post
Share on other sites

I'm really not good with HLSL but here's what I managed to fetch. Is this script correct? Either way it compiles and will be easy to change after your feedback.

 

It just gives "warning X3206: implicit truncation of vector type" on "return val"

sampler s0 : register(s0);
sampler s1 : register(s1); // the Bayer Matrix texture
float4 p0 :  register(c0);
float2 p1 :  register(c1);
float2 MatrixSize : register(c2);

#define width  (p0[0])
#define height (p0[1])
#define px (p1[0])
#define py (p1[1])

float Bayer(float2 uv)
{
	uv = uv * p1.xy;
	float2 val = dot(tex2D(s1, uv).rg, float2(256.0 * 255.0, 255.0));
	val = val * MatrixSize.x * MatrixSize.y;
	return val;
}

// -- Main code --
float4 main(float2 tex : TEXCOORD0) : COLOR {
    float4 c0 = tex2D(s0, tex);
    c0.x = c0.x + Bayer(tex);
    c0.y = c0.y + Bayer(tex);
    c0.z = c0.z + Bayer(tex);
    return c0;
}

Do I apply the same noise on all 3 channels?

 

If I'm copying these values (0x2c90, 0x38f4, 0x3bba, 0x29e0, etc) into a D3DFMT_A8R8G8B8 texture, it would make more sense to use 'bg' instead of 'rg' and copy each value as the first 2 bytes of the 4-byte pixel.

Edited by MysteryX

Share this post


Link to post
Share on other sites

I wrote the code to create the Bayer Matrix, trimming the 32x32 matrix from MPC-HC at 16x16 and copying each value into the B and G fields of a BGRA texture (does the order between both byte fields matter?)

#include "Dither.h"

// Dither matrix in 16-bit floating point format
const unsigned short DITHER_MATRIX[DITHER_MATRIX_SIZE][DITHER_MATRIX_SIZE] = {
0x2c90, 0x38f4, 0x3bba, 0x29e0, 0x35f4, 0x3230, 0x3bbc, 0x3924, 0x3a46, 0x3644, 0x39e2, 0x370c, 0x3444, 0x3b1a, 0x3140, 0x39d2,
0x385a, 0x3b24, 0x2c10, 0x38c6, 0x3808, 0x2780, 0x3bbe, 0x37f8, 0x350c, 0x3a6c, 0x3368, 0x3bc0, 0x3000, 0x3886, 0x31b0, 0x3554,
0x3a94, 0x3618, 0x3430, 0x3a34, 0x3834, 0x39fe, 0x2740, 0x3758, 0x3494, 0x3b7a, 0x2700, 0x3958, 0x3858, 0x3a24, 0x364c, 0x3bc2,
0x3278, 0x3a22, 0x353c, 0x39de, 0x3268, 0x3a98, 0x36fc, 0x2ed0, 0x39e0, 0x30f0, 0x381a, 0x3996, 0x35ac, 0x3af2, 0x39b8, 0x37bc,
0x3250, 0x39dc, 0x3800, 0x30e8, 0x3b42, 0x34d4, 0x3970, 0x3afe, 0x3020, 0x3898, 0x33e8, 0x3b34, 0x2e10, 0x3320, 0x391a, 0x26c0,
0x3784, 0x38de, 0x3060, 0x3b5c, 0x3600, 0x38e6, 0x3490, 0x3b2a, 0x387a, 0x365c, 0x3b3c, 0x2be0, 0x37ac, 0x33d8, 0x2680, 0x3b98,
0x38d6, 0x2a60, 0x3b7e, 0x391e, 0x36d0, 0x2fe0, 0x3812, 0x32a0, 0x3a84, 0x36b0, 0x3a50, 0x357c, 0x37dc, 0x3b68, 0x3594, 0x3aca,
0x344c, 0x3a7c, 0x3674, 0x3884, 0x2d30, 0x3a48, 0x3170, 0x398e, 0x2900, 0x3a30, 0x34bc, 0x38ea, 0x3b70, 0x3a3c, 0x3852, 0x3460,
0x3b04, 0x37a0, 0x351c, 0x2d40, 0x3a80, 0x394e, 0x3b84, 0x3614, 0x3900, 0x2b20, 0x396c, 0x31b8, 0x38ca, 0x3a0c, 0x3038, 0x385c,
0x39a2, 0x2c70, 0x3ba2, 0x3464, 0x3992, 0x36dc, 0x3bc4, 0x3580, 0x3824, 0x32d0, 0x3abc, 0x2ec0, 0x3560, 0x30f8, 0x3974, 0x3610,
0x3a12, 0x3110, 0x3aaa, 0x38a2, 0x35e4, 0x341c, 0x28c0, 0x3a02, 0x34a8, 0x3b60, 0x3790, 0x3aa2, 0x2c40, 0x346c, 0x373c, 0x3bc6,
0x32f0, 0x37e8, 0x391c, 0x3100, 0x3af6, 0x2640, 0x3868, 0x3098, 0x3b3e, 0x3944, 0x3620, 0x3870, 0x39da, 0x374c, 0x3bc8, 0x2e20,
0x3804, 0x3932, 0x3660, 0x3260, 0x3bca, 0x38ce, 0x3ade, 0x382e, 0x30a0, 0x389e, 0x33a0, 0x363c, 0x3b86, 0x3910, 0x3a58, 0x2820,
0x36a0, 0x3b28, 0x34e0, 0x3a40, 0x3768, 0x3510, 0x3a54, 0x390e, 0x36e8, 0x2ae0, 0x3bcc, 0x31a0, 0x3aa4, 0x2600, 0x38cc, 0x3400,
0x3ac4, 0x2800, 0x3b4a, 0x39ee, 0x2cc0, 0x3764, 0x31c8, 0x35cc, 0x3bb6, 0x39a8, 0x2f30, 0x3a1e, 0x3816, 0x3160, 0x35b0, 0x389a,
0x3a86, 0x3070, 0x3848, 0x2d70, 0x38ba, 0x3baa, 0x2e60, 0x3414, 0x3ae4, 0x3544, 0x3a06, 0x37fc, 0x347c, 0x36d8, 0x3b12, 0x35a4};

HRESULT __stdcall CopyDitherMatrixToSurface(InputTexture* dst, IScriptEnvironment* env) {
	// Copy into BG values of BGRA texture
	int TempMatrix[DITHER_MATRIX_SIZE][DITHER_MATRIX_SIZE]{ };
	short* pOut;
	for (int i = 0; i < DITHER_MATRIX_SIZE; ++i) {
		for (int j = 0; j < DITHER_MATRIX_SIZE; ++j) {
			*(short*)&TempMatrix[i][j] = DITHER_MATRIX[i][j];
		}
	}

	HR(CopyAviSynthToBuffer((byte*)&TempMatrix, 4 * DITHER_MATRIX_SIZE, 1, DITHER_MATRIX_SIZE, DITHER_MATRIX_SIZE, dst, env));
	return S_OK;
}

HRESULT __stdcall CopyAviSynthToBuffer(const byte* src, int srcPitch, int clipPrecision, int width, int height, InputTexture* dst, IScriptEnvironment* env) {
	// Copies source frame into main surface buffer, or into additional input textures
	RECT SrcRect;
	SrcRect.top = 0;
	SrcRect.left = 0;
	SrcRect.right = width;
	SrcRect.bottom = height;
	HR(D3DXLoadSurfaceFromMemory(dst->Surface, nullptr, nullptr, src, GetD3DFormat(clipPrecision, false), srcPitch, nullptr, &SrcRect, D3DX_FILTER_NONE, 0));
	return S_OK;
}

However, I don't understand the logic of the shader.

sampler s0 : register(s0);
uniform sampler s1 : register(s1); // the Bayer Matrix texture
float4 p0 :  register(c0);
float2 p1 :  register(c1);
uniform float4 MatrixSize : register(c2); // width, height, 1/width, 1/height
 
#define width  (p0[0])
#define height (p0[1])
#define px (p1[0])
#define py (p1[1])
 
float Bayer(float2 uv)
{
    uv = uv * MatrixSize.zw;
    float2 val = dot(tex2D(s1, uv).bg, float2(256.0 * 255.0, 255.0));
    val = val * MatrixSize.z * MatrixSize.w;
    return val;
}
 
// -- Main code --
float4 main(float2 tex : TEXCOORD0) : COLOR {
    float4 c0 = tex2D(s0, tex);
    c0.x = c0.x + 1 / Bayer(tex);
    c0.y = c0.y + 1 / Bayer(tex);
    c0.z = c0.z + 1 / Bayer(tex);
    return c0;
}

 
Can someone review this HLSL code?
 
Thanks

Edited by MysteryX

Share this post


Link to post
Share on other sites

This looks close enough. It appears to be working, unless I'm missing something.

sampler s0 : register(s0);
uniform sampler s1 : register(s1); // the Bayer Matrix texture
float4 p0 :  register(c0);
float2 p1 :  register(c1);
uniform float4 MatrixSize : register(c2); // width, height, 1/width, 1/height

#define width  (p0[0])
#define height (p0[1])
#define px (p1[0])
#define py (p1[1])

float Bayer(float2 uv)
{
    uv = uv * MatrixSize.zw;
    float val = dot(tex2D(s1, uv).bg, float2(256.0 * 255.0, 255.0));
    val = val * MatrixSize.z * MatrixSize.w;
    return val;
}

// -- Main code --
float4 main(float2 tex : TEXCOORD0) : COLOR {
    float4 c0 = tex2D(s0, tex);
    c0.xyz += ((Bayer(tex) - 128.0) / 256.0 / 255.0);
    return c0;
}

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!