Color correction - 3D LUT

Started by
7 comments, last by wh1sp3rik 11 years ago

Hello,

I am trying to implement somekind of color correction.

I already implemented 1D LUT color correction, which is really easy to code and palette is easy to make.

Now, I would like to use 3D LUT, so i am able to control more color tranformations.

It's based on this http://udn.epicgames.com/Three/ColorGrading.html

I managed to use their standard palette, but problem is, i can't use hardware filtering as pixel is affected by another 16x16 table on left ( and right side ).

It's should be easy to do it with 3D texture, but how can i generate such a texture ? For me, easiest way is just use 2D texture like on the website i sent.

Do you have any ideas ? some tricks ?

Thank you very much.

DirectX 11, C++

Advertisement
You really need to ensure you remap the UV's to address from pixel centres to get the full range. This will stop the bleeding onto neighbour charts and give more exact results.

So if XY is RG and B is the chart index then do

UV.rg = (UV.rg * 15.0f/16.0f) + (0.5f/16.0f);
You may not need the (0.5f/16.0f) at the end depending on whether you need the half texel offset.

To filter across charts you'll need to do two taps and a lerp on the fractional distance between charts.

About bleeding, it's caused by anisotropic sampler. If i use point sampler, it's working fine, but it only uses 1024 colours.

I also tried interpolated colour between two samples .. and i had bad results .. perhaps, i just did something wrong. It worked on my paper hehe.

DirectX 11, C++

You definitely shouldn't be using an anisotropic sampler. Make sure you use linear to get the full range 2^24 colours. Post your HLSL code for the look up and it should be easy to fix.

Hello again :)

I think, i solved my problem. I just made a help function for converting from 2D UV to 3D UVW and back, and it helped me :)

This is my code, it's ComputeShader as I am doing mostly only CS, it's faster then vertex shader+pixel shader ( rendering quadscreen, etc ... ).


Texture2D<float4>   backbuffer             : register(t0);
RWTexture2D<float4> backbufferOut          : register(u0);
Texture2D<float4>   palette                : register(t1);

struct CSInput
{
   uint3 groupID  : SV_GroupID;
   uint3 threadID : SV_DispatchThreadID;
};


uint3 To3D( uint2 uv )
{
    uint blue = floor( uv.x / 16.0 );
    uint red  = uv.x - blue*16;
    uint green= uv.y;
    return uint3(red,green,blue);
}

uint2 To2D( uint3 uvw )
{
    uint v = uvw.y;
    uint u = uvw.z*16+uvw.x;
    return uint2(u,v);
}


[numthreads(16, 16, 1)]
void main( CSInput input )
{		
	float4 oldColor      = backbuffer.Load(uint3(input.threadID.xy,0));
    uint3  uColor        = oldColor.xyz*255;

    uint3  uvw_low       = floor( uColor/16.0);
    uint3  uvw_high      = ceil(  uColor/16.0);
    float3 uvw_avg       = uColor/16.0;

    float  xpercent = 0.0; float ypercent = 0.0; float zpercent = 0.0;
    if( uvw_high.x < 16) {if(uvw_high.x-uvw_low.x > 0) xpercent = 1.0-(uvw_high.x-uvw_avg.x); } else uvw_high.x=15;
    if( uvw_high.y < 16) {if(uvw_high.y-uvw_low.y > 0) ypercent = 1.0-(uvw_high.y-uvw_avg.y); } else uvw_high.y=15;
    if( uvw_high.z < 16) {if(uvw_high.z-uvw_low.z > 0) zpercent = 1.0-(uvw_high.z-uvw_avg.z); } else uvw_high.z=15;

    float4 color1         = palette.Load( uint3( To2D(uvw_low),0 ) );    
    float4 color2         = palette.Load( uint3( To2D(uvw_high),0 ) );
    float4 color3         = float4(0,0,0,1);

    color3.x = lerp( color1.x, color2.x, xpercent ); 
    color3.y = lerp( color1.y, color2.y, ypercent );
    color3.z = lerp( color1.z, color2.z, zpercent );   

	backbufferOut[input.threadID.xy] = color3;
}

if anyone has any idea, how to optimalize it (if it's possible ), i will be happy. Thanks

DirectX 11, C++

Few ideas:

  • Use 1.0f/16.0f as a variable at the beginning of the shader (or a constant, whichever) and use it to multiply instead of divide, you'll save yourself a little bit of performance that way (maybe not a ton but still worth trying).
  • By the looks of it I'd say your if( uvw_high.x < 16) branches can be avoided - definitely try doing this as it'll probably speed up the shader. Even just comparsion (uvw_high.x < 16 ? ooohh : aaahhh) might be better - of course YMMV.
  • You're not really using groupshared memory or things of that sort, so I'm actually questioning how much faster this would be over a pixel shader implementation - it might even be slower (depending on your renderer setup of course, context switches might screw it up if you're doing mostly compute and shifting back and fourth). But don't mind me here, I'm just thinking out loud and wouldn't know without trying both. :)
  • If you wanna squeeze out the absolute most out of your shader then maybe give this a read: "Low-level thinking in high-level shading languages". You'd be amazed at how much a few extra brackets and some small reordering can do for your performance (I've been following these guidelines since it was posted, hasn't let me down).

Use 1.0f/16.0f as a variable at the beginning of the shader (or a constant, whichever) and use it to multiply instead of divide, you'll save yourself a little bit of performance that way (maybe not a ton but still worth trying).

This should make no difference. HLSL compilers can replace things like divide by 2 with multiply by 0.5, and other obvious optimizations.

UV.rg = (UV.rg * 15.0f/16.0f) + (0.5f/16.0f);
You may not need the (0.5f/16.0f) at the end depending on whether you need the half texel offset.

You definitely need the 0.5 offset, as that's how texture coordinates work on every API. In a 16x16 texture, the first pixel is at 0.5/16 and the last at 15.5/16.
If you're using an unwrapped 256x16, then for the x-axis your coord ranges are 0.5/256 to 15.5/256, 16.5/256 to 31.5/256, etc...

If you don't correctly address your texel centers like this, there will be all sorts of subtle problems with your colour LUT.

There is also the infamous "half pixel offset" issue, which only affects D3D9, but that's something else entirely (to do with rasterization of triangles).

Hi Hodgman,

i did not know about halfpixel for just sampling a texture. I know it only for dx9 quad screen rendering.

My case is a bit different here as I am using LOAD function, where input is UINT3. Not sure, if it still needs half pixel for float uv.

Styves: Thanks for ideas, i will check it. I will try to think about IF conditions, there are here to avoid reading from different 16x16 block.

DirectX 11, C++

This topic is closed to new replies.

Advertisement