I have here a solution of a gaussian blur - implemented in the compute shader and now i want to implement a bilateral blur (filter) but i don't know where to start

The formula for gaussian blur in 1D is (the implementation is splitted in horizontal and vertical to be more efficient):

The formula for bilateral blur is:

The problem is that i don't know how to apply the values i have from the gaussian blur on the bilateral blur.

The Source of the gaussian blur shader is:

cbuffer cbSettings { float gWeights[11] = { 0.05f, 0.05f, 0.1f, 0.1f, 0.1f, 0.2f, 0.1f, 0.1f, 0.1f, 0.05f, 0.05f, }; }; cbuffer cbFixed { static const int gBlurRadius = 5; }; Texture2D gInput; RWTexture2D<float4> gOutput; #define N 256 #define CacheSize (N + 2*gBlurRadius) groupshared float4 gCache[CacheSize]; [numthreads(N, 1, 1)] void HorzBlurCS(int3 groupThreadID : SV_GroupThreadID, int3 dispatchThreadID : SV_DispatchThreadID) { // // Fill local thread storage to reduce bandwidth. To blur // N pixels, we will need to load N + 2*BlurRadius pixels // due to the blur radius. // // This thread group runs N threads. To get the extra 2*BlurRadius pixels, // have 2*BlurRadius threads sample an extra pixel. if(groupThreadID.x < gBlurRadius) { // Clamp out of bound samples that occur at image borders. int x = max(dispatchThreadID.x - gBlurRadius, 0); gCache[groupThreadID.x] = gInput[int2(x, dispatchThreadID.y)]; } if(groupThreadID.x >= N-gBlurRadius) { // Clamp out of bound samples that occur at image borders. int x = min(dispatchThreadID.x + gBlurRadius, gInput.Length.x-1); gCache[groupThreadID.x+2*gBlurRadius] = gInput[int2(x, dispatchThreadID.y)]; } // Clamp out of bound samples that occur at image borders. gCache[groupThreadID.x+gBlurRadius] = gInput[min(dispatchThreadID.xy, gInput.Length.xy-1)]; // Wait for all threads to finish. GroupMemoryBarrierWithGroupSync(); // // Now blur each pixel. // float4 blurColor = float4(0, 0, 0, 0); [unroll] for(int i = -gBlurRadius; i <= gBlurRadius; ++i) { int k = groupThreadID.x + gBlurRadius + i; blurColor += gWeights[i+gBlurRadius]*gCache[k]; } gOutput[dispatchThreadID.xy] = blurColor; } [numthreads(1, N, 1)] void VertBlurCS(int3 groupThreadID : SV_GroupThreadID, int3 dispatchThreadID : SV_DispatchThreadID) { // // Fill local thread storage to reduce bandwidth. To blur // N pixels, we will need to load N + 2*BlurRadius pixels // due to the blur radius. // // This thread group runs N threads. To get the extra 2*BlurRadius pixels, // have 2*BlurRadius threads sample an extra pixel. if(groupThreadID.y < gBlurRadius) { // Clamp out of bound samples that occur at image borders. int y = max(dispatchThreadID.y - gBlurRadius, 0); gCache[groupThreadID.y] = gInput[int2(dispatchThreadID.x, y)]; } if(groupThreadID.y >= N-gBlurRadius) { // Clamp out of bound samples that occur at image borders. int y = min(dispatchThreadID.y + gBlurRadius, gInput.Length.y-1); gCache[groupThreadID.y+2*gBlurRadius] = gInput[int2(dispatchThreadID.x, y)]; } // Clamp out of bound samples that occur at image borders. gCache[groupThreadID.y+gBlurRadius] = gInput[min(dispatchThreadID.xy, gInput.Length.xy-1)]; // Wait for all threads to finish. GroupMemoryBarrierWithGroupSync(); // // Now blur each pixel. // float4 blurColor = float4(0, 0, 0, 0); [unroll] for(int i = -gBlurRadius; i <= gBlurRadius; ++i) { int k = groupThreadID.y + gBlurRadius + i; blurColor += gWeights[i+gBlurRadius]*gCache[k]; } gOutput[dispatchThreadID.xy] = blurColor; }

Even with the comments i feel very hard to follow. I'm totally not sure what values come from which calculus - so i also don't even have a idea how to implement it.

Maybe u can give me a hint whats exactly what or how i can approach the problem because i really don't get it by my own

Regards Helgon

Edit:

Some more questions

1) Almost every implementation of blurs i've found were done in the vertex shader. Is it really worth that "hard work" to do it in the CS? Of course its much quicker but is it such a huge difference? And if yes, why is it not done more often in the CS?

2) And another little question. Is the CS used often in game developing or is it more used by rendering softwares / mathematical applications?

**Edited by ~Helgon, 07 December 2012 - 09:41 PM.**