Home » Community » Forums » Graphics Programming and Theory » hlsl bloom optimization
  Intel sponsors gamedev.net search:   
[Control Panel] [Register] [Bookmarks] [Who's Online] [Active Topics] [Stats] [FAQ] [Search]

Add Forum to Favorites |  Send Topic To a Friend | View Forum FAQ | Track this topic


 Last Thread Next Thread 
 hlsl bloom optimization
Post New Topic  Post Reply 
I have to questions for you guys ..

1. What are the optimizations that I can make to my bloom shader? (see code below)
2. How do I properly reduce the number of samples taken without losing quality? I remember reading something somewhere that said I should be able to calculate values in between texels and adjust the weights appropriately, but I dont remember what or where it was. What is the algorithm used for this?

//Constants
static const int cKernelSize = 13;
static const float fBrightThreshold = 0.6f; 
static const float fBloomScale = 1.5f;

//Extern parameters
uniform extern texture 		Tex;
uniform extern texture		OriginalTex; 
uniform extern unsigned int	unWidth;
uniform extern unsigned int	unHeight;

//Offsets
uniform extern float2 		BlurOffsetsH[cKernelSize];
uniform extern float2 		BlurOffsetsV[cKernelSize];
uniform extern float2 		ScaleOffsets[16];

//Samplers
sampler2D g_SampTex = sampler_state
{
	Texture = <Tex>;
	AddressU = Clamp;
    AddressV = Clamp;
    MinFilter = Point;
    MagFilter = Linear;
    MipFilter = None;
};

sampler2D g_SampOriginalTex = sampler_state
{
    Texture = <OriginalTex>;
    AddressU = Clamp;
    AddressV = Clamp;
    MinFilter = Linear;
    MagFilter = Linear;
    MipFilter = None;
};

//Blur Weights
static const float BlurWeights[cKernelSize] = 
{
	1.0f / 4096.0f,
	12.0f / 4096.0f,
	66.0f / 4096.0f,
	220.0f / 4096.0f,
	495.0f / 4096.0f,
	792.0f / 4096.0f,
	924.0f / 4096.0f,
	792.0f / 4096.0f,
	495.0f / 4096.0f,
	220.0f / 4096.0f,
	66.0f / 4096.0f,
	12.0f / 4096.0f,
	1.0f / 4096.0f,
}; 

//Brightpass
float4 DownSample4xBrightPass(float2 uv : TEXCOORD1 ) : COLOR0
{   
    float4 Color = 0;
   
    for (int i = 0; i < 16; i++)
    {
        Color += tex2D( g_SampTex, uv + ScaleOffsets[i].xy ) - fBrightThreshold;
    }

    return Color / 16;
}

//Horizontal Blur
float4 HorizontalBlurPS(float2 uv : TEXCOORD1 ) : COLOR0
{
    float4 Color = 0;
    
    for (int i = 0; i < cKernelSize; i++)
    {    
        Color += tex2D( g_SampTex, uv + BlurOffsetsH[i].xy ) * BlurWeights[i];
    }

    return Color * fBloomScale;
}

//Vertical Blur
float4 VerticalBlurPS(float2 uv : TEXCOORD1 ) : COLOR0
{
    float4 Color = 0;

    for (int i = 0; i < cKernelSize; i++)
    {
        Color += tex2D( g_SampTex, uv + BlurOffsetsV[i].xy ) * BlurWeights[i];
    }

    return Color * fBloomScale;
}

//Upscale and blend
float4 UpSample4xBlend(float2 uv : TEXCOORD0 ) : COLOR0
{
	return tex2D( g_SampTex, uv) * 0.8 + tex2D( g_SampOriginalTex, uv);
}

technique TBloom
{
    pass p0
    {
        VertexShader = null;
        PixelShader = compile ps_2_0 DownSample4xBrightPass();
    }

    pass p1
    {
        VertexShader = null;
        PixelShader = compile ps_2_0 HorizontalBlurPS();
    }

    pass p2
    {
        VertexShader = null;
        PixelShader = compile ps_2_0 VerticalBlurPS();
    }

    pass p3
    {
        VertexShader = null;
        PixelShader = compile ps_2_0 UpSample4xBlend();
    }
}



 User Rating: 893   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Usually you have a bunch of effects in your post effects pipeline. Bloom is just one of them. Your best optimization path is then to combine the effects in a way that they use the available resources == blurred render targets, luminance render targets etc. in the best possible way.
This will be overall your biggest optimization gain.
Obviously optimizations on the algo level are you next best bet ... but you probably already made several of those to achieve your first goal that was described in the previous paragraph. You might check out the algorithms described in my PostFX talk on GDC 2007 here. You can find it at

http://www.coretechniques.info/index_2007.html

If you look at the algorithms described in there, they are already optimized versions of what is usually used. Overall on a 360 or PS3 all the PostFX you find in this presentation should clock in at about 4 - 5ms GPU time.

 User Rating: 1484   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

oh one thing: in general you only want to use 4-tap filter kernels where ever possible. You might try to figure out ways to get any filter kernel to 4-taps ... one is for example if your hardware supports bilinear filtering of your render targets to switch on bilinear filtering and set-off your filter kernel accordingly ... sometimes not quite the same but sometimes close enough :-) ... depends on your hardware. On some hardware this might not make a big difference so ... so ask PIX for this :-)

http://diaryofagraphicsprogrammer.blogspot.com/
Check out our online D3D10 book: Programming Vertex, Geometry, and Pixel Shaders

 User Rating: 1484   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

what do you mean 4-tap? 4 samples?

 User Rating: 893   |  Rate This User  Send Private MessageView Profile Report this Post to a Moderator | Link

Quote:
Original post by unexist
what do you mean 4-tap? 4 samples?


Yup. Taps == samples when you're talking about filters.




Matt Pettineo | DirectX/XNA MVP

Ride into The Danger Zone

PIX With XNA Tutorial

 User Rating: 1868   |  Rate This User  Send Private MessageView ProfileView Journal Report this Post to a Moderator | Link

All times are ET (US)

Post Reply
 Last Thread Next Thread 
Forum Rules:
You may not post new threads
You may post replies
You may not edit your posts
You may not use HTML in your posts
Jump To:
Administrative Options: