# (solved) [XNA] Bloom Postprocess performance

I'm using the standard bloom shader for XNA found here: http://creators.xna.com/en-US/sample/bloom It looks amazing. One problem I'm having though, is on lower end machines, the performance really takes a massive hit depending on what is happening on screen. If there are a lot of models or explosions the frame rate takes a steep dive. Is Bloom post-process performance dependent on screen composition? If so, is there any way to alleviate the FPS hit? Here's the code. Anyone see any ways to optimize?
// Pixel shader extracts the brighter areas of an image.
// This is the first step in applying a bloom postprocess.

sampler TextureSampler : register(s0);

float BloomThreshold;

float4 PixelShader(float2 texCoord : TEXCOORD0) : COLOR0
{
// Look up the original image color.
float4 c = tex2D(TextureSampler, texCoord);

// Adjust it to keep only values brighter than the specified threshold.
return saturate((c - BloomThreshold) / (1 - BloomThreshold));
}

technique BloomExtract
{
pass Pass1
{
PixelShader = compile ps_2_0 PixelShader();
}
}


// Pixel shader applies a one dimensional gaussian blur filter.
// This is used twice by the bloom postprocess, first to
// blur horizontally, and then again to blur vertically.

sampler TextureSampler : register(s0);

#define SAMPLE_COUNT 15

float2 SampleOffsets[SAMPLE_COUNT];
float SampleWeights[SAMPLE_COUNT];

float4 PixelShader(float2 texCoord : TEXCOORD0) : COLOR0
{
float4 c = 0;

// Combine a number of weighted image filter taps.
for (int i = 0; i < SAMPLE_COUNT; i++)
{
c += tex2D(TextureSampler, texCoord + SampleOffsets) * SampleWeights;
}

return c;
}

technique GaussianBlur
{
pass Pass1
{
PixelShader = compile ps_2_0 PixelShader();
}
}


// Pixel shader combines the bloom image with the original
// scene, using tweakable intensity levels and saturation.
// This is the final step in applying a bloom postprocess.

sampler BloomSampler : register(s0);
sampler BaseSampler : register(s1);

float BloomIntensity;
float BaseIntensity;

float BloomSaturation;
float BaseSaturation;

// Helper for modifying the saturation of a color.
float4 AdjustSaturation(float4 color, float saturation)
{
// The constants 0.3, 0.59, and 0.11 are chosen because the
// human eye is more sensitive to green light, and less to blue.
float grey = dot(color, float3(0.3, 0.59, 0.11));

return lerp(grey, color, saturation);
}

float4 PixelShader(float2 texCoord : TEXCOORD0) : COLOR0
{
// Look up the bloom and original base image colors.
float4 bloom = tex2D(BloomSampler, texCoord);
float4 base = tex2D(BaseSampler, texCoord);

// Adjust color saturation and intensity.
bloom = AdjustSaturation(bloom, BloomSaturation) * BloomIntensity;
base = AdjustSaturation(base, BaseSaturation) * BaseIntensity;

// Darken down the base image in areas where there is a lot of bloom,
// to prevent things looking excessively burned-out.
base *= (1 - saturate(bloom));

// Combine the two images.
return base + bloom;
}

technique BloomCombine
{
pass Pass1
{
PixelShader = compile ps_2_0 PixelShader();
}
}


[Edited by - EJH on May 3, 2009 11:50:07 PM]

If it's a full-screen pass that does a constant amount of work, there's no reason for it to take any more or less time depending on what's in the frame.

You can also try downsizing the bloom post process pass to 1/4 or 1/2 the size of the framebuffer.

Oh damn I just found out what I was doing wrong.

On resolution change you have to call LoadContent() on the shader, to make new render targets the size of the new backbuffer. I was forgetting to call UnloadContent() before calling LoadContent() again. So I think it was leaving a bunch of unused buffers on the GPU. Now it is very fast. =)

