Sign in to follow this  

HLSL - use multi-pass or..?

This topic is 4296 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi, I'm looking at HLSL blurs, and the standard method seems to be a fake gaussian by first blurring on one axis, then the other. How do I actually do that? Do I have to render the first axis blur to a rendertarget, grab that, and do the other axis? Would be pretty slow. Or is this where multi-pass shaders come in? I haven't found any good texts on this so I don't know exactly how it works - is the result of pass 0 fed back into pass 1 (that's what I want)? (I want to implement Kawase's blur method, which also includes up- and downsampling. Does anyone has anything to share? In MDX even?) Thanks!

Share this post


Link to post
Share on other sites
MDXInfo to the rescue once again!
So, just so I get this straight - multipass *does* mean taking the output of last pass and doing new operations on it?

Does each pass have its own max instruction set? Each PS pass in 2.0 can use 64 ops?

Another quick one (I feel stupid for asking but I want to make sure): the "compile" below is a load-time command right (when creating the effect)? It doesn't actually compile VertexShader() each pass, each time the technique is used?

technique bloom
{
pass p0
{
vertexshader = compile vs_1_1 VertexShader();
pixelshader = compile ps_2_0 Bloom_X();
}
pass p1
{
vertexshader = compile vs_1_1 VertexShader();
pixelshader = compile ps_2_0 Bloom_Y();
}
}

TIA

Share this post


Link to post
Share on other sites
Quote:
Original post by Jonas B
MDXInfo to the rescue once again!
So, just so I get this straight - multipass *does* mean taking the output of last pass and doing new operations on it?

yes, exactly.

Quote:
Original post by Jonas B
Does each pass have its own max instruction set? Each PS pass in 2.0 can use 64 ops?

Yes, exactly. The GPU doesn't care you actually run the second shader on the output of the first one. You have the same limitation as you'd have with two totally independent shaders.

Quote:
Original post by Jonas B
Another quick one (I feel stupid for asking but I want to make sure): the "compile" below is a load-time command right (when creating the effect)? It doesn't actually compile VertexShader() each pass, each time the technique is used?
...

Sure, it will compile once and use it afterwards.

kp

Share this post


Link to post
Share on other sites
Quote:
the standard method seems to be a fake gaussian by first blurring on one axis, then the other.
Gaussian blur is seperable along the two axis - there's nothing fake about it [grin]

Have a look at This ATI paper for some more details on the seperability of a 2D gaussian function.

Both the PostProcess and HDRPipeline (shameless plug! [oh]) demonstrate most of the things you've mentioned in this thread. Sure, they're both native-DX, but once you get into the shader/effect side of things it's the same [smile]

Quote:
It doesn't actually compile VertexShader() each pass, each time the technique is used?
As kovacsp said, it's a load-time operation - but it does afford you some interesting tricks [wink]

With the effects framework you can use the uniform keyword and get some useful code re-use. Saved me a lot of time in my recent project (lots and lots and lots of post-processing [grin]).

hth
Jack

Share this post


Link to post
Share on other sites
Great docs, thanks!

The ATI doc speaks of inner and outer taps, though I fail to see the difference. My hunch is that there's some kind of caching of pixels close to the current one, and looking them up is called inner tapping. Close? Cigar?


This also seems a bit strange (also from ATI's blur):
Color[1] = tex2D(linearImageSampler, tap12); // samples 1, 2

samples 1 and 2? When it's just one sample? I've got to be missing something in their algorithm... Same "contradiction" in their explanation:
"One centertap, six innertaps and six outertaps [...] Sample 25 texels"
1+6+6=25?

Share this post


Link to post
Share on other sites
Quote:
Original post by Jonas B
The ATI doc speaks of inner and outer taps, though I fail to see the difference. My hunch is that there's some kind of caching of pixels close to the current one, and looking them up is called inner tapping. Close? Cigar?

As far as I can recall that article, the texccords for the inner taps ar calculated in the vertex shader running on the corners of the fullscreen quad, and passed down interploated from the vertex to the pixel shader. The UV coords for outer taps are calculated in the pixel shader, as the number of UVs that can be passed down are limited, so if you want more, you need to do it there. (it's slower to comupute the UVs in the pixel shader, as you run it for every pixel, so you want to lift as many of these to the vertex shader as you can)

Quote:
Original post by Jonas B
This also seems a bit strange (also from ATI's blur):
Color[1] = tex2D(linearImageSampler, tap12); // samples 1, 2

samples 1 and 2? When it's just one sample? I've got to be missing something in their algorithm... Same "contradiction" in their explanation:
"One centertap, six innertaps and six outertaps [...] Sample 25 texels"
1+6+6=25?

If you sample exactly between two texels, and switch bilinear filtering on, then the hardware will sample both texels, and give you back an average. You can (mis)use this feature to sample two pixels with one tap, thus getting twice as wide filters.
Have a look at this article (the What Size Rendertarget? section) where it mentions that correct (that is, one-to-one) texture sampling will occour only if you add a half texel offset. If you don't, you'll get the averaging through bilinear filtering even if you don't want. But in this case, you want.

With this: 1+2*6+2*6 = 25

kp

Share this post


Link to post
Share on other sites
That makes sense. Too bad they didn't include the vertex shader showing how to generate those texcoords (I've only messed with PS so far) but I should be able to experimentally find my way by RenderMonkeying.

Share this post


Link to post
Share on other sites
Quote:
Original post by Jonas B
That makes sense. Too bad they didn't include the vertex shader showing how to generate those texcoords (I've only messed with PS so far) but I should be able to experimentally find my way by RenderMonkeying.
Oh, if this is all you need then here, have my shaders (for horizontal blur):
Vertex shader:

extern float shadowMapSizeXReciproc;

struct VSOUTPUT_BLUR
{
float4 vPosition : POSITION;
float2 vTexCoord : TEXCOORD0;
float2 vTexCoord_p1: TEXCOORD1;
float2 vTexCoord_p2: TEXCOORD2;
float2 vTexCoord_p3: TEXCOORD3;
float2 vTexCoord_n1: TEXCOORD4;
float2 vTexCoord_n2: TEXCOORD5;
float2 vTexCoord_n3: TEXCOORD6;
};

VSOUTPUT_BLUR main( float4 inPosition : POSITION, float2 inTexCoord : TEXCOORD0 )
{
// Output struct
VSOUTPUT_BLUR OUT = (VSOUTPUT_BLUR)0;

// Output the position
OUT.vPosition = inPosition;

// Output the texture coordinates
float2 texelSize = float2(shadowMapSizeXReciproc, 0.0);
OUT.vTexCoord = inTexCoord;
OUT.vTexCoord_p1 = inTexCoord + texelSize;
OUT.vTexCoord_p2 = inTexCoord + texelSize * 2;
OUT.vTexCoord_p3 = inTexCoord + texelSize * 3;
OUT.vTexCoord_n1 = inTexCoord - texelSize;
OUT.vTexCoord_n2 = inTexCoord - texelSize * 2;
OUT.vTexCoord_n3 = inTexCoord - texelSize * 3;

return OUT;
}




Pixel shader:

struct VSOUTPUT_BLUR
{
float4 vPosition : POSITION;
float2 vTexCoord : TEXCOORD0;
float2 vTexCoord_p1: TEXCOORD1;
float2 vTexCoord_p2: TEXCOORD2;
float2 vTexCoord_p3: TEXCOORD3;
float2 vTexCoord_n1: TEXCOORD4;
float2 vTexCoord_n2: TEXCOORD5;
float2 vTexCoord_n3: TEXCOORD6;
};

sampler shadowedPixels: register(s0);

float4 main(VSOUTPUT_BLUR IN) : COLOR0
{
half ret = 0.0f;

ret+= tex2D(shadowedPixels, IN.vTexCoord);
ret+= tex2D(shadowedPixels, IN.vTexCoord_p1);
ret+= tex2D(shadowedPixels, IN.vTexCoord_p2);
ret+= tex2D(shadowedPixels, IN.vTexCoord_p3);
ret+= tex2D(shadowedPixels, IN.vTexCoord_n1);
ret+= tex2D(shadowedPixels, IN.vTexCoord_n2);
ret+= tex2D(shadowedPixels, IN.vTexCoord_n3);

return(ret / 7);
}




This one doesn't use the "sample between texels" trick, and it does a simple box filter instead of gaussian, but it gives good results for me.. use it or modify it as you wish..

kp

Share this post


Link to post
Share on other sites

This topic is 4296 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this