Sign in to follow this  
Ender1618

Sampling inner taps and outer taps??

Recommended Posts

Im implementing some fake HDR in HLSL via SM2. While figuring out how to write a blur filter, i have a come across some examples where the gaussian kernal diameter is 25 pixels, these pixels are sampled with 13 texture reads (in between texels) in 2 passes (h and v). These reads are seperated into 1 center point 6 inner taps and 6 outer taps. The texture coords for the inner taps are calculated by the vertex shader and passed as inputs to the pixel shader, the outer taps coords are feed to the pixel shader via a global variable array in the effect file (whose values are precomputed on the CPU, dependent on the target res). Is there some benefit to doing this over just setting all the 13 sample coords via the global variable array? Why the seperation into innter taps and outer taps? Im sure there is some reason for this but the examples dont really explain it.

Share this post


Link to post
Share on other sites
Ok i think this stems from my obvious misunderstanding of what is a dependent texture read and a non-dependent texture read. Why is sampling a texture offset from a global array less desirable than getting it from the vertexshader's interpolation. Does the hardware somehow prefetch these vs provided offsets, that can't be done when accessing a global array? Is there something funny about accessing a global array in a pixel shader? Can anyone explain this to me? I have googled it and have yet to find a satisfactory explanation.

Share this post


Link to post
Share on other sites
Using the interpolators gives values before the PS is executed. Involving ANY computations, be it actual arithmetic or using constants from an array requires execution of the PS (albeit however trivial).

Modern GPU's might have exceptionally high numbers attached to them but they are still massively pipelined and heavily biased in some cases. Fetching from memory can still have high latency at 100+ gigabytes per second.

Drivers (or lately the GPU) will try to hide this by scheduling arithmetic operations whilst the texture data is being fetched - parallelism at a low level. But if you have any sampling operations depending on arithmetic then this parallelism is broken as you explicitly synchronize the TMU's and ALU's...

To truly answer your question would require input from the IHV's but I'd imagine my description will at least give you the right idea [smile]

hth
Jack

Share this post


Link to post
Share on other sites
Also, when coords are coming from the VS, the hardware knows how quickly the coords are changing and can decide on which miplevels to fetch, how to perform aniso filtering, etc. When the coords are set in the pixel shader code there is (as far as I know) no rate of change information available, causing a fetch from the highest mip level, which may look bad in certain situations when traversing large areas of the texture per on screen pixel. While the dependant read will be slower, the lack of rate information probably doesn't matter in the slightest to fullscreen effects like blur. As long as the dependant read is faster than performing a second pass to get extra texcoords, then you're better off using it.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this