Sign in to follow this  
circassia

Precalculating values and storing them in variables, efficient?

Recommended Posts

Hi guys,

i want to do a only GPU-Based Gausian-Shader and need some benchmarking tips.

[b]How it's done now:[/b] Blur weights are computed on the CPU, then passed to the shader as an array with the length of 5.
[b]
How i want to do it know: [/b]I want to compute the weights for 100 different blurinesses, and save it as an array with the length of 5 in the HLSL-Code. (Hard-Coded). So that i don't to have pass any values anymore.

The question coming up is, if this method is very bad, or if its acceptable?

Thank you very much for your help!

Share this post


Link to post
Share on other sites
what are you doing on the cpu to calculate your blur weights? Could you not do that on the GPU and pass a 'intensity' value to the shader instead? On modern machines I don't think any of this is even going to be noticeable as long as the draw calls are kept down... ie you're not sending the same information over and over and 100 thousand times more over again.

Share this post


Link to post
Share on other sites
It will be faster if you hardcode the values (but I'm almost sure that there is a memory limit for hardcoded values and it might get slower as the number of hardcoded values increase) , however [u]you wont be able to change blurriness dynamically[/u] unless you pass a value containing the blurriness and you use it to read the hardcoded weights...

Anyway the cost of passing 5 float values to a shader is probably small (or even 4 floats since you would need to pass an int containg the blurriness if using hardcoded weights)...

So you should try both methods and see which is faster in your usage scenarios.

Share this post


Link to post
Share on other sites
Hi @ all,

thank you for your the multiple answers. The function for the gaussian weight calculation is as follows:


[code]
static double Gaussian(int distance, double standardDeviation)
{
return 1 / (Math.Sqrt(2 * Math.PI) * standardDeviation * standardDeviation) *
Math.Exp(-Math.Pow(distance, 2) / (2 * Math.Pow(standardDeviation, 2)));
}
[/code]

This formula returns the weights. I don't know if its good to make this sort of calculations on the GPU. At the moment i am passing a float3 to my shader, everytime the standardDeviation changes.

As i mentioned, i would precalculate those weights, for the given standardDeviation (from 0 - 100) and generate a float array with 500 elements.

thank you all!

Share this post


Link to post
Share on other sites
Based on my experience, the Pow() function can be slow for squaring. I find it's better to simply multiply the numbers together much like you've done in the first line of the formula.

In addition, [color="#660066"][font="CourierNew, monospace"][size="2"]Math[/size][/font][/color][color="#666600"][font="CourierNew, monospace"][size="2"].[/size][/font][/color][color="#660066"][font="CourierNew, monospace"][size="2"]Sqrt[/size][/font][/color][color="#666600"][font="CourierNew, monospace"][size="2"]([/size][/font][/color][color="#006666"][font="CourierNew, monospace"][size="2"]2[/size][/font][/color][color="#666600"][font="CourierNew, monospace"][size="2"]*[/size][/font][/color][color="#660066"][font="CourierNew, monospace"][size="2"]Math[/size][/font][/color][color="#666600"][font="CourierNew, monospace"][size="2"].[/size][/font][/color][font="CourierNew, monospace"][size="2"]PI[/size][/font][color="#666600"][font="CourierNew, monospace"][size="2"]) [/size][/font][/color]is a constant value, so you could get rid of the expensive square root by pre-calculating that somewhere else and passing in the value.

To address the issue of whether to have the calculation be done on the GPU or CPU, in general the GPU is going to have much faster, since its heavily optimised for crunching numbers, but it will depend on how utilised your CPU is. If you GPU is already grinding away on other shaders, but the CPU is mostly idle, it may be better to have the CPU do it instead.

The only way to know for sure is to try both approaches and see if there's a difference in frame rate. In short, benchmark! :)

Share this post


Link to post
Share on other sites
It's ultimately going to depend on the hardware, so the only way to know for sure is to profile. Precomputing values and sending them as constants can obviously save you shader math, but accessing constants is not free either. On newer hardware in particular the amount of on-chip ALU power can be tremendous, and it may in fact be quicker to generate the values in the shader.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this