Sign in to follow this  
sunrisefe

which is more fast in vertex shader:addition or texture sampling?

Recommended Posts

To reduce the computation amount, for a complex function, I want to divide it to several binary subfunction and each subfunction corresponds to one 2D texture. Through precomputation, the value of each binary subfunction is stored in 2D texture, where two varibles of the subfunctin corresponds to texture's coordinates. Each subfunction should be linear when one varible is fixed, so we can make use of the advantage of texture's linear filtering . Now I wonder which is more fast: the texture sampling or addition(or multiply).

Share this post


Link to post
Share on other sites
GPUs are complex massively-parallel machines. The only way to know for sure is to try it.

It really doesn't matter if addition is faster than sampling a texture, because, for example, if your shader is already ALU heavy, it could be an optimization to precalculate some work and store the results in a texture. Conversely, there are situations where fetching the texture can be slower

Share this post


Link to post
Share on other sites
RDragon1,thank you.

For computation intensive circumstance, we should use texture to store precomputed value. For a nonlinear function, how do we make use of texure? E.g.
f = ( x^2 + sin(y)) * 5 + x^3 * z + 6. How do we precalculate it to texture?

Share this post


Link to post
Share on other sites
For per-vert calculations I would probably just store this along with other mesh attributes in the vertex stream

Although I would likely start by just putting that code in the shader.

Share this post


Link to post
Share on other sites
Quote:
Original post by sunrisefe
RDragon1,thank you.

For computation intensive circumstance, we should use texture to store precomputed value. For a nonlinear function, how do we make use of texure? E.g.
f = ( x^2 + sin(y)) * 5 + x^3 * z + 6. How do we precalculate it to texture?

Well you're going to have to split it into two textures given that you have 3 variables, and a 3D texture would be far less than optimal.

You're going to need to define your domain so that you can reduce it to the [0,1], then generate your texture. You can split the equations however you feel like e.g.

f1 = 5 * (x^2 + sin(y)) + 6
f2 = x^3 * z

Make f1 and f2 functions of texture coordinates u and v instead of their respective 2 variables. Then in your shader you would do something like:

value = texture2D(func1, coord);
value += texture2D(func2, coord);

Now of course you can do much more with textures. For example, with the above code I assumed that you would like to store the result in a 32-bit RGBA value, but you could split the equations among 16-bit channels, store the results into 2 color channels, then sample one color and add the two channels to the other two channels in the shader:

value = texture2D(func1, coord);
realValue = value.rg + value.ba;

You could do a lot of things, therefore you must experiment.

Share this post


Link to post
Share on other sites
Halifax2, thank you first of all. I have following question:

f1 = 5 * (x^2 + sin(y)) + 6
f2 = x^3 * z

1. Suppose x,y,z are real and all belong to [0,10000] which go beyond the maximum size of texture that GPU can support, what shall I do?

2.Since f1 are not linear, when x and y are not the texture coordinate of func1(e.g. texture size is 256*256,now the x!= i/255, i=0,1,2,...,255), what shall I do?

To RDragon1,
"Although I would likely start by just putting that code in the shader."
what if every vertex has to calculate the very complex function?

Share this post


Link to post
Share on other sites
Quote:
Original post by sunrisefe
what if every vertex has to calculate the very complex function?


So, what's wrong with that? It's not really that complicated - typical GPUs have a builtin sincos instruction anyway, and a few multiplies and adds are nothing. GPUs have tons of flops of power. You'll know if it's fast enough after you try it - depending on your scene, it might turn up as a tiny blip of a perf hit. Or, it could be a massive drain on performance. You won't really know until you try ;)

Share this post


Link to post
Share on other sites
Quote:
Original post by RDragon1
Quote:
Original post by sunrisefe
what if every vertex has to calculate the very complex function?


So, what's wrong with that? It's not really that complicated - typical GPUs have a builtin sincos instruction anyway, and a few multiplies and adds are nothing. GPUs have tons of flops of power. You'll know if it's fast enough after you try it - depending on your scene, it might turn up as a tiny blip of a perf hit. Or, it could be a massive drain on performance. You won't really know until you try ;)


yes, to the simple function, such as several addition or multipy, it does not matter. But when the function is very complex that has 50 additions and 50 multipy for each vertex, if you only do it in vertex shader for realtime animation, what will the animation become?

Share this post


Link to post
Share on other sites
Depends on how many verts there are. 50 instructions isn't a "very complex" shader. Plus, IIRC some cards can dual issue a MUL in one pipe and MADD in another. Also be aware that MADD exists (multiply and add as one instruction)

Seriously, try it ;)

Share this post


Link to post
Share on other sites
ok, thank you.
It's obvious that use texutre to precalculate is an effective way to reduce the compulation amount. But I dont know how to solve the following question :

supppose the original function is :
f = ( x^2 + sin(y)) * 5 + x^3 * z + 6.

divide function f into two subfunction and each one can be expressed with texture:
f1 = 5 * (x^2 + sin(y)) + 6 as one texture
f2 = x^3 * z as another texture

1. Suppose x,y,z are real and all belong to [0,10000] which go beyond the maximum size of texture that GPU can support, what shall I do?

2.Since f1 are not linear, when x and y are not the texture coordinate of func1(e.g. texture size is 256*256,now the x!= i/255, i=0,1,2,...,255), what shall I do?

[Edited by - sunrisefe on December 25, 2009 1:18:50 AM]

Share this post


Link to post
Share on other sites
If you can't create a 1x10000 texture you can turn your 1d value into a 2d value by wrapping it at some point

If you make the texture 128x128 and your 1d value is 'value', the new uv coordinates are:

u = frac( value * 127.0f )
v = floor( value * 127.0f ) / 127.0f

... I think

Share this post


Link to post
Share on other sites
RDragon1, thank you very much. But it seems that I didn't make me understood.


Suppose f = ( x^2 + sin(y)) * 5 + x^3 * z + 6. Tt's a ternary function, but 2D texture have 2 varibles at most. So according to Halifax2's suggestion, we should split the equations into two subequations, e.g.

f1 = 5 * (x^2 + sin(y)) + 6 as one texture
f2 = x^3 * z as another texture

Make f1 and f2 functions of texture coordinates u and v instead of their respective 2 variables. Then in shader we would do something like:

value = texture2D(func1, coord);
value += texture2D(func2, coord);

But now I have questions as follows:

1. Suppose x,y,z are real and all belong to [0,10000] which go beyond the maximum size of texture that GPU can support, what shall I do? Since the maximum size of 2D is 8192*8192.

2.Since f1 are not linear, when x and y are not the texture coordinate of func1(e.g. texture size is 256*256,now the x!= i/255, i=0,1,2,...,255), what shall I do?

[Edited by - sunrisefe on December 25, 2009 6:46:58 PM]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this