which is more fast in vertex shader:addition or texture sampling?

Started by
10 comments, last by sunrisefe 14 years, 4 months ago
To reduce the computation amount, for a complex function, I want to divide it to several binary subfunction and each subfunction corresponds to one 2D texture. Through precomputation, the value of each binary subfunction is stored in 2D texture, where two varibles of the subfunctin corresponds to texture's coordinates. Each subfunction should be linear when one varible is fixed, so we can make use of the advantage of texture's linear filtering . Now I wonder which is more fast: the texture sampling or addition(or multiply).
Advertisement
GPUs are complex massively-parallel machines. The only way to know for sure is to try it.

It really doesn't matter if addition is faster than sampling a texture, because, for example, if your shader is already ALU heavy, it could be an optimization to precalculate some work and store the results in a texture. Conversely, there are situations where fetching the texture can be slower
RDragon1,thank you.

For computation intensive circumstance, we should use texture to store precomputed value. For a nonlinear function, how do we make use of texure? E.g.
f = ( x^2 + sin(y)) * 5 + x^3 * z + 6. How do we precalculate it to texture?
For per-vert calculations I would probably just store this along with other mesh attributes in the vertex stream

Although I would likely start by just putting that code in the shader.
Quote:Original post by sunrisefe
RDragon1,thank you.

For computation intensive circumstance, we should use texture to store precomputed value. For a nonlinear function, how do we make use of texure? E.g.
f = ( x^2 + sin(y)) * 5 + x^3 * z + 6. How do we precalculate it to texture?

Well you're going to have to split it into two textures given that you have 3 variables, and a 3D texture would be far less than optimal.

You're going to need to define your domain so that you can reduce it to the [0,1], then generate your texture. You can split the equations however you feel like e.g.
f1 = 5 * (x^2 + sin(y)) + 6f2 = x^3 * z

Make f1 and f2 functions of texture coordinates u and v instead of their respective 2 variables. Then in your shader you would do something like:
value = texture2D(func1, coord);value += texture2D(func2, coord);

Now of course you can do much more with textures. For example, with the above code I assumed that you would like to store the result in a 32-bit RGBA value, but you could split the equations among 16-bit channels, store the results into 2 color channels, then sample one color and add the two channels to the other two channels in the shader:
value = texture2D(func1, coord);realValue = value.rg + value.ba;

You could do a lot of things, therefore you must experiment.
Denzel Morris (@drdizzy) :: Software Engineer :: SkyTech Enterprises, Inc.
"When men are most sure and arrogant they are commonly most mistaken, giving views to passion without that proper deliberation which alone can secure them from the grossest absurdities." - David Hume
Halifax2, thank you first of all. I have following question:

f1 = 5 * (x^2 + sin(y)) + 6
f2 = x^3 * z

1. Suppose x,y,z are real and all belong to [0,10000] which go beyond the maximum size of texture that GPU can support, what shall I do?

2.Since f1 are not linear, when x and y are not the texture coordinate of func1(e.g. texture size is 256*256,now the x!= i/255, i=0,1,2,...,255), what shall I do?

To RDragon1,
"Although I would likely start by just putting that code in the shader."
what if every vertex has to calculate the very complex function?
Quote:Original post by sunrisefe
what if every vertex has to calculate the very complex function?


So, what's wrong with that? It's not really that complicated - typical GPUs have a builtin sincos instruction anyway, and a few multiplies and adds are nothing. GPUs have tons of flops of power. You'll know if it's fast enough after you try it - depending on your scene, it might turn up as a tiny blip of a perf hit. Or, it could be a massive drain on performance. You won't really know until you try ;)
Quote:Original post by RDragon1
Quote:Original post by sunrisefe
what if every vertex has to calculate the very complex function?


So, what's wrong with that? It's not really that complicated - typical GPUs have a builtin sincos instruction anyway, and a few multiplies and adds are nothing. GPUs have tons of flops of power. You'll know if it's fast enough after you try it - depending on your scene, it might turn up as a tiny blip of a perf hit. Or, it could be a massive drain on performance. You won't really know until you try ;)


yes, to the simple function, such as several addition or multipy, it does not matter. But when the function is very complex that has 50 additions and 50 multipy for each vertex, if you only do it in vertex shader for realtime animation, what will the animation become?
Depends on how many verts there are. 50 instructions isn't a "very complex" shader. Plus, IIRC some cards can dual issue a MUL in one pipe and MADD in another. Also be aware that MADD exists (multiply and add as one instruction)

Seriously, try it ;)
ok, thank you.
It's obvious that use texutre to precalculate is an effective way to reduce the compulation amount. But I dont know how to solve the following question :

supppose the original function is :
f = ( x^2 + sin(y)) * 5 + x^3 * z + 6.

divide function f into two subfunction and each one can be expressed with texture:
f1 = 5 * (x^2 + sin(y)) + 6 as one texture
f2 = x^3 * z as another texture

1. Suppose x,y,z are real and all belong to [0,10000] which go beyond the maximum size of texture that GPU can support, what shall I do?

2.Since f1 are not linear, when x and y are not the texture coordinate of func1(e.g. texture size is 256*256,now the x!= i/255, i=0,1,2,...,255), what shall I do?

[Edited by - sunrisefe on December 25, 2009 1:18:50 AM]

This topic is closed to new replies.

Advertisement