Sign in to follow this  
funkeejeffounet

Number of GPU cycles for cos and sin functions?

Recommended Posts

Hello, I was wondering where I could find some GPU's documentations regarding the instructions' cycles, as we commonly see for any other processors? I am especially interested in the number of cycles of the cos and sin instructions on the latest Nvidia's hardware(GF6x00), could anyone help me on that please? Are they implemented in hardware? Thanks in advance, Cheers, Jeff.

Share this post


Link to post
Share on other sites
GPU IS a processor (graphics proccessing unit). Anywho, i remember seeing somewhere that in geforce 6 series cards its a signle cycle (maybe i was just dreaming :-p) but i have that memory

radeon x800 has it anyways
EDIT:
Quote:

ORIGINALLY AT: http://gear.ibuypower.com/GVE/Store/ProductDetails.aspx?sku=VC-POWERC-147
Smartshader HD


•Support for Microsoft® DirectX® 9.0 programmable vertex and pixel shaders in hardware
• DirectX 9.0 Vertex Shaders
- Vertex programs up to 65,280 instructions with flow control
- Single cycle trigonometric operations (SIN & COS)
• Direct X 9.0 Extended Pixel Shaders
- Up to 1,536 instructions and 16 textures per rendering pass
- 32 temporary and constant registers
- Facing register for two-sided lighting
- 128-bit, 64-bit & 32-bit per pixel floating point color formats
- Multiple Render Target (MRT) support
• Complete feature set also supported in OpenGL® via extensions


im going to look for some nvidia cards as well
EDIT: interesting, though i havent looked a hell of a lot, i havent found any evidence that nvidia has that as well. (damn cause i just got a geforce 6600 lol) Oh well, weve still got FBOs, nah nah nah nah nah naaaaahhh!!! (ive got a 9700 right now so dont think im a ATI hater lol)
hope that helps
-Dan

Share this post


Link to post
Share on other sites
hmmm tbh, I dunno, as Ademan pointed out the X800 series is single cycle and ATI made a reasonable amount of noise about it, but NV havent commented on it, so I wouldnt say either way if it was or wasnt and given NV's tendancy to favour texture lookups over instructions... (afaik anyways, they might have changed their tune on the NV40 line)

Share this post


Link to post
Share on other sites
I had exactly the same reaction that the_phantom.
When searching on the net about the cost of trigonometric functions on GPUs, you can only find articles related to ATI's newest graphic cards.

On the nvidia site, you cannot find anything about such things, so if they don't brag about it, it seems pretty obvious that they do not support such functions in hardware...

Anyway, in their GPU programming guide, it is said that we should prefer the built in functions for simple instructions(log, exp...) to a precaculated table(texture).
I've tested both approaches, and a precalculated texture isn't worth bothering(even for atan...), so I guess that if they do not support it in hardware, these functions are still fast.

Guess Nvidia will have these built in in hardware for their next NV5x.

Cheers, Jeff.

PS:
How come there isn't any docs on the instructions' cycles if you wanna do assembly for GPU? How can you optimize then?

Share this post


Link to post
Share on other sites
Quote:
Original post by funkeejeffounet

On the nvidia site, you cannot find anything about such things, so if they don't brag about it, it seems pretty obvious that they do not support such functions in hardware...

Anyway, in their GPU programming guide, it is said that we should prefer the built in functions for simple instructions(log, exp...) to a precaculated table(texture).
I've tested both approaches, and a precalculated texture isn't worth bothering(even for atan...), so I guess that if they do not support it in hardware, these functions are still fast.



Quoting from the same chapter of the nVidia GPU programming guide:

"3.5.3.3. The sincos() Function
Despite the preceding advice, the GeForce FX family and later GPUs support some complex mathematical functions natively in hardware. One such function that is convenient is the sincos function, which allows you to simultaneously calculate the sine and cosine of a value."


and

"Any time you can encode a complex sequence of arithmetic operations in a texture, you can improve performance. Keep in mind that some complex functions, such as log and exp, are micro-instructions in ps_2_0 and higher profiles, and therefore don’t need to be encoded in textures for optimal performance."

Share this post


Link to post
Share on other sites
Quote:
Original post by funkeejeffounet
On the nvidia site, you cannot find anything about such things, so if they don't brag about it, it seems pretty obvious that they do not support such functions in hardware...
[...]
I've tested both approaches, and a precalculated texture isn't worth bothering(even for atan...), so I guess that if they do not support it in hardware, these functions are still fast.


Everything you do in a shader is (should be) supported in hardware. Do you want the drivers to emulate a sin function in a pixel shader? And you think it'd still be fast?!

Share this post


Link to post
Share on other sites
yes, its certainly going to be done in hardware and has been for some time so will be fast, it just might not be a single cycle. However, it is trival so I wouldnt worry about it, the NV40 is certainly fast enuff that it wont matter if it takes 1 cycle or 2, the compiler should be able to rearrange stuff so the cost is hidden.

As for the lack of instruction cycles, what you have to realise is that even the ARB_vertex/fragment_program interfaces while they might look like assembly do not have a 1:1 mapping of instructions to GPU instructions and with pipelining and the ability for the compiler to rearrange things cycle counts become somewhat redundant.

Share this post


Link to post
Share on other sites
Quote:
Original post by _the_phantom_
As for the lack of instruction cycles, what you have to realise is that even the ARB_vertex/fragment_program interfaces while they might look like assembly do not have a 1:1 mapping of instructions to GPU instructions and with pipelining and the ability for the compiler to rearrange things cycle counts become somewhat redundant.


As well as doing multiple thing per cycle. For example a texture-lookup + a multiply-add instruction would be done in parrallel in one cycle (hopefully).
While it is important to understand things like that, concentrate on developing the shader and optimizing the bottlenecks in the shaders first.

Share this post


Link to post
Share on other sites
the best way to check the speed of this stuff simply create a shader of what u wanna test, draw it to say a 1000x1000sized window, and see how fast it runs on your hardware, if anything is done in software youll soon tell

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
How will that allow you to properly differentiate between a 1 and 2 cycle cost?

oh, it won't.

Share this post


Link to post
Share on other sites
Quote:
Original post by Anonymous Poster
How will that allow you to properly differentiate between a 1 and 2 cycle cost?

oh, it won't.


Oh no. Your sin functions take 2 friggin' cycles. Come on, why should you care? Would you use different code pathes for cards that need an extra clock cycle? It is the shader performance tha counts, not the extra clock cycle...

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this