Number of GPU cycles for cos and sin functions?

Started by
13 comments, last by vNistelrooy 18 years, 10 months ago
Hello, I was wondering where I could find some GPU's documentations regarding the instructions' cycles, as we commonly see for any other processors? I am especially interested in the number of cycles of the cos and sin instructions on the latest Nvidia's hardware(GF6x00), could anyone help me on that please? Are they implemented in hardware? Thanks in advance, Cheers, Jeff.
Advertisement
Are sin and cosine done by the gpu? I would have thought they would be done by the processor.
____________________________________________________________Programmers Resource Central
GPU IS a processor (graphics proccessing unit). Anywho, i remember seeing somewhere that in geforce 6 series cards its a signle cycle (maybe i was just dreaming :-p) but i have that memory

radeon x800 has it anyways
EDIT:
Quote:
ORIGINALLY AT: http://gear.ibuypower.com/GVE/Store/ProductDetails.aspx?sku=VC-POWERC-147
Smartshader HD


•Support for Microsoft® DirectX® 9.0 programmable vertex and pixel shaders in hardware
• DirectX 9.0 Vertex Shaders
- Vertex programs up to 65,280 instructions with flow control
- Single cycle trigonometric operations (SIN & COS)
• Direct X 9.0 Extended Pixel Shaders
- Up to 1,536 instructions and 16 textures per rendering pass
- 32 temporary and constant registers
- Facing register for two-sided lighting
- 128-bit, 64-bit & 32-bit per pixel floating point color formats
- Multiple Render Target (MRT) support
• Complete feature set also supported in OpenGL® via extensions


im going to look for some nvidia cards as well
EDIT: interesting, though i havent looked a hell of a lot, i havent found any evidence that nvidia has that as well. (damn cause i just got a geforce 6600 lol) Oh well, weve still got FBOs, nah nah nah nah nah naaaaahhh!!! (ive got a 9700 right now so dont think im a ATI hater lol)
hope that helps
-Dan
When General Patton died after World War 2 he went to the gates of Heaven to talk to St. Peter. The first thing he asked is if there were any Marines in heaven. St. Peter told him no, Marines are too rowdy for heaven. He then asked why Patton wanted to know. Patton told him he was sick of the Marines overshadowing the Army because they did more with less and were all hard-core sons of bitches. St. Peter reassured him there were no Marines so Patton went into Heaven. As he was checking out his new home he rounded a corner and saw someone in Marine Dress Blues. He ran back to St. Peter and yelled "You lied to me! There are Marines in heaven!" St. Peter said "Who him? That's just God. He wishes he were a Marine."
They're single cycle on modern graphics cards.
hmmm tbh, I dunno, as Ademan pointed out the X800 series is single cycle and ATI made a reasonable amount of noise about it, but NV havent commented on it, so I wouldnt say either way if it was or wasnt and given NV's tendancy to favour texture lookups over instructions... (afaik anyways, they might have changed their tune on the NV40 line)
I had exactly the same reaction that the_phantom.
When searching on the net about the cost of trigonometric functions on GPUs, you can only find articles related to ATI's newest graphic cards.

On the nvidia site, you cannot find anything about such things, so if they don't brag about it, it seems pretty obvious that they do not support such functions in hardware...

Anyway, in their GPU programming guide, it is said that we should prefer the built in functions for simple instructions(log, exp...) to a precaculated table(texture).
I've tested both approaches, and a precalculated texture isn't worth bothering(even for atan...), so I guess that if they do not support it in hardware, these functions are still fast.

Guess Nvidia will have these built in in hardware for their next NV5x.

Cheers, Jeff.

PS:
How come there isn't any docs on the instructions' cycles if you wanna do assembly for GPU? How can you optimize then?
I'm quite sure nVidia's nv4x does the standard trigo functions in one cycle. Precomputed texture table lookups are slower, waste bandwidth, and use extra memory.
"C lets you shoot yourself in the foot rather easily. C++ allows you to reuse the bullet!"
Quote:Original post by funkeejeffounet

On the nvidia site, you cannot find anything about such things, so if they don't brag about it, it seems pretty obvious that they do not support such functions in hardware...

Anyway, in their GPU programming guide, it is said that we should prefer the built in functions for simple instructions(log, exp...) to a precaculated table(texture).
I've tested both approaches, and a precalculated texture isn't worth bothering(even for atan...), so I guess that if they do not support it in hardware, these functions are still fast.



Quoting from the same chapter of the nVidia GPU programming guide:

"3.5.3.3. The sincos() Function
Despite the preceding advice, the GeForce FX family and later GPUs support some complex mathematical functions natively in hardware. One such function that is convenient is the sincos function, which allows you to simultaneously calculate the sine and cosine of a value."


and

"Any time you can encode a complex sequence of arithmetic operations in a texture, you can improve performance. Keep in mind that some complex functions, such as log and exp, are micro-instructions in ps_2_0 and higher profiles, and therefore don’t need to be encoded in textures for optimal performance."

Niko Suni

Quote:Original post by funkeejeffounet
On the nvidia site, you cannot find anything about such things, so if they don't brag about it, it seems pretty obvious that they do not support such functions in hardware...
[...]
I've tested both approaches, and a precalculated texture isn't worth bothering(even for atan...), so I guess that if they do not support it in hardware, these functions are still fast.


Everything you do in a shader is (should be) supported in hardware. Do you want the drivers to emulate a sin function in a pixel shader? And you think it'd still be fast?!
"C lets you shoot yourself in the foot rather easily. C++ allows you to reuse the bullet!"
yes, its certainly going to be done in hardware and has been for some time so will be fast, it just might not be a single cycle. However, it is trival so I wouldnt worry about it, the NV40 is certainly fast enuff that it wont matter if it takes 1 cycle or 2, the compiler should be able to rearrange stuff so the cost is hidden.

As for the lack of instruction cycles, what you have to realise is that even the ARB_vertex/fragment_program interfaces while they might look like assembly do not have a 1:1 mapping of instructions to GPU instructions and with pipelining and the ability for the compiler to rearrange things cycle counts become somewhat redundant.

This topic is closed to new replies.

Advertisement