Jump to content

  • Log In with Google      Sign In   
  • Create Account

Number of GPU cycles for cos and sin functions?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
14 replies to this topic

#1 funkeejeffounet   Members   -  Reputation: 136

Like
0Likes
Like

Posted 29 May 2005 - 06:44 AM

Hello, I was wondering where I could find some GPU's documentations regarding the instructions' cycles, as we commonly see for any other processors? I am especially interested in the number of cycles of the cos and sin instructions on the latest Nvidia's hardware(GF6x00), could anyone help me on that please? Are they implemented in hardware? Thanks in advance, Cheers, Jeff.

Sponsor:

#2 Tera_Dragon   Members   -  Reputation: 260

Like
0Likes
Like

Posted 29 May 2005 - 11:37 AM

Are sin and cosine done by the gpu? I would have thought they would be done by the processor.
____________________________________________________________Programmers Resource Central

#3 Ademan555   Members   -  Reputation: 361

Like
0Likes
Like

Posted 29 May 2005 - 11:47 AM

GPU IS a processor (graphics proccessing unit). Anywho, i remember seeing somewhere that in geforce 6 series cards its a signle cycle (maybe i was just dreaming :-p) but i have that memory

radeon x800 has it anyways
EDIT:
Quote:

ORIGINALLY AT: http://gear.ibuypower.com/GVE/Store/ProductDetails.aspx?sku=VC-POWERC-147
Smartshader HD


•Support for Microsoft® DirectX® 9.0 programmable vertex and pixel shaders in hardware
• DirectX 9.0 Vertex Shaders
- Vertex programs up to 65,280 instructions with flow control
- Single cycle trigonometric operations (SIN & COS)
• Direct X 9.0 Extended Pixel Shaders
- Up to 1,536 instructions and 16 textures per rendering pass
- 32 temporary and constant registers
- Facing register for two-sided lighting
- 128-bit, 64-bit & 32-bit per pixel floating point color formats
- Multiple Render Target (MRT) support
• Complete feature set also supported in OpenGL® via extensions


im going to look for some nvidia cards as well
EDIT: interesting, though i havent looked a hell of a lot, i havent found any evidence that nvidia has that as well. (damn cause i just got a geforce 6600 lol) Oh well, weve still got FBOs, nah nah nah nah nah naaaaahhh!!! (ive got a 9700 right now so dont think im a ATI hater lol)
hope that helps
-Dan

#4 tomlu709   Members   -  Reputation: 156

Like
0Likes
Like

Posted 29 May 2005 - 11:50 AM

They're single cycle on modern graphics cards.

#5 phantom   Moderators   -  Reputation: 7258

Like
0Likes
Like

Posted 29 May 2005 - 12:45 PM

hmmm tbh, I dunno, as Ademan pointed out the X800 series is single cycle and ATI made a reasonable amount of noise about it, but NV havent commented on it, so I wouldnt say either way if it was or wasnt and given NV's tendancy to favour texture lookups over instructions... (afaik anyways, they might have changed their tune on the NV40 line)

#6 funkeejeffounet   Members   -  Reputation: 136

Like
0Likes
Like

Posted 29 May 2005 - 10:29 PM

I had exactly the same reaction that the_phantom.
When searching on the net about the cost of trigonometric functions on GPUs, you can only find articles related to ATI's newest graphic cards.

On the nvidia site, you cannot find anything about such things, so if they don't brag about it, it seems pretty obvious that they do not support such functions in hardware...

Anyway, in their GPU programming guide, it is said that we should prefer the built in functions for simple instructions(log, exp...) to a precaculated table(texture).
I've tested both approaches, and a precalculated texture isn't worth bothering(even for atan...), so I guess that if they do not support it in hardware, these functions are still fast.

Guess Nvidia will have these built in in hardware for their next NV5x.

Cheers, Jeff.

PS:
How come there isn't any docs on the instructions' cycles if you wanna do assembly for GPU? How can you optimize then?

#7 vNistelrooy   Members   -  Reputation: 140

Like
0Likes
Like

Posted 29 May 2005 - 10:50 PM

I'm quite sure nVidia's nv4x does the standard trigo functions in one cycle. Precomputed texture table lookups are slower, waste bandwidth, and use extra memory.
"C lets you shoot yourself in the foot rather easily. C++ allows you to reuse the bullet!"

#8 Nik02   Crossbones+   -  Reputation: 2818

Like
0Likes
Like

Posted 29 May 2005 - 10:50 PM

Quote:
Original post by funkeejeffounet

On the nvidia site, you cannot find anything about such things, so if they don't brag about it, it seems pretty obvious that they do not support such functions in hardware...

Anyway, in their GPU programming guide, it is said that we should prefer the built in functions for simple instructions(log, exp...) to a precaculated table(texture).
I've tested both approaches, and a precalculated texture isn't worth bothering(even for atan...), so I guess that if they do not support it in hardware, these functions are still fast.



Quoting from the same chapter of the nVidia GPU programming guide:

"3.5.3.3. The sincos() Function
Despite the preceding advice, the GeForce FX family and later GPUs support some complex mathematical functions natively in hardware. One such function that is convenient is the sincos function, which allows you to simultaneously calculate the sine and cosine of a value."


and

"Any time you can encode a complex sequence of arithmetic operations in a texture, you can improve performance. Keep in mind that some complex functions, such as log and exp, are micro-instructions in ps_2_0 and higher profiles, and therefore don’t need to be encoded in textures for optimal performance."

Niko Suni


#9 vNistelrooy   Members   -  Reputation: 140

Like
0Likes
Like

Posted 30 May 2005 - 12:15 AM

Quote:
Original post by funkeejeffounet
On the nvidia site, you cannot find anything about such things, so if they don't brag about it, it seems pretty obvious that they do not support such functions in hardware...
[...]
I've tested both approaches, and a precalculated texture isn't worth bothering(even for atan...), so I guess that if they do not support it in hardware, these functions are still fast.


Everything you do in a shader is (should be) supported in hardware. Do you want the drivers to emulate a sin function in a pixel shader? And you think it'd still be fast?!
"C lets you shoot yourself in the foot rather easily. C++ allows you to reuse the bullet!"

#10 phantom   Moderators   -  Reputation: 7258

Like
0Likes
Like

Posted 30 May 2005 - 02:55 AM

yes, its certainly going to be done in hardware and has been for some time so will be fast, it just might not be a single cycle. However, it is trival so I wouldnt worry about it, the NV40 is certainly fast enuff that it wont matter if it takes 1 cycle or 2, the compiler should be able to rearrange stuff so the cost is hidden.

As for the lack of instruction cycles, what you have to realise is that even the ARB_vertex/fragment_program interfaces while they might look like assembly do not have a 1:1 mapping of instructions to GPU instructions and with pipelining and the ability for the compiler to rearrange things cycle counts become somewhat redundant.

#11 tomlu709   Members   -  Reputation: 156

Like
0Likes
Like

Posted 30 May 2005 - 08:03 AM

They are single cycle on the 6800 according to NVIDIA. Google for it. Even if they aren't, they'd be faster than texture lookups with all the hassle involved.

#12 vNistelrooy   Members   -  Reputation: 140

Like
0Likes
Like

Posted 30 May 2005 - 08:10 AM

Quote:
Original post by _the_phantom_
As for the lack of instruction cycles, what you have to realise is that even the ARB_vertex/fragment_program interfaces while they might look like assembly do not have a 1:1 mapping of instructions to GPU instructions and with pipelining and the ability for the compiler to rearrange things cycle counts become somewhat redundant.


As well as doing multiple thing per cycle. For example a texture-lookup + a multiply-add instruction would be done in parrallel in one cycle (hopefully).
While it is important to understand things like that, concentrate on developing the shader and optimizing the bottlenecks in the shaders first.
"C lets you shoot yourself in the foot rather easily. C++ allows you to reuse the bullet!"

#13 zedzeek   Members   -  Reputation: 528

Like
0Likes
Like

Posted 30 May 2005 - 08:54 AM

the best way to check the speed of this stuff simply create a shader of what u wanna test, draw it to say a 1000x1000sized window, and see how fast it runs on your hardware, if anything is done in software youll soon tell

#14 Anonymous Poster_Anonymous Poster_*   Guests   -  Reputation:

0Likes

Posted 30 May 2005 - 09:00 AM

How will that allow you to properly differentiate between a 1 and 2 cycle cost?

oh, it won't.

#15 vNistelrooy   Members   -  Reputation: 140

Like
0Likes
Like

Posted 30 May 2005 - 12:19 PM

Quote:
Original post by Anonymous Poster
How will that allow you to properly differentiate between a 1 and 2 cycle cost?

oh, it won't.


Oh no. Your sin functions take 2 friggin' cycles. Come on, why should you care? Would you use different code pathes for cards that need an extra clock cycle? It is the shader performance tha counts, not the extra clock cycle...
"C lets you shoot yourself in the foot rather easily. C++ allows you to reuse the bullet!"




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS