Shader limitations and best practices

Started by
11 comments, last by Lightness1024 11 years, 8 months ago

because pow will use the Special Function Unit that is idling in your shader


While I'm not questioning your advice, as you generally want to use intrinstics as much as you can, there is no garentee about what it will use as not all hardware has a SFU; AMD's latest GPU arch, for example, doesn't have a dedicated SFU; all the vector units can do SFU work as required.
Advertisement
Regarding your cone stepping loop - you'd have to check the asm output to see if this bit is being in-lined, but if it is, it could be caused by defining cone_steps as a const int = 15. The compiler now knows the max number of loops and will inline the whole thing if there are enough instruction slots. Try to leave this undefined, set value from CPU, now the compiler doesn't know the max number of loops and will leave this bit as a loop. Whether that will help performance is unknown, but it will reduce instruction count by a ton.

[quote name='Lightness1024' timestamp='1345759361' post='4972777']
because pow will use the Special Function Unit that is idling in your shader


While I'm not questioning your advice, as you generally want to use intrinstics as much as you can, there is no garentee about what it will use as not all hardware has a SFU; AMD's latest GPU arch, for example, doesn't have a dedicated SFU; all the vector units can do SFU work as required.
[/quote]
Ok, I didn't know that.
it also depends on the sahder model profile the compiler is set to compile. it is possible that when targeting SM3 the pow will be extended to a taylor serie. (which was definitely the case for e.g. sin(x) in SM1, in SM2 the compiler is replacing sin(x) by sincos(x) asm intrinsic)

This topic is closed to new replies.

Advertisement