HLSL fast trig functions

Vinh Le · 2013-02-08T01:27:35

Does hlsl have access to fast trig functions that sacrifice accuracy for speed? I know CUDA has __sinf(x), __cosf(x), etc. which can be an order of magnitude faster than the sinf(x), cosf(x) counterparts. I swore I read it somewhere before, but I just can't find it on google or msdn anymore.

Graphics and GPU Programming Programming

Started by NotTakenSN February 06, 2013 03:07 AM

13 comments, last by 21st Century Moose 11 years, 2 months ago

NotTakenSN

149

Author

February 07, 2013 04:39 AM

Thanks for the insightful and detailed responses, everybody. Do you think that future versions of hlsl would support this though? Even with the differences between amd and nvidia architecture, I would think that it wouldn't be too hard to create an assembly instruction that would result in using the fast trig functions with nvidia hardware while using the normal trig options with amd hardware. Doesn't the JIT compiler know what hardware is being used? I don't think the compiler should use the fast trig functions without being explicitly told to do so, because accuracy may be important for some applications. I just don't understand why there wouldn't be an assembly instruction for this. Just because the function isn't supported by both vendors shouldn't mean it can't be exploited by hlsl at all. There just needs to be an assembly instruction that uses fast trig operations when the available hardware is detected. Seems simple to me... but then, I'm no expert.

MJP

20,295

February 07, 2013 06:47 PM

I couldn't really answer those questions for sure. I don't have any insider info on the process that Microsoft uses to decide what goes into the specification, or what criteria is used to decide on whether to add an instruction.

The JIT compiler definitely knows what hardware is being used...it has to, since its job is to produce microcode for that specific hardware. In general it won't be able to make assumptions about the required precision or accuracy of a calculation, so I'm pretty sure that in most cases it won't try to swap out a sin or cos with an approximate version. However they will definitely tweak their drivers to make optimizations for specific high-profile games, so that they can get higher performance in benchmarks. I wouldn't be surprised if those optimizations included shader tweaks that adjust precision or accuracy.

The Blog | The Book

Adam_42

3,664

February 07, 2013 11:18 PM

It might be worth experimenting with half floats, if they provide enough precision. It's possible the JIT will pick different instructions based on what types are involved, but I've not tried it.

If you need faster trig functions you could try approximating them with a texture lookup - you can use the texture wrapping to handle the repetition so it's only a couple of instructions. A texture could also get you sin(x) and cos(x) in a single lookup.

To find out what the GPU JIT compiler actually does there are tools available.

MJP

20,295

February 07, 2013 11:31 PM

Modern AMD and Nvidia GPU's don't have any ALU support for half-precision floating point. In fact they removed support for half precision from HLSL, and then they recently added it back in for Direct3D 11.1 (so that they could support mobile GPU's).

The Blog | The Book

21st Century Moose

13,459

February 08, 2013 01:27 AM

All of this begs the question - just how much are you using these functions anyway that you really feel the need for faster versions of them? Have you actually benchmarked and determined that these particular functions are bottlenecks for you, or is this some kind of relatively vague "faster versions of these would be nice" thing?

Personally I've done full-screen PP effects with 2 sins per-pixel and my own benchmarks have shown ROP to be so dominant that it would take some pretty damn heavy shaders to even register to any comparable significance. Summary is that I doubt if fast versions are even needed aside from some weird extreme use cases.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

HLSL fast trig functions

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

HLSL fast trig functions

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines