HLSL fast trig functions

Started by
13 comments, last by 21st Century Moose 11 years, 2 months ago

Thanks for the insightful and detailed responses, everybody. Do you think that future versions of hlsl would support this though? Even with the differences between amd and nvidia architecture, I would think that it wouldn't be too hard to create an assembly instruction that would result in using the fast trig functions with nvidia hardware while using the normal trig options with amd hardware. Doesn't the JIT compiler know what hardware is being used? I don't think the compiler should use the fast trig functions without being explicitly told to do so, because accuracy may be important for some applications. I just don't understand why there wouldn't be an assembly instruction for this. Just because the function isn't supported by both vendors shouldn't mean it can't be exploited by hlsl at all. There just needs to be an assembly instruction that uses fast trig operations when the available hardware is detected. Seems simple to me... but then, I'm no expert.

Advertisement

I couldn't really answer those questions for sure. I don't have any insider info on the process that Microsoft uses to decide what goes into the specification, or what criteria is used to decide on whether to add an instruction.

The JIT compiler definitely knows what hardware is being used...it has to, since its job is to produce microcode for that specific hardware. In general it won't be able to make assumptions about the required precision or accuracy of a calculation, so I'm pretty sure that in most cases it won't try to swap out a sin or cos with an approximate version. However they will definitely tweak their drivers to make optimizations for specific high-profile games, so that they can get higher performance in benchmarks. I wouldn't be surprised if those optimizations included shader tweaks that adjust precision or accuracy.

It might be worth experimenting with half floats, if they provide enough precision. It's possible the JIT will pick different instructions based on what types are involved, but I've not tried it.

If you need faster trig functions you could try approximating them with a texture lookup - you can use the texture wrapping to handle the repetition so it's only a couple of instructions. A texture could also get you sin(x) and cos(x) in a single lookup.

To find out what the GPU JIT compiler actually does there are tools available.

Modern AMD and Nvidia GPU's don't have any ALU support for half-precision floating point. In fact they removed support for half precision from HLSL, and then they recently added it back in for Direct3D 11.1 (so that they could support mobile GPU's).

All of this begs the question - just how much are you using these functions anyway that you really feel the need for faster versions of them? Have you actually benchmarked and determined that these particular functions are bottlenecks for you, or is this some kind of relatively vague "faster versions of these would be nice" thing?

Personally I've done full-screen PP effects with 2 sins per-pixel and my own benchmarks have shown ROP to be so dominant that it would take some pretty damn heavy shaders to even register to any comparable significance. Summary is that I doubt if fast versions are even needed aside from some weird extreme use cases.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement