Quote:Original post by samoth
Quote:To those of you saying never to use such constructs, you are missing the point. Sometimes you *NEED* a little more performance, in which case you do stuff like this.
Sorry, but I have to object. I was saying that hacks are ok, if they are needed. Here they aren't needed.
While it is true that your bit trick runs about 6.8 times faster than the standard function when fully optimised (counting clock ticks on an AMD64 single core system using gcc 4.2), providing -msse -mfpmath=sse on the commandline causes the compiler to generate code that runs almost twice as fast, with full precision, without punning pointers, and without possible gotchas.
You assume SSE is available and your compiler knows how to use it, which is an even bigger assumption across platforms. The above code is quite portable, and works on almost any platform with IEEE 754 floats, and even a few that don't (but have nearly conformant floats). So ask yourself - between SSE and the above code, which is really more portable when you do *need* the speed?
The trick posted was made popular in the Quake source code, and there was no SSE available at that time. Often in a game you don't need "full precision" since it could be used for computing normals for lighting, and who cares if they are off by a few %.
If you're writing a per pixel rasterizer, for example, for a handheld on say an ARM, where you don't have SSE, and the screen is pretty small, you need speed, and the above code is *exactly* what you want. You don't have the luxury of SSE, pixel shaders, GPUs, and other numerical tricks.
In short, any good programmer should understand *each* of the approaches and choose the one that does what you need. Which gets back to the fact that the original posters code may be needed.