C++: Rotating and scaling a bitmask - slow!

Started by
13 comments, last by oliii 17 years, 1 month ago
The only use of fixed point in this case would be for additions, and as far as I can recall those are about as fast as floating point additions. The main thing is avoiding that conversion from a float to an int, which would be done by a bit shift operating. I also googled some ways to do a faster float to int conversion, but what could possibly compete with a shift?
Advertisement
A fixed point add is just an integer add; on any CPU younger than 5 years or so there's not really an appreciable difference between an integer add and a floating point add.

Eliminating the float/int conversion would certainly add some speed, although it isn't clear how much. Obviously the original performance problem was algorithmic in nature, not low-level. Any further gains would likely have to come from careful profiling and tuning. In any case IMHO the small gains of eliminating the conversion are probably not worth the cost in code complexity for moving to fixed-point.

Removing inner conditionals from the loop, optimizing the stepping algorithm (Bresenham is not the fastest known linear step algorithm although I can't recall the names of any alternatives at the moment), and possibly exploiting SIMD vectorization would probably all give better payoff. (In particular, since this is a repeated application of a linear step through a rectangular space, it should be possible to do, say, 4 rows at a time via SIMD.)

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Quote:Original post by ApochPiQ
The practical use of fixed-point math in consumer-PC hardware ended many years ago.


Sorry I have to disagree with this, to a point, at least for a P4 CPU (is that still modern?).

I recently wrote some image manipulation functions that originally used floating-point and found it way too slow. So I did a comparitive tested using fixed-point and discovered it performed over twice as fast.

So in general my expereince has shown that fixed-point is faster than floating point. However I then switched much of the function into simple look-up-tables and that blew both fixedpoint and floatingpoint out of the water, but thats a completely different optimisation technique ;)

Having said that I'd expect SIMD to be as fast if not faster than fixedpoint, but I didn't bother invesigating it at the time since the lookuptable was clearly the most effcient method at the algorithm level, albeit at a slight cost in memory.

Ultimately I feel its always good to test different approaches out if you have the time. If you can write and support SIMD then they are probably going to be as good if not better than fixed point, but fixed-point is likely to always be faster than floatingpoint. However there may be alternative optimsiations at the algorithm level that can easily beat any gains you get from switching from floatingpoint to fixedpoint or SIMD that should also be profiled.
A few things.

First of all, without sharing both the initial and rewritten code, your anecdotal evidence is less than worthless. For all we know you "accidentally" introduced an algorithmic speedup during the float/fixed changeover and that's the source of your results. There's also a fairly well-known performance hit to using "magic" values (INF, NAN, and so on) on the P4 processor series in particular, so if your code involved any of that, look no further for your performance explanation.

Secondly, anecdotal evidence without even profiling stats to back it up is less than worthless anyways since we're talking about highly situational micro-optimizations here.

Third, it seems to me like you don't know what SIMD means. SIMD stands for Single Instruction Multiple Data. Assuming your algorithm can be reasonably implemented to handle it, SIMD allows you to do, say, four calculations on different data at one time - in the same time it would have taken to do a single calculation. There are of course some small bits of overhead and not all code can be vectorized, but in general, well-written SIMD code will absolutely stomp all over non-vectorized fixed point.

Finally, it's an inescapable fact that FP pipelines in modern CPUs are just as fast as the integer pipelines. Both Intel and AMD's official optimization references show floating-point operations running neck-and-neck with integer operations in general; it's even possible for FP throughput to be higher than integer throughput if the code is pipelined right (the compiler makes a really huge difference here). The bottom line is that unless you're doing a huge amount of integer/real swapping (i.e. introducing float conversions), chances are just doing it in floating point will be more than fast enough.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Quote:
float r = sqrt(dx*dx + dy*dy); // Distance/radius from centre to this pixel
float a = atan2(dy, dx); // Angle from centre to this pixel
// Rotate pixel around centre
float fSrcX = cX + cos(a + angle) * r;
float fSrcY = cY + sin(a + angle) * r;


in a inner loop? 40,000 times? I rest my case.

Transform matrices. Yeah... Or some funky hacky 'Doom style' bitmap manipulation.

Everything is better with Metal.

This topic is closed to new replies.

Advertisement