Quote:Original post by outRider
A compiler that does inter-procedural analysis will be able to propagate constants defined in other compilation units, so if size is a constant somewhere in your program you will get the loop unrolling and branch collapsing benefits everywhere it is used. You might check that your compiler does this first.
Unfortunately, C++ compilers generally suck at this. (And of course, C++ itself doesn't exactly make it easy)
Lode:
For further optimization, I'd try something like this in the loop body:
// Probably no biggie, but compilers tend to like temporaries better than arrays, because there's no risk of aliasing uint32 a2 = a; uint32 b2 = b; uint32 out2 = a2 + b2 + carry; // Conditional move is a lot easier for the compiler (and CPU) to reorder and optimize than a branch carry = ((!carry && out2 < b2) || (carry && out2 <= b2)) ? 1U : 0U; out = outtmp;
Might not make a difference in your case, but your version relies on the compiler to 1) be able to eliminate the branches (when I tried something similar a year or two ago, GCC had problems with that, at least), and 2) perform enough aliasing analysis to optimize away all the array accesses.