Quote:Original post by staticVoid2
also any reason why it takes so long in std::vector?
I'll bet that you're either compiling in debug mode or with runtime iterator checking on.
By the way forcing the compiler to generate a particular sequence of instructions or loop form is usually quite hard. But anyway the follow loop form is usually the fastest in assembly language and is what the AMD (and Intel IIRC) optimization guide recommends:
void iterate_over(const int *array, size_t len) { ptrdiff_t i; if(!len) { return; } array += len; i = -len; do { process(array); } while(++i);}
Of course other factors such as cache behavior and, you know, what's actually inside the loop usually makes far more difference. And if the loop overhead is really a problem (and it's usually virtually free on x86 since little beyond painstakingly hand-tuned assembly code can hope to use all of the processor's execution resources) then you can always unroll the loop.
The ideal way also depends on register pressure, size of the array elements (the index method is only suitable for powers-of-two between one and eight bytes on x86/x64), and so on and so forth. Plus other architectures prefer very different code. It's a mess really so I'd just stick to the most natural form and then rewrite any code you truly need to squeeze every last cycle out of in assembly language.
The one easy and universal fix I can think of is that 32-bit indices seem to generate crappy code on x64, at least in GCC, so stick to size_t and the like.
edit: The conditional might or might not be reevaluated every iteration depending on whether the compiler can deduce that the array's size/origin won't change. If they're simply sent as pointer and integer arguments to a function then you're probably fine but if you're being sent a whole std::vector by reference and invoking end() every iteration then you'd probably be better off caching it, the same rule-of-thumb applies even if it's a local vector if you ever pass it by address to another function.