for loops and overhead?

Started by
22 comments, last by Gammastrahler 21 years, 10 months ago
hi, i´ve programmed a vector template class so that i can reuse my code for 2D, 3D or 4D vectors of different data types. therefore, most of the methods must use a for loop like this:
  
template <unsigned int N, class T = float>
float CVector<N, T>::Dot(const CVector &pVec) const
{
   float result = 0;
   for (unsigned int i = 0; i < N; i++) 
      result += v[i] * pVec.v[i];
   return result;
}
  
my question is: do for loops have any significant overheads / performance drawbacks? thanks Gammastrahler
Advertisement
For loops are compiled into conditional jumps which are very fast. I see nothing unneccessary in your function, and every operation you are doing will be fast.

Using the benefits of reusability via templates and writing lean and mean code. That''s why i like writing good C code with C++''s encapsulation.
*sniff* And that''s what game programming should be all about.

Oh, and for further speed questions I advise you learn how to use your compiler/IDE''s profiler. Its a really powerful tool that tells you what areas you need to redesign/optimize, instead of searching line by line for redundancy.

I''ve always lived by the philosophy of write everything lean and clean, and write your graphics/networking code fast and mean.

- Kevin "BaShildy" King
Game Programmer: DigiPen
www.mpogd.com
- Kevin "BaShildy" KingGame Programmer: DigiPenwww.mpogd.com
In many cases a for loop is slower than avoiding it altogether. Although it contains only a conditional branch machine code instruction such as branch still takes time.

However, some compilers support an optimization called "loop unrolling". The compiler will attempt to detect whether it can know how many time the loop with execute and if the number is small it will eliminate the loop completely by replacing it with the contents directly. If N was 3 you code would be replaced by the compiler with:

result += v[0] * pVec.v[0];
result += v[1] * pVec.v[1];
result += v[2] * pVec.v[2];

Whether this happens in your case you can check by looking at the resulting assembler code.

Some compilers (such as Intel C++ or Codeplay VectorC) detect such constructions and can compile them to MMX/SSE/3DNow instructions.

Also, if you want to help your compiler to produce faster code (whether it actually helps, depends on the compiler) you make sure it tests against zero instead of N like this:

for (unsigned int i = N-1; i >= 0; --i)

BTW, I think there is a bug in your program, result shouldn't be declared as float, but as T and the same goes for the return type.

[edited by - felonius on June 1, 2002 7:23:58 AM]
Jacob Marner, M.Sc.Console Programmer, Deadline Games
quote:
Also, if you want to help your compiler to produce faster code (whether it actually helps, depends on the compiler) you make sure it tests against zero instead of N like this:


Speaking from an x86 point of view, why would this be faster? You have to get the flags set somehow to jump off them. The traditional way you would do:

cmp ebx, ecx
jge
...

And for the one you outlined you would do perhaps:

add ebx, 0
jl
...

Why would the latter be any faster than the former?
If you would like the compiler to unroll the loop for you, look into template meta programming. Of course if you are worried about performance, you might want to inline it also.
"conditional jumps which are very fast"

aren''t they actually among the slowest of instructions? Afterall, a mispredict creates a bubble in the pipeline. Loops that are only 2 to 4 iterations long are the worst too. A two iteration loop would be mispredicted half the time, which would be devastating right?
In this specific case, where N will probably be either 2, 3, or 4, isn''t it faster to unroll the loops?

Also, in a function like "Dot" that will be called very often, wouldn''t it be better to create "Result" once as some dummy class member?


Two ways to avoid loops in C++:
Template Specialization:

  template <unsigned int N>inline unsigned int Factorial(void) {	unsigned int sum = 1;	for(unsigned int a=N; a>1; --a)		sum += a;	return sum;}// A specialization for if the factorial of 3 is neededtemplate <>inline unsigned int Factorial<3>(void) {	return 6;}  

Template Metaprogramming:

  template <unsigned int N>struct Factorial {	static inline unsigned int Get(void) {		return Factorial<N-1>::Get()+N;	}};template <>struct Factorial <1> {	static inline unsigned int Get(void) {		return 1;	}};template <>struct Factorial <0> {	static inline unsigned int Get(void) {		return 0;	}};  

I once wrote a bubble sort using metaprogramming, it''s a lot of fun to play with. I''m sure you can figure out how to add the type template parameter back into that (if your compiler happens to support partial specialization, that is).

>> Why would the latter be any faster than the former?

I don''t know about the x86 architecture but on many RISC machines the latter is faster because you just need one less instruction. On, for instance, the Digital Alpha a jump comparison x<3 is this (t0 is x):

ldiq t1, 3 // This is only needed in first iteration.
...
...
cmplt t0, t1, t2
bne t2, mylabel

while the x>=0 can be expressed just as

bge t0, mylabel

and this is clearly faster. Since x86 internally also have to subtract to two values before checking the sign I guess it will be slower there too. Remember on the x86, some instructions are slower than others and comparison between two integer is more complex than comparing a single number against zero (here only the sign bit needs to be tested).
Jacob Marner, M.Sc.Console Programmer, Deadline Games
quote:
return Factorial<N-1>::Get()+N;


I think you meant

return Factorial<N-1>::Get()*N;

but anyways. So will the compiler essentially pre-compute the factorial function so that it is not done at run-time if given constant input? What happens with


  int x;cin >> x;cout << Factorial<x>::Get() << endl;cout << Factorial<7>::Get() << endl;  


I don''t see how the first call could be expanded, but the second could.

This topic is closed to new replies.

Advertisement