Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

Gammastrahler

for loops and overhead?

This topic is 5919 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

hi, i´ve programmed a vector template class so that i can reuse my code for 2D, 3D or 4D vectors of different data types. therefore, most of the methods must use a for loop like this:
  
template <unsigned int N, class T = float>
float CVector<N, T>::Dot(const CVector &pVec) const
{
   float result = 0;
   for (unsigned int i = 0; i < N; i++) 
      result += v[i] * pVec.v[i];
   return result;
}
  
my question is: do for loops have any significant overheads / performance drawbacks? thanks Gammastrahler

Share this post


Link to post
Share on other sites
Advertisement
For loops are compiled into conditional jumps which are very fast. I see nothing unneccessary in your function, and every operation you are doing will be fast.

Using the benefits of reusability via templates and writing lean and mean code. That''s why i like writing good C code with C++''s encapsulation.
*sniff* And that''s what game programming should be all about.

Oh, and for further speed questions I advise you learn how to use your compiler/IDE''s profiler. Its a really powerful tool that tells you what areas you need to redesign/optimize, instead of searching line by line for redundancy.

I''ve always lived by the philosophy of write everything lean and clean, and write your graphics/networking code fast and mean.

- Kevin "BaShildy" King
Game Programmer: DigiPen
www.mpogd.com

Share this post


Link to post
Share on other sites
In many cases a for loop is slower than avoiding it altogether. Although it contains only a conditional branch machine code instruction such as branch still takes time.

However, some compilers support an optimization called "loop unrolling". The compiler will attempt to detect whether it can know how many time the loop with execute and if the number is small it will eliminate the loop completely by replacing it with the contents directly. If N was 3 you code would be replaced by the compiler with:

result += v[0] * pVec.v[0];
result += v[1] * pVec.v[1];
result += v[2] * pVec.v[2];

Whether this happens in your case you can check by looking at the resulting assembler code.

Some compilers (such as Intel C++ or Codeplay VectorC) detect such constructions and can compile them to MMX/SSE/3DNow instructions.

Also, if you want to help your compiler to produce faster code (whether it actually helps, depends on the compiler) you make sure it tests against zero instead of N like this:

for (unsigned int i = N-1; i >= 0; --i)

BTW, I think there is a bug in your program, result shouldn't be declared as float, but as T and the same goes for the return type.

[edited by - felonius on June 1, 2002 7:23:58 AM]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quote:

Also, if you want to help your compiler to produce faster code (whether it actually helps, depends on the compiler) you make sure it tests against zero instead of N like this:



Speaking from an x86 point of view, why would this be faster? You have to get the flags set somehow to jump off them. The traditional way you would do:

cmp ebx, ecx
jge
...

And for the one you outlined you would do perhaps:

add ebx, 0
jl
...

Why would the latter be any faster than the former?

Share this post


Link to post
Share on other sites
If you would like the compiler to unroll the loop for you, look into template meta programming. Of course if you are worried about performance, you might want to inline it also.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
"conditional jumps which are very fast"

aren''t they actually among the slowest of instructions? Afterall, a mispredict creates a bubble in the pipeline. Loops that are only 2 to 4 iterations long are the worst too. A two iteration loop would be mispredicted half the time, which would be devastating right?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
In this specific case, where N will probably be either 2, 3, or 4, isn''t it faster to unroll the loops?

Also, in a function like "Dot" that will be called very often, wouldn''t it be better to create "Result" once as some dummy class member?


Share this post


Link to post
Share on other sites
Two ways to avoid loops in C++:
Template Specialization:

  
template <unsigned int N>
inline unsigned int Factorial(void) {
unsigned int sum = 1;
for(unsigned int a=N; a>1; --a)
sum += a;
return sum;
}

// A specialization for if the factorial of 3 is needed

template <>
inline unsigned int Factorial<3>(void) {
return 6;
}

Template Metaprogramming:

  
template <unsigned int N>
struct Factorial {
static inline unsigned int Get(void) {
return Factorial<N-1>::Get()+N;
}
};

template <>
struct Factorial <1> {
static inline unsigned int Get(void) {
return 1;
}
};

template <>
struct Factorial <0> {
static inline unsigned int Get(void) {
return 0;
}
};

I once wrote a bubble sort using metaprogramming, it''s a lot of fun to play with. I''m sure you can figure out how to add the type template parameter back into that (if your compiler happens to support partial specialization, that is).

Share this post


Link to post
Share on other sites
>> Why would the latter be any faster than the former?

I don''t know about the x86 architecture but on many RISC machines the latter is faster because you just need one less instruction. On, for instance, the Digital Alpha a jump comparison x<3 is this (t0 is x):

ldiq t1, 3 // This is only needed in first iteration.
...
...
cmplt t0, t1, t2
bne t2, mylabel

while the x>=0 can be expressed just as

bge t0, mylabel

and this is clearly faster. Since x86 internally also have to subtract to two values before checking the sign I guess it will be slower there too. Remember on the x86, some instructions are slower than others and comparison between two integer is more complex than comparing a single number against zero (here only the sign bit needs to be tested).

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
quote:

return Factorial<N-1>::Get()+N;



I think you meant

return Factorial<N-1>::Get()*N;

but anyways. So will the compiler essentially pre-compute the factorial function so that it is not done at run-time if given constant input? What happens with


  
int x;

cin >> x;
cout << Factorial<x>::Get() << endl;
cout << Factorial<7>::Get() << endl;


I don''t see how the first call could be expanded, but the second could.

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!