Archived

This topic is now archived and is closed to further replies.

Fast Loop

This topic is 5503 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I''m in the process of completing the levels for a new 2D game that I''m working on and I''m getting real close to creating the game engine. In the previous game I created, which was 2D, I use C-style loops but I didn''t have too many objects to deal with. But now with my new which will have long levels and many of them I''m going to need to make fast loops to have the game speed be acceptable. Just to let you know I wrote both games C++. Here is an example of the loops I would use in my previous game (Quackoo vs Tie Fighter)... Example 1 for (int x=0;x);x++) // Do some stuff... while(!infile.eof()) // Do more stuff... See that''s what I mean. I think my loops are just too slow. If you have any suggestions please tell me. I really need help with this.

Share this post


Link to post
Share on other sites
well....

you can''t really speed up a loop as such (i.e. a for loop is a for loop is a for loop). you can speed up the logic performed in each iteration of a loop which is what i think you''re asking. some things you just flat out have to do for every object in the game -> movement, collision detection, etc. you can speed up collision detection and other types of things that only happen to "close together" object by creating a spatial partitioning scheme like BSP or OctTrees. that way the collision detection part of the loop isn''t an O(n^2) algorithm.

your best case for an update loop is probably something like O(nlogn), but i''m really just making that up . a draw loop can similarly be made much more efficient by using the same spatial partitioning scheme to decide upon Potentially Visable Object Sets. sniff around the Articles & Resources Game Programming section (i think that''s the one with all the spatial partitioning schemes).

did i answer your question?

-me

Share this post


Link to post
Share on other sites
quote:
Original post by FrancoisSoft
I''m in the process of completing the levels for a new 2D game that I''m working on and I''m getting real close to creating the game engine. In the previous game I created, which was 2D, I use C-style loops but I didn''t have too many objects to deal with. But now with my new which will have long levels and many of them I''m going to need to make fast loops to have the game speed be acceptable. Just to let you know I wrote both games C++. Here is an example of the loops I would use in my previous game (Quackoo vs Tie Fighter)...

Example 1


for (int x=0;x);x++)
// Do some stuff...

while(!infile.eof())
// Do more stuff...


See that''s what I mean. I think my loops are just too slow. If you have any suggestions please tell me. I really need help with this.



The best way to quicken your loops is to unroll them. In other words, instead of looping 100 times, loop 10 times and repeat the instruction 10 times in the loop body. That''s an old DOS trick that Lamothe gave in his original Tricks Of The Game Programming Gurus... I don''t think you''ll see that much of an improvement but it should help out a bit. It''ll save you a couple of cycles per iterations.





[Cyberdrek | the last true sorcerer | Spirit Mage - mutedfaith.com]

Share this post


Link to post
Share on other sites
Here is an other loop question:

How does the infinite loop for( ;; ) { ... }
translate intto assembly. I mean, if I do not
have a init, cond and expr in the loop, does
the compiles also leave out the tests andstuff?

LOOP: ...
jump LOOP

Share this post


Link to post
Share on other sites
Like Cyberdrek said.

Unrolling the loop will cut down the loop overhead. A simple loop
like this:


      
// Calculate offset and determine memory address end before loop

while(offset<offsetEnd) // one compare, one conditional branch

{
destination[offset]= source[offset++] + 1; // one increment

} // one unconditional branch



Unrolling the process 10 times would cut the loop overhead down to about a 10th. However, this will bloat the code size and may cause cache problems for the CPU so don't go nuts.


  
// Calculate memory end offset and determine memory address before loop

while(destination<destinationEnd) // one compare, one conditional branch

{
destination[0]= source[0] + 1;
destination[1]= source[1] + 1;
destination[2]= source[2] + 1;
destination[3]= source[3] + 1;
destination[4]= source[4] + 1;
destination[5]= source[5] + 1;
destination[6]= source[6] + 1;
destination[7]= source[7] + 1;
destination[8]= source[8] + 1;
destination[9]= source[9] + 1;
destination += 10; // one addition

source += 10; // one addition

} // one unconditional branch



If there where 100 MOVs to be performed the loop overhead in the unrolled loop whould be 10 compares, contional and unconditional braches instead of 100. There are 20 additions in the unrolled vs 100 increments in the original. (Cutting out 350 instructions, sorry don't know the estimated cycle time that could save)

Or something like that.

Anyway unrolling loops will only have the biggest impact if the loop over head has a close ratio to the process like 1:2 but if it were 1:1000 unrolling the loop won't do much. (besides that process would take ~4000-5000 cycles)

Also: I hope an empty loop is converted to a simple unconditional jump by the complier!! I guess that depends on the complier.


[edited by - CodeJunkie on November 20, 2002 12:38:03 AM]



[edited by - codejunkie on November 21, 2002 2:50:31 PM]

Share this post


Link to post
Share on other sites
I''m not really trying to confuse anyone, but "That''s an old DOS trick" is quite true. Ever since the Pentium and branch prediction unrolling loops has become in many cases *more* cpu intensive than a normal loop. This is because branch prediction allows the processor to avoid misses within a loop (effectively making the conditional branch take 1 cycle, even appear to take "0" cycles on some). At the same time an unrolled loop can strain your cache (especially on a P4), which could mean 2 cache misses for *every* iteration of the loop.

Share this post


Link to post
Share on other sites
Good Point Michalson.

What size code will cause a cache miss then?
Could you possibly get away will unrolling a loop just a few times if its small enough?
I mean the loop has some overhead doesn't it? (yeah it might not be much thou)

Thanks

[edited by - CodeJunkie on November 20, 2002 9:10:52 PM]

Any help with organizing code in C/C++ to take advantage of new CPUs pipelining to avoid cache misses would be great.

[edited by - CodeJunkie on November 20, 2002 9:30:58 PM]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Template specialization for unrolling a loop entirely can be good.

Share this post


Link to post
Share on other sites
quote:
Original post by nano
Here is an other loop question:

How does the infinite loop for( ;; ) { ... }
translate intto assembly. I mean, if I do not
have a init, cond and expr in the loop, does
the compiles also leave out the tests andstuff?

LOOP: ...
jump LOOP

Assembly language features an unconditional jump (aka branch always) instruction. If you use a break to terminate the loop, the assembler inserts a label after the loop as the branch destination on the break. Otherwise you''ll need a goto (another unconditional branch, this time to a location outside the loop).

Share this post


Link to post
Share on other sites
are you sure that''s where your problem lies?

profile first to identify that you really have a problem

also ++i is quicker than i++ as there is no temporary variable created

Share this post


Link to post
Share on other sites
quote:
Original post by petewood
also ++i is quicker than i++ as there is no temporary variable created
Not in the case of primitive types (like the one used in his example)

Share this post


Link to post
Share on other sites
quote:
Not in the case of primitive types (like the one used in his example)

why are primitive types any different? there has to be something to hold the temporary value.

lets not get into discussions of what the compiler does as there are too many to compare.

baseline, there will be a temporary of come kind.

It was only a passing comment, not meant as the big solution to FrancoisSoft''s loops but something that you can use, rather than i++, by default.

Share this post


Link to post
Share on other sites
With primitives, the compiler can safely transform postincrement to preincrement when it knows the previous value is unused.

Not so with iterators and such.

Share this post


Link to post
Share on other sites
quote:
Original post by DrPizza
With primitives, the compiler can safely transform postincrement to preincrement when it knows the previous value is unused.

Not so with iterators and such.



Maybe I'm wrong (I only read articles on assembly and wrote a few lines yeeeeeeeaaaaaars ago) but the cpu does actually have dedicated built-in operators for primitive types, especially integers. I would not be surprised that increment/decrement by 1 is a built-in operator in the cpu...

===EDIT===

Sorry, I realize that what I say is off-topic whith what I quoted... m(_ _)m


----
David Sporn AKA Sporniket

[edited by - davidsporn on November 21, 2002 10:40:43 AM]

Share this post


Link to post
Share on other sites
quote:
Original post by davidsporn
I would not be surprised that increment/decrement by 1 is a built-in operator in the cpu...

What does that have to do with it?

Share this post


Link to post
Share on other sites
quote:
Original post by DrPizza
With primitives, the compiler can safely transform postincrement to preincrement when it knows the previous value is unused.
That's not the whole reason, because the post-incremented variable can be within an expression too. What postincrement does is to use the current value of the variable inside the expression, and increment the variable after the expression.
Example:

5*(x++)+2;

<=>

5*x+2;
++x;

The temporary problem comes only when user defined types try to mimic this behaviour, which is impossible without a temp.
quote:
Original post by petewood
baseline, there will be a temporary of come kind.
I don't see any in the above simplification I made, even though it post-increments correctly. It simply delays the incrementation until the expression has been evaluated.

Of course there's a slight possibility that a compiler exists which uses a temp for this. But I doubt it, since it's so trivial to move the increment operation a bit lower.

It works with branching code too, since comparing is separate from the branch operation.

if (x++) {}

becomes

cmp ax, 0
inc ax, 1
je after_if

(or something, ages since I've written x86 asm)

[edited by - civguy on November 21, 2002 2:48:06 PM]

Share this post


Link to post
Share on other sites