Need speed - :for-loop" it or type it?

Started by
4 comments, last by Verso 22 years, 10 months ago
I have an inline assembly function to put pixels directly onto a DDraw surface, and my question is - should I use a for loop to call the function (~100 calls for a "laser"), or will that really slow things down (relative to the speed of the function itself)? The other option, which is just a little bit tedious to say the least, is to directly call the function.
Remember - Hard work pays off in the long run, but laziness pays off immediately.
Advertisement
Remember that, with a loop, you have the added overhead of (a) incrementing or decrementing a counter, and (b) the conditional check of the counter to see whether to jump out of the loop.

I would recommend calling it 100 times manually. Although you should profile it and see whether it matters. Remember, with copy and paste, you only need to paste a few times (copy 1, paste one, then copy both of them, paste again, and repeating this, you''ll have 100 in no time )
Without going into a great deal of detail the benefit of unrolling a loop is in instruction scheduling and not function call overhead. If it is performance critical you should unroll it to some degree. Unrolling it simply means doing more in one pass of the loop. So rather than doing one pixel at a time you do two or four. There quickly comes a point where the benefit you get isn't worth the effort it takes. Unrolling 100 iterations of a loop would certainly fall in that category. Unrolling four or eight would get you the vast majority if not all of the measurable benefit that unrolling a hundred iterations would get you. You should start though by looking at the disassembly and seeing if there is anything obviously redundant or unneeded. Just because you tell it to inline a function doesn't mean the compiler will. Just because it inlines it doesn't mean it isn't allocating and copying variables that strictly speaking are unneeded.

You also have to not just look at the function being called, but also the function calling it and perhaps even the function that called it. Just as an example if your pixel plotting routine is being called by a rectangle drawing routine that may be nice structured programming but it isn't good for performance because the lower routine has to repeatedly convert x,y into a memory address. You could in that case do a horizontal line routine instead. You would still be doing a multiply on the y, but at least it would be once per line rather than once per pixel. That multiply for successive lines could be replaced with an add if it was drawing the rectangle instead. Before you leap to excesses you should first be sure the simple stuff is taken care of.

p.s. If you are going to go to an excess then use a macro to generate it. Then at least the code stays maintainable.

Edited by - LilBudyWizer on June 22, 2001 11:54:15 AM
Keys to success: Ability, ambition and opportunity.
Also note that the inline parameter is just a suggestion to the compiler. By using a loop or calling other non-inlined functions within a inlined function the compiler will most likely treat the inline function as a standard function, voiding the whole purpose of making the function inlined in the first place.
Joseph FernaldSoftware EngineerRed Storm Entertainment.------------------------The opinions expressed are that of the person postingand not that of Red Storm Entertainment.
To get some extra inlining, try this:

#pragma inline_depth( 255 )
#pragma inline_recursion( on )
#pragma auto_inline( on )

#define inline __forceinline


It still won''t work unless you turn inlining on in the compiler
options though.

Is the purpose of this this pixel plotting function to allow
different pixel formats? Because functions called through
function pointers can''t inline, obviously.

Why are you plotting pixels anyway? If you''re drawing lines
then use a line algorithm. Otherwise use a sprite
Most optimizing compilers should unroll the loop automatically (although it doesn''t hurt to do your own optimizations by hand). Unrolling a loop that much is *way* too much overkill, however. The cost of iterating through 100 pixels is pretty much nothing as long as the buffer is in system memory (writing 100 pixels to VRAM might stall things a bit, however).

I wouldn''t worry about it too much. If it''s extremely slow you can always come back to it later. Besides, prematurely optimizing things in a fashion that makes them dificult to change (such as the unrolling of a loop 100 times) is a bad idea. If you want to change the laser effect later it will be a nightmare.

This topic is closed to new replies.

Advertisement