loop break

Started by
47 comments, last by Ectara 9 years, 11 months ago

Nice to see a simple but interesting question get discussed in detail tbh.

Advertisement

I'm going to be honest with all of you. This seems like a pointless and impossible argument. I knew coming in to it that playing devil's advocate wouldn't make me very popular here.

However, if you're on a platform where you actually care about the code size in terms of free bytes, then unrolling loops and using extra registers for simple bookkeeping is counter-productive.

Things I am arguing:

  1. If you're counting bytes, small loops are important.
  2. If you're on a low-powered platform, you may not even have that good of an optimizing compiler.
  3. If you're on a slow, small-cache/cacheless CPU, that extra register matters.

Things I am NOT arguing:

  1. There isn't better code for modern machines and processors.
  2. If you [modify the scenario in some way that suits some particular argument], it won't optimize to nothing or some large, but fast, control structure.
  3. This is the preferred coding method.
  4. Counting down is easier to read.
  5. Optimizers are bad at optimizing.

I feel this is turning into a flood of "Look, when I set the limit to zero, the counting up version converts to an AVX loop that plays 'Moonlight Sonata' out of my PC speaker. Take THAT!" and everyone high-fives and up-votes. To be honest, this is all missing the point. Most of the arguments here are throwing out my reasons why I offered a use case, and just attack the flimsiest part of the argument, by offering solutions that make code that is astronomically large for a simple loop, use 2 or more registers just for the loop counting (which, if you only have 4 registers, is a waste), and even do things as bizarre as to make it into a function call.

I suggested that where it matters, it can make a difference. Showing that on a high-speed, giant three-tiered cache, 16 64bit general purpose register processor it doesn't matter is great. However, interpreting these results as being relevant to the original point that I made is misleading.

EDIT: As evidenced by the downvote.

I'm going to be honest with all of you. This seems like a pointless and impossible argument. I knew coming in to it that playing devil's advocate wouldn't make me very popular here.

However, if you're on a platform where you actually care about the code size in terms of free bytes, then unrolling loops and using extra registers for simple bookkeeping is counter-productive.

Things I am arguing:

  1. If you're counting bytes, small loops are important.
  2. If you're on a low-powered platform, you may not even have that good of an optimizing compiler.
  3. If you're on a slow, small-cache/cacheless CPU, that extra register matters.

Things I am NOT arguing:

  1. There isn't better code for modern machines and processors.
  2. If you [modify the scenario in some way that suits some particular argument], it won't optimize to nothing or some large, but fast, control structure.
  3. This is the preferred coding method.
  4. Counting down is easier to read.
  5. Optimizers are bad at optimizing.

I feel this is turning into a flood of "Look, when I set the limit to zero, the counting up versions converts to an AVX loop that plays 'Moonlight Sonata' out of my PC speaker. Take THAT!" and everyone high-fives and up-votes. To be honest, this is all missing the point. Most of the arguments here are throwing out my reasons why I offered a use case, and just attack the flimsiest part of the argument, by offering solutions that make code that is astronomically large for a simple loop, use 2 or more registers just for the loop counting (which, if you only have 4 registers, is a waste), and even do things as bizarre as to make it into a function call.

I suggested that where it matters, it can make a difference. Showing that on a high-speed, giant three-tiered cache, 16 64bit general purpose register processor it doesn't matter is great. However, interpreting these results as being relevant to the original point that I made is misleading.

I have to say that if you'd qualified your original posts with these points things may have gone differently; the ensuing discussion may have even been interesting rather than tedious!

From my point of view, your earlier posts in this thread came across as though you were someone who was claiming that "for (i = something; i--;)" was faster than "for (i = something; i > something_else; i--)" and that you were providing a disingenuous contrived example that wasn't an apples-to-apples comparison in order to prove a point. I'm not saying that's what you were doing, I'm saying that's how you came across (I need to stress this because you've been guilty of doing what you claim others have done to you too; i.e misrepresenting their positions and arguments).

"Where it matters it can make a difference" is something that's true of anything: one could hypothesise a processor that's several orders of magnitude slower at subtraction than it is at addition and in that use case - even for asm instructions and register usage - you'd probably still want to count up. I think that hypothetical (and, I admit, contrived) example defeats your final statement (with specific applicability to counting down vs counting up, not in general terms), but I've no arguments with the thinking behind it, which seems to me to essentially boil down to: "choose your optimization strategies according to what's good and what actually gives results on your target platform".

But surely we all already knew that?

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Trying it on Visual C++, counting up:

; 10   : 	for(unsigned int i = 0; i != limit; ++i){

	mov	ecx, DWORD PTR _limit$[ebp]
	add	esp, 12					; 0000000cH
	xor	eax, eax
	test	ecx, ecx
	je	SHORT $LN1@up
$LL3@up:

; 11   : 		buf[i] = i;

	mov	BYTE PTR _buf$[ebp+eax], al
	inc	eax
	cmp	eax, ecx
	jne	SHORT $LL3@up
$LN1@up:

; 12   : 	}
Down:

; 23   : 	for(unsigned int i = limit; i-- ; ){

	mov	eax, DWORD PTR _limit$[ebp]
	add	esp, 12					; 0000000cH
	test	eax, eax
	je	SHORT $LN6@down
$LL2@down:
	dec	eax

; 24   : 		buf[i] = i;

	mov	BYTE PTR _buf$[ebp+eax], al
	jne	SHORT $LL2@down
$LN6@down:

; 25   : 	}
We do appear to have saved an additional compare instruction when counting down.

I'm not familiar with the Visual C++ command line, and the project I used is a bit of a dumping ground for helping people here and elsewhere, so it is possible I've some odd options enabled, but the basic optimisation ones appear to


But surely we all already knew that?

Not everyone; there's a For Beginner's section of the site, and a lot of people may interpret this subforum as a "what not to do" of programming. It's entirely possible for someone to read this, and start automatically condemning every usage of it without bothering to ask why.

I'd prefer that they learn to write well-reading code if they are just learning, but there's no reason to ban parts of their toolbox if it is the tool for the job, unless there's red tape.

This topic is closed to new replies.

Advertisement