loop break

Started by
47 comments, last by Ectara 9 years, 11 months ago

This is really just an obfuscated version of the idiom:


while (i--)
{
    /* do something */
}

Where i is a preexisting variable.

Not really complete unless you show the 'i' variable getting its initial value.

Somehow it would have to get the right value previously (and having it set somewhere far above would obscure the codes obvious operation and add to the potential for a error).

Yes, the point is a lot of the time the bounds you are interested in are already in a local variable (or a function parameter). Not that I would do this unless it was clear from context what I was doing. But in my opinion, it's much better than using a for loop with its iteration function (i--) in its termination condition.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Advertisement


Ectara, you don't cache the limit variable into a local in the up case, so it has to be fetched every iteration, which doesn't happen in your down case.

...Isn't that the point, that counting down doesn't fetch on every iteration? It is cached into a local variable; if you look at the ASM, it mov's the value from an offset from the stack pointer. That sounds local to me. It's doing exactly what it would if there were too many values held in the loop, and the counter spilled to RAM.

I thought the original point was the compiler's optimizer would essentially create the same exact code for:


for(int i=10;i--;){}

and


for(int i=0;i<10;i++){}

however by including violatile, you effectively shut off the optimizer from doing it's thing.

yea, there are edge cases where this would not be true(and you were trying to show that edge case). but that's just all it is, edge cases where yes you are correct, a count down version would theoretically generate tinier code. but I think if you get to the point where your profiling indicates that this one extra instruction is your bottlenexk, you likely have too weak of hardware for w/e your trying to do.

Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.


I.e compare this:



for (int i = 10; i--;)



To this:



for (int i = 10; i; i--)

I have no idea where you're getting that, as the second version never appeared in the thread. Furthermore, that's not even the same loop! Think about it: i is 10 at the start, passes the non-zero check, then it is 10 as it goes into the loop body...

I have no idea what you're on about.


Counting up wouldn't have fetched each iteration if you'd cached he volatile into a local. If you want to simulate register pressure, then use a lot of variables in the loop body rather than disabling the optimizer via volatile.

Clearly, your definition of "local" and mine don't align. To me, a local variable is one of automatic storage duration defined with in a function or deeper scope. Volatile doesn't change that.

Moreover, it seems like you're arguing my method, rather than my result. I don't know why it isn't obvious that using two temporary variables (that the compiler can't optimize away to a constant) instead of one increases register pressure, and I need to prove it with numerous examples.


Basically if you pull that 'clever' shit you can consider yourself unemployed pretty quick.

Someday, people won't always assume that I'm not employed as a software engineer right now, and tell me the dictionary definition of obvious terms.


You can't use a forced example to demonstrate the optimizer's actual behavior!

Then that nullifies all of your suggested modifications to my exercise, and we're at a stalemate.


rather than forcing its hand with code that gives it zero leeway.

Because the point was that if it isn't put into a register, it can result in larger code. You are making all suggestions that would either optimize it to a constant or cache it in a register, which would still be no faster than counting down; at best, it'd break even. My argument from the beginning is reducing register pressure and comparing against zero, while your suggestions are allowing the compiler to assume there's no register pressure and use all registers at once.


Neither use negative values so signed/unsigned is nuetral, but it would be interesting to see if it affects the codegen (many code bases use uint over int by default).

Post results, I'm curious.


IMO it's also more common to write !=limit and !=0 rather than 0...

In my and my co-worker's opinions, it isn't.


If you're testing whether the compiler can transform between up/down

I'm not.


I thought the original point was the compiler's optimizer would essentially create the same exact code for:

for(int i=10;i--;){}

and

for(int i=0;i<10;i++){}

I posted three examples that disagreed with the consensus, and I got refutation about none of them in particular, so I chose the one that was most likely to make the biggest difference.


but that's just all it is, edge cases where yes you are correct, a count down version would theoretically generate tinier code.

I hardly consider any of my original three examples an edge case. I have a lot of loops that does a lot of permutations on every iteration, and the counter spills to RAM.


but I think if you get to the point where your profiling indicates that this one extra instruction is your bottlenexk

No, I never said that. People said they would be the same, and I posted an example where they weren't, with proof. This and many other arguments in this thread is a complete and utter strawman argument.

Really, I should have known better than to come into a coding horror thread, and post a use case for the idiom that people are dumping on.

Moreover, it seems like you're arguing my method, rather than my result.

Exactly. I'm not attacking you or arguing the results. I'm honestly interested in what the results are. No need to get super defensive.
...I was just pointing out that by using volatile, you've forced the compiler's hand, so you may as well have just hand-written the assembly; it's not a test of the compiler, so seeing that the method is artificial, the results are also artificially achieved.

You can't use a forced example to demonstrate the optimizer's actual behavior!

Then that nullifies all of your suggested modifications to my exercise, and we're at a stalemate.

Not quite. I was suggesting that your example forces the compiler to only produce very specific assembly -- that your use of volatile has explicitly told the compiler exactly how it's allowed to construct the loops, and that it's not allowed to make any changes/optimizations.
Real production code will not use the volatile keyword like this (in 99.99% of cases) so it's not at all natural code. Write it naturally and perhaps the compiler can convert a forwards loop into a superior, optimal backwards loop?

If you're testing whether the compiler can transform between up/down

I'm not.

Oh, you threw me off with this where you tried to prove that the up/down versions produced different code tongue.png

You do however realize that an optimizing compiler will produce exactly the same code whether you use the obfuscated version or not?

[up.c ... down.c ...] They aren't identical, and the counting down version has one fewer instruction (compiled with gcc -O3 -S). Yes, cleaner code is better. However, it is all too easy to say that they compile to the same instructions.

getting certed [...] documents [...] meetings [...] lots of shit [...] huge extra cost
If I didn't know that this is exactly true and if it wasn't so sad, I'd say congrats on an excellent joke. Sadly, all that matters is whether or not some totally irrelevant (and likely not even competent) person gets an itch, because when that happens you simply don't get the "certified" stamp which is soooooooooooo important. But again, sadly, it's really important because the next incompetent idiot will only buy a product that is certified, no matter whether it's good or bad, and he simply won't buy yours if you don't have the stamp... sad.png

getting certed [...] documents [...] meetings [...] lots of shit [...] huge extra cost
If I didn't know that this is exactly true and if it wasn't so sad, I'd say congrats on an excellent joke.

mostly our stuff is safety critical, it's a legal matter, as in you go to jail or pulled up on corporate manslaughter if mistakes are made. Reviews are logged with your name at official bodies, every last test result is recorded and signed in triplicate. Every line of code is scrutinised. The restrictions on what you can do with the languages are huge. It can be miserable at times.

ah, sorry Samoth, I misread that as if you weren't sure how quality standards are met. Now I see you do! It's been a long day smile.png

In addition to making sure that it doesn't all get optimized out, the volatile limit simulates register pressure; it is entirely possible for that counter to spill to RAM.

It's actually really cute what happens when you remove the volatile from each case:
// down.c
	.cfi_offset %rbx, -24
	movq	___stack_chk_guard@GOTPCREL(%rip), %rbx
	movq	(%rbx), %rax
	movq	%rax, -16(%rbp)
	movl	$10000, %eax            ## imm = 0x2710
	xorps	%xmm0, %xmm0
	.align	4, 0x90
LBB0_1:
	movups	%xmm0, -10032(%rbp,%rax)
	addq	$-16, %rax
	jne	LBB0_1
// up.c
	.cfi_offset %rbx, -24
	movq	___stack_chk_guard@GOTPCREL(%rip), %rbx
	movq	(%rbx), %rax
	movq	%rax, -16(%rbp)
	leaq	-10016(%rbp), %rdi
	movl	$10000, %esi            ## imm = 0x2710
	callq	___bzero
It appears that the optimiser does not know that the two loops are identical, as the more normal upwards loop is optimised to a call to bzero(), whereas the downwards case actually generates a loop.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

volatile simulates register pressure [...] possible for the counter to spill to RAM

There is a lot more to volatile than just spilling the contents to RAM. The compiler is told to assume that the contents of the memory location can be changed externally, at any time.

Which means that it cannot reorder reads and writes, and cannot coalesce fetches. It must fetch the value exactly when it is used, not any earlier or later, and it will completely stall the processor's pipeline doing so.

If the loop counter has to be written to RAM because of register pressure, the compiler hides the latency by fetching the value early, so it happens while other instructions execute (if the loop is not totally trivial of course, but a trivial loop won't have high register pressure).

In addition to making sure that it doesn't all get optimized out, the volatile limit simulates register pressure; it is entirely possible for that counter to spill to RAM.

It's actually really cute what happens when you remove the volatile from each case:

// down.c
	.cfi_offset %rbx, -24
	movq	___stack_chk_guard@GOTPCREL(%rip), %rbx
	movq	(%rbx), %rax
	movq	%rax, -16(%rbp)
	movl	$10000, %eax            ## imm = 0x2710
	xorps	%xmm0, %xmm0
	.align	4, 0x90
LBB0_1:
	movups	%xmm0, -10032(%rbp,%rax)
	addq	$-16, %rax
	jne	LBB0_1

// up.c
	.cfi_offset %rbx, -24
	movq	___stack_chk_guard@GOTPCREL(%rip), %rbx
	movq	(%rbx), %rax
	movq	%rax, -16(%rbp)
	leaq	-10016(%rbp), %rdi
	movl	$10000, %esi            ## imm = 0x2710
	callq	___bzero
It appears that the optimiser does not know that the two loops are identical, as the more normal upwards loop is optimised to a call to bzero(), whereas the downwards case actually generates a loop.

Amazing what happens when you actually compare like-with-like, isn't it?

Of course a contrived case can be set up to prove any point, so it is important to do an equivalent comparison.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement