Jump to content

  • Log In with Google      Sign In   
  • Create Account


loop break

  • You cannot reply to this topic
54 replies to this topic

#21 mhagain   Crossbones+   -  Reputation: 7802

Like
0Likes
Like

Posted 28 April 2014 - 08:41 AM

 

They aren't identical, and the counting down version has one fewer instruction

 

ONE instruction!  Would you write your own allocator if you felt that malloc was too slow on some platforms?


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


Sponsor:

#22 Ectara   Crossbones+   -  Reputation: 2911

Like
0Likes
Like

Posted 28 April 2014 - 08:52 AM

 

 

They aren't identical, and the counting down version has one fewer instruction

 

ONE instruction!  Would you write your own allocator if you felt that malloc was too slow on some platforms?

 

That's a false analogy. Standard allocators already exist. If I'm writing a loop, it doesn't already exist; it is up to me to choose which characteristics it will have.

 

EDIT:

That's also ignoring the "one temporary, less register pressure" argument.


Edited by Ectara, 28 April 2014 - 08:54 AM.


#23 Hodgman   Moderators   -  Reputation: 29302

Like
2Likes
Like

Posted 28 April 2014 - 04:55 PM

Ectara, you don't cache the limit variable into a local in the up case, so it has to be fetched every iteration, which doesn't happen in your down case.
The use of volatile is also going to do god knows what to the optimizer - you can't compare code that touches that keyword to any other code.

Remove volatile, add that local, maybe also use unsigneds so the compiler doesn't have to reason about negative numbers, make the up condition a != rather than a <, and they'll be more equivalent. Ten add a loop that passes the output array's values into a printf or something to ensure they're not optimized out.

#24 Ectara   Crossbones+   -  Reputation: 2911

Like
0Likes
Like

Posted 28 April 2014 - 05:09 PM


Ectara, you don't cache the limit variable into a local in the up case, so it has to be fetched every iteration, which doesn't happen in your down case.

...Isn't that the point, that counting down doesn't fetch on every iteration? It is cached into a local variable; if you look at the ASM, it mov's the value from an offset from the stack pointer. That sounds local to me. It's doing exactly what it would if there were too many values held in the loop, and the counter spilled to RAM.

 


The use of volatile is also going to do god knows what to the optimizer

I posted the ASM. You can see exactly what happened.

 


Remove volatile, add that local

In other words, allow the compiler to see that it's an unchanging number and inline the constant limit (I've tried that).

 


maybe also use unsigneds so the compiler doesn't have to reason about negative numbers, make the up condition a != rather than a <, and they'll be more equivalent.

...In other words, just don't use signed values, so that I can feel good about counting up all the time? I fail to see the point of actually modifying the use case to suit the algorithm, rather than the other way around.

 


Ten add a loop that passes the output array's values into a printf or something to ensure they're not optimized out.

I don't mean to be abrasive, but did you read the assembly output? It clearly moves zero to each of the array elements, exactly as the C code says (array is volatile qualified). It's guaranteed to have not been optimized out, because it says right in the resulting code that it is doing it.



#25 Servant of the Lord   Crossbones+   -  Reputation: 18478

Like
1Likes
Like

Posted 28 April 2014 - 05:29 PM

 

What does it do?


Perhaps this slight variation, using the "goes to" operator, will clarify:
while ( i --> 0) {
    // ...
}

 

I'm not sure if you are being tongue-in-cheek or not, but when I want to iterate backwards, I use (almost) the same thing:

for(int i = 10; i --> 0;)
{
     //...counts backwards: 9,8,7,6,5,4,3,2,1,0
}

Apparently that's frowned upon by some C++ programmers, because people mistake "-->" for its own operator. However, I feel it genuinely helps me remember the syntax of something I rarely use, preventing off-by-one errors. It's a useful mnemonic.

Any programmer who doesn't know this mysterious non-operator will look it up, and henceforth would know it. They won't start actively using an operator that they don't know what it does - and if they do, then your use of --> in your code base is the least of your concerns.


It's perfectly fine to abbreviate my username to 'Servant' rather than copy+pasting it all the time.
All glory be to the Man at the right hand... On David's throne the King will reign, and the Government will rest upon His shoulders. All the earth will see the salvation of God.
Of Stranger Flames - [indie turn-based rpg set in a para-historical French colony] | Indie RPG development journal

[Fly with me on Twitter] [Google+] [My broken website]

[Need web hosting? I personally like A Small Orange]


#26 catslap   Members   -  Reputation: 35

Like
1Likes
Like

Posted 28 April 2014 - 05:47 PM

Note that these are just examples; I don't know what you or your co-workers would find more readable, since the post-decrement loop is very readable to me.

 

It's a bit deeper than that. The coding standards are legal standards, and go off to a lab to get certed. You can break a rule, but it means getting in both reviewers, and  senior manager  booked for a meeting(costly, and pisses people off, although there's biscuits), then a document is written justifying the deviation from standard (costly and boring, massive waste of time and resource), then the certification lab want to check it but charge more (maybe £6k). Basically if you pull that 'clever' shit you can consider yourself unemployed pretty quick. They canned 12 guys on NYE no warning, another 4 last week, all for bad business practice, they were probably adequate coders, just didn't understand the industry.

 

There are times when you get pulled in to make something uber efficient, like context switching on a blistering fast fpga, but mostly, write it like a muppet can read it and you keep on getting paid.

 

I do like that loop though :)



#27 mhagain   Crossbones+   -  Reputation: 7802

Like
1Likes
Like

Posted 28 April 2014 - 05:52 PM

...Isn't that the point, that counting down doesn't fetch on every iteration?

 

No, that's not the point.  This was never about counting down versus counting up, you're picking a completely different fight here, one that nobody else has even gone near talking about, and you're treating everyone else as if they're having the same fight as you whereas they're not.  This is about comparing the cutesy construct given in the OP with a more idiomatic for loop.

 

I.e compare this:

 

for (int i = 10; i--;)

 

To this:

 

for (int i = 10; i; i--)

 

That's what the point is.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#28 wodinoneeye   Members   -  Reputation: 747

Like
0Likes
Like

Posted 28 April 2014 - 06:27 PM

This is really just an obfuscated version of the idiom:

while (i--)
{
    /* do something */
}

Where i is a preexisting variable.

 

Not really complete unless you show the  'i' variable getting its initial value.

 

Somehow it would have to get the right value previously (and having it set somewhere far above would obscure the codes obvious operation and add to the potential for a error).


--------------------------------------------Ratings are Opinion, not Fact

#29 Hodgman   Moderators   -  Reputation: 29302

Like
1Likes
Like

Posted 28 April 2014 - 06:28 PM

...Isn't that the point, that counting down doesn't fetch on every iteration? It is cached into a local variable; if you look at the ASM, it mov's the value from an offset from the stack pointer. That sounds local to me. It's doing exactly what it would if there were too many values held in the loop, and the counter spilled to RAM.

Counting up wouldn't have fetched each iteration if you'd cached he volatile into a local. If you want to simulate register pressure, then use a lot of variables in the loop body rather than disabling the optimizer via volatile.

I posted the ASM. You can see exactly what happened.

and it's not necessarily the same thing that the optimizer would've actually done in real code. Your written an example that does exactly what you assume the compiler will do, but can't possibly show us what the compiler would actually do in a real situation.
You can't use a forced example to demonstrate the optimizer's actual behavior! ;-)
 

In other words, allow the compiler to see that it's an unchanging number and inline the constant limit (I've tried that).

You can use a global, or a parameter from another function, or get the value from another translation unit, etc... That will stop the compiler from hard doing the loop limit in he same way that it does in real code -- rather than forcing its hand with code that gives it zero leeway.
 

...In other words, just don't use signed values, so that I can feel good about counting up all the time? I fail to see the point of actually modifying the use case to suit the algorithm, rather than the other way around.

unsigneds work for both up and down. If you want to compare those two loops fairly then you've got to make them as equivalent as possible -- if one uses "not zero", the other should use "not limit", etc. Neither use negative values so signed/unsigned is nuetral, but it would be interesting to see if it affects the codegen (many code bases use uint over int by default).
IMO it's also more common to write !=limit and !=0 rather than <limit and >0...
 

I don't mean to be abrasive, but did you read the assembly output? It clearly moves zero to each of the array elements, exactly as the C code says (array is volatile qualified). It's guaranteed to have not been optimized out, because it says right in the resulting code that it is doing it.

If you're testing whether the compiler can transform between up/down, then the use of volatile on the output array completely voids your test -- volatile writes can't be reordered, so you've told the compiler explicitly not to modify your iteration order!
So to test this behavior of the optimizer, you need to remove volatile, and then add an alternative method of stopping the code from being optimized away entirely.

#30 wodinoneeye   Members   -  Reputation: 747

Like
0Likes
Like

Posted 28 April 2014 - 06:42 PM

There is also that thing that Ive seen more than a little where in the test clause the constant is put first

 

for(i=0; 10 > i; i++)

   {

   }

 

or more commonly

 

if (10 > varx)  {  }

 

or

 

if (Mode1 == varx)   {     }

 

which usually makes me double check rather more than the classic  i<10  style


--------------------------------------------Ratings are Opinion, not Fact

#31 Bacterius   Crossbones+   -  Reputation: 8482

Like
0Likes
Like

Posted 28 April 2014 - 06:50 PM

 

This is really just an obfuscated version of the idiom:

while (i--)
{
    /* do something */
}

Where i is a preexisting variable.

 

Not really complete unless you show the  'i' variable getting its initial value.

 

Somehow it would have to get the right value previously (and having it set somewhere far above would obscure the codes obvious operation and add to the potential for a error).

 

 

Yes, the point is a lot of the time the bounds you are interested in are already in a local variable (or a function parameter). Not that I would do this unless it was clear from context what I was doing. But in my opinion, it's much better than using a for loop with its iteration function (i--) in its termination condition.


The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#32 slicer4ever   Crossbones+   -  Reputation: 3466

Like
0Likes
Like

Posted 28 April 2014 - 07:00 PM

 


Ectara, you don't cache the limit variable into a local in the up case, so it has to be fetched every iteration, which doesn't happen in your down case.

...Isn't that the point, that counting down doesn't fetch on every iteration? It is cached into a local variable; if you look at the ASM, it mov's the value from an offset from the stack pointer. That sounds local to me. It's doing exactly what it would if there were too many values held in the loop, and the counter spilled to RAM.

 

I thought the original point was the compiler's optimizer would essentially create the same exact code for:

for(int i=10;i--;){}

and

for(int i=0;i<10;i++){}

however by including violatile, you effectively shut off the optimizer from doing it's thing.

 

yea, there are edge cases where this would not be true(and you were trying to show that edge case).  but that's just all it is, edge cases where yes you are correct, a count down version would theoretically generate tinier code.  but I think if you get to the point where your profiling indicates that this one extra instruction is your bottlenexk, you likely have too weak of hardware for w/e your trying to do.


Edited by slicer4ever, 28 April 2014 - 07:06 PM.

Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

#33 Ectara   Crossbones+   -  Reputation: 2911

Like
0Likes
Like

Posted 28 April 2014 - 08:23 PM


I.e compare this:



for (int i = 10; i--;)



To this:



for (int i = 10; i; i--)

I have no idea where you're getting that, as the second version never appeared in the thread. Furthermore, that's not even the same loop! Think about it: i is 10 at the start, passes the non-zero check, then it is 10 as it goes into the loop body...

 

I have no idea what you're on about.

 


Counting up wouldn't have fetched each iteration if you'd cached he volatile into a local. If you want to simulate register pressure, then use a lot of variables in the loop body rather than disabling the optimizer via volatile.

Clearly, your definition of "local" and mine don't align. To me, a local variable is one of automatic storage duration defined with in a function or deeper scope. Volatile doesn't change that.

 

Moreover, it seems like you're arguing my method, rather than my result. I don't know why it isn't obvious that using two temporary variables (that the compiler can't optimize away to a constant) instead of one increases register pressure, and I need to prove it with numerous examples.

 


Basically if you pull that 'clever' shit you can consider yourself unemployed pretty quick.

 

Someday, people won't always assume that I'm not employed as a software engineer right now, and tell me the dictionary definition of obvious terms.

 


You can't use a forced example to demonstrate the optimizer's actual behavior!

Then that nullifies all of your suggested modifications to my exercise, and we're at a stalemate.

 


rather than forcing its hand with code that gives it zero leeway.

Because the point was that if it isn't put into a register, it can result in larger code. You are making all suggestions that would either optimize it to a constant or cache it in a register, which would still be no faster than counting down; at best, it'd break even. My argument from the beginning is reducing register pressure and comparing against zero, while your suggestions are allowing the compiler to assume there's no register pressure and use all registers at once.

 


Neither use negative values so signed/unsigned is nuetral, but it would be interesting to see if it affects the codegen (many code bases use uint over int by default).

Post results, I'm curious.

 


IMO it's also more common to write !=limit and !=0 rather than 0...

In my and my co-worker's opinions, it isn't.

 


If you're testing whether the compiler can transform between up/down

I'm not.

 


I thought the original point was the compiler's optimizer would essentially create the same exact code for:

for(int i=10;i--;){}

and

for(int i=0;i<10;i++){}

I posted three examples that disagreed with the consensus, and I got refutation about none of them in particular, so I chose the one that was most likely to make the biggest difference.

 


but that's just all it is, edge cases where yes you are correct, a count down version would theoretically generate tinier code.

I hardly consider any of my original three examples an edge case. I have a lot of loops that does a lot of permutations on every iteration, and the counter spills to RAM.

 


but I think if you get to the point where your profiling indicates that this one extra instruction is your bottlenexk

No, I never said that. People said they would be the same, and I posted an example where they weren't, with proof. This and many other arguments in this thread is a complete and utter strawman argument.

 

Really, I should have known better than to come into a coding horror thread, and post a use case for the idiom that people are dumping on.



#34 Hodgman   Moderators   -  Reputation: 29302

Like
1Likes
Like

Posted 28 April 2014 - 08:55 PM

Moreover, it seems like you're arguing my method, rather than my result.

Exactly. I'm not attacking you or arguing the results. I'm honestly interested in what the results are. No need to get super defensive.
...I was just pointing out that by using volatile, you've forced the compiler's hand, so you may as well have just hand-written the assembly; it's not a test of the compiler, so seeing that the method is artificial, the results are also artificially achieved.
 

You can't use a forced example to demonstrate the optimizer's actual behavior!

Then that nullifies all of your suggested modifications to my exercise, and we're at a stalemate.

Not quite. I was suggesting that your example forces the compiler to only produce very specific assembly -- that your use of volatile has explicitly told the compiler exactly how it's allowed to construct the loops, and that it's not allowed to make any changes/optimizations.
Real production code will not use the volatile keyword like this (in 99.99% of cases) so it's not at all natural code. Write it naturally and perhaps the compiler can convert a forwards loop into a superior, optimal backwards loop?
 

If you're testing whether the compiler can transform between up/down

I'm not.

Oh, you threw me off with this where you tried to prove that the up/down versions produced different code tongue.png

You do however realize that an optimizing compiler will produce exactly the same code whether you use the obfuscated version or not?

[up.c ... down.c ...] They aren't identical, and the counting down version has one fewer instruction (compiled with gcc -O3 -S). Yes, cleaner code is better. However, it is all too easy to say that they compile to the same instructions.


#35 samoth   Crossbones+   -  Reputation: 4678

Like
0Likes
Like

Posted 29 April 2014 - 04:13 AM

getting certed [...] documents [...] meetings [...] lots of shit [...] huge extra cost
If I didn't know that this is exactly true and if it wasn't so sad, I'd say congrats on an excellent joke. Sadly, all that matters is whether or not some totally irrelevant (and likely not even competent) person gets an itch, because when that happens you simply don't get the "certified" stamp which is soooooooooooo important. But again, sadly, it's really important because the next incompetent idiot will only buy a product that is certified, no matter whether it's good or bad, and he simply won't buy yours if you don't have the stamp... sad.png

#36 catslap   Members   -  Reputation: 35

Like
0Likes
Like

Posted 29 April 2014 - 02:27 PM

 

getting certed [...] documents [...] meetings [...] lots of shit [...] huge extra cost
If I didn't know that this is exactly true and if it wasn't so sad, I'd say congrats on an excellent joke. 

 

 

mostly our stuff is safety critical, it's a legal matter, as in you go to jail or pulled up on corporate manslaughter if mistakes are made. Reviews are logged with your name at official bodies, every last test result is recorded and signed in triplicate. Every line of code is scrutinised. The restrictions on what you can do with the languages are huge. It can be miserable at times.


Edited by catslap, 29 April 2014 - 02:27 PM.


#37 catslap   Members   -  Reputation: 35

Like
0Likes
Like

Posted 29 April 2014 - 02:30 PM

ah, sorry Samoth, I misread that as if you weren't sure how quality standards are met. Now I see you do! It's been a long day smile.png



#38 swiftcoder   Senior Moderators   -  Reputation: 9845

Like
3Likes
Like

Posted 01 May 2014 - 05:36 AM

In addition to making sure that it doesn't all get optimized out, the volatile limit simulates register pressure; it is entirely possible for that counter to spill to RAM.

It's actually really cute what happens when you remove the volatile from each case:
// down.c
	.cfi_offset %rbx, -24
	movq	___stack_chk_guard@GOTPCREL(%rip), %rbx
	movq	(%rbx), %rax
	movq	%rax, -16(%rbp)
	movl	$10000, %eax            ## imm = 0x2710
	xorps	%xmm0, %xmm0
	.align	4, 0x90
LBB0_1:
	movups	%xmm0, -10032(%rbp,%rax)
	addq	$-16, %rax
	jne	LBB0_1
// up.c
	.cfi_offset %rbx, -24
	movq	___stack_chk_guard@GOTPCREL(%rip), %rbx
	movq	(%rbx), %rax
	movq	%rax, -16(%rbp)
	leaq	-10016(%rbp), %rdi
	movl	$10000, %esi            ## imm = 0x2710
	callq	___bzero
It appears that the optimiser does not know that the two loops are identical, as the more normal upwards loop is optimised to a call to bzero(), whereas the downwards case actually generates a loop.

Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#39 samoth   Crossbones+   -  Reputation: 4678

Like
1Likes
Like

Posted 01 May 2014 - 06:41 AM

volatile simulates register pressure [...] possible for the counter to spill to RAM

There is a lot more to volatile than just spilling the contents to RAM. The compiler is told to assume that the contents of the memory location can be changed externally, at any time.

Which means that it cannot reorder reads and writes, and cannot coalesce fetches. It must fetch the value exactly when it is used, not any earlier or later, and it will completely stall the processor's pipeline doing so.

 

If the loop counter has to be written to RAM because of register pressure, the compiler hides the latency by fetching the value early, so it happens while other instructions execute (if the loop is not totally trivial of course, but a trivial loop won't have high register pressure).



#40 mhagain   Crossbones+   -  Reputation: 7802

Like
0Likes
Like

Posted 01 May 2014 - 07:10 AM

 

In addition to making sure that it doesn't all get optimized out, the volatile limit simulates register pressure; it is entirely possible for that counter to spill to RAM.

It's actually really cute what happens when you remove the volatile from each case:
// down.c
	.cfi_offset %rbx, -24
	movq	___stack_chk_guard@GOTPCREL(%rip), %rbx
	movq	(%rbx), %rax
	movq	%rax, -16(%rbp)
	movl	$10000, %eax            ## imm = 0x2710
	xorps	%xmm0, %xmm0
	.align	4, 0x90
LBB0_1:
	movups	%xmm0, -10032(%rbp,%rax)
	addq	$-16, %rax
	jne	LBB0_1
// up.c
	.cfi_offset %rbx, -24
	movq	___stack_chk_guard@GOTPCREL(%rip), %rbx
	movq	(%rbx), %rax
	movq	%rax, -16(%rbp)
	leaq	-10016(%rbp), %rdi
	movl	$10000, %esi            ## imm = 0x2710
	callq	___bzero
It appears that the optimiser does not know that the two loops are identical, as the more normal upwards loop is optimised to a call to bzero(), whereas the downwards case actually generates a loop.

 

 

Amazing what happens when you actually compare like-with-like, isn't it?

 

Of course a contrived case can be set up to prove any point, so it is important to do an equivalent comparison.


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.






PARTNERS