# optimizing compilers

This topic is 3255 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I just fiddled around with gcc to see how well it optimizes tail recursion.
unsigned gcd(unsigned x, unsigned y)
{
while (y)
{
int z = x % y;
x = y;
y = z;
}
return x;
}

unsigned recursive(unsigned x, unsigned y)
{
return y == 0 ? x : recursive(y, x % y);
}


And with gcc -O2 -S it already compiles to the exact same assembly!
	.p2align 2,,3
.globl _gcd
.def	_gcd;	.scl	2;	.type	32;	.endef
_gcd:
pushl	%ebp
movl	%esp, %ebp
movl	8(%ebp), %eax
movl	12(%ebp), %ecx
testl	%ecx, %ecx
jne	L6
jmp	L2
.p2align 2,,3
L8:
movl	%edx, %ecx
L6:
xorl	%edx, %edx
divl	%ecx
movl	%ecx, %eax
testl	%edx, %edx
jne	L8
L2:
leave
ret
.p2align 2,,3
.globl _recursive
.def	_recursive;	.scl	2;	.type	32;	.endef
_recursive:
pushl	%ebp
movl	%esp, %ebp
movl	8(%ebp), %eax
movl	12(%ebp), %ecx
testl	%ecx, %ecx
jne	L14
jmp	L10
.p2align 2,,3
L15:
movl	%edx, %ecx
L14:
xorl	%edx, %edx
divl	%ecx
movl	%ecx, %eax
testl	%edx, %edx
jne	L15
L10:
leave
ret


I must say I'm quite impressed. When was the last time you were astonished by your compiler? :)

##### Share on other sites
Quote:
 Original post by DevFredAnd with gcc -O2 -S it already compiles to the exact same assembly!
Modern C compilers can usually handle basic tail-recursion. Unfortunately it's very easy to confuse the compiler with anything beyond the most trivial cases (e.g. any code where you take the address of a local variable is pretty much out) and you typically can't enable only this optimization for debug builds, so you still can't rely on tail-recursion as a general loop mechanism unless you want to risk blowing the stack.

Quote:
 Original post by DevFredI must say I'm quite impressed. When was the last time you were astonished by your compiler? :)
I remember seeing MSVC optimize a bit-merge expression (a & m | b ~m) into an equivalent two-address sequence ((a ^ b & m) ^ a). Granted, that's a comparatively easy optimization to implement but it was the first time I actually learned a new bit-hack from a compiler.

Lately I've writing code for an AVR microcontroller using a commercial compiler from Imagecraft, and constantly amazed by the code it generates. Not in a good way though..

##### Share on other sites
Quote:
 Original post by implicityou still can't rely on tail-recursion as a general loop mechanism unless you want to risk blowing the stack.

But the recursive solution is so clean and elegant!
I guess my imperative mind has been spoiled by Haskell :)

##### Share on other sites
Quote:
 Original post by implicitI remember seeing MSVC optimize a bit-merge expression (a & m | b ~m) into an equivalent two-address sequence ((a ^ b & m) ^ a).

Shouldn't that be: convert (a & m) | (b & ~m) to ((a ^ b) & m) ^ b?

##### Share on other sites
Moving to General Programming. I trust there are no objections? :)

##### Share on other sites
Quote:
 Original post by DevFredI must say I'm quite impressed. When was the last time you were astonished by your compiler? :)

When it rerolled a manually unrolled loop so it could version it and unroll it by a different factor.

##### Share on other sites
Out of curiosity, I don't know assembly too well, so anyone care to describe the optimisation, how it works and why it's cool?

##### Share on other sites
Quote:
 Original post by AshleysBrainOut of curiosity, I don't know assembly too well, so anyone care to describe the optimisation, how it works and why it's cool?

Look at the recursive function: it calls itself. Now look at the generated assembly. There is no such function call anymore!

The tail recursion has been optimized to a simple loop from L15 to jne L15 which is exactly like the loop from L8 to jne L8.

##### Share on other sites
Pixel Shader assembly is better than anything you've ever seen. Seriously, I'm implementing a compiler right now and everything's not so hard (although it ain't a walk in the park...), but pixel shaders, man. What crazy guys.

In pixel shaders, you have n pipelines. This means you can process n pixels at once. Now, branching is really bad, because it stalls all pipelines, just for one pipeline. Branching is terrible. So what does the compiler do? It turns this code:

// 1: directional lightelse if (light_used == 1) {    L = -light_direction;}// 2: point lightelse if (light_used == 2) {    L = light_position - position;}

into something like this...

_itmp = light_used - 1L = (-light_direction * (1 - _itmp)) + ( (light_position - position) * _itmp );

Pow, branch removed. Faster. But it can do it with way more complex examples. It took me a while to think up something as simple as that, but it can handle way more involved logic. And it can do it with switch statements, everything. Incredible.

• 48
• 12
• 10
• 10
• 9
• ### Forum Statistics

• Total Topics
631378
• Total Posts
2999665
×