optimizing compilers

Started by
7 comments, last by _goat 14 years, 9 months ago
I just fiddled around with gcc to see how well it optimizes tail recursion.
unsigned gcd(unsigned x, unsigned y)
{
    while (y)
    {
        int z = x % y;
        x = y;
        y = z;
    }
    return x;
}

unsigned recursive(unsigned x, unsigned y)
{
    return y == 0 ? x : recursive(y, x % y);
}

And with gcc -O2 -S it already compiles to the exact same assembly!
	.p2align 2,,3
.globl _gcd
	.def	_gcd;	.scl	2;	.type	32;	.endef
_gcd:
	pushl	%ebp
	movl	%esp, %ebp
	movl	8(%ebp), %eax
	movl	12(%ebp), %ecx
	testl	%ecx, %ecx
	jne	L6
	jmp	L2
	.p2align 2,,3
L8:
	movl	%edx, %ecx
L6:
	xorl	%edx, %edx
	divl	%ecx
	movl	%ecx, %eax
	testl	%edx, %edx
	jne	L8
L2:
	leave
	ret
	.p2align 2,,3
.globl _recursive
	.def	_recursive;	.scl	2;	.type	32;	.endef
_recursive:
	pushl	%ebp
	movl	%esp, %ebp
	movl	8(%ebp), %eax
	movl	12(%ebp), %ecx
	testl	%ecx, %ecx
	jne	L14
	jmp	L10
	.p2align 2,,3
L15:
	movl	%edx, %ecx
L14:
	xorl	%edx, %edx
	divl	%ecx
	movl	%ecx, %eax
	testl	%edx, %edx
	jne	L15
L10:
	leave
	ret

I must say I'm quite impressed. When was the last time you were astonished by your compiler? :)
Advertisement
Quote:Original post by DevFred
And with gcc -O2 -S it already compiles to the exact same assembly!
Modern C compilers can usually handle basic tail-recursion. Unfortunately it's very easy to confuse the compiler with anything beyond the most trivial cases (e.g. any code where you take the address of a local variable is pretty much out) and you typically can't enable only this optimization for debug builds, so you still can't rely on tail-recursion as a general loop mechanism unless you want to risk blowing the stack.

Quote:Original post by DevFredI must say I'm quite impressed. When was the last time you were astonished by your compiler? :)
I remember seeing MSVC optimize a bit-merge expression (a & m | b ~m) into an equivalent two-address sequence ((a ^ b & m) ^ a). Granted, that's a comparatively easy optimization to implement but it was the first time I actually learned a new bit-hack from a compiler.

Lately I've writing code for an AVR microcontroller using a commercial compiler from Imagecraft, and constantly amazed by the code it generates. Not in a good way though..
Quote:Original post by implicit
you still can't rely on tail-recursion as a general loop mechanism unless you want to risk blowing the stack.

But the recursive solution is so clean and elegant!
I guess my imperative mind has been spoiled by Haskell :)
Quote:Original post by implicit
I remember seeing MSVC optimize a bit-merge expression (a & m | b ~m) into an equivalent two-address sequence ((a ^ b & m) ^ a).


Shouldn't that be: convert (a & m) | (b & ~m) to ((a ^ b) & m) ^ b?
Widelands - laid back, free software strategy
Moving to General Programming. I trust there are no objections? :)
Quote:Original post by DevFred
I must say I'm quite impressed. When was the last time you were astonished by your compiler? :)


When it rerolled a manually unrolled loop so it could version it and unroll it by a different factor.
Out of curiosity, I don't know assembly too well, so anyone care to describe the optimisation, how it works and why it's cool?
Construct (Free open-source game creator)
Quote:Original post by AshleysBrain
Out of curiosity, I don't know assembly too well, so anyone care to describe the optimisation, how it works and why it's cool?

Look at the recursive function: it calls itself. Now look at the generated assembly. There is no such function call anymore!

The tail recursion has been optimized to a simple loop from L15 to jne L15 which is exactly like the loop from L8 to jne L8.
Pixel Shader assembly is better than anything you've ever seen. Seriously, I'm implementing a compiler right now and everything's not so hard (although it ain't a walk in the park...), but pixel shaders, man. What crazy guys.

In pixel shaders, you have n pipelines. This means you can process n pixels at once. Now, branching is really bad, because it stalls all pipelines, just for one pipeline. Branching is terrible. So what does the compiler do? It turns this code:

// 1: directional lightelse if (light_used == 1) {    L = -light_direction;}// 2: point lightelse if (light_used == 2) {    L = light_position - position;}


into something like this...

_itmp = light_used - 1L = (-light_direction * (1 - _itmp)) + ( (light_position - position) * _itmp );


Pow, branch removed. Faster. But it can do it with way more complex examples. It took me a while to think up something as simple as that, but it can handle way more involved logic. And it can do it with switch statements, everything. Incredible.


[ search: google ][ programming: msdn | boost | opengl ][ languages: nihongo ]

This topic is closed to new replies.

Advertisement