Sign in to follow this  
DevFred

optimizing compilers

Recommended Posts

I just fiddled around with gcc to see how well it optimizes tail recursion.
unsigned gcd(unsigned x, unsigned y)
{
    while (y)
    {
        int z = x % y;
        x = y;
        y = z;
    }
    return x;
}

unsigned recursive(unsigned x, unsigned y)
{
    return y == 0 ? x : recursive(y, x % y);
}

And with gcc -O2 -S it already compiles to the exact same assembly!
	.p2align 2,,3
.globl _gcd
	.def	_gcd;	.scl	2;	.type	32;	.endef
_gcd:
	pushl	%ebp
	movl	%esp, %ebp
	movl	8(%ebp), %eax
	movl	12(%ebp), %ecx
	testl	%ecx, %ecx
	jne	L6
	jmp	L2
	.p2align 2,,3
L8:
	movl	%edx, %ecx
L6:
	xorl	%edx, %edx
	divl	%ecx
	movl	%ecx, %eax
	testl	%edx, %edx
	jne	L8
L2:
	leave
	ret
	.p2align 2,,3
.globl _recursive
	.def	_recursive;	.scl	2;	.type	32;	.endef
_recursive:
	pushl	%ebp
	movl	%esp, %ebp
	movl	8(%ebp), %eax
	movl	12(%ebp), %ecx
	testl	%ecx, %ecx
	jne	L14
	jmp	L10
	.p2align 2,,3
L15:
	movl	%edx, %ecx
L14:
	xorl	%edx, %edx
	divl	%ecx
	movl	%ecx, %eax
	testl	%edx, %edx
	jne	L15
L10:
	leave
	ret

I must say I'm quite impressed. When was the last time you were astonished by your compiler? :)

Share this post


Link to post
Share on other sites
Quote:
Original post by DevFred
And with gcc -O2 -S it already compiles to the exact same assembly!
Modern C compilers can usually handle basic tail-recursion. Unfortunately it's very easy to confuse the compiler with anything beyond the most trivial cases (e.g. any code where you take the address of a local variable is pretty much out) and you typically can't enable only this optimization for debug builds, so you still can't rely on tail-recursion as a general loop mechanism unless you want to risk blowing the stack.

Quote:
Original post by DevFredI must say I'm quite impressed. When was the last time you were astonished by your compiler? :)
I remember seeing MSVC optimize a bit-merge expression (a & m | b ~m) into an equivalent two-address sequence ((a ^ b & m) ^ a). Granted, that's a comparatively easy optimization to implement but it was the first time I actually learned a new bit-hack from a compiler.

Lately I've writing code for an AVR microcontroller using a commercial compiler from Imagecraft, and constantly amazed by the code it generates. Not in a good way though..

Share this post


Link to post
Share on other sites
Quote:
Original post by implicit
you still can't rely on tail-recursion as a general loop mechanism unless you want to risk blowing the stack.

But the recursive solution is so clean and elegant!
I guess my imperative mind has been spoiled by Haskell :)

Share this post


Link to post
Share on other sites
Quote:
Original post by implicit
I remember seeing MSVC optimize a bit-merge expression (a & m | b ~m) into an equivalent two-address sequence ((a ^ b & m) ^ a).


Shouldn't that be: convert (a & m) | (b & ~m) to ((a ^ b) & m) ^ b?

Share this post


Link to post
Share on other sites
Quote:
Original post by DevFred
I must say I'm quite impressed. When was the last time you were astonished by your compiler? :)


When it rerolled a manually unrolled loop so it could version it and unroll it by a different factor.

Share this post


Link to post
Share on other sites
Quote:
Original post by AshleysBrain
Out of curiosity, I don't know assembly too well, so anyone care to describe the optimisation, how it works and why it's cool?

Look at the recursive function: it calls itself. Now look at the generated assembly. There is no such function call anymore!

The tail recursion has been optimized to a simple loop from L15 to jne L15 which is exactly like the loop from L8 to jne L8.

Share this post


Link to post
Share on other sites
Pixel Shader assembly is better than anything you've ever seen. Seriously, I'm implementing a compiler right now and everything's not so hard (although it ain't a walk in the park...), but pixel shaders, man. What crazy guys.

In pixel shaders, you have n pipelines. This means you can process n pixels at once. Now, branching is really bad, because it stalls all pipelines, just for one pipeline. Branching is terrible. So what does the compiler do? It turns this code:


// 1: directional light
else if (light_used == 1) {
L = -light_direction;
}
// 2: point light
else if (light_used == 2) {
L = light_position - position;
}



into something like this...


_itmp = light_used - 1
L = (-light_direction * (1 - _itmp)) + ( (light_position - position) * _itmp );



Pow, branch removed. Faster. But it can do it with way more complex examples. It took me a while to think up something as simple as that, but it can handle way more involved logic. And it can do it with switch statements, everything. Incredible.


Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this