Back to General and Gameplay Programming

optimizing compilers

General and Gameplay Programming Programming

Started by DevFred July 21, 2009 02:57 AM

7 comments, last by _goat 14 years, 9 months ago

DevFred

840

Author

July 21, 2009 02:57 AM

I just fiddled around with gcc to see how well it optimizes tail recursion.

unsigned gcd(unsigned x, unsigned y)
{
    while (y)
    {
        int z = x % y;
        x = y;
        y = z;
    }
    return x;
}

unsigned recursive(unsigned x, unsigned y)
{
    return y == 0 ? x : recursive(y, x % y);
}

And with gcc -O2 -S it already compiles to the exact same assembly!

	.p2align 2,,3
.globl _gcd
	.def	_gcd;	.scl	2;	.type	32;	.endef
_gcd:
	pushl	%ebp
	movl	%esp, %ebp
	movl	8(%ebp), %eax
	movl	12(%ebp), %ecx
	testl	%ecx, %ecx
	jne	L6
	jmp	L2
	.p2align 2,,3
L8:
	movl	%edx, %ecx
L6:
	xorl	%edx, %edx
	divl	%ecx
	movl	%ecx, %eax
	testl	%edx, %edx
	jne	L8
L2:
	leave
	ret
	.p2align 2,,3
.globl _recursive
	.def	_recursive;	.scl	2;	.type	32;	.endef
_recursive:
	pushl	%ebp
	movl	%esp, %ebp
	movl	8(%ebp), %eax
	movl	12(%ebp), %ecx
	testl	%ecx, %ecx
	jne	L14
	jmp	L10
	.p2align 2,,3
L15:
	movl	%edx, %ecx
L14:
	xorl	%edx, %edx
	divl	%ecx
	movl	%ecx, %eax
	testl	%edx, %edx
	jne	L15
L10:
	leave
	ret

I must say I'm quite impressed. When was the last time you were astonished by your compiler? :)

implicit

504

July 21, 2009 06:03 AM

Quote:Original post by DevFred
And with gcc -O2 -S it already compiles to the exact same assembly!

Modern C compilers can usually handle basic tail-recursion. Unfortunately it's very easy to confuse the compiler with anything beyond the most trivial cases (e.g. any code where you take the address of a local variable is pretty much out) and you typically can't enable only this optimization for debug builds, so you still can't rely on tail-recursion as a general loop mechanism unless you want to risk blowing the stack.

Quote:Original post by DevFredI must say I'm quite impressed. When was the last time you were astonished by your compiler? :)

I remember seeing MSVC optimize a bit-merge expression (a & m | b ~m) into an equivalent two-address sequence ((a ^ b & m) ^ a). Granted, that's a comparatively easy optimization to implement but it was the first time I actually learned a new bit-hack from a compiler.

Lately I've writing code for an AVR microcontroller using a commercial compiler from Imagecraft, and constantly amazed by the code it generates. Not in a good way though..

DevFred

840

Author

July 21, 2009 06:12 AM

Quote:Original post by implicit
you still can't rely on tail-recursion as a general loop mechanism unless you want to risk blowing the stack.

But the recursive solution is so clean and elegant!
I guess my imperative mind has been spoiled by Haskell :)

Prefect

373

July 21, 2009 06:51 AM

Quote:Original post by implicit
I remember seeing MSVC optimize a bit-merge expression (a & m | b ~m) into an equivalent two-address sequence ((a ^ b & m) ^ a).

Shouldn't that be: convert (a & m) | (b & ~m) to ((a ^ b) & m) ^ b?

Widelands - laid back, free software strategy

Zahlman

1,682

July 21, 2009 09:42 AM

Moving to General Programming. I trust there are no objections? :)

outRider

852

July 21, 2009 10:13 AM

Quote:Original post by DevFred
I must say I'm quite impressed. When was the last time you were astonished by your compiler? :)

When it rerolled a manually unrolled loop so it could version it and unroll it by a different factor.

AshleysBrain

162

July 21, 2009 02:46 PM

Out of curiosity, I don't know assembly too well, so anyone care to describe the optimisation, how it works and why it's cool?

Construct (Free open-source game creator)

DevFred

840

Author

July 21, 2009 05:14 PM

Quote:Original post by AshleysBrain
Out of curiosity, I don't know assembly too well, so anyone care to describe the optimisation, how it works and why it's cool?

Look at the recursive function: it calls itself. Now look at the generated assembly. There is no such function call anymore!

The tail recursion has been optimized to a simple loop from L15 to jne L15 which is exactly like the loop from L8 to jne L8.

_goat

804

July 21, 2009 08:30 PM

Pixel Shader assembly is better than anything you've ever seen. Seriously, I'm implementing a compiler right now and everything's not so hard (although it ain't a walk in the park...), but pixel shaders, man. What crazy guys.

In pixel shaders, you have n pipelines. This means you can process n pixels at once. Now, branching is really bad, because it stalls all pipelines, just for one pipeline. Branching is terrible. So what does the compiler do? It turns this code:

// 1: directional lightelse if (light_used == 1) {    L = -light_direction;}// 2: point lightelse if (light_used == 2) {    L = light_position - position;}

into something like this...

_itmp = light_used - 1L = (-light_direction * (1 - _itmp)) + ( (light_position - position) * _itmp );

Pow, branch removed. Faster. But it can do it with way more complex examples. It took me a while to think up something as simple as that, but it can handle way more involved logic. And it can do it with switch statements, everything. Incredible.

[ search: google ][ programming: msdn | boost | opengl ][ languages: nihongo ]

optimizing compilers

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

optimizing compilers

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines