my c++ d c# benchmark!

Started by
70 comments, last by Raghar 17 years, 6 months ago
Looks like DMD is missing some optimization oportunities in the div and mul functions. I tried using a similar modification to what Promit had posted:

import std.c.stdio;import std.c.stdlib;import std.c.time;const int LONG_TIME=4000;byte[] p;byte[] t;int q;int main(char[][] args){	int startime, endtime;	int i;	if (args.length == 2) {		sscanf(&args[1][0],"%d",&q);	} else {		printf("Usage: pi [precision]\n");		exit(55);	}	if (q < 0)	{		printf("Precision was too low, running with precision of 0.\n");		q = 0;	}	if (q > LONG_TIME)	{	    printf("Be prepared to wait a while...\n");	}	// Compute one more digit than we display to compensate for rounding	q++;	p.length = q + 1;	t.length = q + 1;	/* compute pi */	std.c.time.time(&startime);	arctan(2);	arctan(3);	mul4();	std.c.time.time(&endtime);	// Return to the number of digits we want to display	q--;	/* print pi */	printf("pi = %d.",cast(int)(p[0]));	for (i = 1; i <= q; i++)	printf("%d",cast(int)(p));	printf("\n");	printf("%ld seconds to compute pi with a precision of %d digits.\n",endtime-startime,q);	return 0;}void arctan(int s){	int n;	t[0] = 1;	div(s); /* t[] = 1/s */	add();	n = 1;	do {		mul(n);		div(s * s);		div(n += 2);		if (((n-1) / 2) % 2 == 0)			add();		else			sub();	} while (!tiszero());}void add(){	int j;	for (j = q; j >= 0; j--)	{		if (t[j] + p[j] > 9) {			p[j] += t[j] - 10;			p[j-1] += 1;		} else			p[j] += t[j];	}}void sub(){	int j;	for (j = q; j >= 0; j--)		if (p[j] < t[j]) {			p[j] -= t[j] - 10;			p[j-1] -= 1;		} else			p[j] -= t[j];}void mul(int multiplier){	int b;	int i;	int carry = 0, digit = 0;	for (i = q; i >= 0; i--) {		b = (t * multiplier + carry);		carry = b / 10;		digit = b - carry * 10;		t = digit;	}}/* t[] /= l */void div(int divisor){	int i, b;	int quotient, remainder = 0;	for (i = 0; i <= q; i++) {		b = (10 * remainder + t);		quotient = b / divisor;		remainder = b - divisor * quotient;		t = quotient;	}}void div4(){	int i, c, d = 0;	for (i = 0; i <= q; i++) {		c = (10 * d + p) / 4;		d = (10 * d + p) % 4;		p = c;	}}void mul4(){	int i, c, d;	d = c = 0;	for (i = q; i >= 0; i--) {		d = (p * 4 + c) % 10;		c = (p * 4 + c) / 10;		p = d;	}}int tiszero(){	int k;	for (k = 0; k <= q; k++)		if (t[k] != 0)			return false;	return true;}


Then I tested the original and modified program on two systems, on linux machine from my univ and my laptop running windows. Here are my results:

Linux 2.6.17-1.2142_FC4smp

Digital Mars D Compiler v0.168
dmd, no modifications: 50 seconds
dmd, modifications: 33 seconds
dmd flags: -release -inline -O

g++ (GCC) 4.0.2 20051125 (Red Hat 4.0.2-8)
g++, no modifications: 29 seconds
g++, modifications: 32 seconds
g++ flags: -O3



Windows XP SP2 Pro

Digital Mars D Compiler v0.168
dmd, no modifications: 20 seconds
dmd, modifications: 15 seconds
dmd flags: -release -inline -O

g++ (GCC) 3.4.2 (mingw-special)
g++, no modifications: 12 seconds
g++, modifications: 14 seconds
g++ flags: -O3

gdc (GCC) 3.4.2 (mingw-special) (gdc 0.19, using dmd 0.162)
gdc, no modifications: 12 seconds
gdc, modifications: 14 seconds
gdc flags: -frelease -finline -O3



I ran each test twice and took the better result. I used a windows build of the GCC D Compiler available from the following page: http://gdcwin.sourceforge.net/
Advertisement
Quote:Original post by Anonymous Poster
I think this was fair. It's not like Promit's code is more idiomatic C#, it's just a tweak that happens to make .NET run faster. Similar tweaks could probably be found in C++ and D versions as well, and we could spend weeks optimizing each one. Why not actually compare versions that are near identical in code to save the trouble? It even looks to me like his code would make the algorithm wrong due to the fact that floating point math isn't as accurate, but I didn't check this..

Comparing identical versions is pointless unless you're comparing different compilers of the same langage. In other words, its only fair if you're comparing VC++ to g++. When comparing different languages, you have to make changes for each langage in order to keep the test fair...a tightly optimized program for language X may well run like shit in language Y. You should keep the algorithm the same, and that's it.

CM
Quote:Original post by doctorsixstring
Different languages work in different ways.
It's more of a comparison of compilers than languages here. As you can see, the codes are near identical. What we could conclude from Promit's tests is that the C++ compiler relieves the writer of the code from micro-optimizations more than the .NET compiler does. He did the same fix for C++ and the speed was unaffected. So when writing code casually for C++ without regard to micro-optimizations, you'd get 34/15 times faster code than with the .NET compiler. With .NET you'd have to spend extra effort doing the compiler's job to get down to 22/20. I think this already tells us something useful.
Quote:What I care about is the performance I'll achieve using those languages in the way they're designed.
Can you claim with a straight face that Promit's opitimization is how you would've written the code in the first place with C#, and you would've used the modulo operator on C++ instead? It's not like C# was "designed" to perform worse with the integer modulo/div combo than C++. It's just a quirk.
Quote:Original post by Conner McCloud
Comparing identical versions is pointless unless you're comparing different compilers of the same langage. In other words, its only fair if you're comparing VC++ to g++. When comparing different languages, you have to make changes for each langage in order to keep the test fair...a tightly optimized program for language X may well run like shit in language Y.
Even if we were comparing different compilers of the same language, one could use the same reasoning you do. One would want to optimize one's code for the particular compiler's quirks, no? And code tightly optimized for compiler X could run like shit when compiled with compiler Y.

For a code like this all three are essentially identical. None of them provide some super speed constructs that lack from the other languages that should be exploited for this code. Promit's changes still work for D and C++ directly. The original D code worked for C++ and C# (essentially) directly. I know there are tons of actual cases where language differences matter for speed, only this is not one of them. Promit's change was only to make the compiler happy, not to make the code "better C#".
Quote:Original post by Anonymous Poster
It's more of a comparison of compilers than languages here.


Comparing a C# compiler to a C or C++ compiler to a D compiler is a bit like comparing apples to oranges, don't you think?

Quote:Original post by Anonymous Poster
As you can see, the codes are near identical.


As I mentioned earlier, I think this is the wrong way to run a benchmark. What if the original example is written in a non-C language, and then ported to C in a way that is beneficial to the original language? Obviously that would be unfair.

Quote:Original post by Anonymous Poster
What we could conclude from Promit's tests is that the C++ compiler relieves the writer of the code from micro-optimizations more than the .NET compiler does. He did the same fix for C++ and the speed was unaffected. So when writing code casually for C++ without regard to micro-optimizations, you'd get 34/15 times faster code than with the .NET compiler. With .NET you'd have to spend extra effort doing the compiler's job to get down to 22/20. I think this already tells us something useful.


I agree that C++ has the edge over C# in this case. It is interesting that C#'s modulus operator is so much slower that C++. What version of .NET was used for the tests? Has anyone compared it to non-MS implementations like Mono?

Quote:Original post by Anonymous Poster
Can you claim with a straight face that Promit's opitimization is how you would've written the code in the first place with C#, and you would've used the modulo operator on C++ instead?


No, I honestly would never have thought to use Promit's code instead of the modulus.

Quote:Original post by Anonymous Poster
It's not like C# was "designed" to perform worse with the integer modulo/div combo than C++. It's just a quirk.


I agree. It would be interesting to see if this quirk is common across all versions and implementations of .NET.

- Mike
i ported promit's c# code to java. i also just woke up, so beware silly mistakes. this runs about 22.7 seconds on a x2 3800+ @2ghz to give a general idea. i don't have any c/c++/c# compilers installed right now so i can't give any comparisons.
public class Pi{	public static void main( String[] args )	{		Pi pi = new Pi();		pi.run( args );	}	private final int LONG_TIME = 4000;	byte[] p;	byte[] t;	int q;	private void run( String[] args )	{		int i;		if( args.length == 1 )		{			q = Integer.parseInt( args[0] );		}		else		{			System.out.println( "Usage: pi [precision]" );			return;		}		if( q < 0 )		{			System.out.println( "Precision was too low, running with precision of 0." );			q = 0;		}		if( q > LONG_TIME )		{			System.out.println( "Be prepared to wait a while..." );		}		// Compute one more digit than we display to compensate for rounding		q++;		p = new byte[q + 1];		t = new byte[q + 1];		/* compute pi */                long start = System.currentTimeMillis();		arctan( 2 );		arctan( 3 );		mul4();                long time = System.currentTimeMillis() - start;		// Return to the number of digits we want to display		q--;		/* print pi */		System.out.printf( "pi = %d.", p[0] );		for( i = 1; i <= q; i++ )			System.out.print( p );		System.out.println();		System.out.printf( "%f seconds to compute pi with a precision of %d digits.", time / 1000.0, q );		return;	}	void arctan( int s )	{		int n;		t[0] = 1;		div( s ); /* t[] = 1/s */		add();		n = 1;		do		{			mul( n );			div( s * s );			div( n += 2 );			if( ( ( n - 1 ) / 2 ) % 2 == 0 )				add();			else				sub();		} while( !tiszero() );	}	void add()	{		int j;		for( j = q; j >= 0; j-- )		{			if( t[j] + p[j] > 9 )			{				p[j] += (byte) ( t[j] - 10 );				p[j - 1] += 1;			}			else				p[j] += t[j];		}	}	void sub()	{		int j;		for( j = q; j >= 0; j-- )			if( p[j] < t[j] )			{				p[j] -= (byte) ( t[j] - 10 );				p[j - 1] -= 1;			}			else				p[j] -= t[j];	}	void mul( int multiplier )	{		int b;		int i;		int carry = 0, digit;		for( i = q; i >= 0; i-- )		{			b = ( t * multiplier + carry );			carry = (int) ( b * 0.1f );			digit = b - 10 * carry;			t = (byte) digit;		}	}	/* t[] /= l */	void div( int divisor )	{		int i, b;		int quotient, remainder = 0;		float fdiv = 1.0f / divisor;		for( i = 0; i <= q; i++ )		{			b = ( 10 * remainder + t );			quotient = (int) ( b * fdiv );			remainder = b - divisor * quotient;			t = (byte) quotient;		}	}	void div4()	{		int i, c, d = 0;		for( i = 0; i <= q; i++ )		{			c = ( 10 * d + p ) / 4;			d = ( 10 * d + p ) % 4;			p = (byte) c;		}	}	void mul4()	{		int i, c, d;		d = c = 0;		for( i = q; i >= 0; i-- )		{			d = ( p * 4 + c ) % 10;			c = ( p * 4 + c ) / 10;			p = (byte) d;		}	}	boolean tiszero()	{		int k;		for( k = 0; k <= q; k++ )			if( t[k] != 0 )				return false;		return true;	}}
This space for rent.
Quote:Original post by h3r3tic
Looks like DMD is missing some optimization oportunities in the div and mul functions.
I did a little experimenting and found out that the performance hinges upon whether the / and % operations are done together with one divide instruction, or separately with two. Doing them separately nearly doubles the benchmark time. The divide instruction is one expensive operation. All the benchmark does is test whether that optimization is being done or not. The DMD and C# compilers are deficient in not doing it, but that isn't a deficiency in the languages, just their compiler implementations.
I'm a little off topic here, but it took 1.672 seconds to calculate pi to 10,000 places in Lisp ^-^ (I didn't write the code).

(defun pi-atan (k n)
(do* ((a 0)
(w (* n k))
(k2 (* k k))
(i -1))
((= w 0) a)
(setq w (truncate w k2))
(incf i 2)
(incf a (truncate w i))
(setq w (truncate w k2))
(incf i 2)
(decf a (truncate w i))))

(defun calc-pi (digits)
(let* ((n digits)
(m (+ n 3))
(tenpower (expt 10 m)))
(values
(truncate
(-
(+ (pi-atan 18 (* tenpower 48))
(pi-atan 57 (* tenpower 32)))
(pi-atan 239 (* tenpower 20)))
1000))))
Pants, I forgot the code tags.
...and Mathematica does it in 0.05 seconds.


O'Caml version (based on the OP's version):
let main args =  let q =    if Array.length args = 2    then int_of_string args.(1) + 1    else 101  in  let p = Array.make (q + 1) 0  and t = Array.make (q + 1) 0  in  let add () =    for j = q downto 0 do      if t.(j) + p.(j) > 9      then	(	 p.(j) <- p.(j) + t.(j) - 10;	 p.(j-1) <- p.(j-1) + 1	)      else	p.(j) <- p.(j) + t.(j)    done  and sub () =    for j = q downto 0 do      if p.(j) < t.(j)      then	(	 p.(j) <- p.(j) - t.(j) + 10;	 p.(j - 1) <- p.(j - 1) - 1	)      else	p.(j) <- p.(j) - t.(j)    done  and mul multiplier =    let carry = ref 0    and digit = ref 0    and b = ref 0    in    for i = q downto 0 do      b := t.(i) * multiplier + !carry;      digit := !b mod 10;      carry := !b / 10;      t.(i) <- !digit    done  and div divisor =    let quotient = ref 0    and remainder = ref 0    and b = ref 0    in    for i = 0 to q do      b := 10 * !remainder + t.(i);      quotient := !b / divisor;      remainder := !b mod divisor;      t.(i) <- !quotient    done  and mul4 () =    let c = ref 0    and d = ref 0    in    for i = q downto 0 do      d := (p.(i) * 4 + !c) mod 10;      c := (p.(i) * 4 + !c) / 10;      p.(i) <- !d    done  and tiszero () =    let k = ref 0    and r = ref true    in    while (!k <= q) && !r do      r := t.(!k) = 0;      k := !k + 1    done;    !r  in  let arctan s =    let n = ref 1    in    let loop () =      mul !n;      div (s * s);      n := !n + 2;      div !n;      if ((!n - 1) / 2) mod 2 = 0      then add ()      else sub ()    in    t.(0) <- 1;    div s;    add ();    loop ();    while not (tiszero ()) do      loop ()    done  in  let show array =    Array.iter print_int array  and run () =    arctan 2;    arctan 3;    mul4 ()  in  let time () =    let before = Sys.time ()    in    run ();    let after = Sys.time ()    in    show p;    print_newline();    Printf.printf "Time measured: %f" (after -. before)  in  Printf.printf "Computing pi (%d digits)..." (q - 1);  time ();;main Sys.argv;;


C++: 27s
C#: 53.6s
O'Caml: 60.7s

This topic is closed to new replies.

Advertisement