• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Kambiz

Unity
my c++ d c# benchmark!

71 posts in this topic

i ported promit's c# code to java. i also just woke up, so beware silly mistakes. this runs about 22.7 seconds on a x2 3800+ @2ghz to give a general idea. i don't have any c/c++/c# compilers installed right now so i can't give any comparisons.

public class Pi
{
public static void main( String[] args )
{
Pi pi = new Pi();
pi.run( args );
}

private final int LONG_TIME = 4000;
byte[] p;
byte[] t;
int q;

private void run( String[] args )
{
int i;

if( args.length == 1 )
{
q = Integer.parseInt( args[0] );
}
else
{
System.out.println( "Usage: pi [precision]" );
return;
}

if( q < 0 )
{
System.out.println( "Precision was too low, running with precision of 0." );
q = 0;
}

if( q > LONG_TIME )
{
System.out.println( "Be prepared to wait a while..." );
}

// Compute one more digit than we display to compensate for rounding
q++;

p = new byte[q + 1];
t = new byte[q + 1];
/* compute pi */

long start = System.currentTimeMillis();
arctan( 2 );
arctan( 3 );
mul4();
long time = System.currentTimeMillis() - start;

// Return to the number of digits we want to display
q--;

/* print pi */

System.out.printf( "pi = %d.", p[0] );
for( i = 1; i <= q; i++ )
System.out.print( p[i] );
System.out.println();
System.out.printf( "%f seconds to compute pi with a precision of %d digits.", time / 1000.0, q );

return;
}

void arctan( int s )
{
int n;

t[0] = 1;
div( s ); /* t[] = 1/s */
add();
n = 1;
do
{
mul( n );
div( s * s );
div( n += 2 );
if( ( ( n - 1 ) / 2 ) % 2 == 0 )
add();
else
sub();
} while( !tiszero() );
}

void add()
{
int j;

for( j = q; j >= 0; j-- )
{
if( t[j] + p[j] > 9 )
{
p[j] += (byte) ( t[j] - 10 );
p[j - 1] += 1;
}
else
p[j] += t[j];
}
}

void sub()
{
int j;

for( j = q; j >= 0; j-- )
if( p[j] < t[j] )
{
p[j] -= (byte) ( t[j] - 10 );
p[j - 1] -= 1;
}
else
p[j] -= t[j];
}

void mul( int multiplier )
{
int b;
int i;
int carry = 0, digit;

for( i = q; i >= 0; i-- )
{
b = ( t[i] * multiplier + carry );
carry = (int) ( b * 0.1f );
digit = b - 10 * carry;
t[i] = (byte) digit;
}
}

/* t[] /= l */

void div( int divisor )
{
int i, b;
int quotient, remainder = 0;
float fdiv = 1.0f / divisor;

for( i = 0; i <= q; i++ )
{
b = ( 10 * remainder + t[i] );
quotient = (int) ( b * fdiv );
remainder = b - divisor * quotient;
t[i] = (byte) quotient;
}
}

void div4()
{
int i, c, d = 0;

for( i = 0; i <= q; i++ )
{
c = ( 10 * d + p[i] ) / 4;
d = ( 10 * d + p[i] ) % 4;
p[i] = (byte) c;
}
}

void mul4()
{
int i, c, d;

d = c = 0;

for( i = q; i >= 0; i-- )
{
d = ( p[i] * 4 + c ) % 10;
c = ( p[i] * 4 + c ) / 10;
p[i] = (byte) d;
}
}

boolean tiszero()
{
int k;

for( k = 0; k <= q; k++ )
if( t[k] != 0 )
return false;
return true;
}
}

0

Share this post


Link to post
Share on other sites
Quote:
Original post by h3r3tic
Looks like DMD is missing some optimization oportunities in the div and mul functions.
I did a little experimenting and found out that the performance hinges upon whether the / and % operations are done together with one divide instruction, or separately with two. Doing them separately nearly doubles the benchmark time. The divide instruction is one expensive operation. All the benchmark does is test whether that optimization is being done or not. The DMD and C# compilers are deficient in not doing it, but that isn't a deficiency in the languages, just their compiler implementations.
0

Share this post


Link to post
Share on other sites
I'm a little off topic here, but it took 1.672 seconds to calculate pi to 10,000 places in Lisp ^-^ (I didn't write the code).

(defun pi-atan (k n)
(do* ((a 0)
(w (* n k))
(k2 (* k k))
(i -1))
((= w 0) a)
(setq w (truncate w k2))
(incf i 2)
(incf a (truncate w i))
(setq w (truncate w k2))
(incf i 2)
(decf a (truncate w i))))

(defun calc-pi (digits)
(let* ((n digits)
(m (+ n 3))
(tenpower (expt 10 m)))
(values
(truncate
(-
(+ (pi-atan 18 (* tenpower 48))
(pi-atan 57 (* tenpower 32)))
(pi-atan 239 (* tenpower 20)))
1000))))
0

Share this post


Link to post
Share on other sites
...and Mathematica does it in 0.05 seconds.


O'Caml version (based on the OP's version):

let main args =
let q =
if Array.length args = 2
then int_of_string args.(1) + 1
else 101
in
let p = Array.make (q + 1) 0
and t = Array.make (q + 1) 0
in
let add () =
for j = q downto 0 do
if t.(j) + p.(j) > 9
then
(
p.(j) <- p.(j) + t.(j) - 10;
p.(j-1) <- p.(j-1) + 1
)
else
p.(j) <- p.(j) + t.(j)
done
and sub () =
for j = q downto 0 do
if p.(j) < t.(j)
then
(
p.(j) <- p.(j) - t.(j) + 10;
p.(j - 1) <- p.(j - 1) - 1
)
else
p.(j) <- p.(j) - t.(j)
done
and mul multiplier =
let carry = ref 0
and digit = ref 0
and b = ref 0
in
for i = q downto 0 do
b := t.(i) * multiplier + !carry;
digit := !b mod 10;
carry := !b / 10;
t.(i) <- !digit
done
and div divisor =
let quotient = ref 0
and remainder = ref 0
and b = ref 0
in
for i = 0 to q do
b := 10 * !remainder + t.(i);
quotient := !b / divisor;
remainder := !b mod divisor;
t.(i) <- !quotient
done
and mul4 () =
let c = ref 0
and d = ref 0
in
for i = q downto 0 do
d := (p.(i) * 4 + !c) mod 10;
c := (p.(i) * 4 + !c) / 10;
p.(i) <- !d
done
and tiszero () =
let k = ref 0
and r = ref true
in
while (!k <= q) && !r do
r := t.(!k) = 0;
k := !k + 1
done;
!r
in
let arctan s =
let n = ref 1
in
let loop () =
mul !n;
div (s * s);
n := !n + 2;
div !n;
if ((!n - 1) / 2) mod 2 = 0
then add ()
else sub ()
in
t.(0) <- 1;
div s;
add ();
loop ();
while not (tiszero ()) do
loop ()
done
in
let show array =
Array.iter print_int array
and run () =
arctan 2;
arctan 3;
mul4 ()
in
let time () =
let before = Sys.time ()
in
run ();
let after = Sys.time ()
in
show p;
print_newline();
Printf.printf "Time measured: %f" (after -. before)
in
Printf.printf "Computing pi (%d digits)..." (q - 1);
time ();;

main Sys.argv;;



C++: 27s
C#: 53.6s
O'Caml: 60.7s
0

Share this post


Link to post
Share on other sites
O'Caml : 61.98 for your version, 51.29 for mine. Also, by disabling array bounds-checking: 59.71 for your version, 50.04 for mine.

My code is given below. Numbers in comments are the number of executions for each line of code when computing 1000 digits. The modifications with regard to your code mainly consisted in removing the unnecessary indirections (references) you added. The original C code doesn't use pointers-to-variables, so why should we?


let args = Sys.argv
let q =
if Array.length args = 2
then int_of_string args.(1) + 1
else 101

let p = Array.make (q + 1) 0
let t = Array.make (q + 1) 0

let main2 () =
(* 2 *) let add () =
(* 1486 *) for j = q downto 0 do
if t.(j) + p.(j) > 9
then
(
p.(j) <- p.(j) + t.(j) - 10;
p.(j-1) <- p.(j-1) + 1
)
else
p.(j) <- p.(j) + t.(j)
done
and sub () =
(* 1486 *) for j = q downto 0 do
if p.(j) < t.(j)
then
(
p.(j) <- p.(j) - t.(j) + 10;
p.(j - 1) <- p.(j - 1) - 1
)
else
p.(j) <- p.(j) - t.(j)
done

and mul multiplier =
(* 2968 *) let rec mul i carry =
(* 2737504 *) if i >= 0 then
let b = t.(i) * multiplier + carry in
let carry = b / 10 in
(t.(i) <- b mod 10; mul (i-1) carry)
in mul q 0

and div divisor =
(* 5940 *) let rec div i remainder =
(* 5477220 *) if i <= q then
let b = 10 * remainder + t.(i) in
let remainder = b mod divisor in
(t.(i) <- b / divisor; div (i+1) remainder)
in div 0 0

and mul4 () =
(* 2 *) let rec mul i c =
(* 1106 *) if i >= 0 then
let a = p.(i) * 4 + c in
let d = a mod 10 in
let c = a / 10 in
(p.(i) <- d; mul (i-1) c)
in mul q 0

and tiszero () =
(* 2968 *) let rec check i =
(* 1375939 *) (i > q) || ( t.(i) = 0 && check (i+1))
in check 0

in let arctan s =
(* 4 *) let sqs = s * s in
let rec loop n =
(* 2968 *) mul n;
div sqs;
let n = n + 2 in
div n;
if ((n - 1) / 2) mod 2 = 0 then add () else sub ();
if not (tiszero ()) then loop n;
in
t.(0) <- 1;
div s;
add ();
loop 1
in
let show array =
(* 2 *) Array.iter print_int array
and run () =
(* 2 *) arctan 2;
arctan 3;
mul4 ()
in
let time () =
(* 2 *) let before = Sys.time () in
let _ = run () in
let after = Sys.time () in
show p;
print_newline();
Printf.printf "Time measured: %f" (after -. before)
in
Printf.printf "Computing pi (%d digits)..." (q - 1);
print_newline();
time ();;


0

Share this post


Link to post
Share on other sites
Quote:
Original post by doctorsixstring
What version of .NET was used for the tests?
.Net 2.0


I have tested gumpy macdrunken's java code here the result:
15.578000 seconds to compute pi with a precision of 10000 digits.
Promit's c# version:
19.443 seconds to compute pi with a precision of 10000 digits.

I think many people here misunderstood me:
1)I'm not searching for a fast way to compute pi
2)I'm not interested in optimizing the original program

I just saw the pi.d sample and noticed that I do not need to change any of the important function(add, sub, mul,...) to make the program run in c/c++ and c#. (I have just add some castings)
D , C++ and c# have a similar syntax so I thought “give them the same code and see what happens!”.

Why can the ms c++ compiler optimize those mul and div functions but the c# compiler can not?

[Edited by - Kambiz on October 7, 2006 5:29:26 PM]
0

Share this post


Link to post
Share on other sites
Good to see some discussion -- I was hoping this thread would pick up some steam.

First of all, with regards to the Java version -- how do they compare on client and server VMs? I don't really want to set up a full java environment right now. Also, how does Java do with the original version of the code, without my modifications?

Second, I intentionally withheld discussion of why I made the changes I did, and what the implications were. It was mainly to see what others would say first, instead of me jumping full into the discussion from the beginning. So here's my perspective on the matter...

I started things off with NProf, just so that I didn't actually have to understand how this program works. Div and mul obviously come up with the vast majority of CPU time. Notice that we're working with C code's ideal case here -- it is reasonable to assume that it is impossible to beat the C version. (By the way, the C code is really tight. I couldn't find any modifications that made any dent in the execution time without falling back to MMX.) So looking at the functions, it's completely trivial operations, with the exception of the divide and modulus. Now on x86, the idiv instruction computes both the division and modulus results at the same time. Looking at the generated assembly listing from VC makes it clear that the compiler has leveraged this optimization.

Some quick googling about C# math performance issues followed, which revealed a suggestion that the C# compiler and CLR do not optimize division constructs particularly well. So at that point, I began to suspect that replacing the division and modulus would make an impact. As you can see, it did. Note that the replacement is kind of retarded -- it involves the FPU to do its work, when there's really no reason to involve the FPU. More importantly, super-micro-benchmarks would most likely reveal that conversion from floating point to int is a performance problem. In fact, I expected that bringing the C# optimizations back to C++ would negatively impact performance, which it didn't.

On later reflection, I decided that DMD probably botches this somewhat machine specific optimization in exactly the same way. The results posted later that are built on DM C++, combined with the substantially better DMD results using my modifications, support this conclusion. I strongly suspect that JVM will screw this up too.

In other words, all I did was a crude workaround of a more fundamental optimization problem. I didn't utilize special C# functionality or features, I just wrote a hack that basically dodges the bug. This is a problem that we can expect to vanish in the next...oh, I don't know, let's say something vague like the next 5-10 years. It's pretty obvious that math performance is hardly ideal in the managed world -- and as I said at the beginning, we're playing on C's home turf. We can't win here.

Lastly, realize that if someone were to break out the MMX intrinsics, everything except C would be seriously screwed. You'd be looking at times for the C code that were about 1/4 to 1/3 what they are now. DMD, C#, and Java wouldn't be able to touch that -- and we can't even do autovectorization like that statically, let alone in a JITter.

Let's be frank. The non-C languages here all manage to make a positively lame turnout. DMD, C#, and Java are all very respectable after my modifications, and certainly adequate for nearly all real world applications, but it took extra work to get there. In my case it was only 15 minutes because I knew what I was doing -- for a less experienced developer, or more complex code, the story could get much worse. There are lots of situations where C# and Java can outperform C/C++ code, but tight math and array accesses are obviously not one of them.

[Edited by - Promit on October 7, 2006 8:18:19 PM]
0

Share this post


Link to post
Share on other sites
Quote:
Original post by deathkrush
The fastest way to calculate PI in C is of course:

*** Source Snippet Removed ***


Hey, I loved the snippet!
Though, it wouldnt compile as given (F_00() is used before defined) so I rearranged the code blocks.

Then it crunches out the number '0.250'.

So I might add: it may be a great way to calculate pi, compact and cool looking, but it lacks one extra nicety we would like in a program to calculate pi. Determining the correct result.
On all other accounts its cool though! :)

- Jacob
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
On later reflection, I decided that DMD probably botches this somewhat machine specific optimization in exactly the same way.

Examining the assembler output of DMD shows what you suspect is true. Strangely, in other cases, DMD does do that optimization. So it looks like a compiler problem that can be easily corrected.

Quote:
This is a problem that we can expect to vanish in the next...oh, I don't know, let's say something vague like the next 5-10 years.
Or maybe in the next update, if it is an easily correctible problem?

Quote:
It's pretty obvious that math performance is hardly ideal in the managed world -- and as I said at the beginning, we're playing on C's home turf. We can't win here.

C is at a disadvantage with heavy numerical work because of the aliasing problem. It's why Fortran is still preferred (and to be sure, Fortran compilers have gotten the benefit of decades of optimization work by legions of very smart programmers).

Quote:
Lastly, realize that if someone were to break out the MMX intrinsics, everything except C would be seriously screwed. You'd be looking at times for the C code that were about 1/4 to 1/3 what they are now. DMD, C#, and Java wouldn't be able to touch that -- and we can't even do autovectorization like that statically, let alone in a JITter.


I don't know much about the intrinsics to support MMX, but since DMD already has built in intrinsics for some functions (like sin(), cos(), etc.), then adding more doesn't seem like it would be any more of a problem than it would be to add intrinsics to C. D has an inline assembler with support for MMX, so those instructions can be gotten at that way.

Quote:
Let's be frank. The non-C languages here all manage to make a positively lame turnout. DMD, C#, and Java are all very respectable after my modifications, and certainly adequate for nearly all real world applications, but it took extra work to get there. In my case it was only 15 minutes because I knew what I was doing -- for a less experienced developer, or more complex code, the story could get much worse. There are lots of situations where C# and Java can outperform C/C++ code, but tight math and array accesses are obviously not one of them.


The substantial difference here is little more than an optimization folding /% into one instruction rather than two. I don't see any reason why C, C++, D, Java, or C# cannot do that, nor does it seem like a particularly hard optimization to do. All you're seeing is the result of a special case put into the optimizer, not any fundamental language difference.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Stachel
Quote:
This is a problem that we can expect to vanish in the next...oh, I don't know, let's say something vague like the next 5-10 years.
Or maybe in the next update, if it is an easily correctible problem?
It's a damn lucky day when an optimizer problem is easily corrected -- optimization is dangerous business. Though I was actually referring more to the JITted languages.
Quote:
C is at a disadvantage with heavy numerical work because of the aliasing problem. It's why Fortran is still preferred (and to be sure, Fortran compilers have gotten the benefit of decades of optimization work by legions of very smart programmers).
Looking at the assembly output of the C version, I don't see anything that can be fixed here -- what you're saying may be true for more complex mathematical work, but for this program, the C version is as fast as it gets without hitting machine specific code.
Quote:
I don't know much about the intrinsics to support MMX, but since DMD already has built in intrinsics for some functions (like sin(), cos(), etc.), then adding more doesn't seem like it would be any more of a problem than it would be to add intrinsics to C. D has an inline assembler with support for MMX, so those instructions can be gotten at that way.
Intrinsic support itself is relatively easy to do. The real benefit of intrinsic support is that the optimizer can be conscious of your "assembly" when you use them, and that's a more difficult challenge. It's not an insurmountable challenge -- VC and GCC do it, after all. I'd put my hopes on GDC rather than DMD here.
Quote:
The substantial difference here is little more than an optimization folding /% into one instruction rather than two. I don't see any reason why C, C++, D, Java, or C# cannot do that, nor does it seem like a particularly hard optimization to do. All you're seeing is the result of a special case put into the optimizer, not any fundamental language difference.
Sure (with the exception of support for vectorized instruction sets, which Java and C# lack). The problem is that introducing a lot of these features into the compiler is difficult to do well, and requires extensive testing. I can't claim to know why CLR can't do this optimization. There's no reason D can't do it, but it doesn't, which leads back into the original objection I expressed in the other thread -- D lacks developer support and maturity that the other languages, C and C++ in particular, have. Without that, they're screwed.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Promit
Quote:
Original post by Stachel
Quote:
This is a problem that we can expect to vanish in the next...oh, I don't know, let's say something vague like the next 5-10 years.
Or maybe in the next update, if it is an easily correctible problem?
It's a damn lucky day when an optimizer problem is easily corrected -- optimization is dangerous business. Though I was actually referring more to the JITted languages.


I would agree that just sticking a programmer on an optimizer and expecting him to make non-breaking improvements is unreasonable. But for the experts who do work on them, they may very well be easily corrected.

Quote:
Quote:
C is at a disadvantage with heavy numerical work because of the aliasing problem. It's why Fortran is still preferred (and to be sure, Fortran compilers have gotten the benefit of decades of optimization work by legions of very smart programmers).
Looking at the assembly output of the C version, I don't see anything that can be fixed here -- what you're saying may be true for more complex mathematical work, but for this program, the C version is as fast as it gets without hitting machine specific code.


Where Fortran shines is its ability to replace entire loops with parallel vectorized code. With C's aliasing problem, such transformations are very nearly impossible, though the new restrict pointer type may help.

Quote:
Quote:
I don't know much about the intrinsics to support MMX, but since DMD already has built in intrinsics for some functions (like sin(), cos(), etc.), then adding more doesn't seem like it would be any more of a problem than it would be to add intrinsics to C. D has an inline assembler with support for MMX, so those instructions can be gotten at that way.
Intrinsic support itself is relatively easy to do. The real benefit of intrinsic support is that the optimizer can be conscious of your "assembly" when you use them, and that's a more difficult challenge. It's not an insurmountable challenge -- VC and GCC do it, after all. I'd put my hopes on GDC rather than DMD here.


That makes sense. GDC does leverage all the ongoing work done with gcc. There's no technical reason D should be at a disadvantage relative to C.

Quote:
Quote:
The substantial difference here is little more than an optimization folding /% into one instruction rather than two. I don't see any reason why C, C++, D, Java, or C# cannot do that, nor does it seem like a particularly hard optimization to do. All you're seeing is the result of a special case put into the optimizer, not any fundamental language difference.
Sure (with the exception of support for vectorized instruction sets, which Java and C# lack). The problem is that introducing a lot of these features into the compiler is difficult to do well, and requires extensive testing. I can't claim to know why CLR can't do this optimization. There's no reason D can't do it, but it doesn't, which leads back into the original objection I expressed in the other thread -- D lacks developer support and maturity that the other languages, C and C++ in particular, have. Without that, they're screwed.


This is why D is obviously built to leverage existing technology. Both DMD and GDC are built on top of well developed C compilers. C libraries are directly accessible from D. C header files can be mechanically converted to D. I think you place too much of a store on the /% optimization; in my experiments DMD will do that optimization in other circumstances.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by PromitFirst of all, with regards to the Java version -- how do they compare on client and server VMs? I don't really want to set up a full java environment right now. Also, how does Java do with the original version of the code, without my modifications?

For the java version I just made a new project in netbeans and ran the program on jre 1.6.0 (beta 2). (I tested it also from the command line)
I replaced the modified function with the original ones and tested again:

package test;

import java.io.IOException;

public class Main
{
public static void main( String[] args )
{
Main pi = new Main();
pi.run( args );
}

private final int LONG_TIME = 4000;
byte[] p;
byte[] t;
int q;

private void run( String[] args )
{
int i;

if( args.length == 1 )
{
q = Integer.parseInt( args[0] );
}
else
{
System.out.println( "Usage: pi [precision]" );
return;
}

if( q < 0 )
{
System.out.println( "Precision was too low, running with precision of 0." );
q = 0;
}

if( q > LONG_TIME )
{
System.out.println( "Be prepared to wait a while..." );
}

// Compute one more digit than we display to compensate for rounding
q++;

p = new byte[q + 1];
t = new byte[q + 1];
/* compute pi */

long start = System.currentTimeMillis();
arctan( 2 );
arctan( 3 );
mul4();
long time = System.currentTimeMillis() - start;

// Return to the number of digits we want to display
q--;

/* print pi */

System.out.printf( "pi = %d.", p[0] );
for( i = 1; i <= q; i++ )
System.out.print( p[i] );
System.out.println();
System.out.printf( "%f seconds to compute pi with a precision of %d digits.", time / 1000.0, q );

return;
}

void arctan( int s )
{
int n;

t[0] = 1;
div( s ); /* t[] = 1/s */
add();
n = 1;
do
{
mul( n );
div( s * s );
div( n += 2 );
if( ( ( n - 1 ) / 2 ) % 2 == 0 )
add();
else
sub();
} while( !tiszero() );
}
void add()
{
int j;

for (j = q; j >= 0; j--)
{
if (t[j] + p[j] > 9)
{
p[j] += (byte)(t[j] - 10);
p[j - 1] += 1;
}
else
p[j] += t[j];
}
}

void sub()
{
int j;

for (j = q; j >= 0; j--)
if (p[j] < t[j])
{
p[j] -= (byte)(t[j] - 10);
p[j - 1] -= 1;
}
else
p[j] -= t[j];
}

void mul(int multiplier)
{
int b;
int i;
int carry = 0, digit = 0;

for (i = q; i >= 0; i--)
{
b = (t[i] * multiplier + carry);
digit = b % 10;
carry = b / 10;
t[i] = (byte)digit;
}
}

/* t[] /= l */

void div(int divisor)
{
int i, b;
int quotient, remainder = 0;

for (i = 0; i <= q; i++)
{
b = (10 * remainder + t[i]);
quotient = b / divisor;
remainder = b % divisor;
t[i] = (byte)quotient;
}
}

void div4()
{
int i, c, d = 0;

for (i = 0; i <= q; i++)
{
c = (10 * d + p[i]) / 4;
d = (10 * d + p[i]) % 4;
p[i] = (byte)c;
}
}

void mul4()
{
int i, c, d;

d = c = 0;

for (i = q; i >= 0; i--)
{
d = (p[i] * 4 + c) % 10;
c = (p[i] * 4 + c) / 10;
p[i] = (byte)d;
}
}

boolean tiszero()
{
int k;

for (k = 0; k <= q; k++)
if (t[k] != 0)
return false;
return true;
}


}



Result:
39.000000 seconds to compute pi with a precision of 10000 digits.
equivalent c# test:
c# : 34.745 seconds to compute pi with a precision of 10000 digits.

I thought that JIT compilation can produce more efficient code because it can optimize for the target machine. Maybe I should just wait for the next version of those compilers. C/C++ is a mature language and it is not surprising that there are excellent compilers available for it.

I do not understand why people are rating me down for this thread(?)
0

Share this post


Link to post
Share on other sites
I just downloaded the half-an-hour-ago released DMD 0.169. The change log contains:
"- Improved optimization of div/mod combinations."

pi.d compiled with DMD 0.168 took 20 seconds to run on my machine. When compiled with DMD 0.169, 14 seconds is all it takes. That's now only 2 seconds behind GDC and g++.

DMD.score++;
0

Share this post


Link to post
Share on other sites
I downloaded the new d compiler:
DMD 1.69: 17 seconds to compute pi with a precision of 10000 digits.

DMD 1.68: 26 seconds to compute pi with a precision of 10000 digits.
MSC++ 2005 : 15 seconds to compute pi with a precision of 10000 digits.

[attention]Interesting, D is now much faster.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Kambiz
I thought that JIT compilation can produce more efficient code because it can optimize for the target machine.
But it has less time to compile. Obviously it can't spend a minute optimizing some small program that would run in 20 seconds without full optimizations. But C compilation can take as long as necessary to make the best output code.
0

Share this post


Link to post
Share on other sites
Kambiz, I encourage you to add the GDC D compiler to the charts :)
http://gdcwin.sourceforge.net/
0

Share this post


Link to post
Share on other sites
Out of curiosity I implemented it in i386 machine code - it ended up 2 seconds slower than the C++ compiled version. At a guess (apart from shoddy code of course) I'd say it's because the compiler was more willing to unroll loops than I was. :)
0

Share this post


Link to post
Share on other sites
nice benchmark. but of course it doesnt cover all performance relevant aspects.

i would like to see a memory allocation benchmark. with lots of news and deletes. i guess the managed version will be better there.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by h3r3tic
Kambiz, I encourage you to add the GDC D compiler to the charts :)
http://gdcwin.sourceforge.net/
I can not compile using gdmd I get this error message:
object.d: module object cannot read file 'object.d'
I have never used those GNU compilers, can you tell me the correct command line arguments?
I have tried “gdmd pi.d -O -release” with “C:\MinGW\include\d\3.4.5” added to the PATH variable.
-I have to work now later I will try this again.-

Quote:
Original post by cody
nice benchmark. but of course it doesnt cover all performance relevant aspects.

i would like to see a memory allocation benchmark. with lots of news and deletes. i guess the managed version will be better there.
I agree.
0

Share this post


Link to post
Share on other sites
Quote:
Original post by Kambiz
I can not compile using gdmd I get this error message:
object.d: module object cannot read file 'object.d'
I have never used those GNU compilers, can you tell me the correct command line arguments?
I have tried “gdmd pi.d -O -release” with “C:\MinGW\include\d\3.4.5” added to the PATH variable.
-I have to work now later I will try this again.-

The error message means that it cannot find gphobos, the GDC standard library. It could mean a MinGW or GDC installation problem, but I'm not sure what exactly in this case. You could try adding 'C:\MinGW\include\d\3.4.5' to your path. I have mostly used DMD so I can't really help much in this case. On a last note, maybe it's a problem with GDC based on GCC 3.4.5 ? I've got the one based on 3.4.2 and it seems to work fine
As for the command line arguments, both:
> gdmd -inline -release -O pi.d
and
> gdc -frelease -finline -O3 pi.d
worked for me
0

Share this post


Link to post
Share on other sites
Quote:
Original post by cody
nice benchmark. but of course it doesnt cover all performance relevant aspects.
Here's a benchmark showing that low level code generation doesn't always matter, sometimes higher level features matter much more:

http://www.digitalmars.com/d/cppstrings.d

Scroll down to see the benchmark.

0

Share this post


Link to post
Share on other sites
Quote:
Original post by Kambiz
I'm just surprised a little: I thought that D shouldn't be much slower than c++ and I thought that c# would be much faster. Maybe there is some optimization option I have not used(?)

What do you think about the results?


Which version of DMD? With v0.169 of DMD (the latest just posted) and the .NET 2.0 runtime I got:

Intel 2.2 Ghz:
CPP 2.28 (cl /Ox /TP)
D 2.26 (dmd -O -inline -release)
C# 4.52 (csc /optimize+ /checked- /unsafe+)

AMD 3200+
CPP 1.43
D 1.46
C# 3.25

Check this for some other comparisons:

Computer Language Shootout

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • By OPNeonGames
      Hey guys, posting my work in progress for Macho Cat here. 

      A very early prototype for Macho Cat. Everything is just placeholder for now
      What is Macho Cat?
      A silly little game where you scrub the macho cat with random objects found in trash and junkyard to please him
      Gameplay feature?
      -Cat scrubbin, lots of scrubbin
      -Unlock moar objects in junkyard 
      -Funny, silly and easy
      When will the game release?
      December 2017 (estimate)
      Interested to Beta test?
      opneongame@gmail.com
    • By Tset_Tsyung
      Hey all,

      As the heading says I'm trying to get my head around Goal Objective Action Planning AI.  However I'm having some issues reverse engineering Brent Owens code line-by-line (mainly around the recursive graph building of the GOAPPlanner).  I'm assuming that reverse engineering this is the best way to get a comprehensive understanding... thoughts?

      Does anyone know if an indepth explanation on this article (found here: https://gamedevelopment.tutsplus.com/tutorials/goal-oriented-action-planning-for-a-smarter-ai--cms-20793), or another article on this subject?

      I'd gladly post my specific questions here (on this post, even), but not too sure how much I'm allowed to reference other sites...

      Any pointers, help or comments would be greatly appreciated.

      Sincerely,

      Mike
    • By E-gore
      Hi all,
      This is our (xDBAGames) first game
      Called Iron Dome Legacy.
      It is a Missile Command clone. In this game you control the Israeli "Iron Dome" anti missile defence system.
      Features: 
      The player has limited amount of missiles. The Iron dome system is upgradable. The money is determined by the outcome of a level. The game is free. There are only rewarded ads. We tried to create a game that has some context to the daily life, but we are sure not trying to be political here.
      I hope you could try this game, and we will appreciate any comments.
      xDBAGames is a company of two programmers. That have no experience in the video game industry, but have a lot of passion for games.

    • By UNNYHOG
      Hello, Colleagues!
       
      We have been working on our newest Fantasy style Gesture based MOBA game for a long time and it's releasing on Steam on July 26. Meanwhile you can already try it out by downloading our launcher from the website.
       
      Any feedback is welcome here. Thank you.
       
      If you don't want to play, I would love to share with you our teaser : 
       
       
       
    • By Scouting Ninja
      So I am working on a mobile game.
      It uses slides for a story, the slides are very large. Each slide is almost 2048*2048; the max texture loading size I am using for the game.
       
      My problem is that Unity keeps each slide in the memory after it's loaded, even when it will show only once per game. This leads to the game crashing on older mobiles.
      My idea was to destroy each object after it was shown using a coroutine, so it deletes the past slide and loads the next slide. This worked because instead of crashing on 23 slides it crashed on 48 slides.
      After some profiling I realized that destroy() isn't clearing all the memory that a slide used.
       
      What I want to do now is assign a limited amount of memory as a slide slot. Then I need some way to unload the slide from the slot, freeing the slot for the next slide; without Unity storing the slides in the memory.
      Any ideas on how I would do this? 
  • Popular Now