Sign in to follow this  
  • entries
    122
  • comments
    246
  • views
    89775

Parallel Programming and Primes

Sign in to follow this  

269 views

Tip: How to reverse a list in one line in F#:
let rev l = List.fold_left (fun xs x -> x :: xs) [] l
------------
So I've been looking to write up Part 3 of Functional programming on .NET. If you recall, I stated that F# was much more performant at factorising integers than the respective Nemerle code. But a good part of this was due more to implementation details (a minor 1 line change cut the Nemerle time down by 2/3 although it still remained much slower than the F# in general). So I decided to revisit the Nemerle code to see if I could rearchitecture the code in a more appropriate manner.

In addition to levaraging some number theory concepts I decided to play with some parallel programming concepts - since now, its much easier on .NET than in general - to make this program as efficient as possible. I will note that the following was done on a dual core laptop. I decided to look into the ParallelFX library.

In this article the following code caught my eye:
int sum = Parallel.Aggregate(0, 100, // iteration domain
0, // initial value
delegate(int i) { return (isPrime(i) ? i : 0) }, // apply
// on each element
delegate(int x, int y) { return x+y; } // combine results
);
The code sums the first 100 primes. Ah primes I thought, let me use this example to get my feet wet in the library. As stated in the article, the code is written in that manner so that no collisions in access occur. Essentially, every one hankering for the same thing must wait their turn. So the essence of parallel programming is what every child learns in kindergarten. If everyone talks at the same time then some points are going to be missed and we would be caught going over the same ground. Writing properly parallel code is a fine balancing act. This is probably why F# and functional languages are now suddenly in. Since properly referentially transparent code can be more easily written in them and then easily divided up to be sent out for parallelization. An interesting note is that without lambda expressions a library like Parallel FX would not make much sense.

To continue, in the current version of the library the Parallel.Aggregate method has been removed due to its redundancy. However, the top resource on the Task Parallel library is ofcourse the blog by its creators. On there the following broken code snippet was given (Warning does not compile):
static T ParallelAggregate(
int fromInclusive, int toExclusive, T seed,
Action selector, Action aggregator)
{
T result = seed;
object aggLock = new object();
Parallel.For(fromInclusive, toExclusive, () => initialValue, (i,state) =>
{
state.ThreadLocalState =
aggregator(state.ThreadLocalState, selector(i));
},
partial => { lock(aggLock) result = aggregator(partial, result); } );
return result;
}
Looking at the documentation the most likely candidate looked like it might be:
public static void For(
int fromInclusive,
int toExclusive,
Func threadLocalSelector,
Action> body,
Action threadLocalCleanup)
In the above example there was no way aggregarator could be of type Action and take selector as a parameter - Action always returns a void. Anyways I decided to rewrite the above snippet in Nemerle:
module ParallelAggr
public ParallelAggregate [a](fromInclusive : int, toExclusive : int, seed : a, selector : int -> a, aggregator : a * a -> a ) : a
mutable result = seed
def aggLock = object();
Parallel.For(fromInclusive, toExclusive, Func(fun(){seed;}),
Action(fun(i , state : ParallelState[a]){
state.ThreadLocalState = aggregator(state.ThreadLocalState, selector(i : int));
}), Action(fun (partialC) { lock(aggLock){ result = aggregator(partialC, result); }}));
result
and in F#:
let parallelAggregate (fromInclusive : int) (toExclusive : int) (seed : 'a) (selector : int -> 'a) (aggregator : 'a -> 'a -> 'a )  =
let result = ref(seed)
let aggLock = new Object()

Parallel.For(fromInclusive, toExclusive, new 'a Func((fun () -> seed)),
new Action>( (fun i state ->
(state.ThreadLocalState <- aggregator state.ThreadLocalState (selector i)) ) ) ,
new Action<'a>(fun partialC ->
result := lock (aggLock)(fun () -> aggregator partialC !result) ))

!result
And if you are interested I wrote similar code properly for C#:
static double  ParallelAggregate(
int fromInclusive, int toExclusive, double seed,
Func selector, Func aggregator)
{
//decimal d = new decimal ();
//Decimal.FromIn
var result = seed;
object aggLock = new object();
Parallel.For(fromInclusive, toExclusive, () => seed , (i, state) =>
{
state.ThreadLocalState =
aggregator(state.ThreadLocalState, selector( (int)i ));
},
partial => { lock (aggLock) result = aggregator(partial, result); });
return result;
}
Firstly as you may notice the library is biased towards C# 3.0 "lambda expressions". Indeed it is here that Nemerle outshines F#. Nemerle has some macros which eliminate the need for typing boiler plate object intialization code, reducing visual noice a fair bit. In fact in terms of how easily it integrates with CLR concepts I have found Nemerle to be preferable to F#. Ive been working on an asp.net website and have found Nemerle to be the better functional language in terms of integrating with the CLR as a stand alone language. Nemerle was built from the ground up to integrate more fully with OOP concepts and how it exposes its internals classes for other CLR languages. I find it to be an impressive blend of OOP and functional programming. It can hang just as easily with the ML and Haskell peeps as it can with the C# and Java crowd. F# is a bit more lightweight and more conservative about how it does traditional OOP and also a bit closer to the likes of haskell than nemerle.

Anyways to use, one simply calls the function and also passes the bodies of code as functions.
Nemerle:
sum = ParallelAggr.ParallelAggregate(2 , 10000001, 0.0, fun(i){match(Number.IsPrime2(i:int)){|true =>i :double ; | false => 0.0;}}, fun(x,y){x+y})

F#:
let sum2 = parallelAggregate 2 20000000 0.0 (fun i -> match (prime (Float.of_int i)) with
true -> Float.of_int i
| false -> 0.0) (fun x y -> x+y)
The reason Nemerle's code is all stuffed into one line is due to the fact that support for whitespace is not yet perfect in the language but using the parallel construct in F# is more natural and integrates so well it looks like it actually belongs... However the Nemerle code can be made to belong using a simple 1 line macro. As an example here I extended the syntax crappily (the result is placed in sum):



Haven written this code in the languages I decided to see how each language would do. I did not have a primality checking function for C# so I did a quick search on google and got this code from the gamedev forums which I used unmodified:

static bool IsPrime(double TestNumber)
{
double limit = System.Math.Sqrt(TestNumber) + 1;
if (TestNumber == 2)
return true;
if (TestNumber % 2 == 0) //Check if TestNumber is even
return false;
double x = 3;

//Devide until the square root is reached
for (; x < limit; x += 2)
{
if (TestNumber % x == 0)
return false;
}
return true;
}
I then decided to compare how all the languages fared agaisnt each other. So I wrote a little program that sums all primes in a range in C++ Win32 and C++ CLI as well. A few important things to empasize are: This is not a language speed test, such a notion is ill conceived. It just tells how well each language sums primes using a very simple primality check on my machine. One conclusion I do feel I can make is that Nemerle does not scale very well with space. The more space the algorithim is using the more rapid the perfomance degredation. However for most tasks this will not matter and a properly written algorithim sensitive to the language would get around (most of) this.

  • Nemerle, C# and both C++ versions all use the IsPrime function above.

  • For whatever reason up until 500,000 C++ Win32 is fastest. But thereafter C++ CLR is slightly faster.

  • The F# primality check is the old lazy evaluation based one I described in the last post. So it is not really a fair comparison since its code is different from the rest. The algorithim works best for larger ranges and approaches C#. From this I would venture a guess that F# is in general at least 95% as fast C# 3.0.

  • The F# code benefited the most from parallelization, often taking just a bit less than 1/2 the time on my dual core.

  • As stated the primality check in nemerle is that listed above. Given that, though Nemerle is the least performance of the lot the fact that it takes only 13 secs to find an sum all the primes to 2,000,000 is still of note.

  • For small bodies of work (e.g. to 5,000) the parallel library adds more overhead than it is worth - it never goes below ~18 secs

  • The C++ CLR version was the over all winner taking on average 1/3 the time as C#. Again this does not mean that in general C# is that much slower only that for this particular poor example it was on my machine.

  • On all the .NET languages: C++, F#, Nemerle compiling in release or debug mode had seemingly little effect on the speed. However it made a significant difference to the C++ win32 example.

  • I did not bother running the Nemerle to 50,000,000. What is there is actually only a projection.
The Data:
Sign in to follow this  


3 Comments


Recommended Comments

Yes! I have been looking for a language that would bring together all the best parts of C++, SML, and Ruby for a LONNNNGGGG while. Nemerle may just be that language! How have I never heard of this language?

Now...the real question is...did I install it correctly on my MacOSX box? Definitely questionable.

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now