Jump to content

  • Log In with Google      Sign In   
  • Create Account


GPGPU in a local network


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
9 replies to this topic

#1 Tsus   Members   -  Reputation: 1028

Like
0Likes
Like

Posted 04 March 2012 - 02:52 PM

Hi guys!

I’m looking for a way to distribute some number crunching across multiple PCs in a network (we’re talking about 10 PCs). I primarily want to utilize their GPUs but using their CPUs might still give some extra speed.

I’ve seen that Microsoft’s C++ AMP seems promising, since it runs C++ code on the CPU and with help of DirectCompute on the GPU. Does anyone of you have some experience with it? How flexible is it really? Can I share resources between AMP and DirectCompute or CUDA? How well does it map code to the GPU? Perhaps some hand-optimized code can be faster… Is it possible to add hand-optimized code for certain tasks?

I noticed Microsoft’s implementation of the Message Passing Interface. What else does it offer aside from the message passing? I’ve seen some basic scheduling stuff and read about a monitoring tool. What are your experiences with this software? Do you know of better implementations of the Message Passing Interface? Is it easy to add in new scheduling strategies? Does this run entirely on Windows 7?
Or what is the current state-of-the-art in distributed computing in local networks?

Generally, I’m wondering whether C++ AMP and Microsoft’s MPI are a good combination. Any thoughts on that? Posted Image
What would your choice be, if you would like to use all the GPUs in a network?

Thank you in advance!
Best regards
Tsus

Acagamics e.V. – IGDA Student Game Development Club (University of Magdeburg, Germany)


Sponsor:

#2 Cornstalks   Crossbones+   -  Reputation: 6974

Like
2Likes
Like

Posted 04 March 2012 - 03:45 PM

Have you looked at OpenCL? I haven't actually used it myself, so I can't vouch for it, but it's worth looking at if you haven't. I plan on giving it a try in the future.
[ I was ninja'd 71 times before I stopped counting a long time ago ] [ f.k.a. MikeTacular ] [ My Blog ] [ SWFer: Gaplessly looped MP3s in your Flash games ]

#3 Tsus   Members   -  Reputation: 1028

Like
0Likes
Like

Posted 04 March 2012 - 06:51 PM

Thanks!

It has been a while since I last thought about switching from CUDA or DirectCompute to OpenCL. Back in the days OpenCL has been too slow.
I looked around for benchmarks and actually found one that compared CUDA with OpenCL and AMP.
AMP is the slowest, so this kind of rules the whole thing out. I guess it will take more time until AMP catches up. (It is still in a beta, so I’ll look again when it actually ships.)
OpenCL seems to be quite close to the performance of CUDA now, which is very nice actually. I think I’ll look into it (if only for the experience).

This leaves the question with Microsoft’s Messaging Passing Interface open. Is it a good choice?

Acagamics e.V. – IGDA Student Game Development Club (University of Magdeburg, Germany)


#4 Antheus   Members   -  Reputation: 2397

Like
0Likes
Like

Posted 05 March 2012 - 10:23 AM

Have you found the optimal algorithm yet for such distributed environment?

Which API is best fit for it?

#5 frob   Moderators   -  Reputation: 20513

Like
1Likes
Like

Posted 05 March 2012 - 11:01 AM

OpenCL seems to be quite close to the performance of CUDA now, which is very nice actually. I think I’ll look into it (if only for the experience).

This leaves the question with Microsoft’s Messaging Passing Interface open. Is it a good choice?

Google finds quite a few papers (both academic papers and normal human readable articles) comparing different MPI implementations.


Have you found the optimal algorithm yet for such distributed environment?

Which API is best fit for it?

This is perhaps the most important question.

You've got a network of tightly coupled computing nodes connected through slower losely-coupled computing nodes. You'll need to design your algorithm accordingly, to favor communication between those nodes on a single machine and minimize communication between machines.
Check out my personal indie blog at bryanwagstaff.com.

#6 Tsus   Members   -  Reputation: 1028

Like
0Likes
Like

Posted 05 March 2012 - 12:41 PM

Google finds quite a few papers (both academic papers and normal human readable articles) comparing different MPI implementations.

Oh, yes indeed. I found a paper that compared Microsoft’s MPI to a Unix implementation and it turned out Microsoft isn’t too far behind, which would be okay for me. I definitely have to do further research, though.


Have you found the optimal algorithm yet for such distributed environment?

Which API is best fit for it?

This is perhaps the most important question.

You've got a network of tightly coupled computing nodes connected through slower losely-coupled computing nodes. You'll need to design your algorithm accordingly, to favor communication between those nodes on a single machine and minimize communication between machines.

The research for an optimal (or at least a useful) algorithm is the one and only purpose of the project. And yeah, that’s why I’d like to find an API that has good tools for workload monitoring and for experimenting with scheduling strategies. That’s where I hope you have some experience to share. Posted Image

Acagamics e.V. – IGDA Student Game Development Club (University of Magdeburg, Germany)


#7 slayemin   Members   -  Reputation: 2479

Like
1Likes
Like

Posted 07 March 2012 - 12:51 PM

When you're writing you algorithm, make sure that you're measuring the performance of your distributed work. I once wrote an algorithm which distributed work out to multiple CPU's and broke the work down into very small increments. It turned out that while the work was done quickly, the overhead of the network latency caused the overall time to complete the job to increase (~1 minute). If I made the jobs bigger, the computers could spend more time doing work on the CPU rather than sending and recieving network packets, thus decreasing working time to about 15 seconds. There's a probably a sweet spot for every algorithm where you maximize the work and minimize the time taken.

For what it's worth, the university I was at purchased and used an NVidia card which had a ton of processing power. I think it was the tesla? I don't know what the price tag looks like or how to use it, so I can't make any recommendations from experience.

Eric Nevala

Indie Developer | Dev blog


#8 Tsus   Members   -  Reputation: 1028

Like
0Likes
Like

Posted 07 March 2012 - 01:57 PM

Thanks for the tip! I’ll watch out for it and make sure that the workload size will be scalable, perhaps somehow adaptive.
I had for a few months the pleasure to work with a Tesla. My colleagues could hear when my fillrate increased. Posted Image The performance was awesome. The 10 desktops I’m working with have GTX460s, which is good enough I hope.

Which API did you use for distributing the tasks?

Acagamics e.V. – IGDA Student Game Development Club (University of Magdeburg, Germany)


#9 slayemin   Members   -  Reputation: 2479

Like
0Likes
Like

Posted 08 March 2012 - 12:08 PM

I wrote the app in Java and used MPI. I don't have the source code available with me so I can't go into specifics on how I did it. If you're interested, I can get back to you in a few months when I get home.

Eric Nevala

Indie Developer | Dev blog


#10 Tsus   Members   -  Reputation: 1028

Like
0Likes
Like

Posted 08 March 2012 - 03:47 PM

Alright, then! It looks like I’ll go with MPI. (I’ll first have a look at Microsoft’s implementation.)

I wrote the app in Java and used MPI. I don't have the source code available with me so I can't go into specifics on how I did it. If you're interested, I can get back to you in a few months when I get home.

That’s a very nice offer, thanks. Posted Image
If I have trouble setting the MPI up, I’ll be back.

Thanks again you all for the advice!

So long Eric and come home safely.
Best regards

Acagamics e.V. – IGDA Student Game Development Club (University of Magdeburg, Germany)





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS