• Advertisement
Sign in to follow this  

GPGPU in a local network

This topic is 2145 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi guys!

I’m looking for a way to distribute some number crunching across multiple PCs in a network (we’re talking about 10 PCs). I primarily want to utilize their GPUs but using their CPUs might still give some extra speed.

I’ve seen that Microsoft’s C++ AMP seems promising, since it runs C++ code on the CPU and with help of DirectCompute on the GPU. Does anyone of you have some experience with it? How flexible is it really? Can I share resources between AMP and DirectCompute or CUDA? How well does it map code to the GPU? Perhaps some hand-optimized code can be faster… Is it possible to add hand-optimized code for certain tasks?

I noticed Microsoft’s implementation of the Message Passing Interface. What else does it offer aside from the message passing? I’ve seen some basic scheduling stuff and read about a monitoring tool. What are your experiences with this software? Do you know of better implementations of the Message Passing Interface? Is it easy to add in new scheduling strategies? Does this run entirely on Windows 7?
Or what is the current state-of-the-art in distributed computing in local networks?

Generally, I’m wondering whether C++ AMP and Microsoft’s MPI are a good combination. Any thoughts on that? smile.png
What would your choice be, if you would like to use all the GPUs in a network?

Thank you in advance!
Best regards
Tsus

Share this post


Link to post
Share on other sites
Advertisement
Have you looked at OpenCL? I haven't actually used it myself, so I can't vouch for it, but it's worth looking at if you haven't. I plan on giving it a try in the future.

Share this post


Link to post
Share on other sites
Thanks!

It has been a while since I last thought about switching from CUDA or DirectCompute to OpenCL. Back in the days OpenCL has been too slow.
I looked around for benchmarks and actually found one that compared CUDA with OpenCL and AMP.
AMP is the slowest, so this kind of rules the whole thing out. I guess it will take more time until AMP catches up. (It is still in a beta, so I’ll look again when it actually ships.)
OpenCL seems to be quite close to the performance of CUDA now, which is very nice actually. I think I’ll look into it (if only for the experience).

This leaves the question with Microsoft’s Messaging Passing Interface open. Is it a good choice?

Share this post


Link to post
Share on other sites
Have you found the optimal algorithm yet for such distributed environment?

Which API is best fit for it?

Share this post


Link to post
Share on other sites

OpenCL seems to be quite close to the performance of CUDA now, which is very nice actually. I think I’ll look into it (if only for the experience).

This leaves the question with Microsoft’s Messaging Passing Interface open. Is it a good choice?

Google finds quite a few papers (both academic papers and normal human readable articles) comparing different MPI implementations.



Have you found the optimal algorithm yet for such distributed environment?

Which API is best fit for it?

This is perhaps the most important question.

You've got a network of tightly coupled computing nodes connected through slower losely-coupled computing nodes. You'll need to design your algorithm accordingly, to favor communication between those nodes on a single machine and minimize communication between machines.

Share this post


Link to post
Share on other sites

Google finds quite a few papers (both academic papers and normal human readable articles) comparing different MPI implementations.

Oh, yes indeed. I found a paper that compared Microsoft’s MPI to a Unix implementation and it turned out Microsoft isn’t too far behind, which would be okay for me. I definitely have to do further research, though.


[quote name='Antheus' timestamp='1330964590' post='4919484']
Have you found the optimal algorithm yet for such distributed environment?

Which API is best fit for it?

This is perhaps the most important question.

You've got a network of tightly coupled computing nodes connected through slower losely-coupled computing nodes. You'll need to design your algorithm accordingly, to favor communication between those nodes on a single machine and minimize communication between machines.
[/quote]
The research for an optimal (or at least a useful) algorithm is the one and only purpose of the project. And yeah, that’s why I’d like to find an API that has good tools for workload monitoring and for experimenting with scheduling strategies. That’s where I hope you have some experience to share. smile.png

Share this post


Link to post
Share on other sites
When you're writing you algorithm, make sure that you're measuring the performance of your distributed work. I once wrote an algorithm which distributed work out to multiple CPU's and broke the work down into very small increments. It turned out that while the work was done quickly, the overhead of the network latency caused the overall time to complete the job to increase (~1 minute). If I made the jobs bigger, the computers could spend more time doing work on the CPU rather than sending and recieving network packets, thus decreasing working time to about 15 seconds. There's a probably a sweet spot for every algorithm where you maximize the work and minimize the time taken.

For what it's worth, the university I was at purchased and used an NVidia card which had a ton of processing power. I think it was the tesla? I don't know what the price tag looks like or how to use it, so I can't make any recommendations from experience.

Share this post


Link to post
Share on other sites
Thanks for the tip! I’ll watch out for it and make sure that the workload size will be scalable, perhaps somehow adaptive.
I had for a few months the pleasure to work with a Tesla. My colleagues could hear when my fillrate increased. smile.png The performance was awesome. The 10 desktops I’m working with have GTX460s, which is good enough I hope.

Which API did you use for distributing the tasks?

Share this post


Link to post
Share on other sites
I wrote the app in Java and used MPI. I don't have the source code available with me so I can't go into specifics on how I did it. If you're interested, I can get back to you in a few months when I get home.

Share this post


Link to post
Share on other sites
Alright, then! It looks like I’ll go with MPI. (I’ll first have a look at Microsoft’s implementation.)


I wrote the app in Java and used MPI. I don't have the source code available with me so I can't go into specifics on how I did it. If you're interested, I can get back to you in a few months when I get home.

That’s a very nice offer, thanks. smile.png
If I have trouble setting the MPI up, I’ll be back.

Thanks again you all for the advice!

So long Eric and come home safely.
Best regards

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement