MultiThreading, parallel computing questions

Started by
35 comments, last by Jan Wassenberg 16 years, 2 months ago
I got the impression that the original poster may well be an academic who has legitimate data and will go through the relevant peer review channels, but has heard that parallel computing is the most efficient way to do this. So the fact that he doesn't know much about distributed computing doesn't necessarily reflect on the legitimacy of what he wants to do.

_Sigma, perhaps you could elaborate on (a) the nature and amount of the data you're expecting to have, and (b) the type of algorithm you will need to run on it (ignoring threading/distributed concerns).
Advertisement
I've used all three. As mentioned, they serve different purposes. I have a post about OpenMP vs. TBB on my blog, which is in my profile, if you want more information.

Boost.Threads are nice for task-based threading. In particular, their mutex-type of classes and conditions are great. TBB provides similar facilities, but TBB can be configured to use your standard Windows threads, thus having full compatibility with Boost thread-constructs. In addition, TBB gives you some concurrent containers and gives you iterator-style loops that you can break up. They have a lot of useful facilities. OpenMP, I've also used. OpenMP is the simplest option, if, you have very simple loops that you're looking to break up. For the more complex C++-style cases, I haven't found OpenMP as useful as TBB, especially with the fact that throwing an exception into an OpenMP thread = death with no chance of recovery. One more thing to note is that mixing OpenMP threads with OS threads/boost threads/TBB have the risk of running too many threads at once, since the schedulers don't know about each other.

Overall, I prefer to use Boost.Threads with TBB. OpenMP isn't bad, though, and it is standard, which is a good thing, so you shouldn't necessarily discount it. Just remember that OpenMP's focus isn't C++, even though C++ is supported.

Also, TBB is commercial, as noted. I believe it's $300 per developer for the commercial license.
Rydinare, thank you very much for your reply + blog. Has helped lots.
Quote:Also, TBB is commercial, as noted. I believe it's $300 per developer for the commercial license.

However, there is a free open source version...is there a reason it wouldn't work for me?

Quote:frob
...

Erm, I guess I didn't explain my self well? I feel like I jsut asked how to write a climate model running on my uber-l33t Pentium D.... O_o

Quote:Your choices for that list of "3 main threading libraries" shows just how little you know about the field

No shit Sherlock. That is why I'm asking here. I didn't realize that knowing the answer to one's question was a prerequisite for posting on these forms.


Quote:I got the impression that the original poster may well be an academic who has legitimate data and will go through the relevant peer review channels, but has heard that parallel computing is the most efficient way to do this. So the fact that he doesn't know much about distributed computing doesn't necessarily reflect on the legitimacy of what he wants to do.


I'm currently finishing up my undergrad, which is a degree specifically aimed at having a strong background for modelling physical processes. It is a combination Physical Geography, Computer Science and Physics. After this, I will be actively looking into a masters degree.

This last summer I worked with a model to investigate the effects that an increased shrub density has on end of winter snow-water-equivalents in the Mackenzie Delta. I presented a poster at the IUGG
conference in Perugia, Italy. I'm currently working on testing a blowing some model implementation. For example, comparing to satellite data, observations, as well as parameter sensitivity analysis. In addition, I've implemented a radiation balance model to see the diurnal patterns in glacial melt as well as a basic geometry rules based wind model for two undergraduate projects.

So whilst I'm not a PhD student, I do have experience working with models, as well as writing my own. I've well aware of the peer review that is done as well as the extreme complexities involved.

In regards to data, I am (was) working with $70,000 Lidar DEM data sets, as well as other sat. imagery, etc....Perhaps that will stand as testimate to the fact I'm not just dicking around in my basement pretending I'm writing weather models.

Quote:A single PC isn't going to do serious weather forecasting, analyze sea floor sonar scans for oil beds, review your medical scans searching for cancer, render next year's new animated movies, or perform most other serious real-world tasks.

Yes I know. However, you can still do some very cool, and still meaning full, modelling experiments on a single computer. You don't need a super computer to investigate real world phenomenon.

After working last summer with that model (the one I presented in Perugia) I became completely disheartened with the quality of the model. In short, it was shit. It would randomly crash, it was slow, etc. It was the type of thing that wouldn't pass a 2nd year CS course. After talking to others who I was working with, it appeared that models like this were standard in my field (hydrology, snow research, stuff like that). I raised concern with my supervisor who pretty much told me to get used to it.

The people writing these models have NO CS background, they are all self taught. They lack fundamental CS skills. For example, the model I'm currently working with is developed by one fellow who doesn't have formal CS training. Therefore the model is:

1) Not under source control
2) has no external and little internal documentation
3) Coded on the fly. There are no design docs, etc
4) Tied to a dead compiler that if you can find, costs you $600
5) It does too much, and (poorly) reinvents the wheel extensively.

As well it crashes randomly and has quirks out the wazoo. I am honestly sceptical of the implementation of portions of the model (given what I've seen) and am honestly unsure if I can trust its results.

Oh, did I mention it was closed source? Yeah, I can't even look at the code.

So I see this mess and go "I can do better".

Some models (such as the one I'm working with) are moving to what are called HRUs - Hydrological Response Units. So instead of having a 2m dem, and working in a gridded fashion, they break up the landscape into areas that the author feels will all respond the same to an event. So the computations are very simple, however you lose the effects that topography has on the processes. You can further break up the landscape into many more HRUs, however you still lose out on topography. You get a faster model at a sacrifice of being able to use topography...

I see no reason why a highly threaded, distributed model can't be done to surpass (in my mind) the clunky notion of a HRU. I'm under the impression that it is because of the extreme lack of CS skills in the field that no one really does it.

Therefore, I wish to start learning multi threaded / distributed programming (which I don't have any experience with) so I can take advantage of multi threaded platforms.

Quote:Sigma, perhaps you could elaborate on (a) the nature and amount of the data you're expecting to have, and (b) the type of algorithm you will need to run on it (ignoring threading/distributed concerns)

Data can be 1000x1000 (or more) matrices.
The algorithms really depend on the model so it is difficult to comment without any specific cases...

I hope the above has further explained my point and goals...

Quote:Found this one.

heh, yeah I actually posted there.
Quote:I see no reason why a highly threaded, distributed model can't be done to surpass (in my mind) the clunky notion of a HRU. I'm under the impression that it is because of the extreme lack of CS skills in the field that no one really does it.


Academia. CS is considered no better than carpenter skills there. In general, people will argue that you need any CS knowledge - they did it themselves, why would they need some useless guy in their lab that would be writing code they cannot understand, yet it doesn't do anything more than their code did, while it has huge disadvantage that it's obscure, requires strange libraries and is too complex.

What you're going up against here is insurmountable resistance that you won't break. That's the way scientific community works. I'm certain that half of these models run on Fortran interpreters using code that was transcribed from punch cards.

Writing code in academic circles is considered a technical job. It falls right there with assembling a cabinet, replacing a light bulb or plugging in an electrical device.

Lastly, there's the ego thing. Proposing to high profile researchers they re-use the code they'll find huge problems with other people's code. For example, files will not be capitalized, something that is of crucial importance. So they'll rewrite everything themselves.

If you're developing this for yourself, and can apply it, then it'll be fine. But if you're hoping to revolutionize the way this research works, you're in for a rude awakening. You're most likely the only one who will ever see this code, no matter how many places you publish it in.

IMHO, I wouldn't try solving problems that don't need to be solved. Look into existing clustering solutions and study the concepts of concurrency so that you can split the model appropriately. This will give you most effect for minimal effort.
Quote:If you're developing this for yourself, and can apply it, then it'll be fine.

Yes, this would be my immediate goal.
Quote:Look into existing clustering solutions and study the concepts of concurrency so that you can split the model appropriately

Thanks for the link.
Can each Beowulf node take advantage of multi threaded stuff?
Quote:Original post by _Sigma

Can each Beowulf node take advantage of multi threaded stuff?


I believe it's process per core. It doesn't matter though, it's MPI-based and designed for arbitrary sizes.

For computationally heavy stuff, threads are undesirable since they cause performance decrease. For highest through-put, split the input data set into disjoint sets, then have individual processes work on each one. Highest performance and simplest design is possible through independent processes (either multiple cores or network cluster), with each process getting full CPU time.

Threads help when you need responsive applications and where tasks are small and cost of IPC high. For computationally heavy applications, neither of this is true if problem set is correctly defined and parallelized.

I have some experience in the field with large scale clusters, so I don't have that much experience with threading other than working on a multi-threaded server and some OMP projects.

MPI will get the job done but I found that you end up spending most of the time getting the message passing timing correct and trying to balance load more evenly. Then I did some stuff in Charm++, at first I hated it as it is relatively young and rough around the edges. The more I used it and trying to go back to MPI made me appreciate the object oriented aspects of the language. A properly configured charm installation on your cluster will automatically load balance all the charm objects (I think they had some weird name for those) as well. This freed me from some of those details and allowed me to concentrate on the app itself a little more.

I warn you though, Charm++ is a research language, but I have seen it in the wild.

For my thoughts on OMP, I felt that while the threading pragmas can be useful for small improvements here and there it doesn't take away from the need to design your program with parallelism in mind to get the most performance improvement. Of course we can only wish that every problem was embarrassingly parallel =)
Metalstorm, thanks. I'll look into that.

d000hg. That is fantastic. Cheers.

So does it seems like a reasonable approach to begin learning basic multi threading for a SINGLE computer, get the hang of that, and then move to the distributed programming?

Starting with TBB seems good as I found this book.

Does this seem reasonable?

Quote:However you need to decide early on if you want to pursue multithreaded or distributed solutions. I think there is an OMP-like distributed library somewhere.

To echo Kylotan, why is that?

I really appreciate all the replies here, exactly what I was looking for!
I wouldn't say that you have to choose if you want to go distributed or just locally threaded. I think the most helpful thing to do when beginning to learn multi-threaded/process programming is to understand the design of such programs rather than the specific library you end up using.

If you don't have a good grasp of critical sections, semaphores, deadlocks, race conditions and so on and so forth that shows up in parallel computing, if can cause a much larger headache than just learning the library.

I think the library comes second since each implementation has its advantages and disadvantages. You will probably end up learning a few different libraries and picking between them for which one works best.

Anyway, hope you enjoy the process of learning parallel computing _sigma. It's really quite satisfying the first time you see an amazing speedup from the linear version of your program.
Quote:Original post by _Sigma
Rydinare, thank you very much for your reply + blog. Has helped lots.
Quote:Also, TBB is commercial, as noted. I believe it's $300 per developer for the commercial license.

However, there is a free open source version...is there a reason it wouldn't work for me?


If you're not doing commercial work, the open source version should be fine.

I'm glad you found my blog and post useful. Let me know if you have any more questions. [smile]

This topic is closed to new replies.

Advertisement