Events+WaitForMultipleObjects vs IO completion ports?

Started by
4 comments, last by ApochPiQ 11 years, 9 months ago
I'm frustum culling terrain using a quad tree, which is an obvious candidate for multithreading. In my case, I'm using a precalculated PVS pre-sorted front to back. I'm looking to chop up the PVS and multithread the culling.

Would using completion ports be advantageous over events and wait functions, performance wise? There seems to be mixed opinions on this, with many saying that the maximum number of events being the only downside, which isn't relevant to this example.

Another possibility would be to use PostThreadMessage, though that would probably (?) be least efficient.

Cheers
Advertisement
With completion ports, you have multiple threads waiting on a single source of input. This is the opposite of WaitForMultipleObjects, where you typically have a single thread waiting on multiple sources of input.

So, what work distribution algorithms do you have in mind?

I'd recommend starting out with a thread pool, whose interface is as agnostic as possible towards the internal scheduling/processing code. Once you have that you'll be in a good position to answer the question yourself through experimentation and profiling, presumably on the kind of hardware you're targeting and with representative workloads.
Thaks for the reply.

Basically, I have a single main thread which offloads work (when applicable) to an existing worker thread pool, the size of the pool being dependant upon the number of physical cores.

In a quad core scenario, one thread triggers 4 worker threads, and waits for them to complete before continuing its merry way.

main thread ---------- triggers workers ----- waits for workers to complete ---------------
worker threads \--------------------------/
\--------------------------/
\--------------------------/
\--------------------------/


Because everything is synchronised, I can approach this in several ways:

PostThreadMessage - I could reuse the same threads depending on an application defined message, and is very easy to code and visualise.
Reset events - easy to sync, but would require many thread pools, one tailored for each 'job'.
Completion ports - [slightly] harder to code for, but would enable breaking the job up easily into many sections (which would alleviate one thread getting behind) without numerous context switches.

#by beaking a job into many sections I mean: say I have an array of cells from a PVS. dividing it into the number of cores is fine unless windows happen to schedule another thread on the same core and dumps the cache. If I broke the job into 4 x #cores (for example), other cores would pick up the slack if I used completion ports.

I'm only really showing PostThreadMessage for the sake of completeness. Each has its own advantages/disadvantages depending on your point of view. If you can't tell, I'm heavily leaning towards completion ports, but would rather hear of any 'gotchas' before I implement it!

Incidently, the frustum culling I mentioned is just one example of thread usage. Creating a multithreaded render list, to eventually sort on the main thread, is another prime candidate.
IOCPs will, theoretically at least, give you better results when it comes to cache coherency and core affinity of your threads. However, APCs (the mechanism by which you would use IOCPs to do this) are tricky as hell to get right, so think carefully about whether the few percent of throughput gain is worth your time ;-)

Also, it may be worth noting that IOCPs may not necessarily do all the nifty core-affinity logic when you're using them in this regard. AFAIK they're designed to be run on kernel-level IO handles, and you're not exactly doing any kernel IO here. So it may not actually pay off in the long run. I'm not 100% on that, though, so if you have some spare hair to pull out and a lot of time, it'd be nice to know if IOCPs actually work nicely in this scenario!

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

ApochPiQ...

I'll manage the threads myself, and set affinity per core or virtual core pair in the case of hyperthreading. I think I'm going to have to test under different conditions (such as other threads using 100% CPU, with or without HT etc. Also test without setting affinity... maybe the OS *can* be smart at this sort of thing.

In truth, I was guessing this would be a 'suck it and see' type scenario!!!
The nice thing about IOCP's implementation is that if you have two back-to-back completions, they will tend to run on the same thread without forcing a context switch. This is difficult if not impossible to guarantee with events, for example. It also means that setting core affinities by hand is counter-productive in a lot of cases.

However, I will stress again that I personally don't know offhand if this "affinity" works when you're just using APCs and not actual kernel IO handles.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

This topic is closed to new replies.

Advertisement