Sign in to follow this  
ferr

[.net] C# Multi-threading, Quad Core, Thread Affinity?

Recommended Posts

I'm messing around with my quad core trying to get four large processes/threads running in parallel at max speed, i.e. one thread per core.. the problem is SetThreadAffinityMask seems to do nothing. Within the ThreadStart function I make these calls to external functions:
SetThreadAffinityMask(GetCurrentThread(), new IntPtr(1 << 0));
I've used a couple other methods including setting the process thread's ProcessorAffinity.. nothing has worked so far. Here's the extern calls for those functions
[DllImport("kernel32.dll")]
static extern IntPtr GetCurrentThread();

[DllImport("kernel32.dll")]
static extern IntPtr SetThreadAffinityMask(IntPtr hThread, IntPtr dwThreadAffinityMask);
I'm running under Vista64 with Q6600 using VC# 2008 EE.

Share this post


Link to post
Share on other sites
My understanding is that some extra stuff happens under the hood for threads that run .net code (I guess due to requiring communication with the GC and it's thread, etc). So, the moment you have .net code run on your process thread, you lose the ability to set thread affinity.

What is the return value? I'm guessing it's zero.

Anyway. I also wonder why you are doing this?
You shouldn't be setting thread affinity for performance reasons, if anything it'll only make things slower - the OS is really good at managing threads, especially in vista. Unless you have some very specific code that requires it, I'd say you'd be better without it.

Quote:

msdn:
In most cases, it is better to let the system select an available processor.


Process affinity on the otherhand should work fine. If that isn't working, well, I'm not sure what is going wrong.
Even so it's still better to leave it up to the OS (in my opinion :))

Share this post


Link to post
Share on other sites
Quote:
Headkaze 1 << 0 shifts 1 to the left zero times so your left with 1. I don't think this is what your trying to achieve here surely?

That is one of maybe five different approaches I used for passing in that argument. I could not find many examples of literal usage with this function.. the two that I did find used the said argument and another had something similar to "0x01" (in instances where I wanted to use processor #1).


Quote:
Anyway. I also wonder why you are doing this?
You shouldn't be setting thread affinity for performance reasons, if anything it'll only make things slower - the OS is really good at managing threads

I thought that would be true going into this as well, but after a lot of testing (i.e. actually going into task manager and setting affinity manually) I have discovered that with this particular set of processes it is best to have one thread per core. The fact is that so far I cannot exceed more than 25% of my total CPU.. even with multi-threading completely "off" the CPU only gets to 25%. I wonder if this is a priority issue or something like that?

Here are some of my results I found with threadless, 1 thread, and 2 threads:

Threadless:
- No affinity
-- CPU Usage: 25% CPU Usage spread across 4 cores, max core usage rougly 80%.
-- Processes Completion Time: 25 seconds.

One Thread:
- No affinity
-- CPU Usage: 25% CPU Usage spread across 4 cores, max core usage rougly 50%.
-- Process Completion Time: 64 seconds.
- Affinity to 1 core
-- CPU Usage: 25% CPU Usage spread across 1 core, max core usage 100%.
-- Process Completion Time: 44 seconds.

Two Threads:
- No affinity
-- CPU Usage: 25% CPU Usage spread across 4 cores, max core usage rougly 40%.
-- Processes Completion Time: 71 seconds.
- Affinity to 1 core
-- CPU Usage: 25% CPU Usage spread across 1 core, max core usage 100%.
--Process Completion Time: 18 seconds.

Strangest thing to me is threadless vs one thread, I would have thought they would have very similar completion times, but the way they are spread across cores matters a lot. I guess 1 core doing 100% work > two cores doing 50% work. But then again, with core affinity set on the one thread run it's still much slower even with more cpu usage on the same core.

Quote:
You might want to look into trying the following instead?

The Task Parallel Library (TPL)

Thanks, I'll look into this.


As kind of a by the way, I'm not really trying to perform these particular tasks on specified cores for performance reasons, I am just trying to figure out how I would do it if I needed to do it, honestly.

Share this post


Link to post
Share on other sites
I think you need to set your threading model to multi-threaded apartment. If you're not using WinForms or doing COM interop, you can just set the attribute on your Main method to MTAThread. Otherwise, you need to call Thread.SetApartmentState on threads that you create so that they are MTA.

Some info on the threading models is in the link below.
http://www.developer.com/net/cplus/article.php/2202491

Share this post


Link to post
Share on other sites
I tried setting mine to MTA and it crashed out with something about a mouse event problem. I posted a thread and people said I didn't have to set the MTA thing up that it should multi thread anyway. I still can't get past 50% CPU usage myself on my Dual Core. I'm working on this issue too!

Share this post


Link to post
Share on other sites
I tried setting mine to MTA and it crashed out with something about a mouse event problem. I posted a thread and people said I didn't have to set the MTA thing up that it should multi thread anyway. I still can't get past 50% CPU usage myself on my Dual Core. I'm working on this issue too!

Share this post


Link to post
Share on other sites
Edit: Cancel this. RipTorn asked the same thing, and you already answered, so never mind.

Leaving this here for posterity.

Quote:
Original post by ferr
I'm messing around with my quad core trying to get four large processes/threads running in parallel at max speed, i.e. one thread per core.. the problem is SetThreadAffinityMask seems to do nothing.

Just out of interest, why are you trying to manually push the threads onto separate cores?
If the OS's thread scheduling/load balancing code is working at all then it should do an adequate job of distributing them without any extra hints.

When I modified my raytracer to run in two threads, XP didn't seem to have a problem spreading them over my two cores...

Having said that, when the raytracer was running in just one thread, the OS did seem to want to continuously bounce that thread between my two cores instead of leaving it running on just one of them (at least judging by the CPU usage graphs)... so maybe the load balancing code isn't quite as clever as I'd hoped.

John B

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this