# [.net] Parallel performance in C#

This topic is 4236 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

I'm looking to optimize a texture synthesis program I'm writing it by making it parallel. I ran a few tests in C#, however, and realized I have no clue how to get performant parallel algorithms in .Net. Here's an example of what I've tried

WI wi = new WI(a, b, c);
for (int i = 0; i < 1000; i++)
{
DoSomething(wi);
}

...

WorkDelegate wd = new WorkDelegate(DoSomething500);
IAsyncResult res1 = wd.BeginInvoke(new WI(a, b, c), null, null);
IAsyncResult res2 = wd.BeginInvoke(new WI(a2, b2, c2), null, null);
res1.AsyncWaitHandle.WaitOne();
res2.AsyncWaitHandle.WaitOne();


In this example, DoSomething just convolves two 1000x1000 matrices. DoSomething500 does it 500 times. Since the two threads should be completely independant, no cache coherency problems, etc. I'd think that this should run close to 2x speed on a dual-core system, but I find that the parallel version runs at most 40-60% better. Anyone have any ideas on making this faster, or should I stick to C++ for parallel stuff?

##### Share on other sites
All the code I've seen is in C++
http://www.devx.com/go-parallel/Article/32726

##### Share on other sites
You should take a look at Amdahl's Law

##### Share on other sites
According to Amdahl's law, the speedup would be 2x, since there is virtually no sequential code (1/(0-(1/2)) = 2). Amdahl's law doesn't take into consideration memory bandwidth or other hardware constraints, which is more likely my problem. Convolving matrices that can't fit inside the cache is probably going to make the memory bandwidth my bottleneck, so it may not be a .Net issue after all.

##### Share on other sites
afaik BeginInvoke runs on a low priority thread from the windows forms background thread pool.

If you really want to push it, you should be creating your own threads, and possibly setting processor affinity to each of them.

##### Share on other sites
Changing thread priority isn't going to make a big difference when almost nothing else is running. And a program like this should be running at low priority anyway so that other programs (and my own UI) remain responsive. As for processor affinity, the scheduler does this automatically.

For anyone who is interested, I tracked down the problem. Each 1000x1000 matrix of floats is around 4 megs. Obviously, convolving two matrices (resulting in a third) is far larger than can fit in the cache. This means that most memory accesses are cache misses, and going straight to the memory bus. Since the bus is shared between both cores, having both cores running simply made the memory bandwidth the limiting factor. When I took the memory bandwidth out of the equation by convolving smaller matrices (which is more similar to my texture synthesis application) the performance was nearly 95% better than the single-thread version.

##### Share on other sites
I'm interested =) ... thanks for posting this thread :).

##### Share on other sites
Yes, thanks indeed for posting. I found it to be quite informative!

1. 1
2. 2
3. 3
Rutin
21
4. 4
frob
17
5. 5

• 9
• 33
• 13
• 13
• 10
• ### Forum Statistics

• Total Topics
632582
• Total Posts
3007206

×