Jump to content
  • Advertisement
Sign in to follow this  
TimMisiak

[.net] Parallel performance in C#

This topic is 4236 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm looking to optimize a texture synthesis program I'm writing it by making it parallel. I ran a few tests in C#, however, and realized I have no clue how to get performant parallel algorithms in .Net. Here's an example of what I've tried

WI wi = new WI(a, b, c);
for (int i = 0; i < 1000; i++)
{
    DoSomething(wi);
}

...

WorkDelegate wd = new WorkDelegate(DoSomething500);
IAsyncResult res1 = wd.BeginInvoke(new WI(a, b, c), null, null);
IAsyncResult res2 = wd.BeginInvoke(new WI(a2, b2, c2), null, null);
res1.AsyncWaitHandle.WaitOne();
res2.AsyncWaitHandle.WaitOne();



In this example, DoSomething just convolves two 1000x1000 matrices. DoSomething500 does it 500 times. Since the two threads should be completely independant, no cache coherency problems, etc. I'd think that this should run close to 2x speed on a dual-core system, but I find that the parallel version runs at most 40-60% better. Anyone have any ideas on making this faster, or should I stick to C++ for parallel stuff?

Share this post


Link to post
Share on other sites
Advertisement
According to Amdahl's law, the speedup would be 2x, since there is virtually no sequential code (1/(0-(1/2)) = 2). Amdahl's law doesn't take into consideration memory bandwidth or other hardware constraints, which is more likely my problem. Convolving matrices that can't fit inside the cache is probably going to make the memory bandwidth my bottleneck, so it may not be a .Net issue after all.

Share this post


Link to post
Share on other sites
afaik BeginInvoke runs on a low priority thread from the windows forms background thread pool.

If you really want to push it, you should be creating your own threads, and possibly setting processor affinity to each of them.

Share this post


Link to post
Share on other sites
Changing thread priority isn't going to make a big difference when almost nothing else is running. And a program like this should be running at low priority anyway so that other programs (and my own UI) remain responsive. As for processor affinity, the scheduler does this automatically.

For anyone who is interested, I tracked down the problem. Each 1000x1000 matrix of floats is around 4 megs. Obviously, convolving two matrices (resulting in a third) is far larger than can fit in the cache. This means that most memory accesses are cache misses, and going straight to the memory bus. Since the bus is shared between both cores, having both cores running simply made the memory bandwidth the limiting factor. When I took the memory bandwidth out of the equation by convolving smaller matrices (which is more similar to my texture synthesis application) the performance was nearly 95% better than the single-thread version.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!