C# - Having a strange problem with threading.

Started by
8 comments, last by Narf the Mouse 14 years, 1 month ago
I'm attempting to write a simple parallel processing class. It receives a list of generic types and an action to perform on them. Then, it "splits" the list among threads, each of which apply the action to "their" list items. The problem is, I'd expect processor use to rise to around 100% while this is running. It tops out at 58% - So it seems as if something isn't parallelizing properly. Source:

public static class Paralelize<T>
    {
        [ThreadStatic]
        static List<System.Threading.Thread> threads = new List<System.Threading.Thread>();
        public static void ParalelizeOperation(IEnumerable<T> operand, Action<T> op)
        {
            Int32 t;

            threads.Clear();
            System.Threading.Thread temp;
            Tuple2<IEnumerable<T>, Int32, Int32, Action<T>> data;
            for (t = 0; t < System.Environment.ProcessorCount - 1; ++t)
            {
                data = new Tuple2<IEnumerable<T>, Int32, Int32, Action<T>>(operand, System.Environment.ProcessorCount, t, op);
                temp = new System.Threading.Thread(new System.Threading.ParameterizedThreadStart(RunIEnumerable));
                temp.IsBackground = true;
                temp.Start(data);
                threads.Add(temp);
            }


            data = new Tuple2<IEnumerable<T>, Int32, Int32, Action<T>>(operand, System.Environment.ProcessorCount, System.Environment.ProcessorCount - 1, op);
            RunIEnumerable(data);

            t = 0;
            while (t < threads.Count && threads[t].IsAlive)
                if (!threads[t].IsAlive)
                    ++t;
        }


        static void RunIEnumerable(Object obj)
        {
            Tuple2<IEnumerable<T>, Int32, Int32, Action<T>> data = (Tuple2<IEnumerable<T>, Int32, Int32, Action<T>>)obj;
            IEnumerable<T> enumerable = data.Item1;

            for (Int32 t = data.Item3; t < enumerable.Count(); t += data.Item2)
            {
                data.Item4(enumerable.ElementAt(t));
            }
        }
    }

Advertisement
Your class is ridiculously complex and heavyweight, as well as a nightmare to try to read and follow. Additionally, the operation you are trying to implement already exists.

int[] data = new[] { 0, 1, 2, 3, 4, 5, 6 };Parallel.ForEach(data, Console.WriteLine);
Mike Popoloski | Journal | SlimDX
Quote:Original post by Mike.Popoloski
Your class is ridiculously complex and heavyweight, as well as a nightmare to try to read and follow. Additionally, the operation you are trying to implement already exists.

*** Source Snippet Removed ***

Wow, uh, thanks.

Er, what changes would you suggest to make it simple and lightweight?

Edit: That operation topped out at 57%, according to the task manager. I have a dual-core CPU; I'd expect at least a 33% improvement in speed, when I'm not seeing any, really.
Quote:Original post by Narf the Mouse
Er, what changes would you suggest to make it simple and lightweight?


I think he's saying just write your data processing function and use Parallel.ForEach. From what I gather Parallel.ForEach will efficiently split your data amongst threads running the passed function. So, I'd suggest reading the docs about Parallel...

-me
Quote:Original post by Palidine
Quote:Original post by Narf the Mouse
Er, what changes would you suggest to make it simple and lightweight?


I think he's saying just write your data processing function and use Parallel.ForEach. From what I gather Parallel.ForEach will efficiently split your data amongst threads running the passed function. So, I'd suggest reading the docs about Parallel...

-me

As I've said before about other things, I wouldn't learn much of anything about parallel processing beyond how to use Parallel from that. And, Parallel.ForEach isn't showing any real speed improvement on my dual-core, so something is still not working right, it seems.
Hmmm, where to begin? First off, a big issue is your use of the Thread class, which is the heavyweight in .NET threading. .NET maintains a thread pool, which is infinitely preferable for small and fast operations. That change alone will make your function more efficient.

Why do you maintain your list of threads outside of the function? Furthermore, why is it marked ThreadStatic? Do you even know what this does? It adds significant overhead to all operations done on it, and it doesn't even appear to be necessary in this case.

Your use of this "Tuple2" type makes things confusing. Why not one of the built-in tuple types? If you're running on an older version of .NET, why is it called Tuple2 if there are four members? You should follow the usage pattern given in .NET 4 and provide static creation methods so that you can take advantage of type inference and cut down on those atrocious type parameters lists. The use of the "var" keyword will also help here.

Your loop to wait for the other threads to finish is weird at first glance. Did you not know of the Thread.Join method?

You're at least using .NET 3.5 here, so why not use lambda expressions to write the RunIEnumerable method inline?

Your RunIEnumerable method is very inefficient. You're calling IEnumerable.Count() and IEnumerable.ElementAt(), which can very easily end up being O(n) operations, which will kill you on large data sets. Additionally, the Count() method gets called on every loop operation.

Finally, the task manager isn't the greatest way to get performance information, so I would take any numbers it's giving you with a grain of salt.
Mike Popoloski | Journal | SlimDX
Quote:Original post by Narf the Mouse
Edit: That operation topped out at 57%, according to the task manager. I have a dual-core CPU; I'd expect at least a 33% improvement in speed, when I'm not seeing any, really.


You won't see any speed increase if you're IO bound or using a single mutex-protected resource. The problem at that point isn't Parallel.ForEach or your home brewed equivalent pale imitation, but whatever you're looping over. By using my psychic powers, I am able to determine you're doing something along these lines. Care to share how you're using this?
As far as I can see, exactly what it's supposed to be used for - Looping over an IENumerable and applying a function.

            System.Diagnostics.Stopwatch stopwatch = new System.Diagnostics.Stopwatch();            stopwatch.Start();            Double startTime, endTime;            Int32 t = 0,                count = 10000000;            List<Tuple2<String, String, String, String>> list = new List<Tuple2<String, String, String, String>>(count);            for (t = 0; t < count; ++t)            {                list.Add(new Tuple2<String, String, String, String>("Hello ", "World!", "", ""));            }            Action<Tuple2<String, String, String, String>> action = new Action<Tuple2<String, String, String, String>>(                a =>                {                    a.Item3 = a.Item1 + a.Item2;;                }            );            startTime = stopwatch.Elapsed.TotalMilliseconds;            for (t = 0; t < count; ++t)            {                action(list[t]);            }            endTime = stopwatch.Elapsed.TotalMilliseconds;            Handy.StringLoopsTo(endTime - startTime, loops: count, a: Console.WriteLine);            startTime = stopwatch.Elapsed.TotalMilliseconds;            Paralelize<Tuple2<String, String, String, String>>.ParalelizeOperation(                list,                action            );            // System.Threading.Tasks.Parallel.ForEach<Tuple2<String, String, String, String>>(list, action);            endTime = stopwatch.Elapsed.TotalMilliseconds;            Handy.StringLoopsTo(endTime - startTime, loops: count, a: Console.WriteLine);            Console.ReadKey(true);
Quote:
As I've said before about other things, I wouldn't learn much of anything about parallel processing beyond how to use Parallel from that.

You've got the right idea, but the wrong approach. You will learn more about the domain by using Parallel than you will by trying to re-implement the technology incorrectly without a proper foundation. In fact, the approach of "learning by stumbling about in the dark" is likely to lead you to learn things incorrectly.

This is not saying that you don't learn by doing and making mistakes -- mistakes are certainly important. But your approaches thus far seem to lack that foundation and your attitude tends to imply, to me at least, that you're letting yourself get tunnel vision regarding reinventing the wheel.

As far as concurrency goes, in any case, the harder things to learn are more about what you do in a concurrent context rather than how you do it. In other words, it's how you deal with what gets run on multiple threads, not the thread pool and sync primitive implementations themselves.
Quote:Original post by jpetrie
Quote:
As I've said before about other things, I wouldn't learn much of anything about parallel processing beyond how to use Parallel from that.

You've got the right idea, but the wrong approach. You will learn more about the domain by using Parallel than you will by trying to re-implement the technology incorrectly without a proper foundation. In fact, the approach of "learning by stumbling about in the dark" is likely to lead you to learn things incorrectly.

This is not saying that you don't learn by doing and making mistakes -- mistakes are certainly important. But your approaches thus far seem to lack that foundation and your attitude tends to imply, to me at least, that you're letting yourself get tunnel vision regarding reinventing the wheel.

As far as concurrency goes, in any case, the harder things to learn are more about what you do in a concurrent context rather than how you do it. In other words, it's how you deal with what gets run on multiple threads, not the thread pool and sync primitive implementations themselves.

Huh...Good point.

This topic is closed to new replies.

Advertisement