How to do Ordered Multithreading: 2 or more threads executing 1 same function.

This topic is 1920 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Here's my current code and the output:

[source lang="java"]public class Worker{
public static int id;

public static void print(){
System.out.println("Thread hashcode: " + Integer.toString(id));
}
}

public class Main implements Runnable {
public static void main(String[] arg){
Main m = new Main();

t1.start();
t2.start();
}

public synchronized void run(){
for (int i = 0; i < 20; i++){
Worker.id = t.hashCode();
Worker.print();
}
}
}

//=================OUTPUT===========================
[/source]

The ouput, in other words, are labeled as:

AAAAAABBBBBB

I would like to make it so that it does:

ABABABABABAB

In the past, I would create threads and make them work on things independent of one another, Thread A does X work, Thread B does Y work, Thread C does Z work, and so on. The X, Y, and Z do not conflict each other.

Now that I'm starting out on doing multithreading in the same function, I realized it's a unknown adventure for me. I would like someone to lead the way, and warn me, "It's too dangerous to go alone, take this!".

Would anyone like to help?

EDIT: I'm using Firefox 16 and Internet Explorer 9, both browsers appeared to wonk out the code formatting. Edited by tom_mai78101

Share on other sites
If all you want is minimal overlapping, you can just give each of your threads an "index" (from 0, 1, 2, ..., n - 1 where n is your thread count) and make each thread aware of how many threads it is working with. Then, each thread with index T does work unit T, T + n, T + 2n, T + 3n, ... there you have it, zero overlapping. For two threads, the first one will do 0, 2, 4, 6, ... and the second one will do 1, 3, 5, 7, ... all within the same "function".

Note you probably don't want to use System.out.println when working with threads, since they'll print their stuff in nondeterministic order depending on which thread got there first. You want to keep a memory buffer which'll store your thread's results (one slot for each work unit you're doing) and each thread writes there. Then, at the end, you can process it however you want.

Also, for multithreading to be of any use, you need your function to contain significant work that can be parallelized. If you're just trying to get two threads to somehow execute an arbitrary function "together" (your diagram is rather vague), that's not the way to go, especially considering each thread has its own stack and CPU context.

I find OpenMP to be rather refreshing when it comes to getting familiar with the concepts involved in scalable multithreading. Don't know if Java has that though. Edited by Bacterius

Share on other sites
The significant work I want to give to 2 or more threads, is a Rendering function. All it does is to blit bitmap images onto a Canvas, which will, according to my game logic, make the Canvas "refresh" quicker.

This is why I wanted the threads to execute in an orderly fashion, like a queue pool of threads running towards a goal and hitting a button, in a straight line.

If I index each thread with an ID or some sort, I wouldn't know how to implement it. I wouldn't be able to split the Rendering function in groups in whic they (group IDs) are divisible by the respective thread IDs. The Rendering function is based solely on the order of execution, so it's easier to sort their Z positions out (drawing from back to front). And another problem would be, if I don't do this in an orderly fashion, the rendering will be a catastrophe.

=========================================

Just off topic question. What do you do to println() properly for multithreading programs, if I shouldn't be using println(), as you've mentioned, but I don't want to use a memory buffer? I want to avoid using any sort of memory buffers as this program is to be executed on a hardware with very limited memory.

Share on other sites

The significant work I want to give to 2 or more threads, is a Rendering function. All it does is to blit bitmap images onto a Canvas, which will, according to my game logic, make the Canvas "refresh" quicker.

This is why I wanted the threads to execute in an orderly fashion, like a queue pool of threads running towards a goal and hitting a button, in a straight line.

If I index each thread with an ID or some sort, I wouldn't know how to implement it. I wouldn't be able to split the Rendering function in groups in whic they (group IDs) are divisible by the respective thread IDs. The Rendering function is based solely on the order of execution, so it's easier to sort their Z positions out (drawing from back to front). And another problem would be, if I don't do this in an orderly fashion, the rendering will be a catastrophe.

=========================================

Just off topic question. What do you do to println() properly for multithreading programs, if I shouldn't be using println(), as you've mentioned, but I don't want to use a memory buffer? I want to avoid using any sort of memory buffers as this program is to be executed on a hardware with very limited memory.

You pretty much can't control the order of execution when you have multiple threads without removing the parallell execution(Which really is the point of using threads)

Now, since you are blitting bitmaps onto a canvas you should probably give each thread its own section of the canvas to write to so if you got 2 threads you can just split the canvas in the middle and have ThreadA blit the left side while ThreadB blits the right side (with 4 threads you can split it both vertically and horizontally or into 4 "rows" or "columns"), that way there is no interference between threads.

as for println, there is no good way to solve this without synchronization (Which degrades performance), you can have each thread write to its own stream without any problems but if you want them to write to the same stream you will need synchronization. (println in java is synchronized so if one thread calls println on standard output any other thread trying to call println will block until the first one has completed its println call), your best bet is probably to write to separate streams, timestamp the writes and then merge the two streams at a later point. (printing to the console is slow anyway so its not a big deal to merge all the output at the end of each frame and then push it out) Edited by SimonForsman

Share on other sites
That, I did not realize the meaning until now.

What method design (function design) should I approach so that it would like ideally parallelized? And that would be it. I'll try to study the pattern or paradigm, and probably ask a few more questions about them if I get into some problems.

Share on other sites
For good performance with multiple threads here's some rough guidelines of what to do:

- Output from each thread must go to a different bit of memory (i.e. on a separate cache line to memory any other threads touch). This may mean you need two copies of the data.
- Read only input data can be shared between all threads.
- Instead of using threads directly, use a thread pool that someone else has written and create one thread per CPU core if you get a choice.
- The main thread should split the work to be done into a bunch of independent tasks and adds them to the pool's work queue. It can then continue with other processing that won't affect the data the threads are using if you want. At some point the main thread probably needs to block and wait for the threads to complete their tasks.
- If the tasks can vary in how long they take to process it's a good idea to create significantly more tasks than you have CPU cores to distribute the workload more evenly, but making too many tasks will also add overheads.

By the way because of the way the CPU cache works in 64-byte cache lines you really don't want to have one task processing even elements of an array, and another task handling the odd elements. It's usually much quicker to give the first half of the array to one thread, and the second half to another to stop them fighting over memory accesses. The same goes for splitting an image vertically - do all the splitting horizontally instead and it should be quicker.

Share on other sites
"Now, since you are blitting bitmaps onto a canvas you should probably give each thread its own section of the canvas to write to so if you got 2 threads you can just split the canvas in the middle and have ThreadA blit the left side while ThreadB blits the right side (with 4 threads you can split it both vertically and horizontally or into 4 "rows" or "columns"), that way there is no interference between threads."

Split vertically, not horizontally. Why? Because bitmap memory normally runs left to right, top to bottom. This means that the pixels either side of a vertical dividing line are next to each other in memory and there's a good chance caches will try and pre-load the data. If things next to each other in memory are accessed by different threads running on cores in different packages, they'll have to move ownership of the memory around between the cores' caches. This is slow (it's like a cache miss, but a bit less expensive).

If you split vertically, you minimise the number of pixels which border each other and might be repeatedly moved between the threads/cores.

Share on other sites
Katie: wouldn't that be a horizontal split, though?

Share on other sites
There's a lot of discussion about splitting up the canvas into groups, and then let each thread draw the group. It gave me a whole lot of questions.

1. Wouldn't splitting up the Canvas into areas for threads to render on cause something that resembles tearing?

2. In Java, I do not know of a Java method that allows splitting bitmaps while it is being drawn. What is the logic for that? Test to see if the bitmap Rect area is over the threshold, and then stop drawing that area, while letting the other thread draw the undrawn area?

3. rip-off mentioned that I may be passing a gradually completing Canvas into methods. The possibility that this might be my bottleneck is actually quite high (as I haven't considered it before), as I have passed the same reference to the Canvas object through a lot of methods. Are there other alternatives to efficiently draw tons of bitmaps (of powers of 2, in my case) onto a Canvas? Edited by tom_mai78101

Share on other sites

There's a lot of discussion about splitting up the canvas into groups, and then let each thread draw the group. It gave me a whole lot of questions.

1. Wouldn't splitting up the Canvas into areas for threads to render on cause something that resembles tearing?

2. In Java, I do not know of a Java method that allows splitting bitmaps while it is being drawn. What is the logic for that? Test to see if the bitmap Rect area is over the threshold, and then stop drawing that area, while letting the other thread draw the undrawn area?

3. rip-off mentioned that I may be passing a gradually completing Canvas into methods. The possibility that this might be my bottleneck is actually quite high (as I haven't considered it before), as I have passed the same reference to the Canvas object through a lot of methods. Are there other alternatives to efficiently draw tons of bitmaps (of powers of 2, in my case) onto a Canvas?

1,2 you can access a bitmaps data array in Java (atleast on Android) and copy the parts of the array you want to the buffer you're blitting to.

3. yes, this is quite possible, unless you are blitting tons of bitmaps or using some really ineffective blitting method sending the canvas to the GPU will be the slow part. The Canvas in Java seems to be singlebuffered so you should blit to a software buffer (BufferedImage should work) and then blit that buffer to the canvas. (which means you only do a single blit to the canvas). Also, if you have layers or sections of layers that rarely change you can blit those to their own buffered images at launch + when they change and then blit those images to the main bufferedimage each frame. Edited by SimonForsman

Share on other sites
"wouldn't that be a horizontal split, though?"

OK, I was terminologically unclear. Split it vertically into horizontal strips. Yes, the split line is horizontal.

Share on other sites
The main problem for splitting the Canvas for threads to work on is the splitting part. I have not found so far a way to split a Canvas class object area into at least two areas. They said that the class Canvas doesn't "own" pixel data. The canvas I'm using to make drawing calls is always attached to the bitmap images.

I'm compelled to create more Canvas objects, and run threads through them. Merging Canvas is going to be a new topic for me, and so does trying to get 2 canvas to draw the same bitmap at different locations to match up with its universal coordinates (originally intended coordinates), because once they are split up, the X and Y destination coordinates for the bitmaps are not entirely the same as they should be.

For Android, in my opinion, their method of using their subset of Java would involve something like using Canvas constructor to create one from a bitmap. Maybe that counts as an BufferedImage substitute.

I did checked out a few resources about parallel programming for Java. The conncurrency keyword, "synchronized" is a total mystery for me. They also mentioned that how a "synchronized" keyword can also affect immensely the execution of threads. It would seemed I have reached a Pandora's Box, and probably may get my project totalled if I keep nudging towards it.

Share on other sites
Can you tell us more about your game?

There are lots of techniques that can be used to speed up the game. Multi-threading is a very complex way - it generally requires large architectural changes to how your program works, and also can only provide at most a constant speedup N - where N is the number of distinct hardware "cores" that can be used simultaneously. The chances that your program will get such a boost are low, most real applications aren't perfectly parallisable.

There might be other ways which can offer a much larger speedup - algorithmic improvements, choosing the right data structures or improving your memory layout and access patterns.

I did checked out a few resources about parallel programming for Java. The conncurrency keyword, "synchronized" is a total mystery for me. They also mentioned that how a "synchronized" keyword can also affect immensely the execution of threads. It would seemed I have reached a Pandora's Box, and probably may get my project totalled if I keep nudging towards it.
[/quote]
Hold up - if you don't understand such things then you absolutely should not be using threads. Threads add some very complex behaviour to the otherwise simple to understand Java programming model. When two threads interact without correct communication, the results are undefined and can result in some bizarre, even seemingly "impossible" situations.

If you are serious about learning to write multi-threaded programs you need to build up this knowledge first.

Share on other sites

Can you tell us more about your game?

My game is basically a player controlling a ball with an accelerometer embedded in a hardware system (such as an Android phone). The ball is used to knock other balls into a hole (the goal) in order to complete the level. There are many levels with objects interacting with the player's ball and the balls that should be knocked into the goal.

EDIT: I should mentioned that the accelerometer can easily change its X axis, Y axis, and Z axis values, simply because you are holding the accelerometer with your hands. It is not supposed to be placed on a table or on a hard surface.

About the game logic, the game itself is split into two groups, the Update group and the Rendering group. It's multithreaded and it's done in a simple way of abusing the nature of threads. Like many other people have said time and time again, threads are run inconsistently together, causing many race conditions and multiple complex problems if not handled correctly. By understanding the nature of how threads compete one another on the CPU cores (in this case, I have two cores), I utilize a parallel programming paradigm.

One thread is used to update logic ONLY. The other thread is used to render objects ONLY. And that's it. All variables are read-only if thread A wants to read what thread B is using, and vice versa. I have not done anything with concurrency in order to make both of these threads communicate with each other. I simply made them as separate processes with "global-ish" variables wrapped in a Java class (namespace, as you would call it).

When I test the program out, all is fine. Everything run as expected, and there's no bugs on the rendering or updates. It's a perfect combination. But, there are times when I felt it is running slow after a long period of processing the threads, updates, and rendering.

I did checked out a few resources about parallel programming for Java. The conncurrency keyword, "synchronized" is a total mystery for me. They also mentioned that how a "synchronized" keyword can also affect immensely the execution of threads. It would seemed I have reached a Pandora's Box, and probably may get my project totalled if I keep nudging towards it.

Hold up - if you don't understand such things then you absolutely should not be using threads. Threads add some very complex behaviour to the otherwise simple to understand Java programming model. When two threads interact without correct communication, the results are undefined and can result in some bizarre, even seemingly "impossible" situations.

If you are serious about learning to write multi-threaded programs you need to build up this knowledge first.
[/quote]

The only thing I understand about threads, is that they are unpredictable. So, I used this only understanding and wrote a medium-sized program that goes with this rule of thumb. Technically, it's a little bit like multithreading, but in reality, I really don't know a better way to describe this. "Separated threading" sounds a bit off, but that's a better description of it.

If "synchronized" is the answer to all of my questions, then it's either me not looking at it correctly or it's something I find it hard to grasp, yet can't find out why. But I did some more research before posting this.

The function pipe I drew in the diagram may well be a Java synchronized method. In order to let two threads go in one by one and on a tight, repeating loop, I have to understand the concept of notify(), notifyAll() and wait() and how it works in such a loop... This is the part where I'm probably stuck in, and had me lost my focus, I believe. Edited by tom_mai78101

Share on other sites

The only thing I understand about threads, is that they are unpredictable. So, I used this only understanding and wrote a medium-sized program that goes with this rule of thumb. Technically, it's a little bit like multithreading, but in reality, I really don't know a better way to describe this. "Separated threading" sounds a bit off, but that's a better description of it.
[/quote]

Sounds interesting. One thread/function that manages and executes different tasks, like when a multi-tasking OS is running 2 threads simultaneously on a single core (or running N+X threads on N cores, where X > 0)? I assume this requires that your tasks can suspend themselves after a timeslice (to give control back to the main function). Resembles coroutines (used within Lua to resemble multi-tasking) to me.

Share on other sites

The only thing I understand about threads, is that they are unpredictable. So, I used this only understanding and wrote a medium-sized program that goes with this rule of thumb. Technically, it's a little bit like multithreading, but in reality, I really don't know a better way to describe this. "Separated threading" sounds a bit off, but that's a better description of it.

Sounds interesting. One thread/function that manages and executes different tasks, like when a multi-tasking OS is running 2 threads simultaneously on a single core (or running N+X threads on N cores, where X > 0)? I assume this requires that your tasks can suspend themselves after a timeslice (to give control back to the main function). Resembles coroutines (used within Lua to resemble multi-tasking) to me.
[/quote]

I think you could say that. I just don't have any better descriptive ways to say it though.

Share on other sites

My game is basically a player controlling a ball with an accelerometer embedded in a hardware system (such as an Android phone). The ball is used to knock other balls into a hole (the goal) in order to complete the level. There are many levels with objects interacting with the player's ball and the balls that should be knocked into the goal.
[/quote]
This game doesn't sounds complicated enough to require multi-threading when rendering. Arguably it might not require any threading at all.

About the game logic, the game itself is split into two groups, the Update group and the Rendering group. It's multithreaded and it's done in a simple way of abusing the nature of threads. Like many other people have said time and time again, threads are run inconsistently together, causing many race conditions and multiple complex problems if not handled correctly. By understanding the nature of how threads compete one another on the CPU cores (in this case, I have two cores), I utilize a parallel programming paradigm.

One thread is used to update logic ONLY. The other thread is used to render objects ONLY. And that's it. All variables are read-only if thread A wants to read what thread B is using, and vice versa. I have not done anything with concurrency in order to make both of these threads communicate with each other. I simply made them as separate processes with "global-ish" variables wrapped in a Java class (namespace, as you would call it).

When I test the program out, all is fine. Everything run as expected, and there's no bugs on the rendering or updates. It's a perfect combination.
[/quote]
Unfortunately, it sounds like you have a latent bug. Your current combination of code, hardware and JVM implementation might not manifest the bug, but a change in any of these could break your program.

The problem is simple: Java doesn't guarantee that a thread will see a "consistent" view of the state of the program modified by other threads in the absence of synchronized (or Java classes/methods that have the same memory model semantics as synchronized (e.g. Atomic classes, volatile variables or explicit locks).

There are two particularly interesting cases of bugs caused by this.

One is that thread A does not see changes made by thread B at all - each thread might be only reading and writing to its own registers and processor caches, and in the absence of synchronization there is no reason for the JVM to explicitly let the processor know it must flush these local changes to main memory, and for the second thread to reload any changes.

The second is that thread A sees some of the actions of thread B, but in an unexpected order or even in seemingly "impossible" orders. For example, it is possible for a thread to see a partially constructed object in the following code:
 private static Foo sharedFoo = null; public static Foo getFoo() { if(sharedFoo == null) { Foo foo = new Foo(42); sharedFoo = foo; } return sharedFoo; } 
Even though Foo's constructor sets some field to the passed value, if two threads race on this method then it is possible for the following interaction to take place:
 Thread A - Calls getFoo().getValue() Thread A - Enter method getFoo() Thread A - sharedFoo == null Thread A - Allocate memory for "foo" // Note: for this example we presume the JVM zeros the allocated memory. Thread A - set sharedFoo to reference this memory >>> Context switch <<< Thread B - Calls getFoo().getValue() Thread B - Enter method getFoo() Thread B - sharedFoo != null Thread B - return sharedFoo Thread B - getValue() returns 0 // !!! >>> Context switch <<< Thread A - Calls constructor for "foo" Thread A - return sharedFoo Thread A - getValue() returns 42 
It might seem odd that the constructor of "foo" is run after the reference "sharedFoo" has been set, but this is a perfectly valid optimisation for the JVM/hardware to perform in the absence of synchronisation. If the constructor was even more complicated, you could end up with one thread seeing a class instance with the invariants broken!

But, there are times when I felt it is running slow after a long period of processing the threads, updates, and rendering.
[/quote]
If your game is running slow I am positive there are much better ways to get better performance. In any case, your first step should be to profile the application and determine the source of the slowness.

Adding threads can slow an application down - unless done with care.

If "synchronized" is the answer to all of my questions, then it's either me not looking at it correctly or it's something I find it hard to grasp, yet can't find out why. But I did some more research before posting this.

The function pipe I drew in the diagram may well be a Java synchronized method. In order to let two threads go in one by one and on a tight, repeating loop, I have to understand the concept of notify(), notifyAll() and wait() and how it works in such a loop... This is the part where I'm probably stuck in, and had me lost my focus, I believe.
[/quote]
One could use synchronized to get the kind of pattern you described above. However, the code would actually run slower than if you had a single thread alternating between the two actions.

Share on other sites
Do you mean something like this? I happened to get this to work as I wanted to.

[source lang="java"]package test;

import java.util.ArrayList;
import java.util.List;
import java.util.Random;

public class Test {
List<Integer> array = new ArrayList<Integer>();
Runnable run = new Runnable() {
@Override
public void run() {
while (array.size() < 50) {
synchronized (array) {
try {
array.wait();
}
catch (InterruptedException e) {
}
}
}
}
};
Runnable wait = new Runnable() {
@Override
public void run() {
Random r = new Random();
while (array.size() < 50) {
synchronized (array) {
try {
}
catch (InterruptedException e) {
e.printStackTrace();
}
array.notify();
}
}
}
};

public Test() {
t1 = new Thread(run);
t2 = new Thread(run);
t3 = new Thread(wait);
}

public void start() {
t1.start();
t2.start();
t3.start();
}

public static void main(String[] ar) {
Test t = new Test();
t.start();
}
}
[/source]

What it does is it alternates between printing out "Thread 1 - First Thread" and "Thread 2- Last Thread" over and over again 50 times. I gave it a few extra milliseconds of time for it to completely finish the synchronized run() method, but that's only because I wanted to learn more about the keyword, [color=#800080]synchronized.

Anyway, if my current project is built on a latent bug in its concurrent state, I think the best method to increase performance would have to be what you recommended me to do, since you did say that if I were to modify the thread logic, I could break the build easily. If I just modify a small portion of the thread flow, it wouldn't hurt much at all.

My biggest proof to support that sentence is the 7 additional entities and 9 level maps I have added since I started the "separated threading" paradigm four months ago (July, at the time). I just kept adding and adding new game entities and maps, to the point that I'm just fixing small collision bugs here and there.

If I use only one single thread to work on both the Update group and the Rendering group, its work would be about twice the amount of work it takes for a thread to process either the Update group or the Rendering group, and not both. I did test to see if running on a single thread works or not, just by rewriting a small portion of my base code, where it is the place that initiated the separated threading. The result is actually slower than usual; the game runs sluggishly slow. I reverted it back, and it's now normal.

I really don't know how to say this, but I have to keep this "latent bug" existing on my build.

Gotta love JVM for this, I guess. Or in this case, gotta love Android's Dalvik VM.

EDIT: I hate my browser crashing... Edited by tom_mai78101

Share on other sites
The "wait" "run" code seems wrong to me:

• Access to the "array" variable should be synchronized: (array.size() and array.add(...)).
• That it alternates between both threads depends on the current JVM implementation. From Object.notify() javadoc: "If any threads are waiting on this object, one of them is chosen to be awakened. The choice is arbitrary and occurs at the discretion of the implementation."
• Even an "adecuate" JVM implementation is not enough to guarantee that.
• The "run" code runs 50+1 times

Share on other sites
An interesting thing is to take the program and wrap it in a reasonably long loop (I've removed the sleep and the output to better illustrate the point):
 public class Test { Thread t1; Thread t2; Thread t3; List<Integer> array = new ArrayList<Integer>(); Runnable run = new Runnable() { @Override public void run() { while (array.size() < 50) { synchronized (array) { try { array.wait(); } catch (InterruptedException e) { } array.add(Thread.currentThread().hashCode()); // System.out.println(Thread.currentThread().getName()); } } } }; Runnable wait = new Runnable() { @Override public void run() { // Random r = new Random(); while (array.size() < 50) { synchronized (array) { /* try { Thread.sleep(r.nextInt(200) + 100); } catch (InterruptedException e) { e.printStackTrace(); } */ array.notify(); } } } }; public Test() { t1 = new Thread(run); t1.setName("Thread 1 - First Thread"); t2 = new Thread(run); t2.setName("Thread 2 - Last Thread"); t3 = new Thread(wait); } public void start() { t1.start(); t2.start(); t3.start(); } public void join() throws InterruptedException { final long ONE_MINUTE = 1000 * 60; t1.join(ONE_MINUTE); t2.join(ONE_MINUTE); t3.join(ONE_MINUTE); } public static void main(String[] ar) throws InterruptedException { for(int i = 0 ; i < 1000 ; ++i) { Test t = new Test(); t.start(); t.join(); int count = Thread.activeCount(); System.out.println("Run " + i + ": elements " + t.array.size() + " outstanding threads: " + count); } System.out.println("Done"); } } 
On my system, running this program usually results in at least one thread hanging after about 200 - 250 iterations (sometimes more, sometimes less). It is possible that around this point the JVM has decided to do some more advanced optimisations after monitoring the code for some time. Another interesting thing to note is that even when finished running, the list sometimes contains 50 elements, other times 51.

These are the concurrent bugs in your program that I can see:

• It contains a data race - threads read the size() of the array without holding the lock.

• Use of wait()/notify() without a logical condition associated with it.

• You are not calling Object.wait() in a loop. This means that the code is sensitive to spurious or delayed wakeups.

• You are using Object.notify() rather than Object.notifyAll(). Notifying a single thread is advanced.

Here is one situation where your program could hang. Imagine that array.size() is 49. Both "run" threads are waiting on the condition variable. The "wait" thread is just after acquiring the lock, it then calls notify(), releases the lock and gets context switched out. The thread that was signalled is now runnable and will eventually get context switched in. It re-acquires the lock on exiting wait(), and adds an element to the array and releases its lock. During its next iteration of the loop, the array is size 50 so it ends. The system (eventually) context switches back to the "wait" thread, which also may notice that the size is 50. If it does, it will also end. There still is a thread waiting for a signal that will never arrive. Other times, the "wait" thread will not notice, it will enter the synchronized block (which guarantees that it's view of the events to become up to date), it notify()s again and finishes. In this case the third thread is signalled again and also finishes.

But these bugs aside, let us imagine you fix all that. The code behaves the way you want - in every way except performance. Due to the coarse nature of the locks, the code is actually slower than the equivalent straight line code in a single thread.

Performance aside - how would we fix the correctness of your program? Well, the first thing I want to do is define what I believe the program should do. My decision is that it should populate a list of 50 elements with the identifiers of the threads, such that no two identifiers are adjacent in the final array. Once we understand that, we can then consider what our wait/notify condition should be. As a given thread, we need to wait when last element in the array is our identifier. This implies we don't wait when the array is empty. We also don't wait when the array is "full" - when our limit of 50 has been reached.

Once we have waited for this condition to be met, our thread can act. Because we have two competing conditions we don't know which of them caused us to exit the wait loop. If it was the array becoming full, we must end. Otherwise, we can add our identifier to the array. Finally, we notify the waiters on the condition variable.

This is such a program, which I believe to be correct:
 public class Test { private static final int Max = 50; private final Thread t1; private final Thread t2; private final List<Long> array = new ArrayList<Long>(); public Test() { Runnable run = new Runnable() { @Override public void run() { final Long me = Long.valueOf(Thread.currentThread().getId()); boolean running = true; while (running) { synchronized (array) { while(array.size() < Max &amp;&amp; !array.isEmpty() &amp;&amp; array.get(array.size() - 1).equals(me)) { try { array.wait(); } catch (InterruptedException e) { throw new RuntimeException(e); } } if(array.size() < Max) { array.add(me); } else { running = false; } array.notifyAll(); } } } }; t1 = new Thread(run); t1.setName("Thread 1 - First Thread"); t2 = new Thread(run); t2.setName("Thread 2 - Last Thread"); } public void start() { t1.start(); t2.start(); } public void join() throws InterruptedException { final long ONE_MINUTE = 1000 * 60; t1.join(ONE_MINUTE); t2.join(ONE_MINUTE); } public boolean check() { synchronized (array) { Long previous = null; for(Long current : array) { if(previous != null &amp;&amp; previous.equals(current)) { return false; } previous = current; } return array.size() == Max; } } public static void main(String[] ar) throws InterruptedException { for(int i = 0 ; i < 1000 ; ++i) { Test t = new Test(); t.start(); t.join(); if(!t.check()) { throw new IllegalStateException("Bad check!"); } int count = Thread.activeCount(); System.out.println("Outstanding threads: " + count); } System.out.println("Done"); } } 

Note that the runnable code supports an arbitrary number of threads - provided there are at least two (it will deadlock if there is a single thread as the wait condition of the last entry not being its own identifier will never change):
 public class Test { private static final int Max = 50; private final List<Thread> threads = new ArrayList<Thread>(); private final List<Long> array = new ArrayList<Long>(); public Test() { Runnable run = new Runnable() { @Override public void run() { final Long me = Long.valueOf(Thread.currentThread().getId()); boolean running = true; while (running) { synchronized (array) { while(array.size() < Max &amp;&amp; !array.isEmpty() &amp;&amp; array.get(array.size() - 1).equals(me)) { try { array.wait(); } catch (InterruptedException e) { throw new RuntimeException(e); } } if(array.size() < Max) { array.add(me); } else { running = false; } array.notifyAll(); } } } }; int numThreads = new Random().nextInt(42) + 2; for(int i = 0 ; i < numThreads ; ++i) { threads.add(new Thread(run)); } } public void start() { for(Thread thread : threads) { thread.start(); } } public void join() throws InterruptedException { final long ONE_MINUTE = 1000 * 60; for(Thread thread : threads) { thread.join(ONE_MINUTE); } } public boolean check() { synchronized (array) { Long previous = null; for(Long current : array) { if(previous != null &amp;&amp; previous.equals(current)) { return false; } previous = current; } return array.size() == Max; } } public static void main(String[] ar) throws InterruptedException { for(int i = 0 ; i < 1000 ; ++i) { Test t = new Test(); t.start(); t.join(); if(!t.check()) { throw new IllegalStateException("Bad check!"); } int count = Thread.activeCount(); System.out.println("Outstanding threads: " + count); } System.out.println("Done"); } } 
Note that this program does not cause each thread to be interleaved in-order. That is, you could get interleavings like:

ABACBACAB
[/quote] Edited by rip-off

Share on other sites
For comparison, the performance difference between a naive serial version of that program and the given parallel one is two orders of magnitude. I only included a "fixed" version for reference purposes, not because I think it is a good solution to your problem.

Share on other sites

the performance difference between a naive serial version of that program and the given parallel one is two orders of magnitude

I guess that the above means that the serial version is around a hundred times faster. Am I right?

Share on other sites
Actually, I was incorrect, the performance difference is actually just over 50 times faster.

This is a simple program running 100,000 iterations for 2 candidate entries (i.e. 2 threads in the parallel version), filling a 50 element array 3 times each. The serial logic is like this:
 public static Long run(Long [] candidates) { List<Long> array = new ArrayList<Long>(); Random random = new Random(); while(array.size() < Common.Max) { int index = random.nextInt(candidates.length); Long candidate = candidates[index]; if(array.isEmpty() || !candidate.equals(array.get(array.size() - 1))) { array.add(candidate); } } Common.check(array); return array.get(random.nextInt(array.size())); } 
The return value is used to prevent the compiler from optimising everything away. The parallel implementation was modified to behave similarly.

The driver program looks like this:
 public class Profiler { private static long profile(String name, Callable<Long> callable) throws Exception { long accumulator = 0; long timer = 0; for(int i = 0 ; i < 1000 ; ++i) { long start = System.nanoTime(); Long value = callable.call(); accumulator += value.longValue(); long end = System.nanoTime(); long duration = end - start; timer += duration; } System.out.println(name + " duration: " + timer + " nanoseconds"); System.out.println(name + " accumulator: " + accumulator); return timer; } public static void main(String [] args) throws Exception { final Long [] candidates = { Long.valueOf(42), Long.valueOf(13) }; for(int i = 0 ; i < 3 ; ++i) { long a = profile("Serial", new Callable<Long>() { @Override public Long call() throws Exception { return Serial.run(candidates); } }); long b = profile("Parallel", new Callable<Long>() { @Override public Long call() throws Exception { return Parallel.run(candidates); } }); if(a > b) { System.out.println(a / (double)b); } else { System.out.println(b / (double)a); } } } } 
Increasing the array size to 10,000 (and reducing the number of iterations) makes the performance difference about 40% faster. Obviously the thread creation / joining times skew the result much more heavily when the array is only 50 elements.

Of course, this is a very unrealistic scenario - the "multi-threaded" implementation is just as serial as the naive implementation - it would never be a candidate for production code. The purpose is to illustrate in actual numbers what the excessive communication overhead can cost you if you try to multi-thread a serial algorithm.

Share on other sites

the performance difference is actually just over 50 times faster

Ok, just one order of magnitude.
But what I wanted to emphasize is that the serial version will be ALWAYS faster than the multi-threaded version - where in this particular scenario multi-threaded does not mean parallel ];)