"Thread" Mutual Exclusivity

Started by
3 comments, last by Ectara 12 years ago
I have a linked list of threads that iterate through a string in a regular expression tester. The threads are part of an NFA state machine, which are objects that contain the current state, loop, and capture group information. They aren't threads provided by the machine or OS, and they all move in lock-step with each other, one character at a time. However, when implementing possessive repetition operators, I have come to the problem where I cannot iterate through a loop and simply break out of the current iteration and continue after the loop; this will result in characters being evaluated and passed that should not have been, since it should have magically known that the current iteration would fail, and it should continue evaluating outside of the possessive loop.

So, I decided that, like the other operators, I should spawn a duplicate thread at the start of every iteration that breaks out of the loop, and starts evaluating after the end of the loop at the same time that the original thread continues iterating through the loop. The problem is, that only one of them should exist. If the thread that is iterating through the loop reaches its end, it should always be preferred to the one that attempted to break out of the loop. And if the thread that is in the loop dies, the last one to break out of the loop is the correct one. However, I can't figure out how to ensure that only one thread survives. The time of decision is simple: at the end of the loop, if the looping thread is alive, kill the other. If the looping thread dies before reaching the end, the other will continue on with no assistance.

However, I can't think of a good way to kill the other thread. I can't keep a pointer to it; it may pass the torch to another thread, and more importantly, it may spawn new threads, that I can't track and kill without large amounts of memory expenditure for adding a generally simple feature. Can anyone else offer any advice?
Advertisement
Fine-grained thread control is somewhat undesirable.

Threads should be treated as if they run for full duration of time slice or ~20ms. In 20ms you can easily process a megabyte of characters. If using multiple cores and they are oversubscribed, then controlling thread may get preempted while others run unhindered, meaning they can easily run for many slices.


The correct solution to above problem is thread pooling via tasks (continuations, generators, coroutines). Define algorithms in such a way that they have no internal state, at least not the typical per thread stack. Approach like that also allows you to 'suspend' work:struct State {
int counter, int max;
bool interrupted;
...
void doOneStep() {
if (counter < max && !interrupted)
doStuff(counter);
counter++;
enqueue(this); // put self back into task queue
else
enqueue(someOtherTask); // schedule some other task

}
}
Above implementation is naive and will lead to terrible throughput, but it shows the basic idea.

Approach like above doesn't require hard-threads anymore - it can run on a single core with no threading support at all (incredibly efficient at that). If using multiple cores/hard threading, the queue overhead is considerably bigger and must be taken into account. Synchronization is also no longer deterministic.

And if the thread that is in the loop dies, the last one to break out of the loop is the correct one.[/quote]

Approach like this is more suited for heterogenous distributed systems with large numbers of nodes. It's far from ideal for local computation. Thread executions are not deterministic - "last one" is mostly a matter of luck.
I used "threads" in quotation marks; the threads are part of an NFA state machine, which are objects that contain the current state, loop, and capture group information. They aren't threads provided by the machine or OS, and they all move in lock-step with each other.

However, thank you for the reply. I should have made this clearer.
Perhaps I could mark the threads that break out early with the address of the looping thread, and they'll pass the mark on to their children. When the looping thread hits the end of the loop, it could check for its address in all of the currently running threads, and remove those that are marked; if it dies, the marks could be erased from the threads, to prevent them being removed incorrectly if the looping thread is recycled, and a new thread bears the same address. My main concern is that this method is linear in speed.
Nevermind, that won't work; if the looping thread spawns children, and then dies, the children will not have the same address as their parent, and cannot kill the marked threads.

This topic is closed to new replies.

Advertisement