Sign in to follow this  
  • entries
    375
  • comments
    1136
  • views
    297498

Multithreading

Sign in to follow this  
superpig

81 views

I'm rolling back around to multithreading; that time of year, I guess. While walking back along Parks Road today, I was thinking in particular about debugging multithreaded code.

Firstly, what makes it different to debugging regular code? There are a few categories of bugs that simply don't exist in regular code. The prime one is the "race condition," a bug that only occurs when things happen in a particular order. You've also got deadlock, priority inversion, etc. These are all related to the fact that more than one thing is happening at once.

The problem is that these things are difficult to debug precisely because more than one thing is happening at once, and the way in which they do this is unpredictable. If that file load had taken just 5ms longer, it wouldn't have finished by the time the resource managed tried reading it, etc. If you serialise things - just run each thread one after another - then the problem will likely disappear, because things are no longer happening at the same time such that the interactions cause problems. Can we reconcile the two? I smell a tool.

Say you've got three processes that might be running simultaneously; let's just consider them as functions. If you wanted to test for interactions between them (and error states arising from such interactions), what you might do is write a simple unit test that kicks off the three threads, lets them run to completion, and then checks for errors, and does this 1000 times in the hope that it'll catch a reasonable number of different sequences. If we assume that we're running on a single CPU, then our functions are actually being serialised and executed - thread1, thread1, thread1, contextswitch, thread2, thread2, contextswitch, thread3 - and we have no control over how these instructions and context switches are generated and interleaved. We run the test 1000 times and hope that the resulting sequences represent a good portion of all the possible way things could happen.

I'm not content with that. If the CPU can take machine code from three threads and interleave them with context switches in between, why can't we? We'd be able to control how/when the switches happen that way; we could execute one instruction from each thread at a time, or five from the first for every one of the second, or we could simulate the effect of a context switch /right in the middle/ of a particularly critical operation...

Even better, we could generate sequences programmatically. Say we want to run 3 functions, A, B, and C, and each function is 100 instructions long, with no odd critical sections or anything like that. We start with function A in it's plain form, take function B, and squash bits of it in between A. How many possible ways can you interleave two 100-instruction functions, you may well ask? I'm not sure. It's probably quite a lot, involving 100-factorial terms. If anyone figures it out, tell me, please. Anyway, once you've done that you've got a 200-instruction sequence (+ context switches) and can interleave C in the same way. The number of sequences increases exponentially with each additional thread, I think... there are probably some degenerate cases that could be eliminated that haven't occured to me yet.

You take your billions of generated sequences (yeah, it probably is) and you run each one through the unit tests. Any of the tests that fail, you note down, recording the sequence that failed it. You'd probably need some smartness in the test such that it can figure out when something has become stuck in deadlock and abort instead of never completing the test. You can probably group failing sequences by failure point to produce more useful information.
Sign in to follow this  


2 Comments


Recommended Comments

Interesting post... I'd reckon you could only "solve" it by digging into some seriously entertaining CompSci core texts. Trust me, they aren't the most entertaining!

The thing you have to decide is "how low do you go" - simple as it sounds, you'll eventually either getting down to the OS or hardware, maybe (at best) some friendly virtual machine.

EDIT: Removed the bits that weren't even remotely relevant

Share this comment


Link to comment
Just out of curiosity over the combinatorial explosion... your estimate of billions of combinations is probably a little optimistic [smile]

Regular calculator can't handle such big numbers as 100!

35! = 10,333,147,966,386,144,929,666,651,337,523,200,000,000

Call it ten and-a-bit tredecillion [lol]

Jack

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now