Back to General and Gameplay Programming

Is This A 'thread Deadlock'?

General and Gameplay Programming Programming

Started by DividedByZero July 24, 2016 01:37 AM

7 comments, last by Hodgman 7 years, 8 months ago

DividedByZero

1,261

Author

July 24, 2016 01:37 AM

Hi guys,

Yesterday I was playing with threads and using a global variable to determine if a thread had finished or not.

Pseudo code

// thread
loop until finished
set 'done' to true when finished

// main loop
loop until 'done' is true

I found that the program would stall in the main loop and never terminate.

I also found by locking 'done' in both the thread and main loop resolved the issue.

Is this an example of a deadlock?

Hodgman

52,717

July 24, 2016 02:40 AM

It's simply just not valid code :wink: :P
If you're sharing a mutable variable between threads, then you need to use some form of synchronization, such as wrapping it in a mutex.

Assuming C/C++: The old-school advice would be: this will work fine if 'done' is volatile, but don't do that (it's not what volatile is for, and will still be buggy). You can, however, make done a std::atomic and it will work in this particular scenario. An atomic variable is basically one that's wrapped in a super-lightweight, hardware-accelerated mutex, and by default is set to provide "sequentially consistent ordering" of memory operations.

Without some for of synchronization being present, there's no guarantee that changes to memory made by one thread will be visible by another thread at all, or in the correct order.
Ordering matters a lot in most cases, e.g. let's say we have:

result = 0
done = false

Worker:
  result = 42; // do some work
  done = true; // publish our results to main thread

Main:
  Launch Worker
  while(!done) {;} // busy wait
  print( result );

If implemented correctly, this code should print '42'.
But, if memory ordering isn't enforced by the programmer (by using a synchronization primitive), then it's possible that Main sees a version of memory where done==true, but result==0 -- i.e. the memory writes from Worker have arrived out of order.
Synchronization primitives solve this. e.g. the common solution would be:

mutex
result = 0
done = false

Worker:
  lock(mutex)
  {
    result = 42;
    done = true;
  }

Main:
  Launch Worker
  while(1)
  {
    lock(mutex)
    {
      if done then break;
    } 
  }
  print( result );

Or the simplest atomic version... which honestly you should only try to use after doing a lot of study on the C++11 memory model and the x86 memory model (and the memory models of other processors...) because it's easy to misunderstand and have incorrect code :(

result = 0
done = false

Worker:
  result.AtomicWrite(42, sequential_consistency)
  done.AtomicWrite(true, sequential_consistency)

Main:
  Launch Worker
  while(!done.AtomicRead(sequential_consistency)) {;} // busy wait
  print( result.AtomicRead(sequential_consistency) )

. 22 Racing Series .

DividedByZero

1,261

Author

July 24, 2016 04:36 AM

Nice! Thanks Hodgman for clearing this up for me.

In my case I started locking the variable in the end, as it was the only way that would work 'consistently'. Your explanation explains why.

SeanMiddleditch

17,596

July 24, 2016 06:25 PM

It's not a "deadlock" per se but the symptoms are identical.

A deadlock is when you have locking and you've put yourself into a situation where nobody can acquire the locks they need to make progress. e.g., if there are two locks, both threads need both locks, but they acquire the locks in the opposite order; one thread has lock A and the other thread has lock B, so neither can ever grab the other lock. This is one of the reasons why sharing mutable data (and hence requiring locks) is often considered a poor design. When sharing is necessary, higher level primitives like fences or channels are preferable (they do sharing, but they wrap it all up in a single small well-tested hard-to-misuse place).

The reason your code hangs forever, for your edification, is that C++ is fully within its right to strip out unobservable code. There are very explicit rules about what is "observable" in the standard.

In particular, the main loop is running a piece of code that never sets your global variable. Hence, the compiler is allowed to cache a value of it, because C++ does not assume that all variables are modifiable at any time by any thread (because that would disallow a vast majority of useful optimizations!). Very roughly speaking, the C++11 memory model added semantics about atomics, barriers, and locking primitives that basically say that their use invalidates any caching, forcing the compiler to reread values from memory. That's why your locks work - the use of a lock tells the compiler that anything it cached before the lock is invalid.

Hodgman's answer illustrates a second facet of those rules: the compiler (and OS, and hardware, etc.) is allowed to re-order reads and writes so long as that reordering is onobservable to valid C++ code (which yours is not!). It's often called the "as-if rule" meaning that the compiler can change your code in any way it want so long as the output is _as if_ it had translated it verbatim, assuming that the original source code followed all the rules.

One of those rules is that memory is not shared between threads without synchronization primitives. No synchronization, the compiler assumes the memory is not shared, and can hence reorder or even elide reads and writes.

That's why debugging released code gets so hard, too; the compiler is _allowed_ to elide variables or reorder all your code, so long as the _result_ is identical (in correct code). That ability is _essential_ to optimization.

Sean Middleditch – Game Systems Engineer – Join my team!

DividedByZero

1,261

Author

July 26, 2016 04:50 AM

^^ Nice explanation! Thanks man! :)

JohnnyCode

1,084

July 26, 2016 10:00 AM

Exactly as Sean said, I will try to recapital it more shortly.

If a single CPU core (or CP unit) does a write operation, and ends by that moment, there is no reason for moving the writen operation out from closest cache up the laddder to the RAM, and if another CPU core reads a value, there is no need to read it up the ladder from RAM if it is being hit in the cache (L1,L2 whatever). A lock keyword for example ensures that- reading this variable entity is assured done from RAM (or L3 shared), and ending a lock says that, this variable entity is written to RAM (or L3 shared at least).

(Remember lock keyword carries () operator in which you specify the criticaly obseved variable entity. The inside of block has nothing to do with specific of other safe variable, only the operations done to do the achievement)

Simply, a single assignment operation does not happen on a level visible to RAM or other cores. That what may get you confused as - I did this =, why my thread reading this is not this :)

Shannon Barber

1,684

July 30, 2016 07:55 PM

That's a primitive (with a reader-writer) which means updating it is atomic which means it's not a locking problem.

He didn't use 'volatile' so the check got optimized out.
Adding locking will probably fix the issue but it's not guaranteed! and it's overkill for such a simple case.

volatile int g_done;

- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara

frob

46,221

July 30, 2016 10:17 PM

volatile int g_done;

That is not what volatile means in C or C++. Some compilers through history adopted volatile for threading and for their own other purposes, but it is not portable and not standard. Java and C# adopted that meaning, but this thread is marked as C++. That improper usage introduces unnecessary costs and performance penalties and still may not solve the problem.

The correct solution is std::atomic, as described by Hodgman and others above. The atomic templates are fully specialized to use low-overhead functionality (such as interlocked values) to get great performance while still enforcing thread safety.

Hodgman

52,717

July 31, 2016 02:33 AM

To add to the above, adding volatile probably will "fix" it, but even if it does, it would still be theoretically undefined code that's just happening to work in practice, and could break in any future build...

Adding volatile will tell the compiler not to optimize (A) into (B):


(A) while( !done ) { doStuff }
(B) bool local = done; while( !local ) { doStuff } // uh oh, infinite loop!

So volatile will "fix" that one problem.

But volatile has nothing to do with multi-threading, so it does nothing for the memory ordering issues. I gave the example above of:


  result = 42; // do some work
  done = true; // publish our results to main thread

Even if you're using volatile, it's possible for the other thread to see the "done = true" write appear in memory first, and the "result = 42" write appear in memory second... which means that other thread might try to print result before it's been written, which is a race condition.

If you use volatile in your code, you have a bug.

. 22 Racing Series .

Is This A 'thread Deadlock'?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Is This A 'thread Deadlock'?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines