Is this thread safe?

Started by
5 comments, last by ultramailman 10 years, 4 months ago

Hello. I am curious if I can have two threads process a shared array in a pipelined manner, and still be thread-safe without using mutex locks:

thread1 advances in the array, doing its work.

thread2 also advances, but stops to wait if the next element is being processed by thread1.

To test that, I wrote a small program.

A writer thread writes into a shared array.

A reader thread reads from the shared array, then prints it in the terminal.

This repeats until 80000 elements have been printed.

I repeated this program many times, and one time the printing paused indefinitely. I suspect there was a deadlock, but I can't reproduce it again.

My question is: can the two threads in this program reach a state of deadlock? If so, can someone explain how it can happen?

edit: I thought this should be fine was because a while ago, I read an article, and it says it is fine as long as each piece of data is written to by only one thread. I kept that in mind and these two threads seem to do just that (at least I think they do).


#include <pthread.h>
#include <stdio.h>
//#include <windows.h>
#include <time.h>
#define DATA_SIZE (80)

// used to stop threads
volatile int running = 1;

// both threads use this share array of data
volatile unsigned data[DATA_SIZE] = {0};

// numbers to show which element a thread is using
volatile unsigned writer_seq = 1, reader_seq = 0;

// numbers to show how many times the threads waited for each other
volatile unsigned long long writer_waits = 0, reader_waits = 0;

// number of iterations the reader thread will run
volatile unsigned reads_left = DATA_SIZE * 1000;
void * writer_thread(void * args)
{
    unsigned count = 0;
    while(running){
        if(writer_seq == 0)
            count = (count + 1) % 10;
        data[writer_seq] = count;
        unsigned next = (writer_seq + 1) % DATA_SIZE;
        while(next == reader_seq)
        {
            writer_waits++;
            //Sleep(1);
        }
        writer_seq = next;
    }
    return NULL;
}
void * reader_thread(void *args)
{
    while(running){
        printf("%d", data[reader_seq]);
        unsigned next = (reader_seq + 1) % DATA_SIZE;
        while(next == writer_seq)
        {
            reader_waits ++;
            //Sleep(1);
        }
        reader_seq = next;
        
        reads_left--;
        if(reads_left == 0)
            running = 0;
    }
    return NULL;
}
int main(void)
{
    clock_t t1 = clock();
    pthread_t writer, reader;
    pthread_create(&writer, NULL, &writer_thread, NULL);
    pthread_create(&reader, NULL, &reader_thread, NULL);
    void * ret = NULL;
    pthread_join(writer, &ret);
    pthread_join(reader, &ret);
    clock_t t2 = clock();
    printf("reader waits:%llu, writer waits:%llu, time:%f\n", reader_waits, writer_waits, (float)(t2 - t1) / CLOCKS_PER_SEC);
}
Advertisement

You're trying to implement a lock-free single-producer/single-consumer queue.

Your code is not safe on any kind of modern CPU that can perform reordering -- to fix it you need memory fences.

If you used mutexes/etc, they will contain fences inside their implementations that will fix your code wink.png

Lastly, don't get in the habit of using volatile. See "volatile considered harmful" -- in every job I've worked, using volatile is generally banned, because it usually isn't required, usually isn't doing what the author thinks it's doing, and usually indicates the code is buggy.

I can't spot an obvious deadlock in there, but the values being passed through the data array can be ruined, by the updates to the seq variables being reordered with the read/writes of the array.

[edit]Mine is much the same, but using atomics instead of volatiles: fifo_spsc.h fifo_spsc.hpp

Also check out: http://www.1024cores.net/home/lock-free-algorithms/queues

I should probably let someone else answer this question, because I am by no means an expert on thread safety, but I think I'll take a shot at it anyway. My guess is that the CPU's out of order execution could be causing problems for you. The "reader_seq = next" assignment doesn't depend on the while loop that precedes it, and the CPU will notice this and could perform the reader_seq = next assignment before it does the loop. Consider this sequence of events that is possible with the CPU reordering your instructions:

writer_thread: next = writer_seq + 1 = 2

writer_thread: writer_seq = next = 2 // cpu performs this out of order

reader_thread: next = reader_seq + 1 = 1

reader_thread: next == writer_seq: 1 == 2

reader_thread: reader_seq = next = 1

reader_thread: next = reader_seq + 1 = 2

reader_thread: reader_seq = 2 // performed out of order

writer_thread: next == reader_seq: 2 == 2 - wait!

reader_thread: next == writer_seq: 2 == 2 - wait!

- deadlock!

Check out the wikipedia article for memory barriers for more info: http://en.wikipedia.org/wiki/Memory_barrier

@Hodgman

Ah I see, reordered operations can certainly make the polling useless.

The reason I put volatile before the file scoped variables was to disable optimization for those variables, so that even if I turned on an optimization flag, their values would be read every time instead of being optimized away to a constant. Would that be a valid reason to use volatile?

@Samith

Nice example, Samith. Took me a few times of following those steps to realize what happened, haha.

So what I can tell at the moment, if I can somehow make sure the while loop polling happens before the incrementing the sequence number, then there should be no problems. Right? I guess I'll need make use of a memory barrier somewhere.

edit: After looking at some stackoverflow questions, it seems like if I can make use of memory barriers, then the volatiles won't be needed.

edit2: I used memory barriers. After trying different places to put the barriers (before assigning sequence number to next, before the loop), I finally got it to not deadlock by putting the barriers inside both poll loops.

Cool you found a solution using memory barrieres smile.png However, if it's of any interest I once had a similar problem and in my case what happened was that the OS would sometimes switch from one thread to the other immediately after reading a shared (volatile) variable from memory before it had a chance to examine its value. As a result, when the first thread regained focus the variable had actually been altered by the second one although the value it would use hadn't been updated. For example, if you do something like this:

[source]volatile bool busy=0;

if(!busy)
{
// do something here
}[/source]

the first line "if(!busy)" is actually translated into two instructions; one for fetching the boolean value from memory, and one for testing if it's zero. This means that if the OS switches to the other thread immediately after the value has been read (with the first instruction) and the other thread in the meanwhile accesses some shared resources, trouble can arise when the first thread regains focus because it will examine the value as it was stored before it lost focus and perhaps determine that it's ok to proceed, even though that actually isn't the case.

It may sound unlikely that the OS would switch focus at that particular point in the code, but trust me: if the code is repeated perhaps thousands of times per second it will happen, sooner or later. I'm talking from bitter experience here ;)

I don't do any threading related tasks in C++, but I was curious about the keyword 'volatile', came across this http://msdn.microsoft.com/en-us/library/12a04hfd(v=vs.120).aspx and the first line (assuming I read it correct) made it sound useful when dealing with hardware, driver development maybe or communicating directly with hardware or something. I could be wrong however, never used the keyword before

EDIT: http://en.wikipedia.org/wiki/Volatile_variable#C.2B.2B11 says volatile is ONLY meant to be used for hardware access, so I guess it is safe to assume you should not be using it at all in your game

The new loop looks like this now:


next = (reader_seq + 1) % DATA_SIZE;
do __sync_synchronize();
while(next == writer_seq);
reader_seq = next;

But now I can see atomics are much simpler to work with.

@Noctumus

Sounds like atomics were what you needed.

@Dynamo_Maestro

Well, I knew volatile wasn't anything to help with thread safety (in c). In c it's only for disabling optimizations for a variable, and keeping the order of writes between two volatile variables (in the code). What I didn't know was that the CPU itself can execute writes out of order, even if the code has them in order. But I've rid of the code of all volatile now, it seems like what I needed was either atomics or memory barriers.

Writing thread safe code without mutex is hard, I just realized. There can be a deadlock that happens with a very rarely. So if I don't know what I'm doing, I can only think it is thread safe until I encounter a deadlock. Even now, I'm not sure if just that one memory barrier in both loops is all I need.

edit: Testing on gnu/linux showed that deadlocks still happen with this placement of memory barrier.

edit 2:

The polling loop now looks like this:


next = (reader_seq + 1) % DATA_SIZE;
unsigned tmp;
do{
__sync_synchronize();
tmp = writer_seq;
__sync_synchronize();
}while(next == tmp);
reader_seq = next;

edit 3:

The one in edit 2 worked, but it can be reduced to only one barrier in the loop:


next = (reader_seq + 1) % DATA_SIZE;
do __sync_synchronize(); // this barrier makes sure writer_seq is up-to-date before checking against "next"
while(next == writer_seq);
 
__sync_synchronize(); // this barrier makes sure reader_seq is updated only after the polling loop
reader_seq = next;

This topic is closed to new replies.

Advertisement