Multithreaded rendering synchronization issue (double buffering)

Graphics and GPU Programming Programming

Started by lipsryme January 27, 2017 10:02 AM

11 comments, last by lipsryme 7 years, 2 months ago

1,522

Author

January 27, 2017 10:02 AM

EDIT: Sorry guys having issues with the forum

Hey guys,
I'm trying to implement a double buffered multithreading in my renderer but have run into an issue with the synchronization.

Here's the function on the [main thread] that generates the command list:


void FERenderer::Render(ISwapChain *swapChain)
{
    // Set current swapChain to render to
    this->currentChain = swapChain;
    this->renderer->SetCurrentSwapChain(swapChain);

    // Create [RenderThread] here (will wait until [MainThread] is done generating command list)
#ifdef MULTITHREADED_RENDERER_ENABLED
    if (!this->renderThreadCreated && !this->contentManager->IsCompilingShader())
    {
        this->renderThread = std::thread(&IBackEndRenderer::ExecuteCommandList, this->renderer,
																				 this->commandList, std::ref(renderThreadID));

        this->renderThreadCreated = true;
        this->renderThread.detach();
    }
  

    this->CreateCommandList();


#else
    this->CreateCommandList();
		if(!this->contentManager->IsCompilingShader())
		{
				renderThreadID = 1; // workaround
				this->renderer->ExecuteCommandList(this->commandList, 
																					 std::ref(renderThreadID));
				renderThreadID = 0;
		}    
#endif
   
    this->profiler->FrameComplete();


#ifdef MULTITHREADED_RENDERER_ENABLED
		this->renderer->RenderDoneCondition().Wait();
		this->renderer->RenderCommandsReadyCondition().Set();
		this->renderer->BufferSwappedCondition().Wait(); // <- If removed, no deadlock !
#endif
}

And this is the function on the [render thread] that executes the command list:


void LipsRenderD3D11::ExecuteCommandList(std::vector<unsigned char> *cmdList, bool &renderThreadID)
{
#ifdef MULTITHREADED_RENDERER_ENABLED
    while (true)
#endif
    {
#ifdef MULTITHREADED_RENDERER_ENABLED
				// Wait for [MAINTHREAD] to provide us with command list data
				bufferSwappedCondition.Reset();
				renderDoneCondition.Set();
				renderCommandsReadyCondition.Wait();

				renderDoneCondition.Reset();
				renderCommandsReadyCondition.Reset();

				// Swap buffer IDs
				renderThreadID = !renderThreadID;

				// Notify [MAINTHREAD] that the buffer was swapped, so we can generate new commands in parallel
				bufferSwappedCondition.Set();
#endif

        // Early out if forcing shutdown of renderThread
        if (this->forceShutdown)
            return;


        const size_t dataSize = cmdList[!renderThreadID].size();
				unsigned char* commandList = cmdList[!renderThreadID].data();
        unsigned int n = 0;
        SwapChainD3D11 *swapChain = (SwapChainD3D11*)this->currentChain;

        while (n < dataSize)
        {
            eRC e_RC = (eRC)ReadCommand<unsigned int>(commandList, n);

            switch (e_RC)
            {
								//...
            }
        }
    }    
}

I tested this synchronization design in a console application and logically it should really work, however putting this in my renderer I'm getting a deadlock as soon as I run my engine.
This deadlock doesn't happen when I remove the

BufferSwappedCondition().Wait()

but that makes it crash. It runs okay as far as I can tell if I make renderThreadID atomic and then have it do the swap (which creates a form of synchronization) but I'm not sure if that's a viable workaround. Do you guys have any idea why this leads to a deadlock ?

Portfolio/Blog: http://marcel-schindler.weebly.com

Krypt0n

4,769

February 06, 2017 10:01 PM

have you declared renderThreadID volatile? Otherwise the compiler assumes that the variable doesn't need to be visible outside of the while loop in a single thread, hence in ExecuteCommandList it might determine that there is no use in changing that variable and will optimize it out.

video game porting and optimization service + consulting

Hodgman

52,717

February 06, 2017 10:37 PM

In some APIs, 'waits' are allowed to wake up spuriously (early, without the condition actually being met). In such APIs, they have to be called in a loop that checks the condition.

You don't really show how the main thread uses this ID variable. Do the two threads share it, but pass ownership of it back and forth via those waits/signals, so that only one thread owns it at a time? It might be better to show a simple bit of pseudocode or a diagram of how these two threads are meant to be synchronizing/sharing data.

@Krypt0n volatile is not for multithreading use on multicore PCs; using it for such only guarantees memory ordering bugs.

. 22 Racing Series .

Krypt0n

4,769

February 06, 2017 10:46 PM

@Hodgman
no, volatile signals the compiler to make changes visible outside the scope of the current translation unit and runtime. something like


while(!renderThreadID){};

can be optimized into


if(!renderThreadID)
while(true){};

if renderThreadID is not volatile.

video game porting and optimization service + consulting

Hodgman

52,717

February 07, 2017 12:11 AM

@Hodgman
no, volatile signals the compiler to make changes visible outside the scope of the current translation unit and runtime. something like
while(!renderThreadID){};
can be optimized into
if(!renderThreadID)
while(true){};
if renderThreadID is not volatile.

Yeah I know, but it should still never be used to let the compiler know that the variable can be modified by another thread, unless you're on an old single-core PC with no instruction or memory reordering capabilities. Even with that fix, it's still horribly broken.

e.g. Given some code like this, someone might expect that volatile will allow the producer thread to signal the consumer thread, resulting in the consumer spinning for a while and then printing 42.


int data[2] = { -1, -1 }
volatile int index = 0;
//producer thread:
data[0] = 42;
index = 1;
//consumer thread:
while( index == 0 ) { /* spin */ }
printf( "%d", data[0] );

However, the CPU itself (not even the compiler!!) might decide to read the value of data[0] before it reads the value of index, leading this to print -1 instead of 42 as expected. Likewise, the CPU might decide to write index=1 to memory before it writes data[0]=42 to memory, leading to the same result of -1 being printed. It might print 42, but that's down to chance, not correct code.
Even if the compiler does produce the loads and stores in the right order (which it's not actually bound to, unless data is also volatile), CPUs themselves are allowed to reorder loads and stores as long as the single-threaded behavior of the program would not be affected. The CPU does not know that you've marked a variable as volatile or not. You need to explicitly tell the CPU that it's not allowed to reorder memory accesses with a memory fence instruction (which the compiler will emit when using atomics, critical sections, or any kind of actual synchronization primitive). If you use the volatile keyword in multi-threaded code, you have a bug.

. 22 Racing Series .

Krypt0n

4,769

February 07, 2017 12:59 AM

(which the compiler will emit when using atomics, critical sections, or any kind of actual synchronization primitive)

that's not correct. it depends on the implementation of the synchronization primitive. e.g. there are processor architectures which require explicit cache flushes, on these the atomic implementations commonly don't enforce a flush or memory sync beyond the actual atomic, otherwise simple spins would become insanely expensive.
Hence, don't rely on this behavior, it's probably just the implementation you're used to.
as an example: std::atomic allows you to specify memory ordering: http://en.cppreference.com/w/cpp/atomic/memory_order
and you obviously don't need to pay the price for it, if you don't need it, it's not enforced. therefore there is no generic rule that synchronization primitives enforce memory visibility.

@Hodgman
no, volatile signals the compiler to make changes visible outside the scope of the current translation unit and runtime. something like
while(!renderThreadID){};
can be optimized into
if(!renderThreadID)while(true){};
if renderThreadID is not volatile.
 
Yeah I know, but it should still never be used to let the compiler know that the variable can be modified by another thread, unless you're on an old single-core PC with no instruction or memory reordering capabilities. Even with that fix, it's still horribly broken.

don't make these "always true" assumptions. if you know your architecture, and you know what you are doing, then it makes no sense to take the more expensive route just for the sake of ideological correctness. There is fully valid software that uses volatile for communication, it's not just for "cores", it can be for system wide data exchange.
(Don't read that I say "all you need is volatile", that's not what I imply! ;) )

e.g. Given some code like this, someone might expect that volatile will allow the producer thread to signal the consumer thread, resulting in the consumer spinning for a while and then printing 42.
int data[2] = { -1, -1 }volatile int index = 0;//producer thread:data[0] = 42;index = 1;//consumer thread:while( index == 0 ) { /* spin */ }printf( "%d", data[0] );
However, the CPU itself (not even the compiler!!) might decide to read the value of data[0] before it reads the value of index, leading this to print -1 instead of 42 as expected. Likewise, the CPU might decide to write index=1 to memory before it writes data[0]=42 to memory, leading to the same result of -1 being printed. It might print 42, but that's down to chance, not correct code.

hypothetically possible, practically he runs windows, DX11, on an x86 CPU. There is just one case where x86 reorder memory operations and your example is not covering it.
with one producer, one consumer, on x86, you'll get away without atomics.

yes, atomics + barriers are the "safe route", but that's what he has already verified, that's why he was asking for guidance, hence trying to use volatile might hint that his compiler was optimizng loops. your suggestion about sporadic awakening is just as valid to try.

video game porting and optimization service + consulting

lipsryme

1,522

Author

February 07, 2017 09:08 AM

Well basically all the main thread does with renderThreadID is use it as the subscript to access the correct one of the two buffers (aka the one that is not being worked on by the render thread) like so commandList[renderThreadID][size] I tried making it volatile as you suggested but that didnt help. Also I believe I'm already handling the spurious wakeup condition:

class Event
{
public:
		Event() : flag(false) {}

		void Set()
		{
				flag = true;
				condition.notify_all();
		}
		void Reset()
		{
				flag = false;
				condition.notify_all();
		}
		void Wait()
		{
				std::unique_lock lk(m);
				while (!condition.wait_for(lk, std::chrono::milliseconds(0), [this]() { return flag; }))
				{
						if (flag) // avoid spurious wakeup
								break;
				}
		}
		void Wait(const long milliseconds)
		{
				std::unique_lock lk(m);
				condition.wait_for(lk, std::chrono::milliseconds(milliseconds), [this]() { return flag; });				
		}

private:
		mutable std::mutex m;
		mutable std::condition_variable condition;
		bool flag;
};


#endif

The only synchronization between the threads is the wait/signalling. If that works correctly there should be no need for a barrier around that renderThread variable or not ? Or perhaps I need to rethink this more...

Portfolio/Blog: http://marcel-schindler.weebly.com

samoth

9,833

February 07, 2017 11:23 AM

have you declared renderThreadID volatile?

long explanation

Everything you say is perfectly correct under those very narrow assumptions (Windows X86 system, and MSVC compiler), but I guess what Hodgman tried to point out was that using a std::atomic and doing a store(...std::memory_order_relaxed) is the same thing and none more expensive (will be an ordinary store on X86 with very, very little impact on the compiler's ability to reorder instructions), but it is 100% reliable and portable (and... formally correct).

In some APIs, 'waits' are allowed to wake up spuriously (early, without the condition actually being met).

Unlikely, Windows tends to oversleep (well-known) but it rearely over-waits (it's possible, being non-RT, but priority boost makes it magically work 99.9% reliably), and to my knowledge never spuriously wakes, except in one condition that practically never happens. The only situation of which I'm aware of in which this can happen is being in an alertable wait (one of SleepEx(...TRUE), GetQueuedCompletionStatus(), or NtWaitForKeyedEvent(...TRUE)) and during your wait, another thread posts a user-level APC (kernel APCs just run and block your thread again without leaving a trace) or calls NtAlertThread(). So, unless you build your event class around a keyed event or a critical section (which uses a keyed event) and at the same time have some code in another thread which does one of the above, that shouldn't happen. Since you have to deliberately do something very specific, you would likely know if that was the case. I don't know how MSVC imlements its condition variables, but might as well be that there's a KEV in there. Though why a spurious wake should cause a deadlock is beyond my understanding...

samoth

9,833

February 07, 2017 11:50 AM

while (!condition.wait_for(lk, std::chrono::milliseconds(0), [this]() { return flag; }))

Is this deliberate?

That's not likely to be the cause of your problem, but you do know that this is not the same as calling wait()? On the contrary, given a zero timeout you are never waiting, this is busy spinning.

Krypt0n

4,769

February 07, 2017 12:26 PM

have you declared renderThreadID volatile?
 
long explanation
Everything you say is perfectly correct under those very narrow assumptions (Windows X86 system, and MSVC compiler), but I guess what Hodgman

I agreed with him if he talks hypothetically, and pointed out my reply was just a poke for the narrow case the topic starter has. (as nobody even tried to help for 2 weeks apparently). if it doesn't help, it's fine, but it gets the topic going and everybody gives a random shot and we solve it together. I never intended to claim I'm a master in remote-crystal-ball-debugging :)

tried to point out was that using a std::atomic and doing a store(...std::memory_order_relaxed) is the same thing and none more expensive (will be an ordinary store on X86 with very, very little impact on the compiler's ability to reorder instructions), but it is 100% reliable and portable (and... formally correct).

was not clear to me that he was talking about std::atomic, he seem to talk about any implementation of atomics (or synchronization primitives in general).
like he pointed out, it's not just about compiler reordering of instructions, it's about memory ordering of the hardware. yes, even on x86 you might need that, depending on the case. relaxed might be not enough.

video game porting and optimization service + consulting

Multithreaded rendering synchronization issue (double buffering)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Multithreaded rendering synchronization issue (double buffering)

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines