Jump to content
  • Advertisement
Sign in to follow this  
Prune

std::thread in MSVC with sync/async cancellation support

This topic is 2567 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

One thing I liked in pthreads that's not in C++11 threads is cancellation support. What I don't like in pthreads, however, is the inability to first attempt a synchronous cancellation and then, if that fails (with an error or user-settable timeout), attempt an asynchronous cancellation (this is because one cannot change the cancellation type of another thread, only of the calling thread).

Under Windows, however, this was not difficult to achieve. Luckily, MSVC (code above requires 2010 version and won't work with older ones) does not have an std::thread implementation so I rolled my own, and was able to integrate the cancellation support I need (I'm working on a high reliability application where certain hung threads should be able to be cancelled and restarted, and I don't have the luxury of just using separate processes due to the performance impact of using shared memory).

Here's actual code from an application:
[source lang="cpp"]
// Launch here
_cmpThread = move(thread(&Log::Compress, this));
...
...
// Elsewhere
if (_cmpThread.joinable() && !TimedThreadJoin(_cmpThread, 3000))
{
if (!Cancellation::DoSync(_cmpThread, 500)) Cancellation::DoAsync(_cmpThread);
THROW(runtime_error, "Had to cancel log compression thread")
}
if (!(_cmpException == nullptr)) rethrow_exception(_cmpException);
[/source]
The above will first wait for three seconds (another thing I miss from the std::thread API is timed joins), then if that fails, tries to cancel the thread synchronously--at cancellation points (more on that later), and if that fails, injects an exception into the (presumably) hung thread. Asynchronous cancellation is generally frowned upon, but I've tested this fairly thoroughly and it is very rare that a problem occurs, and at least gives you the chance to recover without having to terminate the program. On the other hand, with pthreads you have to commit to a cancellation being synchronous or asynchronous and can't test out the safer route first. In both the synchronous and asynchronous cases, using an exception causes the unwinding of the stack and object destructors to be called (with rare failures of this properly happening in the asynchronous case).

Now onto the implementation.

I implemented synchronous cancellation by using Windows APC (asynchronous procedure call) so that system calls will be interrupted as well, rather than only being able to have userland cancellation points. Then I added it to my synchronization primitives, and any other thread-related wait points. For example, here's my mutex lock function (algorithm based on the wonderful locklessinc.com articles on optimization):
[source lang="cpp"]
while (Load(_owned) || _interlockedbittestandset(reinterpret_cast<long *>(&_waiters), 0))
{
unsigned long waiters(Load(_waiters) | 1);
if (AtomicCAS(&_waiters, waiters, waiters + 512) == waiters) // Indicate sleeping
{
long const ret(NtWaitForKeyedEvent(_handle, reinterpret_cast<void **>(this), 1, nullptr)); // Sleep
#pragma warning(disable : 4146) // Negating an unsigned
AtomicAdd(&_waiters, -256); // Indicate finished waking
#pragma warning(default : 4146)
if (ret)
{
if (ret != STATUS_USER_APC) throw std::runtime_error("Failed to wait for keyed event");
Cancellation::Test();
}
}
}
[/source]
STATUS_USER_APC will be returned from the keyed event wait in the case for an APC, but the explicit test is necessary in case the APC was not from an actual cancellation. Other places I've made cancellation points are a thread join (including the non-std timed join I've added) as well as sleep functions, etc., and of course, Cancellation::Test() is usable in user code.

In case of a synchronous cancellation, Cancellation::Test() throws a Cancellation which the std::thread run function expects, so a catch (...) in user code should be preceeded by catch (Cancellation const &) { throw; }

So I've made std::thread creation and running as follows:

Constructors for various numbers of arguments (no variadic templates yet in MSVC):
[source lang="cpp"]
template<typename C, typename A0, typename A1>
inline thread::thread(C &&func, A0 &&arg0, A1 &&arg1)
{
Start(make_shared<function<void (void)>>(function<void (void)>(std::bind(forward<C>(func), forward<A0>(arg0), forward<A1>(arg1)))));
}
// and so on
[/source]
Implementation:
[source lang="cpp"]
struct Pack
{
shared_ptr<function<void (void)>> runpack;
shared_ptr<Pack> self;
};

void thread::Start(shared_ptr<function<void (void)>> runpack)
{
auto pack(make_shared<Pack>(Pack()));
pack->runpack = runpack;
pack->self = pack; // Allow new thread to take RAII ownership by incrementing reference count
unsigned int tid;
_handle = reinterpret_cast<void *>(_beginthreadex(nullptr, 0, RunThread, pack.get(), 0, &tid));
if (_handle) StoreRel(_id._id, tid);
else
{
int err(errno);
pack->self.reset(); // Decrement reference count so when function returns pack will be destroyed
THROW(runtime_error, "Failed to start new thread: " << StrErr(err))
}
}

unsigned int __stdcall thread::RunThread(void *args)
{
Pack *p(static_cast<Pack *>(args));
shared_ptr<Pack> pack;
pack.swap(p->self);
try
{
(*pack->runpack)();
}
catch (Cancellation &c)
{
#if !defined NDEBUG
assert(Cancellation::thisThread);
while (Cancellation::thisThread->get_id() == thread::id()) this_thread::yield();
assert(Cancellation::thisThread->get_id() == this_thread::get_id());
#endif
Cancellation::thisThread->detach();
Store(c._e, true);
}
catch (...)
{
terminate();
}
return 0;
}
[/source]
std::thread is a friend of Cancellation and thus, unlike the user, can set the Cancellation flag _e. If the cancellation is caught and not rethrown, when it reaches its destructor it will call terminate(). This behavior is by design so that if thread code accidentally prevents a cancellation by catching the exception, it will at least kill the whole process.

Asynchronous cancellations are implemented in the usual way, by editing the thread context and setting the instruction pointer to a throwing function.

Here's the code for the Cancellation class:

cancellation.h
[source lang="cpp"]
// Copyright 2011 Borislav Trifonov. All rights reserved.

#if !defined CANCELLATION_H
#define CANCELLATION_H

#include "sysenv.h"

#include <exception> // For terminate()

namespace std
{
class thread;
}

#if defined WINDOWS

#if !defined _WINDOWS_
#if defined X86_64
typedef unsigned __int64 ULONG_PTR;
#else // X86
typedef unsigned long ULONG_PTR;
#endif
#endif

class Cancellation // Do not throw unless wait alerted by user APC since it uses threadSelf
{
public:
inline static void Test(void);
static bool DoSync(std::thread &t, unsigned int ms); // Will return false if timeout or throw on error; do not call multiple times on the same thread
static void DoAsync(std::thread &t);
static void DontOptimize(void);
private:
inline Cancellation(void);
inline Cancellation(Cancellation &other);
inline ~Cancellation(void);
friend std::thread;
__declspec(noreturn) static void Throw(std::thread *const tp);
static void RaiseAsync(std::thread &t);
static void __stdcall APCProc(ULONG_PTR ptr);
static int inProgress; // Faster to check this first than TLS
static _declspec(thread) bool doThisThread;
static thread_local std::thread *thisThread;
bool _e;
};

inline Cancellation::Cancellation(void)
{
Store(_e, false);
}

inline Cancellation::Cancellation(Cancellation &other)
{
Store(_e, other._e);
Store(other._e, true);
}

inline Cancellation::~Cancellation(void)
{
if (!Load(_e)) terminate();
}

inline void Cancellation::Test(void)
{
if (Load(inProgress) && Load(doThisThread))
{
AtomicAdd(&Cancellation::inProgress, -1);
Store(Cancellation::doThisThread, false);
throw Cancellation();
}
}

#else // Linux

extern void pthread_testcancel(void);

namespace Cancellation
{
inline void Test(void)
{
pthread_testcancel();
}
bool DoSync(std::thread &t, unsigned int ms); // Will return false if timeout or throw on error; do not call multiple times on the same thread
void DoAsync(std::thread &t);
}

#endif

bool TimedThreadJoin(std::thread &t, unsigned int ms);

#endif
[/source]
cancel.cpp
[source lang="cpp"]
// Copyright 2011 Borislav Trifonov. All rights reserved.

#if defined _MSC_VER || defined __ICL

#include <windows.h>
#include "cancellation.h"
#include <cassert>
#include "thread.h"
#include "error_plus.h"

using namespace std;

int Cancellation::inProgress(0);

thread_local bool Cancellation::doThisThread(false);

thread_local thread *Cancellation::thisThread(nullptr);

void Cancellation::DontOptimize(void)
{
volatile int i(0);
if (i) throw;
}

bool Cancellation::DoSync(std::thread &t, unsigned int ms)
{
if (!QueueUserAPC(APCProc, t.native_handle(), reinterpret_cast<ULONG_PTR>(&t))) THROW_LASTWINERR(runtime_error, "Could not cancel thread")
if (ms) return TimedThreadJoin(t, ms);
return !t.joinable();
}

void Cancellation::DoAsync(std::thread &t)
{
Cancellation::RaiseAsync(t);
if (!TimedThreadJoin(t, 500)) // Thread might be in a system call, so try to interrupt
{
if (!QueueUserAPC(APCProc, t.native_handle(), reinterpret_cast<ULONG_PTR>(&t))) THROW_LASTWINERR(runtime_error, "Could not cancel thread") // Might have blocked in a system call again
if (!TimedThreadJoin(t, 500)) THROW(runtime_error, "Thread was not cancelled")
}
}

__declspec(noreturn) void Cancellation::Throw(thread *const tp)
{
Store(Cancellation::thisThread, tp);
throw Cancellation();
}

void Cancellation::RaiseAsync(std::thread &t)
{
HANDLE h(t.native_handle());
if (static_cast<int>(SuspendThread(h)) == -1) THROW_LASTWINERR(runtime_error, "Could not suspend thread being cancelled")

CONTEXT ctx;
ctx.ContextFlags = CONTEXT_CONTROL;
if (!GetThreadContext(h, &ctx)) THROW_LASTWINERR(runtime_error, "Could not get context of thread being cancelled")
#ifdef X86_64
#error "Need to pass the parameter using Microsoft's x86-64 calling convention--probably in RCX"
ctx.Rip = reinterpret_cast<unsigned long long>(Throw);
#else // X86
ctx.Esp -= 2 * sizeof(size_t);
(reinterpret_cast<thread **>(ctx.Esp))[1] = &t; // No need to set return address since Throw() is noreturn
ctx.Eip = reinterpret_cast<unsigned long>(Throw);
#endif
if (!SetThreadContext(h, &ctx)) THROW_LASTWINERR(runtime_error, "Could not set context of thread being cancelled")

int ret, last(MAXIMUM_SUSPEND_COUNT);
while (true)
{
ret = ResumeThread(h);
switch (ret)
{
case -1:
THROW_LASTWINERR(runtime_error, "Could not resume thread being cancelled");
case 0:
THROW(logic_error, "Could not resume thread being cancelled as it's not suspended")
return;
case 1:
return;
default:
if (ret > last) THROW(runtime_error, "Suspend count of thread being cancelled not monotonically decreasing") // Avoid any possibility for an infinite loop
last = ret;
break;
}
}
}

void __stdcall Cancellation::APCProc(ULONG_PTR ptr)
{
Store(Cancellation::thisThread, reinterpret_cast<thread *>(ptr));
AtomicAdd(&Cancellation::inProgress, 1);
Store(Cancellation::doThisThread, true);
}

bool TimedThreadJoin(std::thread &t, unsigned int ms)
{
return t.join(ms);
}

#else // Linux

#include <pthread.h>
#include <?/descr.h> // For descriptor struct pthread and masks // TODO: Check if pthreadP.h needed for __pthread_unwind()
#include "thread.h"
#include "error_plus.h"

bool Cancellation::DoSync(std::thread &t, unsigned int ms) // TODO: Test
{
int ret(pthread_cancel(t.native_handle()));
if (ret) THROW(runtime_error, "Could not cancel thread: " << LastErr(ret))
void *res;
ms += Now();
timespec tmspc = {ms / 1000, (ms % 1000) * 1000000};
ret = pthread_timedjoin(t.native_handle(), &res, &tmspc); // TODO: Might be pthread_timedjoin_np
if (ret == ETIMEDOUT) return false;
if (ret) THROW(runtime_error, "Could not cancel thread: " << LastErr(ret))
if (res != PTHREAD_CANCELED) THROW(runtime_error, "Thread was not cancelled")
t.detach();
return true;
}

void Cancellation::DoAsync(std::thread &t) // TODO: Test
{
struct pthread *other(reinterpret_cast<pthread *>(t.native_handle())); // pthread_t is opaque pointer to the descriptor
int oldval(Load(other->cancelhandling));
while (true) // Change the cancellation type of another thread (cannot use pthread_setcanceltype)
{
int newval(oldval | CANCELTYPE_BITMASK);
if (newval == oldval) break;
int curval(AtomicCAS(&other->cancelhandling, oldval, newval));
if (curval != oldval)
{
oldval = curval;
continue;
}
if (CANCEL_ENABLED_AND_CANCELED_AND_ASYNCHRONOUS(newval))
{
// ?? Cannot call __do_cancel() as it and the __pthread_unwind() it calls use self
}
break;
}
if (!DoSync(t, 500)) THROW(runtime_error, "Thread was not cancelled")
}

bool TimedThreadJoin(std::thread &t, unsigned int ms)
{
void *res;
ret = pthread_timedjoin(t.native_handle(), &res, &tmspc); // TODO: Might be pthread_timedjoin_np
if (ret == ETIMEDOUT) return false;
if (ret) THROW(runtime_error, "Could not cancel thread: " << LastErr(ret))
return true;
}

#endif
[/source]
There are a couple of headers omitted for brevity. Load() and Store() are relaxed memory order loads and stores; I have these instead of std::atomic because I've been using them for a long time and I want to retain consistency.

The Linux version work in progress and not in a usable state. The problem is that pthread_setcanceltype() only works on the calling thread and it's difficult to work around this. As it stands, changing the cancellation type to asynchronous as in the Linux part of the code above will likely cause the stack not to be unwound properly in a cancellation. I'm looking for suggestions.

I can post my complete std::thread upon request, but if you don't need cancellation support I'd suggest using Anthony Williams' version as it has everything in it and he wrote most of the standard, after all: www.stdthread.co.uk
I can share the code for optimized synchronization primitives when I clean it up a bit; pm me if you want to see it now.

Cheers

Share this post


Link to post
Share on other sites
Advertisement
Thread cancellation has generally been agreed upon to be a bad idea. In almost all languages it's next to impossible to guarantee that external shared resources have been properly released (memory, handles, mutex, lock, file access, database, transactions, ...). Even Java, which contains top-to-bottom reflection capabilities and supports threading as first-class construct deprecated cancellations.

A simple problem in C++ - memory allocations. Thread allocates 1MB of memory, is cancelled before these resources can be properly cleaned up (smart pointers don't help since references might be shared, perhaps even in subtle way inside kernel). Each time you restart a thread, you lose 1MB, causing a memory leak. This problem isn't solved (solvable!) even in (plain) Java, which is completely managed.

Final problem are external tasks. A thread starts external task (perform backup of data cluster). Thread then blocks waiting for result (takes 8 hours) but is cancelled. Yet the external task keeps on running, but is a "zombie". Whatever work it performs is redundant, since it might require a followup from initiating thread. This can occur in many places inside kernel when using async operations.

(I'm working on a high reliability application where certain hung threads should be able to be cancelled and restarted[/quote]

If you're working on a *high reliability*, then there is no such thing as "hung". Each thread needs to be in deterministic state at any given time. The fact something can hang is a huge warning sign and cause must be determined and handled specifically. Imagine a flight control system which can hang. If a thread can hang, you lose guarantee of progress, so if one thread can hang in such way, entire system can hang indefinitely.

Otherwise, the design would need to be *highly available*, which is achieved with redundancy. This redundancy can be either symmetric (run same task in multiple places using different configurations/parameters/implementations or the overall system can be based around non-deterministic, but high-probability completion criteria (map/reduce, eventual consistency or similar). It also becomes necessary to handle partial completion or permanently failed tasks, along with ability to detect and properly classify them (if SQL INSERT is taking 17 hours, did it hang, or is it just taking long time).

Share this post


Link to post
Share on other sites

(another thing I miss from the std::thread API is timed joins)


I agree, but you can easily fix that using a condition_variable. Have the thread signal the condition just as it leaves its thread function, then do a timed wait on that condition variable instead of join()'ing the thread right away.

Share this post


Link to post
Share on other sites

[quote name='Prune' timestamp='1322862910' post='4889927']
(another thing I miss from the std::thread API is timed joins)

I agree, but you can easily fix that using a condition_variable. Have the thread signal the condition just as it leaves its thread function, then do a timed wait on that condition variable instead of join()'ing the thread right away.
[/quote]
Yes, though there are some issues with doing this and the proper way using the API would be to use std::notify_all_at_thread_exit() (such as that thread_local objects dtors have completed prior to signaling the condvar). However, it requires one to add a mutex and a condvar--why make it more complicated? IMHO, having a timed join is a good idea, and clearly at least the pthreads people agree.

The same goes for the lack of counting semaphores. You can also use a condvar and mutex for this, but it's obviously not as efficient (compare to sem_wait() and sem_post() in glibc source) and that can make a big difference in some applications. And what about barriers? Recent developments with dynamic derivatives of barriers, such as phasers, are very promising, so it makes no sense to be yet another POSIX feature that C++11 is ignoring.

Share this post


Link to post
Share on other sites

Thread cancellation has generally been agreed upon to be a bad idea. In almost all languages it's next to impossible to guarantee that external shared resources have been properly released (memory, handles, mutex, lock, file access, database, transactions, ...). Even Java, which contains top-to-bottom reflection capabilities and supports threading as first-class construct deprecated cancellations.

This does not apply to synchronous cancellation, because that relies on well-defined interruption points of which the compiler is aware. Synchronous cancellation this way has no more issues than exception handling does, at least the way I've implemented it, which is similar to Boost's interruption points. In cases where the thread is blocked in a system call, at least in Windows, the APC mechanism of interrupting system calls is also well defined and there's no reason to think MS wouldn't have put in proper cleanup when returning from such a call with a value indicating an APC interrupt vs any other sort of return value such as a timeout. If you look at the implementation of pthreads on Linux, synchronous cancellation also unwinds the stack and properly cleans up everything, so it can be assumed to be safe.

For asynchronous cancellation, it is possible that the code is interrupted pretty much anywhere, which can of course be somewhere the compiler does not expect exceptions are possible. And so in a scope where I expect a thread may be hung, I can add Cancellation::DontOptimize() which decreases the chance of a problem. Under Linux, pthreads asynchronous cancellations are implemented with signals--signals being well defined in the Unix world and I don't see people running around saying signals are unsafe--and also attempts to unwind the stack and clean up. But of course, it remains risky to do an asynchronous cancellation and was my motivation behind wanting to be able to try a synchronous one first.

A simple problem in C++ - memory allocations. Thread allocates 1MB of memory, is cancelled before these resources can be properly cleaned up (smart pointers don't help since references might be shared, perhaps even in subtle way inside kernel). Each time you restart a thread, you lose 1MB, causing a memory leak. This problem isn't solved (solvable!) even in (plain) Java, which is completely managed.[/quote]
My answer to this is simple: it's not a black and white thing; it's a statistical consideration. Cancellations are rare. In most cases, a synchronous cancellation (interruption in Boost) can cause recovery without any problem (unless it is thrown, uncaught, in a destructor, but this is a design issue). And in my testing, asynchronous cancellation succeeds without leaks most of the time. Even in the case of a small leak, many such can be afforded before the whole process needs to be restarted. If the process needs to restart on average one time out of ten times doing an asynchronous cancellation, that's a hell of a lot better than doing it every time (again, in my experimentation it's more rare than that).

Final problem are external tasks. A thread starts external task (perform backup of data cluster). Thread then blocks waiting for result (takes 8 hours) but is cancelled. Yet the external task keeps on running, but is a "zombie". Whatever work it performs is redundant, since it might require a followup from initiating thread. This can occur in many places inside kernel when using async operations.[/quote]
This is a program design issue, not a fundamental problem with cancellation. And on Windows at least, system calls such as blocking IO have alertable versions (ReadFileEx, WriteFileEx) and can be safely interrupted with APC.

(I'm working on a high reliability application where certain hung threads should be able to be cancelled and restarted[/quote]

If you're working on a *high reliability*, then there is no such thing as "hung". Each thread needs to be in deterministic state at any given time. The fact something can hang is a huge warning sign and cause must be determined and handled specifically. Imagine a flight control system which can hang. If a thread can hang, you lose guarantee of progress, so if one thread can hang in such way, entire system can hang indefinitely.

Otherwise, the design would need to be *highly available*, which is achieved with redundancy. This redundancy can be either symmetric (run same task in multiple places using different configurations/parameters/implementations or the overall system can be based around non-deterministic, but high-probability completion criteria (map/reduce, eventual consistency or similar). It also becomes necessary to handle partial completion or permanently failed tasks, along with ability to detect and properly classify them (if SQL INSERT is taking 17 hours, did it hang, or is it just taking long time).[/quote]
I should have said highly available. My goal is to minimize the frequency of the whole process restarting. I haven't actually had any thread hang in deployed code yet with the exception of graphics driver/graphics hardware problems (where a restart is necessary anyway), but since plugins are sometimes loaded by users, I can't make any guarantee regarding code I didn't produce. In browsers recently it's become fashionable to load plugins in separate processes, but I'm not making a browser and performance matters to me. I'm sure that many other use cases of cancellation can be found.

Redundancy is not always a solution due to resource constraints. For my example, my application is interactive software running on kiosks which is already pushing the available computational power, and I can't throw more hardware in it to increase reduncancy (how many GTX580 can you keep cool in a small space?)

I think you are making the assumption that availability of cancellation would encourage less care about code quality. But that is not a real argument against it, any more than it's an argument against a hammer that one can bruise their finger with it. We're nowhere near being able to efficiently prove formal correctness of complex software. In a well-constructed project, waiting on a thread will complete in an expected reasonable time with extremely high probability--but not absolute 100%. Of the remainder, having the option of cancellation would allow one to recover at least a great fraction those few problem times--so one gets closer to the 100%. What's the problem, then? If a programmer gets more sloppy because they expect they can just cancel a hung thread, that's the bad of the programmer, not of the concept of cancellations.

The fact that cancellations are a part of POSIX threads is a very clear indication that they are useful. My guess is that synchronous cancellation (i.e. interruption points) will be added to a later C++ revision, though asynchronous won't be due to manufactured controversy and considerations that I think are more academic than practical.

Share this post


Link to post
Share on other sites

The fact that cancellations are a part of POSIX threads is a very clear indication that they are useful.


Non sequitur.

POSIX has many badly broken parts.

My answer to this is simple: it's not a black and white thing; it's a statistical consideration. Cancellations are rare. In most cases, a synchronous cancellation (interruption in Boost) can cause recovery without any problem (unless it is thrown, uncaught, in a destructor, but this is a design issue).[/quote]

It should be strongly prefixed that your use case is not about thread-cancellation, but about watchdog.

For remote systems, watchdog is usually hardware based, so if there's no response in some time, the system is rebooted. Embeddable motherboards support this.

Under Linux, the boot process can be optimized enough for this restart to not be problematic (<5 seconds from signal to full function), but if running WIndows it's a bit longer.


Cancellation points can serve as syntactic sugar, but they do not solve the problem. If thread is to cleanly exit, it may need to clean up resources it claimed. That can take arbitrary amount of time and even never complete, leaving us with original problem again and requiring forceful shutdown. If not, then we either abandon termination or hang the main thread.

This is a program design issue, not a fundamental problem with cancellation.[/quote]

The user is wrong. Got it.

Why bother working around hung threads then - it's user's fault as well.

My guess is that synchronous cancellation (i.e. interruption points) will be added to a later C++ revision, though asynchronous won't be due to manufactured controversy and considerations that I think are more academic than practical.[/quote]

Sure, let's not bring any engineering or science aspects into the fashion industry called programming. Pret-a-programme, as they say.

Share this post


Link to post
Share on other sites
Honestly, I have to agree that thread cancelation is a bad idea.

A better way to be able to implement a reliable watchdog and not end up with corruption is to break the task out into separate processes, so an entire process can be killed along with its entire address space, and to have the RPC payloads between the host and task processes to be carefully audited for corruption at runtime.

Share this post


Link to post
Share on other sites
I already have a watchdog, running as a system service (Windows) or daemon (Linux). However, the watchdog acts by restarting the whole process (or system, in some cases).
In my situation, the application rotates on a schedule multiple touch-interactive pieces of content (some of which is third-party) at a frequency of a few minutes, and in some cases the screen is shared between several pieces of content. If I can avoid restarting the process, that means I can have working content on at nearly all times. This is important especially when there are a bunch of people playing with the interactive piece at the time a problem occurs (which is very common in a busy mall--I mean the people, not the problem), let alone if a client or potential client happens to be looking at it. Right now, depending on which thread freezes, most of the time there can be recovery. For example, if it's the render thread, that does not really affect object state and can just be simply restarted and video memory repopulated from RAM, which takes a fraction of a second (because of the way I cache things in RAM, done thus since as in general not all scheduled content will fit in video memory at the same time). Same with most of the compute/physics threads for which I run a task-based scheduler. If the loading thread needs to be restarted, it's pretty much the only ones that would cause me to reload a full piece of content--and in those cases I can simply run the next piece of content. In most cases, only the main thread failing would mean a process restart.

Overall it works, but I haven't figured out all the corner cases yet. Specifically, I'm not sure what to do if an APC event occurs that is not a cancellation. If, for example, NtWaitForKeyedEvent() in the mutex lock code above returns early but the check for cancellation makes out it's not one, then I don't know what is most appropriate. In the case of something like sleep, I would just resume the sleep (with a properly adjusted timeout). In a more complicated case such as the mutex, I don't think I can do that for two reasons. One is, I'm not sure what the OS does internally if NtReleaseKeyedEvent() occurs after the wait has been interrupted by APC. If it's nothing, then the release call would block until the wait is resumed, which is not good since an unlock should never block. But perhaps Windows accounts for alerts in some case? Second is that if multiple threads are waiting, then after they are alerted by an APC, and only some of them resume the wait by the time a release is called, then the ones that resume waiting subsequently will miss the wakeup.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!