Fast thread synchronisation (WIN32)

Started by
12 comments, last by Jan Wassenberg 18 years, 4 months ago
Hi again. In an application I'm writing, I sometimes have to load some new resources on-the fly (a specific command from user comes at run-time). Unfortunately, I have to keep stable 50 fps. So what I wanted to do, is to load resources in a separate thread. A standard model (I guess): - a queue (Q1) for requests, and a queue (Q2) for completed requests, - active thread adds new requests to Q1 (synchro here) anytime it wants to, - active thread gets and handles completed requests from Q2 (synchro here) in an appropriate time (in-between frames, for example), - loading thread gets requests from Q1 (synchro here) when they come (waken up if needed), then performs loading - loading thread returns completed requests to Q2 (synchro here), if there are more, performs loading in a loop, if none, sleeps A few question/problems: 1. How could I go about synchronisation issues (under WIN32), especially with sleeping/waking problem (I know what synchro is about, I just can't find my way around using structures given in WIN32 - I only got CRITICAL_SECTION)? 2. Performance is of a great issue here. Could I use other synchronisation objects/functions, or should I stick with CRITICAL_SECTION only? 3. I tried to use InitializeCriticalSectionAndSpinCount, but it wouldn't compile, but InitializeCriticalSection does. It's plain "identifier not found" error. Sure, I don't have SMP (but the target machine will have), could that be the problem (doubt it, but what do I know)? So, a bunch of noobish questions for you ;) ANY ideas appreciated, even "this sucks, man! (because...)" Cheers. ~def
Advertisement
Synchronization Functions

If you want the lightest weight synchronization possible, I believe what you want is InterlockedExchange

You could use it to implement a mutex-like entity like so:
typedef LONG volatile InterThreadMutex;BOOL TryAcquire(InterThreadMutex *TheMutex){   if(InterlockedExchange(TheMutex, 1) == 0)   {      return TRUE;   }   return FALSE;}void WaitAcquire(InterThreadMutex *TheMutex){   while(!TryAcquire(TheMutex))   {      Sleep(0);   }}void Release(InterThreadMutex *TheMutex){   InterlockedExchange(TheMutex, 0);}
(written in win32-api-like C, would be much much cleaner as C++);
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Identify the resource used by the different threads and use a critical section to guard it. I used the term resource in a similar sense to how Relisoft does: Resources and their Ownership. Reviewing that page, I see that it even uses CriticalSections as part of it's discussion.

Q1. Win32 synch objects are intended to be referenced by handles. The critical section structure is meant to be opaque. CriticalSections only work within the same process. Mutexes operate in the same way that CS do, except they can work across processes too. Semaphores and Events provide additional specialization features.

Q2. Stick with CS.

Q3. InitializeCriticalSectionAndSpinCount To compile an application that uses this function, define _WIN32_WINNT as 0x0403 or later.
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
You need to tell the windows headers that it's ok if your code doesn't run on Win95 or WinNT 4.0 (pre-SP3). InitializeCriticalSectionAndSpinCount only exists on Win98 and NT4sp3+. Do that with "#define _WIN32_WINNT 0x0403" (or higher) before including windows.h.

Critical sections are pretty good, particularly with carefully chosen spin counts. Always keep in mind that having a spin count on a single-proc system will only hurt you. Choosing the best spin count is something of a black art and tends to be very sensitive to the particular system the code is running on.

The takeaway is to not simply write them off as so many people do.

You also might consider an slist. They are only supported on XP+ though.

If you insist on rolling your own google for "lock free queue" or something similiar and please keep in mind that it's very easy to screw this stuff up and be worse off than if you just used the standard stuff.
-Mike
Anon Mike: You could use code similar to that in (Example)Getting Hardware Information to get the number of processors and avoid setting a spin count for single-processor machines. Of course, you should be careful when doing such things as there's no real way to tell how things will change in the future. The 100-core single-chip machines of tommorow might report only single CPU =-|
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Thanks for all your help.

InterlockExchange (and family) looks nice, but I would need to reimplement spin-wait, which is already given in CS, I guess as effective as it could.

As for the sleeping part, I went with Extrarius original suggestion (in the code snippet) of simply Sleep'ing for a fixed amount of time (2 seconds[!]).

Also, I think that using slist introduces unnecessary synchronizations, where it shouldn't, since I have to do all the operations and some additional checkings already in the section. So I'm using similar structure, only plain and home-made.


Seems to be working, but it really brings the main thread to it's knees, causing a lot of frames to be lost (I have to keep stable 50 fps). I'll try lowering the resource thread's priority, see if that will work.
If you're going to use critical sections, use WaitForSingleObject to acquire them.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Quote:Original post by Extrarius
If you're going to use critical sections, use WaitForSingleObject to acquire them.

No, you need to use EnterCriticalSection to aquire a critical section. WaitForSingleObject is for syncronization objects that require kernel intervention like mutexes, events, etc. Half the point of using a critical section in the first place to avoid the overhead of calling into the kernel.

I can't imagine why you think slists are to heavyweight.

To repeat, at least try the standard objects before dismissing them.
-Mike
Quote:Original post by Anon Mike
I can't imagine why you think slists are to heavyweight.


Because it's synchronizing every time I use it. And when I'm using it, I already am in some kind of a critical section (that is, in this implementation, in some other it would be a different situation), and I'm accessing it several times in a row.

Heavyweight may be too big a word here. I only mentioned it being unnecessary.


As for WaitForSingleObject, I'm using it for waiting for the thread to end, after I mark a proper volatile bool saying "thread, quit now, please".

--

I also tried to lower the priority of the loading thread, for one point only. I loose no more frames in the main thread, but the loading process takes mucho long (about 2-3 seconds for a 360x288 jpg file to standard ARGB texture). As I have to read, like, a few hundreds of those, guess I have to speed it up a little...

Any ideas?
Have you considered using overlapped I/O rather than going to the trouble of using a separate thread for I/O?

This topic is closed to new replies.

Advertisement