This topic is 2427 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Hello

I want to know how exactly I can achieve "thread safe" code. I know there are some libraries boost, Mutex and so on which do this but it is kinda hard for me to figure out how to build code from documentation. I know the principles but it is just hard

Does anyone have any code (c++) or tutorial on how to achieve this?

Thank you

##### Share on other sites
You shouldn't constantly be starting threads, nor creating large numbers of threads.

The main thread sends messages to the background thread, saying it wants to load a model, and the background thread sends messages back once it's loaded. To achieve thread-safety, you just need to make sure that any variables that are used by both threads cannot be used by both threads at the same time (which is what mutexes do for you).

Here's a simple example of two threads communicating via message queues:struct LoadRequest { std::string fileName; Model* target; }; struct LoadResult { Model* target; void* data; }; CRITICAL_SECTION loadRequestMutex; CRITICAL_SECTION loadResultMutex; std::deque<LoadRequest> loadRequestQueue;//you must lock/unlock loadRequestMutex whenever you use this variable std::deque<LoadResult> loadResultQueue;//you must lock/unlock loadResultMutex whenever you use this variable void MainThread_LoadFile(const std::string& fileName, Model* target) { LoadRequest message; message.fileName = fileName; message.target = target; // Send a request to the background thread: EnterCriticalSection(&loadRequestMutex); loadRequestQueue.push_back(message); LeaveCriticalSection(&loadRequestMutex); } DWORD WINAPI BackgroundThread_Loop( LPVOID lpParam ) { while(!g_Quit) { Sleep(0); LoadRequest inMessage; bool receivedMessage = false; //Get a request from the main thread: EnterCriticalSection(&loadRequestMutex); if( !loadRequestQueue.empty() ) { inMessage = loadRequestQueue.front(); loadRequestQueue.pop_front(); receivedMessage = true; } LeaveCriticalSection(&loadRequestMutex); //if there was a request sent: if( receivedMessage ) { //read the file: FILE* fp = fopen( inMessage.fileName, "rb" ); fseek( fp, 0, SEEK_END ); long size = ftell( fp ); rewind( fp ); char* buffer = new char[Size]; fread( buffer, 1, Size, fp ); fclose( fp ); LoadResult outMessage; outMessage.target = inMessage.target; outMessage.data = buffer; //Send the results back to the main thread: EnterCriticalSection(&loadResultMutex); loadResultQueue.push_back(message); LeaveCriticalSection(&loadResultMutex); } } } void MainThread_Loop() { //Create the mutexes InitializeCriticalSection(&loadRequestMutex); InitializeCriticalSection(&loadResultMutex); //Start the background loop HANDLE backgroundThread = CreateThread( 0, 0, BackgroundThread_Loop, 0, 0, 0); while(!g_Quit) { //Run the game: Update(); // might call MainThread_LoadFile Render(); // Process results from the background thread: EnterCriticalSection(&loadResultMutex); while( !loadResultQueue.empty() ) { LoadResult message = loadResultQueue.front(); loadResultQueue.pop_front(); message.target->DataLoaded( message.data );//give the loaded data to the model object delete [] message.data; } LeaveCriticalSection(&loadResultMutex); } //Make sure the background thread is finsihed WaitForSingleObject(backgroundThread, INFINITE); //Clean up the mutexes DeleteCriticalSection(&loadRequestMutex); DeleteCriticalSection(&loadResultMutex); }

##### Share on other sites
In windows you want to use Overlapped I/O, it's the most efficient way to load files on another thread.

Here's how I do it. I call sys_read from the main thread with the filename, callback, and user data whenever I want to load a file. Then I call sys_tick from the main thread every frame. This checks for completed transfers and executes the callback.

Pros:
No mutexes locks or anything of the sort

Cons
All calls to sys_read and sys_tick must come from the same thread. I don't find this to be a limitation at all.

typedef void (*readhandler_t)(byte *, unsigned, void *); unsigned sys_read(const char *, readhandler_t, void *user); void sys_read(void);

Win32 Implementation
typedef struct { void *user; HANDLE file; OVERLAPPED overlapped; byte *buffer; readhandler_t callback; } readop_t; typedef std::list<readop_t *> ReadOpsList; static ReadOpsList readops = ReadOpsList(); unsigned sys_read(const char *name, readhandler_t callback, void *user) { HANDLE file; DWORD size; byte *b; readop_t *o; file = CreateFile(name, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN | FILE_FLAG_OVERLAPPED, NULL); if (file == INVALID_HANDLE_VALUE) { return 0; } size = GetFileSize(file, NULL); if (size == INVALID_FILE_SIZE) { return 0; } b = (byte *)malloc(size + sizeof(readop_t)); o = (readop_t *)(b + size); o->user = user; o->file = file; o->buffer = b; o->callback = callback; o->overlapped.Offset = o->overlapped.OffsetHigh = 0; o->overlapped.hEvent = NULL; if (!ReadFileEx(o->file, o->buffer, size, &o->overlapped, NULL)) { free(o); return 0; } readops.push_back(o); return size; } static bool CompleteReadOp(const readop_t *& o) { DWORD dwNumberOfBytesTransfered; if (GetOverlappedResult(o->file, &o->overlapped, &dwNumberOfBytesTransfered, FALSE)) { CloseHandle(o->file); o->callback(o->buffer, dwNumberOfBytesTransfered, o->user); /* readop_t is appended to buffer to save allocations. either free o->buffer here, or let the user own it */ return true; } return false; }; void sys_tick(void) { readops.remove_if(CompleteReadOp); }

Oh, and here's the (untested) standard impelmentation for other platforms until I get around to learning them.
unsigned sys_read(const char *name, readhandler_t callback, void *user) { FILE *file; long size; char *buffer; unsigned read; file = fopen(name, "r"); if (!file) { return 0; } fseek(file, 0, SEEK_END); size = ftell(file); rewind(file); buffer = (char *)malloc(sizeof(char)*size); read = fread(buffer, 1, size, file); fclose(file); callback(buffer, size, user); return read; } void sys_tick(void) { }

##### Share on other sites

Windows overlapped IO does this without the overhead of all the threads (each thread requires a non-trivial amount of memory to be allocated).

##### Share on other sites
The GetOverlappedResult function will "check" if the I/O operation is completed right? So Windows is handling the I/O while the main loop is non-blocked? Why are there no conflict if windows is writing to the same memory you are trying to check with GetOverlappedResult?

##### Share on other sites

You shouldn't constantly be starting threads, nor creating large numbers of threads.

The main thread sends messages to the background thread, saying it wants to load a model, and the background thread sends messages back once it's loaded. To achieve thread-safety, you just need to make sure that any variables that are used by both threads cannot be used by both threads at the same time (which is what mutexes do for you).

Here's a simple example of two threads communicating via message queues:struct LoadRequest { std::string fileName; Model* target; }; struct LoadResult { Model* target; void* data; }; CRITICAL_SECTION loadRequestMutex; CRITICAL_SECTION loadResultMutex; std::deque<LoadRequest> loadRequestQueue;//you must lock/unlock loadRequestMutex whenever you use this variable std::deque<LoadResult> loadResultQueue;//you must lock/unlock loadResultMutex whenever you use this variable void MainThread_LoadFile(const std::string& fileName, Model* target) { LoadRequest message; message.fileName = fileName; message.target = target; // Send a request to the background thread: EnterCriticalSection(&loadRequestMutex); loadRequestQueue.push_back(message); LeaveCriticalSection(&loadRequestMutex); } DWORD WINAPI BackgroundThread_Loop( LPVOID lpParam ) { while(!g_Quit) { Sleep(0); LoadRequest inMessage; bool receivedMessage = false; //Get a request from the main thread: EnterCriticalSection(&loadRequestMutex); if( !loadRequestQueue.empty() ) { inMessage = loadRequestQueue.front(); loadRequestQueue.pop_front(); receivedMessage = true; } LeaveCriticalSection(&loadRequestMutex); //if there was a request sent: if( receivedMessage ) { //read the file: FILE* fp = fopen( inMessage.fileName, "rb" ); fseek( fp, 0, SEEK_END ); long size = ftell( fp ); rewind( fp ); char* buffer = new char[Size]; fread( buffer, 1, Size, fp ); fclose( fp ); LoadResult outMessage; outMessage.target = inMessage.target; outMessage.data = buffer; //Send the results back to the main thread: EnterCriticalSection(&loadResultMutex); loadResultQueue.push_back(message); LeaveCriticalSection(&loadResultMutex); } } } void MainThread_Loop() { //Create the mutexes InitializeCriticalSection(&loadRequestMutex); InitializeCriticalSection(&loadResultMutex); //Start the background loop HANDLE backgroundThread = CreateThread( 0, 0, BackgroundThread_Loop, 0, 0, 0); while(!g_Quit) { //Run the game: Update(); // might call MainThread_LoadFile Render(); // Process results from the background thread: EnterCriticalSection(&loadResultMutex); while( !loadResultQueue.empty() ) { LoadResult message = loadResultQueue.front(); loadResultQueue.pop_front(); message.target->DataLoaded( message.data );//give the loaded data to the model object delete [] message.data; } LeaveCriticalSection(&loadResultMutex); } //Make sure the background thread is finsihed WaitForSingleObject(backgroundThread, INFINITE); //Clean up the mutexes DeleteCriticalSection(&loadRequestMutex); DeleteCriticalSection(&loadResultMutex); }

Thanks for this!

Question: Will the mutex block until it is free? Can this effect framerate significanly if the render(main) thread has to wait for the worker thread?
Also, Sleep(0) will make the app consume 100% CPU (on one core atleast), any other method recomended? (waking the thread). Sleep(1) will limit the thread to 1000 loops per second. Not good if more are needed.

##### Share on other sites
Question: Will the mutex block until it is free?
Yep. The point of the mutex is to force a thread to wait until it's safe to go on.
However, in cases where you don't want it to block (e.g. you've got other work for the thread to do), you could use [font="Courier New"]TryEnterCriticalSection[/font].
Can this effect framerate significanly if the render(main) thread has to wait for the worker thread?[/quote]Yes, excessive use of mutexes is bad for performance.
Take note though that in that example, the mutexes are only locked ("entered") for a very small amount of time. For example, in [font="Courier New"]BackgroundThread_Loop[/font], the mutex is locked in order to retrieve a message from the queue, then it is unlocked again. Then the message is processed outside of the lock (which means the main thread is free to lock the mutex while the message is being processed).
Always try to keep things locked for the shortest amount of time possible.
Also, Sleep(0) will make the app consume 100% CPU (on one core atleast), any other method recomended? (waking the thread). Sleep(1) will limit the thread to 1000 loops per second. Not good if more are needed.[/quote]No, not having a [font="Courier New"]Sleep[/font] call there at all will make the app consume 100% of a CPU core. Adding the [font="Courier New"]Sleep(0)[/font] will allow other applications to steal some CPU time from the game if they need it.

In general though, you'd use semaphores or events to put the worker thread to sleep and have it wake up only when notified by the main thread. This will reduce the CPU usage comlpetely (when not working), but will also make things less responsive (there will be a few ms delay between issuing the notification and the thread waking up).

While this is a decent example of a worker thread, take note of typedef struct's post above about Overlapped I/O -- this is the native Windows way of loading file in the background and will perform much better.
The GetOverlappedResult function will "check" if the I/O operation is completed right? So Windows is handling the I/O while the main loop is non-blocked? Why are there no conflict if windows is writing to the same memory you are trying to check with GetOverlappedResult?
Yes [font="Courier New"]GetOverlappedResult[/font] checks if the file loading is complete. There's no conflict because when you started the file load, windows gave you a 'ticket', and when you're checking if it's done, you're giving that ticket back to Windows. You don't touch the actual memory in the meantime.
All of these functions are documented on the MSDN (which is probably the top Google result when searching for the function names).

##### Share on other sites
No, not having a [font="Courier New"]Sleep[/font] call there at all will make the app consume 100% of a CPU core. Adding the [font="Courier New"]Sleep(0)[/font] will allow other applications to steal some CPU time from the game if they need it.

For XP/2000 it might be of interest that it behaves differently and only allows threads of equal priority to run. If all other threads happen to have lower priority the call returns immediately. But typically you don't fiddle with priorities anyway.

Another option if you know that you block only for a very short time are atomics. I'd consider them as "advanced", however and first get used to basic locking/unlocking. The problem with atomics is that they effectively retry in a busy loop, which is fine on multi core machines, but obviously very useless for single core (since nobody can unlock it until the current thread runs out of time).

Also worth mentioning: making software multithreading safe is about much more than just spamming mutexes left and right. I remember wasting a lot of time trying to explain some people that mixing event based and polling across multiple threads is a stupid idea and begs for bugs (which we had on a regular base). Why? Thread A modifies state and sends event, thread B reacts by polling state. Too bad Thread A already changed the state again in the meantime. Mutexes won't prevent that (unless you block Thread A for a long time until B finally processed the event). The easy solution would have been to send the state along with the event.

##### Share on other sites
What happens to list::remove_if if there are no elements, is it just skipped?
 void sys_tick(void) { if(!readops.size()) readops.remove_if(CompleteReadOp); } 

Is this necessary/faster?

Why do you prefer to use a callback instead of a regular function call? Is this to handle different filetypes in different ways?

Also
 free(o); 

As I see it, "b" points to the first adress allocated. And "o" is appended to the last adresses. Why not free "b" if (!ReadFileEx....?

##### Share on other sites
Unless you've already considered this, you should first detect how many cores are on the computer. If there's only one your code needs to recognize this and not try to multithread at all. If you don't care about single core machines then ignore this but it's an important factor regardless.

##### Share on other sites

What happens to list::remove_if if there are no elements, is it just skipped?
 void sys_tick(void) { if(!readops.size()) readops.remove_if(CompleteReadOp); } 

Is this necessary/faster?

Why do you prefer to use a callback instead of a regular function call? Is this to handle different filetypes in different ways?

Not sure what you mean by necessary/faster. The if statement isn't necessary (in fact it's wrong, there shouldn't be a "!". That's what I get for trying to clean out irrelevant portions code before posting). If you mean polling with GetOverlappedResult, there are 2 options with Overlapped IO. You can either create an event (the hEvent field of the OVERLAPPED struct) and wait on that (WaitForSingleObject), or just poll with GetOverlappedResult. In general, polling is better than events for games, since you're updating every frame anyway, and you never ever want to block.

Using a callback just gives me the most flexibility, and the best architecture. Any individual file can have specific code related to it, neatly tucked away in that callback. Or you can use a generic callback for different file types. Or do both by adding another layer between the sys_read and the user callback. That's what I do with graphics assets:

struct asset_t; typedef void (*loadhandler_t)(asset_t *); struct asset_t { const char *name; int type; int flags; loadhandler_t onload; void *usr; H3DRes res; bool loaded; };

Implementation
static void OnReadAsset(byte *b, unsigned len, void *usr) { asset_t *asset = (asset_t *)usr; h3dLoadResource(asset->res, (const char *)b, len); free(b); asset->loaded = true; if (asset->onload) { asset->onload(asset); } } void asset_read(asset_t *a) { if (a && a->name) { a->res = h3dAddResource(a->type, a->name, a->flags); sys_read(a->name, OnReadAsset, a); } }

So for graphics assets, instead of using sys_read directly, I create an asset_t (containing flags/etc specific to the renderer). asset_read handles getting the data loaded into the renderer, and then gives me an optional callback for any other stuff I might want to do with that asset once it's loaded.

The end result is that it's very clear what's going on. You can make the flexibility vs abstraction tradeoff that's best for any given situation.

Other people I'm sure use a system where you register handlers for different file types. That works fine, but it's forcing you to use the same level of abstraction for all of your file loading. What if I have 1 .BMP that I want loaded by the renderer, but another that I'm just using as a heightmap, so I don't want to actually create a texture for it? Callbacks let you layer abstraction, so I can handle processing the file data at subsystem-level, filetype-level, or individual file level, as needed.

EDIT: I should mention that you can always use a virtual function instead of a function pointer to do these callbacks, if you don't like the whole "C feel". I just like the option to declare my assets statically:static asset_t map = { "materials/map.material.xml", H3DResTypes::Material, 0, InitMap }; You can tell at a glance what this asset is, where it is, and what happens when it's loaded.

Also
 free(o); 

As I see it, "b" points to the first adress allocated. And "o" is appended to the last adresses. Why not free "b" if (!ReadFileEx....?

Yup, that's a bug, should be freeing b there. Thanks.

##### Share on other sites

Unless you've already considered this, you should first detect how many cores are on the computer. If there's only one your code needs to recognize this and not try to multithread at all. If you don't care about single core machines then ignore this but it's an important factor regardless.

Not true. Sure you could detect how many cores there are, but it doesn't affect the topic of threading the loading.