Thread Safety, using threads to load models.

Started by
10 comments, last by NineYearCycle 12 years, 10 months ago
Hello

I want to know how exactly I can achieve "thread safe" code. I know there are some libraries boost, Mutex and so on which do this but it is kinda hard for me to figure out how to build code from documentation. I know the principles but it is just hard :(

Basically I don't want my main loop getting blocked when a huge texture needs to be loaded. So I though about starting a thread, passing a pointer and let the thread do the work and then maybe using PostMessage() to tell the main loop when it's ready. But if too many threads are initiated at once I fear it will crash the computer. So I am hoping message qeues will resolve my problem. Main loop will just push down what model to be loaded and a pointer. The loading thread will do the loading, then uses the pointer to point to the model (it's encapsulated in a class) and then post in another message qeue that the model is ready to be drawn.

Does anyone have any code (c++) or tutorial on how to achieve this?

Thank you
Advertisement
You shouldn't constantly be starting threads, nor creating large numbers of threads.

You can have just two threads, the main game thread and the background-loading thread.

The main thread sends messages to the background thread, saying it wants to load a model, and the background thread sends messages back once it's loaded. To achieve thread-safety, you just need to make sure that any variables that are used by both threads cannot be used by both threads at the same time (which is what mutexes do for you).

Here's a simple example of two threads communicating via message queues:struct LoadRequest
{
std::string fileName;
Model* target;
};

struct LoadResult
{
Model* target;
void* data;
};

CRITICAL_SECTION loadRequestMutex;
CRITICAL_SECTION loadResultMutex;
std::deque<LoadRequest> loadRequestQueue;//you must lock/unlock loadRequestMutex whenever you use this variable
std::deque<LoadResult> loadResultQueue;//you must lock/unlock loadResultMutex whenever you use this variable

void MainThread_LoadFile(const std::string& fileName, Model* target)
{
LoadRequest message;
message.fileName = fileName;
message.target = target;

// Send a request to the background thread:
EnterCriticalSection(&loadRequestMutex);
loadRequestQueue.push_back(message);
LeaveCriticalSection(&loadRequestMutex);
}

DWORD WINAPI BackgroundThread_Loop( LPVOID lpParam )
{
while(!g_Quit)
{
Sleep(0);
LoadRequest inMessage;
bool receivedMessage = false;

//Get a request from the main thread:
EnterCriticalSection(&loadRequestMutex);
if( !loadRequestQueue.empty() )
{
inMessage = loadRequestQueue.front();
loadRequestQueue.pop_front();
receivedMessage = true;
}
LeaveCriticalSection(&loadRequestMutex);

//if there was a request sent:
if( receivedMessage )
{
//read the file:
FILE* fp = fopen( inMessage.fileName, "rb" );
fseek( fp, 0, SEEK_END );
long size = ftell( fp );
rewind( fp );
char* buffer = new char[Size];
fread( buffer, 1, Size, fp );
fclose( fp );

LoadResult outMessage;
outMessage.target = inMessage.target;
outMessage.data = buffer;

//Send the results back to the main thread:
EnterCriticalSection(&loadResultMutex);
loadResultQueue.push_back(message);
LeaveCriticalSection(&loadResultMutex);
}
}
}

void MainThread_Loop()
{
//Create the mutexes
InitializeCriticalSection(&loadRequestMutex);
InitializeCriticalSection(&loadResultMutex);
//Start the background loop
HANDLE backgroundThread = CreateThread( 0, 0, BackgroundThread_Loop, 0, 0, 0);

while(!g_Quit)
{
//Run the game:

Update(); // might call MainThread_LoadFile

Render();

// Process results from the background thread:
EnterCriticalSection(&loadResultMutex);
while( !loadResultQueue.empty() )
{
LoadResult message = loadResultQueue.front();
loadResultQueue.pop_front();

message.target->DataLoaded( message.data );//give the loaded data to the model object

delete [] message.data;
}
LeaveCriticalSection(&loadResultMutex);
}

//Make sure the background thread is finsihed
WaitForSingleObject(backgroundThread, INFINITE);
//Clean up the mutexes
DeleteCriticalSection(&loadRequestMutex);
DeleteCriticalSection(&loadResultMutex);
}
In windows you want to use Overlapped I/O, it's the most efficient way to load files on another thread.

Here's how I do it. I call sys_read from the main thread with the filename, callback, and user data whenever I want to load a file. Then I call sys_tick from the main thread every frame. This checks for completed transfers and executes the callback.

Pros:
Callbacks are always executed on the same thread as sys_read
No mutexes locks or anything of the sort

Cons
All calls to sys_read and sys_tick must come from the same thread. I don't find this to be a limitation at all.

Header
typedef void (*readhandler_t)(byte *, unsigned, void *);
unsigned sys_read(const char *, readhandler_t, void *user);
void sys_read(void);


Win32 Implementation
typedef struct {
void *user;
HANDLE file;
OVERLAPPED overlapped;
byte *buffer;
readhandler_t callback;
} readop_t;

typedef std::list<readop_t *> ReadOpsList;
static ReadOpsList readops = ReadOpsList();

unsigned sys_read(const char *name, readhandler_t callback, void *user) {
HANDLE file;
DWORD size;
byte *b;
readop_t *o;

file = CreateFile(name, GENERIC_READ, FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_FLAG_SEQUENTIAL_SCAN | FILE_FLAG_OVERLAPPED, NULL);
if (file == INVALID_HANDLE_VALUE) {
return 0;
}

size = GetFileSize(file, NULL);
if (size == INVALID_FILE_SIZE) {
return 0;
}

b = (byte *)malloc(size + sizeof(readop_t));
o = (readop_t *)(b + size);
o->user = user;
o->file = file;
o->buffer = b;
o->callback = callback;
o->overlapped.Offset = o->overlapped.OffsetHigh = 0;
o->overlapped.hEvent = NULL;

if (!ReadFileEx(o->file, o->buffer, size, &o->overlapped, NULL)) {
free(o);
return 0;
}

readops.push_back(o);

return size;
}

static bool CompleteReadOp(const readop_t *& o) {
DWORD dwNumberOfBytesTransfered;

if (GetOverlappedResult(o->file, &o->overlapped, &dwNumberOfBytesTransfered, FALSE)) {
CloseHandle(o->file);
o->callback(o->buffer, dwNumberOfBytesTransfered, o->user);
/* readop_t is appended to buffer to save allocations.
either free o->buffer here, or let the user own it */
return true;
}
return false;
};

void sys_tick(void) {
readops.remove_if(CompleteReadOp);
}


Oh, and here's the (untested) standard impelmentation for other platforms until I get around to learning them.
unsigned sys_read(const char *name, readhandler_t callback, void *user) {
FILE *file;
long size;
char *buffer;
unsigned read;

file = fopen(name, "r");
if (!file) {
return 0;
}

fseek(file, 0, SEEK_END);
size = ftell(file);
rewind(file);

buffer = (char *)malloc(sizeof(char)*size);
read = fread(buffer, 1, size, file);
fclose(file);

callback(buffer, size, user);
return read;
}

void sys_tick(void) {
}
Anthony Umfer
"You can have just two threads, the main game thread and the background-loading thread."

A reason to have many threads in the background is to issue many IO requests. The reason to issue many IO requests is to a) saturate the disk bus (because you want this stuff loaded) and not have any dead time and b) to tell the OS about all the things you want to load. Why? So it can then optimise the loads; suppose you want to load 5 files all on different tracks on the disk. If you tell the OS about all of them (by asking for them) then the OS can order the accesses to go across the disk in the optimal seeking pattern; It may make sense to load #3 first, then #2 then #5 and so on. The OS will know more about how to do this than you probably can because it knows more about the storage device.

Windows overlapped IO does this without the overhead of all the threads (each thread requires a non-trivial amount of memory to be allocated).
The GetOverlappedResult function will "check" if the I/O operation is completed right? So Windows is handling the I/O while the main loop is non-blocked? Why are there no conflict if windows is writing to the same memory you are trying to check with GetOverlappedResult?

You shouldn't constantly be starting threads, nor creating large numbers of threads.

You can have just two threads, the main game thread and the background-loading thread.

The main thread sends messages to the background thread, saying it wants to load a model, and the background thread sends messages back once it's loaded. To achieve thread-safety, you just need to make sure that any variables that are used by both threads cannot be used by both threads at the same time (which is what mutexes do for you).

Here's a simple example of two threads communicating via message queues:struct LoadRequest
{
std::string fileName;
Model* target;
};

struct LoadResult
{
Model* target;
void* data;
};

CRITICAL_SECTION loadRequestMutex;
CRITICAL_SECTION loadResultMutex;
std::deque<LoadRequest> loadRequestQueue;//you must lock/unlock loadRequestMutex whenever you use this variable
std::deque<LoadResult> loadResultQueue;//you must lock/unlock loadResultMutex whenever you use this variable

void MainThread_LoadFile(const std::string& fileName, Model* target)
{
LoadRequest message;
message.fileName = fileName;
message.target = target;

// Send a request to the background thread:
EnterCriticalSection(&loadRequestMutex);
loadRequestQueue.push_back(message);
LeaveCriticalSection(&loadRequestMutex);
}

DWORD WINAPI BackgroundThread_Loop( LPVOID lpParam )
{
while(!g_Quit)
{
Sleep(0);
LoadRequest inMessage;
bool receivedMessage = false;

//Get a request from the main thread:
EnterCriticalSection(&loadRequestMutex);
if( !loadRequestQueue.empty() )
{
inMessage = loadRequestQueue.front();
loadRequestQueue.pop_front();
receivedMessage = true;
}
LeaveCriticalSection(&loadRequestMutex);

//if there was a request sent:
if( receivedMessage )
{
//read the file:
FILE* fp = fopen( inMessage.fileName, "rb" );
fseek( fp, 0, SEEK_END );
long size = ftell( fp );
rewind( fp );
char* buffer = new char[Size];
fread( buffer, 1, Size, fp );
fclose( fp );

LoadResult outMessage;
outMessage.target = inMessage.target;
outMessage.data = buffer;

//Send the results back to the main thread:
EnterCriticalSection(&loadResultMutex);
loadResultQueue.push_back(message);
LeaveCriticalSection(&loadResultMutex);
}
}
}

void MainThread_Loop()
{
//Create the mutexes
InitializeCriticalSection(&loadRequestMutex);
InitializeCriticalSection(&loadResultMutex);
//Start the background loop
HANDLE backgroundThread = CreateThread( 0, 0, BackgroundThread_Loop, 0, 0, 0);

while(!g_Quit)
{
//Run the game:

Update(); // might call MainThread_LoadFile

Render();

// Process results from the background thread:
EnterCriticalSection(&loadResultMutex);
while( !loadResultQueue.empty() )
{
LoadResult message = loadResultQueue.front();
loadResultQueue.pop_front();

message.target->DataLoaded( message.data );//give the loaded data to the model object

delete [] message.data;
}
LeaveCriticalSection(&loadResultMutex);
}

//Make sure the background thread is finsihed
WaitForSingleObject(backgroundThread, INFINITE);
//Clean up the mutexes
DeleteCriticalSection(&loadRequestMutex);
DeleteCriticalSection(&loadResultMutex);
}



Thanks for this!

Question: Will the mutex block until it is free? Can this effect framerate significanly if the render(main) thread has to wait for the worker thread?
Also, Sleep(0) will make the app consume 100% CPU (on one core atleast), any other method recomended? (waking the thread). Sleep(1) will limit the thread to 1000 loops per second. Not good if more are needed.
Question: Will the mutex block until it is free?
Yep. The point of the mutex is to force a thread to wait until it's safe to go on.
However, in cases where you don't want it to block (e.g. you've got other work for the thread to do), you could use [font="Courier New"]TryEnterCriticalSection[/font].
Can this effect framerate significanly if the render(main) thread has to wait for the worker thread?[/quote]Yes, excessive use of mutexes is bad for performance.
Take note though that in that example, the mutexes are only locked ("entered") for a very small amount of time. For example, in [font="Courier New"]BackgroundThread_Loop[/font], the mutex is locked in order to retrieve a message from the queue, then it is unlocked again. Then the message is processed outside of the lock (which means the main thread is free to lock the mutex while the message is being processed).
Always try to keep things locked for the shortest amount of time possible.
Also, Sleep(0) will make the app consume 100% CPU (on one core atleast), any other method recomended? (waking the thread). Sleep(1) will limit the thread to 1000 loops per second. Not good if more are needed.[/quote]No, not having a [font="Courier New"]Sleep[/font] call there at all will make the app consume 100% of a CPU core. Adding the [font="Courier New"]Sleep(0)[/font] will allow other applications to steal some CPU time from the game if they need it.

In general though, you'd use semaphores or events to put the worker thread to sleep and have it wake up only when notified by the main thread. This will reduce the CPU usage comlpetely (when not working), but will also make things less responsive (there will be a few ms delay between issuing the notification and the thread waking up).


While this is a decent example of a worker thread, take note of typedef struct's post above about Overlapped I/O -- this is the native Windows way of loading file in the background and will perform much better.
The GetOverlappedResult function will "check" if the I/O operation is completed right? So Windows is handling the I/O while the main loop is non-blocked? Why are there no conflict if windows is writing to the same memory you are trying to check with GetOverlappedResult?
Yes [font="Courier New"]GetOverlappedResult[/font] checks if the file loading is complete. There's no conflict because when you started the file load, windows gave you a 'ticket', and when you're checking if it's done, you're giving that ticket back to Windows. You don't touch the actual memory in the meantime.
All of these functions are documented on the MSDN (which is probably the top Google result when searching for the function names).
No, not having a [font="Courier New"]Sleep[/font] call there at all will make the app consume 100% of a CPU core. Adding the [font="Courier New"]Sleep(0)[/font] will allow other applications to steal some CPU time from the game if they need it.


For XP/2000 it might be of interest that it behaves differently and only allows threads of equal priority to run. If all other threads happen to have lower priority the call returns immediately. But typically you don't fiddle with priorities anyway.

Another option if you know that you block only for a very short time are atomics. I'd consider them as "advanced", however and first get used to basic locking/unlocking. The problem with atomics is that they effectively retry in a busy loop, which is fine on multi core machines, but obviously very useless for single core (since nobody can unlock it until the current thread runs out of time).

Also worth mentioning: making software multithreading safe is about much more than just spamming mutexes left and right. I remember wasting a lot of time trying to explain some people that mixing event based and polling across multiple threads is a stupid idea and begs for bugs (which we had on a regular base). Why? Thread A modifies state and sends event, thread B reacts by polling state. Too bad Thread A already changed the state again in the meantime. Mutexes won't prevent that (unless you block Thread A for a long time until B finally processed the event). The easy solution would have been to send the state along with the event.
f@dzhttp://festini.device-zero.de
What happens to list::remove_if if there are no elements, is it just skipped?

void sys_tick(void) {
if(!readops.size())
readops.remove_if(CompleteReadOp);
}


Is this necessary/faster?

Why do you prefer to use a callback instead of a regular function call? Is this to handle different filetypes in different ways?


Also

free(o);


As I see it, "b" points to the first adress allocated. And "o" is appended to the last adresses. Why not free "b" if (!ReadFileEx....?
Unless you've already considered this, you should first detect how many cores are on the computer. If there's only one your code needs to recognize this and not try to multithread at all. If you don't care about single core machines then ignore this but it's an important factor regardless.
Always strive to be better than yourself.

This topic is closed to new replies.

Advertisement