Jump to content

  • Log In with Google      Sign In   
  • Create Account


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
24 replies to this topic

#1 addy914   Members   -  Reputation: 158

Like
0Likes
Like

Posted 04 August 2012 - 10:29 PM

Hello everyone, I am planning on making a game with C++ and I am wanting it to be multi-threaded. The biggest issue with having multiple threads is making it thread-safe. I know of the many ways to make it safe, but I want to know, what will be the BEST and FASTEST for me. My primary question is what's the fastest, but if you want to list off the pros/cons of each one, then hey, I don't mind learning more :P

Basically, I am going to have 4 threads for loading content, one thread for input/output, one thread for rendering, and one thread for networking. I am going to have a Form class to represent the current Form/Screen. This Form class will be the basis of where I need to start worrying about thread-safety. The Form class will have a HandlePacket,Draw, and HandleInput functions that will each be called from different threads. The form will have variables in there that will be accessed from each thread.

I do know that mutexes are slower than windows EnterCriticalSection/LeaveCriticalSection since the critical section is user mode and mutexes are performed at a kernel mode. But, I am not sure if boost's smart pointers are faster than critical sections. I did try out atomics for a bit, but it actually deadlocked a program for no reason at one point. I also heard of having TLS, which could possibly work.

If I were to use smart pointers, would I make the form a smart pointer, or just the data inside of it? If I were to use TLS, the data wont be synchronized per thread, correct? If I were to use EnterCriticalSection/LeaveCriticalSection, would I have a CriticalSection variable for each variable inside the form, or just one for the form?

I am going for a balance between memory and performance effeciency. So, I don't want the option that takes a ton of memory but is fast, and I don't want the option that takes no memory, but is slow. A good balance would be ideal for me.

I would greatly appreciate a pros/cons list of my options because going from website to website, someone seems to say something different. It would help set me straight on what works and what doesn't. Thank you in advance.

Sponsor:

#2 Hodgman   Moderators   -  Reputation: 30384

Like
5Likes
Like

Posted 04 August 2012 - 10:57 PM

My primary question is what's the fastest

Don't share data between threads ;-P

Basically, I am going to have 4 threads for loading content, one thread for input/output, one thread for rendering, and one thread for networking.

What about on single-core or dual-core CPUs? Thread-pool systems (with a variable number of threads based on the CPU) that are executing a flow-based graph of tasks are generally very simple and efficient.

Edited by Hodgman, 04 August 2012 - 10:59 PM.


#3 addy914   Members   -  Reputation: 158

Like
0Likes
Like

Posted 04 August 2012 - 11:22 PM

Haha, I wish I didn't have to share data between the threads, but I do :(

I think I understand what you're saying, and that might just work out. I would still have to synchronize the data, but what you're saying could work better. Well, to think of it, I am using OpenGL to render. I actually have to set the context in the thread to the OpenGL one at initialization. It would be somewhat wasteful to set the context of each thread in the pool.

#4 Tribad   Members   -  Reputation: 854

Like
2Likes
Like

Posted 05 August 2012 - 02:49 AM

Haha, I wish I didn't have to share data between the threads, but I do Posted Image

I think I understand what you're saying, and that might just work out. I would still have to synchronize the data, but what you're saying could work better. Well, to think of it, I am using OpenGL to render. I actually have to set the context in the thread to the OpenGL one at initialization. It would be somewhat wasteful to set the context of each thread in the pool.

Hodgman is right with his answer.
If data must exist in another thread, make a copy.
I created a messaging system for my own applications to achieve that. In my approach the only thing that gets shared between threads is the message queue. The messages define the information flow in a strict way. So the overall system gets more structure.

#5 addy914   Members   -  Reputation: 158

Like
0Likes
Like

Posted 05 August 2012 - 10:37 AM

I'm not sure if it would even be beneficial to have multiple threads if I use a queue system to get and set data. I could make a copy of it, but still the data needs to be synchronized between all threads to be accurate. The whole Form class is going to be accessed by each thread and the data inside of it. If I make a copy of all of that, that's about 8x the Form class then I need and want :P

I just want one instance of the Form class, and I want it to be able to be safely accessed by all of the threads. I have seen threads about shared_ptr being slow, and I could believe that. I think they overdid it with the shared_ptr. EnterCriticalSection is what I'm leaning towards, but I'm not sure if I should have one critical section for the whole form, or for the variables inside the critical section. Also, what does Linux/Unix offer for thread safety? I did plan on making this project for other OS systems, and if they don't offer something just as fast and simple, then I would be right back to this question.

#6 Tribad   Members   -  Reputation: 854

Like
0Likes
Like

Posted 05 August 2012 - 10:54 AM

I'm not sure if it would even be beneficial to have multiple threads if I use a queue system to get and set data. I could make a copy of it, but still the data needs to be synchronized between all threads to be accurate.

A single thread should only rely on the data it has available to a specific time. That works in the following situations.
The threads have different functionality and so are not forced to be absolutly in sync.
If multiple threads are working for the same functionality the data should be split to de-couple/unshare the threads-data.

I just want one instance of the Form class, and I want it to be able to be safely accessed by all of the threads. I have seen threads about shared_ptr being slow, and I could believe that. I think they overdid it with the shared_ptr. EnterCriticalSection is what I'm leaning towards, but I'm not sure if I should have one critical section for the whole form, or for the variables inside the critical section. Also, what does Linux/Unix offer for thread safety? I did plan on making this project for other OS systems, and if they don't offer something just as fast and simple, then I would be right back to this question.

EnterCriticalSection/LeaveCriticalSection are done with a single assembler instruction. If you use something generic access to data is locked at all times. If you do the locking by yourself you can decide for every situation anew, wether access must be locked or not.
Using a mutex is something to be used between multiple processes that share data through a shared-memory-segment. So for threads critical sections are enough.
In linux/unix you find often pthreads that give you the same functions.

#7 addy914   Members   -  Reputation: 158

Like
0Likes
Like

Posted 05 August 2012 - 11:20 AM

If I were to use smart pointers to ensure thread safety, then what would I do for regular variables? Would I just turn that into a variable and deference it when I needed to change it? If I were to use smart pointers and I wrap a smart pointer around a class and each thread accesses that smart pointer at the same time, is the data inside the class thread safe? If all the threads call a function from the smart pointer and that function accesses data inside the class, is it thread-safe?

I still don't have legitimate reasons not to use atomics. They seem like a good idea, but I don't know how efficient they are. Reading about atomics, it seems that it all is based upon your CPU.

#8 ill   Members   -  Reputation: 320

Like
0Likes
Like

Posted 05 August 2012 - 12:17 PM

Here's a great article that talks about how to make modern multicore game engines. I'm basing my engine on this.

http://software.inte...el-game-engine/

If you look at PhysX and other game engines of today like Rage, Unreal, they talk about doing similar things, just not in as much detail.

I also strongly recommend you use intel thread building blocks if making a PC game, or whatever is available if you're making it on a different platform. I probably wouldn't bother with multithreading a game on a phone just yet though. Maybe give it a few years.

http://threadingbuildingblocks.org/

Edited by ill, 05 August 2012 - 12:19 PM.


#9 addy914   Members   -  Reputation: 158

Like
0Likes
Like

Posted 05 August 2012 - 06:27 PM

Yeah, I see how that would work well. How much overhead does smart pointers or critical sections have. It seems like a lot of trouble to have locak copies of all the data if the overhead isn't going to be noticeable. Theoretically thinking about it, the time to lock, exit value, and unlock wouldn't be noticeable. The issue is how much memory critical sections take upup

#10 ill   Members   -  Reputation: 320

Like
0Likes
Like

Posted 05 August 2012 - 07:35 PM

The point of the intel article was to avoid things like critical sections.
If you have shared data, it's best to pass it around as messages to a bunch of things and have them go forth and do their own thing.

You basically have a bunch of systems running on their own and being synchronized by sending each other messages, and it's best to avoid the number of systems and the number of synchronization that needs to happen since this is a memory and memcpy overhead.

For example, I could have 50 AI characters that need updating. Every game loop I'd tell the AI guys to go off and run in their own worker threads in parallel, (Very easy and takes a few lines if you are using Intel TBB or OpenMP as those manage it all for you). Then at the end of the game loop I'd synchronize all the changes up. Some AI character might be set to observe the position of another AI character so it can make new decisions next update. So i'd have the observee send a copy of its position to the observer so no critical sections or synchronization is required mid game update.

I also was reading the Unreal Engine 3 documentation and each game update is divided into 3. The pre step, the actual step, and the post step. I'm not sure I 100% agree with their method so far but they have the middle step have all actors update in parallel. The pre step and the post step update all actors serially.

Edited by ill, 05 August 2012 - 07:38 PM.


#11 addy914   Members   -  Reputation: 158

Like
0Likes
Like

Posted 05 August 2012 - 09:16 PM

Yeah, I read the article and I the idea intrigued me, but I would still like to look at all possible options. The method seems like a 'work-around' to synchronizing data. I am not sure what the efficiency of critical sections are compared to the message-queue idea. If using critical sections will cause a major overhead, then yes, I will most likely use your idea because it is the next best thing. I don't think overloading the queue with update messages is the idea, plus it will create (number of threads) * local data bytes while my method just creates 1 * local data bytes. I have 7 threads, which will create 6x more memory than I need. Plus, I will end up overloading the queue with update messages when I could just use a critical section, and at the tops it will block the thread for 1ms. I will probably put a timeout value on the critical section so it won't block the queue more than a few seconds. As long as I don't enter the critical section until I absolutely need to write/read a value, then the read/write will seem almost instantaneous.

I am going to do some tests with windows critical sections to see their performance. It seems the simplest since I get to decide when to lock and unlock it. I have seen performance problems with shared_ptr if not used correctly, and I don't want to take that chance. Atomic's have caused me problems in the past, and I would prefer not having to deal with them.

If anyone has reasons not to use critical sections, then I would appreciate information on why. Picking how to make a program thread-safe is a really tough decision in life :P

#12 ill   Members   -  Reputation: 320

Like
0Likes
Like

Posted 05 August 2012 - 10:01 PM

I implemented my own version of this and it seems to run very well so far.

I had 2 threads. One is the main game control thread with Physics, AI, etc... The other thread is the graphics.

The game thread ran at 60 Hz and the graphics thread ran as fast as it could so the FPS would be as smooth as the game can handle.

I had some simple entities in my game world that would just update their position every game step in the game thread. The graphics thread had corresponding objects for each of those objects.

So I tried it with different numbers of these objects and had AMAZING results once I got most of the bugs out. I had a release build smoothly running 50,000!!!! of these objects. This was basically 50,000 observed property updates and messages being passed about 60 times a second on an i5. I was also sending ordered messages (Messages that must be processed in the same order as created) to create and destroy these entities every 2 seconds.

Later I also added PhysX in so I had that running at the same time as my simple objects.

I think this is definitely the way to go and seems to be the way modern game engines like Unreal, Rage, Frostbite, etc... are programmed. I've put in quite a lot of work and research myself in the last months. The reason is you end up running all of your game's entities in massively parallel instead of 1 at a time, meaning it's possible to have each individual AI character make use of the cores in your PC.

Here's a google doc showing some diagrams that helped me finalize the design. It's not 100% how it turned out in the end because I improved on some ideas as I went but it's pretty close. https://docs.google....hKfpn0XRws/edit

Also here's the source forge repo if you want to see some of the code. It'd be a lot of work for you to get it compiled and all that but you might want to look in here for some ideas. The threading code itself is under illEngine/util/communication. https://sourceforge....ects/illengine/

Edited by ill, 05 August 2012 - 10:20 PM.


#13 Hodgman   Moderators   -  Reputation: 30384

Like
0Likes
Like

Posted 05 August 2012 - 10:45 PM

Shared-memory style concurrency requires very careful synchronisation, and is therefore error prone and potentially very inefficient. The cost of locking/unlocking a mutex (the computer science concept, aka "critical section", not the windows object of the same name) is a red herring. The fact that you've got multiple threads contending for the usage of the same cache-lines of RAM is going to cause your processing cores to wait for each other at a hardware level. You should avoid any design that causes cache contention.

Message-passing style concurrency is the right default. You don't have any mutable data shared between threads at any time, so you don't need any explicit synchronisation. You instead chain together sequences of tasks which communicate via messages, and let the underlying messaging framework handle the implementation issues (which internally requires shared-memory). This is not only a high-performance design when implemented properly (it's the standard approach used in HPC systems like MPI) but is much less error prone for the programmer (deadlocks, race-conditions? nope). It's also based on well-defined branches of mathematics and can therefore be reasoned about within well defined logical frameworks, unlike shared-memory programming, which only allows for ad-hoc reasoning systems.

Edited by Hodgman, 05 August 2012 - 10:53 PM.


#14 Mike Bossy   Members   -  Reputation: 662

Like
0Likes
Like

Posted 06 August 2012 - 09:02 PM

Another vote for message passing. It's the only sane way to do things unless you're an absolute concurrency guru. I've done that on my last few projects and absolutely love the results. Debugging is a lot easier :)

One other thing to keep in mind is that some things cannot be shared across threads. Depending on the libraries you are using some resources require being used on the same thread that initialized them.

#15 addy914   Members   -  Reputation: 158

Like
0Likes
Like

Posted 06 August 2012 - 11:39 PM

I see what you guys mean now. All the other synchronization methods suck :P
I am trying to imagine how this will fit in my game and a few issues come to mind. I draw my own textboxes, so I need to have a copy of the string there, I also need a copy in the input thread because when a button is clicked, I need that data from the textbox. I also am wondering about characters. I will need a copy of the characters to be drawn, but I will also need a copy in the network thread when walk packets and such are sent. This seems all a bit messy.

#16 Hodgman   Moderators   -  Reputation: 30384

Like
0Likes
Like

Posted 07 August 2012 - 12:38 AM

in the input thread

Don't think in terms of "threads". Use a framework that abstracts away threads.
Instead of having the X thread, Y thread and Z thread, it's much more efficient to run the X tasks across all threads, then the Y tasks across all threads, then the Z tasks across all threads. This makes 100% use out of a single-core machine, or a 24-core machine, whereas the "XYZ thread" system is hard-coded to best perform on a 3-core machine (and even then, certain cores will be much more over-worked than others).

Remember, in game engines, the point of using multiple threads is to take advantage of multiple CPU cores. If an operation isn't computationally complex enough to max out a single core, it doesn't need to be burdened with the complexity of multi-threading. Something like "collecting user input" definitely won't be complex enough to warrant having an entire thread dedicated to that single task (unless your user input device is a camera maybe, like the kinnect)!

Check out the "effective concurrency" series. The first one is here, and the last one has an index of them all.

Edited by Hodgman, 07 August 2012 - 12:51 AM.


#17 Cygon   Crossbones+   -  Reputation: 1091

Like
0Likes
Like

Posted 07 August 2012 - 02:21 AM

  • Windows CriticalSections and Linux futexes are usually the best option you have. If contention is low, they will burn a minimal number of CPU cycles (if no other thread is in the CriticalSection/futex, it's the price of a function call, then it will do a busy loop, repeatedly checking whether the CriticalSection/futex can be entered and only then will the thread be put to sleep (which is by comparison extremely expensive since it has to wait until the thread scheduler to allocate it another time slice when the CriticalSection/futex becomes free again).

I would recommend using std::mutex if you can (Visual Studio 2012 RC, GCC 4.6+). It's portable and provides its own RAII scopes (std::lock_guard).

  • There's also the possibility of writing lock-free data structures. These are mostly based on an operation called "compare and exchange" which all CPUs in the last 10 years have supported as an atomic operation (one that can't be preempted by another thread in the middle). There is a golden window of contention where lock-free data structures are much faster than CriticalSections/futexes - they're slightly slower at zero contention and tend to completely mess up under very high contention. They're also incredibly difficult to write even after years of experience with instruction reordering, cache lines and compiler behaviors. And they're a patent minefield.
  • Thread-local storage is the equivalent of copying your data to each thread. You seem to have one variable, but each thread reading or writing it is in fact reading a variable of its own. Sometimes useful, often confusing.
  • Smart pointers cannot help you with threading in any way. They cannot do any synchronization simply because when you call a method on the object the smart pointer is referencing, the smart pointer's function call operator will be invoked to return the address of the object. It could enter a CriticalSection/futex there, but there's no place where it could leave it again.

If a smart pointer is thread-safe, that means it won't blow up if you, for example, copy it while it's being destroying (a normal smart pointer, for example, grab the wrapped pointer, then increment the reference count - which might just have been decremented to zero between the two operations by another thread that is now destroying the object behind the wrapped pointer). Hint: Boost::shared_ptr and std::shared_ptr are not thread-safe. Boost's page on shared_ptr thread safety makes it sound a bit as if, but they're only saying that any number of threads can read the shared_ptr (dereference it) - which holds true for any C++ object - but a write (assigning, reference count adjustment) must never happen at the same time.



If you have the chance to use C++11 check out std::future to parallelize tasks in a simple way. Boost, Win32 and WinRT also offer thread pools (here's a small code snippet using the Win32 thread pool API: WindowsThreadPool.cpp) which are great of you can partition work into equal chunks (number of chunks = std::thread::hardware_concurrency ideally). Depending on the specific thread pool implementation, you can even do blocking tasks in those threads (the Windows thread pool will vastly overcommit the CPU based on some heuristics and it has a flag through which you can hint that you plan to block one of its threads for a longer time).

Edited by Cygon, 07 August 2012 - 02:23 AM.

Professional C++ and .NET developer trying to break into indie game development.
Follow my progress: http://blog.nuclex-games.com/ or Twitter - Topics: Ogre3D, Blender, game architecture tips & code snippets.

#18 BeerNutts   Crossbones+   -  Reputation: 2944

Like
0Likes
Like

Posted 07 August 2012 - 08:01 AM

(Since no-one else has asked yet, I'll be "that guy").

Do you need multiple threads? Is your game going to be so processor intensive that it would cause slow-downs if you have multiple threads? Or do you just want to use multiple threads for fun?

A vast majority of hobbyist games don't need multiple-threads (I'd guess only ~5%, if that), and adding them just makes for more work and more problems. And, when you have a bug caused by race-condition/deadlock, you're going to have a helluva time debugging it.

I just wanted to make sure you were aware of this, and that you truly need to use threads.
My Gamedev Journal: 2D Game Making, the Easy Way

---(Old Blog, still has good info): 2dGameMaking
-----
"No one ever posts on that message board; it's too crowded." - Yoga Berra (sorta)

#19 addy914   Members   -  Reputation: 158

Like
0Likes
Like

Posted 07 August 2012 - 11:04 AM

in the input thread

Don't think in terms of "threads". Use a framework that abstracts away threads.
Instead of having the X thread, Y thread and Z thread, it's much more efficient to run the X tasks across all threads, then the Y tasks across all threads, then the Z tasks across all threads. This makes 100% use out of a single-core machine, or a 24-core machine, whereas the "XYZ thread" system is hard-coded to best perform on a 3-core machine (and even then, certain cores will be much more over-worked than others).
Remember, in game engines, the point of using multiple threads is to take advantage of multiple CPU cores. If an operation isn't computationally complex enough to max out a single core, it doesn't need to be burdened with the complexity of multi-threading. Something like "collecting user input" definitely won't be complex enough to warrant having an entire thread dedicated to that single task (unless your user input device is a camera maybe, like the kinnect)!
Check out the "effective concurrency" series. The first one is here, and the last one has an index of them all.

So, you're thinking more like a thread pool with x amount of threads that all take from a thread-safe queue and will perform a task when one enters the queue. That does sound like it'd work a lot better but I am using OpenGL which requires that the context be set in the thread. I could make a new context for each thread and set it there. I also have the question, how am I going to have variables per thread, if the threads will be running virtually any time of task?

Windows CriticalSections and Linux futexes are usually the best option you have. If contention is low, they will burn a minimal number of CPU cycles (if no other thread is in the CriticalSection/futex, it's the price of a function call, then it will do a busy loop, repeatedly checking whether the CriticalSection/futex can be entered and only then will the thread be put to sleep (which is by comparison extremely expensive since it has to wait until the thread scheduler to allocate it another time slice when the CriticalSection/futex becomes free again).
I would recommend using std::mutex if you can (Visual Studio 2012 RC, GCC 4.6+). It's portable and provides its own RAII scopes (std::lock_guard).
There's also the possibility of writing lock-free data structures. These are mostly based on an operation called "compare and exchange" which all CPUs in the last 10 years have supported as an atomic operation (one that can't be preempted by another thread in the middle). There is a golden window of contention where lock-free data structures are much faster than CriticalSections/futexes - they're slightly slower at zero contention and tend to completely mess up under very high contention. They're also incredibly difficult to write even after years of experience with instruction reordering, cache lines and compiler behaviors. And they're a patent minefield.
Thread-local storage is the equivalent of copying your data to each thread. You seem to have one variable, but each thread reading or writing it is in fact reading a variable of its own. Sometimes useful, often confusing.
Smart pointers cannot help you with threading in any way. They cannot do any synchronization simply because when you call a method on the object the smart pointer is referencing, the smart pointer's function call operator will be invoked to return the address of the object. It could enter a CriticalSection/futex there, but there's no place where it could leave it again.
If a smart pointer is thread-safe, that means it won't blow up if you, for example, copy it while it's being destroying (a normal smart pointer, for example, grab the wrapped pointer, then increment the reference count - which might just have been decremented to zero between the two operations by another thread that is now destroying the object behind the wrapped pointer). Hint: Boost::shared_ptr and std::shared_ptr are not thread-safe. Boost's page on shared_ptr thread safety makes it sound a bit as if, but they're only saying that any number of threads can read the shared_ptr (dereference it) - which holds true for any C++ object - but a write (assigning, reference count adjustment) must never happen at the same time.
If you have the chance to use C++11 check out std::future to parallelize tasks in a simple way. Boost, Win32 and WinRT also offer thread pools (here's a small code snippet using the Win32 thread pool API: WindowsThreadPool.cpp) which are great of you can partition work into equal chunks (number of chunks = std::thread::hardware_concurrency ideally). Depending on the specific thread pool implementation, you can even do blocking tasks in those threads (the Windows thread pool will vastly overcommit the CPU based on some heuristics and it has a flag through which you can hint that you plan to block one of its threads for a longer time).

I originally was going to use EnterCriticalSection but I don't like that 'if contention is low'. Contention would most likely be high in my application. Yeah, a lot of systems do have atomic operations for different variable types, but it just all depends upon the system. I am checking out future and promise now, they do sound interesting.

(Since no-one else has asked yet, I'll be "that guy").

Do you need multiple threads? Is your game going to be so processor intensive that it would cause slow-downs if you have multiple threads? Or do you just want to use multiple threads for fun?

A vast majority of hobbyist games don't need multiple-threads (I'd guess only ~5%, if that), and adding them just makes for more work and more problems. And, when you have a bug caused by race-condition/deadlock, you're going to have a helluva time debugging it.

I just wanted to make sure you were aware of this, and that you truly need to use threads.


Yeah, I could probably do without threads but I want to have them. If I use the message-passing style of thread-safety, then I won't have to worry about those since they will be accessing their own local set of variables. I don't mind doing the extra work/research to implement multiple-threads. Once I do get a working system of thread-safety, I will be able to use it in most of my applications. I also know when it is okay/not okay to use EnterCriticalSection and other such alternatives to thread-safety.

#20 Hodgman   Moderators   -  Reputation: 30384

Like
0Likes
Like

Posted 07 August 2012 - 08:51 PM

So, you're thinking more like a thread pool with x amount of threads that all take from a thread-safe queue and will perform a task when one enters the queue.

Yeah, pretty much.
Though personally, I prefer not to use a multiple-producer/multiple-consumer queue as the central point, because these are hard to implement, and it's bound to have a lot of contention if your tasks are small. If possible I prefer to schedule a whole frame's worth of tasks in advance, so that the worker threads know what they're doing for the next frame up-front and don't have to keep popping items out of a queue. These are the kinds of details that a thread-pool framework should implement for you though ;)

I am using OpenGL which requires that the context be set in the thread.

Tell the thread-pool that your OpenGL tasks are constrained to thread #0 and don't worry about it. OpenGL calls for rendering should only be a small percentage of your frame time (so they don't need to be split across several cores), and a queue-based thread-pool will automatically re-balanced by doing more non-OpenGL work on the other cores.

I also have the question, how am I going to have variables per thread, if the threads will be running virtually any time of task?

I'm not quite sure what you mean. "Tasks" usually include all the variables that are required to perform the task - as long as only one thread is performing a task, then it's variables belong to that thread.

Edited by Hodgman, 07 August 2012 - 09:02 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS