Multithreading: locks in World entities

Started by
6 comments, last by Hodgman 14 years, 5 months ago
(a small preamble first, I started somewhat a related topic in Multiplayer and Network Programming but I feel it's a more appropriate place for such discussions so I'm posting a more general version here as well) Guys, I'm developing a multiplayer game application with C++ and currently in the process of choosing an appropriate multithreading architecture for it. The core of the application is the endless loop which essentially updates each frame all entities of the game World. Currently this World loop is singlethreaded. It's working just fine but I'd really like to make it more scalable on multicores. Since all World entities exist in Locations, as follows: - Location - WorldEntity - WorldEntity - ... - Location - WorldEntity ...I was thinking about running each Location(and its updating logic) in a separate thread. This means I need to synchronize properly the World entities. And this is what I really don't want to to do since, I believe, explicit locking in domain classes methods is wrong and it makes the development, maintaining and debugging much-much more difficult. At first I was thinking about isolating Location entities from entities in different Locations by forbidding any calls between them. What are possible ways to achieve this? Store entities of each Location in a thread local storage so that they are not accessible from outside? Or maybe instead of a thread per Location use processes instead?(but that's going to complicate everything a lot). However even if Location entities are nicely isolated there another problem - persistence. I already have some sort of a simple generic persistence service which is running in a separate thread. It can be used in async mode, it accepts an object to be saved and returns a special future object which can be used to track the persistence process. I would love to use this service, however since it's running in a separate thread I again need to properly synchronize access to domain classes. In this case the possible option could be to implement proper cloning of domain objects so that persistence service would accept a copy of the object to be saved and no explicit locking would be needed... Hence the question, is all said above worth it? Or maybe I should simply add explicit synchronizing logic into all domain classes and be done with it? Or maybe there is some better option I'm not aware of? Thanks in advance [Edited by - pachanga on November 8, 2009 3:13:31 PM]
Advertisement
Quote:Original post by pachanga
At first I was thinking about isolating Location entities from entities in different Locations by forbidding any calls between them.
This solution sounds ok -- your game will scale to N cores, where N is the number of locations to be processed.
If you have 4 locations, you game will scale up to a quad core CPU (but won't scale any more than that).
Quote:What are possible ways to achieve this?
Just simply don't create any links (pointers, references, etc) between entities in different locations!

You'll also need a thread safe messaging system if locations need to be able to communicate with each other. You can use a lock-free queue to pass messages from one thread to another safely.

You don't need to go putting each location in it's own process, just use discipline (and add debugging code to check you haven't made any mistakes).

You could have debugging code like:
//Thread local global variable to tell which location is being processed on this thread:threadlocal int g_CurrentLocation;void Location::Update(){  g_CurrentLocation = this->ID;  for each this->entities as e    e->Update();  g_CurrentLocation = 0;}void Entity::Update(){  //make sure this entity is being updated by the correct thread  assert( g_CurrentLocation == parentLocation->ID )  //...}Entity* Location::GetEntityByName( const std::string& name ){  //if someone is trying to get a link to one of this location's entities  //make sure they are also being processed by this location  assert( g_CurrentLocation == this->ID );  return this->entities.find( name );}
Quote:Or maybe I should simply add explicit synchronising logic into all domain classes and be done with it?
If you choose this option, I'd recommend using lock-hierarchies to ensure you don't get any deadlocks.
Quote:Or maybe there is some better option I'm not aware of?
There's no "right" way to do concurrency... but IMO, mutex-based concurrency is outdated, and will become ever more outdated with more and more cores becoming available to us. Everything should be based on "do work, memory fence, publish work". If possible you should wait for an existing memory fence rather than adding a new one for each piece of work.
Some links:
http://www.insomniacgames.com/research_dev/articles/2009/1500943
http://www.ddj.com/hpc-high-performance-computing/210604448

In my system, every game entity can potentially be running on a separate thread without any locks/mutexes.
I use the message passing / actor model of concurrency, where all function calls between entities are converted into "messages", which are executed at a safe point in time (not immediately). Return values from messages are returned as futures.
The system runs in cycles (each cycle is a memory fence). Each cycle all entities with pending messages execute their messages (each entity in parallel). Processing a message may generate new messages, which are added to a wait-free queue. At the end of the cycle, this queue is divided up among the worker threads and a new cycle begins.
If there is no work to do in a cycle (the queue is empty), then that means it's the end of the frame, and I increment the frame counter / time variables.

[Edited by - Hodgman on November 8, 2009 9:56:14 PM]
Quote:Original post by Hodgman
Some links:
http://www.insomniacgames.com/research_dev/articles/2009/1500943
http://www.ddj.com/hpc-high-performance-computing/210604448


Thanks, I'll have a look

Quote:
In my system, every game entity can potentially be running on a separate thread without any locks/mutexes.
I use the message passing / actor model of concurrency, where all function calls between entities are converted into "messages", which are executed at a safe point in time (not immediately). Return values from messages are returned as href="http://en.wikipedia.org/wiki/Futures_and_promises">futures.


Yep, the idea with futures is great, boost::future by Anthony Williams is a very nice implementation of this idea. BTW, I wonder what are the messages actually in your system? Are they just functors which are queued into a mail box of the recipient? E.g something like this:

PlayerPtr player = getPlayer(...);SomeFuture future = player->queue(boost::bind(&Player::doSomeThing, player, arg1, arg2));


Quote:
The system runs in cycles (each cycle is a memory fence). Each cycle all entities with pending messages execute their messages (each entity in parallel). Processing a message may generate new messages, which are added to a wait-free queue. At the end of the cycle, this queue is divided up among the worker threads and a new cycle begins.
If there is no work to do in a cycle (the queue is empty), then that means it's the end of the frame, and I increment the frame counter / time variables.


Sounds interesting, thanks for sharing.

It's also interesting to know whether your objects are safely cloneable/copyable.
Quote:Original post by pachanga
Yep, the idea with futures is great, boost::future by Anthony Williams is a very nice implementation of this idea. BTW, I wonder what are the messages actually in your system? Are they just functors which are queued into a mail box of the recipient? E.g something like this:
PlayerPtr player = getPlayer(...);SomeFuture future = player->queue(boost::bind(&Player::doSomeThing, player, arg1, arg2));
Yeah that's pretty much it (store the function address, this-pointer and arguments in a struct and put it in a queue), but I use some macros of my own instead of boost::bind, so the syntax I use is:
class Foo{public:  Method2( int, doSomething, int, arg1, int, arg2 );};int Foo::doSomething_( int arg1, int arg2 ){  return arg1 + arg2;}Foo* pFoo = ...;TFuture<int> four = pFoo->doSomething( 2, 2 );
An important consideration here is the memory allocator used to manage these structs/queues. My current implementation is over 100 times faster than my first proof-of-concept because I use thread-local temporary memory stores (most structs are only needed for a single cycle, simplifying allocation routines a lot) whereas my first attempt just used the default new/delete.

I haven't actually looked at boost::future, but one thing that my macros do is accept 'future' arguments as well as regular values, so you can write:
pFoo->doSomething( 2, pFoo->doSomething(1,1) );
The macro detects at compile time whether any of the arguments are futures and knows how many cycles it has to delay execution of the doSomething_ function.
With no futures passed in, there is a 1 cycle delay. With a plain future passed in there is a 2 cycle delay.
In the above case, the return value is a TFuture<int,2>, meaning the result won't be known for 2 cycles. If this value was passed to a 'message function' the macros would know to delay execution for 3 cycles, etc...

Quote:It's also interesting to know whether your objects are safely cloneable/copyable.
I haven't needed this yet, but you could return a copy as a future.

As well as futures, objects can pass their own address to a function, e.g.
class Foo{public:  Method1( void, SendMeAClone, CloneReceiver*, arg1 );};void Foo::SendMeAClone_( CloneReceiver* pRec ){  Foo temp = *this;  pRec->HereIsYourClone( temp );}
Thank you for the reply. As with every solution there are always pros and cons, could you please name any drawbacks?
Well for one I have to use my ugly macros, e.g. this:
  Method2( int, doSomething, int, arg1, int, arg2 );

instead of:
  int doSomething( int arg1, int  arg2 );


Secondly, as I said, you've got to be careful how you handle the memory allocation for the "queued function objects" (the struct that holds the args + function pointer), as you'll be doing one of these allocations per function call.
By default that will be a ridiculous amount of overhead. You need a low-overhead constant-time allocator (and usually optimising memory allocation *speed* means that you waste memory *space*).

Also, the queue itself that you put the data into should be based on a wait-free algorithm, and care must be taken to avoid false sharing.

Third, using message passing for everything is a *general purpose* solution to concurrency. If you study a particular problem, you might be able to come up with a *specific* solution that is better for that single case.

Lastly, you've got to keep in mind that parallel logic doesn't always happen in the same order. If you've got a "game replay" function, it might break because the threading system could execute messages in a different order each time you run it.

e.g. If a player is on 10 health and gets 3 messages "at the same time" (i.e. during a single cycle):
TakeDamage(5)
TakeDamage(5)
GiveHealth(5)

If the order of these messages are executed in the above order, the players health will drop to 0 and they'll die.
But if they were executed in the below order, the player would survive:
TakeDamage(5)
GiveHealth(5)
TakeDamage(5)

I get around this by assigning each message a "priority" -- so when writing the game logic, I might give all "GiveHealth" messages higher priority than "TakeDamage" messages.
Each cycle, every entity sorts their current message queue by priority before executing them, in order to stop these "random threading bugs" from occurring.


What I like about it though is that it just makes things automatically concurrent without having to worry much about it (besides the priority thing!). You can be a user of a message passing system without being a threading expert.
There's no dead-locks, and as long as you assign the right priorities, there's no race conditions either, and you never have to use a mutex (in the game-logic code).
Thank you very much for your time and detailed explanations. BTW, what do you use for memory fences implementation?
On windows, I use the Interlocked family of functions.

On any x86 platform though (Windows/Linux/etc) you can use the LOCK prefix with the XCHG/CMPXCHG/etc instructions. The Windows interlocked functions compile down to these instructions.


Windows also provides _ReadWriteBarrier (and _ReadBarrier/_WriteBarrier), which on x86 are equivalent to calling any Interlocked function on a temporary variable.

On other platforms these functions may be more efficient -- x86 only has "full barriers" (i.e. _ReadBarrier/_WriteBarrier will do the exact same thing as _ReadWriteBarrier on x86), but other CPUs support "partial barriers", making _ReadBarrier/_WriteBarrier more efficient.

This topic is closed to new replies.

Advertisement