• 14
• 14
• 9
• 10
• 9

Multicore design... harder than this, somehow?

This topic is 3435 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Share on other sites
I think the biggest problem is that entity updates are rarely independent of other entities. If entity A needs to know the position of entity B, how do you handle that without either a) creating a read-only copy of entity B for entity for each update or b) using some sort of locking to ensure that thread 1 does not update entity B's position while thread 2 is trying to read it?

You could possibly "batch" updates so that if entity A "knows" it's going to need information about entity B it gets queued to the same worker thread, but then you've just added more problems: how do you know entity A will need access to entity B's state? Besides, it is rare that entities can be batched like that and still be able to be separated out effeciently (e.g. usually entity A needs entity B's state, entity B needs entity C's state and so on)

Also, you mention a "cleanup" thread which iterates over the "done" list. How do you protect the "done" list while the worker thread is writing to it and the "cleanup" thread is trying to read from it at the same time?

In the same vein, how does the game add things to the worker's "work" queues at the same time as the workers are reading from them?

Share on other sites
Quote:
 So anyway, seeing the huge fuss everyone's making on this subject and the reasonably short time I came up with a solution, I figure I must've forgotten something important or overlooked an aspect of the design. What is it? It seems too good to be true :S
I'm not crystal clear on how your design works, so let me try and clarify here:
* All entities in the database that need to be updated are identified.
* Indexes/ptrs to these entities are then divided among the worker threads 'work lists'.
* The worker threads make a copy of each entity in it's 'work list' and calls an Update func on the copy.
* If Update needs to read any data from another entity, it will read data from the database, not from one of these "thread private" copies.
* The copies are placed into a 'done list'.
* At the end of the update cycle, the copied entities in the 'done lists' are written back to the database.

^^ Is that correct?

Quote:
 Original post by InvalidPointerAs with Intel's Smoke and many other 'highly-threaded' architectures, a core scheduler/monitor class prepares n worker threads for execution. In my implementation, this would basically entail creation (or hypothetical reuse via an object pool, etc.) of a 'done list' and a 'work queue' for each of the worker threads. Both are linked lists only accessed by their respective threads and as such do not have mutices assigned to them. Work queue construction is arbitrary, but I thought doing an even division of all entities to process makes the most sense. Work balance may be an issue, but the solution is fairly straightforward and will be discussed more later.
My worker pool works in a similar fashion. Each worker thread has it's own private (mutex-less) queue of tasks to run. The workers fill up their own private queues from a shared public queue (using lock-free/interlocked techniques). Each worker thread grabs more than one task at a time from the public queue to minimise contention.

Quote:
 While the work queue covers what to *do*, the 'done queue' cover's what's been *done* and is what I would consider to be the strength of the architecture. When a thread finishes processing an entity/whatever, it just writes the result to a temporary buffer until a separate 'cleanup' thread iterates over all n thread 'done queues' and commits them to the 'real' database for later use. By doing this we completely remove the need for locks as we can guarantee only one thread is ever going to be accessing both lists and as such there's no need for exclusion.
Each thread has it's own 'done queue', which is basically a list of delayed writes to a database, right? Doesn't this mean that writing to the database requires a lock (etc), as two threads might be going through their 'done queues' at the same time?
[EDIT] Oh, I see there is another thread that processes the 'done queues'. Does this mean that the overall update kinda looks like this?
Create list of entities to update.
Update entities, output to thread's done queue.
Iterate through all done queues, commit data.

(i.e. the clean-up thread can safely use the done queue's because the worker threads are guaranteed to not be updating any entities when the clean-up thread is running.)

[Edited by - Hodgman on October 21, 2008 8:47:55 PM]

Share on other sites
Quote:

Yeah, I wasn't particularly explicit about that. The point was that there's no need for a lock on the central DB as there are no writes to it while updates are still being done. See www.flounder.com/no_synchronization.htm for an explanation and the actual point that got my mind going. Ultimately I can read from them with impunity.

On a similar note, I don't need locks for the per-thread lists-- they're per-thread and as such they aren't going to change without the specific owning thread doing something to them. That's the point of a lock. Additionally I aimed to *defer* final updates and the actual writes until I know nothing actually needed to read from it. Again, there'd be no need for a lock.

Quote:
Original post by Hodgman
Quote:
 So anyway, seeing the huge fuss everyone's making on this subject and the reasonably short time I came up with a solution, I figure I must've forgotten something important or overlooked an aspect of the design. What is it? It seems too good to be true :S
I'm not crystal clear on how your design works, so let me try and clarify here:
* All entities in the database that need to be updated are identified.
* Indexes/ptrs to these entities are then divided among the worker threads 'work lists'.
* The worker threads make a copy of each entity in it's 'work list' and calls an Update func on the copy.
* If Update needs to read any data from another entity, it will read data from the database, not from one of these "thread private" copies.
* The copies are placed into a 'done list'.
* At the end of the update cycle, the copied entities in the 'done lists' are written back to the database.

^^ Is that correct?

Quote:
 Original post by InvalidPointerAs with Intel's Smoke and many other 'highly-threaded' architectures, a core scheduler/monitor class prepares n worker threads for execution. In my implementation, this would basically entail creation (or hypothetical reuse via an object pool, etc.) of a 'done list' and a 'work queue' for each of the worker threads. Both are linked lists only accessed by their respective threads and as such do not have mutices assigned to them. Work queue construction is arbitrary, but I thought doing an even division of all entities to process makes the most sense. Work balance may be an issue, but the solution is fairly straightforward and will be discussed more later.
My worker pool works in a similar fashion. Each worker thread has it's own private (mutex-less) queue of tasks to run. The workers fill up their own private queues from a shared public queue (using lock-free/interlocked techniques). Each worker thread grabs more than one task at a time from the public queue to minimise contention.

Quote:
 While the work queue covers what to *do*, the 'done queue' covers what's been *done* and is what I would consider to be the strength of the architecture. When a thread finishes processing an entity/whatever, it just writes the result to a temporary buffer until a separate 'cleanup' thread iterates over all n thread 'done queues' and commits them to the 'real' database for later use. By doing this we completely remove the need for locks as we can guarantee only one thread is ever going to be accessing both lists and as such there's no need for exclusion.
Each thread has it's own 'done queue', which is basically a list of delayed writes to a database, right? Doesn't this mean that writing to the database requires a lock (etc), as two threads might be going through their 'done queues' at the same time?

You pretty much have it. I had originally conceived that there's no real reason the determination for update (a 'dirty flag,' if you will) and the actual update couldn't be merged, but other than a few semantic differences that's right on the money. I also thought that just having the threads iterate over a certain span of indices in the handle table made the most sense-- it didn't involve copying.

Object creation might mess things up a bit now that I think about it. I'll have to sleep on that one.

EDIT: Yeah, you got it. Although 'serial task' is a bit of a misnomer. I think of it more as 'bit that requires one available thread.' Others can do other threadly things in the mean time, such as prepping display lists, sound, rendering calls, whatever.

Share on other sites
Quote:
 Original post by InvalidPointerYeah, you got it. Although 'serial task' is a bit of a misnomer. I think of it more as 'bit that requires one available thread.' Others can do other threadly things in the mean time, such as prepping display lists, sound, rendering calls, whatever.
Yeah, I meant "serial in respect to other entity related tasks". I have a class (FrameManager) in my system that handles the whole "update cycle" - in this class I have a FrameState variable/enumeration (for debugging mostly).

When someone calls fm.Update I assert that fm.state == eIdle and then set it to ePreUpdate. Then I get all the entities/threads ready to start updating. Then I set state to eUpdate and set all the worker threads going. After all the workers are finished, I set state to ePostUpdate and do clean-up tasks (such as processing your done lists) and then set state to eIdle again.

Tracking these states has been very a very useful development/debugging tool - if you put lots of assertions in your database/entities then you can pick up any threading/sequence problems straight away -- such as if someone tries to write to the database while the FM is in then eUpdate state, or if someone creates a new entity during ePostUpdate, etc...

Quote:
 Object creation might mess things up a bit now that I think about it. I'll have to sleep on that one.
The problem that I had with object creation was figuring out how to assign ID numbers to new objects in a deterministic manner.

Determinism is a key requirement of my system (if I run the same simulation, I should get the same results, even if different entities are updated in a different order) and object ID's are used for network replication, save-games, etc..
e.g. In a multi-player shared simulation, where the simulation is run on all PC's (e.g. for an RTS game) if different PC's assign different object ID's to a new object, then the network will lose synch.

The solution I found was to assign ID's at the end of the frame (i.e. in ePostUpdate state) and to sort all new objects in a deterministic order.
In my system, objects can only be created in the update stage, so during updating I keep track of which entity is currently being updated. When a new object is created, I pair it with the ID of this (currently updating) entity.
Then at ePostUpdate I sort the new objects by their paired ID's.
I use TLS for this, e.g.
thread_local<Entity*> g_tlsCurrentlyUpdating;void RunUpdateEntityTask( Entity& plzUpdateMe ){  g_tlsCurrentlyUpdating = &plzUpdateMe;  plzUpdateMe.Update();  g_tlsCurrentlyUpdating = 0;}Entity::Entity( FrameManager& fm ){  fm.EntityConstructed( this );}void FrameManager::EntityConstructed( Entity* p ){  ASSERT( state == eUpdate )  ASSERT( g_tlsCurrentlyUpdating != NULL )  constructed.push_back( std::make_pair(p,g_tlsCurrentlyUpdating->GetId()) );}void FrameManager::DoPostUpdate(){  ASSERT( state == eUpdate )  state == ePostUpdate;    std::stable_sort( constructed.begin(), constructed.end(), sortOnPairDotSecond );    for each constructed    constructed.first->SetId( GetNewObjectId() );}

Share on other sites
Just a quick note. Have you taken a look at SGI's Performa docs? It's threaded rendering is basically a lock free threaded pipeline divided into 3 distinct phases.

App->Cull->Draw

IIRC each phase sits in its own thread, with a copy of its own state. And at any point in time code operating in each phase sees only its own frame's data. The cost of this would be a 3 frame latency between currently executed code and the time till which it is drawn, the advantage would be interframe latency would be ~ the time of the slowest phase. In an ideal world...

Share on other sites

I recall from one of the mags (Game Developer?) that triple buffering the graphics rendering is done with one buffer for the current display the next for the frame being built and a third to get started on if the second frame gets done early (to be able to make up time better if that next+1 frame takes an extra amont of time). The third frame can be discarded if some significant shift (such as the player changing course) takes place.

Note that would have to integrate with the simulation buffering system the OP describes where the relevant data for each frame is finalized as early as possible (and protecting the current set by using massed delta updates at the frame changeover (the third frame could start on static scenery rendering (or things like wave effects) which wont rely on object data yet).

Cyclic interlocks control the task list postings and access to the different frame data sets. Keeping all CPUs busy can be hard, but some tasks like AI pathfinding can be run across 'turns' and so fill in on CPUs which are idle for scheduled processing (ie waiting for a single thread that does the lockless
DB update pass.