Mulithreaded Engine Design

Started by
23 comments, last by Mononofu 15 years, 9 months ago
Quote:Original post by Mononofu
Hm.

Sounds logical.

So it would be better if I'd use a normal game loop and use multithreaded libs for CPU heavy tasks, like physics, as well as (libs with) background threads for audio, network and input?

Edit:

I finally came to this conclusion:

My suggestion:
+ threads will run constantly, no matter what other threads are doing - constant workload
- probably much overhead due to copying of all that world stuff - I don't know how much time that will take

Your suggestion:
+ not much overhead
- if one thing is only serial, it will force the other cores to do nothing


well, i would advise against having n threads running constantly. you are setting yourself up for some rather odd race conditions, particularly with synchronization between systems.

there is a strong argument for following the CSP style of having serial jobs, which go as wide as possible at each step. you have little chance of timing based race conditions, and it still allows a logical sequence of processing which is required for games.

in reality, theres little point in having more threads than you have hardware threads to run them on. and as multi core cpu's tend to share their instruction cache, having disparate threads running on the same CPU is usually not as efficient as multiple threads doing the same thing.

another option is having one thread doing game processing, and submitting render data, and your render thread processing, sorting and preparing your data for actual rendering, with a syncronisation point at the end of each frame. This obviously means your rendering is a frame behind... but thats usually ok.

in terms of style. it means you should avoid globals, avoid singletons, avoid static access to anything. no direct access, everything via members. (this is generally a good thing to do anyway... )

more threads != more performance.
scalability has more to do with efficient, and modular data design than adding more threads....

Matt D







your never as good as they say you were, never as bad as they say you was.
Advertisement
Well, the problem number 1 is you can't use data parallelism with (closed source) libraries, since the point in using libraries is that you don't have to implement it yourself. But if you want to use data parallelism, you have to write all this libs (physics, audio, graphics . . .) yourself, since (atleast, I believe so) currently there are no libs supportings data parallelism.

Of course, it's possible to combine task-based parallelism and data based parallelism, and therefore, I'm trying to first implement task-based parallelism. Later, when I have more experience how the various libs are working, I can implement data based parallelism without a problem.

btw: I got some new ideas about my engine, I will post them in a few minutes.

Edit: Ah, didn't see the posts before me. Good post, Drigovas. Do you have some more specific hints on how to implement task parallelism?

Edit2:

ImageHost.org

The user creates a new instance of GameEngine.
GameEngine gets the basic configuration from some where (config file, whatever) and creates all the subsystems and passes pointers to them to the threadmanager. The threadmanager will (at least for now, later there will maybe loadbalancing, etc.) create a thread for every subsystem and start them.
The User subsystem will manager input and is responsible for managing all actions by the user. These actions will be passed to the network subsystem, which is responsible for validating these actions (for future multiplayer possibility). The network system then passes these actions to the physics subsystem.
Communications between the threads (if needed) will either occur via message passing, using a messageBuffer, for general and small messages, or using special objects, like GeometryOverlay, Actions or Settings. Mostly, messages will be used to notify the subsystems of certain events.
The physics system (the only one actually modifying world geometry) only writes to GeometryOverlay, which in turn overwrites the actuall geometry data everytime the physics system has finished a pass. Of course, this will happen using locks, to avoid corrupted scenes.
AI will be handled exactly like the player, except that it got its own subsystem. (and therefore, thread)
Any suggestions or hints?

[Edited by - Mononofu on July 26, 2008 9:46:25 AM]
Quote:Original post by Mononofu
Well, the problem number 1 is you can't use data parallelism with (closed source) libraries, since the point in using libraries is that you don't have to implement it yourself. But if you want to use data parallelism, you have to write all this libs (physics, audio, graphics . . .) yourself, since (atleast, I believe so) currently there are no libs supportings data parallelism.
Not really. Physics simulation is a rather ideal target for data parallelism, and you'd be hard pressed to find a industrial strength physics engine that doesn't feature data parallelism in some fashion. This is actually a pretty good example of the argument I was making earlier about being able to add data parallelism later in a project. If you have a physics step in a sequential program, you can scoop a sequential engine out, and drop in a parallel one [that'll likely be data parallel, as opposed to task], and you'll get that performance jump essentially for free. For example: Havok physics is threaded heavily, and while it is closed source, I'd put money on it that they do it by segmenting regions off, and running them in a dara parallel fashion [if only because this is pretty much the traditional way of handling these sort of spatial problems in threading]. If you'd like another example, Bullet physics [open source] has some threading options too. It's been a bit of time since I've dug through the source, but last time I did, it had a sequential dynamics simulation followed by a sequential or threaded [optional] constraint solver. Again, easy modularity for future changes.... Invent a threaded dynamics solver, and you can just drop it into bullet, and then bullet is threaded.

You'll likely be harder pressed to find something a bit more specialized, like multithreaded steering [Not that it'll matter, since it is comparably cheap]. But really modular things like physics are freqeuently available threaded.
Upon request: My design.

Sorry about the big brick of text. And a quick warning: I tend to confuse myself a bit when I type single posts that are this huge. If anything seems kind of strange, that is why...

My design is split up into two components with respect to games. The main reason for this is as I described regarding my design phylosophy and task parallelism: it is better to design a single communication component first, and build everything else ontop of that. So for that reason, my design consisted of a communication framework, and ontop of that is built all the game specific stuff.

COMMUNICATION FRAMEWORK [nothing game specific]:

My communication framework consists of 4 main objects/interfaces/things.

Communication Port: a communication port is the exit point for update data to be sent to other objects. There is nothing that inherits from communication port. Each communication port relates to zero or more communication channels. Updates are published from the thread holding the communication PORT, not the communication channel.

Communication Channel: this is the receiving side of updates, and each communication channel relates to zero or more communication ports. There are lots of different kinds of communication channel for dealing with the different types of updates that will be discussed later.

Task: This is a job that must be done. They are expressed as three stages, and are treated in a way very similar to pure functions. They receive input from communication channels, compute the next step based on internal data and received input, and then publish their updates through their communication ports, in that explicit order, each interval. They do NOT access any outside data in other objects, and maintain internally all state that is required to operate effectively [yes, this means a lot of redundancy of data]. Due to this explicit ordering, this means that all updates that are sent out are the result of inputs that were completely received by the time the updates were published. For this reason, there are no partial updates, and there is never an instance in which an update is published to one task with data that is different from the data that is published to a different task. Updates are published once per iteration, and all updates are published in ONE call of 'send' per communication port [relevance discussed later].

Task Scheduler: A task scheduler is an abstract interface that is initialized with either a set of one or more tasks, or a set of one or more task schedulers. A task scheduler effectively doesn't do any of the actual work, but is mostly used to provide a tagging mechanism that I can use to coordinate communication between tasks. The real importance of a task scheduler is to determine the type of communication channel that will be instantiated between the various tasks based on a set of rules so as to maximize performance. There are many types of task schedulers. Some schedulers are sequential, in that they process contained tasks from start to finish. Others are lock-step, in that they process their tasks in parallel, but do so at a uniform frequency. Some are signaled schedulers that wait for communication, some run without a fixed frequency, and so on.

One specific type of task scheduler is called the 'master' task scheduler, and contains all the others. This acts as the entry point for creation, and has functions that initialize the tree of other task schedulers, and implements the various rules by which channels are created. It also presents all the different tasks to eachother, so that they can request appropriate communication channel creation. The master task scheduler analyzes the logical topology of tasks [as specified by the structure of contained schedulers] and works out the types of synchronization support that are needed at the various channels. The master task scheduler doesn't provide any restrictions on the speed at which it's immediate tasks run

An example of the rules that I was refering to are as follows :

If Task A and Task B are both in the same sequential scheduler [in that they are run in one thread, A-B-A-B-A-B], and B requests communication from A: Create a channel that does not provide any buffering of updates [relying on the characteristic that updates are posted at the end of a task, and thus an update will not become invalid at least until the start of the task on the next iteration].

If Task A and Task B are run in lock step, then provide buffering, but do not create a channel that handles multiple iterations of A for each of B.

If Task A and Task B are run in different threads, NOT in lockstep [cycle on different frequencies], then provide a communication channel that implements an 'update merge', that squeezes an update together with a previous update, to make one big update. The purpose of this is to not allow for fast tasks to flood the memory with updates that are queued up for slow tasks [very important. Actually, without this detail, the physics task in my current project is capable of maxing my memory with updates waiting to be processed by the long-term AI strategic planning task].

If Task B is fed only by Task A, then use a channel that doesn't perform any synchronization.

.... and other rules here for other special cases

When in doubt, act conservatively: Instantiate a buffered channel w/ merged update and full synchronization.

The end result is a sort of tree structure of schedulers as tree nodes and tasks as leafs. For example, a typical render split engine with asynchronous file IO and pathfinding might look something like this:
MasterScheduler+SignaledScheduler|+FileLoader+SignaledScheduler|+Pathfinding+SequentialScheduler|+LongTermPlanning+LockstepScheduler|+SequentialScheduler||+User updates||+GameLogic||+Steering||+Physics|+SequentialScheduler||+Render
And would look pretty much exactly like that in the code as well [only with a lot of 'new this' and 'new that' statements]
After the master scheduler is created, it has 'initializeCommunicationChannels' called, that asks each component who it wants to talk to, and establishes communication accordingly. The benefit of this is that if you find the physics step takes too long in the above example, and that you can take the latency introduced by making it it's own task, you can break it off, and the only change to your code is changing:
... [stuff] ...new LockstepScheduler(     new SequentialScheduler(          new UserUpdate(),          new GameLogic(),          new Steering(),          new Physics()),     new SequentialScheduler(new Render()))... [stuff] ...
to:
... [stuff] ...new LockstepScheduler(     new SequentialScheduler(          new UserUpdate(),          new GameLogic(),          new Steering()),     new SequentialScheduler(new Physics()),     new SequentialScheduler(new Render()))... [stuff] ...
Oooooh-kay, so communication is handled and spoken for. So then it goes into the game stuff.... which isn't quite as relevant to this discussion.
// I'm pretty frustrated. I'm writing this post the third time >.<

Thank you very much for posting this! Very interesting.

I like your communication channels, they look quite interesting to me, especially to direct the messages to where they belong. I think I will implement something similar in my engine.

Your schedulers are also quite good, mainly to ensure nothing unnecessary is computed. (ie physics calculating two updates and graphics only rendering one)

This topic is closed to new replies.

Advertisement