Sign in to follow this  
I_Smell_Tuna

Multi-threaded game

Recommended Posts

Ok, I'm making a 3D video game and I want to put my various sub-systems on their own threads - for obvious performance reasons - while grouping dependant sub-systems together on a thread. I plan on doing this by double buffering the game data. To do this each sub-system will have a version of the game data that it updates before each loop iteration. When a sub-system loop finishes it will update the back buffer and swap the buffers. Does anybody see a problem with this method for multi-threading a video game? The only thing I can think of is that it will consume more memory.

Share this post


Link to post
Share on other sites
The performance reasons are not obvious - unless you are targeting a multiprocessor system. Are you?

The major problem is that the various sub-systems are usually very interdependent, but that doesn't mean it can't be done.

Share this post


Link to post
Share on other sites
I'm not targeting multi-processor systems, I am targeting multi-core systems. Dual core processors are already on the market, which means that an application needs to be multi-threaded to utilize both cores. The interdependancies usually are singaling of events which can be worked around easily. And any good program is designed in modules anyway.

Share this post


Link to post
Share on other sites
Quote:
Original post by I_Smell_Tuna
I'm not targeting multi-processor systems, I am targeting multi-core systems.
Same thing, different name.

Quote:
The interdependancies usually are singaling of events which you can get around easily.
Yeah, but will it provide you a speed boost? If things need to be synchronized, it stalls the game while the synchronization is worked out.

I think the double-buffer approach has some merit, if you're very selective about what is doubled and what isn't (memory is the big constraint here). The approach is very similar to processor pipelining, with many of the associated issues related to data hazards. Games do have the interesting property that frame N+1 does not really depend on frame N - if something occurs during the processing of frame N, the response can probably be delayed until frame N+2 without too much impact.

It's probably not surprising that many of the same technique for increasing parralelism show up in different domains - the process of ordering instructions in a superscalar processor is similar to ordering transactions in a database, for example. Research in other areas has already been done, and determing what, if anything, can be applied to games is a good place to start.

However, expecting it to give you an immediate performance increase is naïve - such a system will have to be very carefully designed to minimize data hazards and synchronization stalls. It's not impossible, and it is something I'm planning on researching further when I get some free time, but I'm expecting it to be very difficult to make both correct and efficient.

Share this post


Link to post
Share on other sites
Multi-processor and multi-core systems are not the same by far. Anyway, back to the topic. I mentioned that systems that are dependant on each other would be grouped together and put on the same thread, thus eliminating the need to synchronize them. Each sub-system could also have a public event queue that the sub-system checks before each loop for your sub-system synching.

Thanks for the feedback.

Share this post


Link to post
Share on other sites
The current situation on the PC side (with the Athlon x2 and Pentium D), it is basically the same as a multiprocessor (at least from a software perspective). The AMD has more ram bandwidth with NUMA on a multiproc system, and quicker cache coherency on a dual core. The Pentium D doesn't have either of those advantages, so it's dang near the same (well win xp home will take advantage of dual core, but not dual proc). CELL is the only processor I'm aware of that currently has asymmetric cores. You could have asymmetric processors too, but I haven't seen anything designed for that (Unless you want to count a GPU as another proc)

But as for your design, it depends on how dependent your sub systems are on each other. For example if there so dependent that you only have 2 threads where one does 90% of the work, you only gain 11% more speed of your possible 100% (on a dual core). So breaking them up more and maintian sync with critical sections (or whatever else works) will probably be faster (but harder to design). Or if you break it up alot (like every dynamic object having it's own thread and dealing with its environment only with event queues, for an extreme example) then context switching and managing all those queues will really hurt performance. I think dual buffering is certainly a good way of handling multiply threads (I think novodex uses a form of that), but it's not be the best solution for everything. As a general design focus you system looks pretty decent (I'm considering a similar system), but it's not an end all be all solution, so you'll probably want to mix in some other other ways of syncing multiply threads (to break up dependent things into threads, or removing queues when they're excesive), when it'll be worth it. How well it'll work is very application specific, and may be perfect for what you're doing, or be fairly ineffienct.

On a side note, When I was looking into some multithreading stuff, I found out that normal allocations (malloc, nonoverloaded new) require blocking. So consider that when you implement your queues (like use a custom allocator with std::list).

Share this post


Link to post
Share on other sites
A quick aside: Test all multi-threaded code on a dual-core or dual CPU system - this will expose more issues than on just a single CPU.

As for the idea:
Intriguing - I presume you're limiting yourself to two buffers which swap? If not, how would they swap between the various subsystems? I can't see how this could scale to multiple buffers for multiple subsystems without interaction problems.

From what you say I'd envisage something like the following:

One thread outputting current frame to hardware working on one buffer.
One thread running everything else to compute next frame working on other buffer.
Both thread's current iteration of loop finished; swap buffers.
Repeat.

Assuming I have the idea right, the single-threaded equivalent would do exactly the same; this approach should be faster at the expense of memory. How much faster is harder to answer. The critical part is that both thread loops must have finished their current task before swapping occurs. It strikes me that one loop as I'm proposing is all output, the other is all calculation, so a graphics thread and a sound thread could be operating on the output buffer, whilst the rest calculates. In addition, don't be afraid to have 'proper' multi-threading with synchronisation within those two loops as needed - as long as you control context switching, individual loops with OpenMP for example might scale well.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
There seems to be a bit of confusion between multiprocessor and multicore. The intel site explains it pretty well.

My question is: Whats the point if all of the processor heavy work is done by the one thread? You havent identified cpu bottle necks eg, AI, ROAM, collision. In the case of a ROAM system, updating the index buffer etc means that it is synced to the rendering, so it'd be part of that. AI and collision may or may not be tightly dependant etc.

Share this post


Link to post
Share on other sites
As best I can tell, every subsystem is dependent on the previous subsystem. The data flow is something like:

Input -\
> Move -> Collide -> React -> Render
AI ----/

Bascially, each stage needs the result of the previous stage for that frame (except the AI and Input stages). Assigning a CPU to each stage (obviously combining stages on current CPUs), with each stage having its own version of the world state, greatly reduces the amount of synchronization by not sharing mutable data between threads.

Each stage "finishes" at the same time - like a CPU, there's a master "clock", the slowest stage, and the other stages have to stall until the slow stage completes. Once it does, the buffers get shuffled to the next stage and the next cycle starts.

The problem is that maintaining the world state (N copies, where N is the number of stages) is expensive. There are certainly ways to reduce the memory usage - most of it doesn't change from frame to frame and doesn't need to held by each stage - but they involve spending more time. Ah, the eternal CS tradeoff...

Share this post


Link to post
Share on other sites
It infact possible to decouple the render from the rest of the logic with little problems.

The trick is to have three 'frames' of data hanging around. These would be;
- previous
- last computed
- next

Previous was the state before the current drawing took place, last computed is the the final position for the next time step and 'next' is your "work in progress" frame.

This system works by having fixed time step world up dates while the renderer throws out frames as fast as it can. While its rendering the positions/rotations etc are interpolated between 'previous' and 'last computed'. Once you get to the same time that 'last computed' was produced you move everything up the line, so that it looks like;
- last computed
- next
- previous

and then around you go again.
While you are doing the drawing based on the first 2 frames you update the third frame.
Swapping is just a simple matter of moving pointers around, which is an atomic operation, so at each 'update' point you have 3 pointer swaps per object to perform.

A key point to note with this system is the need for two clocks, one running one time step AHEAD of the other, this is for the update thread so thinks swap correctly.

I've got a small 'proof of concept' program which shows the above working (50 updates per second and over 1700fps) for a simple test scene, I think Superpig has done some work into this as well. My hope is to expand apon it all for my final year project at college for my degree [wink]

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Quote:
Original post by I_Smell_Tuna
Ok, I'm making a 3D video game and I want to put my various sub-systems on their own threads - for obvious performance reasons - while grouping dependant sub-systems together on a thread. I plan on doing this by double buffering the game data. To do this each sub-system will have a version of the game data that it updates before each loop iteration. When a sub-system loop finishes it will update the back buffer and swap the buffers.

Does anybody see a problem with this method for multi-threading a video game? The only thing I can think of is that it will consume more memory.




How do you define "sub-systems" (what are the ones you plan to use...)?

AI stuff like pathfinding is batchable and somewhat independant (yet requiring access to the current world data).

File operations (like loading assets on the fly) can be seperated.

Communications (if you have significant packet traffic)

Sound/Music management

if your animation/physics are a heavy load, you might want the rest of the game engine running on the other CPU.

Why keep seperate copies of the game data when each processor can access the same data in memory (and then maintain their own local working data).
You may still have to maintain a READ & MODIFY/UPDATE phases/cycle to eliminate
threads trying to read partially changed shared data.

Share this post


Link to post
Share on other sites
For most intents and purposes, on PC, multi-core is equal to multi-CPU. Hyper-threaded, however, is nowhere near multi-core (or multi-CPU).

There are two possible performance problems with threaded systems:

1) Synchronization has a penalty, and may additionally incur jitter caused by the OS scheduler.

2) Multiple threads (and even multiple cores) means multiple things competing for your memory bus. Your memory bus is already often the bottleneck in the system. More doesn't mean better at this point -- in fact, copying your state around a lot just means you add more memory traffic.

For I/O-like things, like asynchronous file and network read/write, and sound mixing, a separate thread for the subsystem (or thread pool, even) will make sense. For highly de-coupled compute-bound systems, a thread per system MAY make sense, if you can buffer the data sufficiently without paying additional copying overhead.

I would look at the NovodeX SDK (free download) for how to structure physical simulation as a sub-thread (although in their case, that thread may even be actual hardware). You could also conceive of a separate thread for rendering a set of game state to the screen -- although modern GPUs already do a good job of parallelizing this, using their own CPU.

So, the most I can see threading a current gaming engine would be:

- I/O in its own thread pool (for asynchronicity, not computation)
- Sound mixing in its own thread (for real-time scheduling, not really compute bound on modern CPUs)
- Physics simulation in its own thread (or on hardware)
- Rendering in its own thread
- "the main thread" which coordinates the others, reads user input, reaps physics output and sends back instructions for next frame, and funnels game state to the renderer for presentation

I would probably put AI in the main thread, or a fiber of the main thread, because I don't think the gains are there in trying to put that in a separate thread. Also, AI will be compute bound without any clear synchronization points, which means that it might hog the CPU for an entire scheduling quanta on a non-multicore system, which would lead to stuttering frame rates.

Share this post


Link to post
Share on other sites
Separating your rendering engine from the rest of your game is easy. All it needs to do is read the scene data, so just copy the data to your thread from the global data, and let the rest of the program continue. This should give you a pretty high FPS on a dual-core/multi-proc system.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this