Concurrent rendering & game-logic

Started by
14 comments, last by Polarist 11 years, 2 months ago
I just released my first game, so I thought I'd throw in my two cents.

I had the same idea as you originally, "wouldn't this be great if the logic could run parallel to the graphics, I could get twice the perf!", so I went down that road with my game design. It's a somewhat simple puzzle game that's heavier on the graphics than anything else.

I separated all of my drawing into a second thread, and had a collection of "proxy" objects in the logic thread, such that when I wanted to do something graphics related, I called a method on the proxy which cached it, and then during the syncpoint between threads I transferred the proxy commands to the actual graphics objects. The idea was that my two threads would only need to be synchronized for a short window, and then the logic for the next frame could run parallel to the drawing.

A neat idea in theory, but all in all looking back I do somewhat regret it. I failed to accurately predict where my bottlenecks would be, and it turned out my game spent nearly 95% of the time in the render loop, such that I got a negligible benefit from putting them in parallel. I also had many hard to fix bugs and confusing moments when trying to keep the threads separate, "Can I call this method on the non-graphics thread? Does the order of calls to the proxy matter? etc, etc". I probably lengthened my development time by two months for almost no noticible performance gain.

So I'll say this can certainly work if you want it to, but make sure you actually will need it before unleashing a huge amount of extra headaches on yourself. Sounds like you're up for a challenge, so maybe you'd like to go this way, but if you're looking at it from a business standpoint make sure you can justify the extra development time.
[size=2]My Projects:
[size=2]Portfolio Map for Android - Free Visual Portfolio Tracker
[size=2]Electron Flux for Android - Free Puzzle/Logic Game
Advertisement

if you're looking at it from a business standpoint make sure you can justify the extra development time.


No business is concerned. I make my money at my day-job.

If business was concerned, I'd be using a game engine ( probably Unity ) and focusing on time-to-market.

The current game I am working on is a strategy game with lot's of path-finding (which takes up ~70% of the CPU).

So I thought to myself: Why not go multi-threaded?

I have done multi-threaded before in my own sick & twisted ways.

But now that i've realized how much of a common problem this is, I would like to know if a "standard" design pattern exists to render multi-threaded.

I am not looking for one of those "don't do it, cause it's a waste of time" replies.

I am curious & reckless, meaning I am willing to make an effort to satisfy my curiosity even if it kills my project, because I like learning new stuff.

My Oculus Rift Game: RaiderV

My Android VR games: Time-Rider& Dozer Driver

My browser game: Vitrage - A game of stained glass

My android games : Enemies of the Crown & Killer Bees


I've done many single threaded renderers (Using VBOs since forever). Started to get bored, so I thought I'd give multi-threaded rendering a try. I'd love the extra complexity to shake things up :-) .

There are many ways to go out that. Here are a few topics I plan to cover in my upcoming book:

  • Redundancy Checks

    • Immediate
    • Deferred
  • Render Queues
  • Fill-Rate Reduction
  • Bandwidth Reduction
  • Proper Vertex Buffer Updates
  • Efficient Swapping of Render Targets

    • Avoiding Logical Buffer Loads
  • Uniform Redundancy Checks
  • Shader Redundancy Checks
  • Physically Based Rendering

    • Physically Based Blinn-Phong
    • Efficient Oren-Nayar
  • Frustum Culling
  • Multi-Threaded Rendering

  • There are tons of other things you can do.



    As L.Spiro said, I think that multithreaded rendering, or atleast creating multiple command queue concurrently can help.


    So I gather that queueing GL render commands is the "traditional" way to go?
    [/quote]
    It is. This is what we do on our in-house engine for Xbox 360, PlayStation 3, PlayStation Vita, etc. We used it in Infinite Undiscovery, Star Ocean 5, Valkyrie Profile, etc.
    It is the time-tested traditional way to do it.



    But now that i've realized how much of a common problem this is, I would like to know if a "standard" design pattern exists to render multi-threaded.

    I am not looking for one of those "don't do it, cause it's a waste of time" replies.

    I am curious & reckless, meaning I am willing to make an effort to satisfy my curiosity even if it kills my project, because I like learning new stuff.

    I already explained the underlying concepts, though I really don’t know how well it will work in Java even if you did everything perfectly.
    As mentioned before, my book will have a chapter on multi-threaded rendering and include a running C++ demo with and without multi-threading for comparison. Although it will be using OpenGL ES 2.0, the same concept applies to any renderer and any API.
    Unfortunately it will not likely be available for another year.

    There are still many other ways to feed your appetite. Have you done physically based rendering before? Have you implemented render queues?
    Even if you insist on continuing with multi-threaded rendering I still suggest you sit on your hands and think about it. When you are done, think about it some more.

    As was mentioned, if you can’t anticipate where the bottlenecks will be, you can’t succeed at the task, period.


    L. Spiro

    I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

    Also if you insist on continuing with multi-threaded rendering instead of thinking about it for a few months, might I suggest starting a new project whose goal is specifically multi-threaded rendering?
    No need to mess up your current project since it will probably take a few tries to get it right.


    L. Spiro

    I restore Nintendo 64 video-game OST’s into HD! https://www.youtube.com/channel/UCCtX_wedtZ5BoyQBXEhnVZw/playlists?view=1&sort=lad&flow=grid

    I got around to implementing render-lists.

    It was relatively painless and did the trick.

    Thanks for the great advice.

    My Oculus Rift Game: RaiderV

    My Android VR games: Time-Rider& Dozer Driver

    My browser game: Vitrage - A game of stained glass

    My android games : Enemies of the Crown & Killer Bees

    If you're still considering the multithreaded approach, take a look at the Smoke demo from Intel which builds a n-core scalable game engine using a task based approach (using tbb).

    A task-based approach makes it quite painless to construct multithreaded processes. To get up and running, it took me just a couple hours to go from a single-threaded design to one that was achieving 100% CPU utilization over 4 cores during a few of my processing heavy operations. There was certainly much more work to be done afterwards to fully take advantage of multithreading, but it was quite easy to speed up the performance sensitive pieces for my needs.

    The idea behind a task-based and scalable solution is basically to "future-proof" the engine for any later developments in hardware, and to be decoupled from the number of physical cores available on the machine. I.e. the engine should maximize hardware usage regardless of whether you are allotted 1 or 20 cores.

    There are fairly modern game engines that take the older approach of assigning subsystems their own threads. (E.g. the renderer gets 1 thread, the physics system gets 1 thread, etc.) But as far as I'm aware, that method is no longer advisable and antiquated.

    The "new" way is to basically to cut up all your game's subsystems into smaller, independent tasks. You then assign those tasks to cores as they finish their current tasks. So for instance, your physics calculations could be split up into 8 tasks, your render queue construction could be split up into 4 tasks, your particle systems can be split up into 6 tasks, whatever. You do have to keep in mind that certain systems need to come after the other, but you only need to ensure that at a high level. You also need to consider that given the high degree of concurrency, that you should be using lock-less data structures and considering thread local storage. But after all those tasks are completed, you reconstruct the outcomes of those tasks into a completed game state that you can pass on for rendering.

    So beyond that model of concurrent programming, there's also a consideration for how asyncronous you want your "game" to be from your renderer. From my brief dive into the topic, I understand that there are two general architectures to consider: double-buffered and triple-buffered.

    In a double-buffered setup, you have two copies of your game state, one of the "last" frame and one of "next" frame. The "last" frame should be read-only at this point and read from by the renderer for drawing and from the other systems for calculating the "next" frame. The subsystems should have clear ownership of write-access to different parts of "next" frame. One benefit of this second buffer is that it saves you from the headache of making sure every little thing happens in the correct order, as it removes a lot of potential for collisions. In this approach, as soon as all the subsystems and rendering is complete, you swap your "next" frame with your "last" frame and repeat.

    The triple-buffered setup is a similar idea, except that the game and the renderer do not need to render in lock-step. The three buffers for this approach can be delineated as the one "being rendered", the one "last fully updated", and the "currently being updated". When the renderer finishes rendering, it will immediately move on to the next "last fully updated" one (unless it's already there). When the subsystems finish calculating, they will reassign the "currently being updated" as the "last fully updated" and begin storing the next game state in the previous "last fully updated" buffer. With this approach, the renderer will only slow down if it's already the last fully updated buffer, but the subsystems will never have to wait for the renderer to finish.

    Also, note that these buffers should not need to be block-copied over eachother, they should use pointers or indicies to denote which one currently has which role. And if memory is a large concern, rather than full buffers, you can manage a queue of changes, instead.

    Anyway, hope I shed some high level details on multithreaded engine programming. There appears to be a lot of development in the area, and it's one that I find quite interesting.

    This topic is closed to new replies.

    Advertisement