Jump to content

  • Log In with Google      Sign In   
  • Create Account


Concurrent rendering & game-logic


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
15 replies to this topic

#1 SillyCow   Members   -  Reputation: 849

Like
1Likes
Like

Posted 28 October 2012 - 05:43 PM

I have an engine design question:

I would like to run my logic on a different thread then the rendering.
The problem: The logic thread moves the units around, generally changing their state, while the render thread tries to render them.

The most naive approach is to use a bunch of locks on the different entities. But as far as I know locks are performance eaters. Although I know that since hyper-threading that overhead has gone down significantly.

Another option is to copy the state on every render frame before I render it. But that also looks very expensive performance-wise.

I could also lock the renderer while the logic is running, but then what's the point of being multi-threaded?

So I'd like to know:

What is the correct approach to concurrent rendering while running game-logic?

Edited by SillyCow, 28 October 2012 - 05:47 PM.

My new android game : Enemies of the Crown

My previous android game : Killer Bees


Sponsor:

#2 ic0de   Members   -  Reputation: 804

Like
0Likes
Like

Posted 28 October 2012 - 10:27 PM

My current approach is just to keep everything on one thread. There is really not that much of a performance gain you can expect from multithreading your game. Chances are adding this functionality will be a HUGE hassle and probably slow down your code due to overhead. I use a simple Logic -> Physics -> Rendering order.

you know you program too much when you start ending sentences with semicolons;


#3 slicer4ever   Crossbones+   -  Reputation: 3326

Like
1Likes
Like

Posted 28 October 2012 - 11:12 PM

I have to disagree with the poster above me, multi-threading can have huge beneficial impact on performance, particularly if your doing anything that is heavy in physics base, and their are several possibilitys to choose for how to approach multi-threading with your problems.
.
personally, to solve the issue between render/logic, i use three buffer's, or matrix's to represent my units, one is the real matrix, used by the logic, and the two other ones are used by the logic/renderer, when the logic thread finishes working on an object, it checks if a draw buffer/matrix is available for that unit, and writes/copy's it's buffer/matrix into the available buffer, then marks the buffer as swappable. then when the renderer see's that the buffer is swapable, it swaps it with the other render buffer, and clears the flag.

it's a bit of a memory hog, but it's an alternative to locking threads. depending on the size of the game, memory might not be an issue.

Edited by slicer4ever, 28 October 2012 - 11:25 PM.

Check out https://www.facebook.com/LiquidGames for some great games made by me on the Playstation Mobile market.

#4 SillyCow   Members   -  Reputation: 849

Like
0Likes
Like

Posted 29 October 2012 - 12:22 AM

I have to disagree with the poster above me, multi-threading can have huge beneficial impact on performance, particularly if your doing anything that is heavy in physics base, and their are several possibilitys to choose for how to approach multi-threading with your problems.
.
personally, to solve the issue between render/logic, i use three buffer's, or matrix's to represent my units, one is the real matrix, used by the logic, and the two other ones are used by the logic/renderer, when the logic thread finishes working on an object, it checks if a draw buffer/matrix is available for that unit, and writes/copy's it's buffer/matrix into the available buffer, then marks the buffer as swappable. then when the renderer see's that the buffer is swapable, it swaps it with the other render buffer, and clears the flag.

it's a bit of a memory hog, but it's an alternative to locking threads. depending on the size of the game, memory might not be an issue.


This is basically the copy approach, it's what I've used till now.
Problem is I'm developing for Java.
Java is very bad at bulk memory copy, since objects are allocated sparodically and there is no memcopy even if they weren't.

My new android game : Enemies of the Crown

My previous android game : Killer Bees


#5 SimonForsman   Crossbones+   -  Reputation: 5953

Like
5Likes
Like

Posted 29 October 2012 - 12:45 AM

I have an engine design question:

I would like to run my logic on a different thread then the rendering.
The problem: The logic thread moves the units around, generally changing their state, while the render thread tries to render them.

The most naive approach is to use a bunch of locks on the different entities. But as far as I know locks are performance eaters. Although I know that since hyper-threading that overhead has gone down significantly.

Another option is to copy the state on every render frame before I render it. But that also looks very expensive performance-wise.

I could also lock the renderer while the logic is running, but then what's the point of being multi-threaded?

So I'd like to know:

What is the correct approach to concurrent rendering while running game-logic?


If you are using OpenGL you are allready running the majority of the rendering concurrently with the game logic(on the GPU) so breaking off the command passing(which is all OpenGL does) to its own thread is pretty close to pointless. (you shouldn't send that many commands to OpenGL each frame anyway).

a few things you can do if your renderer is slow.
1) Do not use immediate mode.
2). See point 1.

If you're not using immediate mode and its still slow you should profile and see where the slow parts are.
I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

#6 Lauris Kaplinski   Members   -  Reputation: 841

Like
2Likes
Like

Posted 29 October 2012 - 05:58 AM

If you are using OpenGL you are allready running the majority of the rendering concurrently with the game logic(on the GPU) so breaking off the command passing(which is all OpenGL does) to its own thread is pretty close to pointless. (you shouldn't send that many commands to OpenGL each frame anyway).

a few things you can do if your renderer is slow.
1) Do not use immediate mode.
2). See point 1.

If you're not using immediate mode and its still slow you should profile and see where the slow parts are.


Pretty much this.

Physics normally benefits more from multithreading than rendering.

If you still want to use multithreading you can separate the rendering thread. I.e.
  • In your main thread submit relevant geometry to render lists (but you have to make sure that render lists do not reference data that will be modified in game logic thread)
  • While main thread proceeds to next logic/physics update the render thread wakes up and starts submitting render lists to OpenGL
  • You can even start new render list submission before rendering is complete if you guarantee that they do not interfere with each other (i.e. use two separate toplevel list containers and do not share objects)

Lauris Kaplinski

First technology demo of my game Shinya is out: http://lauris.kaplinski.com/shinya
Khayyam 3D - a freeware poser and scene builder application: http://khayyam.kaplinski.com/

#7 de_mattT   Members   -  Reputation: 308

Like
0Likes
Like

Posted 29 October 2012 - 06:54 PM

I'm working on an over-engineered Tetris-style game at the moment (also in Java). I too have designed the game so that the rendering and physics can be done in parallel.
I have a Tile class for each Tetris square and a Group class which contains a number of Tiles which form the familiar Tetris shapes ('L', '2x2', etc).

I originally went for the "bunch of locks on different entities" approach; locking each Tile for rendering or for moving. I was doing the synchronisation within the Tile class, i.e. locking the position vector before reading / writing it. There was the added complication that when moving a Group of Tiles I had to lock all the Tiles in the Group and move them together.

I recently decided that this approach wasn't working. My Tile and Group classes were getting very cluttered and confusing with all the synchronisation going on. I felt that the complicated nature of the solution guaranteed problems and bugs. My new solution is based on the "copy the state on every frame" solution, but on each frame I only copy what has changed (e.g. new Tile objects added, Tile objects deleted, Tile objects moved). I have a Synchronisation class which keeps track of changes and then sends them to the rendering thread after each frame.

Synchronisation is responsible for the synchronisation between the physics and rendering threads, making the rest of the code simpler, and currently Synchronisation is implemented using the provided utilities in java.util.concurrent without any need for low-level synchronisation, hence is also relatively very simple;

I'm still a long way off having a completed game and I cannot comment on performance, but in terms of simplicity of design the "copy the state every time" approach seems much better for my game.

Just thought I'd share my experience

Matt

#8 L. Spiro   Crossbones+   -  Reputation: 12803

Like
4Likes
Like

Posted 29 October 2012 - 11:50 PM

I disagree with the notion that there is nothing to be gained from multi-threaded rendering.
Calls to OpenGL are not free. Building an internal command list takes more time than you think in a lot of cases, especially when flushes are induced (note that these are not supposed to stall the CPU side, but try adding one anywhere in your code once per frame and see how your new framerate feels).
Another example is that when you don’t use VBO’s for your vertex buffers, the driver will internally copy the whole buffer to another location when you call ::glDrawArrays() or ::glDrawElements().
Obviously you should always be using VBO’s, but there are other things that can also incur a copy (poorly aligned vertex data, etc.) and this is just one example to illustrate the point.
The point is that there could be a lot more heavy lifting happening under the hood than you realize, and multi-threaded rendering always gives you a boost in performance…


…when done properly.
If it is not done properly, the result will always be worse than not doing it at all, and even after my speech on the potential gains of multi-threaded rendering I am still going to have to suggest to you both the same as was already suggested: Don’t use it. Focus your time elsewhere, such as on using VBO’s.

Neither of you seem to have the right idea about how it should work.
A renderer should not know what a Tile or Group is. The renderer should not be locking these things and they should not have anything to do with the “current state” that you want to copy to be rendered.
You should be building up a command list such that commands for rendering (and I am not talking about, “Render This Tile”, I am talking about, “Set Depth Test True”) can be submitted from your game thread and eaten from your render thread.
Synchronization is not hell; there are only 2 places for it: Sending commands to the command buffer and sending resources off to be deleted (which can’t be done until the renderer is done with them).

As the render thread only reads what is in the buffer, you are free to continue about moving your tiles and game objects.


The goal is to make your command list free to build and much faster than the one built by the driver. The driver will later build yet another command list when your render thread starts executing a render, but the overhead of some expensive commands (such as any large data copies it needs to make for whatever reason) get pushed off to another thread leaving your main logic/physics thread free to continue.


However I have no idea how you can do this efficiently in Java. And unless you are 100% positive how you can, you are likely to only be wasting your time just to end up with a slower game than you had before.
Focus on removing redundant state changes, avoid resolves and logical restores, use best practices when working with OpenGL commands, etc.


L. Spiro
It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#9 Ashaman73   Crossbones+   -  Reputation: 6881

Like
2Likes
Like

Posted 30 October 2012 - 12:39 AM

There is really not that much of a performance gain you can expect from multithreading your game.

Sorry, but this is BS.

What is the correct approach to concurrent rendering while running game-logic?

The game logic is always hard to make concurrently with other engine parts due to its manipulative nature. Just think about a missle created ad-hoc in the game logic loop, you really need to be careful to not create all the necessary entities (physics, render model, sound files) on-the-fly and add them to the according sub-systems. In this case you should work with proxies which are in an invalid state until properly integrated at a given sync point.

As L.Spiro said, I think that multithreaded rendering, or atleast creating multiple command queue concurrently can help.

But there's still hope to optimize the rendering without using multithreaded rendering. The basic idea is , to fill up the rendering queue faster than the GPU is capable of processing it. Once the CPU is done, the GPU is still running leaving the CPU for other tasks:

Simplified tasks in a single frame

	  Render   Game logic			Physics  Audio
CPU |--------||------------------||---------||-----------|
GPU   |------------------------------|	 

If you don't want to touch the game logic you can try to extract as much as possible from the rendering task and process it concurrently like this

Simplified tasks in a single frame

CPU 1 |--| Render animation  
CPU 2 |-----|Rendering Pipeline
CPU 3 |---------| Physics
CPU 4 |--Audio---|S|---Game logic---|
GPU   |------------------------------|	
Ie extract the calculation of the animation for the next frame from your rendering pipeline (double buffering), no need to stall the pipeline filling here. S is the Syncpoint, that is, you start with the game logic once all the other tasks are proceed.

Edited by Ashaman73, 30 October 2012 - 12:41 AM.


#10 SillyCow   Members   -  Reputation: 849

Like
0Likes
Like

Posted 30 October 2012 - 12:32 PM

Don’t use it. Focus your time elsewhere, such as on using VBO’s.


I've done many single threaded renderers (Using VBOs since forever). Started to get bored, so I thought I'd give multi-threaded rendering a try. I'd love the extra complexity to shake things up :-) .


As L.Spiro said, I think that multithreaded rendering, or atleast creating multiple command queue concurrently can help.


So I gather that queueing GL render commands is the "traditional" way to go?

Edited by SillyCow, 30 October 2012 - 12:34 PM.

My new android game : Enemies of the Crown

My previous android game : Killer Bees


#11 karwosts   Members   -  Reputation: 832

Like
0Likes
Like

Posted 30 October 2012 - 12:59 PM

I just released my first game, so I thought I'd throw in my two cents.

I had the same idea as you originally, "wouldn't this be great if the logic could run parallel to the graphics, I could get twice the perf!", so I went down that road with my game design. It's a somewhat simple puzzle game that's heavier on the graphics than anything else.

I separated all of my drawing into a second thread, and had a collection of "proxy" objects in the logic thread, such that when I wanted to do something graphics related, I called a method on the proxy which cached it, and then during the syncpoint between threads I transferred the proxy commands to the actual graphics objects. The idea was that my two threads would only need to be synchronized for a short window, and then the logic for the next frame could run parallel to the drawing.

A neat idea in theory, but all in all looking back I do somewhat regret it. I failed to accurately predict where my bottlenecks would be, and it turned out my game spent nearly 95% of the time in the render loop, such that I got a negligible benefit from putting them in parallel. I also had many hard to fix bugs and confusing moments when trying to keep the threads separate, "Can I call this method on the non-graphics thread? Does the order of calls to the proxy matter? etc, etc". I probably lengthened my development time by two months for almost no noticible performance gain.

So I'll say this can certainly work if you want it to, but make sure you actually will need it before unleashing a huge amount of extra headaches on yourself. Sounds like you're up for a challenge, so maybe you'd like to go this way, but if you're looking at it from a business standpoint make sure you can justify the extra development time.
My Projects:
Portfolio Map for Android - Free Visual Portfolio Tracker
Electron Flux for Android - Free Puzzle/Logic Game

#12 SillyCow   Members   -  Reputation: 849

Like
0Likes
Like

Posted 30 October 2012 - 03:37 PM

if you're looking at it from a business standpoint make sure you can justify the extra development time.


No business is concerned. I make my money at my day-job.

If business was concerned, I'd be using a game engine ( probably Unity ) and focusing on time-to-market.

The current game I am working on is a strategy game with lot's of path-finding (which takes up ~70% of the CPU).

So I thought to myself: Why not go multi-threaded?

I have done multi-threaded before in my own sick & twisted ways.

But now that i've realized how much of a common problem this is, I would like to know if a "standard" design pattern exists to render multi-threaded.

I am not looking for one of those "don't do it, cause it's a waste of time" replies.

I am curious & reckless, meaning I am willing to make an effort to satisfy my curiosity even if it kills my project, because I like learning new stuff.

My new android game : Enemies of the Crown

My previous android game : Killer Bees


#13 L. Spiro   Crossbones+   -  Reputation: 12803

Like
1Likes
Like

Posted 30 October 2012 - 06:12 PM

I've done many single threaded renderers (Using VBOs since forever). Started to get bored, so I thought I'd give multi-threaded rendering a try. I'd love the extra complexity to shake things up :-) .

There are many ways to go out that. Here are a few topics I plan to cover in my upcoming book:
  • Redundancy Checks
    • Immediate
    • Deferred
  • Render Queues
  • Fill-Rate Reduction
  • Bandwidth Reduction
  • Proper Vertex Buffer Updates
  • Efficient Swapping of Render Targets
    • Avoiding Logical Buffer Loads
  • Uniform Redundancy Checks
  • Shader Redundancy Checks
  • Physically Based Rendering
    • Physically Based Blinn-Phong
    • Efficient Oren-Nayar
  • Frustum Culling
  • Multi-Threaded Rendering
There are tons of other things you can do.


As L.Spiro said, I think that multithreaded rendering, or atleast creating multiple command queue concurrently can help.


So I gather that queueing GL render commands is the "traditional" way to go?

It is. This is what we do on our in-house engine for Xbox 360, PlayStation 3, PlayStation Vita, etc. We used it in Infinite Undiscovery, Star Ocean 5, Valkyrie Profile, etc.
It is the time-tested traditional way to do it.


But now that i've realized how much of a common problem this is, I would like to know if a "standard" design pattern exists to render multi-threaded.

I am not looking for one of those "don't do it, cause it's a waste of time" replies.

I am curious & reckless, meaning I am willing to make an effort to satisfy my curiosity even if it kills my project, because I like learning new stuff.

I already explained the underlying concepts, though I really don’t know how well it will work in Java even if you did everything perfectly.
As mentioned before, my book will have a chapter on multi-threaded rendering and include a running C++ demo with and without multi-threading for comparison. Although it will be using OpenGL ES 2.0, the same concept applies to any renderer and any API.
Unfortunately it will not likely be available for another year.

There are still many other ways to feed your appetite. Have you done physically based rendering before? Have you implemented render queues?
Even if you insist on continuing with multi-threaded rendering I still suggest you sit on your hands and think about it. When you are done, think about it some more.

As was mentioned, if you can’t anticipate where the bottlenecks will be, you can’t succeed at the task, period.


L. Spiro
It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#14 L. Spiro   Crossbones+   -  Reputation: 12803

Like
0Likes
Like

Posted 31 October 2012 - 01:49 AM

Also if you insist on continuing with multi-threaded rendering instead of thinking about it for a few months, might I suggest starting a new project whose goal is specifically multi-threaded rendering?
No need to mess up your current project since it will probably take a few tries to get it right.


L. Spiro


Edited by L. Spiro, 09 February 2013 - 07:27 PM.

It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#15 SillyCow   Members   -  Reputation: 849

Like
0Likes
Like

Posted 09 February 2013 - 04:32 PM

I got around to implementing render-lists.

It was relatively painless and did the trick.

Thanks for the great advice.


Edited by SillyCow, 09 February 2013 - 04:33 PM.

My new android game : Enemies of the Crown

My previous android game : Killer Bees


#16 Polarist   Members   -  Reputation: 160

Like
0Likes
Like

Posted 09 February 2013 - 07:45 PM

If you're still considering the multithreaded approach, take a look at the Smoke demo from Intel which builds a n-core scalable game engine using a task based approach (using tbb).

A task-based approach makes it quite painless to construct multithreaded processes. To get up and running, it took me just a couple hours to go from a single-threaded design to one that was achieving 100% CPU utilization over 4 cores during a few of my processing heavy operations.  There was certainly much more work to be done afterwards to fully take advantage of multithreading, but it was quite easy to speed up the performance sensitive pieces for my needs.

 

The idea behind a task-based and scalable solution is basically to "future-proof" the engine for any later developments in hardware, and to be decoupled from the number of  physical cores available on the machine.  I.e. the engine should maximize hardware usage regardless of whether you are allotted 1 or 20 cores.  

 

There are fairly modern game engines that take the older approach of assigning subsystems their own threads.  (E.g. the renderer gets 1 thread, the physics system gets 1 thread, etc.)  But as far as I'm aware, that method is no longer advisable and antiquated.

 

The "new" way is to basically to cut up all your game's subsystems into smaller, independent tasks.  You then assign those tasks to cores as they finish their current tasks.  So for instance, your physics calculations could be split up into 8 tasks, your render queue construction could be split up into 4 tasks, your particle systems can be split up into 6 tasks, whatever.  You do have to keep in mind that certain systems need to come after the other, but you only need to ensure that at a high level.  You also need to consider that given the high degree of concurrency, that you should be using lock-less data structures and considering thread local storage.  But after all those tasks are completed, you reconstruct the outcomes of those tasks into a completed game state that you can pass on for rendering.

 

 

So beyond that model of concurrent programming, there's also a consideration for how asyncronous you want your "game" to be from your renderer.  From my brief dive into the topic, I understand that there are two general architectures to consider: double-buffered and triple-buffered.

 

In a double-buffered setup, you have two copies of your game state, one of the "last" frame and one of "next" frame.  The "last" frame should be read-only at this point and read from by the renderer for drawing and from the other systems for calculating the "next" frame.  The subsystems should have clear ownership of write-access to different parts of "next" frame.  One benefit of this second buffer is that it saves you from the headache of making sure every little thing happens in the correct order, as it removes a lot of potential for collisions.  In this approach, as soon as all the subsystems and rendering is complete, you swap your "next" frame with your "last" frame and repeat.

 

The triple-buffered setup is a similar idea, except that the game and the renderer do not need to render in lock-step.  The three buffers for this approach can be delineated as the one "being rendered", the one "last fully updated", and the "currently being updated".  When the renderer finishes rendering, it will immediately move on to the next "last fully updated" one (unless it's already there).  When the subsystems finish calculating, they will reassign the "currently being updated" as the "last fully updated" and begin storing the next game state in the previous "last fully updated" buffer.  With this approach, the renderer will only slow down if it's already the last fully updated buffer, but the subsystems will never have to wait for the renderer to finish.

 

Also, note that these buffers should not need to be block-copied over eachother, they should use pointers or indicies to denote which one currently has which role.  And if memory is a large concern, rather than full buffers, you can manage a queue of changes, instead.

 

Anyway, hope I shed some high level details on multithreaded engine programming.  There appears to be a lot of development in the area, and it's one that I find quite interesting.


Edited by Polarist, 09 February 2013 - 09:03 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS