Concurrent rendering & game-logic
I have an engine design question:
I would like to run my logic on a different thread then the rendering.
The problem: The logic thread moves the units around, generally changing their state, while the render thread tries to render them.
The most naive approach is to use a bunch of locks on the different entities. But as far as I know locks are performance eaters. Although I know that since hyper-threading that overhead has gone down significantly.
Another option is to copy the state on every render frame before I render it. But that also looks very expensive performance-wise.
I could also lock the renderer while the logic is running, but then what's the point of being multi-threaded?
So I'd like to know:
What is the correct approach to concurrent rendering while running game-logic?
I would like to run my logic on a different thread then the rendering.
The problem: The logic thread moves the units around, generally changing their state, while the render thread tries to render them.
The most naive approach is to use a bunch of locks on the different entities. But as far as I know locks are performance eaters. Although I know that since hyper-threading that overhead has gone down significantly.
Another option is to copy the state on every render frame before I render it. But that also looks very expensive performance-wise.
I could also lock the renderer while the logic is running, but then what's the point of being multi-threaded?
So I'd like to know:
What is the correct approach to concurrent rendering while running game-logic?
My current approach is just to keep everything on one thread. There is really not that much of a performance gain you can expect from multithreading your game. Chances are adding this functionality will be a HUGE hassle and probably slow down your code due to overhead. I use a simple Logic -> Physics -> Rendering order.
I have to disagree with the poster above me, multi-threading can have huge beneficial impact on performance, particularly if your doing anything that is heavy in physics base, and their are several possibilitys to choose for how to approach multi-threading with your problems.
.
personally, to solve the issue between render/logic, i use three buffer's, or matrix's to represent my units, one is the real matrix, used by the logic, and the two other ones are used by the logic/renderer, when the logic thread finishes working on an object, it checks if a draw buffer/matrix is available for that unit, and writes/copy's it's buffer/matrix into the available buffer, then marks the buffer as swappable. then when the renderer see's that the buffer is swapable, it swaps it with the other render buffer, and clears the flag.
it's a bit of a memory hog, but it's an alternative to locking threads. depending on the size of the game, memory might not be an issue.
.
personally, to solve the issue between render/logic, i use three buffer's, or matrix's to represent my units, one is the real matrix, used by the logic, and the two other ones are used by the logic/renderer, when the logic thread finishes working on an object, it checks if a draw buffer/matrix is available for that unit, and writes/copy's it's buffer/matrix into the available buffer, then marks the buffer as swappable. then when the renderer see's that the buffer is swapable, it swaps it with the other render buffer, and clears the flag.
it's a bit of a memory hog, but it's an alternative to locking threads. depending on the size of the game, memory might not be an issue.
I have to disagree with the poster above me, multi-threading can have huge beneficial impact on performance, particularly if your doing anything that is heavy in physics base, and their are several possibilitys to choose for how to approach multi-threading with your problems.
.
personally, to solve the issue between render/logic, i use three buffer's, or matrix's to represent my units, one is the real matrix, used by the logic, and the two other ones are used by the logic/renderer, when the logic thread finishes working on an object, it checks if a draw buffer/matrix is available for that unit, and writes/copy's it's buffer/matrix into the available buffer, then marks the buffer as swappable. then when the renderer see's that the buffer is swapable, it swaps it with the other render buffer, and clears the flag.
it's a bit of a memory hog, but it's an alternative to locking threads. depending on the size of the game, memory might not be an issue.
This is basically the copy approach, it's what I've used till now.
Problem is I'm developing for Java.
Java is very bad at bulk memory copy, since objects are allocated sparodically and there is no memcopy even if they weren't.
I have an engine design question:
I would like to run my logic on a different thread then the rendering.
The problem: The logic thread moves the units around, generally changing their state, while the render thread tries to render them.
The most naive approach is to use a bunch of locks on the different entities. But as far as I know locks are performance eaters. Although I know that since hyper-threading that overhead has gone down significantly.
Another option is to copy the state on every render frame before I render it. But that also looks very expensive performance-wise.
I could also lock the renderer while the logic is running, but then what's the point of being multi-threaded?
So I'd like to know:
What is the correct approach to concurrent rendering while running game-logic?
If you are using OpenGL you are allready running the majority of the rendering concurrently with the game logic(on the GPU) so breaking off the command passing(which is all OpenGL does) to its own thread is pretty close to pointless. (you shouldn't send that many commands to OpenGL each frame anyway).
a few things you can do if your renderer is slow.
1) Do not use immediate mode.
2). See point 1.
If you're not using immediate mode and its still slow you should profile and see where the slow parts are.
If you are using OpenGL you are allready running the majority of the rendering concurrently with the game logic(on the GPU) so breaking off the command passing(which is all OpenGL does) to its own thread is pretty close to pointless. (you shouldn't send that many commands to OpenGL each frame anyway).
a few things you can do if your renderer is slow.
1) Do not use immediate mode.
2). See point 1.
If you're not using immediate mode and its still slow you should profile and see where the slow parts are.
Pretty much this.
Physics normally benefits more from multithreading than rendering.
If you still want to use multithreading you can separate the rendering thread. I.e.
- In your main thread submit relevant geometry to render lists (but you have to make sure that render lists do not reference data that will be modified in game logic thread)
- While main thread proceeds to next logic/physics update the render thread wakes up and starts submitting render lists to OpenGL
- You can even start new render list submission before rendering is complete if you guarantee that they do not interfere with each other (i.e. use two separate toplevel list containers and do not share objects)
I'm working on an over-engineered Tetris-style game at the moment (also in Java). I too have designed the game so that the rendering and physics can be done in parallel.
I have a Tile class for each Tetris square and a Group class which contains a number of Tiles which form the familiar Tetris shapes ('L', '2x2', etc).
I originally went for the "bunch of locks on different entities" approach; locking each Tile for rendering or for moving. I was doing the synchronisation within the Tile class, i.e. locking the position vector before reading / writing it. There was the added complication that when moving a Group of Tiles I had to lock all the Tiles in the Group and move them together.
I recently decided that this approach wasn't working. My Tile and Group classes were getting very cluttered and confusing with all the synchronisation going on. I felt that the complicated nature of the solution guaranteed problems and bugs. My new solution is based on the "copy the state on every frame" solution, but on each frame I only copy what has changed (e.g. new Tile objects added, Tile objects deleted, Tile objects moved). I have a Synchronisation class which keeps track of changes and then sends them to the rendering thread after each frame.
Synchronisation is responsible for the synchronisation between the physics and rendering threads, making the rest of the code simpler, and currently Synchronisation is implemented using the provided utilities in java.util.concurrent without any need for low-level synchronisation, hence is also relatively very simple;
I'm still a long way off having a completed game and I cannot comment on performance, but in terms of simplicity of design the "copy the state every time" approach seems much better for my game.
Just thought I'd share my experience
Matt
I have a Tile class for each Tetris square and a Group class which contains a number of Tiles which form the familiar Tetris shapes ('L', '2x2', etc).
I originally went for the "bunch of locks on different entities" approach; locking each Tile for rendering or for moving. I was doing the synchronisation within the Tile class, i.e. locking the position vector before reading / writing it. There was the added complication that when moving a Group of Tiles I had to lock all the Tiles in the Group and move them together.
I recently decided that this approach wasn't working. My Tile and Group classes were getting very cluttered and confusing with all the synchronisation going on. I felt that the complicated nature of the solution guaranteed problems and bugs. My new solution is based on the "copy the state on every frame" solution, but on each frame I only copy what has changed (e.g. new Tile objects added, Tile objects deleted, Tile objects moved). I have a Synchronisation class which keeps track of changes and then sends them to the rendering thread after each frame.
Synchronisation is responsible for the synchronisation between the physics and rendering threads, making the rest of the code simpler, and currently Synchronisation is implemented using the provided utilities in java.util.concurrent without any need for low-level synchronisation, hence is also relatively very simple;
I'm still a long way off having a completed game and I cannot comment on performance, but in terms of simplicity of design the "copy the state every time" approach seems much better for my game.
Just thought I'd share my experience
Matt
I disagree with the notion that there is nothing to be gained from multi-threaded rendering.
Calls to OpenGL are not free. Building an internal command list takes more time than you think in a lot of cases, especially when flushes are induced (note that these are not supposed to stall the CPU side, but try adding one anywhere in your code once per frame and see how your new framerate feels).
Another example is that when you don’t use VBO’s for your vertex buffers, the driver will internally copy the whole buffer to another location when you call ::glDrawArrays() or ::glDrawElements().
Obviously you should always be using VBO’s, but there are other things that can also incur a copy (poorly aligned vertex data, etc.) and this is just one example to illustrate the point.
The point is that there could be a lot more heavy lifting happening under the hood than you realize, and multi-threaded rendering always gives you a boost in performance…
…when done properly.
If it is not done properly, the result will always be worse than not doing it at all, and even after my speech on the potential gains of multi-threaded rendering I am still going to have to suggest to you both the same as was already suggested: Don’t use it. Focus your time elsewhere, such as on using VBO’s.
Neither of you seem to have the right idea about how it should work.
A renderer should not know what a Tile or Group is. The renderer should not be locking these things and they should not have anything to do with the “current state” that you want to copy to be rendered.
You should be building up a command list such that commands for rendering (and I am not talking about, “Render This Tile”, I am talking about, “Set Depth Test True”) can be submitted from your game thread and eaten from your render thread.
Synchronization is not hell; there are only 2 places for it: Sending commands to the command buffer and sending resources off to be deleted (which can’t be done until the renderer is done with them).
As the render thread only reads what is in the buffer, you are free to continue about moving your tiles and game objects.
The goal is to make your command list free to build and much faster than the one built by the driver. The driver will later build yet another command list when your render thread starts executing a render, but the overhead of some expensive commands (such as any large data copies it needs to make for whatever reason) get pushed off to another thread leaving your main logic/physics thread free to continue.
However I have no idea how you can do this efficiently in Java. And unless you are 100% positive how you can, you are likely to only be wasting your time just to end up with a slower game than you had before.
Focus on removing redundant state changes, avoid resolves and logical restores, use best practices when working with OpenGL commands, etc.
L. Spiro
Calls to OpenGL are not free. Building an internal command list takes more time than you think in a lot of cases, especially when flushes are induced (note that these are not supposed to stall the CPU side, but try adding one anywhere in your code once per frame and see how your new framerate feels).
Another example is that when you don’t use VBO’s for your vertex buffers, the driver will internally copy the whole buffer to another location when you call ::glDrawArrays() or ::glDrawElements().
Obviously you should always be using VBO’s, but there are other things that can also incur a copy (poorly aligned vertex data, etc.) and this is just one example to illustrate the point.
The point is that there could be a lot more heavy lifting happening under the hood than you realize, and multi-threaded rendering always gives you a boost in performance…
…when done properly.
If it is not done properly, the result will always be worse than not doing it at all, and even after my speech on the potential gains of multi-threaded rendering I am still going to have to suggest to you both the same as was already suggested: Don’t use it. Focus your time elsewhere, such as on using VBO’s.
Neither of you seem to have the right idea about how it should work.
A renderer should not know what a Tile or Group is. The renderer should not be locking these things and they should not have anything to do with the “current state” that you want to copy to be rendered.
You should be building up a command list such that commands for rendering (and I am not talking about, “Render This Tile”, I am talking about, “Set Depth Test True”) can be submitted from your game thread and eaten from your render thread.
Synchronization is not hell; there are only 2 places for it: Sending commands to the command buffer and sending resources off to be deleted (which can’t be done until the renderer is done with them).
As the render thread only reads what is in the buffer, you are free to continue about moving your tiles and game objects.
The goal is to make your command list free to build and much faster than the one built by the driver. The driver will later build yet another command list when your render thread starts executing a render, but the overhead of some expensive commands (such as any large data copies it needs to make for whatever reason) get pushed off to another thread leaving your main logic/physics thread free to continue.
However I have no idea how you can do this efficiently in Java. And unless you are 100% positive how you can, you are likely to only be wasting your time just to end up with a slower game than you had before.
Focus on removing redundant state changes, avoid resolves and logical restores, use best practices when working with OpenGL commands, etc.
L. Spiro
There is really not that much of a performance gain you can expect from multithreading your game.
Sorry, but this is BS.
What is the correct approach to concurrent rendering while running game-logic?
The game logic is always hard to make concurrently with other engine parts due to its manipulative nature. Just think about a missle created ad-hoc in the game logic loop, you really need to be careful to not create all the necessary entities (physics, render model, sound files) on-the-fly and add them to the according sub-systems. In this case you should work with proxies which are in an invalid state until properly integrated at a given sync point.
As L.Spiro said, I think that multithreaded rendering, or atleast creating multiple command queue concurrently can help.
But there's still hope to optimize the rendering without using multithreaded rendering. The basic idea is , to fill up the rendering queue faster than the GPU is capable of processing it. Once the CPU is done, the GPU is still running leaving the CPU for other tasks:
Simplified tasks in a single frame
Render Game logic Physics Audio
CPU |--------||------------------||---------||-----------|
GPU |------------------------------|
If you don't want to touch the game logic you can try to extract as much as possible from the rendering task and process it concurrently like this
Simplified tasks in a single frame
CPU 1 |--| Render animation
CPU 2 |-----|Rendering Pipeline
CPU 3 |---------| Physics
CPU 4 |--Audio---|S|---Game logic---|
GPU |------------------------------|
Ie extract the calculation of the animation for the next frame from your rendering pipeline (double buffering), no need to stall the pipeline filling here. S is the Syncpoint, that is, you start with the game logic once all the other tasks are proceed.
Don’t use it. Focus your time elsewhere, such as on using VBO’s.
I've done many single threaded renderers (Using VBOs since forever). Started to get bored, so I thought I'd give multi-threaded rendering a try. I'd love the extra complexity to shake things up :-) .
As L.Spiro said, I think that multithreaded rendering, or atleast creating multiple command queue concurrently can help.[/quote]
So I gather that queueing GL render commands is the "traditional" way to go?
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement