Jump to content

  • Log In with Google      Sign In   
  • Create Account

FREE SOFTWARE GIVEAWAY

We have 4 x Pro Licences (valued at $59 each) for 2d modular animation software Spriter to give away in this Thursday's GDNet Direct email newsletter.


Read more in this forum topic or make sure you're signed up (from the right-hand sidebar on the homepage) and read Thursday's newsletter to get in the running!


#ActualPolarist

Posted 09 February 2013 - 09:03 PM

If you're still considering the multithreaded approach, take a look at the Smoke demo from Intel which builds a n-core scalable game engine using a task based approach (using tbb).

A task-based approach makes it quite painless to construct multithreaded processes. To get up and running, it took me just a couple hours to go from a single-threaded design to one that was achieving 100% CPU utilization over 4 cores during a few of my processing heavy operations.  There was certainly much more work to be done afterwards to fully take advantage of multithreading, but it was quite easy to speed up the performance sensitive pieces for my needs.

 

The idea behind a task-based and scalable solution is basically to "future-proof" the engine for any later developments in hardware, and to be decoupled from the number of  physical cores available on the machine.  I.e. the engine should maximize hardware usage regardless of whether you are allotted 1 or 20 cores.  

 

There are fairly modern game engines that take the older approach of assigning subsystems their own threads.  (E.g. the renderer gets 1 thread, the physics system gets 1 thread, etc.)  But as far as I'm aware, that method is no longer advisable and antiquated.

 

The "new" way is to basically to cut up all your game's subsystems into smaller, independent tasks.  You then assign those tasks to cores as they finish their current tasks.  So for instance, your physics calculations could be split up into 8 tasks, your render queue construction could be split up into 4 tasks, your particle systems can be split up into 6 tasks, whatever.  You do have to keep in mind that certain systems need to come after the other, but you only need to ensure that at a high level.  You also need to consider that given the high degree of concurrency, that you should be using lock-less data structures and considering thread local storage.  But after all those tasks are completed, you reconstruct the outcomes of those tasks into a completed game state that you can pass on for rendering.

 

 

So beyond that model of concurrent programming, there's also a consideration for how asyncronous you want your "game" to be from your renderer.  From my brief dive into the topic, I understand that there are two general architectures to consider: double-buffered and triple-buffered.

 

In a double-buffered setup, you have two copies of your game state, one of the "last" frame and one of "next" frame.  The "last" frame should be read-only at this point and read from by the renderer for drawing and from the other systems for calculating the "next" frame.  The subsystems should have clear ownership of write-access to different parts of "next" frame.  One benefit of this second buffer is that it saves you from the headache of making sure every little thing happens in the correct order, as it removes a lot of potential for collisions.  In this approach, as soon as all the subsystems and rendering is complete, you swap your "next" frame with your "last" frame and repeat.

 

The triple-buffered setup is a similar idea, except that the game and the renderer do not need to render in lock-step.  The three buffers for this approach can be delineated as the one "being rendered", the one "last fully updated", and the "currently being updated".  When the renderer finishes rendering, it will immediately move on to the next "last fully updated" one (unless it's already there).  When the subsystems finish calculating, they will reassign the "currently being updated" as the "last fully updated" and begin storing the next game state in the previous "last fully updated" buffer.  With this approach, the renderer will only slow down if it's already the last fully updated buffer, but the subsystems will never have to wait for the renderer to finish.

 

Also, note that these buffers should not need to be block-copied over eachother, they should use pointers or indicies to denote which one currently has which role.  And if memory is a large concern, rather than full buffers, you can manage a queue of changes, instead.

 

Anyway, hope I shed some high level details on multithreaded engine programming.  There appears to be a lot of development in the area, and it's one that I find quite interesting.


#4Polarist

Posted 09 February 2013 - 09:03 PM

If you're still considering the multithreaded approach, take a look at the Smoke demo from Intel which builds a n-core scalable game engine using a task based approach (using tbb).

A task-based approach makes it quite painless to construct multithreaded processes. To get up and running, it took me just a couple hours to go from a single-threaded design to one that was achieving 100% CPU utilization over 4 cores during a few of my processing heavy operations.  There was certainly much more work to be done afterwards to fully take advantage of multithreading, but it was quite easy to speed up the performance sensitive pieces for my needs.

 

The idea behind a task-based and scalable solution is basically to "future-proof" the engine for any later developments in hardware, and to be decoupled from the number of  physical cores available on the machine.  I.e. the engine should maximize hardware usage regardless of whether you are allotted 1 or 20 cores.  

 

There are fairly modern game engines that take the older approach of assigning subsystems their own threads.  (E.g. the renderer gets 1 thread, the physics system gets 1 thread, etc.)  But as far as I'm aware, that method is no longer advisable and antiquated.

 

The "new" way is to basically to cut up all your game's subsystems into smaller, independent tasks.  You then assign those tasks to cores as they finish their current tasks available.  So for instance, your physics calculations could be split up into 8 tasks, your render queue construction could be split up into 4 tasks, your particle systems can be split up into 6 tasks, whatever.  You do have to keep in mind that certain systems need to come after the other, but you only need to ensure that at a high level.  You also need to consider that given the high degree of concurrency, that you should be using lock-less data structures and considering thread local storage.  But after all those tasks are completed, you reconstruct the outcomes of those tasks into a completed game state that you can pass on for rendering.

 

 

So beyond that model of concurrent programming, there's also a consideration for how asyncronous you want your "game" to be from your renderer.  From my brief dive into the topic, I understand that there are two general architectures to consider: double-buffered and triple-buffered.

 

In a double-buffered setup, you have two copies of your game state, one of the "last" frame and one of "next" frame.  The "last" frame should be read-only at this point and read from by the renderer for drawing and from the other systems for calculating the "next" frame.  The subsystems should have clear ownership of write-access to different parts of "next" frame.  One benefit of this second buffer is that it saves you from the headache of making sure every little thing happens in the correct order, as it removes a lot of potential for collisions.  In this approach, as soon as all the subsystems and rendering is complete, you swap your "next" frame with your "last" frame and repeat.

 

The triple-buffered setup is a similar idea, except that the game and the renderer do not need to render in lock-step.  The three buffers for this approach can be delineated as the one "being rendered", the one "last fully updated", and the "currently being updated".  When the renderer finishes rendering, it will immediately move on to the next "last fully updated" one (unless it's already there).  When the subsystems finish calculating, they will reassign the "currently being updated" as the "last fully updated" and begin storing the next game state in the previous "last fully updated" buffer.  With this approach, the renderer will only slow down if it's already the last fully updated buffer, but the subsystems will never have to wait for the renderer to finish.

 

Also, note that these buffers should not need to be block-copied over eachother, they should use pointers or indicies to denote which one currently has which role.  And if memory is a large concern, rather than full buffers, you can manage a queue of changes, instead.

 

Anyway, hope I shed some high level details on multithreaded engine programming.  There appears to be a lot of development in the area, and it's one that I find quite interesting.


#3Polarist

Posted 09 February 2013 - 09:02 PM

If you're still considering the multithreaded approach, take a look at the Smoke demo from Intel which builds a n-core scalable game engine using a task based approach (using tbb).

A task-based approach makes it quite painless to construct multithreaded processes. To get up and running, it took me just a couple hours to go from a single-threaded design to one that was achieving 100% CPU utilization over 4 cores during a few of my processing heavy operations.  There was certainly much more work to be done afterwards to fully take advantage of multithreading, but it was quite easy to speed up the performance sensitive pieces for my needs.

 

The idea behind a task-based and scalable solution is basically to "future-proof" the engine for any later developments in hardware, and to be decoupled from the number of  physical cores available on the machine.  I.e. the engine should maximize hardware usage regardless of whether you are allotted 1 or 20 cores.  

 

There are fairly modern game engines that take the older approach of assigning subsystems their own threads.  (E.g. the renderer gets 1 thread, the physics system gets 1 thread, etc.)  But as far as I'm aware, that method is no longer advisable and antiquated.

 

The "new" way is to basically to cut up all your game's subsystems into smaller, independent tasks.  You then assign those tasks to complete as cores finish their current tasks available.  So for instance, your physics calculations could be split up into 8 tasks, your render queue construction could be split up into 4 tasks, your particle systems can be split up into 6 tasks, whatever.  You do have to keep in mind that certain systems need to come after the other, but you only need to ensure that at a high level.  You also need to consider that given the high degree of concurrency, that you should be using lock-less data structures and considering thread local storage.  But after all those tasks are completed, you reconstruct the outcomes of those tasks into a completed game state that you can pass on for rendering.

 

 

So beyond that model of concurrent programming, there's also a consideration for how asyncronous you want your "game" to be from your renderer.  From my brief dive into the topic, I understand that there are two general architectures to consider: double-buffered and triple-buffered.

 

In a double-buffered setup, you have two copies of your game state, one of the "last" frame and one of "next" frame.  The "last" frame should be read-only at this point and read from by the renderer for drawing and from the other systems for calculating the "next" frame.  The subsystems should have clear ownership of write-access to different parts of "next" frame.  One benefit of this second buffer is that it saves you from the headache of making sure every little thing happens in the correct order, as it removes a lot of potential for collisions.  In this approach, as soon as all the subsystems and rendering is complete, you swap your "next" frame with your "last" frame and repeat.

 

The triple-buffered setup is a similar idea, except that the game and the renderer do not need to render in lock-step.  The three buffers for this approach can be delineated as the one "being rendered", the one "last fully updated", and the "currently being updated".  When the renderer finishes rendering, it will immediately move on to the next "last fully updated" one (unless it's already there).  When the subsystems finish calculating, they will reassign the "currently being updated" as the "last fully updated" and begin storing the next game state in the previous "last fully updated" buffer.  With this approach, the renderer will only slow down if it's already the last fully updated buffer, but the subsystems will never have to wait for the renderer to finish.

 

Also, note that these buffers should not need to be block-copied over eachother, they should use pointers or indicies to denote which one currently has which role.  And if memory is a large concern, rather than full buffers, you can manage a queue of changes, instead.

 

Anyway, hope I shed some high level details on multithreaded engine programming.  There appears to be a lot of development in the area, and it's one that I find quite interesting.


#2Polarist

Posted 09 February 2013 - 08:56 PM

If you're still considering the multithreaded approach, take a look at the Smoke demo from Intel which builds a n-core scalable game engine using a task based approach (using tbb).

A task-based approach makes it quite painless to construct multithreaded processes. To get up and running, it took me just a couple hours to go from a single-threaded design to one that was achieving 100% CPU utilization over 4 cores during a few of my processing heavy operations.  There was certainly much more work to be done afterwards to fully take advantage of multithreading, but it was quite easy to speed up the performance sensitive pieces for my needs.

 

The idea behind a task-based and scalable solution is basically to "future-proof" the engine for any later developments in hardware, and to be decoupled from the number of  physical cores available on the machine.  I.e. the engine should maximize hardware usage regardless of whether you are allotted 1 or 20 cores.  

 

There are fairly modern game engines that take the older approach of assigning subsystems their own threads.  (E.g. the renderer gets 1 thread, the physics system gets 1 thread, etc.)  But as far as I'm aware, that method is no longer advisable and antiquated.

 

The "new" way is to basically to cut up all your game's subsystems into smaller, independent tasks.  You then assign those tasks to complete as cores finish their current tasks available.  So for instance, your physics calculations could be split up into 8 tasks, your render queue construction could be split up into 4 tasks, your particle systems can be split up into 6 tasks, whatever.  You do have to keep in mind that certain systems need to come after the other, but you only need to ensure that at a high level.  You also need to consider that given the high degree of concurrency, that you should be using lock-less data structures and considering thread local storage.  But after all those tasks are completed, you reconstruct the outcomes of those tasks into a completed game state that you can pass on for rendering.

 

 

So beyond that model of concurrent programming, there's also a consideration for how asyncronous you want your "game" to be from your renderer.  From my brief dive into the topic, I understand that there are two general architectures to consider: double-buffered and triple-buffered.

 

In a double-buffered setup, you have two copies of your game state, one of the "last" frame and one of "next" frame.  The "last" frame should be read-only at this point and read from by the renderer for drawing and from the other systems for calculating the "next" frame.  The subsystems should have clear ownership of write-access to different parts of "next" frame.  One benefit of this second buffer is that it saves you from the headache of making sure every little thing happens in the correct order, as it removes a lot of potential for collisions.  In this approach, as soon as all the subsystems and rendering is complete, you swap your "next" frame with your "last" frame and repeat.

 

The triple-buffered setup is a similar idea, except that the game and the renderer do not need to render in lock-step.  The three buffers for this approach can be delineated as the one "being rendered", the one "last fully updated", and the "currently being updated".  When the renderer finishes rendering, it will immediately move on to the next "last fully updated" one (unless it's already there).  When the subsystems finish calculating, they will reassign the "currently being updated" as the "last fully updated" and begin storing the next game state in the previous "last fully updated" buffer.  With this approach, the renderer will only slow down if it's already the last fully updated buffer, but the subsystems will never have to wait for the renderer to finish.

 

Anyway, hope I shed some high level details on multithreaded engine programming.  There appears to be a lot of development in the area, and it's one that I find quite interesting.


#1Polarist

Posted 09 February 2013 - 07:45 PM

If you're still considering the multithreaded approach, take a look at the Smoke demo from Intel which builds a n-core scalable game engine using a task based approach (using tbb).

I'm not sure if the overhead is worth it in a cutting edge engine, but a task-based approach makes it quite painless to construct multithreaded processes. It took me just a couple hours, before I went from a single threaded design to one that was achieving 100% CPU utilization over 4 cores during a few of my processing heavy operations.

PARTNERS