• Create Account

We need your help!

We need 7 developers from Canada and 17 more from Australia to help us complete a research survey.

Support our site by taking a quick sponsored survey and win a chance at a \$50 Amazon gift card. Click here to get started!

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

2 replies to this topic

### #1TheAsterite  Members   -  Reputation: 203

Like
0Likes
Like

Posted 12 May 2013 - 02:33 AM

So as of now, I have a somewhat decent grasp about how to break up jobs into smaller parts for parallel processing. I wanted to move to designing an engine framework that revolved around handling different systems such as physics or rendering as different threads when I came across this article by intel: http://software.intel.com/en-us/articles/designing-the-framework-of-a-parallel-game-engine.

It seemed really interesting, so I downloaded the source code to their demonstration and was looking at it.

My question is, how relevant is this paper today? Is this scheduling technique used widely in the games industry?

### #2AgentC  Members   -  Reputation: 1826

Like
0Likes
Like

Posted 12 May 2013 - 05:05 AM

My view is that the Intel Smoke framework is somewhat over-generic (universal scene, which is extended with system scenes, change management) and some of its concepts are dubious, such as conflict resolution if several systems change the same property such as object transform (IMO that should not happen in the first place.) It doesn't really address the ordering of tasks within a frame, which doesn't hurt in the demo, but in an actual game it could lead to nasty non-determinism if for example AI is sometimes running one frame ahead, sometimes one frame behind in respect to physics. I'd consider it potentially dangerous if used directly as a basis for learning or for own engine.

I'd recommend, at least in the beginning, to be very explicit (even hardcoded) of what systems or tasks are actually running simultaneously, and what are the changes they're propagating to each other. For example physics updating object transforms, which the renderer picks up. Then profile constantly to discover the bottlenecks and to able to decide if threading will actually benefit them.

Some more resources (in case you're not familiar with them yet)

Every time you add a boolean member variable, God kills a kitten. Every time you create a Manager class, God kills a kitten. Every time you create a Singleton...

### #3AllEightUp  Moderators   -  Reputation: 4767

Like
0Likes
Like

Posted 12 May 2013 - 11:26 AM

I tend to agree with AgentC as to Smoke being over generic and also having some really bad bits which should not exist.  Having said that though, let me rephrase it as my preferred question: does the complication outweigh the benefit?  Not everyone is a threading guru, nor in reality should they be.  A person doing gameplay code should not have to worry about threading beyond a very few rules, if they do have to worry about threading a lot, it is an architecture failure in terms of the Knuth saying: "premature optimization is the root of all evil" and threading is most definitely an optimization.  A threading system doesn't have to execute at 99+% of Amdahl's law to be a benefit, 90% is good enough for most things as it scales to 8-10 cores before diminishing returns prevents further benefit.  Better than that, most of the performance loss is pre/post frame organization; the internal per frame work, when done well, but non-intrusively, can average closer to 97-98% which scales beyond 10 cores.

I am a fan of staged execution myself.  It is similar in concept to the discussions of threading in the entity frameworks folks are talking about.  You break up the execution into several pieces (components) using a set of simple rules even a junior programmer can follow.  Take for example a flocking system, I won't detail the algo, just the break down and how the rules apply:

Vector3f CalculateFlocking()
{
// Read positions of flock.
// Calculate center point, avoidance etc.
// Generate new velocity.
}

void Update()
{
mVelocity = CalculateFlocking();
mPosition += mVelocity * Time::DeltaTime();
}


The above won't work in the way I do threading because it breaks the prime rule of my system:  In a single stage of update you can not read from and write to a variable.  And the other way around also in terms that you can't write a variable and then use it for further calculation.  In the case of flocking, the update function breaks the rule because the calculation reads multiple object positions and then writes to this objects position.  The reason is simple, without inserting a mutex (against another rule) to protect the position member, the assignment is non-atomic and you could get partially updated (potentially invalid) vectors used by other simultaneous executions of this function.  Additionally, there is no consistency in terms of reading a position and getting an old or new version, which can throw off a number of calculations.

To fix the update function is simple, break it into two pieces (note: velocity is also read from the flocking calculation, so I fix that also):

void Update1()
{
mNewVelocity = CalculateFlocking();
}
void Update2()
{
mVelocity = mNewVelocity;
mPosition += mNewVelocity * Time::DeltaTime();
}


Now, iterating Update1 and Update2 with multiple threads is completely safe without any per object synchronization, you just have to make sure you iterated all Update1's completely before starting on Update2's.  Performance wise, even with the temporaries, this runs exceptionally fast with only a single synchronization point between Update1 and Update2.  I posted an example video at one time which shows flocking implemented in this manner:  it is not fancy (and takes a second up front to stabilize  but 2000 objects updating multi-threaded in the above manner is not exactly trivial.  The more important number, which I didn't show, is a near complete lack of locking/kernel time taking place except between the various update calls, of which there were 12 stages in that example.

Finally, to keep with the basic idea of staying out of peoples way, you can write the first example during initial development and simply mark the "stage" as single threaded.  Once you get things working, you can turn on a debug helper which flags any same stage data accesses and start the process of decomposing the update into different pieces.

With only 3 rules and better than 95% Amdahl's efficiency, I've never seen any real reason to worry about this solution much further.  It stays out of the way of getting things done, provides plenty of performance benefit and is very simple to implement.  I'll take the minor losses for the gain of simplicity in this case, not sure how you will feel of course but it is worth keeping in mind.

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

PARTNERS