Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 24 Apr 2011
Offline Last Active Yesterday, 08:54 PM

#5197383 3ds Max or Maya?

Posted by AllEightUp on 10 December 2014 - 08:33 AM

Unfortunately there are few 'one package solves everything' items out there and you need to balance things based on your target. First, for modeling in general, most artists don't tend to work in either Max or Maya exclusively for that and tend to use a pipeline of tools. For instance, one of my modelers tends to work as follows: Silo 3D for rough shape modeling, ZBrush for detailing, Max for application of final materials. But, on the other side, many (most?) of the artists who do animation tend to prefer the tools available in Maya over most other packages. Finally, programmers will tend to prefer writing plugins for Maya over Max since the SDK is so much cleaner, but unfortunately the choice is often forced on folks to be Max because that has a longer history of supporting games.

If I had to suggest on though, I'd probably suggest Maya since it makes life easier on the programmers and most of the places it doesn't work "great" are fixed through use of other tools. I've shipped games which had no custom editors and through use of plugins, our level editing and everything was part of Maya. Max *can* do similar but it it generally a much more difficult undertaking. Of course I'd have to say that Max is the 'prettier' of the two and Maya is quite utilitarian out of the box, primarily this is because Maya is designed to be customized, not so much used as a stock install.

As to Blender, if your artists can use it, it can work quite well. Other packages to consider in the long run (though more niche of course): Modo, Lightwave, Houdini, Cinema 3D and a whole slew of others.

It's a tough choice and there is no "one best" answer unfortunately.

#5169745 Throttling

Posted by AllEightUp on 28 July 2014 - 07:40 AM

So, I don't completely agree with hplus in regards to the sent data, given that I tend to send some redundant data in UDP, but I do agree that there is missing information in your blog post and above description.  How much data are you averaging per packet is the key missing item?  I will make some assumptions about the data but mostly just cover at a very high level (i.e. missing enough details to drive a truck through dry.png ) some of the problems involved with the naive solution you have.


First off, there is little reason to be sending packets at such a high rate.  Your reasoning for wanting to get things on the wire as fast as possible is relevant but when you look at the big picture, hardly viable.  The number you need to be considering here is latency, but of course latency consists of three specific pieces: delay till put on the wire, network transit time and actual receiver action time.  Assuming that your nic is completely ready to receive a packet and put it on the wire, your minimal latency is 5-10 ms because the nic/wifi/whatever needs to form the data into a packet, prepend the headers and then actually transmit the data at the appropriate rate over the wire.  Add on top of this the fact that you are sending packets every 33.33~ms you have a potential maximal latency of 40ish ms from the point you call the send API to when the data actually hits the wire.  If the network is busy, the wifi is congested or weak, you can easily be in the 50+ms range before a packet actually even hits the wire from the point you call the function to send the data.  In general, you need more intelligence in your system than simply sending packets at a high fixed rate if you want to reduce latency but still not "cause" errors and dropped packets.


The next thing to understand is that routers tend to drop UDP before TCP.  At a high level this is a technically incorrect description, it's more to the fact that the routers will see small high rate packets from your client, potentially even having two or three buffered for transit to the next hop, and then larger packets of TCP at a more reasonable rate and prefer to drop your little packets in favor of the larger packets from someone else.  Given there are easily 10+ hops between a client and a server, the packet lottery is pretty easy to loose under such conditions when the network is even minimally congested.  Add in reliable data getting dropped regularly and now your latencies are creeping up into the 200+ range depending on how you manage resend.


How to start fixing all these issues to deal with the random and unexplained nature of networking while maintaining low latency is all about intelligent networking systems.  Your "experiment" to reduce packet rates is headed in the correct direction, but unfortunately a simple on/off is not the best solution.  The direction you need to be headed is more TCP like in some ways, specifically you should be balancing latency and throughput as best you can while also (unlike what hplus suggested) using any extra bandwidth required to reduce the likelyhood of errors causing hickups.


I'll start by mentioning how to reduce the effect of errors on your networking first.  The common reliable case is the fire button or the do something button which must reach the other side.  In my networking systems I have a "always send" buffer which represents any critical data such as the fire button.  So, if I'm sending a position update several times a second, each packet also contains the information for the fire button until such time as the other side ack's that it received it.  So, baring massive network hickup, even through a couple packets may have been dropped the fire button message will likely get through as quickly as possible.  This is specifically for "critical" data, I wouldn't use this for chat or other things which are not twitch critical.  In general, this alone allows you to avoid the worst cases of having "just the wrong packet got dropped" which throws off the players game.  Yup, it uses more data than strictly necessary but for very good reason.


Moving towards the TCP like stuff, let me clarify a bit.  What you really want here is the bits which replace your "experiment" piece of code with something a bit more robust.  In general, you want three things: mtu detection (for games you just want, can I send my biggest packet safely), slow start/restart packet rates and a non-buffered variation of the sliding window algorithm.  So, the MTU (maximum transmission unit) is pretty simple and kinda like your current throttling detection, send bigger packets until they start consistently failing then back off till they get through.  Somewhere between where they were failing and where they are getting through is the MTU for the route you are transmitting on.  You don't need to actually detect the MTU for a game, you just want to know that if everything starts failing, MTU could be the reason and you should back off large packets till they get through.


The second bit, slow start/restart is actually a lot more important than many folks realize.  Network snarls happen regularly, either things are being rerouted, something has a hickup or potentially real hardware failures crop up.  In regards to UDP, the rerouting can be devastating because your previously detected "safe" values are now all invalid and you need to reset them and start over.  A sliding window deals with this normally and is generally going to take care of this, but I wanted to call it out separately because you need to plan for it.


The sliding window (see: http://en.wikipedia.org/wiki/Sliding_Window_Protocol) is modified from TCP for UDP requirements.  Instead of filling a buffer with future data to be sent, you simply maintain the packets per second and average size of the packets you "think" you will be sending.  The purpose of computing the sliding window though is so you can build heuristics for send rate and packet sizes in order to "play nice" with the routers between two points and still minimize the latencies involved.  Additionally, somewhat like the inverse of the nagle algorithm, you can introduce "early send" for those critical items in order to avoid the maximal latencies.  I.e. if you are sending at 10 a second and the last packet goes out just as a "fire" button is received, you can look at sending the next packet early to reduce the critical latency but still stay in the nice flow that the routers expect from you.  A little jitter in the timing of packets is completely expected and they don't get mad about that too much.  But, even if some router drops the packet, your next regularly scheduled packet with the duplicated data might get through.


I could go on for a while here but I figure this is already getting to be a novel sized post.  I'd suggest looking at the sliding window algorithm, why it exists, how it works etc and then consider how you can use that with UDP without the data stream portion.  I've implemented it a number of times and, while far from perfect, it is about the best you can get given the randomness of networking in general.

#5168852 Does anyone know of this game editor the video uploader is using?

Posted by AllEightUp on 24 July 2014 - 06:23 AM

If you are talking about the capture starting around 1:38, that's just 3DS Max with a custom plugin.  Otherwise, the only other stuff I saw was some in game UI work to help editing.

#5166721 Storing position of member variable

Posted by AllEightUp on 14 July 2014 - 06:29 AM

You might want to look at doing this in a more C++ like manner without the potentially difficult to deal with side effects of pointers into protected/private data.  The way I went recently was to implement a system like this using std::function.  Basically using accessors I could bind up access in nice little objects which worked with my serialization.  So, for instance, in your given example, I wouldn't expose x, y, z separately I'd simply bind the accessor to the vector as:


std::function< Vector& () > accessor( std::bind( &MyComponent::Position, std::placeholders::_1 ) );


With that, pop it in a map of named accessors and you can get/set the component data without breaking encapsulation of the class with evil memory hackery.  Obviously there is more work involved in cleaning this up to allow multiple types, separate get/set functions and a bunch of other things I wanted supported but it is considerably better behaved than direct memory access.  The primary reason I avoided the direct memory access was because I have a number of items which need to serialize but work within a lazy update system, if I bypass the accessor, the data in memory can be in completely invalid states.  With a bound member (or internal lambda), everything can be runtime type safe (unlikely to get compile time type safety), works with lazy update systems and generally a much more robust and less hacky solution.

#5165439 FBX and Skinned Animation

Posted by AllEightUp on 07 July 2014 - 09:21 PM

I'm getting my bind pose data like this (pseudo):
bone->GetTransformMatrix() and bone->GetTransformLinkMatrix()
...that seems to be working alright.

The transform link is the way I do the bind pose matrix myself, so it looks correct to me.

I don't understand the concept of FbxAnimStack and FbxAnimLayer. I think a stack is a collection of layers, and a layer is what used to be called a 'take' (which I think is like a key). I also don't understand what the FbxAnimEvaluator does.

You can pretty much ignore the layers unless you intend to do some pretty advanced stuff, just collapse them at the start and you'll only have one layer per stack. (I don't remember the exact call, it's on the FbxAnimStack I believe.)

The FbxAnimStack is actually what used to be called a take as I remember it. For DCC tools which export multiple animations per FBX, you would have one of these stacks per animation. So, a file could have a walk stack, run stack, turn, jump etc. Maya seems to ignore this, I think Max will use them if you author the files in a specific manner.

Finally the evaluator is basically an animation playback system for the content of the FBX file. Basically if you set a stack as it's current context, it will allow you to sample the scene and get the transforms, etc from the scene at various times. This brings us to the rest:

I'm doing the modelling, rigging, and animating myself in 3ds Max. Let's say I want to do a simple two key idle animation for my soldier, I pose him at time=0 and at time=30. All I really need are the bone transforms for those two times (for my simple system.)
Can you put that in terms of FbxAnimStack, FbxAnimLayer, FbxAnimCurveNode, FbxAnimCurve, FbxAnimCurveKey ? I don't know where to look for the bone transforms at those two times.

The FBX SDK gives you a number of ways to get the data you might want. Unfortunately due to different DCC tools (Max/Maya/etc) you may not be able to get exactly the data you want. For instance, let's say you find the root bone and it has translation on it in the animation. You can access the transform in a number of ways. You can use the LclTransform property and ask for the FbxAMatrix at various times. Or you can call the evaluation functions with a time to get the matrix. Or you can use the evaluator's EvaluateNode function to evaluate the node at a time. And finally, the most complicated version is you can get the curve nodes from the properties and look at the curve's keys.

Given all those options, you might think getting the curves would be the way to go. Unfortunately Maya, for instance, bakes the animation data to a set of keys which have nothing to do with the keys actually setup in Maya. The reason for this is that the curves Maya uses are not the same as those FBX supports. So, even if you get the curves directly, they may have hundreds of keys in them since they might have been baked.

What this means is that basically unless Max has curves supported by FBX, it may be baking them and you won't have a way to find what the original two *poses* in your terms were. Generally you will iterate through time and sample the scene at a fixed rate. Yup, it kinda sucks and generally you'll want to simplify the data after sampling.

Hopefully this gets you past the understanding issues for the moment. FBX is fun stuff ain't it. biggrin.png

#5165047 FBX and Skinned Animation

Posted by AllEightUp on 06 July 2014 - 08:52 AM

I think you are mixing up some FBX concepts with your data requirements and also you might be getting confused since FBX 2014 has changed all the animation structures and any tutorials or information you find via Google are likely using the older SDK's.  The reason I say this is that the "pose" structure in FBX is not directly related to animation data.  The pose structure is more of a placeholder structure which is used to to describe a skin binding in the FBX file, but it is not absolutely required and for instance won't exist in files with rigid body animation.  (NOTE: it seems pretty random if it exists or not depending on the DCC tool and data in the file.)  In general, I suggest not really thinking in terms of poses directly as it will just confuse things, for the runtime animation side, I'd suggest thinking in terms of "keys" instead to stay separated from FBX concepts.


Now, in terms of the data you want, it's pretty simple given your description of how you are storing things.  Basically if you have the inverse bind pose, you have the data structure you need already and you simply need to decide what to do with it.  The style of animation you seem to be shooting for is full skeleton key frames which is among the most simple to implement and has some benefits in various ways.  All you need for this to work is the duration of the animation, playback rate and a set of keys representing the animation.  The keys are what you are calling poses but not related to the FbxPose nodes.  Anyway, assuming you have a concept of a matrix hierarchy which represents the bind pose, you would use the same matrix hierarchy to store the key frames.  So you end up with the following:



  MatrixHierarchy: bind pose



     1..n MatrixHierarchy: key


Getting the data out of Fbx can be a chore.  I won't go over the whole thing, I'll just mention some of the fun bits I ran into recently when switching to the 2014 SDK:


1.  Make sure to tell FbxIOSettings to import animation.  I spent a day thinking I had a bug but instead I had simply forgot to tell the damned thing to import the animation data.  Without this set, the SDK lies to you saying there *is* animation and even allows you to iterate all the structures, get lengths and everything, but extracting animation just gives you the first frame repeatedly without errors or other hints that the data wasn't loaded.

2.  Even if you only have one FbxAnimStack, make sure to go get the evaluator from the fbx manager and set the context to the stack.  Otherwise I found in some cases it was doing the wrong things.

3.  Don't rely on any FbxPose's existing.  Between Max, Maya, Modo and others, it seems hit and miss if it will be there and in general you don't really need it anyway.

4.  I occasionally got additive animation layers which didn't make much sense given the source art, I'd suggest running the layer collapse function right after import to remove any of them.


Anyway, good luck.  Hopefully the little outline helps explain how you can store the data simply.  The FBX SDK on the other hand is a box of crazy and drives me insane.

#5164050 Cry Engine or Unity?

Posted by AllEightUp on 01 July 2014 - 08:50 AM

I wouldn't wish the HeroEngine even on my worst enemy (I've rewritten the spaghetti code in that engine once, never again), I suggest avoiding it.  There are MMO frameworks available on the Unity asset store which I've heard reasonable things about.  Overall, I'd have to suggest Unity for a starting point myself.  Don't expect it to support 5k folks out of the box, but a couple hundred sounds quite doable from what I've heard.

#5160016 Plugins management in application plugin system

Posted by AllEightUp on 12 June 2014 - 05:59 AM

The other decent option is to have each plugin export a registration table that the app uses to figure out how to register the plugin. Old plugins keep working because you never change the format of this table; new plugins just put different data in it. e.g

struct PluginInfo {
  const char* name;
  const char* version;


__declspec(dllexport) PluginInfo _plugin_info = {
The later approach is not brittle, but it's incredibly inflexible. What if you want a single DLL to supply two different things? What if a DLL needs to change the thing it registers based on runtime choices? Use AllEightUp's advice.

This is exactly the style I was using several years ago which drove me to turn things around.  Eventually I had to put in so many different flags, id's of dependencies and such that these structures where getting out of hand.  And worse, I stopped wanting to change them for new features and exceptions because having to go through all the plugins to update a field became a serious chore.  Reversing things as suggested prevented needing to touch 90% of the code just because one plugin had a unique requirement to it's loading.

#5160012 Plugins management in application plugin system

Posted by AllEightUp on 12 June 2014 - 05:53 AM

It is one possible solution.

For my personal feeling I dont think it is a good thing that the plugin accesses information in the application. In your solution the factory class.

I would setup a factory class in the application and make the application responsible to load and register the plugin.


For a C++ solution I would define a generic Interface class that is used within the factory. And like your install function someone will need a create-interface function that creates a new Interface. There is not more needed to handle the plugin.

The difference is that the plugin accesses no data in the application. So its a question of what "feels" better.


I may come around later with some pictures.


For the purpose of plugins, I find that letting them figure out what to do is generally the best approach since the application side classes are less likely to need to be changed for special requirements.  This is a matter of opinion of course and everyone has preferences but I tend to think of it as an extension of SRP, the factory stays self contained and knows nothing about the plugins, their requirements etc.  Each plugin remains self contained in terms that it knows it's requirements, how to install and instantiate itself and as such it tells others how to use the plugin contents.


Just an example of one of the cases where this style makes things considerably easier.  Say you have a preference which selects OpenGL versus D3D rendering.  If you make the factories responsible for figuring out the content of the plugins you need to either filter the plugins by name, check flags in the plugins or still rely on the plugin to fail to register based on the preference.  In the case of the plugins doing the work, each plugin can simply asks for the prefs object and decides to install or not.  The factory remains simple in this case.  As you add more items to the system, say Bullet physics versus Havok, versus whatever, having the factories do the work gets more and more complicated while the plugin side can do the work exceptionally easily.

#5159745 Plugins management in application plugin system

Posted by AllEightUp on 11 June 2014 - 06:06 AM

I tend to turn this entire thing around for flexibility reasons.  The problem I've had with plugins which attempt to expose data to the system is that the structure/flags/whatever you wish to expose ends up constantly changing as you add more features and it makes keeping all the plugins up to date difficult.  Instead of doing this, I let the plugins do the work themselves.  Basically I load up the plugin and call a register function in the plugin.  Now it is up to the plugin to find factories to hook into, tell systems about themselves etc.  In this way, if I add a new factory I don't need to go through all the existing plugins and update the registration structures, existing plugins simply continue to work since they don't know about the new thing or need to change how they register.  When I add a plugin for the new item, it just "works" because it is querying the system for the new factory to plug into instead of the system querying the plugin to figure out where it should be hooked.


Just a quick outline of this.  (Note that I use a custom variation of the COM pattern, it is not MS COM though, just the pattern.)

for (auto plugin : foundPlugins)
void Plugin::Load(Registry& registry) // plugins don't cause load failures, they just don't register and that is logged.
  m_SharedObject.Load (m_Filename);
  if (m_SharedObject.Loaded())
    InstallFunction install = m_SharedObject.GetFunction<InstallFunction>("Install");
    if (install)
       if ((install)(registry))
         return true;
  // TODO: Clean up everything from above on failure...
// From a plugin.
bool stdcall Install(Registry& registry)
  // I'm a file format plugin...
  iFileFormatFactory* factory = nullptr;
  if (registry.Create<iFileFormatFactory>(&factory)) // Note: the factory is probably a singleton internally, it deals with that behind the scenes.
    if (factory->Install(MyPlugin::kId, &MyPlugin::Creator))
      // Note: you can pass in descriptive flags above in a "real" system.
      return true;
  return false;

There are a whole slew of benefits to this solution.  For instance, if this plugin absolutely needs another plugin to exist before it can load, it could soft fail and add itself to a deferred load (i.e. try again) list in the registry to be called again after other things get a chance to load.  Or, if this was itself the factory mentioned, it could register the factory, then implement a subdirectory scan to load plugins for itself.


Overall, this pattern has served me very well and allows for very intricate plugin systems without the primary "Registry" getting all bloated with options and parsing systems.

#5157538 Trying to break into more adv. game prog. areas

Posted by AllEightUp on 02 June 2014 - 08:38 AM

So, the first thing to know is that by having started with C#, you have a beginning understanding of C++ syntax already.  There are fundamental differences mostly in the structure and declaration of things but the core portions of the languages are very similar such that actually coding things and applying logic isn't going to change too horribly.  The largest differences are going to be in class declarations, how h(xx/pp) versus cpp files are used and other trivial, but important, items which take a while to work through.


Having said that, if you have a pro license for Unity you could simply start off by writing some plugins for Unity.  Even without the pro license you can still write external helpers and use networking to communicate back and forth, though that may easily be too much to start off with.


Anyway, for C++ by itself given the listed goals.  I'd actually *start* by looking at the build systems first.  If you want to target cross platform eventually then don't put off learning how to do it.  Trying to unwind code from compiler, library, OS etc differences at a later date is nearly futile.  Start out coding that way if it is where you want to end up.  As to your choices, you know you are biting off a lot though I'd question the need/desire/utility of writing physics and audio libraries yourself.  The other items are pretty solid items to learn and even while large, the software renderer will likely show a lot of things many programmers never see in terms of math and related.


In general though, if you start these things, you will learn C++ by osmosis as you go.  The one thing I highly suggest is keeping things simple to start with.  Don't try and learn/use every feature of C++, start with the minimal set you can to do the work.  Only jump to new features of the language as you have become solid with the ones you are using and also make sure you *need* a feature before you use it.  For me, I think that one of the most important things to learn early on is pointers, arrays and how to work with those "low level" features of the language prior to jumping on the shared pointer band wagon.  It is amazing how many recent folks simply don't know how to use simple pointers and it really cripples them moving forward, even when using STL and the likes.


Anyway, this is a pretty huge subject.  I personally suggest just jumping in since you know C# to a degree and there is similarity.  If that doesn't work, Amazon is rife with C++ beginner books.

#5155258 msvc "install" command?

Posted by AllEightUp on 22 May 2014 - 01:17 PM

I'm not comfortable with NMake but CMake creates an INSTALL project in your solution which is the "make install" equivalent. So you can build this project from the command-line and it should dump your build folder into C:\Program Files (x86)\YourProgramName. I don't know if there is an easy way to change the install directory.


As long as all the install commands work on relative items, the variable CMAKE_INSTALL_PREFIX allows you to tell CMake where to perform the actual install.  It defaults to "C:/Program Files/<solution name>" and you can change it in cmake at anytime after the project command (i.e. after cache is loaded) or simply modify it in the cmake-gui.

#5150641 Multithreading vs variable time per frame

Posted by AllEightUp on 30 April 2014 - 03:41 PM

I've thought of an easy way to eliminate all those mutexes. I can give every object and the camera etc a pair of matrices and whatever else needs to be accessed in both threads and use them in much the same way as double buffering. Then I only need one global flag to make the rendering thread use one copy while the game thread is using the other, then flip them over at each tick, and the only thread-safe constraint it needs is to be volatile. How does that sound?


As Sean points out, it doesn't need to be volatile and actually volatile is a bad thing to use anyway.  You don't even need this to be atomic either.  Give the sim and renderer a working index and use three threads: control, sim and renderer.  Control starts off by issuing a sim tick to get the initial set of matrices and when sim completes it tells the renderer to render using index 0.  Control increases the index for the sim to 1 and tells it to compute the next frame as the renderer is going.  At this point you can completely decouple rendering and simulation.  If the renderer finishes before the next simulation finishes, you can re-render the same data or wait.  If the sim has finished another frame and is working on a third, renderer can extrapolate from last frame through current frame.  If the renderer is slow, sim keeps alternating between two matrices that it fills with the most recent data.  The key point is that the sim and renderer notify the control thread and wait until told what to do, a single mutex/blocking state per game frame is completely viable and won't pose any significant performance issue.  If the logic controlling the indexes in use is in the control thread, you don't need any thread safety on the indexes since the other threads are unable to contend for the data at that point.


Using shadow data of this form is fairly common.  Deciding what to do when one or the other items is slow, that's really up to you and how you want to deal with it.  In general though, I believe all the mixtures of possible fast/slow require 3 matrices stored and only the control thread is allowed to change the index which the other threads will use.

#5150533 Multithreading vs variable time per frame

Posted by AllEightUp on 30 April 2014 - 07:02 AM

 Data parallel works in a lot of cases very well where thread pools and task dags lose horribly due to Amdahl's law.

All the DAG/Job systems that I've had experience with have been data-parallel.
The two solutions I've seen are, either:

  • data-parallel systems will put many jobs into the queue, representing chunks of their data that can be executed in parallel,
    • Usually there's another tier in here - e.g. A system has 100000 items - it might submit a "job list", which might contain 100 sub-jobs that each update 100 of the system's items. An atomic counter in the job-list is used to allow every thread in the pool to continually pop different sub-jobs. Once all the sub-jobs have been popped, the job-list itself is removed from the pool's queue. The DAG is actually one of job-lists, not jobs.
  • or, every job is assumed to be data parallel, with a numItems constant associate with them, and begin/end input variables in the job entry point. Many threads can execute the same job, specifying different, non-overlapping begin/end ranges within begin=0 to end=numItems. Once the job has been run such that every item in that full range has been executed exactly once, then it's marked as being complete. Non-scalable / non-data-parallel jobs have numItems=1.

Are you saying that a data-parallel system wouldn't have a fixed pool of threads, or maybe you're using connotations of the phrase "thread pool" that I'm missing?


The difference which I've seen is really the usage patterns and implementation details of the underlying thread system.  Unfortunately I was trying to generalize without getting into the details too much, but that of course leaves a lot of questions for those familiar with the problems involved.  Additionally, I probably mixed up how I described the difference as they tend to overlap a bit.


I'll try and clean this up using an example of a game which shipped a few years ago, the engine was modeled somewhat on concepts of UnReal in certain areas and initially had very little threading.  On the 360 and PS3, running the game as basically 3 threads was not getting the performance required.  I.e. the divisions were basically: main game systems, rendering and GPU computations.  (NOTE: The GPU computations in this case are normally associated with rendering but I keep them separate here since the GPU was over utilized and some work needed to shift back to the CPU's.)  Other than various minor helper threads for file io, general input, and other such things, this was pretty much the limit of the threading and the game was suffering due to this.  As far as shipped (A+) games go, this division is still very common today and the common addition of job queues is not really a "fix" as the changes need to happen architecturally from day one to really utilize current and future core counts.


Anyway, back to the example engine.  By adding a relatively fast job system I moved animation, skinning, some various bits of AI update logic and a couple other systems over to using the thread system and we managed to get the primary threads under 100% and also reduce load on the GPU such that more pretty stuff could be pushed through.  This is as far as things were taken, "just enough" to ship.  This leaves two reasons I draw the line I do: the thread system itself was limited by the need to integrate with a functional parallel engine and only bits and pieces of the engine were flattened out to actually distribute.  Newer engines start with threading from ground zero and the way they are coded is driven by the necessity of threading day one such that the thread system itself does not have to tiptoe around functional concepts.  This might seem subtle but it is actually quite a large difference and causes a pretty major change to the overall architecture and design of systems and engine organization.  Obviously the ultimate 100% Amdahl's law is still a long way off, but I have seen new engines which can push upwards of 90-95% (i.e. 5-10% is still single threaded) over 24 cores which gives them the ability to do living world simulations orders of magnitude larger than pretty much anything else.  Also note that rendering becomes more tightly integrated with threading and allows better throughput and fewer GPU stalls thus helping to push even prettier pictures than before.


This is still painting with a very broad brush and not detailing the reasons.  Without getting into the nit picky details I don't think I can go much further without starting up discussions of cache, latencies, contention avoidance and all that.  Hopefully I at least cleared up where I try and draw the line of the differences?

#5150478 Multithreading vs variable time per frame

Posted by AllEightUp on 29 April 2014 - 10:03 PM

Keep in mind that when talking about these things, threading in engines continues to evolve. Thread pools and dag based task issuance is not really the direction most engines are moving due to the fact that they don't scale very well. Thread pools are mostly a bolt on system to allow functionally threaded engines to stay somewhat relevant, it's just a bolt on that helps, not a great solution. Newer engines have moved (are moving) towards data parallel threading due to the well proven scaling abilities. This is a threading model similar to hardware items such as GPU's, telecom switches etc, and of course languages such as Erlang, OpenCL, etc. The first major usage of the threading style was for Renderman's Shading Language, though it wasn't threaded initially it simulated a giant simd engine which is basically where the best threaded engines have been going. Another example was Havok's hydra threading system, once passed broadphase processing it was data parallel in computing intersections/penetrations and such.

Of course. All modern game engines are very parallel.

I hope you are only counting engines in the last couple years in this. The big culprits out there right now: Unity, UnReal and Source are very uneven in how well they thread no matter how pretty the output is. Basically with those engines you see 2 or 3 cores getting pounded pretty hard, those are the hold overs from having been functionally threaded when they started out. Then you generally see the remaining cores getting some utilization from the thread pools but it is generally pretty low and you see a very large amount of contention burning up a notable portion of the cores. Recent engines, notably those using things like the component entity model, thread out data parallel and those *are* very parallel.

But, again, threading in general is still evolving in games. Data parallel works in a lot of cases very well where thread pools and task dags lose horribly due to Amdahl's law. On the other hand, there are a couple places where data parallel falls down. The point of all this is that this is an area of ongoing research, it's not a cut and dried "x" is the thing to do, though thread pools is really not a good solution as mentioned, its a crutch to get some threading, not good threading.