Jump to content

  • Log In with Google      Sign In   
  • Create Account


Member Since 24 Apr 2011
Online Last Active Today, 08:31 AM

#5291660 when to use concurrency in video games

Posted by AllEightUp on 14 May 2016 - 11:44 PM


*except on some MS compilers, where on x86, volatile reads/writes are generated using the LOCK instruction prefix

MS doesn't use the LOCK instruction for volatile reads and writes. LOCK would provide sequential consistency, but MS volatile only guarantees acquire/release. On x86, reads naturally have acquire semantics and writes naturally have release semantics (assuming they're not non-temporal). The MS volatile just ensures that the compiler doesn't re-order or optimize out instructions in a way that would violate the acquire/release semantics.

Yeah, you're right - I've struck that bit out in my above post. From memory I thought that they'd gone as far as basically using their Interlocked* intrinsics silently on volatile integers, but it's a lot weaker than that. I even just gave it a go in my compiler and couldn't get it to emit a LOCK prefix except when calling InterlockedCompareExchange/InterlockedIncrement manually :)


This means that even with MS's stricter form of volatile, it would be very hard to use them to write correct inter-thread synchronization (i.e. you should still only see them deep in the guts of synchronization primitives, and not in user code).



As a general note involving the volatiles, I also went and did a test for fun.  I took the scheduler for my distribution system and added a single volatile to the head index of the lazy ring buffer.  I changed nothing else, I'm still using explicit atomic load/store to access it.  It slowed down the loop by about 10%.  That's quite a bit worse than my worst guess.  This was on a dual Xeon and compiled by Clang, I'd be terrified to see what happens with the MS hackery on volatiles.  As a note: there is an option in VC2015 to disable the MS specific behavior now I believe, so it may not be any worse than Clang with that set.


As to volatiles and threading in general, I don't believe I use the keyword volatile anywhere in my code, both home and work, and it is fully multi-core from the ground up.  Unlike what I called out above, I'm not using it just to ship, it is a fundamental design goal of the overall architecture.

#5291441 when to use concurrency in video games

Posted by AllEightUp on 13 May 2016 - 12:25 PM

I think one key point is being glossed over in regards to the when portion of the question... When do you use concurrency? "Only when you need it to ship!"  I have shipped many games where I've added threading engines but only bothered to port bits and pieces of code to the parallel models to hit a solid 60FPS with a little headroom.  I just wanted to point this out as it seemed to be getting glossed over in the 'shiny' reasons you do concurrency.. :)

#5290634 How do I design this kind of feature?

Posted by AllEightUp on 08 May 2016 - 06:32 AM


There is another manner to look at this which may be a variation of haegarr's response.  ...

Yes, that is definitely what I meant except that you're going the refactoring way (which, of course, is totally fine) where I already have been trapped once back in time and hence know of that particular necessity we're talking about here.


Interestingly enough, there is many good stuff to learn from TA / IF engines, where the interaction of the player is focused on performing such kind of actions.



My primary motivation for separation of usable from use in this case is that the results can now be exposed to script systems considerably easier.  I generally dump most of these items into a behavior tree since the simple 'use a key' example can be extended to include a lot more checks and becomes the basis of a puzzle system also.  I.e.:



-  haveItem(key, "Some door")

-  haveItem(scroll, "Nasty Green Ritual")

-  isDay("Tuesday")

-  isHour("Noon")

-  actorInVicinity(Player, "Nasty Green Altar", 5)

-  actorHasPerformed(Player, "Sacrifice", "Chicken Feet")

-  makeActionAvailable("Trigger Nasty Green Apocalypse")


Now the player can destroy the world by reusing prior work.  With enough generic actions, even the final line can be pushed off to script such that none of this requires custom code.  Maybe the OP is not making a game with puzzles, maybe he is, either way though fixing SRP allows throwing this stuff in script where it is easier to reconfigure and experiment.

#5290399 How do I design this kind of feature?

Posted by AllEightUp on 06 May 2016 - 06:58 AM

There is another manner to look at this which may be a variation of haegarr's response.  At the top level, your problem is generally known as cross cutting where a design works for 90% but 10% doesn't fit the same pattern and causes issues such as this.  Usually this is caused by a failure in the design to separate concepts properly and your current design suffers from this.  Consider your ItemEffect, it only supplies a single function so how could it be breaking SRP?  Well, in the example provided it conflates the concept of 'usable' with the concept of 'use'.  I.e. it checks that it is in a usable condition before attempting to trigger a state change.  Separation of the 'is it usable' from the 'use it' concerns would be a first step to solving much of the design issue and as haegarr suggests it is similar in behavior to how a component entity model works.  So, reworking your example, you could do something like the following:
class Action {public: virtual void do(...) = 0;}
class UseKey {
  virtual void do(...) override
It has no condition checks, it just performs the action with the assumption that something else has validated that everything is ready to go.  The thing that makes the checks could be broken down into many conditions (generally a good idea) but I'll be lazy and outline it in a single class and also assume you have a services oriented design around things:
// This assumes it is an inventory item, the same pattern holds for other variations.
class ConditionCheck {
  virtual void addToInventory(...) = 0;
  virtual void removeFromInventory(...) = 0;
class KeyConditions {
  virtual void addToInventory(...) override
    // Assume a 'smart world' which supplies various services.
    // The primary service of concern here is an awareness system.
    mQuery = GetWorld().SpatialAwareness().Query()
      .Radius(2.5) // keys pay attention to things within 2.5 meters.
      .OnChange(std::bind(&KeyConditions::onVisible, this, std::placeholder::_1));

  virtual void removeFromInventory(...) override
    mQuery = nullptr;

  void onVisible(const std::vector<ObjectHandle>& doors)
    for(door : doors)
      if (haveKeyFor(door) && facing(door))

  SpatialAwareness::QueryHandle mQuery;
This is not a perfect example but hopefully shows the goals and direction such an architecture would take. It combines a more complete following of SRP with a reactive design to prevent the behavioral lock in you are finding.

Perhaps this is too much of a change, you could still borrow some concepts and fix the SRP issue to be in a better position at a minimum. Of course, the issue a lot of folks have with something like this is that it feels (is) pretty abstract and takes a bit to get used to. Additionally, WoopsASword, does have a point, unless this is a relatively major portion of your gameplay, a simplified solution may be better. I would tend to use this solution if I were creating a huge RPG style game, but if it were a fairly simple game, I'd keep it simple.

#5284955 In terms of engine technology, what ground is left to break?

Posted by AllEightUp on 03 April 2016 - 09:30 PM

Among others, look at the recently-released Maxplay engine. We're using it at work -- I spend my days mixed between it, Unity, and custom work -- and many of my former co-workers from past jobs have been using the engine on new projects. (Notably, The Void, hi guys!) While not polished like Unity or Unreal or Source or other longstanding engines, there are many things that once you realize they exist, make it hard to switch back to the older engines.

At GDC there were quite a few groups pushing new game engines, many with excellent innovations. A few of them I mentioned above.

It is probable that Unity and Unreal will be incorporating the functionality over time, but as they are older an established they've got to invest heavily on maintaining the past. As is typical, incumbents are less agile than newcomers but will adopt features as they can.


Correction, MaxPlay is not released.  It was used to build a demo for Intel and then shown privately.

#5281561 Building in a OS that you don't have (Cross-platform 2D engine)

Posted by AllEightUp on 16 March 2016 - 05:16 PM

I generally use VMware Fusion running Windows 7 or 10.  It can support running DX 10 I believe and GL 4.0 so it does pretty well for our needs.

#5281542 Building in a OS that you don't have (Cross-platform 2D engine)

Posted by AllEightUp on 16 March 2016 - 02:55 PM

I would actually disagree with ByteTroll to a certain level.  We use continuous integration and everything is built using VM's and in a lot of cases our automated tests run within the VM's also.  So long as that process completes, we hand the builds to QA and they test on real hardware.  Additionally, when working on a Mac, I tend to run a Windows VM for testing reasons until I switch back to a real Windows box.  So, I use VM's quite a bit for both building and running.  Is this common?  I'd actually say it is for folks doing a lot of cross platform work.

#5281255 C++ Self-Evaluation Metrics

Posted by AllEightUp on 14 March 2016 - 03:11 PM

I like to answer: by myself 8, with Google 10+.. :D

#5281096 Multithreaded engines

Posted by AllEightUp on 13 March 2016 - 02:56 PM

I would expand on Hodges and Tangletail's results as they are both valid but I think miss an important point.  The problem I have seen with most game threading is that instead of changing the game code itself to work correctly when threaded, folks end up writing these monstrosity threading systems with task dependencies, work stealing etc etc when none of that is really needed.  I don't say those solutions are wrong but everytime I see them in production code I generally end up avoiding dependencies and such by simply refactoring the code with proper mutable/immutable divisions such that I can perform all the same work by simply issuing block of tasks, a fence or barrier, and another block of tasks.


Here is an example since I was doing some work last weekend on exactly this problem.

13ms per frame with dependency driven linear code:



4ms per frame with the dependencies broken into two passes, first pass does all the inter relational computation, second pass writes all the results back to the objects.



The changes were simple and quick to implement.  I just broke up the 'look at all the other objects' portion of the code from the 'update my data' portion of the code such that it could be run in parallel.  And this is not a contrived test, it is real game code running a very expensive simulation over 25k objects.  The funny thing is that without the renderer it runs about ~500FPS with 100% cpu load, <~1% spent in kernel and almost completely linear scaling across 2-12 cores and then it starts falling off in terms of gain as you get closer to 24 cores, mostly because 25k objects just isn't enough to feed the cores.


Perhaps I'm missing some reason for these complicated task scheduling systems but I don't think so.  I've shipped a number of games using this same style of threading without problems and it usually makes the code considerably easier to maintain if for no other reason than side effects have to be removed.  But you have to be explicit and as such you need to break your logic into a series of pieces much more like a GPU set of shaders instead of linear code.  It is very similar to the other forum thread about data oriented versus object oriented design.


Anyway, for someone new, I'd always suggest keeping it simple and learning the usually simple set of rules instead of trying to write complicated threading solutions so you don't have to follow the rules correctly.

#5279255 Cross-platform API design

Posted by AllEightUp on 03 March 2016 - 08:38 AM

A typical method I use leverages the compiler include paths. Here's an example:

    core <- platform agnostic headers, <core/whatever.hpp> for instance.
    windows <- platform specific headers for windows.
    osx <- same, just for Mac.

When I build the solution, makefile or whatever (I use CMake most of the time) I set the include path to <include> for all the generic code and then add an include path for <include/windows>.  Now, when I type in "#include <core/Platform.hpp>" that comes out of the windows specific directory.  I have no need of ifdef/else/endif pragma's and things are relatively clean.  A nice thing about this (especially using generators like CMake) is that you can use the same style for things such as target processors, if SIMD is enabled/disabled etc.  Of course you eventually end up with 4+ different paths to include the single library but I have found that preferable to most other solutions I've tried.  Also keep in mind that the same style works at the source level.


Hope this helps.

#5277380 Managing game object IDs

Posted by AllEightUp on 21 February 2016 - 07:38 PM

Just one thing to mention here is unless you are really hard up for memory, and this really should not be a place that is eating up memory, using 64 bit values for id's is generally perfectly acceptable.  You can generate 100 id's per second for approximately 5.8 billion years without wrapping.  As such, I humbly suggest you should not worry about it too much...  :)

#5274846 Behavior Trees or FSM?

Posted by AllEightUp on 08 February 2016 - 07:47 AM

As Sean say's there is no hard either/or choice to make, FSM and BT are inherently mixable.  On the other hand, I have shipped titles with only FSM's and others with only BT like approaches.  When I have mixed I usually break things down into two groups, items with well known and fixed rules defining entry/exit/transition use FSM and items where the rules are a bit more sloppy become BT's.  For instance, a relatively common FSM state: moveTo.  It moves the actor from point a to b and exits only if the target has been changed or reached, an interrupting event such as death, decision interrupt (i.e. a new enemy shows up in target area) occurs etc.  Generally you code such a thing up and reuse it in lots of places and it is a tool you don't have to tweak a lot.  On the other hand, the decision to call moveTo is often based on a list of priorities which BT's make fairly easy, i.e. if I'm low heath and that's a good hiding spot, moveTo, or I'm low health and in cover, don't moveTo, etc.


Getting into the more specific of your goals though.  Generally speaking FSM versus BT does not solve for the squad tactic side of the equation.  In this area you generally need an invisible actor which is a representation of the squad and handles the overall intelligence.  Generally speaking this means turning off the individual unit decision making and letting the invisible object be a general planner.  In the past I have done this using multiple levels of FSM and BT where the squad intelligence says things like move into this area which spawns a FSM which moves the individual units to new points in the general area.  The top level intelligence gets a notification when moved or some interesting thing has happened and starts planning the next thing for the squad to do.


The hierarchical behavior systems as mentioned are easy to describe but often a nightmare to implement.  So, rather than worrying so much about the one or other choice, I'd suggest you look into planning how you would do something simple like having a squad go to a location and take cover with all the little edge case rules needed such as two units don't try for the same cover, the individually go to their closest cover, sniper units stay to the rear of expected area of action, etc.  Getting that figured out in a hierarchical or other AI system is a massive pain in the ass but when done correctly looks awesome to the end player that watches "smart" behavior of enemies.


Hope this gives you some more ideas and refinement.

#5265143 How to mark roof for show/hide?

Posted by AllEightUp on 06 December 2015 - 10:01 AM

Generally speaking, you will eventually want a bit more information about the world than just inside/outside.  What I suggest may be a bit overkill for the specific question, but a subset could do what you are looking for.


I start by determining rules that I can use to make general categorization of the area.  For instance, I might start by saying that any tile which is roofed and can be entered from a tile that is not roofed constitutes a transition between indoors and outdoors.  Then I may say any roofed tile which connects to another roofed tile is considered part of the same indoor area.  By running a relatively simple flood fill style algorithm I can uniquely identify various groups of tiles which are joined together to form a single indoor area.


Starting with something this simple, you can now find which tiles are connected interiors and only disable drawing of the roofs for those tiles.  Expand on these rules and you can do things such as identifying special 'types' of rooms based on contents, internal divisions such as doors between room areas etc.  The nice thing is that a fairly simple set of rules can categorize a relatively large number of things so if you have AI involved they can do relatively intelligent looking things when wandering around.

#5259982 C++ Weird Behaviors With Multi-Threading

Posted by AllEightUp on 01 November 2015 - 08:07 AM

Looking through the code, much as Hodgman suggests, there are a number of unprotected items taking place and you need to get your protections cleaned up.  But, additionally, I would suggest removing the use of the Event type.  Event in Windows should be avoided as it doesn't tend to work well for most things people use it for.  Generally speaking, I would replace that with a semaphore.  I suggest this because event doesn't work as most folks expect and it is a very common cause of deadlocks and missed work.  The problem is that events are simply toggles set true or false, if you call set event multiple times, the event is still only set or unset to workers, hence only one thread gets the event and then resets it, even though there are more work items that should be issued to other threads.  NOTE: the behavior can be different based on how you construct the event, if it is not auto reset, multiple threads could attempt to pull work when there is only one item queued.


I don't claim this is your problem, but in conjunction with Hodgmans suggestion, it would be a start to likely fixing this code.

#5256469 Task Scheduling Question.

Posted by AllEightUp on 09 October 2015 - 10:16 PM

I suggest you look at this differently.  What is it that causes dependencies in the first place?  The code is not dependent on other code and does not require ordering, the problem is all about data access the code performs while running.  It is obvious to state this, but have you ever really thought through the implications?  Entity component systems 'can' be extremely fast and damned near optimal over n-cores in some use cases, so the real question is what is it about a component object model which made things better?  I don't mean to suggest that component object models are better or worse, I find they both have general use cases where they excel and fail, so using one or the other is dependent on your goals.


But, specifically, your description of solution points out a common thought process which I, personally, don't agree with.  Attempting to precompute some form of dependency graph is self defeating.  It's like using a qsort on 'mostly' sorted data, a simple bubble sort would have been faster for the use case.  The same thing generally applies to executing game code, normally order doesn't matter all that much and you only have to worry about dependency in a very few cases.  So, instead of sorting objects up front, which have dependencies on other objects, simply allow objects to 'defer' themselves to be executed later.  The idea is that you do one pass on all objects leveraging code/data cache coherency and all the good things of not worrying about dependencies and solve the expensive case of non-linear memory access only after you have reduced the problem to the point of having a hand full of remaining items to compute.


I'm not sure the above makes a lot of sense, the point is trying to say that sorting initially in a game is basically a waste of a lot of complicated time.  Let the handful of objects with dependencies identify themselves in the first pass over all objects, wasting a minor bit of performance.  Compared to a big sorting tasked once a frame, taking a small number of cache hits and such in a second (or third/forth) pass over just the remaining items will almost always out perform pre-sorts.


Hopefully this suggests other ways to look at the problem..