• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
Norman Barrows

AI speedups for running game in accelerated time

28 posts in this topic

I'm working on a caveman fps / rpg / simulator game.

 

because its a simulator, you can accelerate time, like in a flight simulator, or a submarine simulator, to skip past the dull stuff.

the game currently supports real time, 2x, 900x, and 5400x game speeds. Unlike waiting in Oblivion, you can do this with potential hostiles nearby. It automatically drops out of accelerated time if anything tries to attack you.

 

in the game, you can interact with the environment and perform actions (like in The SIMs) such as gathering resources and making weapons.

 

when performing an action such as making a wood spear, which takes 4 game hours, it runs fast with no active targets, and slow with a herd of 20 antelope mulling about. with no active targets, at 900x normal speed, making a spear takes 4 seconds (in real world time). with 20 active targets, at 900x normal speed, it takes 22 seconds (in real world time).

 

so i'm looking for ways to speedup the AI so it still does (or approximates) what it does in real time combat, but runs faster at 900x and 5400x normal speed.

 

so far i've gone to selecting target (which requires a range check to every active target) once every 5 seconds, and made the animals steer based on a desired heading calculated every 5 seconds (which requires a call to atan). these calculations are performed on every 15th target in succession over 15 frames. IE: if ( target_number % 15 == frame_number ) do_AI_stuff.

 

i'm using a fast RISC style 2d bounding box distance calculation for the range check when selecting target.

 

i tried an atan lookup function, but the math coprocessor is faster.

 

the amount of time between target selections is the amount of lag in response the targets exhibit. if an antelope is programmed to run when you get within 50 feet, it can take up to 5 seconds for it to target you, decide you're too close, and start running. So selecting targets even less frequently doesn't seem to be an option.

 

everything else in the AI except the range check to all targets and the atan call is fast simple code, mostly ADDs and COMPs.

 

however it does do a check once per second for being "cornered", which requires moving 4 points in 2d (4 sins and 4 cos's) and 4 collision checks (point vs axis aligned bounding boxes).

 

i can spread out the target and heading calculations so i'm only doing it for one target per frame, but i'd still be doing it for all targets over 5 seconds of game time. this would reduce per frame overhead to a maximum of one set of target calculations per frame, but not overall overhead. 

 

i can do the "cornered" check every 5 or 10 seconds, that will get me a little something, but other than that, I'm not sure what to try next.

 

faking it? is that the answer? when all else fails, fake it?

 

 

 

0

Share this post


Link to post
Share on other sites

Profile it.

 

yeah, unfortunately, there's not much left to slice out. i'm afraid that all profiling will tell me is that range checks to every active target take time, calculation heading with atan takes a little time, but is not done often, and all the rest is sort of spread out evenly across code that changes animal AI state  (hunting/eating, wandering, standing, flocking, etc), adds stuff together, and checks if a > b kind of thing. while this still needs to be confirmed with timing tests, I suspect some "approximation" approach may be required, so i was sort of thinking ahead.

0

Share this post


Link to post
Share on other sites
Well, I got that 22 seconds (with targets) vs 4 seconds (no targets) down to 7 seconds (with targets) vs 4 seconds (no targets). same test case, a saved game with a herd of antelope, and the player 17% of the way through making a wood spear.
 
and I haven't had to reach for the profiler yet.  not sure I have one... <g> 
 
I hate it when I'm researching a topic, and I find a thread of interest, and they never tell you at the end what they did.
 
so, for the benefit of anyone in the future reading this and wondering how I went from 22 to 7...`
 
the main routine for moving all targets, move_animals, was your normal big loop that did some stuff common to all targets, then called other big AI routines depending on the type of target.
 
the big loop had 6 basic types of AI that it called. these 6 basic types encompass the 14 different types of AI currently in the game.
 
The first thing i did was i replaced the big main loop with a small one that calls one of 6 AI routines, based on the AI type the unit uses.
 
Then I started working on the maintain distance AI, as this was the one used in the test case, and the one most commonly used  in the game.
 
I started by copying the code from the subroutine calls it made into the AI routine itself. This gave a better picture of what it did where.
 
Then I split it up into a proper oh, whats it called? I've been using them forever...well at least since the 80's...  
a high level expert system determines the AI state for the unit (attack, flee, stand, etc). the AI state determines which low level expert system is used to move the unit. low level expert systems in turn may determine lower level AI states (avian attack climb vs avian attack dive), which in turn determine the use of even lower level expert systems (avian_attack_climb_AI and avian_attack_dive_AI). 
 
so i split the maintain distance AI into setstate() [the high level expert system] and runstate() [the low level expert systems].
 
the selection of target once every 5 seconds is done in setstate, just before the target is used for a state change check.
 
then i continued to copy code from subroutines into setstate and runstate, until i got down to stuff like animal[a].AIstate=0.
 
at this point i was looking at 4 states in runstate that were active during the test case: stand, flee, wander, and flock.
 
and wander and flock had calls to relheading() [get relative heading to target]. the dreaded heading calculation with atan().
 
before the speedup, the game steered targets based on their relative heading to their target, which requires frequent calculation of heading to the target, the dreaded atan. 
 
I had already changed flee to pre-calculate the heading to the target they were fleeing from when they select target. the heading is calculated, stored, and then used to steer by.
 
So i changed wander and flock to do the same thing. so now heading to target is pre-calculated and stored when a target is selected, and that heading is used to steer the animal. 
 
I haven't taken a look at the lower level code that actually does the steering or moving. there may be additional speedups there. I also haven't taken a look at the collision recovery AI or cornered AI yet.
 
Now I have to make the other types of AI pre-calculate target heading.
 
I also plan to split the six basic types of AI into the actual 14 different types of AI in the game. This means each type of AI will have its own dedicated high level and low level expert systems, with no checks or branching to handle multiple types of AI (such as ground attack vs avian attack).
 
funny, the more things change, the more they stay the same.
 
after 25 years of writing game AI, atan is still a bottleneck, and pre-calculating stuff once and storing the result for later use is still required.
0

Share this post


Link to post
Share on other sites
What are you using atan for? If it's a bottleneck, you may have luck replacing it with vector math (ex: dot products and cross products can handle the majority of gameplay logic relating to angles).
0

Share this post


Link to post
Share on other sites

What are you using atan for? If it's a bottleneck, you may have luck replacing it with vector math (ex: dot products and cross products can handle the majority of gameplay logic relating to angles).

 

 

calculating compass heading in degrees from x1,z1 to x2,z2 in the x,z plane in world space.

 

hadn't occurred to me to try using vectors as a speedup.

 

but this get into the whole "how to store an orientation" question. and the whole issue of having to possibly convert a given method of storing orientation (unit direction vector) into something the graphics engine can take as input (Euler angles, etc).

 

the title uses a shooter interface, so orientations are stored as Euler angles (x,y,z rot), and the player is limited to two degrees of rotational freedom (xrot and yrot).

 

the graphics engine is based off of directx 9 fixed function pipeline for maximum compatibility  as i recall, that API only supports Euler angles, rotation matrices, and quaternions as input. so any ugly math avoided by using vector stuff you eventually have to use to convert it into something the graphics engine can take as input.

 

however, if there's a lot of ugly math done before drawing, it may be faster to avoid the ugly math before drawing, and take the hit of one bit of ugly math to convert to graphics engine format.

 

have to think about that one, it may be a better way to do the whole "movement / orientation / graphics engine input format" thing.

 

first step, what input formats does the graphics engine support?

 

well, it takes a transformation matrix, and has a bunch of helper functions for creating matrices (D3DxMatrixRotationX, etc).

 

so, you store orientation as a direction vector. player can't rotate around Z axis, so don't need a rotation angle, so its not a quaternion, just a dir vector. so then you want to turn, and you have a desired direction vector. I dunno, maybe i'm just locked into trig thinking. I got an A in Linear at OSU but recall almost none of it (i can't even do a linear combination anymore, forgot how!). so i'd be calculatng angles based on projections, and determining whether my desired vector is left or right of my current vector, and by how much. just seems like the wrong kind of numbers to me. vectors are more about translation than rotation. but as i said, i recall almost zero linear, so there may be a way that i don't remember

 

really the way to do might be with a quat, since they can do it all, then simply limit the degrees of freedom for the game in question. but this may be overkill. i've only contemplated using quats in games with 3 degrees of rotational freedom.

0

Share this post


Link to post
Share on other sites

Just use direction vector and your up vector to construct a matrix for the object. 2 of the rows will be the direction and up vector and the other one will be the cross product of those (need to normalise it if up and facing aren't perpendicular after doing the cross product).

0

Share this post


Link to post
Share on other sites

Just use direction vector and your up vector to construct a matrix for the object. 2 of the rows will be the direction and up vector and the other one will be the cross product of those (need to normalise it if up and facing aren't perpendicular after doing the cross product).

 

I used a similar approach in my starship flight sim, but it had up, right, and forward unit vectors, as it had 3 degrees of rotational freedom.

 

to turn it, i'd rotate the three points  around x,y,and z, then gram-shmidt. the engine was a p-correct t-mapped poly engine i wrote myself when MS bought rend386 and took it off the market, and before they released it as DirectX 1.0. Thee engine used euler angles for input, so I still had to do projections to extract the euler angles for drawing. it worked great, starships would swoop, dive, roll, etc.

 

right now i'm working on an airship flight sim.   but again, its a 2 degrees of rotational freedom game, so it uses xrot and yrot for orientation. 

 

before doing that, i tested storing orientation as a 4x4 matrix, and using gram-shmidt.   its ok, at 1600x900, but not pixel perfect.   

 

i really need to break down and learn quats. but that's not required until the next version of Gamma Wing or SIMSpace, or if i get motivated to do a WW1 fighter sim (like red barron 2).

 

i guess the idea with a quat is you can rotate the direction vector with a rotation matrix to turn on two of your axes, and simply increment or decrement zrot to rotate around local z.

0

Share this post


Link to post
Share on other sites

Just use direction vector and your up vector to construct a matrix for the object. 2 of the rows will be the direction and up vector and the other one will be the cross product of those (need to normalise it if up and facing aren't perpendicular after doing the cross product).

 

 

ok, but how do you figure out which way to turn and climb/dive, given a current direction unit vector, and a desired direction unit vector?

 

calculating the displacement vector in each plane would just be SUBs, nice and fast, can do that 40,000 times a frame, no prob.

 

but that only gets you direction (left/rght) and final position (displacement, left or right), you'd still have to back solve to get the rotation amount.

 

the problem i'm trying to solve is this:

i'm a critter in the game. i have a current direction (yrot or current_direction_unit_vector). i have a current position x1,z1.

i have a target (another critter) at x2,z2. whats the direction to the target?

from this i can determine the relative heading to the target, by subtracting my heading from the heading to the target: rel_heading = heading to tgt -  my yrot.

this tells me whether to turn left or right, and by how much, and if my target is in an "arc of fire" in front of me. arc of fire is used to turn attacking critters towards their target until abs(relheading) < 15 degrees, before they start to run towards the target, so they don't run in big circles. abs(relheading) < 45 degrees is used for testing if a target is "in strike zone" in melee combat.

 

the game used to pre-calculate relative heading to target once every frame for every active critter. calculating heading requires atan. basically atan dx/dz, trapping out the 4 cardinal point cases, and watching your signs and numerators and denominators, depending on the quadrant.

 

this was sufficient for real time, and 2x speed, but not for 900x and 5400x speeds.

 

i finished reorganizing the AI into nice modules, and making everything select target once every 5 seconds, and calculate heading to target when it selects target, and as needed when it changes AI state, such as an animal returning  to a location its supposed to defend, and needing to calculate the heading to the nearest threat, now that is "back at its post" again. range to target is pre-calculated once per frame.

 

i re-ordered the checks in the loops that select a target, so that on average it will skip critters that should not be targeted sooner. no major speed up there. the select target routine used the most was already in optimal order. 

 

other than finding a substitute for atan, abut the only other thing i can think of is speeding up the search for select target. its a loop that searches a static array of structs.

right now it searches the entire list (200 structs), even though there many only be 20 or so active. i was thinking of keeping track of the last active struct, and stopping the loop there. the list is un-ordered. after skipping inactive, dead, friendly, etc targets, is does a bbox range check. but once the simulation gets close to 200 active targets, that will be about 40,000 range checks each time targets are selected. at once every 5 game seconds, with a completion time of 4 game hours to make a woods spear, that works out to about 115.2 million range checks while the  player sits and watches an animation of their caveperson making a wood spear (whats planned), or watches the world run at high speed in the background while a message displays progress (what it does now). 900x game speed draws one frame every game minute. 5400x speed draws one frame every game hour. so the effect right now with the progress message is like watching time lapse photography. if the game didn't have 3rd person view, i might not do the animations at all.

0

Share this post


Link to post
Share on other sites

one speedup i though of (inspired by the displacement vector idea):

 

instead of calculating heading(x1,z1,x2,z2) using atan:

 

dx=x2-x1

dz=z2-z1

 

by checking for dz or dz = 0 you get the 4 cardinal directions

 

by checking dx > dz and dx < dz, you get the 45 degree cases, and can pin everything else down to an octet (22.5 degrees). by checking for things like dx > 2*dz you can get it down narrower than an octet. this may be a good enough approximation.and requires no atan, just SUBs, COMPs, and maybe some MULs.

0

Share this post


Link to post
Share on other sites

Reading through your discussion points suggests to me that you are optimizing the wrong problem in general.  Atan should not be in your list of problems, more than likely your problem is simply one of not stepping back and optimizing how often you are calling various functions at a higher level.  I.e. the quote: "so far i've gone to selecting target (which requires a range check to every active target) once every 5 seconds" should draw your scrutiny and atan in general should be ignored.  Instead, what you probably want to be looking at is how to remove calls to the range check in the first place.

 

The first thing that jumps out at me in the mentioned quote quote is that it is a description of a very bad "O" notation processing time curve.  Your curve is basically number of calls versus number of objects or: 2 objects means 2 calls (i.e. distance from me to you and the inverse), 3 objects means 6 calls, 4 objects means 12, 5 objects means 20 calls etc.  That's an "O( n^2-n )" pattern which is very bad.  By trying to update less frequently you are reducing only by a linear amount and as such, no matter what you do at some point the number of calls still drags everything to a stand still.  What you really want is something to get rid of the "n^2" portion of this problem so you are only calling the code in question once in a while, or not at all preferably.

 

Before I suggest my solution, I'll make a specific point:

 

atan(2) is not really evil, it's not even that slow unless you call it in massive amounts.  What is more evil is that it is often a function where you are getting a cache miss since it is used to figure out an angle between this object and another object and that other object is unlikely to be in cache at the moment.  A profiler will often glob up the cache miss into the atan call since the compiler will often make the actual "access" to the other object while setting up the stack prior to calling atan(2).  Usually this means you are seeing a cache hit as included with the atan(2) call and miss-interpreting what is actually slow.  If you see atan(2) in your profiles, check the call count, if it is very high then the cost is likely true, if it is a couple hundred, you are likely looking at a cache miss and not a function call problem.  (Per frame that is.)  Profiling is critical but also reading "between the lines" is exceptionally important since cache misses are more of a black art to figure out than a gospel from any profiler or guess you can make.

 

 

Anyway, my suggestion for you is to stop optimizing things the way you are because your first goal seems like it should be to reduce the O notation issue.  (I.e. the n^2 portion of the number of calls I described above.)  You also want to reduce cache misses which suggests you need to step back and work the problem at a higher level.  The higher level would be SAS (spacial awareness system) which works over all objects and only triggers your AI bits as actual "entry/exit" events happen.  An SAS is optimized to correctly figure out the distances between objects and trigger events only when an object properly enters a given range, without going n^2 in processing time or incurring the dreaded cache misses constantly.

 

You might want to check out: http://codesuppository.blogspot.com/2008/03/spatial-awareness-system.html for an overall description of several variations.  I tend to use a sweep/prune method myself as flocking is part of my simulation and while it is slower overall it doesn't get the big performance spikes of other solutions when many objects move together.

 

Given the problems you seem to be facing though, working the problem at a higher level seems to be required as the current direction is not likely to be good enough without sacrificing a lot of game play.  My personal unoptimized single core version deals with 5k objects at 500fps easy in debug, while release is 1700+ (stl usage in debug sucks).  Given something like this in your game, you would simple start making something, add an awareness bubble around the character and if there are any "entry" events for enemies, go back to normal speed.  No more n^2 issues and optimizing small repeatedly called functions won't be a big issue anymore.

1

Share this post


Link to post
Share on other sites

Atan should not be in your list of problems, more than likely your problem is simply one of not stepping back and optimizing how often you are calling various functions at a higher level.

and only triggers your AI bits as actual "entry/exit" events happen.

 

random encounter is the entry event. death or range > visual_range + 50 ft or range > 300 feet (depending on the target type) is the exit condition (target removed from simulation). so the only AI crunched is for alive critters within 300 feet or less of a player character.

 

add an awareness bubble around the character and if there are any "entry" events for enemies, go back to normal speed.  No more n^2 issues and optimizing small repeatedly called functions won't be a big issue anymore.

 

the whole idea is to get it to run fast at 900x and 5400x normal speed with up to 200 active units in the player's awareness bubble, preferably without "faking" it.

 

 

Atan should not be in your list of problems, more than likely your problem is simply one of not stepping back and optimizing how often you are calling various functions at a higher level.

 

i'm now down to selecting target once every 5 seconds, and doing the atan at that time. there are a couple other places where i do atan, when AI state changes, like from targeting a location to move towards, to targeting a nearby hostile critter. but even that 5 seconds is probably too much. it means its up to 5 seconds before a critter notices and responds to the player or any other hostile. 

 

atan and the range checks are the two bottlenecks. atan now gets pre-calculated. so the range checks are the killer.

it is a description of a very bad "O" notation processing time curve.

 

 

yep, checks go up as the square of the number of targets.

 

gotta sign off

 

will finish reply later.

0

Share this post


Link to post
Share on other sites

random encounter is the entry event. death or range > visual_range + 50 ft or range > 300 feet (depending on the target type) is the exit condition (target removed from simulation). so the only AI crunched is for alive critters within 300 feet or less of a player character.

This is exactly the point I was making with all the O notations.  An SAS would cut the processing time in massive ways using any of the suggestions from the link I provided (well, excepting the brute force method which is what you currently use) because you would only have to worry about the "entry" events.  The SAS is optimized to run the complicated I see you/you see me stuff without going exponential in processing time.

 

add an awareness bubble around the character and if there are any "entry" events for enemies, go back to normal speed.  No more n^2 issues and optimizing small repeatedly called functions won't be a big issue anymore.

 

the whole idea is to get it to run fast at 900x and 5400x normal speed with up to 200 active units in the player's awareness bubble, preferably without "faking" it.

 

Yup, and by removing the primary overhead via pushing it out to a global system which can avoid all the bottlenecks of a "per object" solution you should be able to do that without tricks quite easily.  Simply reducing the update rate is a linear solution, it won't be a solution for long if you increase the unit counts since your current version is exponential in cost.

 

Atan should not be in your list of problems, more than likely your problem is simply one of not stepping back and optimizing how often you are calling various functions at a higher level.

 

i'm now down to selecting target once every 5 seconds, and doing the atan at that time. there are a couple other places where i do atan, when AI state changes, like from targeting a location to move towards, to targeting a nearby hostile critter. but even that 5 seconds is probably too much. it means its up to 5 seconds before a critter notices and responds to the player or any other hostile. 

 

atan and the range checks are the two bottlenecks. atan now gets pre-calculated. so the range checks are the killer.

Again, I suggest that you may be miss-identifying your problem.  Atan is a really damned fast function in reality, same with distance checks which is a simple subtraction then a square root.  (Ok, non-simd code they take dimension times n time.)  These are trivial math problems and should be executing in absolutely no noticeable time.  But, if they show up n^2 times per call, yes, they are going to be hot spots in the code.  Or, perhaps, the more notable issue is that the cache misses get identified with these functions and something which should take 10-20 cycles all of a sudden takes 500-2000 cycles to resolve because it has to go fetch a cache line.

 

Either way, a SAS would solve a lot of this by offloading a very complicated task from being per object to a global system which properly optimizes things without the brute force solution.

 

it is a description of a very bad "O" notation processing time curve.

 

 

yep, checks go up as the square of the number of targets.

I figured given the description.  Hopefully the above clarifies why I think you should stop optimizing what seems to be a minor issue.  :)

0

Share this post


Link to post
Share on other sites

while checking into SAS's and K-dimensional trees, it occurred to me that i never did a "in awareness bbox" check in the loop for select target before calculating range. so i added that to set_wild_tgt, which is the target selection routine used by the critters in the test case.  timed it, no major difference. then out of curiosity, i put a trace on it to count the number of times it checks range while selecting the target for a critter. adding "in awareness bbox" cut the number of range checks from one to zero. basically, its not doing any range checks at all, and trivially rejects everything as inactive, same species, or "can't see them through all this overgrowth".

 

when i was reorganizing the AI, I added traces to track the current target and AI state of an animal (the first critter in the herd). according to all the traces, critters never check range, don't have a target, and just transition between stand, wander, and flock AI states. so any remaining slowdown is in the expert system that handles state changes, or in the runstate() routine that executes the various AI states. time for more timing tests, or a quick look at the code to see if some parts are obviously fast or slow. but at this point, the speed the game is running at is pretty decent. but there is that up to 5 second lag in targeting.

Edited by Norman Barrows
0

Share this post


Link to post
Share on other sites

So in other words, I was only partially correct, you were optimizing something trivial but it wasn't being called thousands of times a frame, it wasn't even being called.  :) 

 

Anyway, for the future though, when you look into the SAS systems, here is a movie from a little example I put together real quick using my new simulation engine rewrite:  http://www.youtube.com/watch?v=JVTBRes6gY0&feature=youtu.be  It was a nice little unit test for the system so figured what the hell. :)

 

It is 2000 objects all running the flocking algo but using an SAS to track objects of interest (i.e. worthy of adding to the flocking calculations).  It also randomizes view distances for all the boids between 25 and 50 units (some are pretty near sighted :)) so it is a bit more challenging.  The flocking algo and movement is all done multi-core, the rendering and awareness systems are both currently single core though.  Given that it runs at over 100fps without rendering and the code is experimental and as such not highly optimized, that should give you an idea of how well the SAS system can help you out in the future with such problems if you do actually run into them.

0

Share this post


Link to post
Share on other sites

So in other words, I was only partially correct, you were optimizing something trivial but it wasn't being called thousands of times a frame, it wasn't even being called.   

 

i hadn't gotten to optimizing yet, i was still in the speculation and educated guess phase. sometimes it can show obvious things like out of order trivial rejections in loops, constants being re-calculated inside loops, and other first pass - write it easy, not fast coding stuff.  then you break bad with the timers and tracing once you've done the obvious stuff.    inspecting the code and thinking about it first will give you a good idea what the timers will tell you. if not, then learn from the timers, and next time you'll do a better job of predicting the profile results, or not writing lazy code in the first place where you know its going to matter. just doing the obvious stuff got me from 22 seconds down to about 7 seconds. a trace indicates that no further speedups are possible with atan and range while target selection continues to be done at a rate of only once every five seconds. I'd like to do target selection once per second to make them more responsive. until then, atan and range are fast enough. i now need to look at setstate(), the expert system that sets AI state, and runstate(), the code that executes the various AI states. I know that setstate() can be sped up. for MAINTAIN_DISTANCE_AI herds, every frame it "rolls the dice" to determine if a critter changes AI state between wander, flock, and stand. and sometimes "rolls the dice" to determine what they do next (wander vs flock, etc). runstate() can be sped up too. i have't sliced the avian and non-avian AI into totally separate parts yet, so many of the move() routines (attack, run, etc) have two branches of code in them, one for avians and one for non-avians. splitting that up so its just one short clean shot of code will eliminate some flow control checks. but not potentially setting state every frame , and instead doing it once per sec or so will be the major speedup.

Edited by Norman Barrows
0

Share this post


Link to post
Share on other sites

Possibly 900X 5400X  as long as you can cut down the rendering (and the question then will be how frequently you should or what events warrant an update)  That is if most of your time in the game is spent waiting for animations to finish before calculating the next move (even when the units are running asynchonusly to each other).  

 

If you have heavyweight AI taking up the majority of the processing time then you may be stuck if you cannot abstract the  solutions the AI finds and uses well enough to preserve a valid simulation.

 

 

Oh and your optimization attempts at a low level  - I assume you have already done more general optimiztions like streamlining object create/destruct functions (particularly use of object pools to get rid of the horrendous alloc/dealloc  processing costs).....

 

Lazy evaluations ???  Task queues to get rid of polling loops??  Lockless methods and/or  micro fibers ??

Edited by wodinoneeye
0

Share this post


Link to post
Share on other sites

Given that it runs at over 100fps without rendering and the code is experimental and as such not highly optimized, that should give you an idea of how well the SAS system can help you out in the future with such problems if you do actually run into them.

 

I can see how a system like this would probably be required for persistent NPC's that go about their business whether the player is there or not, like the dogfights occurring all over the front in Red Baron II. When you went on a mission, it modeled the activity of every airborne plane on the entire front.

 

in my case, you only have to worry about the awareness bubble of band members the player controls (max 10 band members). the simulation adds and removes targets from the simulation as they move into and out of the awareness bubbles of the player's band members. They are added when a band member encounters them, either by random encounter,or by moving withing range of some objective, such as an occupied cave. The are removed when they are no longer in any band member's "bubble". checks for removal are done once per second.

0

Share this post


Link to post
Share on other sites

Possibly 900X 5400X  as long as you can cut down the rendering (and the question then will be how frequently you should or what events warrant an update) 

 

900x = one frame drawn per game minute.

 

5400x = one frame drawn per game hour.

If you have heavyweight AI taking up the majority of the processing time then you may be stuck if you cannot abstract the  solutions the AI finds and uses well enough to preserve a valid simulation.

 

If you have heavyweight AI taking up the majority of the processing time then you may be stuck if you cannot abstract the  solutions the AI finds and uses well enough to preserve a valid simulation.

 

Well, its not really heavyweight, randomly change states (flock, wander, stand), turn, move. occasional heading calculations for turning (i had to go to once per second in the attack AI's to get it responsive enough), and of course, collision checks when moving.

 

Changing state once every 5 seconds vs once per second didn't do much.

 

Oh and your optimization attempts at a low level  - I assume you have already done more general optimiztions like streamlining object create/destruct functions (particularly use of object pools to get rid of the horrendous alloc/dealloc  processing costs).....

 

In the test case, no targets are being added or removed, there's just 2 dozen of them active on the screen mulling around while you watch 4 hours of game time time go by in seven seconds (at 900x).

 

all targets are stored in a static array of structs, so no malloc's or free's to slow things down or cause a possible memory leak.

0

Share this post


Link to post
Share on other sites

Task queues to get rid of polling loops?? 

 

windows message pump is the closest thing to a polling loop active during the test case.

 

the game does a lot of simulation in the background. i may need to time it with and without critters active. I may be down to the point where active critters are not such a bottleneck compared to the overall overhead of everything else being simulated.

0

Share this post


Link to post
Share on other sites
Given the amount of speculation you're doing on what's slow, I'm still rather shocked you haven't bothered using a profiler yet. Seriously, give it a go, you might be surprised how much you discover.
0

Share this post


Link to post
Share on other sites

Given the amount of speculation you're doing on what's slow, I'm still rather shocked you haven't bothered using a profiler yet. Seriously, give it a go, you might be surprised how much you discover.

 

Yes, at this point all the obvious stuff has been done, and it time for divide and conquer with the timers if i want to do more. but the most recent tests show the game running at about 917x normal speed, when running at "900x" speed.   Note that 900x is a theoretical approximate speed. IE its only drawing the screen once every 900 "game turns". And inspection of the code shows a relatively even distribution of relatively fast code. So the results will most likely be that i'm spending a little time everywhere. The first thing to test is time with critters active vs time without. that will determine whether its the AI of the active animals, or the background modeling being done all the time. then its time for the timers to determine more.

0

Share this post


Link to post
Share on other sites

 


Given that it runs at over 100fps without rendering and the code is experimental and as such not highly optimized, that should give you an idea of how well the SAS system can help you out in the future with such problems if you do actually run into them.

 

I can see how a system like this would probably be required for persistent NPC's that go about their business whether the player is there or not, like the dogfights occurring all over the front in Red Baron II. When you went on a mission, it modeled the activity of every airborne plane on the entire front.

 

in my case, you only have to worry about the awareness bubble of band members the player controls (max 10 band members). the simulation adds and removes targets from the simulation as they move into and out of the awareness bubbles of the player's band members. They are added when a band member encounters them, either by random encounter,or by moving withing range of some objective, such as an occupied cave. The are removed when they are no longer in any band member's "bubble". checks for removal are done once per second.

The way I deal with this is very simple but it leverages the SAS considerably expecting it to have done all the heavy lifting.  This is the basic idea:

 

class somedoodad : public Go::Object
{
  somedoodad()
  :  mUpdate( std::bind( &somedoodad::Update, this ) )
  {....  this is only used for serialization since member func pointers don't serialize ....}
 
public:
  somedoodad( Go::Manager& gm )
  : Go::Object( gm )
  , mUpdate( gm, std::bind( &somedoodad::Update, this ), World::Priorities::kActiveGo )
  {
    // Go ahead and activate this object, this will bind the update and start calling it
    // in the multicore engine.
    Manager().Activate( this );
  }
 
  void Update()
  {
    if( GetComponent< Go::Awareness >()->Inverse().size()==0 )
    {
       // Oh noes, time to die.
       Manager().Destroy( this );
       return;
    }
 
    // Do normal realtime update when in view of other objects.
  }
};

 

Simple, and the heavy lifting is performed in the SAS as mentioned.  Performance wise, the combination of the object/component system itself and leveraging the SAS solution allows an unoptimized (it's a rewrite leveraging C++11 as you can see) simulation engine to run 500-1000 "active" objects to be updated at well over 500fps when graphics/sound items are disabled.  So, again, the SAS solution does most of the work and my current variation has plenty of room for more specific optimization and multi-core work which can supply an easy 2-3 times performance increase overall without nit-picking details.

 

I do agree with Apoch though, performing a real profiling pass would refine the issue considerably.  I personally never optimize anything without a profile in hand to tell me what specifically started slowing things down.  Currently though, the only low level optimization I have in my rewrite is the SIMD math library which I moved over from the older system, everything I've been doing to maintain the current performance is usually just fixing brute force items to be more systematic, such as the SAS solution instead of searching for things manually.

0

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0