Sign in to follow this  
Ashaman73

Hitting the perforamce wall: sensory scanning

Recommended Posts

Well, I'm currently trying to test and tune the performance of my game and I'm hitting a performance wall within my AI.
Some background:
[list][*]pathfinding is not part of the bottleneck and can be ignored[*]entities are controlled by a behavior tree writting in Lua using LuaJit2[*]entities change update intervals according to activity of surrounding(LOD)[/list]
My stress test is a cluster of about 50 entities in a combat situation with the player close. My idle performance for some entities is ~4ms , this stress test goes up to 60-120 ms, that is 1-2ms per entity :o. My goal is to reduce it to a much lower, more stable behavior (~20ms).

Due to the none linear behavior of the performance, I fear that the sensory scanning of the surrounding is the bottle neck O(n^2). I can query the surrounding for entities which met a list of certain criteria and get a basic rating of their importants (checking the criteria is quite expensive). I'm not sure what the best approach would be, some ideas sofar:
[list=1][*]Asynchronously sensory scanning, an entity enqueues a request and get the result a few frames later.
Issue: Delayed reaction and refactoring of my BHTs.[*]Dynamic range adjustment, when the performance decreases the scanning range decreases too.
Issue: missing important targets when inside of a cluster.[*]Caching of sensory results: partly done (current target is locked and reevaluated)
Issue: missing of changing surrounding, delayed reaction ?[/list]
Anyone has some experiences, best practises or ideas ?

Share this post


Link to post
Share on other sites
[quote name='Ashaman73' timestamp='1313560043' post='4850158']
Well, I'm currently trying to test and tune the performance of my game and I'm hitting a performance wall within my AI.
Some background:
[list][*]pathfinding is not part of the bottleneck and can be ignored[*]entities are controlled by a behavior tree writting in Lua using LuaJit2[*]entities change update intervals according to activity of surrounding(LOD)[/list]
My stress test is a cluster of about 50 entities in a combat situation with the player close. My idle performance for some entities is ~4ms , this stress test goes up to 60-120 ms, that is 1-2ms per entity :o. My goal is to reduce it to a much lower, more stable behavior (~20ms).

Due to the none linear behavior of the performance, I fear that the sensory scanning of the surrounding is the bottle neck O(n^2). I can query the surrounding for entities which met a list of certain criteria and get a basic rating of their importants (checking the criteria is quite expensive). I'm not sure what the best approach would be, some ideas sofar:
[list=1][*]Asynchronously sensory scanning, an entity enqueues a request and get the result a few frames later.
Issue: Delayed reaction and refactoring of my BHTs.[*]Dynamic range adjustment, when the performance decreases the scanning range decreases too.
Issue: missing important targets when inside of a cluster.[*]Caching of sensory results: partly done (current target is locked and reevaluated)
Issue: missing of changing surrounding, delayed reaction ?[/list]
Anyone has some experiences, best practises or ideas ?
[/quote]

You may be able to prune your calculations using BSPs or Quadtree's to designate units that are allowed to update since something interesting is going on next to them (i.e., the player is there), the rest are not even quieried until they are close enough become actively involved at which point they are activated. You're best bet is to reduce as much as practical any "search" or "scanning" algorithms in favor of direct calculations with an entity based on local neighbors in the BSP or Quadtree to see if an action should take place due to some sort of threshold (e.g., i can see the player and i want to hurt him so i attack, player made a noise i could hear i'm going to go find him, i see that food and i'm hungry so im going to eat it, etc.). At least that is my approach and what i've seen from alot of AIs that try to implement more complex behavior. Each AI can easily know what it is near it locally since that is quickly accessible in the BSP or Quadtree, and therefore just does a test to see if it is close enough to care about it to respond with some sort of action. Course, if you get 50 AIs in the same location, this could still slow things down, so you'll want to spread them out smartly, but it is still better than testing 50 AIs in a level when you only need to test 2 or 3 which are close enough to respond to the player cause they're all in the same quadrant or subquadrant. Either way, quadtrees make finding which AIs should be active logarithmic and is constant afterwards until the quadrant is left, and makes action determination constant (best)[1 AI and 1 player] or linear (average)[more than 1 entity in a quadrant with memory of past actions] per AI. Even if you allow testing of everything in that quadrant, you're still dealing with a handful of things, so the number of tests is low, so things move quickly.

This also helps a bit with allowing AIs more freedom of action separate of the player as you can run a quadrant per cycle or a subquadrant per cycle to further limit things and then start updating "uninvolved" AIs less when player-vicinity AIs get actively involved with the player. This would allow for more autonomy with less overhead and allows your AIs to do other interesting things, like stumble upon the player, stalk other AIs or look for junk, or respond to a level-wide alarm. For finding stuff your AI doenst directly know about, but may have "general" ideas of, depending on your implementation, it may be able to "roll up" information if you need an AI to find soemthing, so that it knows that quadrant X contains a subquadrant with item Y, it then procedes to quadrant X and looks further when it gets there, eventually getting to the locaiton of item Y. You can use this for the AIs to talk to eachother and share knowledge of interest like "i saw a player in quadrant X" or "there was a dead body in quadrant X, watch out!".

Basically, the best thing i can suggest is, if the player isnt experiencing it, it ultimately is wasted cycles, so don't update AI that isnt involved with the player unless it is important to your game. Keep AI dormant or unchecked until it becomes a concern to the player whenever possible, you can cheat this in many ways: Dormancy can be AIs sleeping, "chatting" with other AIs, "goofing off", "bored", watching a campfire, "cleaning" something, whatever that makes look more interesting to the player when they stumble upon the AI, but ultimately require no cycles to process in any way until the player stumbles upon them when you can activate any animations or whatever to give variety until the player takes action or the local AIs notice the player in some way and react.

Anyway, hope that helps.

Share this post


Link to post
Share on other sites
Make sure your actions are largely event-based rather than polling for them. e.g. Don't constantly check to see if a grenade went off in your area. Let the grenade tell [i]you[/i] that it went off.

Also, if there are expensive calculations that are going to be used be multiple agents, store them separately and don't recalculate for each agent. When needed, the agents can check for the calculated value and simply use it.

Share this post


Link to post
Share on other sites
As NetGnome suggested, are you using space partitioning for anything else like collision or rendering? If so, you can use it to make AI queries fast. After all, AI sensor range queries are very similar to collision detection ... i.e. "what units intersect with my sensor range"?

Personally, I did a game with 100's of units on essentially a 2D plane (top-down RTS style game play) and used the game's existing spatial partitioning grid to optimize the AI queries. Since the partitions are uniform, I was able to define all AI unit's sensor ranges in terms of grid squares, so there was very little explicit distance calculation when finding targets. If the target is in a nearby grid square, it is in range. Even with 200+ AI units the game runs 60 fps on 5 year old laptop hardware.

Anyway, yeah...use spatial partitioning if you can. ;)

Share this post


Link to post
Share on other sites
@NetGnome and EJH:
Thx for the sugguestions, but all these things I have already considered and implemented in a similar way. For example I'm using sweep'n'prune + binary search to query the surrounding, which is very fast even with a very high number of queries per frame.

This isn't the issue. The problem is, that once I've determined the potential candidates I need to check for game related properties (is it an enemy, is it injured, is it an item or a npc etc.). These tests are quite expensive in my engine due to the nature of my attribute handling, this in combination with a cluster of npc seems to be one of the bottlenecks (~40% AI performance in cluster test case).

Currently I'm checking all potential candidates in the surrounding, in a cluster this could be 10-20. With a more or less 50 entity cluster , these are 500-1000 entity scans per frame. Certainly I have to take a look at the per-frame scan, these seems obviously too much, still I must take care to do some load balancing to avoid sudden time peaks .
If I would only check the first 10 entities it could lead to reaction artifacts (i.e. an entity stands in a cluster of 10 unimportant entities not seeing the 11th dangerous entity approaching).

[quote]Make sure your actions are largely event-based rather than polling for them. e.g. Don't constantly check to see if a grenade went off in your area. Let the grenade tell [i]you[/i] that it went off.

Also, if there are expensive calculations that are going to be used be multiple agents, store them separately and don't recalculate for each agent. When needed, the agents can check for the calculated value and simply use it.[/quote]
I'm polling only to determine entities around to include
new entities who enter the area of interest. When scanning an entity I already use some kind of "view" which maps certain states to simple attributes (i.e. 100 of 500 hp => condition="bad"). Thereafter most checks depends on the association between two entities.

Hmmm... maybe it is time to implement some kind of query-engine and move the script-part into the more efficient c++ engine in addition to an entity-entity query cache...:unsure:

Share this post


Link to post
Share on other sites
[quote name='Ashaman73' timestamp='1313647262' post='4850626']
@NetGnome and EJH:
Thx for the sugguestions, but all these things I have already considered and implemented in a similar way. For example I'm using sweep'n'prune + binary search to query the surrounding, which is very fast even with a very high number of queries per frame.

This isn't the issue. The problem is, that once I've determined the potential candidates I need to check for game related properties (is it an enemy, is it injured, is it an item or a npc etc.). These tests are quite expensive in my engine due to the nature of my attribute handling, this in combination with a cluster of npc seems to be one of the bottlenecks (~40% AI performance in cluster test case).

Currently I'm checking all potential candidates in the surrounding, in a cluster this could be 10-20. With a more or less 50 entity cluster , these are 500-1000 entity scans per frame. Certainly I have to take a look at the per-frame scan, these seems obviously too much, still I must take care to do some load balancing to avoid sudden time peaks .
If I would only check the first 10 entities it could lead to reaction artifacts (i.e. an entity stands in a cluster of 10 unimportant entities not seeing the 11th dangerous entity approaching).

[quote]Make sure your actions are largely event-based rather than polling for them. e.g. Don't constantly check to see if a grenade went off in your area. Let the grenade tell [i]you[/i] that it went off.

Also, if there are expensive calculations that are going to be used be multiple agents, store them separately and don't recalculate for each agent. When needed, the agents can check for the calculated value and simply use it.[/quote]
I'm polling only to determine entities around to include
new entities who enter the area of interest. When scanning an entity I already use some kind of "view" which maps certain states to simple attributes (i.e. 100 of 500 hp => condition="bad"). Thereafter most checks depends on the association between two entities.

Hmmm... maybe it is time to implement some kind of query-engine and move the script-part into the more efficient c++ engine in addition to an entity-entity query cache...:unsure:
[/quote]

Given that you have so many AIs so close together, you can probably get away with implementing some form of swarming, so that 2-5 of your enemies are actually just one AI that appears to be 2-5 individuals. Of course this is heavily dependant on the game you're making and whether this can be achieved without it becoming obvious to the player, but if done right, it may look like a squad of enemies working in tandom, giving the appearance of "better" AI while also cutting down your calculations by ~2-5x (assuming you're crafty about it)

Share this post


Link to post
Share on other sites
[quote name='Net Gnome' timestamp='1313749008' post='4851135']
Given that you have so many AIs so close together, you can probably get away with implementing some form of swarming, so that 2-5 of your enemies are actually just one AI that appears to be 2-5 individuals. Of course this is heavily dependant on the game you're making and whether this can be achieved without it becoming obvious to the player, but if done right, it may look like a squad of enemies working in tandom, giving the appearance of "better" AI while also cutting down your calculations by ~2-5x (assuming you're crafty about it)
[/quote]
This is just a test case. All of my AI agents are individuals with their own demands,weaknesses, fears etc. , but there are certain game events where a cluster of entities is probably and I want to cushen the FPS impact. In fact it is a performance optimisation.

I've already extended my engine to run on multiple cores, but AI, especially when running in a VM (lua), is really hard to optimize, therefore I've started to transfer the scanning of the surrounding from my script engine to the C++ engine and make it multithreaded and asynchroniously. I've done this already with pathfinding with really great results.:D

Share this post


Link to post
Share on other sites
[quote name='Ashaman73' timestamp='1313750492' post='4851141']
[quote name='Net Gnome' timestamp='1313749008' post='4851135']
Given that you have so many AIs so close together, you can probably get away with implementing some form of swarming, so that 2-5 of your enemies are actually just one AI that appears to be 2-5 individuals. Of course this is heavily dependant on the game you're making and whether this can be achieved without it becoming obvious to the player, but if done right, it may look like a squad of enemies working in tandom, giving the appearance of "better" AI while also cutting down your calculations by ~2-5x (assuming you're crafty about it)
[/quote]
This is just a test case. All of my AI agents are individuals with their own demands,weaknesses, fears etc. , but there are certain game events where a cluster of entities is probably and I want to cushen the FPS impact. In fact it is a performance optimisation.

I've already extended my engine to run on multiple cores, but AI, especially when running in a VM (lua), is really hard to optimize, therefore I've started to transfer the scanning of the surrounding from my script engine to the C++ engine and make it multithreaded and asynchroniously. I've done this already with pathfinding with really great results.:D
[/quote]

Yea, a VM would definitely kill performance :) Though i am interested in how you structured your multi-threaded pathfinding and returned the info to the AI.

Share this post


Link to post
Share on other sites
[quote name='Net Gnome' timestamp='1313750984' post='4851146']
Though i am interested in how you structured your multi-threaded pathfinding and returned the info to the AI.
[/quote]
To avoid unnecessary locking my pathfinding works only when modification of the underlying waypoint structure is barred. In this case multiple thread can read the waypoint structure without any problem. Whenever I manipulate the wp structure (seldom) I reset all running pathfinding processors.

The communication between pathfinder and AI is event driven. The AI can submit a pathfinding request which will be enqueued to the pathfinding queue. The requests will be dispatchted to one of pathfinding processors, which will do the pathfinding job within a time slice(i.e. 50 nodes are visited per frame) and send the result to the AI. The current action of the AI is stopped while waiting on the path result, thought the AI is still active (scanning the surrounding, reacting to events etc.).

My multithreaded approach is similar to the one in Id tech 5(google for id tech 5 challenges), where they use job lists and signal-wait tokens for synchronisation.

Share this post


Link to post
Share on other sites
[quote name='Ashaman73' timestamp='1313756833' post='4851173']
[quote name='Net Gnome' timestamp='1313750984' post='4851146']
Though i am interested in how you structured your multi-threaded pathfinding and returned the info to the AI.
[/quote]
To avoid unnecessary locking my pathfinding works only when modification of the underlying waypoint structure is barred. In this case multiple thread can read the waypoint structure without any problem. Whenever I manipulate the wp structure (seldom) I reset all running pathfinding processors.

The communication between pathfinder and AI is event driven. The AI can submit a pathfinding request which will be enqueued to the pathfinding queue. The requests will be dispatchted to one of pathfinding processors, which will do the pathfinding job within a time slice(i.e. 50 nodes are visited per frame) and send the result to the AI. The current action of the AI is stopped while waiting on the path result, thought the AI is still active (scanning the surrounding, reacting to events etc.).

My multithreaded approach is similar to the one in Id tech 5(google for id tech 5 challenges), where they use job lists and signal-wait tokens for synchronisation.
[/quote]

Very interesting. So you're AI queues up an "Update Path" job to your pathfinder processor and enters itself into a "wait for path update" state so it can continue to monitor things. Then before the pathfinder thread rejoins, it fires off an "Update AI with Path" event which the AI consumes at which point it changes itself back to some sort of action state? Very slick!

Are the events managed via the AI or are they managed via the pathfinder processor, or some game event manager?

Share this post


Link to post
Share on other sites
[quote name='Net Gnome' timestamp='1313761244' post='4851197']
Very interesting. So you're AI queues up an "Update Path" job to your pathfinder processor and enters itself into a "wait for path update" state so it can continue to monitor things. Then before the pathfinder thread rejoins, it fires off an "Update AI with Path" event which the AI consumes at which point it changes itself back to some sort of action state? Very slick!
[/quote]
Yes, the same principle (I'm using BHTs, nost only FSMs, which can take control from the waiting node ).

[quote name='Net Gnome' timestamp='1313761244' post='4851197']
Are the events managed via the AI or are they managed via the pathfinder processor, or some game event manager?
[/quote]
I have a general purpose bus system in my engine over which several components can communicate via events.

Share this post


Link to post
Share on other sites
Space partitioning - simple is 2D 3D zones that evey movement removes only zone registration and adds to new or leaves the same if unchanged
A grid square or cubic area with size about that of the maximum interaction range (or finest LOD). Since you usually wont be centered you will have to check all
adjacent squares (8) /cubes(26) and filter with true interaction distance to get a culled interaction/sensor list for the object. If populations are usually low per 'zone'
then simple fast link-lists can be used instead of a more heavyweight (and wasteful) container types.

Individual objects maintain a 'target' list of already assessed 'in-range' targets so that you dont have to reevaluate them every time.
You will have to add newly seen targets and remove long non-visible ones AND if your AI logic evaluated actions or distances
(ie- closer is a higher threat). Thats Delta (change) processing that would require reevaluation due to some difference in the situation)
AI interpretation of the situation dwarfs the basic object culling processing so if you can cache the results of alot of classification
processing you can often save alot of CPU resource. That also allows for AI processing to have a continuum of identified objects to
watch patterns of behavior (and depending if you have partial/faulty sensor info to infer positions of somethig you cant temporarily see)

If your game has 'sides' then the results of AI classification/assessment might be pooled per side instead of per object. Grouping entities
(ie- a fleet) pooling at a smaller size than a 'side' likewise might be done.

LUA runs about 10X slower than native code so if you can move some of these bulk crunching operations to C++ then you can save
a significant amount of CPU and leave more time for the irregular logic of the scripting.


I think LOD processing interval was already mentioned (long range scan would happen less often than close range where the more
complex/immediate/high-priority interactions will take place.

Share this post


Link to post
Share on other sites
[quote name='ApochPiQ' timestamp='1313921312' post='4851846']
Have you actually profiled your code at all?
[/quote]
Yes, I have an ingame time profiler on microsecond base to measure the avg call duration and the call count of certain important processes. The benefit is, that I got time measuring on every run with a detailed log at the end, I can narrow down on certain test cases and I can use it within lua. Additionally I use [url="http://sleepy.sourceforge.net/"]sleepy[/url], a free profiler.

I have not used any commercial profiling tools sofar, I save the trail versions for hardship cases :wink: and I've tested the lua profiler without success.

Share this post


Link to post
Share on other sites
If you already have profiling data, any clear optimization wins should be staring you in the face.

Also, without [i]us[/i] having access to that data, pretty much any advice you get on optimization here is going to be about as useful as blindly pounding on the keyboard and hoping a faster algorithm pops out when you're done. We don't know nearly enough about your systems, your code, your hardware, or anything else relevant to your situation to be able to give you [i]really relevant[/i] advice on your particular case. The more you can share, the better.

Share this post


Link to post
Share on other sites
[quote name='ApochPiQ' timestamp='1314040437' post='4852457']
If you already have profiling data, any clear optimization wins should be staring you in the face.

Also, without [i]us[/i] having access to that data, pretty much any advice you get on optimization here is going to be about as useful as blindly pounding on the keyboard and hoping a faster algorithm pops out when you're done. We don't know nearly enough about your systems, your code, your hardware, or anything else relevant to your situation to be able to give you [i]really relevant[/i] advice on your particular case. The more you can share, the better.
[/quote]
Yes, it dawns on me that this special case seems to be more of a monolog where I must help myself :( (and yes, I'm currently optimizing the bottleneck staring right at me :P). The issue is, that I need quite "advanced" advices to optimize it further. The engine is in development since 12 years and it is far from perfect, but I've done a lot to keep a clear design, to optimize performance and keep it stable and bug free. It is quite teethed, writing all this down to give enough information would lead to a wall of text.

Ok, an other, more general approach: [i]How to make an AI written in lua multithreaded ?[/i]
Don't answer this question yet, I think that I need to create a new topic asking this question. Sofar I'm thanking all who tried to help me arranging my ideas :wink:

Share this post


Link to post
Share on other sites
I arrived a little late for the party, but on the individual vs swarm AI, I would submit to you that when you have so many agents clustered together it would actually be desirable for them to team up and behave as a group, even if they started as individuals, if two stray army soldiers on the same team meet up in the battlefield they would likely join forces as long as they have a common goal, in this case, killing the player.

As for lua scripting, make sure you use lua to connect the dots and not to make the dots, leave calculations and behavior implementation to the native language and use lua only to trigger the reactions to specific events, events that should also be calculated and triggered from native language.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this