• Create Account

Banner advertising on our site currently available from just \$5!

# [Flocking] How to improve performance?

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

18 replies to this topic

### #1sheep19  Members   -  Reputation: 414

Like
0Likes
Like

Posted 12 April 2014 - 04:18 PM

I'm doing a simulation of flying birds for my thesis.

I am using the usual steering behaviors: Cohesion, Separation, Velocity Match and Obstacle Avoidance (to avoid terrain).

The program is written in C# and uses Unity3D. I have disabled physics for the boids.

I'm also using alglib to calculate K-nearest neighbors using AKNN. alglib uses k-d trees for this.

So here are the fps I get:

100 boids = 60 fps

200 boids = 35 fps

300 boids = 23 fps

500 boids = 13 fps

1000 boids = 5 fps

As the number of boids increases, performance degrades rapidly. I'd like to do something to improve it, but sadly I am using the free version of unity3D, which does not have a profiler.

Is the performance I'm getting good? I haven't found results online for this.

Should I find another library instead of alglib?

By the way, you can see the program online here. Requires unity3d plugin.

### #2Álvaro  Crossbones+   -  Reputation: 15728

Like
5Likes
Like

Posted 12 April 2014 - 04:30 PM

First, do yourself a favor and start thinking of how long it takes to do something, not how many times you can do something per second. It's easier to think about the former than the latter.

It looks like your performance is approximately linear in the number of boids, which is about the best you can hope for. If you were using a more naive algorithm, you could easily see the time it takes to complete a frame growing with the square of the number of boids.

So the issue is then how you can get more than 6000 boid updates per second. I don't know anything about unity3D, but I am sure there are ways to profile your code. If nothing else, one can insert timing calls into the code to generate a log that tells you how long each step is taking.

If everything else fails, post your code so we can see if you are doing something obviously inefficient.

### #3jefferytitan  Crossbones+   -  Reputation: 2445

Like
0Likes
Like

Posted 12 April 2014 - 10:55 PM

One thing I have heard (but have not confirmed) is that Unity does not support multi-threading, but .Net/Mono does. Therefore you may be able to do all the boid calculations in a separate thread using primitive types, then pass the results back to Unity to update the boid GameObjects. Note that none of the Unity provided types are thread-safe AFAIK. Vector3 should be fine, I hope! ;)

### #4sheep19  Members   -  Reputation: 414

Like
0Likes
Like

Posted 13 April 2014 - 02:27 AM

I have made some measurements using the StopWatch class of the .net framework.

When using the following steering behaviors I had the results below - using 500 boids.

Time to build K-D tree: 2.34ms. So this is not the problem.

Time to update boid steerings (all 500 boids, average time of N frames)

Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 46.77ms

Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 39.55ms

Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 24.44ms

So by not using some behaviors the time dropped significantly. The difference from not using VelocityMatch to not using Cohesion was bigger because Cohesion uses k = 30 for aknn while velocity match uses k = 10. When I set k = 10 for cohesion the time dropped as well.

This is the code that behaviors use to get k-nearest neighbors:

public class Flock
{
List<Entity> boids;
alglib.kdtree kdt;

public Flock(params Entity[] _boids)
{
boids = new List<Entity>();

}

public int GetNeighborhoodInfo(Entity character, int maxNeighbors, float maxDistance, float minDotProduct, double aknnApproxVal, out Vector3 center, out Vector3 avgVeL)
{
var point = new double[3]{character.position.x, character.position.y, character.position.z};

var results = new double[0, 0]; // the array containing the results (points of the k nearest neighbors)

int neighbors = (maxNeighbors == 0 ? 0 : alglib.kdtreequeryaknn(kdt, point, maxNeighbors, aknnApproxVal));

// get the results
alglib.kdtreequeryresultsxy(kdt, ref results);

center = avgVeL = Vector3.zero;

// where is the character looking
var look = character.transform.forward;

int count = 0;
for (int i = 0; i < neighbors; ++i)
{
var pos = new Vector3((float)results[i, 0], (float)results[i, 1], (float)results[i, 2]);
var vel = new Vector3((float)results[i, 3], (float)results[i, 4], (float)results[i, 5]);

if (pos == character.position)
continue;

if( Vector3.Distance(character.position, pos) <= maxDistance )
{
// check for angle
if( minDotProduct > -1.0f )
{
var offset = pos - character.position;
if( Vector3.Dot(look, offset.normalized) < minDotProduct )
continue;
}

center += pos;
avgVeL += vel;
++count;
}
}

center /= count;
avgVeL /= count;

return count;
}
}


I think it's pretty straight-forward... So I guess the only solution is to search for another library for aknn? I read somewhere that akkn could be made more efficient when using octrees instead of k-d trees.

### #5Loofou  Members   -  Reputation: 111

Like
0Likes
Like

Posted 13 April 2014 - 04:16 AM

Normally you don't need to run an AI system every frame for every agent. I assume your above Behaviours are on every single agent entity?

One way to optimize is to use agent handlers that calculate behaviour for a bunch of agents. That way you can reuse many calculations.

Another way (these are combinable) is to use Coroutines (Threads are a really bad idea in Unity) to run your algorithms over multiple frames. This can boost performance really good.

The last not so obvious problem you might have is colliders. You wrote above you "disabled" physics on your agents. Does this mean you removed the RigidBody component or do you set it to isKinematic? Do you still have colliders attached to the objects?

The problem is: Unity handles every Collider that is not on the same object as a Rigidbody like it was static. Moving a static collider means for Unity to recalculate EVERY other static collider, which is performance hell. So the way to go with moving colliders is 'Collider + Rigidbody (isKinematic)' if you do want to disable physics on that object. Or you remove the colliders, too.

### #6Álvaro  Crossbones+   -  Reputation: 15728

Like
0Likes
Like

Posted 13 April 2014 - 07:05 AM

I am a programming dinosaur, so I only use dynamic memory allocation when I need it. For a simulation like this, nothing other than perhaps the k-d tree implementation should use dynamic memory allocation at all. Your code has new' all over the place, and this is my first guess as to why it is so slow. Perhaps there is no alternative to using new' everywhere in C#, but my C/C++ brain is disgusted.

Is there any way you can switch to C++ for this project?

### #7Telanor  Members   -  Reputation: 1422

Like
0Likes
Like

Posted 13 April 2014 - 09:59 AM

Álvaro is right about the news. The ones on the Vector3s don't cost anything but the 2 array allocations could be hurting performance and are completely unnecessary. Move the allocation of those 2 arrrays into the constructor and just set the values in GetNeighborhoodInfo.

### #8Loofou  Members   -  Reputation: 111

Like
0Likes
Like

Posted 13 April 2014 - 01:02 PM

Álvaro is right about the news. The ones on the Vector3s don't cost anything but the 2 array allocations could be hurting performance and are completely unnecessary. Move the allocation of those 2 arrrays into the constructor and just set the values in GetNeighborhoodInfo.

That is true, but I don't think that is the most pressing performance issue there. Making the arrays fields, will help nonetheless. Also you should use jagged arrays, not twodimensional arrays, as they are faster to fill in the mono version Unity is using, than the latter ones. (Jagged arrays: int[x][y] instead of int [x,y])

### #9sheep19  Members   -  Reputation: 414

Like
0Likes
Like

Posted 13 April 2014 - 01:47 PM

I see what you guys mean, but the problem is that alglib expects the array to be of this type. And even If allocate earlier it will still be re-allocated by alglib (that's why it takes the array by reference) as you can see below:

// get the results
alglib.kdtreequeryresultsxy(kdt, ref results);

Also as far as I'm aware dynamic memory allocation in C# is fast (I read it's just a pointer increment, not sure how it works).

I can't switch to C++, it's too late now Also Unity doesn't support C++ so...

The last not so obvious problem you might have is colliders. You wrote above you "disabled" physics on your agents. Does this mean you removed the RigidBody component or do you set it to isKinematic? Do you still have colliders attached to the objects?

The agents do have rigidbodies attatched to them because I use them to set the velocity of the agents. However they do not have colliders attatched to them.

### #10Telanor  Members   -  Reputation: 1422

Like
0Likes
Like

Posted 13 April 2014 - 01:52 PM

The fact that it takes it by reference should actually be proof of the opposite. It's allowing you to handle the allocation. Also pre-allocating it in the constructor doesn't mean you have to change the array type. You can keep the same type, all you have to do is move the allocation:

List<Entity> boids;
alglib.kdtree kdt;
double[] point
double[,] results;

public Flock(params Entity[] _boids)
{
boids = new List<Entity>();
point = new double[3];
results = new double[INSERT, SIZE];

}

public int GetNeighborhoodInfo(Entity character, int maxNeighbors, float maxDistance, float minDotProduct, double aknnApproxVal, out Vector3 center, out Vector3 avgVeL)
{
point[0] = character.position.x;
point[1] = character.position.y;
point[2] = character.position.z;

int neighbors = (maxNeighbors == 0 ? 0 : alglib.kdtreequeryaknn(kdt, point, maxNeighbors, aknnApproxVal));

// get the results
alglib.kdtreequeryresultsxy(kdt, ref results);

...

}

Edit: Sorry I misread the code, I fixed it up a bit. I'm not sure what the size of the results array should be. The documentation for alglib says this:

        XY      -   possibly pre-allocated buffer. If XY is too small to store
result, it is resized. If size(XY) is enough to store
result, it is left unchanged.

So if you provide an array that's properly sized, there won't be any allocation

Edited by Telanor, 13 April 2014 - 02:22 PM.

### #11Loofou  Members   -  Reputation: 111

Like
0Likes
Like

Posted 13 April 2014 - 03:41 PM

Also as far as I'm aware dynamic memory allocation in C# is fast (I read it's just a pointer increment, not sure how it works).

Dynamic Memory Allocation is fast in .Net 4+ and in newer versions of Mono. Unfortunately Unity still uses Mono 2.6 and that has a pretty bad garbage collector, so you have to be careful of memory allocation and especially destroying if you need good performance. I recommend you read the following article series: http://www.gamasutra.com/blogs/WendelinReich/20131109/203841/C_Memory_Management_for_Unity_Developers_part_1_of_3.php

But I still think your best solution would be to slice up your algorithm using Coroutines. Heavy calculations are best spread over multiple frames, if you don't need precise real time adjustments. You don't need to calculate the positions of every agent 30+ times per second.

If you don't know about Unitys Coroutines, I gladly provide a few articles about it.

### #12Hodgman  Moderators   -  Reputation: 38840

Like
0Likes
Like

Posted 13 April 2014 - 07:20 PM

When using the following steering behaviors I had the results below - using 500 boids.
Time to build K-D tree: 2.34ms. So this is not the problem.

Time to update boid steerings (all 500 boids, average time of N frames)
Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 46.77ms
Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 39.55ms
Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 24.44ms

2.36ms is 14% of a 60Hz frame -- quite a large percentage of time! If every task in your game took this long, you could only have 7 different tasks making up your entire game logic (and still run at 60fps).

A good way to get some perspective when profiling these numbers, is to pick a framerate that you'd ideally like to have your game running at (e.g. 60fps or 30fps) and then convert the frame-times (ms) into percentages (e.g. a 60Hz frame gives you a budget of 16.667ms per frame):

Task                              Time      60Hz%  30Hz%
K-D Tree                       => 2.34ms  / 14%  / 7%
VelocityMatch                  => 7.22ms  / 43%  / 22%
Cohesion                       => 22.33ms / 134% / 67%
Separation + ObstacleAvoidance => 17.22ms / 103% / 52%
49.11ms / 294% / 148% < totals

I wouldn't be surprised if it weren't possible to reimplement your entire boids system to process those 500 objects in under 2ms total

### #13AllEightUp  Moderators   -  Reputation: 4539

Like
0Likes
Like

Posted 13 April 2014 - 09:10 PM

When using the following steering behaviors I had the results below - using 500 boids.
Time to build K-D tree: 2.34ms. So this is not the problem.

Time to update boid steerings (all 500 boids, average time of N frames)
Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 46.77ms
Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 39.55ms
Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 24.44ms

2.36ms is 14% of a 60Hz frame -- quite a large percentage of time! If every task in your game took this long, you could only have 7 different tasks making up your entire game logic (and still run at 60fps).

A good way to get some perspective when profiling these numbers, is to pick a framerate that you'd ideally like to have your game running at (e.g. 60fps or 30fps) and then convert the frame-times (ms) into percentages (e.g. a 60Hz frame gives you a budget of 16.667ms per frame):

Task                              Time      60Hz%  30Hz%
K-D Tree                       => 2.34ms  / 14%  / 7%
VelocityMatch                  => 7.22ms  / 43%  / 22%
Cohesion                       => 22.33ms / 134% / 67%
Separation + ObstacleAvoidance => 17.22ms / 103% / 52%
49.11ms / 294% / 148% < totals

I wouldn't be surprised if it weren't possible to reimplement your entire boids system to process those 500 objects in under 2ms total

Hodge is quite correct, that's a pretty horrible update per frame to be hitting.  I also don't see a performance time for:

int neighbors = (maxNeighbors == 0 ? 0 : alglib.kdtreequeryaknn(kdt, point, maxNeighbors, aknnApproxVal));
alglib.kdtreequeryresultsxy(kdt, ref results);


Tree searches are expensive, even via kd-tree's.  The first step of that search is going down the tree looking for the closest tree node, then it has to search up and out for the neighbors.  If nothing else, this is blowing huge chunks of memory bandwidth and most likely showing up as most of your per boid update.  Doing this from within every boid update, almost guarantee'd to be O(log n) or worse in cost.

My suggestion for this would be to move the creation of the list of neighbors out to a system designed to linearize the cost of maintaining such information.  You can generally google for: "Spatial Awareness" with things like "c++ library" and get some starting points.  I wrote a library which I tested with flocking a while back and even did a vid capture:   It's ugly but that's 2000 boids flocking and the most costly thing is the rendering.  WIthout rendering, I pushed that system to 10000 boids and it still ran at a steady 60fps while barely touching the cpu.  It's designed for AI usage which includes a bit of a different set of requirements but the basic "aware of" list is what flocking requires and as such the overall system was well tested by flocking.

Anyway, you are highly unlikely to be able to optimize a per-object solution such as this beyond a certain point.  If you follow all the current suggestions, you might drop 5-10ms off the frame times but you'll still have all the same problems, just pushed out a bit further.  Try some Googles for "AI Spatial Awareness" including "c++ library" and other trailers and you might find some ideas.  You'll be looking for items which supply "aware of" lists.  They can be implemented in many ways though I favor sweep and prune due to it having the fewest edge cases and the most consistent per frame processing time no matter what is going on.

### #14sheep19  Members   -  Reputation: 414

Like
0Likes
Like

Posted 14 April 2014 - 09:55 AM

Alright, I moved the array allocations in the Flock class's constructor. And in the rebuildKDTree function I do not create the array every frame.

The time dropped from ~46 ms to ~36 ms. That was pretty big. By the way, I had forgotten that Unity uses Mono and not .NET

About co-routines: Seems I do not have access to startCoroutine because the class that handles the updates does not derive from MonoBehavior. However I did it myself: I manually update only n boids every frame (basically what I do is set a flag in each behavior that indicates whether it will use its old value or calculate a new one).

This is my update function:

public override void Update(float dt)
{
if (Time.frameCount == currentFrame) return;
currentFrame = Time.frameCount;

//------------
kdBuildTimer.Start();

flock.RebuildKdTree();
anchorFlock.RebuildKdTree();

kdBuildTimer.Stop();
//------------

updateTimer.Start();

for(int i = 0; i < entries.Count; ++i)
{
(((entries[i].behavior as PrioritySteering).Groups[1] as BlendedSteering).Behaviors[0].behaviour as Separation).useOldValues = true;
(((entries[i].behavior as PrioritySteering).Groups[1] as BlendedSteering).Behaviors[1].behaviour as Cohesion).useOldValues = true;
(((entries[i].behavior as PrioritySteering).Groups[1] as BlendedSteering).Behaviors[2].behaviour as VelocityMatch).useOldValues = true;
}

int limit = (current + 100 < entries.Count ? current + 100 : entries.Count);
for(; current < limit; ++current)
{
(((entries[current].behavior as PrioritySteering).Groups[1] as BlendedSteering).Behaviors[0].behaviour as Separation).useOldValues = false;
(((entries[current].behavior as PrioritySteering).Groups[1] as BlendedSteering).Behaviors[1].behaviour as Cohesion).useOldValues = false;
(((entries[current].behavior as PrioritySteering).Groups[1] as BlendedSteering).Behaviors[2].behaviour as VelocityMatch).useOldValues = false;
}

if( current == entries.Count )
current = 0;

// update birds and calculate the center of the flock
for (int i = 0; i < entries.Count; ++i)

updateTimer.Stop();
++frameCount;
Debug.Log("Time to build K-D tree: " + kdBuildTimer.ElapsedMilliseconds / (float)frameCount + ", Time to update: " + updateTimer.ElapsedMilliseconds / (float)frameCount);
}



The k-d tree is built every frame however. I guess I can do something about that too - re-build it every x seconds (where x < 1.0 of course).

So now by having 500 boids and updating 100 every frame I get 25 fps (time to update is about 16ms) instead of 15 fps (and 34 ms to update).

But the motion of the boids is not the same when updating different number of boids every frame. But I guess this is expected. By the way isn't this "cheating"? That I'm not updating all every frame? Or is this a standard thing to do in AI?

Another thing: If I set more boids (5000 for example ) I still get very long update times even when I still update 100 boids per frame. This is because those that get updaded still do queries on a huge k-d tree... Can I do something about this? What I was thinking was having something that "remembers" where the closest neighbors were so that next time they won't be far away. But how can I know this without re-calculating everything?

Edited by sheep19, 14 April 2014 - 09:58 AM.

### #15ferrous  Members   -  Reputation: 2860

Like
1Likes
Like

Posted 14 April 2014 - 12:15 PM

It's pretty standard to not run expensive stuff every frame, and AI is often expensive.  =)

### #16Loofou  Members   -  Reputation: 111

Like
1Likes
Like

Posted 14 April 2014 - 01:52 PM

About co-routines: Seems I do not have access to startCoroutine because the class that handles the updates does not derive from MonoBehavior. However I did it myself: I manually update only n boids every frame (basically what I do is set a flag in each behavior that indicates whether it will use its old value or calculate a new one).

Well, you can start a coroutine from another class (that is a monobehaviour). It just needs to return IEnumerator, that's all. There are some tricks to start coroutines from non-monobehaviour classes as well, but that is pretty deep stuff and I don't think it will help at all in your situation. If you have found a way to spread your calculation over multiple frames - good.

Another thing: If I set more boids (5000 for example ) I still get very long update times even when I still update 100 boids per frame. This is because those that get updaded still do queries on a huge k-d tree... Can I do something about this? What I was thinking was having something that "remembers" where the closest neighbors were so that next time they won't be far away. But how can I know this without re-calculating everything?

And well, for that you can use a heuristic algorithm that uses the last known position, and velocity vector of an agent and tries to "guess" where it will be after a few frames. Of course, the longer you "guess" the more errors in calculation you will have. So you need to balance your update times well, so that you still got good flocking behaviour.

### #17Ashaman73  Crossbones+   -  Reputation: 11092

Like
2Likes
Like

Posted 14 April 2014 - 11:54 PM

Performance wise two bottlenecks comes to mind.

1. updating the structure which holds the boids (kd-tree)

2. scanning for boids for the steering behaviour.

1. Using a kd-tree (with dynamic allocation) could be an issue. I would recommend sweep'n'prune, too, which is basically a fix array (for each axis) which could be sorted (insertion or tim sort) in almost O(n), due to the fact, that the coherence of the moving boids is really low.

2. If using sweep'n'prune, you can scan the surrounding in O(log n) using a modified bineary search, which sums up to O(n *log n). To improve this further on, you can cache the scan result for a few frames. Just increase the radius to scan a start set of neighbor boids, then hold this set and update it each time. The maximum movement of a boids is a threshold to when the cache will get invalid (in this case the probability that an unknown boid is entering the steering context is quite high). If you don't need a 100% perfect context information at each frame, you can really save a lot of time with a cache.

With this improvements you can get your time complexity from O(n*n) to almost O(n).

Ashaman

### #18wodinoneeye  Members   -  Reputation: 1052

Like
0Likes
Like

Posted 28 April 2014 - 07:03 PM

Might it be possible to create 'flock' lists  (list of unit members within a defined 'flock entity - I'd do it with a link list pointer chain var in the unit object itself) where for a short time current grouping subset data is used for the interacting 'flocking' spacing comparisons,  and then at an adaquately frequent  interval the spacial tree is rebuilt/adjusted and the flock lists are rebuilt to be used again for some period ?

Cluster Flocking ...

The main idea is to eliminate alot of repeated heavyweight processing overhead  (particularly with high frame rates and only small relative movement increments of all the units)  Spacial tree is modified less frequently than it is used...  The flock subsets might only have to be checked brute force within their own members 1/2N^2  when N is fairly small might allow a 'good enuf' approximation.

Possibly some 'smart' interflock calculations might be done/needed to detect flocks in the same vicinity -- possibly to force a variable interval for a subset recalculation using the full spacial processing.

Observing testing to tune for the right intervals and how to keep the subsets groups small enough for it to have any advantage over the simpler constant application....

--------------------------------------------Ratings are Opinion, not Fact

### #19noisecrime  Members   -  Reputation: 746

Like
1Likes
Like

Posted 30 April 2014 - 07:34 AM

Whilst everyone is correct in terms of addressing memory allocations and re-engineering your overall algorithms being used will give the best improvements, I thought it odd that no-one had pointed out that you are using a standard distance check instead of a squared distance check. I.e. instead of using Vector3.Distance() use Vector3.sqrMagnitude() and provide your MaxDistance as a squared value. Though these days i've no idea of the relative performance benefits of doing this, probably not as much as it used to, but still worth doing IMHO.

I'd also maybe look into where you convert the KD Tree results into Unity types (Vector3). For example you convert vel but its only used if several conditional checks pass, would it not be more prudent to do the conversion only when you needed to add it to the avgVeL?

Indeed I would maybe question why you convert to Unity types at all within the function, leave the vectors as doubles (unless there is a tangible benefit in casting to and working with floats) and only convert to Unity Vector3 the avgCenter and avgVeL at the end of the function or in the calling function. It would mean writing your own sqrDistance and dotproduct methods, but they are easy enough, just don't forget to convert your character.pos from Vector3 to a double array in the calling function. In other words its probably far better to keep all your data in one format/type and convert at the end, instead of converting on the fly, especially when some of the data may not even be used.

However as I said at the beginning you probably find better gains through changing your approach  or algorithms, but the few optimisations above are pretty straightforward and simple to implement.

For repeating an action infrequently like updating your KDtree you could use InvokeRepeating() in Unity.

Its not clear to me but are you calling GetNeighborhoodInfo() for each boid for each Cohesion, Separation, VelocityMatch? If so shouldn't you simply be sharing the results between all three behaviours and just calculate it once per AI Tick using the highest K value you need?

Finally a few points about Unity. If you have a spare email address you should be able to sign up for a 30 day demo of Pro, getting you access to the profiler. Alternatively you could try emailing Unity Technologies and asking for a Pro demo key, explaining that you are using it for your thesis and supply evidence, as they are quite a nice company and might be willing to help you out - at least it doesn't hurt to try.

Edited by noisecrime, 30 April 2014 - 08:09 AM.

Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

PARTNERS