• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
sheep19

[Flocking] How to improve performance?

18 posts in this topic

I'm doing a simulation of flying birds for my thesis.

 

I am using the usual steering behaviors: Cohesion, Separation, Velocity Match and Obstacle Avoidance (to avoid terrain).

The program is written in C# and uses Unity3D. I have disabled physics for the boids.

 

I'm also using alglib to calculate K-nearest neighbors using AKNN. alglib uses k-d trees for this.

 

So here are the fps I get:

100 boids = 60 fps

200 boids = 35 fps

300 boids = 23 fps

500 boids = 13 fps

1000 boids = 5 fps

 

As the number of boids increases, performance degrades rapidly. I'd like to do something to improve it, but sadly I am using the free version of unity3D, which does not have a profiler.

 

Is the performance I'm getting good? I haven't found results online for this.

Should I find another library instead of alglib?

 

By the way, you can see the program online here. Requires unity3d plugin.

0

Share this post


Link to post
Share on other sites

One thing I have heard (but have not confirmed) is that Unity does not support multi-threading, but .Net/Mono does. Therefore you may be able to do all the boid calculations in a separate thread using primitive types, then pass the results back to Unity to update the boid GameObjects. Note that none of the Unity provided types are thread-safe AFAIK. Vector3 should be fine, I hope! ;)

0

Share this post


Link to post
Share on other sites

I have made some measurements using the StopWatch class of the .net framework.

 

When using the following steering behaviors I had the results below - using 500 boids.

Time to build K-D tree: 2.34ms. So this is not the problem.

 

Time to update boid steerings (all 500 boids, average time of N frames)

Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 46.77ms

Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 39.55ms

Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 24.44ms

 

So by not using some behaviors the time dropped significantly. The difference from not using VelocityMatch to not using Cohesion was bigger because Cohesion uses k = 30 for aknn while velocity match uses k = 10. When I set k = 10 for cohesion the time dropped as well.

 

This is the code that behaviors use to get k-nearest neighbors:

public class Flock
{
	List<Entity> boids;
	alglib.kdtree kdt;

	public Flock(params Entity[] _boids)
	{
		boids = new List<Entity>();

		Add(_boids);
	}
	
	public int GetNeighborhoodInfo(Entity character, int maxNeighbors, float maxDistance, float minDotProduct, double aknnApproxVal, out Vector3 center, out Vector3 avgVeL) 
	{
		var point = new double[3]{character.position.x, character.position.y, character.position.z};

		var results = new double[0, 0]; // the array containing the results (points of the k nearest neighbors)

		int neighbors = (maxNeighbors == 0 ? 0 : alglib.kdtreequeryaknn(kdt, point, maxNeighbors, aknnApproxVal));

		// get the results
		alglib.kdtreequeryresultsxy(kdt, ref results);

		center = avgVeL = Vector3.zero;

		// where is the character looking
		var look = character.transform.forward;

		int count = 0;
		for (int i = 0; i < neighbors; ++i)
		{
			var pos = new Vector3((float)results[i, 0], (float)results[i, 1], (float)results[i, 2]);
			var vel = new Vector3((float)results[i, 3], (float)results[i, 4], (float)results[i, 5]);

			if (pos == character.position)
				continue;


			if( Vector3.Distance(character.position, pos) <= maxDistance )
			{
				// check for angle
				if( minDotProduct > -1.0f )
				{
					var offset = pos - character.position;
					if( Vector3.Dot(look, offset.normalized) < minDotProduct )
						continue;
				}
				
				center += pos;
				avgVeL += vel;
				++count;
			}
		}

		center /= count;
		avgVeL /= count;


		return count;
	}
}

I think it's pretty straight-forward... So I guess the only solution is to search for another library for aknn? I read somewhere that akkn could be made more efficient when using octrees instead of k-d trees.

0

Share this post


Link to post
Share on other sites

Normally you don't need to run an AI system every frame for every agent. I assume your above Behaviours are on every single agent entity?

One way to optimize is to use agent handlers that calculate behaviour for a bunch of agents. That way you can reuse many calculations.

Another way (these are combinable) is to use Coroutines (Threads are a really bad idea in Unity) to run your algorithms over multiple frames. This can boost performance really good.

 

The last not so obvious problem you might have is colliders. You wrote above you "disabled" physics on your agents. Does this mean you removed the RigidBody component or do you set it to isKinematic? Do you still have colliders attached to the objects?

 

The problem is: Unity handles every Collider that is not on the same object as a Rigidbody like it was static. Moving a static collider means for Unity to recalculate EVERY other static collider, which is performance hell. So the way to go with moving colliders is 'Collider + Rigidbody (isKinematic)' if you do want to disable physics on that object. Or you remove the colliders, too.

0

Share this post


Link to post
Share on other sites
I am a programming dinosaur, so I only use dynamic memory allocation when I need it. For a simulation like this, nothing other than perhaps the k-d tree implementation should use dynamic memory allocation at all. Your code has `new' all over the place, and this is my first guess as to why it is so slow. Perhaps there is no alternative to using `new' everywhere in C#, but my C/C++ brain is disgusted. smile.png

Is there any way you can switch to C++ for this project?
0

Share this post


Link to post
Share on other sites
Álvaro is right about the `new`s. The ones on the Vector3s don't cost anything but the 2 array allocations could be hurting performance and are completely unnecessary. Move the allocation of those 2 arrrays into the constructor and just set the values in GetNeighborhoodInfo.
0

Share this post


Link to post
Share on other sites

Álvaro is right about the `new`s. The ones on the Vector3s don't cost anything but the 2 array allocations could be hurting performance and are completely unnecessary. Move the allocation of those 2 arrrays into the constructor and just set the values in GetNeighborhoodInfo.

 

That is true, but I don't think that is the most pressing performance issue there. Making the arrays fields, will help nonetheless. Also you should use jagged arrays, not twodimensional arrays, as they are faster to fill in the mono version Unity is using, than the latter ones. (Jagged arrays: int[x][y] instead of int [x,y])

0

Share this post


Link to post
Share on other sites

I see what you guys mean, but the problem is that alglib expects the array to be of this type. And even If allocate earlier it will still be re-allocated by alglib (that's why it takes the array by reference) as you can see below:

 

// get the results
alglib.kdtreequeryresultsxy(kdt, ref results);

 

Also as far as I'm aware dynamic memory allocation in C# is fast (I read it's just a pointer increment, not sure how it works).

I can't switch to C++, it's too late now :P Also Unity doesn't support C++ so...

 

The last not so obvious problem you might have is colliders. You wrote above you "disabled" physics on your agents. Does this mean you removed the RigidBody component or do you set it to isKinematic? Do you still have colliders attached to the objects?

The agents do have rigidbodies attatched to them because I use them to set the velocity of the agents. However they do not have colliders attatched to them.

0

Share this post


Link to post
Share on other sites
The fact that it takes it by reference should actually be proof of the opposite. It's allowing you to handle the allocation. Also pre-allocating it in the constructor doesn't mean you have to change the array type. You can keep the same type, all you have to do is move the allocation:
 
List<Entity> boids;
alglib.kdtree kdt;
double[] point
double[,] results;

public Flock(params Entity[] _boids)
{
	boids = new List<Entity>();
	point = new double[3];
	results = new double[INSERT, SIZE];

	Add(_boids);
}

public int GetNeighborhoodInfo(Entity character, int maxNeighbors, float maxDistance, float minDotProduct, double aknnApproxVal, out Vector3 center, out Vector3 avgVeL) 
{
	point[0] = character.position.x;
	point[1] = character.position.y;
	point[2] = character.position.z;

	int neighbors = (maxNeighbors == 0 ? 0 : alglib.kdtreequeryaknn(kdt, point, maxNeighbors, aknnApproxVal));

	// get the results
	alglib.kdtreequeryresultsxy(kdt, ref results);
	
	
	...
	
}
Edit: Sorry I misread the code, I fixed it up a bit. I'm not sure what the size of the results array should be. The documentation for alglib says this:

        XY      -   possibly pre-allocated buffer. If XY is too small to store
                    result, it is resized. If size(XY) is enough to store
                    result, it is left unchanged.
So if you provide an array that's properly sized, there won't be any allocation Edited by Telanor
0

Share this post


Link to post
Share on other sites

Also as far as I'm aware dynamic memory allocation in C# is fast (I read it's just a pointer increment, not sure how it works).

 

Dynamic Memory Allocation is fast in .Net 4+ and in newer versions of Mono. Unfortunately Unity still uses Mono 2.6 and that has a pretty bad garbage collector, so you have to be careful of memory allocation and especially destroying if you need good performance. I recommend you read the following article series: http://www.gamasutra.com/blogs/WendelinReich/20131109/203841/C_Memory_Management_for_Unity_Developers_part_1_of_3.php

 

But I still think your best solution would be to slice up your algorithm using Coroutines. Heavy calculations are best spread over multiple frames, if you don't need precise real time adjustments. You don't need to calculate the positions of every agent 30+ times per second.

If you don't know about Unitys Coroutines, I gladly provide a few articles about it.

0

Share this post


Link to post
Share on other sites

When using the following steering behaviors I had the results below - using 500 boids.
Time to build K-D tree: 2.34ms. So this is not the problem.
 
Time to update boid steerings (all 500 boids, average time of N frames)
Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 46.77ms
Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 39.55ms
Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 24.44ms

2.36ms is 14% of a 60Hz frame -- quite a large percentage of time! If every task in your game took this long, you could only have 7 different tasks making up your entire game logic (and still run at 60fps).

A good way to get some perspective when profiling these numbers, is to pick a framerate that you'd ideally like to have your game running at (e.g. 60fps or 30fps) and then convert the frame-times (ms) into percentages (e.g. a 60Hz frame gives you a budget of 16.667ms per frame):

Task                              Time      60Hz%  30Hz%
K-D Tree                       => 2.34ms  / 14%  / 7%
VelocityMatch                  => 7.22ms  / 43%  / 22%
Cohesion                       => 22.33ms / 134% / 67%
Separation + ObstacleAvoidance => 17.22ms / 103% / 52%
                                  49.11ms / 294% / 148% < totals

I wouldn't be surprised if it weren't possible to reimplement your entire boids system to process those 500 objects in under 2ms total wink.png

0

Share this post


Link to post
Share on other sites

 

When using the following steering behaviors I had the results below - using 500 boids.
Time to build K-D tree: 2.34ms. So this is not the problem.
 
Time to update boid steerings (all 500 boids, average time of N frames)
Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 46.77ms
Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 39.55ms
Behaviors used: Cohesion, Separation, VelocityMatch, ObstacleAvoidance => 24.44ms

2.36ms is 14% of a 60Hz frame -- quite a large percentage of time! If every task in your game took this long, you could only have 7 different tasks making up your entire game logic (and still run at 60fps).

A good way to get some perspective when profiling these numbers, is to pick a framerate that you'd ideally like to have your game running at (e.g. 60fps or 30fps) and then convert the frame-times (ms) into percentages (e.g. a 60Hz frame gives you a budget of 16.667ms per frame):

Task                              Time      60Hz%  30Hz%
K-D Tree                       => 2.34ms  / 14%  / 7%
VelocityMatch                  => 7.22ms  / 43%  / 22%
Cohesion                       => 22.33ms / 134% / 67%
Separation + ObstacleAvoidance => 17.22ms / 103% / 52%
                                  49.11ms / 294% / 148% < totals

I wouldn't be surprised if it weren't possible to reimplement your entire boids system to process those 500 objects in under 2ms total wink.png

 

Hodge is quite correct, that's a pretty horrible update per frame to be hitting.  I also don't see a performance time for:

int neighbors = (maxNeighbors == 0 ? 0 : alglib.kdtreequeryaknn(kdt, point, maxNeighbors, aknnApproxVal));
alglib.kdtreequeryresultsxy(kdt, ref results);

Tree searches are expensive, even via kd-tree's.  The first step of that search is going down the tree looking for the closest tree node, then it has to search up and out for the neighbors.  If nothing else, this is blowing huge chunks of memory bandwidth and most likely showing up as most of your per boid update.  Doing this from within every boid update, almost guarantee'd to be O(log n) or worse in cost.

 

My suggestion for this would be to move the creation of the list of neighbors out to a system designed to linearize the cost of maintaining such information.  You can generally google for: "Spatial Awareness" with things like "c++ library" and get some starting points.  I wrote a library which I tested with flocking a while back and even did a vid capture: https://www.youtube.com/watch?v=JVTBRes6gY0  It's ugly but that's 2000 boids flocking and the most costly thing is the rendering.  WIthout rendering, I pushed that system to 10000 boids and it still ran at a steady 60fps while barely touching the cpu.  It's designed for AI usage which includes a bit of a different set of requirements but the basic "aware of" list is what flocking requires and as such the overall system was well tested by flocking.

 

Anyway, you are highly unlikely to be able to optimize a per-object solution such as this beyond a certain point.  If you follow all the current suggestions, you might drop 5-10ms off the frame times but you'll still have all the same problems, just pushed out a bit further.  Try some Googles for "AI Spatial Awareness" including "c++ library" and other trailers and you might find some ideas.  You'll be looking for items which supply "aware of" lists.  They can be implemented in many ways though I favor sweep and prune due to it having the fewest edge cases and the most consistent per frame processing time no matter what is going on.

0

Share this post


Link to post
Share on other sites

Alright, I moved the array allocations in the Flock class's constructor. And in the rebuildKDTree function I do not create the array every frame.

The time dropped from ~46 ms to ~36 ms. That was pretty big. By the way, I had forgotten that Unity uses Mono and not .NET  tongue.png

 

About co-routines: Seems I do not have access to startCoroutine because the class that handles the updates does not derive from MonoBehavior. However I did it myself: I manually update only n boids every frame (basically what I do is set a flag in each behavior that indicates whether it will use its old value or calculate a new one).

 

This is my update function:

public override void Update(float dt)
    {
        if (Time.frameCount == currentFrame) return;
        currentFrame = Time.frameCount;

        //------------
        kdBuildTimer.Start();

        flock.RebuildKdTree();
        anchorFlock.RebuildKdTree();

        kdBuildTimer.Stop();
        //------------

        updateTimer.Start();

        for(int i = 0; i < entries.Count; ++i)
        {
            (((entries[i].behavior as PrioritySteering).Groups[1] as BlendedSteering).Behaviors[0].behaviour as Separation).useOldValues = true;
            (((entries[i].behavior as PrioritySteering).Groups[1] as BlendedSteering).Behaviors[1].behaviour as Cohesion).useOldValues = true;
            (((entries[i].behavior as PrioritySteering).Groups[1] as BlendedSteering).Behaviors[2].behaviour as VelocityMatch).useOldValues = true;
        }

        int limit = (current + 100 < entries.Count ? current + 100 : entries.Count);
        for(; current < limit; ++current)
        {
            (((entries[current].behavior as PrioritySteering).Groups[1] as BlendedSteering).Behaviors[0].behaviour as Separation).useOldValues = false;
            (((entries[current].behavior as PrioritySteering).Groups[1] as BlendedSteering).Behaviors[1].behaviour as Cohesion).useOldValues = false;
            (((entries[current].behavior as PrioritySteering).Groups[1] as BlendedSteering).Behaviors[2].behaviour as VelocityMatch).useOldValues = false;
        }

        if( current == entries.Count )
            current = 0;

        // update birds and calculate the center of the flock
        for (int i = 0; i < entries.Count; ++i)
            UpdateSteering(dt, entries[i]);

        updateTimer.Stop();
        ++frameCount;
        Debug.Log("Time to build K-D tree: " + kdBuildTimer.ElapsedMilliseconds / (float)frameCount + ", Time to update: " + updateTimer.ElapsedMilliseconds / (float)frameCount);
    }


The k-d tree is built every frame however. I guess I can do something about that too - re-build it every x seconds (where x < 1.0 of course).

So now by having 500 boids and updating 100 every frame I get 25 fps (time to update is about 16ms) instead of 15 fps (and 34 ms to update).

But the motion of the boids is not the same when updating different number of boids every frame. But I guess this is expected. By the way isn't this "cheating"? That I'm not updating all every frame? Or is this a standard thing to do in AI?

Another thing: If I set more boids (5000 for example smile.png ) I still get very long update times even when I still update 100 boids per frame. This is because those that get updaded still do queries on a huge k-d tree... Can I do something about this? What I was thinking was having something that "remembers" where the closest neighbors were so that next time they won't be far away. But how can I know this without re-calculating everything?

Edited by sheep19
0

Share this post


Link to post
Share on other sites


About co-routines: Seems I do not have access to startCoroutine because the class that handles the updates does not derive from MonoBehavior. However I did it myself: I manually update only n boids every frame (basically what I do is set a flag in each behavior that indicates whether it will use its old value or calculate a new one).

 

Well, you can start a coroutine from another class (that is a monobehaviour). It just needs to return IEnumerator, that's all. There are some tricks to start coroutines from non-monobehaviour classes as well, but that is pretty deep stuff and I don't think it will help at all in your situation. If you have found a way to spread your calculation over multiple frames - good.

 


Another thing: If I set more boids (5000 for example ) I still get very long update times even when I still update 100 boids per frame. This is because those that get updaded still do queries on a huge k-d tree... Can I do something about this? What I was thinking was having something that "remembers" where the closest neighbors were so that next time they won't be far away. But how can I know this without re-calculating everything?

 

And well, for that you can use a heuristic algorithm that uses the last known position, and velocity vector of an agent and tries to "guess" where it will be after a few frames. Of course, the longer you "guess" the more errors in calculation you will have. So you need to balance your update times well, so that you still got good flocking behaviour.

1

Share this post


Link to post
Share on other sites

Performance wise two bottlenecks comes to mind.

 

1. updating the structure which holds the boids (kd-tree)

2. scanning for boids for the steering behaviour.

 

1. Using a kd-tree (with dynamic allocation) could be an issue. I would recommend sweep'n'prune, too, which is basically a fix array (for each axis) which could be sorted (insertion or tim sort) in almost O(n), due to the fact, that the coherence of the moving boids is really low.

 

2. If using sweep'n'prune, you can scan the surrounding in O(log n) using a modified bineary search, which sums up to O(n *log n). To improve this further on, you can cache the scan result for a few frames. Just increase the radius to scan a start set of neighbor boids, then hold this set and update it each time. The maximum movement of a boids is a threshold to when the cache will get invalid (in this case the probability that an unknown boid is entering the steering context is quite high). If you don't need a 100% perfect context information at each frame, you can really save a lot of time with a cache.

 

With this improvements you can get your time complexity from O(n*n) to almost O(n).

2

Share this post


Link to post
Share on other sites

Might it be possible to create 'flock' lists  (list of unit members within a defined 'flock entity - I'd do it with a link list pointer chain var in the unit object itself) where for a short time current grouping subset data is used for the interacting 'flocking' spacing comparisons,  and then at an adaquately frequent  interval the spacial tree is rebuilt/adjusted and the flock lists are rebuilt to be used again for some period ?

 

Cluster Flocking ...

 

The main idea is to eliminate alot of repeated heavyweight processing overhead  (particularly with high frame rates and only small relative movement increments of all the units)  Spacial tree is modified less frequently than it is used...  The flock subsets might only have to be checked brute force within their own members 1/2N^2  when N is fairly small might allow a 'good enuf' approximation. 

 

 

 

Possibly some 'smart' interflock calculations might be done/needed to detect flocks in the same vicinity -- possibly to force a variable interval for a subset recalculation using the full spacial processing.

 

Observing testing to tune for the right intervals and how to keep the subsets groups small enough for it to have any advantage over the simpler constant application....

0

Share this post


Link to post
Share on other sites

Whilst everyone is correct in terms of addressing memory allocations and re-engineering your overall algorithms being used will give the best improvements, I thought it odd that no-one had pointed out that you are using a standard distance check instead of a squared distance check. I.e. instead of using Vector3.Distance() use Vector3.sqrMagnitude() and provide your MaxDistance as a squared value. Though these days i've no idea of the relative performance benefits of doing this, probably not as much as it used to, but still worth doing IMHO.

 

I'd also maybe look into where you convert the KD Tree results into Unity types (Vector3). For example you convert vel but its only used if several conditional checks pass, would it not be more prudent to do the conversion only when you needed to add it to the avgVeL?

 

Indeed I would maybe question why you convert to Unity types at all within the function, leave the vectors as doubles (unless there is a tangible benefit in casting to and working with floats) and only convert to Unity Vector3 the avgCenter and avgVeL at the end of the function or in the calling function. It would mean writing your own sqrDistance and dotproduct methods, but they are easy enough, just don't forget to convert your character.pos from Vector3 to a double array in the calling function. In other words its probably far better to keep all your data in one format/type and convert at the end, instead of converting on the fly, especially when some of the data may not even be used.

 

However as I said at the beginning you probably find better gains through changing your approach  or algorithms, but the few optimisations above are pretty straightforward and simple to implement.

 

For repeating an action infrequently like updating your KDtree you could use InvokeRepeating() in Unity.

 

Its not clear to me but are you calling GetNeighborhoodInfo() for each boid for each Cohesion, Separation, VelocityMatch? If so shouldn't you simply be sharing the results between all three behaviours and just calculate it once per AI Tick using the highest K value you need?

 

Finally a few points about Unity. If you have a spare email address you should be able to sign up for a 30 day demo of Pro, getting you access to the profiler. Alternatively you could try emailing Unity Technologies and asking for a Pro demo key, explaining that you are using it for your thesis and supply evidence, as they are quite a nice company and might be willing to help you out - at least it doesn't hurt to try.

Edited by noisecrime
1

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0