Sign in to follow this  
EdBoon

terribly slow function needs more efficient structure

Recommended Posts

I have a program that i ran in a profiler and 50% of the entire code is in one function. It is really slowing down the program and thought maybe there is a much better way to code this. The program is in XNA but its really more of standard code question so I put it in general game programming. I am running collision detection using spacial hashing and this update function is from updating the grid.


public void Update()
{
int bucketListCount;
int count = occupiedBucket.Count();

for (int i = 0; i < count; i++)
{
bucketListCount = bucketList[occupiedBucket.ElementAt(i)].Count();
for (int j = 0; j < bucketListCount; j++)
{
for (int k = j + 1; k < bucketListCount; k++)
{
bucketList[occupiedBucket.ElementAt(i)].ElementAt(j).handleCollide(bucketList[occupiedBucket.ElementAt(i)].ElementAt(k));
}
}
}
}

There are a lot of uses of elementAt(), would that cause the slowdown to happen? the function it calls inside of itself (handlecollide) isnt spending any time in it so it is all just in the for statements. There has to be a better way to code this. Bucketlist count is usually around 3 or 4 and count is around 200-300.

thanks guys,

[Edited by - EdBoon on November 29, 2010 9:02:32 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by no such user
What are the exact types of the variables here?


oh yea sorry, the objects in the list that it is counting and elementsat() are from a class i created holding the enemy information such as position health bitmap etc..

also i guess i could explain a little bit what the for loops are doing>>

the bucketlist is an array of lists that represent buckets in the grid which when an object gets placed into a bucket or sector of the grid, an indicator gets placed inside of occupiedBucket, so that i dont have to cycle through the entire bucketlist and find if there is an object inside of the particular bucket. if an item is across a bucket divider, it gets placed into two buckets (or 4 if at a cross). each bucket of enemies and projectiles checks everyone else in the list at least once and does the collision function with them.

the more i try to explain i see im going in circles

variables:
occupiedBucket: a list of integers
bucketList: an array of lists which contain gameEntities (enemies, projectiles, turrets, etc..)
handlecollide: not too important because the profiler says hardly any time is spent inside of this function, but 50% is spent inside of the update function.


please let me know what is unclear because im sure theres more :) thanks again

Share this post


Link to post
Share on other sites
Quote:
Original post by no such user
What are the exact types of the variables here?


oh yea sorry, the objects in the list that it is counting and elementsat() are from a class i created holding the enemy information such as position health bitmap etc..

also i guess i could explain a little bit what the for loops are doing>>

the bucketlist is an array of lists that represent buckets in the grid which when an object gets placed into a bucket or sector of the grid, an indicator gets placed inside of occupiedBucket, so that i dont have to cycle through the entire bucketlist and find if there is an object inside of the particular bucket. if an item is across a bucket divider, it gets placed into two buckets (or 4 if at a cross). each bucket of enemies and projectiles checks everyone else in the list at least once and does the collision function with them.

the more i try to explain i see im going in circles

variables:
occupiedBucket: a list of integers
bucketList: an array of lists which contain gameEntities (enemies, projectiles, turrets, etc..)
handlecollide: not too important because the profiler says hardly any time is spent inside of this function, but 50% is spent inside of the update function.


please let me know what is unclear because im sure theres more :) thanks again


**double post my fault

Share this post


Link to post
Share on other sites
Are the buckets linked lists or vectors/arrays? ElementAt will be very expensive for linked lists. ElementAt(200) will have to go through 200 nodes before it gets the element you're interested in.

Share this post


Link to post
Share on other sites
Quote:
Original post by Thrump
Are the buckets linked lists or vectors/arrays? ElementAt will be very expensive for linked lists. ElementAt(200) will have to go through 200 nodes before it gets the element you're interested in.


i do not believe they are linkedlist, they seem to function the same as array lists but is type safe. (I am using c# by the way)

i declare them like this:


public static int min = 0;
public static int max = Game1.map.levelHeight;
public static int cellSize = 64;
public static int width = (max-min)/cellSize;
public static int numberOfBuckets = width*width;
public List<GameEntity>[] bucketList = new List<GameEntity>[numberOfBuckets];
public List<int> occupiedBucket = new List<int>();
Vector2 normal;



public Grid()
{
for (int i = 0; i < numberOfBuckets; i++)
{
bucketList[i] = new List<GameEntity>();
}
}


So I'm pretty sure they aren't searching through the entire list each location but i may be wrong.

originally I had them as foreach in the Update section but I read that for loops are much more efficient, although I didn't see much of a change in performance numbers. Actually a couple days ago it was much more code and have gotten it down to the couple of for loops, but still not much improvement on efficiency. any ideas?
thanks again.

Share this post


Link to post
Share on other sites
The proper way to iterate through a linked list is to do something like

while(currentElement = list->nextelement())
{
// do something with currentElement
}

Otherwise, each time you address an element by index number, the program iterates through the list element by element until it gets to the requested element, which takes more and more time every time the index increases.

Share this post


Link to post
Share on other sites
List<T> is represented by arrays in .Net, not linked lists.

List<T> has a [] operator. Use it instead of ElementAt.

ElementAt is a LINQ extension method and uses the IEnumerable interface. The IEnumerable interface is not meant for random access and is guaranteed to be slower than indexing the List<T> directly using operator[].



That said, your loop can be written like this:


public void Update()
{
foreach (var bucket in occupiedBucket.Select(x => bucketList[x]))
{
int bucketListCount = bucket.Count();
for (int j = 0; j < bucketListCount - 1; j++)
for (int k = j + 1; k < bucketListCount; k++)
bucket[j].handleCollide(bucket[k]);
}
}

Share this post


Link to post
Share on other sites
Quote:
Original post by Nypyren
List<T> is represented by arrays in .Net, not linked lists.

List<T> has a [] operator. Use it instead of ElementAt.

ElementAt is a LINQ extension method and uses the IEnumerable interface. The IEnumerable interface is not meant for random access and is guaranteed to be slower than indexing the List<T> directly using operator[].


thanks a lot, i removed the elementAt in there and just indexing, and it improved by about 2%, but from 50% to 48% not much difference :(

any other simple mistakes you can see??

Share this post


Link to post
Share on other sites
The HandleCollide function must be pretty expensive then, if the enumeration wasn't the primary cause.

Alternatively, instead of checking everything inside each occupied bucket against each other, you could use Sweep and prune instead of (or in conjunction with) buckets.

[Edited by - Nypyren on November 29, 2010 10:47:09 PM]

Share this post


Link to post
Share on other sites
Quote:
Original post by Nypyren
List<T> is represented by arrays in .Net, not linked lists.

List<T> has a [] operator. Use it instead of ElementAt.

ElementAt is a LINQ extension method and uses the IEnumerable interface. The IEnumerable interface is not meant for random access and is guaranteed to be slower than indexing the List<T> directly using operator[].



That said, your loop can be written like this:


public void Update()
{
foreach (var bucket in occupiedBucket.Select(x => bucketList[x]))
{
int bucketListCount = bucket.Count();
for (int j = 0; j < bucketListCount - 1; j++)
for (int k = j + 1; k < bucketListCount; k++)
bucket[j].handleCollide(bucket[k]);
}
}

oops i spoke too soon, before you had passed the code, thanks! it did help a bit more from 48 to around 30 so quite a bit. The profiler now says 61% is spent in update, 78% of that in grid.update itself, and then 58% of that time in update itself and 41% in handle collide where before the ratio was a lot more towards in update itself. Thanks so much! faster by about 15-20% still a lot of time inside of update.. hmm maybe i'll try to go through collide, but should it be that much of the program? 61%*78%*58% (27% total) just for those couple of loops? Then again, much much better than it was before.

Share this post


Link to post
Share on other sites
Sorry, I keep editing my posts :)

It's very strange that my posted code would perform that much better than just your version (replacing ElementAt with operator[]). The optimizer should turn yours into something that runs identically to mine.

Are you profiling it with optimization suppressed?

Share this post


Link to post
Share on other sites
Quote:
Original post by Nypyren
Sorry, I keep editing my posts :)

It's very strange that my posted code would perform that much better than just your version (replacing ElementAt with operator[]). The optimizer should turn yours into something that runs identically to mine.

Are you profiling it with optimization suppressed?


No i don't believe I am running with optimization suppressed? i am going to guess it is normally on, I hadn't changed that setting, to be honest i am not sure how to do that.

yea with the sweep and prune thing, I was kind of trying to simulate that with the for loop starting at k = j+1 then it wouldn't check 2 objects twice.

Share this post


Link to post
Share on other sites
Usually profilers don't suppress optimizations, but the visual studio debugger does by default. If you're profiling at the same time as debugging then that might happen.

If you just want to mess with the option in visual studio to see if it makes any difference, it's under (Tools -> Options -> Debugging -> Suppress JIT optimization on module load).

That option affects how the intermeriate bytecode in the EXE gets converted to native machine code on whichever platform you're running on (are you running on a PC or a 360?)

If the checkbox is enabled (default) AND you use visual studio to launch your program, it will disable some optimizations so that you can debug your code better. If it's disabled, debugging the optimized code can be VERY weird (you might not be able to step into certain functions, variables might appear to contain random numbers, etc).

If you launch your EXE without debugging, or from explorer, it will optimize it without caring what the checkbox is set to.



It doesn't make sense to me for profilers to have this kind of option (since you would want your code to go as fast as possible!), but it *might* be doing it for some unknown reason.

Share this post


Link to post
Share on other sites
You say that 50% of the execution time is in this function, but percents by themselves are nearly meaningless when profiling! Instead you should be talking about things in time (ie milliseconds)

Questions for you...

1) Are you having a performance problem (ie low frame rate?) or are you just trying to make it faster because it's taking up too much % of your execution time?

2) How long does this function take on average to execute?

3) How many times do you call this function per frame?

4) Is your program doing anything else significant yet or is this most of all that it does?

Also a little bit of info for you...

Optimizing functions like this where you try to iterate through lists in a better way, make math faster etc is called MICRO optimization.

Micro optimizations should be done only after macro optimizations and usually they don't get you very much performance increase.

MACRO optimizations are where all of your big wins come from usually when optimizing. Macro optimizations are where you change the strategy of your code, not the fine grained details of how it's implemented.

For instance, you might have the very fastest implementation of a bubble sort, but it's going to lose out to a much poorer implementation of quick sort!

Can't wait to hear the answers, and i hope the info is useful for you!
-Atrix

Share this post


Link to post
Share on other sites
Quote:
Original post by Atrix256
You say that 50% of the execution time is in this function, but percents by themselves are nearly meaningless when profiling! Instead you should be talking about things in time (ie milliseconds)

Questions for you...

1) Are you having a performance problem (ie low frame rate?) or are you just trying to make it faster because it's taking up too much % of your execution time?

2) How long does this function take on average to execute?

3) How many times do you call this function per frame?

4) Is your program doing anything else significant yet or is this most of all that it does?

Also a little bit of info for you...

Optimizing functions like this where you try to iterate through lists in a better way, make math faster etc is called MICRO optimization.

Micro optimizations should be done only after macro optimizations and usually they don't get you very much performance increase.

MACRO optimizations are where all of your big wins come from usually when optimizing. Macro optimizations are where you change the strategy of your code, not the fine grained details of how it's implemented.

For instance, you might have the very fastest implementation of a bubble sort, but it's going to lose out to a much poorer implementation of quick sort!

Can't wait to hear the answers, and i hope the info is useful for you!
-Atrix



thanks for the post, a lot of useful information! As for the questions:
1) I am experiencing a performance problem. On PC the framerate starts going down at about 400 enemies, and on xbox ( which is the platform it will end up on) it starts almost right away at 5 or 10 enemies. My goal is around 1000 on xbox so im trying to focus on the biggest areas first which the most time was in this function.

2) the average time through the full grid.update function is 12 ms, average for inside itself alone is 6.9ms and in a 34 sec total program run, it was called 1355 times and 16sec of the total 34 inside of grid.update. handlecollide was called 16,998,343 times. Total time spent in handle collide is 6.5sec.

3) Grid.Update is called once per frame. Would it be okay to maybe call it once per 2 frames? I'm thinking this wouldn't be ideal.

4) No the program is pretty big, it contains map tile functionality, game state management, then there is quite a number of items/enemies/ammunition/money/playertypes/placeable items/etc.. been working on it for a while but none of the other functionality seems to be taking much time. There is a big chunk inside of the main program itself but I havent drilled down into that code yet because it was a bit less than the grid.update.

From a high level for all objects/items/game entities there is pretty much an update phase which is mainly game AI and seeking methods, updating the positions, along with the map scrolling, then there is an insert phase (insert collidable items into the buckets), then a draw and animation phase to display the correct bitmap at the location and rotation necessary.

Again thanks for the post, got me thinking about a few things that may help. If anything comes to mind about optimization let me know. thanks

Share this post


Link to post
Share on other sites
Quote:
Original post by EdBoon

1) I am experiencing a performance problem. On PC the framerate starts going down at about 400 enemies, and on xbox ( which is the platform it will end up on) it starts almost right away at 5 or 10 enemies. My goal is around 1000 on xbox so im trying to focus on the biggest areas first which the most time was in this function.


A question similar to this was replied in another thread.

Quote:
2) the average time through the full grid.update function is 12 ms, average for inside itself alone is 6.9ms and in a 34 sec total program run, it was called 1355 times and


The code given in other thread is effectively optimal. For 500 entities, the time I got was 1ms. Given slight differences between C++ and C# and containers used, the numbers are perfectly comparable - factor of 6 is just a small constant.

Simply put, n^2 algorithm will not do 1000 entities. Period. Not in C#, not in C++ not in assembly. At least not for desired framerates.

Find a better algorithm. There are plenty to choose from, ranging from hash-based to kd-trees, sweep line algorithms....

Share this post


Link to post
Share on other sites
Quote:
Original post by Antheus

The code given in other thread is effectively optimal. For 500 entities, the time I got was 1ms. Given slight differences between C++ and C# and containers used, the numbers are perfectly comparable - factor of 6 is just a small constant.

Simply put, n^2 algorithm will not do 1000 entities. Period. Not in C#, not in C++ not in assembly. At least not for desired framerates.

Find a better algorithm. There are plenty to choose from, ranging from hash-based to kd-trees, sweep line algorithms....


thanks for the post, and the thread link. But this is not n^2 algorithm that checks every object with every other object. I am using a spatial hashing method that puts items into a 1d list based on its position and only checks items in that area (or bucket as i call them). so if there is 100 enemies, and groups of 5 are all in different areas, that is 10 checks per 5 (checks are done 1 per 'set'). that would total to be 200. Let me know if i didn't explain very clear.
to put the items in the list I do the following:


public void Insert(GameEntity item)
{
Vector2 position = new Vector2(item.boundsBox.X, item.boundsBox.Y) + Game1.mapOffset;
int gridCell = ((int)(position.X / cellSize) + (int)(position.Y / cellSize) * width);
int gridCell3 = ((int)(position.X + item.boundsBox.Width) * 1 / cellSize +
(int)((position.Y + item.boundsBox.Height) * 1 / cellSize) * width);
if (gridCell < numberOfBuckets && gridCell >= 0)
{
if (bucketList[gridCell] != null && gridCell3 < numberOfBuckets && gridCell3 >= 0)
{
bucketList[gridCell].Add(item);
if(bucketList[gridCell].Count() == 1)
occupiedBucket.Add(gridCell);
if (gridCell != gridCell3)
{
bucketList[gridCell3].Add(item);
if (bucketList[gridCell3].Count() == 1)
occupiedBucket.Add(gridCell3);
if (gridCell != gridCell3 - 1)
{
bucketList[gridCell3 - 1].Add(item);

bucketList[gridCell + 1].Add(item);
if (bucketList[gridCell3-1].Count() == 1)
occupiedBucket.Add(gridCell3-1);
if (bucketList[gridCell+1].Count() == 1)
occupiedBucket.Add(gridCell+1);
}
}
}
}
}


This function assigned a position to 1 number or bucket and adds it to that. If the size is overlapping 2 or 4 buckets it adds to them all.

the main part to notice is

int gridCell = ((int)(position.X / cellSize) + (int)(position.Y / cellSize) * width);



I previously was using n^2 method and it almost performed about the same which I dont understand because now it is more like 2*n or 8*n if they all overlapped at a cross in the grid.

again thanks.

edit* also as shown in the insert code, it only checks collision if it is the 2nd item in that bucket

Share this post


Link to post
Share on other sites
Have you checked whether the objects are reasonably distributed throughout the buckets? You might be able to correlate bucket load with performance issues and you might need to tweak your algorithm to take your specific data distribution into account.

Share this post


Link to post
Share on other sites
yeah i wonder if maybe alot of your objects are winding up in the same buckets?

Or maybe the code that adds to multiple buckets when it doesn't fit in one (straddles the line etc) is accidentally adding to them all?

Or maybe it isn't removing references when the objects move around so the buckets just keep filling up as time goes on?

Share this post


Link to post
Share on other sites
try shrinking your grid and see if this helps. say max # of buckets in a grid can be around 10 or then try a max setting of 5.

Share this post


Link to post
Share on other sites
So are you using a LIST and accessing it via array index? You don't give a full enough listing but with your bad performance it's got to be a mistake of this kind.

I am not sure how list in C# will deal with accessing in that manner, but probably a bad idea.

What you probably want is to make this whole think much simpler and put it into a vector and then sort the element's with postman's sort. You can also use a spacial index and testing means pretty much walking along the stucture one time. You should be able to handle hundreds of thousands like this with good implementation (maybe less since it's not C++ but still much more).

Share this post


Link to post
Share on other sites
Thanks for all the responses guys, ill try to touch on them all.

As for the buckets, I have gone through them in depth and the buckets have the right number of items in each and it is clearing the items out of it fine.

When i use less buckets (or make each cell larger) the efficiency goes down exponentially, and with how I have it now there isnt usually more than 4 or 5 in a bucket at a time.

c# deals with accessing the array of lists well. I could do a postman's sort but this is really bugging me and I want to learn low level why this won't work (if it won't) or what I am doing wrong to make it work correctly. I have read a few sources where people have had very high success with this method.

I started using a couple different profilers, and some of them say cpu time spent highest is in other areas! :/ not sure where to set my focus..

but if it helps i added a screenshot to the profiler that has the most detail to it, maybe someone will notice something:

1

Share this post


Link to post
Share on other sites
Well hey i have some potentially good news if I'm correct (and like you say... if that report can be trusted since its saying different stuff than the other)

That report shows that most of your time is spent in present.

What i take that to mean is that you are GPU bound by a pretty large amount!

If that is indeed the case, optimizing your CPU side code will do nothing for you at this point in time.

It looks like you need to optimize your GPU (shaders etc) for you to see performance gains.

If you get your GPU down to where you are again CPU bound, then maybe optimizing your collision routine, or that vertex buffer copying would be the way to go. Until then though, it doesn't look like it would do anything for you.

My 2 cents anyhow!

Share this post


Link to post
Share on other sites
Quote:
Original post by Atrix256
Well hey i have some potentially good news if I'm correct (and like you say... if that report can be trusted since its saying different stuff than the other)

That report shows that most of your time is spent in present.

What i take that to mean is that you are GPU bound by a pretty large amount!

If that is indeed the case, optimizing your CPU side code will do nothing for you at this point in time.

It looks like you need to optimize your GPU (shaders etc) for you to see performance gains.

If you get your GPU down to where you are again CPU bound, then maybe optimizing your collision routine, or that vertex buffer copying would be the way to go. Until then though, it doesn't look like it would do anything for you.

My 2 cents anyhow!


That would make a lot of sense, because I have really optimized the grid.update a lot without seeing much difference. I hope this is the case too. I will read about how to optimize shaders and such and implement changes, see if it does anything,

Thanks again Atrix.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this