terribly slow function needs more efficient structure

Started by
32 comments, last by adt7 13 years, 3 months ago
Quote:Original post by Nypyren
List<T> is represented by arrays in .Net, not linked lists.

List<T> has a [] operator. Use it instead of ElementAt.

ElementAt is a LINQ extension method and uses the IEnumerable interface. The IEnumerable interface is not meant for random access and is guaranteed to be slower than indexing the List<T> directly using operator[].



That said, your loop can be written like this:

public void Update(){    foreach (var bucket in occupiedBucket.Select(x => bucketList[x]))    {        int bucketListCount = bucket.Count();        for (int j = 0; j < bucketListCount - 1; j++)            for (int k = j + 1; k < bucketListCount; k++)                bucket[j].handleCollide(bucket[k]);    }}

oops i spoke too soon, before you had passed the code, thanks! it did help a bit more from 48 to around 30 so quite a bit. The profiler now says 61% is spent in update, 78% of that in grid.update itself, and then 58% of that time in update itself and 41% in handle collide where before the ratio was a lot more towards in update itself. Thanks so much! faster by about 15-20% still a lot of time inside of update.. hmm maybe i'll try to go through collide, but should it be that much of the program? 61%*78%*58% (27% total) just for those couple of loops? Then again, much much better than it was before.

http://www.youtube.com/user/bigrookdigital
Undead Empire for Xbox 360 --> http://bit.ly/dYdu3z
Gamerscore Tracker for Xbox 360 --> http://bit.ly/vI4T4X
www.bigrookgames.com
twitter.com/BigRookGames

Advertisement
Sorry, I keep editing my posts :)

It's very strange that my posted code would perform that much better than just your version (replacing ElementAt with operator[]). The optimizer should turn yours into something that runs identically to mine.

Are you profiling it with optimization suppressed?
Quote:Original post by Nypyren
Sorry, I keep editing my posts :)

It's very strange that my posted code would perform that much better than just your version (replacing ElementAt with operator[]). The optimizer should turn yours into something that runs identically to mine.

Are you profiling it with optimization suppressed?


No i don't believe I am running with optimization suppressed? i am going to guess it is normally on, I hadn't changed that setting, to be honest i am not sure how to do that.

yea with the sweep and prune thing, I was kind of trying to simulate that with the for loop starting at k = j+1 then it wouldn't check 2 objects twice.

http://www.youtube.com/user/bigrookdigital
Undead Empire for Xbox 360 --> http://bit.ly/dYdu3z
Gamerscore Tracker for Xbox 360 --> http://bit.ly/vI4T4X
www.bigrookgames.com
twitter.com/BigRookGames

Usually profilers don't suppress optimizations, but the visual studio debugger does by default. If you're profiling at the same time as debugging then that might happen.

If you just want to mess with the option in visual studio to see if it makes any difference, it's under (Tools -> Options -> Debugging -> Suppress JIT optimization on module load).

That option affects how the intermeriate bytecode in the EXE gets converted to native machine code on whichever platform you're running on (are you running on a PC or a 360?)

If the checkbox is enabled (default) AND you use visual studio to launch your program, it will disable some optimizations so that you can debug your code better. If it's disabled, debugging the optimized code can be VERY weird (you might not be able to step into certain functions, variables might appear to contain random numbers, etc).

If you launch your EXE without debugging, or from explorer, it will optimize it without caring what the checkbox is set to.



It doesn't make sense to me for profilers to have this kind of option (since you would want your code to go as fast as possible!), but it *might* be doing it for some unknown reason.
You say that 50% of the execution time is in this function, but percents by themselves are nearly meaningless when profiling! Instead you should be talking about things in time (ie milliseconds)

Questions for you...

1) Are you having a performance problem (ie low frame rate?) or are you just trying to make it faster because it's taking up too much % of your execution time?

2) How long does this function take on average to execute?

3) How many times do you call this function per frame?

4) Is your program doing anything else significant yet or is this most of all that it does?

Also a little bit of info for you...

Optimizing functions like this where you try to iterate through lists in a better way, make math faster etc is called MICRO optimization.

Micro optimizations should be done only after macro optimizations and usually they don't get you very much performance increase.

MACRO optimizations are where all of your big wins come from usually when optimizing. Macro optimizations are where you change the strategy of your code, not the fine grained details of how it's implemented.

For instance, you might have the very fastest implementation of a bubble sort, but it's going to lose out to a much poorer implementation of quick sort!

Can't wait to hear the answers, and i hope the info is useful for you!
-Atrix
Quote:Original post by Atrix256
You say that 50% of the execution time is in this function, but percents by themselves are nearly meaningless when profiling! Instead you should be talking about things in time (ie milliseconds)

Questions for you...

1) Are you having a performance problem (ie low frame rate?) or are you just trying to make it faster because it's taking up too much % of your execution time?

2) How long does this function take on average to execute?

3) How many times do you call this function per frame?

4) Is your program doing anything else significant yet or is this most of all that it does?

Also a little bit of info for you...

Optimizing functions like this where you try to iterate through lists in a better way, make math faster etc is called MICRO optimization.

Micro optimizations should be done only after macro optimizations and usually they don't get you very much performance increase.

MACRO optimizations are where all of your big wins come from usually when optimizing. Macro optimizations are where you change the strategy of your code, not the fine grained details of how it's implemented.

For instance, you might have the very fastest implementation of a bubble sort, but it's going to lose out to a much poorer implementation of quick sort!

Can't wait to hear the answers, and i hope the info is useful for you!
-Atrix



thanks for the post, a lot of useful information! As for the questions:
1) I am experiencing a performance problem. On PC the framerate starts going down at about 400 enemies, and on xbox ( which is the platform it will end up on) it starts almost right away at 5 or 10 enemies. My goal is around 1000 on xbox so im trying to focus on the biggest areas first which the most time was in this function.

2) the average time through the full grid.update function is 12 ms, average for inside itself alone is 6.9ms and in a 34 sec total program run, it was called 1355 times and 16sec of the total 34 inside of grid.update. handlecollide was called 16,998,343 times. Total time spent in handle collide is 6.5sec.

3) Grid.Update is called once per frame. Would it be okay to maybe call it once per 2 frames? I'm thinking this wouldn't be ideal.

4) No the program is pretty big, it contains map tile functionality, game state management, then there is quite a number of items/enemies/ammunition/money/playertypes/placeable items/etc.. been working on it for a while but none of the other functionality seems to be taking much time. There is a big chunk inside of the main program itself but I havent drilled down into that code yet because it was a bit less than the grid.update.

From a high level for all objects/items/game entities there is pretty much an update phase which is mainly game AI and seeking methods, updating the positions, along with the map scrolling, then there is an insert phase (insert collidable items into the buckets), then a draw and animation phase to display the correct bitmap at the location and rotation necessary.

Again thanks for the post, got me thinking about a few things that may help. If anything comes to mind about optimization let me know. thanks

http://www.youtube.com/user/bigrookdigital
Undead Empire for Xbox 360 --> http://bit.ly/dYdu3z
Gamerscore Tracker for Xbox 360 --> http://bit.ly/vI4T4X
www.bigrookgames.com
twitter.com/BigRookGames

Quote:Original post by EdBoon

1) I am experiencing a performance problem. On PC the framerate starts going down at about 400 enemies, and on xbox ( which is the platform it will end up on) it starts almost right away at 5 or 10 enemies. My goal is around 1000 on xbox so im trying to focus on the biggest areas first which the most time was in this function.


A question similar to this was replied in another thread.

Quote:2) the average time through the full grid.update function is 12 ms, average for inside itself alone is 6.9ms and in a 34 sec total program run, it was called 1355 times and


The code given in other thread is effectively optimal. For 500 entities, the time I got was 1ms. Given slight differences between C++ and C# and containers used, the numbers are perfectly comparable - factor of 6 is just a small constant.

Simply put, n^2 algorithm will not do 1000 entities. Period. Not in C#, not in C++ not in assembly. At least not for desired framerates.

Find a better algorithm. There are plenty to choose from, ranging from hash-based to kd-trees, sweep line algorithms....
Quote:Original post by Antheus

The code given in other thread is effectively optimal. For 500 entities, the time I got was 1ms. Given slight differences between C++ and C# and containers used, the numbers are perfectly comparable - factor of 6 is just a small constant.

Simply put, n^2 algorithm will not do 1000 entities. Period. Not in C#, not in C++ not in assembly. At least not for desired framerates.

Find a better algorithm. There are plenty to choose from, ranging from hash-based to kd-trees, sweep line algorithms....


thanks for the post, and the thread link. But this is not n^2 algorithm that checks every object with every other object. I am using a spatial hashing method that puts items into a 1d list based on its position and only checks items in that area (or bucket as i call them). so if there is 100 enemies, and groups of 5 are all in different areas, that is 10 checks per 5 (checks are done 1 per 'set'). that would total to be 200. Let me know if i didn't explain very clear.
to put the items in the list I do the following:

public void Insert(GameEntity item)        {            Vector2 position = new Vector2(item.boundsBox.X, item.boundsBox.Y) + Game1.mapOffset;            int gridCell = ((int)(position.X  / cellSize) + (int)(position.Y / cellSize) * width);            int gridCell3 = ((int)(position.X + item.boundsBox.Width) * 1 / cellSize + (int)((position.Y + item.boundsBox.Height) * 1 / cellSize) * width);            if (gridCell < numberOfBuckets && gridCell >= 0)            {                if (bucketList[gridCell] != null && gridCell3 < numberOfBuckets && gridCell3 >= 0)                {                    bucketList[gridCell].Add(item);                    if(bucketList[gridCell].Count() == 1)                        occupiedBucket.Add(gridCell);                    if (gridCell != gridCell3)                    {                        bucketList[gridCell3].Add(item);                        if (bucketList[gridCell3].Count() == 1)                            occupiedBucket.Add(gridCell3);                        if (gridCell != gridCell3 - 1)                        {                            bucketList[gridCell3 - 1].Add(item);                            bucketList[gridCell + 1].Add(item);                            if (bucketList[gridCell3-1].Count() == 1)                                occupiedBucket.Add(gridCell3-1);                            if (bucketList[gridCell+1].Count() == 1)                                occupiedBucket.Add(gridCell+1);                        }                    }                }            }        }


This function assigned a position to 1 number or bucket and adds it to that. If the size is overlapping 2 or 4 buckets it adds to them all.

the main part to notice is
int gridCell = ((int)(position.X  / cellSize) + (int)(position.Y / cellSize) * width);



I previously was using n^2 method and it almost performed about the same which I dont understand because now it is more like 2*n or 8*n if they all overlapped at a cross in the grid.

again thanks.

edit* also as shown in the insert code, it only checks collision if it is the 2nd item in that bucket

http://www.youtube.com/user/bigrookdigital
Undead Empire for Xbox 360 --> http://bit.ly/dYdu3z
Gamerscore Tracker for Xbox 360 --> http://bit.ly/vI4T4X
www.bigrookgames.com
twitter.com/BigRookGames

Have you checked whether the objects are reasonably distributed throughout the buckets? You might be able to correlate bucket load with performance issues and you might need to tweak your algorithm to take your specific data distribution into account.
yeah i wonder if maybe alot of your objects are winding up in the same buckets?

Or maybe the code that adds to multiple buckets when it doesn't fit in one (straddles the line etc) is accidentally adding to them all?

Or maybe it isn't removing references when the objects move around so the buckets just keep filling up as time goes on?

This topic is closed to new replies.

Advertisement