Loops+switch slowing things down?

Started by
8 comments, last by cache_hit 14 years, 1 month ago
I'm working on a fairly simple game, using C++. I've got an array of particles, and each particle is pixel-sized. So, my window is 1024x768, and the array is also 1024x768. I have two for() loops, one nested in another, looping through all of the particles in the array. For each particle, a switch checks its type and acts on it accordingly. Tonight, I've added several new types of particles, and thus several new cases for my switch. I noticed that my game's performance is impacted quite heavily by the additions of two rather large cases. However, the two types of particles handled by these cases aren't even present in the game. Now, this doesn't seem right to me. If the types satisfying these cases aren't present in the game, shouldn't this have no impact on performance? For reference, here are the two cases I'm talking about. When I comment them out, I lose ~10 FPS:
                case 4:
                //Fall straight down.
                if(y<level.level_y-1 && (level.particles_array[x][y+1].type==0)){
                    level.particles_array[x][y].type=0;
                    level.particles_array[x][y].appearance=0;

                    new_x=x;
                    new_y=y+1;
                    level.particles_array[new_x][new_y].type=4;
                }

                //Fall down left.
                else if(y<level.level_y-1 && x>0 && (level.particles_array[x-1][y+1].type==0)){
                    level.particles_array[x][y].type=0;
                    level.particles_array[x][y].appearance=0;

                    new_x=x-1;
                    new_y=y+1;
                    level.particles_array[new_x][new_y].type=4;
                }

                //Fall down right.
                else if(y<level.level_y-1 && x<level.level_x-1 && (level.particles_array[x+1][y+1].type==0)){
                    level.particles_array[x][y].type=0;
                    level.particles_array[x][y].appearance=0;

                    new_x=x+1;
                    new_y=y+1;
                    level.particles_array[new_x][new_y].type=4;
                }

                //If the water particle has air above it.
                if(level.particles_array[new_x][new_y].type!=0){
                    level.particles_array[new_x][new_y].appearance=0;
                }
                else if(level.particles_array[new_x][new_y-1].type==0){
                    //
                    level.particles_array[new_x][new_y].appearance=random_range(1,2);
                }
                break;
                //Sand.
                case 5:
                //Fall straight down.
                if(y<level.level_y-1 && (level.particles_array[x][y+1].type==0)){
                    level.particles_array[x][y].type=0;
                    level.particles_array[x][y].appearance=0;

                    level.particles_array[x][y+1].type=5;
                    level.particles_array[x][y+1].appearance=random_range(0,2);
                }

                //Fall down left.
                else if(y<level.level_y-1 && x>0 && (level.particles_array[x-1][y+1].type==0)){
                    level.particles_array[x][y].type=0;
                    level.particles_array[x][y].appearance=0;

                    level.particles_array[x-1][y+1].type=5;
                    level.particles_array[x][y+1].appearance=random_range(0,2);
                }

                //Fall down right.
                else if(y<level.level_y-1 && x<level.level_x-1 && (level.particles_array[x+1][y+1].type==0)){
                    level.particles_array[x][y].type=0;
                    level.particles_array[x][y].appearance=0;

                    level.particles_array[x+1][y+1].type=5;
                    level.particles_array[x][y+1].appearance=random_range(0,2);
                }
                break;


[Edited by - Dark_Oppressor on February 28, 2010 11:14:49 PM]
Advertisement
It may be helpful to post the entire function's code, including the for loops, so that we can see why these two cases are different from the others (and hence, why it's slowing things down).
Oh no, I may have misrepresented the problem. The two cases in question are not the problem. If they are commented out, I can comment out other cases and get similar performance boosts. The problem I'm having is that the more cases I have in my switch, the slower the game runs, regardless of whether or not those cases are actually being used in the game.
I'm pretty sure switch statements have constant performance as you have more cases. The first thing I'd do to optimize this though, is ask myself "Do I really need 1024 x 768 particles at the same time?". After that, break up the update so that only a half, or a quarter of the particles get updated at a time. The end user won't notice the difference and you're doing a quarter of the work each frame. If all you're doing is adding cases (that aren't even being used) then that doesn't sound right to me....

EDIT- Just googled this:
http://www.codeguru.com/cpp/cpp/cpp_mfc/comments.php/c4007/?thread=48816

Looks like in debug builds, the switch is handled like a big if else if else... which would explain the problem. In release it shouldn't be as bad. Still, that's a LOT of particles every frame tho...
Quote:However, the two types of particles handled by these cases aren't even present in the game.

Now, this doesn't seem right to me. If the types satisfying these cases aren't present in the game, shouldn't this have no impact on performance?

It should gave a minor impact, that of testing the 'if()' or 'case' statements to see if they match, and however small it may be, a 'minor' impact is definitely not 'no' impact.

You added two case statements to check against. You loop 786,432 times (that's 1024x768), so you check the statements 786,432 times each.

Quote:I noticed that my game's performance is impacted quite heavily by the additions of two rather large cases.

How heavy is 'quite heavily'? How are you measuring the 'heaviness' of the addition?

Quote:For reference, here are the two cases I'm talking about. When I comment them out, I lose ~10 FPS:

What was your FPS before you lost 10 FPS? Typically, you don't measure perfomance by FPS, you measure it by the average number of millaseconds per frame, for this very reason: it's very misleading.

If you have 200 FPS, and drop to 190 FPS, that's not much of a drop.
200 FPS = 5 millaseconds per frame
190 FPS = 5.26 millaseconds

That's only an increase of 0.26 millaseconds per frame, in this case.

If you have 50 FPS, and drop to 40 FPS, that's a much larger drop, but still tolerable.
50 FPS = 20 millaseconds per frame
40 FPS = 25 millaseconds per frame

That's an increase of 5 millaseconds per frame. It's still only a drop of '10 FPS'.

Calculate how much millaseconds per frame it increased by, then decide if it's a huge increase or a small increase. Is the game running so slow that it's unplayable? If not, then work on your game, and optimize it later. Unless it's unbearably slow, don't worry about optimizing it, until you are getting ready for release. If it is unbearably slow, then by all means, try to make it go faster.
First of all, that's nearly a million particles, and I'm betting those particles are what, 16 bytes (1 3D vector + RGB color) at least? That's nearly 16 megabytes of particles alone! And if you don't have a pooled allocator, I bet you're hopping all over memory and trashing the hell out of your cache.

A million particles is pretty crazy honestly... In the last Halo: Reach ViDoc bungie was boasting, IIRC, the leap from 3000 particles/frame (halo 3 tech) to 30,000 per frame.

Frankly, the fact that you equate screen resolution with particles at all causes me to wonder if you know what a particle system should entail, unless this is for some kind of full-screen plasma effect or similar.

The reason you're likely seeing such an ill effect on performance as you add switch cases is because, as the amount of code in those cases grow (altogether) your loop is no longer fitting within the instruction cache (because each iteration is a different case, and different code). Another reason is that, as the number and disparity of particle types grows, the switch statement may have been changed from an efficient jump-table to something less efficient.


Particle systems are all about throughput -- and the only way you get that these days is to really make optimal use of memory bandwidth, and consequently, your cache at all levels.

Rather than having one loop which handles all particle types, you want to sort every particle into a different bin based on its type, and then process all particles of a given type in one go -- this means you don't have to ask all those questions (each case and if-then-else-if statement) for each particle -- its all this big, branchy code that's killing your performance. If you fix that, then your instruction cache will be happy.

If you take the next logical step, you want to allocate particles of each type from a custom memory allocator for that particle type, and then process them all in order from first to last. Processing particles should be order independent, so this is the most efficient use of the data cache.

throw table_exception("(? ???)? ? ???");

First of all, FPS and MS per frame are a simple equation away from one another. They are both useful. The point is that the relative difference in FPS with the only thing changed being the content of a switch is very odd, at least to me. Regardless, I actually have it tell me both of those things anyway, so the drop in terms of MS per frame is about 4 MS, from 24 to 20. This probably would not matter enough for me to even worry about, except I barely have it running passably as it is. I'm using SDL + OpenGL, and rendering 786,432 little particles each frame is quite taxing. For reference, I just looked at another project I'm working on, which uses more RAM, and consistently uses more CPU power. It renders at .8 MS per frame.

Apparently I need to clarify what I'm doing. It's a 2D game in which each pixel of the screen has a corresponding particle in the array, so that I can play around with them and do fun stuff. Think "Falling Sand Game." My use of the word 'particle' must have been misleading, sorry about that. This is not a particle engine as part of a larger game. This is the game, essentially.

The particles are each 2 bytes, so that's actually only about a meg and a half.
Is it possible for your logic (since we can't see the whole thing) to factor out the 'y<level.level_y-1' test and do it first?
Are you doing these performance tests in an optimized release build? Measuring performance in debug builds isn't very useful.

Assuming it is release mode, can you compare the disassembly in release between the slow and fast versions, and tell us what's different. Alternatively post enough code so others can compile it and do the comparisons.

By the way which compiler are you using?
Do a little test. Take the code for cases 4 and 5 and put them in their own little function. Call these functions exec_case_4() and exec_case_5(). Mark with a __declspec(noinline) assuming you're using Visual C++, and the corresponding declspec for GCC or Intel if you're using those compilers.

Now run your code again. I am willing to bet it will speed up. This would be because you're filling up the instruction cache with a bunch of useless code that isn't going to be executed. Maybe without these 2 cases, the entire block of code fits in the instruction cache, but with these 2 extra cases, it has to keep reloading the instruction cache on every iteration through the look.

If that doesn't work, check the generated assembly and see if it's perhaps generating a jump table for your switch statement. That could also exhibit poor performance for similar reasons.

This topic is closed to new replies.

Advertisement