Faster than Switch

This topic is 2666 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

I have the following code in my program:
 switch (block) { case fire: updateFire(true); break; case smoke: updateSmoke(); break; case tree: updateTree(); break; case steam: updateSteam(); break; default: break; } 

This switch is called >10,000 times per frame so it needs to be faster.

All of these items are enums so they are numbers (0-3).

What I want to do is have all these functions in an array.

Something like this:
 // //Defines // enum item ...item list (tree, fire etc)... itemupdates[4]; //Functions that need to be called itemupdates[0] = updateFire(); itemupdates[1] = updateSmoke(); itemupdates[2] = updateTree(); itemupdates[3] = updateSteam(); // //Switch replacement // call itemUpdates[block]; 

How would I implement this?

I have tried googling this but it's such an uncommon task (I think) I don't even know what to search.

Share on other sites
First and foremost: Did you profile this, and found that it is indeed this switch statement that is the bottleneck?

Share on other sites

All of these items are enums so they are numbers (0-3).
What I want to do is have all these functions in an array.
How would I implement this?
A good compiler will automatically produce this function pointer table for you from your existing switch code!
You may have already implemented it!

I'm guessing that you have this problem because you've got a list of 10,000 items, which you iterate through, and it's only when you inspect each item that you discover it's type?

To optimise this, you should pull the type-test out of that loop, into the code above it. Partition your data in advance into groups of the same type, then update the entire group at once. updateAllFires(true); updateAllSmokes(); updateAllTrees(); updateAllSteams();

Share on other sites
Those of us that have Turing machines prefer a non-standard keyword "oracle(variable) { }" instead of switch...

In real world however there is no real solution. The proposed function pointer solution is equivalent and likely worse on some compilers.

To optimise this, you should pull the type-test out of that loop, into the code above it. Partition your data in advance into groups of the same type, then update the entire group at once.[/quote]
Which likely isn't possible. For Minecraft-like or similar map, cells cannot be rearranged.

Either one uses sparse iteration (intrusive linked list), but insertions that maintain in-order memory traversal will be n/C if using skip list.
One could sort on each frame, taking a n*C overhead and hope that C is smaller than branching overhead. Relies on stable sort and radix sort comes with some overhead.
One could use bi-directional pointers between typed lists and map data, but suffer from similar ordering problems.
Sparse matrix/bitmap would be another solution, where one only updates specific type on each pass, but it would only benefit if there are long runs of same types.

There simply isn't a universal solution that would be better.

switch is called >10,000 times per frame[/quote]It's not switch that is the bottleneck. 10,000 per frame is 600,000 per second. CPUs today can do 600 million on single core.

the bottleneck are your functions, not switch.

Share on other sites

Those of us that have Turing machines prefer a non-standard keyword "oracle(variable) { }" instead of switch...

In real world however there is no real solution. The proposed function pointer solution is equivalent and likely worse on some compilers.

To optimise this, you should pull the type-test out of that loop, into the code above it. Partition your data in advance into groups of the same type, then update the entire group at once.

Which likely isn't possible. For Minecraft-like or similar map, cells cannot be rearranged.

Either one uses sparse iteration (intrusive linked list), but insertions that maintain in-order memory traversal will be n/C if using skip list.
One could sort on each frame, taking a n*C overhead and hope that C is smaller than branching overhead. Relies on stable sort and radix sort comes with some overhead.
One could use bi-directional pointers between typed lists and map data, but suffer from similar ordering problems.
Sparse matrix/bitmap would be another solution, where one only updates specific type on each pass, but it would only benefit if there are long runs of same types.

There simply isn't a universal solution that would be better.

switch is called >10,000 times per frame[/quote]It's not switch that is the bottleneck. 10,000 per frame is 600,000 per second. CPUs today can do 600 million on single core.

the bottleneck are your functions, not switch.
[/quote]

This is actually what I am using currently:
 if (block == water || block == oil || block == lava) { updateLiquid(i); } else if (block == ocean) { updateOcean(i); } else if (block == sand) { updateFall(i); } else if (block == wood) { updateWood(i); } else if (block == fire) { if (updateStep==0) { updateFire(i, true); } else { liquidBufferAdd(gridActive.x,gridActive.y,gridActive.z); } } else if (block == tree) { updateTree(i); } else if (block == steam) { updateSteam(i); } else if (block == smoke) { updateSmoke(i); } if (block == lava) { updateFire(i, false); updateLavaTouchWater(i); } 

Why did I do an If Statement Chain? I have no idea.
That's why I'm changing to the switch.

The switch should be faster than the chain right?

Share on other sites
Which likely isn't possible. For Minecraft-like or similar map, cells cannot be rearranged.
It's always possible (esp. when speaking of turing machines) and downright necessary on certain CPUs.
The most common implementation we use is lists of indices into the storage area.array<BlockStorage> items; array<int> idxTypeA; array<int> idxTypeB; foreach( int i in idxTypeA ) UpdateTypeA( items ) foreach( int i in idxTypeB ) UpdateTypeB( items ) function: ChangeType( array& from, array& to, int idx ) from.remove(idx) to.add(idx)

Share on other sites
I am making a Voxel Engine similar to Minecraft. My game is more like a falling sand 3d (Reaction Focused) instead of a mining game.

Arrays:
The way I use to manage the active blocks is to declare an array to store positions of active blocks
(Checking each block would take Way too long)
When it adds a block it looks threw the array to see if it's already in there, using a for loop, then it looks for an empty position in the array and adds it, using a for loop.
During the update function the array is copied and it cycles threw the new array looking at each block that is listed and its surrounding, changing accordingly.
(It must be copied to make sure it's not being performed twice, because sometimes an action adds another place to watch)

Deque:
Now I have decided to use deque.
When it adds a block it looks threw the array to see if it's already in there, using a for loop, then it pushes it on the back.
During the update function it pops each element from the deque and checks each block that is listed and its surrounding, changing accordingly.
Because sometimes an action adds another place to watch, I will need some means of telling it where to stop.

The second method should be faster right?
Is there a way I can farther improve this?

Share on other sites

The switch should be faster than the chain right?

It depends. Compilers do optimize switches, e.g. by using a binary tree of comparisons instead of a chain, and even switching to jump tables at certain sizes (which depend on the target architecture).

Overall, worrying about the switch statement is worrying about the wrong thing, unless you have proof in the form of profiling. Most likely, it's the functions you're calling that take up a lot of time.

That said, the key to optimization is to not do the same work faster, but to do less work instead. It appears like you are looping over the entire world, performing an update step for each cell every frame. Most likely, this can be avoided by rearranging the way your simulation works. For example, it is most likely that the vast majority of cells are actually idle most of the time, so the way to improve speed is to simply not loop over idle cells. If it turns out that only 1% of the cells are active at any typical point in time, then looping over only those active cells can improve the performance of your simulation by an order of magnitude or more.

On top of that, many types of cells probably work fine if you update them once every few seconds instead of once per frame. Those are just some ideas that could help significantly speed things up much more than any microoptimizations could ever do.

Share on other sites
When everyone says profiler, what are you referring to?

1. 1
2. 2
Rutin
21
3. 3
4. 4
frob
15
5. 5

• 9
• 13
• 9
• 33
• 13
• Forum Statistics

• Total Topics
632593
• Total Posts
3007282

×