Jump table optimization

Started by
5 comments, last by Russell 21 years, 2 months ago
I read that you can use a switch statement to generate a jump table and create branchless code. For example: If you had this...

for (player = FirstPlayer(); *player; player++) {
  if (player->type == 0)
    ShootAtHimBecauseHeIsAnEnemy();
  else if (player->type == 1)
    GoHelpHimBecauseHeIsAFriend();
}
 
We could replace it with...

for (player = FirstPlayer(); *player; player++) {
  switch (player->type) {
    case 0: goto team_1;
    case 1: goto team_2;
  }
  team_1:
  ShootAtHimBecauseHeIsAnEnemy();
  continue;
  team_2:
  GoHelpHimBecauseHeIsAFriend();
  continue;
}
 
First off, is this true that this is faster (switch statement), and if so, how much faster? Also, I am curious how much you could use this method before it stops being faster. For instance, if you have one jump table that jumps to another jump table that jumps to another jump table, etc...still faster than conditionals? If you could replace all of your conditionals in an inner loop with switch statements that translate into jump tables, it might be worth trying. It could get ugly after a while though. I''d like to get an idea of the implications from a performance standpoint, as well as from a design perspective (which I am guessing isn''t good). One other thing I wondered about when I saw this was that function pointers are generally going to be at least as slow as a conditional, since there is indirection there. How come this is fast, but using function pointers is not?
Advertisement
(I hope this is somewhat coherent - it''s bedtime )

Unless you have tons of players and types, selecting an action to perform will most likely not have a significant impact on performance.

That said, some things you could do:
1) sort players according to type, thereby hoisting the tests out of the loop (good if there are lots of players, few types)
2) sort your tests by probability (big gains if there are many players of a certain type)
3) use a jump table (iff you have lots of (preferably equally likely) types and profiling in normal operation shows it''s faster)

quote:I read that you can use a switch statement to generate a jump table and create branchless code.

Your 2nd example is filled with branches - 2..3 (depending on compiler cleverness) jumps per loop, one of which is impossible to predict.

quote:First off, is this true that this is faster (switch statement), and if so, how much faster?

Don''t ask about performance without answering the ''if'' statements above; profile / look at the asm code!

quote:Also, I am curious how much you could use this method before it stops being faster. For instance, if you have one jump table that jumps to another jump table that jumps to another jump table, etc...still faster than conditionals?

That''ll kill performance.

quote:I''d like to get an idea of the implications from a performance standpoint, as well as from a design perspective (which I am guessing isn''t good).

Using switch is fine. Emulating a jump table is silly (compilers may translate switch into cmp/jcc chains, a binary decision tree, or a jump table, on top of which you are adding another layer of indirection).

quote:One other thing I wondered about when I saw this was that function pointers are generally going to be at least as slow as a conditional, since there is indirection there. How come this is fast, but using function pointers is not?

Jump tables are generally slow for the same reason.

HTH
Jan

500x1
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3
switch is nice because it can filter out input that is outside the range of your case statements. So if you are checking each number between 1 and 40, as opposed to an if/else chain where each conditional has to be evaluated, switch will jump to the default case (vis it''s jump table algorithms) if the input is outside the range.

Sometimes a switch statement won''t compile into a jump table if the range of the case numbers is too large. For example:
switch(i){   case 11:      {         // Stuff      } break;   case 9:      {         // Stuff      } break;	case 4:   case 4:      {         // Stuff      } break;   case 15:      {         // Stuff      } break;   case 9999:      {         // Stuff      } break;   default:      break;} 

The compiler isn''t going to generate a jump table for this because it needs at least a byte for every integer between 4 and 9999 in addition to the addresses to jump to for each case... now that''s just a lot of useless "code" if the compiler were to use a jump table (as jump table addresses are compiled into the code segment) This would turn into a cmp/je chain as Jan suggested.

You have to consider a whole bunch of things when you think about whether or not to use a switch. In terms of memory, each case statement itself (including the default case if it executes some code) will need four CS bytes for a pointer, and each integer value in-between your high/low case values will need a CS byte as a table index. The switch statement also generates the code to check for out-of-range values, as well as find the index into the jump table, and finally jump to the stored address.

In terms of speed, switch has to perform additional instructions to work properly, as mentioned above. It has to use a few indirections (if byte ptr [edx] is considered an indirection in assembly) for the jump table.

You will be better off using an if/else chain if you only have a few cases. Once you get past 8, 9, or 10 cases, however, then it''s time to consider a switch statement.

You know what? You would be best off disassembling some code an taking a look yourself
quote:You know what? You would be best off disassembling some code an taking a look yourself

I''m not sure what good that would do. I know assembly enough that I could follow the basic instructions if needed, but I wouldn''t have known (for instance) that a jmp [eax] is considered an indirection. I would have looked at that and thought, "wow! one instruction! blazing fast!"

Thus, I am posting here
quote:but I wouldn''t have known (for instance) that a jmp [eax] is considered an indirection.


Hehe, I didn''t know it was an indirection until I tryed following the code without the indirection and it didn''t make sense

But looking at the switch assembly can give you a good idea on how jump tables work. You could even implement your own if you wanted to
learning techniques to optimize important areas of code, great, but remember the following:

rearanging the code to trick the compiler into generating certain output has many downsides (it won''t work the same on different compilers, it won''t work the same on different platforms, it erases many of the benifits of high level languages, specifically it can hurt code self documentation and maintainability, it may actually hurt the performance in your code, if you analyze it incorrectly or situations change).

my advice for optimizers is always, spend your time learning techniques for efficiently and acurately profiling your code (including using profiling tools and understanding their output and the big picture), THEN the types of optimizations you should perform will often become evident ... you won''t be asking, how many levels of jumps tables should i use, you''ll be asking - can I quit verifying the range of my data in this important inner loop, or ... maybe I shouldn''t use dynamic memory inside this condition or loop, since I know the maximum size isn''t that big ...

remember that getting the RIGHT behavior is way more important than the best performance ... and getting a job done, and to market on schedule will keep the contracts coming. AFTER the unit/module/project is functionally complete ... which is the perfect time for many reasons ... 1. you have already solved your problem, so it hopefully won''t be changing much anymore, 2. only once the section is feature complete do you have enough information to know if a jump table or other such low level optimization can actually be of any benifit, 3. the nearer to completion the project is, the more meaningful the profiling information.
quote:The compiler isn''t going to generate a jump table for this because it needs at least a byte for every integer between 4 and 9999 in addition to the addresses to jump to for each case... now that''s just a lot of useless "code" if the compiler were to use a jump table (as jump table addresses are compiled into the code segment) This would turn into a cmp/je chain as Jan suggested.

One huge value won''t prevent the compiler from using a jump table - it can just test for the 9999 case, and jump otherwise. BTW, jump table code usually looks like this:
> eax - value to be tested (0..n-1)(code)jmp [tbl+eax*4]		; indirect branch(data)tbl:			; jump targets (32 bit absolute addr)dd addr_0...dd addr_n-1 


quote:You have to consider a whole bunch of things when you think about whether or not to use a switch. In terms of memory, each case statement itself (including the default case if it executes some code) will need four CS bytes for a pointer, and each integer value in-between your high/low case values will need a CS byte as a table index. The switch statement also generates the code to check for out-of-range values, as well as find the index into the jump table, and finally jump to the stored address.

The compiler usually does the thinking VC7 tends to use cmp/jcc when there are only a few cases, a jump table if there are a bunch of cases (say > 8) in a tight range, and some moderately clever decision tree code otherwise.

quote:In terms of speed, switch has to perform additional instructions to work properly, as mentioned above. It has to use a few indirections (if byte ptr [edx] is considered an indirection in assembly) for the jump table.
You will be better off using an if/else chain if you only have a few cases. Once you get past 8, 9, or 10 cases, however, then it''s time to consider a switch statement.

I''d say compilers have an easier time optimizing switch statements (there is no overhead specific to switch).
I guess it only matters if a) one case occurs 95% of the time or b) or you need to test intervals => use if; c) lots of cases in a tight range => use switch (the compiler will likely make a jump table out of it). Otherwise, use what you like


quote:I know assembly enough that I could follow the basic instructions if needed, but I wouldn''t have known (for instance) that a jmp [eax] is considered an indirection. I would have looked at that and thought, "wow! one instruction! blazing fast!"

Jump addresses can be specified indirectly in memory or in a reg, as opposed to a direct address/displacement in the instruction. Problems with jump tables: the table might not be in cache (ouch), and the jump can''t be predicted (bummer on CPUs with a deep pipe).


> rearanging the code to trick the compiler into generating certain output has many downsides
I agree. Beyond a certain level, if you care about performance, write it in asm!
E8 17 00 42 CE DC D2 DC E4 EA C4 40 CA DA C2 D8 CC 40 CA D0 E8 40E0 CA CA 96 5B B0 16 50 D7 D4 02 B2 02 86 E2 CD 21 58 48 79 F2 C3

This topic is closed to new replies.

Advertisement