• Advertisement

Archived

This topic is now archived and is closed to further replies.

really big switch statements

This topic is 5713 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I''m writing a console emulator. After fetching the opcode, there are 256 possible instructions to execute. The question is which is the fastest way to branch off to the instruction you need? Two choices I can think of, if there''s an even better way let me know. 1) one huge switch statement. case ''0x00'', case ''0x01'', ..., case ''0xFF''. This creates a huge function. Does the compiler see what you''re trying to do and create a "jump table" to quickly go to the case you''re after? If not, this amounts to just a big chain of if/elseif right? Seems pretty slow to me... 2) array of 256 function pointers, each pointing to a function that performs one instruction, and after fetching the opcode you call function opcode_handler[opcode](); . There are 256 small functions in your source file, each handling one instruction. There is no decision making, just indexing your function ptr array and calling them. But, this creates some overhead for every emulated opcode, especially since all the functions must return a value. Basically what''s the fastest way to branch off 256 possiblities?

Share this post


Link to post
Share on other sites
Advertisement
I think the compiler will build a jump table, if the switch values you use allow that (i.e. are nearly contiguous and small enough). Modern compilers are pretty good in local optimizations. Btw, you can find that out, by having a look at the code your compiler generates. Build in release mode (optimizations on), and insert a hard breakpoint just before the switch().... Then have a look to the assembly window.

Forever trusting who we are
And nothing else matters
- Metallica

Share this post


Link to post
Share on other sites
You could write the jump table stuff yourself in ASM...

Superpig
- saving pigs from untimely fates
- sleeps in a ham-mock at www.thebinaryrefinery.cjb.net

Share this post


Link to post
Share on other sites
I dont belive there would be much of a difference between the 2 approches. Depending on the compiler, the case statment might be faster. However you might want to keep in mind how easy it is to add and change opcodes.

The array of function pointers is the most flexiable approch.
The big thing for it is you dont have to change a huge case statment every time you need to change a single opcode.

The big advantage is opcode mapping. This is were you map various opcodes to different things based on some settings.

Any opcode you dont support, can be mapped to the invalid opcode function.

Share this post


Link to post
Share on other sites
There won''t be a real difference (unless you''re using a strange compiler), since both are just jump tables. The only faster approach I''ve seen was in an optimized life simulation, which took the option to be executed, multiplied it (using a shift) and added a base offset. Essentially there was an ''array'' of code sections. All the sections where aligned using filler to occupy the same power of 2 space, and stored continueously in memory.


jump (Option shr 2)+16

16://Code for option 0, ending with jump to END
//Filler ensures that the next section will be 16 bytes after the start of this one
32://Code for option 1, ending with jump to END
48;//Code for option 2, ending with jump to END

END

Share this post


Link to post
Share on other sites
The JIT-compiler for Rotor[1] is implemented as a switch(fjitcompiler.cpp):


  
switch (opcode)
{

case CEE_PREFIX1:
GET(opcode_val, unsigned char, false);
opcode = OPCODE(opcode_val + 256);
goto DECODE_OPCODE;

case CEE_LDARG_0:
case CEE_LDARG_1:
case CEE_LDARG_2:
case CEE_LDARG_3:
offset = (opcode - CEE_LDARG_0);
// Make sure that the offset is legal (with respect to the IL encoding)

VERIFICATION_CHECK(offset < 4);
goto DO_LDARG;

case CEE_LDARG_S:
GET(offset, unsigned char, false);
goto DO_LDARG;

case CEE_LDARG:
GET(offset, unsigned short, false);
DO_LDARG:
// Make sure that the offset is legal (with respect to the number of arguments)

VERIFICATION_CHECK(offset < maxArgs);
varInfo = &argsMap[offset];
if (methodInfo->args.isVarArg() && !varInfo->isReg) {
emit_VARARG_LDARGA(varInfo->offset, offsetVarArgToken);
trackedType = varInfo->type;

This goes on for several thousand lines, so I wont post everything

[1]http://msdn.microsoft.com/downloads/default.asp?URL=/downloads/sample.asp?url=/msdn-files/027/001/901/msdncompositedoc.xml


Share this post


Link to post
Share on other sites
The OO approach : one base class which contains a function DoInstruction (which is virtual) ,now derive for every opcode a class which implements DoInstruction for the specific opcode. Now you keep an array of pointer''s to the base class and you instantiate them with the right class (off course you sort them by opcode). Executing the opcode''s now becomes simple just call

array[opcode]->DoInstruction();

It''s about the same approach as the array of function pointer''s.
It''s faster than the big switch (if you wanna execute the last opcode you have to go through the switch, here you can do it immediatly).

Share this post


Link to post
Share on other sites
quote:
Original post by George2
It''s faster than the big switch (if you wanna execute the last opcode you have to go through the switch, here you can do it immediatly).


If you have a decent compiler the switch statement gets turned into a jump table, so its not any slower than your vtable, and you do not do 255 compares to get to the last op code. This is the point of having the switch statement in the language in the first place, otherwise you could just use if else if or have a much more flexible switch like basic''s select case.

Share this post


Link to post
Share on other sites
how exactly does the switch jump table work?
how would it do this:

  
switch(x)
{
case 0:break;
case 1:break;
case 5:break;
case 20:break;
case 34: break;
case 231:break;
default:break;
}


how can it make a jump table from such funny values? is it actually a series of jump tables mixed with normal ifs and such?

Share this post


Link to post
Share on other sites
quote:
Original post by disjunction
how can it make a jump table from such funny values? is it actually a series of jump tables mixed with normal ifs and such?


It depends on the compiler. You should look at the asm generated by your particular compiler if you are interested. I compiled your switch statement with some assignments inside (as opposed to just empty) and VC7 does indeed table the switch statement, although it does it by making the table have 232 entries, which is a huge waste of memory, at the gain of some speed.

Anyways, write an if statement when you want compare behavior. Write a switch statement when you want a jump table.

Share this post


Link to post
Share on other sites
I know VC++ 6 with the optimizations turned on makes very good jump tables. If the numbers are wierd it creates 2 tables, one 32-bit table for addresses and one 8-bit table for indexes to the address. Compilers are very good because they can make very optimized code rather than ASM that has to be human changeable

Creating a list of functions is no good, it is way more complicated to pass arguments and access variables and much slower than one big function

You already know a lot about assembly, so just look at the ASM listings to see if the code is any good

Share this post


Link to post
Share on other sites
This is a fairly complex problem - which method is faster depends on a couple of different things.

If the code to handle the op code is short, inline it and put it in a switch. If the code is long, you want to make a function.

Only profiling will tell you for certain. The switch statement should be faster if the code is inlined into the switch, because no function call takes place then. However, as the code in the switch becomes larger it can cause more cache misses, and if it swells to over 4k it could cause an extra page fault (lethal).

Magmai Kai Holmlor

"Oh, like you''ve never written buggy code" - Lee

[Look for information | GDNet Start Here | GDNet Search Tool | GDNet FAQ | MSDN RTF[L] | SGI STL Docs | STFW | Asking Smart Questions ]

[Free C++ Libraries | Boost | ACE | Loki | MTL | Blitz++ | wxWindows]

Share this post


Link to post
Share on other sites
Preliminary tests... in Release mode

big switch statement, emulator runs at ~31MHz
array of func ptrs, emulator runs at ~28MHz

I think the speed gain of the "jump table" is mostly negated by the fact that VC++ has over 20 instructions between the switch (opcode) and the first case line. Seems it isn't too good at accessing its own jump table

Not much difference in the two so far. Kind of shocking that my emulator, in its infancy stage, runs over 30 times slower than the hardware it emulates (I have a 1000 MHz t-bird)

[edited by - foofightr on June 1, 2002 12:50:54 PM]

Share this post


Link to post
Share on other sites
How about std::map?

I assume you''re getting you OPCODES from some sort of commands typed into the console. A std::map has a hash table built into it and does lookups very fast.

Share this post


Link to post
Share on other sites
No, I meant console emulator as in gaming console emulator. You know, video game consoles like Nintendo. You get the opcodes from the ROM file.

Magmai, the source file for the CPU code is 76kb, the majority which is the opcodes, so that would be quite a bit bigger than 4k even when optimized for space

Share this post


Link to post
Share on other sites

  • Advertisement