# A Wild switch() Appears!

This topic is 1865 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

@Nypyren:

when I had written an x86 disassembler, and later an interpreter/emulator (userland only), I had used the listing tables (basically, the same tables as used by my assembler). the listings were basically held in a text file which was using a notation derived from the Intel docs, which during the build process would be converted into C source files and matching headers via a tool (so parts of the assembler and disassembler/emulator were procedurally generated).

for the interpreter, for speeding up the decoding, I had used a set of index tables (IIRC, 2x 256), where one table would match single-byte instructions, and fall-back tables would match 2-byte forms, followed by table searches for anything that couldn't be matched in 2 bytes (it would have a list of possible matching instructions). this made the code a lot less hairy, but potentially lots of big switches could have been faster.

granted, to make the interpretation faster, I had ended up effectively implementing a CPU cache which would convert "traces" of x86 instructions into threaded code (sequences of calls via function pointers), which was later used by my script-language interpreter (and, my script VM JIT uses a hybrid strategy, mixing direct instruction sequences with calls back into C land, hence part of the reason for its weak performance).

decided to refrain from going to much into the nature and issues with trying to get my script VM to go faster (summary: most of the time anyways goes into the C->script calls, but there is no "good" way to really optimize this short of making the C side VM interface behave more like COM+ or similar, *, ...).

granted, decoding instructions for my script language does itself use some giant switch() blocks (quick check, the "main one" is about 2000 lines, or about 4x longer than the one from BTIC1C), which mostly just serves for unpacking the instruction into a structure and picking the correct function-pointer to execute it (however, I consider it "cleaner" and less "wild" mostly as it is better organized and is a single flat-list switch, vs some unwieldy mess of nested switches implementing a small state-machine).

most of the actual "execution logic" within the C side of the VM then mostly consists of short functions often doing a single operation (such as: adding two integers and storing the result, ...)

*: basically, ability to get C side interfaces for various script-language classes, and then make more direct calls into the compiled script-code, vs calling via variable-argument C functions which end up wasting clock-cycles doing things like copying function-arguments from a va_list based on a signature string, ... as-is, this actually outweighs the cost of the naive JIT output, as most of the script functions / methods are fairly short.

Edited by BGB

##### Share on other sites
My previous disassembler worked like that - a large text file that I would parse into a data structure that was used to disassemble.  Unfortunately my existing data structure wasn't very well suited to dealing with SSE/VEX prefixes and was very difficult to debug.  So when I was updating my disassembler to support SSE/AVX instructions I decided to change to hardcoded switch statements.  It's much easier to debug overall, and I can use more clever rules:

return ValidSV13(Opcode.VPCMPEQB, Vdq, Hdq, Wdq);

The "ValidSV13" is a special way of dealing with SSE vs. AVX instructions which are almost identical. Most AVX instructions only differ from their SSE counterparts by having one extra operand (a nondestructive output register). However, there are a few different patterns:

(excuse my naming convention - I often called AVX "VEX" in my code instead due to dealing with the VEX prefixes too often)

        private bool Valid(Opcode opcode, params Func<Operand>[] operandDecoders)
{
AccountForSIMD66(opcode); // If prefix 66 is present but the opcode is SIMD, revert the operandSize to default!

Operand[] operands = new Operand[operandDecoders.Length];
for (int i=0; i<operandDecoders.Length; ++i)
operands[i] = operandDecoders[i]();

var machineCode = stream.ToArray().Take((int)stream.Position).ToArray();

Instruction = new Instruction(addr, machineCode, opcode, operands);

return true;
}

private void AccountForSIMD66(Opcode opcode)
{
if (g3 == 0x66 && simd66Opcodes.Contains(opcode))
{
g3 = 0;
ApplyOperandSizePrefixes();
}
}

/// <summary>
/// Valid, only in non-64-bit modes.
/// </summary>
private bool Valid_i64(Opcode opcode, params Func<Operand>[] operandDecoders)
{
if (x64)
return false;

return Valid(opcode, operandDecoders);
}

/// <summary>
/// Valid, only in 64-bit mode.
/// </summary>
private bool Valid_o64(Opcode opcode, params Func<Operand>[] operandDecoders)
{
if (!x64)
return false;

return Valid(opcode, operandDecoders);
}

/// <summary>
/// Valid, in 64-bit mode the 32-bit operand size is forced to 64-bits.
/// </summary>
private bool Valid_d64(Opcode opcode, params Func<Operand>[] operandDecoders)
{
if (x64 && operandSize == 32)
operandSize = 64;

return Valid(opcode, operandDecoders);
}

/// <summary>
/// Valid, in 64-bit mode all operand sizes are forced to 64-bits.
/// </summary>
private bool Valid_f64(Opcode opcode, params Func<Operand>[] operandDecoders)
{
if (x64)
operandSize = 64;

return Valid(opcode, operandDecoders);
}

// Naming convention for the following functions:
// SV###.  Each digit indicates the operand that the SSE version takes.  The VEX version takes all arguments.

// For example, SV13 means:
// SV = SSE/VEX, automapping the opcode from VEX opcode to SSE.
// 1 = argument 1 is the first SSE operand.
// 3 = argument 3 is the second SSE operand.

// Functions which take all the same arguments are provided for consistency.

/// <summary>
/// SSE or VEX both taking the same two operands.
/// Auto-maps the VEX opcode to SSE when SSE is used.
/// </summary>
private bool ValidSV12(Opcode vex, Func<Operand> v1, Func<Operand> v2)
{
return vexPresent ? Valid(vex, v1, v2)
: Valid(VEXtoSSE[vex], v1, v2);
}

/// <summary>
/// SSE of VEX both taking the same three operands.
/// Auto-maps the VEX opcode to SSE when SSE is used.
/// </summary>
private bool ValidSV123(Opcode vex, Func<Operand> v1, Func<Operand> v2, Func<Operand> v3)
{
return vexPresent ? Valid(vex, v1, v2, v3)
: Valid(VEXtoSSE[vex], v1, v2, v3);
}

/// <summary>
/// SSE with operands 1 and 3 or VEX with all three.
/// Auto-maps the VEX opcode to SSE when SSE is used.
/// </summary>
private bool ValidSV13(Opcode vex, Func<Operand> v1, Func<Operand> v2, Func<Operand> v3)
{
return vexPresent ? Valid(vex, v1, v2, v3)
: Valid(VEXtoSSE[vex], v1, v3);
}

/// <summary>
/// SSE with operands 2 and 3 or VEX with all three.
/// Auto-maps the VEX opcode to SSE when SSE is used.
/// </summary>
private bool ValidSV23(Opcode vex, Func<Operand> v1, Func<Operand> v2, Func<Operand> v3)
{
return vexPresent ? Valid(vex, v1, v2, v3)
: Valid(VEXtoSSE[vex], v2, v3);
}

/// <summary>
/// SSE with operands 1, 3 and 4 or VEX with all four.
/// Auto-maps the VEX opcode to SSE when SSE is used.
/// </summary>
private bool ValidSV134(Opcode vex, Func<Operand> v1, Func<Operand> v2, Func<Operand> v3, Func<Operand> v4)
{
return vexPresent ? Valid(vex, v1, v2, v3, v4)
: Valid(VEXtoSSE[vex], v1, v3, v4);
}


##### Share on other sites

Wow that looks very much like my first attempt at writing a c64 emulator.

##### Share on other sites

@Nypyren:

I still use the listing-based strategy with SSE, XOP, and AVX stuff, but at one point did have to tweak things to allow opcodes with more than 3 operands (IIRC, I increased the limit to 7);

also had to extend the notation some to allow representing the rules for the added prefixes (there was a little hair WRT these prefixes, and neither Intel nor AMD stuck with a straightforward plain-ASCII notation, so another notation was devised). actually, the listing notation also includes special characters for where the various prefixes go, ...

my interpreter didn't use SSE, FWIW, but was limited to a more 486-like subset, namely basic integer operations and the x87 FPU (emulated mostly via 64-bit integer math).

FWIW: at this point, this is the only really free / unencumbered form of x86, as having MMX or SSE support or similar risks stepping on patents (so, for example, a CPU vendor could freely make a chip using a 486 like version of the ISA, but would have to license things from Intel and AMD if they wanted things like SSE and x86-64).

I think I had planned on adding emulated MMX and SSE, but didn't get to it though, and the interpreter turned out to be "nowhere near as useful as originally imagined". mostly at the time I had figured x86 could be used as an IL, but then I came to realize: x86 doesn't really make for a good IL.

##### Share on other sites

...x86 doesn't really make for a good IL.

I wholeheartedly agree. The x86 encoding's main redeeming feature over common RISC encodings is that the flexible encoding length allows for much more space-efficient code (doing MOVs are typically 2 bytes, immediate operands don't require two separate hi/lo instructions, common stuff like push/pop are encoded in 1 byte). But for all other considerations it's WAY too complicated.

My decompiler uses an IL with 15 instructions (5 of which are for the SCAS/MOVS/etc instructions since those are almost always part of a strlen/strcpy/memcpy function that got inlined and are much easier to detect if I don't convert them to full loops in IL).

##### Share on other sites

...x86 doesn't really make for a good IL.

I wholeheartedly agree. The x86 encoding's main redeeming feature over common RISC encodings is that the flexible encoding length allows for much more space-efficient code (doing MOVs are typically 2 bytes, immediate operands don't require two separate hi/lo instructions, common stuff like push/pop are encoded in 1 byte). But for all other considerations it's WAY too complicated.

My decompiler uses an IL with 15 instructions (5 of which are for the SCAS/MOVS/etc instructions since those are almost always part of a strlen/strcpy/memcpy function that got inlined and are much easier to detect if I don't convert them to full loops in IL).

the main problem I had with trying to use x86 as an IL (for an interpreter) was that it tends to get too many architecture details tied up with how the instructions work, so it is pretty much "nearly impossible" to effectively gloss over things like what the pointer-size is or whether registers are naturally 32 or 64 bits, ... while still keeping it binary compatible with native x86.

the result then ends up looking like a sub-program operating within a main program with needing contortions to get data or calls into or out-of the interpreter, which wasn't really worthwhile. it could work maybe, but compared poorly with just using a script language compiled to a bytecode (and leaving most other things to native code).

meanwhile, a bytecode then can be designed to gloss over a lot of architectural details (like how big the native-size pointer or integer is, how exactly many values are represented, ...).

most of my bytecode formats have generally been stack-based (for example, current script-VM bytecode is a statically-typed stack-machine). it has a lot of instructions, but is in some ways "rather CISC".

I have a few times considered trying to move over to 3AC (*), but the level of work involved (would need to re-implement much of the VM core) has thus far prevented me from doing so.

some early tests (with a small prototype interpreter loop and hand-crafted instruction sequences) had shown some rather promising looking performance statistics (I was getting within 9x of C just with just an interpreter). my estimates are that a JIT for it could likely produce C competitive performance.

besides the effort thus far killing off most efforts to make the move, there has been conflict over the best way to get from the ASTs to 3AC (whether to first compile to the existing bytecode, then convert this to 3AC and run it through an optimizer, or to write a new upper-end and compile directly from the ASTs to 3AC), ...

*: sort of like a RISC but with variable-length instructions.

most of the 3AC bytecode wouldn't be all that notable.

sort of gives the basic idea:

the logical representation of the instructions was in an ASM-like format (*2), but the actual instructions generally consisted of a prefix+opcode pair followed by any operands (the prefix would encode the value-type and operand layout, with the opcode saying which logical operation to perform).

*2:

opcode.type dest, srca, srcb

opcode.type dest, src, immed

...

##### Share on other sites

OP

cant help much - irregular logic code is just irregular...

Possibly use a C macro for the stuffing of the   ctb[0]..[3] to reduce it to one line      and  another for ctb[4] ..[7]

##### Share on other sites

OP

cant help much - irregular logic code is just irregular...

Possibly use a C macro for the stuffing of the   ctb[0]..[3] to reduce it to one line      and  another for ctb[4] ..[7]

yeah, it is possible...

a lot of code is written fairly quickly and so tends to develop repeating patterns and some verbosity.

some things are left expanded out in case I see a way to make them faster, where putting things in macros can sometimes hide potential optimizations, or sometimes make some issues for looking at and stepping through stuff in the debugger.

though, for example, "*_memcpy8()" is actually a macro in this case (chooses a target-specific mechanism to copy 8 bytes).

• ### What is your GameDev Story?

In 2019 we are celebrating 20 years of GameDev.net! Share your GameDev Story with us.

(You must login to your GameDev.net account.)

• 11
• 13
• 9
• 9
• 15
• ### Forum Statistics

• Total Topics
634078
• Total Posts
3015375
×