Inline assembly

Started by
16 comments, last by cr88192 11 years, 1 month ago
I use VirtualProtect and PAGE_EXECUTE to dynamically generate re-entrancy stubs in the Epoch runtime. Basically it allows you to hand off Epoch code as a callback to a C library and get it invoked correctly, using a generated machine-code bridge function and a little bit of marshaling magic.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Advertisement

LLVM's JIT engine is pretty much the definitive kick-ass explosion-laden version of this concept :-)

yep, I didn't know about LLVM when I wrote mine, and when I encountered it didn't want to have to go through the hassle of converting all my stuff over to work with it.


I was originally exposed to how JITs and assemblers by looking at the Quake3 JIT, and also the NASM source. I ended up making my assembler as more of a compromise.

it is internally string-driven, using strings consisting of a mix of commands and hex-digits (giving literal byte values), using a notation derived from that used in the Intel docs (just with other letters to indicate various prefixes, ...).

(~ 75% of the work was mostly transcribing the contents of the Intel docs into a big assembler listing). (this notation was later hackishly extended to support ARM and Thumb, and transcribing their block-notation into the listing's notation).

basically, the mnemonic is used to lookup the first instruction form, and it will walk along until it finds one where the arguments match those parsed from the ASM code, then pass this off to a function to convert the ASM listing-string and arguments into the output opcode.

initially, an early JIT of mine was using it at a much lower level, generally using assembler internal functions directly, but I later ended up mostly switching over to a textual ASM interface mostly as this was much nicer to work with, and switched from directly spitting out machine code into a buffer on each command, to buffering code into sections in an object-context (and subsequently linking the produced object).

I have generally found it to be "fast enough". the cost of parsing the ASM code doesn't add all that much to the total cost of assembling it. much more relevant are things like whether or not the preprocessor are enabled, and whether or not it is using multi-pass jump-length optimization, ...


my higher-level JITs spit out textual ASM code for the target.

I have on/off had ideas for a medium level "portable lower-level intermediate language" (as an intermediate between a language-specific back-end and the native target-specific ASM), but thus far not much has been done. a recent partial effort (~ last year) would have been loosely inspired by the Dalvik IR (and was being designed along with a reasonably fast interpreter), but I ended up mostly sticking with my existing script-language back-end (its JIT is naive, but "generally good enough").


part of the challenge for these low-level IRs is mostly that of effectively glossing over the various C ABIs in a "sensible" (reasonably efficient) way (partly due to IMHO the SysV/AMD64 ABI being "filled with evil" in some areas, and me mostly using a subset which is mostly compatible as long as one isn't passing complex structs by value across language boundaries, I used a simplification here: structs are either passed as a single register (if it will fit), or by using a reference/pointer, without any sort of structure-decomposition).

mhagain, on 15 Mar 2013 - 13:26, said:
I've never personally tested this, but http://msdn.microsoft.com/en-us/library/windows/desktop/aa366553(v=vs.85).aspx indicates that VirtualAlloc or VirtualProtect with the appropriate PAGE_EXECUTE flag should allow it.

It's still horrible though. Even if there are sometimes reasons why you may want to do it, it remains horrible (just imagine trying to debug it!)

Yeah. I was expecting to have to use those functions when I was goofing with it. The VS 2008 debugger had no problem jumping right into the code though. In disassembled view, ofc. It is pretty horrible though. biggrin.png
void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.


mhagain, on 15 Mar 2013 - 13:26, said:
I've never personally tested this, but http://msdn.microsoft.com/en-us/library/windows/desktop/aa366553(v=vs.85).aspx indicates that VirtualAlloc or VirtualProtect with the appropriate PAGE_EXECUTE flag should allow it.

It's still horrible though. Even if there are sometimes reasons why you may want to do it, it remains horrible (just imagine trying to debug it!)

Yeah. I was expecting to have to use those functions when I was goofing with it. The VS 2008 debugger had no problem jumping right into the code though. In disassembled view, ofc. It is pretty horrible though. biggrin.png


IIRC, DEP is not enabled by default for user code with 32-bit apps, so you can generally get around needing it.
IMO it is still a good idea though.

things are a little more strict on Linux though, and for daemons it is generally required to essentially "double map" the executable memory (to have separate RX and RW mappings, vs a single RWX memory region). thus far I haven't bothered yet, as at least for user-apps, the distros I have been using don't really seem to care that much.

Is RX even used? I remember that with protected mode you could set a selector to be unreadable and unwritable but executable (meaning you can get the CPU run code off it, but you can't get its instructions to read memory from that area), I'd imagine this is still possible in long mode.

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

Is RX even used? I remember that with protected mode you could set a selector to be unreadable and unwritable but executable (meaning you can get the CPU run code off it, but you can't get its instructions to read memory from that area), I'd imagine this is still possible in long mode.

it is basically the same as before just with the addition of an NX/XD bit to disable executing code.

RX is basically useful for being able to put literal values in the data.
this can be useful for things like executable structures (such as when creating things like closures which mimic C function pointers), where a single data-object may contain both the executable stub-code, and also data members for things like the captured environment bindings, ...

lol, I was literally thinking about creating a thread about this a few minutes ago: writing assembly code without an inline assembler!!!

I've done this many times, but I've never actually used a string. I always used an dynamically allocated array of bytes defining my assembly code. I've never had to do this for a game (except once when I first started coding for MacOSX and I didn't know what Apple's breakpoint function was called), but I've done this very frequently when working on emulators. Not trying to brag, but some of you might actually recognize my nickname. If you do, then I guess I don't need further explanation. ^_^ It really comes in handy if you're writing a static-rec or dynarec engine (I prefer static rec).

For games, I'd only use it to write my own intrinsics for something like SSE, CPUID or those oddball instructions that aren't recognized by the inline assembler. Of course, this isn't an easy task, you'll have to read the docs on Intel's webpage: http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

WBINVD anyone? wink.png

Oh, and depending on what compiler you're using, debugging it isn't hard! XCode disassembles properly written byte encoded assembly code and steps through it normally. That's just one of the many reasons I absolutely LOVE XCode and MacOSX! cool.png

Shogun

@Shogun: yes, this is also a possible reason to write an assembler.

then you can dynamically assembler whatever, without having to manually invoke the Intel docs and generate the machine code sequences by hand.

also, in the simple case, writing an assembler isn't really hard, in my case most of the "work" was the long/tedious job of transcribing the instruction-listings and similar.

(initially, I wrote it over the course of a few days, but the assembler has expanded a bit since then).

though, yes, I have still used manually-generated machine code in a few places, as for certain uses this is more useful (the assembler isn't entirely free...).

as an example, from my listing:


add
        04,ib            al,i8
        X80/0,ib        rm8,i8        aleph
        WX83/0,ib        rm16,i8        aleph
        TX83/0,ib        rm32,i8        aleph
        X83/0,ib        rm64,i8
        X02/r            r8,rm8        aleph
        X00/r            rm8,r8        aleph
        W05,iw        ax,i16
        WX81/0,iw        rm16,i16        aleph
        WX03/r        r16,rm16        aleph
        WX01/r        rm16,r16        aleph
        T05,id        eax,i32
        TX81/0,id        rm32,i32        aleph
        TX01/r        rm32,r32        aleph
        TX03/r        r32,rm32        aleph
        X05,id        rax,i32
        X81/0,id        rm64,i32
        X03/r            r64,rm64
        X01/r            rm64,r64

... 

inc
        W40|r            r16            leg
        T40|r            r32            leg
        XFE/0            rm8            aleph
        WXFF/0        rm16            aleph
        TXFF/0        rm32            aleph
        XFF/0            rm64

where 'X' basically means where to put the REX prefix, W/T where to insert the operand-size prefix (when relevant), ...

(other letters have other meanings, like V/S for address-size prefix, H/I/J/K/L for VEX and XOP forms, ...).

likewise: '/r' means "insert ModRM here", ',id' means immediate dword, ...

and the last line is basically for flags, such as to indicate when/where sequences are valid (CPU modes, ...).

'leg', basically means "legacy only", 'long' means long-mode only, and 'aleph' was basically for a past x86 subset.

I had once considered having a verifiable x86 subset sort of like NaCl (but more like the JVM verifier), but this idea didn't really go anywhere.

when the assembler is compiled, a tool basically takes the instruction listings, and converts them into the relevant C source files and headers (basically, it converts them into big prebuilt tables).

This topic is closed to new replies.

Advertisement