Inline assembly
yep, I didn't know about LLVM when I wrote mine, and when I encountered it didn't want to have to go through the hassle of converting all my stuff over to work with it.LLVM's JIT engine is pretty much the definitive kick-ass explosion-laden version of this concept :-)
I was originally exposed to how JITs and assemblers by looking at the Quake3 JIT, and also the NASM source. I ended up making my assembler as more of a compromise.
it is internally string-driven, using strings consisting of a mix of commands and hex-digits (giving literal byte values), using a notation derived from that used in the Intel docs (just with other letters to indicate various prefixes, ...).
(~ 75% of the work was mostly transcribing the contents of the Intel docs into a big assembler listing). (this notation was later hackishly extended to support ARM and Thumb, and transcribing their block-notation into the listing's notation).
basically, the mnemonic is used to lookup the first instruction form, and it will walk along until it finds one where the arguments match those parsed from the ASM code, then pass this off to a function to convert the ASM listing-string and arguments into the output opcode.
initially, an early JIT of mine was using it at a much lower level, generally using assembler internal functions directly, but I later ended up mostly switching over to a textual ASM interface mostly as this was much nicer to work with, and switched from directly spitting out machine code into a buffer on each command, to buffering code into sections in an object-context (and subsequently linking the produced object).
I have generally found it to be "fast enough". the cost of parsing the ASM code doesn't add all that much to the total cost of assembling it. much more relevant are things like whether or not the preprocessor are enabled, and whether or not it is using multi-pass jump-length optimization, ...
my higher-level JITs spit out textual ASM code for the target.
I have on/off had ideas for a medium level "portable lower-level intermediate language" (as an intermediate between a language-specific back-end and the native target-specific ASM), but thus far not much has been done. a recent partial effort (~ last year) would have been loosely inspired by the Dalvik IR (and was being designed along with a reasonably fast interpreter), but I ended up mostly sticking with my existing script-language back-end (its JIT is naive, but "generally good enough").
part of the challenge for these low-level IRs is mostly that of effectively glossing over the various C ABIs in a "sensible" (reasonably efficient) way (partly due to IMHO the SysV/AMD64 ABI being "filled with evil" in some areas, and me mostly using a subset which is mostly compatible as long as one isn't passing complex structs by value across language boundaries, I used a simplification here: structs are either passed as a single register (if it will fit), or by using a reference/pointer, without any sort of structure-decomposition).
Yeah. I was expecting to have to use those functions when I was goofing with it. The VS 2008 debugger had no problem jumping right into the code though. In disassembled view, ofc. It is pretty horrible though.mhagain, on 15 Mar 2013 - 13:26, said:
I've never personally tested this, but http://msdn.microsoft.com/en-us/library/windows/desktop/aa366553(v=vs.85).aspx indicates that VirtualAlloc or VirtualProtect with the appropriate PAGE_EXECUTE flag should allow it.
It's still horrible though. Even if there are sometimes reasons why you may want to do it, it remains horrible (just imagine trying to debug it!)
IIRC, DEP is not enabled by default for user code with 32-bit apps, so you can generally get around needing it.
Yeah. I was expecting to have to use those functions when I was goofing with it. The VS 2008 debugger had no problem jumping right into the code though. In disassembled view, ofc. It is pretty horrible though.mhagain, on 15 Mar 2013 - 13:26, said:
I've never personally tested this, but http://msdn.microsoft.com/en-us/library/windows/desktop/aa366553(v=vs.85).aspx indicates that VirtualAlloc or VirtualProtect with the appropriate PAGE_EXECUTE flag should allow it.
It's still horrible though. Even if there are sometimes reasons why you may want to do it, it remains horrible (just imagine trying to debug it!)
IMO it is still a good idea though.
things are a little more strict on Linux though, and for daemons it is generally required to essentially "double map" the executable memory (to have separate RX and RW mappings, vs a single RWX memory region). thus far I haven't bothered yet, as at least for user-apps, the distros I have been using don't really seem to care that much.
Is RX even used? I remember that with protected mode you could set a selector to be unreadable and unwritable but executable (meaning you can get the CPU run code off it, but you can't get its instructions to read memory from that area), I'd imagine this is still possible in long mode.
it is basically the same as before just with the addition of an NX/XD bit to disable executing code.Is RX even used? I remember that with protected mode you could set a selector to be unreadable and unwritable but executable (meaning you can get the CPU run code off it, but you can't get its instructions to read memory from that area), I'd imagine this is still possible in long mode.
RX is basically useful for being able to put literal values in the data.
this can be useful for things like executable structures (such as when creating things like closures which mimic C function pointers), where a single data-object may contain both the executable stub-code, and also data members for things like the captured environment bindings, ...
lol, I was literally thinking about creating a thread about this a few minutes ago: writing assembly code without an inline assembler!!!
I've done this many times, but I've never actually used a string. I always used an dynamically allocated array of bytes defining my assembly code. I've never had to do this for a game (except once when I first started coding for MacOSX and I didn't know what Apple's breakpoint function was called), but I've done this very frequently when working on emulators. Not trying to brag, but some of you might actually recognize my nickname. If you do, then I guess I don't need further explanation. ^_^ It really comes in handy if you're writing a static-rec or dynarec engine (I prefer static rec).
For games, I'd only use it to write my own intrinsics for something like SSE, CPUID or those oddball instructions that aren't recognized by the inline assembler. Of course, this isn't an easy task, you'll have to read the docs on Intel's webpage: http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
WBINVD anyone?
Oh, and depending on what compiler you're using, debugging it isn't hard! XCode disassembles properly written byte encoded assembly code and steps through it normally. That's just one of the many reasons I absolutely LOVE XCode and MacOSX!
Shogun
@Shogun: yes, this is also a possible reason to write an assembler.
then you can dynamically assembler whatever, without having to manually invoke the Intel docs and generate the machine code sequences by hand.
also, in the simple case, writing an assembler isn't really hard, in my case most of the "work" was the long/tedious job of transcribing the instruction-listings and similar.
(initially, I wrote it over the course of a few days, but the assembler has expanded a bit since then).
though, yes, I have still used manually-generated machine code in a few places, as for certain uses this is more useful (the assembler isn't entirely free...).
as an example, from my listing:
add
04,ib al,i8
X80/0,ib rm8,i8 aleph
WX83/0,ib rm16,i8 aleph
TX83/0,ib rm32,i8 aleph
X83/0,ib rm64,i8
X02/r r8,rm8 aleph
X00/r rm8,r8 aleph
W05,iw ax,i16
WX81/0,iw rm16,i16 aleph
WX03/r r16,rm16 aleph
WX01/r rm16,r16 aleph
T05,id eax,i32
TX81/0,id rm32,i32 aleph
TX01/r rm32,r32 aleph
TX03/r r32,rm32 aleph
X05,id rax,i32
X81/0,id rm64,i32
X03/r r64,rm64
X01/r rm64,r64
...
inc
W40|r r16 leg
T40|r r32 leg
XFE/0 rm8 aleph
WXFF/0 rm16 aleph
TXFF/0 rm32 aleph
XFF/0 rm64
where 'X' basically means where to put the REX prefix, W/T where to insert the operand-size prefix (when relevant), ...
(other letters have other meanings, like V/S for address-size prefix, H/I/J/K/L for VEX and XOP forms, ...).
likewise: '/r' means "insert ModRM here", ',id' means immediate dword, ...
and the last line is basically for flags, such as to indicate when/where sequences are valid (CPU modes, ...).
'leg', basically means "legacy only", 'long' means long-mode only, and 'aleph' was basically for a past x86 subset.
I had once considered having a verifiable x86 subset sort of like NaCl (but more like the JVM verifier), but this idea didn't really go anywhere.
when the assembler is compiled, a tool basically takes the instruction listings, and converts them into the relevant C source files and headers (basically, it converts them into big prebuilt tables).