Jump to content

  • Log In with Google      Sign In   
  • Create Account


Inline assembly

  • You cannot reply to this topic
17 replies to this topic

#1 ultramailman   Prime Members   -  Reputation: 1563

Like
0Likes
Like

Posted 01 March 2013 - 07:42 PM

Not mine, I just came across it while googling.

Machine instructions are put inside a string literal. Said literal is then casted to a function pointer to be called. Pretty exotic, what do you think?

 

http://stackoverflow.com/a/5602143

 

 



Sponsor:

#2 Bacterius   Crossbones+   -  Reputation: 8494

Like
4Likes
Like

Posted 01 March 2013 - 09:29 PM

Nice hack, though it won't usually work nowadays because instructions outside code segments will not be run by the operating system. You could also just use a char array as { 0xC3, 0xDD, ... } instead of using a string literal with escape codes.

 

Also, you're fired tongue.png


The slowsort algorithm is a perfect illustration of the multiply and surrender paradigm, which is perhaps the single most important paradigm in the development of reluctant algorithms. The basic multiply and surrender strategy consists in replacing the problem at hand by two or more subproblems, each slightly simpler than the original, and continue multiplying subproblems and subsubproblems recursively in this fashion as long as possible. At some point the subproblems will all become so simple that their solution can no longer be postponed, and we will have to surrender. Experience shows that, in most cases, by the time this point is reached the total work will be substantially higher than what could have been wasted by a more direct approach.

 

- Pessimal Algorithms and Simplexity Analysis


#3 Khatharr   Crossbones+   -  Reputation: 2958

Like
0Likes
Like

Posted 02 March 2013 - 03:53 AM

Actually I did a goof off project in Windows not long ago where I composed bytecode into an array of unsigned chars and then call it. I expected a page fault, but never got one. It ran with no complaints.
void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

#4 Waterlimon   Crossbones+   -  Reputation: 2457

Like
0Likes
Like

Posted 02 March 2013 - 04:07 AM

It might be embedded to the code of the program, making it not complain when you run it...

o3o


#5 darookie   Members   -  Reputation: 1437

Like
2Likes
Like

Posted 02 March 2013 - 04:30 AM

It might be embedded to the code of the program, making it not complain when you run it...

Since string literals in C are not marked as executable it's much more likely that DEP is simply not enabled for all programs.



#6 Khatharr   Crossbones+   -  Reputation: 2958

Like
2Likes
Like

Posted 02 March 2013 - 04:33 AM

Dynamically allocated. I actually allocated a huge buffer, then calculated the amount of it to use for data and the part to use for the bytecode that would manipulate said data. Insane laughter ensued.

Could have been DEP. I've upgraded my OS since then, so I can't check to see what the setting was.

Edited by Khatharr, 02 March 2013 - 04:34 AM.

void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

#7 Zao   Members   -  Reputation: 878

Like
1Likes
Like

Posted 15 March 2013 - 09:10 AM

If you want to do runtime x86 and x86_64 code generation properly (and yes, there's use cases for it), I would recommend using Xbyak. Nice little C++ DSL to generate machine code at runtime, complete with labels, jumping to them and all sorts of nice stuff.


To make it is hell. To fail is divine.

#8 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 15 March 2013 - 12:54 PM

FWIW: in my case I am using my own assembler library (BGBASM), where it is fairly common to basically just assemble globs of code (as textual ASM), and then call into it using function pointers.

this is also used somewhat in cases where normally inline ASM would be used, but is slightly more portable between compilers, and a little more flexible (since the code generated can be specialized based on settings at runtime or similar).


the assembler currently has x86 and x86-64 support, and also partial / mostly untested ARM and Thumb support, and misc things like an NaCl sub-mode (which aligns labels and similar), and uses an NASM derived syntax (and C inspired macro facilities).

its API is basically like begin/end pairs with a bunch of printf-like calls in between (this is what seemed to be what was most convenient for my uses FWIW).

it can also produce and accept COFF objects (also still used on Linux), and has wrappers code over some OS APIs (namely for loading DLLs and SOs). internally it uses a partly disjoint assemble and link stages, and the code is "linked" against the running program image using a big region of RWX memory.


currently there is no standalone download for it though (could put it online if anyone is interested), though AFAIK YASM can do something similar.

#9 ApochPiQ   Moderators   -  Reputation: 14895

Like
0Likes
Like

Posted 15 March 2013 - 01:58 PM

LLVM's JIT engine is pretty much the definitive kick-ass explosion-laden version of this concept :-)

#10 mhagain   Crossbones+   -  Reputation: 7803

Like
0Likes
Like

Posted 15 March 2013 - 02:19 PM

I've never personally tested this, but http://msdn.microsoft.com/en-us/library/windows/desktop/aa366553%28v=vs.85%29.aspx indicates that VirtualAlloc or VirtualProtect with the appropriate PAGE_EXECUTE flag should allow it.

 

It's still horrible though.  Even if there are sometimes reasons why you may want to do it, it remains horrible (just imagine trying to debug it!)


It appears that the gentleman thought C++ was extremely difficult and he was overjoyed that the machine was absorbing it; he understood that good C++ is difficult but the best C++ is well-nigh unintelligible.


#11 ApochPiQ   Moderators   -  Reputation: 14895

Like
0Likes
Like

Posted 15 March 2013 - 02:38 PM

I use VirtualProtect and PAGE_EXECUTE to dynamically generate re-entrancy stubs in the Epoch runtime. Basically it allows you to hand off Epoch code as a callback to a C library and get it invoked correctly, using a generated machine-code bridge function and a little bit of marshaling magic.

#12 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 15 March 2013 - 03:33 PM

LLVM's JIT engine is pretty much the definitive kick-ass explosion-laden version of this concept :-)

yep, I didn't know about LLVM when I wrote mine, and when I encountered it didn't want to have to go through the hassle of converting all my stuff over to work with it.


I was originally exposed to how JITs and assemblers by looking at the Quake3 JIT, and also the NASM source. I ended up making my assembler as more of a compromise.

it is internally string-driven, using strings consisting of a mix of commands and hex-digits (giving literal byte values), using a notation derived from that used in the Intel docs (just with other letters to indicate various prefixes, ...).

(~ 75% of the work was mostly transcribing the contents of the Intel docs into a big assembler listing). (this notation was later hackishly extended to support ARM and Thumb, and transcribing their block-notation into the listing's notation).

basically, the mnemonic is used to lookup the first instruction form, and it will walk along until it finds one where the arguments match those parsed from the ASM code, then pass this off to a function to convert the ASM listing-string and arguments into the output opcode.

initially, an early JIT of mine was using it at a much lower level, generally using assembler internal functions directly, but I later ended up mostly switching over to a textual ASM interface mostly as this was much nicer to work with, and switched from directly spitting out machine code into a buffer on each command, to buffering code into sections in an object-context (and subsequently linking the produced object).

I have generally found it to be "fast enough". the cost of parsing the ASM code doesn't add all that much to the total cost of assembling it. much more relevant are things like whether or not the preprocessor are enabled, and whether or not it is using multi-pass jump-length optimization, ...


my higher-level JITs spit out textual ASM code for the target.

I have on/off had ideas for a medium level "portable lower-level intermediate language" (as an intermediate between a language-specific back-end and the native target-specific ASM), but thus far not much has been done. a recent partial effort (~ last year) would have been loosely inspired by the Dalvik IR (and was being designed along with a reasonably fast interpreter), but I ended up mostly sticking with my existing script-language back-end (its JIT is naive, but "generally good enough").


part of the challenge for these low-level IRs is mostly that of effectively glossing over the various C ABIs in a "sensible" (reasonably efficient) way (partly due to IMHO the SysV/AMD64 ABI being "filled with evil" in some areas, and me mostly using a subset which is mostly compatible as long as one isn't passing complex structs by value across language boundaries, I used a simplification here: structs are either passed as a single register (if it will fit), or by using a reference/pointer, without any sort of structure-decomposition).

#13 Khatharr   Crossbones+   -  Reputation: 2958

Like
0Likes
Like

Posted 15 March 2013 - 08:21 PM

mhagain, on 15 Mar 2013 - 13:26, said:
I've never personally tested this, but http://msdn.microsoft.com/en-us/library/windows/desktop/aa366553(v=vs.85).aspx indicates that VirtualAlloc or VirtualProtect with the appropriate PAGE_EXECUTE flag should allow it.

It's still horrible though. Even if there are sometimes reasons why you may want to do it, it remains horrible (just imagine trying to debug it!)

Yeah. I was expecting to have to use those functions when I was goofing with it. The VS 2008 debugger had no problem jumping right into the code though. In disassembled view, ofc. It is pretty horrible though. biggrin.png
void hurrrrrrrr() {__asm sub [ebp+4],5;}

There are ten kinds of people in this world: those who understand binary and those who don't.

#14 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 16 March 2013 - 01:08 AM


mhagain, on 15 Mar 2013 - 13:26, said:
I've never personally tested this, but http://msdn.microsoft.com/en-us/library/windows/desktop/aa366553(v=vs.85).aspx indicates that VirtualAlloc or VirtualProtect with the appropriate PAGE_EXECUTE flag should allow it.

It's still horrible though. Even if there are sometimes reasons why you may want to do it, it remains horrible (just imagine trying to debug it!)

Yeah. I was expecting to have to use those functions when I was goofing with it. The VS 2008 debugger had no problem jumping right into the code though. In disassembled view, ofc. It is pretty horrible though. biggrin.png


IIRC, DEP is not enabled by default for user code with 32-bit apps, so you can generally get around needing it.
IMO it is still a good idea though.

things are a little more strict on Linux though, and for daemons it is generally required to essentially "double map" the executable memory (to have separate RX and RW mappings, vs a single RWX memory region). thus far I haven't bothered yet, as at least for user-apps, the distros I have been using don't really seem to care that much.

#15 Sik_the_hedgehog   Crossbones+   -  Reputation: 1596

Like
0Likes
Like

Posted 16 March 2013 - 01:24 PM

Is RX even used? I remember that with protected mode you could set a selector to be unreadable and unwritable but executable (meaning you can get the CPU run code off it, but you can't get its instructions to read memory from that area), I'd imagine this is still possible in long mode.


Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#16 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 16 March 2013 - 07:04 PM

Is RX even used? I remember that with protected mode you could set a selector to be unreadable and unwritable but executable (meaning you can get the CPU run code off it, but you can't get its instructions to read memory from that area), I'd imagine this is still possible in long mode.

it is basically the same as before just with the addition of an NX/XD bit to disable executing code.

RX is basically useful for being able to put literal values in the data.
this can be useful for things like executable structures (such as when creating things like closures which mimic C function pointers), where a single data-object may contain both the executable stub-code, and also data members for things like the captured environment bindings, ...

Edited by cr88192, 16 March 2013 - 07:04 PM.


#17 blueshogun96   Crossbones+   -  Reputation: 874

Like
0Likes
Like

Posted 21 March 2013 - 10:37 PM

lol, I was literally thinking about creating a thread about this a few minutes ago: writing assembly code without an inline assembler!!!  

 

I've done this many times, but I've never actually used a string.  I always used an dynamically allocated array of bytes defining my assembly code.  I've never had to do this for a game (except once when I first started coding for MacOSX and I didn't know what Apple's breakpoint function was called), but I've done this very frequently when working on emulators.  Not trying to brag, but some of you might actually recognize my nickname.  If you do, then I guess I don't need further explanation. ^_^  It really comes in handy if you're writing a static-rec or dynarec engine (I prefer static rec).  

 

For games, I'd only use it to write my own intrinsics for something like SSE, CPUID or those oddball instructions that aren't recognized by the inline assembler.  Of course, this isn't an easy task, you'll have to read the docs on Intel's webpage: http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

 

WBINVD anyone? wink.png

 

Oh, and depending on what compiler you're using, debugging it isn't hard!  XCode disassembles properly written byte encoded assembly code and steps through it normally.  That's just one of the many reasons I absolutely LOVE XCode and MacOSX! cool.png

 

Shogun


Follow Shogun3D on the official website: http://shogun3d.net

 

blogger.png twitter.png tumblr_32.png facebook.png


#18 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 22 March 2013 - 03:07 AM

@Shogun: yes, this is also a possible reason to write an assembler.

 

then you can dynamically assembler whatever, without having to manually invoke the Intel docs and generate the machine code sequences by hand.

also, in the simple case, writing an assembler isn't really hard, in my case most of the "work" was the long/tedious job of transcribing the instruction-listings and similar.

 

(initially, I wrote it over the course of a few days, but the assembler has expanded a bit since then).

 

though, yes, I have still used manually-generated machine code in a few places, as for certain uses this is more useful (the assembler isn't entirely free...).

 

 

as an example, from my listing:

add
        04,ib            al,i8
        X80/0,ib        rm8,i8        aleph
        WX83/0,ib        rm16,i8        aleph
        TX83/0,ib        rm32,i8        aleph
        X83/0,ib        rm64,i8
        X02/r            r8,rm8        aleph
        X00/r            rm8,r8        aleph
        W05,iw        ax,i16
        WX81/0,iw        rm16,i16        aleph
        WX03/r        r16,rm16        aleph
        WX01/r        rm16,r16        aleph
        T05,id        eax,i32
        TX81/0,id        rm32,i32        aleph
        TX01/r        rm32,r32        aleph
        TX03/r        r32,rm32        aleph
        X05,id        rax,i32
        X81/0,id        rm64,i32
        X03/r            r64,rm64
        X01/r            rm64,r64

... 

inc
        W40|r            r16            leg
        T40|r            r32            leg
        XFE/0            rm8            aleph
        WXFF/0        rm16            aleph
        TXFF/0        rm32            aleph
        XFF/0            rm64

 

where 'X' basically means where to put the REX prefix, W/T where to insert the operand-size prefix (when relevant), ...

(other letters have other meanings, like V/S for address-size prefix, H/I/J/K/L for VEX and XOP forms, ...).

 

likewise: '/r' means "insert ModRM here", ',id' means immediate dword, ...

 

and the last line is basically for flags, such as to indicate when/where sequences are valid (CPU modes, ...).

 

'leg', basically means "legacy only", 'long' means long-mode only, and 'aleph' was basically for a past x86 subset.

I had once considered having a verifiable x86 subset sort of like NaCl (but more like the JVM verifier), but this idea didn't really go anywhere.

 

 

when the assembler is compiled, a tool basically takes the instruction listings, and converts them into the relevant C source files and headers (basically, it converts them into big prebuilt tables).


Edited by cr88192, 22 March 2013 - 03:08 AM.






PARTNERS