Resources for writing a JIT compiler for my own markup graph system (with focus on x86 and Android)

Started by
0 comments, last by Adam_42 11 years, 4 months ago
I'm working on my own markup system, which is graph based (a la Kismet). It can currently output GLSL code and it wouldn't be overly complex to extend it to make use of a scripting language such as Lua. I'm limited to the the same instruction set supported by modern shaders, although things like bitwise operations wouldn't make it into the initial set.


I'm trying to come up with a contingency solution for when the relevant shader profile isn't available or isn't desired, and emulate the same functionality in software. As mentioned, it wouldn't be too difficult to do this through LUA or make use of Dalvik directly on Android, but since I'm curious and somewhat stubborn, I'd like to at least try and write my own native code compiler - for starters for x86. With a limited instruction set, no planned code-generation optimizations or code parsing (since the input is a graph system), this should be a simple matter of stacking CPU instructions on top of each other.

There is a lot of material on the web on compiler programming, but I have a feeling I don't really need to delve into most of this. Can anyone suggest good webpages to get the kind of information I need or would I be better off just taking a reverse engineering approach, implementing all of the required functionality in a higher level language and working it out from the disassembly?

Note that I really am trying to jump the gun here by avoiding getting knee-deep in a field that has decades of experience and literature to show for it. However, I feel that in view of my modest requirements and no interest in focusing too heavily on the topic, the time required to parse through hundreds of resources could be more effectively spent elsewhere while still achieving what I need. If I'm being naive in my assumptions, don't be shy to just say so - this is something I'd like to try to do myself, not something I absolutely need.
Advertisement
To get decent performance I'd suggest an offline process that goes something like this (I'm assuming generating hlsl as well as glsl won't be too hard as they are very similar):

1. Run fxc.exe on the hlsl to turn it into assembly instructions and optimize it.
2. Generate a list of variables. To simplify swizzling ("foo.zwx") I'd treat each component as a separate variable.
3. Convert each variable declaration and assembly instruction into C/C++ code. It might be simplest to write a function for each instruction, and then just generate a bunch of function calls, which the compiler should inline.
4. Compile the generated code in your favourite compiler on the target platforms.

If you really want to generate machine code directly you need to replace steps 3 and 4. The new step 3/4 would be something like:

3. Allocate some memory to hold the variables, keeping track of where each one is. Then for each instruction in the program, translate that into a block of code, patching in the correct address for each variable and branch target. If the instruction is the start of a loop you'll also need to keep track of it's address so you can branch to it at the end of the loop.
4. Generate a single ret instruction at the end so you can call it via a function pointer from your C/C++ code. Parameter passing can be done by writing to those variables.

To create the block of machine code for each instruction you have a couple of options:
1. Use a C/C++ compiler to generate a function for each instruction as above. You then just need to generate a bunch of function calls passing the right parameters in machine code. This obviously adds a bunch of function calling overhead, but the functions themselves can be optimized, and you'll only have to generate a few different instructions (push address_of_variable; call address_of_function; test eax,eax; jnz start_of_loop_address).
2. Use a C/C++ compiler to generate code, and look at the assembly to work out where the addresses of the variables / branch targets go. This isn't as easy as you might think as the compiler may optimize things into registers, and use forms of instructions which take a relative offset instead of a complete address. So you'll have some work to do to get the instructions you need.


All of those options will require you to be comfortable with reading disassembly on your target platforms.

This topic is closed to new replies.

Advertisement