which cpu to model in VM interpreter?

Started by
11 comments, last by DracoLacertae 11 years, 3 months ago

Suppose I'm writing my own virtual machine interpreter, and I want to target it with a 'real' language like C. I could retarget the backend of gcc, but I hear that work is horrendous. What I would rather do is abuse GCC already written for another processor. Back in school, we had a stripped-down gcc that output to a subset of MIPS, and we had to write a MIPs emulator, so it would be similar to that, just much more feature-rich.

I suppose this more of a question for those who have written a lot of assembly language: What CPU do you think is most ideally suited for writing an interpreter? Keep in mind that I don't necessarily need to run a modern style OS on the VM, and the memory architecture can be simple. I'm even considering subsets of certain processors. For instance, a 386 in real mode has 32-bit registers, floating point support (387), and there's a trick to address 4gb flat from real mode (called unreal mode.) If my interpreter only acted as a 386 in 'unreal mode', that would be fine by me. But x87 opcodes are complex... I want a simpler processor.

Why do I want to do this? For fun. Several years ago, I wrote a little VM and my own language and gave it bindings to a C library I wrote that did basic graphics. The vm would call functions like, load this image file, draw is image here, etc, and the C library did all the grunt work. I kind of want to do the same thing again, but with OpenGL 3d graphics. I could invent a new pet language again, and I might, but if I could get plain C compiling to vm bytecode that would be great. Some of the optimizations gcc does (loop unrolling, moving constants out of loops etc) would benefit bytecode programs probably quite a bit.

And before someone lists it as an x86 interpreter, I am considering just adding opengl calls to dosbox. If I reserve certain addresses for communication to the outside of the VM, there wouldn't be anything stopping me from writing 3d accelerated apps in qbasic or turbo c!

Advertisement
Your goals are fundamentally in opposition to one another.

If you want to use an existing and functional instruction set in your VM, you need to build a pretty rich VM.

On the other hand, if you want to build a simple VM that you can reasonably finish in a moderate timespan, you'll need to pick an instruction set that isn't terribly complicated.


x86 is probably the worst instruction set to try and emulate, at least as far as popular contemporary instruction sets go. x64 is a little better in terms of cleanliness but it's still huge.

I'd recommend a RISC architecture to begin with, since by definition they're mostly simple operations and their complexity comes from having lots of registers. Something old is probably good, too, but there are good modern RISCs out there like the ATMega line.


Actually, come to think of it, your sweet spot might be to do something like an Arduino software emulator. The hardware is well-specified and toolchains exist for it already, so you don't need to roll your own compiler; the instruction set is pretty clean; and the VM implementation can be trivially checked against other emulators or actual Arduino hardware if you really want.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

I'd suggest going for something for which there are existing emulators. For example the 68000 might be simple enough, but still has plenty of support around because it was used in the Atari ST and Amiga 500. It also has 32-bit registers so it's not too limited. It's also documented - http://www.easy68k.com/paulrsm/doc/68kprm.pdf

Of course you will have the added fun of it being big endian, so there might be better options around.

The 68k is a total CISC beast. I wouldn't recommend trying to emulate it as something "simple."

If you want extreme in simple, do 6502. It only has a handful of registers and no more than 256 opcodes.

Of course, if you don't mind using someone else's emulator core to create your VM, you could use just about anything. You could also create your own VM machine code, but that'd require you to write the assembly back end for any C compiler you used, which is fairly nontrivial.

AVR32 and PowerPC would be my choices. Both RISC. Both well supported. Also both are well documented.

Go for MMIX.

It's a fictional processor Donald Knuth uses in The Art of Computer Programming books. 64-bit and RISC.

"I can't believe I'm defending logic to a turing machine." - Kent Woolworth [Other Space]

MMIX is a nice instruction set, but I'd be careful about using it learn how to build a VM. It's a very idealistic design and while it's excellent stuff (hell, it's Knuth!) it won't teach you the ugly reality of dealing with actual hardware instruction sets etc.

I stick by ATMega because you can buy an Arduino and verify your VM's behavior. Trying to simulate a fast CPU is going to be extremely hard; trying to simulate a slow but effective CPU gives you room to eat the software emulation overhead without making the VM useless.

In other words, write a program that works nicely on an Arduino or similar ATMega chipset, and then pump it through the VM. If all goes well, your VM should be competitive in performance with the real hardware (which ain't gonna happen if you want to do modern PowerPC, and MMIX in particular has no reference hardware) and you can validate the VM's behavior for free.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Some notes regarding common RISC architectures:

0. Read about SPARC. It's a free architecture with full micro-architecture documentation!

1. As ApochPiq said, the Atmel AVR family is quite popular due to the Arduino platform.

2. The ARM architecture is the common/industry standard among hardware manufactures.

3. Also, to whet your appetite: PowerPC is used in the Curiosity rover..

MMIX is a nice instruction set, but I'd be careful about using it learn how to build a VM. It's a very idealistic design and while it's excellent stuff (hell, it's Knuth!) it won't teach you the ugly reality of dealing with actual hardware instruction sets etc.

Indeed, I incremented RatTrap's post for reading "The art of Computer Programming" series ! smile.png

Something else that no one has mentioned yet is LLVM (Low Level Virtual Machine). It is designed to be a platform for exactly the kind of thing you are thinking of. It is used as a platform onto which higher level virtual machines can be implemented. I would highly recommend that you check it out as it is very cool. Plus it is licensed under a very liberal open source license so if you have commercial aspirations that shouldn't be a problem.

http://vmkit.llvm.org/

LLVM is an awesome instruction set, but it's not the easiest thing ever to actually build a VM for. It's more designed for translation down to machine code than live execution. There's a lot of very "strange" instructions that don't correspond in any way to hardware CPUs that you have to understand very well in order to execute LLVM bitcode (phi nodes and GEPs come to mind as just a couple of examples).

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

This topic is closed to new replies.

Advertisement