Most scripting languages include some form of VM (some just directly execute parse trees, but those are dying out).
The gist is pretty simple. You can build a naive VM for a custom language using a structure similar to:
enum class OpCode : uint8_t {
Noop,
LoadConstant,
Add,
Subtract,
JumpIfEqual,
JumpIfNotEqual,
// etc.
NumOpCodes
};
using RegisterName = uint8_t;
constexpr size_t NumRegisters = 128;
struct Instruction final {
OpCode opCode = OpCode:::Noop;
RegisterName dest = 0;
RegisterName src1 = 0;
RegisterName src2 = 0;
union {
double dImmediate;
size_t iImmediate;
};
};
class Context final {
std::array<double, NumRegisters> m_Registers;
std::vector<Instruction> m_Instructions;
size_t m_CurrentInstruction = 0;
public:
bool LoadAssembly(std::string const& path);
void Execute();
};
bool Context::LoadAssembly(std::string const& path) {
// parse out a very simple assembly language here using istream and friends
return success;
}
void Context::Execute() {
while (m_CurrentInstruction < m_Instructions.size()) {
Instruction const& inst = m_Instructions[m_CurrentInstruction++];
switch (inst.opCode) {
case OpCode::Noop:
break; // do nothing
case OpCode::LoadConstant:
m_Registers[inst.dest] = inst.dImmediate;
break;
case OpCode::Add:
m_Registers[inst.dest] = m_Registers[inst.src1] + m_Registers[inst.src2];
break;
case OpCode::JumpIfEqual:
if (m_Registers[inst.src1] == m_Registers[inst.src2])
m_CurrentInstruction = inst.iImmediate;
break;
// etc.
default:
throw std::exception("Unknown instruction");
}
}
}
You can build from there to include more than just double types, to include a stack for function calls, and so on. You can read the specification for an existing bytecode format (say, Java) and load that instead of a custom format (though often your VM will need to be specially tailored for that specific bytecode, since the format places requirements on the VM). You can study language parsing and compiling to get your own high-level language (or another high-level language) compiling into your VM bytecode's format. With some work on the above you can optimize your binary format so you're not wasting so much space in Instruction when it's not needed, you can implement better instruction dispatch, and so on.
Your VM's design is going to vary based on whether you're interpreting a dynamic language, a static language, a "trusted" language, etc. Lua's simple (and open source) VM is based on a dynamic trusted bytecode (since you can't load raw bytecode and have to compile from source when loading a script, there's no worry that you'll encounter malformed or malicious instructions). Java (and there are open source JVMs you can inspect) uses a static untrusted bytecode. .NET/C#/VB (Mono is an open source implementation of a .NET VM) uses a more advanced static untrusted bytecode.
You can then look into libjit or various related libraries for generating raw machine instructions and avoid (much) of the need of a custom VM.
For a top-of-the-line VM with JITting, you'll want to start reading the source code to things like LuaJIT or the various JavaScript engine implementations. The state of the art is very advanced and complex, but thankfully there are a hobajillion articles on the Web about all the advancements made in the open source JavaScript VMs and LuaJIT and even the experimental Java VMs. The tricks used to enable JIT support in a dynamic language are very non-trivial, so you might find it easier to implement a static-type VM if you're interested in exploring JITting, at least at first.