Idea Spec: FRBC Design
while still not settled if I will do much with this, I did go and start working on a spec for a new bytecode format:
intended source languages would probably be BGBScript and C. (the design is based on an incomplete IR I was originally intending as a new IR for BGBScript, but with a recent idea is being mostly re-purposed for use with C).
specifics are still not entirely settled, nor are things really complete, but the idea is more to set up an idea for what the "completed" form would look like.
it seems at least possible to consider compiling C to a bytecode-based IL;
this could potentially allow running the same binary code on more than one target architecture (either via interpretation, or via JIT or AOT compilation).
the spec in this case is using a statically-typed register-machine IL;
as expected of an IL for use with C, it will provide support for pointers and for pointer arithmetic.
however, another sub goal is to allow "verifiable safety" regarding operations, namely so that while pointers are available and can do all their usual "pointer stuff", it is also possible to keep the code sand-boxed and avoid it stomping memory. this will come, however, at some performance cost, but the cheaper of the options in this cases is to generally use fat-pointers for areas of code making frequent use of pointers, but possibly falling back to narrow pointers for cases where fat-pointers are unnecessary, or would create other issues (such as hindering inter-operation with native code, which generally uses solely narrow pointers).
an alternative to the use of fat-pointers is basically to use lookups on access to implement barrier checks, but this has a few drawbacks:
lookups aren't free (they would cost more than validating that a pointer is still in range);
they wont detect cases where the pointer has "jumped ship" from one memory object to another, which is possible if intermediate operations are done with fat pointers.
however, by themselves, fat-pointers still don't necessarily address the matter of type-safety, namely validating that a pointer held within a data-structure is not overwritten.
however: many of these cases can be at least partially detected and dealt with:
the code can't be "proven safe" in this case, meaning that it can be possible to fall back to the use of explicit checks;
fat-pointers can be kept outside of memory areas writable to the code itself (in these cases, things like structures will typically only contain narrow pointers, and if the VM has reason to suspect that a pointer might have been overwritten by an operation, it can be verified that whatever address is held is still "valid" in that it refers to an accessible region of memory).
this sort of validation is likely to be needed in cases involving things like accessing pointers within unions, explicit conversions between pointers and integers, and generally cases where "suspicious" memory operations are performed involving an object (such as a memset or memcpy call, ...).
in this sense, the memory manager will also take up a role in terms of memory-access validation (well, and probably also things, like performing leak detection, ...).
granted, to go well with C tradition, we can't necessarily trap as soon as an invalid pointer is generated (this could prevent some otherwise valid code from working), but ideally need to trap when either:
memory is actually accessed via such an invalid pointer (behaving in this case more like a traditional memory access exception);
such a pointer moves into a "required safe" area, which basically means where it is known statically that an access will occur via this pointer (or alternatively, where no later validation checks would be possible).
the latter case is likely to be more of an effect when converting fat-pointers back into narrow pointers, where it could be required that whenever such an operation occurs, the pointer will need to be pointing at a "valid" address for the source object.
if code is "untrusted", all this probably also means denying access to any memory which it doesn't specifically have access to, and similarly placing limits on what sorts of API functions may be used (say, untrusted code is limited to a set of white-listed wrapper APIs, and may not have direct access to the underlying native APIs).