There has been a lot of interest on the forums about creating custom programming languages. Since I have done this, I was thinking of doing a semi-retrospective, semi-rewrite to illustrate how to build a working programming language interpreter. I was considering doing this for quite some time, but I didn't quite have the motivation to do so.
I'm going to start with a basic Lisp syntax, using a hand rolled parser. Integrating a proper parser is a hurdle that I don't want to place too early in the process. I do plan to do it eventually so I can use a nice custom syntax. Most of my initial effort will be concentrated on getting the interpreter up to scratch. Its working title is "Rasp", which is an ugly mix of my user name and "Lisp".
I haven't tied down all the details yet, but I'm aiming for a semi-dynamic language, with optional explicit but otherwise inferred strong typing. That is, a function can be written without mentioning types. If a variable are used in a context where the type is known (e.g. passed to a library function or explicitly typed) then the type will be inferred and an error occurs trying to invoke the function with an incorrect type. Types will be be inferred recursively, so a generic function calling a second generic function which calls a typed function will have the types propagated all the way back. In certain circumstances, I expect the type will not be deducible. For the moment, I'm going to do something similar to "autoboxing" where in such cases the program will generate type errors at runtime rather than compile time. A variable would be strongly typed, once given a type (explicitly via declaration or implicitly through first assignment) it would not be legal to assign it to an unrelated type.
Some or all of these ideas might be compromised if they end up contradicting each other, or if they overly complicate implementation or finally if they just aren't nice to use. In any case, almost none of them will be present early on. Hopefully it won't be too difficult to add them later.
Some of the more advanced things I'm interested in investigating are automatic type decomposition to enable "data oriented programming" without imposing a large burden on the programmer, or possibly an explicit way of separating the physical arrangement of the data from the logic arrangement, to put the programmer in charge of their data layout. Other ideas include a hybrid approach to memory, using RAII where possible and garbage collection only where cycles can occur. I think the type system I'm considering might cause the compiler to be unable to infer a useful type, it would end up inferring something like "Object" in managed languages, though I'd prefer to omit such an arbitrary base class from the language if possible. To achieve this I may be forced to use explicit types when defining data types. I'll need to investigate.
I'd also like a language that helps programmers with multi-threaded code, or at least doesn't hinder them in quite the same way that other languages do. The plans I have for this are more nebulous at the moment, we'll see how it goes. Maybe something like language support for transactions, which can be rolled back and made a critical section.
I'm also thinking about an alternative to exceptions and error codes, or possibly merging them together - for example: checked return codes, the compiler will reject code that does not test the return value of marked functions. Investigate some kind of language support to ease the burden of passing error codes through a function call hierarchy, which might be aided by the inference engine I'm including. We'll see.
Long term goals include optimising the implementation to the point that people might want to actually use it, and eventually JIT compilation using LLVM. I would consider targeting a prewritten VM but it don't think it will allow me to get to some of the juicier advanced topics.
But the first few iterations will be very simple. I threw together an interpreter yesterday using my old "Lisp" project as a base. It lacks a parser. I'll get something simple working for the weekend, during which I hope to publish (or at least write) my first "article". I am currently testing it by building a raw instruction list and executing it. The first iteration will support integer literals and calling arbitrary C++ functions, hopefully even in a nested manner.
I started from scratch, not because I want to but because that is the best way to demonstrate the iterative process over the series of articles. Even stripped down to quite basic functionality, the base code is not small. I believe it is about the smallest useful size that will give me a good bit to talk about without being overwhelming. Hopefully it will be about the size that my intended audience can manage.
I know I can get it to the point where my last Lisp language was, which was reasonably powerful but with glaring omissions, notably user defined types. I should be able to get a few articles out until that point, then I can start braching into some of the topics I mentioned above.
The main complication is that my time is divided because I'm working on a game with a friend at the moment. We're getting close to a potential test of a gameplay mockup, we're aiming to have the program ready by the end of January and possibly conduct some tests in the months following.