Jump to content

  • Log In with Google      Sign In   
  • Create Account





Epoch - Generics vs. Templates

Posted by ApochPiQ, 08 May 2012 · 1,159 views

Epoch
This is mostly just some rambling to try and organize my own thoughts regarding the Epoch type system, but it may also be interesting for general consumption, so I decided to post it here instead of just leaving it in my scratch book.


One of the big concerns I have for the next steps of Epoch's development is supporting generic programming. In a nutshell, this means writing code that is type agnostic - it works regardless of what's put into it, perhaps with some constraints, and doesn't require duplication for handling different types of data.

C++ templates are one approach to this; Java/C# have generics; and there are other approaches which are less type-safe such as the void* convention used in C.

Part of my goal in Epoch is to have a very rich type system, so anything weakly typed like void* is out. That leaves two major competing approaches, which I'm roughly dividing into template style and generic style. Perhaps, though, a more accurate naming convention would be compile-time genericity versus run-time genericity, which I think better captures the essential points of each respective philosophy.


Compile-time Genericity
This is the C++ template model. In brief, you have some code-like "stuff" which is not really code, but rather a sort of instruction set for how to generate some code. The fact that templates look a lot like real code is a major tripping point for a lot of newcomers, and one of the big reasons why I'm cautious about just jumping into compile-time genericity for Epoch.

Once you write a template, you then instantiate it by asking the compiler to "fill in the blanks" using a specific set of arguments. These arguments can be types or certain values in C++, but generally speaking we're more interested in the types aspect in this discussion. You can think of this process as sort of a mad-lib. The template gives you a, well, template - and the arguments are what you plug into the gaps in that template.

Whether you get reasonable code or nonsense depends a lot on what you put into the blanks. Of course, unlike a mad-lib, our goal is generally to get something sensible out instead of something wacky and humorous; but the analogy still works.

The problem is that the mechanisms used for this make it very difficult to explain why we get nonsensical results. A human reading a mad-lib knows intuitively that it sounds funny because you used "watermelon" as a noun instead of "uncle" or whatever; this alters the meaning of the resulting code and makes it nonsense. However, a compiler reading a template doesn't know that you can't multiply two strings. It just vomits out a type error which, generally speaking, will be long, complicated, hard to read, minimally useful, and scary as all hell.

I believe that with some serious engineering it could be possible to emit very smart error messages in the presence of template-generated compilation errors, but it would be a hard problem. The fact that C++ templates have existed for so long and still get a bad rap for their error quality is an indicator of how tough this issue really is.

There's one final aspect of compile-time genericity which I'd like to emphasize before moving on: it really is compile time. Once the code is generated from our template, we can't get back to the template itself anymore, let alone to anything interesting with it at runtime. This means that a container of Derived objects is not compatible with a container of Base objects, because the compiler has specialized the code for Derived and Base respectively. It no longer remembers - especially not while the program is running - that both containers are, say, linked lists, and both contain objects that should allow the container of Derived to be used in place of a container of Base, via the Liskov Substitution Principle and a little bit of creative interpretation.

That leads us right into...


Run-time Genericity
This is the meaning of the term "generics" as usually used by languages like Java and C#. Generics are similar to templates only in that they try to allow us to write code that operates in a type-agnostic fashion. Other than that, they are very different approaches.

In the generics world, we don't fill out a mad-lib and then throw away the original template; instead, we annotate the so-called "generic type" to indicate that it is in fact generic. The compiler uses this annotation to make sure that we don't try to put a string into a list of integers, but the really interesting bit is all done at run-time. In a JIT compiled language this gets rather hairy because there is a measure of template-like code generation going on, but the fundamental principle of generics is that they operate on run-time accessible type information instead of just baking the code together at compile time.

The implementation details of generics vary in different languages and platforms, but the principles remain roughly the same. Some generics systems are substantially more powerful and safe than others, for instance, but all generics systems essentially rely on run-time state to help track what's put into a generic object and what can be safely pulled back out.


Orthogonal Concerns
There are a couple of things worth mentioning which are orthogonal to compile-vs-run-time genericity. The first is constraints. One nice thing about C# generics is you can say "you can put anything in here you like, as long as it obeys these rules". Doing this in C++ is possible using policy-based design for instance, but is substantially harder to do correctly, and when it breaks, it spews out a proper tome of compiler errors that require superhuman unraveling skills to decipher.

I believe that a stronger type system than that used in classic C++ could allow for good template constraints and still emit useful error messages; one example of this was the proposal for constraints that was considered for the C++11 standard. Given that Epoch starts from an even stricter and more potent type system, I think it would be within the realm of possibility to do good constraints systems regardless of how the genericity is obtained initially.

The other interesting facet of all this is subtyping. As is well known, subtyping can wreak havoc with type inference, and combined with genericity can make a lot of type inference questions undecidable. That's obviously bad. Subtyping is useful for a lot of program models, but not strictly necessary, and I think there are ways to achieve similar power without actually using traditional OO-style inheritance and other subtyping approaches.

The critical observation on that front is that interfaces and implementations are traditionally things which look tightly coupled but need not be. Anyone who has used Epoch in its current incarnations will notice that it looks remarkably procedural: there are no member methods or functions allowed on objects. (Objects can hold function pointers, but that's not the same thing.)

My idea here is to make the split explicit in the language design itself. Instead of having "classes" which combine data and functionality, I want to keep everything cleanly separated. There will be representations and contracts.

A representation can be any type, including a primitive, a sum type, an aliased type, a structure, a generic type, and so on. A contract is a mechanism by which you interact with an opaque entity. Contracts do not carry any guarantees of how they do their job; they just specify how you can interact with an object.

To get a traditional class, you write a representation, then a contract that describes the functionality. There may or may not be a step where you say "this type means 'facilitate this contract using this representation'" but I haven't really thought it out that far yet.



I wanted to concretize some of this with a hypothetical code example, but I couldn't think of anything offhand that really suits the idea. Either this needs more thought in general, or I need more sleep in general. Take your pick.




Something I always found interesting was D's "template mixin" concept - it was like a scope-aware C++ template mixed with macros.

If you're asking us to take our pick in terms of Epoch - I'd say "pick whichever fits the purpose of the language the best". The classic "it depends" answer, but it's really about fit to the problems you want to solve. From what you've said about contracts, etc - you may wish to look into Microsoft research's Singularity OS project, specifically Spec#/Sing# which is designed from the ground up to work with contracts.
Generics in java are compile-time generics too (don't know about C#). Once the code has been compiled, the additional class informations are removed. That's the reason you can't determine at runtime if it is a ArrayList<Integer> or ArrayList<Boolean>.

Living in both worlds, C++ and Java, for more then a decade, I would always prefer templates over generics. Generics, atleast in java, are the clumpsy way to add template support without reengineering the whole language or adding a pre-compiler.
Yeah, I definitely hate Java generics, although I tried to keep my disparaging remarks about them out of the post :-)

Type erasure is just a really bad solution to generic implementation, and I don't consider it an option for any newly designed, serious language. .NET doesn't use type erasure, so it's a much better approach IMO.

I'm honestly leaning towards .NET style generics because they have a lot of advantages and are in some ways easier to implement than true compile-time templates, at least if you want quality code generation. I'm willing to be swayed, though :-)

October 2014 »

S M T W T F S
   1234
567891011
12131415161718
1920 21 22232425
262728293031 

Recent Comments

PARTNERS