designing a programming language for fun and learning

Started by
7 comments, last by Seoushi 14 years, 11 months ago
So I've decided to design a programming language, mostly for fun and learning before I start doing anything serious next year when I get in to collage. Right now I'm having problems figuring out if this type system would work - so I thought maybe you could help me out, share some ideas. Basically the language would be strong/static typed (types must be determinable at compile time), but types would be implicit by default. Example :
class Foo:
   def bar
   def someint as int

   def __init__():
      someint = 1
      # this would generate an error since bar is not assigned to and the type of 'bar' can't be determined


   def __init__(x):
      bar = x # here the bar has the same type as x parameter which can only be determined by the compiler
      someint = 0

   def __init__(x as int):
      bar = x # here the bar is int
      someint = 1.0f # error - someint is defined to int

Class members don't need to have types specified until object construction. After the constructor all members must be assigned to and have their types determined. The new object will have a type that will be a instantiation of the partially defined class. So if i construct Foo like this
a = new Foo(new X())

The typeof(a) would be Foo<bar=X>. I could also use this notation to specify types, eg:
a = new Foo<bar=int>(1.0f)

This would generate an error since bar member is explicitly typed to int It's obvious that for Foo<bar=x> the constructor using int would not work, and would generate compile-time error, but this does not seem like a problem IMO (it would be like specialization in c++ templates). My questions : A) would this type system work, and/or what problems do you see with this approach ? Any other languages featuring similar types system ? B) Since this code would be statically compiled the compiler would have to generate code for each type combination of some class/function/method. This would probably generate large binary files, but what concerns me more is how would this affect runtime performance. Is the CPU smart enough with instruction loading that it can efficiently cache code parallel to the program execution or is the loading serial, eg: -------predict calls and load---------- ----execute-----------------call------- or execute -> call -> cache miss, load -> continue Would this cause big performance penalties ? C) One obvious problem is that virtual members need to have fully defined types, but if I am already creating a specialized function for each type combination can I ignore the vtable and inheritance and treat classes as structures at the code generation level ? Eg.

class X:
   ...

class Y(X):
   ...

def foo(object as X):
   ...

y = new Y()
foo(y as X)


When I call foo(y as X) the compiler 'knows' that y is actually type Y. Instead of generating code for one function 'foo' that takes a X type object and calls methods trough vtable, compiler generates a specialized instance of function 'foo' which accepts and works with Y. Only the semantic analyser "pretends" foo works with type X. Then classes, inheritance, etc. become a feature on the level of semantics analysis, but in the code generation X and Y have nothing with each other except that X members are a subset of Y members. Anyway, I have this strange feeling that there are some problems with this approach which would prevent it from working but I can't really tell what it is - so any comments/ideas/suggestions/corrections/further reading links are welcome :)
Advertisement
Quote:Original post by RedDrake

A) would this type system work, and/or what problems do you see with this approach ? Any other languages featuring similar types system ?

B) Since this code would be statically compiled the compiler would have to generate code for each type combination of some class/function/method. This would probably generate large binary files, but what concerns me more is how would this affect runtime performance.


A) Other than to say it is strongly statically typed you say nothing about the type system, so yeah any strong statically typed language has done something similar with their type system. What you go on to suggest looks like templates\generics in every language that has them.

B) It would only have to generate code for each class\function\method for each type it is instantiated with. If I instantiate class foo with int and only call method bar there is no reason to generate code for function baz(unless you have somesort of dynamic linking mechanism). If I never instantiate foo with string then there is no need to generate code for it.
Quote:This would probably generate large binary files, but what concerns me more is how would this affect runtime performance.
Explain more about this. Do you think large binaries are going to run slowly? Do you think generating native code is going to perform badly?
Quote:C) One obvious problem is that virtual members need to have fully defined types, but if I am already creating a specialized function for each type combination can I ignore the vtable and inheritance and treat classes as structures at the code generation level ?
I don't know you never explain what semantics that is supposed to have or how it works at all. In fact this is the first time you have mentioned virtual functions at all so far as I know your language doesn't have such things.
Quote:Original post by stonemetal
A) Other than to say it is strongly statically typed you say nothing about the type system, so yeah any strong statically typed language has done something similar with their type system. What you go on to suggest looks like templates\generics in every language that has them.

But the idea is to have classes and OO features implemented in the semantic analysis. So when I have class Y derive from X, Y has no connections with X outside analyzer, eg. no vtable.
pure virtual member in X would simply tell analyser that Y must implement a method with same name/parameters.
This would make code generation a bit tricky, eg:
def foo(a):   x as Interface   if a == 1:      x = new DerivedA()   if a == 2:      x = new DerivedB   x.do_something()


now the compiler needs to generate a code path for both a== 1 case and a == 2 case because A and B have no 'vtable'.

Quote:Original post by stonemetal
B) It would only have to generate code for each class\function\method for each type it is instantiated with. If I instantiate class foo with int and only call method bar there is no reason to generate code for function baz(unless you have somesort of dynamic linking mechanism). If I never instantiate foo with string then there is no need to generate code for it.

True, but making this a default behaviour would probably mean it's used a lot, IDK i guess you are right, most of the time it shouldn't be a problem. But the functions generated would definitely be larger.

Quote:Original post by stonemetal Explain more about this. Do you think large binaries are going to run slowly? Do you think generating native code is going to perform badly?

Well if binary code > CPU cache size then wouldn't large code cause frequent cache misses/load/stalls ?
Quote:Original post by RedDrake
Basically the language would be strong/static typed (types must be determinable at compile time), but types would be implicit by default.


Right. Type Inference is used in many languages.

Quote:
A) would this type system work, and/or what problems do you see with this approach ? Any other languages featuring similar types system ?


I don't see anything wrong immediately based on the examples shown. Maybe I'm assuming too much, but this looks pretty much like Python and is pretty similar to a number of languages with type inference.

Quote:
B) Since this code would be statically compiled the compiler would have to generate code for each type combination of some class/function/method.


Depends on your requirements. .NET languages are statically compiled but only create one instance of code per generic class/method. Though they need to do a little trickery during runtime to instantiate the concrete class/method.

Quote:
Would this cause big performance penalties ?


I don't know. But I also don't care. Performance is a secondary or tertiary concern after you get it working.

Quote:
C) One obvious problem is that virtual members need to have fully defined types, but if I am already creating a specialized function for each type combination can I ignore the vtable and inheritance and treat classes as structures at the code generation level ?


I don't think so, but that depends on how the objects are stored/dealt with under the hood. Remember that (almost) the entire motivation for classes originally was to create a system where you didn't have to have the entire code to compile a program. If the DLL only says that the object is of type X, you can't infer what it is beyond that boundary.

Quote:
Anyway, I have this strange feeling that there are some problems with this approach which would prevent it from working but I can't really tell what it is - so any comments/ideas/suggestions/corrections/further reading links are welcome :)


LtU tends to be the go to place for language theory, even if it's morbidly academic.
To try to explain better, I want to implement 'standard' OO features of C# or C++ (inheritance/virtuals/visibility etc.) in the language.
But in the code generation, a final class (one that can be created) is 'flattened' in to a struct that contains only class member objects.
Functions/methods are 'specialized' by the compiler and appropriate function calls are determined at the compile time.
To do this the compiler must predict all possible paths where the type can change under some interface, like in that branching example above.

I want to do this so that I can abstract the OO out of the lanuge and implement it trough 'meta programming' interfaces, and only have minimal type system (like C struct) in the core language.

So I'm just interested if there is some flaw/error that would prevent this from working.
Quote:Original post by Telastyn
I don't think so, but that depends on how the objects are stored/dealt with under the hood. Remember that (almost) the entire motivation for classes originally was to create a system where you didn't have to have the entire code to compile a program. If the DLL only says that the object is of type X, you can't infer what it is beyond that boundary.

Hmm... is including the bytecode in DLL such a big issue ? I always thought OO was about encapsulation/data hiding :\
Quote:Original post by RedDrake
To try to explain better, I want to implement 'standard' OO features of C# or C++ (inheritance/virtuals/visibility etc.) in the language.
But in the code generation, a final class (one that can be created) is 'flattened' in to a struct that contains only class member objects.
Functions/methods are 'specialized' by the compiler and appropriate function calls are determined at the compile time.
To do this the compiler must predict all possible paths where the type can change under some interface, like in that branching example above.

I want to do this so that I can abstract the OO out of the lanuge and implement it trough 'meta programming' interfaces, and only have minimal type system (like C struct) in the core language.

So I'm just interested if there is some flaw/error that would prevent this from working.


No, the general concept should be fine. Tangent (my current hobby project) does this sort of class flattening. It does not do type inference though, and I forget why exactly, but I remember realizing that it was going to be impossible to determine at compile time what polymorphic method to invoke.

But beyond that, it's definitely possible to model traditional OOP features in a structural type system (which allows some 'flattening', assuming I understand you correctly).
Quote:Original post by RedDrake

Well if binary code > CPU cache size then wouldn't large code cause frequent cache misses/load/stalls ?


It shouldn't, you would jump to the section of code you were about to execute. It would end up being no different than a function call or a vtable call since you would be doing the same thing more or less.
From the looks of it you want python but convert it over to static typing and use type inference. I would recommend you look at and or learn other languages before you make your own. Some of the languages I've learned a lot from are scheme, haskell, ocaml, factor and scala while you might not like any of them it's worth looking at them and their features to give you a better idea of what is out there.

How do you plan to get the language running? As far as I see there are three main options
Target a VM (JVM, .Net, parrot VM, LLVM)
Convert it into straight assembly
Convert it to another language and use their compiler ( C and C-- seem to be common for this )

I would also highly consider that you make the base language as small as possible and make everything else a library for the language to use. This makes it nice for users so their code looks more like the base language instead of tacked on (See DSLs). Making things as libraries also allows you focus on what you want rather than the language as a whole. The last big advantage to this approach is if you ever want help it will be easier for people to get up to speed and contribute small parts.

I've done a bit of language design and have looked at quite a few languages and there is a lot more out there than just another C++ clone with different syntax. Not trying to bash your language, all I'm saying is explore whats out there first.

This topic is closed to new replies.

Advertisement