[Xhed] Of Immutability and Optimizations

Started by
3 comments, last by ToohrVyk 16 years, 1 month ago
Some of you might remember my work on a new programming language, Xhed, which has consistently made its way into my GameDev posts for the last year or so. The structure of the language has been fixed, and I have already developed a complete interpreter. I am currently in the process of bootstrapping the language. If people are interested in discussing about language features, requesting features, or raising interesting points about the current structure, I will gladly hear them out. Xhed is intended to be a compiled, garbage-collected, purely functional, strongly typed and type-inferred language centered around high-performance stream processing. The main objective is to provide stream processing primitives as part of an expressive yet simple language to increase optimization opportunities. I expect reasonably large and idiomatic Xhed programs to transparently take advantage (whenever available) of multi-core processors, programmable GPU pipelines, cell processors and correctly configured clusters to improve computation speed. A short example:

package net.nicollet.Examples

# The McCarthy 91 function
# The compiler would answer:
#   net.nicollet.Examples.f91 : int -> int

public f91 = { n ->
  if n > 100 then n - 10
  else n - 11 >> f91 >> f91
}

# Computing the maximum convergence time for 
# the Syracuse sequence over 1 .. n 
# The compiler would answer:
#   net.nicollet.Examples.syr : int -> int

public syr = { n ->
  let s = { i -> 
    if i % 2 = 0 then i / 2 else 3*n + 1
  } in
  # Create an array containing the convergence times
  Xh.make n (Xh.until { i -> i = 1 } s)
  # Reduce the array to a single maximum
  >> Xh.reduce Xh.max
}
The language has some inspirations from other languages:
  • C#/Java: the package system, the public/private tokens on file-scope identifiers
  • F#: the >> operator (a >> f is equivalent to f(a), but sometimes more readable)
  • ML: most of the syntax and the entire type system.
  • Felix: reuse of the block procedure syntax, the while loop.
  • Haskell/Miranda: pure functional philosophy.
  • Brook/CG/CUDA: stream processing approaches.
The core language supports built-in integers (unspecified size), reals (uses 64-bit if the hardware supports it, 32-bit wherever necessary), ASCII strings, dynamic-sized arrays, and closures. Its control structures involve (automatically recursive) let-in, if-then-else, and function calls. The standard library also provides for-while-fold-map-reduce functions which will generally be heavily optimized. However, kernel-based computation is not forced: it's possible to access array indices freely at any point in the program, though this will usually forfeit most parallelization techniques and as such is discouraged. The language also allows file-scope type declarations (which are actually typedefs), which may be made opaque to other packages through use of the private access modifier at file scope, and can be used to constrain value types in the typical ML fashion. I expect to add exceptions and labeled types to the language at some point in the future, but only after the initial compiler works. The language interacts with the exterior world through function calls: it's possible to use the available bindings (which should include C, C++, C# and Objective Caml) to invoke a function by providing it with arguments, and retrieving the result of the computation once it's done, possibly in an asynchronous fashion. As such, the language can be used both as a co-processor in a larger imperative language with side-effects, or as a standalone computation system with a small scaffold that reads input from files and writes output back to other files. [Edited by - ToohrVyk on February 22, 2008 4:41:25 AM]
Advertisement
I'm interested. Do you have an up-to-date spec/informal semantics? Where is the interpreter?

The stream processing concepts sound a little like Lucid Synchrone, an extension of OCaml (unfortunately I don't really know anything about it).

So the language is completely pure? No mutable references or input/output? Is the purpose to perform efficient/parallel computation on values input from a host language? That's quite interesting, and I think you could do quite a lot on optimisation and parallelism if so.

Type system: do you have algebraic types and HM polymorphism? Weren't you planning on dependent types at one point, too?

Consider adding support for first-class continuations rather than exceptions. People seem to overlook them, but they're right up there with first-class functions and typing when it comes to increasing language expressiveness. They're what I miss from Scheme when programming in Haskell. SML/NJ has typed first-class continuations. If you are not familiar with them, this is quite a good explanation.
Quote:Original post by Rebooted
I'm interested. Do you have an up-to-date spec/informal semantics? Where is the interpreter?


The interpreter is currently on my other computer. It's a five-file C# project, and it is extremely limited (it fully implements the language semantics, but it has almost no error diagnosis when errors happen, and the type system is not in place yet). I am currently working on a bootstrapped compiler written directly in Xhed.

The semantics are extremely simple: programs manipulate values, which are typical ML-like values:
  • Base value types (int, bool, real, char).
  • Streams (aka arrays, aka vectors) of an underlying type and specified size. Streams are indexed starting at one, instead of zero.
  • Tuples.
  • Closures, which are (env, argnames, body), where "env" is the environment of the closure, "argnames" are the names of the arguments, and "body" is the expression inside the closure.


Expressions are evaluated inside environments (which map variable names to variable values). Aside from the usual operational semantics of binary and unary operators, you have:
  • eval(env, literal) = value(literal)
  • eval(env, variable) = env[variable]
  • eval(env, let x = a in b) = eval(env + [x -> eval(env,a)], b)
  • eval(env, { args -> expr }) = closure(env, args, expr).
  • eval(env, f a) = call(f, eval(env, a))
    Calling a closure removes the first argument from "args" and adds it to the inner closure environment, bound to the passed argument. Then, if there are no arguments left, it evaluates the closure body in that inner environment. Otherwise, returns the new closure.


There are a handful of builtins: @x is the size-of operator, returns the number of elements in stream x, x evaluates to the i-th element of the stream x, which means that the "front" element is x[1] and the "back" element is x[@x], make n f creates an array such that x = f i and @x = n, and while c f i returns the first x = fn i such that c x = false (by order of increasing n ≥ 0). All other standard library functions are implemented in terms of these built-ins (but are, obviously, to be optimized by the compiler).

Quote:So the language is completely pure? No mutable references or input/output? Is the purpose to perform efficient/parallel computation on values input from a host language? That's quite interesting, and I think you could do quite a lot on optimisation and parallelism if so.


Yes, the language is completely pure. There are several optimizations: obviously, there are the basic ones (trivial parallelization of make-map-reduce sequences), the ones related to concurrent evaluation (f x + f y being computed on two threads and then added together), and the ones related to stream processing which are currently not very often seen:

public sort = { s ->  # Create a heap from the stream s  Xh.Stream.fold Xh.Heap.add Xh.Heap.empty s  # Create a sorted array back from the stream  >> Xh.Stream.generate { h -> ! (Xh.Heap.is_empty h) }                        { h -> Xh.Heap.top h, Xh.Heap.pop h }}public example = { s ->  s >> foo >> sort >> bar}


The point is that the stream argument of fold has a "streaming" format, which means that its elements are read in sequential order one after another without any lookahead. Similarly, the output stream of generate has a "streaming" format, which means that its elements are written in sequential order. So, let's assume that foo and bar respectively generate and read "streaming" format streams. Then, I could split the example function in two threads running on two cores.

Thread one executes foo, and outputs the generated elements of the output stream to a pipeline one after another as soon as they are available. Then, it executes bar, telling it to read from another pipeline whenever it needs a new element to work on.

Thread two executes sort, which will read an element from the input pipeline and place it in the heap whenever it's available, then pop elements from the heap and put them in an output pipeline as fast as possible.

Since the pipelines of the two threads are connected, the results are as expected. Since the threads run in parallel, the total time required between foo and bar evaluation is the O(log n) insertion time of a single element, which could be lowered to O(1) with an even more clever data structure.

Quote:Type system: do you have algebraic types and HM polymorphism? Weren't you planning on dependent types at one point, too?


I have plans for HM polymorphism and recursive types, though not necessarily sum types (which can be represented as visitors).

Quote:Consider adding support for first-class continuations rather than exceptions. People seem to overlook them, but they're right up there with first-class functions and typing when it comes to increasing language expressiveness. They're what I miss from Scheme when programming in Haskell. SML/NJ has typed first-class continuations. If you are not familiar with them, this is quite a good explanation.


I'll look into it. I am familiar with continuations, but am generally unable to read LISP code for long durations without my eyes bleeding. [smile]
The semantics look pretty simple and clean to me.

Quote:Original post by ToohrVyk
while c f i returns the first x = fn i such that c x = false (by order of increasing n ≥ 0). All other standard library functions are implemented in terms of these built-ins (but are, obviously, to be optimized by the compiler).

You can also implement while in terms of other constructs: while c f i = if c i then while c f (f i) else i

Quote:I have plans for HM polymorphism and recursive types, though not necessarily sum types (which can be represented as visitors).
Quote:I expect to add exceptions and labeled types to the language at some point in the future, but only after the initial compiler works.

I'm not sure what sort of data types you mean. What do you mean by recursive/labeled types if not sums? By visitors do you mean the pattern using objects?

Your optimisation ideas make me think of two things:

  • Nested data parallelism, which is very similar to your ideas, based around parallel operations on arrays such as array comprehensions: [: f1*f2 | f1 <- v1 | f2 <- v2 :].

  • Lenient evaluation as opposed to strict or lazy evaluation. Under lenient evaluation, function arguments are evaluated in parallel to the body and bound to futures until needed

Quote:(f x + f y being computed on two threads and then added together)
This is lenient evaluation. This optimisation must be conservative, because switching from strict to lenient evaluation can change behaviour. You could instead use lenient evaluation all the time, but this will incur context-switching (although this shouldn't be too bad if you use lightweight threads and have them scheduled to hardware threads).
Quote:Original post by Rebooted
You can also implement while in terms of other constructs: while c f i = if c i then while c f (f i) else i


It's true. I was speaking of the interpreter here (which does not support tail recursion and thus cannot handle the above correctly).

Quote:I'm not sure what sort of data types you mean. What do you mean by recursive/labeled types if not sums? By visitors do you mean the pattern using objects?


A labeled type:
public struct complex = { re : float; im : float }

A sum type, in Objective Caml:
type 'a list = E | N of 'a * 'a list

A sum type, in Xhed:
public type 'a list = 'b. ('a * 'a list -> 'b) -> 'b -> 'bpublic (empty : 'a list) = { _ x ->   x }public (cons : 'a -> 'a list -> 'a list) = { h t ->  { x _ -> x h t }}public (match : ('a * 'a list -> 'b) -> 'b -> 'a list -> 'b) = { cons empty lst ->   lst cons empty }public list_length =   match { h, t -> list_length t + 1 }        ( 0 )


Quote:This is lenient evaluation. This optimisation must be conservative, because switching from strict to lenient evaluation can change behaviour. You could instead use lenient evaluation all the time, but this will incur context-switching (although this shouldn't be too bad if you use lightweight threads and have them scheduled to hardware threads).


Yes, this is true. The point is that since there are no side-effects, the language can be assumed to be eager-evaluated (and the code must be written to respect this assumption), but the optimizer may choose at any point in time to resort to lazy evaluation instead if this improves performance.

This topic is closed to new replies.

Advertisement