Tree Manipulation Language

Started by
24 comments, last by choffstein 17 years, 7 months ago
I had written a rudimentary recursive parser to build TeX linguistics trees out of simple bracketed trees, like (root (node) (node) (root (node))). I don't think I ever got much further than it, but I could've probably turned it into a language.
Advertisement
Quote:Original post by Daerax
Quote:Original post by ToohrVyk
Any language with pattern matching would work in this case, including LISP and ML variants. Illustrating with Objective Caml (or F#) syntax:

Did you type that off the top of your head? Thats really impressive! I ask this because of the minor mistakes you made in the code which imply you didnt test the code and did it all off the top of your head.

Last time I checked, this guy was still impressive. I don't know what he can't do on the top of his head (well... maybe moving real-life objects, although I'm not even sure about that).
Quote:Original post by Daerax
Did you type that off the top of your head? Thats really impressive! I ask this because of the minor mistakes you made in the code which imply you didnt test the code and did it all off the top of your head.


I'm a Caml Light teacher, so I find myself writing that kind of things on a white board every so often (as well as debugging my students' programs off the top of my head, too). I also write more O'Caml code than C/C++/C# code (and more PHP/LaTeX than O'Caml, but that's another matter).

You might want to look into buying "ANSI Common Lisp"(ISBN: 0133708756) as it includes as an example a pretty complex pattern matching system. It uses it to provide a relational logic language ("M is-mother-of C" implies "C is-child-of M" type of stuff), but the way it handles things (IIRC) should make it fairly easy to build up a formalization of mathematics on top (math formalization does start with basic logic, after all =-)

-Extrarius
Quote:Original post by Emmanuel Deloget
Quote:Original post by Daerax
Quote:Original post by ToohrVyk
Any language with pattern matching would work in this case, including LISP and ML variants. Illustrating with Objective Caml (or F#) syntax:

Did you type that off the top of your head? Thats really impressive! I ask this because of the minor mistakes you made in the code which imply you didnt test the code and did it all off the top of your head.

Last time I checked, this guy was still impressive. I don't know what he can't do on the top of his head (well... maybe moving real-life objects, although I'm not even sure about that).


well im sure he could easily move objects on the top of his head hehe. [grin] but yeah him, that anonymous poster, zahlman and snk_kid ( <- used to) constantly write amazing bits of code on the forums on the go, that just made you go wow. for a lack of words.

Quote:Original post by Anonymous Poster
You might want to look into buying "ANSI Common Lisp"(ISBN: 0133708756) as it includes as an example a pretty complex pattern matching system. It uses it to provide a relational logic language ("M is-mother-of C" implies "C is-child-of M" type of stuff), but the way it handles things (IIRC) should make it fairly easy to build up a formalization of mathematics on top (math formalization does start with basic logic, after all =-)

-Extrarius


There is a way that is many, many orders of magnitudes better than this. The method you are espousing would quickly get bogged down in semantics and become impossible to work with least of all verify, as it would become more and more specific and hardcoded. As well it would be exponentially difficult to extend. You know of metamath. that way is one way, another way is to do it constructively in a strongly typed lambda calculus whose termination assures that the proof is correct.
Quote:AP #1: Well, I'm reasonably keen on Lisp now. Those examples were great, thanks .
Here's another example of a cool thing that lisp is especially good at: molding itself to what you want to do. Maybe it'll make you keener. ;)

I've thought about this and I'm growing increasingly envious of the ML pattern matching syntax based on destructuring types by their type constructors. It's extremely convenient for manipulating tree structures.

Haskellers and ML'ers are rightly smug about it. So we lispers might either

a) sneer and say it isn't that cool anyway or

b) bow deeply to those guys and steal everything that isn't nailed down in ML.

I'm going with option b.

Of course, we don't have invertible type constructors for all types in lisp, and we don't necessarily want to go through the hassle of having them (sometimes dynamicity is good). So we'll go a slightly different route: we'll pattern match against a pattern and then filter the bindings through some (optional) predicates. These can be type predicates (providing the functionality of ML patterns) or arbitrary predicates.

We want to be able to write the famous haskell factorial or map:

fac :: Integer -> Integerfac 0 = 1fac n | n > 0 = n * fac (n-1)map                     :: (a->b) -> [a] -> map f  []               =  []map f (x:xs)            =  f x : map f xs


in a similar way. So we want syntax, let's call it defun-matcher, that takes a name and an arbitrary number of pattern-predicate-expression tuples and turns them into a common lisp function with the same name that has the desired behaviour. The fac and map functions above might look like this:

(defun-matcher  ((fact 0) (t) 1)  ((fact n) ((numberp n) (> n 0)) (* n (fact (- n 1)))))(defun-matcher   ((my-map f nil) (t) nil)  ((my-map f (x . xs)) (t) (cons (funcall f x) (my-map f xs))))


More parentheses, but it's equally beautiful once you look past them.

We need some medium macrology to accomplish this: we have to unfold a pattern into code that matches that pattern, and this is tougher than recursively matching bindings because we have to know whether a pattern variable has been bound or not 'previously'. 'Previously' means we have to impose an ordering on how we flatten the pattern into code that matches it; we didn't have to worry about that before.

We can use tyhe recursive approach we used for the previous pattern matcher as long as we remember to pass along the variables that have yet to be bound into each recursive branch (while knowing their relative ordering when the pattern is unfolded into code). Then the code will know whether to check a variable against a previous binding or whether to bind it directly and move on. I'm no Paul Graham, so we'll restrict ourselves to matching against lists only.
Here're a few functions we need first: the predicates to check whether something is a pattern variable or a quoted expression, a function that extracts a list of variables from patterns, and a function that returns the variables in the first pattern that are not in the second pattern. We'll need the last two functions to keep track of which variables have not been given bindings yet during the creation of the matcher code.

(defun var-p (pattern)  "This is to check whether something in a pattern is a pattern variable  or something else (the + - * / symbols you want to treat as constant)."  (and (symbolp pattern) (not (member pattern '(nil + - * / _)))))(defun quoted-p (pattern)  (and (consp pattern) (eq 'quote (car pattern))))(defun get-vars (tree)  "returns a list of all variables in a pattern (save those escaped by a quote)."  (cond ((and (consp tree) (eq 'quote (car tree)))	 (list))	((consp tree)	 (append (get-vars (car tree)) (get-vars (cdr tree))))	((var-p tree)	 (list tree))	(t (list))))(defun var-difference (pattern1 pattern2)  (set-difference (get-vars pattern1) (get-vars pattern2)))


Here's the matcher-code generator, which turns a pattern into code that matches it and binds the pattern variables in it. We also pass it an identifier for an expression (which it'll match against) an inner-form (which it will place inside the bindings so it can be evaluated within them). It must know which variables are as yet unbound, so as to bind them instead of comparing them to their (nonexistent!) binding.

It looks scarier than it actually is: for you non-lispers, the code that is preceded by a backquote is just a code template that is returned (this function generates code, after all), it isn't part of the logic in the function and isn't actually doing anything.

(defun make-matcher (pattern expr unbound-vars inner-form)  (let ((right-branch (gensym)))    (cond ((var-p pattern)	   (if (member pattern unbound-vars)	     `(let ((,pattern ,expr))		,inner-form)	     `(if (equalp ,pattern ,expr)		,inner-form)))	  ((and (consp pattern) (eq 'quote (car pattern)))	   `(if (equalp ,pattern ,expr)	      ,inner-form))	  ((consp pattern)	   (destructuring-bind (pattern1 . pattern2) pattern	     (let ((inner-unbound (var-difference pattern2 pattern1)))	       `(if (consp ,expr)		  (let ((,expr (car ,expr)) (,right-branch (cdr ,expr))) 		    ,(make-matcher pattern1 expr unbound-vars 				   (make-matcher pattern2 right-branch inner-unbound inner-form)))))))	  ((eq pattern '_)	   inner-form)	  (t 	    `(if (equalp (quote ,pattern) ,expr)	       ,inner-form)))))


Now, this only destructures a pattern and fails if the pattern doesn't match. We also wanted to fail if the generated bindings didn't satisfy some arbitrary predicates, remember?

And we also want a function that puts this all together: one we can call with the symbol the expression to be matched will be initially bound to and with a list of pattern-guard-form tuples. This function will set things up so that the matcher works as expected: trying to satify the patterns and guards in order, and evaluating and returning the form in the first successful match.

(defun make-guarded-matcher (in-pattern guards form expr-symbol)  (let ((inner-form `(if (and ,@guards) (return-from successful-match ,form))))     (make-matcher in-pattern expr-symbol (get-vars in-pattern) inner-form)))(defun single-matcher (expr-symbol pattern-guard-forms)  (let ((guarded-matchers (loop for p-g-f in pattern-guard-forms				collect (destructuring-bind (pattern guards form) p-g-f					      (make-guarded-matcher pattern guards form expr-symbol)))))    `(block successful-match	    (progn ,@guarded-matchers))))
We now have a function that can turn lists of pattern-guard-form tuples into the code to match and execute the first successful match. But we wanted to be able to define functions, and these can take more than one argument (each argument can be a pattern). Fortunately, it is just a matter of transforming the pattern-function syntax into pattern-guard-form tuples , which we already know how to deal with.

None of this code is particularly pretty (I'm pressed for time) and this is the ugliest bt, but also the simplest: the hard work is behind us.

(defun check-fn-arity (fns-args)  (destructuring-bind (name-args . therest) fns-args    (let ((arity+1 (length name-args))	  (name (car name-args)))      (loop for function-args in therest do	    (if (or (not (eq name (car function-args))) 		    (not (= arity+1 (length function-args))))	      (error "name or arity mismatch in a pattern-match defined function")))      (cons name (- arity+1 1)))))(defmacro defun-matcher (&rest name-pattern-guard-forms)  (let* ((expr-symbol (gensym))	 (function-args (mapcar #'car name-pattern-guard-forms))	 (name-arity (check-fn-arity function-args))	 (pattern-guard-forms (mapcar (lambda (n-p-g-f) 					(destructuring-bind ((name . args) . g-f) n-p-g-f					  (cons args g-f))) 				      name-pattern-guard-forms)))      (let ((args (loop for x from 1 to (cdr name-arity) collect (gensym))))	`(defun ,(car name-arity) ,args	   (let ((,expr-symbol (list ,@args)))	     ,(single-matcher expr-symbol pattern-guard-forms))))))


Whoopee! We're done.

Let's see what code is actually generated when we write down those beautiful, concise pattern function definitions:

[122]> (macroexpand-1 '(defun-matcher  ((fact 0) (t) 1)  ((fact n) ((numberp n) (> n 0)) (* n (fact (- n 1))))))(DEFUN FACT (#:G3372) (LET ((#:G3371 (LIST #:G3372)))  (BLOCK SUCCESSFUL-MATCH   (PROGN    (IF (CONSP #:G3371)     (LET ((#:G3371 (CAR #:G3371)) (#:G3373 (CDR #:G3371)))      (IF (EQUALP '0 #:G3371)       (IF (EQUALP 'NIL #:G3373)        (IF (AND T) (RETURN-FROM SUCCESSFUL-MATCH 1))))))    (IF (CONSP #:G3371)     (LET ((#:G3371 (CAR #:G3371)) (#:G3376 (CDR #:G3371)))      (LET ((N #:G3371))       (IF (EQUALP 'NIL #:G3376)        (IF (AND (NUMBERP N) (> N 0))         (RETURN-FROM SUCCESSFUL-MATCH (* N (FACT (- N 1))))))))))))) ;T


And for my-matcher:

[123]> (macroexpand-1 '(defun-matcher  ((my-map f nil) (t) nil)  ((my-map f (x . xs)) (t) (cons (funcall f x) (my-map f xs)))))(DEFUN MY-MAP (#:G3380 #:G3381) (LET ((#:G3379 (LIST #:G3380 #:G3381)))  (BLOCK SUCCESSFUL-MATCH   (PROGN    (IF (CONSP #:G3379)     (LET ((#:G3379 (CAR #:G3379)) (#:G3382 (CDR #:G3379)))      (LET ((F #:G3379))       (IF (CONSP #:G3382)        (LET ((#:G3382 (CAR #:G3382)) (#:G3383 (CDR #:G3382)))         (IF (EQUALP 'NIL #:G3382)          (IF (EQUALP 'NIL #:G3383)           (IF (AND T) (RETURN-FROM SUCCESSFUL-MATCH NIL)))))))))    (IF (CONSP #:G3379)     (LET ((#:G3379 (CAR #:G3379)) (#:G3387 (CDR #:G3379)))      (LET ((F #:G3379))       (IF (CONSP #:G3387)        (LET ((#:G3387 (CAR #:G3387)) (#:G3388 (CDR #:G3387)))         (IF (CONSP #:G3387)          (LET ((#:G3387 (CAR #:G3387)) (#:G3390 (CDR #:G3387)))           (LET ((X #:G3387))            (LET ((XS #:G3390))             (IF (EQUALP 'NIL #:G3388)              (IF (AND T)               (RETURN-FROM SUCCESSFUL-MATCH                (CONS (FUNCALL F X) (MY-MAP F XS)))))))))))))))))) ;T[124]>


Isn't it a relief we can get lisp to write that instead of having to do it ourselves? And it's bulky but pretty efficient: it stops looking as soon as it discovers it can't match the pattern.
Now, this code is almost certainly buggy and far from pretty. I only tested it on the fact and my-map examples (which surprisingly seem to work):

[124]> (fact 3)6[125]> (my-map (lambda (x) (* x x)) (list 1 2 3 4 5))(1 4 9 16 25)[126]>


But it's VERY little work for what it does (extend lisp syntax to comfortably express functions in terms of patterns and predicates they must match). Suddenly you have syntax that is almost as pretty as Haskell pattern-match function definitions, and it takes only a few hours to extend your language!

And I feel very flattered at being called legendary, but I'm actually a beginner at lisp (been writing it for a little over two years, on and off) and I've forgotten the languages I thought I 'knew' before (I'm starting to realize I didn't know them at all).

The point is that some things that are unthinkably hard in other languages are very easy in lisp; writing this requires no extraordinary ability, just the right language. Lisp is very well suited to this sort of thing, and I like the practice :)
Quote:Original post by Anonymous Poster
And I feel very flattered at being called legendary, but I'm actually a beginner at lisp

Woah.

Well, it seems like I'm going to need to learn Lisp anyway - just to fully appreciate that code.

If no one can point me in an even better direction, I'm going to go for Lisp on this project. Cheers [smile]

This topic is closed to new replies.

Advertisement