Jump to content
  • Advertisement
Sign in to follow this  
ToohrVyk

Unity Better Operator Overloading

This topic is 4147 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

This is a followup to Extreme Operator Overloading, and attempts to describe the language idea more formally than the previous time.

Values

The language manipulates non-mutable values (in that they can be created, but not changed or assigned to). Values fall into three categories:
  • Literals. These are things like 42, 3.141592f, "Hello", 'x' or false. They also include "constructed literals", such as string(12,'x'), array(42,"Hello").
  • Objects. These are in fact fieldsets (bindings between field-names and field-values), such as (name -> "Joe", age -> 35). Some of the fields may bear the extend keyword, which (as shown later) indicates that the object may "decay" to that field when used in a context where that field could be used but the object couldn't: (name -> extend "Joe", age -> 35) means that the program will use the name whenever a string could be used but the object couldn't. Objects also include tuples, of the form ("Joe",35), and everything may be mixed: (extend "Joe", age -> 35). An unnamed field automatically receives the name fieldi for the smallest i > 0 that causes no collision with the other names, the names for unnamed fields being alotted left-to-right.
  • Singletons. These objects are unique and different from every other object in the program. A singleton named X is created with the new X; statement, which will create a new singleton every single time it's evaluated. Singletons have no default properties and cannot be used by default to do anything at all.

Grammar

The basic building block of the grammar is the expression. Expressions fall into two categories: overloadable and non-overloadable. Expressions are (with the expected priorities wherever applicable):
expr =
 | x                       // Variable name, literal or object creation.
 | expr.field              // Field access. Overloadable.
 | expr:expr               // Satisfying a contract. Specially overloadable.
 | expr[expr]              // Subscripting. Oveloadable.
 | expr expr               // Function application. Overloadable.
 | expr op expr            // Binary operation (+,-,&&,^,%,<<,== etc). Overloadable.
 | op expr                 // Unary operation (-,*,!). Overloadable.
 | { decl* } expr          // Declaration list. 
 | expr as expr            // Explicit value casting. Overloadable.
 | using(expr) expr        // Using declarations. Specially overloadable.
 | if expr then expr else expr
For instance: if is_odd n then (10,22).field2 * using(rational) 1 / 2 else -3 A declaration describes the behaviour or existence of a given overloadable expression. It takes an extremely simple form:
decl =
  | conditions {{ expr }} = expr;          // Conditional definition.
  | conditions {{ using(x) }} = { decl* }; // 'Using' definition.
  | new x;                                 // Singleton declaration.
  | using x;                               // Using a list of declarations.

conditions =
  | ε                      // No conditions
  | for(x:expr) conditions // Universal quantifier
  | if(expr) conditions    // Restriction
The expression between the {{ }} must be a construct made of overloadable expressions, and its terminal members must be variables names, non-constructed literals or objects constructed from these. For instance: new max; for(x:int,y:int) if (x >= y) {{ max (x,y) }} = x; for(x:int,y:int) if (x < y) {{ max (x,y) }} = y; The using(e) e expression is specially overladable: its declaration only incorporates the object between parentheses, and describes a set of declarations as opposed to an expression. Note that the e:e expression is also specially overloadable, but may not be placed in a declaration. This expression is automatically overloaded when e as e is overloaded, following the simple rule that y : t is true if and only if there exists a value x such that x as t yields y (for obvious reasons, this is undecidable and the compiler may decide to arbitrarily complain about it if the definition of e as t is too complex). A file is considered to be a list of declarations decl*. For syntactic sugar reasons, declarations may be factored (so for(a) b; and for(a) c; can be folded into for(a) {b; c;} and the same, intuitively, for if and else). Also, using(a,b) e is meant to mean using(a) using(b) e and new a,b; means new a; new b;.

Compiling

If file filename.ext contains code code, then compiling code into a library or program creates a binary containing the equivalent of:
new filename;
{{ using(filename) }} = { code };
The only exception is when there is already a definition of new filename; in the file. In that case, the compiler does not generate a new wrapper and uses the existing one instead (throwing errors for any errors that do not relate to filename. This allows:
// minmax.ext
new minmax;
{{ minmax }} = (

  min -> { 
    new min; 
    for (x:int,y:int) 
      if (x < y) {{ min (x,y) }} = x;
      else       {{ min (x,y) }} = y; 
  } min,

  max -> { 
    new max;
    for (x:int,y:int) 
      if (x < y) {{ max (x,y) }} = y;
      else       {{ max (x,y) }} = x; 
  } max

);

// otherfile.ext
minmax.min(1,2)
Linking libraries together simply concatenates the code equivalent of the libraries into a single code (in the order that libraries were linked in) then compiles the result as if it were a new file (thus adding another level of using indirection).

Semantics

The meaty part. Evaluating an expression, when its arguments are all values (as opposed to other expressions) consists in replacing the expression with a value according to certain rules. When the arguments are other expressions, then these expressions are evaluated in unspecified order, so they become values, and the encompassing expression can be evaluated. The rules for evaluating an expression are conceptually simple (the devil is in the details): the evaluator looks for a corresponding expression definition within the environment (the set of definitions visible from where the expression is), replaces the expression with the expression it's defined as and (if it's not a value) evaluates the result until it's a value. For instance, consider the evaluation of 1 + (max (1+2,4)), with the max function defined above.
  • To evaluate the sum, the max expression must be reduced to a value, which means the tuple must be reduced to a value, which means 1+2 must be reduced to a value. In the environment, {{ 1 + 2 }} = 3; and so the tuple becomes (3,4). This is a clean and nice value, so we can move on.
  • To evaluate max (3,4), the evaluator has two definitions. The first is invalid here, because it violates constraint 3 >= 4, but the second is usable. Thus, the entire term evaluates to y where (x = 3, y = 4), which means 4.
  • Then, {{ 1 + 4 }} = 5; is applied to obtain the final value, 5.
There are four difficult concepts to understand: environment layers, overloading, environment taint and environment injection.

Environment Layers

Environments are ordered sets of definitions and new expressions, each of them being a layer. Whenever a new definition is encountered, it is added at the top of the set (it becomes the topmost layer). Layers added at a given time will disappear from the environment when the expression they were in disappears.
  • using(e) x: the using(e) statement is executed, which may add any number of layers on top of the environment. These layers disappear as soon as x is fully evaluated.
  • { d; } x: the d definitions are added to the environment, and disappear when x is fully evaluated.
  • { a; b; } x is actually interpreted as { a; } { b; } x, so that the layer created by definition a is usable when within the definition b, and all of them disappear when xis evaluated.

Overloading

Several definitions may exist for any given expression. These definitions differ by the constraints imposed on the arguments: are they integers? Tuples? Must they satisfy a given property? One must also take into account the existence of extends fields in objects, which let objects masquerade as other objects. When no definitions may be made to fit an expression, throw an error. When exactly one definition can fit an expression, use it. When more than one definition can fit an expression, things get ugly. First, the one which requires the least extends conversions to be applied is used. Then if ambiguity remains, the highest layer of all is used.

Environment Taint

This is the most difficult rule, which can be described as: "If a definition may affect the value of an expression, then it doesn't disappear from the environment for that value." The definition remains in the environment, specialized for that value only, and other definitions may disappear from under it. However, this definition in turn may reference other definitions, at which point they, too, remain in the environment in the same order. The objective of this rule is to allow returning objects along with their intended behaviour, instead of just their value. Otherwise, returning functions would be impossible.
new add;
for (x:int) {{add x}} = { 
  new closure; 
  for (y:int) {{closure y}} = x + y;
} closure;
In the example above, it is necessary for the {{ closure y }} definition to remain available. The environment taint rule performs this by keeping that definition alive until the returned value disappears. The rule is called taint because building a definition around a given value effectively taints the value with the definition forever. So, one of the funny things that you can do is this:
new strange;
for (x:int) {{strange x}} = {
  for (y:int) {{x + y}} = 0;
} x; 

Environment Injection

This is the complex consequence of Environment Layers and Overloading. It explains that the behaviour of an object inside a definition depends on the environment in which the definition is used (as opposed to usual languages where the environment in which the definitions is defined is actually used). Consider:
new succ;
for (x:int) {{succ x}} = x + 1;

new poly;
for (x:int) {{poly x}} = (succ x) * (succ x); 

  // Returns 3*3 = 9
  ... poly 2 ...

  // Returns 2*2 = 4
  ... { for(x:int) {{succ x}} = x; } poly 2 ...
It's arguably not quite sane, but it has its uses. Besides, library writers can always protect themselves by reloading the correct layer of definitions on top of the layer defined by the user before the call:
// filename.ext
new succ;
for (x:int) {{succ x}} = x + 1;

new poly;
for (x:int) {{poly x}} = using(filename) (succ x) * (succ x); 

// somewhereelse.ext
using (filename);

  // Returns 3*3 = 9
  ... { for(x:int) {{succ x}} = x; } poly 2 ...

Static Checking

Well, this is still giving me trouble. I'll post more about this when I actually manage to think through it.

Share this post


Link to post
Share on other sites
Advertisement
While I am opposed to operator overloading in most languages, I think this could be a very nice tool for making DSLs. I can't wait to see where this is going!

Share this post


Link to post
Share on other sites
This looks very interesting, I haven't absorbed it all yet but it's certainly looking easier to get at than the last thread which is still screwing with my head :)

I do suspect however that the static analysis is going to be very hard, seems like many of the problems of dynamic binding occur, as well as various typing issues - thats just a gut feeling though so pinch of salt etc.

Ahnfelt: I'm interested in your opposition to overloading, could you expand? It seems to me that overloading based on type is one of the more pragmatically useful things to add to a language since so much of maths is based on the idea of symbol reuse. Perhaps it's just my graphics bias as I like to be able to look at vector expressions in infix notation ;)

Share this post


Link to post
Share on other sites
Now, concerning static checking. First, I introduce a new statement which may appear anywhere where a declaration may appear: the assert statement. It works like this:
conditions assert(expr)
conditions isdef(expr)

Typical conditional and quantifier factoring rules apply.

Generating constraints


Whenever you write an expression, it generates a contraint signature, which is the set of all conditions which must be verified by the environment for the expression to be valid. Constraints (including user-provided ones) may also use the isdef(expr) constraint, which is verified if and only if the constraint has a well-defined meaning. One can use this to determine if a variable is defined (isdef(varname)), to determine if a given operator works for a given type (for(a:type,b:type) isdef(a+b)), if an entity provides an using operator (isdef(using(a) a)) and so on. Such constraints are also automatically generated whenever code is encountered. Note, however, that isdef is not a boolean value (for lattice traversal using LFP to terminate) but actually an assert-like check. It is not possible to assert that something is not defined For example:
// Defining MacCarthy's 91 function
new maccarthy;
for (x:int) if (x >= 0)
if (x < 100)
{{maccarthy x}} = maccarthy (maccarthy (x + 11));
else
{{maccarthy x}} = x - 10;

// Automatic generation of static constraints:
for (x:any) isdef(x:int);
for (x:int) isdef(x >= 0);
for (x:int) isdef(x < 100);
for (x:int) assert ((x >= 0):bool);
for (x:int) assert ((x < 100):bool);

for (fun:any, arg:any) def(fun arg) =
if (fun = maccarthy)
if (arg:int)
if (arg >= 0)
if (arg < 100) {
isdef(arg + 11);
isdef(maccarthy (arg+11));
isdef(maccarthy (maccarthy (arg+11));
} else {
isdef(arg - 10);
}


As shown above, the compiler automatically generates the constraints used to determine whether code is valid. It contains five global constraints, which represent the assumptions made by the defining code about integers, and four conditional constraints, which represent the assumptions made by the function about the calling environment.

Checking constraints


A constraint is checked at various times depending on its type. Global constraints are checked once at the point where the definition is generated, and then are checked once every time an environment change happens when the definition exists (this includes environment taint). Sequential changes in the environment prior to evaluating an expression count as only one change (and the check is done at the end). This lets the programmer write a sequence of definitions which violates the constraints, in order to redefine something, without failing the check:
// Generates: for (x:int) assert (isdef (odd x))
for (x:int) if (odd x) {{x % 2}} = 1;
// Fails the above constraint, since it's a newly defined entity with no operators
new odd;
// This satisfies the above constraint again
for (x:int) {{odd x}} = true;


Conditional constraints are checked whenever a constraint involving them is also checked. For instance, if the compiler needs to determine whether for(i:int) if(i >= 0) isdef(maccarthy i), then the three constraints for (i:int) isdef(i + 11), for (i:int) isdef(maccarthy (i + 11)) and for (i:int) isdef(maccarthy(maccarthy(i + 11))) would require checking. As soon as all constraints that are required have been verified, the code is allowed. If a constraint is provably violated, the compiler may issue an error. If a constraint may not be proven to be either verified or violated, the compiler may either issue an error or issue a warning and decorate the code with a runtime assertion, depending on the settings.

For performance reasons, all checks performed in the default environment (that is, without injection that would affect the checks) are remembered, so that the check only occurs once, and is reused everywhere else. A check is only done twice if there is an environment injection which might affect the outcome (at which point it's expected that the check occur again anyway). In short, when only part of an environment was changed, checking is optimized by only re-doing the checks that depend on the changed part, not the entire environment (so that, if the user leaves integers alone, the entire integer-checking does not happen again whenever the user defines a local lambda).

The solving algorithm


Proving happens as a recursive process. Everything starts with a knowledge base about the environment, to which information is added, and a node (what is to be proved). The proof is done by backward induction:if a proof did prove that the node is true, what would be its penultimate property? The solver picks a possible such property by pattern-matching the node against axioms and the right-hand-side of theorems in the knowledge base, selecting the most probable one using heuristics (simplicity, resemblance with other elements in the current code, and previous successes). If it's an axiom, the proof is correct. If it's the inverse of an axiom, the proof indicates incorrectness and immediately fails (if it was the only way to prove something, the entire proof is incorrect). If it's a theorem, then it goes on to prove the left-hand member of the theorem recursively, after adding the node to the knowledge base (to obtain the GFP instead of the LFP).

The actual details and heuristics of this are still a work in progress...

Share this post


Link to post
Share on other sites
Quote:
Original post by JuNC
Ahnfelt: I'm interested in your opposition to overloading, could you expand? It seems to me that overloading based on type is one of the more pragmatically useful things to add to a language since so much of maths is based on the idea of symbol reuse. Perhaps it's just my graphics bias as I like to be able to look at vector expressions in infix notation ;)


I can see how it's useful that + is defined for integers, floating point, complex numbers, vectors, matrices and quaternions, since that is an established style generally agreed on. I do not mind if the language has predefined overloaded operators (it's pretty nice, actually).

However, if the users of the language is allowed to overload operators, it results in abominations such as << used for both "shift left" and "write to stream" or complete madness such as the boost::spirit parser that implements a BNF-esque syntax in C++ by overloading completely unrelated operators.

In the end, if the language has operators defined for the most common types, that's all I think there should be. A library, say, that overloads <- to write to a stream seems much less clear to me than having a method called write().

If it had never been done before and I was seeing this feature for the first time, I'd love it. It's just that history (C++) has shown that it will get abused.

Share this post


Link to post
Share on other sites
Quote:
Original post by Ahnfelt
If it had never been done before and I was seeing this feature for the first time, I'd love it. It's just that history (C++) has shown that it will get abused.


I would argue that 'it will be abused' is a poor reason to exclude any language feature. Sufficiently evil programmers can write evil code in any language.

Share this post


Link to post
Share on other sites
Which is why I base my opposition to the concept based on what has happened in another language that implements the feature. I do not oppose the feature because it can be abused, but because apparently it will be.
Edit: Whoops, that was not really a reply. Instead, let me hear your argument for why it's a poor reason to exclude a language feature, that, extrapolating from C++ usage, will be abused at least as much as it is put to good use. Of course, this is a subjective matter, but let's assume it is for the sake of the argument ;-)

Then again, we're sidetracking the thread...

Share this post


Link to post
Share on other sites
Quote:
Original post by Ahnfelt
Then again, we're sidetracking the thread...


I don't mind [smile] please go on, it's interesting.

Share this post


Link to post
Share on other sites
I'm back with some more thoughts.

New declaration semantics


I've been thinking about the Open-Closed Principle lately, as well as the difficulty of handling environment injection correctly. So, I'm making two essential changes to the system : I'm dropping the using(x) statement, and I'm restricting the overloading declarations so that they may only affect expression which contain an element that's currently being defined. What this means is that you cannot change what happens when two objects are in a given expression, unless you're currently defining one of the two objects. To give a concrete example, you may write for(x : t) {{func x}} = x; if you are currently defining t (and func already exists), or if you are currently defining func (and t already exists), but not in any other situation.

This brings a change to how definitions are actuall grouped: they're now bound to an object creation (using the new x statement). So, I'm altering the grammar to look like:
expr = expr =
| x // Variable name, literal or object creation.
| expr.field // Field access. Overloadable.
| expr:expr // Satisfying a contract. Specially overloadable.
| expr[expr] // Subscripting. Oveloadable.
| expr expr // Function application. Overloadable.
| expr op expr // Binary operation (+,-,&&,^,%,<<,== etc). Overloadable.
| op expr // Unary operation (-,*,!). Overloadable.
| expr as expr // Explicit value casting. Overloadable.
| new name { decl* } // Object definition
| let name = expr in expr // Expression naming
| if expr then expr else expr

decl =
| conditions $ expr $ = expr; // Conditional definition.


The object definition evaluates to an object. The name n of the object is available within the declaration block that follows it, and that declaration block is mutually recursive (the bindings are only checked for existence after the declaration block is finished. Every single declaration in the block must declare an expression where one of the members is a new variable. A new variable is either n or a variable x quantified as x:y where y is a new variable. An example of source code

//pair.ext
new pair
{
for(x:any,y:any) $(x,y) as pair$ = (first -> x,second -> y);

for(x:pair) $x.swap$ = (x.second,x.first) as pair;

$pair.twice$ = new twice
{
for(x:any) $twice x$ = (x,x) as pair;
};
}


An advantage of this approach is that the behaviour of any expression or declaration that doesn't depend on unprovided parameters is checked for correctness exactly once, when it is encountered.

The new semantics drop the notion of layered environment (or environment at all). Since the declarations are attached to a defined object, we consider objects as data-and-expressions associations. Whenever several objects are involved in an expression, all the declarations attached to them are filtered to determine which may be used. The one which requires the least type conversions is kept (if none, ambiguity or missing declaration errors happen depending on the situation) and evaluated.

New class behaviour


There is a bit of compiler magic behind the as operator. All values have a .class field which represents their class. For instance, 1.class evaluates to int. This can be changed by using the as operator: (1 as unsigned).class evaluates to unsigned even though the value remains the same. This remains true of the as operator overloaded by the programmer. In our example above, ( (1,2) as pair ).class evaluates to pair. Objects and tuples have the type any, strings have the type string, and arrays have the type array.

The a:b operator could be seen mostly a shortcut for a.class = b, but there is a subtle difference in the way that implicit casting interacts with it. If you remember, implicit casting works using the extends keyword, by saying that a tuple or object may behave as if it were one of its own members. Thus, a:b is in fact a shortcut for "a, or a value x extended by a transitively, verifies a.class = b or x.class = b". Note that all objects implicitly extend any.

An important element here is the Liskov Substitution Principle. Since overloading is mostly free and there is no actual type-checking system, one could do the following:
new base
{
for (i:int) $i as base$ = i;
$base.value$ = new value
{
for (b:base) $value b$ = b as int;
};
for (b:base) $b.double$ = (base.value b) * 2;
}

new derived
{
for (b:base, j:int) $(b,j) as derived$ = (base -> extends b, value -> j);
for (d:derived) $base.value d$ = (base.value d.b),d.j;
$derived.incorrect$ = new incorrect
{
for (d:derived) $incorrect d$ = d.double;
}
}


Therefore, base.double will have a "base-only" interface which works well (because it's checked at definition time that everything is well), an a "derived" interface which specifies what a derived class must satisfy. For instance, in the situation above:
isdef ((base.value d) * 2)
When using a derived as a base, the "derived" version is used, and the checks are done. For performance reasons, the "derived" interface is generated lazily (because most of the time there is no need for it).

The garbage collector


The garbage collector can be heavily optimized depending on the usage that a given expression makes of a value. The collector uses four approaches:
  • (Slowest) Reference-counted heap system (since the language semantics preclude the existence of cycles) destroys the object when nobody references it anymore.
  • (Slow) Destructor-based heap system where the object is destroyed when the encompassing object or scope is gone.
  • (Fast) A stack auto system where the object is destroyed as part of the expression code before returning a value.
  • (Fastest) A stack drop system where the object isn't even destroyed, just forgotten (which assumes that it has no members which are destructor-based).

The compiler builds the final code with placeholders where the allocation and deallocation should be. Then, based on the generated code, a second pass chooses the best possible allocation scheme based on the expected lifetime of the object.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!