Optimization Mk.II

Started by
6 comments, last by Excors 17 years, 1 month ago
This thread is a follow-up of that older thread, which I had started right before my finals (how smart of me). I'm designing a compiler from a high-level garbage-collected strongly-typed no-side-effects language to a low-level language, such as C, C++-- or one of the GCC intermediary code formats. The objective is to let an existing optimizer handle the transformation of low-level code into machine code, along with the pipeline optimization, tight inline decisions and similar low-level machine-dependent optimizations that I have no intent to delve into. My own optimizing compiler would deal with areas that are usually not handled by the compilers of low-level languages. I already have some high-level optimization ideas, described below. I was wondering if any of you have any experience with such a level of optimization that you would like to share.
  • Elimination of unused object fields, and/or of the initialization of unused object fields. For example:
    class Person { 
      string name;
      int age;
      bool male;
    };
    
    def compare(a:Person,b:Person) -> bool
    {
      return (a.age > b.age) || (a.male = b.male);
    }
    
    def frobnicate(p:Person) -> bool
    {
      def someone = new Person { 
        name = someLongComputation();
        age = 42;
        male = false;
      };
      return compare(p,someone);
    }
    In this example, compare does not use the name field of Person. So, the initialization from someLongComputation() would be optimized out. Even better: if the compare function was always called on an object which does not need its name field for anything else, then an alternate version of Person without a name field would be created, optimizing memory usage.
  • Specialization of generic functions based on what constraints are present at the call site. For example:
    def angle(x:any) -> bool 
    {
      return (x:real) && (x >= 0) && (x =< 2 * Math::pi);
    }
    
    def frobnicate(a:angle) -> real 
    {
      return cos(a) * (1 - cos(a));
    }
    
    def cos(a:real)
    {
      var a:real = a;
      while (a < 0) a += 2 * Math::pi;
      while (a > 2 * Math::pi) a -= 2 * Math::pi;
    
      assert(a:angle);
      return cos_lut(a);
    }
    
    In the above code, the naive implementation of cos does some iteration on loops to move its argument into the (0,2π) interval. However, this is not necessary when the argument is of type angle, which already includes the aforementioned condition. Therefore, cos applied on an angle, as in the frobnicate function, would probably be split into a new specialized function that skipped the loops completely.
  • Elimination or modification of garbage collection instructions where ownership can be inferred by the compiler. Typical example:
    def succ(x:complex) -> complex
    {
      return complex(x.re + 1,x.im);
    }
    
    def frobnicate(x:complex) -> complex * complex
    {
      def c = complex(10,20);
      return (succ x,succ c);
    }
    
    Here, three versions of succ would exist. The first (create) would assume x is needed elsewhere, and allocate memory for its return value. The second (reuse) would assume x is not needed elsewhere, and reuse its memory for its return value. The third (count) would assume that x is reference-counted, and determine based on the count whether memory should be allocated or not. In the frobnicate function, succ c would use method 2, while succ x would use the same version as frobnicate itself. So, calling frobnicate on a variable that is not used anywhere else would result in a single allocation (that of c) even though three complex numbers are created. For greater numbers of arguments, more versions would be created. Version creation is on-demand, when enough demands are present (typically, if a version is not used enough to justify the code bload, it would not be created).
  • Handling frequently evaluated results and sharing them between functions. For example:
    
    for(T:type) if (comparable(T))
    def find_max(x:array(T)) -> T
    {
      var i:index = 1;
      var m:T = x[0];
      while (i < length(x))
      {
        m = max(m,x);
      }
      return m;
    }
    
    def max_on_size(x:array(real)) -> real
    {
      def max = find_max(x);
      def len = to_real(length(x));
      return max / len;
    }
    In this example, length(x) is computed several times. For simplicity, we'll assume that this computation is costly, so evaluating it multiple times would be a waste. Then, find_max would publish the costly operations it performs on its argument (here, length(x)) as part of its interface. Then, if the calling code recognizes some of the operations, it may provide find_max with a lazy evaluator. This evaluator will contain the value of length(x) after execution, if that value was computed, and so max_on_size will be able to use the existing value instead of recomputing it.
Advertisement
Bullet 1. If you allow any sort of serialization this is a bad idea. A unused field could be something from a previous
version, or file padding, or an assortment of other uses that are not obvious from the lack of use in the code.

Last Bullet. You have to prove that the code in the function length() is not changing X (ie a const function).
But you also have to prove between calls that nothing changes X in a way as to change the result of length(X).
Depending on the complexity of the code, and if it is multi-threaded, you can have MAJOR problems proving this.
Quote:Original post by KulSeran
Bullet 1. If you allow any sort of serialization this is a bad idea. A unused field could be something from a previous
version, or file padding, or an assortment of other uses that are not obvious from the lack of use in the code.


Serialization is an user of all the serialized fields in an object by definition, so no serialized fields would be removed on a potentially serialized object. The optimization would only occur when there could not possibly be any difference at runtime when the field is missing or uninitialized.

Quote:Last Bullet. You have to prove that the code in the function length() is not changing X (ie a const function).
But you also have to prove between calls that nothing changes X in a way as to change the result of length(X).
Depending on the complexity of the code, and if it is multi-threaded, you can have MAJOR problems proving this.


Actually, the proof is trivial: no code is ever allowed to change a value in the language (since side-effects are not allowed). If a piece of code appears to modify an object, it instead creates a fresh copy of the object and applies the modification to the copy — which may be simplified by bullet three into modifying the object in-place if nobody else is using it, so no performance loss is caused by constant unnecessary cloning. In short, once you get your hands on an object, it stays the way it is forever.

Of course, the language allows local variables, which are in fact references to existing objects. Modifying a variable copies the current referenced value, alters the copy, and makes the variable reference the copy. Variables cannot be passed as arguments (but their referenced values are).
This sounds like a situation which would really need HotSpot-style runtime optimisation - otherwise you're going to get massive code bloat when all the functions get specialised in multiple ways (and if one specialisation isn't used very often, it'll spend more time being pulled into the instruction cache than it will save on computation, so you have to be selective).

Once you have something like that, the rest seems to collapse under inlining into normal optimisation - inlined function calls will let you easily see what variables are never being used (and you can just treat objects as a series of variables, to handle point 1), and will let you see mutually-exclusive arithmetic constraints and non-escaping variables (so you can stack-allocate them and avoid GC) and common subexpressions (so you can avoid calling length(x) multiple times).

I can't see how you could possibly do that in an offline compiler - there are far too many cases where inline-specialisation is possible (and it could grow exponentially with the number of function calls), but only a very small number where it's worthwhile, and you don't have enough data to identify those cases. Maybe multi-pass profile-guided-optimisation would let you get part of the way towards that?

Quote:Original post by ToohrVyk
Actually, the proof is trivial: no code is ever allowed to change a value in the language (since side-effects are not allowed). If a piece of code appears to modify an object, it instead creates a fresh copy of the object and applies the modification to the copy — which may be simplified by bullet three into modifying the object in-place if nobody else is using it, so no performance loss is caused by constant unnecessary cloning. In short, once you get your hands on an object, it stays the way it is forever.


So, basically, everything is pass by value?

So to have a function which modifies an existing object, you'd have to do something like:

something = func_that_modifies(something)


Which would create a copy of 'something', pass that to func_that_modifies, probably return that copy?

Frankly, that strikes me as potentially a worse performance hit than the things you've been talking about.

Correct me if I'm wrong :)
Quote:Original post by kyoryu
Quote:Original post by ToohrVyk
Actually, the proof is trivial: no code is ever allowed to change a value in the language (since side-effects are not allowed). If a piece of code appears to modify an object, it instead creates a fresh copy of the object and applies the modification to the copy — which may be simplified by bullet three into modifying the object in-place if nobody else is using it, so no performance loss is caused by constant unnecessary cloning. In short, once you get your hands on an object, it stays the way it is forever.


So, basically, everything is pass by value?

So to have a function which modifies an existing object, you'd have to do something like:

something = func_that_modifies(something)


Which would create a copy of 'something', pass that to func_that_modifies, probably return that copy?

Frankly, that strikes me as potentially a worse performance hit than the things you've been talking about.

Correct me if I'm wrong :)


Many languages handle objects in that exact way, the performance hit is not as great as you might think.
Best regards, Omid
Excors: you're definitely right, a way to insert profiling results back into the optimization process would be avery large benefit. However, even without this, there is much that can be done. I disagree with your mention of an exponential increase with the number of calls: while this would indeed occur when inlining functions in order to do analysis, this is not the approach taken here.

Instead, I use static analysis to transfer information (about called functions, memory ownership, and so on) from callee to caller, where fixpoints can be identified in linear time. So, I have very easily extracted knowledge (does any function use this member? — just look at all functions), easily extracted knowledge (will this function ever use this member? — only requires to examine each function twice), and locally extracted knowledge (has this local object been aliased by someone else yet? — The answer is either "no", "yes" or "perhaps" to avoid undecidability).

In the end, I can find out some universal properties that don't even represent a runtime tradeoff (for instance, nobody uses a function, class or member) in linear examination of my code. But, of course, knowing about the number of calls to each version of a function would be an excellent asset in choosing which one is the default one, and which one isn't. Until then, I would have to rely on heuristics about call counts.

kyoryu: yes, everything is passed by value. This does not mean, however, that a copy is made whenever an argument is passed: it just happens to be "as if" the copy had been made. For instance, if the function never attempts to modify the argument, then passing the original instead of a copy makes no difference, so that's what happens. In fact, since it is the responsibility of the modifying code to create the copy (also known as Copy-On-Write), a copy will never be performed unless a modification happens.

And then, there's the reuse of the older object as the new one. For instance, if the optimizer can determine that nobody else holds an alias to the modified object, then the object is modified in place. For instance, consider the following code:

def change(x:something) -> something{  x.a = 10;  x.b = x.a + x.c;  x.c = x.d * 2;  return x;}


Here, the x.a = 10; assignment would create a copy of x and fill it with all the original values of x for fields b, c and d. Then, the x.b = x.a + x.c; would notice that x now references the copy, which hasn't been aliased by anything else. So it's a modification in-place. The same thing would happen with the x.c = x.d * 2;.

Of course, if the change function can assume that its argument x is not aliased by anyone else, then even the initial copy is not performed and the modification is done in-place. The optimizer might even be able to detect statements of the form x = change(x); and eliminate the superfluous assignment (because x was changed in place).
Quote:Original post by ToohrVyk
I disagree with your mention of an exponential increase with the number of calls: while this would indeed occur when inlining functions in order to do analysis, this is not the approach taken here.

Instead, I use static analysis to transfer information (about called functions, memory ownership, and so on) from callee to caller, where fixpoints can be identified in linear time. ...
But if you're specialising the callees (as in your second bullet point), and they themselves can call more functions, and you're propagating constraints down the invocations, I can imagine there still being a danger - if you had code like
void f(int n) {    if (n < 10)      g(n);    else if (n < 20) g(n+1);    else if (n < 40) g(n+2);    else             g(n+3);}void g(int n) {    if (n < 10)      h(n);    else if (n < 20) h(n+1);    else if (n < 40) h(n+2);    else             h(n+3);}void h(int n) {    if (n < 10)      i(n);    else if (n < 20) i(n+1);    else if (n < 40) i(n+2);    else             i(n+3);}...
then you might want to start by specialising g for the case n<10 and for 11<=n<21 and for 22<=n<42 and for 42<n (or you would at least consider those cases to see if it's worth generating code for a specialised function). The first specialised version of g would then want to specialise h for n<10; the second of g would want h for 12<=n<21 and for 22<=n<23; the third of g would want h for 24<=n<42 and 43<=n<45; and so on, preferably without any arithmetic mistakes I might have made (but hopefully I didn't mess up the concept).

If you carried the analysis on far enough, maybe you'd find the answer is always 0 and the whole of f can be replaced by a single statement, which would be a nice optimisation. But since the number of specialisations can grow exponentially, it's not good for compile times, so presumably you'd have to choose some arbitrary limits, and then the optimiser is relying on a load of hacks (or I suppose you could call them heuristics). Not that there's anything unusual or wrong about that - it's just not an elegant algorithm any more, which I suppose is the cost when you want it to work [smile]

That seems to be irrelevant when you're only transferring information up the call chain (i.e. from callee to caller, assuming there's no loops), e.g. "this function only uses the fields 'age' and 'male'" - in that case it seems like a form of type inference, where compare is given the type a:Person[age,male],b:Person[age,male] -> bool, and the type system has a subtype relation Person[age,male] <: Person so you can still pass any arbitrary Person to that function, and the optimiser looking at a call to compare can see that only the age and male fields are needed without having to look inside the definition of compare.

From your example code, it looks like the Person creation is equivalent to
def someone_name = someLongComputation();def someone_age = 42;def someone_male = false;
and once you've got that kind of type inference, the call to compare(a:Person[age,male],b:Person[age,male]) -> bool could be compiled into compare(p.age, p.male, someone_age, someone_male) and then some sort of live variable analysis lets you see that the assignment to someone_name is unnecessary and you can delete that statement, which seems to be what you'd like to do. Or am I missing some horrible flaws in this method?

This topic is closed to new replies.

Advertisement