[java] Perfomance vs. improved readability (Escape analysis does not work!?)

Started by
17 comments, last by Frederick 13 years, 11 months ago
Hi, I need some help from serious java game developers. At the moment I am revisiting my matrix code. My previous version was basically an array of values. Unfortunately this is not the most intuitive way to deal with matrix programming. This time i tried a more OOP style:

public class NewMatrix3 {
    private Vector3 x, y, z;
    private Point3 w;
     ...
}
The matrix has now subobjects, which bring their own methods. From a programming point of view this makes lots of things easier (at least in my opinion =)) Particularily I tried this:

    public Point3 multiply(Point3 p) {
        Point3 r = new Point3(w);

        Vector3 dx = x.multiply(p.getX());
        Vector3 dy = x.multiply(p.getY());
        Vector3 dz = x.multiply(p.getZ());

        r.translate(dx); r.translate(dy); r.translate(dz);

        return r;
    }

The purpose of this method is to transform a point in space by the matrix. The method is unorthodox, but much more visual than plain row/column dot-product multiplication. The result point is translated into the origin of our coordinate system and a linear combination of the axis vectors is added to the origin point. So mathematically all should be right ?!! Hope so, but thats not the point. We have a method here with at least 3 local objects, the vectors dx, dy, dz, which are allocated inside the scalar multiply method of vector3. When I understand right, these are perfect candiates for escape analysis. Thats a thing I really dislike about java... code like this is simply not feasible from a performance standpoint (at least it was), but is perfectly fine in c++ ;-( I really would appreciate to write like this. So I tried this: -server -XX:+DoEscapeAnalysis with a small test program:

 NewMatrix3 matrix = new NewMatrix3();
        matrix.loadAxisAngleRotation(30, new Vector3(1,1,1));
        Point3 p = new Point3();
        for(;;) {
            p.set((float)Math.random() * 100f, (float)Math.random() * 100f, (float)Math.random() * 100f);

            Point3 r = matrix.multiply(p);

            System.out.println(r.getX());

        }
Unfortunately there is NO difference to no escape analysis at all. The netbeans profiler reports a hell of lots of Vector3 objects that are allocated. But if I got this right, none should be allocated at all. I really appreciate some help. Because I like java and there are advantages to come later on in game programming, like good scripting support, continuations via javaflow, reflection, low compilation times, runtime code loading. But this sacrifice of readability in exchange for performance is really a pain. I am even thinking of porting my project to C++... :-( I would really appreciate help in this case. Btw. escape analysis seems to work somehow, as it lowered total object creation in my main program. So it does at least something. But i believe the example I provided should be the standard case for escape analysis and should provide a stellar speed up, shouldn't it ?! Thank you! Really desperate for help, Frederick
Advertisement
ummpf....

i just tried the windows snapshot of jdk7. EA should be enabled by default, but again I have NOOOO results. I am getting a little frustrated...
r.translate(dx);
return r;

Why do you think dx is not "escaping"?
Hmm,

lets see

Quote:
The -XX:+DoEscapeAnalysis option directs HotSpot to look for objects that are created and referenced by a single thread within the scope of a method compilation.


dx is created withing the scope, thats "x.multiply(p.getX())"
dx is referenced within the scope, thats "r.translate(dx)"

Potentially dx could escape through "translate" but it is only accessed read only.
In this case the optimization should jump in. I missed to provide that code snipped.

If an object "escapes" and is read-only accessed, HotSpot might inline all the code in the method and eliminate the object creation.
You may check this here:

http://java.sun.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html

The thing is I tried to run the example from this page, but escape analysis didn´t jump in either. I have some code snippets now, that are more clearly:

public void rotate(float angle, Vector3 axis) {    Matrix3 rotationMatrix = new Matrix3();   rotationMatrix.loadAxisAngleRotation(angle, axis);   rotationMatrix.transformOnlyOrientation(this);}


You may see now the "rotationMatrix" isn´t even passed as a parameter, so there is no hidden magic this time, but the problem remains. The above example produces lots of object allocations.

I would really enjoy this to work, don´t want to use static methods or hacks like that.

Here is the "defensive copy" example from the page:

class Matrix3 {   private Vector3 x, y, z;   private Point3 pos   Vector3 getX() {     //Defensive copy     return new Vector3(x);   }}


You see the matrix has subobjects for its axes and these are returned "safely" in order to prevent manipulation from the outside. The link states this should be optimized, but it isn´t.

I would be very grateful if anybody could check that and give me advice how to enable the feature. The VM version I need and the parameters. According to the link it should work, but it doesn´t.

Btw. I am using the -Xloggc mechanism and the netbeans memory profiler to check if EA jumps in. Please tell me if anything might be wrong with that.

my java version is:
java version "1.7.0-ea"
Java(TM) SE Runtime Environment (build 1.7.0-ea-b91)
Java HotSpot(TM) Client VM (build 18.0-b03, mixed mode, sharing)

Thank you a lot for your attention,
Frederik
Escape analysis will always be just optional run-time optimization. It may kick in, or it might not.

If performance is important, use different style. If performance is optional and source code is what matters most, then use OO style. If performance is paramount, use C# or C++.

Java, due to the design of its memory model, is worst possible case for this type of operations and will always remain as such. The biggest problems do not come from temporaries, but from lack of in-place array allocations.

For performance sensitive processing, matrices and vectors would be organized in arrays and processed sequentially to maximize cache locality. This isn't possible in Java, so that is a factor 8-20 performance hit vs. C/C++.

Java is simply a wrong abstraction for this type of problems.
Quote:code like this is simply not feasible from a performance standpoint ..., but is perfectly fine in c++

[irony]Hmm, interesting.
How much difference in performance?[/irony]

Quote:You see the matrix has subobjects for its axes and these are returned "safely" in order to prevent manipulation from the outside. The link states this should be optimized, but it isn´t.

The link doesn't say that. It says that "if the compiler determines that the original object is never modified, it may optimize and eliminate the call to make a copy".

Quote:dx is referenced within the scope, thats "r.translate(dx)"

Potentially dx could escape through "translate" but it is only accessed read only.
In this case the optimization should jump in. I missed to provide that code snipped.

But that implies that the escape analysis has to be performed through n-level indirection. I am not sure about that. I cannot deduct that from the link. Can you give me any hint?
Quote:If performance is paramount, use C# or C++.

If performance is paramount, use a proper algorithm.
Quote:Original post by ppgamedev
Quote:If performance is paramount, use C# or C++.

If performance is paramount, use a proper algorithm.


The right man in the wrong place can make all the difference.
Quote:Original post by ppgamedev
Quote:If performance is paramount, use C# or C++.

If performance is paramount, use a proper algorithm.


Yep. Optimal data-centric algorithm of this type is 8-20 times slower in Java than in any language that allows control over memory allocations. Usually C or C++, but Fortran would fit the bill as well, it's just a bit hard to find developers for it.

I speak from experience, when running a cluster of machines, doing some memory heavy processing, Java's lack of continuous allocation prevents certain algorithmic optimizations. 8-20 times may not seem much, but think of it this way - instead of needing half a rack in a data center, a single quad core will do. Data centers also make it difficult to do real-time graphics, when dealing with tens of millions of vertices.

In Java, lack of cache locality kills this type of algorithms, and there is no way around it. Pooling and such prevents recycling, but it doesn't allow sequential ordering of elements.

Escape analysis under 1.6 has been shown to provide around factor of 15 performance improvement in best case - then there is still that 8-20 factor.

Until Java introduces C# struct-like type, it's simply not economically viable for this type of tasks.
http://hal.inria.fr/docs/00/31/20/39/PDF/RT-0353.pdf

Well, in this report the difference with Fortran is not so dramatic.

This topic is closed to new replies.

Advertisement