Jump to content

  • Log In with Google      Sign In   
  • Create Account

[java] Perfomance vs. improved readability (Escape analysis does not work!?)


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
18 replies to this topic

#1 Frederick   Members   -  Reputation: 140

Like
0Likes
Like

Posted 01 May 2010 - 01:13 PM

Hi, I need some help from serious java game developers. At the moment I am revisiting my matrix code. My previous version was basically an array of values. Unfortunately this is not the most intuitive way to deal with matrix programming. This time i tried a more OOP style:
public class NewMatrix3 {
    private Vector3 x, y, z;
    private Point3 w;
     ...
}
The matrix has now subobjects, which bring their own methods. From a programming point of view this makes lots of things easier (at least in my opinion =)) Particularily I tried this:
    public Point3 multiply(Point3 p) {
        Point3 r = new Point3(w);

        Vector3 dx = x.multiply(p.getX());
        Vector3 dy = x.multiply(p.getY());
        Vector3 dz = x.multiply(p.getZ());

        r.translate(dx); r.translate(dy); r.translate(dz);

        return r;
    }

The purpose of this method is to transform a point in space by the matrix. The method is unorthodox, but much more visual than plain row/column dot-product multiplication. The result point is translated into the origin of our coordinate system and a linear combination of the axis vectors is added to the origin point. So mathematically all should be right ?!! Hope so, but thats not the point. We have a method here with at least 3 local objects, the vectors dx, dy, dz, which are allocated inside the scalar multiply method of vector3. When I understand right, these are perfect candiates for escape analysis. Thats a thing I really dislike about java... code like this is simply not feasible from a performance standpoint (at least it was), but is perfectly fine in c++ ;-( I really would appreciate to write like this. So I tried this: -server -XX:+DoEscapeAnalysis with a small test program:
 NewMatrix3 matrix = new NewMatrix3();
        matrix.loadAxisAngleRotation(30, new Vector3(1,1,1));
        Point3 p = new Point3();
        for(;;) {
            p.set((float)Math.random() * 100f, (float)Math.random() * 100f, (float)Math.random() * 100f);

            Point3 r = matrix.multiply(p);

            System.out.println(r.getX());

        }
Unfortunately there is NO difference to no escape analysis at all. The netbeans profiler reports a hell of lots of Vector3 objects that are allocated. But if I got this right, none should be allocated at all. I really appreciate some help. Because I like java and there are advantages to come later on in game programming, like good scripting support, continuations via javaflow, reflection, low compilation times, runtime code loading. But this sacrifice of readability in exchange for performance is really a pain. I am even thinking of porting my project to C++... :-( I would really appreciate help in this case. Btw. escape analysis seems to work somehow, as it lowered total object creation in my main program. So it does at least something. But i believe the example I provided should be the standard case for escape analysis and should provide a stellar speed up, shouldn't it ?! Thank you! Really desperate for help, Frederick

Sponsor:

#2 Frederick   Members   -  Reputation: 140

Like
0Likes
Like

Posted 02 May 2010 - 12:33 AM

ummpf....

i just tried the windows snapshot of jdk7. EA should be enabled by default, but again I have NOOOO results. I am getting a little frustrated...

#3 ppgamedev   Members   -  Reputation: 311

Like
0Likes
Like

Posted 02 May 2010 - 09:29 AM

r.translate(dx);
return r;

Why do you think dx is not "escaping"?

#4 Frederick   Members   -  Reputation: 140

Like
0Likes
Like

Posted 02 May 2010 - 11:40 AM

Hmm,

lets see

Quote:

The -XX:+DoEscapeAnalysis option directs HotSpot to look for objects that are created and referenced by a single thread within the scope of a method compilation.


dx is created withing the scope, thats "x.multiply(p.getX())"
dx is referenced within the scope, thats "r.translate(dx)"

Potentially dx could escape through "translate" but it is only accessed read only.
In this case the optimization should jump in. I missed to provide that code snipped.

If an object "escapes" and is read-only accessed, HotSpot might inline all the code in the method and eliminate the object creation.
You may check this here:

http://java.sun.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html

The thing is I tried to run the example from this page, but escape analysis didn´t jump in either. I have some code snippets now, that are more clearly:


public void rotate(float angle, Vector3 axis) {
Matrix3 rotationMatrix = new Matrix3();

rotationMatrix.loadAxisAngleRotation(angle, axis);
rotationMatrix.transformOnlyOrientation(this);
}


You may see now the "rotationMatrix" isn´t even passed as a parameter, so there is no hidden magic this time, but the problem remains. The above example produces lots of object allocations.

I would really enjoy this to work, don´t want to use static methods or hacks like that.

Here is the "defensive copy" example from the page:


class Matrix3 {
private Vector3 x, y, z;
private Point3 pos

Vector3 getX() {
//Defensive copy
return new Vector3(x);
}
}


You see the matrix has subobjects for its axes and these are returned "safely" in order to prevent manipulation from the outside. The link states this should be optimized, but it isn´t.

I would be very grateful if anybody could check that and give me advice how to enable the feature. The VM version I need and the parameters. According to the link it should work, but it doesn´t.

Btw. I am using the -Xloggc mechanism and the netbeans memory profiler to check if EA jumps in. Please tell me if anything might be wrong with that.

my java version is:
java version "1.7.0-ea"
Java™ SE Runtime Environment (build 1.7.0-ea-b91)
Java HotSpot™ Client VM (build 18.0-b03, mixed mode, sharing)

Thank you a lot for your attention,
Frederik


#5 Antheus   Members   -  Reputation: 2397

Like
0Likes
Like

Posted 03 May 2010 - 08:33 AM

Escape analysis will always be just optional run-time optimization. It may kick in, or it might not.

If performance is important, use different style. If performance is optional and source code is what matters most, then use OO style. If performance is paramount, use C# or C++.

Java, due to the design of its memory model, is worst possible case for this type of operations and will always remain as such. The biggest problems do not come from temporaries, but from lack of in-place array allocations.

For performance sensitive processing, matrices and vectors would be organized in arrays and processed sequentially to maximize cache locality. This isn't possible in Java, so that is a factor 8-20 performance hit vs. C/C++.

Java is simply a wrong abstraction for this type of problems.

#6 ppgamedev   Members   -  Reputation: 311

Like
0Likes
Like

Posted 03 May 2010 - 08:50 AM

Quote:
code like this is simply not feasible from a performance standpoint ..., but is perfectly fine in c++

[irony]Hmm, interesting.
How much difference in performance?[/irony]

Quote:
You see the matrix has subobjects for its axes and these are returned "safely" in order to prevent manipulation from the outside. The link states this should be optimized, but it isn´t.

The link doesn't say that. It says that "if the compiler determines that the original object is never modified, it may optimize and eliminate the call to make a copy".

Quote:
dx is referenced within the scope, thats "r.translate(dx)"

Potentially dx could escape through "translate" but it is only accessed read only.
In this case the optimization should jump in. I missed to provide that code snipped.

But that implies that the escape analysis has to be performed through n-level indirection. I am not sure about that. I cannot deduct that from the link. Can you give me any hint?

#7 ppgamedev   Members   -  Reputation: 311

Like
0Likes
Like

Posted 03 May 2010 - 09:07 AM

Quote:
If performance is paramount, use C# or C++.

If performance is paramount, use a proper algorithm.

#8 nullsquared   Members   -  Reputation: 122

Like
0Likes
Like

Posted 03 May 2010 - 09:11 AM

Quote:
Original post by ppgamedev
Quote:
If performance is paramount, use C# or C++.

If performance is paramount, use a proper algorithm.


The right man in the wrong place can make all the difference.

#9 Antheus   Members   -  Reputation: 2397

Like
0Likes
Like

Posted 03 May 2010 - 09:20 AM

Quote:
Original post by ppgamedev
Quote:
If performance is paramount, use C# or C++.

If performance is paramount, use a proper algorithm.


Yep. Optimal data-centric algorithm of this type is 8-20 times slower in Java than in any language that allows control over memory allocations. Usually C or C++, but Fortran would fit the bill as well, it's just a bit hard to find developers for it.

I speak from experience, when running a cluster of machines, doing some memory heavy processing, Java's lack of continuous allocation prevents certain algorithmic optimizations. 8-20 times may not seem much, but think of it this way - instead of needing half a rack in a data center, a single quad core will do. Data centers also make it difficult to do real-time graphics, when dealing with tens of millions of vertices.

In Java, lack of cache locality kills this type of algorithms, and there is no way around it. Pooling and such prevents recycling, but it doesn't allow sequential ordering of elements.

Escape analysis under 1.6 has been shown to provide around factor of 15 performance improvement in best case - then there is still that 8-20 factor.

Until Java introduces C# struct-like type, it's simply not economically viable for this type of tasks.

#10 ppgamedev   Members   -  Reputation: 311

Like
0Likes
Like

Posted 03 May 2010 - 11:04 AM

http://hal.inria.fr/docs/00/31/20/39/PDF/RT-0353.pdf

Well, in this report the difference with Fortran is not so dramatic.

#11 Frederick   Members   -  Reputation: 140

Like
0Likes
Like

Posted 03 May 2010 - 11:32 AM

Hey!

Thanks for your answers!

Quote:

mm, interesting.
How much difference in performance?


Frankly I can´t tell, but I don´t like to have an allocation hell going on in my project. Maybe its not a big deal in this example, but it has a potential to sum up quickly. The JMonkeyEngine guys seem to be fine - maybe I should be too...but I don´t like to run allocations out of control.


Quote:

Java, due to the design of its memory model, is worst possible case for this type of operations and will always remain as such. The biggest problems do not come from temporaries, but from lack of in-place array allocations.


Actually that is not quite right... I use ByteBuffer objects to store all my vertices (same for matrices) and pass them to the jni c++ layer. ;-)
But you are right. This is an annoying hack... but I got around with that.

So no big performance hit here...

I primarily chose java because I hoped for more productivity. That has been mostly right up to now, because of the great refactoring tool support, lack of header files and so on. I was also annoyed of all the small flaws of c++ one has to work around... its simply not a clean language.
It would be a strange insight in my programming career that c++ might actually allow MORE intuitive programming concerning algebra than java.

Frankly I like java and its dynamic nature... There are just a few flaws. EA could solve the stack allocation problem.

Quote:

Escape analysis under 1.6 has been shown to provide around factor of 15 performance improvement in best case


How the heck do I reliably switch that thing on (Yes I tried the -XX:+UseEscapeAnalysis flag). If I am right it is as equally capable than the C++ stack scope auto-deallocation thing, plus may be more capable in future.
The analysis phase takes startup time, but that doesn´t matter.

I have come to believe that EA never worked in any example (That one I wrote about, was most likely a change of scala in the profiler, which seemed like
lower allocation level).

Has anybody perceived real results with EA and can tell me a method to reproduce that, how to enable and verify that it kicks in ?

Lets take this example again:
Quote:

public void rotate(float angle, Vector3 axis) {
Matrix3 rotationMatrix = new Matrix3();

rotationMatrix.loadAxisAngleRotation(angle, axis);
rotationMatrix.transformOnlyOrientation(this);
}


The matrix allocation should be clearly optimized, or am I wrong about that ?

Quote:

But that implies that the escape analysis has to be performed through n-level indirection. I am not sure about that. I cannot deduct that from the link. Can you give me any hint?


Ok you refer to my first more complicated example, keep in mind, that the simpler also does not work for me. But... I have no exact source, but I figured if that HotSpot is capable of:

Quote:

The method makes a copy to prevent modification of the original object by the caller. If the compiler determines that the getPerson() method is being invoked in a loop, it will inline that method. In addition to this, by escape analysis, if the compiler determines that the original object is never modified, it may optimize and eliminate the call to make a copy.


Lets see:

1.)multiply is in a loop and gets inlined.
2.)transform() is in multiply and thus in a loop (the indirection you mentioned)
3.)dx is read only accessed and its pointer is not stored in a collection

So the case is at least similar... why would hotspot only inline one call level.
Why wouldn´t it bake the second level also into one big soup !? ,-)

Not sure about that, but keep in mind, that the more obvious example I provided before doesn´t work either.

Quote:

Java, due to the design of its memory model, is worst possible case for this type of operations and will always remain as such.


Quote:

Until Java introduces C# struct-like type, it's simply not economically viable for this type of tasks.


Don´t be so hard to java. Its actually a good language and HotSpot is magic. Also run-time code loading and reflection gonna be cool in gamedev. Netbeans
is also superior to VS. You don´t believe ? It is... I know both ;-)

There are some pieces missing. But... if an "array"-pool allocation mechanism gets added to OpenJDK and EA does work reliably the language would be there.
I am too busy with game development (-:, but some compiler freaks could kick it...

Till then use ByteBuffers and wrapper classes to solve the pool problem.

C# is not so cross platform and I simply won´t learn another system.

I like java and I don´t feel like going to c++, at least not yet. Please help
me to get that damn thing going !!!

Thank you for your kind attention,
Frederik

#12 Antheus   Members   -  Reputation: 2397

Like
0Likes
Like

Posted 03 May 2010 - 11:55 AM

Quote:
Original post by ppgamedev
http://hal.inria.fr/docs/00/31/20/39/PDF/RT-0353.pdf

Well, in this report the difference with Fortran is not so dramatic.

From the benchmark:
  private static final int datasizes_M[] = {50000,100000,500000};
private static final int datasizes_N[] = {50000,100000,500000};
private static final int datasizes_nz[] = {250000,500000,2500000};
private static final int SPARSE_NUM_ITER = 200;

Random R = new Random(RANDOM_SEED);

double [] x;
double [] y;
double [] val;
int [] col;
int [] row;
int [] lowsum;
int [] highsum;




Like I said - contiguous data. Java only allows it for primitive types.

This is relevant to OP - if you want numeric performance in Java, the bottleneck is not bytecode or CPU, it's memory layout, which means throwing out the OO way. Something which is incredibly unreadable.

As soon as things are wrapped in classes, the instances get scattered all over the memory, incurring huge cache access penalties.

A compromise is SoA layout upon which bulk operations are performed. That way code readability can be preserved, but is only effective if there are large numbers of elements - which is orthogonal to OP's design.


And I just love this type of academic articles. First they reject a test they know performs sub-optimally, then they run a few tutorial-grade benchmarks and draw a pretty picture. Whatever happened to actually interpreting results and analyzing the causes, testing hypothesis and similar.

Quote:
How the heck do I reliably switch that thing on (Yes I tried the -XX:+UseEscapeAnalysis flag). If I am right it is as equally capable than the C++ stack scope auto-deallocation thing, plus may be more capable in future.

It was added in 1.6.14 and disabled in 1.6.18. It is currently available experimentally under 1.7, but it may vary depending on the build - I don't follow beta versions. It might have been disabled or crippled again due to problems with G1 GC.

It is also inevitably up to VM to decide when to kick in. So it might be in, but might not be working in this example.

Quote:
Don´t be so hard to java. Its actually a good language and HotSpot is magic. Also run-time code loading and reflection gonna be cool in gamedev.

There are three alternatives:
- If readability is vital - go for it, forget about performance until profiler shows it's a problem (should be fine for hundreds, or perhaps even thousands of objects). At some point in the future, escape analysis may help - but there is no guarantee. Readability remains constant, VM will only improve.
- If consistent performance is important today, then array-centric approach or using in/out parameters is the way to go. It doesn't rely on 'magic', but works.
- To really push the boundaries, you simply need a language that allows flat memory layout.

[Edited by - Antheus on May 3, 2010 6:55:39 PM]

#13 ppgamedev   Members   -  Reputation: 311

Like
0Likes
Like

Posted 04 May 2010 - 08:02 AM

Quote:
Original post by Frederick
Quote:
Original post by ppgamedev
Quote:
Original post by Frederick
Thats a thing I really dislike about java... code like this is simply not feasible from a performance standpoint (at least it was), but is perfectly fine in c++

[irony]Hmm, interesting.
How much difference in performance?[/irony]

Frankly I can't tell, but I don't like to have an allocation hell going on in my project.

But unless you change the code, how would you avoid that allocation hell with C++?

Quote:
Original post by Antheus
It is also inevitably up to VM to decide when to kick in. So it might be in, but might not be working in this example.

Not really sure about this because:
Quote:
Original from Frederick link to sun
Escape analysis is a technique by which the Java™ Hotspot Server Compiler can analyze the scope of an object and decide whether to allocate memory on the heap or not.

so it is not a JVM improvement but a compiler one.
Or maybe not?

Quote:
Original post by Antheus
Like I said - contiguous data. Java only allows it for primitive types.

What do you mean with that?
Do you mean that you can create an array of contiguous objects in C++?

Quote:
Original post by Antheus
Until Java introduces C# struct-like type, it's simply not economically viable for this type of tasks.

I am not an expert in C# but I think struct has value semantics, and in C# that means pass-by-value.
Wouldn't it be a heavy penalty for performance?

It's curiosity.
Can you try something like this?:

public Point3 multiply(Point3 p) {
Point3 r = new Point3(w);

Vector3 dx = x.multiply(p.getX());
Vector3 dy = x.multiply(p.getY());
Vector3 dz = x.multiply(p.getZ());

r.translate(dx.getX(), dx.getY(), dx.getZ());
r.translate(dy.getX(), dy.getY(), dy.getZ());
r.translate(dz.getX(), dz.getY(), dz.getZ());

return r;
}


and this

public Point3 multiply(Point3 p) {
Point3 r = new Point3(w);

float xx = p.getX();
float yy = p.getY();
float zz = p.getZ();
Vector3 dx = new Vector3(xx*x.getX(), xx*x.getY(), xx*x.getZ());
Vector3 dy = new Vector3(yy*x.getX(), yy*x.getY(), yy*x.getZ());
Vector3 dz = new Vector3(zz*x.getX(), zz*x.getY(), zz*x.getZ());

r.translate(dx.getX(), dx.getY(), dx.getZ());
r.translate(dy.getX(), dy.getY(), dy.getZ());
r.translate(dz.getX(), dz.getY(), dz.getZ());

return r;
}


By the way is it dy = x.multiply(p.getY()); or dy = y.multiply(p.getY()); ???

If there are no positive results, can you try the sample from the sun page and see if you can get results with that?

Could it be that the profiler is interfering.
Why don't you generate two compiled versions with and without the optimization and then checked the total time of maybe 100,000 thousands of operations.

#14 Frederick   Members   -  Reputation: 140

Like
0Likes
Like

Posted 04 May 2010 - 09:39 AM

Hey...


Quote:

But unless you change the code, how would you avoid that allocation hell with C++?


The cool answer is, you just have to remove the "new" keyword and the object is allocated on the stack, not the heap memory. Thats really a cool feature of c++.

This code is perfectly valid (and performant) c++ code



public void rotate(float angle, Vector3 axis) {
Matrix3 rotationMatrix = Matrix3();

rotationMatrix.loadAxisAngleRotation(angle, axis);
rotationMatrix.transformOnlyOrientation(this);
}



When the function is called, all local variables are "bundled" together and put
on the stack. Allocating objects this way is a matter of moving the stack pointer a little bit farther. So allocation cost is truly zero. Only initialization matters. When the function returns the stack pointer is moving back and all objects get auto-deallocated. Very easy, clever & powerful.
But also tricky for learners because you can´t return local objects. You have to know what the language does in the background to program in c++. But... its the same with java, if you ask me.


I am not an expert in C# but I think struct has value semantics, and in C# that means pass-by-value.
Wouldn't it be a heavy penalty for performance?


Yes it would, if you would pass the whole struct-array by value, but that is a thing you wouldn´t do. You have to copy each struct individual into its flat-memory array cell and thats in principle the same as initializing an object. Potentially it could be faster because you have the memory where the object will take place directly at hand, you preallocated it in one chunk.

Think of an array, where each cell has the size of a whole object, not just the pointer as in java and you copy the objects properties directly into to the cells, not at the location of the pointer you find in the cell

Quote:

Can you try something like this:


Yes sure. But the thing that matters most is abstraction in programming.
It makes the job really many times easier.

In this simple case, it might be bearable to pass only primitive types, but
this really bloats the code and if you ask me, the right abstraction makes
all the difference.


Could it be that the profiler is interfering.
Why don't you generate two compiled versions with and without the optimization and then checked the total time of maybe 100,000 thousands of operations.


Yeah... wahhooo. That showed some real results. I switched back to jdk.1.6.0_16
and did a simple time-benchmark with this code:


Point3 p = new Point3();
for(int i = 0;i <= 24000000;i++) {
matrix.rotate(0.1f * i, new Vector3(1,0,0));
p.translate(matrix.x());
}


It took 18 seconds without the EA flag turned on and 16 seconds with the flag.
Thats quite a difference. Maybe I should do a C++ speed comparison.
I would call this quite a good result. 2 seconds sounds reasonable as a saving when allocations are omitted.

Its true, maybe the profiler interferes. The longer i think about it, the more reasonable it sounds. I believe the profiler does some kind of bytecode instrumentation and maybe replaces every allocation with a call to the profiler.

Just guessing...

Thank you.

#15 ppgamedev   Members   -  Reputation: 311

Like
0Likes
Like

Posted 04 May 2010 - 11:39 AM

Quote:

The cool answer is, you just have to remove the "new" keyword and the object is allocated on the stack, not the heap memory.

I know about stack vs heap. (at least I knew many time ago)
But how are we going to avoid "new" with this code:
Vector3 dx = x.multiply(p.getX());
in this case "new" is hidden inside multiply but it is still there.
has to be there, am I right?
It is more than ten years that I was forced to changed C++ by Java.
(pretty angry at the beginning but very happy now)
That's the reason I am not sure about all those things.

Quote:

Yes it would, if you would pass the whole struct-array by value, but that is a thing you wouldn´t do.

But as soon as the code is a bit complicated performance is hit.
I mean, I know it is very fast to create an array of C#-structs, but after that,
imagine that every element has to be processed been passed to a couple of functions,
and maybe calling another one inside these...
then performance is hit or you are forced to small structs...
or maybe you can use some C# trick I am not aware of.
Any thoughts?

About the code, Oh I didn't want you to change the code permanently, it was just an attempt to discover the behaviour of the escape analysis performance enhancement.

And, I am really happy we catch the bug.
It has been a funny hunting.

Also, I have another curiosity: what's the difference in performance between the client and the server JVM?

#16 Antheus   Members   -  Reputation: 2397

Like
0Likes
Like

Posted 04 May 2010 - 01:30 PM

Quote:
Original post by ppgamedev
I mean, I know it is very fast to create an array of C#-structs, but after that,
imagine that every element has to be processed been passed to a couple of functions,
and maybe calling another one inside these...
then performance is hit or you are forced to small structs...
or maybe you can use some C# trick I am not aware of.


Arrays are passed by reference. Structs are inside array. In Java, this isn't possible, unless dealing with primitive types.

Comparison.

Quote:
what's the difference in performance between the client and the server JVM?

Last I checked, server is tuned for longer running tasks, and takes longer to invoke certain optimizations, so it gets a better "big picture". In general, server is slightly faster, but nothing drastic. The GC might also be tuned slightly differently. Escape analysis might exist only in server VM for now.

#17 ppgamedev   Members   -  Reputation: 311

Like
0Likes
Like

Posted 04 May 2010 - 08:56 PM

Quote:
Original post by Antheus
Quote:
Original post by ppgamedev
I mean, I know it is very fast to create an array of C#-structs, but after that,
imagine that every element has to be processed been passed to a couple of functions,
and maybe calling another one inside these...
then performance is hit or you are forced to small structs...
or maybe you can use some C# trick I am not aware of.

Arrays are passed by reference. Structs are inside array. In Java, this isn't possible, unless dealing with primitive types.

Are C#-structs your golden hammer?
Hi Antheus, I don't know you personally and maybe you are brilliant.
But it seems you just ignore what I say and just keep praising C#-structs.

Anyway, this could be another problem

struct Foo {
public int x;
}

int modifyFoo(Foo foo) {
int previousX = foo.x;
foo.x = 2 * foo.x;
return previousX;
}

Foo[] foos = new Foo[10];
foos[1].x = 10;
Console.WriteLine("before reset: {0}", foos[1].x);
modifyFoo(foos[1]);
Console.WriteLine("after reset: {0}", foos[1].x);


What's the result from this?
I guess it is 10 and 10.
Value objects are nice, but entities are also.

Quote:
Original post by Antheus
Quote:
Original post by ppgamedev
what's the difference in performance between the client and the server JVM?

Last I checked, server is tuned for longer running tasks, and takes longer to invoke certain optimizations, so it gets a better "big picture". In general, server is slightly faster, but nothing drastic. The GC might also be tuned slightly differently. Escape analysis might exist only in server VM for now.

I was (obviously?) referring to Frederick's code. That particular case.
Also, what I understand from Sun docs is that "escape analysis" is a compile time improvement, completely independent of the VM.





#18 ppgamedev   Members   -  Reputation: 311

Like
0Likes
Like

Posted 04 May 2010 - 09:50 PM

Hey Frederick

Have you done a C++ comparison?

If you do it don't forget to post the results!!

#19 Frederick   Members   -  Reputation: 140

Like
0Likes
Like

Posted 05 May 2010 - 10:56 AM

Quote:

Hey Frederick

Have you done a C++ comparison?

If you do it don't forget to post the results!!


Hey !!!

Ah, nooo, maybe that comes later, but i did another interesting test:

Remember my function looked like this:


public void rotate(float angle, Vector3 axis) {

Matrix3 rotationMatrix = Matrix3();

rotationMatrix.loadAxisAngleRotation(angle, axis);
rotationMatrix.transformOnlyOrientation(this);
}



which was called like this:

Matrix3 matrix = new Matrix3();
Point3 p = new Point3();

for(int i = 0;i <= 24000000;i++) {

matrix.rotate(0.1f * i, new Vector3(1,0,0));

p.translate(matrix.x());

}



So basically, we have an allocation of Matrix3 (which has four subobjects x,y,z,w by the way) per loop iteration - right ?

Running time without EA 59 seconds - with EA enabled 52 seconds. I would call that a good improvement, but the exciting thing comes now:

I tried this also:


Matrix3 matrix = new Matrix3();
Point3 p = new Point3();

Vector3 axis = new Vector3(1,0,0); //Moved all allocations out of the loop
Matrix3 rotationMatrix = new Matrix3();

for(int i = 0;i <= 48000000;i++) {
//Only assignments in here
rotationMatrix.loadAxisAngleRotation(0.1f * i, axis);
rotationMatrix.transformOnlyOrientation(matrix);
p.translate(matrix.x());
}



Now guess the running time !? 55 seconds with EA disabled and 52 seconds with EA enabled. Well I have only one word for this: MAGIC. Hotspot is real magic.
I am happy and confident now that I can walk far with java. It is a super advanced modern system by now and should get credit for that.

C++ test will follow If I am not lazy. I would guess 42 seconds or something, lets see ;-)

Yes, Antheus is right a flat memory layout would be a great addition to handle massive collections of objects, like vertices, indices and so on. I hope some game and media oriented people will add this to the jdk. It would also make communication with c++ easier. By now I stuff everything into ByteBuffers, which are wrapped into objects on the c++ and the java side, that is a hacky workaround, but I hardly notice, what is going on behind, when I use the wrappers.


So I am really happy I have no need to port my project and can happily stay. The only thing is I would like to have a reliable way to find out when allocations are omitted and when not, to get more control over that, so if anybody has a good idea I am happy to hear about.

Thank you all for your great participation, I don`t think I would have drawn this conclusions by myself (-;

Frederick




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS