Megamorphic calls

Published March 30, 2015
Advertisement

[font=verdana]I'll be having this journal to post random stuff probably! So...

Megamorphic calls



JVM distinguish regular calls from megamorphic calls (actually it distinguishes bimorphic calls too but whatever).

Regular calls can be optimized for, they can be inlined, or at least you can have the vtable round trip avoided. Whats a "regular call" ? Well its not necessarily a 'final' method.

HotSpot evaluates each call site of a method, and gathers statistics on what implementations of those methods were called. In most of the code, its very rare to find sections where say, 5 implementations of the same method are called, which is the case where vtable pointer chasing is actually needed.

HotSpot knows this, it can determine if a particular call site is only receiving objects of a certain type by profiling the code while its running, in the best of cases, it can inline the call, regardless of how many other implementations of said method there are running around, if a call site only ever sees one, it can be inlined.

This has a cost though. It can't inline the call blindly, what happens if you send 10k objects of one type, but that very last one is from another type? You'd be making the wrong call! This is called an "speculative optimization" for a reason, HotSpot can never be 100% sure you wont send another subclass down the same path eventually.

The best thing it can do is to inline the call for now and add an "uncommon trap" just in case, this is a type check before continuing the execution of the call. If the type is the expected, run the inlined code, otherwise, let HotSpot know what happened and fetch the correct function pointer to execute.

Example



This is a bit of code I was looking at today:[/font]public final void batchUpdate ( final T[] items, final int offset, final int limit ){ final ByteBuffer buff = tmpBuffer; // Reset position. buff.clear(); for ( int i = offset; i < limit; ++i ) { // For all batched renderables, update. updateFunc.accept( items, buff ); } // Upload data in buffer to UBO. updateUBO();} // accept signature being:void accept(T t, ByteCode u)

[font=verdana]Thats the snippet of a batch updater for UBOs, that 'updateFunc' is a functional interface, BiConsumer specifically. We delegate the actual update operation implementation to that function since it depends on the kind of renderable we're trying to fetch data from. Thus its often implemented as a consumer of a specific renderable type (say, a Spatial, or a SpotLight).

Function implementation usually looks like this lambda:[/font]

updateFunc = ( item, buff ) -> { float radius = item.pointLight.radius; item.viewPosition.putIn( buff ); item.pointLight.color.putIn( buff ); buff.putFloat( (1.0f / radius) );} );

[font=verdana]The 'item' in this case is a point light renderable. Put the view space position, color and inverse radius in the buffer. Alignment is done here if necessary, not the case though since viewPosition is a Vec4f, color is a Vec3f and a float is put in the last position. 32 bytes in total, perfect for std140 layout.

For all the different UBOs we have (transforms, spot lights, point lights, animation data, materials and whatever comes next), we'll have one specific updateFunc implementation. And each call to updateFunc.accept is a megamorphic call since by the time they're inside the RenderPasses, we're uploading
all the different kinds of data.

What does this means? Well, have in mind if we're rendering a couple thousand meshes, we're probably calling one to four of these updateFunc.apply calls per mesh, we could go anywhere from 10k to 100k updateFunc.apply calls.

Its dangerous to do virtual calls alone, take this!



This has a very tiny and simple fix though, we just need to move the for loop inside the function. For that we only need to add two parameters to the 'accept' signature, 'offset' and 'limit', and replace the single 'item' parameter with an array 'items'.[/font]
// New interface to use.@FunctionalInterfacepublic interface UpdateFunc{ public void accept ( T[] items, ByteBuffer buff, int start, int end );} // New point light update function implementation:updateFunc = ( items, buff, start, end ) -> { // This is the easiest kind of lambda to resolve for HotSpot btw, it gets the same treatment as static methods, // since it doesn't needs to capture any state, it just works with the passed parameters. for ( int i = start; i < end; ++i ) { PointLightInstance item = items; float radius = item.pointLight.radius; item.viewPosition.putIn( buff ); item.pointLight.color.putIn( buff ); buff.putFloat( (1.0f / radius) ); }} ); // And new update call inside the UBO updater:public final void batchUpdate ( final T[] items, final int offset, final int limit ){ final ByteBuffer buff = tmpBuffer; // Reset buffer. buff.clear(); // Update all the batched renderables. updateFunc.accept(items, buff, offset, limit ); // Upload data in buffer to UBO. updateUBO();}

[font=verdana]Now we haven't eliminated the megamorphic call, its still there, but we have reduced the amount of calls we make considerably. [/font]



[font=verdana]If we batch 500 instance updates, and we're rendering say, 10k meshes (disregarding instancing, or other batching methods for now), now we will be making just 20 calls in total for that particular updater, tops around 100 calls once we upload the data for the other kinds of UBOs.

In conclusion



Much better! Cya.[/font]
7 likes 1 comments

Comments

TheChubu

Ugh I'm having a really hard time trying to get the editor NOT to eat the entry after the first code block...

March 30, 2015 03:25 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement