Thought I would share a recent micro-optimization I found for Android development (targeting Dalvik VM, not native). While trying to optimize some drawing code I noticed that Dalvik uses 32bit registers and thus register pairs (i.e. two registers per) for storage of 64bit data types like the scalar 'long'. Feeling that there might be a performance impact for having to access two registers, I decided to do some benchtesting with the following code:
public void run() {
int intCounter = 0;
int i;
long longCounter = 0;
long ii;
long elapseTime;
long startTime;
startTime = SystemClock.uptimeMillis();
for (int counter = 0; counter < 100; counter++) {
for (i = 0; i < 100000; i++) {
intCounter += i;
}
intCounter += counter;
}
elapseTime = SystemClock.uptimeMillis() - startTime;
Log.d("IntLongTest", "Integer Ending Value - " + intCounter + " / Total Time - " + elapseTime);
startTime = SystemClock.uptimeMillis();
for (long counter = 0; counter < 100; counter++) {
for (ii = 0; ii < 100000; ii++) {
longCounter += ii;
}
longCounter += counter;
}
elapseTime = SystemClock.uptimeMillis() - startTime;
Log.d("IntLongTest", "Long Ending Value - " + longCounter + " / Total Time - " + elapseTime);
}
I came up with the following results:
09-14 23:11:18.567: D/IntLongTest(9491): Integer Ending Value - 1778798614 / Total Time - 123
09-14 23:11:18.728: D/IntLongTest(9491): Long Ending Value - 499995004950 / Total Time - 159
09-14 23:11:27.016: D/IntLongTest(9491): Integer Ending Value - 1778798614 / Total Time - 81
09-14 23:11:27.197: D/IntLongTest(9491): Long Ending Value - 499995004950 / Total Time - 175
09-14 23:11:34.044: D/IntLongTest(9491): Integer Ending Value - 1778798614 / Total Time - 33
09-14 23:11:34.214: D/IntLongTest(9491): Long Ending Value - 499995004950 / Total Time - 163
09-14 23:11:42.653: D/IntLongTest(9491): Integer Ending Value - 1778798614 / Total Time - 39
09-14 23:11:42.843: D/IntLongTest(9491): Long Ending Value - 499995004950 / Total Time - 193
The average execution time for integers is 68.75ms while the average execution time for longs is 172.5. Which means that longs are roughly 2.5 times slower than integers on my Galaxy S3 in this test. I do acknowledge it would be fairly unlikely that this would be a sole reason for poor performance in your code, but I thought it might be an interesting little tidbit :)
This did help me improve draw performance by roughly 10% because I was using longs to track animation frame dwell times for each individual sprite on the screen (I was benchtesting with 10,000 sprites).
Side note, you will notice the integer value is different from the long value which I believe is due to 'roll over' since the loop would take the integer variable past its maximum value.
If anybody else sees a different explanation I would love to hear it. This is just an idle observation, I have not ripped open the dex to see if the bytecode is acting different than expected or if the JIT is helping integers out somehow.