[iPhone] Optimizing a long loop

Started by
11 comments, last by wodinoneeye 14 years, 1 month ago
For a game I'm writing for the iPhone I have hit a particular problem. I am rendering md2 models and each frame the models are animated by interpolating their vertices between the current and next frame. This is how the md2 format stores animation, as a serious of frames with each vertex position defined. In the game currently there are 4 animating models onscreen. Since each model has 918 vertices multiply that by 4 and you have 3,672 iterations per frame to update the models' animation. When I profile the game in Shark this takes up a whopping 15 - 20% of frame time. Drawing only one model takes about 6%. Here is the offending code
NSInteger vertexCount = _header.triangleCount * 3;

for( NSInteger i=0; i<vertexCount; i++)
{
	_vertices.position.x = currentVertices.position.x + alpha * (nextVertices.position.x - currentVertices.position.x);
	_vertices.position.y = currentVertices.position.y + alpha * (nextVertices.position.y - currentVertices.position.y);
	_vertices.position.z = currentVertices.position.z + alpha * (nextVertices.position.z - currentVertices.position.z);
}
Pretty basic, loop through each vertex and interpolate between current and next frame. I would really like to optimize this, and I don't see how it would be possible by doing anything different in the loop. So I would like to hear suggestions (such as updating a model per frame / frame skipping etc.). What would you do?
Advertisement
I don't know much about the iPhone, but I read the newer ones at least support vertex shaders, which is excellent for things like this.
If not, perhaps you can pre-calculate alpha * (next - current) at every key-frame change, at least if your update rate is constant. You could also choose to only update the model animation at specific intervals, for example half your framerate (and/or force constant update rate for the models).
You could try looking at this presentation ( http://gamesfromwithin.com/360idev-cranking-up-floating-point-performance-to-11 ) although it might already cover some stuff you already know.
Also either arranging your vertices to be more cache efficient/aware or taking advantage of __builtin_prefetch might speed it up a bit.
There's another article from that site about data orientated design which covers this ( http://gamesfromwithin.com/data-oriented-design ), just take the anti-oo opinion with a grain of salt.
On the 3GS (won't help you if you are targeting old iPhones as well) you can make use of the NEON instructions for this. If my assembler is any good this operation would end up as two (single cycle?) NEON instructions (per iteration). One vsub and one vmuladd, instead of 9 as it is now. Not counting loop instructions.
Simple stuff you can do, unroll the loop for multiple iterations per step, use raw pointers vs arrays to access the data, organize your data for better caching or if your up too it, use SIMD instructions (http://code.google.com/p/vfpmathlibrary/), etc..

Good Luck!

-ddn
Thanks for the feedback guys. I have coded in ARM asm before so I think I could optimize it using the vfp. Thanks for that excellent link, I went through all the slides although a bit dissapointed I can't see the video of Noel doing the talk live. Is there a video somewhere? I'm also having a slight bottle neck processing particles so that is another thing I need to optimize. I'll keep you posted on how this is resolved.

BTW I was quite shocked that the 3G runs it's vfp at half the speed so it does put me off using the vfp. And obviously can't use the NEON because we are targetting older phones aswell.
Moving to Consoles, PDAs and Cell Phones in case you can get any more advice there. :)
Quote:Original post by Zahlman
Moving to Consoles, PDAs and Cell Phones in case you can get any more advice there. :)


Well I didn't actually think there was much opportunity to optimize the loop itself so I thought it was more of a general game programming question on how to deal with big loops. I'm not sure moving it will help either, that part of the forum seems quite dead. Perhaps there are more iPhone related forums on the web. Thanks anyway I know you're just trying to help!
Quote:Original post by Headkaze
NSInteger vertexCount = _header.triangleCount * 3;
The best solution would be to cut down the number of vertices. Is there any chance you can switch to a format (possibly custom) which supports indexed vertices? You could likely cut down the number of vertices by 50%.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Okay problem solved.. this was easier than I though it would be and really staring me in the face.. Remove the interpolation. Since the models are quite small and low poly you don't really notice the difference.

So re-factored the code for this, and now instead of taking 20% I've dropped it down to 0%.

The mod is really to just assign the _verticies pointer to the current frame of vertices instead of copying them over in a loop. Since removing interpolation no realtime calculations are necessary and I can just use the raw data of each frame.

Another optimization I've been considering is changing from GL_TRIANGLES to GL_TRIANGLES_STRIP. Is it relatively easy to convert the data over to this format?

This topic is closed to new replies.

Advertisement