Sign in to follow this  
StiX

iPhone Game Optimizations. Ultimate Guide

Recommended Posts

StiX    167
Thanks! I plan to write few articles, that will dive deeper into optimizations described here. Probably one article per weekend or two...

Share this post


Link to post
Share on other sites
Ed Welch    1008
Thanks for the info Stix, it's quite a gold mine of information.
One tip that sounds a bit funny:
"Driver will pad your NPOT textures to next biggest POT value"
I never heard of anything like that. Just wondering it that's really true.
I did hear of a different bug regarding NPOT. If you have a NPOT width that isn't multiple of 4 it will allocate too much memory.

Share this post


Link to post
Share on other sites
StiX    167
I meant, that your NPOT texture, that is of 480x480x32, that will eat 700kb of your client memory will be padded to 512x512x32 by the driver and will eat 800kb of "video" memory...

Share this post


Link to post
Share on other sites
Ed Welch    1008
[quote name='StiX' timestamp='1350239612' post='4990104']
I meant, that your NPOT texture, that is of 480x480x32, that will eat 700kb of your client memory will be padded to 512x512x32 by the driver and will eat 800kb of "video" memory...
[/quote]
That's crazy. I thought the whole purpose of NPOT textures was to save memory.

Share this post


Link to post
Share on other sites
L. Spiro    25621
That is correct. Sort only by shader and textures. If you add depth to that test your FPS will go down.

[quote name='Daaark' timestamp='1349807207' post='4988425']
A few slides were a bit ambiguous. You want to align to 4 bytes manually, or let GPU do it for you?
[/quote]
StiX already answered but I wanted to give some more detail and an idea of how critical this is.
When you call glDrawElements() or glDrawArrays() there are a few things that can cause it to take a slower path which causes it to copy your entire vertex buffer to a new location in “GPU” RAM (of course there is no such thing in a unified memory model but it is easier to think of memory managed by the driver as GPU RAM).
One way is to simply not use VBO’s. Another way is to pass misaligned data (attributes not aligned to 4 bytes).
These copies obviously involve a lot of extra cycles, even though it uses an optimized memcpy() when possible (it can’t when realignment is necessary), and to give you an overhead of just how much that is, on an average game it means the difference between 20 and 45 FPS.

In going into extreme detail, if you benchmark with Time Profiler and you see a function called glDraw[[b][/b]Arrays|Elements]_ACC_ES2Exec() taking a large amount of time, check your vertex alignments.
If you see glDraw[[b][/b]Arrays|Elements]_IMM_ES2Exec() taking a lot of time then your problem is likely the lack of a VBO.


L. Spiro Edited by L. Spiro

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this