Large textures are really slow...

Started by
38 comments, last by L. Spiro 10 years, 1 month ago

One of my main game features (visually) is the overlay texture I'm using. This overlay texture is full screen, and covers the entire screen which gives it a nice transparent pattern. On my PC and Mac builds, the impact to my framerates are minimal at best. On my iPad, I lose 15fps automatically.

I did some google searching, but didn't quite find a solution yet. The overlay image is 1024x768 and fits the entire iPad screen nicely. At first I assumed it was because the texture wasn't POT. So I tried splitting it into two textures: 1024x512 and 1024x256. Still to slow. So I tried googling some more. It turns out that the fill rate for mobile devices (especially a 1st Gen iPad) isn't nearly as fast as a PC or Mac video card. That's understandable because the entire quad uses alpha blending and the entire screen has to be processed by the fragment program, but I still haven't quite found a solution yet.

One idea I had was to render a few rows of 32x32 textured quads. The other idea is to get a new iPad. Tbh, I was trying to hold off on getting a new iPad because I'm saving for other hardware as well as licenses (fortunately for me, I can save up more than $1000 USD per month easily). On top of that, I want my game to work at proper speeds on an iPad2. When my game is ready, I will likely make it iOS7 exclusive anyway. And lastly, I plan on presenting this game at a public event in Seattle in early April, so I have to get the frame rates working acceptably before then (nobody wants to spend too much time on fixing a minuscule problem when there are other features that need to be finished).

I'm sure someone else has had the same problem here, but I still haven't managed to google any solutions. This sucks. Any ideas? Thanks.

Shogun.

Advertisement

Is it this game? http://shogun3d.net/images2/looptil/04.jpg

If so, you could probably save a lot cycles drawing it on a mesh with a big hole in the middle instead of a full screen quad.

The idea is to process as few of the fully transparent pixels as possible.

1st Gen iPad isn't very powerful so its not impossible it's fillrate that is your limit, even though it doesn't look like your game should need that much...

You can also check in the gldebugger and profiler that comes with XCode to make sure where the bottle necks are

Although, that screen shot is old, yes, that's the game.

I'll give that a try, and I assume it would work better, but another reason I'm asking this is because I'd also like to release another game that uses post-processing effects on mobile devices. I fear that I would get the same frame rate issues, but in that case, I should probably have options to enable/disable it.

Shogun.

Losing 15 fps doesn't tell us much. Dropping from 1000 fps to 985 fps is a huge meh, and 15 fps dropping to 0 fps is the other extreme. I know you're somewhere in the middle, but no clue where.

If your texture is actually 1024x768 instead of 1024x1024, make it 1024x1024 (feel free to atlas with something else if you can) and compress it as much as you're willing for some speedup. If you are using a "bad" texture format, change to a proper texture format (probably PVRTC). If you're using OpenGL ES 1.x instead of 2, use OES_draw_texture. Use the lowest color precision you find acceptable. If you can, do multitexture instead of multipass. Basically, just follow the best practices Apple tells you in their docs. I'm sure I left stuff out.

I don't think 32x32 textures are faster than 1024x1024. If you really can't atlas with something else to fill out the 1024x1024, maybe 3 512x512 would be faster, but I'm not sure since you're trading one performance area for another and it will probably be dependent on what you're doing.

And that recommendation to avoid as much of the transparency as possible is great advice. Try varying numbers of polygons to see where the sweet spot is, but I wouldn't be surprised if you're better off with 100+ polygons to minimize the amount of transparent overlay pixels to render rather than a single fullscreen quad. Still use as few textures as possible though, even if you have a lot of polygons.

The tile-based architectures of most mobile GPUs have a different set of optimal use cases compared to more traditional GPUs. Check the advice in Performance Tuning for Tile-Based Architectures from the OpenGL Insights book to make sure you're not doing anything that triggers poor performance.

but another reason I'm asking this is because I'd also like to release another game that uses post-processing effects on mobile devices.

You should aim for later devices then ipad1 then. It's almost 4 years old now.

It all depends on what else you do of course, but I wouldn't plan on too fancy full screen post effects on anything more then 1-2 years old.

With the ES3 devices coming out now, the performance start reaching levels where you can do some pretty fancy stuff, but they are still very far from desktop.

That's not that strange considering their difference in size and power consumption though!

How often is the texture being update ?
-If every frame, are you uploading the entire texture every frame even in the case where parts of the texture is not modified?

What format is the texture data in ?
-If its a fat format like RGBA, do you actually need 4-channels ? Would 2-channel or 1-channel texture suffice.

There is a multitude of reason as why your texture upload is so slow, but the solution probably boils down to taking a step back and looking at the requirements. I've seen you have tried a few optimization that haven't given you the desired result. Without more information like texture format, update frequency, its kinda difficult to give a concrete answer.

Losing 15 fps doesn't tell us much. Dropping from 1000 fps to 985 fps is a huge meh, and 15 fps dropping to 0 fps is the other extreme. I know you're somewhere in the middle, but no clue where.

If your texture is actually 1024x768 instead of 1024x1024, make it 1024x1024 (feel free to atlas with something else if you can) and compress it as much as you're willing for some speedup. If you are using a "bad" texture format, change to a proper texture format (probably PVRTC). If you're using OpenGL ES 1.x instead of 2, use OES_draw_texture. Use the lowest color precision you find acceptable. If you can, do multitexture instead of multipass. Basically, just follow the best practices Apple tells you in their docs. I'm sure I left stuff out.

I don't think 32x32 textures are faster than 1024x1024. If you really can't atlas with something else to fill out the 1024x1024, maybe 3 512x512 would be faster, but I'm not sure since you're trading one performance area for another and it will probably be dependent on what you're doing.

And that recommendation to avoid as much of the transparency as possible is great advice. Try varying numbers of polygons to see where the sweet spot is, but I wouldn't be surprised if you're better off with 100+ polygons to minimize the amount of transparent overlay pixels to render rather than a single fullscreen quad. Still use as few textures as possible though, even if you have a lot of polygons.

1. My game has to run at a solid 60fps to remain playable. Without the overlay image, I get that. 45fps is not tolerable for my game due to the internal mechanics I've chosen to use. Sorry, should have said that earlier.

2. As I stated earlier, I tried POT-ing the texture, and it had zero impact. It's an issue related to fill rates. I'm going to try compressing it to use PVRTC soon, because I'm using RGBA8, which i know isn't too optimal, especially since it's black and white. I'm using OpenGL ES 2.0 right now, and now that I realize it, I did forget to switch the precision back to low, thanks. I read the Apple docs, and learned quite a bit from it, and implemented the necessary techniques to speed up my code, but none that address my particular issue.

3. No, that's not what I was saying. What I was saying was similar to your 4th paragraph. Use rows of 32x32 quads and not render the fully transparent area.

The tile-based architectures of most mobile GPUs have a different set of optimal use cases compared to more traditional GPUs. Check the advice in Performance Tuning for Tile-Based Architectures from the OpenGL Insights book to make sure you're not doing anything that triggers poor performance.

I saw that article once before, and forgot about it. This time, I'll try reading it more thoroughly.


but another reason I'm asking this is because I'd also like to release another game that uses post-processing effects on mobile devices.

You should aim for later devices then ipad1 then. It's almost 4 years old now.

It all depends on what else you do of course, but I wouldn't plan on too fancy full screen post effects on anything more then 1-2 years old.

With the ES3 devices coming out now, the performance start reaching levels where you can do some pretty fancy stuff, but they are still very far from desktop.

That's not that strange considering their difference in size and power consumption though!

Yes, well said. I do plan on aiming for only iOS7 compatible devices in the future (iPad1 is all I have right now, hence the reason why I'm using it).

I do own an OpenGL ES 3.0 compatible device now, a 2nd Gen Nexus 7 which does nicely. Haven't ported my game to Android yet.

How often is the texture being update ?
-If every frame, are you uploading the entire texture every frame even in the case where parts of the texture is not modified?

What format is the texture data in ?
-If its a fat format like RGBA, do you actually need 4-channels ? Would 2-channel or 1-channel texture suffice.

There is a multitude of reason as why your texture upload is so slow, but the solution probably boils down to taking a step back and looking at the requirements. I've seen you have tried a few optimization that haven't given you the desired result. Without more information like texture format, update frequency, its kinda difficult to give a concrete answer.

1. The texture is not dynamic, so it's not updated every frame.

2. It's RGBA, and compression wouldn't hurt, of course. I've read some posts by others who have the same issue, and they said that compression only got them a few extra fps (like 4). Still worth implementing either way.

Shogun.

1st gen iPads are a nightmare for fill-rate. Switching to PVRTC will help.

If your texture allows it, then identify areas which don't need to be filled, and use trade extra geometry for less fill-rate. You could even take that to an extreme, assuming we're talking about concentric hexagons visible in this image (http://shogun3d.net/images2/looptil/04.jpg), then the performance will be massively better if they are rendered as geometry with no texture whatsoever. A thousand or so extra triangles is nothing compared to the hundreds of thousands of pixels you'll avoid filling.

Just adding this in for clarity.

Make sure you aren't sending in the uniform for the sampler2D each frame. Just send it in the first time you need it.
I know you said the texture isn't updated every frame, but uploading and updating can mean two different things, hence why I want to make sure this is clear!

Cheers

This topic is closed to new replies.

Advertisement