Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Large textures are really slow...


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
39 replies to this topic

#1 blueshogun96   Crossbones+   -  Reputation: 1053

Like
1Likes
Like

Posted 28 February 2014 - 11:30 AM

One of my main game features (visually) is the overlay texture I'm using.  This overlay texture is full screen, and covers the entire screen which gives it a nice transparent pattern.  On my PC and Mac builds, the impact to my framerates are minimal at best.  On my iPad, I lose 15fps automatically.

 

I did some google searching, but didn't quite find a solution yet.  The overlay image is 1024x768 and fits the entire iPad screen nicely.  At first I assumed it was because the texture wasn't POT.  So I tried splitting it into two textures: 1024x512 and 1024x256.  Still to slow.  So I tried googling some more.  It turns out that the fill rate for mobile devices (especially a 1st Gen iPad) isn't nearly as fast as a PC or Mac video card.  That's understandable because the entire quad uses alpha blending and the entire screen has to be processed by the fragment program, but I still haven't quite found a solution yet.  

 

One idea I had was to render a few rows of 32x32 textured quads.  The other idea is to get a new iPad.  Tbh, I was trying to hold off on getting a new iPad because I'm saving for other hardware as well as licenses (fortunately for me, I can save up more than $1000 USD per month easily).  On top of that, I want my game to work at proper speeds on an iPad2.  When my game is ready, I will likely make it iOS7 exclusive anyway.  And lastly, I plan on presenting this game at a public event in Seattle in early April, so I have to get the frame rates working acceptably before then (nobody wants to spend too much time on fixing a minuscule problem when there are other features that need to be finished).

 

I'm sure someone else has had the same problem here, but I still haven't managed to google any solutions.  This sucks.  Any ideas?  Thanks.

 

Shogun.


Follow Shogun3D on the official website: http://shogun3d.net

 

blogger.png twitter.png tumblr_32.png facebook.png

 

"Yo mama so fat, she can't be frustum culled." - yoshi_lol


Sponsor:

#2 Olof Hedman   Crossbones+   -  Reputation: 2910

Like
3Likes
Like

Posted 28 February 2014 - 12:32 PM

Is it this game? http://shogun3d.net/images2/looptil/04.jpg

 

If so, you could probably save a lot cycles drawing it on a mesh with a big hole in the middle instead of a full screen quad.

The idea is to process as few of the fully transparent pixels as possible.

 

1st Gen iPad isn't very powerful so its not impossible it's fillrate that is your limit, even though it doesn't look like your game should need that much...

 

You can also check in the gldebugger and profiler that comes with XCode to make sure where the bottle necks are


Edited by Olof Hedman, 28 February 2014 - 12:33 PM.


#3 blueshogun96   Crossbones+   -  Reputation: 1053

Like
0Likes
Like

Posted 28 February 2014 - 12:37 PM

Although, that screen shot is old, yes, that's the game.

 

I'll give that a try, and I assume it would work better, but another reason I'm asking this is because I'd also like to release another game that uses post-processing effects on mobile devices.  I fear that I would get the same frame rate issues, but in that case, I should probably have options to enable/disable it.

 

Shogun.


Follow Shogun3D on the official website: http://shogun3d.net

 

blogger.png twitter.png tumblr_32.png facebook.png

 

"Yo mama so fat, she can't be frustum culled." - yoshi_lol


#4 richardurich   Members   -  Reputation: 1187

Like
1Likes
Like

Posted 28 February 2014 - 12:42 PM

Losing 15 fps doesn't tell us much. Dropping from 1000 fps to 985 fps is a huge meh, and 15 fps dropping to 0 fps is the other extreme. I know you're somewhere in the middle, but no clue where.

 

If your texture is actually 1024x768 instead of 1024x1024, make it 1024x1024 (feel free to atlas with something else if you can) and compress it as much as you're willing for some speedup. If you are using a "bad" texture format, change to a proper texture format (probably PVRTC). If you're using OpenGL ES 1.x instead of 2, use OES_draw_texture. Use the lowest color precision you find acceptable. If you can, do multitexture instead of multipass. Basically, just follow the best practices Apple tells you in their docs. I'm sure I left stuff out.

 

I don't think 32x32 textures are faster than 1024x1024. If you really can't atlas with something else to fill out the 1024x1024, maybe 3 512x512 would be faster, but I'm not sure since you're trading one performance area for another and it will probably be dependent on what you're doing.

 

And that recommendation to avoid as much of the transparency as possible is great advice. Try varying numbers of polygons to see where the sweet spot is, but I wouldn't be surprised if you're better off with 100+ polygons to minimize the amount of transparent overlay pixels to render rather than a single fullscreen quad. Still use as few textures as possible though, even if you have a lot of polygons.



#5 dave j   Members   -  Reputation: 595

Like
2Likes
Like

Posted 28 February 2014 - 12:45 PM

The tile-based architectures of most mobile GPUs have a different set of optimal use cases compared to more traditional GPUs. Check the advice in Performance Tuning for Tile-Based Architectures from the OpenGL Insights book to make sure you're not doing anything that triggers poor performance.

#6 Olof Hedman   Crossbones+   -  Reputation: 2910

Like
2Likes
Like

Posted 28 February 2014 - 12:49 PM


but another reason I'm asking this is because I'd also like to release another game that uses post-processing effects on mobile devices.

 

You should aim for later devices then ipad1 then. It's almost 4 years old now.

 

It all depends on what else you do of course, but I wouldn't plan on too fancy full screen post effects on anything more then 1-2 years old.

 

With the ES3 devices coming out now, the performance start reaching levels where you can do some pretty fancy stuff, but they are still very far from desktop. 

That's not that strange considering their difference in size and power consumption though!


Edited by Olof Hedman, 28 February 2014 - 12:55 PM.


#7 cgrant   Members   -  Reputation: 698

Like
2Likes
Like

Posted 28 February 2014 - 01:02 PM

How often is the texture being update ?
-If every frame, are you uploading the entire texture every frame even in the case where parts of the texture is not modified?

What format is the texture data in ?
-If its a fat format like RGBA, do you actually need 4-channels ? Would 2-channel or 1-channel texture suffice.

There is a multitude of reason as why your texture upload is so slow, but the solution probably boils down to taking a step back and looking at the requirements. I've seen you have tried a few optimization that haven't given you the desired result. Without more information like texture format, update frequency, its kinda difficult to give a concrete answer.
 



#8 blueshogun96   Crossbones+   -  Reputation: 1053

Like
0Likes
Like

Posted 28 February 2014 - 01:27 PM

Losing 15 fps doesn't tell us much. Dropping from 1000 fps to 985 fps is a huge meh, and 15 fps dropping to 0 fps is the other extreme. I know you're somewhere in the middle, but no clue where.

 

If your texture is actually 1024x768 instead of 1024x1024, make it 1024x1024 (feel free to atlas with something else if you can) and compress it as much as you're willing for some speedup. If you are using a "bad" texture format, change to a proper texture format (probably PVRTC). If you're using OpenGL ES 1.x instead of 2, use OES_draw_texture. Use the lowest color precision you find acceptable. If you can, do multitexture instead of multipass. Basically, just follow the best practices Apple tells you in their docs. I'm sure I left stuff out.

 

I don't think 32x32 textures are faster than 1024x1024. If you really can't atlas with something else to fill out the 1024x1024, maybe 3 512x512 would be faster, but I'm not sure since you're trading one performance area for another and it will probably be dependent on what you're doing.

 

And that recommendation to avoid as much of the transparency as possible is great advice. Try varying numbers of polygons to see where the sweet spot is, but I wouldn't be surprised if you're better off with 100+ polygons to minimize the amount of transparent overlay pixels to render rather than a single fullscreen quad. Still use as few textures as possible though, even if you have a lot of polygons.

 

1. My game has to run at a solid 60fps to remain playable.  Without the overlay image, I get that.  45fps is not tolerable for my game due to the internal mechanics I've chosen to use.  Sorry, should have said that earlier.

 

2. As I stated earlier, I tried POT-ing the texture, and it had zero impact.  It's an issue related to fill rates.  I'm going to try compressing it to use PVRTC soon, because I'm using RGBA8, which i know isn't too optimal, especially since it's black and white.  I'm using OpenGL ES 2.0 right now, and now that I realize it, I did forget to switch the precision back to low, thanks.  I read the Apple docs, and learned quite a bit from it, and implemented the necessary techniques to speed up my code, but none that address my particular issue.

 

3. No, that's not what I was saying.  What I was saying was similar to your 4th paragraph.  Use rows of 32x32 quads and not render the fully transparent area.

 

The tile-based architectures of most mobile GPUs have a different set of optimal use cases compared to more traditional GPUs. Check the advice in Performance Tuning for Tile-Based Architectures from the OpenGL Insights book to make sure you're not doing anything that triggers poor performance.

 

I saw that article once before, and forgot about it.  This time, I'll try reading it more thoroughly.

 

 


but another reason I'm asking this is because I'd also like to release another game that uses post-processing effects on mobile devices.

 

You should aim for later devices then ipad1 then. It's almost 4 years old now.

 

It all depends on what else you do of course, but I wouldn't plan on too fancy full screen post effects on anything more then 1-2 years old.

 

With the ES3 devices coming out now, the performance start reaching levels where you can do some pretty fancy stuff, but they are still very far from desktop. 

That's not that strange considering their difference in size and power consumption though!

 

 

Yes, well said.  I do plan on aiming for only iOS7 compatible devices in the future (iPad1 is all I have right now, hence the reason why I'm using it).

 

I do own an OpenGL ES 3.0 compatible device now, a 2nd Gen Nexus 7 which does nicely.  Haven't ported my game to Android yet.

 

How often is the texture being update ?
-If every frame, are you uploading the entire texture every frame even in the case where parts of the texture is not modified?

What format is the texture data in ?
-If its a fat format like RGBA, do you actually need 4-channels ? Would 2-channel or 1-channel texture suffice.

There is a multitude of reason as why your texture upload is so slow, but the solution probably boils down to taking a step back and looking at the requirements. I've seen you have tried a few optimization that haven't given you the desired result. Without more information like texture format, update frequency, its kinda difficult to give a concrete answer.
 

 

1. The texture is not dynamic, so it's not updated every frame.

 

2. It's RGBA, and compression wouldn't hurt, of course.  I've read some posts by others who have the same issue, and they said that compression only got them a few extra fps (like 4).  Still worth implementing either way.

 

Shogun.


Follow Shogun3D on the official website: http://shogun3d.net

 

blogger.png twitter.png tumblr_32.png facebook.png

 

"Yo mama so fat, she can't be frustum culled." - yoshi_lol


#9 C0lumbo   Crossbones+   -  Reputation: 2405

Like
0Likes
Like

Posted 28 February 2014 - 01:32 PM

1st gen iPads are a nightmare for fill-rate. Switching to PVRTC will help.

 

If your texture allows it, then identify areas which don't need to be filled, and use trade extra geometry for less fill-rate. You could even take that to an extreme, assuming we're talking about concentric hexagons visible in this image (http://shogun3d.net/images2/looptil/04.jpg), then the performance will be massively better if they are rendered as geometry with no texture whatsoever. A thousand or so extra triangles is nothing compared to the hundreds of thousands of pixels you'll avoid filling.



#10 Lodeman   Members   -  Reputation: 847

Like
1Likes
Like

Posted 28 February 2014 - 02:46 PM

Just adding this in for clarity.

 

Make sure you aren't sending in the uniform for the sampler2D each frame. Just send it in the first time you need it.
I know you said the texture isn't updated every frame, but uploading and updating can mean two different things, hence why I want to make sure this is clear!

 

Cheers



#11 swiftcoder   Senior Moderators   -  Reputation: 10242

Like
1Likes
Like

Posted 28 February 2014 - 05:18 PM


I'm going to try compressing it to use PVRTC soon, because I'm using RGBA8, which i know isn't too optimal, especially since it's black and white.

If that texture is greyscale + alpha, you are currently using 2x the memory bandwidth by loading it as a RGBA8. The win from downsizing that to 2 channels is likely to overshadow any gains due to texture compression.

 

Convert to a LUMINANCE_ALPHA 2-channel setup, and compress it, and you should be cooking (although, as other posters have mentioned, any large holes should be created with geometry, not alpha).


Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#12 dpadam450   Members   -  Reputation: 934

Like
0Likes
Like

Posted 28 February 2014 - 05:31 PM

You said "overlay" which implies you are blending the whole thing on top after the rest of the stuff.  Instead render it as a background, glDisable(GL_BLEND).

And then make the yellow bar and whatever else glEnable(BLEND), you will be blending much much less pixels. And even your black text, obviously nothing blends with black, so when drawing black objects dont blend with the background. You should see the same exact results.



#13 dpadam450   Members   -  Reputation: 934

Like
0Likes
Like

Posted 28 February 2014 - 05:33 PM

Also as the other guy suggested about holes:  Any 100% transparent pixels should be alpha tested out so that they dont blend in. use GL_ALPHA_TEST and GL_BLEND together.



#14 uglybdavis   Members   -  Reputation: 940

Like
0Likes
Like

Posted 28 February 2014 - 06:50 PM

Assuming you are using shaders, test the alpha value of your sample, pick a good number (i don't know how much alpha blending you have) and any fragment whose alpha is below that, just discard.



#15 blueshogun96   Crossbones+   -  Reputation: 1053

Like
1Likes
Like

Posted 28 February 2014 - 06:56 PM

1st gen iPads are a nightmare for fill-rate. Switching to PVRTC will help.

 

If your texture allows it, then identify areas which don't need to be filled, and use trade extra geometry for less fill-rate. You could even take that to an extreme, assuming we're talking about concentric hexagons visible in this image (http://shogun3d.net/images2/looptil/04.jpg), then the performance will be massively better if they are rendered as geometry with no texture whatsoever. A thousand or so extra triangles is nothing compared to the hundreds of thousands of pixels you'll avoid filling.

What I ended up doing was rendering 4 quads to avoid drawing the area that was fully transparent.  Now I get 60fps majority of the time.

 

 

 


I'm going to try compressing it to use PVRTC soon, because I'm using RGBA8, which i know isn't too optimal, especially since it's black and white.

If that texture is greyscale + alpha, you are currently using 2x the memory bandwidth by loading it as a RGBA8. The win from downsizing that to 2 channels is likely to overshadow any gains due to texture compression.

 

Convert to a LUMINANCE_ALPHA 2-channel setup, and compress it, and you should be cooking (although, as other posters have mentioned, any large holes should be created with geometry, not alpha).

 

 

I have yet to compress the texture (haven't looked into how to do so yet).  The large hole created with geometry worked.

 

 

You said "overlay" which implies you are blending the whole thing on top after the rest of the stuff.  Instead render it as a background, glDisable(GL_BLEND).

And then make the yellow bar and whatever else glEnable(BLEND), you will be blending much much less pixels. And even your black text, obviously nothing blends with black, so when drawing black objects dont blend with the background. You should see the same exact results.

That defeats the purpose of the overlay image altogether.  Without the alpha blending, it looks awkward and isn't desirable.

 

 

Also as the other guy suggested about holes:  Any 100% transparent pixels should be alpha tested out so that they dont blend in. use GL_ALPHA_TEST and GL_BLEND together.

Alpha testing on mobile devices is generally slower, and should be avoided.  This is what I've read many times while googling iOS optimizations.  Second, I'm using OpenGL ES 2.0, and it doesn't appear that GL_ALPHA_TEST is supported.  Gives me an error when I try to enable it because it's not defined.

 

Anyway, it's working at acceptable speeds now.  Next I'll do some further optimizations such as texture compression, etc.

 

Shogun.


Follow Shogun3D on the official website: http://shogun3d.net

 

blogger.png twitter.png tumblr_32.png facebook.png

 

"Yo mama so fat, she can't be frustum culled." - yoshi_lol


#16 dpadam450   Members   -  Reputation: 934

Like
4Likes
Like

Posted 28 February 2014 - 10:00 PM

Yea I forgot about alpha test. The overlay you missed my point completely.

.5*healthbar + .5*your_background texture   ==  .5*your_background texture + .5*healthbar

If you put the background drawn first, without alpha blending, it will be in the background. If you blend your health bar on top, the amount of pixels you are blending is only the health bar pixels. (A lot less pixels blending).

Your original image, is the background blended on top, or the healthbar blending on top?....... the user won't know because its the same result. This doesn't hold true for 99% of games, but if that is the extent of your games art design, you can pull off doing this order instead(drawing BG without blending first, then applying blending with the BG). The other 99% of games don't work because usually stuff is drawn in front of stuff behind stuff etc, where they would blend and blend and blend, but since you have only BG blending with really 1 sprite at a time(at each pixel), it doesnt matter which of the 2 images is the SRC and which is the DST. Same math.
http://shogun3d.net/images2/looptil/04.jpg


Edited by dpadam450, 28 February 2014 - 10:04 PM.


#17 swiftcoder   Senior Moderators   -  Reputation: 10242

Like
0Likes
Like

Posted 01 March 2014 - 07:54 AM


I have yet to compress the texture (haven't looked into how to do so yet).

It is literally dead simple. Fire up texturetool on the command line, and profit.


Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#18 Olof Hedman   Crossbones+   -  Reputation: 2910

Like
2Likes
Like

Posted 01 March 2014 - 08:24 AM


Yea I forgot about alpha test. The overlay you missed my point completely.

.5*healthbar + .5*your_background texture   ==  .5*your_background texture + .5*healthbar

If you put the background drawn first, without alpha blending, it will be in the background. If you blend your health bar on top, the amount of pixels you are blending is only the health bar pixels. (A lot less pixels blending).

 

I'm not sure that would work that well...

 

Maybe with constant alpha, but not with an alpha channel.

 

First you would also need to modify your blending function, and make sure your framebuffer is RGBA.

 

Normally you do srcA * srcRGB + (1-srcA)*dstB.

 

If you want to to it the other way around, you have to blend with (1-dstA)*srcRGB + dstA*dstRGB in your second pass, to use the alpha in the overlay.

 

I think It also requires that the game graphics can't have an alpha channel of their own, so no smooth edges. (like those clouds)



#19 blueshogun96   Crossbones+   -  Reputation: 1053

Like
1Likes
Like

Posted 01 March 2014 - 10:24 AM

 


I have yet to compress the texture (haven't looked into how to do so yet).

It is literally dead simple. Fire up texturetool on the command line, and profit.

 

I've never heard of texturetool.  Does it come with XCode or something?  Let me google that.

 

Shogun.


Follow Shogun3D on the official website: http://shogun3d.net

 

blogger.png twitter.png tumblr_32.png facebook.png

 

"Yo mama so fat, she can't be frustum culled." - yoshi_lol


#20 swiftcoder   Senior Moderators   -  Reputation: 10242

Like
2Likes
Like

Posted 01 March 2014 - 10:25 AM


I've never heard of texturetool.  Does it come with XCode or something?  Let me google that.

 

https://developer.apple.com/library/ios/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/TextureTool/TextureTool.html


Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS