Jump to content

  • Log In with Google      Sign In   
  • Create Account


Fill rate questions


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
9 replies to this topic

#1 vladic2000x   Members   -  Reputation: 252

Like
0Likes
Like

Posted 08 June 2012 - 07:22 AM

Hi there. This question is about fill rate performance. What will be faster: rendering a quad with 1024x1024 texture or the same quad with 128x128 texture (scaled with linear interpolation). I'm not seeing any FPS improvement in my application for 128x128, even a small 1 FPS decrease from 1024x1024... Is there some performance gain from this perspective?

Thanks

Sponsor:

#2 Ripiz   Members   -  Reputation: 529

Like
0Likes
Like

Posted 08 June 2012 - 08:58 AM

It depends on amount of pixels drawn, not texture size.

You'll see performance gain only if you are fill rate limited.

#3 vladic2000x   Members   -  Reputation: 252

Like
0Likes
Like

Posted 08 June 2012 - 09:32 AM

I didn't quite understand... so, indiferently if I'm using a 1024x1024 texture or 128x128 texture, if it's placed over the same quad and occupies visually the same amount of screen, it will render the same amount of pixels which means no fill-rate gain?

#4 Gavin Williams   Members   -  Reputation: 642

Like
0Likes
Like

Posted 08 June 2012 - 12:46 PM

My admittedly limited understanding of the situation says 'yes', it will render at roughly the same speed, because your shader has to operate over every pixel, whether it fetches the texel data from a 128^2 texture or a 1024^2 texture is of little consequence, it still must fetch a single value for each texel.

#5 kauna   Crossbones+   -  Reputation: 2158

Like
0Likes
Like

Posted 08 June 2012 - 02:23 PM

I think such a simple case isn't good for performance analysis. Practically in your case, you use same amount of fill rate regardless of texture size. However, I think that at some point with certain low end hardware texture sizes may become bottle neck, either for consumed memory bandwidth (reading texture from memory) or consumed memory (which results in swapping). Memory bandwidth of modern cards is measured in tens/hundreds gigabytes per second.

Cheers!

#6 vladic2000x   Members   -  Reputation: 252

Like
0Likes
Like

Posted 09 June 2012 - 01:17 AM

Thanks for replies. I'm doing this for iPad 1, so I guess it's relevant.

#7 L. Spiro   Crossbones+   -  Reputation: 12241

Like
0Likes
Like

Posted 09 June 2012 - 01:55 AM

If this is for iOS then it is not for DirectX or XNA and this is a wrong forum.

Fill-rate limitations involve how many pixels you render to the screen and nothing more. This includes overlapping pixels but you won’t have them here with your test case.
So from a fill-rate standpoint, your results will be completely identical.

Then you have bandwidth limitations which dictate how fast it is to send data to the graphics hardware. Normally larger textures are slower to send, but iOS devices use a UMM (Unified Memory Model) which means there is no GPU RAM and nowhere to send the data, so here again you will not see a change.

Then there are cache hits. This applies to every device, regardless of memory model (except to pedantically mention devices with no cache). If you are blitting every pixel in a texture, smaller is faster due to better caching. This is why (among many reasons) mipmaps are so important.


However, as it was mentioned above, your test case is useless. You can’t detect the differences between each of these potential bottlenecks with just 4 vertices and a texture being drawn every frame. Firstly, iOS devices are capped at 60 FPS. No matter how bad the above limitations are, it will never drop below 60 FPS with that kind of test case.

You need to get more into your scene until it starts staying around 30 FPS before you start benchmarking anything.


L. Spiro
It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#8 vladic2000x   Members   -  Reputation: 252

Like
0Likes
Like

Posted 09 June 2012 - 02:34 AM

Thank you very much L.Spiro. It makes sense :) I was writing on DX forum, because our engine works on both Win/Mac and has separate code paths for rendering, so I was doing the test on PC primarily, thinking of how this will reflect on iPad.

#9 L. Spiro   Crossbones+   -  Reputation: 12241

Like
1Likes
Like

Posted 09 June 2012 - 03:01 AM

You should never test that way. They are completely unrelated hardware.

PowerVR (iOS graphics hardware) uses deferred tile-based rendering and the device uses a unified memory model.
There are hundreds of differences between iOS devices and Windows®/Macintosh®.

Deferred tile-based rendering means overdraw is eliminated (in standard cases).
Unified memory model means bus transfers are eliminated.

Then there are differences in the drivers for each API.
For DirectX 9 it is faster not to redundancy-check large shader uniforms such as matrices or arrays.
For OpenGL on desktop it depends on the driver, but usually it is faster to manually redundancy-check large shader uniforms even if they change frequently.
For OpenGL ES 2 on iOS devices, it doesn’t matter either way. Checking if a uniform is redundant or just sending it to the shader is essentially the same speed (again, unified memory model).

And by the way, this all applies to the iOS Simulator as well. It has nothing to do with a real device. It doesn’t even try to emulate speed etc. In fact its implementation of OpenGL ES 2 is software-emulated, meaning not hardware accelerated.

Never ever test on anything but the real device.


L. Spiro
It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
I went to my local Subway once to find some guy yelling at the staff. When someone finally came to take my order and asked, “May I help you?”, I replied, “Yeah, I’ll have one asshole to go.”
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums

#10 vladic2000x   Members   -  Reputation: 252

Like
0Likes
Like

Posted 11 June 2012 - 09:19 AM

Many many thanks! A great start to investigate my issues :)




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS