Hi there. This question is about fill rate performance. What will be faster: rendering a quad with 1024x1024 texture or the same quad with 128x128 texture (scaled with linear interpolation). I'm not seeing any FPS improvement in my application for 128x128, even a small 1 FPS decrease from 1024x1024... Is there some performance gain from this perspective?
Thanks
9 replies to this topic
Sponsor:
#3 Members - Reputation: 248
Posted 08 June 2012 - 09:32 AM
I didn't quite understand... so, indiferently if I'm using a 1024x1024 texture or 128x128 texture, if it's placed over the same quad and occupies visually the same amount of screen, it will render the same amount of pixels which means no fill-rate gain?
#4 GDNet+ - Reputation: 479
Posted 08 June 2012 - 12:46 PM
My admittedly limited understanding of the situation says 'yes', it will render at roughly the same speed, because your shader has to operate over every pixel, whether it fetches the texel data from a 128^2 texture or a 1024^2 texture is of little consequence, it still must fetch a single value for each texel.
#5 Members - Reputation: 1204
Posted 08 June 2012 - 02:23 PM
I think such a simple case isn't good for performance analysis. Practically in your case, you use same amount of fill rate regardless of texture size. However, I think that at some point with certain low end hardware texture sizes may become bottle neck, either for consumed memory bandwidth (reading texture from memory) or consumed memory (which results in swapping). Memory bandwidth of modern cards is measured in tens/hundreds gigabytes per second.
Cheers!
Cheers!
#7 Crossbones+ - Reputation: 5345
Posted 09 June 2012 - 01:55 AM
If this is for iOS then it is not for DirectX or XNA and this is a wrong forum.
Fill-rate limitations involve how many pixels you render to the screen and nothing more. This includes overlapping pixels but you won’t have them here with your test case.
So from a fill-rate standpoint, your results will be completely identical.
Then you have bandwidth limitations which dictate how fast it is to send data to the graphics hardware. Normally larger textures are slower to send, but iOS devices use a UMM (Unified Memory Model) which means there is no GPU RAM and nowhere to send the data, so here again you will not see a change.
Then there are cache hits. This applies to every device, regardless of memory model (except to pedantically mention devices with no cache). If you are blitting every pixel in a texture, smaller is faster due to better caching. This is why (among many reasons) mipmaps are so important.
However, as it was mentioned above, your test case is useless. You can’t detect the differences between each of these potential bottlenecks with just 4 vertices and a texture being drawn every frame. Firstly, iOS devices are capped at 60 FPS. No matter how bad the above limitations are, it will never drop below 60 FPS with that kind of test case.
You need to get more into your scene until it starts staying around 30 FPS before you start benchmarking anything.
L. Spiro
Fill-rate limitations involve how many pixels you render to the screen and nothing more. This includes overlapping pixels but you won’t have them here with your test case.
So from a fill-rate standpoint, your results will be completely identical.
Then you have bandwidth limitations which dictate how fast it is to send data to the graphics hardware. Normally larger textures are slower to send, but iOS devices use a UMM (Unified Memory Model) which means there is no GPU RAM and nowhere to send the data, so here again you will not see a change.
Then there are cache hits. This applies to every device, regardless of memory model (except to pedantically mention devices with no cache). If you are blitting every pixel in a texture, smaller is faster due to better caching. This is why (among many reasons) mipmaps are so important.
However, as it was mentioned above, your test case is useless. You can’t detect the differences between each of these potential bottlenecks with just 4 vertices and a texture being drawn every frame. Firstly, iOS devices are capped at 60 FPS. No matter how bad the above limitations are, it will never drop below 60 FPS with that kind of test case.
You need to get more into your scene until it starts staying around 30 FPS before you start benchmarking anything.
L. Spiro
It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums
#9 Crossbones+ - Reputation: 5345
Posted 09 June 2012 - 03:01 AM
You should never test that way. They are completely unrelated hardware.
PowerVR (iOS graphics hardware) uses deferred tile-based rendering and the device uses a unified memory model.
There are hundreds of differences between iOS devices and Windows®/Macintosh®.
Deferred tile-based rendering means overdraw is eliminated (in standard cases).
Unified memory model means bus transfers are eliminated.
Then there are differences in the drivers for each API.
For DirectX 9 it is faster not to redundancy-check large shader uniforms such as matrices or arrays.
For OpenGL on desktop it depends on the driver, but usually it is faster to manually redundancy-check large shader uniforms even if they change frequently.
For OpenGL ES 2 on iOS devices, it doesn’t matter either way. Checking if a uniform is redundant or just sending it to the shader is essentially the same speed (again, unified memory model).
And by the way, this all applies to the iOS Simulator as well. It has nothing to do with a real device. It doesn’t even try to emulate speed etc. In fact its implementation of OpenGL ES 2 is software-emulated, meaning not hardware accelerated.
Never ever test on anything but the real device.
L. Spiro
PowerVR (iOS graphics hardware) uses deferred tile-based rendering and the device uses a unified memory model.
There are hundreds of differences between iOS devices and Windows®/Macintosh®.
Deferred tile-based rendering means overdraw is eliminated (in standard cases).
Unified memory model means bus transfers are eliminated.
Then there are differences in the drivers for each API.
For DirectX 9 it is faster not to redundancy-check large shader uniforms such as matrices or arrays.
For OpenGL on desktop it depends on the driver, but usually it is faster to manually redundancy-check large shader uniforms even if they change frequently.
For OpenGL ES 2 on iOS devices, it doesn’t matter either way. Checking if a uniform is redundant or just sending it to the shader is essentially the same speed (again, unified memory model).
And by the way, this all applies to the iOS Simulator as well. It has nothing to do with a real device. It doesn’t even try to emulate speed etc. In fact its implementation of OpenGL ES 2 is software-emulated, meaning not hardware accelerated.
Never ever test on anything but the real device.
L. Spiro
It is amazing how often people try to be unique, and yet they are always trying to make others be like them. - L. Spiro 2011
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums
I spent most of my life learning the courage it takes to go out and get what I want. Now that I have it, I am not sure exactly what it is that I want. - L. Spiro 2013
L. Spiro Engine: http://lspiroengine.com
L. Spiro Engine Forums: http://lspiroengine.com/forums






