Hi there. This question is about fill rate performance. What will be faster: rendering a quad with 1024x1024 texture or the same quad with 128x128 texture (scaled with linear interpolation). I'm not seeing any FPS improvement in my application for 128x128, even a small 1 FPS decrease from 1024x1024... Is there some performance gain from this perspective?
Thanks
Fill rate questions
It depends on amount of pixels drawn, not texture size.
You'll see performance gain only if you are fill rate limited.
You'll see performance gain only if you are fill rate limited.
I didn't quite understand... so, indiferently if I'm using a 1024x1024 texture or 128x128 texture, if it's placed over the same quad and occupies visually the same amount of screen, it will render the same amount of pixels which means no fill-rate gain?
My admittedly limited understanding of the situation says 'yes', it will render at roughly the same speed, because your shader has to operate over every pixel, whether it fetches the texel data from a 128^2 texture or a 1024^2 texture is of little consequence, it still must fetch a single value for each texel.
I think such a simple case isn't good for performance analysis. Practically in your case, you use same amount of fill rate regardless of texture size. However, I think that at some point with certain low end hardware texture sizes may become bottle neck, either for consumed memory bandwidth (reading texture from memory) or consumed memory (which results in swapping). Memory bandwidth of modern cards is measured in tens/hundreds gigabytes per second.
Cheers!
Cheers!
If this is for iOS then it is not for DirectX or XNA and this is a wrong forum.
Fill-rate limitations involve how many pixels you render to the screen and nothing more. This includes overlapping pixels but you won’t have them here with your test case.
So from a fill-rate standpoint, your results will be completely identical.
Then you have bandwidth limitations which dictate how fast it is to send data to the graphics hardware. Normally larger textures are slower to send, but iOS devices use a UMM (Unified Memory Model) which means there is no GPU RAM and nowhere to send the data, so here again you will not see a change.
Then there are cache hits. This applies to every device, regardless of memory model (except to pedantically mention devices with no cache). If you are blitting every pixel in a texture, smaller is faster due to better caching. This is why (among many reasons) mipmaps are so important.
However, as it was mentioned above, your test case is useless. You can’t detect the differences between each of these potential bottlenecks with just 4 vertices and a texture being drawn every frame. Firstly, iOS devices are capped at 60 FPS. No matter how bad the above limitations are, it will never drop below 60 FPS with that kind of test case.
You need to get more into your scene until it starts staying around 30 FPS before you start benchmarking anything.
L. Spiro
Fill-rate limitations involve how many pixels you render to the screen and nothing more. This includes overlapping pixels but you won’t have them here with your test case.
So from a fill-rate standpoint, your results will be completely identical.
Then you have bandwidth limitations which dictate how fast it is to send data to the graphics hardware. Normally larger textures are slower to send, but iOS devices use a UMM (Unified Memory Model) which means there is no GPU RAM and nowhere to send the data, so here again you will not see a change.
Then there are cache hits. This applies to every device, regardless of memory model (except to pedantically mention devices with no cache). If you are blitting every pixel in a texture, smaller is faster due to better caching. This is why (among many reasons) mipmaps are so important.
However, as it was mentioned above, your test case is useless. You can’t detect the differences between each of these potential bottlenecks with just 4 vertices and a texture being drawn every frame. Firstly, iOS devices are capped at 60 FPS. No matter how bad the above limitations are, it will never drop below 60 FPS with that kind of test case.
You need to get more into your scene until it starts staying around 30 FPS before you start benchmarking anything.
L. Spiro
Thank you very much L.Spiro. It makes sense I was writing on DX forum, because our engine works on both Win/Mac and has separate code paths for rendering, so I was doing the test on PC primarily, thinking of how this will reflect on iPad.
You should never test that way. They are completely unrelated hardware.
PowerVR (iOS graphics hardware) uses deferred tile-based rendering and the device uses a unified memory model.
There are hundreds of differences between iOS devices and Windows®/Macintosh®.
Deferred tile-based rendering means overdraw is eliminated (in standard cases).
Unified memory model means bus transfers are eliminated.
Then there are differences in the drivers for each API.
For DirectX 9 it is faster not to redundancy-check large shader uniforms such as matrices or arrays.
For OpenGL on desktop it depends on the driver, but usually it is faster to manually redundancy-check large shader uniforms even if they change frequently.
For OpenGL ES 2 on iOS devices, it doesn’t matter either way. Checking if a uniform is redundant or just sending it to the shader is essentially the same speed (again, unified memory model).
And by the way, this all applies to the iOS Simulator as well. It has nothing to do with a real device. It doesn’t even try to emulate speed etc. In fact its implementation of OpenGL ES 2 is software-emulated, meaning not hardware accelerated.
Never ever test on anything but the real device.
L. Spiro
PowerVR (iOS graphics hardware) uses deferred tile-based rendering and the device uses a unified memory model.
There are hundreds of differences between iOS devices and Windows®/Macintosh®.
Deferred tile-based rendering means overdraw is eliminated (in standard cases).
Unified memory model means bus transfers are eliminated.
Then there are differences in the drivers for each API.
For DirectX 9 it is faster not to redundancy-check large shader uniforms such as matrices or arrays.
For OpenGL on desktop it depends on the driver, but usually it is faster to manually redundancy-check large shader uniforms even if they change frequently.
For OpenGL ES 2 on iOS devices, it doesn’t matter either way. Checking if a uniform is redundant or just sending it to the shader is essentially the same speed (again, unified memory model).
And by the way, this all applies to the iOS Simulator as well. It has nothing to do with a real device. It doesn’t even try to emulate speed etc. In fact its implementation of OpenGL ES 2 is software-emulated, meaning not hardware accelerated.
Never ever test on anything but the real device.
L. Spiro
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement