Jump to content

  • Log In with Google      Sign In   
  • Create Account

Awesome job so far everyone! Please give us your feedback on how our article efforts are going. We still need more finished articles for our May contest theme: Remake the Classics

#ActualHodgman

Posted 01 July 2012 - 09:16 AM

You guys have already derailed the topic, so...

The biggest difference that pops out to me is in regards to clearing.
Clearing buffers on iOS devices just sets a flag.  It doesn’t actually copy memory over the whole buffer etc.
It is instantaneous as long as it does not cause a resolve.  You can avoid resolves by calling glDiscardFramebufferEXT() before glClear().
This means that what makes for a long operation on PC is virtually free on iOS.

please don't teach people that, that's simply not true. it has the exact same cost as drawing a full screen triangle on iOS.

on PC it depends on what GPU you have, if you use one that has HiZ, it will just invalidate some areas

@L. Spiro - I would expect PC GPUs to still have "fast clear" optimisation in hardware. This was present in PC hardware 5 years ago, so it should be still be around. It will likely depend on the texture format as to whether it's supported or not (more likely supported on depth and 8-bit channel formats).
@Krypt0n - HiZ can't possibly help when clearing a non-depth texture, and not every depth texture will necessarily be assigned a corresponding hierarchical representation (which yes, should support fast clear) -- there might only be a single HiZ "buffer" which the current depth target can make use of (but isn't permanently assigned to that target).

Actually on all platform you should first sort by shader (at least if overdraw is not an issue e.g. due to a z-pass). it's not related to drivers, it's how hardware works

On several widely popular (read: older) GPUs, a change to shader constants causes the same performance impacts as changing the shader program itself (which may or may not be a bottleneck for your scene, only sensible profiling would tell) -- so sorting by shader program isn't going to do anything on these GPUs if you're also changing any shader constants between draw-calls, as these are causing internal program switches anyway (unless grouping by shader helps you to reduce changes to shader-constants).

You should never update buffers once you start drawing, as the PowerVR gpu is deferred, this means, it keeps all buffers until the end of the frame, so when you lock, it will have to allocate new memory etc.
it doesn't matter how many times you buffer, if you try to modify a buffer that was used already in this frame for drawing, the driver will have to allocate a temporal new buffer etc.
This performance behavior is true for nearly all mobile GPUs, PowerVR, Adreno, Mali... to my knowledge only Tegra has no deferred rendering and it could be fine with updates in the middle of the frame, but even there, you could stall until the HW is done with the last drawcall that was using this particular buffer.

All regular PC GPUs do that (buffer your data/commands for at least one frame) -- this has nothing to do with mobile/deferred -- deferred/PowerVR style buffering is a completely different concept implemented between the primitive rasteriser and the pixel shader.

#5Hodgman

Posted 01 July 2012 - 09:14 AM

You guys have already derailed the topic, so...

The biggest difference that pops out to me is in regards to clearing.
Clearing buffers on iOS devices just sets a flag.  It doesn’t actually copy memory over the whole buffer etc.
It is instantaneous as long as it does not cause a resolve.  You can avoid resolves by calling glDiscardFramebufferEXT() before glClear().
This means that what makes for a long operation on PC is virtually free on iOS.

please don't teach people that, that's simply not true. it has the exact same cost as drawing a full screen triangle on iOS.

on PC it depends on what GPU you have, if you use one that has HiZ, it will just invalidate some areas

@L. Spiro - I would expect PC GPUs to still have "fast clear" optimisation in hardware. This was present in PC hardware 5 years ago, so it should be still be around. It will likely depend on the texture format as to whether it's supported or not (more likely supported on depth and 8-bit channel formats).
@Krypt0n - HiZ can't possibly help when clearing a non-depth texture, and not every depth texture will necessarily be assigned a corresponding hierarchical representation (which yes, should support fast clear) -- there might only be a single HiZ "buffer" which the current depth target can make use of (but isn't permanently assigned to that target).

Actually on all platform you should first sort by shader (at least if overdraw is not an issue e.g. due to a z-pass). it's not related to drivers, it's how hardware works

On several widely popular GPUs, changing a shader constants causes the same performance impacts as changing the shader program itself (which may not be a bottleneck for your project, only sensible profiling would tell) -- so sorting by shader program isn't going to do anything (on these GPUs) if you're also changing any shader constants between draw-calls (as these are causing internal program switches anyway)...

You should never update buffers once you start drawing, as the PowerVR gpu is deferred, this means, it keeps all buffers until the end of the frame, so when you lock, it will have to allocate new memory etc.
it doesn't matter how many times you buffer, if you try to modify a buffer that was used already in this frame for drawing, the driver will have to allocate a temporal new buffer etc.
This performance behavior is true for nearly all mobile GPUs, PowerVR, Adreno, Mali... to my knowledge only Tegra has no deferred rendering and it could be fine with updates in the middle of the frame, but even there, you could stall until the HW is done with the last drawcall that was using this particular buffer.

All regular PC GPUs do that (buffer your data/commands for at least one frame) -- this has nothing to do with mobile/deferred -- deferred/PowerVR style buffering is a completely different concept implemented between the primitive rasteriser and the pixel shader.

#4Hodgman

Posted 01 July 2012 - 05:52 AM

The biggest difference that pops out to me is in regards to clearing.
Clearing buffers on iOS devices just sets a flag.  It doesn’t actually copy memory over the whole buffer etc.
It is instantaneous as long as it does not cause a resolve.  You can avoid resolves by calling glDiscardFramebufferEXT() before glClear().
This means that what makes for a long operation on PC is virtually free on iOS.

please don't teach people that, that's simply not true. it has the exact same cost as drawing a full screen triangle on iOS.

on PC it depends on what GPU you have, if you use one that has HiZ, it will just invalidate some areas

@L. Spiro - I would expect PC GPUs to still have "fast clear" optimisation in hardware. This was present in PC hardware 5 years ago, so it should be still be around. It will likely depend on the texture format as to whether it's supported or not (more likely supported on depth and 8-bit channel formats).
@Krypt0n - HiZ can't possibly help when clearing a non-depth texture, and not every depth texture will necessarily be assigned a corresponding hierarchical representation (which yes, should support fast clear) -- there might only be a single HiZ "buffer" which the current depth target can make use of (but isn't permanently assigned to that target).

Actually on all platform you should first sort by shader (at least if overdraw is not an issue e.g. due to a z-pass). it's not related to drivers, it's how hardware works

On several widely popular GPUs, changing a shader constants causes the same performance impacts as changing the shader program itself (which may not be a bottleneck for your project, only sensible profiling would tell) -- so sorting by shader program isn't going to do anything (on these GPUs) if you're also changing any shader constants between draw-calls (as these are causing internal program switches anyway)...

You should never update buffers once you start drawing, as the PowerVR gpu is deferred, this means, it keeps all buffers until the end of the frame, so when you lock, it will have to allocate new memory etc.
it doesn't matter how many times you buffer, if you try to modify a buffer that was used already in this frame for drawing, the driver will have to allocate a temporal new buffer etc.
This performance behavior is true for nearly all mobile GPUs, PowerVR, Adreno, Mali... to my knowledge only Tegra has no deferred rendering and it could be fine with updates in the middle of the frame, but even there, you could stall until the HW is done with the last drawcall that was using this particular buffer.

All regular PC GPUs do that (buffer your data/commands for at least one frame) -- this has nothing to do with mobile/deferred -- deferred/PowerVR style buffering is a completely different concept implemented between the primitive rasteriser and the pixel shader.

#3Hodgman

Posted 01 July 2012 - 05:46 AM

The biggest difference that pops out to me is in regards to clearing.
Clearing buffers on iOS devices just sets a flag.  It doesn’t actually copy memory over the whole buffer etc.
It is instantaneous as long as it does not cause a resolve.  You can avoid resolves by calling glDiscardFramebufferEXT() before glClear().
This means that what makes for a long operation on PC is virtually free on iOS.

please don't teach people that, that's simply not true. it has the exact same cost as drawing a full screen triangle on iOS.

on PC it depends on what GPU you have, if you use one that has HiZ, it will just invalidate some areas

@L. Spiro - I would expect PC GPUs to still have "fast clear" optimisation in hardware. This was present in PC hardware 5 years ago, so it should be still be around. It will likely depend on the texture format as to whether it's supported or not (more likely supported on depth and 8-bit channel formats).
@Krypt0n - HiZ can't possibly help when clearing a non-depth texture, and not every depth texture will necessarily be assigned a corresponding hierarchical representation (which yes, should support fast clear).

Actually on all platform you should first sort by shader (at least if overdraw is not an issue e.g. due to a z-pass). it's not related to drivers, it's how hardware works

On several widely popular GPUs, changing a shader constants causes the same performance impacts as changing the shader program itself (which may not be a bottleneck for your project, only sensible profiling would tell) -- so sorting by shader program isn't going to do anything (on these GPUs) if you're also changing any shader constants between draw-calls (as these are causing internal program switches anyway)...

You should never update buffers once you start drawing, as the PowerVR gpu is deferred, this means, it keeps all buffers until the end of the frame, so when you lock, it will have to allocate new memory etc.
it doesn't matter how many times you buffer, if you try to modify a buffer that was used already in this frame for drawing, the driver will have to allocate a temporal new buffer etc.
This performance behavior is true for nearly all mobile GPUs, PowerVR, Adreno, Mali... to my knowledge only Tegra has no deferred rendering and it could be fine with updates in the middle of the frame, but even there, you could stall until the HW is done with the last drawcall that was using this particular buffer.

All regular PC GPUs do that (buffer your data/commands for at least one frame) -- this has nothing to do with mobile/deferred -- deferred/PowerVR style buffering is a completely different concept implemented between the primitive rasteriser and the pixel shader.

#2Hodgman

Posted 01 July 2012 - 05:42 AM

The biggest difference that pops out to me is in regards to clearing.
Clearing buffers on iOS devices just sets a flag.  It doesn’t actually copy memory over the whole buffer etc.
It is instantaneous as long as it does not cause a resolve.  You can avoid resolves by calling glDiscardFramebufferEXT() before glClear().
This means that what makes for a long operation on PC is virtually free on iOS.

please don't teach people that, that's simply not true. it has the exact same cost as drawing a full screen triangle on iOS.

on PC it depends on what GPU you have, if you use one that has HiZ, it will just invalidate some areas

@L. Spiro - I would expect PC GPUs to still have "fast clear" optimisation in hardware. This was present in PC hardware 5 years ago, so it should be still be around. It will likely depend on the texture format as to whether it's supported or not (more likely supported on depth and 8-bit channel formats).
@Krypt0n - HiZ can't possibly help when clearing a non-depth texture, and not every depth texture will necessarily be assigned a corresponding hierarchical representation (which yes, should support fast clear).

Actually on all platform you should first sort by shader (at least if overdraw is not an issue e.g. due to a z-pass). it's not related to drivers, it's how hardware works

On several widely popular GPUs, changing a shader constants causes the same performance impacts as changing the shader program itself -- so sorting by shader program isn't going to do anything (on these GPUs) if you're also changing any shader constants between draw-calls.

You should never update buffers once you start drawing, as the PowerVR gpu is deferred, this means, it keeps all buffers until the end of the frame, so when you lock, it will have to allocate new memory etc.
it doesn't matter how many times you buffer, if you try to modify a buffer that was used already in this frame for drawing, the driver will have to allocate a temporal new buffer etc.
This performance behavior is true for nearly all mobile GPUs, PowerVR, Adreno, Mali... to my knowledge only Tegra has no deferred rendering and it could be fine with updates in the middle of the frame, but even there, you could stall until the HW is done with the last drawcall that was using this particular buffer.

All regular PC GPU does that (buffers for at least one frame) -- that has nothing to do with mobile/deferred -- deferred/PowerVR style buffering is a completely different concept implemented between the rasteriser and the pixel shader.

#1Hodgman

Posted 01 July 2012 - 05:25 AM

Actually on all platform you should first sort by shader (at least if overdraw is not an issue e.g. due to a z-pass). it's not related to drivers, it's how hardware works

On several widely popular GPUs, changing a shader constants causes the same performance impacts as changing the shader program itself -- so sorting by shader program isn't going to do anything (on these GPUs) if you're also changing any shader constants between draw-calls.

PARTNERS