Measuring performance - overdraw

Graphics and GPU Programming Programming

Started by maxest April 22, 2012 12:30 AM

17 comments, last by maxest 12 years ago

11,263

April 22, 2012 03:52 PM

I think your timing method is suspect; the only real way to find out what is going on is to spin up a performance analysis tool and take a look at it with that.

maxest

644

Author

April 22, 2012 04:04 PM

My timing is purely based on http://www.opengl.org/registry/specs/ARB/timer_query.txt. I start profiling before calling glDrawElements and stop right after. Here's a class I quickly wrote:



class CTimer

{

public:

    GLuint queries[1];

    GLint available;

    GLuint64 timeElapsed;



public:

    void start()

    {

        available = 0;

        timeElapsed = 0;



        // Create a query object.

        glGenQueries(1, &queries[0]);



        // Start query 1

        glBeginQuery(GL_TIME_ELAPSED, queries[0]);

    }



    void stop()

    {

        glEndQuery(GL_TIME_ELAPSED);



        // Wait for all results to become available

        while (!available) {

            glGetQueryObjectiv(queries[0], GL_QUERY_RESULT_AVAILABLE, &available);

        }



        for (int i = 0; i < 1; i++) {

            // See how much time the rendering of object i took in nanoseconds.

            glGetQueryObjectui64v(queries[0], GL_QUERY_RESULT, &timeElapsed);



            // Do something useful with the time.  Note that care should be

            // taken to use all significant bits of the result, not just the

            // least significant 32 bits.

            printf("%lu\n", timeElapsed/1000);

        }

    }

};

I would happily use some profiling tool if I could finally manage to. I've had a couple go-rounds to NVPerfHud but never managed to get it working...

osmanb

2,082

April 22, 2012 04:57 PM

Just to clarify the "why" of early/late Z rejection: Because there are some situations when depth test must be done after the pixel shader (shaders that output depth, in particular) - the hardware needs to be able to do the full depth test logic after that stage. Once you've made that concession, including all of the additional logic in the hardware to also be able to do it before (and switch dynamically, based on things like presence of clip instructions) isn't considered worth it. Instead, they just include the coarse early rejection hardware. Every piece of functionality costs die space - and alternate data paths like that are particularly nasty from a chip complexity standpoint. The GPU is much easier to design and optimize if it's a single long pipeline. (Which obviously isn't entirely true, but for the parts that remain fixed-function, like depth test, it's still a good tradeoff to make).

maxest

644

Author

April 22, 2012 05:33 PM

Yeah... I guess it's much, much more complex down there at the hardware level than it is when you write your own software rasterizer running on the CPU

MJP

20,295

April 22, 2012 05:33 PM

Just to clarify the "why" of early/late Z rejection: Because there are some situations when depth test must be done after the pixel shader (shaders that output depth, in particular) - the hardware needs to be able to do the full depth test logic after that stage. Once you've made that concession, including all of the additional logic in the hardware to also be able to do it before (and switch dynamically, based on things like presence of clip instructions) isn't considered worth it. Instead, they just include the coarse early rejection hardware.

Except that isn't true, since modern GPU's *do* have the hardware to perform full fine-grained depth/stencil testing before execution of the pixel shader. They still have coarse-grained z/stencil rejection, since it is cheaper to reject entire tiles than it is to perform a full z-test per-pixel.

The Blog | The Book

mikiex

261

April 22, 2012 08:41 PM

One thing I have learned is never assume anything with modern hardware, there is just way too much going on. The only way to be sure is to capture a frame and time it through the GPU (even this has limited meaning you need to do captures from many camera locations).. At least in the world of consoles you learn so much from this about the hardware, I don't know how much fine grained data you get from a PC graphics card these days and at the end of the day you're probably not going to be trying to get the absolute last drop of performance anyway. You can only really do your best to find mistakes in what you are sending to be rendered, there is no point trying to start micro optomizing until you have at least something that represents the final scene you are rendering.

maxest

644

Author

April 22, 2012 11:17 PM

Well, the whole thing started when I decided to add grass rendering to game, which is a top-down game. I noticed that the game's FPS dropped from 180 (where there is only ground plane visible) to 60 FPS (alpha blending, ground plane visible, grass on top of it and covers entire screen) or 80 FPS (alpha test). I could have guessed why alpha blending is such a killer here, but the time needed for alpha tested version surprised me. I expected the drop to be less dramatic. Then I turned off alpha testing (and alpha blending) - 113 FPS. So this difference, 180 - 113, lead me to more accurate profiling because in that case I expected very similar results. Now I know it's more complex that that and that depth pre-pass doesn't guarantee 100% early depth rejection

mikiex

261

April 23, 2012 12:04 PM

Well, the whole thing started when I decided to add grass rendering to game, which is a top-down game. I noticed that the game's FPS dropped from 180 (where there is only ground plane visible) to 60 FPS (alpha blending, ground plane visible, grass on top of it and covers entire screen) or 80 FPS (alpha test). I could have guessed why alpha blending is such a killer here, but the time needed for alpha tested version surprised me. I expected the drop to be less dramatic. Then I turned off alpha testing (and alpha blending) - 113 FPS. So this difference, 180 - 113, lead me to more accurate profiling because in that case I expected very similar results. Now I know it's more complex that that and that depth pre-pass doesn't guarantee 100% early depth rejection .

Sometimes its not going to be worth doing a prepass, if there is a lot of batches it could have a negative effect on CPU performance - this depends on your target platforms (modern hardware + API its less of an issue).Also you pay something in rasterising operations even if rendering fast Z for your prepass . If you instead draw as much as you can in the correct order with no prepass, with some luck you might have better performance. You might want to look at your grass asset and make sure it is optimal - as in make sure there is no wasted 0 alpha at the top of the quad.

Top down games are nice to work on in terms of performance of rendering.

maxest

644

Author

April 23, 2012 01:51 PM

Top down games are nice to work on in terms of performance of rendering.

Indeed

.

I would argue that Z-prepass *mostly* is a win. Assuming we have enough CPU to spare and a lot of pixel processing. For instance, I use forward rendering and have quite a few lights in the scene. Now, because I have depth pre-pass, for alpha tested geometry I do the alpha test and discard only *once* in the depth pre-pass. All other lighting shaders don't have to do it, cause the depth buffer is already filled correctly.

Measuring performance - overdraw

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Measuring performance - overdraw

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines