VBO Confusion

Started by
2 comments, last by noodleBowl 8 years, 10 months ago

I'm developing in OpenGL ES 2 on Android (my dev device uses Jelly Bean) and I created a scene with 3 full screen backgrounds (two scroll horizontally), 2 particle emitters, a sprite, and some other things dealing with audio and controls. After starting up my app I see the FPS is pretty sad, averaging around 19 FPS.

Based on my other app tests I was pretty sure it was the graphics, so as a blind shot in the dark I changed the max quads that my sprite batcher could handle before it needed to flush the buffer from 5000 to 1000. Started it back up and bang! I instantly hit 55+ FPS, change the max quads again to 500 and I'm back at 60 FPS

Could someone tell me why this worked?

I mean its awesome, but I really have no idea how this actually made my FPS skyrocket.
The max quads constant in my batcher does effect how big the the VBO/IBO are, but I would think that does not matter as my call to glBufferSubData only fills the VBO with what it needs and my call to glDrawElements only draws the number of vertices required for the quads.

Here is my flush method which is responsible for actually doing the draw call


    private void FlushBatch()
    {
        //Back out if there is nothing to draw
        if(vboIndex == 0)
            return;
        
        //Set up everything for the Shader Program to run
        glActiveTexture(GL_TEXTURE0);
        glBindTexture(GL_TEXTURE_2D, batchTextureID);
        currentShaderProgram.UseProgram();
        currentShaderProgram.UseUniform("projectionMatrix", batchProjectionMatrix.data);
        currentShaderProgram.UseUniform("textureSampler", 0);
        currentShaderProgram.EnableAttribute("vertexPos", currentShaderProgram.AttributesTotalByteSize(), 0);
        currentShaderProgram.EnableAttribute("color", currentShaderProgram.AttributesTotalByteSize(), currentShaderProgram.Attribute("vertexPos").byteSize);
        currentShaderProgram.EnableAttribute("texCoords", currentShaderProgram.AttributesTotalByteSize(), currentShaderProgram.Attribute("vertexPos").byteSize + currentShaderProgram.Attribute("color").byteSize);


        //Prepare the vbo buffer that will be used by OpenGL
        vboBuffer.put(vertexData, 0, vboIndex);
        vboBuffer.position(0);

        //Bind the buffer and fill the GPU buffer 
        glBindBuffer(GL_ARRAY_BUFFER, vbos[0]);
        glBufferSubData(GL_ARRAY_BUFFER, 0, vboIndex * BYTES_PER_FLOAT, vboBuffer);

        //Bind the IBO and draw our quads
        glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibos[0]);
        glDrawElements(GL_TRIANGLES, (int)(INDICES_PER_QUAD * vboIndex * vboFloatsPerQuadRatio), GL_UNSIGNED_SHORT, 0);
        
        //Reset where we are in the vbo buffer
        vboIndex = 0;
    }

Here is my VBO/IBO creation


    private void CreateVboObjects()
    {
        //Gen the vbo names
        vboCount = 1;
        vbos = new int[vboCount];
        glGenBuffers(vboCount, vbos, 0);

        //setup vbo buffer (the java buffer) to be used by opengl
        //MAX_QUADS_PER_BATCH was changed from 5000 to 500 and the FPS skyrocketed. Why?
        vboBuffer = ByteBuffer.allocateDirect(MAX_QUADS_PER_BATCH * VERTEX_PER_QUAD * FLOAT_COMPONENTS_PER_VERTEX * BYTES_PER_FLOAT).order(ByteOrder.nativeOrder()).asFloatBuffer();
        vboBuffer.position(0);
        vboIndex = 0;
        vboFloatComponentsPerQuad = FLOAT_COMPONENTS_PER_VERTEX * VERTEX_PER_QUAD;
        vboFloatsPerQuadRatio = 1.0f / (float)vboFloatComponentsPerQuad;
        vboCurrentBuffer = 0;

        //Create the intermediate container for the vbo data (used because Put for java buffers is super slow :( )
        vertexData = new float[vboBuffer.limit()];

        //Create the VBO for opengl
        for(int i = 0; i < vbos.length; ++i)
        {
            glBindBuffer(GL_ARRAY_BUFFER, vbos[i]);
            glBufferData(GL_ARRAY_BUFFER, vboBuffer.limit() * BYTES_PER_FLOAT, null, GL_DYNAMIC_DRAW);
        }
    }

    private void CreateIboObject()
    {
        //Gen the ibo name
        ibos = new int[1];
        glGenBuffers(1, ibos, 0);

        //Create the java buffer for the ibo
        ShortBuffer iboData = ByteBuffer.allocateDirect(MAX_QUADS_PER_BATCH * INDICES_PER_QUAD * BYTES_PER_SHORT).order(ByteOrder.nativeOrder()).asShortBuffer();
        iboData.position(0);

        //Generate the ibo data to use
        for(int i = 0, j = 0; i < MAX_QUADS_PER_BATCH * INDICES_PER_QUAD; i += INDICES_PER_QUAD, j += VERTEX_PER_QUAD)
        {
            iboData.put((short) j);
            iboData.put((short)(j + 1));
            iboData.put((short)(j + 2));
            iboData.put((short)(j + 3));
            iboData.put((short) j);
            iboData.put((short)(j + 2));
        }
        iboData.flip();

        //Create the IBO buffer for opengl
        for(int i = 0; i < ibos.length; ++i)
        {
            glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibos[i]);
            glBufferData(GL_ELEMENT_ARRAY_BUFFER, iboData.limit() * BYTES_PER_SHORT, iboData, GL_STATIC_DRAW);
        }
    }
Advertisement

I could imagine, that your app don't really utilize the ability of concurrent CPU/GPU usage. A simple example:


1. Fill VBO
2. Draw VBO
3. Wait until rendering has been finished
4. Goto 1

This would be the worst case, because first the CPU do all the work,while the GPU idles, then the GPU works while the CPU idles.

Your approach would work like this


1. Fill VBO1
2. Draw VBO1
3. Fill VBO2
4. Draw VBO2
5. Fill VBO3
6. Draw VBO3
7. Wait until rendering has been finished
8. Goto 1

This works much better, because the GPU can start while the CPU is still used.

Even better would be:


1. Draw VBO1
2. Fill VBO2
3. Wait until rendering has been finished
4. swap VBO1 & VBO2
5. goto 1


Double buffering this way, your CPU and GPU would start to work concurrently on different data sets (one frame lag).

what are you exact draw timings </rant> I blows my mind everytime a developer makes a post about having performance issue and then proceed to explain their FPS as their measuring metric</rant>. Did you actually time the area of code that you suspect may be causing the issue ?

Double buffering this way, your CPU and GPU would start to work concurrently on different data sets (one frame lag).

I actually thought this might be it, but I haven't had the time to go back and test it. Although I don't think this will "fix" it

I went back and tried a double buffer. It does make performance a little better, but its not enough to make a huge impact like the change to my MAX_QUADS_PER_BATCH constant

what are you exact draw timings </rant> I blows my mind everytime a developer makes a post about having performance issue and then proceed to explain their FPS as their measuring metric</rant>. Did you actually time the area of code that you suspect may be causing the issue ?

Setting 5000 as my max number of quads gets me around 0.05263 DT (in seconds) (19 FPS 1/19 = 0.05263...)
Setting 500 as my max number of quads gets me a solid 0.01666 DT (in seconds) (60 FPS 1/60 = 0.01666...)

When my MAX_QUADS_PER_BATCH is 5000 my glDrawElements call takes 0.004 - 0.008 seconds
When my MAX_QUADS_PER_BATCH is 500 my glDrawElements call takes 0.0000275 - 0.0000488 seconds

Draw code that was timed:
//Bind the IBO and draw our quadsglBindBuffer(GL_ELEMENT_ARRAY_BUFFER, ibos[0]);glDrawElements(GL_TRIANGLES, (int)(INDICES_PER_QUAD * vboIndex * vboFloatsPerQuadRatio), GL_UNSIGNED_SHORT, 0);
The only difference between the 5K run and the 500 run is the amount of quads the VBO has room for and the size of my IBO, since its size it directly tied to the VBO.
My only real question here is why does the draw call preform so poorly in the 5K one? There is no difference in how or what each run is drawing. Everything besides the above is the same in each run.

So what gives? Could it be the buffer size I'm allocating? It is close to 1mb for the VBO (each quad is worth 160 bytes).The call to glDrawElements is the call that takes up the most time according to the profiler built into android studio

This topic is closed to new replies.

Advertisement