Optimization of many small glDrawElements() calls

Started by
9 comments, last by 3TATUK2 10 years, 9 months ago

So I realize there are many better ways to do this in modern OpenGL. But here's what I have:


class World
....
function render() {
   foreach(chunk)
        if (chunk.isReady && chunk.isVisible) {
              chunk.render()
        }
}

class WorldChunk
public int[][][] blocks;
...
function render() {
    if (DISPLAY_LIST != 0) {
         glCallList(DISPLAY_LIST);
    } else {
         buildDisplayList();
    }
}
function buildDisplayList() {
        int list = GL11.glGenLists(1);   
        GL11.glNewList(list, GL11.GL_COMPILE);    
        pushVertexData();
        GL11.glEndList();  
}
function pushVertexData() {
        glPushMatrix();
        glTranslatef(x,y,z);

        glBindBuffer(GL_ARRAY_BUFFER, Block.vboVertexHandle);  //From a single static vbo stored in Block class - VBO built once on startup
        glVertexPointer(3, GL_FLOAT, 0, 0L);
        glBindBuffer(GL_ARRAY_BUFFER, Block.vboNormalHandle);  //From a single static vbo stored in Block class - VBO built once on startup
        glNormalPointer(GL_FLOAT, 0, 0L);
        
        GL11.glEnableClientState(GL11.GL_VERTEX_ARRAY);
        GL11.glEnableClientState(GL11.GL_NORMAL_ARRAY);

        for (int i = 0; i < sizeX; i++) {
            for (int j = 0; j < sizeY; j++) {
                for (int k = 0; k < sizeZ; k++) {
                    //Determine exposed faces..
                    EXPOSED_FACES = determineExposedFaces(i,j,k);
                    //Contains boolean array [true, true, true, true, true, true] of faces to draw if they are not hidden
                    //
                    Block.render(i,j,k, EXPOSED_FACES);
                }
            }
        }
        glDisableClientState(GL_NORMAL_ARRAY);
        glDisableClientState(GL_VERTEX_ARRAY);
        glPopMatrix();
}


class Block
...
function render(int x, int y, int z, int type, boolean[] faces) {
if (faces[0] || faces[1] || faces[2] || faces[3] || faces[4] || faces[5]) {
            glPushMatrix();                                        
            glTranslatef(x + size, y + size, z + size);     
            if (faces[0]) {
                glDrawElements(GL_TRIANGLES, frontIndicies);
            }
            if (faces[1]) {
                glDrawElements(GL_TRIANGLES, rightIndicies);
            }
            if (faces[2]) {
                glDrawElements(GL_TRIANGLES, topIndicies);
            }
            if (faces[3]) {
                glDrawElements(GL_TRIANGLES, leftIndicies);
            }
            if (faces[4]) {
                glDrawElements(GL_TRIANGLES, bottomIndicies);
            }
            if (faces[5]) {
                glDrawElements(GL_TRIANGLES, backIndicies);
            }

            glPopMatrix();
        }
}
 

As you can see, I'm basically just storing a single VBO of the cube and using indicies to decide which faces to actually draw. The problem is that I know this is not an optimal way to do things. I am currently:

1) Rendering only those chunks within +/- X, +/- Y units of the camera

2) Doing efficient frustum culling to decide which chunks to are outside the viewing area

3) Using display lists to "bake" a chunk's geometry

4) Using a single VBO for all cubes

In any given scene I might have only 20k-40k faces being drawn in actual on-GPU geometry, but even this is fairly slow. Even on a GTX 690, I can only sustain about 80fps with about 50k faces on frame.

I know I should switch to shaders and do this the true modern way, but is there a fundamental concept I'm abusing and causing such poor performance? I know that performance is not substantially consumed elsewhere in the code based on profiling the app. For example, I hold at 1650 fps if I just comment out each of the glDrawElements() calls above so nothing is drawn. I must be overwhelming the card with poorly scheduled draw calls on tiny vertex arrays...

Advertisement

Making too many draw calls is not a good thing, it would be much better to create a single vertexbuffer from all of those cubes. Maybe use an update function to update the all the needed visible faces, just the same way as you're drawing them now, but instead you could push them in a vertexbuffer. Then you could draw the whole vertexbuffer in your render function with just one draw call.

Derp

Making too many draw calls is not a good thing, it would be much better to create a single vertexbuffer from all of those cubes. Maybe use an update function to update the all the needed visible faces, just the same way as you're drawing them now, but instead you could push them in a vertexbuffer. Then you could draw the whole vertexbuffer in your render function with just one draw call.

Thanks Sponji. I guess I knew this, but I had read that it can be less efficient to draw megalithic draws (entire scene graphs for example) vs smaller drawing calls. What you suggest makes all kinds of sense.

I will revamp it to push the geometry into a buffer and render each chunk in one go. What do you suggest for pushing arbitrary amounts of data into the buffer? A dynamic array and just keep pushing the ordered verticies one-at-a-time? A fixed size FloatBuffer of the "worst-case" chunk size and just populate what you need? Or is there a nicer way to push vertex data as you see it onto a vertex buffer? It seems the buffers are fixed size.

I'm not really sure which way would be the best, but usually for that kind of chunks I've done it just by uploading the whole buffer again and it has been working quite nicely. Of course it would be good to spend some time profiling those different methods, at least if it becomes a problem.

Derp

So I have started the process of moving all my rendering out to a single VBO on WorldChunk.

However, I'm pretty baffled as to how I can push all the vertex data into a single FloatBuffer. It's a pretty basic question about OpenGL, but maybe if we talk through it it will make more sense.

The way I was rendering my scene before relied on glTranslatef() to do all the heavy lifting to get my vertex data in position. The World.render() would iterate over the WorldChunks and call render() on those. WorldChunk.render() would glTranslatef(x,y,z) into the chunk's position. Then WorldChunks.render() would iterate over the Blocks and in turn each block would glTranslatef(x,y,z) to get the block into place. The nice thing about this is that I could draw faces without worrying about vertex data from an earlier face draw.

Now I need to replicate the same process on raw vertex data stored in a buffer. What I do now is let WorldChunk.generate() occur in its own thread - not using any OpenGL calls during this compile pass. The function needs to draw only those faces that are exposed and visible. I do this by determining which faces are exposed and pushing *only those faces* onto an ArrayList<FloatBuffer>. Once the list is done, I push all those individual FloatBuffers onto a single new FloatBuffer vbuffer.

WorldChunk

--------------------

private void generate() {

.....

buildMesh();

}


public void buildMesh() {
        this.dynamicVertexData = new ArrayList<FloatBuffer>();

        for (int i = 0; i < sizeX; i++) {
            for (int j = 0; j < sizeY; j++) {
                for (int k = 0; k < sizeZ; k++) {
                    if (blocks[i][j][k] != 0) {
                            Block.render(i, j, k, 1, EXPOSED_FACES, wireframe, this);  //Writes each face as its own FloatBuffer in dynamicVertexData
                    }
                }
            }
        }

        this.vbuffer = Util.getFloatBuffer(this.dynamicVertexData.size() * 18);
        //Convert all the individual floatbuffers into one megalithic float buffer
        for (int i = 0; i < this.dynamicVertexData.size(); i++) {
            if (this.dynamicVertexData.size() > 0) {
                this.vbuffer.put(this.dynamicVertexData.get(i));
            }
        }
        this.vbuffer.flip();
}


    private void buildVBO() {
        vboVertexHandle = GL15.glGenBuffers();
        GL15.glBindBuffer(GL15.GL_ARRAY_BUFFER, vboVertexHandle);
        GL15.glBufferData(GL15.GL_ARRAY_BUFFER, this.vbuffer, GL15.GL_STATIC_DRAW);
        GL15.glBindBuffer(GL15.GL_ARRAY_BUFFER, 0);
    }


    public void drawMesh(boolean wireframe) {
        GL11.glPushMatrix();
        GL11.glTranslatef(this.worldPositionX, 0, this.worldPositionZ);
        GL11.glPolygonMode(GL11.GL_FRONT, GL11.GL_FILL);

        GL15.glBindBuffer(GL15.GL_ARRAY_BUFFER, this.vboVertexHandle);
        GL11.glVertexPointer(3, GL11.GL_FLOAT, 0, 0L);

        GL11.glEnableClientState(GL11.GL_VERTEX_ARRAY);

        //Draw now
        if (this.dynamicVertexData.size() > 0) {
            GL11.glDrawArrays(GL11.GL_TRIANGLES, 0, this.dynamicVertexData.size() * 6);
        }
        GL15.glBindBuffer(GL15.GL_ARRAY_BUFFER, 0);
        GL11.glDisableClientState(GL11.GL_VERTEX_ARRAY);
        GL11.glPopMatrix();
    }

Block
---------------------------


    public static FloatBuffer verticiesFront(float x, float y, float z) {
        return Util.getFloatBuffer(new float[]{
                    (x * size), (y * size), (z * size), -(x * size), (y * size), (z * size), -(x * size), -(y * size), (z * size), // v0-v1-v2 (front)
                    -(x * size), -(y * size), (z * size), (x * size), -(y * size), (z * size), (x * size), (y * size), (z * size), // v2-v3-v0
                });
    }
.....

render(int x, int y, int z, int type, boolean[] faces, WorldChunk chunk) 

if (faces[0] || faces[1] || faces[2] || faces[3] || faces[4] || faces[5]) {
            if (faces[0]) {
                chunk.dynamicVertexData.add(verticiesFront(x, y, z));
            }
            if (faces[1]) {
                chunk.dynamicVertexData.add(verticiesRight(x, y, z));
            }
            if (faces[2]) {
                chunk.dynamicVertexData.add(verticiesTop(x, y, z));
            }
            if (faces[3]) {
                chunk.dynamicVertexData.add(verticiesLeft(x, y, z));
            }
            if (faces[4]) {
                chunk.dynamicVertexData.add(verticiesBottom(x, y, z));
            }
            if (faces[5]) {
                chunk.dynamicVertexData.add(verticiesBack(x, y, z));
            }
        }

Essentially I will use the same process of pushing points x,y,z into the buffer, one at a time, but I need to make sure that those x,y,z are effectively translated as they would be with glTranslatef(x,y,z).

Do I just need to find a way to translate the points or will I have bigger problems?

For example, imagine a chunk 16x16x16 chunk with only 5 exposed faces for some odd reason. They are scattered throughout the chunk. When I declare a face in the vertex buffer/mesh, I need it to be entirely self-contained - the preceding point from a face should not break the next face to draw. How can I accomplish this?

When you draw the points, you can never "pick up your pencil" really. I need to draw a face, stop drawing, and then draw again somewhere else, all the while keeping that information in the vertex buffer.

Maybe I am greatly over complicating this. Can I draw using my old method and somehow extract the raw vertex data from OpenGL *after* I have done all the glTranslatef() calls and my mesh is complete inside OpenGL's current frame? It would be slower on the first pass, due to all the push/translate/pop, but once everything is in place, I'd love to just snap a copy of it.

I think your verticiesFront is totally wrong. Just think about the values, those are from -size*x to +size*x. I would do just size*x and size*(x+1). Or if you really want the tile's center to be at zero: x*size - halfSize and x*size + halfSize.

Those positions are inside the chunk. And when you're rendering you could translate those chunks by tileSize * numberOfTiles (seems that you're already doing that). It probably also helps reading if you keep your tile size as 1.

Here is some pseudo code:


// Update chunk
void update(chunk) {
    vertices[];
    foreach(tile) {
        if(face_front) {
            // I keep the tile size as 1 in this case, so the tile's start is at 0 and end is at 1
            // Let's create a quad for the current tile
            Vertex v0(tile.x,   tile.y,   tile.z+1);
            Vertex v1(tile.x+1, tile.y,   tile.z+1);
            Vertex v2(tile.x+1, tile.y+1, tile.z+1);
            Vertex v3(tile.x,   tile.y+1, tile.z+1);
            // And add it into the array
            vertices.add(v0);
            vertices.add(v1);
            vertices.add(v2);
            vertices.add(v3);
        }
    }
    // And finally, create a vertexbuffer from those vertices
    chunk.vbo = create_vbo(vertices);
}

void render(chunk) {
    // chunk.size means the number of tiles per one axis
    translate(chunk.position * chunk.size);
    render(chunk.vbo);
}


Can I draw using my old method and somehow extract the raw vertex data from OpenGL *after* I have done all the glTranslatef() calls and my mesh is complete inside OpenGL's current frame?

Yes, but I wouldn't suggest that, because you should use your own matrices. But just in case, it goes something like this in C:


float matrix[16]; 
glGetFloatv(GL_MODELVIEW_MATRIX, matrix);

Translation part is in the last column, matrix[12], matrix[13], matrix[14].

Btw, plural of vertex is vertices.

Derp

Thanks Sponji. You've been a big help.

After spending most of the night being frustrated, I just took a step back. The solution is now working well. I am using interleaved vertex, normal, color and texcoord arrays, but I'm having trouble dealing with the degenerative verticies

WorldChunk.buildMesh() calls

verticies.add(Block.generate(i, j, k, EXPOSED_FACES, new float[]{0.2f, 1.0f, 0.2f}));

Block.generate(x,y,z,faces, colors) produces a vertex array including degenerates, like this:


public static FloatBuffer generate(float x, float y, float z, boolean[] faces, float[] color) {
        float[][] cubeFaces = new float[][]{
            //Front face
            new float[]{
                //Vertex                Normals  Colors                        Texcoord
                (x+1), (y+1),(z+1), 0f, 0f, 1f, color[0], color[1], color[2], 1f, 0f,
                (x), (y+1),(z+1), 0f, 0f, 1f, color[0], color[1], color[2], 0f, 0f,
                (x), (y),(z+1), 0f, 0f, 1f, color[0], color[1], color[2], 0f, 1f, // v0-v1-v2 (front)
                (x), (y),(z+1), 0f, 0f, 1f, color[0], color[1], color[2], 0f, 1f,
                (x+1), (y),(z+1), 0f, 0f, 1f, color[0], color[1], color[2], 1f, 1f,
                (x+1), (y+1),(z+1), 0f, 0f, 1f, color[0], color[1], color[2], 1f, 0f, // v2-v3-v0
                (x+1), (y+1),(z+1),0,0,0,0,0,0,0,0
            },
            ...
            //back face
            new float[]{
                (x+1), (y), (z), 0f, 0f, -1f, color[0], color[1], color[2], 0f, 1f,
                (x), (y), (z), 0f, 0f, -1f, color[0], color[1], color[2], 1f, 1f,
                (x), (y+1), (z), 0f, 0f, -1f, color[0], color[1], color[2], 1f, 0f,// v4-v7-v6 (back)
                (x), (y+1), (z), 0f, 0f, -1f, color[0], color[1], color[2], 1f, 0f,
                (x+1), (y+1), (z), 0f, 0f, -1f, color[0], color[1], color[2], 0f, 0f,
                (x+1), (y), (z), 0f, 0f, -1f, color[0], color[1], color[2], 0f, 1f,
                (x+1), (y), (z),0,0,0,0,0,0,0,0
            }
        };
        int faceCount = 0;
        for (int i = 0; i < faces.length; i++) {
            if (faces[i] == true) {
                faceCount++;
            }
        }

        float[] values = new float[faceCount * 11 * 7];

        int ptr = 0;
        float[] degenerate = new float[11];   //store the previous vertex from some other face draw to use as our next degenerate
        boolean degen = false;                    //Have we processed an earlier face and created a degenerate ?
        for (int i = 0; i < faces.length; i++) {  //foreach face
            if (faces[i] == true) {                     //if this face is to be drawn
                float[] face = cubeFaces[i];     //get the vertex data for the face 
                
                for (int j = 0; j < face.length; j++) {    //Copy the vertex data into the return array
                    if (degen && j < 11) {                    //prepend the previous degenerate vertex for the next draw 
                        values[ptr] = degenerate[j];
                    } else {
                        values[ptr] = face[j];                //Otherwise just copy the face vertex data as-is
                    }
                    ptr++;
                    if (j > 66) {                                    //Store a degenerate vertex by copying the last interleaved vertex data from this draw
                        degenerate[j-66] = face[j];
                        degen = true;
                    }
                }
            }
        }
        return Util.getFloatBuffer(values);      //Return the final floatbuffer
    }

Are my degenerate verticies all wrong? I thought all I need to do is declare the same x,y,z from a previous draw, so I could not do the hacky stuff here like remembering the last vertex from an earlier face.
Can't I just stick the degenerate in the vertex data like this?

//Front face
new float[]{
//Vertex Normals Colors Texcoord
(x+1), (y+1),(z+1), 0f, 0f, 1f, color[0], color[1], color[2], 1f, 0f,
(x), (y+1),(z+1), 0f, 0f, 1f, color[0], color[1], color[2], 0f, 0f,
(x), (y),(z+1), 0f, 0f, 1f, color[0], color[1], color[2], 0f, 1f, // v0-v1-v2 (front)
(x), (y),(z+1), 0f, 0f, 1f, color[0], color[1], color[2], 0f, 1f,
(x+1), (y),(z+1), 0f, 0f, 1f, color[0], color[1], color[2], 1f, 1f,
(x+1), (y+1),(z+1), 0f, 0f, 1f, color[0], color[1], color[2], 1f, 0f, // v2-v3-v0
(x+1), (y+1),(z+1),0,0,0,0,0,0,0,0 //degenerate
},

Okay, so I have tweaked this a bit:


//Interleaved array - vertex3f, normal3f, color3f, u,v
            //bottom face
            new float[]{
                x, y, z, 0, 0, 0, 0, 0, 0, 0, 0,
                x, y, z, 0, -1, 0, color[0], color[1], color[2], 0, 1,
                x + 1, y, z, 0, -1, 0, color[0], color[1], color[2], 1, 1,
                x + 1, y, z + 1, 0, -1, 0, color[0], color[1], color[2], 1, 0,// v7-v4-v3bottom
                x + 1, y, z + 1, 0, -1, 0, color[0], color[1], color[2], 1, 0,
                x, y, z + 1, 0, -1, 0, color[0], color[1], color[2], 0, 0,
                x, y, z, 0, -1, 0, color[0], color[1], color[2], 0, 1,// v3-v2-v7
                x, y, z, 0, 0, 0, 0, 0, 0, 0, 0
            },
            //back face
            new float[]{
                x + 1, y, z, 0, 0, 0, 0, 0, 0, 0, 0,
                x + 1, y, z, 0, 0, -1, color[0], color[1], color[2], 0, 1,
                x, y, z, 0, 0, -1, color[0], color[1], color[2], 1, 1,
                x, y + 1, z, 0, 0, -1, color[0], color[1], color[2], 1, 0,// v4-v7-v6back
                x, y + 1, z, 0, 0, -1, color[0], color[1], color[2], 1, 0,
                x + 1, y + 1, z, 0, 0, -1, color[0], color[1], color[2], 0, 0,
                x + 1, y, z, 0, 0, -1, color[0], color[1], color[2], 0, 1,
                x + 1, y, z, 0, 0, 0, 0, 0, 0, 0, 0
            }


As you can see, each face begins with a degenerate and ends with one. I basically stick these faces together in any order, so I start drawing a chunk and draw block1->face1, block1->face 3, block2->face2, ..., blockN->faceN

I start drawing at offset one


GL11.glDrawArrays(GL11.GL_TRIANGLES, 1, this.numVerts);

since I want to draw this array but I probably shouldn't start off by drawing a degenerate.

The problem is that my degenerates are still wrong, or at least they are being toggled in ways I'm not expecting. They seem to be toggling on and off which results in this trippy scene.

Screen_Shot_2013_07_03_at_6_opt.png

This "degeneration" seems way too complicated, I'm not even sure what you're trying to achieve with that. Are you trying to copy the needed faces to other array or what?

Derp

This "degeneration" seems way too complicated, I'm not even sure what you're trying to achieve with that. Are you trying to copy the needed faces to other array or what?

Without the degeneration, I would have visible triangles drawn between disconnected faces of a block. If I have 3 blocks to draw, like this:

block1 -> front + back + right

block2 -> right + back + top

block3 -> front + left

Imagine trying to compute a closed, single connected triangle fan from that. It would be tricky. You'd need to start your first face on block 1, then stop drawing (degenerate vertex) and move the vertex array "pointer" to a back face corner and start drawing the back face. When you are done with that, you'll need to stop drawing (with a degenerate) and get a vertex pointer over to a corner on block2's right face (another degenerate), then commit to actually start drawing again with yet another degenerate vertex on block2's right face.

It sounds overly complicated, but I don't believe it to be - I think this is how optimal meshes are output and even Blender, when I draw a mesh like this, will output a similar mesh. Computing truly optimal meshes, however, is proven to be an NP-complete problem.

I did get it functional with something that looks like this:


            new float[]{
                //Vertex         Normals      Colors                            Texcoord
                Float.NaN, Float.NaN, Float.NaN, 0, 0, 0, 0, 0, 0, 0, 0, //Degenerate reset
                x + 1, y + 1, z + 1, 0, 0, 1, color[0], color[1], color[2], 1, 0,
                x, y + 1, z + 1, 0, 0, 1, color[0], color[1], color[2], 0, 0,
                x, y, z + 1, 0, 0, 1, color[0], color[1], color[2], 0, 1, // v0-v1-v2front
                x, y, z + 1, 0, 0, 1, color[0], color[1], color[2], 0, 1,
                x + 1, y, z + 1, 0, 0, 1, color[0], color[1], color[2], 1, 1,
                x + 1, y + 1, z + 1, 0, 0, 1, color[0], color[1], color[2], 1, 0, // v2-v3-v0
                x + 1, y + 1, z + 1, 0, 0, 1, color[0], color[1], color[2], 1, 0, //End this face
                Float.NaN, Float.NaN, Float.NaN, 0, 0, 0, 0, 0, 0, 0, 0,}, //Degenerate reset
            //Right face
            new float[]{
                Float.NaN, Float.NaN, Float.NaN, 0, 0, 0, 0, 0, 0, 0, 0, //Degenerate reset
                x + 1, y + 1, z + 1, 1, 0, 0, color[0], color[1], color[2], 0, 0,
                x + 1, y, z + 1, 1, 0, 0, color[0], color[1], color[2], 0, 1,
                x + 1, y, z, 1, 0, 0, color[0], color[1], color[2], 1, 1, // v0-v3-v4right
                x + 1, y, z, 1, 0, 0, color[0], color[1], color[2], 1, 1,
                x + 1, y + 1, z, 1, 0, 0, color[0], color[1], color[2], 1, 0,
                x + 1, y + 1, z + 1, 1, 0, 0, color[0], color[1], color[2], 0, 0, // v4-v5-v0
                x + 1, y + 1, z + 1, 1, 0, 0, color[0], color[1], color[2], 0, 0,//End this face
                Float.NaN, Float.NaN, Float.NaN, 0, 0, 0, 0, 0, 0, 0, 0,}, //Degenerate reset

As you can see, there are several degens in there. Every time a face is drawn, I move the vertex "pointer" from a known bogus vertex - (Float.NaN, Float.NaN, Float.NaN) and start drawing a new face with a real x,y,z. At the end of each face I restore the degenerate back to (Float.NaN, Float.NaN, Float.NaN) - I need to ensure that no matter where I am in drawing blocks, I won't send OpenGL a bunch of duplicate values. If I used 0,0,0 for example, and I tried to draw a block actually at 0,0,0, this scheme would not work because OpenGL would be fed even more degenerates and toggle drawing of those verticies off. The easiest thing to do is start and stop each face with degenerate verticies that you know will never actually need to be drawn and that won't collide with any actual drawn vertex in your mesh.

This all works fine at the moment with a few caveats. Performance is excellent - I get over 300fps on an intel hd 4000 integrated card and over 2000fps on my GTX690 with about 20 million blocks in scene. Of course those 20 million blocks become relatively few faces to actually draw since most are completely concealed - maybe a couple hundred thousand faces actually being drawn.

I have one lingering problem that only occurs on Radeon cards. Notice these rogue faces:

radeon_solid.png

Each chunk is outlined in the red wires. Notice the "thrashing" of garbage vertex data near the origin of the chunk:

Radeon_wire.png

Whereas Intel and Nvidia cards render the scene as expected:

nvidia_and_intel_1.png

nvidia_and_intel_2.png

This topic is closed to new replies.

Advertisement