Sprite batching and other sprite rendering techniques

Started by
6 comments, last by 21st Century Moose 11 years, 2 months ago

I'm considering how to efficiently render 2D sprites. I'm trying to keep things forward compatible for OpenGL 3+, but I'm limited to OpenGL 2.

I'd have a mixture of static and dynamic sprites.

  • Some sprites would be completely static; e.g. map tiles
  • Moving sprites would have dynamic transforms
  • Animated sprites would have dynamic texture coordinates; I'm using a texture atlas
  • A sprite may be both moving and animated (of course)
  • A sprite may be able to move but only does so infrequently; e.g. doors that swing only when opened
  • The lifetime of a sprite may be dynamic; some sprites may exist for the whole duration of the game, others may be added and later removed from the scene mid-game

A strategy I've used is having a global VBO representing a single unit-sized quad. This unit quad is rendered multiple times for each sprite, where I provide my shader a transformation matrix as well as offset and scale uniforms for the texture coordinates.

I've read that batching sprites, where I get the world-space coordinates and final texture co-ordinates of all sprites and jam them into a single VBO, is normally the way to go performance-wise. The simplest(?) batching method that I understand is using a GL_STREAM_DRAW VBO that gets the vertex data of all sprites with a glBufferData call each frame, possibly using an additional GL_STATIC_DRAW VBO with all the sprites that I know are static and persistent.

Would sprite batching be significantly more performant than my unit-quad VBO approach? If so, is the method of sprite batching I described an efficient implementation of batching given my requirements for sprite behaviour?

Advertisement

The simplest(?) batching method that I understand is using a GL_STREAM_DRAW VBO that gets the vertex data of all sprites with a glBufferData call each frame, possibly using an additional GL_STATIC_DRAW VBO with all the sprites that I know are static and persistent.

Pretty much, just process and store the vertices CPU side until ready to draw. Then send the proper state and the vertices to the buffer and draw.

Use glBufferSubData though - from what I understand, glBufferData destroys & re-creates the buffer each time with an overhead cost.

Would sprite batching be significantly more performant than my unit-quad VBO approach?

Probably - reducing the number of separate draw calls is generally a very effective optimization. If you are having performance issues, this would definitately be the first thing to try.

With a simple batch similar to above, written in C# and using OpenTK, and without any real optimization, I can easily get many thousands of sprites on the screen each frame with plenty of both CPU & GPU to spare for other tasks.

Use glBufferSubData though - from what I understand, glBufferData destroys & re-creates the buffer each time with an overhead cost.

From what I can tell, since glBufferSubData doesn't allocate space (I think), I would effectively have a maximum size for my VBO?

Use glBufferSubData though - from what I understand, glBufferData destroys & re-creates the buffer each time with an overhead cost.

From what I can tell, since glBufferSubData doesn't allocate space (I think), I would effectively have a maximum size for my VBO?

Yup, so you just fill/draw, fill/draw, multiple times per frame if need be, in groups of that maximum size. It will still run much faster than having to destroy and re-create the buffer each time.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Yup, so you just fill/draw, fill/draw, multiple times per frame if need be, in groups of that maximum size.

OK. That'll also save me from redefining the index buffer.

Though what's the proper way to fill the vertex buffer when I have less than the maximum number of sprites? Would I call glBufferSubData once to upload my remaining vertices and then call glBufferSubData a second time to fill the rest of the buffer with nulls? Would my index buffer be affected?

Just call glBufferSubData once when a batch is ready to be drawn, specifying the count or whatever parameter it is to set the amount of data to upload, and only draw the number of vertices (or indices if using draw elements) you need for the current batch.

The index buffer can be predifined, and will not change.

Here is some C#-ish pseudocode from a project of mine that outlines what I do:
spriteVertex[] vertices = new spriteVertex[BATCHSIZE * 4];
int[] indices = new int[BATCHSIZE * 6];

MakeBuffers(ref vertBufferID, ref indBufferID, ref vaoID);

// fill the buffer with default (0s) on creation
glBufferData(BufferTarget.ArrayBuffer, sizeof(spriteVertex) * vertices.Length, vertices);

// fill index buffer with pre-calc'd values
PreCalcIndexBuffer();

...

class TileBatch
{
  public void Begin(Texture2D texture)
  {
    currentSprite = 0;
    currentTexture = texture;
  }

  public void Draw(Rect dest, Rect src, Color color)
  {
    // if we are out of room, flush (draw) the current batch
    if(currentSprite > BATCHSIZE)
    {
      flush();
      currentSprite = 0;
    }

    // calc all the vertex attributes for this sprite and store them in our CPU array
    int vertStart = currentSprite * 4;
    vertices[vertStart].position.X = dest.X;
    vertices[vertStart].texcoord.X = src.X;
    ...
    vertices[vertStart + 3].position.Y = destRect.Bottom;
    vertices[vertStart + 3].texcoord.Y = src.Bottom;
    
    currentSprite++;
  }

  public void End()
  {
    Flush();
  }

  private void Flush()
  {
    // set program uniforms and state
    shaderProgram.Texture = currentTexture;
    shaderProgram.mvpMatrix = currentMVP;
    ...

    int numberToDraw = currentSprite;

    BindVAO();
    
    // upload our CPU vertex data to GPU
    glBufferSubData(BufferTarget.ArrayBuffer, 0, sizeof(SpriteVertex) * numberToDraw, vertices);
    
    // draw the appropriate number of sprites
    DrawElements(BeginMode.Triangles, numberToDraw);
  }
}

In use:
MyBatch.Begin(myTexture);

MyBatch.Draw(Rect(0,0,32,32), Rect(50,50,32,32), Color.White);
MyBatch.Draw(...)
...

MyBatch.End();

specifying the count or whatever parameter it is to set the amount of data to upload

Oh yeah. I forgot about the count parameter in glDrawElements.

Thanks.

In theory the best way is to use glMapBufferRange, and either write directly to the returned pointer or else memcpy to it from some intermediate struct.

The way to do this is to keep a "current position" counter (initially 0); you map, write, increment current position. When data can no longer fit you invalidate the entire buffer and reset current position to 0. At various points in the process (normally only when state needs to change) you draw anything that's been written since the last draw.

All of that can fit into a nice class to keep things clean in the higher level code using this system.

If glMapBufferRange is unavailable (and I note that you're currently limited to GL 2 so that may be the case) then I'd encourage you to do a comparative benchmark of VBOs versus old-style system memory arrays. The big problem with GL buffer updates pre-MapBufferRange is that they're prone to GPU/CPU synchronization, so while in theory a VBO should be a faster path, in practice for truly dynamic vertex data that needs to change every frame, it may not be. You should consider setting up that nice class I mentioned in a manner that is reasonably transparent to your higher-level code irrespective of which case you use.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement