Sign in to follow this  

OpenGL Sprite batching and other sprite rendering techniques

Recommended Posts

I'm considering how to efficiently render 2D sprites. I'm trying to keep things forward compatible for OpenGL 3+, but I'm limited to OpenGL 2.


I'd have a mixture of static and dynamic sprites.

  • Some sprites would be completely static; e.g. map tiles
  • Moving sprites would have dynamic transforms
  • Animated sprites would have dynamic texture coordinates; I'm using a texture atlas
  • A sprite may be both moving and animated (of course)
  • A sprite may be able to move but only does so infrequently; e.g. doors that swing only when opened
  • The lifetime of a sprite may be dynamic; some sprites may exist for the whole duration of the game, others may be added and later removed from the scene mid-game

A strategy I've used is having a global VBO representing a single unit-sized quad. This unit quad is rendered multiple times for each sprite, where I provide my shader a transformation matrix as well as offset and scale uniforms for the texture coordinates.


I've read that batching sprites, where I get the world-space coordinates and final texture co-ordinates of all sprites and jam them into a single VBO, is normally the way to go performance-wise. The simplest(?) batching method that I understand is using a GL_STREAM_DRAW VBO that gets the vertex data of all sprites with a glBufferData call each frame, possibly using an additional GL_STATIC_DRAW VBO with all the sprites that I know are static and persistent.


Would sprite batching be significantly more performant than my unit-quad VBO approach? If so, is the method of sprite batching I described an efficient implementation of batching given my requirements for sprite behaviour?

Share this post

Link to post
Share on other sites

The simplest(?) batching method that I understand is using a GL_STREAM_DRAW VBO that gets the vertex data of all sprites with a glBufferData call each frame, possibly using an additional GL_STATIC_DRAW VBO with all the sprites that I know are static and persistent.

Pretty much, just process and store the vertices CPU side until ready to draw. Then send the proper state and the vertices to the buffer and draw.

Use glBufferSubData though - from what I understand, glBufferData destroys & re-creates the buffer each time with an overhead cost.

Would sprite batching be significantly more performant than my unit-quad VBO approach?

Probably - reducing the number of separate draw calls is generally a very effective optimization. If you are having performance issues, this would definitately be the first thing to try.

With a simple batch similar to above, written in C# and using OpenTK, and without any real optimization, I can easily get many thousands of sprites on the screen each frame with plenty of both CPU & GPU to spare for other tasks.

Share this post

Link to post
Share on other sites

Use glBufferSubData though - from what I understand, glBufferData destroys & re-creates the buffer each time with an overhead cost.

From what I can tell, since glBufferSubData doesn't allocate space (I think), I would effectively have a maximum size for my VBO?


Yup, so you just fill/draw, fill/draw, multiple times per frame if need be, in groups of that maximum size.  It will still run much faster than having to destroy and re-create the buffer each time.

Share this post

Link to post
Share on other sites

Yup, so you just fill/draw, fill/draw, multiple times per frame if need be, in groups of that maximum size.

OK. That'll also save me from redefining the index buffer.


Though what's the proper way to fill the vertex buffer when I have less than the maximum number of sprites?  Would I call glBufferSubData once to upload my remaining vertices and then call glBufferSubData a second time to fill the rest of the buffer with nulls?  Would my index buffer be affected?

Share this post

Link to post
Share on other sites
Just call glBufferSubData once when a batch is ready to be drawn, specifying the count or whatever parameter it is to set the amount of data to upload, and only draw the number of vertices (or indices if using draw elements) you need for the current batch.

The index buffer can be predifined, and will not change.

Here is some C#-ish pseudocode from a project of mine that outlines what I do:
spriteVertex[] vertices = new spriteVertex[BATCHSIZE * 4];
int[] indices = new int[BATCHSIZE * 6];

MakeBuffers(ref vertBufferID, ref indBufferID, ref vaoID);

// fill the buffer with default (0s) on creation
glBufferData(BufferTarget.ArrayBuffer, sizeof(spriteVertex) * vertices.Length, vertices);

// fill index buffer with pre-calc'd values


class TileBatch
  public void Begin(Texture2D texture)
    currentSprite = 0;
    currentTexture = texture;

  public void Draw(Rect dest, Rect src, Color color)
    // if we are out of room, flush (draw) the current batch
    if(currentSprite > BATCHSIZE)
      currentSprite = 0;

    // calc all the vertex attributes for this sprite and store them in our CPU array
    int vertStart = currentSprite * 4;
    vertices[vertStart].position.X = dest.X;
    vertices[vertStart].texcoord.X = src.X;
    vertices[vertStart + 3].position.Y = destRect.Bottom;
    vertices[vertStart + 3].texcoord.Y = src.Bottom;

  public void End()

  private void Flush()
    // set program uniforms and state
    shaderProgram.Texture = currentTexture;
    shaderProgram.mvpMatrix = currentMVP;

    int numberToDraw = currentSprite;

    // upload our CPU vertex data to GPU
    glBufferSubData(BufferTarget.ArrayBuffer, 0, sizeof(SpriteVertex) * numberToDraw, vertices);
    // draw the appropriate number of sprites
    DrawElements(BeginMode.Triangles, numberToDraw);

In use:

MyBatch.Draw(Rect(0,0,32,32), Rect(50,50,32,32), Color.White);

Edited by laztrezort

Share this post

Link to post
Share on other sites

In theory the best way is to use glMapBufferRange, and either write directly to the returned pointer or else memcpy to it from some intermediate struct.


The way to do this is to keep a "current position" counter (initially 0); you map, write, increment current position.  When data can no longer fit you invalidate the entire buffer and reset current position to 0.  At various points in the process (normally only when state needs to change) you draw anything that's been written since the last draw.


All of that can fit into a nice class to keep things clean in the higher level code using this system.


If glMapBufferRange is unavailable (and I note that you're currently limited to GL 2 so that may be the case) then I'd encourage you to do a comparative benchmark of VBOs versus old-style system memory arrays.  The big problem with GL buffer updates pre-MapBufferRange is that they're prone to GPU/CPU synchronization, so while in theory a VBO should be a faster path, in practice for truly dynamic vertex data that needs to change every frame, it may not be.  You should consider setting up that nice class I mentioned in a manner that is reasonably transparent to your higher-level code irrespective of which case you use.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Announcements

  • Forum Statistics

    • Total Topics
    • Total Posts
  • Similar Content

    • By DejayHextrix
      Hi, New here. 
      I need some help. My fiance and I like to play this mobile game online that goes by real time. Her and I are always working but when we have free time we like to play this game. We don't always got time throughout the day to Queue Buildings, troops, Upgrades....etc.... 
      I was told to look into DLL Injection and OpenGL/DirectX Hooking. Is this true? Is this what I need to learn? 
      How do I read the Android files, or modify the files, or get the in-game tags/variables for the game I want? 
      Any assistance on this would be most appreciated. I been everywhere and seems no one knows or is to lazy to help me out. It would be nice to have assistance for once. I don't know what I need to learn. 
      So links of topics I need to learn within the comment section would be SOOOOO.....Helpful. Anything to just get me started. 
      Dejay Hextrix 
    • By mellinoe
      Hi all,
      First time poster here, although I've been reading posts here for quite a while. This place has been invaluable for learning graphics programming -- thanks for a great resource!
      Right now, I'm working on a graphics abstraction layer for .NET which supports D3D11, Vulkan, and OpenGL at the moment. I have implemented most of my planned features already, and things are working well. Some remaining features that I am planning are Compute Shaders, and some flavor of read-write shader resources. At the moment, my shaders can just get simple read-only access to a uniform (or constant) buffer, a texture, or a sampler. Unfortunately, I'm having a tough time grasping the distinctions between all of the different kinds of read-write resources that are available. In D3D alone, there seem to be 5 or 6 different kinds of resources with similar but different characteristics. On top of that, I get the impression that some of them are more or less "obsoleted" by the newer kinds, and don't have much of a place in modern code. There seem to be a few pivots:
      The data source/destination (buffer or texture) Read-write or read-only Structured or unstructured (?) Ordered vs unordered (?) These are just my observations based on a lot of MSDN and OpenGL doc reading. For my library, I'm not interested in exposing every possibility to the user -- just trying to find a good "middle-ground" that can be represented cleanly across API's which is good enough for common scenarios.
      Can anyone give a sort of "overview" of the different options, and perhaps compare/contrast the concepts between Direct3D, OpenGL, and Vulkan? I'd also be very interested in hearing how other folks have abstracted these concepts in their libraries.
    • By aejt
      I recently started getting into graphics programming (2nd try, first try was many years ago) and I'm working on a 3d rendering engine which I hope to be able to make a 3D game with sooner or later. I have plenty of C++ experience, but not a lot when it comes to graphics, and while it's definitely going much better this time, I'm having trouble figuring out how assets are usually handled by engines.
      I'm not having trouble with handling the GPU resources, but more so with how the resources should be defined and used in the system (materials, models, etc).
      This is my plan now, I've implemented most of it except for the XML parts and factories and those are the ones I'm not sure of at all:
      I have these classes:
      For GPU resources:
      Geometry: holds and manages everything needed to render a geometry: VAO, VBO, EBO. Texture: holds and manages a texture which is loaded into the GPU. Shader: holds and manages a shader which is loaded into the GPU. For assets relying on GPU resources:
      Material: holds a shader resource, multiple texture resources, as well as uniform settings. Mesh: holds a geometry and a material. Model: holds multiple meshes, possibly in a tree structure to more easily support skinning later on? For handling GPU resources:
      ResourceCache<T>: T can be any resource loaded into the GPU. It owns these resources and only hands out handles to them on request (currently string identifiers are used when requesting handles, but all resources are stored in a vector and each handle only contains resource's index in that vector) Resource<T>: The handles given out from ResourceCache. The handles are reference counted and to get the underlying resource you simply deference like with pointers (*handle).  
      And my plan is to define everything into these XML documents to abstract away files:
      Resources.xml for ref-counted GPU resources (geometry, shaders, textures) Resources are assigned names/ids and resource files, and possibly some attributes (what vertex attributes does this geometry have? what vertex attributes does this shader expect? what uniforms does this shader use? and so on) Are reference counted using ResourceCache<T> Assets.xml for assets using the GPU resources (materials, meshes, models) Assets are not reference counted, but they hold handles to ref-counted resources. References the resources defined in Resources.xml by names/ids. The XMLs are loaded into some structure in memory which is then used for loading the resources/assets using factory classes:
      Factory classes for resources:
      For example, a texture factory could contain the texture definitions from the XML containing data about textures in the game, as well as a cache containing all loaded textures. This means it has mappings from each name/id to a file and when asked to load a texture with a name/id, it can look up its path and use a "BinaryLoader" to either load the file and create the resource directly, or asynchronously load the file's data into a queue which then can be read from later to create the resources synchronously in the GL context. These factories only return handles.
      Factory classes for assets:
      Much like for resources, these classes contain the definitions for the assets they can load. For example, with the definition the MaterialFactory will know which shader, textures and possibly uniform a certain material has, and with the help of TextureFactory and ShaderFactory, it can retrieve handles to the resources it needs (Shader + Textures), setup itself from XML data (uniform values), and return a created instance of requested material. These factories return actual instances, not handles (but the instances contain handles).
      Is this a good or commonly used approach? Is this going to bite me in the ass later on? Are there other more preferable approaches? Is this outside of the scope of a 3d renderer and should be on the engine side? I'd love to receive and kind of advice or suggestions!
    • By nedondev
      I 'm learning how to create game by using opengl with c/c++ coding, so here is my fist game. In video description also have game contain in Dropbox. May be I will make it better in future.
    • By Abecederia
      So I've recently started learning some GLSL and now I'm toying with a POM shader. I'm trying to optimize it and notice that it starts having issues at high texture sizes, especially with self-shadowing.
      Now I know POM is expensive either way, but would pulling the heightmap out of the normalmap alpha channel and in it's own 8bit texture make doing all those dozens of texture fetches more cheap? Or is everything in the cache aligned to 32bit anyway? I haven't implemented texture compression yet, I think that would help? But regardless, should there be a performance boost from decoupling the heightmap? I could also keep it in a lower resolution than the normalmap if that would improve performance.
      Any help is much appreciated, please keep in mind I'm somewhat of a newbie. Thanks!
  • Popular Now