Followers 0

# OpenGL Sprite batching and other sprite rendering techniques

## 7 posts in this topic

I'm considering how to efficiently render 2D sprites. I'm trying to keep things forward compatible for OpenGL 3+, but I'm limited to OpenGL 2.

I'd have a mixture of static and dynamic sprites.

• Some sprites would be completely static; e.g. map tiles
• Moving sprites would have dynamic transforms
• Animated sprites would have dynamic texture coordinates; I'm using a texture atlas
• A sprite may be both moving and animated (of course)
• A sprite may be able to move but only does so infrequently; e.g. doors that swing only when opened
• The lifetime of a sprite may be dynamic; some sprites may exist for the whole duration of the game, others may be added and later removed from the scene mid-game

A strategy I've used is having a global VBO representing a single unit-sized quad. This unit quad is rendered multiple times for each sprite, where I provide my shader a transformation matrix as well as offset and scale uniforms for the texture coordinates.

I've read that batching sprites, where I get the world-space coordinates and final texture co-ordinates of all sprites and jam them into a single VBO, is normally the way to go performance-wise. The simplest(?) batching method that I understand is using a GL_STREAM_DRAW VBO that gets the vertex data of all sprites with a glBufferData call each frame, possibly using an additional GL_STATIC_DRAW VBO with all the sprites that I know are static and persistent.

Would sprite batching be significantly more performant than my unit-quad VBO approach? If so, is the method of sprite batching I described an efficient implementation of batching given my requirements for sprite behaviour?

0

##### Share on other sites

The simplest(?) batching method that I understand is using a GL_STREAM_DRAW VBO that gets the vertex data of all sprites with a glBufferData call each frame, possibly using an additional GL_STATIC_DRAW VBO with all the sprites that I know are static and persistent.

Pretty much, just process and store the vertices CPU side until ready to draw. Then send the proper state and the vertices to the buffer and draw.

Use glBufferSubData though - from what I understand, glBufferData destroys & re-creates the buffer each time with an overhead cost.

Would sprite batching be significantly more performant than my unit-quad VBO approach?

Probably - reducing the number of separate draw calls is generally a very effective optimization. If you are having performance issues, this would definitately be the first thing to try.

With a simple batch similar to above, written in C# and using OpenTK, and without any real optimization, I can easily get many thousands of sprites on the screen each frame with plenty of both CPU & GPU to spare for other tasks.
1

##### Share on other sites

Use glBufferSubData though - from what I understand, glBufferData destroys & re-creates the buffer each time with an overhead cost.

From what I can tell, since glBufferSubData doesn't allocate space (I think), I would effectively have a maximum size for my VBO?

0

##### Share on other sites

Use glBufferSubData though - from what I understand, glBufferData destroys & re-creates the buffer each time with an overhead cost.

From what I can tell, since glBufferSubData doesn't allocate space (I think), I would effectively have a maximum size for my VBO?

Yup, so you just fill/draw, fill/draw, multiple times per frame if need be, in groups of that maximum size.  It will still run much faster than having to destroy and re-create the buffer each time.

1

##### Share on other sites

Yup, so you just fill/draw, fill/draw, multiple times per frame if need be, in groups of that maximum size.

OK. That'll also save me from redefining the index buffer.

Though what's the proper way to fill the vertex buffer when I have less than the maximum number of sprites?  Would I call glBufferSubData once to upload my remaining vertices and then call glBufferSubData a second time to fill the rest of the buffer with nulls?  Would my index buffer be affected?

0

##### Share on other sites
Just call glBufferSubData once when a batch is ready to be drawn, specifying the count or whatever parameter it is to set the amount of data to upload, and only draw the number of vertices (or indices if using draw elements) you need for the current batch.

The index buffer can be predifined, and will not change.

Here is some C#-ish pseudocode from a project of mine that outlines what I do:
spriteVertex[] vertices = new spriteVertex[BATCHSIZE * 4];
int[] indices = new int[BATCHSIZE * 6];

MakeBuffers(ref vertBufferID, ref indBufferID, ref vaoID);

// fill the buffer with default (0s) on creation
glBufferData(BufferTarget.ArrayBuffer, sizeof(spriteVertex) * vertices.Length, vertices);

// fill index buffer with pre-calc'd values
PreCalcIndexBuffer();

...

class TileBatch
{
public void Begin(Texture2D texture)
{
currentSprite = 0;
currentTexture = texture;
}

public void Draw(Rect dest, Rect src, Color color)
{
// if we are out of room, flush (draw) the current batch
if(currentSprite > BATCHSIZE)
{
flush();
currentSprite = 0;
}

// calc all the vertex attributes for this sprite and store them in our CPU array
int vertStart = currentSprite * 4;
vertices[vertStart].position.X = dest.X;
vertices[vertStart].texcoord.X = src.X;
...
vertices[vertStart + 3].position.Y = destRect.Bottom;
vertices[vertStart + 3].texcoord.Y = src.Bottom;

currentSprite++;
}

public void End()
{
Flush();
}

private void Flush()
{
// set program uniforms and state
...

int numberToDraw = currentSprite;

BindVAO();

// upload our CPU vertex data to GPU
glBufferSubData(BufferTarget.ArrayBuffer, 0, sizeof(SpriteVertex) * numberToDraw, vertices);

// draw the appropriate number of sprites
DrawElements(BeginMode.Triangles, numberToDraw);
}
}


In use:
MyBatch.Begin(myTexture);

MyBatch.Draw(Rect(0,0,32,32), Rect(50,50,32,32), Color.White);
MyBatch.Draw(...)
...

MyBatch.End();

Edited by laztrezort
1

##### Share on other sites

specifying the count or whatever parameter it is to set the amount of data to upload

Oh yeah. I forgot about the count parameter in glDrawElements.

Thanks.

0

##### Share on other sites

In theory the best way is to use glMapBufferRange, and either write directly to the returned pointer or else memcpy to it from some intermediate struct.

The way to do this is to keep a "current position" counter (initially 0); you map, write, increment current position.  When data can no longer fit you invalidate the entire buffer and reset current position to 0.  At various points in the process (normally only when state needs to change) you draw anything that's been written since the last draw.

All of that can fit into a nice class to keep things clean in the higher level code using this system.

If glMapBufferRange is unavailable (and I note that you're currently limited to GL 2 so that may be the case) then I'd encourage you to do a comparative benchmark of VBOs versus old-style system memory arrays.  The big problem with GL buffer updates pre-MapBufferRange is that they're prone to GPU/CPU synchronization, so while in theory a VBO should be a faster path, in practice for truly dynamic vertex data that needs to change every frame, it may not be.  You should consider setting up that nice class I mentioned in a manner that is reasonably transparent to your higher-level code irrespective of which case you use.

0

## Create an account

Register a new account

Followers 0

• ### Similar Content

• So it's been a while since I took a break from my whole creating a planet in DX11. Last time around I got stuck on fixing a nice LOD.
A week back or so I got help to find this:
https://github.com/sp4cerat/Planet-LOD
In general this is what I'm trying to recreate in DX11, he that made that planet LOD uses OpenGL but that is a minor issue and something I can solve. But I have a question regarding the code
He gets the position using this row
vec4d pos = b.var.vec4d["position"]; Which is then used further down when he sends the variable "center" into the drawing function:
if (pos.len() < 1) pos.norm(); world::draw(vec3d(pos.x, pos.y, pos.z));
Inside the draw function this happens:
draw_recursive(p3[0], p3[1], p3[2], center); Basically the 3 vertices of the triangle and the center of details that he sent as a parameter earlier: vec3d(pos.x, pos.y, pos.z)
Now onto my real question, he does vec3d edge_center[3] = { (p1 + p2) / 2, (p2 + p3) / 2, (p3 + p1) / 2 }; to get the edge center of each edge, nothing weird there.
But this is used later on with:
vec3d d = center + edge_center[i]; edge_test[i] = d.len() > ratio_size; edge_test is then used to evaluate if there should be a triangle drawn or if it should be split up into 3 new triangles instead. Why is it working for him? shouldn't it be like center - edge_center or something like that? Why adding them togheter? I asume here that the center is the center of details for the LOD. the position of the camera if stood on the ground of the planet and not up int he air like it is now.

Full code can be seen here:
https://github.com/sp4cerat/Planet-LOD/blob/master/src.simple/Main.cpp
If anyone would like to take a look and try to help me understand this code I would love this person. I'm running out of ideas on how to solve this in my own head, most likely twisted it one time to many up in my head
Toastmastern

• I googled around but are unable to find source code or details of implementation.
What keywords should I search for this topic?
Things I would like to know:
A. How to ensure that partially covered pixels are rasterized?
Apparently by expanding each triangle by 1 pixel or so, rasterization problem is almost solved.
But it will result in an unindexable triangle list without tons of overlaps. Will it incur a large performance penalty?
How to ensure proper synchronizations in GLSL?
GLSL seems to only allow int32 atomics on image.
C. Is there some simple ways to estimate coverage on-the-fly?
In case I am to draw 2D shapes onto an exisitng target:
1. A multi-pass whatever-buffer seems overkill.
2. Multisampling could cost a lot memory though all I need is better coverage.
Besides, I have to blit twice, if draw target is not multisampled.

• By mapra99
Hello

I am working on a recent project and I have been learning how to code in C# using OpenGL libraries for some graphics. I have achieved some quite interesting things using TAO Framework writing in Console Applications, creating a GLUT Window. But my problem now is that I need to incorporate the Graphics in a Windows Form so I can relate the objects that I render with some .NET Controls.

To deal with this problem, I have seen in some forums that it's better to use OpenTK instead of TAO Framework, so I can use the glControl that OpenTK libraries offer. However, I haven't found complete articles, tutorials or source codes that help using the glControl or that may insert me into de OpenTK functions. Would somebody please share in this forum some links or files where I can find good documentation about this topic? Or may I use another library different of OpenTK?

Thanks!

• Hello, I have been working on SH Irradiance map rendering, and I have been using a GLSL pixel shader to render SH irradiance to 2D irradiance maps for my static objects. I already have it working with 9 3D textures so far for the first 9 SH functions.
In my GLSL shader, I have to send in 9 SH Coefficient 3D Texures that use RGBA8 as a pixel format. RGB being used for the coefficients for red, green, and blue, and the A for checking if the voxel is in use (for the 3D texture solidification shader to prevent bleeding).
My problem is, I want to knock this number of textures down to something like 4 or 5. Getting even lower would be a godsend. This is because I eventually plan on adding more SH Coefficient 3D Textures for other parts of the game map (such as inside rooms, as opposed to the outside), to circumvent irradiance probe bleeding between rooms separated by walls. I don't want to reach the 32 texture limit too soon. Also, I figure that it would be a LOT faster.
Is there a way I could, say, store 2 sets of SH Coefficients for 2 SH functions inside a texture with RGBA16 pixels? If so, how would I extract them from inside GLSL? Let me know if you have any suggestions ^^.
• By KarimIO
EDIT: I thought this was restricted to Attribute-Created GL contexts, but it isn't, so I rewrote the post.
Hey guys, whenever I call SwapBuffers(hDC), I get a crash, and I get a "Too many posts were made to a semaphore." from Windows as I call SwapBuffers. What could be the cause of this?
Update: No crash occurs if I don't draw, just clear and swap.
static PIXELFORMATDESCRIPTOR pfd = // pfd Tells Windows How We Want Things To Be { sizeof(PIXELFORMATDESCRIPTOR), // Size Of This Pixel Format Descriptor 1, // Version Number PFD_DRAW_TO_WINDOW | // Format Must Support Window PFD_SUPPORT_OPENGL | // Format Must Support OpenGL PFD_DOUBLEBUFFER, // Must Support Double Buffering PFD_TYPE_RGBA, // Request An RGBA Format 32, // Select Our Color Depth 0, 0, 0, 0, 0, 0, // Color Bits Ignored 0, // No Alpha Buffer 0, // Shift Bit Ignored 0, // No Accumulation Buffer 0, 0, 0, 0, // Accumulation Bits Ignored 24, // 24Bit Z-Buffer (Depth Buffer) 0, // No Stencil Buffer 0, // No Auxiliary Buffer PFD_MAIN_PLANE, // Main Drawing Layer 0, // Reserved 0, 0, 0 // Layer Masks Ignored }; if (!(hDC = GetDC(windowHandle))) return false; unsigned int PixelFormat; if (!(PixelFormat = ChoosePixelFormat(hDC, &pfd))) return false; if (!SetPixelFormat(hDC, PixelFormat, &pfd)) return false; hRC = wglCreateContext(hDC); if (!hRC) { std::cout << "wglCreateContext Failed!\n"; return false; } if (wglMakeCurrent(hDC, hRC) == NULL) { std::cout << "Make Context Current Second Failed!\n"; return false; } ... // OGL Buffer Initialization glClear(GL_DEPTH_BUFFER_BIT | GL_COLOR_BUFFER_BIT); glBindVertexArray(vao); glUseProgram(myprogram); glDrawElements(GL_TRIANGLES, indexCount, GL_UNSIGNED_SHORT, (void *)indexStart); SwapBuffers(GetDC(window_handle));

• 22
• 11
• 15
• 16
• 19