Sign in to follow this  

OpenGL Drawing lots of 2D boxes; should I batch the draw call, and what's the best way to do that?

This topic is 1896 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

When batching draw calls, how does one handle different transform matrices (position/rotation) for each object? Does each object's transformation matrix have to be applied before the batch draw call?

Specifically, I'm using OpenGL ES 2.0 to draw lots of 2D boxes. I can draw each one individually just fine, but seeing as I'm targeting mobile platforms, I'm looking to try to squeeze as much performance out of this as I can. The boxes aren't textured; mostly I'm interested in minimizing draw calls to save CPU time. The more CPU time I save, the larger my physics simulation can be.

[list]
[*]If I draw a lot of boxes and want to batch the draw calls together into one call, does that require that I apply the transformation matrices of each object to its vertices before copying all the objects into a single buffer for drawing?
[*]When batches are drawn, how does one differentiate between each object being drawn (if it's possible)? Perhaps my understanding of batching is wrong, but the way I currently understand it is that you cannot differentiate between each object (because you just take the vertices of all the objects to draw and copy them all into one buffer, so it just looks like a whole bunch of vertices; also this would require using GL_TRIANGLES instead of GL_TRIANGLE_STRIP (unless degenerate triangles were inserted) so as to not connect two different objects with a stray triangle)
[*]If you had a bunch of 2D boxes to draw, each with its own position, rotation, and scaling, but all of the same color (and no textures), how would [i]you[/i] (personally) draw them? As a follow up, what if each had its own color; does that change things?
[/list]

I don't know a lot about drawing optimization techniques. I'm comfortable drawing things to the screen, but I'm not a fancy graphics programmer.

Share this post


Link to post
Share on other sites
your understanding is correct
you can:
1. position vertices so that you only need 1 (or 2) matrices to determine their position in space
basically, if you have no choice but to transform them each frame, you can use the animation approach:
use a dynamic VBO, and transform each vertex each frame, and render everything in one go
this is a reasonable approach in many cases

2. use several draw calls: use one matrix that you translate back and forth, draw a range of vertices at a time
glDrawArrays takes 3 parameters: type, first, and count
so you would use the first parameter, and start with 0, then jump to say 4, 8, 12, 16.. if you only draw 1 box at a time
this is very slow though, so if you can, draw 100 boxes at a time :)

i use both, since if you have alot of vertices it may not be in your best interest to transform all of them each frame
instead using a few extra calls on groups of vertices that belong together is better
but it depends on your data
i'm sure other people can name other solutions, but as long as your boxes don't move, you should be able to do one or the other without problem
Edited by Kaptein

Share this post


Link to post
Share on other sites
I don't have a lot of experience on mobile platforms, but shouldn't the glDraw*Instanced family of calls solve your problem? You have one VBO containing the box geometry. In your shader you then have something like this
[code]
#define MAX_INSTANCE SomeReasonableValue

uniform mat4 mvp[MAX_INSTANCES];
...
outVertex = mvp[gl_InstanceID] * inVertex;
[/code]
The only bottleneck (that is, the number of glDraw*Instanced calls required to draw N boxes) there will be how much space you have for uniforms. If you are limited to certain kinds of transformations (for example only translations, uniform scale and/or rotations around one axis) for the boxes you could try to send only those parameters to the program and building an instance-specific matrix on the fly. Of course building that matrix for every vertex might well be more costly than setting the larger uniform matrices. However, if you are limited to translations only this could be better:
[code]
uniform mat4 mvp;
uniform vec3 translations[MAX_INSTANCES];
...
outVertex = mvp * (inVertex + vec4(translations[gl_InstanceID], 0));
[/code] Edited by BitMaster

Share this post


Link to post
Share on other sites
[quote name='Kaptein' timestamp='1352298660' post='4998416']
your understanding is correct
you can:
1. position vertices so that you only need 1 (or 2) matrices to determine their position in space
basically, if you have no choice but to transform them each frame, you can use the animation approach:
use a dynamic VBO, and transform each vertex each frame, and render everything in one go
this is a reasonable approach in many cases
[/quote]
That's one option I'm considering.

[quote name='Kaptein' timestamp='1352298660' post='4998416']
2. use several draw calls: use one matrix that you translate back and forth, draw a range of vertices at a time
glDrawArrays takes 3 parameters: type, first, and count
so you would use the first parameter, and start with 0, then jump to say 4, 8, 12, 16.. if you only draw 1 box at a time
this is very slow though, so if you can, draw 100 boxes at a time :)
[/quote]
Well, each box has its own transformation matrix because they're all independently movable, so I'm guessing this would require drawing one box at a time (using this method)?

[quote name='Kaptein' timestamp='1352298660' post='4998416']
i'm sure other people can name other solutions, but as long as your boxes don't move, you should be able to do one or the other without problem
[/quote]
The boxes certainly move, as they're part of a physics simulation, which unfortunately is what makes this complicated.

[quote name='BitMaster' timestamp='1352301153' post='4998427']
I don't have a lot of experience on mobile platforms, but shouldn't the glDraw*Instanced family of calls solve your problem?
[/quote]
I can certainly draw with them, but I'm trying to find ways to a) minimize the number of draw calls and b) put as much of the computation on the GPU instead of the CPU. I don't know how to draw things in batches without first applying each object's transformation matrix to all of its vertices on the CPU, and then using the transformed data in the draw call. I don't know if there's a different/better way to do this, because right now the options I'm seeing are a) make a draw call for each object and don't apply the transformations on the CPU, or b) apply the transformations on the CPU and make a batched draw call. I'm debating between the two and am interested if a third option exists.

[quote name='BitMaster' timestamp='1352301153' post='4998427']
You have one VBO containing the box geometry. In your shader you then have something like this
[code]
#define MAX_INSTANCE SomeReasonableValue

uniform mat4 mvp[MAX_INSTANCES];
...
outVertex = mvp[gl_InstanceID] * inVertex;
[/code]
[/quote]
That's a neat idea, but glDraw*Instanced() drawing didn't appear until OpenGL ES 3.0, and I'm stuck with 2.0 :(

Share this post


Link to post
Share on other sites
For GLES 2.0 you have no choice but to do the transformation on CPU and load a dynamic VBO each frame if you have dynamic objects you want to batch.

On the plus-side, you have less calculations in your shader and can use that extra performance for making the pixels prettier. (or draw more boxes before GPU-limit) Edited by Olof Hedman

Share this post


Link to post
Share on other sites
What about adding N box geometries to the same VBO, just one after the other. For each geometry, add an additional integer attribute which is constant for each box (0 to N-1). Then you have basically a handrolled glDraw*Instanced in batches of maximal N, with your integer attribute taking the role of gl_InstanceID. Edited by BitMaster

Share this post


Link to post
Share on other sites
[quote name='BitMaster' timestamp='1352304076' post='4998448']
What about adding N box geometries to the same VBO, just one after the other. For each geometry, add an additional integer attribute which is constant for each box (0 to N-1). Then you have basically a handrolled glDraw*Instanced in batches of maximal N, with your integer attribute taking the role of gl_InstanceID.
[/quote]

You'd need to do add that attribute for each vertex, so a lot of extra integers.
I guess you'd have to put the matrixes in a texture too.
My gut says it will be slower then just transform on CPU, but I can't say I know Edited by Olof Hedman

Share this post


Link to post
Share on other sites
Hmm, I think I disregarded the case of 2D and only trivial shading.

If so, I guess more unorthodox methods could yield result, specially if the rest of the simulation tax the CPU.

The vertexes on a 2D box is less data then a matrix though, so I still say an efficient CPU-transform is probably the best :)

Share this post


Link to post
Share on other sites
Well, that will largely depend on exactly which transformations are needed. In the pure 2D case you can get away with a mat2x3 and still be completely general. For translation with rotation you can get away with a single vec3 (2D translation and angle) or maybe a vec4 (2D translation and precalculated sin(angle) and cos(angle)). I guess the 'best' solution to this problem will be extremely domain-specific, so the more ideas Cornstalks has lying around, the better. Edited by BitMaster

Share this post


Link to post
Share on other sites
The one recommendation I haven't seen is to consider jumping up to ES3. Now, that may not be viable for you, but if it is you'll have instancing support, so happy days - you're in the promised land.

If you can't do that then you've got a balancing act between the cost of splitting batches versus the cost of updating your vertex data in a manner that would allow you to take it all in a single batch.

For a desktop implementation without either instancing or glMapBufferRange (which would be required to update VBOs in a reasonable manner and without stalling the pipeline - again, ES3 would make that problem go away too) my gut inclination would be to drop the use of VBOs altogether, use client-side arrays in system memory, and transform on the CPU. Note that I said "[i]desktop[/i]" here; I'm not certain how much of the following is going to apply to a mobile implementation so take it with the appropriate sized grain of salt.

Before proceeding it needs to be noted that ES2 [i]does[/i] allow use of client-side arrays in this manner.

The main rationale behind this is that updating a VBO can be a horribly expensive operation - if you get it wrong it can be orders of magnitude more expensive than just not using VBOs at all. The reason why is that if the VBO is currently in use for drawing your program will not be able to immediately update it - instead it must stall, wait for all pending drawing operations to complete, then the update can happen. Do this a few too many times per frame and some implementations will plunge you to single digit framerates.

I'm guessing that you don't really want that to happen. ;)

So lets look at transforming a box on the CPU. This is not as horrible as it may appear at first glance.

First thing is to use indexed drawing (via glDrawElements) which will reduce the amount of vertices that need to be transformed from 24 to 8 - that's quite a significant saving already.

Second thing is to look at the transformation itself. There are several shortcuts you can take here, with an obvious one being to check if the box needs to be rotated - if it doesn't then the transformation collapses from a full set of matrix calculation/multiplies to 3 additions. Nice! The same applies to scaling; again you can collapse the full transform to something much much simpler (and faster).

One other factor here is that the indices used for drawing many boxes are going to be static - they'll never change, so you can just set them up once and reuse them as needed. You'll need to burn a bit of extra memory to set up indices for multiple boxes, but I believe that the tradeoff is worth it.

You could also get a further reduction in vertex submission by just not bothering to draw cube faces that are facing away from the viewpoint, but that would mess things up a little with your static indices (although you could work around it by collapsing them to 0-area triangles and reusing the same vertex for all of them). I'd maybe save that one for a later avenue of potential optimization if needed.

If my advice about VBOs turns out to be wrong on mobile platforms (i.e. if the cost of updating is lower than I estimate) then you're in a nice position where you can use a dynamic VBO, a static index buffer, and just fill/draw. I'm not sure if I'd be happy mixing client-side vertex data with an index buffer though, but my limited experience of mobile platforms measn that I can't really comment further on that one.

Share this post


Link to post
Share on other sites
Yeah, the best option will likely be very specific to my case. Specifically, this is a 2D game on Android where I'm using Box2D to simulate physical interactions of a large number of 2D boxes/rectangles. The boxes are not textured, and at the moment I'm considering making them all one color. The boxes can freely move and rotate during the simulation. Box2D gives me a translation vector (x and y) and rotation vector (precomputed sin and cos) for each box, and I know each box's size (I'm considering making them all the same size).

[quote name='mhagain' timestamp='1352307780' post='4998470']
The one recommendation I haven't seen is to consider jumping up to ES3. Now, that may not be viable for you, but if it is you'll have instancing support, so happy days - you're in the promised land.
[/quote]
Unfortunately, I can't, as I'm targeting Android devices and the best thing available is ES2.0.

[quote name='mhagain' timestamp='1352307780' post='4998470']
First thing is to use indexed drawing (via glDrawElements) which will reduce the amount of vertices that need to be transformed from 24 to 8 - that's quite a significant saving already.
[/quote]
These are 2D boxes, so the savings are significantly reduced (but still present). Are the savings still significant enough, do you think?

One thought I've had (there's a problem with it though) is to make one buffer that holds transformation matrices (really just vec4s representing the object's translation and rotation vectors) for every object, and then when drawing use an index array to index into the transformation matrix buffer. That way, each vertex can reference the corresponding box's transformation matrix and the box's transformation matrix only needs to be sent once. Each update would require updating the transformation matrix buffer. The problem, however, is another vertex buffer would be needed to define the 4 vertices for each box. This other vertex buffer only needs 4 elements, as all the boxes can be represented with the same vertices and a different transformation matrix. However, I don't think I can specify two index buffers, one which indexes into the transformation matrix buffer and the other which indexes into the little vertex buffer.

I'm seriously considering abandoning batching altogether at this point and just drawing each box individually (and transforming on the GPU using a uniform matrix passed in). Vertex data and buffer indices remain the same for each draw call. The only thing that would change is the uniform matrix. Thoughts?

Share this post


Link to post
Share on other sites
Hmmm - when I saw the word "box" I automatically assumed a 3D shape (even if projected onto a 2D view), but could you clarify - are you talking "boxes" as I assumed with 6 sides, 8 corners, or are you talking rectangles? I'd withdraw a huge chunk of my previous post if the latter (and happily accept negative rep on it too).

[quote name='Cornstalks' timestamp='1352313224' post='4998519']
I'm seriously considering abandoning batching altogether at this point and just drawing each box individually (and transforming on the GPU using a uniform matrix passed in). Vertex data and buffer indices remain the same for each draw call. The only thing that would change is the uniform matrix. Thoughts?
[/quote]

Worth benchmarking and seeing how you go. It's incredibly simple to implement and may turn out to be not a problem at all. Edited by mhagain

Share this post


Link to post
Share on other sites
[quote name='mhagain' timestamp='1352335222' post='4998663']
Hmmm - when I saw the word "box" I automatically assumed a 3D shape (even if projected onto a 2D view), but could you clarify - are you talking "boxes" as I assumed with 6 sides, 8 corners, or are you talking rectangles? I'd withdraw a huge chunk of my previous post if the latter (and happily accept negative rep on it too).
[/quote]
Boxes as in 2D rectangles and squares. 4 vertices, 2 triangles. I voted you up because even though a good amount of what you were talking about doesn't really apply in my particular case, there are things that you mentioned that I do appreciate because they may be very helpful in future projects.

I've got some basic rendering working now using the method in my last paragraph of my previous post. I plan on doing some stress testing and benchmarking and seeing if the rendering is enough of a bottleneck to try to optimize more, though I'm doubting it will at this point.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Similar Content

    • By xhcao
      Does sync be needed to read texture content after access texture image in compute shader?
      My simple code is as below,
      glUseProgram(program.get());
      glBindImageTexture(0, texture[0], 0, GL_FALSE, 3, GL_READ_ONLY, GL_R32UI);
      glBindImageTexture(1, texture[1], 0, GL_FALSE, 4, GL_WRITE_ONLY, GL_R32UI);
      glDispatchCompute(1, 1, 1);
      // Does sync be needed here?
      glUseProgram(0);
      glBindFramebuffer(GL_READ_FRAMEBUFFER, framebuffer);
      glFramebufferTexture2D(GL_READ_FRAMEBUFFER, GL_COLOR_ATTACHMENT0,
                                     GL_TEXTURE_CUBE_MAP_POSITIVE_X + face, texture[1], 0);
      glReadPixels(0, 0, kWidth, kHeight, GL_RED_INTEGER, GL_UNSIGNED_INT, outputValues);
       
      Compute shader is very simple, imageLoad content from texture[0], and imageStore content to texture[1]. Does need to sync after dispatchCompute?
    • By Jonathan2006
      My question: is it possible to transform multiple angular velocities so that they can be reinserted as one? My research is below:
      // This works quat quaternion1 = GEQuaternionFromAngleRadians(angleRadiansVector1); quat quaternion2 = GEMultiplyQuaternions(quaternion1, GEQuaternionFromAngleRadians(angleRadiansVector2)); quat quaternion3 = GEMultiplyQuaternions(quaternion2, GEQuaternionFromAngleRadians(angleRadiansVector3)); glMultMatrixf(GEMat4FromQuaternion(quaternion3).array); // The first two work fine but not the third. Why? quat quaternion1 = GEQuaternionFromAngleRadians(angleRadiansVector1); vec3 vector1 = GETransformQuaternionAndVector(quaternion1, angularVelocity1); quat quaternion2 = GEQuaternionFromAngleRadians(angleRadiansVector2); vec3 vector2 = GETransformQuaternionAndVector(quaternion2, angularVelocity2); // This doesn't work //quat quaternion3 = GEQuaternionFromAngleRadians(angleRadiansVector3); //vec3 vector3 = GETransformQuaternionAndVector(quaternion3, angularVelocity3); vec3 angleVelocity = GEAddVectors(vector1, vector2); // Does not work: vec3 angleVelocity = GEAddVectors(vector1, GEAddVectors(vector2, vector3)); static vec3 angleRadiansVector; vec3 angularAcceleration = GESetVector(0.0, 0.0, 0.0); // Sending it through one angular velocity later in my motion engine angleVelocity = GEAddVectors(angleVelocity, GEMultiplyVectorAndScalar(angularAcceleration, timeStep)); angleRadiansVector = GEAddVectors(angleRadiansVector, GEMultiplyVectorAndScalar(angleVelocity, timeStep)); glMultMatrixf(GEMat4FromEulerAngle(angleRadiansVector).array); Also how do I combine multiple angularAcceleration variables? Is there an easier way to transform the angular values?
    • By dpadam450
      I have this code below in both my vertex and fragment shader, however when I request glGetUniformLocation("Lights[0].diffuse") or "Lights[0].attenuation", it returns -1. It will only give me a valid uniform location if I actually use the diffuse/attenuation variables in the VERTEX shader. Because I use position in the vertex shader, it always returns a valid uniform location. I've read that I can share uniforms across both vertex and fragment, but I'm confused what this is even compiling to if this is the case.
       
      #define NUM_LIGHTS 2
      struct Light
      {
          vec3 position;
          vec3 diffuse;
          float attenuation;
      };
      uniform Light Lights[NUM_LIGHTS];
       
       
    • By pr033r
      Hello,
      I have a Bachelor project on topic "Implenet 3D Boid's algorithm in OpenGL". All OpenGL issues works fine for me, all rendering etc. But when I started implement the boid's algorithm it was getting worse and worse. I read article (http://natureofcode.com/book/chapter-6-autonomous-agents/) inspirate from another code (here: https://github.com/jyanar/Boids/tree/master/src) but it still doesn't work like in tutorials and videos. For example the main problem: when I apply Cohesion (one of three main laws of boids) it makes some "cycling knot". Second, when some flock touch to another it scary change the coordination or respawn in origin (x: 0, y:0. z:0). Just some streng things. 
      I followed many tutorials, change a try everything but it isn't so smooth, without lags like in another videos. I really need your help. 
      My code (optimalizing branch): https://github.com/pr033r/BachelorProject/tree/Optimalizing
      Exe file (if you want to look) and models folder (for those who will download the sources):
      http://leteckaposta.cz/367190436
      Thanks for any help...

    • By Andrija
      I am currently trying to implement shadow mapping into my project , but although i can render my depth map to the screen and it looks okay , when i sample it with shadowCoords there is no shadow.
      Here is my light space matrix calculation
      mat4x4 lightViewMatrix; vec3 sun_pos = {SUN_OFFSET * the_sun->direction[0], SUN_OFFSET * the_sun->direction[1], SUN_OFFSET * the_sun->direction[2]}; mat4x4_look_at(lightViewMatrix,sun_pos,player->pos,up); mat4x4_mul(lightSpaceMatrix,lightProjMatrix,lightViewMatrix); I will tweak the values for the size and frustum of the shadow map, but for now i just want to draw shadows around the player position
      the_sun->direction is a normalized vector so i multiply it by a constant to get the position.
      player->pos is the camera position in world space
      the light projection matrix is calculated like this:
      mat4x4_ortho(lightProjMatrix,-SHADOW_FAR,SHADOW_FAR,-SHADOW_FAR,SHADOW_FAR,NEAR,SHADOW_FAR); Shadow vertex shader:
      uniform mat4 light_space_matrix; void main() { gl_Position = light_space_matrix * transfMatrix * vec4(position, 1.0f); } Shadow fragment shader:
      out float fragDepth; void main() { fragDepth = gl_FragCoord.z; } I am using deferred rendering so i have all my world positions in the g_positions buffer
      My shadow calculation in the deferred fragment shader:
      float get_shadow_fac(vec4 light_space_pos) { vec3 shadow_coords = light_space_pos.xyz / light_space_pos.w; shadow_coords = shadow_coords * 0.5 + 0.5; float closest_depth = texture(shadow_map, shadow_coords.xy).r; float current_depth = shadow_coords.z; float shadow_fac = 1.0; if(closest_depth < current_depth) shadow_fac = 0.5; return shadow_fac; } I call the function like this:
      get_shadow_fac(light_space_matrix * vec4(position,1.0)); Where position is the value i got from sampling the g_position buffer
      Here is my depth texture (i know it will produce low quality shadows but i just want to get it working for now):
      sorry because of the compression , the black smudges are trees ... https://i.stack.imgur.com/T43aK.jpg
      EDIT: Depth texture attachment:
      glTexImage2D(GL_TEXTURE_2D, 0,GL_DEPTH_COMPONENT24,fbo->width,fbo->height,0,GL_DEPTH_COMPONENT,GL_FLOAT,NULL); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, fbo->depthTexture, 0);
  • Popular Now