• Advertisement
Sign in to follow this  

OpenGL glBufferSubDataARB performance issues

This topic is 3131 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Can anyone tell if there is a better way? Is it the number of calls to glBufferSubDataARB() that's killing me or the number of VBO's ?!? I'm working on an opengl project that has potentially hundreds of objects (around 300-400 objects) animated on the CPU. Each one is roughly 300-900 triangles. Unfortunately I can't batch them into a single draw call as each object may contain unique shader parameters. Right now I'm pre-allocating VBO memory for each object using glBufferDataARB( GL_ARRAY_BUFFER_ARB, numBytes, verts, GL_DYNAMIC_DRAW ) and then updating all VBOS using glBufferSubDataARB( GL_ARRAY_BUFFER_ARB, 0, numBytes, verts ) before rendering the objects. I'm told that this is the fastest way to update but for some reason this is killing my performance. If I comment out this update call and just loop through all other code (cpu animations, visible object determination, etc) it runs at crazy frame rates (120+ fps), but including the update drops it down to the teens. NOTE: To avoid fill-rate times from effecting my tests, I disabled the actual draw calls. Only CPU code and VBO updates were being processed here.

Share this post

Link to post
Share on other sites
Try mapping buffers, try to have fewer buffers with more than one triangle group in each (and use offset for drawing), and try to have several sets of them or use glBufferDataARB(..., 0, ...).

glBufferSubDataARB can do, but is not required to, the transfer to the card asynchronously. What it can't do is to do the memory copy to its own buffers asynchronously, as you could in theory delete the pointed-to memory the next microsecond after the function call returns. Thus it will always have to have some additional delay compared to mapping the buffer.

Your second issue is binding buffers, which is not as trivial on the driver side as you may think. Doing that many hundred times per frame can be a problem. Indexing into fewer buffers is cheaper.

The third issue is stalls. Drawing cannot happen before all data has been transferred, and transfers to the same memory can't happen before all draws that use it have finished. The driver will schedule asynchronously as much as it can, but it can't do much in such situations if it doesn't know what's safe to discard and when.
This can be solved by explicitely having 2-3 sets of buffers (so it will draw from one while you upload the other) or simply by calling glBufferDataARB with zero size before uploading the next buffer, which tells the driver "I won't be using the old contents any more, so throw it away once you're done, and store my new data elsewhere in the mean time".

Share this post

Link to post
Share on other sites
Thanks samoth. Maybe I'll try to create one big VBO and reserve partitions for each object, then render using offsets. I wonder if bandwidth is an issue here as well since I'm using a interleaved vertex structure and not separate arrays.

// vertex structure (68 bytes)
vec3f pos;
vec2f st[2];
vec3f tangents[3];
unsigned char color[4];

At an average of 600 verts x 300 objects x 68 bytes, I'm looking at 12MB per frame of uploads... =( That sounds like a lot, but I don't know what is considered reasonable for OGL1.5+ compatible video card.

Share this post

Link to post
Share on other sites
question.. If i group many objects into one VBO how would I assign the offset?!?

// object[0]
glVertexAttribPointerARB(..., 0); // vertex.pos
glVertexAttribPointerARB(..., 12); // vertex.texcoord

// object[1] ?? is the following valid ??
glVertexAttribPointerARB(..., 0 + firstVertexOffset); // vertex.pos
glVertexAttribPointerARB(..., 12 + firstVertexOffset); // vertex.texcoord

or would I always bind (0 and 12) and use offsets in my index buffer?

object[0].indexes = { 0, 1, 2, 3 }
object[1].indexes = { 4, 5, 6, 7 }

Share this post

Link to post
Share on other sites
FYI for anyone following this thread. It unfortunately didn't make a very big difference in performance to combine VBO's. =(. I guess the way I was handling the data was fairly optimal to begin with so batching the objects into a single VBO was a negligible boost at the cost of much less flexibility (hard to add/remove objects easily without another manager or brute re-allocations to close the fragmentation).

I think that I may have simply run into a bandwidth issue with my card =(. I did a test with around 300 objects which totaled close to 6.2MB of vertex buffer updates but only a few bytes of index buffer updates. The frame rate was running around 25fps-35fps (no rendering, only updates) which means:

(6.2)MB per frame x (25 to 35)fps = 155MB/s - 217MB/s upload...

video card:
NVidia GeForce 8600M GS

Wikipedia claims a memory bandwidth of 12.8 to 22.4 GB/s... Ofcourse AGP1x and PCI are in the range of 150-250MB/s so this is likely the problem and not the video card.

[Edited by - digitalgibs on July 27, 2009 2:05:31 AM]

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
  • Advertisement
  • Popular Tags

  • Advertisement
  • Popular Now

  • Similar Content

    • By LifeArtist
      Good Evening,
      I want to make a 2D game which involves displaying some debug information. Especially for collision, enemy sights and so on ...
      First of I was thinking about all those shapes which I need will need for debugging purposes: circles, rectangles, lines, polygons.
      I am really stucked right now because of the fundamental question:
      Where do I store my vertices positions for each line (object)? Currently I am not using a model matrix because I am using orthographic projection and set the final position within the VBO. That means that if I add a new line I would have to expand the "points" array and re-upload (recall glBufferData) it every time. The other method would be to use a model matrix and a fixed vbo for a line but it would be also messy to exactly create a line from (0,0) to (100,20) calculating the rotation and scale to make it fit.
      If I proceed with option 1 "updating the array each frame" I was thinking of having 4 draw calls every frame for the lines vao, polygons vao and so on. 
      In addition to that I am planning to use some sort of ECS based architecture. So the other question would be:
      Should I treat those debug objects as entities/components?
      For me it would make sense to treat them as entities but that's creates a new issue with the previous array approach because it would have for example a transform and render component. A special render component for debug objects (no texture etc) ... For me the transform component is also just a matrix but how would I then define a line?
      Treating them as components would'nt be a good idea in my eyes because then I would always need an entity. Well entity is just an id !? So maybe its a component?
    • By QQemka
      Hello. I am coding a small thingy in my spare time. All i want to achieve is to load a heightmap (as the lowest possible walking terrain), some static meshes (elements of the environment) and a dynamic character (meaning i can move, collide with heightmap/static meshes and hold a varying item in a hand ). Got a bunch of questions, or rather problems i can't find solution to myself. Nearly all are deal with graphics/gpu, not the coding part. My c++ is on high enough level.
      Let's go:
      Heightmap - i obviously want it to be textured, size is hardcoded to 256x256 squares. I can't have one huge texture stretched over entire terrain cause every pixel would be enormous. Thats why i decided to use 2 specified textures. First will be a tileset consisting of 16 square tiles (u v range from 0 to 0.25 for first tile and so on) and second a 256x256 buffer with 0-15 value representing index of the tile from tileset for every heigtmap square. Problem is, how do i blend the edges nicely and make some computationally cheap changes so its not obvious there are only 16 tiles? Is it possible to generate such terrain with some existing program?
      Collisions - i want to use bounding sphere and aabb. But should i store them for a model or entity instance? Meaning i have 20 same trees spawned using the same tree model, but every entity got its own transformation (position, scale etc). Storing collision component per instance grats faster access + is precalculated and transformed (takes additional memory, but who cares?), so i stick with this, right? What should i do if object is dynamically rotated? The aabb is no longer aligned and calculating per vertex min/max everytime object rotates/scales is pretty expensive, right?
      Drawing aabb - problem similar to above (storing aabb data per instance or model). This time in my opinion per model is enough since every instance also does not have own vertex buffer but uses the shared one (so 20 trees share reference to one tree model). So rendering aabb is about taking the model's aabb, transforming with instance matrix and voila. What about aabb vertex buffer (this is more of a cosmetic question, just curious, bumped onto it in time of writing this). Is it better to make it as 8 points and index buffer (12 lines), or only 2 vertices with min/max x/y/z and having the shaders dynamically generate 6 other vertices and draw the box? Or maybe there should be just ONE 1x1x1 cube box template moved/scaled per entity?
      What if one model got a diffuse texture and a normal map, and other has only diffuse? Should i pass some bool flag to shader with that info, or just assume that my game supports only diffuse maps without fancy stuff?
      There were several more but i forgot/solved them at time of writing
      Thanks in advance
    • By RenanRR
      Hi All,
      I'm reading the tutorials from learnOpengl site (nice site) and I'm having a question on the camera (https://learnopengl.com/Getting-started/Camera).
      I always saw the camera being manipulated with the lookat, but in tutorial I saw the camera being changed through the MVP arrays, which do not seem to be camera, but rather the scene that changes:
      Vertex Shader:
      #version 330 core layout (location = 0) in vec3 aPos; layout (location = 1) in vec2 aTexCoord; out vec2 TexCoord; uniform mat4 model; uniform mat4 view; uniform mat4 projection; void main() { gl_Position = projection * view * model * vec4(aPos, 1.0f); TexCoord = vec2(aTexCoord.x, aTexCoord.y); } then, the matrix manipulated:
      ..... glm::mat4 projection = glm::perspective(glm::radians(fov), (float)SCR_WIDTH / (float)SCR_HEIGHT, 0.1f, 100.0f); ourShader.setMat4("projection", projection); .... glm::mat4 view = glm::lookAt(cameraPos, cameraPos + cameraFront, cameraUp); ourShader.setMat4("view", view); .... model = glm::rotate(model, glm::radians(angle), glm::vec3(1.0f, 0.3f, 0.5f)); ourShader.setMat4("model", model);  
      So, some doubts:
      - Why use it like that?
      - Is it okay to manipulate the camera that way?
      -in this way, are not the vertex's positions that changes instead of the camera?
      - I need to pass MVP to all shaders of object in my scenes ?
      What it seems, is that the camera stands still and the scenery that changes...
      it's right?
      Thank you
    • By dpadam450
      Sampling a floating point texture where the alpha channel holds 4-bytes of packed data into the float. I don't know how to cast the raw memory to treat it as an integer so I can perform bit-shifting operations.

      int rgbValue = int(textureSample.w);//4 bytes of data packed as color
      // algorithm might not be correct and endianness might need switching.
      vec3 extractedData = vec3(  rgbValue & 0xFF000000,  (rgbValue << 8) & 0xFF000000, (rgbValue << 16) & 0xFF000000);
      extractedData /= 255.0f;
    • By Devashish Khandelwal
      While writing a simple renderer using OpenGL, I faced an issue with the glGetUniformLocation function. For some reason, the location is coming to be -1.
      Anyone has any idea .. what should I do?
  • Advertisement