Memory-Array vs Dynamic-VBO: which is better?

Started by
27 comments, last by christian h 17 years, 6 months ago
I'm trying to optimize my render code. During this I noticed that the copying values to the VBO takes quite some time. The VBO is a dynamic one as the mesh bends around ( creature ). Now I question myself if I am quicker using a memory-array instead of making a VBO. I also would like to keep in mind the memory consumption. With higher resolution meshes the VBO data can quickly explode eating precious texture memory. Is it worth dropping a VBO in favor of cpu memory especially if the data changes every frame?

Life's like a Hydra... cut off one problem just to have two more popping out.
Leader and Coder: Project Epsylon | Drag[en]gine Game Engine

Advertisement
You would probably be better off using VBOs for more static data thats displayed a lot. I would recommend trying both methods and seeing what works best for you, and what is fastest.
Quote:Original post by RPTD
The VBO is a dynamic one as the mesh bends around ( creature ).
Instead of using a dymaic mesh (that is generated in software), it may be possible to use a static mesh and deform it in hardware with matricies (i.e. hardware skinning / skeletal animation).
Allways question authority......unless you're on GameDev.net, then it will hurt your rating very badly so just shut the fuck up.
VBO might stand out as better approach, if you go deeper and do stuff like caching, multiple passes per frame. And you shouldn't copy an array of vertices to the VBO, but calculate the data directly to it using memory-mapping. And create the VBO with write-only, rw-mode _will_ kill you. And then there's the performance hints, static/dynamic usage etc..

For single-pass non-cached low-poly models VA's work really well these days at least.

ch.
@PhilMorton:
The problem is that I use a complex animation system. For example my dragon player model weights in at roughly 410 weight matrices ( from over 100 bones with vertex bone weights ). While I could do a Float-Texture hack there to calculate the vertices I am at a complete loss what goes for normals and tangents. I have to calculate them all over from the transformed vertices as there exists no way to produce a weight matrix for those ( A simple example situation shows the impossibility immediatly ). Hence transforming on the GPU would be not impossible but would require heavy tricks with Float-Textures and various GLSL scripts. This approach would cost a huge amount of texture memory and I don't know if the speed would really catch up with in the end.

@christian:
And there I heard before that memory mapping is worse than using a copy array: now what is true? And furthermore I am in OpenGL here. Don't know where there would be "write-only" mode and such things. You can only set STATIC or STEAM modes ( 3 in total ).

Life's like a Hydra... cut off one problem just to have two more popping out.
Leader and Coder: Project Epsylon | Drag[en]gine Game Engine

Quote:Original post by christian h
VBO might stand out as better approach, if you go deeper and do stuff like caching, multiple passes per frame. And you shouldn't copy an array of vertices to the VBO, but calculate the data directly to it using memory-mapping.
ch.


If you use glBufferSubData, then your later method is not possible.
I have read that ATI prefers this over glMapBuffer. I don't really know which method is better.

The OP can make a dynamic VBO.

glBindBuffer(...., VBOID);
glBufferData(..., ..., ..., GL_STREAM_DRAW);
or
glBufferData(..., ..., ..., GL_DYNAMIC_DRAW);

STREAM means you will change very often : change, draw, change, draw
DYNAMIC means you will change les often : change, draw, draw, change, draw, draw, draw, draw, draw

but these are hints to the driver.
For some driver, STREAM and DYNAMIC may be the same thing.
Sig: http://glhlib.sourceforge.net
an open source GLU replacement library. Much more modern than GLU.
float matrix[16], inverse_matrix[16];
glhLoadIdentityf2(matrix);
glhTranslatef2(matrix, 0.0, 0.0, 5.0);
glhRotateAboutXf2(matrix, angleInRadians);
glhScalef2(matrix, 1.0, 1.0, -1.0);
glhQuickInvertMatrixf2(matrix, inverse_matrix);
glUniformMatrix4fv(uniformLocation1, 1, FALSE, matrix);
glUniformMatrix4fv(uniformLocation2, 1, FALSE, inverse_matrix);
Quote:Original post by V-man
Quote:Original post by christian h
VBO might stand out as better approach, if you go deeper and do stuff like caching, multiple passes per frame. And you shouldn't copy an array of vertices to the VBO, but calculate the data directly to it using memory-mapping.
ch.


If you use glBufferSubData, then your later method is not possible.
I have read that ATI prefers this over glMapBuffer. I don't really know which method is better.


Yes the glBufferSubData is faster on modern hardware with the latest drivers. The map method was older and is generally a slower method. I have actually seen FPS drop because of mapping VBOs.



++ My::Game ++
So if I have a preallocated VBO using glBufferSubData for the entire range is faster than glBufferData on a modern system?

Life's like a Hydra... cut off one problem just to have two more popping out.
Leader and Coder: Project Epsylon | Drag[en]gine Game Engine

Quote:Original post by RPTD
So if I have a preallocated VBO using glBufferSubData for the entire range is faster than glBufferData on a modern system?


I will indirectly answer your question because frankly I am not sure how glBufferSubData and glBufferData are managed by the drivers and the GPUs internally.

1. glBufferData does an allocation of memory evertime you call it.

2. glBufferSubData will update a part of the data and does no memory allocation or deallocation. So should be faster.

Now the results. I did some changes in our engine's renderer, which uses VBOs whenever possible and falls back on Vertex arrays if VBO support is absent. I made sure the renderer was using VBOs and then tried replacing glBufferSubData and glBufferData. There was a drop in speed, but the results are far from conclusive. Also I currently have only a single gpu to test the code so can't really say if glBufferSubData was of any real value. The other reason may be because the engine batches data aggressively so there is no real difference noticeable. I need to test more with animated meshes and a bunch of other stuff.

Then I tried using Vertex Arrays instead of VBOs on same GPU, speed dropped considerably. This again is with the entire engine and not with one particular instance like the one you are interested in ("With higher resolution meshes")
++ My::Game ++
What do you mean exactly by "batching" in this context? ( just to see if I have something similar to compare results )

Life's like a Hydra... cut off one problem just to have two more popping out.
Leader and Coder: Project Epsylon | Drag[en]gine Game Engine

This topic is closed to new replies.

Advertisement