OpenGL 3.0+ And VAOs

Started by
5 comments, last by Vincent_M 9 years, 7 months ago

I'm finally learning OpenGL above 2.1, which would require some extra driver knowledge in Linux, and a Hail Mary from Apple in regards to OpenGL 4.0 and above. What I'm wondering is: what are the important features differences between OpenGL 3.x and OpenGL 4.x? My guess is that OpenGL 4.2 (I think) provides geometry shaders which allows for hardware batching. In other words, if I had a model of a character, I could render dozens of instances of that character in 1 draw call per mesh in that model. Another are compute shaders in OpenGL 4.3 which is a nice replacement for OpenCL, more of an equivalent to DirectX 11, and possibly VERY useful for processing audio samples for interesting DSP effects that'd typically be handled by the motherboard's audio hardware. There are also 3D textures, and better techniques for rendering volumetric clouds, from what I've heard. Are there any other interesting features to look out for while I learn about OpenGL?

Now, how do vertex array objects (VAOs) work, exactly? From what I've read so far, they preserve vertex state, and by vertex state, I think it means the state of which vertex arrays are enabled. For example, if I have a model that is composed of 5 meshes, and the vertex format for all meshes are the same: position, texture coordinate and normal. So, when setting up my vertex array, I'd generate a VAO, bind to it, enable the first 3 vertex attribute arrays, then unbind. Now, when I wanted to draw the model, I'd just bind to that VAO again, bind my VBOs containing references to the vertex data, and call glDrawElements(). I no longer need to call glEnableVertexAttribArray() or glDisableVertexAttribArray() whenever I draw something because the VAO I've just bound preserves which vertex arrays to enable/disable --effectively batching, or rather, caching those calls into a single gl* call.

Then, there are VBOs... VBOs are completely separate from VAOs. A VBO must be generated per vertex attribute, whether they're separate arrays, or interleaved via structures, blobs, etc. Then, I may have an IBO (index buffer array) if my vertices are indexed, but again, has nothing to do with VAOs. VAOs only cache which vertex attribute arrays are enabled. Is that correct?

NOTE: If this is correct, would it make sense to no store VAOs on a per-model basis, but at a graphics context basis instead. If I have 5 different models that all happen to have the same number of vertex attribute arrays enabled, then I'd create 1 VAO that'd enable the first 3 vertex attribute arrays, bound once, render all instances of all 5 models, then bind to another VAO that uses a different number of arrays.

EDIT: I think I just realized something. So, I'd generate a new VAO, then bind it to configure it. At this point, I'd enable all the attribute arrays needed, and then generate, bind and fill my VBOs/IBOs. Then, I'd also setup glVertexAttribPointer() per attribute to specify the starting address for each attribute in the VBO, or VBOs if I'm going the array-per-attribute route. Finally, I'd unbind for safety. Then, when I want to draw something, it's a matter of setting the correct shader, setting the uniforms (probably with UBOs, but I haven't read that far yet), binding the VAO, and then drawing with glDrawArrays() or glDrawElements(). So, VAOs would greatly reduce the amount of gl* calls by caching these commands in a VAO, which serves similarly to a mini-command buffer that could be modified or calls on-the-fly. Which, if this is correct, then do binding VAOs introduce any type of scope for binding VBOs? For example, if I bound a VBO while a VAO is bound, once I bound the current VAO to zero, would it revert the currently-bound VBO to whatever VBO I was bound to when I wasn't in VAO scope? Does all of this sound about right?

Advertisement

You don't have to, and in many cases don't want to use UBOs. I think the support is still very shaky, but I actually don't know the specifics. I have limited enviroments to test on.

Here is my VAO implementation:

https://github.com/fwsGonzo/library/blob/master/include/library/opengl/vao.hpp

https://github.com/fwsGonzo/library/blob/master/library/opengl/vao.cpp

Just like you said:

generate VAO

bind VAO

note that you don't want to enable attribs here, because you have no VBO bound

an enabled attrib is bound to the VBO you have bound, which means you can in theory have several VBOs with vertex data

generate VBO & IBO

bind VBO

upload data

enable attribs (use offsetof(struct, x))

(potentially bind IBO & upload data)

done. no need to unbind anything.

if you are using a wrapper for VAOs FBOs Textures and Shaders, these wrappers should manage this for you

Note that my implementation isn't 100% perfect. I even spotted grey areas just skimming through it right now, eg. indexes() doesn't do a bind() to make guarantee the IBO bind to work correctly. But it will hopefully give you an idea of how it all works.

When you upload data to a VBO you have a choice between GL_STATIC_DRAW and GL_STREAM_DRAW, the former for when your mesh is static and the latter for when you are re-uploading the data frequently. There are other flags, but afaik the drivers don't care.

So, with all that said, here are some tips:

1. You never really disable an attrib array, as you would just instead use a shader that doesn't utilize the specific attribute.

2. You should avoid unbinding anything, unless you absolutely have to.

3. Don't fall into the immediate mode trap for screenspace shaders, as suddenly glEnable(old_shit) matters, like GL_TEXTURE_2D.

I avoided this trap myself by having a very useful createScreenspace() function in my VAO implementation. :) Laziness > all.

Yes, when you unbind a VAO, you are suddenly back in old/VBO territory with gl*Pointer stuff, I guess. If you are in compatibility mode, like most people are.

What I'm wondering is: what are the important features differences between OpenGL 3.x and OpenGL 4.x?

The main differences between the latest OpenGL 3 and 4 versions off the top of my head are...

  • Tessellation shaders
  • Compute shaders
  • Support for 64-bit floats (doubles) in shaders
  • Separable shader objects - you essentially mix and match shaders in different parts of the graphics pipeline, similar to D3D.
  • Direct state access - no longer have to bind-to-edit
  • Shader storage buffer objects (shader-readable/-writeable memory buffers)
  • Indirect rendering
  • Immutable buffers and textures

Direct state access is part of OpenGL 4.5 which came out just a few weeks ago so unless you have a relatively new Nvidia card, won't be available to you. For rendering multiple of the same model, you would use regular instancing (e.g. glDrawArraysInstanced, glDrawElementsInstanced, etc). With an array of model matrices as uniforms, and the gl_InstanceID variable in your vertex shader, you can then index into the array of matrices to position each instance differently smile.png

Now, how do vertex array objects (VAOs) work, exactly? [...] Then, there are VBOs..

Think of VAOs as containers for vertex attributes (and I guess for convenience, an index buffer). Each vertex attribute then describes where to fetch its data from, how much data to read each time, how many bytes to skip between each element, and so forth. And you can have multiple vertex attributes, like your position, texture coordinates, or even arbitrary data that is needed per-vertex (or per-instance*). So the VAO contains all of that information. Every time you bind the VAO, all this information is used in the subsequent draw calls until you bind a different VAO. I have found some drivers are a bit buggy in that they don't keep the index buffer, so you might need to rebind your index buffer every time you bind your VAO as well...

* Vertex attributes can be per-instance by using glVertexAttribDivisor, which tells GL to advance the attribute read-pointer every N instances.

Where do VBOs come into it, you ask? Each vertex attribute has a "data source" which is your VBO, so you can use a single vertex buffer for all your attributes, or use a different vertex buffer for each attribute, or a mixture.

Perhaps beyond the scope of what you need or intend to do (but I'll add it anyway because I think it's something to consider), is a different way of thinking about VAOs which I came across a few months ago [1]. If instead of creating a VAO per-object, you create a VAO per-vertex format, you can reduce the number of glBindVertexArray calls (which in the driver would reduce the number of buffer changes). In order to do this, you would need to create a very large vertex buffer (a few tens of megabytes) and store all your models in this vertex buffer which had the same vertex format. Each model (or model sub-mesh) would then then also need a base vertex, which is the "offset" in the VBO to start rendering from. So instead of Bind VAO, Draw, Bind VAO, Draw, Bind VAO, Draw, you now end up with Bind VAO, Draw, Draw, Draw, which not only cuts your GL calls in half pretty much, but also the number of potential buffer switches.

Eventually, you see the same can be applied to UBOs as well. Create a large UBO, and describe each 'chunk' with an offset and size. You can take it even further, and allocate a single large buffer, and use different ranges of it as your VBO, IBO and UBO! At this point, you're basically managing your own GPU buffer memory biggrin.png

[1] http://www.ogre3d.org/forums/viewtopic.php?p=506783&sid=f629b3848582844ecb131a120ba21659#p506783 The poster, gsellers, is Graham Sellers from AMD.

Alright, thanks guys. I think I'm getting the hang of it. I've been busy the last 2 weeks with work and the gym, so I've rarely had the time to reply back, let alone test it out. I was able to try out VAOs and VBOs yesterday, and things are starting to click.


So, with all that said, here are some tips:
1. You never really disable an attrib array, as you would just instead use a shader that doesn't utilize the specific attribute.
2. You should avoid unbinding anything, unless you absolutely have to.
3. Don't fall into the immediate mode trap for screenspace shaders, as suddenly glEnable(old_shit) matters, like GL_TEXTURE_2D.
I avoided this trap myself by having a very useful createScreenspace() function in my VAO implementation. Laziness > all.

Thanks for clarifying about the unbinding part --that makes sense. By "screenspace shader", are you talking about post-processing? Also, does the OpenGL 4.x core spec eventually get rid of glEnable()/glDisable() entirely?

@Xycaleth, you bring up a good point on storing everything on a per-format basis. This could reduce the amount of gl* calls, which is always a good thing. These objects might have to be divided up into draw calls due to other factors such as drawing with/without depth, with/without blending, with/without lighting, etc. Btw,


Perhaps beyond the scope of what you need or intend to do (but I'll add it anyway because I think it's something to consider), is a different way of thinking about VAOs which I came across a few months ago [1]. If instead of creating a VAO per-object, you create a VAO per-vertex format, you can reduce the number of glBindVertexArray calls (which in the driver would reduce the number of buffer changes).

This also brings up another question I was wondering: do VAOs provide more efficiency, or are they there for convenience for programmers? It sounds like VAOs are more of a shortcut for programmers to draw stuff to the screen without having to worry about enabling the correct attribute arrays, setting pointers, binding buffers, etc. Instead, VAOs do that for us, obviously, but under the hood, are VAOs really the equivalent of us doing that ourselves meaning they increasing programmer productivity instead of GPU performance? Or, is it caching the commands in a batched way similar to how GL 4.5's DSM methodology will be taking us?

At this point, I'm all theory though! I've been reading quite a bit online, books and making posts. I really need to make time to sit down, and write code lol.

EDIT: I noticed the Graham Sellers link you posted after writing this, and I'm starting to think that VAOs are in fact what my theory was:


Traditional APIs which generally have a function call per state change encourage bad behavior as seen by the GPU. Wrapping blobs of state into state objects or pushing the work of building them onto other threads only addresses the CPU side of the problem. The GPU still eats the same work. In some cases, it will eat more - the big, monolithic state object approach is likely to push a lot of redundancy into the pipe because a large number of states will be the same between objects.

I should have mentioned this before, but my theory is that if VAOs are merely there for productivity, then there could be more GPU overhead because now you have the VAO buffer that's eating up precious video memory, and yet another buffer swap to deal with, but it should cut down on CPU-side overhead as less gl* calls are being made. Is this correct?

EDIT 2: Back in my OpenGL ES 2.0 days, I didn't really mess with gl buffers much. Now that I've read the Graham Sellers article, I'm starting to realize that they can be looked at as just another memory map. My uber shader methodology, as nasty as it was, sounds it's still the fastest alternative. In fact, it sounds like some of OpenGL 4's features don't really make OpenGL 4 much faster in terms of performance except maybe batching... Does OpenGL 4.3's batching features help with that?

Also, does the OpenGL 4.x core spec eventually get rid of glEnable()/glDisable() entirely?

Starting with the introduction of the core spec, some glEnable/glDisable enums are no longer relevant. The reason for this was the move to a programmable pipeline. Take texturing for example. In a fixed function pipeline, you can bind a texture, specify texture coordinates, specify vertex colours, but it's up to you to tell the API whether you want to use texturing by using glEnable(GL_TEXTURE_2D); Compare this with the programmable pipeline: if you don't want to use texturing, then your shaders will not use any texture sampling functions. If you do want to use texturing, then the shaders will use the sampling functions.

This also brings up another question I was wondering: do VAOs provide more efficiency, or are they there for convenience for programmers?

VAOs are purely a software feature (as far as I've seen), that is, the GPU doesn't have any knowledge of them. They're supposed to cut down on time spent validating the vertex attributes, switching buffers, but YMMV. Here's a good write up on when benefits can be seen or not seen: http://www.openglsuperbible.com/2013/12/09/vertex-array-performance/


In fact, it sounds like some of OpenGL 4's features don't really make OpenGL 4 much faster in terms of performance except maybe batching... Does OpenGL 4.3's batching features help with that?

If by batching, you mean instancing, then this is available since 3.1. I'm not sure what else you could mean :P

A minor comment I would add:

There are many new things in OpenGL which help you reduce bugs too. The less states you worry about the better.

Even so, many of these things were already solved by creating your own wrapper classes that deals with all of this, and it continues to be true now.

The new features in 4.x allow more batching, so you have to investigate whether or not you can rewrite parts of your pipeline to utilize these new features, or whether you should just keep using the old proven way. There are some new ways of batching though which I think is easier to (short term) leverage than say going full AZDO approach.

Look at the AZDO presentation (google) to see which order you should render things in, then figure out which features make sense for you and go from there.

Short of using any synchronizing functions (such as glGet*) that stalls the entire pipeline, you're going to be fine. AZDO requires GL 4.4 btw. I think.

Advice about minimizing state changes and batching as much as possible is always true, but it's really only to help programmers make good architectural decisions.



Vincent_M, on 31 Aug 2014 - 3:21 PM, said:
This also brings up another question I was wondering: do VAOs provide more efficiency, or are they there for convenience for programmers?
VAOs are purely a software feature (as far as I've seen), that is, the GPU doesn't have any knowledge of them. They're supposed to cut down on time spent validating the vertex attributes, switching buffers, but YMMV. Here's a good write up on when benefits can be seen or not seen: http://www.openglsuperbible.com/2013/12/09/vertex-array-performance/

I did see that post, and it looks like there are efficiency benefits for VAOs, but if it's just software, then I find it kind of unnecessary outside of it being forced upon you in OpenGL 4.x. My own state manager was a wrapper for whenever I switched FBOs, shader programs, VBOs, textures, glEnable/Disable, and enabling/disabling vertex arrays. The way the vertex array portion worked was that whenever I swapped my shader, and my GraphicsContext class recognized it as swapping to a different shader than the one currently in use, it'd enable/disable the difference vertex arrays from the last bound shader because GraphicsContext also has its own client-side set of bools to keep track of which attribute arrays were currently active internally.

For example, let's just say my currently-bound shader only requires 1 vertex attribute array enabled, so only array 0 would be activated. Then, let's say later on in the frame I need to activate my lit-and-textured shader that takes 3 attribute arrays. It'd activate arrays 1 and 2 only since 0 was already activated. Then, when the next frame is drawn, and I need to go back to the single attribute array shader, it'll swap, and deactivate attribute arrays 1 and 2 all. This is simple to the user drawing something because all they have to do is call GraphicsContext::UseProgram(Shader *shader), and pass in the shader object they require. Now, I'm not sure how efficient the software implementation is, but if my objects were grouped up by shader, then by state, etc you're really not calling glEnableVertexAttribArray()/Disable too much! Now, glVertexAttribArrayPointer() gets called per legit shader swap, however, but there's ways of further optimizing that using the massive VBO buffer mentioned above, and also referenced in Graham Sellers' post above.


Look at the AZDO presentation (google) to see which order you should render things in, then figure out which features make sense for you and go from there.
Short of using any synchronizing functions (such as glGet*) that stalls the entire pipeline, you're going to be fine. AZDO requires GL 4.4 btw. I think.

Ironically, I haven't needed to use any glGet* functions outside of glGetString(GL_VERSION) at startup to print the implementation string for logging purposes. The guys over at Steam mentioned in their video regarding porting their engine over from DirectX to OpenGL that their Source Engine uses glGet* for nearly ever state query they need as they believe that all states systems deviate, at least slightly. I can see how this is true in some cases of the OpenGL State, but when it comes to things, such as glEnable/Disable, writing a wrapper for setting/getting has always worked for me. Of course, my engine only assumes single-context rendering...

But yeah, GraphicsContext::SetGLState(unsigned int state, bool enable) -> pass in ANYTHING, and internally, it'll check if that state's value is in an STL vector already for enabling, or check if does not exist for disabling. If enabling, but the state doesn't exist in the STL vector, then call glEnable, and add it to the vector of states. If disabling, it'll check to see if the state is in the vector, in which case it'll remove it from the STL vector and call glDisable. The method even returns a bool on if it successfully state changes or not. Same with GraphicsContext::UseProgram(Shader *shader), GraphicsContext::SetActiveTexture(int target, Texture *texture), I have one for FBOs, etc.

This cut down quite a bit of gl* calls in generate on mobile devices using OpenGL ES 2.0, and I could assume it'll only do more justice on desktop environments with instancing.

This topic is closed to new replies.

Advertisement