Jump to content

  • Log In with Google      Sign In   
  • Create Account


Like
25Likes
Dislike

OpenGL Instancing Demystified

By Martin Thomas | Published Aug 08 2013 09:20 AM in OpenGL
Peer Reviewed by (slicer4ever, Dave Hunt, ivan.spasov)

ogl opengl instancing

When I tried to implement instancing some time ago, I found that nearly 0 tutorials/articles were written about it. Much like everything a bit beyond OpenGL 2.1, this seems to be a bit of a taboo among OpenGL programmers. Everyone knows about it, yet not many are actually using it despite it being easy to do. Not anymore.

Instancing for everyone


History


Instancing became a core feature of OpenGL starting with version 3.1 back in 2009, named ARB_draw_instanced. At this time you could only use Texture Buffer Objects (TBOs) or Uniform Buffer Objects (UBOs) to actually deliver your data into the shaders.

A year later, in 2010, OpenGL 3.3 arrived with the brand new ARB_instanced_arrays extension now being a core feature. With this addition you could use actual Vertex Buffer Objects (VBOs) to deliver your data. Yay!
There is a restriction to it however, you can only pass 16 vertex attributes to your vertex shader (by specification, or GL_MAX_VERTEX_ATTRIB_BINDINGS), which makes it 16 * vec4 = 64 float values.

In 2010 ARB_draw_indirect (OpenGL 4.0) also made it to the core. It enables you to pass the parameters to the glDrawArrays* functions indirectly, that is from a piece of memory.

In 2011 ARB_base_instance (OpenGL 4.2) became a core feature too. This allows you to specify a half-open range [x...y) of what instance data you would like to draw.

To add ARB_transform_feedback_instanced was also added, that allows you to use the transform feedback data as instance data to draw.

The Big Concept


So when is it appropriate to use instancing? Well when you would like to draw the same thing thousand times. The reason to this is that a single draw call (glDraw*) costs a lot of CPU power, as the driver needs to do some checking and preparation (magic!) before the function call would return. Usually on an average PC 2000 draw calls is the most you can do without hurting your frame rate too bad (remember: you have at max 33 ms per frame!). So drawing something 1000 times would make up half of your draw calls, and that is bad.

Instancing solves this by allowing you to tell the driver: 'Hey, I'd like to draw this piece of geometry 1000 times'. But you would wind up with 1000 objects in the same place, right? To solve this, you can pass data that will be unique to each of the 1000 objects drawn. This is what I call 'Instance Data'. This is usually a (modelview) matrix, but for the sake of simplicity, I will only store one vec4 (position).

Algorithm overview


Normal rendering:

For each frame:


For each object:


-upload object specific data to uniforms (or UBOs)
-render the object


Instancing:

For each frame:


For each object:


-prepare instance data, store it in a buffer (no need to do this each frame if the buffer is static)


-upload that buffer to the GPU
-render the objects using instancing using the provided Instance Data


You can clearly see that the number of draw calls is reduced from n to 1 (plus no uniform passing!).

The implementation


I'm going to use a small (~600 lines) framework I wrote for prototyping techniques. This allows me to hide irrelevant code.

We are going to draw cubes. The first step to drawing cubes is to create a VBO that contains the vertex data.

GLuint box = frm.create_box(); //Vertex Array Object (VAO) of the box

Then we are going to create the VBO for the instance data: the positions of the cubes. To do this we need a buffer (memory) and a VBO.

First let's bind the fresh VAO

glBindVertexArray( box );

Then create the buffer

vector<vec4> positions;
positions.resize( size * size ); //make some space

Then create the VBO for this data

GLuint position_vbo;
glGenBuffers( 1, &position_vbo ); //gen vbo
glBindBuffer( GL_ARRAY_BUFFER, position_vbo ); //bind vbo

Here comes the interesting part: you need to tell the driver that you are going to use this VBO for instancing. To do this you need to tell it these things:


-which vertex attribute location will you use? (2)
-how many components does each piece of data have? (vec4, so 4)
-what type of data are you passing? (floats)
-is this data normalized? (they are positions, so probably no)
-how many bytes is each piece of data? (vec4, so 4 * sizeof( float ) )
-if this data consists of more than four components (like a mat4), then where is this specific data located (relative to the whole piece of data, in bytes)?
-is this data instanced?


All this in code:

GLuint location = 2;
GLint components = 4;
GLenum type = GL_FLOAT;
GLboolean normalized = GL_FALSE;
GLsizei datasize = sizeof( vec4 );
char* pointer = 0; //no other components
GLuint divisor = 1; //instanced

glEnableVertexAttribArray( location ); //tell the location
glVertexAttribPointer( location, components, type, normalized, datasize, pointer ); //tell other data
glVertexAttribDivisor( location, divisor ); //is it instanced?

If the data you would like to pass is a mat4 for example, then you would end up using 4 vertex attribute locations to pass this data. This would require you to set up the VBO a bit differently, telling where each column (vec4) of the matrix is in each piece of data in bytes. This is required because you are passing it in GLvoid* which means that that the size of data in bytes is unknown (no pointer arithmetics). Therefore you need to work in bytes and convert that to GLvoid*.

In code:

GLuint location = 2;
GLint components = 4;
GLenum type = GL_FLOAT;
GLboolean normalized = GL_FALSE;
GLsizei datasize = sizeof( mat4 );
char* pointer = 0;
GLuint divisor = 1;

/**
Matrix:
float mat[16] =
{
 1, 0, 0, 0, //first column:  location at 0 + 0 * sizeof( vec4 ) bytes into the matrix
 0, 1, 0, 0, //second column: location at 0 + 1 * sizeof( vec4 ) bytes into the matrix
 0, 0, 1, 0, //third column:  location at 0 + 2 * sizeof( vec4 ) bytes into the matrix
 0, 0, 0, 1  //fourth column  location at 0 + 3 * sizeof( vec4 ) bytes into the matrix
};
/**/

//you need to do everything for each vertex attribute location
for( int c = 0; c < 4; ++c )
{
  glEnableVertexAttribArray( location + c ); //location of each column
  glVertexAttribPointer( location + c, components, type, normalized, datasize, pointer + c * sizeof( vec4 ) ); //tell other data
  glVertexAttribDivisor( location + c, divisor ); //is it instanced?
}

The divisor tells the driver if the data is instanced. If the divisor is 0 (by default) it means that the data is not instanced. If it is 1 then it will be instanced. For any other value >1 the instance id (gl_InstanceID) in the vertex shader will be divided by this value.

Next you need to load up the shaders. I'm using a super-simple deferred shader for the sake of maximizing shading efficiency, and making these shaders simple.

Vertex shader:

#version 330 core

uniform mat4 mvp; //modelviewprojection matrix
uniform mat3 normal_mat;

layout(location=0) in vec4 in_vertex; //cube vertex position
layout(location=1) in vec3 in_normal; //cube face normal
layout(location=2) in vec4 pos; //instance data, unique to each object (instance)

out vec3 normal;

void main()
{
  normal = normal_mat * in_normal;
  gl_Position = mvp * vec4(in_vertex.xyz + pos.xyz, 1); //write to the depth buffer
}

Pixel shader:

#version 330 core

in vec3 normal;

layout(location=0) out vec4 color; //normals go here

void main()
{
  color = vec4(normal * 0.5 + 0.5, 1);
}

Loading the shaders

GLuint gbuffer_instanced_shader = 0;
frm.load_shader( gbuffer_instanced_shader, GL_VERTEX_SHADER, "../shaders/instancing2/gbuffer_instanced.vs" );
frm.load_shader( gbuffer_instanced_shader, GL_FRAGMENT_SHADER, "../shaders/instancing2/gbuffer.ps" );

GLint gbuffer_instanced_mvp_mat_loc = glGetUniformLocation( gbuffer_instanced_shader, "mvp" );
GLint gbuffer_instanced_normal_mat_loc = glGetUniformLocation( gbuffer_instanced_shader, "normal_mat" );

Finally all you need to do is render the cubes. Usually this would look something like this:

//regular rendering
glBindVertexArray( box );

for( int c = 0; c < size; ++c )
{
  for( int d = 0; d < size; ++d )
  {
    glUniform4f( gbuffer_pos_loc, c * 3 - size, -2 + 0.5 * sin( radians( ( c + d + 1 )* timer.getElapsedTime().asSeconds() ) ), -d * 3, 0 ); //this gives it some ocean-like movement
    glDrawElements( GL_TRIANGLES, 36, GL_UNSIGNED_INT, 0 ); //two triangles per face, that is 6 * 6 = 36 vertices
  }
}

However for instancing you need to update the instance buffer, it looks like this:

//instanced rendering
glBindVertexArray( box );

//store positions in the buffer
for( int c = 0; c < size; ++c )
{
  for( int d = 0; d < size; ++d )
  {
    positions[c * size + d] = vec4( c * 3 - size, -2 + 0.5 * sin( radians( ( c + d + 1 )* timer.getElapsedTime().asSeconds() ) ), -d * 3, 0 );
  }
}

//upload the instance data
glBindBuffer( GL_ARRAY_BUFFER, position_vbo ); //bind vbo 
//you need to upload sizeof( vec4 ) * number_of_cubes bytes, DYNAMIC_DRAW because it is updated per frame
glBufferData( GL_ARRAY_BUFFER, sizeof( vec4 ) * positions.size(), &positions[0][0], GL_DYNAMIC_DRAW );

glDrawElementsInstanced( GL_TRIANGLES, 36, GL_UNSIGNED_INT, 0, positions.size() );

This is it. The rest of the code is setting up the deferred shader, and some controls that should be pretty straightforward.

Interesting Points


Interestingly, doing the simple sin() on the CPU to update the positions became the bottleneck after ~1.000.000 cubes. If I used a matrix, then matrix multiplication was an issue after ~160.000 cubes. This means that even when doing instancing you still need to be clever about the CPU side (doing the matrix muls using SIMD instructions, or in the shaders). After all, updating positions for lots of data is a data parallel task that the GPU usually likes.

Conclusion


Instancing is very important to make sure draw calls are not a bottleneck. I hope more and more people will end up using it in the future.

Additional resources:


-project source
controls: WASD, space to toggle between instancing (green) and normal rendering (red)
building: use cmake to generate project (set CMAKE_BUILD_TYPE to "Release")
https://docs.google.com/file/d/0B33Sh832pOdObExOLTRCRF9QWU0/edit?usp=sharing

-OpenGL history
http://www.opengl.org/wiki/History_of_OpenGL

-Instancing on the OpenGL wiki
http://www.opengl.org/wiki/Vertex_Rendering#Instancing
http://www.opengl.org/wiki/Vertex_Specification#Instanced_arrays
http://www.opengl.org/wiki/Vertex_Rendering#Transform_feedback_rendering

-related tutorials I found
http://ogldev.atspace.co.uk/www/tutorial33/tutorial33.html
http://sol.gfxile.net/instancing.html

-instance culling using transform feedback
http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/


Article Update Log


20 Jun 2013: Fixed typo: vec4 --> mat4 at matrix example, usualy --> usually at 'interesting points' part, becuase --> because
15 Jun 2013: Initial release



License


GDOL (Gamedev.net Open License)




Comments

location at" should be

 

0 +

1 +

2 +

 

not

0+

0+

0+

thanks for the constructive criticism DemonRad!
However, I have to prove you wrong:
what I am doing is incrementing a char* pointer byte-by-byte. What you are referring to would be incrementing a vec4* pointer by sizeof( vec4 ):
vec4* a = 0;
char* b = 0;
a = a + 1;
b = b + sizeof( vec4 ); //should be the same
This would work of course if converted to GLvoid* later.

Here's the project updated with an example showcasing matrix usage as Instance Data:
https://docs.google.com/file/d/0B33Sh832pOdOc2V2LWF6M0hzNGM/edit?usp=sharing

I learned about the instancing in opengl and knew that that was really usefull. But now I also know that it increases performances ! Thanks !

That is what I was looking for. Thanks!

It would be nice if the introduction provided a brief explanation of what instancing actually is before starting to explain when you should use it.

Does android support any of this?

I'm have this exact problem in my current android game.

Only there it is even worse because each draw call is passed through the JNI.

I had some trouble with my setup(GF590 with driver version 320.49).

 

light.ps report that there is no direct cast from vec4 to vec3 and i fixed it by explicit cast it with .xyz .

        vec3 h = 0.5 * (l + normalize(-vs_pos).xyz);

The CG compiler cry about the "layout binding" need #version 440 or #extension GL_ARB_shading_language_420pack .

After changing the version it was working fine.

It would be nice if the introduction provided a brief explanation of what instancing actually is before starting to explain when you should use it.

I think you could expect one to at least read the corresponding wiki article:
http://en.wikipedia.org/wiki/Geometry_instancing

but if you'd really like to see it, I can add it.

Does android support any of this?

I'm have this exact problem in my current android game.

Only there it is even worse because each draw call is passed through the JNI.

I believe no, it does not. It should have pseudo-instancing though.
Plus if I'm right OGLES 3.0 should have instancing, however the devices supporting it are just coming/came out.

I had some trouble with my setup(GF590 with driver version 320.49).

 

light.ps report that there is no direct cast from vec4 to vec3 and i fixed it by explicit cast it with .xyz .

        vec3 h = 0.5 * (l + normalize(-vs_pos).xyz);

The CG compiler cry about the "layout binding" need #version 440 or #extension GL_ARB_shading_language_420pack .

After changing the version it was working fine.

well yeah I'm on AMD, and their driver allows such things :)

I have a OGL4.x level GPU so I may not notice I'm using layout(binding=...) is not supported on OGL3.x level. I'm just used to it now.

 

It would be nice if the introduction provided a brief explanation of what instancing actually is before starting to explain when you should use it.

I think you could expect one to at least read the corresponding wiki article:
http://en.wikipedia.org/wiki/Geometry_instancing

but if you'd really like to see it, I can add it.

 

 

It's more about how to write an article in general -- it's a good idea to define what it is that you are going to be talking about, particularly if you are 'demystifying' it.


Note: Please offer only positive, constructive comments - we are looking to promote a positive atmosphere where collaboration is valued above all else.




PARTNERS