OpenGL Instancing Demystified

Programming

Graphics and GPU Programming

Published August 08, 2013 by Martin Thomas, posted by Yours3!f

Do you see issues with this article? Let us know.

When I tried to implement instancing some time ago, I found that nearly 0 tutorials/articles were written about it. Much like everything a bit beyond OpenGL 2.1, this seems to be a bit of a taboo among OpenGL programmers. Everyone knows about it, yet not many are actually using it despite it being easy to do. Not anymore.

Instancing for everyone

History

Instancing became a core feature of OpenGL starting with version 3.1 back in 2009, named ARB_draw_instanced. At this time you could only use Texture Buffer Objects (TBOs) or Uniform Buffer Objects (UBOs) to actually deliver your data into the shaders. A year later, in 2010, OpenGL 3.3 arrived with the brand new ARB_instanced_arrays extension now being a core feature. With this addition you could use actual Vertex Buffer Objects (VBOs) to deliver your data. Yay! There is a restriction to it however, you can only pass 16 vertex attributes to your vertex shader (by specification, or GL_MAX_VERTEX_ATTRIB_BINDINGS), which makes it 16 * vec4 = 64 float values. In 2010 ARB_draw_indirect (OpenGL 4.0) also made it to the core. It enables you to pass the parameters to the glDrawArrays* functions indirectly, that is from a piece of memory. In 2011 ARB_base_instance (OpenGL 4.2) became a core feature too. This allows you to specify a half-open range [x...y) of what instance data you would like to draw. To add ARB_transform_feedback_instanced was also added, that allows you to use the transform feedback data as instance data to draw.

The Big Concept

So when is it appropriate to use instancing? Well when you would like to draw the same thing thousand times. The reason to this is that a single draw call (glDraw*) costs a lot of CPU power, as the driver needs to do some checking and preparation (magic!) before the function call would return. Usually on an average PC 2000 draw calls is the most you can do without hurting your frame rate too bad (remember: you have at max 33 ms per frame!). So drawing something 1000 times would make up half of your draw calls, and that is bad. Instancing solves this by allowing you to tell the driver: 'Hey, I'd like to draw this piece of geometry 1000 times'. But you would wind up with 1000 objects in the same place, right? To solve this, you can pass data that will be unique to each of the 1000 objects drawn. This is what I call 'Instance Data'. This is usually a (modelview) matrix, but for the sake of simplicity, I will only store one vec4 (position).

Algorithm overview

Normal rendering: For each frame:

For each object:

-upload object specific data to uniforms (or UBOs) -render the object

Instancing: For each frame:

For each object:

-prepare instance data, store it in a buffer (no need to do this each frame if the buffer is static)

-upload that buffer to the GPU -render the objects using instancing using the provided Instance Data

You can clearly see that the number of draw calls is reduced from n to 1 (plus no uniform passing!).

The implementation

I'm going to use a small (~600 lines) framework I wrote for prototyping techniques. This allows me to hide irrelevant code. We are going to draw cubes. The first step to drawing cubes is to create a VBO that contains the vertex data.


GLuint box = frm.create_box(); //Vertex Array Object (VAO) of the box

Then we are going to create the VBO for the instance data: the positions of the cubes. To do this we need a buffer (memory) and a VBO. First let's bind the fresh VAO


glBindVertexArray( box );

Then create the buffer


vector positions;
positions.resize( size * size ); //make some space

Then create the VBO for this data


GLuint position_vbo;
glGenBuffers( 1, &position_vbo ); //gen vbo
glBindBuffer( GL_ARRAY_BUFFER, position_vbo ); //bind vbo

Here comes the interesting part: you need to tell the driver that you are going to use this VBO for instancing. To do this you need to tell it these things:

-which vertex attribute location will you use? (2) -how many components does each piece of data have? (vec4, so 4) -what type of data are you passing? (floats) -is this data normalized? (they are positions, so probably no) -how many bytes is each piece of data? (vec4, so 4 * sizeof( float ) ) -if this data consists of more than four components (like a mat4), then where is this specific data located (relative to the whole piece of data, in bytes)? -is this data instanced?

All this in code:


GLuint location = 2;
GLint components = 4;
GLenum type = GL_FLOAT;
GLboolean normalized = GL_FALSE;
GLsizei datasize = sizeof( vec4 );
char* pointer = 0; //no other components
GLuint divisor = 1; //instanced

glEnableVertexAttribArray( location ); //tell the location
glVertexAttribPointer( location, components, type, normalized, datasize, pointer ); //tell other data
glVertexAttribDivisor( location, divisor ); //is it instanced?

If the data you would like to pass is a mat4 for example, then you would end up using 4 vertex attribute locations to pass this data. This would require you to set up the VBO a bit differently, telling where each column (vec4) of the matrix is in each piece of data in bytes. This is required because you are passing it in GLvoid* which means that that the size of data in bytes is unknown (no pointer arithmetics). Therefore you need to work in bytes and convert that to GLvoid*. In code:


GLuint location = 2;
GLint components = 4;
GLenum type = GL_FLOAT;
GLboolean normalized = GL_FALSE;
GLsizei datasize = sizeof( mat4 );
char* pointer = 0;
GLuint divisor = 1;

/**
Matrix:
float mat[16] =
{
 1, 0, 0, 0, //first column:  location at 0 + 0 * sizeof( vec4 ) bytes into the matrix
 0, 1, 0, 0, //second column: location at 0 + 1 * sizeof( vec4 ) bytes into the matrix
 0, 0, 1, 0, //third column:  location at 0 + 2 * sizeof( vec4 ) bytes into the matrix
 0, 0, 0, 1  //fourth column  location at 0 + 3 * sizeof( vec4 ) bytes into the matrix
};
/**/

//you need to do everything for each vertex attribute location
for( int c = 0; c < 4; ++c )
{
  glEnableVertexAttribArray( location + c ); //location of each column
  glVertexAttribPointer( location + c, components, type, normalized, datasize, pointer + c * sizeof( vec4 ) ); //tell other data
  glVertexAttribDivisor( location + c, divisor ); //is it instanced?
}

The divisor tells the driver if the data is instanced. If the divisor is 0 (by default) it means that the data is not instanced. If it is 1 then it will be instanced. For any other value >1 the instance id (gl_InstanceID) in the vertex shader will be divided by this value. Next you need to load up the shaders. I'm using a super-simple deferred shader for the sake of maximizing shading efficiency, and making these shaders simple. Vertex shader:


#version 330 core

uniform mat4 mvp; //modelviewprojection matrix
uniform mat3 normal_mat;

layout(location=0) in vec4 in_vertex; //cube vertex position
layout(location=1) in vec3 in_normal; //cube face normal
layout(location=2) in vec4 pos; //instance data, unique to each object (instance)

out vec3 normal;

void main()
{
  normal = normal_mat * in_normal;
  gl_Position = mvp * vec4(in_vertex.xyz + pos.xyz, 1); //write to the depth buffer
}

Pixel shader:


#version 330 core

in vec3 normal;

layout(location=0) out vec4 color; //normals go here

void main()
{
  color = vec4(normal * 0.5 + 0.5, 1);
}

Loading the shaders


GLuint gbuffer_instanced_shader = 0;
frm.load_shader( gbuffer_instanced_shader, GL_VERTEX_SHADER, "../shaders/instancing2/gbuffer_instanced.vs" );
frm.load_shader( gbuffer_instanced_shader, GL_FRAGMENT_SHADER, "../shaders/instancing2/gbuffer.ps" );

GLint gbuffer_instanced_mvp_mat_loc = glGetUniformLocation( gbuffer_instanced_shader, "mvp" );
GLint gbuffer_instanced_normal_mat_loc = glGetUniformLocation( gbuffer_instanced_shader, "normal_mat" );

Finally all you need to do is render the cubes. Usually this would look something like this:


//regular rendering
glBindVertexArray( box );

for( int c = 0; c < size; ++c )
{
  for( int d = 0; d < size; ++d )
  {
    glUniform4f( gbuffer_pos_loc, c * 3 - size, -2 + 0.5 * sin( radians( ( c + d + 1 )* timer.getElapsedTime().asSeconds() ) ), -d * 3, 0 ); //this gives it some ocean-like movement
    glDrawElements( GL_TRIANGLES, 36, GL_UNSIGNED_INT, 0 ); //two triangles per face, that is 6 * 6 = 36 vertices
  }
}

However for instancing you need to update the instance buffer, it looks like this:


//instanced rendering
glBindVertexArray( box );

//store positions in the buffer
for( int c = 0; c < size; ++c )
{
  for( int d = 0; d < size; ++d )
  {
    positions[c * size + d] = vec4( c * 3 - size, -2 + 0.5 * sin( radians( ( c + d + 1 )* timer.getElapsedTime().asSeconds() ) ), -d * 3, 0 );
  }
}

//upload the instance data
glBindBuffer( GL_ARRAY_BUFFER, position_vbo ); //bind vbo 
//you need to upload sizeof( vec4 ) * number_of_cubes bytes, DYNAMIC_DRAW because it is updated per frame
glBufferData( GL_ARRAY_BUFFER, sizeof( vec4 ) * positions.size(), &positions[0][0], GL_DYNAMIC_DRAW );

glDrawElementsInstanced( GL_TRIANGLES, 36, GL_UNSIGNED_INT, 0, positions.size() );

This is it. The rest of the code is setting up the deferred shader, and some controls that should be pretty straightforward.

Interesting Points

Interestingly, doing the simple sin() on the CPU to update the positions became the bottleneck after ~1.000.000 cubes. If I used a matrix, then matrix multiplication was an issue after ~160.000 cubes. This means that even when doing instancing you still need to be clever about the CPU side (doing the matrix muls using SIMD instructions, or in the shaders). After all, updating positions for lots of data is a data parallel task that the GPU usually likes.

Conclusion

Instancing is very important to make sure draw calls are not a bottleneck. I hope more and more people will end up using it in the future. Additional resources:

-project source controls: WASD, space to toggle between instancing (green) and normal rendering (red) building: use cmake to generate project (set CMAKE_BUILD_TYPE to "Release") https://docs.google.com/file/d/0B33Sh832pOdObExOLTRCRF9QWU0/edit?usp=sharing -OpenGL history http://www.opengl.org/wiki/History_of_OpenGL -Instancing on the OpenGL wiki http://www.opengl.org/wiki/Vertex_Rendering#Instancing http://www.opengl.org/wiki/Vertex_Specification#Instanced_arrays http://www.opengl.org/wiki/Vertex_Rendering#Transform_feedback_rendering -related tutorials I found http://ogldev.atspace.co.uk/www/tutorial33/tutorial33.html http://sol.gfxile.net/instancing.html -instance culling using transform feedback http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/

Article Update Log

20 Jun 2013: Fixed typo: vec4 --> mat4 at matrix example, usualy --> usually at 'interesting points' part, becuase --> because 15 Jun 2013: Initial release

0 Likes 11 Comments

Comments

Dario Oliveri

location at" should be

0 +

1 +

2 +

not

June 20, 2013 06:56 PM

Yours3!f

thanks for the constructive criticism DemonRad!
However, I have to prove you wrong:
what I am doing is incrementing a char* pointer byte-by-byte. What you are referring to would be incrementing a vec4* pointer by sizeof( vec4 ):
vec4* a = 0;
char* b = 0;
a = a + 1;
b = b + sizeof( vec4 ); //should be the same
This would work of course if converted to GLvoid* later.

Here's the project updated with an example showcasing matrix usage as Instance Data:
https://docs.google.com/file/d/0B33Sh832pOdOc2V2LWF6M0hzNGM/edit?usp=sharing

June 20, 2013 08:23 PM

shocobenn

I learned about the instancing in opengl and knew that that was really usefull. But now I also know that it increases performances ! Thanks !

June 20, 2013 08:59 PM

Genert

That is what I was looking for. Thanks!

June 21, 2013 08:38 PM

jjd

It would be nice if the introduction provided a brief explanation of what instancing actually is before starting to explain when you should use it.

August 08, 2013 04:46 PM

SillyCow

Does android support any of this?

I'm have this exact problem in my current android game.

Only there it is even worse because each draw call is passed through the JNI.

August 08, 2013 08:32 PM

TAK2k4

I had some trouble with my setup(GF590 with driver version 320.49).

light.ps report that there is no direct cast from vec4 to vec3 and i fixed it by explicit cast it with .xyz .

vec3 h = 0.5 * (l + normalize(-vs_pos).xyz);

The CG compiler cry about the "layout binding" need #version 440 or #extension GL_ARB_shading_language_420pack .

After changing the version it was working fine.

August 08, 2013 09:19 PM

Yours3!f

It would be nice if the introduction provided a brief explanation of what instancing actually is before starting to explain when you should use it.

I think you could expect one to at least read the corresponding wiki article:
http://en.wikipedia.org/wiki/Geometry_instancing

but if you'd really like to see it, I can add it.

August 09, 2013 09:50 AM

Yours3!f

Does android support any of this?

I'm have this exact problem in my current android game.

Only there it is even worse because each draw call is passed through the JNI.

I believe no, it does not. It should have pseudo-instancing though.
Plus if I'm right OGLES 3.0 should have instancing, however the devices supporting it are just coming/came out.
http://www.youtube.com/watch?v=dqdUXNdk4us

August 09, 2013 09:55 AM

Yours3!f

I had some trouble with my setup(GF590 with driver version 320.49).

light.ps report that there is no direct cast from vec4 to vec3 and i fixed it by explicit cast it with .xyz .

vec3 h = 0.5 * (l + normalize(-vs_pos).xyz);

The CG compiler cry about the "layout binding" need #version 440 or #extension GL_ARB_shading_language_420pack .

After changing the version it was working fine.

well yeah I'm on AMD, and their driver allows such things :)

I have a OGL4.x level GPU so I may not notice I'm using layout(binding=...) is not supported on OGL3.x level. I'm just used to it now.

August 09, 2013 09:58 AM

jjd

It would be nice if the introduction provided a brief explanation of what instancing actually is before starting to explain when you should use it.

I think you could expect one to at least read the corresponding wiki article:
http://en.wikipedia.org/wiki/Geometry_instancing

but if you'd really like to see it, I can add it.

It's more about how to write an article in general -- it's a good idea to define what it is that you are going to be talking about, particularly if you are 'demystifying' it.

August 09, 2013 10:34 AM

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

This article is about basic OpenGL instancing. It intends to provide you with proper knowledge about instancing, so that next time when you do a project, you do take advantage of it.

OpenGL Instancing Demystified

Instancing for everyone

History

The Big Concept

Algorithm overview

The implementation

Interesting Points

Conclusion

Article Update Log

Comments

Recommended Tutorials

Other Tutorials by Yours3!f

OpenGL Instancing Demystified

Instancing for everyone

History

The Big Concept

Algorithm overview

The implementation

Interesting Points

Conclusion

Article Update Log

Comments

Recommended Tutorials

Other Tutorials by Yours3&#33;f

Reticulating splines

Other Tutorials by Yours3!f