Jump to content
  • Advertisement
  • Remove ads and support GameDev.net for only $3. Learn more: The New GDNet+: No Ads!

  • 08/08/13 03:20 PM
    Sign in to follow this  

    OpenGL Instancing Demystified

    Graphics and GPU Programming

    Yours3!f
    When I tried to implement instancing some time ago, I found that nearly 0 tutorials/articles were written about it. Much like everything a bit beyond OpenGL 2.1, this seems to be a bit of a taboo among OpenGL programmers. Everyone knows about it, yet not many are actually using it despite it being easy to do. Not anymore.

    Instancing for everyone

    History

    Instancing became a core feature of OpenGL starting with version 3.1 back in 2009, named ARB_draw_instanced. At this time you could only use Texture Buffer Objects (TBOs) or Uniform Buffer Objects (UBOs) to actually deliver your data into the shaders. A year later, in 2010, OpenGL 3.3 arrived with the brand new ARB_instanced_arrays extension now being a core feature. With this addition you could use actual Vertex Buffer Objects (VBOs) to deliver your data. Yay! There is a restriction to it however, you can only pass 16 vertex attributes to your vertex shader (by specification, or GL_MAX_VERTEX_ATTRIB_BINDINGS), which makes it 16 * vec4 = 64 float values. In 2010 ARB_draw_indirect (OpenGL 4.0) also made it to the core. It enables you to pass the parameters to the glDrawArrays* functions indirectly, that is from a piece of memory. In 2011 ARB_base_instance (OpenGL 4.2) became a core feature too. This allows you to specify a half-open range [x...y) of what instance data you would like to draw. To add ARB_transform_feedback_instanced was also added, that allows you to use the transform feedback data as instance data to draw.

    The Big Concept

    So when is it appropriate to use instancing? Well when you would like to draw the same thing thousand times. The reason to this is that a single draw call (glDraw*) costs a lot of CPU power, as the driver needs to do some checking and preparation (magic!) before the function call would return. Usually on an average PC 2000 draw calls is the most you can do without hurting your frame rate too bad (remember: you have at max 33 ms per frame!). So drawing something 1000 times would make up half of your draw calls, and that is bad. Instancing solves this by allowing you to tell the driver: 'Hey, I'd like to draw this piece of geometry 1000 times'. But you would wind up with 1000 objects in the same place, right? To solve this, you can pass data that will be unique to each of the 1000 objects drawn. This is what I call 'Instance Data'. This is usually a (modelview) matrix, but for the sake of simplicity, I will only store one vec4 (position).

    Algorithm overview

    Normal rendering: For each frame:

    For each object:

    -upload object specific data to uniforms (or UBOs) -render the object

    Instancing: For each frame:

    For each object:

    -prepare instance data, store it in a buffer (no need to do this each frame if the buffer is static)

    -upload that buffer to the GPU -render the objects using instancing using the provided Instance Data

    You can clearly see that the number of draw calls is reduced from n to 1 (plus no uniform passing!).

    The implementation

    I'm going to use a small (~600 lines) framework I wrote for prototyping techniques. This allows me to hide irrelevant code. We are going to draw cubes. The first step to drawing cubes is to create a VBO that contains the vertex data. GLuint box = frm.create_box(); //Vertex Array Object (VAO) of the box Then we are going to create the VBO for the instance data: the positions of the cubes. To do this we need a buffer (memory) and a VBO. First let's bind the fresh VAO glBindVertexArray( box ); Then create the buffer vector positions; positions.resize( size * size ); //make some space Then create the VBO for this data GLuint position_vbo; glGenBuffers( 1, &position_vbo ); //gen vbo glBindBuffer( GL_ARRAY_BUFFER, position_vbo ); //bind vbo Here comes the interesting part: you need to tell the driver that you are going to use this VBO for instancing. To do this you need to tell it these things:

    -which vertex attribute location will you use? (2) -how many components does each piece of data have? (vec4, so 4) -what type of data are you passing? (floats) -is this data normalized? (they are positions, so probably no) -how many bytes is each piece of data? (vec4, so 4 * sizeof( float ) ) -if this data consists of more than four components (like a mat4), then where is this specific data located (relative to the whole piece of data, in bytes)? -is this data instanced?

    All this in code: GLuint location = 2; GLint components = 4; GLenum type = GL_FLOAT; GLboolean normalized = GL_FALSE; GLsizei datasize = sizeof( vec4 ); char* pointer = 0; //no other components GLuint divisor = 1; //instanced glEnableVertexAttribArray( location ); //tell the location glVertexAttribPointer( location, components, type, normalized, datasize, pointer ); //tell other data glVertexAttribDivisor( location, divisor ); //is it instanced? If the data you would like to pass is a mat4 for example, then you would end up using 4 vertex attribute locations to pass this data. This would require you to set up the VBO a bit differently, telling where each column (vec4) of the matrix is in each piece of data in bytes. This is required because you are passing it in GLvoid* which means that that the size of data in bytes is unknown (no pointer arithmetics). Therefore you need to work in bytes and convert that to GLvoid*. In code: GLuint location = 2; GLint components = 4; GLenum type = GL_FLOAT; GLboolean normalized = GL_FALSE; GLsizei datasize = sizeof( mat4 ); char* pointer = 0; GLuint divisor = 1; /** Matrix: float mat[16] = { 1, 0, 0, 0, //first column: location at 0 + 0 * sizeof( vec4 ) bytes into the matrix 0, 1, 0, 0, //second column: location at 0 + 1 * sizeof( vec4 ) bytes into the matrix 0, 0, 1, 0, //third column: location at 0 + 2 * sizeof( vec4 ) bytes into the matrix 0, 0, 0, 1 //fourth column location at 0 + 3 * sizeof( vec4 ) bytes into the matrix }; /**/ //you need to do everything for each vertex attribute location for( int c = 0; c < 4; ++c ) { glEnableVertexAttribArray( location + c ); //location of each column glVertexAttribPointer( location + c, components, type, normalized, datasize, pointer + c * sizeof( vec4 ) ); //tell other data glVertexAttribDivisor( location + c, divisor ); //is it instanced? } The divisor tells the driver if the data is instanced. If the divisor is 0 (by default) it means that the data is not instanced. If it is 1 then it will be instanced. For any other value >1 the instance id (gl_InstanceID) in the vertex shader will be divided by this value. Next you need to load up the shaders. I'm using a super-simple deferred shader for the sake of maximizing shading efficiency, and making these shaders simple. Vertex shader: #version 330 core uniform mat4 mvp; //modelviewprojection matrix uniform mat3 normal_mat; layout(location=0) in vec4 in_vertex; //cube vertex position layout(location=1) in vec3 in_normal; //cube face normal layout(location=2) in vec4 pos; //instance data, unique to each object (instance) out vec3 normal; void main() { normal = normal_mat * in_normal; gl_Position = mvp * vec4(in_vertex.xyz + pos.xyz, 1); //write to the depth buffer } Pixel shader: #version 330 core in vec3 normal; layout(location=0) out vec4 color; //normals go here void main() { color = vec4(normal * 0.5 + 0.5, 1); } Loading the shaders GLuint gbuffer_instanced_shader = 0; frm.load_shader( gbuffer_instanced_shader, GL_VERTEX_SHADER, "../shaders/instancing2/gbuffer_instanced.vs" ); frm.load_shader( gbuffer_instanced_shader, GL_FRAGMENT_SHADER, "../shaders/instancing2/gbuffer.ps" ); GLint gbuffer_instanced_mvp_mat_loc = glGetUniformLocation( gbuffer_instanced_shader, "mvp" ); GLint gbuffer_instanced_normal_mat_loc = glGetUniformLocation( gbuffer_instanced_shader, "normal_mat" ); Finally all you need to do is render the cubes. Usually this would look something like this: //regular rendering glBindVertexArray( box ); for( int c = 0; c < size; ++c ) { for( int d = 0; d < size; ++d ) { glUniform4f( gbuffer_pos_loc, c * 3 - size, -2 + 0.5 * sin( radians( ( c + d + 1 )* timer.getElapsedTime().asSeconds() ) ), -d * 3, 0 ); //this gives it some ocean-like movement glDrawElements( GL_TRIANGLES, 36, GL_UNSIGNED_INT, 0 ); //two triangles per face, that is 6 * 6 = 36 vertices } } However for instancing you need to update the instance buffer, it looks like this: //instanced rendering glBindVertexArray( box ); //store positions in the buffer for( int c = 0; c < size; ++c ) { for( int d = 0; d < size; ++d ) { positions[c * size + d] = vec4( c * 3 - size, -2 + 0.5 * sin( radians( ( c + d + 1 )* timer.getElapsedTime().asSeconds() ) ), -d * 3, 0 ); } } //upload the instance data glBindBuffer( GL_ARRAY_BUFFER, position_vbo ); //bind vbo //you need to upload sizeof( vec4 ) * number_of_cubes bytes, DYNAMIC_DRAW because it is updated per frame glBufferData( GL_ARRAY_BUFFER, sizeof( vec4 ) * positions.size(), &positions[0][0], GL_DYNAMIC_DRAW ); glDrawElementsInstanced( GL_TRIANGLES, 36, GL_UNSIGNED_INT, 0, positions.size() ); This is it. The rest of the code is setting up the deferred shader, and some controls that should be pretty straightforward.

    Interesting Points

    Interestingly, doing the simple sin() on the CPU to update the positions became the bottleneck after ~1.000.000 cubes. If I used a matrix, then matrix multiplication was an issue after ~160.000 cubes. This means that even when doing instancing you still need to be clever about the CPU side (doing the matrix muls using SIMD instructions, or in the shaders). After all, updating positions for lots of data is a data parallel task that the GPU usually likes.

    Conclusion

    Instancing is very important to make sure draw calls are not a bottleneck. I hope more and more people will end up using it in the future. Additional resources:

    -project source controls: WASD, space to toggle between instancing (green) and normal rendering (red) building: use cmake to generate project (set CMAKE_BUILD_TYPE to "Release") https://docs.google.com/file/d/0B33Sh832pOdObExOLTRCRF9QWU0/edit?usp=sharing -OpenGL history http://www.opengl.org/wiki/History_of_OpenGL -Instancing on the OpenGL wiki http://www.opengl.org/wiki/Vertex_Rendering#Instancing http://www.opengl.org/wiki/Vertex_Specification#Instanced_arrays http://www.opengl.org/wiki/Vertex_Rendering#Transform_feedback_rendering -related tutorials I found http://ogldev.atspace.co.uk/www/tutorial33/tutorial33.html http://sol.gfxile.net/instancing.html -instance culling using transform feedback http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/

    Article Update Log

    20 Jun 2013: Fixed typo: vec4 --> mat4 at matrix example, usualy --> usually at 'interesting points' part, becuase --> because 15 Jun 2013: Initial release


      Report Article
    Sign in to follow this  


    User Feedback


    thanks for the constructive criticism DemonRad!
    However, I have to prove you wrong:
    what I am doing is incrementing a char* pointer byte-by-byte. What you are referring to would be incrementing a vec4* pointer by sizeof( vec4 ):
    vec4* a = 0;
    char* b = 0;
    a = a + 1;
    b = b + sizeof( vec4 ); //should be the same
    This would work of course if converted to GLvoid* later.

    Here's the project updated with an example showcasing matrix usage as Instance Data:
    https://docs.google.com/file/d/0B33Sh832pOdOc2V2LWF6M0hzNGM/edit?usp=sharing

    Share this comment


    Link to comment
    Share on other sites

    I learned about the instancing in opengl and knew that that was really usefull. But now I also know that it increases performances ! Thanks !

    Share this comment


    Link to comment
    Share on other sites

    It would be nice if the introduction provided a brief explanation of what instancing actually is before starting to explain when you should use it.

    Share this comment


    Link to comment
    Share on other sites

    Does android support any of this?

    I'm have this exact problem in my current android game.

    Only there it is even worse because each draw call is passed through the JNI.

    Share this comment


    Link to comment
    Share on other sites

    I had some trouble with my setup(GF590 with driver version 320.49).

     

    light.ps report that there is no direct cast from vec4 to vec3 and i fixed it by explicit cast it with .xyz .

            vec3 h = 0.5 * (l + normalize(-vs_pos).xyz);

    The CG compiler cry about the "layout binding" need #version 440 or #extension GL_ARB_shading_language_420pack .

    After changing the version it was working fine.

    Share this comment


    Link to comment
    Share on other sites

    It would be nice if the introduction provided a brief explanation of what instancing actually is before starting to explain when you should use it.

    I think you could expect one to at least read the corresponding wiki article:
    http://en.wikipedia.org/wiki/Geometry_instancing

    but if you'd really like to see it, I can add it.

    Share this comment


    Link to comment
    Share on other sites

    Does android support any of this?

    I'm have this exact problem in my current android game.

    Only there it is even worse because each draw call is passed through the JNI.

    I believe no, it does not. It should have pseudo-instancing though.
    Plus if I'm right OGLES 3.0 should have instancing, however the devices supporting it are just coming/came out.
    http://www.youtube.com/watch?v=dqdUXNdk4us

    Share this comment


    Link to comment
    Share on other sites

    I had some trouble with my setup(GF590 with driver version 320.49).

     

    light.ps report that there is no direct cast from vec4 to vec3 and i fixed it by explicit cast it with .xyz .

            vec3 h = 0.5 * (l + normalize(-vs_pos).xyz);

    The CG compiler cry about the "layout binding" need #version 440 or #extension GL_ARB_shading_language_420pack .

    After changing the version it was working fine.

    well yeah I'm on AMD, and their driver allows such things :)

    I have a OGL4.x level GPU so I may not notice I'm using layout(binding=...) is not supported on OGL3.x level. I'm just used to it now.

    Share this comment


    Link to comment
    Share on other sites

     

    It would be nice if the introduction provided a brief explanation of what instancing actually is before starting to explain when you should use it.

    I think you could expect one to at least read the corresponding wiki article:
    http://en.wikipedia.org/wiki/Geometry_instancing

    but if you'd really like to see it, I can add it.

     

     

    It's more about how to write an article in general -- it's a good idea to define what it is that you are going to be talking about, particularly if you are 'demystifying' it.

    Share this comment


    Link to comment
    Share on other sites


    Create an account or sign in to comment

    You need to be a member in order to leave a comment

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!