• 08/08/13 03:20 PM
    Sign in to follow this  

    OpenGL Instancing Demystified

    Graphics and GPU Programming

    Yours3!f
    When I tried to implement instancing some time ago, I found that nearly 0 tutorials/articles were written about it. Much like everything a bit beyond OpenGL 2.1, this seems to be a bit of a taboo among OpenGL programmers. Everyone knows about it, yet not many are actually using it despite it being easy to do. Not anymore.

    Instancing for everyone

    History

    Instancing became a core feature of OpenGL starting with version 3.1 back in 2009, named ARB_draw_instanced. At this time you could only use Texture Buffer Objects (TBOs) or Uniform Buffer Objects (UBOs) to actually deliver your data into the shaders. A year later, in 2010, OpenGL 3.3 arrived with the brand new ARB_instanced_arrays extension now being a core feature. With this addition you could use actual Vertex Buffer Objects (VBOs) to deliver your data. Yay! There is a restriction to it however, you can only pass 16 vertex attributes to your vertex shader (by specification, or GL_MAX_VERTEX_ATTRIB_BINDINGS), which makes it 16 * vec4 = 64 float values. In 2010 ARB_draw_indirect (OpenGL 4.0) also made it to the core. It enables you to pass the parameters to the glDrawArrays* functions indirectly, that is from a piece of memory. In 2011 ARB_base_instance (OpenGL 4.2) became a core feature too. This allows you to specify a half-open range [x...y) of what instance data you would like to draw. To add ARB_transform_feedback_instanced was also added, that allows you to use the transform feedback data as instance data to draw.

    The Big Concept

    So when is it appropriate to use instancing? Well when you would like to draw the same thing thousand times. The reason to this is that a single draw call (glDraw*) costs a lot of CPU power, as the driver needs to do some checking and preparation (magic!) before the function call would return. Usually on an average PC 2000 draw calls is the most you can do without hurting your frame rate too bad (remember: you have at max 33 ms per frame!). So drawing something 1000 times would make up half of your draw calls, and that is bad. Instancing solves this by allowing you to tell the driver: 'Hey, I'd like to draw this piece of geometry 1000 times'. But you would wind up with 1000 objects in the same place, right? To solve this, you can pass data that will be unique to each of the 1000 objects drawn. This is what I call 'Instance Data'. This is usually a (modelview) matrix, but for the sake of simplicity, I will only store one vec4 (position).

    Algorithm overview

    Normal rendering: For each frame:

    For each object:

    -upload object specific data to uniforms (or UBOs) -render the object

    Instancing: For each frame:

    For each object:

    -prepare instance data, store it in a buffer (no need to do this each frame if the buffer is static)

    -upload that buffer to the GPU -render the objects using instancing using the provided Instance Data

    You can clearly see that the number of draw calls is reduced from n to 1 (plus no uniform passing!).

    The implementation

    I'm going to use a small (~600 lines) framework I wrote for prototyping techniques. This allows me to hide irrelevant code. We are going to draw cubes. The first step to drawing cubes is to create a VBO that contains the vertex data. GLuint box = frm.create_box(); //Vertex Array Object (VAO) of the box Then we are going to create the VBO for the instance data: the positions of the cubes. To do this we need a buffer (memory) and a VBO. First let's bind the fresh VAO glBindVertexArray( box ); Then create the buffer vector positions; positions.resize( size * size ); //make some space Then create the VBO for this data GLuint position_vbo; glGenBuffers( 1, &position_vbo ); //gen vbo glBindBuffer( GL_ARRAY_BUFFER, position_vbo ); //bind vbo Here comes the interesting part: you need to tell the driver that you are going to use this VBO for instancing. To do this you need to tell it these things:

    -which vertex attribute location will you use? (2) -how many components does each piece of data have? (vec4, so 4) -what type of data are you passing? (floats) -is this data normalized? (they are positions, so probably no) -how many bytes is each piece of data? (vec4, so 4 * sizeof( float ) ) -if this data consists of more than four components (like a mat4), then where is this specific data located (relative to the whole piece of data, in bytes)? -is this data instanced?

    All this in code: GLuint location = 2; GLint components = 4; GLenum type = GL_FLOAT; GLboolean normalized = GL_FALSE; GLsizei datasize = sizeof( vec4 ); char* pointer = 0; //no other components GLuint divisor = 1; //instanced glEnableVertexAttribArray( location ); //tell the location glVertexAttribPointer( location, components, type, normalized, datasize, pointer ); //tell other data glVertexAttribDivisor( location, divisor ); //is it instanced? If the data you would like to pass is a mat4 for example, then you would end up using 4 vertex attribute locations to pass this data. This would require you to set up the VBO a bit differently, telling where each column (vec4) of the matrix is in each piece of data in bytes. This is required because you are passing it in GLvoid* which means that that the size of data in bytes is unknown (no pointer arithmetics). Therefore you need to work in bytes and convert that to GLvoid*. In code: GLuint location = 2; GLint components = 4; GLenum type = GL_FLOAT; GLboolean normalized = GL_FALSE; GLsizei datasize = sizeof( mat4 ); char* pointer = 0; GLuint divisor = 1; /** Matrix: float mat[16] = { 1, 0, 0, 0, //first column: location at 0 + 0 * sizeof( vec4 ) bytes into the matrix 0, 1, 0, 0, //second column: location at 0 + 1 * sizeof( vec4 ) bytes into the matrix 0, 0, 1, 0, //third column: location at 0 + 2 * sizeof( vec4 ) bytes into the matrix 0, 0, 0, 1 //fourth column location at 0 + 3 * sizeof( vec4 ) bytes into the matrix }; /**/ //you need to do everything for each vertex attribute location for( int c = 0; c < 4; ++c ) { glEnableVertexAttribArray( location + c ); //location of each column glVertexAttribPointer( location + c, components, type, normalized, datasize, pointer + c * sizeof( vec4 ) ); //tell other data glVertexAttribDivisor( location + c, divisor ); //is it instanced? } The divisor tells the driver if the data is instanced. If the divisor is 0 (by default) it means that the data is not instanced. If it is 1 then it will be instanced. For any other value >1 the instance id (gl_InstanceID) in the vertex shader will be divided by this value. Next you need to load up the shaders. I'm using a super-simple deferred shader for the sake of maximizing shading efficiency, and making these shaders simple. Vertex shader: #version 330 core uniform mat4 mvp; //modelviewprojection matrix uniform mat3 normal_mat; layout(location=0) in vec4 in_vertex; //cube vertex position layout(location=1) in vec3 in_normal; //cube face normal layout(location=2) in vec4 pos; //instance data, unique to each object (instance) out vec3 normal; void main() { normal = normal_mat * in_normal; gl_Position = mvp * vec4(in_vertex.xyz + pos.xyz, 1); //write to the depth buffer } Pixel shader: #version 330 core in vec3 normal; layout(location=0) out vec4 color; //normals go here void main() { color = vec4(normal * 0.5 + 0.5, 1); } Loading the shaders GLuint gbuffer_instanced_shader = 0; frm.load_shader( gbuffer_instanced_shader, GL_VERTEX_SHADER, "../shaders/instancing2/gbuffer_instanced.vs" ); frm.load_shader( gbuffer_instanced_shader, GL_FRAGMENT_SHADER, "../shaders/instancing2/gbuffer.ps" ); GLint gbuffer_instanced_mvp_mat_loc = glGetUniformLocation( gbuffer_instanced_shader, "mvp" ); GLint gbuffer_instanced_normal_mat_loc = glGetUniformLocation( gbuffer_instanced_shader, "normal_mat" ); Finally all you need to do is render the cubes. Usually this would look something like this: //regular rendering glBindVertexArray( box ); for( int c = 0; c < size; ++c ) { for( int d = 0; d < size; ++d ) { glUniform4f( gbuffer_pos_loc, c * 3 - size, -2 + 0.5 * sin( radians( ( c + d + 1 )* timer.getElapsedTime().asSeconds() ) ), -d * 3, 0 ); //this gives it some ocean-like movement glDrawElements( GL_TRIANGLES, 36, GL_UNSIGNED_INT, 0 ); //two triangles per face, that is 6 * 6 = 36 vertices } } However for instancing you need to update the instance buffer, it looks like this: //instanced rendering glBindVertexArray( box ); //store positions in the buffer for( int c = 0; c < size; ++c ) { for( int d = 0; d < size; ++d ) { positions[c * size + d] = vec4( c * 3 - size, -2 + 0.5 * sin( radians( ( c + d + 1 )* timer.getElapsedTime().asSeconds() ) ), -d * 3, 0 ); } } //upload the instance data glBindBuffer( GL_ARRAY_BUFFER, position_vbo ); //bind vbo //you need to upload sizeof( vec4 ) * number_of_cubes bytes, DYNAMIC_DRAW because it is updated per frame glBufferData( GL_ARRAY_BUFFER, sizeof( vec4 ) * positions.size(), &positions[0][0], GL_DYNAMIC_DRAW ); glDrawElementsInstanced( GL_TRIANGLES, 36, GL_UNSIGNED_INT, 0, positions.size() ); This is it. The rest of the code is setting up the deferred shader, and some controls that should be pretty straightforward.

    Interesting Points

    Interestingly, doing the simple sin() on the CPU to update the positions became the bottleneck after ~1.000.000 cubes. If I used a matrix, then matrix multiplication was an issue after ~160.000 cubes. This means that even when doing instancing you still need to be clever about the CPU side (doing the matrix muls using SIMD instructions, or in the shaders). After all, updating positions for lots of data is a data parallel task that the GPU usually likes.

    Conclusion

    Instancing is very important to make sure draw calls are not a bottleneck. I hope more and more people will end up using it in the future. Additional resources:

    -project source controls: WASD, space to toggle between instancing (green) and normal rendering (red) building: use cmake to generate project (set CMAKE_BUILD_TYPE to "Release") https://docs.google.com/file/d/0B33Sh832pOdObExOLTRCRF9QWU0/edit?usp=sharing -OpenGL history http://www.opengl.org/wiki/History_of_OpenGL -Instancing on the OpenGL wiki http://www.opengl.org/wiki/Vertex_Rendering#Instancing http://www.opengl.org/wiki/Vertex_Specification#Instanced_arrays http://www.opengl.org/wiki/Vertex_Rendering#Transform_feedback_rendering -related tutorials I found http://ogldev.atspace.co.uk/www/tutorial33/tutorial33.html http://sol.gfxile.net/instancing.html -instance culling using transform feedback http://rastergrid.com/blog/2010/02/instance-culling-using-geometry-shaders/

    Article Update Log

    20 Jun 2013: Fixed typo: vec4 --> mat4 at matrix example, usualy --> usually at 'interesting points' part, becuase --> because 15 Jun 2013: Initial release


      Report Article
    Sign in to follow this  


    User Feedback

    Create an account or sign in to leave a review

    You need to be a member in order to leave a review

    Create an account

    Sign up for a new account in our community. It's easy!

    Register a new account

    Sign in

    Already have an account? Sign in here.

    Sign In Now


    Heelp

    Report ·

      

    Share this review


    Link to review
    JoeyDewd

    Report ·

      

    Share this review


    Link to review
    3g3r0

    Report ·

      

    Share this review


    Link to review
    Fen

    Report ·

      

    Share this review


    Link to review
    orangecat

    Report ·

      

    Share this review


    Link to review
    RanBlade

    Report ·

      

    Share this review


    Link to review
    MarkS

    Report ·

      

    Share this review


    Link to review
    Fras

    Report ·

      

    Share this review


    Link to review
    Sponji

    Report ·

      

    Share this review


    Link to review
    Ectara

    Report ·

      

    Share this review


    Link to review
    Aldacron

    Report ·

      

    Share this review


    Link to review
    Genert

    Report ·

      

    Share this review


    Link to review
    Weton

    Report ·

      

    Share this review


    Link to review
    Dave Hunt

    Report ·

      

    Share this review


    Link to review