From the reading I've been doing, it seems like there are a few different ways to supply per-instance data while using instancing in OpenGL. I've tried a couple of these. Here is a rundown, as I understand it. With each example, I'll use the mvp matrix (modelViewProjection) as the per-instance item. I'm hoping that you can help correct any errors in my understanding.
Array Uniforms w/ gl_InstanceId
Example:
layout(location = 0) in vec4 pos;
uniform mat4 mvp[1024];
...
void main() {
gl_Position = mvp*pos;
}
With this method, you're just declaring an array of mat4 as a uniform, and you're using gl_InstanceId to index that array. The main advantage of this method is that it's easy, because it's hardly different than the normal way of using uniforms. However, each element in the array is given its own separate uniform location, and uniform locations are in limited supply (as few as 1024).
Vertex Attributes with Divisor=1
OpenGL example:
#define MVP_INDEX 2
...
glBindBuffer(GL_ARRAY_BUFFER, mvpBuffer);
for (int i = 0; i < 4; ++i) {
GLuint index = MVP_INDEX+i;
glEnableVertexAttribArray(index);
glVertexAttribPointer(index, 4, GL_FLOAT, GL_FALSE, sizeof(GLfloat)*16, (GLvoid*)(sizeof(GLfloat)* i * 4));
glVertexAttribDivisor(index, 1);
}
glBindBuffer(GL_ARRAY_BUFFER, 0);
GLSL example:
layout(location = 0) in vec4 pos;
layout(location = 2) in mat4 mvp;
...
void main() {
gl_Position = mvp*pos;
}
With this method, the mvp matrix just looks like a vertex attribute from the GLSL side of things. However, since a divisor of 1 was specified on the OpenGL side, there is only one matrix stored per instance, rather than one per vertex. This allows very clean access to a large number of matrices (as many as a buffer object can hold). You also get all of the advantages that other buffer objects have, such as streaming using orphaning or mapping strategies. However, each matrix uses four vertex attrib locations. There may be as few as 16 total vertex attrib locations available. If you plan on using a shader that requires multiple sets of UV coordinates, blend weights, etc., then you may not have enough vertex attrib locations to use this method.
So, I'm trying to find a method that will allow thousands of instances without using up precious vertex attrib locations. I am hoping that Uniform Buffer Objects or SSBOs will come to the rescue. I haven't yet attempted to use them for this purpose, nor have I found many examples of people online using them for this purpose. Maybe there is a reason for that. . So here's my current understanding of how it works. I would be much obliged if someone could read it over, and tell me where I'm wrong.
Uniform Buffer Objects
OpenGL example:
GLuint mvpBuffer;
// GenBuffers, BufferData, etc.
glBindBufferBase(GL_UNIFORM_BUFFER, 0, mvpBuffer);
GLuint uniformBlockIndex = glGetUniformBlockIndex(myProgram, "mvpBlock");
glUniformBlockBinding(myProgram, uniformBlockIndex, 0);
GLSL example:
layout(row_major) uniform MVP {
mat4 mvp;
} mvps[1024];
void main() {
gl_Position = mvps[gl_InstanceId]*pos;
}
It seems like this could alleviate restrictions with attrib locations. However, you are limited by GL_MAX_UNIFORM_BLOCK_SIZE, which I believe includes each instance in the instance array. This can be as low as 64kB, which in our case would only allow for 1024 instances, which is no better than the first method.
Shader Storage Buffer Objects
This method would be essentially identical to the Uniform Buffer method, except the interface block type is buffer and you can use a lot more memory. You can also write to the SSBO from within the shader, but that is not necessary for this application. On the down side, the Wiki says that this method is slower than Uniform buffers. Again, I haven't tested this myself, so I may be mistaken about how this works.