Sign in to follow this  

OpenGL problem writing complex vertex shader

Recommended Posts

eastcowboy    170

Greetings, everyone.


Recently I've been interested in Warcraft3's model system.

I download the War3ModelEditor source code (from:, read it, and rewrite a program witch can render Warcraft3's model using OpenGL ES.

When I run this code on an Android phone, it looks good but, when there're more than 5 models in the screen, the FPS becomes very low.


Currently I do all the bone animation(matrix calculation and vertex position calculation) in CPU side.

I think it might be faster if we can do all these works in GPU side.

But I just don't know how to do it sad.png

The Warcraft3's vertex position calculation is complex for me.


Let me explain a little more.

In a Warcraft3's model, each vertex is linked to one or moe bone.

Here is how the War3ModelEditor calculate the vertex's position:

step1. for each bone[i], calculate matrix_list[i]
step2. for each vertex
           position = (matrix_list[vertex_bone[0]] * v
                    +  matrix_list[vertex_bone[1]] * v
                    +  ...
                    +  matrix_list[vertex_bone[n]] * v) / n

note: n is the length of 'vertex_bone', each vertex may have a different 'vertex_bone'.

Actually, several vertex can share a same 'vertex_bone' array,

while several other vertex share another 'vertex_bone' array.

For example, a model with 500 vertices may have only 35 different 'vertex_bone' arrays.

But I don't know how can I make use of this, to optimize the performance.




The step1 may be easy. Since a typical Warcraft3 model will have less than 30 bones, we can do this step in CPU side without much performance hit.

But step2 is quite complex.


If I write a vertex shader (GLSL) it will be something like this:

uniform mat4 u_matrix_list[50]; /* there might be more ?? */
attribute float a_n;
attribute float a_vertex_bone[4]; /* there might be more ?? */
attribute vec4 a_position;
void main() {
  float i;
  vec4 p = vec4(0.0, 0.0, 0.0, 1.0);
  for (i = 0; i < a_n; ++i) {
    p += u_matrix_list[int(a_vertex_bone[int(i)])] * a_position;
  gl_Position = p / float(a_n);

There're some problems.

1. When I compile the vertex shader above (on my laptop, either than an Android phone), it reports 'success' with a warning message 'OpenGL does not allow attributes of type float[4]'.

And some times (when I change the order of the 3 attributes) it cause my program goes down, with a message 'The NVIDIA OpenGL driver lost connection with the display driver due to exceeding the Windows Time-Out limit and is unable to continue.'

2. The book <OpenGL ES 2.0 Programming Guide> page 83, says that 'many OpenGL ES only mandates that array indexing be supported by constant integral expressions (there is an exception to this, which is the indexing of uniform variables in vertex shaders that is discussed in Chapter 8).', so the statement 'a_vertex_bone[int(i)]' might not work on some OpenGL ES hardware.



Actually I've never write such a complex(?) shader before.

Any one could you give me some advice?

Thank you.

Share this post

Link to post
Share on other sites
Koehler    228

You're on the right track!  A uniform array of bones, and vertex attributes that index into said array is the common way to handle this. 


For your specific problem, I have a solution that should work but will limit you to 4 bones per vertex (I can't imagine this is a problem for WC3 models, but please let me know if it is.)

You could try representing your bone weights as a vec4 instead of an array in the attribute. From there, you could add a second vec4 attribute representing how many bones affect a vertex (such as [1.0, 1.0, 0.0, 0.0] for two bones).


Finally, If you take the dot product of this vector with itself, you conveniently enough get the number of bones out! (if we call the vector above v, then dot(v,v) = (1.0*1.0  + 1.0*1.0 + 0.0*0.0 + 0.0*0.0) = 2.0)


This would change your attribs to: 

attribute vec4 a_position;
attribute vec4 bone_weights;
attribute vec4 bone_mask;

You would also remove the for loop above, and just say 

vec4 p = vec4(0,0,0,1);
p += u_matrix_list[int(bone_weights.x)]* a_position*bone_mask.x;
p += u_matrix_list[int(bone_weights.y)]* a_position*bone_mask.y;
p += u_matrix_list[int(bone_weights.z)]* a_position*bone_mask.z;
p += u_matrix_list[int(bone_weights.w)]* a_position*bone_mask.w;
gl_Position = p / dot(bone_mask,bone_mask);

Hope this helps!

Share this post

Link to post
Share on other sites
eastcowboy    170

Koehler, thank you very much for your reply. It helps me a lot.

Especially the 'dot product', that is wonderful.


But let me point out this.

The code "vec4 p = vec4(0,0,0,1);" you wrote, will actually be "vec4 p = vec4(0,0,0,0);". Or the transformation will not be correct.


Based on your idea, I've changed my source code.

I'm not very famillar about OpenGL version 2.0 and above. Fortunately I did it with a success:).

And there're still some issues that need to be think about.


Let me put my shader source code down here:

(Yes you can see there's something like gl_TextureMatrix and gl_ModelViewProjectionMatrix. That's because the first version of my program is written on an old PC witch only supports OpenGL 1.4. I'll modify these when necessary)

/* vertex shader */
uniform mat4 u_matrix_list[202];
attribute vec3 a_position;
attribute vec2 a_texcoord;
attribute vec4 a_mat_indices;
attribute vec4 a_mat_weights;
varying vec2 v_texcoord;
void main() {
  v_texcoord = (gl_TextureMatrix[0] * vec4(a_texcoord, 0.0, 1.0)).xy;
  vec4 p0 = vec4(a_position, 1.0);
  vec4 p = vec4(0.0, 0.0, 0.0, 0.0);
  p += (u_matrix_list[(int)a_mat_indices[0]] * p0) * a_mat_weights[0];
  p += (u_matrix_list[(int)a_mat_indices[1]] * p0) * a_mat_weights[1];
  p += (u_matrix_list[(int)a_mat_indices[2]] * p0) * a_mat_weights[2];
  p += (u_matrix_list[(int)a_mat_indices[3]] * p0) * a_mat_weights[3];
  p /= dot(a_mat_weights, a_mat_weights);
  gl_Position = gl_ModelViewProjectionMatrix * p;

/* fragment shader */
uniform sampler2D tex;
uniform vec4 u_color;
varying vec2 v_texcoord;
void main() {
  gl_FragColor = u_color * texture2D(tex, v_texcoord);


1. I wrote "uniform mat4 u_matrix_list[202];", this is a very large array for GPU.

    I found that many of Warcraft3's unit model have less than 100 bones. For example a water elemental has 69 bones, and a footman has 49 bones.

    But the buildings' model have many more bones. When I use the model 'AncientOfLore.mdx' for test. I found that it has 202 bones. So I declared such a large array. According to the MDX format, there can be up to 256 nodes(since the node's ID is a BYTE). But when I wrote "uniform mat4 u_matrix_list[256];" the glLinkProgram fails, with an error message "error C6007: Constant register limit exceeded; more than 1024 constant registers needed to compiled program".

   I hear that if we store a mat4 as 3 vec4, it may save some space. But that may not be enough. The OpenGL ES 2.0 only ensure to have 128 vec4 uniform variables (glGetIntegeri with GL_MAX_VERTEX_UNIFORM_VECTORS), so we can only use 128 / 3 = 42 bones or less?

  Or we can try to use a texture to store some more data. The book <OpenGL ES 2.0 Programming Guide> says that "Samplers in a vertex shader are optional". The POWERVR SGX seems to support it. But we need some more information to decide whether or not to use it.


2. Yes, the <Warcraft III Art Tools Documention.pdf> says that "Up to four bones can influence one vertex.". So we can use an vec4 attribute to simulate an float[4] array.

    But I found there're some exceptions. For example a water elemetal has some vertices that are influenced by up to 6 bones. This is not very critical because we can add 2 more attribute to fix it.

    In my test I just use the first 4 bones, and ignore the last 2, it looks fine without any obvious problem. So let's just ignore it for now:)

Share this post

Link to post
Share on other sites
eastcowboy    170

Here's some snapshot of my test program.

I'd like to share my happy feeling with you. Thank you again.





Share this post

Link to post
Share on other sites
Koehler    228

Glad to see you caught my mistake. I was calling "indices" weights, also. Clearly I didn't test that code :/


Those results look good! I am surprised that the ancients have so many bones. If I had to guess, maybe WC3 probably did software skinning so it didn't matter?


As an option, maybe you could look through the model and split the mesh based on the bone indices accessed? (half for indices < 110 or something, half for >110) and do two draw calls for the big guys.. This would work best if pieces don't rely on the root bone bones too much.


Alternatively you could split the model and duplicate the most shared bones into each of the two smaller models' bone arrays, changing the indices on your vertex data appropriately. It still might let you cut down the number enough to fit into your uniform space. 

Edited by Koehler

Share this post

Link to post
Share on other sites
eastcowboy    170

Yes the 'AncientOfLore.mdx' has many bones. When I found this for the first time, I am surprised too.

Once again the Warcraft3's model do not obey the rule they've made in the <Warcraft III Art Tools Documention.pdf>.

According to the documention, a building should have at most 15 bones. And a really big unit should have at most 30 bones.


By the way, OpenGL 2.0 spec is released on the year 2004. Warcraft3 is released before that. So I think Warcraft3 is not using a shader to do the bone animation.


I've noticed that, not all the bones are used by the mesh. Some of the bones are used for attaching another model, or used by a particle emitter, etc.

For example, when an AncientOfLore tree was badly damaged, some places of the tree body will be on fire. Each place uses a particle emitter to draw the fire, and a particle emitter needs a bone. Simply speaking, 6 places of fire will use 6 bones.

We can ignore these bones when we are loading bone matrices to the shader.


There is a concept named "geoset" in the Warcraft3's model. A geoset contains data like vertex positions, texture coords, normals, and the indices of bone matrix. One model may have one or more geoset(s).

Before today I thought that each vertex in each geoset can be linked to any bone of this model. When I see these words "split the mesh" I guess we may make use of the geoset directly, rather than split the mesh by an algorithm.

So I did a simple test.

The 'AncientOfLore.mdx' model has 12 geosets. And in the animation sequence "stand work alternate" there're 6 of them are visible(The documention says that one model should have at most 5 visible geosets!). The number of bones used in each geoset are: 27, 62, 3, 3, 8, 2. All these numbers are much lesser than 202.

But for OpenGL ES, the 62 bones is still too many and will need to split into smaller parts.


So if I need to display an 'AncientOfLore.mdx' on my Android phone, I have to design an algorithm to split a geoset into two or more small geosets.

The next step is to design and implement this algorithm. I think that will not be easy for me. But I'll try it.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Similar Content

    • By Zaphyk
      I am developing my engine using the OpenGL 3.3 compatibility profile. It runs as expected on my NVIDIA card and on my Intel Card however when I tried it on an AMD setup it ran 3 times worse than on the other setups. Could this be a AMD driver thing or is this probably a problem with my OGL code? Could a different code standard create such bad performance?
    • By Kjell Andersson
      I'm trying to get some legacy OpenGL code to run with a shader pipeline,
      The legacy code uses glVertexPointer(), glColorPointer(), glNormalPointer() and glTexCoordPointer() to supply the vertex information.
      I know that it should be using setVertexAttribPointer() etc to clearly define the layout but that is not an option right now since the legacy code can't be modified to that extent.
      I've got a version 330 vertex shader to somewhat work:
      #version 330 uniform mat4 osg_ModelViewProjectionMatrix; uniform mat4 osg_ModelViewMatrix; layout(location = 0) in vec4 Vertex; layout(location = 2) in vec4 Normal; // Velocity layout(location = 3) in vec3 TexCoord; // TODO: is this the right layout location? out VertexData { vec4 color; vec3 velocity; float size; } VertexOut; void main(void) { vec4 p0 = Vertex; vec4 p1 = Vertex + vec4(Normal.x, Normal.y, Normal.z, 0.0f); vec3 velocity = (osg_ModelViewProjectionMatrix * p1 - osg_ModelViewProjectionMatrix * p0).xyz; VertexOut.velocity = velocity; VertexOut.size = TexCoord.y; gl_Position = osg_ModelViewMatrix * Vertex; } What works is the Vertex and Normal information that the legacy C++ OpenGL code seem to provide in layout location 0 and 2. This is fine.
      What I'm not getting to work is the TexCoord information that is supplied by a glTexCoordPointer() call in C++.
      What layout location is the old standard pipeline using for glTexCoordPointer()? Or is this undefined?
      Side note: I'm trying to get an OpenSceneGraph 3.4.0 particle system to use custom vertex, geometry and fragment shaders for rendering the particles.
    • By markshaw001
      Hi i am new to this forum  i wanted to ask for help from all of you i want to generate real time terrain using a 32 bit heightmap i am good at c++ and have started learning Opengl as i am very interested in making landscapes in opengl i have looked around the internet for help about this topic but i am not getting the hang of the concepts and what they are doing can some here suggests me some good resources for making terrain engine please for example like tutorials,books etc so that i can understand the whole concept of terrain generation.
    • By KarimIO
      Hey guys. I'm trying to get my application to work on my Nvidia GTX 970 desktop. It currently works on my Intel HD 3000 laptop, but on the desktop, every bind textures specifically from framebuffers, I get half a second of lag. This is done 4 times as I have three RGBA textures and one depth 32F buffer. I tried to use debugging software for the first time - RenderDoc only shows SwapBuffers() and no OGL calls, while Nvidia Nsight crashes upon execution, so neither are helpful. Without binding it runs regularly. This does not happen with non-framebuffer binds.
      GLFramebuffer::GLFramebuffer(FramebufferCreateInfo createInfo) { glGenFramebuffers(1, &fbo); glBindFramebuffer(GL_FRAMEBUFFER, fbo); textures = new GLuint[createInfo.numColorTargets]; glGenTextures(createInfo.numColorTargets, textures); GLenum *DrawBuffers = new GLenum[createInfo.numColorTargets]; for (uint32_t i = 0; i < createInfo.numColorTargets; i++) { glBindTexture(GL_TEXTURE_2D, textures[i]); GLint internalFormat; GLenum format; TranslateFormats(createInfo.colorFormats[i], format, internalFormat); // returns GL_RGBA and GL_RGBA glTexImage2D(GL_TEXTURE_2D, 0, internalFormat, createInfo.width, createInfo.height, 0, format, GL_FLOAT, 0); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); DrawBuffers[i] = GL_COLOR_ATTACHMENT0 + i; glBindTexture(GL_TEXTURE_2D, 0); glFramebufferTexture(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0 + i, textures[i], 0); } if (createInfo.depthFormat != FORMAT_DEPTH_NONE) { GLenum depthFormat; switch (createInfo.depthFormat) { case FORMAT_DEPTH_16: depthFormat = GL_DEPTH_COMPONENT16; break; case FORMAT_DEPTH_24: depthFormat = GL_DEPTH_COMPONENT24; break; case FORMAT_DEPTH_32: depthFormat = GL_DEPTH_COMPONENT32; break; case FORMAT_DEPTH_24_STENCIL_8: depthFormat = GL_DEPTH24_STENCIL8; break; case FORMAT_DEPTH_32_STENCIL_8: depthFormat = GL_DEPTH32F_STENCIL8; break; } glGenTextures(1, &depthrenderbuffer); glBindTexture(GL_TEXTURE_2D, depthrenderbuffer); glTexImage2D(GL_TEXTURE_2D, 0, depthFormat, createInfo.width, createInfo.height, 0, GL_DEPTH_COMPONENT, GL_FLOAT, 0); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST); glBindTexture(GL_TEXTURE_2D, 0); glFramebufferTexture(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, depthrenderbuffer, 0); } if (createInfo.numColorTargets > 0) glDrawBuffers(createInfo.numColorTargets, DrawBuffers); else glDrawBuffer(GL_NONE); if (glCheckFramebufferStatus(GL_FRAMEBUFFER) != GL_FRAMEBUFFER_COMPLETE) std::cout << "Framebuffer Incomplete\n"; glBindFramebuffer(GL_FRAMEBUFFER, 0); width = createInfo.width; height = createInfo.height; } // ... // FBO Creation FramebufferCreateInfo gbufferCI; gbufferCI.colorFormats =; gbufferCI.depthFormat = FORMAT_DEPTH_32; gbufferCI.numColorTargets = gbufferCFs.size(); gbufferCI.width = engine.settings.resolutionX; gbufferCI.height = engine.settings.resolutionY; gbufferCI.renderPass = nullptr; gbuffer = graphicsWrapper->CreateFramebuffer(gbufferCI); // Bind glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fbo); // Draw here... // Bind to textures glActiveTexture(GL_TEXTURE0); glBindTexture(GL_TEXTURE_2D, textures[0]); glActiveTexture(GL_TEXTURE1); glBindTexture(GL_TEXTURE_2D, textures[1]); glActiveTexture(GL_TEXTURE2); glBindTexture(GL_TEXTURE_2D, textures[2]); glActiveTexture(GL_TEXTURE3); glBindTexture(GL_TEXTURE_2D, depthrenderbuffer); Here is an extract of my code. I can't think of anything else to include. I've really been butting my head into a wall trying to think of a reason but I can think of none and all my research yields nothing. Thanks in advance!
    • By Adrianensis
      Hi everyone, I've shared my 2D Game Engine source code. It's the result of 4 years working on it (and I still continue improving features ) and I want to share with the community. You can see some videos on youtube and some demo gifs on my twitter account.
      This Engine has been developed as End-of-Degree Project and it is coded in Javascript, WebGL and GLSL. The engine is written from scratch.
      This is not a professional engine but it's for learning purposes, so anyone can review the code an learn basis about graphics, physics or game engine architecture. Source code on this GitHub repository.
      I'm available for a good conversation about Game Engine / Graphics Programming
  • Popular Now