problem writing complex vertex shader

Started by
4 comments, last by eastcowboy 10 years, 8 months ago

Greetings, everyone.

Recently I've been interested in Warcraft3's model system.

I download the War3ModelEditor source code (from: http://home.magosx.com/index.php?topic=6.0), read it, and rewrite a program witch can render Warcraft3's model using OpenGL ES.

When I run this code on an Android phone, it looks good but, when there're more than 5 models in the screen, the FPS becomes very low.

Currently I do all the bone animation(matrix calculation and vertex position calculation) in CPU side.

I think it might be faster if we can do all these works in GPU side.

But I just don't know how to do it sad.png

The Warcraft3's vertex position calculation is complex for me.

Let me explain a little more.

In a Warcraft3's model, each vertex is linked to one or moe bone.

Here is how the War3ModelEditor calculate the vertex's position:


step1. for each bone[i], calculate matrix_list[i]
step2. for each vertex
           position = (matrix_list[vertex_bone[0]] * v
                    +  matrix_list[vertex_bone[1]] * v
                    +  ...
                    +  matrix_list[vertex_bone[n]] * v) / n

note: n is the length of 'vertex_bone', each vertex may have a different 'vertex_bone'.

Actually, several vertex can share a same 'vertex_bone' array,

while several other vertex share another 'vertex_bone' array.

For example, a model with 500 vertices may have only 35 different 'vertex_bone' arrays.

But I don't know how can I make use of this, to optimize the performance.

?

The step1 may be easy. Since a typical Warcraft3 model will have less than 30 bones, we can do this step in CPU side without much performance hit.

But step2 is quite complex.

If I write a vertex shader (GLSL) it will be something like this:


uniform mat4 u_matrix_list[50]; /* there might be more ?? */
attribute float a_n;
attribute float a_vertex_bone[4]; /* there might be more ?? */
attribute vec4 a_position;
void main() {
  float i;
  vec4 p = vec4(0.0, 0.0, 0.0, 1.0);
  for (i = 0; i < a_n; ++i) {
    p += u_matrix_list[int(a_vertex_bone[int(i)])] * a_position;
  }
  gl_Position = p / float(a_n);
}

There're some problems.

1. When I compile the vertex shader above (on my laptop, either than an Android phone), it reports 'success' with a warning message 'OpenGL does not allow attributes of type float[4]'.

And some times (when I change the order of the 3 attributes) it cause my program goes down, with a message 'The NVIDIA OpenGL driver lost connection with the display driver due to exceeding the Windows Time-Out limit and is unable to continue.'

2. The book <OpenGL ES 2.0 Programming Guide> page 83, says that 'many OpenGL ES only mandates that array indexing be supported by constant integral expressions (there is an exception to this, which is the indexing of uniform variables in vertex shaders that is discussed in Chapter 8).', so the statement 'a_vertex_bone[int(i)]' might not work on some OpenGL ES hardware.

Actually I've never write such a complex(?) shader before.

Any one could you give me some advice?

Thank you.

Advertisement

You're on the right track! A uniform array of bones, and vertex attributes that index into said array is the common way to handle this.

For your specific problem, I have a solution that should work but will limit you to 4 bones per vertex (I can't imagine this is a problem for WC3 models, but please let me know if it is.)

You could try representing your bone weights as a vec4 instead of an array in the attribute. From there, you could add a second vec4 attribute representing how many bones affect a vertex (such as [1.0, 1.0, 0.0, 0.0] for two bones).

Finally, If you take the dot product of this vector with itself, you conveniently enough get the number of bones out! (if we call the vector above v, then dot(v,v) = (1.0*1.0 + 1.0*1.0 + 0.0*0.0 + 0.0*0.0) = 2.0)

This would change your attribs to:


attribute vec4 a_position;
attribute vec4 bone_weights;
attribute vec4 bone_mask;

You would also remove the for loop above, and just say


vec4 p = vec4(0,0,0,1);
p += u_matrix_list[int(bone_weights.x)]* a_position*bone_mask.x;
p += u_matrix_list[int(bone_weights.y)]* a_position*bone_mask.y;
p += u_matrix_list[int(bone_weights.z)]* a_position*bone_mask.z;
p += u_matrix_list[int(bone_weights.w)]* a_position*bone_mask.w;
gl_Position = p / dot(bone_mask,bone_mask);

Hope this helps!

Koehler, thank you very much for your reply. It helps me a lot.

Especially the 'dot product', that is wonderful.

But let me point out this.

The code "vec4 p = vec4(0,0,0,1);" you wrote, will actually be "vec4 p = vec4(0,0,0,0);". Or the transformation will not be correct.

Based on your idea, I've changed my source code.

I'm not very famillar about OpenGL version 2.0 and above. Fortunately I did it with a success:).

And there're still some issues that need to be think about.

Let me put my shader source code down here:

(Yes you can see there's something like gl_TextureMatrix and gl_ModelViewProjectionMatrix. That's because the first version of my program is written on an old PC witch only supports OpenGL 1.4. I'll modify these when necessary)


/* vertex shader */
uniform mat4 u_matrix_list[202];
attribute vec3 a_position;
attribute vec2 a_texcoord;
attribute vec4 a_mat_indices;
attribute vec4 a_mat_weights;
varying vec2 v_texcoord;
void main() {
  v_texcoord = (gl_TextureMatrix[0] * vec4(a_texcoord, 0.0, 1.0)).xy;
  vec4 p0 = vec4(a_position, 1.0);
  vec4 p = vec4(0.0, 0.0, 0.0, 0.0);
  p += (u_matrix_list[(int)a_mat_indices[0]] * p0) * a_mat_weights[0];
  p += (u_matrix_list[(int)a_mat_indices[1]] * p0) * a_mat_weights[1];
  p += (u_matrix_list[(int)a_mat_indices[2]] * p0) * a_mat_weights[2];
  p += (u_matrix_list[(int)a_mat_indices[3]] * p0) * a_mat_weights[3];
  p /= dot(a_mat_weights, a_mat_weights);
  gl_Position = gl_ModelViewProjectionMatrix * p;
};

/* fragment shader */
uniform sampler2D tex;
uniform vec4 u_color;
varying vec2 v_texcoord;
void main() {
  gl_FragColor = u_color * texture2D(tex, v_texcoord);
}

Issues:

1. I wrote "uniform mat4 u_matrix_list[202];", this is a very large array for GPU.

I found that many of Warcraft3's unit model have less than 100 bones. For example a water elemental has 69 bones, and a footman has 49 bones.

But the buildings' model have many more bones. When I use the model 'AncientOfLore.mdx' for test. I found that it has 202 bones. So I declared such a large array. According to the MDX format, there can be up to 256 nodes(since the node's ID is a BYTE). But when I wrote "uniform mat4 u_matrix_list[256];" the glLinkProgram fails, with an error message "error C6007: Constant register limit exceeded; more than 1024 constant registers needed to compiled program".

I hear that if we store a mat4 as 3 vec4, it may save some space. But that may not be enough. The OpenGL ES 2.0 only ensure to have 128 vec4 uniform variables (glGetIntegeri with GL_MAX_VERTEX_UNIFORM_VECTORS), so we can only use 128 / 3 = 42 bones or less?

Or we can try to use a texture to store some more data. The book <OpenGL ES 2.0 Programming Guide> says that "Samplers in a vertex shader are optional". The POWERVR SGX seems to support it. But we need some more information to decide whether or not to use it.

2. Yes, the <Warcraft III Art Tools Documention.pdf> says that "Up to four bones can influence one vertex.". So we can use an vec4 attribute to simulate an float[4] array.

But I found there're some exceptions. For example a water elemetal has some vertices that are influenced by up to 6 bones. This is not very critical because we can add 2 more attribute to fix it.

In my test I just use the first 4 bones, and ignore the last 2, it looks fine without any obvious problem. So let's just ignore it for now:)

Here's some snapshot of my test program.

I'd like to share my happy feeling with you. Thank you again.

[attachment=16979:testGL.01.png]

[attachment=16980:testGL.02.png]

[attachment=16981:testGL.03.png]

[attachment=16982:testGL.04.png]

Glad to see you caught my mistake. I was calling "indices" weights, also. Clearly I didn't test that code :/

Those results look good! I am surprised that the ancients have so many bones. If I had to guess, maybe WC3 probably did software skinning so it didn't matter?

As an option, maybe you could look through the model and split the mesh based on the bone indices accessed? (half for indices < 110 or something, half for >110) and do two draw calls for the big guys.. This would work best if pieces don't rely on the root bone bones too much.

Alternatively you could split the model and duplicate the most shared bones into each of the two smaller models' bone arrays, changing the indices on your vertex data appropriately. It still might let you cut down the number enough to fit into your uniform space.

Yes the 'AncientOfLore.mdx' has many bones. When I found this for the first time, I am surprised too.

Once again the Warcraft3's model do not obey the rule they've made in the <Warcraft III Art Tools Documention.pdf>.

According to the documention, a building should have at most 15 bones. And a really big unit should have at most 30 bones.

By the way, OpenGL 2.0 spec is released on the year 2004. Warcraft3 is released before that. So I think Warcraft3 is not using a shader to do the bone animation.

I've noticed that, not all the bones are used by the mesh. Some of the bones are used for attaching another model, or used by a particle emitter, etc.

For example, when an AncientOfLore tree was badly damaged, some places of the tree body will be on fire. Each place uses a particle emitter to draw the fire, and a particle emitter needs a bone. Simply speaking, 6 places of fire will use 6 bones.

We can ignore these bones when we are loading bone matrices to the shader.

There is a concept named "geoset" in the Warcraft3's model. A geoset contains data like vertex positions, texture coords, normals, and the indices of bone matrix. One model may have one or more geoset(s).

Before today I thought that each vertex in each geoset can be linked to any bone of this model. When I see these words "split the mesh" I guess we may make use of the geoset directly, rather than split the mesh by an algorithm.

So I did a simple test.

The 'AncientOfLore.mdx' model has 12 geosets. And in the animation sequence "stand work alternate" there're 6 of them are visible(The documention says that one model should have at most 5 visible geosets!). The number of bones used in each geoset are: 27, 62, 3, 3, 8, 2. All these numbers are much lesser than 202.

But for OpenGL ES, the 62 bones is still too many and will need to split into smaller parts.

So if I need to display an 'AncientOfLore.mdx' on my Android phone, I have to design an algorithm to split a geoset into two or more small geosets.

The next step is to design and implement this algorithm. I think that will not be easy for me. But I'll try it.

This topic is closed to new replies.

Advertisement