# OpenGL problem writing complex vertex shader

This topic is 1633 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Greetings, everyone.

Recently I've been interested in Warcraft3's model system.

I download the War3ModelEditor source code (from: http://home.magosx.com/index.php?topic=6.0), read it, and rewrite a program witch can render Warcraft3's model using OpenGL ES.

When I run this code on an Android phone, it looks good but, when there're more than 5 models in the screen, the FPS becomes very low.

Currently I do all the bone animation(matrix calculation and vertex position calculation) in CPU side.

I think it might be faster if we can do all these works in GPU side.

But I just don't know how to do it

The Warcraft3's vertex position calculation is complex for me.

Let me explain a little more.

In a Warcraft3's model, each vertex is linked to one or moe bone.

Here is how the War3ModelEditor calculate the vertex's position:

step1. for each bone[i], calculate matrix_list[i]
step2. for each vertex
position = (matrix_list[vertex_bone[0]] * v
+  matrix_list[vertex_bone[1]] * v
+  ...
+  matrix_list[vertex_bone[n]] * v) / n

note: n is the length of 'vertex_bone', each vertex may have a different 'vertex_bone'.


Actually, several vertex can share a same 'vertex_bone' array,

while several other vertex share another 'vertex_bone' array.

For example, a model with 500 vertices may have only 35 different 'vertex_bone' arrays.

But I don't know how can I make use of this, to optimize the performance.

?

The step1 may be easy. Since a typical Warcraft3 model will have less than 30 bones, we can do this step in CPU side without much performance hit.

But step2 is quite complex.

If I write a vertex shader (GLSL) it will be something like this:

uniform mat4 u_matrix_list[50]; /* there might be more ?? */
attribute float a_n;
attribute float a_vertex_bone[4]; /* there might be more ?? */
attribute vec4 a_position;
void main() {
float i;
vec4 p = vec4(0.0, 0.0, 0.0, 1.0);
for (i = 0; i < a_n; ++i) {
p += u_matrix_list[int(a_vertex_bone[int(i)])] * a_position;
}
gl_Position = p / float(a_n);
}


There're some problems.

1. When I compile the vertex shader above (on my laptop, either than an Android phone), it reports 'success' with a warning message 'OpenGL does not allow attributes of type float[4]'.

And some times (when I change the order of the 3 attributes) it cause my program goes down, with a message 'The NVIDIA OpenGL driver lost connection with the display driver due to exceeding the Windows Time-Out limit and is unable to continue.'

2. The book <OpenGL ES 2.0 Programming Guide> page 83, says that 'many OpenGL ES only mandates that array indexing be supported by constant integral expressions (there is an exception to this, which is the indexing of uniform variables in vertex shaders that is discussed in Chapter 8).', so the statement 'a_vertex_bone[int(i)]' might not work on some OpenGL ES hardware.

Actually I've never write such a complex(?) shader before.

Any one could you give me some advice?

Thank you.

##### Share on other sites

You're on the right track!  A uniform array of bones, and vertex attributes that index into said array is the common way to handle this.

For your specific problem, I have a solution that should work but will limit you to 4 bones per vertex (I can't imagine this is a problem for WC3 models, but please let me know if it is.)

You could try representing your bone weights as a vec4 instead of an array in the attribute. From there, you could add a second vec4 attribute representing how many bones affect a vertex (such as [1.0, 1.0, 0.0, 0.0] for two bones).

Finally, If you take the dot product of this vector with itself, you conveniently enough get the number of bones out! (if we call the vector above v, then dot(v,v) = (1.0*1.0  + 1.0*1.0 + 0.0*0.0 + 0.0*0.0) = 2.0)

This would change your attribs to:

attribute vec4 a_position;
attribute vec4 bone_weights;


You would also remove the for loop above, and just say

vec4 p = vec4(0,0,0,1);
gl_Position = p / dot(bone_mask,bone_mask);

Hope this helps!

##### Share on other sites

Koehler, thank you very much for your reply. It helps me a lot.

Especially the 'dot product', that is wonderful.

But let me point out this.

The code "vec4 p = vec4(0,0,0,1);" you wrote, will actually be "vec4 p = vec4(0,0,0,0);". Or the transformation will not be correct.

Based on your idea, I've changed my source code.

I'm not very famillar about OpenGL version 2.0 and above. Fortunately I did it with a success:).

And there're still some issues that need to be think about.

Let me put my shader source code down here:

(Yes you can see there's something like gl_TextureMatrix and gl_ModelViewProjectionMatrix. That's because the first version of my program is written on an old PC witch only supports OpenGL 1.4. I'll modify these when necessary)

/* vertex shader */
uniform mat4 u_matrix_list[202];
attribute vec3 a_position;
attribute vec2 a_texcoord;
attribute vec4 a_mat_indices;
attribute vec4 a_mat_weights;
varying vec2 v_texcoord;
void main() {
v_texcoord = (gl_TextureMatrix[0] * vec4(a_texcoord, 0.0, 1.0)).xy;
vec4 p0 = vec4(a_position, 1.0);
vec4 p = vec4(0.0, 0.0, 0.0, 0.0);
p += (u_matrix_list[(int)a_mat_indices[0]] * p0) * a_mat_weights[0];
p += (u_matrix_list[(int)a_mat_indices[1]] * p0) * a_mat_weights[1];
p += (u_matrix_list[(int)a_mat_indices[2]] * p0) * a_mat_weights[2];
p += (u_matrix_list[(int)a_mat_indices[3]] * p0) * a_mat_weights[3];
p /= dot(a_mat_weights, a_mat_weights);
gl_Position = gl_ModelViewProjectionMatrix * p;
};

uniform sampler2D tex;
uniform vec4 u_color;
varying vec2 v_texcoord;
void main() {
gl_FragColor = u_color * texture2D(tex, v_texcoord);
}


Issues:

1. I wrote "uniform mat4 u_matrix_list[202];", this is a very large array for GPU.

I found that many of Warcraft3's unit model have less than 100 bones. For example a water elemental has 69 bones, and a footman has 49 bones.

But the buildings' model have many more bones. When I use the model 'AncientOfLore.mdx' for test. I found that it has 202 bones. So I declared such a large array. According to the MDX format, there can be up to 256 nodes(since the node's ID is a BYTE). But when I wrote "uniform mat4 u_matrix_list[256];" the glLinkProgram fails, with an error message "error C6007: Constant register limit exceeded; more than 1024 constant registers needed to compiled program".

I hear that if we store a mat4 as 3 vec4, it may save some space. But that may not be enough. The OpenGL ES 2.0 only ensure to have 128 vec4 uniform variables (glGetIntegeri with GL_MAX_VERTEX_UNIFORM_VECTORS), so we can only use 128 / 3 = 42 bones or less?

Or we can try to use a texture to store some more data. The book <OpenGL ES 2.0 Programming Guide> says that "Samplers in a vertex shader are optional". The POWERVR SGX seems to support it. But we need some more information to decide whether or not to use it.

2. Yes, the <Warcraft III Art Tools Documention.pdf> says that "Up to four bones can influence one vertex.". So we can use an vec4 attribute to simulate an float[4] array.

But I found there're some exceptions. For example a water elemetal has some vertices that are influenced by up to 6 bones. This is not very critical because we can add 2 more attribute to fix it.

In my test I just use the first 4 bones, and ignore the last 2, it looks fine without any obvious problem. So let's just ignore it for now:)

##### Share on other sites

Here's some snapshot of my test program.

I'd like to share my happy feeling with you. Thank you again.

[attachment=16979:testGL.01.png]

[attachment=16980:testGL.02.png]

[attachment=16981:testGL.03.png]

[attachment=16982:testGL.04.png]

##### Share on other sites

Glad to see you caught my mistake. I was calling "indices" weights, also. Clearly I didn't test that code :/

Those results look good! I am surprised that the ancients have so many bones. If I had to guess, maybe WC3 probably did software skinning so it didn't matter?

As an option, maybe you could look through the model and split the mesh based on the bone indices accessed? (half for indices < 110 or something, half for >110) and do two draw calls for the big guys.. This would work best if pieces don't rely on the root bone bones too much.

Alternatively you could split the model and duplicate the most shared bones into each of the two smaller models' bone arrays, changing the indices on your vertex data appropriately. It still might let you cut down the number enough to fit into your uniform space.

Edited by Koehler

##### Share on other sites

Yes the 'AncientOfLore.mdx' has many bones. When I found this for the first time, I am surprised too.

Once again the Warcraft3's model do not obey the rule they've made in the <Warcraft III Art Tools Documention.pdf>.

According to the documention, a building should have at most 15 bones. And a really big unit should have at most 30 bones.

By the way, OpenGL 2.0 spec is released on the year 2004. Warcraft3 is released before that. So I think Warcraft3 is not using a shader to do the bone animation.

I've noticed that, not all the bones are used by the mesh. Some of the bones are used for attaching another model, or used by a particle emitter, etc.

For example, when an AncientOfLore tree was badly damaged, some places of the tree body will be on fire. Each place uses a particle emitter to draw the fire, and a particle emitter needs a bone. Simply speaking, 6 places of fire will use 6 bones.

There is a concept named "geoset" in the Warcraft3's model. A geoset contains data like vertex positions, texture coords, normals, and the indices of bone matrix. One model may have one or more geoset(s).

Before today I thought that each vertex in each geoset can be linked to any bone of this model. When I see these words "split the mesh" I guess we may make use of the geoset directly, rather than split the mesh by an algorithm.

So I did a simple test.

The 'AncientOfLore.mdx' model has 12 geosets. And in the animation sequence "stand work alternate" there're 6 of them are visible(The documention says that one model should have at most 5 visible geosets!). The number of bones used in each geoset are: 27, 62, 3, 3, 8, 2. All these numbers are much lesser than 202.

But for OpenGL ES, the 62 bones is still too many and will need to split into smaller parts.

So if I need to display an 'AncientOfLore.mdx' on my Android phone, I have to design an algorithm to split a geoset into two or more small geosets.

The next step is to design and implement this algorithm. I think that will not be easy for me. But I'll try it.

##### Share on other sites

This topic is 1633 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Create an account

Register a new account

• ### Similar Content

• By xhcao
Does sync be needed to read texture content after access texture image in compute shader?
My simple code is as below,
glUseProgram(program.get());
glBindImageTexture(0, texture[0], 0, GL_FALSE, 3, GL_READ_ONLY, GL_R32UI);
glBindImageTexture(1, texture[1], 0, GL_FALSE, 4, GL_WRITE_ONLY, GL_R32UI);
glDispatchCompute(1, 1, 1);
// Does sync be needed here?
glUseProgram(0);
GL_TEXTURE_CUBE_MAP_POSITIVE_X + face, texture[1], 0);
glReadPixels(0, 0, kWidth, kHeight, GL_RED_INTEGER, GL_UNSIGNED_INT, outputValues);

Compute shader is very simple, imageLoad content from texture[0], and imageStore content to texture[1]. Does need to sync after dispatchCompute?

• My question: is it possible to transform multiple angular velocities so that they can be reinserted as one? My research is below:

• I have this code below in both my vertex and fragment shader, however when I request glGetUniformLocation("Lights[0].diffuse") or "Lights[0].attenuation", it returns -1. It will only give me a valid uniform location if I actually use the diffuse/attenuation variables in the VERTEX shader. Because I use position in the vertex shader, it always returns a valid uniform location. I've read that I can share uniforms across both vertex and fragment, but I'm confused what this is even compiling to if this is the case.

#define NUM_LIGHTS 2
struct Light
{
vec3 position;
vec3 diffuse;
float attenuation;
};
uniform Light Lights[NUM_LIGHTS];

• By pr033r
Hello,
I have a Bachelor project on topic "Implenet 3D Boid's algorithm in OpenGL". All OpenGL issues works fine for me, all rendering etc. But when I started implement the boid's algorithm it was getting worse and worse. I read article (http://natureofcode.com/book/chapter-6-autonomous-agents/) inspirate from another code (here: https://github.com/jyanar/Boids/tree/master/src) but it still doesn't work like in tutorials and videos. For example the main problem: when I apply Cohesion (one of three main laws of boids) it makes some "cycling knot". Second, when some flock touch to another it scary change the coordination or respawn in origin (x: 0, y:0. z:0). Just some streng things.
I followed many tutorials, change a try everything but it isn't so smooth, without lags like in another videos. I really need your help.
My code (optimalizing branch): https://github.com/pr033r/BachelorProject/tree/Optimalizing
Exe file (if you want to look) and models folder (for those who will download the sources):
http://leteckaposta.cz/367190436
Thanks for any help...

• By Andrija
I am currently trying to implement shadow mapping into my project , but although i can render my depth map to the screen and it looks okay , when i sample it with shadowCoords there is no shadow.
Here is my light space matrix calculation
mat4x4 lightViewMatrix; vec3 sun_pos = {SUN_OFFSET * the_sun->direction[0], SUN_OFFSET * the_sun->direction[1], SUN_OFFSET * the_sun->direction[2]}; mat4x4_look_at(lightViewMatrix,sun_pos,player->pos,up); mat4x4_mul(lightSpaceMatrix,lightProjMatrix,lightViewMatrix); I will tweak the values for the size and frustum of the shadow map, but for now i just want to draw shadows around the player position
the_sun->direction is a normalized vector so i multiply it by a constant to get the position.
player->pos is the camera position in world space
the light projection matrix is calculated like this:
uniform mat4 light_space_matrix; void main() { gl_Position = light_space_matrix * transfMatrix * vec4(position, 1.0f); } Shadow fragment shader:
out float fragDepth; void main() { fragDepth = gl_FragCoord.z; } I am using deferred rendering so i have all my world positions in the g_positions buffer
get_shadow_fac(light_space_matrix * vec4(position,1.0)); Where position is the value i got from sampling the g_position buffer
Here is my depth texture (i know it will produce low quality shadows but i just want to get it working for now):
sorry because of the compression , the black smudges are trees ... https://i.stack.imgur.com/T43aK.jpg
EDIT: Depth texture attachment:
glTexImage2D(GL_TEXTURE_2D, 0,GL_DEPTH_COMPONENT24,fbo->width,fbo->height,0,GL_DEPTH_COMPONENT,GL_FLOAT,NULL); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE); glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE); glFramebufferTexture2D(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_TEXTURE_2D, fbo->depthTexture, 0);

• 13
• 16
• 10
• 17
• 11