Cg skinning shader is very slow

Started by
4 comments, last by Gumgo 15 years, 10 months ago
Hello. I'm having trouble with this Cg shader for skinning:

struct input // vertex model
{
	float3 position	: POSITION;
	float3 normal	: NORMAL;
	float4 color	: COLOR0;
	float2 texcoord	: TEXCOORD0;
	float4 indices;
	float4 weights;
};

struct output
{
	float4 position	: POSITION;
	float2 texcoord	: TEXCOORD0;
	float4 color	: COLOR0;
};

output main( input IN,				// vertex
	     uniform float4x4 modelViewProj,	// modelview matrix
	     uniform float4 bones[90] )		// bones (30*3)
{
	output OUT;

	float3 pos = float3( 0.0f, 0.0f, 0.0f );

	for (int i = 0; i < 4; i++)
	{
		float3x4 matrix = float3x4( bones[IN.indices*3],
					    bones[IN.indices*3+1],
					    bones[IN.indices*3+2] );
		pos += IN.weights * mul( matrix, float4( IN.position, 1.0f ) );
	}

	OUT.position = mul( modelViewProj, float4( pos, 1.0f ) );
	OUT.color = IN.color;
	OUT.texcoord = IN.texcoord;

	return OUT;
}




With a ~700 vertex model with 22 bones, this runs at ~4800 microseconds per model! Way to slow. However, if I comment out the for loop in the middle, I'm getting around ~1300 microseconds (still very slow, but I can probably get it way lower because I haven't done anything to speed up the drawing code in OpenGL). I can't figure out why the matrix multiplications take so long. Any ideas? I can post more information if it is needed (which I'm sure it will be).
Advertisement
Keep in mind that loading lots of uniforms can get pricey, consider that you have 30 bones, which translates 360 floats per model being passed as uniforms, and if the model is only 700 vertices, then you can see the problem, as about one tenth of your data is being passed as uniforms... the skinning will scale better as the number of vertices grows, also to cut down on the uploading of uniforms, pack the transformations as a quaternion together with a translation (and possibly a sign if the bone transformations can be orientation reversing), one can write a blob to apply a rotation represented as a quaternion to a vertex, also the way you are filling the matrix might make the GPU unhappy, you can always declare your bones as float3x4 as:

output main( input IN,				// vertex	     uniform float4x4 modelViewProj,	// modelview matrix	     uniform float4x3 bones[30] )	// bones 30


the GPU might be getting unhappy because is does a copy of the uniform paremeters to work registers, when it can be avoided if you make the bones declared as an array of matrices.


at 4800 micro seconds per model --> 4.8/1000 second per model, which would give 200 models per second--> 3 or 4 models per frame if you want to do 60fps... that is damn slow! what is your hardware? (I ask because on a GeForce6600GT I can do md5 GPU skinning at 60fps with over 40 models)

Close this Gamedev account, I have outgrown Gamedev.
Changing it to 30 float3x4s instead of 90 float4s made it go up to 5.9ms! I don't think it is a problem with my hardware as other games display multiple characters at once but I'll post my specs anyway.
Microsoft Windows XP Home Edition
AMD Athlon 64 Processor 2.39 GHz
ATI Mobility Radeon XPress 200

Also, when I profile this, I only time the drawing itself, not sending over or calculating the matrices.
err... um, keep in mind that in Cg uniform parameter are shadowed, so often they do not get sent until the actual draw call... I am curious, can you try:

drawing the same model many times (but at different locations), if the speed is good there per model, then perhaps the issue is the uniforms...

also ATI Mobility Radeon XPress 200 is not exactly a graphics card for gaming:

Quote:
ATI Radeon Xpress 200M is the shared memory integrated graphic card from ATI. It is a derivate of the Mobility Radeon X300 graphic card, but slower because of the lack of own memory and slower clock speed. Sometimes it is called Mobility Radeon X200. It is a bit faster than the Intel GMA 950 graphic core and threrefore not really suited for actual gaming.


from http://www.notebookcheck.net/ATI-Radeon-Xpress-200M.2175.0.html


Close this Gamedev account, I have outgrown Gamedev.
Oh, it definitely isn't a gaming card but I still can play games with a fair amount of characters.
I will try your suggestion in a bit and edit for the results.
Sorry to take so long, lots of homework.
I set up the skeleton once per frame and drew the model 100 times at different positions. I'm getting about 4.7ms per model...

This topic is closed to new replies.

Advertisement