Jump to content
  • Advertisement
Sign in to follow this  
DMINATOR

Vertex array and animation

This topic is 4870 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

This is pretty hard one. I am using vertex array , its performance is pretty good, but only for static meshes. If I add animation the performance drops dramatticly. I used profiling to find the bottleneck, here is the log: The scene is drawn 13 times
Quote:
Frequncy: 3579545 Total processes: 3 Process: 0 [0] - Total drawing: [0] - Hits = 13 [0] - Ticks = 2268852 [0] - Ticks avg. = 174527 [0] - Time = 1000 [0] - Time avg.= 76 Process: 1 [1] - Filling vertex and texture array [1] - Hits = 1300 [1] - Ticks = 1910886 [1] - Ticks avg. = 1469 [1] - Time = 1000 [1] - Time avg.= 0 Process: 2 [2] - Drawing vertex array [2] - Hits = 1300 [2] - Ticks = 341468 [2] - Ticks avg. = 262 [2] - Time = 100 [2] - Time avg.= 0
So the 85% of performance time is taken for filling the array with data. Here is the code I use for that
	for(int i = 0 ; i < header.num_face; i++)
	{

		//current frame vertex position
  	    ax1 = framelist[frames].vertexlist[ facelist.a ].x * framelist[frames].scale[0] + framelist[frames].translate[0];
		ay1 = framelist[frames].vertexlist[ facelist.a ].y * framelist[frames].scale[1] + framelist[frames].translate[1];
		az1 = framelist[frames].vertexlist[ facelist.a ].z * framelist[frames].scale[2] + framelist[frames].translate[2];

		bx1 = framelist[frames].vertexlist[ facelist.b ].x * framelist[frames].scale[0] + framelist[frames].translate[0];
		by1 = framelist[frames].vertexlist[ facelist.b ].y * framelist[frames].scale[1] + framelist[frames].translate[1];
		bz1 = framelist[frames].vertexlist[ facelist.b ].z * framelist[frames].scale[2] + framelist[frames].translate[2];

			
		cx1 = framelist[frames].vertexlist[ facelist.c ].x * framelist[frames].scale[0] + framelist[frames].translate[0];
		cy1 = framelist[frames].vertexlist[ facelist.c ].y * framelist[frames].scale[1] + framelist[frames].translate[1];
		cz1 = framelist[frames].vertexlist[ facelist.c ].z * framelist[frames].scale[2] + framelist[frames].translate[2];




		//next frame vertex position
		ax2 = framelist[framee].vertexlist[ facelist.a ].x * framelist[framee].scale[0] + framelist[framee].translate[0];
		ay2 = framelist[framee].vertexlist[ facelist.a ].y * framelist[framee].scale[1] + framelist[framee].translate[1];
		az2 = framelist[framee].vertexlist[ facelist.a ].z * framelist[framee].scale[2] + framelist[framee].translate[2];

		bx2 = framelist[framee].vertexlist[ facelist.b ].x * framelist[framee].scale[0] + framelist[framee].translate[0];
		by2 = framelist[framee].vertexlist[ facelist.b ].y * framelist[framee].scale[1] + framelist[framee].translate[1];
		bz2 = framelist[framee].vertexlist[ facelist.b ].z * framelist[framee].scale[2] + framelist[framee].translate[2];

			
		cx2 = framelist[framee].vertexlist[ facelist.c ].x * framelist[framee].scale[0] + framelist[framee].translate[0];
		cy2 = framelist[framee].vertexlist[ facelist.c ].y * framelist[framee].scale[1] + framelist[framee].translate[1];
		cz2 = framelist[framee].vertexlist[ facelist.c ].z * framelist[framee].scale[2] + framelist[framee].translate[2];


            //interpolated vertex coordinates
		ax = ax1 + md2_inter * ( ax2 - ax1 );
		ay = ay1 + md2_inter * ( ay2 - ay1 );
		az = az1 + md2_inter * ( az2 - az1 );


		bx = bx1 + md2_inter * ( bx2 - bx1 );
		by = by1 + md2_inter * ( by2 - by1 );
		bz = bz1 + md2_inter * ( bz2 - bz1 );


		cx = cx1 + md2_inter * ( cx2 - cx1 );
		cy = cy1 + md2_inter * ( cy2 - cy1 );
		cz = cz1 + md2_inter * ( cz2 - cz1 );
	
	


		//calculate uv coordinates
		au = (float)uvlist[ faceuvlist.auv ].u / header.skinwidth;
		av = 1 - (float)uvlist[ faceuvlist.auv ].v / header.skinheight;


		bu = (float)uvlist[ faceuvlist.buv ].u  / header.skinwidth;
		bv = 1 - (float)uvlist[ faceuvlist.buv ].v / header.skinheight;

		
		cu = (float)uvlist[ faceuvlist.cuv ].u  / header.skinwidth;
		cv = 1 - (float)uvlist[ faceuvlist.cuv ].v / header.skinheight;

	

            //fill the vertex and texture arrays

		vboar[ cnt ].x = ax;
		vboar[ cnt ].y = ay;
		vboar[ cnt ].z = az;


		tboar[ cnt ].u = au;
		tboar[ cnt++ ].v = av;



	    vboar[ cnt ].x = bx;
		vboar[ cnt ].y = by;
		vboar[ cnt ].z = bz;


		tboar[ cnt ].u = bu;
		tboar[ cnt++ ].v = bv;





		vboar[ cnt ].x = cx;
		vboar[ cnt ].y = cy;
		vboar[ cnt ].z = cz;
	

		tboar[ cnt ].u = cu;
		tboar[ cnt++ ].v = cv;

}


ax1,ay1,az1 - vertex position at current frame ax2,ay2,az2 - vertex position at next frame Then I interpolate and get the following ax,ay,az - interpolated vertex postiion then I calculate UV coordinates for each vertex and fill the arrays.. I use md2 model , I unpack the values and then make interpolation and fill array each frame, that's what causing the performance decrease. Maybe someone can help me out ? Is there a way to speed it up or use different system ?

Share this post


Link to post
Share on other sites
Advertisement
Quote:
Original post by DMINATOR
I use md2 model , I unpack the values and then make interpolation and fill array each frame, that's what causing the performance decrease.

Maybe someone can help me out ? Is there a way to speed it up or use different system ?

For starters, you could probably skip the whole recalculation and redefinition of uv components for every frame, since these don't change? ^^;;

Share this post


Link to post
Share on other sites
I bet you could write some pretty snappy SSE to do that code.

Or you could write a vertex shader and do GPU based morphing, which would move the load off the CPU entirely.


[EDIT] Haha, I thought so. This is MD2.

cx2 = framelist[framee].vertexlist[ facelist.c ].x * framelist[framee].scale[0] + framelist[framee].translate[0];

This is generally computed beforehand and baked back into the vertex, so that there are no translate and scale attributes.

That alleviates a lot of the stress right off. You do even better by resolving the UV and vertex indices into a single unindexed triangle list at load time. And if you feel really good, you can reindex them and reorder the indices to be nice and cache friendly.

Share this post


Link to post
Share on other sites
Thanks for your comments , well originally I did uncompress all the values to the arrays, but It seems that they were using quite alot of memory. 700K for 1 model with animation, not counting the texture. So i saved the memory , but lost with performance. My main goal is to support some of the old video cards. Right now I don't know what is the best option.

Share this post


Link to post
Share on other sites
Here are some modification I made. AWell there were some speed problems , when using vboar and tboar as private array in model class , but if I use them as global variables the performance increase about 40% ! Here is the profiling info

Here I am using tboar and vboar as local variables in class
Quote:

Frequncy: 3579545

Total processes: 3

Process: 0

[0] - Total drawing:

[0] - Hits = 13

[0] - Ticks = 2175138
[0] - Ticks avg. = 167318

[0] - Time = 1000

[0] - Time avg.= 76


Process: 1

[1] - Filling vertex array only

[1] - Hits = 1300

[1] - Ticks = 1194818
[1] - Ticks avg. = 919

[1] - Time = 500

[1] - Time avg.= 0


Process: 2

[2] - Drawing vertex array

[2] - Hits = 1300

[2] - Ticks = 942778
[2] - Ticks avg. = 725

[2] - Time = 333

[2] - Time avg.= 0



And Here I am using them as global variable
Quote:

Frequncy: 3579545

Total processes: 3

Process: 0

[0] - mod-general

[0] - Hits = 13

[0] - Ticks = 1371290
[0] - Ticks avg. = 105483

[0] - Time = 500

[0] - Time avg.= 38


Process: 1

[1] - mod-fill

[1] - Hits = 1300

[1] - Ticks = 926025
[1] - Ticks avg. = 712

[1] - Time = 333

[1] - Time avg.= 0


Process: 2

[2] - mod-draw

[2] - Hits = 1300

[2] - Ticks = 424421
[2] - Ticks avg. = 326

[2] - Time = 125

[2] - Time avg.= 0



VERTEX ARRAY FILLING - 1194818 -> 926025
DRAWING - 926025 -> 424421

TOTAL - 2175138 -> 1371290


Well in FPS the increase is from 20 to 30 ! That's a pretty good. It seems that using local variables slows down the vertex array drawing, maybe someone can explain how can this be ?


Here is the code I am using now. The texture array is pre computed as it is constant. The vertex array is now GLfloat and uncompressed during model loading.



//I am now using pointers to start and end frame
md2_frame *frame_s = &framelist[frames];
md2_frame *frame_e = &framelist[framee];


//should speed it little up
// instead of using facelist.a all the time
unsigned short facea;
unsigned short faceb;
unsigned short facec;

profiler.ProcStart(1);


//let's move through the vertices at specified frame
for(int i = 0 ; i < header.num_face; i++)
{

facea = facelist.a;
faceb = facelist.b;
facec = facelist.c;


//vertex array
abc[0] = frame_s->vertexlist[ facea ].x ;
abc[1] = frame_s->vertexlist[ facea ].y ;
abc[2] = frame_s->vertexlist[ facea ].z ;

abc[3] = frame_s->vertexlist[ faceb ].x ;
abc[4] = frame_s->vertexlist[ faceb ].y ;
abc[5] = frame_s->vertexlist[ faceb ].z ;

abc[6] = frame_s->vertexlist[ facec ].x ;
abc[7] = frame_s->vertexlist[ facec ].y ;
abc[8] = frame_s->vertexlist[ facec ].z;


abc[9] = frame_e->vertexlist[ facea ].x ;
abc[10] = frame_e->vertexlist[ facea ].y ;
abc[11] = frame_e->vertexlist[ facea ].z ;

abc[12] = frame_e->vertexlist[ faceb ].x ;
abc[13] = frame_e->vertexlist[ faceb ].y;
abc[14] = frame_e->vertexlist[ faceb ].z ;

abc[15] = frame_e->vertexlist[ facec ].x ;
abc[16] = frame_e->vertexlist[ facec ].y ;
abc[17] = frame_e->vertexlist[ facec ].z;

//finding interpolation value
ax = abc[0] + md2_inter * ( abc[9] - abc[0] );
ay = abc[1] + md2_inter * ( abc[10]- abc[1] );
az = abc[2] + md2_inter * ( abc[11] - abc[2] );


bx = abc[3] + md2_inter * ( abc[12] - abc[3] );
by = abc[4] + md2_inter * ( abc[13] - abc[4] );
bz = abc[5] + md2_inter * ( abc[14] - abc[5] );


cx = abc[6] + md2_inter * ( abc[15] - abc[6] );
cy = abc[7] + md2_inter * ( abc[16] - abc[7] );
cz = abc[8] + md2_inter * ( abc[17] - abc[8] );


//filling the array
vboar[ cnt ].x = ax;
vboar[ cnt ].y = ay;
vboar[ cnt++ ].z = az;


vboar[ cnt ].x = bx;
vboar[ cnt ].y = by;
vboar[ cnt++ ].z = bz;



vboar[ cnt ].x = cx;
vboar[ cnt ].y = cy;
vboar[ cnt++ ].z = cz;


}




// Drawing the vertex array - the code is the same but performance decreased

glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState( GL_TEXTURE_COORD_ARRAY );

glTexCoordPointer( 2,GL_FLOAT,0, tboar );
glVertexPointer(3, GL_FLOAT, 0,vboar);



//now let's draw it
//glDrawElements(GL_TRIANGLES, header.num_face * 3, GL_UNSIGNED_SHORT, facelist);


glDrawArrays( GL_TRIANGLES, 0, header.num_face * 3);


glDisableClientState( GL_TEXTURE_COORD_ARRAY );
glDisableClientState(GL_VERTEX_ARRAY);




}








[Edited by - DMINATOR on June 15, 2005 5:31:47 AM]

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!