Jump to content
  • Advertisement

Archived

This topic is now archived and is closed to further replies.

Striiideerr

glDrawElements is slow with bsps?

This topic is 5281 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I have been working on a q3 map view for the fun it (im not using the q3 shaders though) and I finally got the renderer working and everything renders fine, all geometry is there (except patches) and the textures are correct but I''m getting around 7fps. I am using glDrawElements to render each batch and I thought this would be fast. I plan on using VBOs later but I thought using just glDrawElements by itself without any of these fancy extensions should be faster than this. I batch each group of faces by texture so the time of texture state switches is not the problem. I comment out the glDrawElement call and the fps jumps up the max rate of 85fps (my refresh since i have vsync on). My renderer is very much like the one found in the Titan engine and Titan runs very fast. I changed the renderer over to be just like the one found in the Aftershock engine but I still get the exact same framerates. I tried to use glDrawArrays with triangle fans like the tutorial on gametutorials.com uses but when doing this some of the triangles are missing. But the framerate is going extremely fast reguardless. Is glDrawArrays that much superior to glDrawElements now? Here are snippets of my rendering funcs: DrawFaces(...)
drawSurf_t *ds;

int last_tex = -1;

int size = m_visibleFace.size();
for( int i = 0; i < size; ++i )
{
int lf_index = m_bspLumps.pleaffaces[m_visibleFace[i]];
		
ds = &m_surfaces[lf_index];

if( ds->texture != last_tex )
{
g_renderer.SetRenderState( &m_currMaterial, ds->texture );
last_tex = ds->texture;
}

g_renderer.PushTriangles( ds );		
}
BTW, dont worry about the material thing there, it does nothing yet. SetRenderState(..) (from above)
if( m_input.num_verts != 0 )
   Emit();

m_currMaterial = mat;
m_iCurrTexture = texture;
PushTrianges(...)
if( surf->num_elems == 0 || surf->num_verts == 0 )
	return;

if( m_input.num_elems + surf->num_elems >= MAX_ELEMS ||
	m_input.num_verts + surf->num_verts >= BUFFER_SIZE )
	{
		Emit();		
	}

// Create the triangle indices for the vertex array

for( int i = 0; i < surf->num_elems; ++i )
	m_input.elems[m_input.num_elems++] = m_input.num_verts + surf->first_elem[i];

// Extract the vertex positions

float *pos  = &m_input.pos[m_input.num_verts << 2];
vertex_t *v = surf->first_vert;
int count   = surf->num_verts;

do
{
	pos[0] = v->position[0];
	pos[1] = v->position[1];
	pos[2] = v->position[2];
	pos += 4;

	++v;
} while( --count );


// Extract the texture coords

float *st = &m_input.st[m_input.num_verts << 1];
	
v     = surf->first_vert;
count = surf->num_verts;

do
{
	st[0] = v->texcoord[0][0];
	st[1] = v->texcoord[0][1];
	st += 2;

	++v;
} while( --count );

m_input.num_verts += surf->num_verts;
Emit(..)
if( m_input.num_verts == 0 )
	return;

glVertexPointer( 3, GL_FLOAT, sizeof( float ) * 4, m_input.pos );
glTexCoordPointer( 2, GL_FLOAT, 0, m_buffer[0].st );
memcpy( m_buffer[0].st, m_input.st, sizeof( float ) * 2 * m_input.num_verts );

glBindTexture( GL_TEXTURE_2D, m_pTextureArray[m_iCurrTexture] );
glEnable( GL_TEXTURE_2D );

glDrawElements( GL_TRIANGLES, m_input.num_elems, GL_UNSIGNED_INT, m_input.elems );
	
glDisable( GL_TEXTURE_2D );

m_input.num_elems = 0;
m_input.num_verts = 0;
Thanks.

Share this post


Link to post
Share on other sites
Advertisement
quote:
memcpy( m_buffer[0].st, m_input.st, sizeof( float ) * 2 * m_input.num_verts );


looks very expensive, and very "not-needed"




HOW I IMPLEMENTED MY RENDERER (OPTIONAL READING)

i have implemented a full Quake3 level viewer (shaders, BSP, Beziers, Collision & Models)

my render device (CRendererOpenGL) uses a push-sort-flush scheme
it is VERY important that you minimise the first and second tasks times.

for me, this worked best.

Single Memory Pool
* Verticies
* Texture Coords
* Shaders

this pool is stored as Renderer native format (so do conversions
at load time)

instead of copying floats (5*floats) copy ints (or shorts -WORD)
point to the buffer and flush.

so PushTriangles dosnt actually push triangle data, rather
indicies.

in total i think the memory transfer between my engine and the
device is 10kb(about 2600 indicies) which is quite small as a per-frame cost.

my renderer runs on a Celeron 400 (66Mhz fsb) GeForce2-MX400 with full multipass shaders at around 70-75fps.

Need anyhelp with anything, Just post back here.


Share this post


Link to post
Share on other sites
You''re right that memcpy thing was not needed. I took that out and let glTexCoordPointer have m_input.st instead of m_buffer. The bad thing is this didn''t change the performance, although it''s still a good thing that memcpy is out of there.

So your single memory pool, is that just a structure with pointers to arrays of indices to vertices, tex coords and shaders?

I''m still suprised it''s running so slow, I look at it and everything looks like it should be pretty fast. I would think what little i''m doing should run good on my system, athlon xp 2600 & geforce fx 5600 ultra. Something weird is going on here. I''d download the trial version of VTune but it''s pretty big and my connection is a slow pos.

Share this post


Link to post
Share on other sites
Hmm, I played around with the code to my Quake viewer, and I get about 5 fps on a medium size map when I turn culling off entirely. Is it possible that you are drawing the entire BSP tree, instead of just the visible sections?

I am using glDrawElements(), and it does not seem unusually slow, even on my old machine.

Here''s to the crazy ones, the misfits, the rebels, the troublemakers, the round pegs in the square holes, the ones who see things differently.

Share this post


Link to post
Share on other sites
Well I''m doing the PVS stuff and storing the visible faces in an array. My map I''m using though is a small map I made that I estimated by looking at the face count to be around 5000 polys or so. It''s very small. It''s basically a single room with some pillars and a guard rail between the pillars. There is this engine called "diesel" that runs q3 maps very well and I ran my map through diesel and it was running at almost 400fps, compared to my 7.

I have been commenting things out to try to figure out what could be going so slow and every time I do this it still runs slow. Making it only use one texture doesn''t help. If I disable texturing all together it only goes up to 22 fps. Comment out glDrawElements and the framerate maxes out to 85, what I SHOULD be seeing when everything is drawn normally. This one has me puzzled.

Share this post


Link to post
Share on other sites
Ok I slept on it and I just had a thought. I wondered just how many times glDrawElements is being called in this program. Yesterday I made a test program that used glDrawElements that drew a single triangle. But I put the call in a loop that iterated 500 times. So glDrawElements was getting called to draw this one little triangle 500 times. Of course as I expected the framerate was VERY low. Today I added a counter right after glDrawElements is called (in my q3 program) and set a breakpoint after all visible faces were drawn. The debugger told me the value of this counter and it was 1433! No wonder it was so freaking slow! If 500 calls to glDrawElements drawing a single triangle with no texture just some colors is extrememly slow, then drawing a q3 world w/ textures is going to crawl.

The moral of the story, limit the number of glDrawElement calls as much as humanly possible.

So the problem is in my batching algorithm, it''s not doing a good job at all it seems.

Thanks everyone for your replys, they were helpfull.

Share this post


Link to post
Share on other sites
my renderer limits to about 200 gldrawelements calls for the
entire scene, and if it is well tesselated, it can be as low
as 4 calls for 1700-2000 faces.

q3dm1 is the best performer
q3dm7/17 (not sure) is the worst with a meger 40-50 on the celeron spec

this is the best way to batch, its basically a radix sort.



PolygonBuffers pbShader[MaxShaderId];



//sort by shader first
foreach Triangle in Scene
pbShader[Triangle.ShaderId].add( Triangle );
next


PolygonBuffers pbTexture[MaxTextureId];




//sort by texture now (0 = "default shader" ie. plain texture)
foreach Triangle in pbShader[0]
pbTexture[Triangle.TextureId].add( Triangle );
next





// ***** RENDER ******

//now render the default shader
for p = 0 to pbTexture.size()
RenderShaderedPolyBuffer(pbShader[p]);
next



//draw all shaders last (avoids transparency problems)
for p = 1 to pbShader.size()
RenderShaderedPolyBuffer(pbShader[p]);
next



thats pretty much how i do it, i leave the implementation for you, those obviously need to be seperated into appropriate pushTriangle/pushShader functions etc

[edited by - silvermace on June 4, 2004 8:52:00 PM]

Share this post


Link to post
Share on other sites

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!