Seeking maximum performance with OpenGL!

Graphics and GPU Programming Programming OpenGL

Started by golgoth13 August 29, 2010 02:45 AM

12 comments, last by Aks9 13 years, 7 months ago

122

Author

August 29, 2010 02:45 AM

Greetings,

I once read the analogy that drawing with OpenGL is like crossing an ocean. Ultimately, you better have a fully packed cruise ship rather than make several small voyages if you want to get your people the other side as fast as possible. That said, it made me think, I’m currently doing several draw calls per frame with glDrawRangeElements and VOBs. I’m wondering what is the next step… considering a single pass pipeline, is it possible to make only one draw call per frame? And bam!

Ravyne

14,306

August 29, 2010 03:08 AM

Its probably not possible for any reasonable complexity, and certain techniques are fundamentally multi-pass. Moreover, trying to pack everything into one draw call isn't going to get you anything -- using 4 draw calls doesn't necessarily happen in half the time as 8 calls, so its not like using just one draw call is any sort of ring you should be reaching for.

I'm not really familiar with what the reasonable limits are from personal experience, but I do recall reading an article once which said anything fewer than a couple hundred per frame didn't have any real impact, and even that data may be dated since CPU/GPUs have moved on, and also more recent models for 3D driver interaction have reduced the number of kernel/user mode switching and have also begun to embrace multi-threading. The reasons to avoid many draw calls are to avoid those (relatively) slow kernel/user mode switches, and also to not change the GPUs worldview so frequently, which causes GPU caches and pipelines to be flushed more often than necessary.

Organize your draw calls by like material to minimize your draw calls, but don't get caught up trying to chase the 1 draw call game -- though that might make an interesting theme for a coding competition [grin]

throw table_exception("(? ???)? ? ???");

Hodgman

52,717

August 29, 2010 03:17 AM

I've heard a rule of thumb that a draw-call is only "waste" if there's less than ~100 triangles included in it. As long as you're making decent use of your draw-calls (i.e. each one does quite a bit of work) then there's not too much to worry about.

Some anecdotes:
On my GeForce200 I can do about 1M polygons with 1000 draw-calls at 30hz (that's 1000 triangles per draw-call).

I know of one proprietary game engine (DX9/360/PS3) that starts displaying warnings to the artists (optimize your meshes!) once the draw-call counter goes over 2000/frame. i.e. too many draw-calls is recognized here as an art problem, not a code problem.

I've worked on a planetary renderer that was doing 150K triangles in 20K draw-calls per frame (that's only 7-8 triangles per draw-call!). By grouping triangles into the same draw-calls (from 20K down to 1K), we improved the per-frame timings from 350ms (~3 fps) to 30ms (~30fps). That's a 20x reduction in draw-calls for a 10x speed improvement (you can't make a general rule out of that though - there's too many other factors!). n.b. this kind of ties in with the ">100 tri per batch" rule of thumb - we were breaking that rule and had bad performance, then we complied with the rule and had good performance.

On the PS3, one of the most expensive state changes is switching shaders - this can overshadow draw-call costs.

Quote:Original post by golgoth13
is it possible to make only one draw call per frame? And bam!

Only if every object has the same rendering state (textures, materials, shaders, etc).

. 22 Racing Series .

maxgpgpu

207

August 30, 2010 06:40 PM

Yup. My experiments indicated that the "sweet spot" was somewhere around 4000 triangles per call. Reducing the number of triangles per call below 4000 increased overhead fairly quickly, while increasing the number of triangles per call above 4000 decreased overhead rather slowly.

If you're really crazy about squeezing every last gram of performance this way, you might find it worthwhile to draw somewhere between 16K and 64K triangles at a time. Beyond that, the overhead is insignificant.

golgoth13

122

Author

August 30, 2010 10:16 PM

Interesting, seems like the real rule of thumb here is “It’s all relative”.

Quote:Yup. My experiments indicated that the "sweet spot" was somewhere around 4000 triangles per call. Reducing the number of triangles per call below 4000 increased overhead fairly quickly, while increasing the number of triangles per call above 4000 decreased overhead rather slowly.

I can almost see the light but there is a missing link and it's how the Polys count is being controlled. Even if we have one material for 100 geometries... we still need to draw them one by one with 100 draw calls right?

If not, how can we renderer severals geometries with the same rendering states in one draw call?

Prune

224

August 31, 2010 05:21 PM

Another way to minimize draw calls might be to do high-level visibility, transform, morphing, etc. calculations on the GPU, storing the result in a buffer using GL_transform_feedback3 support, and use glDrawElementsIndirect or glDrawArraysIndirect. What would really be great though is to be able to switch shaders without an API call.

"But who prays for Satan? Who, in eighteen centuries, has had the common humanity to pray for the one sinner that needed it most?" --Mark Twain

~~~~~~~~~~~~~~~Looking for a high-performance, easy to use, and lightweight math library? http://www.cmldev.net/ (note: I'm not associated with that project; just a user)

kilah

514

August 31, 2010 05:35 PM

As I posted on another thread, I would strongly recommend to seek the best structure for you vertex shader caché in order to increase your VBO performance on the drawCalls.

That is the next more interesting step (for normal VBO numbers, not 20K of course).

Regards.

golgoth13

122

Author

August 31, 2010 07:01 PM

Quote:calculations on the GPU, storing the result in a buffer using GL_transform_feedback3 support, and use glDrawElementsIndirect or glDrawArraysIndirect.

This seem to be pushing the envelope to the next step indeed. My guess is, this technique involve geometry shaders. I never hear or seen anything like this yet... so it have to be part of GL 4. I'll m currently bind with GL 2.0~3.3.

Quote:As I posted on another thread, I would strongly recommend to seek the best structure for you vertex shader caché in order to increase your VBO performance on the drawCalls.

I'm currently working with glBindBuffer; glBufferData; and (glBufferSubData for vertex attributes):

Anybody has experience in this to share and how to achieve good results?

Thx for all your inputs.

kilah

514

September 01, 2010 04:44 AM

Check some specifications for the most common graphic cards, align your vertex memory structure to their cache block fetch. Also make sure you have your data is interleaved. This should help a lot your VBO performance as all vertex data for a single pass on your vertex shader will be a cache hit.

Another great optimization you may do on the vertex side is within your vertex shader. For instance on PS3 an "if" statement within Cell or GPU is extremly costly due to stalls it incurs in.

You are worrying too much on how you bandwidth is used, but your bottleneck might be on other stuff. Having the fastest drawcall system in the planet, might mean nothing if you are fill rate capped or whatever.

golgoth13

122

Author

September 01, 2010 11:57 AM

Quote:For instance on PS3 an "if" statement within Cell or GPU is extremly costly due to stalls it incurs in… You are worrying too much on how you bandwidth is used, but your bottleneck might be on other stuff.

Touch down, I decide to go for the unique shader that does it all. I have 4-5 if statement in my shader. I’m guessing it is also applicable with GeFocre 8 family. Was too good to be true…

Cant bypass the multiple shaders concept could we? I despise the idea… damn it.

Quote:Check some specifications for the most common graphic cards, align your vertex memory structure to their cache block fetch. Also make sure you have your data is interleaved.

You mean there is an ultimate way to build VBOs? I m curious to find out how I can optimize this:

void OpenGL::SetVertexBuffer(Geometry *in_geometry){	UInt &l_id = in_geometry->GetVertexBuffer()->GetArrayId();	if (l_id == 0 || in_geometry->IsState(STATE_RESET_VOB))	{		glDeleteBuffers(1, &l_id);		VertexBuffer *l_vbo = in_geometry->GetVertexBuffer();		if (l_id == 0)			glGenBuffers(1, &l_id);		glBindBuffer(GL_ARRAY_BUFFER, l_id);		glBufferData(GL_ARRAY_BUFFER, l_vbo->GetArraySize(), NULL, l_vbo->GetType());		for (UInt i = 0; i < in_geometry->GetUVSets().GetCount(); ++i)		{			ArrayBuffer<Vector2f> &l_uvs = in_geometry->GetUVSets()->GetUVs();			glBufferSubData(GL_ARRAY_BUFFER, l_uvs.GetOffset(), l_uvs.GetSize(), l_uvs.GetData());		}		if (in_geometry->GetVertexColors().IsValid())			glBufferSubData(GL_ARRAY_BUFFER, in_geometry->GetVertexColors().GetOffset(), in_geometry->GetVertexColors().GetSize(), in_geometry->GetVertexColors().GetData());		if (in_geometry->GetCurvatures().IsValid())			glBufferSubData(GL_ARRAY_BUFFER, in_geometry->GetCurvatures().GetOffset(), in_geometry->GetCurvatures().GetSize(), in_geometry->GetCurvatures().GetData());		if (in_geometry->GetEdgeFlags().IsValid())			glBufferSubData(GL_ARRAY_BUFFER, in_geometry->GetEdgeFlags().GetOffset(), in_geometry->GetEdgeFlags().GetSize(), in_geometry->GetEdgeFlags().GetData());		if (in_geometry->GetFogCoords().IsValid())			glBufferSubData(GL_ARRAY_BUFFER, in_geometry->GetFogCoords().GetOffset(), in_geometry->GetFogCoords().GetSize(), in_geometry->GetFogCoords().GetData());		if (in_geometry->GetNormals().IsValid())			glBufferSubData(GL_ARRAY_BUFFER, in_geometry->GetNormals().GetOffset(), in_geometry->GetNormals().GetSize(), in_geometry->GetNormals().GetData());		if (in_geometry->GetUTangents().IsValid())			glBufferSubData(GL_ARRAY_BUFFER, in_geometry->GetUTangents().GetOffset(), in_geometry->GetUTangents().GetSize(), in_geometry->GetUTangents().GetData());		if (in_geometry->GetVTangents().IsValid())			glBufferSubData(GL_ARRAY_BUFFER, in_geometry->GetVTangents().GetOffset(), in_geometry->GetVTangents().GetSize(), in_geometry->GetVTangents().GetData());		if (in_geometry->GetVertices().IsValid())			glBufferSubData(GL_ARRAY_BUFFER, in_geometry->GetVertices().GetOffset(), in_geometry->GetVertices().GetSize(), in_geometry->GetVertices().GetData());		if (in_geometry->GetWeights().IsValid())			glBufferSubData(GL_ARRAY_BUFFER, in_geometry->GetWeights().GetOffset(), in_geometry->GetWeights().GetSize(), in_geometry->GetWeights().GetData());		if (in_geometry->GetDeformers().IsValid())			glBufferSubData(GL_ARRAY_BUFFER, in_geometry->GetDeformers().GetOffset(), in_geometry->GetDeformers().GetSize(), in_geometry->GetDeformers().GetData());	}	else	{		glBindBuffer(GL_ARRAY_BUFFER, l_id);		if (in_geometry->GetVertices().IsDynamic())			glBufferSubData(GL_ARRAY_BUFFER, in_geometry->GetVertices().GetOffset(), in_geometry->GetVertices().GetSize(), in_geometry->GetVertices().GetData());	}}

[Edited by - golgoth13 on September 2, 2010 2:57:01 AM]

Seeking maximum performance with OpenGL!

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Seeking maximum performance with OpenGL!

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines