Jump to content

  • Log In with Google      Sign In   
  • Create Account

Using the ARB_multi_draw_indirect command


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
13 replies to this topic

#1 tmason   Members   -  Reputation: 306

Like
0Likes
Like

Posted 22 May 2014 - 12:19 PM

Hello,

 

Has anyone used the "ARB_multi_draw_indirect" command? Apparently it is supposed to make drawing much faster depending on the situation but I don't know how to use it.

 

My drawing code is currently a loop where I bind each item I want to draw with a VAO, send my MVP/Normal Matrix via glUniformMatrix* functions, send a UBO object with my materials/colors, and then do my glDrawElements(*) function.

 

But apparently using ARB_multi_draw_indirect is much faster.

 

Anyone have an example of it's use and how to use it correctly?

 

Thank you for your time.

 

 



Sponsor:

#2 Promit   Moderators   -  Reputation: 7921

Like
4Likes
Like

Posted 22 May 2014 - 01:01 PM

First, read the NVIDIA slides if you haven't already. multi_draw_indirect starts at slide 63.

 

Consider the signature of glDrawArrays:

void glDrawArrays(GLenum  mode, GLint  first, GLsizei  count);

This function takes a mode and two int parameters (basically). Instead of submitting that function, create a buffer:

{ [first | count],
[first | count],
[first | count],
[first | count],
[first | count] }

Now you can call MultiDrawArrays, passing appropriate pointers into this buffer, and a single call will submit five draws at once. Or you can call MultiDrawArraysIndirect, with a single pointer and a stride. Here's the trick: Indirect understands a buffer binding called DRAW_INDIRECT_BUFFER. Now you can upload that buffer above into GPU memory (a buffer object) and execute it from there. Why would you want to do that? You wouldn't. But this is the cleverest part: you can use GPU compute to generate the buffer without any copies. And here's another bit mentioned by the slides: there's an extension called shader_draw_parameters that adds a DrawID into the shader, telling you whether this is draw call 1/2/3/4/5. So you can use that value to select between, let's say, multiple modelview matrices passed into the shader. 

 

The tricky part is setting up all of your input data to leverage as much of this as possible. You need to share buffers and as many shader parameters as possible, and use DrawID cleverly.


Edited by Promit, 22 May 2014 - 01:02 PM.


#3 tmason   Members   -  Reputation: 306

Like
0Likes
Like

Posted 22 May 2014 - 01:22 PM

Thank you for your fast reply!

 

I reviewed the slides and coupled with your response this gives me a starting point.

 

The thing that immediately comes to mind for me is that I am using glDrawElements and not glDrawArrays. Should I change the code to use gl(Multi)DrawArrays or are there ways to use "ARB_multi_draw_indirect" with glDrawElements?

 

Thanks again.



#4 Promit   Moderators   -  Reputation: 7921

Like
3Likes
Like

Posted 22 May 2014 - 01:53 PM

There's MultiDrawElements as well. You should go over the spec for the extension carefully:

https://www.opengl.org/registry/specs/ARB/multi_draw_indirect.txt

http://www.opengl.org/registry/specs/ARB/draw_indirect.txt



#5 tmason   Members   -  Reputation: 306

Like
0Likes
Like

Posted 22 May 2014 - 02:55 PM

There's MultiDrawElements as well. You should go over the spec for the extension carefully:

https://www.opengl.org/registry/specs/ARB/multi_draw_indirect.txt

http://www.opengl.org/registry/specs/ARB/draw_indirect.txt

 

Thank you; I'll go over it. I am just so used to doing things fundamentally one way up until now and this seems much different.

 

So it is a little confusing.

 

My current code consists of:

 

*Setting up the VAO/VBO/Normals, etc*

                glGenVertexArrays(1, &Pointer_VAO);
		glBindVertexArray(Pointer_VAO);
	
		// Create Vertex Buffer Object
		glGenBuffers(1, &Vertex_VBO);

		// Save vertex attributes into GPU
		glBindBuffer(GL_ARRAY_BUFFER, Vertex_VBO);
		glBufferData(GL_ARRAY_BUFFER, TotalVertexCount * 4 * sizeof(float), Vertices, GL_STATIC_DRAW);
		glEnableVertexAttribArray(0);
		glVertexAttribPointer(0, 4, GL_FLOAT, GL_FALSE, 0, 0);

		delete[] Vertices;

		if (HasNormals)
		{
			glGenBuffers(1, &Normal_VBO);
			glBindBuffer(GL_ARRAY_BUFFER, Normal_VBO);
			glBufferData(GL_ARRAY_BUFFER, TotalVertexCount * 3 * sizeof(float), Normals, GL_STATIC_DRAW);
			glEnableVertexAttribArray(1);
			glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, 0, 0);

			delete[] Normals;
		}

		if (HasUVs)
		{
			glGenBuffers(1, &UV_VBO);
			glBindBuffer(GL_ARRAY_BUFFER, UV_VBO);
			glBufferData(GL_ARRAY_BUFFER, TotalVertexCount * 2 * sizeof(float), UVs, GL_STATIC_DRAW);
			glEnableVertexAttribArray(2);
			glVertexAttribPointer(2, 2, GL_FLOAT, GL_FALSE, 0, 0);

			delete[] UVs;
		}

		glGenBuffers(1, &Index_VBO);
		glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, Index_VBO);
		glBufferData(GL_ELEMENT_ARRAY_BUFFER, TotalPolygonCount * 3 * sizeof(unsigned int), Indices, GL_STATIC_DRAW);

		delete[] Indices;
		glBindVertexArray(0);

And my draw call later on in a loop, per VAO:

	glBindVertexArray(Pointer_VAO);
	glUniformMatrix3fv(CurrentOpenGLController->GetNormalMatrixID(), 1, GL_FALSE,
		glm::value_ptr(NormalMatrix));
	glUniformMatrix4fv(CurrentOpenGLController->GetUniformGlobalPositionID(), 1, GL_FALSE, glm::value_ptr(ModelViewProjectionMatrix));
	for (int i = 0; i < Materials.size(); i++) {
		glBufferData(GL_UNIFORM_BUFFER, sizeof(Materials[i].ColorProperties), Materials[i].ColorProperties, GL_DYNAMIC_DRAW);
		glDrawElements(GL_TRIANGLES, (Materials[i].TriangleCount * 3)
			, GL_UNSIGNED_INT, reinterpret_cast<const GLvoid *>(Materials[i].Offset * sizeof(unsigned int)));
	}

I am just trying to wrap my head around what needs to change.

 

But I am reading up on it.

 

Thank you for your time and assistance thus far...



#6 theagentd   Members   -  Reputation: 616

Like
4Likes
Like

Posted 22 May 2014 - 04:59 PM

You won't gain anything by simply replacing each glDrawElements() call with a glMultiDrawElementsIndirect(). The whole point of glMultiDrawElementsIndirect() is to allow you to upload everything you need for all your draw calls to the GPU (using uniform buffers, texture buffers, bindless textures, sparse textures, etc) and then replace ALL your glDrawElements() calls with a single glMultiDrawElementsIndirect() call. As far as I know, glMultiDrawElementsIndirect() is not faster than glDrawElements() when simply used as a replacement for the latter.

 

I strongly recommend you take a look at this presentation http://www.slideshare.net/CassEveritt/beyond-porting which explains really well both the problems and how to solve them.



#7 tmason   Members   -  Reputation: 306

Like
0Likes
Like

Posted 22 May 2014 - 05:03 PM

You won't gain anything by simply replacing each glDrawElements() call with a glMultiDrawElementsIndirect(). The whole point of glMultiDrawElementsIndirect() is to allow you to upload everything you need for all your draw calls to the GPU (using uniform buffers, texture buffers, bindless textures, sparse textures, etc) and then replace ALL your glDrawElements() calls with a single glMultiDrawElementsIndirect() call. As far as I know, glMultiDrawElementsIndirect() is not faster than glDrawElements() when simply used as a replacement for the latter.

 

I strongly recommend you take a look at this presentation http://www.slideshare.net/CassEveritt/beyond-porting which explains really well both the problems and how to solve them.

 

Thank you.

 

That's both what I figured and what I feared; I just started to understand OpenGL from the standpoint of making each draw call independently and now this comes along.

 

I apologize if I am asking many questions that seems like repeats; I am just trying to wrap my head around it as I understood it differently.



#8 theagentd   Members   -  Reputation: 616

Like
3Likes
Like

Posted 22 May 2014 - 08:18 PM

Are you sure that you're not simply prematurely optimizing? How exactly is your situation looking? Have you identified the bottleneck?

 

Slightly off-topic: I was inspired by your post and decided to try out glMultiDrawElementsIndirect() since I identified a part in my engine where I simply called glDrawElementsInstancedBaseVertex() in a loop. This was for shadow rendering, so no texture switches were required. Depending on how many types of tiles that were visible, around 20 draw calls were issued in a row, which I replaced with a single glMultiDrawElementsIndirect() call instead. That left my code with 3 different modes, depending on OpenGL support.

 

 

OGL3: Although all the instance data for all draw calls is packed into the same VBO, the vertex attribute pointer needs to be updated before each draw call so that it reads the correct subset of instances from that buffer.

			glVertexAttribPointer(instancePositionLocation, 3, GL_FLOAT, false, 0, baseInstance * 12);
			glDrawElementsInstancedBaseVertex(GL_TRIANGLES, numIndices, GL_UNSIGNED_SHORT, baseIndex*2, numInstances, baseVertex);

ARB_base_instance: If ARB_base_instance is supported, I can instead simply pass in a base instance instead of modifying the instance data pointer, removing the last set of state change from the mesh rendering loop:

glDrawElementsInstancedBaseVertexBaseInstance(GL_TRIANGLES, numIndices, GL_UNSIGNED_SHORT, baseIndex*2, numInstances, baseVertex, baseInstance);

ARB_multi_draw_indirect: If ARB_multi_draw_indirect is supported, I can pack together the above data into an array (an IntBuffer in my case since I'm using Java, hence the weird code), and draw them all with a single draw call:

//In the mesh "rendering" loop
multiDrawBuffer.put(numIndices).put(numInstances).put(baseIndex).put(baseVertex).put(baseInstance);
multiDrawCount++;

//After the loop:
ARBMultiDrawIndirect.glMultiDrawElementsIndirect(GL_TRIANGLES, GL_UNSIGNED_SHORT, multiDrawBuffer, multiDrawCount, 0);
multiDrawBuffer.clear();
multiDrawCount = 0;

 

 

Performance:
OGL3: 56 FPS

ARB_base_instance: 56 FPS (seems like the overhead of glVertexAttribPointer() is extremely low)

ARB_multi_draw_indirect: 62 FPS

 

The scene used was a purposely CPU intensive scene with 1944 shadow maps being rendered (extremely low resolution and most simply had no shadow casters that passed frustum culling). The resolution was intentionally kept very low and the GPU load was at around 69-71%. My Java code was NOT the bottleneck; my OpenGL commands take approximately 8.5 ms to execute, and then an additional ~8 ms is spent blocking on buffer swap (= waiting for the driver to complete the queued commands, e.g. C code (or something) in the driver). My conclusion is that glMultiDrawElementsIndirect() effectively reduced the load on the driver thread significantly, even when batching together just 10-20 draw calls into each glMultiDrawElementsIndirect() commands.


Edited by theagentd, 22 May 2014 - 08:20 PM.


#9 Hodgman   Moderators   -  Reputation: 33142

Like
0Likes
Like

Posted 22 May 2014 - 08:44 PM


then an additional ~8 ms is spent blocking on buffer swap (= waiting for the driver to complete the queued commands, e.g. C code (or something) in the driver)
As a rule of thumb, if you're blocking on swap/flip/present, you're either waiting for a vblank if you're vsyncing, or you're waiting for the GPU to catch up.

Measurements of "GPU load" are very misleading. You can be bottlenecked by the GPU without seeing it report 100% load...



#10 Promit   Moderators   -  Reputation: 7921

Like
0Likes
Like

Posted 22 May 2014 - 09:41 PM

Remember to use timestamp queries for GPU timing. tmason, I think what you want to do is put all of your material parameters together in an array, and load it into a single buffer. Then you want to make a single MultiDraw call, and use DrawID in the shader to choose the correct material. That way you won't have that loop of calls anymore.



#11 phantom   Moderators   -  Reputation: 7922

Like
4Likes
Like

Posted 23 May 2014 - 02:35 AM

When it comes to performance and usage etc this is worth a watch Approaching Zero Driver Overhead.

#12 swiftcoder   Senior Moderators   -  Reputation: 11105

Like
0Likes
Like

Posted 23 May 2014 - 05:04 AM

When it comes to performance and usage etc this is worth a watch Approaching Zero Driver Overhead.

That's an awesome presentation.


Tristam MacDonald - Software Engineer @Amazon - [swiftcoding]


#13 theagentd   Members   -  Reputation: 616

Like
0Likes
Like

Posted 23 May 2014 - 08:15 AM

 


then an additional ~8 ms is spent blocking on buffer swap (= waiting for the driver to complete the queued commands, e.g. C code (or something) in the driver)
As a rule of thumb, if you're blocking on swap/flip/present, you're either waiting for a vblank if you're vsyncing, or you're waiting for the GPU to catch up.

Measurements of "GPU load" are very misleading. You can be bottlenecked by the GPU without seeing it report 100% load...

 

 

Threaded optimization off:

    51 FPS

    Render time: 18.806 ms
    Swap time: 0.277 ms
    Frame time: 19.678 ms (also includes some UI rendering)
    GPU load: ~60%
 

Threaded optimization on: 

    61 FPS

    Render time: 9.185 ms
    Swap time: 5.739 ms
    Frame time: 15.727 ms

    GPU load: ~71%

 

I did not change a single line in my program. Threaded optimization on simply moves the cost of the render calls to the driver server thread (see the slides I posted above), and if the server thread lags behind it causes it to block on buffer swaps.



#14 tmason   Members   -  Reputation: 306

Like
0Likes
Like

Posted 23 May 2014 - 08:19 AM

Remember to use timestamp queries for GPU timing. tmason, I think what you want to do is put all of your material parameters together in an array, and load it into a single buffer. Then you want to make a single MultiDraw call, and use DrawID in the shader to choose the correct material. That way you won't have that loop of calls anymore.


Great, worth a shot. And this seems like something I can do even without using "ARB_multi_draw_indirect".

Of course, that command seems awesome but I can experiment slowly.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS