Minimizing draw calls and passing transform/material data to a shader in OpenGL 3.x+

Started by
3 comments, last by TheChubu 7 years, 6 months ago

Since I'm targeting GL3.x, this somewhat limits my options of how to do certain things.

In particular, I'm using glMultiDrawElements(), which has a the drawback of not being able to handle instancing info. As far as I can tell all combined draw calls that do support instancing are GL4.x. Is there something I might be missing? A simple workaround here would be to manually "write out" or expand the instances in the index/count arrays passed to glMultiDrawElements().

The bigger problem I'm having is matching my transform data up with respective objects. As best I can tell, using a TBO is the simplest way to get my matrices to the shader. Once there, extracting my world matrix is a bit trickier, though, as I still need an index of the range currently being processed (effectively in a way a combination of object/instance ID). This doesn't seem to be passed in via gl_InstanceID or any other way. How can I get around this?

I'd appreciate some input in terms of how to store and update material data in the shader. The standard approach seems to be to store this stuff in a UBO. These are capped to fairly small sizes though. What's your approach for dealing with complex multimaterial objects? More drawcalls and more frequent material lookup rebinding?

Advertisement

Keep in mind that simple implementations of multi-draw will do it on the client (CPU) side, just as a loop that calls the corresponding draw function a bunch of times. GL4-level hardware will be more likely to do this loop on the GPU-side.
Note that on GPUs that do perform multi-draws on the GPU-side, you can performantly emulate instancing by issuing one draw per instance.

To get the best milage, you want something like ARB_shader_draw_parameters, which gives you gl_DrawId (like gl_InstanceId).

If this extension is not present, you may actually be better off using GL3/GL2-style instancing...

I'd appreciate some input in terms of how to store and update material data in the shader. The standard approach seems to be to store this stuff in a UBO

Material data tends to be pretty small...

Ah if I could point you to some older thread that was about this kind of thing. Sadly I cant seem to be able to search keywords in my own posts only, so I can't find it.

Anyway, ARB_shader_draw_parameters seems to not be supported in any GL 3 hardware (and I've read for some reason it has a non-negligible performance impact). So the idea is to work around that using this beautifully named draw call:

glDrawElementsInstancedBaseVertexBaseInstance

Which comes in the extension arb_base_instance, made core in 4.2, which also is supported by all the GL 3 hardware you should care about.

Now this gives you a couple of things:

A way to specify a vertex offset to start drawing from in a vertex buffer.

A way to specify an index offset to start drawing from in an index buffer.

A way to specify the instance you're starting to draw from in instanced rendering.

With this you can have a single VAO, with all your meshes, and a way to combine instanced rendering with normal rendering in a single draw call (like Vulkan!).

You need a way to bind these single/multiple mesh instances to their respective transform/material data right? And uploading a single uniform per draw call wont cut it, since you might have instanced calls with more than one instance. So you need a way to upload data for several different instances, preferably without caring if they're of the same mesh or not, so you can issue a single upload then make draw calls as needed that operate with that uploaded data.

Now the issue here is that instance ID still works as usual, that is, its zero based. So if you're drawing 5 instances of mesh 10 in your draw list, you need it to fetch transform data at index 10, 11, 12, 13 and 14, so you need a way to communicate to the shader that it should start from index 10. So, with a normal instanced draw call, if you tell it to draw 5 instances, it will start at zero.

A way to get around this is to specify a separate instanced attribute buffer, that contains the indices. From 0 to whatever maximum instances you can draw in a single uniform buffer upload (usually 4096 with a 64Kb UBO limit).

Now the trick here is to tell the instanced draw call to start at instance 10, which will fetch the instanced attribute at index 10, which will be the real index you want (ten!). gl_InstanceID will still be zero, but you dont care about that, because you got the index you want automatically fetched from the instanced attribute buffer.

So your render loop becomes something like:


for ( allPassedShaderPrograms) {

program.bind();

do {

// Fill the UBOs as much as you can with the render tasks data.

for (allUniformBufferInThisBatch)

update(renderTask);

// Draw all the render tasks that had their data uploaded.

for (allTasksThatCouldBeUpdated)

draw(renderTask);

// And repeate while there are tasks to draw.

} while (thereAreMoreTasksToDraw);

}

You can find a more detailed explanation here if you read the PDF: http://www.gamedev.net/blog/2042/entry-2261259-from-yaml-to-renderer-in-50ms/

Have in mind that my renderer is pretty basic, no multi layered materials or anything, so your mileage may vary. But those are the basics to get more draw calls per buffer upload.

EDIT: Also, these kinds of approaches were described in nVidia's advanced OpenGL scene rendering presentations from GTC, they did one and updated it each year with different methodologies and benchmarks. You can google those. The idea is more or less the same, how to minimize buffer uploads, how to get the most out of your drawcalls, and how to efficiently upload instance IDs for indexed resources (whether they're UBOs, TBOs, SSBOs, etc).

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

Thanks for the thorough response, TheChubu! I got a chance to look into ARB_shader_draw_parameters and even my GTX 960M doesn't have it, so I don't see a reason to add a codepath for it. I still haven't had the chance to sit down and work on the actual code, but in the very least ARB_base_instance is present, which is indeed there since GL3.1. PS - I appreciate the link to your render loop architecture.

As for materials: frankly I have been working on the deep innards of my game thus far and the visual side is in dire need of an upgrade from a texture-based to an actual material-enabled approach. I generally try to minimize code iteration and time spent on reimplementing features as much as possible. As such, I'd like to upgrade my render pipeline to draw meshes with multiple materials as well as extend it to handle layered materials in one swell swoop. I may be misguided here, but given the evidently fairly small maximum array size in shaders of the time this introduces a potentially limiting complication to the pipeline in order to not reduce the whole effort of minimizing draw calls to constantly remapping materials, which kinda defeats the whole purpose. In any case, I haven't gotten to adding materials to my shader pipeline yet, so it was more of a preparatory "best practices" type of question :).

64kB UBO is the standard for nVidia (Intel varies from 16kB to 64kB, chances are you dont care about the Intel GPUs with 16kB UBOs). GCN supports 2Gb UBOs if you want them for some reason. As I mentioned, you could just simply use texture buffer objects (TBOs), which are megabyte sized in most cases (they're often used for skeletal animation data for example, which is a lot of matrices, more than whatever material data you might have).

Also have in mind this: 64kB (or whatever) is the maximum range you can bind of a buffer to an UBO slot, that doesnt prevents you from creating a megabyte sized buffer (which I do), and updating it entirely in a single call. Then you just rebind ranges incrementally and draw as you go (drawcalls themselves aren't the real issue here but the stuff you have to touch to get to that drawcall).

At least in my profiling, the time is spent updating the buffers rather than issuing the drawcalls. So that's what gets minimized.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

This topic is closed to new replies.

Advertisement