What are your opinions on DX12/Vulkan/Mantle?

Started by
120 comments, last by Ubik 8 years, 10 months ago

Of course but the idea is to batch up data inside the constant/uniform buffers and use the instance ID for indexing. No sense doing it if you can only index one thing (ie, you end up what I am doing, one glDraw and glUniform1i call per mesh drawn).

Id comes from Instance data if I understand correctly and not gl_InstanceID. Id is different for two different instance, and a different mesh is a different instance.

Think of this as 2 buffers, one is instance buffer which contains only ID, the other is vertex buffer.

A first draw call would use 10 instance from the instance buffer, starting from BaseInstance 0.
A second draw call would use 1 instance from the instance buffer, starting from BaseInstance 10.

So if in your instance buffer you put Id in ascending order for instance, all the ID will be different.

Advertisement

use baseInstance parameter from glDraw*BaseInstanceBaseVertex. gl_InstanceID will still be zero based, but you can use an instanced vertex element to overcome this problem (or use an extension that exposes an extra glsl variable with the value of baseInstance)

And what if you're drawing two different meshes? ie, not instancing a single mesh.

How are you implementing your constant buffers? From what you've written as your #3b, it sounds like you're packing multiple materials'/objects' constants into a single large constant buffer, and perhaps indexing out of it in your draws? IIRC, that's supported only in D3D11.1+, as there is no *SSetConstantBuffer function that takes offsets until then.

I have no idea about D3D11, but prolly isn't even necessary. Just update the entire buffer in one call. Buffer is defined as an array of structs, index into that to fetch the one that corresponds to the current thing being drawn.

So, he's just saying 'for the next n draws, here are the constants', and then sets indices (somehow? Not sure how he'd track that without also updating a constant. Atomic integers?) to say 'access struct n in the huge constant buffer?

Honestly, I'd rather update smaller buffers with finer granularity as I wouldn't be stalling on one large copy.

How are you implementing your constant buffers? From what you've written as your #3b, it sounds like you're packing multiple materials'/objects' constants into a single large constant buffer, and perhaps indexing out of it in your draws?

Yes

IIRC, that's supported only in D3D11.1+, as there is no *SSetConstantBuffer function that takes offsets until then.

That's one way of doing it, and doing it that way, then you're correct. We don't use D3D11.1 functionality, though since OpenGL does support setting constant buffers by offsets, we take advantage of that to further reduce splitting some batch of draw calls.

Otherwise, if you aren't using constant buffers with offsets, how are you avoiding having to set things like object transforms and the like? If you are, how are you handling targets below D3D11.1?

By treating all your draws as instanced draws (even if they're just one instance) and use StartInstanceLocation.
I attach a "drawId" R32_UINT vertex buffer (instanced buffer) which is filled with 0, 1, 2, 3, 4 ... 4095 (basically, we can't batch more than 4096 draws together in the same call; that limit is not arbitrary: 4096 * 4 floats per vector = 64kb; aka the const buffer limit).
Hence the "drawId" vertex attribute will always contain the value I want as long as it is in range [0; 4096) and thus index whatever I want correctly.

This is the most compatible way of doing it which works with both D3D11 and OpenGL. There is a GL4 extension that exposes the keywords gl_DrawIDARB & gl_BaseInstanceARB which allows me to do the same without having to use an instanced vertex buffer (thus gaining some performance bits in memory fetching; though I don't know if it's noticeable since the vertex buffer is really small and doesn't consume much bandwidth; also the 4096 draws per call limit can be lifted thanks to this extension).

1 is a valid value for the instance count.

Of course but the idea is to batch up data inside the constant/uniform buffers and use the instance ID for indexing. No sense doing it if you can only index one thing (ie, you end up what I am doing, one glDraw and glUniform1i call per mesh drawn).

Doing this for just one instance is completely valid. If you do it the way you said, although valid; your API overhead will go through the roofs, specially if you have a lot of different meshes.

Otherwise, if you aren't using constant buffers with offsets, how are you avoiding having to set things like object transforms and the like? If you are, how are you handling targets below D3D11.1?

By treating all your draws as instanced draws (even if they're just one instance) and use StartInstanceLocation.

And you have no noticeable problems with that? A year and half ago or so I did some quick tests where I just rendered (in OpenGL) all my objects using normal draw calls vs rendering all my objects using instancing with instance count =1, and it had some truly horrendous CPU overhead. Profiler showed that GPU fell asleep, but CPU for some reason took a lot longer for everything. If I remember right, for about 700 total draw calls (crytek sponza geometry + shadow pass), I saved something like 3 or 4ms by switching back to normal draw calls for everything (on an i7 3770k and GTX 770). Granted the setup was suboptimal at best, I sorted by shaders and textures used and nothing else, every mesh was in its own VB, etc. Maybe that was the reason and there's a much smaller instancing overhead otherwise?

And you have no noticeable problems with that? A year and half ago or so I did some quick tests where I just rendered (in OpenGL) all my objects using normal draw calls vs rendering all my objects using instancing with instance count =1, and it had some truly horrendous CPU overhead. Profiler showed that GPU fell asleep, but CPU for some reason took a lot longer for everything. If I remember right, for about 700 total draw calls (crytek sponza geometry + shadow pass), I saved something like 3 or 4ms by switching back to normal draw calls for everything (on an i7 3770k and GTX 770). Granted the setup was suboptimal at best, I sorted by shaders and textures used and nothing else, every mesh was in its own VB, etc. Maybe that was the reason and there's a much smaller instancing overhead otherwise?

This would depend on how you update the per-instance buffer.

If you have a small buffer - with space for only one instance - and you do a separate buffer update for each instance, then OpenGL is going to perform horribly (D3D won't). If you have a large buffer with space for all your instances, but you update them all together, then it should run well.

The overhead isn't instancing, it's OpenGL's buffer objects API.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

And you have no noticeable problems with that?

Nope, we're are not.

every mesh was in its own VB

There's your problem. Every time you had to switch to the next mesh, you had to respecify the VAO state.
You could be hitting the slow path by doing that per mesh + using instancing. The driver may have been able to detect the VAO only switched buffers with the non-instanced calls; but decided to respecify the whole vertex data when using instancing.
You should keep all your meshes in the same Buffer Object, or have very Buffer Objects at least.

Also, you obviously compared an instanced version without indexing into one single buffer vs normal draw calls.
You should compare instanced version + indexing into one single buffer vs normal draw calls.
If there is higher overhead from using instancing, it is more than negated by using indexes into a single buffer.

Doing this for just one instance is completely valid. If you do it the way you said, although valid; your API overhead will go through the roofs, specially if you have a lot of different meshes.
Then with the instanced method, how would you handle drawing different meshes?

ie, as I see it you'd have two ways of doing it:

  1. Update the mesh transform, then issue a glDraw*Instanced call with a single instance, always fetch transform in index 0. Repeat for every single mesh.
  2. Update transform UBO with all the transforms that can fit, then issue glDraw*Instanced call, repeat this draw call increasing the base instance ID by one for every single mesh until you run out of transforms in the UBO (doing the instanced index buffer trick you mentioned since instance ID is always 0).

So you always end up with one draw call per each different mesh. Thing that differs is UBO updating scheme (no scheme in first one, batching scheme in the second one).

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

I starting to be worried by rumors that Google may have its own low level api too. This would basically mean one API per OS which break the purpose of Vulkan in the first place...

Then with the instanced method, how would you handle drawing different meshes?

ie, as I see it you'd have two ways of doing it:

  1. Update the mesh transform, then issue a glDraw*Instanced call with a single instance, always fetch transform in index 0. Repeat for every single mesh.
  2. Update transform UBO with all the transforms that can fit, then issue glDraw*Instanced call, repeat this draw call increasing the base instance ID by one for every single mesh until you run out of transforms in the UBO (doing the instanced index buffer trick you mentioned since instance ID is always 0).

So you always end up with one draw call per each different mesh. Thing that differs is UBO updating scheme (no scheme in first one, batching scheme in the second one).

glMultiDrawIndirect basically iterates glDraw*Instanced call over all element of bound indirect draw command buffer.

This topic is closed to new replies.

Advertisement