Jump to content
  • Advertisement
Sign in to follow this  
Seabolt

Vulkan What are your opinions on DX12/Vulkan/Mantle?

This topic is 1169 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

 

use baseInstance parameter from glDraw*BaseInstanceBaseVertex. gl_InstanceID will still be zero based, but you can use an instanced vertex element to overcome this problem (or use an extension that exposes an extra glsl variable with the value of baseInstance)

And what if you're drawing two different meshes? ie, not instancing a single mesh.

 

 

 

How are you implementing your constant buffers? From what you've written as your #3b, it sounds like you're packing multiple materials'/objects' constants into a single large constant buffer, and perhaps indexing out of it in your draws? IIRC, that's supported only in D3D11.1+, as there is no *SSetConstantBuffer function that takes offsets until then.

I have no idea about D3D11, but prolly isn't even necessary. Just update the entire buffer in one call. Buffer is defined as an array of structs, index into that to fetch the one that corresponds to the current thing being drawn.

 

 

So, he's just saying 'for the next n draws, here are the constants', and then sets indices (somehow? Not sure how he'd track that without also updating a constant. Atomic integers?) to say 'access struct n in the huge constant buffer?

Honestly, I'd rather update smaller buffers with finer granularity as I wouldn't be stalling on one large copy.

Share this post


Link to post
Share on other sites
Advertisement

How are you implementing your constant buffers? From what you've written as your #3b, it sounds like you're packing multiple materials'/objects' constants into a single large constant buffer, and perhaps indexing out of it in your draws?

Yes
 

IIRC, that's supported only in D3D11.1+, as there is no *SSetConstantBuffer function that takes offsets until then.

That's one way of doing it, and doing it that way, then you're correct. We don't use D3D11.1 functionality, though since OpenGL does support setting constant buffers by offsets, we take advantage of that to further reduce splitting some batch of draw calls.
 

Otherwise, if you aren't using constant buffers with offsets, how are you avoiding having to set things like object transforms and the like? If you are, how are you handling targets below D3D11.1?

By treating all your draws as instanced draws (even if they're just one instance) and use StartInstanceLocation.
I attach a "drawId" R32_UINT vertex buffer (instanced buffer) which is filled with 0, 1, 2, 3, 4 ... 4095 (basically, we can't batch more than 4096 draws together in the same call; that limit is not arbitrary: 4096 * 4 floats per vector = 64kb; aka the const buffer limit).
Hence the "drawId" vertex attribute will always contain the value I want as long as it is in range [0; 4096) and thus index whatever I want correctly.

This is the most compatible way of doing it which works with both D3D11 and OpenGL. There is a GL4 extension that exposes the keywords gl_DrawIDARB & gl_BaseInstanceARB which allows me to do the same without having to use an instanced vertex buffer (thus gaining some performance bits in memory fetching; though I don't know if it's noticeable since the vertex buffer is really small and doesn't consume much bandwidth; also the 4096 draws per call limit can be lifted thanks to this extension). Edited by Matias Goldberg

Share this post


Link to post
Share on other sites

1 is a valid value for the instance count.

Of course but the idea is to batch up data inside the constant/uniform buffers and use the instance ID for indexing. No sense doing it if you can only index one thing (ie, you end up what I am doing, one glDraw and glUniform1i call per mesh drawn).

Doing this for just one instance is completely valid. If you do it the way you said, although valid; your API overhead will go through the roofs, specially if you have a lot of different meshes. Edited by Matias Goldberg

Share this post


Link to post
Share on other sites

Otherwise, if you aren't using constant buffers with offsets, how are you avoiding having to set things like object transforms and the like? If you are, how are you handling targets below D3D11.1?

By treating all your draws as instanced draws (even if they're just one instance) and use StartInstanceLocation.

 

 

 And you have no noticeable problems with that? A year and half ago or so I did some quick tests where I just rendered (in OpenGL) all my objects using normal draw calls vs rendering all my objects using instancing with instance count =1, and it had some truly horrendous CPU overhead. Profiler showed that GPU fell asleep, but CPU for some reason took a lot longer for everything. If I remember right, for about 700 total draw calls (crytek sponza geometry + shadow pass), I saved something like 3 or 4ms by switching back to normal draw calls for everything (on an i7 3770k and GTX 770). Granted the setup was suboptimal at best, I sorted by shaders and textures used and nothing else, every mesh was in its own VB, etc. Maybe that was the reason and there's a much smaller instancing overhead otherwise?

Edited by agleed

Share this post


Link to post
Share on other sites

And you have no noticeable problems with that? A year and half ago or so I did some quick tests where I just rendered (in OpenGL) all my objects using normal draw calls vs rendering all my objects using instancing with instance count =1, and it had some truly horrendous CPU overhead. Profiler showed that GPU fell asleep, but CPU for some reason took a lot longer for everything. If I remember right, for about 700 total draw calls (crytek sponza geometry + shadow pass), I saved something like 3 or 4ms by switching back to normal draw calls for everything (on an i7 3770k and GTX 770). Granted the setup was suboptimal at best, I sorted by shaders and textures used and nothing else, every mesh was in its own VB, etc. Maybe that was the reason and there's a much smaller instancing overhead otherwise?

 

This would depend on how you update the per-instance buffer.

 

If you have a small buffer - with space for only one instance - and you do a separate buffer update for each instance, then OpenGL is going to perform horribly (D3D won't).  If you have a large buffer with space for all your instances, but you update them all together, then it should run well.

 

The overhead isn't instancing, it's OpenGL's buffer objects API.

Share this post


Link to post
Share on other sites

And you have no noticeable problems with that?

Nope, we're are not.

every mesh was in its own VB

There's your problem. Every time you had to switch to the next mesh, you had to respecify the VAO state.
You could be hitting the slow path by doing that per mesh + using instancing. The driver may have been able to detect the VAO only switched buffers with the non-instanced calls; but decided to respecify the whole vertex data when using instancing.
You should keep all your meshes in the same Buffer Object, or have very Buffer Objects at least.

Also, you obviously compared an instanced version without indexing into one single buffer vs normal draw calls.
You should compare instanced version + indexing into one single buffer vs normal draw calls.
If there is higher overhead from using instancing, it is more than negated by using indexes into a single buffer.

Share this post


Link to post
Share on other sites

Doing this for just one instance is completely valid. If you do it the way you said, although valid; your API overhead will go through the roofs, specially if you have a lot of different meshes.
Then with the instanced method, how would you handle drawing different meshes?

 

ie, as I see it you'd have two ways of doing it:

  1. Update the mesh transform, then issue a glDraw*Instanced call with a single instance, always fetch transform in index 0. Repeat for every single mesh.
  2. Update transform UBO with all the transforms that can fit, then issue glDraw*Instanced call, repeat this draw call increasing the base instance ID by one for every single mesh until you run out of transforms in the UBO (doing the instanced index buffer trick you mentioned since instance ID is always 0).

So you always end up with one draw call per each different mesh. Thing that differs is UBO updating scheme (no scheme in first one, batching scheme in the second one).

Share this post


Link to post
Share on other sites

I starting to be worried by rumors that Google may have its own low level api too. This would basically mean one API per OS which break the purpose of Vulkan in the first place...

Share this post


Link to post
Share on other sites

 

Then with the instanced method, how would you handle drawing different meshes?

 

ie, as I see it you'd have two ways of doing it:

  1. Update the mesh transform, then issue a glDraw*Instanced call with a single instance, always fetch transform in index 0. Repeat for every single mesh.
  2. Update transform UBO with all the transforms that can fit, then issue glDraw*Instanced call, repeat this draw call increasing the base instance ID by one for every single mesh until you run out of transforms in the UBO (doing the instanced index buffer trick you mentioned since instance ID is always 0).

So you always end up with one draw call per each different mesh. Thing that differs is UBO updating scheme (no scheme in first one, batching scheme in the second one).

 

 

glMultiDrawIndirect basically iterates glDraw*Instanced call over all element of bound indirect draw command buffer.

Share this post


Link to post
Share on other sites

ie, as I see it you'd have two ways of doing it:

  • Update the mesh transform, then issue a glDraw*Instanced call with a single instance, always fetch transform in index 0. Repeat for every single mesh.
  • Update transform UBO with all the transforms that can fit, then issue glDraw*Instanced call, repeat this draw call increasing the base instance ID by one for every single mesh until you run out of transforms in the UBO (doing the instanced index buffer trick you mentioned since instance ID is always 0).
So you always end up with one draw call per each different mesh. Thing that differs is UBO updating scheme (no scheme in first one, batching scheme in the second one).
The CPU cost of a draw call depends on the state changes that preceeded it.
Apparently setting the base instance ID state is much cheaper than binding a new UBO, which makes sense, as there's a tonne of resource management code that has to run behind the scenes whenever you bind any resource, especially if it's an orphaned resource.

Also, yes, updating one large UBO is going to be much cheaper than updating thousands of small ones. Especially if you use persistent unsynchronized updates.

On the GPU side, draw calls are free. What costs is context/segment switches. If two draw-calls use the same "context", the GPU bundles them together, avoiding stalls.
Certain state changes "roll the context"/"begin a segment"/etc, which means the next draw can't overlap with the previous one.
It would be interesting to find out where base-instance-id state and UBO bindings stand in regards to context rolls on different GPUs...

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!