Jump to content
  • Advertisement
Sign in to follow this  
Seabolt

Vulkan What are your opinions on DX12/Vulkan/Mantle?

This topic is 1567 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

 

Doing this for just one instance is completely valid. If you do it the way you said, although valid; your API overhead will go through the roofs, specially if you have a lot of different meshes.
Then with the instanced method, how would you handle drawing different meshes?

 

ie, as I see it you'd have two ways of doing it:

  1. Update the mesh transform, then issue a glDraw*Instanced call with a single instance, always fetch transform in index 0. Repeat for every single mesh.
  2. Update transform UBO with all the transforms that can fit, then issue glDraw*Instanced call, repeat this draw call increasing the base instance ID by one for every single mesh until you run out of transforms in the UBO (doing the instanced index buffer trick you mentioned since instance ID is always 0).

So you always end up with one draw call per each different mesh. Thing that differs is UBO updating scheme (no scheme in first one, batching scheme in the second one).

 

 

glDrawElementsInstancedBaseInstance - one big per-instance buffer containing all of your instances.  It doesn't have to be a UBO; if the per-instance data is small enough (which it will be if it's just a transform) it can be a VBO and specified in your VAO using glVertexAttribDivisor.  Use the baseinstance parameter of your draw call to specify which instance you're currently drawing.  That will index into your per-instance buffer effectively for free.  gl_InstanceID remains 0-based but in this case it doesn't matter because you're not using it.  You're not touching uniforms, you're not updating state between draws, the only thing that changes is the parameters to your draw calls.

Share this post


Link to post
Share on other sites
Advertisement

I'm not arguing about the impact of the UBO updating/binding, we pretty much established we do are trying to do UBO update batching. Nor I am talking about OpenGL 4.x features as I've mentioned in a previous post (ie, no indirect draw calls). Strictly GL 3 here.

 

What I am asking is how much different is doing a glDraw*Instanced call changing the instance ID offset vs doing a normal glDraw* call while calling glUniform1i for the indices.

 

Mathias said the draw-glUniform1i combination it would be detrimental if you have many different meshes (we're still limited by one different mesh -> one draw call) but I'm trying to figure out if it would be that much worse than doing a glDraw*Instance call and changing the base instance ID, fetching the real index from an instance attribute buffer, since you still have one draw call per different mesh.

 

State isn't changing beyond the glUniform1i call, which requires no binding. If it matters, the same could be accomplished with glVertexAttrib*i and passing the index that way before each draw call (which is what nVidia guys used for their scene graph presentations in 2013 and 2014).

 

In short, for different meshes, we still have one draw call per mesh, we still can batch UBO updates, but there are several ways to get the needed index to the shader (glUniform*i, glVertexAttrib*i, or glDraw*Instanced with indices in instanced attribute).

 

EDIT: Reading mhagain's post:

 

t doesn't have to be a UBO
  Of course, but Mathias said it was a constant buffer, so I'm assuming he is using UBOs.

 

Then again, what you're saying here is basically, dont use UBOs, put all in instanced attributes? It sounds good but see this presentation for example, there is a small section on UBO updating and indexing inside the shader, maybe nVidia hardware doesn't works that well with instanced drawing.

 

EDIT2: Fucking quote blocks and fucking editor for fucks sake I fucking hate it.

Edited by TheChubu

Share this post


Link to post
Share on other sites


This is the most compatible way of doing it which works with both D3D11 and OpenGL. There is a GL4 extension that exposes the keywords gl_DrawIDARB & gl_BaseInstanceARB which allows me to do the same without having to use an instanced vertex buffer (thus gaining some performance bits in memory fetching; though I don't know if it's noticeable since the vertex buffer is really small and doesn't consume much bandwidth; also the 4096 draws per call limit can be lifted thanks to this extension).

 

 

Not sure if it helps performance wise, azdo's slides say that gl_DrawID often cripples performance

http://fr.slideshare.net/CassEveritt/approaching-zero-driver-overhead (slide 33)

 

There is no word on gl_BaseInstance though.

Share this post


Link to post
Share on other sites

I starting to be worried by rumors that Google may have its own low level api too. This would basically mean one API per OS which break the purpose of Vulkan in the first place...

Wouldn't that actually justify the purpose of Vulkan?  One cross-platform API that will work regardless of the OS, more or less the same purpose as with OpenGL.  This would be in contrast to it simply being intended to fill the platform gap, providing a high performance 3D graphics API on platforms that don't have a native one.  I'd be highly disappointed if the latter purpose were all that the Vulkan designers ever hoped to achieve.

Share this post


Link to post
Share on other sites

 

How are you implementing your constant buffers? From what you've written as your #3b, it sounds like you're packing multiple materials'/objects' constants into a single large constant buffer, and perhaps indexing out of it in your draws?

Yes
 

IIRC, that's supported only in D3D11.1+, as there is no *SSetConstantBuffer function that takes offsets until then.

That's one way of doing it, and doing it that way, then you're correct. We don't use D3D11.1 functionality, though since OpenGL does support setting constant buffers by offsets, we take advantage of that to further reduce splitting some batch of draw calls.
 

Otherwise, if you aren't using constant buffers with offsets, how are you avoiding having to set things like object transforms and the like? If you are, how are you handling targets below D3D11.1?

By treating all your draws as instanced draws (even if they're just one instance) and use StartInstanceLocation.
I attach a "drawId" R32_UINT vertex buffer (instanced buffer) which is filled with 0, 1, 2, 3, 4 ... 4095 (basically, we can't batch more than 4096 draws together in the same call; that limit is not arbitrary: 4096 * 4 floats per vector = 64kb; aka the const buffer limit).
Hence the "drawId" vertex attribute will always contain the value I want as long as it is in range [0; 4096) and thus index whatever I want correctly.

This is the most compatible way of doing it which works with both D3D11 and OpenGL. There is a GL4 extension that exposes the keywords gl_DrawIDARB & gl_BaseInstanceARB which allows me to do the same without having to use an instanced vertex buffer (thus gaining some performance bits in memory fetching; though I don't know if it's noticeable since the vertex buffer is really small and doesn't consume much bandwidth; also the 4096 draws per call limit can be lifted thanks to this extension).

 

So, let me reword it to make sure I understand what you're doing. You have a single per-instance "vertex buffer" that just has a sequence of integers from 0 to 4095. Your draws specify which instance offset they are, and because of that when you get this per-instance ID, it matches which overall ID it is of your draws. You then use that ID to access the constants from a 64K constant buffer for that draw.

Right?

Do you have any performance issues with it? I'd be wary of requiring a 64K copy to GPU memory before draws. Is it faster if you use smaller batches?

Also importantly, how scalable is this to next-gen draw bundles? Would you need to use indirect parameters to inject the right instance offset for your draws in the bundles?

I suppose the main benefit here is that you have reduced the number of actual API calls dramatically to actually perform draws, though with the next-gen APIs, isn't the benefit of that going to be somewhat mitigated by the fact that the actual draws themselves can be completely encapsulated in prebuilt bundles?

Edit: I suppose the core question here is: are the multidraws (indirect or not) actually pushed as such onto the GPU's command buffer (that is, is there a 'multidraw' command) or is the driver extracting the draws and inserting them individually (or batched, whatever) onto the command buffer? If the former, then it would certainly be faster. If the latter, I imagine a ton of draw bundles would be faster.

Edited by Ameise

Share this post


Link to post
Share on other sites


Edit: I suppose the core question here is: are the multidraws (indirect or not) actually pushed as such onto the GPU's command buffer (that is, is there a 'multidraw' command) or is the driver extracting the draws and inserting them individually (or batched, whatever) onto the command buffer? If the former, then it would certainly be faster. If the latter, I imagine a ton of draw bundles would be faster.

AFAIK, it's former that the application pushed the draws into the command buffer. 


 

Share this post


Link to post
Share on other sites
On Radeon beginning from hd6950 and geforce multi draw indirect is a hardware feature. On Intel (up to Haswell at least, don't Know for future chip) it is emulated by a loop in the driver.

Share this post


Link to post
Share on other sites

 

I starting to be worried by rumors that Google may have its own low level api too. This would basically mean one API per OS which break the purpose of Vulkan in the first place...

Wouldn't that actually justify the purpose of Vulkan?  One cross-platform API that will work regardless of the OS, more or less the same purpose as with OpenGL.  This would be in contrast to it simply being intended to fill the platform gap, providing a high performance 3D graphics API on platforms that don't have a native one.  I'd be highly disappointed if the latter purpose were all that the Vulkan designers ever hoped to achieve.

 

The problem is that Google is the one that controls Android, and is not an open environment like Windows for the user (at least for most users), in the sense that you can't install drivers like you do on Windows, the drivers comes with the device and its updates (which for worse in most cases are under control of the carriers).

 

Sure, if you install Cyanogen Mod or some other custom version you can do whatever you want, but for the normal user, they are stuck with whats comes with the device, which means that if Google decides to not implement Vulkan, you can't use Vulkan for Android and you are forced to use their API (so is worse than MS with DX vs OpenGL, since in Windows at least you can always install a driver that implements the latest OpenGL).

 

The thing is, these days Google is the new MS and Android the new Windows, they have the biggest portion of market and they are in a position where they can do whatever they want.

This is worse case scenario, I don't think Google will do this, but is a possibility that worries.
 

But to be honest, I don't care if I have to implement one or two more APIs, since we already support a bunch (DX11, OpenGL 3.x, 4.x, ES 2.x, ES3.x, and we have an early implementation for DX12 and we want to add support for PS4 as well). I prefer to have to support multiple strong and solid APIs than a bad one (OpenGL am looking at you). 
 
Vulkan seems great but I still have reserves, the good thing is that is just like D3D12, so porting should be very easy (the only thing that I need now is a HLSL compiler to SPIR-V tongue.png ).
Edited by Killeak

Share this post


Link to post
Share on other sites

On Radeon beginning from hd6950 and geforce multi draw indirect is a hardware feature. On Intel (up to Haswell at least, don't Know for future chip) it is emulated by a loop in the driver.

 

One would hope that Intel at least validates once only rather than for each draw call in the loop.

Share this post


Link to post
Share on other sites

**Sigh**

If anyone has lots of questions, you can just compile and try Ogre 2.1, then disect its source code to see how we're handling it. It's Open Source after all.
Doing what I'm saying is not impossible, otherwise we wouldn't be doing it.

To answer The Chubu's question, glDrawElementsInstancedBaseVertexBaseInstance has THREE key parameters:

  • baseInstance: With this I can send an arbitrary index as I explained, which I can use to index whatever I want from a constant (UBO) or texture (TBO) buffer. I can even perform multiple indirections (retrieve an index from an UBO using the index from baseInstance)
  • baseVertex?: With this I can store as many meshes as I want in the same buffer object; and select them individually by providing the offset location to the start of the mesh I want to render. With this, I don't need to alter state at all (unless vertex format changes). The meshes don't even need to be contiguous in memory; they just need to be in the same Buffer Object and aligned to the vertex size.
  • indices: With this I can store as many meshes' index data as I want in the same buffer object, and select them individually by providing the offset location to the start of the index data. Remember to keep alignment to 4 bytes. Bonus points: You can keep the vertex and index data in the same buffer object.

 

The DX11 equivalent of this DrawIndexedInstanced and the analogous parameters are StartInstanceLocation, BaseVertexLocation & StartIndexLocation respectively.

We treat all of our draws with these functions.

 

The DX11 function works on DX10 hardware just fine. glDrawElementsInstancedBaseVertexBaseInstance was introduced in GL 4.2; however it is available to GL3 hardware via extension. The most notable remark is that OS X doesn't support this extension, at the time of writing.

 

The end result is that we just map the buffer(s) once; write all the data in sequence; bind these buffers and then issue a lot of consecutive glDrawElementsInstancedBaseVertexBaseInstance / DrawIndexedInstanced calls without any other API calls in between.

We only need to perform additional API calls when:

  • We need to bind a different buffer / buffer section (i.e. we've exhausted the 64kb limit)
  • We need to change state (shaders, vertex format, blending modes, rasterizer states; we keep them sorted to reduce this)
  • We're using more than one mesh pool (pool = a buffer where we store all our meshes together), and the next mesh is stored in another pool (we sort by pools though, in order to reduce this switching).

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!