Sign in to follow this  
Seabolt

Vulkan What are your opinions on DX12/Vulkan/Mantle?

Recommended Posts

 

I was thinking in the sense that you batched 4,096 draw calls, before the call even got submitted to the GPU, you'd have to perform a copy to GPU memory for your 64KiB of data. Until that copy is complete, the GPU may not be doing anything at all. However, I am quite possibly vastly underestimating the speed of such a copy (which is probably on the order of microseconds).

If you do it properly, copying data is completely asynchronous, in that it only involves CPU work - the GPU is unaware.
Also, the GPU is almost always 1 or more frames behind the CPU - i.e. The OS/driver are buffering all your draw calls for 1+ frames.
Stopping issuing draws to copy some data, or run gameplay code, etc will not starve the GPU, because there's this massive frame long buffer of work. As long as once per frame you submit a frame's worth of work, the GPU will never starve.

 

 

I think I'm too used to working on CPU-bound applications to ever actually experience this :)

Share this post


Link to post
Share on other sites
Well in a CPU bound situation, the GPU will starve every frame until you can manage to get your CPU frametimes below your GPU frametimes.

As for the copy, say we're lucky enough to have a 20Gbps bus -that's ~2.33 GiBps, or ~2.38 MiB per millisecond, or ~2.44 KiB per microsecond!
So, 64KiB could be transferred in ~26 microseconds.

On the other hand, if you have to to a GL/D3D map/unmap operation, that's probably 300 microseconds of driver overhead!

Share this post


Link to post
Share on other sites

Well in a CPU bound situation, the GPU will starve every frame until you can manage to get your CPU frametimes below your GPU frametimes.

As for the copy, say we're lucky enough to have a 20Gbps bus -that's ~2.33 GiBps, or ~2.38 MiB per millisecond, or ~2.44 KiB per microsecond!
So, 64KiB could be transferred in ~26 microseconds.

On the other hand, if you have to to a GL/D3D map/unmap operation, that's probably 300 microseconds of driver overhead!

 

Yup, on certain projects I've certainly seen map/unmap operations build up.

 

This is a different (and personal) codebase from what I usually work on (which are clients'), so I'm trying to "do things right" - I suspect I'm a bit 'polluted' by other people's codebases that didn't necessarily work well. Forgive my questions if they seem ignorant - I haven't worked on an actual modern, well-performing codebase :(.

Share this post


Link to post
Share on other sites

I suspect I'm a bit 'polluted' by other people's codebases that didn't necessarily work well. Forgive my questions if they seem ignorant - I haven't worked on an actual modern, well-performing codebase

I'm pretty sure its very hard to saturate a PCIe x16 bus unless you're doing some very serious graphics (ie, think Crysis 14 or something) or plain stupid things (ie, reupload all textures every frame or something). If there is a stall from the application POV, it will probably be by a driver synchronization point (in which case you'd have to rework how you are doing things) and/or just pure API overhead (in which case you'd need to minimize API calls).

 

Thanks Mathias for answering my questions biggrin.png

 

EDIT: For fucks sake this editor and fucking quote blocks. Its fucking broken. BROKEN YOU HEAR ME!? BROKEN! NOT FROZEN! BROKEN!

Edited by TheChubu

Share this post


Link to post
Share on other sites

On the other hand, if you have to to a GL/D3D map/unmap operation, that's probably 300 microseconds of driver overhead!

That would only mean 53 map/unmaps per frame on 60fps. Can't be right.

Share this post


Link to post
Share on other sites

 

Relevant to the discussion of instancing on the previous coupla pages, I see that all draw commands in Mantle are instanced, so the "1 is a valid instance count" approach has merit and will likely be the way we're doing things in the future.
 

Share this post


Link to post
Share on other sites

Making games for PC or having cross-platform implementation which includes for PCs is becoming significantly more attractive because of this jump in performance by having efficient multiple draw calls from multicore CPUs. In a single core CPU system the improvement will be much less, so the biggest advantage is in multicore CPU computers.

 

My prediction is that more pressure will be put on computer manufacturers to increase the number of cores of the CPU and in some cases two or more CPUs years in the future.

Tablets and smart phone manufacturers might also be indirectly influenced to provide multicore CPUs. Trickle down performance demand, perhaps?

Edited by 3Ddreamer

Share this post


Link to post
Share on other sites

Has anyone had any luck getting NDA access to the Mantle headers?  Think they'll grant it to a student looking to prepare for Vulkan or will they laugh at me?  The programming guide is fantastic but implementation is worth a thousand words.

Share this post


Link to post
Share on other sites

You'll have better luck with DX12 although it requires you to install Windows 10 + VS 2015.

I tried but got no response, they're asking for a company name and I think they won't provide the sdk for individuals or teams that can't work under DNA (read : open source project)

Share this post


Link to post
Share on other sites

On the other hand, if you have to to a GL/D3D map/unmap operation, that's probably 300 microseconds of driver overhead!

That would only mean 53 map/unmaps per frame on 60fps. Can't be right.

Yeah I was pulling numbers out of thin air -- but map/unmap do have a lot of driver overhead, which is probably often similar in cost to the actual memcpy operation.

Has anyone had any luck getting NDA access to the Mantle headers?  Think they'll grant it to a student looking to prepare for Vulkan or will they laugh at me?  The programming guide is fantastic but implementation is worth a thousand words.

I was part of the beta group with full access. I got the impression that the beta was mainly for big companies, so I was very luck to get access.
I dont know what the future of that program is now though. I dont expect they'll be giving anyone else access to that version of Mantle; they'll want you to use Vulkan instead. Edited by Hodgman

Share this post


Link to post
Share on other sites


I was part of the beta group with full access. I got the impression that the beta was mainly for big companies, so I was very luck to get access.
Preeetty sure it was just a very elaborated "You're fat" joke.

Share this post


Link to post
Share on other sites


I was part of the beta group with full access. I got the impression that the beta was mainly for big companies, so I was very luck to get access.
I dont know what the future of that program is now though. I dont expect they'll be giving anyone else access to that version of Mantle; they'll want you to use Vulkan instead.

 

We're a quite small studio and we got access to the beta group as well, I don't think there was a prerequisite of being a "big" company.

But yeah, it seems like they want to only open up their Mantle tech (or at least a more evolved version of what's available now) for customers with very specific needs. As far as games are considered Mantle was just a catalyst for the development of Vulkan. Let's hope the transition from Mantle to Vulkan will be quite painless, not really in the mood to rip out my Mantle renderer in its entirety.

Share this post


Link to post
Share on other sites

My (little) experience with DX12 makes me a little worried about designing code that can run DX12 and let's say DX11 or GL 4.

It should be quite straightforward to port an existing engine using global variable, but it looks much more difficult to have an abstraction that allows to support the new async facilities of DX12.

For instance in my app I have a "VAOManager" object in OpenGL, which basically embeds a giant index buffer with several vertex buffers containing all my (static) meshes.

These buffer can be resized when a new mesh is added to the scene, in this case new buffers are allocated, and content of old buffers are copied. GL ensures that all previous and future draw commands use correct location.

With DX12 I can block the calls untill command queue has finished but it will introduce a stall. But I can also create a new copy of the vertex buffer, using the DMA queue, and delete the old one when graphic command queue haas finished. I'm not sure I can modify a command queue on the fly but if it's possible, I can ammend some comand list to use the newly vertex buffer too.

While the first solution may be possible with a light abstraction, 2nd and third behavior are a lot more DX12 specific and I fear trying to come with something that could please DX11 or GL4 may introduce some overhead for the older API.

Share this post


Link to post
Share on other sites


While the first solution may be possible with a light abstraction, 2nd and third behavior are a lot more DX12 specific and I fear trying to come with something that could please DX11 or GL4 may introduce some overhead for the older API.

 

This may be an inconvenience for now, but in the end this is a very good thing! In a lot of cases DX12 can become a complete replacement for DX11, there's no real need to maintain both renderers if you decide on building a DX12 implementation. The fact that Microsoft departed this drastically from the DX11 API design only shows even more that they were dedicated to doing things right and that they didn't feel the need to maintain backwards compatibility.

 

Once DX12 becomes mainstream and developers don't feel the need anymore to drag a DX11 implementation along in their games/engines is when we'll see some very exciting things happen in the PC space. The sooner that time comes the better!

 

I assume the same will apply for the OpenGL -> Vulkan transition based off of my experiences with Mantle.

Share this post


Link to post
Share on other sites

 

With DX12 I can block the calls untill command queue has finished but it will introduce a stall. But I can also create a new copy of the vertex buffer, using the DMA queue, and delete the old one when graphic command queue haas finished. I'm not sure I can modify a command queue on the fly but if it's possible, I can ammend some comand list to use the newly vertex buffer too.

 

The thing is, your second option here is essentially the very same as what OpenGL currently does behind-the-scenes.  But with OpenGL the code that does it is in the driver; with the new APIs you have to explicitly write the code for it yourself.

 

And this is actually a good thing, because now you get to see first-hand what the real effects of OpenGL's higher level of abstraction are, and now you get to be able to choose whether or not the behaviour of that abstraction is in fact actually what you want.  (Plus if it blows up or does the wrong thing it'll be your code, not driver code, so you'll be able to fix it.)

Share this post


Link to post
Share on other sites
And this is actually a good thing, because now you get to see first-hand what the real effects of OpenGL's higher level of abstraction are, and now you get to be able to choose whether or not the behaviour of that abstraction is in fact actually what you want.  (Plus if it blows up or does the wrong thing it'll be your code, not driver code, so you'll be able to fix it.)

 

 

Yes but what I mean is : in order to fight draw call overhead, I reduced binding call to a minimum by implementing a giant VAO Manager that stores all the (static) geometry of a scene.

With DX12 I can use the VAO Manager concept but it means reimplementing the allocation scheme in GL drivers, which means handling several synchronisation/fence that won't appear in the GL equivalent code.
Or I can just ditch the VAO Manager since binding a vertex buffer/index buffer costs almost nothing in dx12. But then I don't have a 1:1 mapping with old and new API.

 

I also think it's better to have a clean explicit API but it's hard to drop support for older one, today you get complains if you target gpu from GL 3 only. I can't think of Vulkan only engine before at least 5 or 6 years...

Share this post


Link to post
Share on other sites

But then I don't have a 1:1 mapping with old and new API.


And this is a good thing; you can't take a system you designed to get around the problems of an old API and map it to a new API where the problems no longer exist and expect it to be optimal.

Sure, you can make Vulkan/DX12 work like DX11/GL3/4 era APIs but you'll start reproducing driver work and generally make things less than optimal. (Ran into this problem a while back; designing new renderer layer, based too closely on DX11 so the PS4 version was a nightmare. If we had waited a bit to see the new hardware then chances are it would have been more towards the PS4 way of doing things and DX11 made to emulate that in some way.)

Right now it is like you've worked out a great way to cook burgers, and you can cook burgers really well, but now people want beer and you are trying to work out how to apply your grill to serve them beer without doing any extra work.

This is also a great argument against close API abstraction layers as they don't react well to massive API changes.

Honestly, if you can, I'd take some time to strip your abstraction layers and rebuild so that the Vulkan/DX12 way of doing things is your primary abstraction and make OpenGL work with that instead. If nothing else the old way of doing things IS going away, even if it takes a little while, so getting on top of it sooner rather than later is probably a good move.

If you can't do that, because you have a game to ship or something, then I'd forget DX12 support for now, finish what you need to finish and then go back and rebuild.

End of the day an abstraction which is based around the current DX11/OpenGL limitations and functionality will be broken and slow going forward, you are probably better off redesigning with the new method in mind. Edited by phantom

Share this post


Link to post
Share on other sites

This brilliant programmer reverse-engineered the Mantle API and wrote a "Hello Triangle" tutorial.  Definitely worth checking out.

 

https://medium.com/@Overv/implementing-hello-triangle-in-mantle-4302450fbcd2

 

Interesting. It seems like a verbose effort to get a triangle up and running (although, from what I remember, my first experiences with DX11 and GL3+ first triangle were the same), but conceptually it's a lot easier to grasp than I feared.

 

I haven't been active here for that long. How do the forums look like when a new API is released to the masses? Especially a paradigm shift like DX12 and Mantle must be fun.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Partner Spotlight

  • Forum Statistics

    • Total Topics
      627686
    • Total Posts
      2978635
  • Similar Content

    • By mark_braga
      I am looking at the SaschaWillems subpass example for getting some insight into subpass depdendencies but its hard to understand whats going on without any comments. Also there is not a lot of documentation on subpass dependencies overall.
      Looking at the code, I can see that user specifies the src subpass, dst subpass and src state, dst state. But there is no mention of which resource the dependency is on. Is a subpass dependency like a pipeline barrier. If yes, how does it issue the barrier? Is the pipeline barrier issued on all attachments in the subpass with the input src and dst access flags? Any explanation will really clear a lot of doubts on subpass dependencies.
      Thank you
    • By mark_braga
      I need to index into a texture array using indices which are not dynamically uniform. This works fine on NVIDIA chips but you can see the artifacts on AMD due to the wavefront problem. This means, a lot of pixel invocations get the wrong index value. I know you fix this by using NonUniformResourceIndex in hlsl. Is there an equivalent for Vulkan glsl?
      This is the shader code for reference. As you can see, index is an arbitrary value for each pixel and is not dynamically uniform. I fix this for hlsl by using NonUniformResourceIndex(index)
      layout(set = 0, binding = 0) uniform sampler textureSampler; layout(set = 0, binding = 1) uniform texture2D albedoMaps[256]; layout(location = 0) out vec4 oColor; void main() { uint index = calculate_arbitrary_texture_index(); vec2 texCoord = calculate_texcoord(); vec4 albedo = texture(sampler2D(albedoMaps[index], textureSampler), texCoord); oColor = albedo; } Thank you
    • By Mercesa
      As the title says, I am explicitly creating a too small descriptor pool, which should NOT support the resources I am going to allocate from it.
       
      std::array<vk::DescriptorPoolSize, 3> type_count; // Initialize our pool with these values type_count[0].type = vk::DescriptorType::eCombinedImageSampler; type_count[0].descriptorCount = 0; type_count[1].type = vk::DescriptorType::eSampler; type_count[1].descriptorCount = 0; type_count[2].type = vk::DescriptorType::eUniformBuffer; type_count[2].descriptorCount = 0; vk::DescriptorPoolCreateInfo createInfo = vk::DescriptorPoolCreateInfo() .setPNext(nullptr) .setMaxSets(iMaxSets) .setPoolSizeCount(type_count.size()) .setPPoolSizes(type_count.data()); pool = aDevice.createDescriptorPool(createInfo);  
      I have an allocation function which looks like this, I am allocating a uniform, image-combined sampler and a regular sampler. Though if my pool is empty this should not work?
      vk::DescriptorSetAllocateInfo alloc_info[1] = {}; alloc_info[0].pNext = NULL; alloc_info[0].setDescriptorPool(pool); alloc_info[0].setDescriptorSetCount(iNumToAllocate); alloc_info[0].setPSetLayouts(&iDescriptorLayouts); std::vector<vk::DescriptorSet> tDescriptors; tDescriptors.resize(iNumToAllocate); iDevice.allocateDescriptorSets(alloc_info, tDescriptors.data());  
    • By Mercesa
      When loading in a model with a lot of meshes that have different materials that contain different textures, how would you handle this in Vulkan?
      Is it possible to partially change a DescriptorSet with a WriteDescriptorSet object? Even if it is possible, it does not sound ideal to update the descriptor set for every mesh. I am aware of the boundless texture arrays in shader model 5.0+, but for now I want to keep it as simple as possible.
    • By khawk
      CRYENGINE has released their latest version with support for Vulkan, Substance integration, and more. Learn more from their announcement and check out the highlights below.
      Substance Integration
      CRYENGINE uses Substance internally in their workflow and have released a direct integration.
       
      Vulkan API
      A beta version of the Vulkan renderer to accompany the DX12 implementation. Vulkan is a cross-platform 3D graphics and compute API that enables developers to have high-performance real-time 3D graphics applications with balanced CPU/GPU usage. 

       
      Entity Components
      CRYENGINE has addressed a longstanding issue with game code managing entities within the level. The Entity Component System adds a modular and intuitive method to construct games.
      And More
      View the full release details at the CRYENGINE announcement here.

      View full story
  • Popular Now