Sign in to follow this  
Seabolt

Vulkan What are your opinions on DX12/Vulkan/Mantle?

Recommended Posts

Ameise    1148

 

I was thinking in the sense that you batched 4,096 draw calls, before the call even got submitted to the GPU, you'd have to perform a copy to GPU memory for your 64KiB of data. Until that copy is complete, the GPU may not be doing anything at all. However, I am quite possibly vastly underestimating the speed of such a copy (which is probably on the order of microseconds).

If you do it properly, copying data is completely asynchronous, in that it only involves CPU work - the GPU is unaware.
Also, the GPU is almost always 1 or more frames behind the CPU - i.e. The OS/driver are buffering all your draw calls for 1+ frames.
Stopping issuing draws to copy some data, or run gameplay code, etc will not starve the GPU, because there's this massive frame long buffer of work. As long as once per frame you submit a frame's worth of work, the GPU will never starve.

 

 

I think I'm too used to working on CPU-bound applications to ever actually experience this :)

Share this post


Link to post
Share on other sites
Hodgman    51234
Well in a CPU bound situation, the GPU will starve every frame until you can manage to get your CPU frametimes below your GPU frametimes.

As for the copy, say we're lucky enough to have a 20Gbps bus -that's ~2.33 GiBps, or ~2.38 MiB per millisecond, or ~2.44 KiB per microsecond!
So, 64KiB could be transferred in ~26 microseconds.

On the other hand, if you have to to a GL/D3D map/unmap operation, that's probably 300 microseconds of driver overhead!

Share this post


Link to post
Share on other sites
Ameise    1148

Well in a CPU bound situation, the GPU will starve every frame until you can manage to get your CPU frametimes below your GPU frametimes.

As for the copy, say we're lucky enough to have a 20Gbps bus -that's ~2.33 GiBps, or ~2.38 MiB per millisecond, or ~2.44 KiB per microsecond!
So, 64KiB could be transferred in ~26 microseconds.

On the other hand, if you have to to a GL/D3D map/unmap operation, that's probably 300 microseconds of driver overhead!

 

Yup, on certain projects I've certainly seen map/unmap operations build up.

 

This is a different (and personal) codebase from what I usually work on (which are clients'), so I'm trying to "do things right" - I suspect I'm a bit 'polluted' by other people's codebases that didn't necessarily work well. Forgive my questions if they seem ignorant - I haven't worked on an actual modern, well-performing codebase :(.

Share this post


Link to post
Share on other sites
TheChubu    9452

I suspect I'm a bit 'polluted' by other people's codebases that didn't necessarily work well. Forgive my questions if they seem ignorant - I haven't worked on an actual modern, well-performing codebase

I'm pretty sure its very hard to saturate a PCIe x16 bus unless you're doing some very serious graphics (ie, think Crysis 14 or something) or plain stupid things (ie, reupload all textures every frame or something). If there is a stall from the application POV, it will probably be by a driver synchronization point (in which case you'd have to rework how you are doing things) and/or just pure API overhead (in which case you'd need to minimize API calls).

 

Thanks Mathias for answering my questions biggrin.png

 

EDIT: For fucks sake this editor and fucking quote blocks. Its fucking broken. BROKEN YOU HEAR ME!? BROKEN! NOT FROZEN! BROKEN!

Edited by TheChubu

Share this post


Link to post
Share on other sites
kalle_h    2464

On the other hand, if you have to to a GL/D3D map/unmap operation, that's probably 300 microseconds of driver overhead!

That would only mean 53 map/unmaps per frame on 60fps. Can't be right.

Share this post


Link to post
Share on other sites
mhagain    13430

 

Relevant to the discussion of instancing on the previous coupla pages, I see that all draw commands in Mantle are instanced, so the "1 is a valid instance count" approach has merit and will likely be the way we're doing things in the future.
 

Share this post


Link to post
Share on other sites
3Ddreamer    3826

Making games for PC or having cross-platform implementation which includes for PCs is becoming significantly more attractive because of this jump in performance by having efficient multiple draw calls from multicore CPUs. In a single core CPU system the improvement will be much less, so the biggest advantage is in multicore CPU computers.

 

My prediction is that more pressure will be put on computer manufacturers to increase the number of cores of the CPU and in some cases two or more CPUs years in the future.

Tablets and smart phone manufacturers might also be indirectly influenced to provide multicore CPUs. Trickle down performance demand, perhaps?

Edited by 3Ddreamer

Share this post


Link to post
Share on other sites
Klutzershy    1681

Has anyone had any luck getting NDA access to the Mantle headers?  Think they'll grant it to a student looking to prepare for Vulkan or will they laugh at me?  The programming guide is fantastic but implementation is worth a thousand words.

Share this post


Link to post
Share on other sites
vlj    1070

You'll have better luck with DX12 although it requires you to install Windows 10 + VS 2015.

I tried but got no response, they're asking for a company name and I think they won't provide the sdk for individuals or teams that can't work under DNA (read : open source project)

Share this post


Link to post
Share on other sites
Hodgman    51234

On the other hand, if you have to to a GL/D3D map/unmap operation, that's probably 300 microseconds of driver overhead!

That would only mean 53 map/unmaps per frame on 60fps. Can't be right.

Yeah I was pulling numbers out of thin air -- but map/unmap do have a lot of driver overhead, which is probably often similar in cost to the actual memcpy operation.

Has anyone had any luck getting NDA access to the Mantle headers?  Think they'll grant it to a student looking to prepare for Vulkan or will they laugh at me?  The programming guide is fantastic but implementation is worth a thousand words.

I was part of the beta group with full access. I got the impression that the beta was mainly for big companies, so I was very luck to get access.
I dont know what the future of that program is now though. I dont expect they'll be giving anyone else access to that version of Mantle; they'll want you to use Vulkan instead. Edited by Hodgman

Share this post


Link to post
Share on other sites
TheChubu    9452


I was part of the beta group with full access. I got the impression that the beta was mainly for big companies, so I was very luck to get access.
Preeetty sure it was just a very elaborated "You're fat" joke.

Share this post


Link to post
Share on other sites
Radikalizm    4807


I was part of the beta group with full access. I got the impression that the beta was mainly for big companies, so I was very luck to get access.
I dont know what the future of that program is now though. I dont expect they'll be giving anyone else access to that version of Mantle; they'll want you to use Vulkan instead.

 

We're a quite small studio and we got access to the beta group as well, I don't think there was a prerequisite of being a "big" company.

But yeah, it seems like they want to only open up their Mantle tech (or at least a more evolved version of what's available now) for customers with very specific needs. As far as games are considered Mantle was just a catalyst for the development of Vulkan. Let's hope the transition from Mantle to Vulkan will be quite painless, not really in the mood to rip out my Mantle renderer in its entirety.

Share this post


Link to post
Share on other sites
vlj    1070

My (little) experience with DX12 makes me a little worried about designing code that can run DX12 and let's say DX11 or GL 4.

It should be quite straightforward to port an existing engine using global variable, but it looks much more difficult to have an abstraction that allows to support the new async facilities of DX12.

For instance in my app I have a "VAOManager" object in OpenGL, which basically embeds a giant index buffer with several vertex buffers containing all my (static) meshes.

These buffer can be resized when a new mesh is added to the scene, in this case new buffers are allocated, and content of old buffers are copied. GL ensures that all previous and future draw commands use correct location.

With DX12 I can block the calls untill command queue has finished but it will introduce a stall. But I can also create a new copy of the vertex buffer, using the DMA queue, and delete the old one when graphic command queue haas finished. I'm not sure I can modify a command queue on the fly but if it's possible, I can ammend some comand list to use the newly vertex buffer too.

While the first solution may be possible with a light abstraction, 2nd and third behavior are a lot more DX12 specific and I fear trying to come with something that could please DX11 or GL4 may introduce some overhead for the older API.

Share this post


Link to post
Share on other sites
Radikalizm    4807


While the first solution may be possible with a light abstraction, 2nd and third behavior are a lot more DX12 specific and I fear trying to come with something that could please DX11 or GL4 may introduce some overhead for the older API.

 

This may be an inconvenience for now, but in the end this is a very good thing! In a lot of cases DX12 can become a complete replacement for DX11, there's no real need to maintain both renderers if you decide on building a DX12 implementation. The fact that Microsoft departed this drastically from the DX11 API design only shows even more that they were dedicated to doing things right and that they didn't feel the need to maintain backwards compatibility.

 

Once DX12 becomes mainstream and developers don't feel the need anymore to drag a DX11 implementation along in their games/engines is when we'll see some very exciting things happen in the PC space. The sooner that time comes the better!

 

I assume the same will apply for the OpenGL -> Vulkan transition based off of my experiences with Mantle.

Share this post


Link to post
Share on other sites
mhagain    13430

 

With DX12 I can block the calls untill command queue has finished but it will introduce a stall. But I can also create a new copy of the vertex buffer, using the DMA queue, and delete the old one when graphic command queue haas finished. I'm not sure I can modify a command queue on the fly but if it's possible, I can ammend some comand list to use the newly vertex buffer too.

 

The thing is, your second option here is essentially the very same as what OpenGL currently does behind-the-scenes.  But with OpenGL the code that does it is in the driver; with the new APIs you have to explicitly write the code for it yourself.

 

And this is actually a good thing, because now you get to see first-hand what the real effects of OpenGL's higher level of abstraction are, and now you get to be able to choose whether or not the behaviour of that abstraction is in fact actually what you want.  (Plus if it blows up or does the wrong thing it'll be your code, not driver code, so you'll be able to fix it.)

Share this post


Link to post
Share on other sites
vlj    1070
And this is actually a good thing, because now you get to see first-hand what the real effects of OpenGL's higher level of abstraction are, and now you get to be able to choose whether or not the behaviour of that abstraction is in fact actually what you want.  (Plus if it blows up or does the wrong thing it'll be your code, not driver code, so you'll be able to fix it.)

 

 

Yes but what I mean is : in order to fight draw call overhead, I reduced binding call to a minimum by implementing a giant VAO Manager that stores all the (static) geometry of a scene.

With DX12 I can use the VAO Manager concept but it means reimplementing the allocation scheme in GL drivers, which means handling several synchronisation/fence that won't appear in the GL equivalent code.
Or I can just ditch the VAO Manager since binding a vertex buffer/index buffer costs almost nothing in dx12. But then I don't have a 1:1 mapping with old and new API.

 

I also think it's better to have a clean explicit API but it's hard to drop support for older one, today you get complains if you target gpu from GL 3 only. I can't think of Vulkan only engine before at least 5 or 6 years...

Share this post


Link to post
Share on other sites
_the_phantom_    11250

But then I don't have a 1:1 mapping with old and new API.


And this is a good thing; you can't take a system you designed to get around the problems of an old API and map it to a new API where the problems no longer exist and expect it to be optimal.

Sure, you can make Vulkan/DX12 work like DX11/GL3/4 era APIs but you'll start reproducing driver work and generally make things less than optimal. (Ran into this problem a while back; designing new renderer layer, based too closely on DX11 so the PS4 version was a nightmare. If we had waited a bit to see the new hardware then chances are it would have been more towards the PS4 way of doing things and DX11 made to emulate that in some way.)

Right now it is like you've worked out a great way to cook burgers, and you can cook burgers really well, but now people want beer and you are trying to work out how to apply your grill to serve them beer without doing any extra work.

This is also a great argument against close API abstraction layers as they don't react well to massive API changes.

Honestly, if you can, I'd take some time to strip your abstraction layers and rebuild so that the Vulkan/DX12 way of doing things is your primary abstraction and make OpenGL work with that instead. If nothing else the old way of doing things IS going away, even if it takes a little while, so getting on top of it sooner rather than later is probably a good move.

If you can't do that, because you have a game to ship or something, then I'd forget DX12 support for now, finish what you need to finish and then go back and rebuild.

End of the day an abstraction which is based around the current DX11/OpenGL limitations and functionality will be broken and slow going forward, you are probably better off redesigning with the new method in mind. Edited by phantom

Share this post


Link to post
Share on other sites
agleed    1013

This brilliant programmer reverse-engineered the Mantle API and wrote a "Hello Triangle" tutorial.  Definitely worth checking out.

 

https://medium.com/@Overv/implementing-hello-triangle-in-mantle-4302450fbcd2

 

Interesting. It seems like a verbose effort to get a triangle up and running (although, from what I remember, my first experiences with DX11 and GL3+ first triangle were the same), but conceptually it's a lot easier to grasp than I feared.

 

I haven't been active here for that long. How do the forums look like when a new API is released to the masses? Especially a paradigm shift like DX12 and Mantle must be fun.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this  

  • Similar Content

    • By nickyc95
      Hi,
      I posted on here a while back about rendering architecture and came away with some great information.
      I am planning on implementing a render queue which collects the visible objects in the scene and sorts them based on different criteria to minimise state change etc..
      The thing I am currently undecided about is: what is the best way to submit my draw calls?
      (I am wanting to support both OpenGL and Vulkan)
      At the moment I have two ideas for how I can handle it.
      The renderable handles the rendering (i.e. It calls renderContext->BindVertexBuffer(...) etc) and setup the renderer state Pro- Each renderable is full in control of how it renders Con - Have to manually manage state The renderable pushes RenderCommands (DrawMesh, DrawMeshIndexed etc) into a CommandBuffer that gets executed by the RenderBacked at the end of the frame Pro - Stateless Con - Seems more difficult to extend with new features Pro/Con - The front end only has a subset of rendering capabilities  
      There are more pros / cons for each, but I have listed a couple to help show my thinking..
       
      Any one have any comments on either of these two approaches or any other approaches that are typically used?
       
      Thanks
    • By mark_braga
      I have been reading about async compute in the new apis and it all sounds pretty interesting.
      Here is my basic understanding of the implementation of async compute in a simple application like computing the Mandelbrot fractal:
      In this case, the compute queue generates a texture of the fractal and the graphics queue presents it.
      Program structure:
      // Create 3 UAV textures for triple buffering // Create 3 fences for compute queue beginCmd(computeCmd); cmdDispatch(computeCmd); endCmd(computeCmd); queueSubmit(computeQueue, fence[frameIdx]); if (!getFenceReady(fence[frameIdx - 1]) waitForFences(fence[frameIdx - 1]); beginCmd(graphicsCmd); cmdDraw(uavTexture[frameIdx - 1]); endCmd(graphicsCmd); queueSubmit(graphicsQueue); I am not sure about one thing in this structure
      All the examples I have seen use vkWaitForFences but I thought fences are used for waiting from the CPU for the GPU to complete. Should I use semaphores instead, so the graphics queue waits on the GPU for the compute queue to finish if it's running faster than the compute queue? Any advice on this will really help to make efficient use of async compute.
    • By mark_braga
      Hello, I am currently working on synchronization for frame submission in Vulkan.
      This is currently the logic of the draw function.
      void draw() { uint32_t listIndex = gFrameCount++ % gSwapChainImageCount; Cmd* pCmd = ppCmds[listIndex]; beginCmd(pCmd); // resets command buffer fillCommandList(pCmd); endCmd(pCmd); queueSubmit(pGraphicsQueue, pCmd); } gSwapChainImageCount is currently 3.
      So there are no validation errors for first three frames but after I try to reset the command list in the fourth frame (listIndex is back to 0), I get a validation error
      ERROR: [DS] : Attempt to reset command buffer (0x0000025AC56701B0) which is in use. The spec valid usage text states 'commandBuffer must not be in the pending state'
      Now from what I read, submit will automatically stall after the third frame if swapChain[0] is still used. I have tried using all the present modes (Immediate, Fifo, Mailbox,...) but it gives the same error.
      This leads me to believe that what I read was not correct. In that case, I would need to stall until that frame has been processed. So what would be the best way to determine if a commandBuffer is still in use in Vulkan?
    • By khawk
      The AMD GPU Open website has posted a brief tutorial providing an overview of objects in the Vulkan API. From the article:
      Read more at http://gpuopen.com/understanding-vulkan-objects/.


      View full story
    • By khawk
      The AMD GPU Open website has posted a brief tutorial providing an overview of objects in the Vulkan API. From the article:
      Read more at http://gpuopen.com/understanding-vulkan-objects/.

  • Popular Now