• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By khawk
      LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:
      New extensions: VK_ANDROID_external_memory_android_hardware_buffer VK_EXT_descriptor_indexing VK_AMD_shader_core_properties VK_NV_shader_subgroup_partitioned Many bug fixes, increased validation coverage and accuracy improvements, and feature additions Developers can download the SDK from LunarXchange at https://vulkan.lunarg.com/sdk/home.

      View full story
    • By khawk
      LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:
      New extensions: VK_ANDROID_external_memory_android_hardware_buffer VK_EXT_descriptor_indexing VK_AMD_shader_core_properties VK_NV_shader_subgroup_partitioned Many bug fixes, increased validation coverage and accuracy improvements, and feature additions Developers can download the SDK from LunarXchange at https://vulkan.lunarg.com/sdk/home.
    • By mark_braga
      I have a pretty good experience with multi gpu programming in D3D12. Now looking at Vulkan, although there are a few similarities, I cannot wrap my head around a few things due to the extremely sparse documentation (typical Khronos...)
      In D3D12 -> You create a resource on GPU0 that is visible to GPU1 by setting the VisibleNodeMask to (00000011 where last two bits set means its visible to GPU0 and GPU1)
      In Vulkan - I can see there is the VkBindImageMemoryDeviceGroupInfoKHR struct which you add to the pNext chain of VkBindImageMemoryInfoKHR and then call vkBindImageMemory2KHR. You also set the device indices which I assume is the same as the VisibleNodeMask except instead of a mask it is an array of indices. Till now it's fine.
      Let's look at a typical SFR scenario:  Render left eye using GPU0 and right eye using GPU1
      You have two textures. pTextureLeft is exclusive to GPU0 and pTextureRight is created on GPU1 but is visible to GPU0 so it can be sampled from GPU0 when we want to draw it to the swapchain. This is in the D3D12 world. How do I map this in Vulkan? Do I just set the device indices for pTextureRight as { 0, 1 }
      Now comes the command buffer submission part that is even more confusing.
      There is the struct VkDeviceGroupCommandBufferBeginInfoKHR. It accepts a device mask which I understand is similar to creating a command list with a certain NodeMask in D3D12.
      So for GPU1 -> Since I am only rendering to the pTextureRight, I need to set the device mask as 2? (00000010)
      For GPU0 -> Since I only render to pTextureLeft and finally sample pTextureLeft and pTextureRight to render to the swap chain, I need to set the device mask as 1? (00000001)
      The same applies to VkDeviceGroupSubmitInfoKHR?
      Now the fun part is it does not work  . Both command buffers render to the textures correctly. I verified this by reading back the textures and storing as png. The left texture is sampled correctly in the final composite pass. But I get a black in the area where the right texture should appear. Is there something that I am missing in this? Here is a code snippet too
      void Init() { RenderTargetInfo info = {}; info.pDeviceIndices = { 0, 0 }; CreateRenderTarget(&info, &pTextureLeft); // Need to share this on both GPUs info.pDeviceIndices = { 0, 1 }; CreateRenderTarget(&info, &pTextureRight); } void DrawEye(CommandBuffer* pCmd, uint32_t eye) { // Do the draw // Begin with device mask depending on eye pCmd->Open((1 << eye)); // If eye is 0, we need to do some extra work to composite pTextureRight and pTextureLeft if (eye == 0) { DrawTexture(0, 0, width * 0.5, height, pTextureLeft); DrawTexture(width * 0.5, 0, width * 0.5, height, pTextureRight); } // Submit to the correct GPU pQueue->Submit(pCmd, (1 << eye)); } void Draw() { DrawEye(pRightCmd, 1); DrawEye(pLeftCmd, 0); }  
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By Alexa Savchenko
      I publishing for manufacturing our ray tracing engines and products on graphics API (C++, Vulkan API, GLSL460, SPIR-V): https://github.com/world8th/satellite-oem
      For end users I have no more products or test products. Also, have one simple gltf viewer example (only source code).
      In 2016 year had idea for replacement of screen space reflections, but in 2018 we resolved to finally re-profile project as "basis of render engine". In Q3 of 2017 year finally merged to Vulkan API. 
       
       
  • Advertisement
  • Advertisement
Sign in to follow this  

Vulkan What are your opinions on DX12/Vulkan/Mantle?

This topic is 1049 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm pretty torn on what to think about it. On one hand being able to write the implementations for a lot of the resource management and command processing allows for a lot of gains, and for a much better management of the rendering across a lot of hardware. Also all the threading support is going to be fantastic.

But on the other hand, accepting that burden will vastly increase the cost to maintain a platform and the amount of time to fully port. I understand that a lot of it can be ported piece by piece, but it seems like the amount of time necessary to even meet the performance of, say, dx12 is on the order of man weeks. 

I feel like to fully support these APIs I need to almost abandon the previous APIs support in my engine since the veil is so much thinner, otherwise I'll just end up adding the same amount of abstraction that DX11 does already, kind of defeating the point.

 

What are your opinions?

Share this post


Link to post
Share on other sites
Advertisement

I feel like to fully support these APIs I need to almost abandon the previous APIs support in my engine since the veil is so much thinner, otherwise I'll just end up adding the same amount of abstraction that DX11 does already, kind of defeating the point.

 

 

That's highly unlikely. One of the reasons the classical APIs are so slow is that they have to do a whole lot of rule validation. When you google an OpenGL function and you see that whole list of things that an argument is allowed to be, what happens when something goes wrong, etc, then the drivers have to actually validate that at runtime (to actually fulfill the language spec requirements correctly, but also to prevent you from crashing the GPU). That is because drivers can't make many assumptions about the context that you execute those API calls in. When you write a game engine with DX12 or Vulkan, that's not true. As the programmer you typically have complete knowledge of the relevant context and can make many assumptions, thus you can skip a whole lot of work that a classical OpenGL driver would have to do.

 

In addition to that multithreading in DX11 and OpenGL 4.x is still very crappy (although I'm not sure why) and with the new APIs you will be able to actually use multiple cores to do rendering API stuff for more than just 5% gains. 

 

Thinking about it, it's kinda like C++ vs a more managed language like Java or C#. Similar concepts, of course extremely similar execution context, but one gives more control, is a more precise abstraction of the hardware, but enables you to shoot yourself in the foot more (and in return is faster).

Edited by agleed

Share this post


Link to post
Share on other sites

 

 

In addition to that multithreading in DX11 and OpenGL 4.x is still very crappy (although I'm not sure why)

 

OpenGL doesnt really provide any mean for multithreading even in 4.4. OpenGL context belongs to a single thread who can issue rendering command. That's what vulkan/DX12 are tackling by the "command buffer" object that can be created on any thread (although they have to be commited to the command queue which seems to belong to a single thread only).

 

Actually there are way to do kindof multithreading in OpenGL 4 : you can share a context between thread, that's used to load texture asynchronously for instance, but I've heard that this is really inefficient. There is also glBufferStorage + IndirectDraw which allows you to access a buffer of instanced data that can be written like any others buffer, eg concurrently.

But it's not as powerful as what Vulkan or DX12 which allow to issue any command and not just instanced ones.

Share this post


Link to post
Share on other sites

 

 

 

In addition to that multithreading in DX11 and OpenGL 4.x is still very crappy (although I'm not sure why)

 

OpenGL doesnt really provide any mean for multithreading even in 4.4. OpenGL context belongs to a single thread who can issue rendering command. That's what vulkan/DX12 are tackling by the "command buffer" object that can be created on any thread (although they have to be commited to the command queue which seems to belong to a single thread only).

 

Actually there are way to do kindof multithreading in OpenGL 4 : you can share a context between thread, that's used to load texture asynchronously for instance, but I've heard that this is really inefficient. There is also glBufferStorage + IndirectDraw which allows you to access a buffer of instanced data that can be written like any others buffer, eg concurrently.

But it's not as powerful as what Vulkan or DX12 which allow to issue any command and not just instanced ones.

 

 

Yes, but I'm more interested in what prevented driver implementors to get proper multithreading support into the APIs in the first place. DX11 has the concept of command lists too and it's kind of working, but practical gains from it are pretty small. I don't know what about the APIs (or the implementations of the drivers) prevents proper multithreading from working in DX11 and GL4.x

Share this post


Link to post
Share on other sites

I feel like to fully support these APIs I need to almost abandon the previous APIs support in my engine since the veil is so much thinner, otherwise I'll just end up adding the same amount of abstraction that DX11 does already, kind of defeating the point.

Yes.
But it depends. For example if you were doing AZDO OpenGL, many of the concepts will already be familiar to you.
However, for example, AZDO never dealt with textures as thin as Vulkan or D3D12 do so you'll need to refactor those.
If you weren't following AZDO, then it's highly likely that the way you were using the old APIs is incompatible with the new says.

Actually there are way to do kindof multithreading in OpenGL 4 : (...). There is also glBufferStorage + IndirectDraw which allows you to access a buffer of instanced data that can be written like any others buffer, eg concurrently.
But it's not as powerful as what Vulkan or DX12 which allow to issue any command and not just instanced ones.

Actually DX12 & Vulkan are exactly following the same path glBufferStorage + IndirectDraw did. It just got easier, made thiner, and can now handle other misc aspects from within multiple cores (texture binding, shader compilation, barrier preparation, etc).

The rest was covered by Promit's excellent post.

Share this post


Link to post
Share on other sites

Actually there are way to do kindof multithreading in OpenGL 4 : (...). There is also glBufferStorage + IndirectDraw which allows you to access a buffer of instanced data that can be written like any others buffer, eg concurrently.
But it's not as powerful as what Vulkan or DX12 which allow to issue any command and not just instanced ones.

Actually DX12 & Vulkan are exactly following the same path glBufferStorage + IndirectDraw did. It just got easier, made thiner, and can now handle other misc aspects from within multiple cores (texture binding, shader compilation, barrier preparation, etc).

 

 

There is something I don't really understand in Vulkan/DX12, it's the "descriptor" object. Apparently it acts as a gpu readable data chunk that hold texture pointer/size/layout and sampler info, but I don't understand the descriptor set/pool concept work, this sounds a lot like array of bindless texture handle to me.

Share this post


Link to post
Share on other sites

There is something I don't really understand in Vulkan/DX12, it's the "descriptor" object. Apparently it acts as a gpu readable data chunk that hold texture pointer/size/layout and sampler info, but I don't understand the descriptor set/pool concept work, this sounds a lot like array of bindless texture handle to me.

Without going into detail; it's because only AMD & NVIDIA cards support bindless textures in their hardware, there's one major Desktop vendor that doesn't support it even though it's DX11 HW. Also take in mind both Vulkan & DX12 want to support mobile hardware as well.
You will have to give the API a table of textures based on frequency of updates: One blob of textures for those that change per material, one blob of textures for those that rarely change (e.g. environment maps), and another blob of textures that don't change (e.g. shadow maps).
It's very analogous to how we have been doing constant buffers with shaders (provide different buffers based on frequency of update).
And you put those blobs into a bigger blob and tell the API "I want to render with this big blob which is a collection of blobs of textures"; so the API can translate this very well to all sorts of hardware (mobile, Intel on desktop, and bindless like AMD's and NVIDIA's).

If all hardware were bindless, this set/pool wouldn't be needed because you could change one texture anywhere with minimal GPU overhead like you do in OpenGL4 with bindless texture extensions.
Nonetheless this descriptor pool set is also useful for non-texture stuff, (e.g. anything that requires binding, like constant buffers). It is quite generic. Edited by Matias Goldberg

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement