• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By khawk
      LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:
      New extensions: VK_ANDROID_external_memory_android_hardware_buffer VK_EXT_descriptor_indexing VK_AMD_shader_core_properties VK_NV_shader_subgroup_partitioned Many bug fixes, increased validation coverage and accuracy improvements, and feature additions Developers can download the SDK from LunarXchange at https://vulkan.lunarg.com/sdk/home.

      View full story
    • By khawk
      LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:
      New extensions: VK_ANDROID_external_memory_android_hardware_buffer VK_EXT_descriptor_indexing VK_AMD_shader_core_properties VK_NV_shader_subgroup_partitioned Many bug fixes, increased validation coverage and accuracy improvements, and feature additions Developers can download the SDK from LunarXchange at https://vulkan.lunarg.com/sdk/home.
    • By mark_braga
      I have a pretty good experience with multi gpu programming in D3D12. Now looking at Vulkan, although there are a few similarities, I cannot wrap my head around a few things due to the extremely sparse documentation (typical Khronos...)
      In D3D12 -> You create a resource on GPU0 that is visible to GPU1 by setting the VisibleNodeMask to (00000011 where last two bits set means its visible to GPU0 and GPU1)
      In Vulkan - I can see there is the VkBindImageMemoryDeviceGroupInfoKHR struct which you add to the pNext chain of VkBindImageMemoryInfoKHR and then call vkBindImageMemory2KHR. You also set the device indices which I assume is the same as the VisibleNodeMask except instead of a mask it is an array of indices. Till now it's fine.
      Let's look at a typical SFR scenario:  Render left eye using GPU0 and right eye using GPU1
      You have two textures. pTextureLeft is exclusive to GPU0 and pTextureRight is created on GPU1 but is visible to GPU0 so it can be sampled from GPU0 when we want to draw it to the swapchain. This is in the D3D12 world. How do I map this in Vulkan? Do I just set the device indices for pTextureRight as { 0, 1 }
      Now comes the command buffer submission part that is even more confusing.
      There is the struct VkDeviceGroupCommandBufferBeginInfoKHR. It accepts a device mask which I understand is similar to creating a command list with a certain NodeMask in D3D12.
      So for GPU1 -> Since I am only rendering to the pTextureRight, I need to set the device mask as 2? (00000010)
      For GPU0 -> Since I only render to pTextureLeft and finally sample pTextureLeft and pTextureRight to render to the swap chain, I need to set the device mask as 1? (00000001)
      The same applies to VkDeviceGroupSubmitInfoKHR?
      Now the fun part is it does not work  . Both command buffers render to the textures correctly. I verified this by reading back the textures and storing as png. The left texture is sampled correctly in the final composite pass. But I get a black in the area where the right texture should appear. Is there something that I am missing in this? Here is a code snippet too
      void Init() { RenderTargetInfo info = {}; info.pDeviceIndices = { 0, 0 }; CreateRenderTarget(&info, &pTextureLeft); // Need to share this on both GPUs info.pDeviceIndices = { 0, 1 }; CreateRenderTarget(&info, &pTextureRight); } void DrawEye(CommandBuffer* pCmd, uint32_t eye) { // Do the draw // Begin with device mask depending on eye pCmd->Open((1 << eye)); // If eye is 0, we need to do some extra work to composite pTextureRight and pTextureLeft if (eye == 0) { DrawTexture(0, 0, width * 0.5, height, pTextureLeft); DrawTexture(width * 0.5, 0, width * 0.5, height, pTextureRight); } // Submit to the correct GPU pQueue->Submit(pCmd, (1 << eye)); } void Draw() { DrawEye(pRightCmd, 1); DrawEye(pLeftCmd, 0); }  
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By Alexa Savchenko
      I publishing for manufacturing our ray tracing engines and products on graphics API (C++, Vulkan API, GLSL460, SPIR-V): https://github.com/world8th/satellite-oem
      For end users I have no more products or test products. Also, have one simple gltf viewer example (only source code).
      In 2016 year had idea for replacement of screen space reflections, but in 2018 we resolved to finally re-profile project as "basis of render engine". In Q3 of 2017 year finally merged to Vulkan API. 
       
       
  • Advertisement
  • Advertisement

Vulkan New API Rendering Architecture

Recommended Posts

So I'm currently updating my rendering architecture to gain more performance from the newer APIs , DX12/Vulkan, while still supporting D3D11 and I wanted to get some advice on the architecture to use. As far as I know there are two main architectures ; the first is using a single main thread. This thread performs gameplay logic using a task system, and once that is complete, performs visibility and drawcall logic using a task system and submits commands back on the main thread. As far as I know, the benefits of this approach is reducing input latency, but a consequence is that you have to wait for rendering tasks to complete before you perform game logic again. The second is a main thread and a render thread. After gameplay logic is computed, the main thread syncs data with the rendering thread. The rendering thread will compute visibility , draw call , and command buffer generation using the task system and submits command lists on the rendering thread. A benefit of this approach is that it does not block the computation of gameplay logic, but creates a frame of latency.

Share this post


Link to post
Share on other sites
Advertisement
Using a gameplay / render thread doesn't add any latency as the critical path for a frame remains completely unchanged - it's still update + render.
Say update and render cost 4ms each. Each frame takes 8ms to compute, regardless of whether you're using on thread or two.
With the two-threads plus pipelining solution, you start a new frame once every 4ms, so your framerate doubles, but latency is still 8ms per frame :o

Two threads is a bad architecture though because it doesn't scale. Let's say that a job system gives a 4x speed boost on a 4-core CPU.
The single-threaded version now takes 2ms per frame and has 4x the original framerate.

However, you can have your cake and eat it too. Add the job system to the two-threads pipelined version and it's now starting a new frame once every 1ms for 8x the original framerate... However because you're actively using two threads instead of one, let's say the job system now only gives a 2x speex boost instead of 4x: that means a frame takes 4ms total but a new one is started once every 2ms, so it actually does end up with the same framerate as the single-thread plus jobs version, but double the latency :o

In the real world, a lot of code remains serial and doesn't end up fully utilising the job system though, so personally I do still use the K-threads pipelined plus a job system model. I also place my "per system" threads into the job system's thread pool. e.g. on a 4-core CPU, I have one jobs+gameplay thread, one jobs+render thread, and two jobs-only threads.

The main difference with the new APIs is that the generation of command buffers can benefit from threading/jobs.
You can generate command buffer in jobs/threads on D3D11, but you don't gain any performance by doing so, so there's little point. In D3D12, I've found that you need to be recording a few thousand draw calls at a time to see any benefit from a job system... So in my engine, when I'm about to record a commamd buffer, I check if the backend reports that it supports fast threaded command buffers (i.e. is D3D12/Vulkan) and also if the draw-item count is over 1000 or not, and then either record the commands immediately on the render thread, or spawn several jobs.

As for rendering architecture itself, a more interesting question to me is whether to make a state-machine renderer like the underlying APIs, or a stateless renderer like BGGX or mine :D

Share this post


Link to post
Share on other sites

Thank you again for another response! I think I may have described my pattern wrong. For the main thread and render thread system.. both are using a job system to distribute tasks. So my approach is similar to yours (one game thread, one render thread , and the rest will be job threads). The "only main thread" method I was also talking about is based off of umbra-ignite-2015-jrmy-virga-dishonored- 

where they queue the game logic tasks and render tasks from the main thread. 

I guess it makes sense that it would not increase latency, I have just read about how companies like id-software , arkane , and a few other studios explain that that was the reason why they chose the main thread / task system model.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now


  • Advertisement