• Advertisement
  • Popular Tags

  • Popular Now

  • Advertisement
  • Similar Content

    • By khawk
      LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:
      New extensions: VK_ANDROID_external_memory_android_hardware_buffer VK_EXT_descriptor_indexing VK_AMD_shader_core_properties VK_NV_shader_subgroup_partitioned Many bug fixes, increased validation coverage and accuracy improvements, and feature additions Developers can download the SDK from LunarXchange at https://vulkan.lunarg.com/sdk/home.

      View full story
    • By khawk
      LunarG has released new Vulkan SDKs for Windows, Linux, and macOS based on the 1.1.73 header. The new SDK includes:
      New extensions: VK_ANDROID_external_memory_android_hardware_buffer VK_EXT_descriptor_indexing VK_AMD_shader_core_properties VK_NV_shader_subgroup_partitioned Many bug fixes, increased validation coverage and accuracy improvements, and feature additions Developers can download the SDK from LunarXchange at https://vulkan.lunarg.com/sdk/home.
    • By mark_braga
      I have a pretty good experience with multi gpu programming in D3D12. Now looking at Vulkan, although there are a few similarities, I cannot wrap my head around a few things due to the extremely sparse documentation (typical Khronos...)
      In D3D12 -> You create a resource on GPU0 that is visible to GPU1 by setting the VisibleNodeMask to (00000011 where last two bits set means its visible to GPU0 and GPU1)
      In Vulkan - I can see there is the VkBindImageMemoryDeviceGroupInfoKHR struct which you add to the pNext chain of VkBindImageMemoryInfoKHR and then call vkBindImageMemory2KHR. You also set the device indices which I assume is the same as the VisibleNodeMask except instead of a mask it is an array of indices. Till now it's fine.
      Let's look at a typical SFR scenario:  Render left eye using GPU0 and right eye using GPU1
      You have two textures. pTextureLeft is exclusive to GPU0 and pTextureRight is created on GPU1 but is visible to GPU0 so it can be sampled from GPU0 when we want to draw it to the swapchain. This is in the D3D12 world. How do I map this in Vulkan? Do I just set the device indices for pTextureRight as { 0, 1 }
      Now comes the command buffer submission part that is even more confusing.
      There is the struct VkDeviceGroupCommandBufferBeginInfoKHR. It accepts a device mask which I understand is similar to creating a command list with a certain NodeMask in D3D12.
      So for GPU1 -> Since I am only rendering to the pTextureRight, I need to set the device mask as 2? (00000010)
      For GPU0 -> Since I only render to pTextureLeft and finally sample pTextureLeft and pTextureRight to render to the swap chain, I need to set the device mask as 1? (00000001)
      The same applies to VkDeviceGroupSubmitInfoKHR?
      Now the fun part is it does not work  . Both command buffers render to the textures correctly. I verified this by reading back the textures and storing as png. The left texture is sampled correctly in the final composite pass. But I get a black in the area where the right texture should appear. Is there something that I am missing in this? Here is a code snippet too
      void Init() { RenderTargetInfo info = {}; info.pDeviceIndices = { 0, 0 }; CreateRenderTarget(&info, &pTextureLeft); // Need to share this on both GPUs info.pDeviceIndices = { 0, 1 }; CreateRenderTarget(&info, &pTextureRight); } void DrawEye(CommandBuffer* pCmd, uint32_t eye) { // Do the draw // Begin with device mask depending on eye pCmd->Open((1 << eye)); // If eye is 0, we need to do some extra work to composite pTextureRight and pTextureLeft if (eye == 0) { DrawTexture(0, 0, width * 0.5, height, pTextureLeft); DrawTexture(width * 0.5, 0, width * 0.5, height, pTextureRight); } // Submit to the correct GPU pQueue->Submit(pCmd, (1 << eye)); } void Draw() { DrawEye(pRightCmd, 1); DrawEye(pLeftCmd, 0); }  
    • By turanszkij
      Hi,
      I finally managed to get the DX11 emulating Vulkan device working but everything is flipped vertically now because Vulkan has a different clipping space. What are the best practices out there to keep these implementation consistent? I tried using a vertically flipped viewport, and while it works on Nvidia 1050, the Vulkan debug layer is throwing error messages that this is not supported in the spec so it might not work on others. There is also the possibility to flip the clip scpace position Y coordinate before writing out with vertex shader, but that requires changing and recompiling every shader. I could also bake it into the camera projection matrices, though I want to avoid that because then I need to track down for the whole engine where I upload matrices... Any chance of an easy extension or something? If not, I will probably go with changing the vertex shaders.
    • By Alexa Savchenko
      I publishing for manufacturing our ray tracing engines and products on graphics API (C++, Vulkan API, GLSL460, SPIR-V): https://github.com/world8th/satellite-oem
      For end users I have no more products or test products. Also, have one simple gltf viewer example (only source code).
      In 2016 year had idea for replacement of screen space reflections, but in 2018 we resolved to finally re-profile project as "basis of render engine". In Q3 of 2017 year finally merged to Vulkan API. 
       
       
  • Advertisement
  • Advertisement
Sign in to follow this  

Vulkan Multithreaded renderer & command ordering

This topic is 441 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

This is a question that perhaps a seasoned engine programmer could asnwer for me. In a single threaded renderer, a possible approach would be something like this:

 

- Collect visible drawable objects

- Create DrawCall instances, containing everything that the renderer needs to visually describe this object, including a sort key.

- Sort those instances based on the sort key

- Go through them all and render them

 

Roughly...

 

 

However, with a commandlist approach (D3D12/Vulkan), I can see some problems. So, we want to record commandlists from multiple threads; let's say we have a task scheduler which we feed it rendering tasks. We also have a pool of commandlists and we get the next available. So far so good, we can record our commands. How would the drawcalls be ordered?

 

Thanks!

Share this post


Link to post
Share on other sites
Advertisement
You can have the main thread actually submit the completed command lists to the queue (or a dedicated job to submit them to the queue).
It's the same as any other data dependency within your job system. You split a task up into an array of jobs (so it can be processed by many threads), but jobs can be dependent on earlier jobs, being forced to wait until those early jobs have been completed before executing
e.g. A job graph where three threads can do the scene traversal/culling, one thread does the sorting, two threads convert draw-call instances into command lists, and one thread submits those command lists to the queue in the appropriate order might look like:
[Collect Visible Objects Job 1] [Collect Visible Objects Job 2] [Collect Visible Objects Job 3]
                              \                |                /
                              [           Sorting Job           ]
                                 /                           \
           [Draw into command list  Job 1]          [Draw into command list  Job 2]
                                    \                     /
                                    [ Submit to Queue Job ]

Share this post


Link to post
Share on other sites

I'd just break up the rendering into jobs per render target - generally you'll have the main scene and N shadow maps, maybe a reflection, maybe a cube map to render. Each one of these has its own Cull, Sort, Draw that is fairly independent. Each can be its own job in the thread system and they just need to be linked together at the end so all the dependent shadow maps / textures are available at the right time.

 

If they need to be linked throughout the main scene render (i.e. shadow map reuse), you just do some up front work to know what the command buffers / render target textures are so you can reference them as needed before they are fully generated and without syncing.

Share this post


Link to post
Share on other sites

Hi Hogdman,

 

Thanks for your reply. That is how I do the Culling and sorting at the moment. DrawCalls are created before the Sort job, where 1 mesh = 1 task, if that makes sense.

Would it be correct to say that it'd be the Sort job's job to cut the sorted DrawCall array into pieces and create N BuildCmdList tasks (where N = the maximum number of commandlists)? That way I could guarrantee the order of the command list building.

 

@Dukus

That is how I have structured my renderer (or well, structuring it now :P). It is based on an article from Blizzard. However, 1 task per scene-view will not parallelise very well; that is why I am trying to add subtasks to each scene-view rendering process.

Share this post


Link to post
Share on other sites

If you need to break the rendering up into more jobs here's a few ideas.

 

For culling, if you have a spacial subdivision (quadtree/octree) you can create a job per node of a certain size, say anything over N meters becomes a separate job.

 

For sorting, if you know certain things have certain properties and rendering order, such as opaque, transparent, z sorted, etc, you can bin them together at cull time and then sort them separately as their own job.

 

For drawing, you can certainly take the sorted lists and break them up into N jobs. I'd just make sure that N is big enough that you don't take a GPU hit from having to set all render state at the beginning of each sub list, since you don't know how render state will end up between each small command buffer that results. 

Share this post


Link to post
Share on other sites

Hi Dukus,

Those are some good points, thanks. I have a flat scenegraph at the moment, but intend to go to a more spacial approach soon.

 

For sort bins, you suggest that I have a render queue per property batch, let's say, and sort the objects that are bined in it using their sort key? That may parallelise better, as at the moment I am sorting all the objects in one job.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement