Jump to content
  • Advertisement
Sign in to follow this  
Silverlan

Vulkan Changing a descriptor set's buffer memory every frame?

This topic is 929 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Following problem:
I have a bunch of meshes that need to be rendered in one batch (They're not the same, so I can't use instancing).
I've created a secondary command buffer, which does exactly that:

(PseudoCode)
VkCommandBuffer cmdSec = new SecondaryCommandBuffer;
int subPass = 0;
vkBeginCommandBuffer(cmdSec,COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE,renderPass,framebuffer,subPass);
	vkCmdBindPipeline(cmdSec,pipeline);
	foreach(mesh) {
		vkCmdBindVertexBuffers(cmdSec,...);
		vkCmdDraw(cmdSec);
	}
vkEndCommandBuffer(cmdSec);
 
The secondary command buffer is later executed each frame from within the primary command buffer:
VkCommandBuffer cmdPrim = new PrimaryCommandBuffer;
vkBeginRenderPass(cmdPrim,renderPass,framebuffer,VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS);
	vkCmdExecuteCommands(cmdPrim,cmdSec);
vkEndRenderPass(cmdPrim);
 
So far so good. The problem is, to render the meshes, I also need to push some additional data (e.g. matrix) to the pipeline, and this data changes every frame.
Push constants are not an option, since they can't be used in a render pass with the VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS flag:

The contents parameter describes how the commands in the first subpass will be provided. If it is VK_SUBPASS_CONTENTS_INLINE, the contents of the subpass will be recorded inline in the primary command buffer, and calling a secondary command buffer within the subpass is an error. If contents is [background=#ffeb90]VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS[/background], the contents are recorded in secondary command buffers that will be called from the primary command buffer, and [background=#ffeb90]vkCmdExecuteCommands is the only valid command on the command buffer until vkCmdNextSubpass or vkCmdEndRenderPass.[/background]

(Source: https://www.khronos.org/registry/vulkan/specs/1.0/apispec.html#vkCmdBeginRenderPass)

That means my only(?) option is to use a descriptor set.
The idea is to bind the descriptor set inside the secondary command buffer recording, then update the descriptor set with the new data every frame, right before executing the secondary command buffer.

Now, I'm still new at this, so I'd like someone to confirm whether this is correct or not. There's a couple of things I have to take into account:
  • Since the memory of the descriptor set's buffer changes every frame (=non-coherent) it has to be created without the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT flag.

vkFlushMappedMemoryRanges must be used to guarantee that host writes to non-coherent memory are visible to the device. It must be called after the host writes to non-coherent memory have completed and before command buffers that will read or write any of those memory locations are submitted to a queue.

  • vkFlushMappedMemoryRanges has to be called on the host, after the updated memory has been mapped.

Host-visible memory types that advertise the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT property still require memory barriers between host and device in order to be coherent, but do not require additional cache management operations to achieve coherency. For host writes to be seen by subsequent command buffer operations, a pipeline barrier from a source of VK_ACCESS_HOST_WRITE_BIT and VK_PIPELINE_STAGE_HOST_BIT to a destination of the relevant device pipeline stages and access types must be performed. Note that such a barrier is performed implicitly upon each command buffer submission, so an explicit barrier is only rarely needed (e.g. if a command buffer waits upon an event signaled by the host, where the host wrote some data after submission). For device writes to be seen by subsequent host reads, a pipeline barrier is required to make the writes visible.

  • I'm not sure about this part. Since the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT flag isn't set, do I still need a pipeline barrier? (Or does vkFlushMappedMemoryRanges already take care of that?)
The result would be this:
(PseudoCode)
VkCommandBuffer cmdSec = new SecondaryCommandBuffer;
int subPass = 0;
vkBeginCommandBuffer(cmdSec,COMMAND_BUFFER_USAGE_RENDER_PASS_CONTINUE,renderPass,framebuffer,subPass);
	vkCmdBindPipeline(cmdSec,pipeline);
	vkCmdBindDescriptorSets(descSet);
	foreach(mesh) {
		vkCmdBindVertexBuffers(cmdSec,...);
		vkCmdDraw(cmdSec);
	}
vkEndCommandBuffer(cmdSec);
vkMapMemory(descSetBufferMemory);
	// Write data to mapped memory
vkUnmapMemory(descSetBufferMemory);
vkFlushMappedMemoryRanges(descSetBufferMemory);
VkCommandBuffer cmdPrim = new PrimaryCommandBuffer;
vkBeginCommandBuffer(cmdPrim);
	// Pipeline Barrier?
	vkBeginRenderPass(cmdPrim,renderPass,framebuffer,VK_SUBPASS_CONTENTS_SECONDARY_COMMAND_BUFFERS);
		vkCmdExecuteCommands(cmdPrim,cmdSec);
	vkEndRenderPass(cmdPrim);
vkEndCommandBuffer(cmdPrim);
Would that be correct so far?


Another thing I'm wondering about:
The memory of the buffer is updated and used by the pipeline every frame. What happens if a frame has been queued already, but not fully drawn, and I'm updating the buffer for the next frame already?
Would/Could that affect the queued frame? If so, could that be avoided with an additional barrier (source = VK_ACCESS_SHADER_READ_BIT, destination = VK_ACCESS_HOST_WRITE_BIT ("Wait for all shader reads to be completed before allowing the host to write"))?
Would it be better to use more than 1 buffer/descriptor set (+ more than 1 secondary command buffer), and swap between them each frame? If so, would 2 be enough (Even for mailbox present mode), or would I need as many as I have swapchain images?


I'd mostly just like to know if my general idea is correct, or if I'm missing/misinterpreting something.

Share this post


Link to post
Share on other sites
Advertisement

That means my only(?) option is to use a descriptor set.
The idea is to bind the descriptor set inside the secondary command buffer recording, then update the descriptor set with the new data every frame, right before executing the secondary command buffer.

Just to get the terminology right - a descriptor set is a group of descriptors. A descriptor is a small structure that points to a resource.
You can either update a descriptor to point to a different resource, or just update the data within that existing resource.
 

Since the memory of the descriptor set's buffer changes every frame (=non-coherent) it has to be created without the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT flag.

That's not what coherent/non-coherent means. Memory coherency means that two processors see the same version of events in memory. Coherency is an issue for multi-core CPU design too -- when one core writes to memory, that write might be stored in the core's cache for some time before actually reaching RAM. This means that other cores will see a non-coherent view of RAM. CPU manufacturers solve this by networking the cache of each CPU together, and following a coherency protocol, e.g. MESI.

By default, the CPU and GPU are not coherent because the CPU is accessing RAM via it's cache, and the GPU is accessing it directly -- so the GPU won't see any values that are lingering in the CPU's cache.
Programmers can achieve coherency themselves, via functions like vkFlushMappedMemoryRanges/etc (internally this is ensuring that the CPU's writes have actually reached RAM, and informs the GPU to invalidate any caches that it may be using).
Or, if your hardware supports it, some PC's are capable of auto-magically establishing a coherent view of RAM. For example, these systems may be able to route the GPU's RAM read request to flow via the CPU's L2 cache, so that the latest values are picked up without the need for any flushing/invalidation commands. The downside is that this will be a longer route, so the latency will be increased -- so coherent memory heaps are ok for things like command buffers or some constant updates, but not so good for textures :)

In your case, you should be able to put your data in coherent or non-coherent heaps, as long as you follow the guildelines to achieve coherency yourself via Flush/etc...
 
As for the barrier -- vkFlushMappedMemoryRanges occurs on the CPU timeline and flushes the CPU cache out to RAM. The barrier occurs on the GPU timeline and invalidates any values that already exist in the GPU's cache, so that it will actually fetch fresh values from RAM - but as in your quote, this happens already for each command buffer submission.
 

Another thing I'm wondering about:
The memory of the buffer is updated and used by the pipeline every frame. What happens if a frame has been queued already, but not fully drawn, and I'm updating the buffer for the next frame already?
Would/Could that affect the queued frame? If so, could that be avoided with an additional barrier (source = VK_ACCESS_SHADER_READ_BIT, destination = VK_ACCESS_HOST_WRITE_BIT ("Wait for all shader reads to be completed before allowing the host to write"))?
Would it be better to use more than 1 buffer/descriptor set (+ more than 1 secondary command buffer), and swap between them each frame? If so, would 2 be enough (Even for mailbox present mode), or would I need as many as I have swapchain images?

Whenever the CPU is updating data that will be used by the GPU, you need to take care as the GPU is usually one frame behind the CPU. This usually means double or even triple-buffering your data. This is usually achieved by creating two (or more) resources and binding a different one each frame. This would also mean creating two descriptor sets, and two of your secondary command buffers...
You also need to use two (or more) fences to make sure that the CPU/GPU don't get too far ahead of each other. e.g. for double buffering, at the start of frame N, you must first wait on the fence that tells you that the GPU has finished frame N-2.
Once you've implemented this fencing scheme, you can use this one mechanism to ensure safe access to all of your per-frame resources.
e.g. once you know for a fact that the GPU is only ever 1 frame behind the CPU, then any resource that's more than 1 frame old is safe for the CPU to recycle/overwrite/reuse... and anything younger than that must be treated as if it's still being used by the GPU...

So, if you want to edit your descriptor set, or edit the resources that it points to... you're not allowed to until the GPU has finished consuming them. You can solve this by double buffering as above -- two resources, so you can have two sets of values in flight... which means two descriptor sets in flight... which means pre-creating two versions of your command buffer :(

Alternatively, you can use a single descriptor set (not double-buffered, never updated) and a single resource (not double-buffered, but updated on the GPU timeline instead of the CPU timeline) :)
If these updates occur on the GPU timeline, then there's no need to double buffer the resource, which means there's no need for multiple descriptor sets.
However, this also introduces its own pitfalls... To perform this update on the GPU timeline, you now need the "main" version of the resource, which is referenced by the descriptor set and read by your shaders. You also need a double-buffered "upload" resource, which is written to by the CPU each frame. You then submit a command buffer that instructs the GPU to copy from (one of) the upload resources to the "main" resource.

Edited by Hodgman

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!