Jump to content
  • Advertisement
Sign in to follow this  

Vulkan Techniques for bulk data transfer

This topic is 656 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I'm having as issue with how to efficiently upload large amounts of data to the GPU with Vulkan.  I'm not talking about small constant buffers, or dynamically modified data, but rather the large static assets usually loaded during level loading (textures, models, etc...).


In D3D11 it usually consisted of a loop repeatedly calling CreateTexture or CreateBuffer with a corresponding D3D11_SUBRESOURCE_DATA to indicate the data source.  This seemed pretty quick and I imagine the driver knew how to the copy data into device memory (whichever heap/type D3D11_USAGE_IMMUTABLE mapped to) in an optimal way.


In Vulkan it seems I have to:

- create a staging VkImage using VK_IMAGE_TILING_LINEAR

- allocate and bind host visible memory to the image

- use map/unmap to upload data

- create final VkImage this time with VK_IMAGE_TILING_OPTIMAL

- allocate and bind device memory to image

- create command buffer

- copy each image subresource from staging to final image

- submit command buffer

- wait on command buffer completion

- repeat 1000x ??


I understand the need for a staging buffer when working with dynamic resources that would be modified per frame (or even multiple times per frame).  But when it comes to bulk data transfers it seems the need for a staging buffer as well as the VK_IMAGE_TILING_LINEAR to VK_IMAGE_TILING_OPTIMAL conversion would significantly hinder performance.  As I see it there are a number of options, all of which seem to have issues.


1) Use a single large staging image/buffer, use a single command buffer, submit, wait on fence, loop.  This has got to be slow waiting for both the CPU and GPU to sync before uploading the next resource.


2) Use multiple staging images/buffers as part of a large memory ring/circular buffer, use multiple command buffers.  This is fairly complex to implement; not a huge deal but for something that seems so simple this feels over engineered.  Also there doesn't seem to be any simple way to predict how much memory to use for staging.  I'd be worried that allocating a large amount of staging memory would affect performance (staging memory could also serve as device memory in some implementations if I understand the spec correctly) and/or lead to out of memory situations.


3) Use buffers instead of images for staging both image and buffer data.  Don't know if this helps in any way, but I saw it mentioned in a forum online.  Still have the 'how much memory is too much memory' to allocate issue.


I can't find any write-up or presentations on this, any links, slides, talks, etc... that I've come across simply seem to gloss over this whole part.  I was hoping there would be a way I could just dump a large amount of data directly to device memory and then cut it up into its various images/buffers after the fact.  How are you guys approaching this problem?

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!