Jump to content
  • Advertisement


  • Content Count

  • Joined

  • Last visited

Community Reputation

2 Neutral


About acerskyline

  • Rank

Personal Information

  • Interests


  • Github

Recent Profile Visitors

1888 profile views
  1. acerskyline

    Vulkan Sampler Array

    Thanks so much!
  2. I have 10 samplers as an uniform array in a fragment shader. When rendering, I only used 3 of them. So the descriptor set layout and descriptor set only have 3 combined image samplers. That's why I am getting this validation layer warning: validation layer: Shader expects at least 10 descriptors for binding 0.1 but only 3 provided Is there any way to get rid of this other than changing layout(set = 0, binding = 1) uniform sampler2D lightTextureArray[10]; to layout(set = 0, binding = 1) uniform sampler2D lightTextureArray[3]; If I fill up remaining descriptors with default image info (use null handle for view and sampler), the validation layer will complain about invalid combined image sampler/view or something like that. The only way I can think of now is to repeat the existing descriptors. Is there any other way to get around this? What is the standard way to solve this problem?
  3. acerskyline

    Is vkCmdPushDescriptorSetKHR efficient?

    Very helpful! Thanks!
  4. I am new to Vulkan. Compared to DX12, the resource binding procedure seems a little bit complicated. In terms of per draw call uniform buffers, I currently have two approaches in mind, and they both seem to work (haven't tested/benchmarked yet). Can someone shine some light on which is the standard way to do this, with pros and cons. You are welcome to provide other ways but I am not considering push constants because I want a universal approach that I can apply to other kinds of uniform buffers as well (i.e. per pass, per frame, per scene and etc.). approach A) One descriptor set and multiple buffers (one buffer for each object). Bind the descriptor set to pipeline only once before rendering. Bind buffers to the only descriptor set between draw calls using vkCmdPushDescriptorSetKHR. approach B) Multiple descriptor sets (one descriptor set for each object). (Multiple or single buffers, it doesn't really matter. If single buffer is used, I think it's called a dynamic uniform buffer and all that I need to do is to specify the offset when binding the descriptor set.) Bind buffer/buffers to descriptor sets only once before rendering. Bind the matching descriptor set to the pipeline between draw calls using vkCmdBindDescriptorSets. I am used to DX12 so approach A seems more natural because in my understanding vkCmdPushDescriptorSetKHR in Vulkan is just like SetGraphicsRootConstantBufferView in DX12, because they both bind buffer to pipeline and this operation is buffered with other commands. But the Vulkan one do feels a little bit slower mainly because 1) it is not a root descriptor, 2) Vulkan does not use GPU address and also 3) a descriptor set write operation is needed.
  5. acerskyline

    GPU memory allocator

    Thanks for your answers!
  6. Why do we need GPU memory allocator? One of the most important reason for CPU memory allocator is to minimize memory segmentation and increase cache hit ratio. These goals are achieved by allocating a consecutive block of memory and provision this block memory to application. Now, let's assume I do not need dynamic allocation, do I still need CPU memory allocator? IMO the only potentially plausible advantage is the cache hit ratio. In terms of GPU memory, most of my little demos do not need dynamic buffer or texture allocation (load & unload). They are all loaded at initialization and will never require unloading until the end of the application. Is there any reasons to use memory allocator in this situation?
  7. acerskyline

    D3D12 Fence and Present

    Thank you so much for answering all my questions! All your answers are very helpful. I learned a lot. Thanks again!
  8. acerskyline

    D3D12 Fence and Present

    Oh wait I think I found a possible reason. Maybe it's because the copy operation in blt model is not finished. It's holding the front buffer. There ARE 3 buffers (1 front 2 back) but the "display buffer" is currently using one (front buffer) of them (to copy from) so the GPU command list is blocked by it until the copy operation is finished. Is this valid?
  9. acerskyline

    D3D12 Fence and Present

    Question 1, does this mean the present to render target barrier is unnecessary? (since the entire command list stopped (as opposed to the command list is being executed but get blocked at the barrier) because of some magic that the driver(?) made) A separate question is, according to the Microsoft DX12 page, the buffer count parameter of DXGI_SWAP_CHAIN_DESC is: So, question 2, in the above example, isn't the actual buffer count is 4 (the number you created the swap chain with)? 1 of them is front buffer and 3 of them are back buffer. Only this way can it support the point that Because if the top "colored block" is not a part of the swapchain (means you created the swap chain with buffer count 3), why is the GPU blocked by that?
  10. acerskyline

    D3D12 Fence and Present

    Based on your reply, I changed the original intel diagram a little bit just to make sure I understand what you mean. The first diagram is the original one. The second diagram is what I made. The third one has some marks so that you know what I'm talking about. Looking at the third diagram, you can notice the red rectangle indicates what I changed. I made the GPU work last longer. It caused some other changes to the pipeline. Indicated by the yellow rectangle, I presume this is what you mean by . The GPU work lasts longer for that frame. Consequently, the "present queue" has to wait for the GPU to finish this frame. Also, by I think you are saying now that the "present queue" will wait for GPU work to finish, we might as well think it as it will not be put in the "present queue" until GPU finish its work for that frame. 1.Now, my first question is, which way visualize what happens on the hardware level better? (Even though they make no difference conceptually. It only changes where the start of a "colored block" in "present queue" is conceptually and the start does not matter as much as the end.) 2.My second question is, within the green rectangle, the (light blue) CPU thread is blocked by a fence(dark blue) and then blocked by Present(purple), am I right? 3.My third question is, within the blue rectangle, the brown "GPU thread" (command queue) is blocked by a present to render target barrier, am I right?
  11. acerskyline

    D3D12 Fence and Present

    Yeah! I totally agree. I am waiting for this. So, if Present is a queued operation, why does this diagram indicates that the CPU thread generate two "colored blocks", one on GPU queue and one on "present queue", and the time line looks like the Present is ahead of the actual rendering. Does it make sense?
  12. acerskyline

    D3D12 Fence and Present

    Sorry, I should have given it a little explanation. What I'm asking is what will happen if the GPU haven't finished rendering the frame but the Present is being "executed" to display it. The reason I didn't draw the rest of the pipeline is not that it's drained, it's just because I don't think its necessary to show the rest since it's irrelevant and it's also a lot of work to type the rest of it ; ).
  13. acerskyline

    D3D12 Fence and Present

    Continue my previous example, please bare with me. Now, assume one of the previous 3 frames is done - really done, as in on-screen, and the GPU workload for the current frame is very heavy. ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 0.We have already completed step 1 to step 8 for 3 times.(i = 1, 2, 3. Now i = 1 again) 1.WaitForSingleObject(i) 2.Barrier(i) present->render target <---------------- "GPU thread" (command queue) was here 3.Record commands... 4.Barrier(i) render target->present 5.ExecuteCommandList 6.Signal 7.Present <------------------------------------------------- CPU thread was here 8.Go to step 1 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| cpu ... present| gpu ... barrier|----------------heavy work----------------| ----------------------------------------------------------------------------- | 3 | 1 | | 2 | 3 | 1 | | 1 | 2 | 3 | 1 | ----------------------------------------------------------------------------- screen | 3 | 1 | 2 | 3 | ? | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| My question is: What should the question mark be in the diagram above? Or will this happen? Thanks!
  14. acerskyline

    D3D12 Fence and Present

    Isn't calling WaitForSingleObject on a fence block the CPU thread? Also, I am wondering does Present block GPU thread? Assume I have called Present 3 times very quickly, before the 4th time I call Present, I called ExecuteCommandList. After ExecuteCommandList, I called Signal and then I called Present. So it looks like this: 0.We have already completed step 1 to step 8 for 3 times.(i = 1, 2, 3. Now i = 1 again) 1.WaitForSingleObject(i) 2.Barrier(i) present->render target 3.Record commands... 4.Barrier(i) render target->present 5.ExecuteCommandList 6.Signal 7.Present 8.Go to step 1 Under this circumstance, please answer my following questions: A.Step 1 may block CPU thread if previous work of frame 1 is not finished on GPU. Am I right? B. Assume previous work of frame 1 has finished on GPU. Step 7 may block CPU thread if none of the previous 3 frames is done - really done, as in on-screen. Am I right? C.If the answer to B is yes, then CPU thread will be blocked at step 7 but command list has already been submitted, what will happen to GPU thread? Will GPU thread be blocked? If yes, by what (If the answer is yes, I'm suspecting by the barrier recorded in step 2 (present->render target barrier))? If no, where will GPU render to when none of the previous 3 frames is done - really done, as in on-screen?
  15. I have been trying to figure out how does fence and present synchronize the pipeline when using vsync. I have read https://computergraphics.stackexchange.com/questions/2166/how-does-vsync-affect-fps-exactly-when-not-at-full-vsync-fps, https://www.gamedev.net/forums/topic/677527-dx12-fences-and-swap-chain-present/, https://www.gamedev.net/forums/topic/679050-how-come-changing-dxgi-swap-chain-descbuffercount-has-no-effect/, https://software.intel.com/en-us/articles/sample-application-for-direct3d-12-flip-model-swap-chains and https://docs.microsoft.com/en-us/windows/desktop/api/dxgi/nf-dxgi-idxgiswapchain-present. But I'm still a little confuesd. My major question is, assuming we are using triple buffer, will Present block cpu thread? If yes, when will it block cpu thread? I made this picture, please tell me which combination is the correct situation for next frame? In my opinion it should be B,E,H. But if it is really B,E,H, it doesn't conform to what the link#4 suggest under the classic mode section. As a matter of fact, I don't even understand how could GPU thread be 2 vsync late than CPU thread in the first place in that situation. Also, if it is really B,E,H, it doesn't conform to what Nathan Reed suggested in link#1. It seems in his example, cpu thread is not throttled by Present or vsync at all. Cpu threads start to work right after gpu finish its work.
  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!