I'm just starting to finish up a simple Vulkan wrapper library for some hobby projects, so anything I say should be taken with a grain of salt. Also I've only played with Vulkan and not DX12...
I've focused my fine-grained synchronization around VkFence. I have a VkFence wrapper that has a function ExecuteOnReset(). I can pass any function object, and when I reset the fence, all the stored functions get executed. When I have any resources that need to be released/recycled at a later time (when they are no longer in use) I simply add the cleanup function to their associated fence. At some point in the future I will have to check/wait on that fence, when the fence is signaled I then present the associated swap buffer image and reset the fence, which causes all the associated cleanup functions to execute.
Its surprisingly simple and efficient, and handles nearly 95% of all synchronization. I tried a couple other methods, and found this was by far the easiest to both implement and use. It was really one of those 'ah-ha' moments. All the other attempts at making a full blown all bells-whistles resource manager either was very complex, inefficient, or awkward; and I found that no matter what I did I was always passing around VkFence's to synchronize on anyways. So I eventually just decided to stick it all in the fence and be done with it.
I also cache/re-use command pools. So my Device class allows manual creation/destruction of pools, but also allows you to pull/return pools from a cache so I'm not constantly re-creating them every frame. Coupled with the above Fence class drawing is usually as simple as: request a command pool, create command buffers, fill buffers, submit buffers, pass pool to fence to be recycled. If I want to store/reuse the command buffers for later that's trivial as well. I know online a lot of people are talking about creating the command buffers once, then using draw indirect. I have a hard time believing this will be a better option; but I could be wrong and have no data to go on. I'd love to see a comparison of the two styles: dynamic/resuse command buffers vs static buffers with dynamic draw data manually uploaded with a proper benchmark.
The problem I find with fixing command pools or resources ahead of time is that you really don't know what/how many you'll need before hand. If you're managing each thread 'by hand' it can probably work (ie. I need 3 command pools for each thread to rotate on, one thread for physics, one thread of fore-ground objects, one for UI, etc...), but I'd rather just throw everything at a thread pool and let things work themselves out. On top of that sometimes you want to re-use command pools, other times you want to recycle them. I found it quickly became impractical to manage. So the cache system works great. Any thread can pull from the cache, any thread can recycle to the cache. I can just toss all the rendering jobs at a thread/job pool without any pre-planning or thought, the command pools are recycled using the Fence's, the command buffers returned via Future's from the threads. Its stupidly simple to use and implement and I like that.
As far as updating dynamic data (apart from using push constants whenever possible) for the vast majority of buffer updates (matrices, shader constats, etc...) I'm using vkCmdUpdateBuffer(); this means I only need to allocate the buffer/memory once and can re-use it each frame (no buffer rotations necessary, but you do need pipeline barriers). For the rather rare cases where I actually need to dynamically upload data each frame and I can't use push constants or vkCmdUpdateBuffer() I'm writing two dynamic memory allocators. The first is a very simple slab/queue allocator designed to handle situations where allocations/free's occur in order. The second is a buddy allocator for situations where allocations/free's happen randomly.
I'm not claiming that what I've done is optimal, just thought I'd throw it up for discussion/idea purposes. I'm interested as well to see what others have done/are planning to do.