Yeah your three-level organization matches up with what I do too. My front-end advice is here.
Honestly, most of new features actually aren't required by your high-level layer, 99% of the time, so many can be used only as internal implementation details.
I use a D11-style of state-setting API when constructing my draw-items (i.e. depth-stencil and blend modes are set individually, not as a PSO), but all state is baked into a PSO as part of that draw-item creation process. This makes the draw-item creation API friendly to users, but the execution of draw-items is still stupidly fast due to all the precomputation.
D3D11 has the deferred context already, which maps to modern command buffers. GL has some command buffer extensions. In D3D9/GL you can emulate command buffers yourself - obviously with worse performance characteristics than the real thing! So I expose command buffers in my front-end, but also a capability variable as to whether they're the real deal (D12/vulkan), semi-real (D11) or emulated. If the high-level wants to use a command buffer to move an entire large bit of rendering work to another thread (e.g. something that does it's own computations as well as generating low-level commands), then even emulated command buffers are useful for that as they let you move all those non-low-level computations onto another thread. If the high-level simply wants to process 1000 draw-items as fast as possible by splitting that low-level-only work over several threads, then emulated command buffers are not helpful. This makes for a uniform API, but the high level is responsible for choosing to use the feature or not, based on the way in which they intend to use that feature and the reported performance characteristics.
Resource heaps in D12/Vulkan can be kept entirely as an internal details without doing you too much harm. If you do want to make full use of them, you can expose them to the high level in a way such that resources with common lifetimes are known to the back end. e.g. instead of having the high-level create 10 textures which all have the same lifetime (loaded at start of level, unloaded at end of level) via 10 individual function calls, if you make a resource creation API where the high-level can complete that task with a single function call, then the back end can easily and invisibly put them all into a shared heap and track them as a single allocation (which simplifies your residency management).
Descriptors again can be hidden internally without hurting you too much -- dynamic descriptor management via a ring buffer, if done well, is still faster than D11's binding model :)
I abstract away reusable/static descriptor sets by exposing a resource binding system that's a slightly modified version of D11's... In D9, we bound individual constants (uniforms) to shaders, and then in D11 we bound constant-buffers (UBO's) to shaders instead. I do the same thing for textures -- instead of binding individual textures to a shader, I only allow the user to bind "texture lists", which are a collection of texture bindings. In HLSL this maps to a contiguous range of t# registers in the shader, and in D11's C++ side it maps to a single call to *SSetShaderResources, but in D12/Vulkan it maps to a reusable descriptor table.
Internally you'll use fences/etc to manage internal descriptor ring buffers and upload ring buffers, but the high-level code doesn't have much use for them. However, you've been able to implement fences since D9 -- they are not a new feature, so it's entirely possible to expose them in your front-end. Personally I have some capability flags that specify whether this back-end allows the CPU/GPU to signal a fence, and whether it allows the CPU/GPU to wait on a fence (4 bits). Every API lets the GPU do the signalling and the CPU do the waiting, which is what you need to build a safe CPU->GPU upload ring buffer, but the new API's allow all 4 communication options. I expose them in my front-end, but the high-level hasn't actually used them yet.
I dealt with barriers as an internal detail, and then exposed them to the high level... but then decided the impact on the high-level was too great and annoying, so went back to implementing them internally. However, I left behind a "hint" system, where the high-level can optionally inform the back-end about optimal points in the command buffer for transitions to take place.
D12 and Vulkan mostly match up to each other when it comes to features. D12 smooths over a few things at a higher level than Vulkan though, plus Vulkan has it's whole weird render pass system where you declare the upcoming render target bindings ahead of time, which is important knowledge for any GPU with dedicated render-target memory buffers...