I get the feeling AMD/Nvidia are still scrambling to get this all working properly still but to be fair there has been a lot of changes and many API's to support currently.
Vulkan feels very solid, not the early beta kind of i have expected. Probably because it's similar to existing DX12/Mantle and requires only a fraction of complexity of a OpenGL driver.
Downside is initially you need a LOT of code to get even simple things going, but it does not hide the hardware behind a black box like OpenGL, so finally development is faster. (Less guessing and trial and error to find the fastest of many possible ways).
Validation layers tell you pretty much everything you do wrong in plain english. How often did i get stuck with OpenGL not rendering anything and i have had no clue why - this never happens with Vulkan.
The only thing i miss is profiler support - AMD CodeXL does not yet work for Vulkan, which is why i still develop in OpenCL first and port to GLSL aided by a preprocessor.
(I don't think RenderDoc shows the interesting things like occupancy, register usage etc.)
Maybe it's better on Nvidia, don't know.
The persistence of register state surprised me and is quite interesting - I don't know if AMD can do that in their new generation?
I'd guess any hardware can do this to some point but doupt it will be exposed in next Vulkan or SM 6.0.
Probably it's in use to do things like keeping static shader input parameters in registers, keeping the texture cache in LDS for the next pixel etc.
Same for similar things like device side enqueue (OpenCL 2.0) - AFAIK this is in Mantle, but neither in DX12 nor Vulkan.