D3D v Vulkan -- for D3D v GL, I'd go with D3D without hesitation, but D12/VK are pretty much the same as each other. D12 is a bit easier IMHO.
RTX is just NVidia's marketing buzzword for "supports RTRT APIs"
Metalness + color VS diffuse color + specular color / roughness VS glossiness isn't much of a muchness. They're two ways of encoding the exact same information. You can easily support both (which could be useful if you're sourcing artwork from different places). Cavity maps are an additional feature that work with both encodings in the same way.
Super-sampling is just rendering at a higher resolution than the screen. e.g. Draw to a 4k texture and then resize it to 1080p for display on a 1080p screen. Dynamic resolution is the same thing but you pick a different intermediate/working resolution each frame, based on your framerate. Often it's used to under-sample (render at a lower rest than the screen).
If money and time aren't an issue, implement them both and see which one performs better on your specific game scenes.
I've been working on an indie game in a custom engine mostly-full-time for years, so kind of doing this. Before that, I was working professionally as a graphics programmer on a game engine team, so knew what I wanted -- first and foremost, my new renderer had to be easy for a graphics programmer to work with, easy to experiment with new features, easy to change. No two games that I've worked on have ever used the same renderer, so I knew that switching out algorithms easily had to be easily supported in my ideal renderer.
Our game started off as traditional deferred + traditional forward (able to switch between them at runtime), then tiled deferred + tiled forward (able to switch between them at runtime), then clustered forward (only).
Other features like shadows (many techniques) , SSAO, reflection probes, SSR, motion blur, planar mirrors / portals, etc, occasionally need to be added or experimented with... So there needs to be enough flexibility to slot these (or techniques that haven't yet been invented) into the pipeline.
One of my inspirations for this was Horde3D's data driven rendering pipelines, where you told the engine how to render a scene with an XML file! I managed to convert Horde3D from traditional deferred to Inferred Rendering in a weekend by only writing a little bit of XML and GLSL. That impressed me a lot as a graphics programmer (it was so much nicer than the 'professional' engine I was using at work at the time...)
This concept has largely caught on and is commonly referred to now as a "frame graph". Each step of an algorithm/technique is represented as a single input->process->output node, and then a data/configuration/script file uses those nodes to build a graph of instructions on how the frame will be drawn. This makes it very easy to modify the frame rendering algorithms over time my experiment with new features, but, it also allows the engine to perform lots of optimisation when it comes to D3D12 resource transition barriers / VK render passes, render target memory allocation and aliasing, and async compute as well!