1) How does this work in practical terms?  Is the CPU side, "generating the draw calls", just building the command lists with calls to DrawIndexedInstanced and the like?  And then to actually perform the rendering, the GPU side, you call ExecuteCommandLists?
That's not exactly how it works. When you call ExecuteCommandLists() it's more the equivalent of how you called draw() previously on an immediate context. It's like a very efficient draw() (because supposedly the hard work has been done already).
What happens next is that the driver/OS will queue those calls (draw and executecommandlists) as the GPU processes them in order they've been received. That's the queue you're concerned about.
So you don't have really to do anything to make the GPU go behind the CPU. It is already behind it and the more GPU bound your are the more behind the CPU it's going to be. (if you are totally CPU bound.. then the GPU is only 0 step behind the CPU).
What you can do is take measures to limit how far ahead the CPU can go by waiting on the CPU side for the GPU to advance a certain point before submitting commands again (you can use fences for example). The reason you'd want to limit how far ahead the CPU is is for example : to save memory (you have to keep things in memory for as long the GPU can use them, so for buffers, the highest the number of commands in flight the bigger the buffer have to be. We're not talking about vsync and double buffering here (the sync is between GPU and screen) but more like constant buffer updates, dynamic vertex data, renderstates and so on (the sync is between CPU and GPU)), and to limit latency (if you record commands too early then the player will see the result of their actions a long time after they've done them).
TLDR; what I wanted to say is that execute() is not the message for the GPU to go ahead immediately. There are more queues involved and that's what is meant by letting the GPU go behind the CPU.
2) In terms of multi-threaded rendering, is that a misnomer?  Are the other threads just generating draw calls, with the main rendering thread being the only thing that actually calls ExecuteCommandLists?  Or can you simultaneously render to various textures, and then your main rendering thread uses them to generate a frame for the screen?
I don't think anybody (who knows how things work) actually think that. Your GPU mostly accepts things in a serial manner (while being able to process them in a massively parallel way..). Note : In d3d12 you are also able to submit work to separate engines (who will begin and end things on a separate queue but they target the same processing units). BUT the "multithreaded" word refers of course to the CPU building of commands. Building those commands is expensive, the submitting part is less expensive so you factor out the building you can reduce the cost to mostly "submitting" and building in parallel (the multithreaded part). On older APIs like d3d9 and d3d10 (discounting the multithreaded D3D11 that was supposed to work more like d3d12 today but didn't for variety of reasons) the building and submitting are basically one and the same and because the submitting had to happen in a serialized fashion you couldn't get much advantage to using multiple CPU cores for building the commands (you could get some but let's not go there).