Yes, something like that. That's a typical wrapper.
Originally, it had a d3d9.dll that got loaded by the game, probably because it has the same name as the actual Windows library. That means that the game would load the "fake" d3d9.dll and then that dll would call the proper library?
Yes and no. A DLL as such does not call functions in a program, it is just a PE image that is loaded into the address space of a process. However, a thread running the executable code in a DLL could certainly call functions in the program, presumed that you are able to determine the function's address and know what it does (and what parameter it takes, etc.).
So, with those methods one could load a dll, but how that dll would interact with the process? Can the dll call functions in the .exe?
Usually, it works the other way around. Normally, the injected DLL replaces some original functionality, and it's a thread from the original process that calls (without realizing) a function in the DLL.
A small program using CreateProcess, DebugActiveProcess, and WaitForDebugEvent to launch the "real" program.
The debugger would need to be compiled into the exe or its just another dll that can be injected?
Of course most non-trivial titles will take more or less sophisticated anti-debugger measures.
Also, note that you are almost certainly breaking the EULA (read as: using a good without permission, thus becoming a criminal in most countries) with that.
That's all nice and good, but it's almost certainly a breach of EULA. Most probably nobody will care, but in the worst case that guy could find himself in a $50M tortious interference lawsuit.
So, this guy comes ahead with a profiler, catches up the issue and codes in ASM a replacement for some critical functions in Skyrim's exe
It is highly unlikely that Skyrim for PC shipped as a debug build. The likely reason why the developers of Skyrim didn't do the same optimizations in the first place (you have to assume they're not complete idiots either!) is that performance was deemed "good enough" on the targeted hardware and that spending extra time on optimizing (and delaying the product) was therefore economically not viable.
Two years ago, one might have added "or maybe they didn't want to depend on SSE2 or have an extra code path", but this is unlikely now, seeing how no modern version of Windows works without SSE2 at all anyway.