[font="Arial, Helvetica, sans-serif"][color="#222222"]The problem isn't DirectX itself, the problem is the PC-specific implementations of DirectX that are lacking these low-level features.[/font]
And the question really should be, is that low level even feasible on a PC?
Lets just talk draw calls. The 360 and the PS3 GPUs read the same memory that the CPU writes. The API level is just blasting the draw call and state information out of the CPU caches to main RAM. On a PC, that data has to be flushed, then DMA'd (which depend on non GPU or CPU hardware) to the GPU's memory where it can be processed. It may not seem like much, but its a substantial amount of work that makes this all happen reliably.
Even if the PC version can be stripped down to the barest essentials, you would still see large discrepancies in the number of things you could draw without instancing. The PC drawcall limits haven't really gone up in years despite huge performance improvements in GPU chipsets. This is because that type of data transfer (small block DMA) is not what the rest of the PC architecture has been made to handle.
Either that or we get rid of the GPU and go from ~10 core CPUs to ~1000 core CPUs. I would prefer that tbh, would be more efficient I think. Duplicating information on the GPU that I already have much better organised on the CPU is a pain for a start. Let the GPU die.
I actually like the separation, it makes programming GPU's far easier. Modern GPU's are a nice example of constrained and focused parallelism. Even neophyte programmers can write shaders that operate massively parallel and never deadlock the GPU. Most veteran programmers I've met that are working on general massively parallel systems still fight with concurrency issues and deadlocks.
Sure, you could constrain your massively parallel system on these new 1000 core CPUs such that you have the same protection, but then you have just re-invented shaders on less focused and probably slightly slower hardware.
With great power comes great responsibility. My experience with seasoned programmers and dual core machine has lead me to be skeptical that these massively parallel systems will actually be "general" in practice.
My prediction, 95% of the engineers that would use such a thing would subscribe to an API/Paradigm as restrictive as the current shader model on GPUs. The other 5% will release marginally better titles at best and go stark raving mad at worst.