Input assembler moved completely into the vertex shader. You bind resources of pretty much any type to the vertex shader, access them directly via texture look-ups. Would make things a lot simpler and more flexible IMHO. Granted you sort-of can do this already, but I'd be nice if the GPUs/drivers were optimized for it.
GPU's already work this way. The driver generates a small bit of shader code that runs before the vertex shader (AMD calls it a fetch shader), and all it does is load data out of the vertex buffer and dump it into registers. If you did it all yourself in the vertex shader there's not really reason for it to be any slower.
Depth/stencil/blend stage moved completely into the pixel shader. Sort of like UAVs but not necessarily with the ability to do 'scatter' operations. Could be exposed by allowing 'SV_Target0', 'SV_Target1' ect... to be read and write. So initially its loaded with the value of the target, and it can be read, compared, operated on, and then if necessary written.
Programmable blending isn't happening without completely changing the way desktop GPU's handle pixel shader writes. TBDR's can do it since they work with an on-chip cache, but they can't really do arbitrary numbers of render targets.
Doing depth/stencil in the pixel shader deprives you of a major optimization opportunity. It would be like always writing to SV_Depth.