It does not work that way. You have no guarantee on the order of execution (much less on the order of completion) inside a single draw-call.
It's really simple. Multiple execution units --> race conditions. You see those GPU blocks on every article each time a new GPU is released.
The only decent way to do order-independant-transparency is using D3D11 linked lists in my opinion.
The order that a primitive is rasterized and written to a render target is the same as the order in which you submit those primitives. This is part of the DX spec, and is guaranteed by the hardware. In fact the hardware has to jump through a lot of hoops to maintain this guarantee while still making use of multiple hardware units. This means that if you were able to perfectly sort all primitives in a mesh by depth, you would get perfect transparency. The same goes for multiple instances in a single draw call. The only case that's totally impossible to handle without OIT is the case of intersecting primitives.
Are you sure about this behavior? How can this be assured when multiple primitives are being rasterized in parallel? There is also some gray area regarding generated primitives too (via tessellation or the geometry shader) as they can be generated in parallel instances of the shaders...
I have always heard that the order is roughly equivalent to the order they are submitted in, but that they are explicitly not guaranteed to be processed in exact order.
Definitely. You're never guaranteed about the order in which vertices/primitives/pixels are processed in the shader units, but the ROPS will guarantee that the final results written to the render target match the triangle submission order (which is often done by buffering and re-ordering pending writes from pixel shaders). This is even true for geometry shaders, which is a big part of what makes them so slow.