Using two contexts is unlikely to help you since it's a driver issue. Or rather, a deliberate driver "feature".
In theory, transfers are CPU/GPU asynchronous (and render-asynchronous), in practice they are only client/server asynchronous.
I can't quote a source right now and might be wrong on the exact hardware generation (though I believe it was in Cozzi and Riccio's book?). Basically, the thing is that pre-Kepler (or was it Fermi? I think it was Kepler) hardware has one dedicated DMA unit that runs inependently of the stream processors, so it can do one DMA operation while it is rendering, without you doing anything special. However, only Quadro drivers actually use this feature, consumer-level drivers stall rendering while doing DMA. Kepler and later have 2 DMA units and could do DMA uploads and downloads in parallel while rendering, but again, only Quadro drivers use the hardware's full potential.
AMD has as far as I remember (not 100% sure on that) a similar issue, but it's not because a driver but an actual hardware thing.