Just to reminder you that copying data from GPU to CPU (using staging buffer) is usually not very fast. You should check the performance, if you need to make a lot of round-triples.
The slowness may be effect of trying to access the copied data right-away after copy. You should do something else before using the results in order to avoid unnecessary GPU-CPU synchronizations. Ie. the CopyResource is asynchronious, but mapping the data right after will make the CPU to wait the GPU to finish it's tasks.
Cheers!