DX11 [DX11] Handling multiple GPUs

I did alot of googling on the subject of multiple GPUs. Both NVidia and ATI have presentations available, but all they give are guidelines on how to structure your rendering code to make AFR (alternate frame rendering) work efficiently.

My goal is quite different. I would like to perform standard rendering on GPU1, and at the same time, have compute shader code running on GPU2 (ie no AFR).

Does anyone have any pointers / links / advice on how I can test a setup like this? My current hardware configuration consists of 2 crossfired HD5750's.

Is this even possible through the DX11 API? If I were able to achieve this, would I even be able to share compute shader results from GPU2 to GPU1 without having to read results back to the CPU first?


Can't you just pass separate adapter identifiers for two D3D 11 devices? That would be the most straightforward way, but it will require you to un-crossfire your cards.

Hmm... true, that would do that job, though I was hoping for some way to share data between the GPUs through the crossfire / SLI bridge. That is unless there is a very fast way of getting the data to the other GPU without it?

Not really, that's why there's a bridge. But Crossfire and SLI merge the cards into a single logical GPU, so you can't allocate tasks to one or the other.

Sure thing.

What I plan on doing is testing two setups:

1) Crossfire the cards as normal. Do my compute shader stuff, followed by my rendering stuff in series (paying attention to guidelines given by ATI for alternate frame rendering)

2) Uncrossfire the cards. Create two separate D3D devices, one per card. Do all my compute stuff on one device, and all my rendering stuff on the other device. As soon as results become available from the compute device, upload to CPU, then load onto rendering GPU for use.

If you're curious, the "compute stuff" is a patch based radiosity solver. So while it doesn't NEED to be completely in sync with the rendering, it would certainly be more visually pleasing to see lighting updates as the light moves around, rather than having some delay in getting results.

I expect that 1) will give the most visually pleasing results, but the rendering might be bogged down if the solving step takes longer than theres room for. 2) would allow me to have the radiosity solved in a "separate thread" per se, independent of the rendering, which should make moving the camera around more snappy, but with delayed lighting results (which could turn out to be ok)

:) this is exactly the project I'm doing. If you uncrossfire the cards, you can iterate through the adapter list, create one device on each card, then copy buffers between the two using UpdateSubresource(). You have to copy the UAV buffer to a readable buffer, and then read from that, and update the render GPU with the info that you get. If they are crossfired together, DirectX sees them as one adapter, and you don't have very much control over distribution between the two.

I managed to get the uncrossfire version working with a particle simulation, where you compute on one GPU and render on another GPU. on ATI, the performance boost wasn't very high, but with two NVIDIAs, you get a huge boost (probably because of the difference in memory speed).

I have the code snippet for copying the buffers between two devices posted here:

[D3D11] Map() from two different devices causing system hangup / freeze

if you run into a similar problem as the one I posted, let me know, I've been pulling my hair out for weeks trying to get two compute devices to work concurrently.

