# SLI GPGPU Question

This topic is 2190 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## So, I recently found out that it is possible to render shadow maps using WARP devices, on an independent CPU thread. This brought up some questions Is it possible, if the user has 2 Graphic Cards, to use one for GPGPU (ie: calculating shadow maps, and rotating spherical harmonics), while the other one does the primary rendering? If so, can someone provide me with a code example, or at least pseudo code?

##### Share on other sites
Using a 2nd card explicitly (as opposed to the SLI/Crossfire way) is done by creating a 2nd device in the exact same way as using WARP, but declaring the adapter directly. Once done, you can use the device however you would normally. Note that Shaders, Vertex Buffers etc are all created on the Device level, and so would need to be created for both devices.

Resources can be shared between 2 devices, but at this time its restricted to (IIRC) 2D RGBA8 render targets without mipmaps. Syncing between the 2 devices, is not a trivial issue though, as you really don't want to block either GPU or the CPU, and its hard to enforce timing across the board without doing so.

##### Share on other sites
Thanks for the reply, just a few more questions.

1) What if the cards have different GPUs? (ie: A GTX 590, and a GTX 460) Can I use the smaller spec card for computation?

2) Do both devices have to be controlled by the main thread? Or can I use them on the thread which they were created on?

3) How do I check for multiple GPUs? Is this built into DirectX, or do I have to check the registry, and if so, what key?

4) Can I use one for CUDA, while the other one uses DirectX, and interpolate between the two cards?

Finally, 5) If the primary card doesn't meet the graphical requirements of my game engine, can I do the rendering on the other one, and display it through the primary card?

Sorry for the barrage of questions, it's just that I never really looked at integrating SLI/Crossfire into my engine, until this point

##### Share on other sites
1 & 2) When using the 2 cards as distinctly different video cards, then you aren't restricted in their use.
So, yeah you can use different cards and different threads.
However, 5) is the case where you would really want to use SLI/Crossfire and that does require very-similar if not the exact same video card.

3) You want to enumerate the device adapters, http://msdn.microsoft.com/en-us/library/windows/desktop/ff476877%28v=vs.85%29.aspx

4) If by interpolate you mean share data between, then yes but you may need to route data via the CPU which can be painfully slow. I've not looked into CUDA myself very much though.

SLI/Crossfire is largely automatic as long as your game does things in the right way, along with themselves having a few different configurations to support different means of rendering. Plenty of whitepapers from both nVidia and ATi/AMD on the matter.

##### Share on other sites
So, I read some of their white papers, but what I don't understand is that why one card renders one frame, and then the other card renders the next. Doesn't that mean that there is always one card idle, alternating between frames? And can I use that to speed along computation?

##### Share on other sites

So, I read some of their white papers, but what I don't understand is that why one card renders one frame, and then the other card renders the next. Doesn't that mean that there is always one card idle, alternating between frames? And can I use that to speed along computation?

What would be the purpose of having multiple GPU's if any of them was idle and waiting for next frame to be rendered.

Consider when the CPU has submitted all the work of a frame for the first GPU, the GPU hasn't probably finished drawing the frame yet. It is still processing a list of commands.
When there is a second GPU available, the CPU can continue the next frame immediately without waiting for GPU to finish the frame. After the second frame, the first GPU has (probably) become availableagain and the drawing can continue without interruption.

Cheers!

##### Share on other sites
Kauna is right on that mode. Note that this means that if you use a rendertarget from the previous frame on this frame, then you will always be stalling your GPUs and they will run one after the other.

I remember also reading of an alternate mode where each video card renders half the screen. Downside to this is if you try to fetch rendertarget memory thats being written by the opposing GPU, you will introduce stalls.

One fun thing to try with SLI, is doing your frame rendering on 1 video card, and then doing *all* of your post processing and HUD on the 2nd video card. So, while your 2nd card is processing the previous frame, your 1st video card is rendering out the current frame.

##### Share on other sites

Kauna is right on that mode. Note that this means that if you use a rendertarget from the previous frame on this frame, then you will always be stalling your GPUs and they will run one after the other.

I remember also reading of an alternate mode where each video card renders half the screen. Downside to this is if you try to fetch rendertarget memory thats being written by the opposing GPU, you will introduce stalls.

One fun thing to try with SLI, is doing your frame rendering on 1 video card, and then doing *all* of your post processing and HUD on the 2nd video card. So, while your 2nd card is processing the previous frame, your 1st video card is rendering out the current frame.

Doesn't this mean that there is a dependency between the GPUs and they need to synchronize the frame buffers which of course has a negative impact on the performance.

As far as I know, with the SLI/CrossFire, there shouldn't be any dependency between the current and the previous frame, otherwise the GPUs need to synchronize.

Cheers!

##### Share on other sites

Kauna is right on that mode. Note that this means that if you use a rendertarget from the previous frame on this frame, then you will always be stalling your GPUs and they will run one after the other.

I remember also reading of an alternate mode where each video card renders half the screen. Downside to this is if you try to fetch rendertarget memory thats being written by the opposing GPU, you will introduce stalls.

One fun thing to try with SLI, is doing your frame rendering on 1 video card, and then doing *all* of your post processing and HUD on the 2nd video card. So, while your 2nd card is processing the previous frame, your 1st video card is rendering out the current frame.

I am with Kauna on that, it seems as if there would be a very large amount if syncing.
What I am thinking about doing is having one card calculate all the direct lighting, and then using Wavelet Radiance Transport
(http://www-ljk.imag.fr/Publications/Basilic/com.lmc.publi.PUBLI_Inproceedings@1172c0fd434_ed088a/index.html)

The only issue is the fact that I need to have he previous frame so that it can be converted from Direct Illumination to Indirect Illumination, and then re-apply the indirect illumination. Any ideas how I can approach this?

##### Share on other sites

Doesn't this mean that there is a dependency between the GPUs and they need to synchronize the frame buffers which of course has a negative impact on the performance.

As far as I know, with the SLI/CrossFire, there shouldn't be any dependency between the current and the previous frame, otherwise the GPUs need to synchronize.

Cheers!

Yes, thats true. Though theres always a small dependancy between the two, and heavy restrictions on what can be transferred from one to the other without killing performance. Don't forget that SLI/Crossfire cards have a bridge connecting them - to my knowledge this is for direct memory transfer between the cards without passing through the motherboard. Theres also the ability to account for some latency between the two.

Its likely that cross card performance will get faster over the next few generations of cards due to their more-frequent use as general purpose units.

##### Share on other sites

[quote name='kauna' timestamp='1327045376' post='4904502']
Doesn't this mean that there is a dependency between the GPUs and they need to synchronize the frame buffers which of course has a negative impact on the performance.

As far as I know, with the SLI/CrossFire, there shouldn't be any dependency between the current and the previous frame, otherwise the GPUs need to synchronize.

Cheers!

Yes, thats true. Though theres always a small dependancy between the two, and heavy restrictions on what can be transferred from one to the other without killing performance. Don't forget that SLI/Crossfire cards have a bridge connecting them - to my knowledge this is for direct memory transfer between the cards without passing through the motherboard. Theres also the ability to account for some latency between the two.

Its likely that cross card performance will get faster over the next few generations of cards due to their more-frequent use as general purpose units.
[/quote]

Remember that your monitor is only connected one of your cards, so in standard AFR rendering on both Radeon and Geforce GPUs every second frame is sent over the SLI/Crossfire bridge, so having synchronization between the GPUs isn't always a problem since the SLI/Crossfire bridge has an insane amount of bandwidth. The idea of doing post-processing on the second GPU will not have any overhead if you ask me, since the first GPU will not be touching that data again, so it does not have to wait for the second GPU in any way and can immediately begin rendering the next frame. I do however fail to see how this is going to give any benefit over standard AFR. With only one GPU it has about 16.666ms to render a complete frame for a smooth 60 FPS frame rate. With 2 GPUs they only have to spew out 30 frames per second each, meaning they have around 33.333ms each. In an optimal case for doing post-processing on another GPU both rendering and post-processing would take the exact same amount of time, meaning 16.666ms each. The actual data transfer between the CPUs would not take any effective time of rendering away from neither GPU since it can be done in parallel to the first GPU rendering the next frame. The data transfer would however introduce a small overhead to the total time it takes from OpenGL/DirectX commands to a complete rendered frame.

Consider this extremely artistic little chart:

 AFR: GPU 1 <---Frame0---> <---Frame2---> <---Frame4---> etc GPU 2 <---Frame1---> <---Frame3---> etc Postprocessing on a different GPU Fr? = Frame no ? R = rendering D = data transfer P = postprocessing GPU 1 <---Fr0 R---> <---Fr1 R---> <---Fr2 R---> <---Fr3 R---> etc Data <-Fr0 D-> <-Fr1 D-> <-Fr2 D-> <-Fr3 D-> GPU 2 <---Fr0 P---> <---Fr1 P---> <---Fr2 P---> etc 

After the first frame has been rendered and transmitted by the first GPU, both GPUs would be working 100% without any stalling. The data transfer overhead would however make it worse than basic AFR because of delay which would be perceived as input lag.

##### Share on other sites

I do however fail to see how this is going to give any benefit over standard AFR.

The standard AFR requires/highly recommends that you do not include any rendering that takes the results of the previous frame. This means, all incremental-rendering tricks, like rain streaks down a texture, surface degradation, reprojection-cache tricks, even delta based motion blur (as much as i dislike that effect) cause stalls in the standard AFR - it introduces a bottle neck on both cards, as the every frame depends on the previous frame. However, running the 3D scene and post processing on different cards bypasses this rule, data is only piped from the 3D scene GPU to the post processing GPU, and both can then independantly perform tricks based on the previous frame without interference.

##### Share on other sites

[quote name='theagentd' timestamp='1327229883' post='4905076']
I do however fail to see how this is going to give any benefit over standard AFR.

The standard AFR requires/highly recommends that you do not include any rendering that takes the results of the previous frame. This means, all incremental-rendering tricks, like rain streaks down a texture, surface degradation, reprojection-cache tricks, even delta based motion blur (as much as i dislike that effect) cause stalls in the standard AFR - it introduces a bottle neck on both cards, as the every frame depends on the previous frame. However, running the 3D scene and post processing on different cards bypasses this rule, data is only piped from the 3D scene GPU to the post processing GPU, and both can then independantly perform tricks based on the previous frame without interference.
[/quote]
That's true, as long as each GPU only reuses its own frame buffers. The first GPU obviously does not have access to the post-processed frame, but that's the only limitation I guess. It does enable some really nice effects that could reduce the performance of AFR,

It's all a very interesting idea, but wouldn't it still be hard to load balance between the two GPUs? Since the first GPU's load would depend a lot on the scene, how many objects that are visible, what shaders are used, e.t.c while post-processing pretty much only depends on the number of pixels. I suppose it would be pretty easy to find quality settings that makes post-processing take around 16.666ms for the second GPU, but the load on the first GPU can vary wildly. In a game that does not have an equal load between rendering and post-processing this would not scale well at all...

##### Share on other sites

In a game that does not have an equal load between rendering and post-processing this would not scale well at all...

Quite so. Its more of an exercise in learning GPU bottlenecks. Given the design for devices & device contexts in DirectX11 though, its quite easy to support both ways and even switch between them.

I found it interesting to design code in this way on the account of PS3 development, where we find it common to offload postprocessing onto the SPUs instead.

##### Share on other sites

[quote name='theagentd' timestamp='1327287236' post='4905308']
In a game that does not have an equal load between rendering and post-processing this would not scale well at all...

Quite so. Its more of an exercise in learning GPU bottlenecks. Given the design for devices & device contexts in DirectX11 though, its quite easy to support both ways and even switch between them.

I found it interesting to design code in this way on the account of PS3 development, where we find it common to offload postprocessing onto the SPUs instead.
[/quote]

It could be possible to load balance between the GPUs by moving tasks that do not require data from the previous frame between the GPUs. In deferred rendering, if we assume that the first GPU can always manage to setup the G-buffers in the time it has and that the second GPU can always manage the post-processing in time, we can have both GPUs render lights and change the ratio of lights rendered by each GPU in realtime based on timing measurements. The same could be done for tile-based deferred rendering too by changing the ratio of tiles lit by each GPU, This could obviously also work with completely different GPU as long as the earlier assumption still holds.

This sure is interesting. Sadly I don't have access to a multi-GPU computer right now, and I also use OpenGL, not DX...

##### Share on other sites
sli should also work in opengl.

I think the really interesting path would be to render screen/view space effects with one GPU, while the other GPU makes world space calculations and bakes those into textures. would be kind of like a realtime megatexture. that would actually be quite a nice way to speed up rendering in split screen gaming on consoles

I was doing something like that on a psp, as the psp does not support multi texturing and fillrate was also quite limited, you wanted to draw all surfaces with just one drawcall, on the other side, there is a "media engine" which basically idles all the time. I've been blending up to 16 terrain layers + lightmap and dxt1 compressed them into the VMEM.

I could imagin a game like e.g. portal, with very simple level shapes, yet nice lighting, could benefit a lot from a realtime GI solution that is baked into the textures.

##### Share on other sites

sli should also work in opengl.

[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

## I could imagin a game like e.g. portal, with very simple level shapes, yet nice lighting, could benefit a lot from a realtime GI solution that is baked into the textures. [/font]

Portal does use GI http://www.valvesoft...ourceEngine.pdf

And that sounds quite interesting, but if you don't manage the GPUs properly, and there is some small lag between them, then all of a sudden you have really ugly texture popping. Look at how RAGE implemented their Mega texture. At release, it failed on some GPUs, then it took them multiple days to release a patch that degraded performance even farther!(But notably fixed some of the issues) Mega textures aren't ready for real world usage, without a lot of further developments