SLI GPGPU Question

Started by
15 comments, last by WireZappIndieStudios 12 years, 2 months ago

[quote name='kauna' timestamp='1327045376' post='4904502']
Doesn't this mean that there is a dependency between the GPUs and they need to synchronize the frame buffers which of course has a negative impact on the performance.

As far as I know, with the SLI/CrossFire, there shouldn't be any dependency between the current and the previous frame, otherwise the GPUs need to synchronize.

Cheers!

Yes, thats true. Though theres always a small dependancy between the two, and heavy restrictions on what can be transferred from one to the other without killing performance. Don't forget that SLI/Crossfire cards have a bridge connecting them - to my knowledge this is for direct memory transfer between the cards without passing through the motherboard. Theres also the ability to account for some latency between the two.

Its likely that cross card performance will get faster over the next few generations of cards due to their more-frequent use as general purpose units.
[/quote]

Remember that your monitor is only connected one of your cards, so in standard AFR rendering on both Radeon and Geforce GPUs every second frame is sent over the SLI/Crossfire bridge, so having synchronization between the GPUs isn't always a problem since the SLI/Crossfire bridge has an insane amount of bandwidth. The idea of doing post-processing on the second GPU will not have any overhead if you ask me, since the first GPU will not be touching that data again, so it does not have to wait for the second GPU in any way and can immediately begin rendering the next frame. I do however fail to see how this is going to give any benefit over standard AFR. With only one GPU it has about 16.666ms to render a complete frame for a smooth 60 FPS frame rate. With 2 GPUs they only have to spew out 30 frames per second each, meaning they have around 33.333ms each. In an optimal case for doing post-processing on another GPU both rendering and post-processing would take the exact same amount of time, meaning 16.666ms each. The actual data transfer between the CPUs would not take any effective time of rendering away from neither GPU since it can be done in parallel to the first GPU rendering the next frame. The data transfer would however introduce a small overhead to the total time it takes from OpenGL/DirectX commands to a complete rendered frame.

Consider this extremely artistic little chart:



AFR:

GPU 1 <---Frame0---> <---Frame2---> <---Frame4---> etc
GPU 2 <---Frame1---> <---Frame3---> etc



Postprocessing on a different GPU

Fr? = Frame no ?
R = rendering
D = data transfer
P = postprocessing


GPU 1 <---Fr0 R---> <---Fr1 R---> <---Fr2 R---> <---Fr3 R---> etc
Data <-Fr0 D-> <-Fr1 D-> <-Fr2 D-> <-Fr3 D->
GPU 2 <---Fr0 P---> <---Fr1 P---> <---Fr2 P---> etc



After the first frame has been rendered and transmitted by the first GPU, both GPUs would be working 100% without any stalling. The data transfer overhead would however make it worse than basic AFR because of delay which would be perceived as input lag.
Advertisement

I do however fail to see how this is going to give any benefit over standard AFR.


The standard AFR requires/highly recommends that you do not include any rendering that takes the results of the previous frame. This means, all incremental-rendering tricks, like rain streaks down a texture, surface degradation, reprojection-cache tricks, even delta based motion blur (as much as i dislike that effect) cause stalls in the standard AFR - it introduces a bottle neck on both cards, as the every frame depends on the previous frame. However, running the 3D scene and post processing on different cards bypasses this rule, data is only piped from the 3D scene GPU to the post processing GPU, and both can then independantly perform tricks based on the previous frame without interference.

[quote name='theagentd' timestamp='1327229883' post='4905076']
I do however fail to see how this is going to give any benefit over standard AFR.


The standard AFR requires/highly recommends that you do not include any rendering that takes the results of the previous frame. This means, all incremental-rendering tricks, like rain streaks down a texture, surface degradation, reprojection-cache tricks, even delta based motion blur (as much as i dislike that effect) cause stalls in the standard AFR - it introduces a bottle neck on both cards, as the every frame depends on the previous frame. However, running the 3D scene and post processing on different cards bypasses this rule, data is only piped from the 3D scene GPU to the post processing GPU, and both can then independantly perform tricks based on the previous frame without interference.
[/quote]
That's true, as long as each GPU only reuses its own frame buffers. The first GPU obviously does not have access to the post-processed frame, but that's the only limitation I guess. It does enable some really nice effects that could reduce the performance of AFR,

It's all a very interesting idea, but wouldn't it still be hard to load balance between the two GPUs? Since the first GPU's load would depend a lot on the scene, how many objects that are visible, what shaders are used, e.t.c while post-processing pretty much only depends on the number of pixels. I suppose it would be pretty easy to find quality settings that makes post-processing take around 16.666ms for the second GPU, but the load on the first GPU can vary wildly. In a game that does not have an equal load between rendering and post-processing this would not scale well at all...

In a game that does not have an equal load between rendering and post-processing this would not scale well at all...


Quite so. Its more of an exercise in learning GPU bottlenecks. Given the design for devices & device contexts in DirectX11 though, its quite easy to support both ways and even switch between them.

I found it interesting to design code in this way on the account of PS3 development, where we find it common to offload postprocessing onto the SPUs instead.

[quote name='theagentd' timestamp='1327287236' post='4905308']
In a game that does not have an equal load between rendering and post-processing this would not scale well at all...


Quite so. Its more of an exercise in learning GPU bottlenecks. Given the design for devices & device contexts in DirectX11 though, its quite easy to support both ways and even switch between them.

I found it interesting to design code in this way on the account of PS3 development, where we find it common to offload postprocessing onto the SPUs instead.
[/quote]

It could be possible to load balance between the GPUs by moving tasks that do not require data from the previous frame between the GPUs. In deferred rendering, if we assume that the first GPU can always manage to setup the G-buffers in the time it has and that the second GPU can always manage the post-processing in time, we can have both GPUs render lights and change the ratio of lights rendered by each GPU in realtime based on timing measurements. The same could be done for tile-based deferred rendering too by changing the ratio of tiles lit by each GPU, This could obviously also work with completely different GPU as long as the earlier assumption still holds.

This sure is interesting. Sadly I don't have access to a multi-GPU computer right now, and I also use OpenGL, not DX...
sli should also work in opengl.

I think the really interesting path would be to render screen/view space effects with one GPU, while the other GPU makes world space calculations and bakes those into textures. would be kind of like a realtime megatexture. that would actually be quite a nice way to speed up rendering in split screen gaming on consoles :)

I was doing something like that on a psp, as the psp does not support multi texturing and fillrate was also quite limited, you wanted to draw all surfaces with just one drawcall, on the other side, there is a "media engine" which basically idles all the time. I've been blending up to 16 terrain layers + lightmap and dxt1 compressed them into the VMEM.

I could imagin a game like e.g. portal, with very simple level shapes, yet nice lighting, could benefit a lot from a realtime GI solution that is baked into the textures.

sli should also work in opengl.




[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

I think the really interesting path would be to render screen/view space effects with one GPU, while the other GPU makes world space calculations and bakes those into textures. would be kind of like a realtime megatexture. that would actually be quite a nice way to speed up rendering in split screen gaming on consoles [/font]


smile.png



[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

I was doing something like that on a psp, as the psp does not support multi texturing and fillrate was also quite limited, you wanted to draw all surfaces with just one drawcall, on the other side, there is a "media engine" which basically idles all the time. I've been blending up to 16 terrain layers + lightmap and dxt1 compressed them into the VMEM.[/font]




[color=#282828][font=helvetica, arial, verdana, tahoma, sans-serif]

I could imagin a game like e.g. portal, with very simple level shapes, yet nice lighting, could benefit a lot from a realtime GI solution that is baked into the textures. [/font]





Portal does use GI http://www.valvesoft...ourceEngine.pdf

And that sounds quite interesting, but if you don't manage the GPUs properly, and there is some small lag between them, then all of a sudden you have really ugly texture popping. Look at how RAGE implemented their Mega texture. At release, it failed on some GPUs, then it took them multiple days to release a patch that degraded performance even farther!(But notably fixed some of the issues) Mega textures aren't ready for real world usage, without a lot of further developments
www.wirezapp.net - Bringing Power to the indie developer

This topic is closed to new replies.

Advertisement