Understanding VRAM use / bandwidth

Started by
8 comments, last by MJP 11 years, 11 months ago
Greetings.

As I design my rendering engine, I'm thinking a lot about off-screen buffers for things like reflection textures, g-buffers, etc. Specifically, I'm wondering how many I can get away with. All of my textures, vertex buffers, etc., consume less than 400MB per scene. It seems that with today's cards with 1GB+ of video memory, it would be difficult to fill the VRAM with off-screen buffers (even if they're 32-bit float buffers, which I believe consume less than 40MB when you include depth/stencil). I'm interested in learning more about the other performance considerations, though. I can certainly experiment to see how much I can get away with, but I was hoping someone might have an article or two that would explain more of the theory -- things like bandwidth and cache and how data is moved around in the pipeline -- so that I can get a better idea of the smartest way to manage these things.

Thanks!
Advertisement

All of my textures, vertex buffers, etc., consume less than 400MB per scene. It seems that with today's cards with 1GB+ of video memory, it would be difficult to fill the VRAM with off-screen buffers (even if they're 32-bit float buffers, which I believe consume less than 40MB when you include depth/stencil).

Lucky you. My planet renderer is teetering on the edge of oblivion, paging in-and-out ~6 GB of texture data alone...

Also consider that mobile devices have much less VRAM available. Even the latest iPad only sports a gigabyte of RAM total, and I doubt you can use more than 512MB for GPU data.

I was hoping someone might have an article or two that would explain more of the theory -- things like bandwidth and cache and how data is moved around in the pipeline -- so that I can get a better idea of the smartest way to manage these things.[/quote]
GPU memory is fast, bandwidth to main memory is limited - that's the long and short of it.

Fit all the data you can on card, make sure to absolutely minimise the the data going back-and-forth across the bus.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

swiftcoder, I've seen your planet renderer and that doesn't surprise me in the least. That stuff looks amazing. Thanks for the advice.
Render targets can consume a lot of memory very quickly. It's quite common to have a lot of them, and they can have a large footprint if they use a floating point format or if they use MSAA. An MSAA floating point G-Buffer can be 100's of megabytes at 1920x1080.
Thanks. Is there an easy way to calculate the memory increase that occurs when MSAA is enabled for a frame buffer?

Thanks. Is there an easy way to calculate the memory increase that occurs when MSAA is enabled for a frame buffer?

MSAA basically calculates N samples for each pixel, resulting in Nx the memory usage. 8x MSAA takes 8x the memory - 16x MSAA takes 16x the memory.

For a 1920x1080 G-buffer with floating point colour and normals, that's about 500 MB just for the G-buffer.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Yikes. What effect does that have on fill rate? I was under the impression that MSAA didn't affect the number of rendered pixels and that it used some sort of edge-detection technique.

Yikes. What effect does that have on fill rate?

It kills it. You need one hell of a beefy card to run 16x MSAA on a modern game.

I was under the impression that MSAA didn't affect the number of rendered pixels and that it used some sort of edge-detection technique.[/quote]
Perhaps you are thinking of one of the MLAA/SMAA variants?

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

That's probably what I was thinking off. I'll have to do some more reading on it.

Yikes. What effect does that have on fill rate? I was under the impression that MSAA didn't affect the number of rendered pixels and that it used some sort of edge-detection technique.


In terms of render target memory storage there is generally N subsamples per pixel based on the MSAA rate, as swiftcoder explained. This is true of both color buffers and depth buffers. You have to be a little careful though because Nvidia/AMD can be a little tricky with their nomenclature. For instance Nvidia has special MSAA modes that only use 4 or 8 subsamples per pixel, but they advertise them as "16x MSAA" due using a special decoupled method of storing pixel coverage that uses 16 coverage samples.

In terms of bandwidth and pixel shading it's more complicated. The way MSAA works is that the pixel shader will *usually* only shade one sample per pixel, which is similar to the non-MSAA case. However for cases where multiple triangles partially overlap a pixel, you'll end up executing the pixel shader more than once per-pixel. In terms of bandwidth, the pixel shader still has to write to all subsamples even if it only shades once. To mitigate this problem, GPU's employ complex lossless compression schemes that take advantage of the fact the color is typically the same for all subsamples. The end result is that performance is much much better than the supersampling case, with memory footprint being the same.

EDIT: Just to clarify, I'm talking about modern dedicated PC GPU's here. Things are different the TBDR GPU's used on mobile hardware, as well as the Xbox 360's GPU.

This topic is closed to new replies.

Advertisement