How 3D engines manage so many textures without running out of VRAM?

Started by
11 comments, last by 21st Century Moose 8 years, 11 months ago

Hello everyone,

I have done few 2D games. I have a solid understand of most of the techniques that are used in 2D games. But I haven't worked with 3D yet. I'm wondering, How do game engines deal with loading so many textures without running out of VRAM?

So lets say you have a 100 models. each model have a 4k texture. Now each texture size is roughly around 67MB (4096x4096x32). So if we render all 100 models with 4K textures that would have a size of 6.3GB (100 x 67MB). Most video cards have 1-2GB of VRAM. So how do engines deal with that amount of data?

I understand that engines use Occlusion Culling to only draw models that face the camera. Is that how they also deal textures? Load only textures that are visible to the camera? Do they load it at runtime? That might be slow. Do they use some kind of Level Of Details algorithm to only load the 4K textures when you're close to it? i don't know.

How is this done exactly?

Advertisement

Not all models will have a 4k texture. A lot of models will be for bullets, smaller scene decoration objects, etc. 256x256 or even lower is perfectly reasonable for them.

Not all models are used on every map. Of your hypothetical 100 models, the current map may only use 20 or so. So you don't load all 100, you just load the 20 or so that the current map uses. When the player transitions to a new map you destroy the textures and models which are not needed for the new map, which frees up memory, and again only load the textures and models that the new map uses.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

Side note - besides texture streaming, advanced resource management, virtual/sparse/mega textures - A 4k texture is only 8MB (not 64MB) when using DXT1/BC1, so loading 100 of them is feasible on a 1+ GB GPU :)

Not all models will have a 4k texture. A lot of models will be for bullets, smaller scene decoration objects, etc. 256x256 or even lower is perfectly reasonable for them.

Not all models are used on every map. Of your hypothetical 100 models, the current map may only use 20 or so. So you don't load all 100, you just load the 20 or so that the current map uses. When the player transitions to a new map you destroy the textures and models which are not needed for the new map, which frees up memory, and again only load the textures and models that the new map uses.

That makes sense.

Side note - besides texture streaming, advanced resource management, virtual/sparse/mega textures - A 4k texture is only 8MB (not 64MB) when using DXT1/BC1, so loading 100 of them is feasible on a 1+ GB GPU smile.png

ooh wow. I haven't heard of texture streaming or DXT1/BC1 compression before. That's really interesting.


ooh wow. I haven't heard of texture streaming or DXT1/BC1 compression before. That's really interesting.
They're the tiny .dds files you see in desktop games (usually that is, DDS is a container, it could hold either DXT compressed data or raw data). Mobile devices have their own formats too (PVRTC, ETC2, etc). And there is ASTC although its starting to get support only recently AFAIK.

Its not often you send uncompressed data to the GPU since, as you mentioned, its simply too much data. Then again, in 2D games if you do some sort of pixel art kinda thing, DXT compression might screw up the look.

"I AM ZE EMPRAH OPENGL 3.3 THE CORE, I DEMAND FROM THEE ZE SHADERZ AND MATRIXEZ"

My journals: dustArtemis ECS framework and Making a Terrain Generator

Hello everyone,

I have done few 2D games. I have a solid understand of most of the techniques that are used in 2D games. But I haven't worked with 3D yet. I'm wondering, How do game engines deal with loading so many textures without running out of VRAM?

So lets say you have a 100 models. each model have a 4k texture. Now each texture size is roughly around 67MB (4096x4096x32). So if we render all 100 models with 4K textures that would have a size of 6.3GB (100 x 67MB). Most video cards have 1-2GB of VRAM. So how do engines deal with that amount of data?

I understand that engines use Occlusion Culling to only draw models that face the camera. Is that how they also deal textures? Load only textures that are visible to the camera? Do they load it at runtime? That might be slow. Do they use some kind of Level Of Details algorithm to only load the 4K textures when you're close to it? i don't know.

How is this done exactly?


Correct me if I'm wrong, but you also have to factor in texture compression. A 4x4k texture is 16MB compressed last I checked. Add mipmapping and it's 21MB.

At the same time, more is stored on video cards than textures. You also have to factor in that the average player will probably want at least 4x AA and 1920x1080 resolution, leading up to 50-100MB framebuffer being used.

But the best way of figuring it is, few games have 100 unique humans in the world at once. And if they do, the textures of each human probably don't exceed 1x1k. Either that or they have a major engine and have completely figured a way around this problem.

So lets say you have a 100 models. each model have a 4k texture. Now each texture size is roughly around 67MB (4096x4096x32). So if we render all 100 models with 4K textures that would have a size of 6.3GB (100 x 67MB). Most video cards have 1-2GB of VRAM. So how do engines deal with that amount of data?

The driver does it for you in DX11 and GL.

When you submit a draw call (glDraw*) the driver will check what inputs is needed by the program, and for each texture, it will check if it's resident (ie in gpu accessible memory) and if not, it will fix it.

If there is not enough memory, driver will typically evict unused (using for instance a least recently used table) data in vram that is copying it to main memory (if it's not already there) and use the freed memory for texture that needs it.

Any texture in main memory may be paged out. The algorithm used to decide which texture to replace has a strong impact on performance obviously.

So lets say you have a 100 models. each model have a 4k texture. Now each texture size is roughly around 67MB (4096x4096x32). So if we render all 100 models with 4K textures that would have a size of 6.3GB (100 x 67MB). Most video cards have 1-2GB of VRAM. So how do engines deal with that amount of data?

The driver does it for you in DX11 and GL.
When you submit a draw call (glDraw*) the driver will check what inputs is needed by the program, and for each texture, it will check if it's resident (ie in gpu accessible memory) and if not, it will fix it.
If there is not enough memory, driver will typically evict unused (using for instance a least recently used table) data in vram that is copying it to main memory (if it's not already there) and use the freed memory for texture that needs it.
Any texture in main memory may be paged out. The algorithm used to decide which texture to replace has a strong impact on performance obviously.


I will note that on Windows there is still a finite amount of system memory that it will use for paging out GPU memory, and so you can still exhaust your resources if you have too much data. Also in general, you really want to avoid having the driver page things in and out mid-frame. It's a great way to kill your performance in unpredictable ways.

So lets say you have a 100 models. each model have a 4k texture. Now each texture size is roughly around 67MB (4096x4096x32). So if we render all 100 models with 4K textures that would have a size of 6.3GB (100 x 67MB). Most video cards have 1-2GB of VRAM. So how do engines deal with that amount of data?

The driver does it for you in DX11 and GL.When you submit a draw call (glDraw*) the driver will check what inputs is needed by the program, and for each texture, it will check if it's resident (ie in gpu accessible memory) and if not, it will fix it.If there is not enough memory, driver will typically evict unused (using for instance a least recently used table) data in vram that is copying it to main memory (if it's not already there) and use the freed memory for texture that needs it.Any texture in main memory may be paged out. The algorithm used to decide which texture to replace has a strong impact on performance obviously.
I will note that on Windows there is still a finite amount of system memory that it will use for paging out GPU memory, and so you can still exhaust your resources if you have too much data. Also in general, you really want to avoid having the driver page things in and out mid-frame. It's a great way to kill your performance in unpredictable ways.

But is there a way to prevent it?
I would think destroying unused and creating new textures mid-frame would seem even worse unless it is possible to do that on a different thread. Or wait, is that exactly how so called streaming textures do work?
I would think destroying unused and creating new textures mid-frame would seem even worse unless it is possible to do that on a different thread. Or wait, is that exactly how so called streaming textures do work?

The thing is, you would never actually destroy and create textures at runtime. What you would do instead is replace the contents of already existing textures.

So you would have a pool of, say, 20 textures at a given resolution (let's pick 512x512 for the purposes of this example). If one of those textures is no longer needed at runtime, rather than destroy it, you would just reuse the texture and replace it's content. In OpenGL this means a glTexSubImage call, in D3D a LockRect or Map.

Now, you might think that with a truly generic system you can't guarantee that your hypothetical pool of 20 512x512 textures is appropriate to every situation, but you don't build a truly generic system. You adapt your texture management system to the requirements of your program. A generic system that can handle any arbitrary thing you throw at it sounds nice in theory, but just isn't practical. In practice you're going to have knowledge of your content, you're going to know that 95% of the textures you use are of a given fixed resolution, and you build something that works for your specific case using that knowledge.

That's all assuming that you even need to go down that route, which only really applies to low-memory situations. If you have enough memory for all of the current map's requirements then you load everything that the current map needs one-time-only, but you don't load anything that the current map doesn't need.

Direct3D has need of instancing, but we do not. We have plenty of glVertexAttrib calls.

This topic is closed to new replies.

Advertisement