Jump to content
  • Advertisement
Sign in to follow this  
Aqua Costa

Texture Size And Performance

This topic is 2545 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I know that large textures decrease performance because of low memory bandwidth of the GPU, but I was wondering if the size also affects performance in other ways like: Is sampling a smaller texture faster? (Both with the same format)

And regarding DX texture compression: Do modern games still use compressed formats? Because I never used them in my projects and I was wondering if I should start using them.

Share this post


Link to post
Share on other sites
Advertisement
I know that large textures decrease performance because of low memory bandwidth of the GPU, but I was wondering if the size also affects performance in other ways like: Is sampling a smaller texture faster? (Both with the same format)
It usually is, assuming the samples are coherent and not random. It's not simply that smaller == faster though.

This isn't exactly accurate, but as a kind of approximately concrete example, let's imagine a GPU that works like this:
A batch of vertices are transformed into primitives. This batch of primitives are rasterized into a batch of pixels. The batch of 'N' pixels have to be shaded (using your pixel shader) by only 'S' hardware shader units. Each shader unit can process 'X' number of pixels at a time.
Say, N=192, S=4 and X=16.
So, we take our batch of 192 pixels to be shaded, split them into 4 groups (of 48 each), and then split each group into 3 sub-groups of size 16.
Now, each shader unit has 3 'batches' of pixels to shade. It starts running the pixel shader on the first batch of 16 pixels, until it hits an instruction that stalls/causes latency. When there's latency, it switches to the 2nd batch, and executes it until it runs into latency again. This repeats until all 3 batches are finished, at which point, the next batch of vertices can be rasterized/shaded.

Texture sampling instructions cause latency, as they're basically memory fetches. When you issue a texture sample, the L2 cache is instructed to fetch some texture data from VRAM. Because the shader unit is operating on 16 pixels at once (in lock-step), it's going to get (up to) 16 simultaneous requests to download data to L2. Once all of these requests are satisfied, then the batch of 16 pixels that issued the request are no longer blocked, and the shader unit can continue processing them.

With all of that out of the way, we can now see why smaller textures are "faster" -- with a lower resolution, the probability that two pixels in a batch will attempt to fetch the same texel becomes higher.
In the extreme case - a 1x1 texture - every texture fetch is going to ask for the same data to be downloaded into the L2 cache, and after the first fetch, all other fetches will be complete instantly.

With higher resolution textures, there's more chance that different pixels will require very different texels, making the probability of a cache-hit very low.

And regarding DX texture compression: Do modern games still use compressed formats? Because I never used them in my projects and I was wondering if I should start using them.[/quote]Yes, a DXT1 texture is 8 times smaller than it's uncompressed equivalent, which means (in theory) 8 times as many pixels can be stored in the L2 cache at once (increasing the chances of a cache-hit), and can be fetched there 8 times faster. This also means you can have 8 times more pixels loaded for the same VRAM megabyte budget (though, obviously with compression artefacts).

Also note, that using mip-maps is also a huge performance win here. Mip-maps make sure that your textures aren't ever "too high resolution", by sampling from lower-resolution versions where necessary. This reduces fetch latencies greatly, just like compression does.

This also shows why changing textures between draw-calls can degrade performance -- if the same textures are used between draw-calls, then the L2 cache likely already contains valid texture data, making fetches occur quicker. If the textures change each draw-call, then the L2 cache may as well be empty at the beginning of each batch.

Share this post


Link to post
Share on other sites

Also note, that using mip-maps is also a huge performance win here.


When using mip-maps fetching is faster, but what about uploading textures to the GPU? Because the memory used by a chain of mipmaps that needs to be uploaded to the GPU is larger than a texture without mipmaps, right?

Share this post


Link to post
Share on other sites
A mip chain ends up being about 50% of the memory required for the largest mip level. So if you had a 1MB 512x512 texture, adding mips would bring you up to ~1.5MB. So it's not really that much, and yet the performance savings can be huge due to the increased texture cache efficiency.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!