DXT compressors, good vs bad

Started by
7 comments, last by dpadam450 7 years, 3 months ago

I never really compressed my textures in my engine because I didn't have gigantic levels / a full game worth of texture content. If I let my GPU or GIMP create DXT textures, they come out about the same which look terrible.

I was digging up really old threads about DXT and realized that certain tools are actually way better than others. GIMP vs Nvidia Texture Tools is such a drastic difference. I noticed that the nvidia tools have a -fast compression option as well which is pretty bad but still not as bad as gimp.

So what gives? We have 4x4 blocks being compressed. Of those 16 pixels you select a high and low and interpolate everything in between. My guess then is they must be determining actual contrast/averages to throw out some outliers and have a better high and low value for the 16 pixel block? Maybe this contrast check is happening per r,g,b channel?

*Edit: I just tried the "use dithering option" in GIMP which produced results much closer to the nvidia tool, but still not as good.

Nvidia tool being used:

https://code.google.com/archive/p/nvidia-texture-tools/downloads

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Advertisement

You can download the NV texture tools source code and check it out :)

If I remember how DXT works... Each 4x4 block has a palette of 4 colours - 2 are explicit R5G6B5 values, and the other two are implicitly generated at 1/3rd and 2/3rds interpolation of the two explicit colours. Each of the 16 pixels in the block then has a 2-bit color index.

So firstly, 565 creates some challenges. Shitty encoders can't encode grey, because middle-grey in 565 (16/32,32/64,16/32) isn't grey. This adds characteristic green/puple colour shifts...
Smart encoders can pick two explicit colours such that one of the implicit colours will land exactly on middle grey and wont suffer as many hue shifts.
Likewise, picking the min/max RGB for your two explicit colours doesn't always produce the best encoding. Picking other colours can produce a better overall palette, minimizing per-pixel error within the block.
Better encoders are often slower as they will evaluate several strategies per block to find which produces the least error.

Also, compression isn't just so that you can load data faster and fit more data in RAM -- many GPU's will keep texture data compressed in the L2 cache, which means that sampling from a compressed texture can be faster than sampling from a regular one, which means your pixel shaders might even execute faster.

Here you have a great article about texture samplers and there is information about DXT too.

It says:

So, small L1 cache, long pipeline. What about the “additional smarts”? Well, there’s compressed texture formats. The ones you see on PC – S3TC aka DXTC aka BC1-3, then BC4 and 5 which were introduced with D3D10 and are just variations on DXT, and finally BC6H and 7 which were introduced with D3D11 – are all block-based methods that encode blocks of 4×4 pixels individually. If you decode them during texture sampling, that means you need to be able to decode up to 4 such blocks (if your 4 bilinear sample points happen to land in the worst-case configuration of straddling 4 blocks) per cycle and get a single pixel from each. That, frankly, just sucks. So instead, the 4×4 blocks are decoded when it’s brought into the L1 cache: in the case of BC3 (aka DXT5), you fetch one 128-bit block from texture L2, and then decode that into 16 pixels in the texture cache. And suddenly, instead of having to partially decode up to 4 blocks per sample, you now only need to decode 1.25/(4*4) = about 0.08 blocks per sample, at least if your texture access patterns are coherent enough to hit the other 15 pixels you decoded alongside the one you actually asked for :). Even if you only end up using part of it before it goes out of L1 again, that’s still a massive improvement. Nor is this technique limited to DXT blocks; you can handle most of the differences between the >50 different texture formats required by D3D11 in your cache fill path, which is hit about a third as often as the actual pixel read path – nice. For example, things like UNORM sRGB textures can be handled by converting the sRGB pixels into a 16-bit integer/channel (or 16-bit float/channel, or even 32-bit float if you want) in the texture cache. Filtering then operates on that, properly, in linear space. Mind that this does end up increasing the footprint of texels in the L1 cache, so you might want to increase L1 texture size; not because you need to cache more texels, but because the texels you cache are fatter. As usual, it’s a trade-off.

Our discussion is on the compression stage of generating a DXT texture offline, not the decompression stage. There are varying quality DXT outputs of the same image input.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

Have you checked out AMD's Compressonator yet? Its opensource now. http://gpuopen.com/gaming-product/compressonator/

-potential energy is easily made kinetic-

Compressonator is pretty good in my experience, and is also fairly quick. For BC6 and BC7 I would recommend the ISPC Texture Compressor. It also does BC1 and BC3, but I've personally only used it for BC6 and BC7.

@MJP

I dug up an old thread of yours from years ago I think talking about Compressonator. Brought me to the whole concept of the algorithm behind what these tools are actually doing.

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

DXT compression is not at all trivial, both in how it works, and in what logic people attach to it. Finding the best possible coefficients is a hard problem, but this is usually not necessary. On the other hand, a couple of years ago, a person with name Rich Geldreich (I remember the name because it was so funny, Geldreich means "rich on money") posted a link to a compressor which deliberately compressed sub-optimally. My first thought was "WTF?!" but on a second thought this turned out being a really ingenious idea since the bad compressor's output was such that it compressed very well when fed into a LZ compressor afterwards. So you had only slightly worse quality but a lot less data to transmit over the wire.

On the other hand, a couple of years ago, a person with name Rich Geldreich (I remember the name because it was so funny, Geldreich means "rich on money") posted a link to a compressor which deliberately compressed sub-optimally. My first thought was "WTF?!" but on a second thought this turned out being a really ingenious idea since the bad compressor's output was such that it compressed very well when fed into a LZ compressor afterwards. So you had only slightly worse quality but a lot less data to transmit over the wire.

Rich (and Stephanie Hurlburt) now have a startup called Binormal that's continuing this work, and trying to make a universal lossy/lossless hybrid texture format -- A lossless LZ type algorithm over a lossy block based algorithm that can quickly transcode to the native GPU formats such as DXT/ETC/etc.
They've also got that kind of DXT compressor that you talk about, which optimizes the DXT coefficients such that LZMA will compress the results better.

*Binomial

NBA2K, Madden, Maneater, Killing Floor, Sims http://www.pawlowskipinball.com/pinballeternal

This topic is closed to new replies.

Advertisement