S3TC mipmaps without DDS

Started by
1 comment, last by cr88192 11 years ago

I'm trying to use S3 texture compression in my opengl game but it seems that if I want mip-maps I either have to load from a pre-compressed DDS file or generate mip-maps manually and I cant just use glGenerateMipmapEXT. I don't want to load from a DDS because I want to offer the user the option of disabling texture compression provided they have enough VRAM. I want the texture to be loaded from a lossless PNG and then compressed at load time if texture compression is enabled. If I do indeed need to generate mip-maps manually I was wondering if there was a simple library somewhere that could do this or if there are some straightforward code samples/tutorials detailing how to do this.

Advertisement

There's a lot of libraries that can do the compression for you, e.g.

http://developer.amd.com/resources/archive/archived-tools/gpu-tools-archive/ati_compress/

https://code.google.com/p/nvidia-texture-tools/

https://code.google.com/p/libsquish/

https://code.google.com/p/crunch/

Some forum members here have also written their own -- LSpiro's is available as an exe tool, and IIRC, cr88192 posted some code for a fast compressor recently designed for on-load use.

However, high quality texture compression is a slow process. e.g. the nvidia library lets you choose between quality or time, but you can't have both. This is the main reason that textures are usually pre-compressed. So if you use fast on-load texture compression, then the quality impact of texture compression will be even worse than usual!

If you want fast loading times in both cases, you could store both the DDS and PNG files on disk wink.png

DXT5 compression has a 4:1 ratio, so another option would be to always use texture compression, but simply use higher resolutions on better GPUs. e.g use 4096x4096 textures on GPUs with more VRAM, and 2048x2048 textures on other GPUs (this is the same VRAM saving as turning on/off compression). You can do this by simply ignoring the first mip in the DDS file.

As a side note, texture compression doesn't just save VRAM, but also saves bandwidth when performing texture fetch instructions in your shaders (which makes your shaders run faster if they're bound by fetch latencies).

yes, for example (for DXT5):

http://pastebin.com/8rq3z5F5

http://pastebin.com/8Sga1sd6

which basically try to do DXT5 compression reasonably quickly by reducing the process mostly to fixed-point arithmetic and bit-twiddly.

however, there is no real internal feedback or search for an optimal encoding, so the quality is basically "whatever the fixed point-math spits out".

there was also some fiddling with a non-DDS DXTn based format (which added a secondary LZ77 compression stage), mostly intended for video-decoding usage (to be cheaper to decode than my current usage of M-JPEG for video-map textures).

for the most part, I have just been loading PNG and JPEG images and converting to DXTn on load (post mipmap though, at least at present), as it has generally been "good enough"...

naively, LZ77 can reduce DXT5 to 1/2 its size by generally compressing down the often highly-redundant alpha-channel, but a lossy block-reduction filter can further increase compression... however, for size/quality tradeoffs (with on-disk storage), JPEG and PNG still have an advantage here in my tests.

ADD:

note that mipmaps aren't actually hard to generate.

basically, at each level you are dividing the X and Y resolution in half, and roughly averaging 4 pixels into a single pixel.

say, pixels:

A B

C D

become

P

so: P=(A+B+C+D)/4

in the simple case, this is basically done for all of the image pixels and for each component, and is basically repeated each time to generate each new mip-level.

slightly higher quality for alpha-blended images can be gained like this:

Ta=Aa+Ba+Ca+Da (where Aa=A's alpha, and Ta=Total alpha).

P=A*(Aa/Ta)+B*(Ba/Ta)+C*(Ca/Ta)+D*(Da/Ta)

or alternatively (Argb = RGB for A):

Prgb=Argb*(Aa/Ta)+Brgb*(Ba/Ta)+Crgb*(Ca/Ta)+Drgb*(Da/Ta)

Pa=Ta/4

then, each mipmap level can be converted into DXTn.

in actual code, this typically results in 2 levels of "for" loops (one for Y and one for X), with some math in the middle.

note that in the latter strategy (where Ta is used), a few special cases may need special handling: Ta=0 and Ta=1020, where a fallback to the original (simple average) strategy may be used. this also requires either fixed or floating point arithmetic (since with plain integer arithmetic, say, 240/960 will always give 0), or possibly alternatively something like (Argb*Aa)/Ta.

This topic is closed to new replies.

Advertisement