misc: BTLZA (LZ77+Arithmetic), ...
what is it?...
basically, an extended form of Deflate, intended mostly to improve compression with a "modest" impact on decoding speed (while also offering a modest boost in compression).
in its simplest mode, it is basically just Deflate, and is binary compatible;
otherwise, the decoder remains backwards compatible with Deflate.
its extensions are mostly as such:
bigger maximum match length (64KiB);
bigger maximum dictionary size (theoretical 4GB, likely smaller due to implementation limits);
optional arithmetic coded modes.
the idea was partly to have a compromise between Deflate and LZMA, with the encoder able to make some tradeoffs WRT compression settings (speed vs ratio, ...). the hope basically being to have something which could compress better than Deflate but decode faster than LZMA.
the arithmetic coder is currently applied after the Huffman and VLC coding.
this speeds things up slightly by reducing the number of bits which have to be fed through the (otherwise slow) arithmetic coder, while at the same time still offering some (modest) compression benefit from the arithmetic coder.
otherwise, arithmetic coder can be left disabled (and bits are read/written more directly), in which case the decoding will be somewhat faster (it generally seems to make around a 10-15% size difference, but around a 2x decoding-speed difference).
ADD: in the tests with video stuff, overall I am getting around a 30% compression increase (vs Deflate).
what am I using it for?
mostly as a Deflate alternative for the BTIC family of video codecs (many of which had used Deflate as their back-end entropy coder);
possibly other use cases (compressing voxel region files?...).
otherwise, I am now much closer to being able to switch BTIC1C over to full RGB colors;
most of the relevant logic has been written, so it is mostly finishing up and testing it at this point.
this should improve the image-quality at higher quality settings for BC7 and RGBA output (but will have little effect on DXTn output).
most of the work here has been on the encoder end, mostly due to the original choice for the representation of pixel-blocks, and there being almost no abstraction over the block format here (it is sad when "move some of this crap into predicate functions and similar" is a big step forwards, a lot of this logic is basically decision trees and raw pointer arithmetic and bit-twiddling and similar). yeah, probably not a great implementation strategy in retrospect.
the current choice of blocks looks basically like:
so, each new encoder-side block is 256 bits, and spreads the color over the primary ColorBlock and ExtColorBlock.
in total, there is currently about 60 bits for color data, which is currently used to (slightly inefficiently) encode a pair of 24-bit colors (had thought, "maybe I can use the other 32 bits for something else", may reconsider. had used a strategy where ExtColorBlock held a delta from the "canonical decoded color").
for 31F colors, I may need to use the block to hold the color-points directly:
had also recently gained some quality improvement mostly by tweaking the algorithm for choosing color endpoints:
rather than simply using a single gamma function and simply picking the brightest and darkest endpoints, it now uses 4 gamma functions.
roughly, by fiddling, I got the best results with a CYGM (Cyan, Yellow, Green, Magenta) based color-space, where each gamma function is an impure form of these colors (permutations of 0.5, 0.35, 0.15). the block encoder then chooses the function (and endpoints) which generated the highest contrast.
this basically improved quality with less impact on encoder speed than with some other options (it can still be done in a single pass over the input pixels).
it generally improves the quality of sharp color transitions (reducing obvious color bleed), but does seem to come at the cost in these cases of slightly reducing the accuracy of preserved brightness.
this change was then also applied to my BC7 encoder and similar with good effect.