more textures: YUVA, and new image format...

Started by
6 comments, last by cr88192 11 years, 1 month ago
sorry in advance if all this isn't very interesting...


was fiddling more with textures, and am now messing around with DXT5 YUVA textures...
I ended up using essentially the same colorspace as JPEG-XR just mostly with U and V swapped.

I did a bunch of testing, and basically the JPEG-XR derived colorspace more or less won out in terms of preserving the most quality when shoved into a DXT5 texture (this being followed by another YCoCg variant and RCT). YCbCr managed to give fairly lame results (falling in last place in terms of RMSE), so was mostly excluded fairly early.


I have managed to get images here with generally higher quality (in terms of RMSE) than the straight DXT5-RGBA versions.

currently, in the image tested (full quality), I am getting an RMSE of 1.68 for G and ~ 2.07 for R and B, in contrast to 2.6 for G and 3.2 for RB for the RGBA versions.

note: RMSE=Root Mean Square Error, measures how far the pixels are off on-average (with pixel component values being in the range of 0..255).

note that all this does come at a cost to the accuracy of the alpha-channel, which in this format only has about 4-bits of accuracy (due to sharing the B channel with the UV scale factor).


I decided to include the images as PNG, but using here smaller versions (the versions I was mostly testing with were 4096x4096), mostly for the reason that the 4096x4096 versions were giving roughly 40MB PNG files.

(the DXT1 image is converted from a JPEG dump, so isn't strictly max quality, otherwise would need to dump the image from the tool as a PNG...).



I also have a preliminary spec for an image-format being developed here:
http://pastebin.com/mnxLUJ7D

the format doesn't (yet) have an encoder/decoder.
note: it is not really intended either for file interchange or as an exercise in design elegance.
Advertisement

It sounds somewhat interesting, still I'm not sure what you are talking about.

the format doesn't (yet) have an encoder/decoder.

So, are you developing a new DXT5 encoder, so that you get higher quality DXT5 RGB texture compression while still using the standard hardware supported DXT5 decoder ?

Ashaman73, on 27 Feb 2013 - 01:48, said:
It sounds somewhat interesting, still I'm not sure what you are talking about.


cr88192, on 26 Feb 2013 - 21:15, said:
the format doesn't (yet) have an encoder/decoder.

So, are you developing a new DXT5 encoder, so that you get higher quality DXT5 RGB texture compression while still using the standard hardware supported DXT5 decoder ?


(sorry if all this doesn't make much sense)


currently I already have the DXT5 encoders / ...

the YUVA textures are still DXT5, and can still use the existing hardware, but do require some shader support (shimming in some math to convert things back into RGBA, generally as a copy/pasted function, and feeding colors values through this function prior to making use of them in the shader logic).

partly I was looking for a new colorspace though that would deliver slightly higher image quality, as previously the image quality wasn't really that much better than plain RGBA (kind of harder to justify using an alternate colorspace if it doesn't notably improve the image quality). also trying to fine-tune the math to improve the image quality, ...


mostly, for the file-format part, what I have to get around to writing is the stuff for the headers and packaging (not likely difficult, just not gotten to it).
the packaging format, however, is based on another past (not implemented) format I had called "BTJ-NBCES", which was itself loosely based on JPEG and my own BTJ format ("NBCES" mostly had the drawback that it fully broke with JPEG compatibility, limiting its usefulness vs BTJ, which is at least partly backwards compatible).

the main potential advantage the new design has is that it can allow preparing images for faster decoding (by being able to more directly decode data into a form usable by the GPU), and at the same time allow potentially higher quality.

for example, my decoder for LZ77 compressed DXT5 is pulling off around 2GB/s (~ 2Gp/s), and in my tests the compression isn't really all that bad (in contrast to around 38Mp/s for my existing JPEG decoder).

the design may also potentially piggyback on some of my existing code (still debating on whether or not it should be its own independent codec, or a BTJ sub-mode). (as a BTJ sub-mode, I would avoid needing any additional 3D engine support, though the cruft-factor is a little higher, basically as it would amount mostly to decoder-hacks).


but, yes, it does make use of the existing graphics hardware (part of the whole point of using DXT5 FWIW).

the conversion from YUVA back into RGBA is currently handled inside of shaders, basically by telling the shader about the use of the alternate colorspace (via a variable). in my tests, this doesn't really seem to make a big difference regarding performance, but does allow for potentially higher image quality. (shader support is needed, since the hardware sort of assumes RGBA, and "fingerprinting" the textures isn't really practical).

as-is though, this variable applies to all textures passed into the shader, so basically it generally forces all textures in a given "tag-layer" to use the same colorspace (where tag-layers are basically like compound textures which can be referenced by material definitions).

eg:
"textures/base_foo/myimage::SomeLayer" allows referencing a layer within an image, which may be drawn with various blend-options, ...
this also applies to AVI videos, where a layer within an AVI can be referenced via a material definition (much like with a standalone image).

technically, it is all a bit hairy/evil in some ways though, as there is a bit of tight coupling between the image/video codec and the renderer (sort of like having a mess of code with one side making OpenGL calls, and the other side mucking around in the internals of a the image codec...).


or such...


the YUVA textures are still DXT5, and can still use the existing hardware, but do require some shader support (shimming in some math to convert things back into RGBA, generally as a copy/pasted function, and feeding colors values through this function prior to making use of them in the shader logic).

Still have a hard time to wrap my head around ...

Just to clarify my point of view. DXT5 or any other similar hardware supported format have the only benefit of being, well, hardware supported. The disk footage isn't really important, only the hardware supported access, mipmapping and filtering is. Does your decoding still work with the hardware supported filtering mechanism and e.g. with sRGB ?

Here are two topics about DXTn compression tool by L.Spiro, which might follow a similar goal:

topic 1

topic 2


the YUVA textures are still DXT5, and can still use the existing hardware, but do require some shader support (shimming in some math to convert things back into RGBA, generally as a copy/pasted function, and feeding colors values through this function prior to making use of them in the shader logic).

Still have a hard time to wrap my head around ...

Just to clarify my point of view. DXT5 or any other similar hardware supported format have the only benefit of being, well, hardware supported. The disk footage isn't really important, only the hardware supported access, mipmapping and filtering is. Does your decoding still work with the hardware supported filtering mechanism and e.g. with sRGB ?

Here are two topics about DXTn compression tool by L.Spiro, which might follow a similar goal:
topic 1
topic 2


all the usual hardware-supported stuff still works (mipmaps, filtering, ...), apart from the need for a final color-conversion in the fragment shaders. if a person just sort of naively uses the texture on some geometry without using the shader-side color-conversion, then it will look weird. so, this does require doing most of the rendering via shaders.

as-is, it is something like (in the fragment shader):
pix0=texture2D(texBase, texCoord);
pix0=ConvTex2RGBA(pix0);

with ConvTex2RGBA() as the conversion function.

the main reason a person might do this is because it can give quality more like that of uncompressed textures.

yes, I also have support for plain RGBA DXT5 as well, and have already spent a while fiddling with this.


also special purpose secondary compression (LZ77 based, so basically loosely comparable to Deflate), for offline storage.
its main purpose is mostly to allow quickly decoding the on-disk form of the textures, so they can be passed to the hardware (which can't directly use LZ77 compressed textures). the goal here is mostly to decompress quickly, but luckily LZ77 variants can be pretty fast here (in the GB/s range).

the main reason here is partly because, naively storing a raw DXT texture on disk makes them kind of bulky (which can be counter-productive, as disk-IO speeds are also an enemy here).


as-is, I am typically doing most DXT encoding during the process of passing textures to OpenGL (like "glCompressedTexImage2D()").
supporting a file-format which is DXT based does allow the possibility of offline encoding though.

this is more relevant for video though, partly to reduce the CPU load of streaming concurrent video streams into the GPU.
additional note:
I may also go around and mess with BC7, which also has some potentially interesting properties.

http://msdn.microsoft.com/en-us/library/windows/desktop/hh308953%28v=vs.85%29.aspx
http://msdn.microsoft.com/en-us/library/windows/desktop/hh308954%28v=vs.85%29.aspx
http://www.opengl.org/registry/specs/ARB/texture_compression_bptc.txt


some of its modes should likely give generally better quality than DXT5, though the format is a little more complicated, and likely if the encoder supports more than a single block-mode, it would require either doing some level of block-analysis or checking and weighting each mode individually (expensive). (or, worse: doing a brute-force check with every possible option...).

granted, a naive encoder could probably just use mode 5 or 6 for everything.
a slightly less naive encoder could maybe detect per-block alpha and alternate between modes 4/5/6 or similar.

mode 6, with two 7:7:7:7 components with 4 bit indices looks nifty. (can be lazy and just treat it as a single unified Y curve).

a lot of the other modes look inconvenient (not well suited to direct forward calculation via fixed-point arithmetic).

(my existing encoders work pretty much entirely by direct forward calculations via fixed point arithmetic and without any sort of internal feedback, whereas the design of BC7 appears more aimed towards encoders which make use of internal feedback to encode their decisions).

(or, if there is a straightforward fixed-point solution to modes 0/1/2/3/7, dunno... looks like it would probably require looping over a table and calculating weights for each possibility or similar...).

well, that and maybe also get around to the code for encoding/decoding the serialized format.


well, not like I am only doing this (have also been working on other stuff as well...).


ADD, thoughts for encoding BC7 partitioned modes:
calculate average value of 4x4 pixel block;
calculate correlation vector for block, use this as division plane;
split the pixels into each camp via the color plane (call this axis X), storing the X values for each pixel;
for each camp, calculate the averages and correlation vectors (Y0 and Y1), and store the Y values for all pixels via each vector;
loop over each possible partition, taking the dot-product of this mode-vector (*1), and the pixel X values, choosing the option with the highest value;
do the actual encoding part (using the chosen vector to indicate which Y values to use).

*1: probably fixed-point dot-product, with the vector consisting mostly of 1 and -1 values (values on the same side of the plane will always generate positive products, and opposite sides negative, with a larger product value indicating a better match).

for a 3-subset block-mode, we could also allow for encoding along Z, essentially treating the space as an H-shaped region.

expectation:
probably will be fairly slow (those loops and dot-products aren't free...).
might make sense for batch encoding or similar though.
status update:

new preliminary spec here:
http://pastebin.com/4fwQPRAc

the changes are mostly minor, mostly related to the effort of trying to implement it.

it has ended up handling layers a bit differently than my already existing BTJ format, namely in terms of giving each layer and image its own individually-allocated structure, rather than using a single big context with everything stored as lots of arrays, and also parses the file via a recursive top-down parsing strategy, rather than linear forward scanning (the primary strategy used for decoding BTJ images). hence the addition of end-markers. it is going slow, as I have ended up writing a lot more new code than originally expected.

will probably either need to gloss over some of these differences, or end up writing new engine code to target the new interface (probably would also mean giving AVI videos their own dedicated FOURCC).

not specified is a mechanism for storing materials and shaders inside of the images, but I never really did this with BTJ images either (generally stayed with shaders/... defined elsewhere).

currently the implemented format is a subset (doesn't support filtered RGBA or YUVA modes yet, only DXT), and currently it hasn't really been tested.


ADD: has been tested now (after a little more implementation work and debugging), but is still a DXT-only subset.
idly wondering about a "good" way to add a quality slider while staying clear of existing IP claims (or making the compression very slow, such as using approximate threshold matching on the blocks).

a few possible algos are being considered (mostly based on matching based on RMSE thresholds, likely in RGBA space). (say, 90% quality matches any blocks where RMSE < 15, 80% < 30, ..., or similar...).
Lossy Block Reduction Filter:
http://pastebin.com/105GZST6

this was mostly a result of an experiment for adding a more aggressive lossy mode to my DXT encoding.
it basically merges together similar blocks, such that in the output they have combined/averaged pixel values.
this is mostly to improve the effectiveness of subsequent LZ compression of the DXT output.

it accepts a quality factor to indicate how much of an error tolerance there is when matching blocks.


ADD: example image after being reduced to 50% quality (and encoded/decoded from my new "BTIC" format). image quality is "roughly" comparable to that of a JPEG at 50% quality (my JPEG codec), though the file size is a bit bigger (190kB vs 62kB). at 90% quality, they have roughly comparable sizes (~ 500kB). (note that the only compression done on the resultant DXT5 image is BlockPack).

(edit, add: a libjpeg produced version of the source image at 50% quality is 127kB, but the quality is slightly higher...).


recently also ended up fixing a few bugs and adding in some optimizations for my BlockPack encoder (mostly adding in hash-chains).

updated BlockPack:
http://pastebin.com/QN8PzPQC

This topic is closed to new replies.

Advertisement