Jump to content

  • Log In with Google      Sign In   
  • Create Account

BGBTech: The Status Update

Status: BTIC1C + BC6H and BC7, Expanded RGB depth.

Posted by , 24 December 2013 - - - - - - · 812 views

well, first off, recently did a test showing the image quality for BTIC1C:

this test was for a video at 1024x1024 with 8.6 Mbps and 0.55 bpp.

as noted, the quality degradation is noticeable, but "mostly passable".
some amount of it is due largely to the conversion to RGB555, rather than actual quantization artifacts (partly because video compression and dithering don't really mix well in my tests). however, some quantization artifacts are visible.

as usual, working spec:

other recent changes:

I have split apart BTIC1C and RPZA into different codecs, mostly as 1C has diverged sufficiently from RPZA that keeping them as a single codec was becoming problematic.

BTIC1C now has BC6H and BC7 decode routes, with single-thread decode speeds of around 320-340 Mpix/sec for BC7, and around 400 Mpix/sec for BC6H (the speed difference is mostly due to the lack of an alpha channel in 6H, and slightly awkward handling of alpha in BC7).

as-is, both effectively use a subset of the format (currently Mode 5 for BC7, and Mode 11 for 6H).

the (theoretical) color depth has been expanded, as it now supports 23-bit RGB and 31-bit RGB.
RGB23 will give (approximately) a full 24-bit color depth (mostly for BC7, possibly could be used for RGBA).

RGB31 will support HDR (for BC6H), and comes in signed and unsigned variants. as-is, it stores 10-bits per component (as floating-point).

likewise, the 256-color indexed block-modes have been expanded to support 23 and 31 bit RGB colors.

these modes are coerced to RGB565 for DXTn decoding, as well as RGB555 still being usable with BC7 and BC6H, ...
this means that video intended for one format can still be decoded for another if-needed (though videos will still have a "preferred format").

as-is, it will still require some work on the encoder end to be able to generate output supporting these color depths (likely moving from 128 to 256 blocks on the encoder end).

the current encoder basically uses a hacked form of DXT5 for its intermediate form, where:
(AlphaA>AlphaB) && (ColorA>ColorB)
basically the same as DXT5.
(AlphaA<=AlphaB) || (ColorA<=ColorB)
special cases (flat colors, skip blocks, ...)

however, there are no free bits for more color data (at least while keeping block-complexity "reasonable").
so, likely, it will be necessary to expand the block size to 256 bits and probably use a 128-bit color block.

64-bits: tag and metadata
64-bits: alpha block
128-bits: expanded color block.

this would not effect the output format, as these blocks are purely intermediate (used for frame conversion/quantization/encoding), but would require a bit of alteration to the encoder-side logic.

it sort of works I guess...


video-texture, now with audio...

had an idea here for how to do a DXTn-space deblocking filter, but it would likely come with a bit of a speed cost.
may try it out and see if it works ok though.

misc: added more features for BTIC1C...

Posted by , 19 December 2013 - - - - - - · 700 views

well, the BTIC3A effort also kind of stalled out, mostly as the format turns out to be overly complex to implement (particularly on the encoder). I may revive the effort later, or maybe try again with a simpler design (leaving blocks in raster order and probably designing it to be easier to encode with a multi-stage encoder).

so, I ended up for now just going and doing something lazier:
gluing a few more things onto my existing BTIC1C format.

these are:
predicted / differential colors (saves bits by storing many colors as an approximate delta value);
support for 2x2 pixel blocks (as a compromise between flat-color blocks and 4x4 pixel blocks, a 2x2 pixel block needs 8 bits rather than 32 bits);
simplistic motion compensation (blocks from prior frames may be translated into the new frame).

all were pretty lazy, most worked ok.

the differential colors are a bit problematic though as they are prone to mess up resulting in graphical glitches (blocks which seem to overflow/underflow the color values, or result in miscolored splotches);

basically, it uses a Paeth filter (like in PNG), and tries to predict the block colors from adjacent blocks, which allows (in premise), the use of 7-bit color deltas (as a 5x5x5 cube) instead of full RGB555 colors in many cases.

I suspect there is a divergence though between the encoder-side blocks and decoder-side blocks though, to account for the colors screwing up (the blocks as they come out of the quantizer look fine though, implying that the deltas and quantization are not themselves at fault).

the 2x2 blocks and motion compensation were each a little more effective. while not pixel-accurate, the motion compensation can at least sort of deal with general movement and seems better than having nothing at all.

I suspect in general it is doing "ok" with size/quality in that I can have a 2 minute video in 50MB at 512x512 and not have it look entirely awful.

decided to run a few benchmarks, partly to verify some of my new features didn't kill decode performance.

non-Deflated version:
decode speed to RGBA: ~ 140 Mpix/sec;
decode speed to DXT5: ~ 670 Mpix/sec.

Deflated version:
decode speed to RGBA: ~ 118 Mpix/sec;
decode speed to DXT5: ~ 389 Mpix/sec.

then started wondering what would be the results of trying a multi-threaded decoder (with 4 decoder threads):
420 Mpix/sec to RGBA;
2100 Mpix/sec DXT5 (IOW: approx 2.1 gigapixels per second).

this is for a non-Deflated version, as for the Deflated version, performance kind of goes to crap as the threads end up all ramming into a mutex protecting the inflater (not currently thread safe).

or such...

BTIC1C spec (working draft):

BTIC3A partial spec (idea spec):
(doesn't seem like much, but the issues are more subtle).

well, it looks like 3A may not be entirely dead, there are a few parts I am considering trying to "generalize out", so it may not all be loss. for example, the bitstream code was originally generalized somewhat (mostly as I was like "you know what, copy-pasting a lot of this is getting stupid", as well as it still shares some structures with BTIC2C).

likewise, I may generalize out the use of 256-bit meta-blocks on the encoder end (rather than a 128-bit block format), partly as the format needs to deal both with representing pixel data, and also some amount of internal metadata (mostly related to the block quantizer), and 256-bits provides a little more room to work with.

don't know yet if this could lead to a (probably less ambitious) 3B effort, or what exactly this would look like (several possibilities exist). partly tempted by thoughts of maybe using a PNG-like or DWT-based transform for the block colors.

Possible: BTIC1E (BPTC / BC6H / BC7 + Video Codec)

Posted by , 08 December 2013 - - - - - - · 1,126 views

yes, yet more codec wackiness...

seeing as how my graphics hardware has a limited number of options for (non DXTn / S3TC) compressed texture formats, but does support BPTC / BC6H / BC7, which hinder effective real-time encoding (*), it may make sense to consider developing a video codec specifically for this.

*: though there is always the option of "just pick a block type and run with it", like always encoding BC7 in mode 5 or BC6H in mode 11 or something.
note: BPTC here will be used (in the OpenGL sense) to refer both to BC6H and BC7.
structurally, they are different formats, and need to be distinguished in-use.
when relevant, BC6H and BC7 (their DirectX names) will be used (mostly because names like "RGBA_BPTC_UNORM" kind of suck...).

basic design:
essentially fairly similar to BTIC1C and BTIC1D (which in turn both derive from Apple Video / RPZA).


unlike 1C and 1D, it (mostly) sidesteps a lot of the complexities of these texture formats, and essentially treats the blocks mostly as raw data. this should still allow a moderately simple and fast decoder (into BPTC or similar).
also this stage of the process will be lossless.

this encoding allows a fairly arbitrary split between block-header and block data, which an encoder should be able to try to optimize for (and search for the "greatest savings" in terms of where to split up the block at). this also includes the ability to do "simple RLE runs" for repeating block-patterns, as well as to store raw/unencoded runs of blocks.

note that it isn't really viable to cleanly split between the header and index portions of a block given the way the blocks work.

Enocde Process:
RGB(A) Source Image -> Pixel Block Quantizer + BPTC Encoder -> BTIC1E Frame Encoder -> Deflate -> Packaging/Container.

Decode Process:
Container/Packaging -> Inflate -> BTIC1E Decoder -> BPTC (passed to GL or similar).

the "Pixel Block Quantizer" step will basically try to fudge blocks to reduce the encoded image size; it is unclear exactly how it will tie in with the BPTC encoders. as-is, it is looking mostly like a tradeoff between an RGBA-space quantizer ("pre-cooking" the image) and a naive "slice and dice" quantizer (hack bits between blocks coming out of the BPTC encoder and see what it can get away with within the error threshold, basically by decoding the blocks to RGBA and comparing the results).

an issue: I have rather mixed feelings about BPTC.
namely, it is only available in newer desktop-class GPUs, and could be rendered less relevant if ETC2 becomes widespread in upcoming GPUs (both having been promoted to core in OpenGL).

some of this could potentially lead to cases of needing multiple redundant animated-texture videos, which would be kind of lame (and would waste disk space and similar), though potentially still better than wasting video memory by always using an RGBA16F or RGB9_E5 version.

could almost be a case of needing to implement it and determine whether or not it sucks...

figured the likelihood of BTIC1E sucking was just too high.

started working on another design:

which would be intended as a format to hopefully target both DXT and a BPTC subset, with other goals of being faster for getting to DXTn than BTIC2C, and compressing better than BTIC1C, target speed = 300 Mpix/sec for a single threaded decoder.

going and checking, the gap isn't quite as drastic as I had thought (if I can reduce the bitrate to 1/2 or 1/3 that of 1C, I will be doing pretty good, nevermind image quality for the moment).

I guess the reason many videos can fit 30 minutes in 200MB is mostly because of lower resolutions (640x360 has a lot fewer pixels than 1024x1024 or 2048x1024...).

misc: FRIR2, (Possible) Alpha + Theora and XviD

Posted by , 02 December 2013 - - - - - - · 804 views

recently was working some on a new interpreter design I was calling FRIR2.

what is it?

basically a Three-Address-Code Statically-Typed bytecode format;
the current intention was mostly to try to make a bytecode at least theoretically viable to JIT compile into a form which could be more performance competitive with native code, mostly for real-time audio/video stuff, while still allowing readily changing scripts (not requiring a rebuild, and possibly interactively being able to tweak things).

made some progress implementing it, but it still has a ways to go before it could be usable (and considerably more work before it is likely to be within the target range WRT performance).

not an immediate priority though.

ADD: FWIW, as-is FRIR2 ASM syntax will look something like:
neg.i r13, r9;            //2 byte instruction
add.i r14, r7, r11;    //3 byte instruction
neg.i r19, r23;              //5 byte instruction
add.i r42, r37, r119;    //6 byte instruction
neg.v3f r19, r23;              //6 byte instruction
add.v3f r42, r37, r119;    //7 byte instruction
mov.ic r3, 0
jmp_ge.ic r3, 10, L1
inc.i r3, r3
jmp L0


//with declarations:
var someVar:i;    //someVar is an integer
function SomeFunc:i(x:f, y:f)    //int SomeFunc(float, float)
    var z:f;
    add.f z, x, y;
    convto.f t0, z, 'i';
    ret.i t0;

otherwise, more idle thoughts for how to do alpha blending with Theora and XviD (within an AVI).

previously, I had tried the use of out-of-gamut colors, which while able to encode transparency, would do so with some ugly artifacts and limitations (namely violet bands and an inability to accurately encode colors for alpha-blended areas).

another possibility is to utilize some tricks similar to those used by Google for WebM, namely one of:
encode a secondary video channel containing alpha data (implementation PITA, little idea how existing video players will respond);
double the vertical resolution, encoding the extended information in the lower half, and indicating somehow that this has been done (would be handled via a special hack in the image decoder).

current leaning is toward the resolution-doubling strategy, as it is likely to be less effort.

the main issue is likely how to best encode the use of the hack:
somehow hacking it into one of the existing headers (how to best avoid breaking something?...);
possibly add an extra chunk which would mostly have the role of indicating certain format extensions (would need to be handled in the AVI code and passed back to the codec code).

contents of the extended components:
most likely, DAE (Depth, Alpha, Exponent).

Depth: used for bump-maps, possibly also for generating normal-maps via a Sobel filter (or cheaper analogue), ignored otherwise;
Alpha: obvious enough;
Exponent: Exponent for HDR images, ignored for LDR.

likely, DAE would still be subject to RGB/YUV conversions (could be skipped if only alpha were used).

resolution doubling at least should work without too much issue for existing video players and similar, but would double the height of the video for normal players (leaving all the alpha-related stuff in the bottom of the screen).

Theora and XviD compress a little better than my BTIC2C format, so this could offer a better size/quality tradeoff, but likely worse decoding speeds (BTIC2C is roughly on-par with XviD as-is while already using an alpha channel);
unlike some other options, this would still not support specular or glow maps.

most likely, this is more likely to be relevant to video sequences than for animated textures, where raw RGB or RGBA is more likely to be sufficient for video sequences.

still not sure if this is a big enough use-case to really bother with though.

this could potentially require a fairly significant increase in the cost of the color-conversion, doubling the amount of pixels handled and potentially adding some extra filtering cost for normal-maps;
this should still be fast enough for 720p-equivalent resolutions though.

FWIW, a similar cost is implied as with the BGBTech-JPEG format (which supports alpha and normal maps via additional images embedded within the main image).

otherwise, went and added more video textures (to my game project):
water and slime now are video-mapped (using the BTIC1C codec, *);
ended up using 256x256 for the video-textures (was going to use 512x512, figured this was overkill);
discovered and fixed a few bugs (some engine related, a few minor decoder bugs in 1C discovered and fixed, ...);
made a lot of minor cosmetic tweaks (scaling textures, ...);

a minor tweak is that 1C will now try to "guess" the missing green and alpha bits based on the other bits;
basically, 1C normally stores RGB in 555 format (vs 565 as DXTn uses), so there is a missing bit;
likewise, for alpha, which is stored using 7 bits, vs the usual 8.

in both cases, the guess is currently made by assuming that the low bit depends on the high bit, so it copies the bit, which while naive, seems to be better than just leaving it as 0.

the other option is preserving these bits, but the quality gain is not particularly noticeable vs the image size increase.

*: note, 1C and 2C are different formats. 1C uses an RPZA-based format (RPZA + Deflate + more features), whereas 2C is loosely JPEG-based (and does RGBA mostly by encoding 4-component YUVA images).

1C is primarily focused on decoding to DXTn. it is effectively LDR only (HDR is theoretically possible, but the size and quality from some tests is "teh suck"). while decoding to DXTn it is drastically faster than most other options.

2C is mostly intended for intermediate video and HDR (it can do HDR mostly by encoding images filled with 16-bit half-floats, and/or using one of several fixed-point formats). speed and perceptual size/quality are a little worse than XviD or Theora, but the image quality is much higher at higher bitrates (ex: 30-70 Mbps).

decode speeds are "similar" to those of XviD (both are fast enough to do 1080p30, but 2C can do 1080p30 with HFloat+Alpha). generally, it is ~80 Mpix/sec vs ~105 Mpix/sec.

if XviD were used at 2x resolution to do alpha, this would likely cut the effective speed to around 53 Mpix/sec.
similar applies to Theora.

note: BTJPEG is around 90 Mpix/sec for raw RGB images, and around 60 for RGB+Alpha, for similar reasons.

this leaves the advantage of XviD and Theora mostly in terms of better image quality at lower bitrates (IOW: not throwing 30+ Mbps at the problem...).