Jump to content

  • Log In with Google      Sign In   
  • Create Account

- - - - -

Possible: BTIC1E (BPTC / BC6H / BC7 + Video Codec)

Posted by BGB, 08 December 2013 · 1,026 views

yes, yet more codec wackiness...

seeing as how my graphics hardware has a limited number of options for (non DXTn / S3TC) compressed texture formats, but does support BPTC / BC6H / BC7, which hinder effective real-time encoding (*), it may make sense to consider developing a video codec specifically for this.

*: though there is always the option of "just pick a block type and run with it", like always encoding BC7 in mode 5 or BC6H in mode 11 or something.
note: BPTC here will be used (in the OpenGL sense) to refer both to BC6H and BC7.
structurally, they are different formats, and need to be distinguished in-use.
when relevant, BC6H and BC7 (their DirectX names) will be used (mostly because names like "RGBA_BPTC_UNORM" kind of suck...).

basic design:
essentially fairly similar to BTIC1C and BTIC1D (which in turn both derive from Apple Video / RPZA).


unlike 1C and 1D, it (mostly) sidesteps a lot of the complexities of these texture formats, and essentially treats the blocks mostly as raw data. this should still allow a moderately simple and fast decoder (into BPTC or similar).
also this stage of the process will be lossless.

this encoding allows a fairly arbitrary split between block-header and block data, which an encoder should be able to try to optimize for (and search for the "greatest savings" in terms of where to split up the block at). this also includes the ability to do "simple RLE runs" for repeating block-patterns, as well as to store raw/unencoded runs of blocks.

note that it isn't really viable to cleanly split between the header and index portions of a block given the way the blocks work.

Enocde Process:
RGB(A) Source Image -> Pixel Block Quantizer + BPTC Encoder -> BTIC1E Frame Encoder -> Deflate -> Packaging/Container.

Decode Process:
Container/Packaging -> Inflate -> BTIC1E Decoder -> BPTC (passed to GL or similar).

the "Pixel Block Quantizer" step will basically try to fudge blocks to reduce the encoded image size; it is unclear exactly how it will tie in with the BPTC encoders. as-is, it is looking mostly like a tradeoff between an RGBA-space quantizer ("pre-cooking" the image) and a naive "slice and dice" quantizer (hack bits between blocks coming out of the BPTC encoder and see what it can get away with within the error threshold, basically by decoding the blocks to RGBA and comparing the results).

an issue: I have rather mixed feelings about BPTC.
namely, it is only available in newer desktop-class GPUs, and could be rendered less relevant if ETC2 becomes widespread in upcoming GPUs (both having been promoted to core in OpenGL).

some of this could potentially lead to cases of needing multiple redundant animated-texture videos, which would be kind of lame (and would waste disk space and similar), though potentially still better than wasting video memory by always using an RGBA16F or RGB9_E5 version.

could almost be a case of needing to implement it and determine whether or not it sucks...

figured the likelihood of BTIC1E sucking was just too high.

started working on another design:

which would be intended as a format to hopefully target both DXT and a BPTC subset, with other goals of being faster for getting to DXTn than BTIC2C, and compressing better than BTIC1C, target speed = 300 Mpix/sec for a single threaded decoder.

going and checking, the gap isn't quite as drastic as I had thought (if I can reduce the bitrate to 1/2 or 1/3 that of 1C, I will be doing pretty good, nevermind image quality for the moment).

I guess the reason many videos can fit 30 minutes in 200MB is mostly because of lower resolutions (640x360 has a lot fewer pixels than 1024x1024 or 2048x1024...).

I am a 'file format' kind of guy, and I get happy reading these posts about various storage and encoding methods.


Might be time to go to meetings... ;)

yeah, and in my case it is mostly trying to find a "good" solution to the general set of issues I keep running into.


like, wanting videos to go into textures while getting good performance and not producing overly huge files (vs "real" codecs), ...



luckily, most of these codecs are fairly simple and consist largely of reused code, mostly with tweaks being made to the format structure encoding details in an attempt to adjust for things like compressor efficiency and performance.


the result then is a small "forest" of various experimental codecs.



my newer BTIC3A codec could at least potentially/hopefully address a few issues (should compress a lot better than 1C at least).


I am less certain about the decoding performance, since trying to implement it, there are areas where it is both simpler and more complex than 1C in terms of per-block overhead.


but, as long as it goes faster than a typical DCT codec, it is probably good, and I suspect it probably will.


wont really know until it is fully implemented though...

It may have been said before; but I didn't catch it; what is the eventual goal of having actual video being displayed on a texture?

When I dealt with this stuff years ago, it was just to render cutscenes full screen; but it seems like you might have different ends?

mostly it would be for sake of:

better animated textures, which already work acceptably (already used for things like lava/slime/water/fire).


basically, more traditional animated textures have fairly severe size and/or length limit issues, limiting them mostly to often low-res and either very short or having a very low framerate (if a person tries to stretch 10 or 16 frames over multiple seconds).


also in-game TVs and displays.

for example, you could have an in-game TV which could be triggered to play a video-sequence on some event.




a lot of this is done in various places in Doom 3, which uses RoQ (another VQ-based format), however, Doom 3 generally uses fairly low resolutions, and still has fairly large file sizes.


cutscenes are also possible, but as-is, conventional cutscenes could be better served by a more conventional video format (since you can get by using a lot more of the CPU time for a cutscene).


so, it is basically trying to do things like get multiple video streams to decode at the same time using relatively low CPU load.


elsewhere I did a test:

videos are 512x512 (and 30Hz), but I did subsequent tests with 1024x1024 videos, and framerates still held up. (note: the video decoding is happening inside the render thread in this case...).


but, then noted a problem: ~ 10 minutes of video at 1024x1024 works out to around 450MB with the codec (1C) even with fairly poor video quality.

so, this gave me reason to look at maybe trying to address the bitrate issue.

10 minutes of video at 200MB at better quality would be a little more passable.



note: BTIC1C is essentially a Deflate-compressed variant of the Apple Video / RPZA format (historically RPZA was used by applications based on Apple QuickTime). the Deflate generally reduces bitrate by about 40% with a "modest" performance impact. (I made other extensions to the format though, such as for an alpha channel and mipmaps).



so, more fiddling, and designing (and implementing) a more aggressively compressed, but hopefully still fast to decode, format.

July 2016 »