Video codecs: compressing delta values

Started by
3 comments, last by maxest 6 years, 5 months ago

I'm implementing a video codec. One way of improving efficiency is to compress delta between two frames. The problem with computing difference though is that it increases the range of values. So if my input image has RGB values in [0, 255] range the diff can be range [-255, 255]. My solution to this is to calculate (in floats) clamp(diff, -0.5, 0.5) + 0.5. This gives me range [0, 255] but cuts off higher values which actually is not a problem; at least I don't seee much difference.

I was suggested that instead of using "raw" difference between input frames pixels I should use xor; without much further explanation. I seriously doubt if I should xor input RGB values before applying conversion to luma-chroma, DCT and quantization on the result does not yield good results (I see severe artifcats). Anyway, I've tried different approaches and here are my various findings. As a test case I took two similar pictures.

1. Compression of a single picture, no delta, JPEG-like (conversion of RGB to luma-chroma, quantization, DCT and finally Huffman-based RLE), gives compression ratio x26.

2. Computing xor of input frames and then compressing that (JPEG-like) gives compression ratio x39 and afore-mentioned artifacts.

3. Computing clamp(difference, -0.5, 0.5) + 0.5, followed by JPEG-like compression, results in compression ratio x76.

4. Since xor itself seemed to make sense but applied on *DCTs* of the input frames, not the RGBs, I tried that. Storing xor of DCT's and running RLE on that gave me compression ratio of x72.

So as you see, I indeed achieved some nice compression using xor but only with 4. 3. gives the best compression ratio and has some other advantages. Since differences are more "natural" than xor nothing stands against blurring difference from 3. and thus achieving even better compression ratio at the cost of decreased, yes noticeable but not that much, quality.

Do you have any thoughts on improving delta compression of images? I'm asking because in extreme cases delta compression produces blocks and eventually whole frames which have *worse* compression ratio than when compressed without delta.

Advertisement

I don't know much about the topic, but you don't actually need one extra bit for the difference if you wrap the difference around at the 1-byte border:

The difference from 255 to 0 is -255, but can be expressed as +1: (255 + 1) % 256 = 0.

The difference from 0 to 255 is +255 and can be expressed as that: (0+255) % 256 = 255.

 

PNG does that for some row filter methods if i'm not mistaken.

At first I did not understand how this was supposed to work. My though was: hey, I still have [-255, 255] range to cover so how can I skip the extra bit? But your suggestion enlightened me on that - I *don't* have that range. I indeed have negative values to cover but the total number of values I need to represent is 256, not 511. I'll give an example to elaborate a bit more.

Let's say we have a pixel which in frame1 has value of 20 (in [0..255] range) and in frame2 the same pixel has a value of 7. Difference to encode here is 7 - 20 = -13. Now, since that 7 has to be only in range [0..255], our limits are:

0 - 20 = -20
255 - 20 = 235

So for that case the only viable differences to encode are in range [-20, 235]. So as you see we still have only 256 values so 8 bits is enough.

So back to our example, we got 7 - 20 = -13. We take -13 % 256 = 243 and store that.

On the decoding side we have previous, frame1's value of 20. We decode by summing 20 + 243 = 263. Taking 263 % 256 = 7.

This solution not only improved my compression so that the whole spectrum of values is encoded but, to my surprise, the compression ratio has increased by a measurable few %. Not sure why that happened but I won't complain :).

Thank you rnlf_in_space immensely for making me realize what I have just described :).

Huh, my joy was premature.

I made a mistake in my code and the change we're taking about here was not applied at all. After I have ensured it is applied artifacts showed up.

I did make use of the trick desrcibed above though. I used this to more efficiently store differences between DC coefficients.

I think it is not possible to apply any non-linear operator (like xor or modulo) *before* running DCT/quantization on the data and expect correct results on the decoding side. I might be wrong here but that is my intuition so far.

This topic is closed to new replies.

Advertisement