data compression

Started by
37 comments, last by iMalc 10 years, 1 month ago


The Fourier transform does not change the amount of information. For, say, a 1024 point FFT, you input 1024 complex values and you get 1024 complex values, so no change in information.

If you instead input 1024 real values, you still get 1024 complex values out, but 511 of them are redundant due to complex conjugate symmetry. That leaves us with 513 values, out of which the first one and the last one have an imaginary part of zero, and thus represent the same amount of information as just one complex number. In the end, 1024 real values in gives you 511 complex values and 2 real values out.

Exactly the same amount of data storage needed for both the input and the output. The amount of unique and redundant information is the same in both the input and the output, but the information is often transformed to a different representation that is more suitable for a later compression stage.

Yes, that is what I said.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Advertisement

Yeah, sorry, I was originally agreeing with you, making an extension to your post and and replying to SunDog at the same time, but I edited the text some times before submitting it and in the end it became just the reply. Unfortunate, and as I said now, I do agree with your point.

Actually a common way to compress a signal is to transform it from the time domain to the frequency domain, by a Fourier transform. Then you can throw away half of the signal because the Fourier transform produces both positive and negative frequencies, for a real signal these are mirror images of each other, in other words for a real signal S(t), which has a Fourier transform F, F(f) = F(-f) (note the change of variables from time t to frequency f). Something like this is used in quadrature amplitude modulation (QAM)

Doesn't the Fourier transform of the real signal contain both real and imaginary components (amplitude and phase) though? Then half of the (complex) transform is still the same size as the (real) input signal, so it's not compression per se as far as I can tell. But yes there are schemes that compress in the frequency domain since when it comes to many types of data, in particular, sound and images, the frequency domain is much more predictable than the spatial domain and is more readily compressed. It's also way easier to quantize a signal without losing too much perceived accuracy (audio, video, ..) by working in the frequency domain rather than in the spatial domain. JPEG does this for instance, as well as MP3, though I don't know all the details.

yes.

typically these are not actually using the actual Fourier transform, but other frequency based transforms:

for example, JPEG (and the MPEG family) uses the DCT (Discrete Cosine Transform);

MP3, Vorbis, and AAC use MDCT(Modified Discrete Cosine Transform).

another option (used, for example, in HDPhoto / JPEG-XR) is the WHT (Walsh Hadamard Transform).

the WHT has an advantage in terms of being faster and can more easily be made lossless (while the DCT can be made fully-reversible / lossless, doing so is computationally expensive). however, the WHT has a drawback in terms of not compressing as well as DCT.

...

the advantage of these transforms is, as noted, that they compress well and are fairly easy to quantize.

they also offer a reasonable tradeoff between conceptual elegance and computational performance.

another possible strategy is Vector Quantization, where the values are instead approximated via either replacing them with indices into lookup tables (similar in concept to a font or color-palette), or via using very coarse numerical approximations (such as selecting between a palette of interpolated values, ...).

the advantage of VQ is that decoders can be made to be very fast (compared to what is generally possible with DCT or WHT).

the drawback is generally a tradeoff that either the format has poor quality, poor compression, or both, or alternatively needs to be overly complex and requires a computationally expensive encoder to get good compression (pretty much the entire codec may turn into an ugly mass of special cases and hair...). so, it may become a lot of tables to build tables to index other tables, with lots of special cases for how to represent the table indices, and lots of dedicated special-case code-branches, ...

going the other direction, you find things like the DWT (Discrete Wavelet Transform), which tend to offer some interesting properties (and was used, for example, in JPEG-2000 and in some video codecs, such as Dirac), and the advantage of being conceptually elegant, but the drawback of generally (in practice) being a bit slower than what is possible with either the DCT or WHT transforms (while not really offering much in terms of a compression advantage).

the result then is that thus far DCT based formats have generally held the dominant position in the image/audio/video space.

I think nobody here described compression the way I understand it, so here it goes.

When we are dealing with strings of bits that describe something (text, sound, an image...), not every string of a particular length is equally likely. Sometimes the distribution of probability is extremely far from uniform, and then we should be able to find an encoding of the data that will use shorter strings to describe probable things and longer strings to describe improbable things, in such a way that the expected length of the encoding of a string is smaller than the original fixed size. You can actually find an encoding that minimizes that quantity, and you'll get something like arithmetic coding.

I have only perused this, but it seems very reasonable: http://mattmahoney.net/dc/dce.html

going the other direction, you find things like the DWT (Discrete Wavelet Transform), which tend to offer some interesting properties (and was used, for example, in JPEG-2000 and in some video codecs, such as Dirac), and the advantage of being conceptually elegant, but the drawback of generally (in practice) being a bit slower than what is possible with either the DCT or WHT transforms (while not really offering much in terms of a compression advantage).
DWT certainly has significant advantages over DCT.

Using DWT, your compression artefacts are distributed over the entire image rather than over fixed-size blocks and are much less noticeable (assuming you use a no-crap basis function), and you can therefore compress at a considerably higher rate before it becomes disturbing.

Technically, you could of course achieve a similar thing with a fullscreen-DCT as well, but the computional effort would be inacceptable. DCT in e.g. JPEG runs on small, fixed size blocks where O(n2), O(n log n) and O(1) are pretty much all the same. But what's true for 8x8 is not necessarily true for 2048x2048.

The reason why DWT has not been (and likely will not be) successful is not that it isn't considerably better. It certainly is.

The reason is that JPEG works, is universally supported, and DCT/JPEG is an embarrassingly parallel problem, both for compression and decompresion. Which means it's trivial to implement in dedicated hardware and write multi-core friendly code for. You process one isolated 8x8 block at a time, and you don't care about the rest of the image. Also, there is no urgent need (other than "yeah, nice to have") for something different.

15-20 years ago, it would greatly matter if 10,000 images on your harddisk would take 4MiB each or 400kiB. That's why JPEG was so hugely successful. It made the difference between "unusable, unaffordable", and "works fine, and doesn't even look too bad".

If you had had a readily working algorithm (say, DWT) that used only 250-300kiB instead of 400kiB at comparable quality back then, then that would have been a bummer.

Today, most people can easily afford to store uncompressed images (or they compress with max quality, anyway). Of course, smaller files are generally better than large files, but the sad truth is that hardly anyone cares any more if a file is 3MiB or 30MiB, not even when it has to be downloaded.

I wonder what was OP smoking to start this thread :) He's just trolling... So many nice and relevant answers, just in vein :(

I wonder what was OP smoking to start this thread smile.png He's just trolling... So many nice and relevant answers, just in vein sad.png


We can enjoy each other's comments about the subject. I don't give a flying fig about the OP's part in this thread. smile.png

Here's another awesome (not entirely serious!) two-step approach to lossless data compression:

Encode data using spherical harmonics. Now that we have data on a "sphere", do a Banach-Tarski decomposition/reassembly.

Since we can reassemble the sphere into two identical spheres of the same size, but we only need one sphere to reconstruct the original data, we know that 50% of the information can be discarded!

However, since SH is only an equivalent representation of the original data, we can optimize the process and immediately throw away 50% of the original input. The great thing about this method is that it guarantees a 50% compression ratio independently of what you input, and at a very low computional overhead. Which of course means you can apply it recursively as often as you like... tongue.png

Actually a common way to compress a signal is to transform it from the time domain to the frequency domain, by a Fourier transform. Then you can throw away half of the signal because the Fourier transform produces both positive and negative frequencies, for a real signal these are mirror images of each other, in other words for a real signal S(t), which has a Fourier transform F, F(f) = F(-f) (note the change of variables from time t to frequency f). Something like this is used in quadrature amplitude modulation (QAM)

Doesn't the Fourier transform of the real signal contain both real and imaginary components (amplitude and phase) though? Then half of the (complex) transform is still the same size as the (real) input signal, so it's not compression per se as far as I can tell (otherwise you would be able to violate information theory by taking the FT of a uniformly random signal, and "compressing" it to half its size biggrin.png ).

The Fourier transform does not change the amount of information. For, say, a 1024 point FFT, you input 1024 complex values and you get 1024 complex values, so no change in information.

If you instead input 1024 real values, you still get 1024 complex values out, but 511 of them are redundant due to complex conjugate symmetry. That leaves us with 513 values, out of which the first one and the last one have an imaginary part of zero, and thus represent the same amount of information as just one complex number. In the end, 1024 real values in gives you 511 complex values and 2 real values out.

Exactly the same amount of data storage needed for both the input and the output. The amount of unique and redundant information is the same in both the input and the output, but the information is often transformed to a different representation that is more suitable for a later compression stage.

Aaargh I forgot about phase, Yes, that is correct you have would have to store both the magnitude and phase (not equivalent to real/ikmaginary, bu easily convertiable, i.e mag = sqrt(real^2 + imag^2), phase = tan-1(imag/real) , so for a signal of length 1024, say, you would have to store 512 magnitude and 512 phase points, so you gain nothing

The "compression" in QAM comes from the fact you only need use half the bandwidth, but from an information theory perspective the data is not "compressed" because in the discrete case you have to store the same amount.

As oither posters have noted, there are lossy compression techniques that transform from the time to the frequency domain.

going the other direction, you find things like the DWT (Discrete Wavelet Transform), which tend to offer some interesting properties (and was used, for example, in JPEG-2000 and in some video codecs, such as Dirac), and the advantage of being conceptually elegant, but the drawback of generally (in practice) being a bit slower than what is possible with either the DCT or WHT transforms (while not really offering much in terms of a compression advantage).

DWT certainly has significant advantages over DCT.

Using DWT, your compression artefacts are distributed over the entire image rather than over fixed-size blocks and are much less noticeable (assuming you use a no-crap basis function), and you can therefore compress at a considerably higher rate before it becomes disturbing.


I had found the compression differences to be fairly modest in past tests, especially at higher quality levels.

Technically, you could of course achieve a similar thing with a fullscreen-DCT as well, but the computional effort would be inacceptable. DCT in e.g. JPEG runs on small, fixed size blocks where O(n2), O(n log n) and O(1) are pretty much all the same. But what's true for 8x8 is not necessarily true for 2048x2048.

usually I had used 8x8 or 4x4 DCT blocks.
4x4 DCT has the advantage of being faster than an 8x8 DCT.

there is also 16x16 DCT, but the problem is that 16x16 is somewhat slower than 8x8.
an experimental compromise was using a 2-level 4x4 DCT over a 16x16 block, which is a little faster than a 16x16 DCT (and more directly comparable to an 8x8 DCT).

I had also experimented at one point using a 2-level 8x8 DCT (over a 64x64 block), but found that it was detrimental to compression (for some reason), as well as being slower.

typically, I had used integer/fixed-point DCT transforms.

another factor is the back-end VLC and entropy coding, where it is possible to make some improvements over JPEG in this area, along with fiddling with the algorithms for generating quantization tables and similar.

The reason why DWT has not been (and likely will not be) successful is not that it isn't considerably better. It certainly is.

The reason is that JPEG works, is universally supported, and DCT/JPEG is an embarrassingly parallel problem, both for compression and decompresion. Which means it's trivial to implement in dedicated hardware and write multi-core friendly code for. You process one isolated 8x8 block at a time, and you don't care about the rest of the image. Also, there is no urgent need (other than "yeah, nice to have") for something different.

could be.

my current JPEG decoder uses a macroblock-at-a-time strategy for some of its decode-routes.
so, when decoding each 16x16 pixel macroblock, it will go straight from the bitstream to the output.

this is moderately faster than decoding into Y/Cb/Cr planes and then converting the planes to RGB or DXTn in a separate pass.

then there are also tricks like manually unrolling some of the loops, and adding special cases to detect-and-skip the inverse-quantization and inverse DCT in certain cases (namely flagging cases where the DC coefficient was directly followed by an EOB, and then flood-filling the output block with a single value, ...).

some of my experimental DCT-based video codecs had worked similarly.

decode performance has generally been in the same general area as XviD (~ 110-115 megapixels/second on my PC), with me typically getting around 90-105 Mpix/sec from JPEG-like designs for RGBA output (for a single-threaded decoder).

my VQ codecs have tended to be a bit faster.

I hadn't had much success trying to get similar sorts of speeds out of either DWT or PNG based designs, but more investigation could be warranted here (past experiments with DWT had focused mostly on the Haar wavelet, with comparable speeds from both DWT and PNG-style Paeth filtering).


I had designed some experimental DCT/VQ hybrids, but thus far none has been fully implemented.

a recent, much lazier, and moderately-successful compromise was using a DCT-based coding for I-Frames (and for some P-Frames), and using a VQ-based strategy for most other P-Frames. this allowed some size/quality improvements with a less severe performance impact (only about 1/16 of the frames were DCT-based, with VQ for most others).

this design does seem a bit like an ugly hack though.

15-20 years ago, it would greatly matter if 10,000 images on your harddisk would take 4MiB each or 400kiB. That's why JPEG was so hugely successful. It made the difference between "unusable, unaffordable", and "works fine, and doesn't even look too bad".
If you had had a readily working algorithm (say, DWT) that used only 250-300kiB instead of 400kiB at comparable quality back then, then that would have been a bummer.
Today, most people can easily afford to store uncompressed images (or they compress with max quality, anyway). Of course, smaller files are generally better than large files, but the sad truth is that hardly anyone cares any more if a file is 3MiB or 30MiB, not even when it has to be downloaded.



could be, but, my past tests had generally found the compression differences to not be quite so drastic.

though, mostly this was testing things at around higher quality settings (typically at around 90% JPEG quality or similar).

generally I have found 90% to be a good compromise quality, as past around 95%, and there is a large increase in file-sizes with no real visible improvement in image quality.

OTOH, much below about 70% JPEG quality, and the quality drop-off is fairly rapid compared with modest drops in file-sizes.

I guess maybe the DWT could have a bit more of an advantage in sub-70% territory...

This topic is closed to new replies.

Advertisement