audio/video compression

Started by
6 comments, last by ironfroggy 22 years, 9 months ago
I know how to do alot of the methods but some of them are a bit unclear so I have a few questions for anyone knowing about these things. 1) is my understanding that there are certain Hz in the sound you can cut out without loosing data true? and how would you do this? 2) is there more to video compression than just compressing each frame by itself?
(http://www.ironfroggy.com/)(http://www.ironfroggy.com/pinch)
Advertisement
quote:Original post by ironfroggy

2) is there more to video compression than just compressing each frame by itself?



Although JPEG is sometimes used to compress video (in some TV production settings), there is usually more to it than just compressing each frame individually.

The MPEG format works based on both individual frames and similarities/differences between frames. In MPEG, you first compress what is called an I-frame. This is basically the entire frame of compressed video. In this format you have to choose a group of pictures (GOP) length. The GOP length determines how many frames you have between each I-frame.

The intermediate frames between I-frames are called P-frames and B-frames. Usually, the GOP goes something like IBBPBBPBBP... In the TV industry, a long GOP would be 16 frames from I to I.

In the P- and B-frames, compression is based upon changes and movement in the image. The closer each frame is to the frame before it, the better the compression rate.

So, to make a short story long, MPEG basically compresses one whole frame of video, then for the next several frames compresses the changes and differences, then repeats. The actual compression algorithms are pretty complex for MPEG, but if you are just interested in the general principles, hopefully this will help you out.

You''d probably want to take some signal analysis/processing courses for more in-depth explanation, but I''ll try to explain some basics. Also, signal processing field is *very* math intensive, so be warned...

Most signal processing - video or audio - happens in frequency domain. When you want to compress data, first step obviously is to convert data to frequency domain with DFT/FFT (discrete/fast fourier transformation) or (M)DCT ( (modified) discrete cosine transformation). There usually is quite a lot of redundant data which is more easily removed when working with frequencies, instead of sampled data. MP3, MPEG, JPEG and other ''lossy'' algorithm all work on frequency area - which is quite math-intensive, resulting slow processing, especially if algorithms used aren''t optimal. I have only worked with JPEG and MP3 (it''s really hard to get any good documentation of the latter, so I kinda gave up), so I can''t give more specific information of the formats - most of my information is second hand, and most probably partially wrong.

For the audio part... Yes, it is possible to remove a frequency (or frequencies). The "easiest" way to do this is to calculate fourier transformation, remove desired frequencies from spectrum and then calculate inverse fourier transform. However, this is *very* expensive (mathematically). When only filtering is required, most DSP''s use convolution theorem (you might want to familiarize yourself with the term and princible behind it - it''s very important in signal processing). Basically this means that you take a set of samples (amount depends on filter used), multiply it with impulse response data, sum it up and get resulting filtered data.
(note; I''m talking about FIR (finite impulse response) filters here. Although they are more (computationally) expensive than IIR (infinite impulse response) filters, they are easier to create and don''t suffer from stability problems.)
The filtering, however, is not cheap (again, I''m speaking computationally here). You you want to remove one frequency (as in 1 Hz) in from CD-quality audio real time -- well, good luck. You''ll be needing a dedicated DSP processor for that, probably even 1.5GHz TBird won''t be powerful enough for that -- my 500MHz Athlon barely can process data with 150Hz resolution. (that code isn''t optimized, though, and even then amount of math is immerse; appromately 80 million floating point multiplies alone in a second!)
Also, cheaper filters induce ''artifacts'' in sound. Usually they are in form of frequency/phase nonlinearity -- that is, frequency response isn''t exactly what you wanted. Designing a filter is always a tradeoff; you want something (say, specific 50Hz cut out, but nothing else); it''s too expensive, so you need to give up (say, your filter cuts out those 50Hz, with 50Hz transition band either side and little (say, 1 dB) ripple on passbands). And even that might require quite expensive DSP or more processing power than your PC has...!

As I said, this whole topic is quite complex. I tried to make sense here, but probably ended up confusing you people even more. Look for more information about signal processing. And prepare to be overhelmed with information...

Disclaimer; I have only limited knowledge of signal processing field. Some information here might be wrong or misleading.

~~~ "'impossible' is a word in the dictonary of fools" --Napoleon
The signal processing side is relatively tame, compared to the statistical and optimization problem side of things (joint optimization of lots of parameters is a bitch, as anyone who has ever tried to optimally encode with say DivX will know .

Basically Id say to understand it (not apply it mind) you would need a good grasp of linear algebra (to understand transforms and the concept of energy compaction) and a little probability and information theory (to understand the concept of entropy and modelling/predictive-coding) to get to grips with how audio/image coding approximately works (if you understand predictive and transform coding, motion compensation is a trivial step). Nothing too advanced... so if your like to learn auto-didacticly theres plenty sources on the net to educate yourself if you wish, or you can just wait till you get to college like me :/ (Im a lazy bum)

BTW Hway if you want to remove "a single frequency" you are probably better off using a sliding DFT/FFT to lock the signal and determin the instantaneous amplitude at any given sample. For a certain size history you keep the correlation with a sine and cosine of the given frequency, determin from that the amplitude at the present sample to subtract and then update the correlation for the next sample and repeat... because the correlation is simply the sum of the multiplication of the sine/cosine and the signal you can subtract the multiplications for the oldest sample in the history and add it for the new one to get a sliding calculation (you can do that by either keeping a circular buffer with the multiplications along with the sums, or just calculating it again... 2 more flops per sample could be cheaper than a lookup in a circular buffer). The width of the filter is inversely proportional to the history size. Im pretty sure that should work... but I could be wrong


Yes, processing itself is simple. Grasping the ideas behind the process is the hard part. I''ve taken, some 3 or 4 signal processing courses, from basics to DSP building, and some things are still quite vague.

Generally I see no point on removing one single frequency (except maybe cases where mains 50/60Hz ''brum'' is problem, or similar situation, but there some kind of PLL/VCO-combination with might be a better solution anyway).

What it comes to mpeg2/4 and motion compensation and other fancy stuff like that, I haven''t bothered myself with them at all. I have no interest in that field, for now at least. I''m no mathmo , so most of the time I just get a formula (already mathematically optimized preferably) and I am told to implement that. Getting the bits in proper order is more fun anyways, especially in low level...

And what comes to your explanation of frequency removal... I have absolutely no idea what you''re saying. I read that five times but it still makes no sense to me...
~~~ "'impossible' is a word in the dictonary of fools" --Napoleon
Yeah I knew it was pretty obscure, but I wanted to keep it short... let me try that again.

I shouldnt have made the connection to FFT at the start really.

Here's what I was suggesting, for a certain history before your present sample you determin the phase and amplitude of the 50 Hz sine you assume is there (which you do by calculating the correlation of that history with a sine and cosine). If you have that you can determin the amplitude of the 50 Hz sine at the present sample and subtract it to remove it.

You could implement this quite efficiently because most of the work is in the correlation calculation, which can be implemented in a sliding fashion... only requiring subtracting the correlation for the oldest sample and adding it for the newer one to update it.

That was just my spur of the moment idea (probably not my idea, but its been a long time since I did DSP... so I forget my sources) but then it occurred to me that this is actually entirely equivalent to brick wall filtering at 50 Hz in the frequency domain, with a bin spacing of sampling-rate/history-size. Its almost like filtering with a sliding DFT/FFT (thats pretty obscure BTW, not too many people use that) but instead of calculating the inverse of the filtered signal you calculate the amplitude of the componenent you want to filter out and subtract it in the time domain since its less work. The DFT is actually pretty fast if you only want to calculate it for a single (or even a couple) of frequencies.

Of course brick wall filters give quite horrible amplitude and phase fluctuations in the continuous spectrum, so I dont know if this would work well, you could use the same procedure for multiple frequency bins though to implement a little smoother filter. Of course there's a point where if you change enough components its faster to just do filtering entirely in the frequency domain (through sliding FFT, or through overlap add/save techniques).

Or for that matter a normal FIR filter, but low frequency highly selective filters are usually going to be cheaper in the frequency domain. Only short simple filters make sense in the time domain, unless you need low latency (and even then you can break up the impulse response and implement parts for which latency is no problem through FFT's of increasing size, unfortunately this method is patented... I consider this method smart but obvious if you work on the problem for a while, doesnt deserve a patent).

Or if you want a really perverse solution and nothing but FIR filters there's multirate.

Edited by - MfA on June 26, 2001 7:47:51 AM
I''m not an expert or anything on this field... but wouldn''t it be best compressed if you just save the updates for the pixels rather then saving all the pictures? I''m sure it''s used, I just wonder how common it is...

thx,
cya,
Phil


Visit Rarebyte!
and no!, there are NO kangaroos in Austria (I got this questions a few times over in the states
Visit Rarebyte! and no!, there are NO kangaroos in Austria (I got this question a few times over in the states ;) )
quote:Original post by phueppl1
I''m not an expert or anything on this field... but wouldn''t it be best compressed if you just save the updates for the pixels rather then saving all the pictures? I''m sure it''s used, I just wonder how common it is...

thx,
cya,
Phil


Visit Rarebyte!
and no!, there are NO kangaroos in Austria (I got this questions a few times over in the states


That''s what I said. Guess I was too long winded and boring. What you said is pretty much how MPEG video compression works. (With lots of extra tricks, of course)

This topic is closed to new replies.

Advertisement