how is audio represented in data?

Started by
18 comments, last by ironfroggy 22 years, 9 months ago
Ok so if I had a format where each byte was a value for the wave, and I wanted to sample 1024 values to get the frequencies, how would I do that? How would I measure all the different frequencies? Can I see code? Can I stop asking questions?

(http://www.ironfroggy.com/)(http://www.ironfroggy.com/pinch)
Advertisement
For simplicity, let's normalize your audio samples, so that each is a floating point number in range of [-1,1]. Now, you have buffer f of those; (f is widely used to represent spatial, that is time domain data, so I use it here).

float f[1024];

For these samples, you need to do Discrete Fourier Transformation. The mathematical formula can be easily found with a simple search, so I just give you simple implementation:
(note; I assume here that you are fairly experienced with C++ and it's standard libraries so you can fill out missing #includes and other stuff without anyone holding your hand there...)

for (int n = 0; n < 1024; n++){  std::complex((float)) value = 0;  for (int i = 0; i < 1024; i++)    {       float angle = (-2*M_PI*i*n) / 1024.0;      value += f * std::complex((float))( cos(angle), sin(angle) );     }       F[n] = std::abs( value );}  


Note: System screwed up templates, (( above should be 'smaller-than' char, and )) should be 'greater-than'...

Note: The result of transform is complex; that is, it has both amplitude and phase. Here I just ignored phase spectrum, because amplitude spectrum is usually the more interesting one.

Note: This is not the most accurate way to calculate DFT. It is not the only way to calculate DFT either, and as you can see, it is quite slow. I picked up a simple (and in this case, "accurate enough") DFT formula and applied it, so no bitching about that, please.

After the above is run (assuming I didn't make any mistakes there), first half of F (indices 0..511) will contain the frequency spectrum of audio data. Each value in array will represent a span of (Fs/1024) frequencies (where Fs is your samplerate, for example 44100 for CD sound, thus giving you resolution of 43Hz per sample).

The upper half of F (512..1023) will contain basically the same information than the lower part, only mirrored, so most of the time you'll probably want just to ignore it (or not to calculate it at all).



Edited by - Hway on June 29, 2001 5:41:49 AM
~~~ "'impossible' is a word in the dictonary of fools" --Napoleon
That''s interesting merlin, because my book says "advanced". *shrug* Then again it''s a Microsoft book and what do they know. "Adaptive Delta" does make more sense, in light of the algorithm used to get from PCM to ADPCM.

All the .wav files I''ve had to deal with have been ADPCM. I have very rarely dealt with pure PCM data, and most of those have been .au files.

-fel
~ The opinions stated by this individual are the opinions of this individual and not the opinions of her company, any organization she might be part of, her parrot, or anyone else. ~
Hway: to get this board to cooperate with your postings, you should enclose code samples in the or the [ source ] [ /source ] tags (without spaces).

Ironfroggy: be careful that 8-bit PCM WAV files are represented different than 16-bit PCM WAV files. 8-bit and 16-bit refer to the size of each sample.

16-bit is a "normal" signed short, so if you read the value of 0, it would be ~0 volts in the waveform, and +32767 would be 1 volt in the waveform, etc.

8-bit is an "offset" unsigned char; the values range from 0 to 255, with 128 being about the zero point; in other words, with 8-bit PCM, 0 is -1V and 255 is +1V. I don''t know who''s stupid idea this was, and I seem to complain about it way too much, but there you have it.
You know, fel, I''ve never encountered ADPCM WAV files, except for ones I explicitly made. For example, all of the WAVs that come with Windows are raw PCM, as are most of what you''ll find on the Internet. So...ADPCM...that''s weird.
Well, when I actually cared what format it was in, I was working for a voiceprint software company. Basically I had to write the libraries to convert between ADPCM .wav''s and a few proprietary encrypted/watermarked formats. For games, I mostly just dump it into the mmio and DirectSound stuff and let that deal with it however it wants.

-fel
~ The opinions stated by this individual are the opinions of this individual and not the opinions of her company, any organization she might be part of, her parrot, or anyone else. ~
I just had to chime in because I've worked on something similar recently.

The major of the wav files I've worked with have also been raw PCM's (44kHz 16bit L/R packed shorts), which is exactly how you envisioned them ironfroggy. ADPCM saves the offset from the previous point (using less bits). The idea is that the wave will never vary by more than (say) 12bits between points; so by using this technique you get 25% lossless compression. The advantage is those huge wav files are now smaller, but the code's a little more complicated because you now have packed 12bit samples instead of (reasonably) nice 16bit samples. There's a pile of ADPCM codecs using different amounts of bits and methods.

I submitted a snippet to GameDev on FFTing; if you're interested in making a spectrum analyzer (like the one in WinAmp) you need to do a bit more than just FFT. You can also download butt-quick FFTs and other DSP functions from the intel & amd websites.

If you take 1024 waveform points and run them through an APS (Auto Power Spectrum) you get 512 spectrum points out (aka a 'spectrum trace' or just a 'trace'). Each of those points is a relative measurement indicating how much power (how much energy of the wave contributes to) the corresponding frequencies. Each spectrum point (aka 'bin') represents a range of frequencies. That range is determined by the sampling frequency of the waveform and the number of points used in the APS.

n = waveform points
f = sampling rate
bin size = f / n

So a 1024 point APS from a 44.1kHz wave will yield a bin size of 43.06640625Hz. i.e. The first bin goes from 0-43, second bin from 43-86, etc... To get finer bin sizes you need to sample at higher rate or take the APS with more points.

When you display the data you'll want to display it on a logarithm scale, like in dB = log(APS)/20 (so long as the device you're using is scaled correctly, i.e. calibrated).

If you display the spectrum characteristics of an 128kbps mp3, it will drop off sharply at 16kHz; it's an indication that you've got it working.


Magmai Kai Holmlor
- The disgruntled & disillusioned


Edited by - Magmai Kai Holmlor on June 29, 2001 8:23:32 PM
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
I don''t know all that C++ template stuff so I am having a little trouble understanding that. Could anyone redo that without the templates? Possibly even just in C because I would like to optimize in assembly and I find it much easier to go C -> Assembly than C++ -> Assembly.

I also have some more questions. Am I to understand that there are two forms of spectrums? For one its the different wave height ranges and for others its the frequency ranges? How exactly are they measured? Not code, actual explanations?

For the wave height one I would guess that the actual wave data is just fed into that statistically? But how do you find a full wave (peak to trough to peak) from the data to get the frequency spectrum?

(http://www.ironfroggy.com/)(http://www.ironfroggy.com/pinch)
There''s only one spectrum, but many ways to calculate or estimate it.

check out Fastest Fourier Transform in the West for C code
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
quote:Original post by ironfroggy
I also have some more questions. Am I to understand that there are two forms of spectrums? For one its the different wave height ranges and for others its the frequency ranges? How exactly are they measured? Not code, actual explanations?

There are two domains that are commonly used when discussing signals: time domain and frequency domain.

To visualize a signal in the time domain, you draw linar amplitude of the signal vs. the linear time. This is called a waveform display, or just a wave display. For instance, if you plotted the value of each sample on a y axis vs the sample number on the x axis, you would have a waveform display.

The other domain is the frequency domain, where you talk about the spectral content of the signal. One method to get frequency domain data is to apply an FFT over some amount of the time domain data. What you get out is a complex (real & imaginary, magnitude and phase) representation of the data spectrum from the lowest frequencies to the highest frequencies over that period of time (also called a "window"). Generally, a "spectrum" graph is a graph of the magnitude of the spectrum data vs. frequency. The magnitude is usually presented in decibels, so you need to take the log of the data before displaying it in order for it to look right. For audio, the phase is usually ignored since the human ear is deaf to almost all forms of phase distortion.

As for how the spectrum is calculated, it''s easiest to look at the DFT. If you multiply some signal that is a combination of sinusoids by a single sinusoid, it filters all other sinusoids except the one by which you''re multiplying from the signal. The reason for this is that
A*sin(n*f) * B*sin(m*f) = A*B*sin^2(n*f) if n=m, 0 otherwise.

The DFT filters your signal at each frequency multiple of some small frequency resolution, which gives you a representation of the signal as a sum of sinusoids. The FFT uses the parallelism within discrete data in sets that are a power of 2 to reduce the number of computations needed significantly, but still offering the same answer.

These are pretty advanced mathematical topics, so if you want to learn more about this you should probably do some independent research.

This topic is closed to new replies.

Advertisement