Archived

This topic is now archived and is closed to further replies.

ironfroggy

how is audio represented in data?

Recommended Posts

The only way I can think of is that its a series of values of the offset of the currect point on the wave from 0. I know I''m wrong and there is probably more to it. So, could someone please help me understand this?

Share this post


Link to post
Share on other sites
Depends what type of audio data. Midi data is simply hardcoded into to format. it will read something like
quaternote at x pitch in measure x and beat x
wav, mp3, and most other formats save it similarly. each piece of data tells the pitch or the sound. the timing of it can be assumed by the resolution the it was saved in. (like 44hz)
It''s not quite that simple, but that should give you a rough idea. for exact file formats check out www.wotsit.org.

HHSDrum@yahoo.com

Share this post


Link to post
Share on other sites
Actually you''re pretty much right. It really is the value of the waveform from a base point. The sampling frequency would indicate how often along the waveform you take the measurement. This is, of course, for standard pulse code modulation (.wav is ADPCM, Advanced Pulse Code Modulation, which has a little compression work done but is still recognizeable as sampling)... some nifty circuits and our ears make it "sound" smooth. Granted you can run this set of samples from the waveform through a few algorithms to compress it, at the loss of some quality.

-fel

Share this post


Link to post
Share on other sites
In AD/PCM formats, stereo data is stored in marching steps, L|R|L|R|L|R, through the data stream.

--------------------
-WarMage
...www.wotsit.org will have some links to .wav and PCM decoding

Share this post


Link to post
Share on other sites
So what is the relations between the waveforms and the spectrum? How do I get spectrum data from the waveform and whate exactly is the spectrum? Know any sites?

Share this post


Link to post
Share on other sites

Relation between waveform ("spatial domain") and spectum ("frequency domain") is a bit ... well, not necessarily easiest thing to grasp. But basically it boils down to that you can represent arbitrary data ("waveform") with a proper sum of sinusoidial signals ("frequencies"). I won''t start proving it, so you just have to take my word for it (or any one''s word who''s done even a bit of signal processing...)

By "spectrum" you probably mean the amplitude spectrum of data. It basically shows what frequencies are present in waveform data (just like signal analyzer in winamp or your stereo set).

I don''t start lecturing more here, if you are interested, you might want to look for example "fourier transformation" (spatial->frequency) and "inverse fourier transform" (frequency->spatial).

Be warned however, signal processing can be ... a bit elusive topic, and it probably will take you a while to even grasp the basics. It''ll be one step towards a larger world, however...

Share this post


Link to post
Share on other sites
the only thing i cant understand is how for instant i a single value of the wave height can be transformed into all those different frequencies. or is it something over time? measure the different frequences over time t and each one it put into the spectrum?

Share this post


Link to post
Share on other sites
You can''t get more information out of something than you put into it. If you''re doing a Fourier analysis of a signal, the maximum amount of frequencies you can resolve is N/2, where N is the length of your time series (the waveform). The reason for the N instead of N frequencies is that there are both positive and negative frequencies in a real signal, and they are always identical.

For N = one data point, you only have the magnitude of one frequency (0Hz).

For N = 2, you get the DC value and the magnitude of the frequency at the Nyquist rate (half the sampling rate). You can also say Fs/2 where Fs is the sampling frequency.

For N = 4, you get frequencies 0, Fs/4, -Fs/4 and Fs/2.

Your frequency resolution is going to be Fs/N; that is, you''ll get N evenly spaced frequency results along the negative and positive sides of the spectrum. This gives you Fs/2 real data points.

There is a way to get higher resolution with less samples, but it requires using autoregressive modeling & is a bit more complicated than straight Fourier analysis.

Share this post


Link to post
Share on other sites
First, ADPCM stands for Adaptive Delta Pulse Code Modulation.
Second, WAV is not intrinsically ADPCM as felisandria suggests. In fact, WAV can use many different formats (as defined in mmreg.h), which is why the WAV header contains a field to identify the format. Usually, WAV files are raw PCM; it's rare to find any other types in practice.


Edited by - merlin9x9 on June 28, 2001 3:23:54 PM

Share this post


Link to post
Share on other sites
Ok so if I had a format where each byte was a value for the wave, and I wanted to sample 1024 values to get the frequencies, how would I do that? How would I measure all the different frequencies? Can I see code? Can I stop asking questions?

Share this post


Link to post
Share on other sites
For simplicity, let's normalize your audio samples, so that each is a floating point number in range of [-1,1]. Now, you have buffer f of those; (f is widely used to represent spatial, that is time domain data, so I use it here).

float f[1024];

For these samples, you need to do Discrete Fourier Transformation. The mathematical formula can be easily found with a simple search, so I just give you simple implementation:
(note; I assume here that you are fairly experienced with C++ and it's standard libraries so you can fill out missing #includes and other stuff without anyone holding your hand there...)


for (int n = 0; n < 1024; n++)
{
std::complex((float)) value = 0;

for (int i = 0; i < 1024; i++)
{
float angle = (-2*M_PI*i*n) / 1024.0;
value += f * std::complex((float))( cos(angle), sin(angle) );
}

F[n] = std::abs( value );
}


Note: System screwed up templates, (( above should be 'smaller-than' char, and )) should be 'greater-than'...

Note: The result of transform is complex; that is, it has both amplitude and phase. Here I just ignored phase spectrum, because amplitude spectrum is usually the more interesting one.

Note: This is not the most accurate way to calculate DFT. It is not the only way to calculate DFT either, and as you can see, it is quite slow. I picked up a simple (and in this case, "accurate enough") DFT formula and applied it, so no bitching about that, please.

After the above is run (assuming I didn't make any mistakes there), first half of F (indices 0..511) will contain the frequency spectrum of audio data. Each value in array will represent a span of (Fs/1024) frequencies (where Fs is your samplerate, for example 44100 for CD sound, thus giving you resolution of 43Hz per sample).

The upper half of F (512..1023) will contain basically the same information than the lower part, only mirrored, so most of the time you'll probably want just to ignore it (or not to calculate it at all).



Edited by - Hway on June 29, 2001 5:41:49 AM

Share this post


Link to post
Share on other sites
That''s interesting merlin, because my book says "advanced". *shrug* Then again it''s a Microsoft book and what do they know. "Adaptive Delta" does make more sense, in light of the algorithm used to get from PCM to ADPCM.

All the .wav files I''ve had to deal with have been ADPCM. I have very rarely dealt with pure PCM data, and most of those have been .au files.

-fel

Share this post


Link to post
Share on other sites
Hway: to get this board to cooperate with your postings, you should enclose code samples in the [ code ] [ /code ] or the [ source ] [ /source ] tags (without spaces).

Ironfroggy: be careful that 8-bit PCM WAV files are represented different than 16-bit PCM WAV files. 8-bit and 16-bit refer to the size of each sample.

16-bit is a "normal" signed short, so if you read the value of 0, it would be ~0 volts in the waveform, and +32767 would be 1 volt in the waveform, etc.

8-bit is an "offset" unsigned char; the values range from 0 to 255, with 128 being about the zero point; in other words, with 8-bit PCM, 0 is -1V and 255 is +1V. I don''t know who''s stupid idea this was, and I seem to complain about it way too much, but there you have it.

Share this post


Link to post
Share on other sites
You know, fel, I''ve never encountered ADPCM WAV files, except for ones I explicitly made. For example, all of the WAVs that come with Windows are raw PCM, as are most of what you''ll find on the Internet. So...ADPCM...that''s weird.

Share this post


Link to post
Share on other sites
Well, when I actually cared what format it was in, I was working for a voiceprint software company. Basically I had to write the libraries to convert between ADPCM .wav''s and a few proprietary encrypted/watermarked formats. For games, I mostly just dump it into the mmio and DirectSound stuff and let that deal with it however it wants.

-fel

Share this post


Link to post
Share on other sites
I just had to chime in because I've worked on something similar recently.

The major of the wav files I've worked with have also been raw PCM's (44kHz 16bit L/R packed shorts), which is exactly how you envisioned them ironfroggy. ADPCM saves the offset from the previous point (using less bits). The idea is that the wave will never vary by more than (say) 12bits between points; so by using this technique you get 25% lossless compression. The advantage is those huge wav files are now smaller, but the code's a little more complicated because you now have packed 12bit samples instead of (reasonably) nice 16bit samples. There's a pile of ADPCM codecs using different amounts of bits and methods.

I submitted a snippet to GameDev on FFTing; if you're interested in making a spectrum analyzer (like the one in WinAmp) you need to do a bit more than just FFT. You can also download butt-quick FFTs and other DSP functions from the intel & amd websites.

If you take 1024 waveform points and run them through an APS (Auto Power Spectrum) you get 512 spectrum points out (aka a 'spectrum trace' or just a 'trace'). Each of those points is a relative measurement indicating how much power (how much energy of the wave contributes to) the corresponding frequencies. Each spectrum point (aka 'bin') represents a range of frequencies. That range is determined by the sampling frequency of the waveform and the number of points used in the APS.

n = waveform points
f = sampling rate
bin size = f / n

So a 1024 point APS from a 44.1kHz wave will yield a bin size of 43.06640625Hz. i.e. The first bin goes from 0-43, second bin from 43-86, etc... To get finer bin sizes you need to sample at higher rate or take the APS with more points.

When you display the data you'll want to display it on a logarithm scale, like in dB = log(APS)/20 (so long as the device you're using is scaled correctly, i.e. calibrated).

If you display the spectrum characteristics of an 128kbps mp3, it will drop off sharply at 16kHz; it's an indication that you've got it working.


Magmai Kai Holmlor
- The disgruntled & disillusioned


Edited by - Magmai Kai Holmlor on June 29, 2001 8:23:32 PM

Share this post


Link to post
Share on other sites
I don''t know all that C++ template stuff so I am having a little trouble understanding that. Could anyone redo that without the templates? Possibly even just in C because I would like to optimize in assembly and I find it much easier to go C -> Assembly than C++ -> Assembly.

I also have some more questions. Am I to understand that there are two forms of spectrums? For one its the different wave height ranges and for others its the frequency ranges? How exactly are they measured? Not code, actual explanations?

For the wave height one I would guess that the actual wave data is just fed into that statistically? But how do you find a full wave (peak to trough to peak) from the data to get the frequency spectrum?

Share this post


Link to post
Share on other sites
quote:
Original post by ironfroggy
I also have some more questions. Am I to understand that there are two forms of spectrums? For one its the different wave height ranges and for others its the frequency ranges? How exactly are they measured? Not code, actual explanations?


There are two domains that are commonly used when discussing signals: time domain and frequency domain.

To visualize a signal in the time domain, you draw linar amplitude of the signal vs. the linear time. This is called a waveform display, or just a wave display. For instance, if you plotted the value of each sample on a y axis vs the sample number on the x axis, you would have a waveform display.

The other domain is the frequency domain, where you talk about the spectral content of the signal. One method to get frequency domain data is to apply an FFT over some amount of the time domain data. What you get out is a complex (real & imaginary, magnitude and phase) representation of the data spectrum from the lowest frequencies to the highest frequencies over that period of time (also called a "window"). Generally, a "spectrum" graph is a graph of the magnitude of the spectrum data vs. frequency. The magnitude is usually presented in decibels, so you need to take the log of the data before displaying it in order for it to look right. For audio, the phase is usually ignored since the human ear is deaf to almost all forms of phase distortion.

As for how the spectrum is calculated, it''s easiest to look at the DFT. If you multiply some signal that is a combination of sinusoids by a single sinusoid, it filters all other sinusoids except the one by which you''re multiplying from the signal. The reason for this is that
A*sin(n*f) * B*sin(m*f) = A*B*sin^2(n*f) if n=m, 0 otherwise.

The DFT filters your signal at each frequency multiple of some small frequency resolution, which gives you a representation of the signal as a sum of sinusoids. The FFT uses the parallelism within discrete data in sets that are a power of 2 to reduce the number of computations needed significantly, but still offering the same answer.

These are pretty advanced mathematical topics, so if you want to learn more about this you should probably do some independent research.

Share this post


Link to post
Share on other sites