raw PCM data.

Started by
10 comments, last by Wilhelm van Huyssteen 13 years, 7 months ago
Hi

Im not 100% sure if this is the right forum to ask this. ive considered posting this in "Sound and Music" but it's technical questions.

Ive just started playing with sound and openAL so my understanding is still limited. I have an array of bytes representing an audio stream. i use openAL to play it but i want to do visualizations by processing the byte array as the music plays. The streams sampling rate is 48000 hz and its 16 bit stereo (im asuming that means two 8 bit channels). How does PCM represent this? would the first byte in the stream be the value of the first sample for the "right" channel and the second byte of the stream be the first sample for the "left" channel and so forth? im just guessing here.

second question. i want stereo playback but i want my visualizations to use the "average" (if thats the right term) of the two channels. Can i combine the two channels into one simply by adding their sample values together and deviding by 2?

third question. if i take a slice of the audio stream. How would i determine its "average frequency"? I have found information about "FFT methods" (i have no idea what FFT really is) but nothing specific enough for what i need. any explenations or pointers to good resources apreciated.

Thnx in Advance!
Advertisement
I just had the joy of writing an audio file converter for work a couple of weeks ago...

Quote:Original post by EternityZA
The streams sampling rate is 48000 hz and its 16 bit stereo (im asuming that means two 8 bit channels). How does PCM represent this? would the first byte in the stream be the value of the first sample for the "right" channel and the second byte of the stream be the first sample for the "left" channel and so forth? im just guessing here.
Each channel is 16-bit, and the channels are interleaved.
So the first 2 bytes represents the left channel, the next 2 bytes the right, and so on. That means that your PCM data will be a multiple of 4 bytes (And if it's not, something is wrong [smile])

Quote:Original post by EternityZA
second question. i want stereo playback but i want my visualizations to use the "average" (if thats the right term) of the two channels. Can i combine the two channels into one simply by adding their sample values together and deviding by 2?
Yup.

Quote:Original post by EternityZA
third question. if i take a slice of the audio stream. How would i determine its "average frequency"? I have found information about "FFT methods" (i have no idea what FFT really is) but nothing specific enough for what i need. any explenations or pointers to good resources apreciated.
I'm not really sure about this, but FFT is probably what you want to. It involves converting a stream of data into a series of waveforms with various frequencies. Then you can average the frequencies and that's your output (I think, not entirely sure).
Quote:third question. if i take a slice of the audio stream. How would i determine its "average frequency"? I have found information about "FFT methods" (i have no idea what FFT really is) but nothing specific enough for what i need. any explenations or pointers to good resources apreciated.


Here's the most straightforward thing:
1 - Take FFT of a window of data.
2 - Compute magnitude of FFT. This gives you the power spectrum.
3 - Whichever frequency bin has the largest power is the one you report.

Some people have found that this is insufficient for, say, determining what note is being played, and use the cepstrum magnitude instead.

Another related way to do this, which is efficient if you have a fairly small set of tones you want to detect, is using a filter bank. This is also nice for real-time stuff since, while FFTs are "batch" operations that work on windows of data, filter banks are more compatible with a "streaming" view of data. If you've taken physics, you're familiar with the concept of resonant frequency; more-or-less, the idea here is to have a bunch of damped oscillators with different frequencies and you see which one is vibrating the most (though you use systems with more precise frequency response characteristics than simple harmonic oscillators).
Thnx fo clearing everything up for me.
Im going with FFT and after googling a bit i think i mostly understand what i need to do. i just cant figure out how to Compute magnitude of FFT.

among others i read through this page:
http://wikijava.org/wiki/The_Fast_Fourier_Transform_in_Java_(part_1)

for now i want to copy and use that method exactly like that. Ive read somewhere else that for what i want to achive i need to make the array of imaginary numbers all zero's. So if i want to process 1024 samples il pass the samples in in the first array and have 1024 zero's in the other one (is this correct? im not sure).

then i need to calculate the magnitude on this? how do i do that?

[Edited by - EternityZA on August 26, 2010 3:12:36 AM]
Normally, the input to an FFT is 2^N complex numbers. In your case, you insert 1024 real-valued samples, so the assumption will be that the imaginary part is 0. Depending on your FFT library, you may have to convert your values to complex values before using the FFT.
The output of the FFT is always 2^N complex numbers, so if you want to find the largest bin, you need to take the absolute value of all complex numbers.
|a+bi| = sqrt((a+bi)*(a-bi))
BUT since you are only interested in which of the bins that are, you don't need to do the sqrt operation since the bin containing largest value before and after the sqrt operation remains the same.

Ok, now you know which bin is the largest, so now you need to figure out what frequency it corresponds to.

The first thing is to get the resolution, which is sampling frequency/(2*number of samples), which for you is 7.8125Hz.

If you are able to plot the output you will notice that the first half (samples 0-510, if index starts at 0) is a mirrored version of the second half (512-1023).
Sample 511 corresponds to frequency 0, 512 is 7.8125Hz, 513 is 15.625Hz ... and 1023 is 8000Hz.


I don't think you can insert PCM data directly into an FFT. If my memory serves me right, the PCM format is logarithmic, so you may need to do some conversion before sending it into the FFT.

Averaging the two signals is probably fine for you.
Sorry but this is abit over my head. I dont get your explanation at all.

Emergent mapped out the steps i need to take.

Quote:Original post by Emergent
Here's the most straightforward thing:
1 - Take FFT of a window of data.
2 - Compute magnitude of FFT. This gives you the power spectrum.
3 - Whichever frequency bin has the largest power is the one you report.


i need a breakdown of these steps.

input: a 1 dimensional array containing 1024 samples
output: 1 float value representing the avarage frequency of the input.

Please bear with me :D

Thnx for the effort.
Quote:Original post by EternityZA
Sorry but this is abit over my head. I dont get your explanation at all.

Emergent mapped out the steps i need to take.

Quote:Original post by Emergent
Here's the most straightforward thing:
1 - Take FFT of a window of data.
2 - Compute magnitude of FFT. This gives you the power spectrum.
3 - Whichever frequency bin has the largest power is the one you report.


i need a breakdown of these steps.

input: a 1 dimensional array containing 1024 samples
output: 1 float value representing the avarage frequency of the input.

Please bear with me :D

Thnx for the effort.


1 - Take FFT of a window of data.
input: 1024 complex valued data.
output: 1024 complex valued data.

2 - Compute magnitude of FFT. This gives you the power spectrum.
input: 1024 complex valued data. (i.e. output step 1)
output: 1024 real (positive) valued data
Algorithm: real(input) * real(input) + imag(input) * imag(input)


3 - Find maximum bin
input: 1024 real (positive) valued data (i.e. output step 2)
output: 1 integer
Algorithm: index the input from 0-1023.
Ignore the values with index 0-511
Find maximum value of the indices 512-1023.
Report the index. (i.e. a value between 512 and 1023)

4 - Convert index to frequency
input: index from step 3.
output: frequency in Hz
Algorithm: (input-512)*16000/1024

Thnx alot. but now just to be clear. The "input: 1024 complex valued data" that you refer to in step 1. would that be 2 parrallel 1024 length arrays. with the real one contaning my sample data and the otehr one contaning zeros? or am i stil missing something?
Quote:Original post by EternityZA
Thnx alot. but now just to be clear. The "input: 1024 complex valued data" that you refer to in step 1. would that be 2 parrallel 1024 length arrays. with the real one contaning my sample data and the otehr one contaning zeros? or am i stil missing something?


That would depend on your FFT library. Once you have selected one, look at its documentation to see what type of data it wants.

edit: But yes. Somehow you will set the imaginary part to 0, whether it be a Complex class or two arrays (one representing the real values i.e you r data, the other the imaginary values with all 0s).

This topic is closed to new replies.

Advertisement