Jump to content
  • Advertisement
Sign in to follow this  
DrunkMonkey25

FFT data does not look correct...

This topic is 2712 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hello everyone!

I'm not quite sure where to begin... this might be a long one.

For the past few weeks I've been working on using sound in my tetris clone to make gameplay more difficult. I've created a DirectShow filter that passes sound samples to my game, performs a DFT on that sample (using FFTW), and displays the result as a sort of oscilloscope. Here is a short video of my results: [no longer active] http://www.youtube.c...h?v=3DfQhAfn3Fs
The ring on startup is a 100Hz test frequency (more on that later).

To me, that does not look correct. At the beginning of the song I should be seeing most of the action on the spectrum on the left side, but this isn't the case. You probably need to see some source code so lets start from the top.

My DShow filter is derived from CTransInPlace, and all my ::Transform() function does is copy data to an array of floats:

Edit 4/4/11: updated snippet
BYTE *data;
pSample->GetPointer(&data);

int j = 0;
for (int i = 0; i != MY_SAMPLE_SIZE; i += 4) //MY_SAMPLE_SIZE = 4096
{
//left channel
signed short temp = 0;
temp |= data[i + 1];
temp = temp << 8;
temp |= data;
//right channel
signed short temp1 = 0;
temp1 |= data[i + 3];
temp1 = temp1 << 8;
temp1 |= data[i + 2];

m_sample[j] = float(temp + temp1);
m_sample[j] /= scale; //scale = pow(2.0f, 15);
++j;


}

At any time my app can retrieve m_sample from the filter, and it does, every frame. I have tried different time intervals with the same results.

Setup of FFTW, not sure if this is the right plan, but I'm no math wiz and deciphering the FFTW documentation always sends me on a long tangent of research:

m_fftPlan = fftwf_plan_dft_r2c_1d(MY_REAL_SIZE, m_audioSample, m_fftOut, FFTW_MEASURE);


Hanning window generation, without this, there is much more "noise" in my o-scope, however the test frequency produces solid spikes in the wave (video on request), as opposed to the "ocean" that is displayed

//MY_REAL_SIZE = MY_SAMPLE_SIZE / 4 (2 channels, 2 bytes apiece consolidated into one value)
for (int i = 0; i != MY_REAL_SIZE; ++i)
m_fftWindow = 0.5f * (1 - cos((2 * D3DX_PI * i) / (MY_REAL_SIZE - 1)));

//application of hanning window (never saw any example of this, I just winged it)
for (int i = 0; i != MY_REAL_SIZE; ++i)
m_audioSample *= m_fftWindow;


FFT data retrieval:

//pos[0] is real; pos[1] is imaginary
return sqrt(m_fftOut[pos][0]*m_fftOut[pos][0] + m_fftOut[pos][1]*m_fftOut[pos][1]);


rendering of o-scope:

//saw this in a stackoverflow.com post
//supposed to convert to decibels
... log10(g_mp3Player->GetFFTData(i)) * 20.0f;


Well I think that is all that is pertinent to my problem. If you need to see more just let me know. I don't know what I'm doing wrong,

Thank you very much for taking the time to get through all of that. If you have any criticism, lay it on me! And if you want I can upload the executable so you can watch in real time, although I DO NOT garauntee that it will work on your machine. For now, the DShow sample consolidation is completely hardcoded, so if your particular setup does not pass at least 4096 samples, or passes mono or 8-bit samples, something bad will happen.

Once again, thank you for your time!

Edit:
After some tweaking and using a different test frequency, I've discovered that small, fast moving wave spikes emanate from the origin and bounce back and forth in my representation of the spectrum. I was thinking there might be a problem with my sample acquisition in the DirectShow filter.

If anyone has any input or can point me in the right direction, please do so! Edited by DrunkMonkey25

Share this post


Link to post
Share on other sites
Advertisement
From the Wild Guess Department: you're treating the buffer data as unsigned. Some sound data formats are signed and have a center offset (bias). I.e., the data doesn't range from 0 to 65535, but from -32767 to +32767. As a debugging step, you might plot the wave data itself (and/or m_sample) to see if you're interpreting it correctly.

You might also try using just one channel. I'm not sure that analysing the sum of the right and left channels makes sense.

Share this post


Link to post
Share on other sites
I've experimented a bit with using signed data and I don't really see any appreciable difference. I've also changed how the data is plotted on my scope, but I'm still seeing constant activity on the left side of the spectrum, especially on the zero-index (which should be zero hz, right? How is that possible?). If anyone can shed some light on the situation, I'd appreciate it!

Share this post


Link to post
Share on other sites
*bump*

Here is another video of my progress, the sound is low quality and a little strange (and annoying) in the beginning, but please bear with me. Also, my laptop speakers suck, which is why you can't hear anything on the low frequency portion of the test.
http://www.youtube.c...h?v=vqvlIQNTUnM

Bombs away!

A couple questions:
1. Looking at the way the spike moves through the spectrum, it seems that the frequencies contained in each bin are not evenly divided. Is that normal? Why do you think that is?

2. How could I use what I have to get specific sound data? What I mean is, how should I go about getting bass data, mid-range data and so on? Would it be a good idea to divide my fft data into 8 (non-linear) sections and sum up each chunk into one value?

I'm feeling a little lost and completely unconfident at the moment, and I would appreciate any helpful advice/links/whatever you can give me.

Thanks for your time! Edited by DrunkMonkey25

Share this post


Link to post
Share on other sites
I've made significant progress on my problem... Check it out here.

Not too shabby, eh? The white line is my FFT data, the green line at the top is the sum of sections of that data. On the left is the sum of each section since the song began and the average for that section (updated 16 times per second). If the current sample is above the average, "PEAK" is displayed below that. Its crude but I think its usable.

Thanks to several members of StackOverflow for the helping hand, and to everyone here who took a look!

Share this post


Link to post
Share on other sites
The sound and the spectrum seem to have nothing in common to be honest. The shape and amplitudes of the spectrum are roughly the same throughout the song, yet there are significant changes in the song that aren't reflected in the spectrum. Some examples:

The spectrum starts at 0:03, but the song doesn't start until 0:05. The silence does not produce an amplitude in frequency domain.
In the silence in the song from 0:12 to 0:14, the spectrum keeps showing the same shape as in the ten seconds intro.
The guitar at 0:43 introduces no amplitude changes to the spectrum. There should be significant activity in the higher frequencies.

I would say something is really broken with your spectrum, since what is shown really doesn't correlate with the sound you hear.

Share this post


Link to post
Share on other sites
...well, it looks much better in realtime. CamStudio was recording at around 10 FPS, whereas my app is running at 300+.

Despite that, you are correct. It certainly has its inconsistencies. In the frequency domain, I plot peaks as follows (edited for appearance):

float x = log10(real^2 + imaginary^2);
if (x < 0.0f)
return 0.0f;
return x;

If I remove the logarithm, there is a little more activity, and it corresponds with the intensity of the sound a little better. And in the absence of a window function, theres shit everywhere.

As far as seeing activity with silence, I think thats from spectral leakage. I have a test frequency that sweeps from 20kHz to 20Hz (check my stackoverflow post for a link) in which the leakage is highly visible. I know that I need to perform low-pass or band-pass filtering, but I can't find any good examples for doing that in the time domain. In frequency domain, its easy, but I'm not removing the problem, just making it look better around the edges. If you have any suggestions, I'd greatly appreciate it.

Share this post


Link to post
Share on other sites
Spectral leakage is in frequency domain, not in time domain, and does not produce significant frequency activity when there's long periods of silence. You can have maximum spectral leakage (basically a tone hitting between two frequency bins), but still a very rapid time-response. The time response should follow the amplitude in time rather quickly, unless you do long-time averaging, despite the spectral leakage. But the amplitudes of the spectrum acts nowhere near what the song sounds like.

I think you should post a complete example to show your current code. The code above is mostly fragments, although somewhat relevant fragments, but now that you've done adjustments we should see the current state of the code.

Share this post


Link to post
Share on other sites
Ok, first up is my DirectShow filter. The major change from what I was using is that instead of trying to merge the left and right channels into one value, I just store them next to each other in my sample. I figured that was where I was probably going horribly wrong, so I simplified it.

HRESULT CDrunkenFilter::Transform(IMediaSample *pSample)
{
CheckPointer(pSample, E_POINTER);

//for now, i do not check my actual data length, I know that its around 4099 bytes
//or something like that.
//if I ever release this I'll make it more friendly with different setups
//(I just want it to work for now)
BYTE *data;
pSample->GetPointer(&data);

int j = 0;
for (int i = 0; i != MY_SAMPLE_SIZE; i += 4) //MY_SAMPLE_SIZE = 4096
{
//left channel
signed short temp = 0;
temp |= data[i + 1]; //retrive high order byte
temp = temp << 8; //set to correct position
temp |= data; //add low order byte
//right channel
signed short temp1 = 0;
temp1 |= data[i + 3];
temp1 = temp1 << 8;
temp1 |= data[i + 2];

m_sample[j] = (float)temp;
m_sample[j] /= scale; //scale = pow(2.0f, 15);
++j;
m_sample[j] = (float)temp1;
m_sample[j] /= scale;
++j;
}

return NOERROR;
}

//application calls this to retrieve sample
//it does so every frame (probably erroneously, I should check the times
//on the samples in ::Transform, there might be some overlap)
STDMETHODIMP CDrunkenFilter::GetSample(float *ptr)
{
memcpy(ptr, m_sample, MY_REAL_SIZE); //MY_REAL_SIZE = (MY_SAMPLE_SIZE / 2)
return NOERROR;
}


The rest of the code in this post comes from the host app.

Various sound sample related things:

//window generation, done on initialization
//blackman-harris
for (int i = 0; i != MY_REAL_SIZE; ++i)
m_fftWindow = 0.35875f -
(0.48829f * cos((2 * D3DX_PI * i) / (MY_REAL_SIZE - 1))) +
(0.14128f * cos((4 * D3DX_PI * i) / (MY_REAL_SIZE - 1))) -
(0.01168f * cos((6 * D3DX_PI * i) / (MY_REAL_SIZE - 1)));

...

//fftw stuff, also done on init
//the output from the fft plan I have selected is N/2 + 1 elements
//in case you were curious about the last bit on the next line
m_fftOut = (fftwf_complex*)fftwf_malloc(sizeof(fftwf_complex) * (MY_REAL_SIZE/2 + 1));
m_fftPlan = fftwf_plan_dft_r2c_1d(MY_REAL_SIZE, m_audioSample, m_fftOut, FFTW_MEASURE);

...

//sample retrieval, window application and fft
// performed every frame
//m_drunk is the interface to my dshow filter
void cMP3Player::DoFFT(float dt)
{
m_drunk->GetSample(m_audioSample);
for (int i = 0; i != MY_REAL_SIZE; ++i)
m_audioSample *= m_fftWindow;
fftwf_execute(m_fftPlan);
}


Scope rendering

//FFT'd data retrieval
//commenting out the negative value check
//made the largest difference in the visualization
float cMP3Player::GetFFTData(int pos)
{
float x = log10(m_fftOut[pos][0]*m_fftOut[pos][0] + m_fftOut[pos][1]*m_fftOut[pos][1]);
//if (x < 0.0f)
//return 0.0f;
return x * 25;
}

...

//set vertices for 'o-scope'
//nothing changed, just thought it might belong
//GetRawData() retrieves the raw sound sample
//looks (roughly) like a sine wave when rendered
for (int i = 0; i != MY_REAL_SIZE/2 + 1; ++i)
{
fftLinePoints.y = g_graphics->GetBackBufferHeight() * 0.5f;
fftLinePoints.y -= g_mp3Player->GetFFTData(i);
//fftLinePoints.y -= g_mp3Player->GetRawData(i);
}


Results:
Test Frequency 20kHz to 20Hz (the 'phasing' at the start is not present in real time, guess my laptop microphone doesn't like high frequencies)
FFT WIP (I change the song around the 1 minute mark, which has a gradual fade in)
No Logarithm (I remove the log10 call and just plot the sum of the squares. much less activity)
edit: Oh yeah, the sound is terrible in every video. Please bear with me.

The main difference here (as you can see from previous videos) comes from using separate samples for the right and left channels. I didn't think I'd get a result like that, although it seems to respond to the music a little better.

Well, there it is. What do you think?

Share this post


Link to post
Share on other sites
From what I can see from the comments, it looks like you're just removed the clamping of the log-values to zero in this post, is that correct? It does look much more like it now, but there are is still an observation that bothers me. You calculate a single-sided DFT, appear to plot a single-sided DFT, but the output looks like a double-sided DFT. However, it is not a perfectly symmetric double-sided DFT, which is what you should get if it really is a double-sided DFT.

I think there is something wrong with your conversion from the raw sample data to your float buffer. I have experienced similar problems before in many DSPs, and with similar observations. Are you sure that the sample data you get is signed, and not unsigned? Try make the temporary variables unsigned short, and offset its zero-level. In this case, scale should probably be 2[sup]16[/sup], not 2[sup]15[/sup], since the unsigned short is likely 16 bits with no sign.

unsigned short *data;
pSample->GetPointer((BYTE *)&data);

for (int i = 0; i != MY_REAL_SIZE; i += 1)
{
m_sample = (float)data/scale - 0.5;
}

Even condensed your code a bit, although it's untested. The idea with unsigned variable, scaling and offsetting is shown though, so that should get you started if this code is not entirely correct. But verify, at least twice, the format of the raw sample data.

One more thing to keep in mind that, from your comments, it appears that the stream is stereo. That means your float sample buffer now contains interleaved left and right channels, alternating on a per-sample basis. This is absolutely not desired. Either sum the channels as you initially did, or stick just to one of the channels.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!