FFT DSP Signal processing?

Started by
23 comments, last by Thread 20 years ago
Hello, Wo can help me with C/C++ source from FFT or DSP or other filter function''s for analyzing analog audio waveforms? Thanks
Cogito Ergo Sum
Advertisement
quote:Original post by Thread
Hello, Wo can help me with C/C++ source from FFT or DSP or other filter function''s for analyzing analog audio waveforms?


Can you be more specific about what you need to do? There are too many signal processing and time-series techniques to list here.

-Predictor
http://will.dwinnell.com



Hi Predictor,
More specific about what I want to do.
I am working now on an application that analyzing the analog wave shape, (speech no music) example the waveform off "Hello".
This is (byte) 128 silence, 0 (max) lower wave, and 255 (max) high wave, zero is in sound wave max -low amplitude.
De sum of the energy is added low and high together. Low 128-45 and high (235-128) You cannot add 45 and 235 this is bigger than max 8 bit sample of 255.De energy is 235-128=107+45=152. This value must be open the mouth(lips). Now I am working on the frequent of the wave, How many waves’s in a second.
I want the energy off the analog waveform.
This energy must drive a mouth off 2D/3D character on the screen and in the future a servo from a real 3D character.
How many samples from the analog sound do I need to come (freq)translate the wave to open the mouth?
How filtering this sound[n]data? FFT, DSP or something else?
Now I take 50hz samples, 22050 Hz/50Hz = 441 samples.
example off my energy source:
Maybe Fast Fourier Transform off this data[441] is better?

//--------------------------------------------------------------
// Name: ShowEnergy()
// Desc: Display the energy level.
//-------------------------------------------------------------
void ShowEnergy(HWND hDlg,DWORD nIndex)
{
if(Eng < Freq) // Freq = 22050 sample /50 hz = 441
{
nWav = TempBuffer[nIndex]; // soundbuffer[n]

if(nWav >= 128) // high ampltude wave
{
nWav = nWav-128; // yes example 245-128=117
}
else
{
nWav = 128-nWav; // low amplitude wave inverse
}
// check threshold > behind 0 to 128 (8 bits sound format)
if(nWav > (int)dwThresHold) tWav+=nWav;

Eng++;
}
else
{
Eng=1;
if(tWav > Freq)
{
tWav = tWav/Freq; // devided the max energy from 441 samples
SendMessage( GetDlgItem(hDlg,IDC_PROGRESS1),PBM_SETPOS,tWav , 0 );
//Sleep(100);
SetDlgItemInt(hDlg,IDC_ENERGY,tWav,FALSE);
}
else
{
SendMessage( GetDlgItem(hDlg,IDC_PROGRESS1),PBM_SETPOS,0 , 0 );
SetDlgItemInt(hDlg,IDC_ENERGY,0,FALSE);
}
}

}// end ShowEnergy

regards Thread,

Cogito Ergo Sum
I''m moving this thread to the Maths & Physics forum... while there is an aspect of AI in the problem, the specific question is far more directed at M&P and thus, more help is likely to come from that forums.

Timkin
Yes it is more maths then AI, Thanks, Thread.
Cogito Ergo Sum
First: I´ve never dealt with this kin of problem neither even thought about it before but since you haven´t got any reply yet, I´ll try to help you by giving out a few remarks:

a) Generally the energy of a wave is the integral over the squared amplitude. The squared amplitude thus is the energy density.

b) I don´t see any connection between the energy of a sound wave and the mouth form. In fact you can emmit quite a range of sounds without even opening your mouth. The energy density should mainly depend on how much you exhale.

c) same goes for frequency. I´d even say the main frequency (now talking about the FT) is generated in the vocal chords.

d) to make things worse: I´d even say the vocal chords don´t even emmit a single sinoid wave but also subfrequencies. But if you´re lucky you can assume that those are highly surpressed (amplitude wise).

e) Same as for the vocal chords goes for the tounge. It also plays an important part in generating sound.

f) to make things even worse: I wouldn´t even count on the mouth form to be only dependant from the sound you actually emmit but it might also depend on the sound you emmited before. You have two "parameters" that both move with a finite "speed": Mouthform and ground-tone. Two different parameterizations might still end up in a similar sound. Remember that speaking is a process that even takes the human brain quite some time to learn even through it´s optimized for that. A linguist might be able to answer that question.


If you really want to analytically derive a mouth form for a given sound, I´d try (just a guess of mine; will probably not work) the following:
1) Get the FT of the sound.
2) Determine the main frequency.
3) Extract the relative positions of most relevant subfrequency(ies).
4) Hope you find a rule how to form the mouth dependant of the relative position(s) (the higher the subfrequencies the more the mouth is opened, I´d guess).

My assumption on how other programs deal with this:
They have a set of different mouth forms with (list of) sound related to it, which was derived empirically. Then, they compare the actual sound (perhaps by FFT ?) with the list and chose the mouth form which´s associated sound fit the best.
That´s more straightforward and probably gets better results.

Resume:
Your post first sounded very weird to me and the using of unsigned bytes really made things look unnessecary unreadable. But I think I started understanding what you where trying to ask, so my answer would be:

Try FTs and try to find the most fitting mouthform from a list which you generate by doing several vocals, humms, hisses and watching your mouth in a mirror.
Atheist Thanks,
Maybe your example, compare the actual sound with sound in memory, mapping this together with some threshold(n).
And then figuring out the mouth opening from this.
thread
Cogito Ergo Sum
quote:Original post by Thread
Now I take 50hz samples, 22050 Hz/50Hz = 441 samples.


A 50Hz sampling rate is no where near enough. To accurately represent a voice, you need at least an 8Khz sampling rate ( well, you *need* a bandwidth of 3.4Khz, so technically you could get away with using 6.8Khz, but you add a little extra to make sure ). It won''t be perfect, but it is enough to destinguish voice ( 8Khz is what the UK telecommunications network uses ( the landline one - the GSM networks use something less than this, hence why it isn''t quite the same quality ) ). Look up Nyquists sampling theorem, and aliasing.

If you want to do a sort of voice recognition thing, then I suggest you do an FFT and compare it to a base FFT ( look for similarities in the distance between harmonics, and so forth ). Firstly, I''d try to get it to recognise a simple single frequency signal, and then try to move to something like voice. If you want to model a mouth, by streaming an arbitary waveform through a system of somesort to do so, then it gets FAR more complicated. Moving the mouth up and down is a pretty simple process ( I.e. Half-life ), you can just use the power of the signal at any particular point ( you can average a bit if you want ), and set the mouth position accordingly ( so, more power -> more open mouth ). To do this you''ll need to look up power density spectra of transient signals, and other related things. I can recommend a few books on the matter if you want. However, if you want to actually model the mouth realistically ( i.e. have it morph and change shape, rather than just open and close ), I really don''t know how you''d do this. You''ll need some sort of predefined table of mouth actions for certain types of sound, and do some sort of preprocessing to see how much of each sound is in the signal, and perhaps merge these weights against each of the animations to provide a net output. Dunno really what I''m talking about there, but it seems an application for some fuzzy variables.

You have to remember that you''re unique, just like everybody else.
If at first you don't succeed, redefine success.
hi python_regious,
What you say about the power,
you can just use the power of the signal at any particular point ( you can average a bit if you want ), and set the mouth position accordingly ( so, more power -> more open mouth ).
>
Yes this is just what I want.
Voice recognition is not nesesary.
I an building a character (puppet) like Kermit(c)J Henson.
the green frog from Sesamstraat in Dutch.
Only not driving by hands but with Servo actuators and
interact with capturing webcam input...
regards Thread. Cogito Ergo Sum, I think
Cogito Ergo Sum
Ah cool, that idea would tie in nicely with the input of a digital control system then. Well, it wouldn''t be much harder to implement it with an analogue design either.

You have to remember that you''re unique, just like everybody else.
If at first you don't succeed, redefine success.

This topic is closed to new replies.

Advertisement