Audio Programming - Find pitch (note) from sound file

Started by
5 comments, last by TheAdmiral 17 years, 2 months ago
Hey all, I am wondering if anyone knows how to go about finding the note / frequency of a sound file being played - and to further that find the note / frequency of a sound coming live through an input. ie. I play a mp3 of 5 notes A B C D E and a person has a microphone and has to sing / hum in tune with the recording - how would I go about checking the input from the player and knowing if it matches what it should be? It`s getting the info from the input(mic) thats stumping me, but I`m guessing I have to take the frequency of the signal and compare etc - but how? Any help appreciated.
[ rIK ][ rik@ram-solutions.co.uk ]
Advertisement
I'm guessing you would do something like...

Get your reference tone sound file.

Have the person record their version.

Analyze the frequencies of the recorded file.

This guy has some apparant values for tone frequencies:

From: David Levine (lunar@sunsite.unc.edu)
Date: 4-Apr-95 (21:1:5 GMT)
Subj: Tone Frequencies

Tone A = 73.179 Hz
Tone B = 85.375 Hz
Tone C = 97.572 Hz
Tone D = 109.768 Hz
Tone E = 121.965 Hz
Tone F = 134.161 Hz
And so on. Each tone is about 12.1965 Hz higher in freq.
than the prev. tone. Each tone is 0.492 seconds long, plus
or minus 5 percent, pretty much randomly. AU file, anyone?

I would try go from there. Depending on how scienfic it needs to be, you could also try overlapping the relative tones and listening for beat frequency or just variation in general to glean some new insight into the problem.

To move up to realtime, you'll just have to take whatever strategy you use (say, comparing incoming frequency to those listed within a % margin of error) and execute it on the stream once inputted.
You might want to check out music-dsp. I don't know if you'll find anything in their source archive, but I'm sure you'll find someone on either their forums or mailing list that could help you out with this.
A simple method is applying a bandpass filter to the signal that is centred around the pitch that it should be (or you think it is) and then using a zero crossing detector which just looks at all the times the signal crosses the zero, the spacing between the zero crossings is the period of the signal, so you can get the frequency from that.

You could also do a fourier transform and look at the frequency spectrum, e.g. find the frequency with the highest peak and call that the pitch.
Quote:Original post by Monder
A simple method is applying a bandpass filter to the signal that is centred around the pitch that it should be (or you think it is) and then using a zero crossing detector which just looks at all the times the signal crosses the zero, the spacing between the zero crossings is the period of the signal, so you can get the frequency from that.

You could also do a fourier transform and look at the frequency spectrum, e.g. find the frequency with the highest peak and call that the pitch.


I'll be upfront and say I don't know a whole lot of DSP theory, but it seems like your first suggestion wouldn't work. Correct me if I'm wrong, but it seems like you're saying to do this all in the time domain (by the "look at all the times the signal crosses the zero" comment). But when in the time domain, applying a band pass filter doesn't make sense because the BPF implies, as you said, is centered about a specific frequency, i.e., it's defined for the frequency domain of the spectrum you obtain after transforming the signal. Plus, if you are in the time domain and do count the number of zero crossings when windowing a portion of the signal, that might not give you a good estimate if the signal begins containing many partials which will cause interference, resulting in many zero crossings across one period of the fundamental.

The second idea though should work. Your fundamental ought to be the largest peak in your frequency spectrum, which as was said is obtained by FFTing the signal. Using your list of frequencies for notes, you can see if the peak you found relates to one of those frequencies or a frequency that's n octaves above/below your reference frequency. I have no idea though on the efficiency of such an algorithm.
Once I made a simple tuner for my guitar. I used FFTW to apply a fourier transformate to get the frequencies spectrum. Then I get the freq. with the higher amplitude. At least, IIRC. If you search here in GD you could find my old topics and the answers I've got.
Indeed, the only 'safe' way to do this is to analyse the frequency spectrum. There are many resources out there regarding the Fourier transform (in particular, the FFT), which you can cannibalise to reasonable effect, but I recommend you first take a search on what is known as the Q-transform, as it's precisely what you need.

The Q-transform is very similar to the discrete Fourier-transform, only the filters are geometrically-spaced, so that the resulting function represents notes, rather than frequencies. Given a Q-transform of your sample, you need only find the peak value, and you'll know which note is dominant.

However, I have no idea how much source code you'll find on the topic, so you may have to make do with a FFT. The Fourier-transform of a data set can be converted quickly into a Q-transform, as described in this paper, but you may get away with a rough approximation such as summing the first few octaves under respective band-pass-filters.
There are a good few fast Fourier-transform libraries out there, though I never found one I was truly happy with. If I had to recommend, I'd choose FFTW for power and flexibility, or Laurent de Sora's FFTReal for ease-of-use.

Admiral
Ring3 Circus - Diary of a programmer, journal of a hacker.

This topic is closed to new replies.

Advertisement