heuristically comparing mp3 files ( ideas needed )

Started by
3 comments, last by cyanide 18 years, 1 month ago
Hi, I'm working on a hobby project in which I plan to co-relate songs (mp3 files) by analysis of the waveform data. Its something like pandora.com, except my plans are to make it 100% automatic, distributed and open source. Anyway, so far I have been successful in decoding the waveform data as follows: 1) convert mp3 to wav using the lame decoder and 2) read the waveform samples and optionally plot them to graphs (example) with a Perl script (output similar to a goldwave's graph). Now I'm looking for the best approach to proceed. I know this requires a lot of experimentation but here are a few options. 1. Figure out the BPM for primary classification (like iTunes) - This looks interesting... 2. Try some fingerprint matching algorithm and overlap 10 second graphs (split from the the wavefile) to the existing songs database with the output sorted by the song that got the maximum no. of matches (hopefully taking into account scaling harmonics, tempo, etc as required to get the match). I'd like to hear what you guys have to say about this or maybe if you have any ideas to add to it. I hope this isn't a 100% crazy project (though 99% is acceptable <G>). Awaiting your input! Regards, San. P.S. Like if say you like "no quarter (led zeppelin)" then the application would be able to automatically recommend to you "electric glide in blue (Apollo 440)" or any other similar songs.
[size="1"]----#!/usr/bin/perlprint length "The answer to life,universe and everything";
Advertisement
Curiosity: can the beat detection algorithm figure out bpm on a 7/8 compass formula with beats going like |---|---|---|-- , instead of the usual 4/4 ( |---|---|---|--- )? It's an unregular rythmic pattern or however you wanna call.
This isn't my algorithm, but it looks like it does ( see here )

Also one more thing, can some enlightened soul tell me what does taking a fast Fourier transform (FFT) of the waveform data (from the .wav file) give me? I mean all i can figure from the wiki is, FFT converts time domain signals into the frequency domain but i'm still not sure what it means in reality? If somebody would be kind enough to explain this in simple terms. Thanks

Regards,
San
[size="1"]----#!/usr/bin/perlprint length "The answer to life,universe and everything";
I asked a similar question some months ago without getting any easy answers.
My problem was that I wanted to find out if a short piece of music was cut out from a larger piece.
Something like if you get a sample from a radio station and it could automaticly comapre the tune with your mp3's and give you the name of the song etc.
Note, this is NOT my intended use for what I asked, it's just something that's closely related to my problem.
My problem (and probably yours) is that the two inputs can be from different sources with different quality, bit rate, volume etc.

For me it's all about simplyfing an already working process and if it takes too much time to implement the "automatic verions" it's not worth it.

I think that you should transform your two songs to the frequency domain and compare them there (using a sliding window of samples).

Good look and please keep me updated.
thanks for the reply.. can you please help me understand what the numbers ( from -65535 to 65535 ) in the waveform data represent? Do they represent the frequency that is then sent to the sound card, 44,000 times per second (say, for a 44khz file)? Also what does it mean taking the FFT of this? Isn't FFT used used to work with complex nos? If so, what do I use for the iota part? Too much confused right now. Maybe I need to dig more and stop pestering you guys with less questions. Thanks anyway.

Regards,
San
[size="1"]----#!/usr/bin/perlprint length "The answer to life,universe and everything";

This topic is closed to new replies.

Advertisement