The Fourier transform has the unique capability of taking functions from the time domain to the frequency domain. A few hundred years ago French mathematician Jean Baptiste Joseph Fourier developed the calculus based transform to solve heat diffusion problems. A number of decades ago a method to quickly calculate discrete FFT's of the power was developed, and first implemented in Fortran. If you are interested in more history or details on implementing an FFT check out Numerical Recipes. I would be lying if I said Numerical Recipes (popular & comprehensive numerical algorithm book) had absolutely no impact on my FFT, but I would also be lying if I didn't find their C source revolting (it's a direct port of their algorithm from Fortran). Their licensing arrangements are also unacceptable (royalties & no dlls allowed). So I decided to write my own in C++; one that didn't use any gotos and that was faster by design.
Nearly every FFT algorithm I have reviewed has been concerned with preserving the amount of memory being used. They always use a so-called "in place" FFT. For the application I was concerned with, I had memory to blow, and needed an additional copy of the data. I needed to preserve the original time waveform and have the spectrum waveform data at the same time. So my FFT is an out-of-place variant. The first thing that is done during an FFT is a bit reserve sort. An in-place FFT must swap a good deal of the elements around whereas my FFT makes a copy, which is slightly faster (and preserves the original data). Further, I pre-calculate a large table of coefficients that many FFTs recalculate as needed. So my FFT algorithm is only suited to a high memory environment (well, it may need up to a few megs, which in the PC world isn't exactly "high" but in the embedded world its huge.) My unoptimized code out performs many in-place FFTs (such as the one included with LabView v5.x, it should be noted that LabView 6.x has an entirely new set of DSP algorithms which use modern techniques and are orders of magnitude faster at spectrum analysis than classic FFTs - or so they claim). Oh yeah, my FFT only works on powers of 2, which is rather limiting, but I haven't digested radix based FFTing yet, which is how you FFT other powers and combine them. Adding more radixes would also dramatically increase the memory requirements of my method as well. Its really good at FFTing sets of 512, 1024, 2048 points, which is what I needed to do in my application.
I think my code is slightly more readable than most available on the Internet, made so by its limited use and the FFTing methodology I implemented. It may not be clear why it accomplishes a FFT, but what the code does should be clear. The function of interest is _ROFFT (Real Out-of-place Fast Fourier Transform). _CISFT (Complex In-place Slow Fourier Transform) is a naive implementation, but is the simplest to implement & understand. The source is a VC6 workspace and has two projects, one to create a dsp.dll and a simple test program.
Incidentally, Intel has a DSP package available for download here