I have implemented FFT in my project (I know what is the FT and its uses)
This is what I have understood:
1) First we need to calculate the Phillips' Spectrum, in a frequency domain.
2) Then we use the inverse FFT to convert it into a spatial domain.
I have trouble understanding where your problem lies then. You basically use the iFFT to efficiently sum a large amount of sine waves. That's it. The paper covers mainly how to calculate the complex coefficients of those waves.
The Phillips spectrum really only provides a weight for each wave depending on its frequency/wavelength. Therefore, you get a radially symmetric frequency space texture at this point (considering the weights/coefficients of the low-freq waves are towards the middle of the tex. A 1D slice of the Phillips spectrum looks something like this then: __...----==._.==----...__ ). Next you add some Gaussian noise since nature is never perfectly regular. You might also want to multiply with a directional weight to give the resulting waves a primary propagation direction. Up to this point you have calculated the magnitude of the complex coefficients - the height of each sine wave. But you also want the waves to actually move (I assume). For that you have to alter the phase of the wave by rotating its complex coefficient depending on a time variable. Here you want to consider that dispersion stuff which describes the relation of propagation speed to wavelength (low-freq waves travel faster). Also, adding a constant random offset to the phase angle might be a good idea (to break up regularities). Now perform an iFFT and you have got your height field.
If instead of just sine waves you want to to sum Gerstner waves (which you totally want) you have some extra work to do. Basically you need two more textures + iFFTs to compute a horizontal offset vector in addition to the scalar height offset you already calculated (if I remember correctly the paper mentions something of multiplication with i in this context - this is really just a 90° rotation of the phase angle, nothing fancy).
To ensure your FFT works correctly you could try to reproduce the low-pass filter results from the Fourier Transform section of this talk: http://www.slideshare.net/TexelTiny/phase-preserving-denoising-of-images
It might also help if you could show us your frequency space results.