Jump to content
  • Advertisement
Sign in to follow this  

Stretch audio without distortion, nor pitch-shift

This topic is 827 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Hi at all! I think this is the right section to ask my question.

It's just a curiosity, but I'll try to express it like a question xD.

Anyway, I was reading about Nyquist–Shannon theorem, where it says the minimal sampling frequency of a signal is 2 times the maximum frequency of the signal. Now, human ears can hear a maximum of 20 KHz, so now I understand why most audio files are encoded at a sampling rate of 44.1 or 48 KHz.

Now, we do know that if we stretch an audio sample to make it longer, it becomes distorted. This effect can be reduced by pitch-shifting down the audio.

But, if I record an audio sample at a frequency of 96 KHz (like professional / studio microphones do), can I stretch the audio by slowing down by two times without distortion? I cannot test this by myself, because I don't have any good microphone, nor a professional audio card, but I think that by slowing down the audio, the minimum frequency of the audio would be enough to make it not distorted, right?


I think it's possible, because if we make an analogy with video: to give the illusion of motion, we need around 23 FPS (23 Hz). So, if we take its double, we have a solid motion of 46 FPS (like Thomas Edison said, anything less will strain the eye xD), which is pretty good (I can notice the difference between 46 and 60 FPS, but it's not a lot, at least for me).

If we take a slow motion video captured at 120 FPS, then stretch it down by two times, we can see the video is still smooth, because framerate is above the double of the maximum frequency.


So, please, let me know if is possible to stretch an audio without distortion...


Thanks for any help!

Excuse me for my bad english :(

Edited by FonzTech

Share this post

Link to post
Share on other sites

Yes, if you are doing signal processing it is quite easy to stretch audio without changing pitch.


A naive approach of changing recording -- like was done with physical media -- is actually changing the frequencies as well as the time.  Speeding it up means the frequencies are higher, slowing it down means the frequencies are lower.


With digital signals you can preserve the frequencies being played, but play them for twice as long.  If you want pitch shifting, you adjust the frequencies being played, and play them at the same speed. Exactly how you do that depends on your audio system you are using.


Many systems provide mechanisms for both; they've got both a tempo adjustment and a pitch adjustment.

Share this post

Link to post
Share on other sites

There's two main ways to accomplish time stretching and both will cause some artifacts:

  • Interpolation in time domain - stretch the audio samples by the stretch factor by interpolating the nearby ones (with sinc, cubic, or linear interpolation). This will cause pitch shifting because the waveform is stretched out/compressed.
  • Frequency-domain stretching - the input audio is windowed into chunks of power of two size, converted to frequency domain using an FFT. Then you can repeat every chunk twice to stretch by a 2x factor, or maybe something more sophisticated to handle non integer stretch factors. This method doesn't do any pitch shifting but will lead to smearing of transients in your input audio (with big FFT size) or loss of low frequency resolution (with small FFT size). This also introduces latency if you are doing real-time processing of streaming audio.

Generally, some artifacts will always be introduced due to the limitations of math/signal processing/causality. The challenge in designing these sorts of DSP algorithms is finding the sweet spot in terms of quality/performance for your intended application. There's no silver bullet as far as I am aware.


If you take a recording at 96kHz and reduce the speed by 2x, what happens is that ultrasonic frequencies in the 22-44kHz range are shifted to the 11-22kHz range and become audible (though probably quiet because most audio gear/microphones are not designed for recording ultrasound). Normal frequencies are shifted an octave lower.

Edited by Aressera

Share this post

Link to post
Share on other sites
Sign in to follow this  

  • Advertisement

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!