Sound Programming from Scratch

Started by
13 comments, last by Aressera 11 years, 2 months ago

I have written a software 3D sound engine during college.

As I was targeting windows only, I used waveOutOpen (http://msdn.microsoft.com/en-us/library/windows/desktop/dd743866(v=vs.85).aspx) to send the raw PCM data to the speakers.

This is a good starting point : http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=4422&lngWId=3.

Once you got a basic sound coming from the speakers, you can simple keep on adding features and build a nice architecture around it.

Mixing sounds is a simple as adding them together, resampling is simply interpolating and effects like echo or low/high pass filter aren't that hard and fairly documented.

Great info, I've been meaning to look into this for awhile myself.

A question if you come back around -- did you find it difficult to keep the buffer full for gap-less playback? Even though audio processing isn't terribly intensive, I've always been concerned about Windows' ability to keep up with the real-time constraints while also having fast response time to sound events. Audio is far more susceptible to even tiny gaps in playback -- the ear notices micro-second gaps, while entire video frames can slip by. What were your experiences?

In my experience with the waveOutOpen() family of functions you need a really big buffer to avoid gapless playback, making low-latency audio impossible. The reason behind this is that this API is not a callback-based API, whereas more advanced APIs like WASAPI on Windows and CoreAudio on OS X allow you to register a callback method which is called from the main system audio thread whenever output audio is needed. The OS/driver maintains the buffer for you and synchronizes the callback so that there is only a few ms of latency between your code and the hardware.

Advertisement

If you are interested in doing raw device I/O, check out the WASAPI. It is intended for use by modern professional audio applications, has low latency, and gives you access to all of the device's channels/sample rates/capabilities. It is the successor to waveOutOpen() and related functions on Vista+.

ASIO is another pro-level option supported by a lot of hardware drivers, but it isn't as widely supported as the above.

Thanks. This is definitely going to help. :)

I think, therefore I am. I think? - "George Carlin"
My Website: Indie Game Programming

My Twitter: https://twitter.com/indieprogram

My Book: http://amzn.com/1305076532

For lowest latency, you should look at WASAPI exclusive mode (with callbacks) or even ASIO (but that will not work with all sound cards).

I have written a software 3D sound engine during college.

As I was targeting windows only, I used waveOutOpen (http://msdn.microsoft.com/en-us/library/windows/desktop/dd743866(v=vs.85).aspx) to send the raw PCM data to the speakers.

This is a good starting point : http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=4422&lngWId=3.

Once you got a basic sound coming from the speakers, you can simple keep on adding features and build a nice architecture around it.

Mixing sounds is a simple as adding them together, resampling is simply interpolating and effects like echo or low/high pass filter aren't that hard and fairly documented.

Great info, I've been meaning to look into this for awhile myself.

A question if you come back around -- did you find it difficult to keep the buffer full for gap-less playback? Even though audio processing isn't terribly intensive, I've always been concerned about Windows' ability to keep up with the real-time constraints while also having fast response time to sound events. Audio is far more susceptible to even tiny gaps in playback -- the ear notices micro-second gaps, while entire video frames can slip by. What were your experiences?

In my experience with the waveOutOpen() family of functions you need a really big buffer to avoid gapless playback, making low-latency audio impossible. The reason behind this is that this API is not a callback-based API, whereas more advanced APIs like WASAPI on Windows and CoreAudio on OS X allow you to register a callback method which is called from the main system audio thread whenever output audio is needed. The OS/driver maintains the buffer for you and synchronizes the callback so that there is only a few ms of latency between your code and the hardware.

WaveOutOpen is callback based (see the last 3 parameters), combined with a dedicated thread, it worked pretty good.

I didn't really had issues with gaps. I used 2 64kb buffers (not sure if that is considered a big buffer for audio programming) and the effects weren't really compute intensive ( low/high pass filter, echo, ...).

I was able to play 20+ sounds at the same time without a problem.

Although there is a latency of 1-2 buffers before a sound is actually beeing played, I didn't noticed it.

WaveOutOpen is callback based (see the last 3 parameters), combined with a dedicated thread, it worked pretty good.
I didn't really had issues with gaps. I used 2 64kb buffers (not sure if that is considered a big buffer for audio programming) and the effects weren't really compute intensive ( low/high pass filter, echo, ...).
I was able to play 20+ sounds at the same time without a problem.
Although there is a latency of 1-2 buffers before a sound is actually beeing played, I didn't noticed it.

Oops, my mistake, I was going from memory. And yes, 64kb is a very large buffer. For 16-bit stereo sound, that's 16384 samples or almost 370ms at 44.1kHz. It's not surprising it only took a few buffers to get gapless playback. In my implementation, I needed a similarly sized delay buffer (but split over multiple smaller buffers) to avoid gaps. Most devices work on <= 512 samples in a buffer (for ~10ms of latency), so you're really not getting anywhere close to good latency.

The other reason why that API is not the best option is because it internally does sample rate conversion and other lossy effects on the audio before it is sent to the device. This is probably OK for simple playback but it's not desirable for more complex audio tasks.

This topic is closed to new replies.

Advertisement