Jump to content

  • Log In with Google      Sign In   
  • Create Account

Interested in a FREE copy of HTML5 game maker Construct 2?

We'll be giving away three Personal Edition licences in next Tuesday's GDNet Direct email newsletter!

Sign up from the right-hand sidebar on our homepage and read Tuesday's newsletter for details!


We're also offering banner ads on our site from just $5! 1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


Sound Programming from Scratch


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
14 replies to this topic

#1 Glass_Knife   Moderators   -  Reputation: 4760

Like
0Likes
Like

Posted 27 January 2013 - 10:24 AM

I previously posted the question here, but got the muscian response.  http://www.gamedev.net/topic/637873-sound-programming-from-scratch/

 

I'd also like the programmer response.

 

 

Not sure if this question is better asked here or in the programming forum.  I've been playing with sound programming, but I find the APIs hide too much stuff.  When I learned 3D graphics, I started by learning from scratch.  I would like to know if anyone has any resources (books or online) that teach audio programming from scratch.  Much like writing a software renderer from scratch to learn about the algorithms.

 

I've googled and amazoned, but I don't really know enough about the subject to make a decision if the books I found are any good.  And man, I though software books were expensive.  Audio books are not cheap.  smile.png

 

Thanks,


I think, therefore I am. I think? - "George Carlin"
Indie Game Programming

Sponsor:

#2 irreversible   Crossbones+   -  Reputation: 1376

Like
4Likes
Like

Posted 27 January 2013 - 08:52 PM

Here's the definitive book that will help you get started with the basics: Scientist And Engineer's Guide To Digital Signal Processing. It's hefty (600+ pages), covers everything from audio to image processing to compression and it's FREE. It's written for absolute beginners, but develops to an intermediate difficulty as you keep reading, has sparse code listings and very few actual equations, which DSP is infamous for. If you're starting out, then IMO this is the place to get the basic knowledge.

 

"Audio programming from scratch" is a very broad term and if you want more specific advice, I'm afraid you're going to have to be more specific with your question!



#3 Glass_Knife   Moderators   -  Reputation: 4760

Like
0Likes
Like

Posted 28 January 2013 - 06:39 AM

"Audio programming from scratch" is a very broad term and if you want more specific advice, I'm afraid you're going to have to be more specific with your question!

 

I guess what I would like to understand is how to write Direct Sound or OpenAL from scratch.


I think, therefore I am. I think? - "George Carlin"
Indie Game Programming

#4 irreversible   Crossbones+   -  Reputation: 1376

Like
3Likes
Like

Posted 28 January 2013 - 10:08 AM

DirectSound is essentially a driver that bridges the gap between third party libraries (such as OpenGL, FMOD, etc) and hardware. DSound does emulate some effects and provides access to hardware acceleration if possible, but at ground level it's precisely that and nothing more: a driver.

 

Now, I'm not too familiar with OpenAL overall, but it's likely just a library like FMOD, which builds on top of native drivers depending on what operating system you're compiling on and what is available. OpenAL and FMOD (and other libraries) also provide additional functionality, like time-to-frequency domain conversion (essentially raw FFT and IFFT calls), effects (reverb, delay, etc) and format support (easy loading of audio file formats).

 

In short, you're probably not thinking of writing a driver, in which case "writing DirectSound from scratch" doesn't really make much sense. You are probably thinking of implementing various library functionalities, such as effects and the like (just to be clear: if you do - for whatever reason - want to write a driver, then I can't help you). In the latter case, however, I would suggest two things:

 

1) start by reading the book I linked to. I'm sorry to say, but it's kind of apparent that you're not really aware of what you're even wanting to do. Building a knowledge base to work off of is the place to start. DSP is literally one of the most comprehensive and demanding fields out there and has to do with everything from circuitry design to programming synthesizers to implementing an incredible slew of various effects in code

2) if you don't feel comfortable simply reading up on things and really really want to do some coding, try an icebreaker assignment: keep reading and start writing somethin like a really simple additive synthesizer (let's say 2 oscillators using a few wavetables and a couple of filters). You will never figure out how this stuff works from code (which is why reading is so important), but conversely also implementing things like filters in code from theory is highly technical. My approach, which I deem pretty healthy, is that it's essential to have an understanding of what each knob (on a synthesizer or audio control panel) does and how it affects the signal, but it isn't imperative to understand the underlying mathematics. The same applies to code unless you really want to over-compensate.

 

As another thing, you might want to start by examining how software synthesizers work and what all the different knobs do. Let me know if you would like some suggestions.

 

As for code, here are two invaluable resources to get you started: KVR Audio (check out the forums for active DSP-related discussions) and musicdsp.org (check out the wide variety of user-submitted source code listings).

 

If you're wondering what a software synth has in common with an audio library, then the answer is that a synth generally boils down to being a DSP library in and of itself with the distinction that all modules are specialized and structured to manipulate sound in a specific sequence as opposed to being standalone functions.

Hopefully I understood you correctly and what I wrote helps!



#5 Glass_Knife   Moderators   -  Reputation: 4760

Like
0Likes
Like

Posted 28 January 2013 - 12:49 PM

DirectSound is essentially a driver that bridges the gap between third party libraries (such as OpenGL, FMOD, etc) and hardware. DSound does emulate some effects and provides access to hardware acceleration if possible, but at ground level it's precisely that and nothing more: a driver.
 
Now, I'm not too familiar with OpenAL overall, but it's likely just a library like FMOD, which builds on top of native drivers depending on what operating system you're compiling on and what is available. OpenAL and FMOD (and other libraries) also provide additional functionality, like time-to-frequency domain conversion (essentially raw FFT and IFFT calls), effects (reverb, delay, etc) and format support (easy loading of audio file formats).
 
In short, you're probably not thinking of writing a driver, in which case "writing DirectSound from scratch" doesn't really make much sense. You are probably thinking of implementing various library functionalities, such as effects and the like (just to be clear: if you do - for whatever reason - want to write a driver, then I can't help you).

Yes, I don't really know what I want out of this smile.png. I've done lots of programming, and written a software renderer from scratch to learn about graphics. The last two books I purchased about 3D engine programming didn't cover sound. It seemed strange, because I figured that the sound stuff would be important. The more I learn about this, however, the more it seems like the sound and graphics are very different areas.

So yes, I shouldn't say I was to write Direct Sound. I think I mean I would like to be able to do things in software like mixing, reverb, pan, High and low pass filters, and that kind of thing. I don't really know what I need to learn, because if I already knew that, I wouldn't need to ask. biggrin.png

I will check out the book. It looks like a good place to start.

Edited by Glass_Knife, 28 January 2013 - 12:49 PM.

I think, therefore I am. I think? - "George Carlin"
Indie Game Programming

#6 Yourself   Crossbones+   -  Reputation: 1144

Like
2Likes
Like

Posted 29 January 2013 - 04:01 AM

I have written a software 3D sound engine during college.

As I was targeting windows only, I used waveOutOpen (http://msdn.microsoft.com/en-us/library/windows/desktop/dd743866(v=vs.85).aspx) to send the raw PCM data to the speakers. 

This is a good starting point : http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=4422&lngWId=3.

Once you got a basic sound coming from the speakers, you can simple keep on adding features and build a nice architecture around it.

Mixing sounds is a simple as adding them together, resampling is simply interpolating and effects like echo or low/high pass filter aren't that hard and fairly documented.

 



#7 Glass_Knife   Moderators   -  Reputation: 4760

Like
0Likes
Like

Posted 29 January 2013 - 06:36 AM

I have written a software 3D sound engine during college.

As I was targeting windows only, I used waveOutOpen (http://msdn.microsoft.com/en-us/library/windows/desktop/dd743866(v=vs.85).aspx) to send the raw PCM data to the speakers. 

This is a good starting point : http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=4422&lngWId=3.

Once you got a basic sound coming from the speakers, you can simple keep on adding features and build a nice architecture around it.

Mixing sounds is a simple as adding them together, resampling is simply interpolating and effects like echo or low/high pass filter aren't that hard and fairly documented.

 

YES!!!  This is what I was looking for.  The sound equivalent of getting a buffer and setting each pixel value.  This, along with the DSP book, is a great starting point.

 

Thanks!


I think, therefore I am. I think? - "George Carlin"
Indie Game Programming

#8 Ashaman73   Crossbones+   -  Reputation: 7793

Like
1Likes
Like

Posted 29 January 2013 - 08:01 AM

As I was targeting windows only, I used waveOutOpen (http://msdn.microsof...6(v=vs.85).aspx) to send the raw PCM data to the speakers.

You should although consider the XAudio2 windows api.



#9 Aressera   Members   -  Reputation: 1458

Like
1Likes
Like

Posted 29 January 2013 - 09:03 PM

If you are interested in doing raw device I/O, check out the WASAPI. It is intended for use by modern professional audio applications, has low latency, and gives you access to all of the device's channels/sample rates/capabilities. It is the successor to waveOutOpen() and related functions on Vista+.

 

ASIO is another pro-level option supported by a lot of hardware drivers, but it isn't as widely supported as the above.



#10 Ravyne   GDNet+   -  Reputation: 7738

Like
0Likes
Like

Posted 29 January 2013 - 09:47 PM

I have written a software 3D sound engine during college.

As I was targeting windows only, I used waveOutOpen (http://msdn.microsoft.com/en-us/library/windows/desktop/dd743866(v=vs.85).aspx) to send the raw PCM data to the speakers. 

This is a good starting point : http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=4422&lngWId=3.

Once you got a basic sound coming from the speakers, you can simple keep on adding features and build a nice architecture around it.

Mixing sounds is a simple as adding them together, resampling is simply interpolating and effects like echo or low/high pass filter aren't that hard and fairly documented.

 

Great info, I've been meaning to look into this for awhile myself.

 

A question if you come back around -- did you find it difficult to keep the buffer full for gap-less playback? Even though audio processing isn't terribly intensive, I've always been concerned about Windows' ability to keep up with the real-time constraints while also having fast response time to sound events. Audio is far more susceptible to even tiny gaps in playback -- the ear notices micro-second gaps, while entire video frames can slip by. What were your experiences?



#11 Aressera   Members   -  Reputation: 1458

Like
0Likes
Like

Posted 29 January 2013 - 09:54 PM

I have written a software 3D sound engine during college.

As I was targeting windows only, I used waveOutOpen (http://msdn.microsoft.com/en-us/library/windows/desktop/dd743866(v=vs.85).aspx) to send the raw PCM data to the speakers. 

This is a good starting point : http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=4422&lngWId=3.

Once you got a basic sound coming from the speakers, you can simple keep on adding features and build a nice architecture around it.

Mixing sounds is a simple as adding them together, resampling is simply interpolating and effects like echo or low/high pass filter aren't that hard and fairly documented.

 

Great info, I've been meaning to look into this for awhile myself.

 

A question if you come back around -- did you find it difficult to keep the buffer full for gap-less playback? Even though audio processing isn't terribly intensive, I've always been concerned about Windows' ability to keep up with the real-time constraints while also having fast response time to sound events. Audio is far more susceptible to even tiny gaps in playback -- the ear notices micro-second gaps, while entire video frames can slip by. What were your experiences?

 

In my experience with the waveOutOpen() family of functions you need a really big buffer to avoid gapless playback, making low-latency audio impossible. The reason behind this is that this API is not a callback-based API, whereas more advanced APIs like WASAPI on Windows and CoreAudio on OS X allow you to register a callback method which is called from the main system audio thread whenever output audio is needed. The OS/driver maintains the buffer for you and synchronizes the callback so that there is only a few ms of latency between your code and the hardware.


Edited by Aressera, 29 January 2013 - 09:55 PM.


#12 Glass_Knife   Moderators   -  Reputation: 4760

Like
0Likes
Like

Posted 29 January 2013 - 10:45 PM

If you are interested in doing raw device I/O, check out the WASAPI. It is intended for use by modern professional audio applications, has low latency, and gives you access to all of the device's channels/sample rates/capabilities. It is the successor to waveOutOpen() and related functions on Vista+.

 

ASIO is another pro-level option supported by a lot of hardware drivers, but it isn't as widely supported as the above.

 

Thanks.  This is definitely going to help.  :)


I think, therefore I am. I think? - "George Carlin"
Indie Game Programming

#13 l0calh05t   Members   -  Reputation: 771

Like
0Likes
Like

Posted 30 January 2013 - 03:49 AM

For lowest latency, you should look at WASAPI exclusive mode (with callbacks) or even ASIO (but that will not work with all sound cards).



#14 Yourself   Crossbones+   -  Reputation: 1144

Like
0Likes
Like

Posted 30 January 2013 - 03:59 AM

 

I have written a software 3D sound engine during college.

As I was targeting windows only, I used waveOutOpen (http://msdn.microsoft.com/en-us/library/windows/desktop/dd743866(v=vs.85).aspx) to send the raw PCM data to the speakers. 

This is a good starting point : http://www.planet-source-code.com/vb/scripts/ShowCode.asp?txtCodeId=4422&lngWId=3.

Once you got a basic sound coming from the speakers, you can simple keep on adding features and build a nice architecture around it.

Mixing sounds is a simple as adding them together, resampling is simply interpolating and effects like echo or low/high pass filter aren't that hard and fairly documented.

 

Great info, I've been meaning to look into this for awhile myself.

 

A question if you come back around -- did you find it difficult to keep the buffer full for gap-less playback? Even though audio processing isn't terribly intensive, I've always been concerned about Windows' ability to keep up with the real-time constraints while also having fast response time to sound events. Audio is far more susceptible to even tiny gaps in playback -- the ear notices micro-second gaps, while entire video frames can slip by. What were your experiences?

 

In my experience with the waveOutOpen() family of functions you need a really big buffer to avoid gapless playback, making low-latency audio impossible. The reason behind this is that this API is not a callback-based API, whereas more advanced APIs like WASAPI on Windows and CoreAudio on OS X allow you to register a callback method which is called from the main system audio thread whenever output audio is needed. The OS/driver maintains the buffer for you and synchronizes the callback so that there is only a few ms of latency between your code and the hardware.

 

WaveOutOpen is callback based (see the last 3 parameters), combined with a dedicated thread, it worked pretty good.

I didn't really had issues with gaps. I used 2 64kb buffers (not sure if that is considered a big buffer for audio programming) and the effects weren't really compute intensive ( low/high pass filter, echo, ...). 

I was able to play 20+ sounds at the same time without a problem. 

Although there is a latency of 1-2 buffers before a sound is actually beeing played, I didn't noticed it. 



#15 Aressera   Members   -  Reputation: 1458

Like
0Likes
Like

Posted 30 January 2013 - 01:56 PM

WaveOutOpen is callback based (see the last 3 parameters), combined with a dedicated thread, it worked pretty good.
I didn't really had issues with gaps. I used 2 64kb buffers (not sure if that is considered a big buffer for audio programming) and the effects weren't really compute intensive ( low/high pass filter, echo, ...). 
I was able to play 20+ sounds at the same time without a problem. 
Although there is a latency of 1-2 buffers before a sound is actually beeing played, I didn't noticed it. 

 

Oops, my mistake, I was going from memory. And yes, 64kb is a very large buffer. For 16-bit stereo sound, that's 16384 samples or almost 370ms at 44.1kHz. It's not surprising it only took a few buffers to get gapless playback. In my implementation, I needed a similarly sized delay buffer (but split over multiple smaller buffers) to avoid gaps. Most devices work on <= 512 samples in a buffer (for ~10ms of latency), so you're really not getting anywhere close to good latency.

 

The other reason why that API is not the best option is because it internally does sample rate conversion and other lossy effects on the audio before it is sent to the device. This is probably OK for simple playback but it's not desirable for more complex audio tasks.






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS