Audio Programming

Started by
5 comments, last by Valoon 10 years, 2 months ago

Hello everyone,

I am a sound designer and as a way to become more valuable to the industry I'd like to learn audio programming too so I'm here for some advices.

A little bit of background first, I'm not doing a sound design degree (I learn it outside of classes with a different professor) but a "computer music" master's degree. In this degree there is actually a lot of programming and a lot of stuff about DSP and synthesis.

The big problem (for me) is that we learn it mostly from an artist POV and as a result we do it mostly with high level stuff like MAx/Msp for example, but we do use a bit of C too.

I also learn C/C++ on my own too because I like it and even tho I'm definitely not a pro I can handle code. What I do lack the most is probably maths and physics.

I'd like to understand what an audio programmer exactly does and what could I use to teach myself this skill? (Note that it's a long term project, not something I expect to do in 2 months). I already got the book "Audio Programming Book" and I'd like to make things related to game audio as soon as possible. Any tips?

Thank you!

Advertisement

I've only known a couple of guys (neither of whom worked for game developers) that might have been classifiable as "audio programmer" -- one worked on making old compression algorithms more efficient and portable, and also on implementing these algorithms on FPGA chips. Another guy was more like the sound librarian, concerned with cataloging and checking the specs (sample rate, max volume), and handling details related to distribution. Both had similar backgrounds and qualifications -- college degree in comp sci, decent C/C++ skills.

The Four Horsemen of Happiness have left.

how to answer?...

in a basic sense, working with audio is mostly about loops, arrays of numbers, interpolation, and doing basic calculations on these numbers.

beyond this, and maybe dealing with issues like getting audio to/from the sound hardware, or dealing with specific audio file-formats, there doesn't really seem to be a whole lot particularly notable IME.

things are made simpler mostly in that computers are fast enough that one can generally get along pretty well using some fairly naive math and lots of floating point calculations (say, in contrast to real-time video processing, which is much more speed-critical).

dealing with more specific tasks, like writing a mixer for a 3D world, or writing a MIDI synthesizer, or doing text-to-speech, ... may involve a few more things specific to the use-case or situation, but these aren't really a whole lot fundamentally different from the above (it would be things like calculating how loud/quiet something should be based on distance, or how much to Doppler shift and delay it based on its current velocity and distance, or playing a sound clip for an instrument faster or slower depending on the note being played, ...).

trying to find information about various audio topics online may sometimes be made more difficult by lots of information being either in the form of opaque mathematical notation and/or depictions of various analog electronic devices, but this is a secondary issue I think.

in games though, a lot of people mostly side-step all this stuff by using audio-libraries though.

I've only known a couple of guys (neither of whom worked for game developers) that might have been classifiable as "audio programmer" -- one worked on making old compression algorithms more efficient and portable, and also on implementing these algorithms on FPGA chips. Another guy was more like the sound librarian, concerned with cataloging and checking the specs (sample rate, max volume), and handling details related to distribution. Both had similar backgrounds and qualifications -- college degree in comp sci, decent C/C++ skills.

This is not for a full time job thing, I'm trying to better myself at things around audio since I am a sound designer. So it's more about stuff I could do on my own to improve my skills in this area.

Anyways I do think there is audio programmers in the game industry, I saw some job opening for it. That's not the most common tho.

how to answer?...

in a basic sense, working with audio is mostly about loops, arrays of numbers, interpolation, and doing basic calculations on these numbers.

beyond this, and maybe dealing with issues like getting audio to/from the sound hardware, or dealing with specific audio file-formats, there doesn't really seem to be a whole lot particularly notable IME.

things are made simpler mostly in that computers are fast enough that one can generally get along pretty well using some fairly naive math and lots of floating point calculations (say, in contrast to real-time video processing, which is much more speed-critical).

dealing with more specific tasks, like writing a mixer for a 3D world, or writing a MIDI synthesizer, or doing text-to-speech, ... may involve a few more things specific to the use-case or situation, but these aren't really a whole lot fundamentally different from the above (it would be things like calculating how loud/quiet something should be based on distance, or how much to Doppler shift and delay it based on its current velocity and distance, or playing a sound clip for an instrument faster or slower depending on the note being played, ...).

trying to find information about various audio topics online may sometimes be made more difficult by lots of information being either in the form of opaque mathematical notation and/or depictions of various analog electronic devices, but this is a secondary issue I think.

in games though, a lot of people mostly side-step all this stuff by using audio-libraries though.

Do you happen to know a good book about it? I ordered "audio programming book" and I already know a fair bit of DSP and audio stuff thanks to my degree. I think I will lack the math and some programming mostly.

Again this is not for a full career or for a degree, it's just to learn on the side.

After writing my own sampler and mixer with XAudio2, I can tell you that there was little information easily found on the internet at all. I have previously noted that gamdev does not have a specific forum devoted to audio programming.

I assume that your prior knowledge on DSP is not code related, but on how these work on waveforms. If that's the case, I would look into how to manipulate waveforms and attempt to recreate the tools of trade as used in the studio, amplitude effects such as compressors/limiters or expanders, delay effects like flanging and reverb, and filter effects like phasers, EQ, and wahs. This would tap into your knowledge of DSP and give you stuff to work on with your hobby coding.



Do you happen to know a good book about it? I ordered "audio programming book" and I already know a fair bit of DSP and audio stuff thanks to my degree. I think I will lack the math and some programming mostly.

Again this is not for a full career or for a degree, it's just to learn on the side.

I have little idea about books here.

I mostly just learned stuff by writing code, mostly using some amount of guesstimation and trial-and-error and similar, and in some cases trying to scavenge information off the internet.

but, yeah, as noted by another poster, relevant information is hard to find...

even then, one may still end up with issues, which need to be solved in slightly inelegant ways.

one example was recently noting while recording in-game video, that the audio and video were out of sync.

the video was frames as seen by the renderer, and the audio was whatever was coming out of the in-game mixer at that particular moment. however, the audio was slightly ahead of the video. the solution was basically to insert an audio delay into the video recording, then tuning the delay-values until they matched up. why? because it apparently takes a little bit of time between when audio is mixed in-game, and when it comes out of the speakers.

but, yeah, the major thing about audio I suspect is mostly about knowing basic programming stuff, and being generally familiar with working with arrays.

for example, your audio data will typically be in the form of arrays of "samples" at a particular "sample rate".

if you want to produce output samples, typically it will consist of a loop, which will calculate each sample and put it into the output-array.

typically, the input audio is also in the form of arrays of samples, so the position of the current sample being mixed may be used to calculate the position of the input samples you want to mix, ...

however, often the desired input sample doesn't land exactly on a sample, so then we interpolate. for example, a common strategy is linear interpolation, or "lerp" (invoking math here):

lerp(a, b, t)=(1-t)*a+t*b

where a and b are the adjacent input samples, and t is the position between a and b.

another option is using a spline, for example, one possible spline function:

splerp(a, b, c, d, t)=lerp(lerp(b, 2*b-a, t), lerp(2*c-d, c, t), t)

where a,b,c,d are the adjacent input samples, and the desired value is between b and c.

2*b-a and 2*c-d are what are known as linear extrapolation, where c'=2*b-a (c' being the value of c as predicted by extrapolating from a and b).

we effectively then form a pair of predictions, and then interpolate between these predictions to get an answer.

the idea here is basically that with a series of points, you might have a curve which passes through these points, and it may make sense to be able to answer a question "given these points, where will the value be, approximately?".

this works pretty well if the input and output sample rates are "similar", but if there are considerable sample rate differences (such as in a MIDI synth), then the audio quality may suffer (due to "aliasing" or similar).

one strategy that exists is to start with audio at a higher sample-rate (say, 48kHz or 44.1kHz), and then recursively downsample it by factors of 1/2 (say, by averaging pairs of samples), for example, we create versions of the audio at various sample rates:

44.1kHz, 22.05kHz, 11.025kHz, 5.513kHz, ...

then you can calculate approximately which target sample-rate you need, interpolating the sample position from adjacent sample-rates, and then interpolating between these rates to get the desired sample. (if familiar with the idea of mipmapping in graphics, this is very similar...).

most of this can be wrapped up in a function though, as normally you don't want to deal with all this stuff every time you want a sample.

example:

float patchSamplerInterpolate(patchSampler patch, double sampleBase, double targetRate);

where patchSampler here may represent a given piece of audio (a patch or waveform or whatever term is used for this).

other various thoughts:

consider picking some unit other than samples as your unit of time measure, for example, it may make some sense to do a lot of calculations in terms of seconds or similar (when dealing with lots of audio at different sample-rates or with a lot of scaling/... seconds may make more sense as a basic time-unit);

use double (and not float) for audio sample positions and time-based calculations (when it comes to sub-sample accuracy over time-frames of many minutes or more, float doesn't really hold up well);

it may be useful to consider what happens when time values are before the start or after the end of a given patch/waveform, for example, does it loop or is it followed by silence?, ... so, for example, the interpolation function might use a flag, which indicates whether the sound is discrete (non-looping) or continuous (looping), and then generate sane values for out-of-range sample positions.

potentially, various effects may be implemented in terms of functions, either using raw sample arrays, or by building on top of abstracted interpolation functions (there are tradeoffs here, raw arrays can be faster, but tend to be a little more hairy/nasty).

another tradeoff is, for example, whether to store audio data in terms of 16-bit shorts or similar, or use floats.

in my case, I tend to use 16-bit PCM (or sometimes compressed representations) for storing raw audio data (sound-effects, ...), but floating-point arrays for intermediate audio data (stuff currently being mixed, ...).

but, yeah, otherwise dunno...

Thank you for all of the information I see a little bit more how it works now, it is helpful.

I actually did a bit of this in class using C. We used array like you said, the problem is since it's mostly an artistic formation our professor had done all the "talking to drivers" part which is, I guess the first thing I need to learn if I want to do audio on my own. My program needs to talk to some audio device.

I guess the same answer goes to Burnt_Fyr, thanks for your tips. I don't know about Xaudio2, is it something used to what I call "talk to drivers"?

My experience is basically with high level code and general knowledge about DSP. For example like I said we would do programming but on some premade code with all the low level close to the machine already done. We would also do reverbs, flangers and all this kind of stuff but with Max/Msp for example. I remade the ARP Odyssey using Csound which a little bit closer to C but with a lot of stuff already done (like the VCOs). So again it's not like I don't have any clue about what goes bellow the effect (I know more than just the effect on the sound and the waveform basically) but I indeed don't have a very deep knowledge (not at all) about how it's done in code when you start with an empty file.

And this is basically the gap I'd like to fill, from 0 to where I know. Again you might think, "well good luck" but it's not a thing I plan to do super fast.

This topic is closed to new replies.

Advertisement