Audio Programming: Deeper than you might think

posted in A Keyboard and the Truth for project 96 Mill
Published July 23, 2009
Advertisement
Audio system development is a hugely under-discussed topic. Naively implemented audio systems can seriously hamper developers of non-trivial games and force them to settle for a less audio-rich game.

I'm going to talk about a few different aspects of Audio in games, there is a surprising amount to cover.

Audio is state

Most naive developments, will simply provide a facility to play (via command/function) a single audio file, or maybe a bunch of simultaneous audio files.

playSound("mysound.ogg");

Once started, the sound (or sounds) will play until they are finished and then stop. For trivial games this is fine, tic-tac-toe, minesweeper, maybe even tetris; but when we're talking about RPGs/Adventure Games/FPSs or anything non-trivial it won't fly and using it wont impress anyone.

Why not?

Well because there is a severe lack of control, its the graphical equivalent of being able to only draw a maximum of 20 white squares on a black background; you'll see something, but it is horribly un-interesting. So, what kind of control do we need? First off it would be nice if sounds could loop repeatedly, very handy for background music, or ambient audio... but then you would need a way to potentially stop a looping sound...hmm...but then you would need a way to reference a playing sound; it all gets complicated very quickly, but fear not, I can de-mystify it.

For developers who get past the one-shot sound playing commands, chances are they end up with something where you can play multiple sounds, specify them as one-shot or looping, and each sound played has an alias so you can reference it to stop it later. This is certainly an improvement but it still falls short.

Disembodied Audio

In some games it is quite fun to hear the sounds of spirits from beyond; however if your finely crafted npc is whistling a merry tune and then walks off screen, chances are his haunting whistling will remain though he has gone.

Audio is state - revisited

Sound is never disembodied, something always causes it, an 'emitter' of sound, but the process of playing audio on a computer doesn't require this; a computer 'renders' sound but does not own the sound emissions. Sound familiar? To many it will be a similar realization that our graphics are not our characters, but rather a rendering of the character data.

Sounds should be members of your game-state, each object that lives in your game state, be it The World, a Room, or an Actor; the game should know what kinds of sounds they could potentially play (as constant data); but most importantly the state should know if any of those sounds are playing.

If a tree falls in the woods and no one is around, does it make a sound?

No, it doesn't. The data of what sounds are Logically Playing should be kept in state. This is often very different from what sounds are actually heard.

Things which determine audibility:

  • Is the sound supposed to be playing? (sound logical state)

  • Is the player (viewer) within coarse audibility (in the same room)

  • Is the actor within fine audibility of the player character (spatially for 3D sound)

  • Has the user enabled a particular sound category (music/sfx/voice)?

  • Is there an open sound channel to actually render another audio file?

  • Does the user even have sound hardware?



So, if a tree falls in our game, and we're not in the same room, or even around it, the state says it makes a sound, but it isn't audibly heard. Lots of good wisdom here.

How Selenite Rolls

In selenite each first-class object can have sounds associated with it; and audibility of those sounds depends on the object type.

Note: when i say 'sounds are heard' i mean baring user settings or hardware limitations.

  • Game - All playing sounds are always heard

  • Room - All playing sounds are always heard if the player's current room is this room.

  • Actor - All playing sounds are heard if the actor is within the current player's room, and for 3D sound, within a close proximity.



What this means to the developer/designer

As the player changes the structure of the state (that is, changes the current room, current actor, or what actors exist in what rooms) an Audibility test is run on the state, and each state object is marked as audible or not.

Thus, as an actor walks into the room you're in, it is marked as audible, this audible marking will then make the object check and see if any of its sounds are supposed to be playing, if they are, it will actually render them.

Similarly if the actor walks out of the current room, it will be marked as inaudible, and any sounds playing or not logically, will cease to render.

This is very important to the ease of development, not having to micro-manage the audio state. With this weight of our backs we can feel free to add lots more audio and create a richer environment.


Audio Rendering

Now that we've talked exhaustively about how the concept of audio should be structured, we need to talk about how to actually render this audio. Not so much how to render it great detail, most of you should know that you'll be sending PCM samples to the sound card via some API; but some high level concepts.

Streaming

I am a big fan of streaming audio, mainly because, if you have digital music of significant length, you're going to need to stream; and streaming is very memory friendly. For SeleniteWin32 there exists a high-level streaming interface for audio, and here is how it works:

Requesting a Channel

When you've decided it is time to render some audio, you first request a channel. A channel in this case is some object or handle that represents a currently streaming audio file. When you request it, you pass in the audio file you would like loaded into it and whether or not it should loop. Let's deal with the worst case scenario; due to limitations of resources it is very possible that you will get back a null handle this is the way the audio renderer tells you (i can't play this audio right now) in such a case you should honor this, and to the state, your audio is still playing; this might happen under high audio load situations and is fine.

Assuming that you do get back a valid channel, your audio will now be playing and the channel is your responsibility.

Keeping an eye on the channel

Once you have a valid channel, you'll want to periodically check if it is still rendering. If it is, then no worried, business as usual, if its not then it means your sound is completed (this will never happen for looping sounds); at this completion you should mark your state sound as no longer playing, release the channel, and raise any 'sound done' events you'd like.

Releasing a Channel

Releasing in a channel, playing or not puts it back into the pool of available channels (i use 16 channels). The idea is that while your state may have hundreds or thousands of logically playing sounds; at any given time you should only need to hear 16 of them, and any more than that would likely be a great cacophony.


Tying it all together

As a state object is marked audible, it goes through all of its sounds, and for ones that should be playing it requests channels for each of them. It checks these channels each update loop to see if they've stopped; if they have it marks the sounds as not playing and releases the channels. If the object gets marked as inaudible (say a character leaves the room) all valid channels of the sounds are swiftly released, but the sound states are kept as they were.



As far as I can tell this is the only post I've ever seen on audio programming of this nature, I'd love to hear peoples opinions on it, and systems you use.
0 likes 7 comments

Comments

QRebound
How do you handle prioritizing sounds? For example, say 16 unique sounds are playing (possibly looping), (1 from the game, 1 from the room, and 14 different actors, or whatever). All of a sudden, an actor enters screaming at the top of his lungs. Shouldn't this force another emitter to give up it's channel so that the scream can be heard?
July 23, 2009 01:47 PM
EDI
While a priority system could be used; in most games the likelihood of 16 simultaneously audible sounds would result in such a racket, that adding another wouldn't likely be noticed. I've personally never encountered a situation with 16 simultaneous sounds and I could pick out not hearing the 17th :D

But good point, and for those concerned, adding a simple flag of 'must be heard' and time-stamping channel acquisitions, if a must be heard sound needs to realize itself you could invalidate a non must be heard channel to fulfill it.


Addition:

You can also use a higher number of channels; some folks might create new channels to fill needs; with maybe a higher limit, say 32 or 64, or 128.

You could also implement channel acquisition qualifiers; if more than 4 channels are playing a single sound, don't give another channel for the same sound.
July 23, 2009 06:57 PM
EDI
Anyone else? Anyone look at this and say:

"Damn, that's a good idea!"

or

"Heh. this guy doesn't know what he's talking about."


maybe

"Duh, everyone knows this already."


July 24, 2009 07:57 AM
QRebound
Well, say these 16 sounds consist of:
game music A (outside, fading away)
game music B (inside the bar, fading in)
flickering of a torch
someone walking across the room
someone in the corner mumbling under their breath
GUI sounds, the "hover over" chimes that seem to be popular
Someone's chair scraping back as they stand up
the chink of coins as they pass hands at the bar
someone sipping from their mug
the bartender asking what someone wants to drink
that person responding
the door to the bar opening
random hustle bustle from outside (carts etc)
the town clock ringing 8 o'clock
your own character talking about what he needs to do for a quest
the player is dropping his armor so it makes a clink on the ground

All of these are low volume (thus priority sounds), but even with all of these playing i should hear the lady who bursts into the bar screaming that orcs are attacking. Just thought i should bring it up. The "must be heard" flag is a bit too binary for me. I'd probably implement a byte that stored a 0 to 255 "priority" and a new sound finds the lowest priority sound, and if it's priority is lower than it's it force stops the sound and plays itself. Or even, it might use some audio rendering to render them into the same sound file to play. *shrug* just speculating here. (Sorry for horrible punctuation/capitalization, this keyboard is stiff and difficult to type on.)
July 24, 2009 02:22 PM
EDI
Quote:Original post by QRebound
Well, say these 16 sounds consist of:


First of, this scenario is highly unlikely, both time-wise and content-wise.

Quote:Original post by QRebound
game music A (outside, fading away)


Extremely transient, 1 2 seconds at most?


Quote:Original post by QRebound
game music B (inside the bar, fading in)


Okay, game music, theres 1


Quote:Original post by QRebound
flickering of a torch


normally baked into a single 'environment soundscape', but i'll give you this one, thats 2

Quote:Original post by QRebound
someone walking across the room


this one is valid, though if we're talking a lot of folks, these sounds should be fine-spatially culled, (motion can influence channel aquisition), having 20 people moving all at once would be a much greater cacopony than in real life.
thats 3

Quote:Original post by QRebound
someone in the corner mumbling under their breath


Who are you spider-man? this would be an extremely small audibility radius for fine positioning. not giving you this one

Quote:Original post by QRebound
GUI sounds, the "hover over" chimes that seem to be popular


sure, though extremely transient, 1sec or less usually, thats 4

Quote:Original post by QRebound
Someone's chair scraping back as they stand up


this level of detail normally not included in games; this would be done again with a 'bar scene' sound scape.

Quote:Original post by QRebound
the chink of coins as they pass hands at the bar


Again way too much detail, at the very least you should have to be right next to them.

Quote:Original post by QRebound
someone sipping from their mug
the bartender asking what someone wants to drink
that person responding


I'm not even going to go into these, ditto.


Quote:Original post by QRebound
the door to the bar opening


valid, thats 5


Quote:Original post by QRebound
random hustle bustle from outside (carts etc)


*melodic* Sooouunnndddssscaaapppeee :)

Quote:Original post by QRebound
the town clock ringing 8 o'clock


maybe, we've got sounds so i'll give you this one, thats 6


Quote:Original post by QRebound
your own character talking about what he needs to do for a quest
the player is dropping his armor so it makes a clink on the ground


talking while dropping your armor? i don think so, possible, but i dont think so
i'll give you one, thats 7

Thats seven, extremely unlikely simultanious occurances, and you still have a good surplus; and as I mentioned, a sound channel cache of 32 is perfectly acceptable.

Quote:Original post by QRebound
All of these are low volume (thus priority sounds),


if you know of a game with this kind of sonic detail, i'd love to know about it.


Quote:Original post by QRebound
but even with all of these playing i should hear the lady who bursts into the bar screaming that orcs are attacking. Just thought i should bring it up. The "must be heard" flag is a bit too binary for me. I'd probably implement a byte that stored a 0 to 255 "priority" and a new sound finds the lowest priority sound, and if it's priority is lower than it's it force stops the sound and plays itself. Or even, it might use some audio rendering to render them into the same sound file to play. *shrug* just speculating here. (Sorry for horrible punctuation/capitalization, this keyboard is stiff and difficult to type on.)


The issue is you're offloading the concept of priority onto the game developer; with an 8 bit scale things can get muddied really quickly.... (is this a 128 sound or a 127 sound, hmmm).

I deem this an unlikely scenario, potentially solved by a larger channel cache, or grow-only channels; without the need to bother the developer with sound priority on any
July 24, 2009 10:13 PM
QRebound
=P Like I said, I was speculating. Plus, my examples weren't the most solid because I honestly have never written a single piece of code for sound. Just seemed like something that might be an issue. For example, in my project, an RTS, there could be lots of battle sounds going on that can't be baked into a single environment sound, but a "5 minutes remaining" announcement or a super weapon charging up would need to be heard. I was exploring avenues that I might want to go down later, I have yet to tackle sound in my design, so we'll see.
July 29, 2009 01:11 AM
EDI
I think your concerns were/are very common ones. I remember way back when (2001 maybe?) using the wavemix32.dll which offered only 8 channels. I said to myself "8 channels isn't nearly enough!"; and while I never implemented anything multi-soundly robust with wavemix, the fear stuck to me.

The idea is that generally any sounds which truly should be audible; with 8, 16 or 32 sounds all at once it will sound like a mess.

You will actually likely find that the logically ideal method of making each and every soldier in an army of 1,000 have their own step sounds simply wont work for the game world; in such a scenario having this 'mob' overruled by some sort of controller which dealt with 'group dynamics' would be more likely.

As I said before, a binary flag can be used to tag sounds important to the scenario.
July 29, 2009 07:52 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement