Jump to content

  • Log In with Google      Sign In   
  • Create Account


Compressed sound, what format to use?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
15 replies to this topic

#1 happytrooper   Members   -  Reputation: 102

Like
0Likes
Like

Posted 02 January 2012 - 03:21 AM

Hi!

I'm about to start implementing support for playing compressed sound for our game engine. Unsurprisingly we're on a tight budget both performance-wise and memory-wise. I've been looking at the ADPCM-wav-compression and it could be a possible solution, although it would require a lot of extra hacking. We use OpenAL, and unfortunately it doesn't natively support ADPCM which would mean we'd have to 'manually' decompress the sound before sending it to OpenAL.

Right, so my question to you is: What sound compression would you recommend us using? As I said, performance is a big thing for us so the it can't have a large inpact on that.

Cheers!
/ Freddy

ps. an audio-section here in the forums would be nice! Posted Image

Sponsor:

#2 prh99   Members   -  Reputation: 501

Like
0Likes
Like

Posted 02 January 2012 - 12:08 PM

Have you look at Ogg Vorbis from xiph.org
Patrick

#3 jeroenb   Members   -  Reputation: 257

Like
0Likes
Like

Posted 02 January 2012 - 01:06 PM

I would indeed also vote for Ogg Vorbis format. It is a good format that can be used easily in combination with OpenAL. The encoder and decoders are also not emcumbered with wierd licenses, so you can freely use it.

Crafter 2D: the open source 2D game framework

Blog: Crafter 2D
Twitter: @crafter_2d


#4 Sik_the_hedgehog   Crossbones+   -  Reputation: 1601

Like
1Likes
Like

Posted 02 January 2012 - 11:50 PM

While I would also recommend Vorbis (what's usually miscalled OGG, we're talking about the encoding format here, guys), take into account he is worried about performance. I guess he's expecting a large amount of sound sources all having to get decompressed at the same time.

I have used Vorbis in the past in some games to store both the background music and sound effects. By this I mean the sound data was loaded into memory as-is (still Vorbis) and then decompressed during playback. This was done to save memory (even though it may have been overkill for sound effects). Didn't seem to have any impact on performance at all (and it was a quite old computer for today's standards - the CPU was a 2.4 GHz Pentium IV, also the game was software rendered), but then again at most you had the background music and a couple of sound effects going on.

I have absolutely no idea of a good compression format that has good decompression performance. ADPCM is very fast to decompress, but the compression ratio isn't all that good in comparison to newer encoding methods.

EDIT: also it may be worth a shot to use Vorbis for background music and something more lightweight for sound effects. There would be only one music going on so performance isn't much of an issue there, while sound effects are short so space usage isn't much of an issue there. Sounds like it could be a good trade-off (even better would be to let users specify what format to use for each sound source if possible, ideally explaining what's the best way to use each format).

EDIT 2: also in case you wonder, ADPCM is fast. I know of a homebrew sound engine that can do ADPCM playback at 22 KHz on a 3.58 MHz Z80 (and that's an 8-bit processor!), so in the case of ADPCM performance would be the least of your worries. Just don't expect all that much compression from it (although it's still significant compared to uncompressed).
Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#5 happytrooper   Members   -  Reputation: 102

Like
0Likes
Like

Posted 04 January 2012 - 09:04 AM

Thank you all for your replies!

Sik you're spot on. We have a lot of sound loaded into memory, where all the larger sound files already are vorbis compressed. It's all the smaller (400kb and less) uncompressed sounds we're trying to reduce the size of. I did some testing and came to the conclusion that Ogg Vorbis is not a viable solution for us as it would be too expensive performance-wise to decode.

ADPCM seems to be the way to go here. We'd be able to compress all of our uncompressed wav down to a 1/4 in size which is quite nice. But, as I mentioned, we're using OpenAL which doesn't support ADPCM on Windows. This means I'll have to make a hack that decodes/decompresses the ADPCM-sound before sending it to OpenAL.

Please reply if you think there's anything I should keep in mind when doing this.

Again, thank you all Posted Image

#6 Matias Goldberg   Crossbones+   -  Reputation: 3148

Like
0Likes
Like

Posted 04 January 2012 - 10:44 AM

Hi, it would help if you tell us what kind of CPU budget you have (a Pentium II processor? an Intel Core 2 Duo? Single or Dual core? An ARM for cellphones? iPhone? Android?) and what is your memory budget (16MB? 128MB? 256? 512? 4GB?) and how much memory you're using already without sounds loaded, and how much time in seconds you have in sounds.

I personally use raw pcm for sounds (most of them areEdit: Somehow half of my post was cut. A GD.Net bug? I have to go, I'll repost later. Still answer the question above

#7 GameCodingNinja   Members   -  Reputation: 162

Like
0Likes
Like

Posted 04 January 2012 - 11:21 AM

Hi, it would help if you tell us what kind of CPU budget you have (a Pentium II processor? an Intel Core 2 Duo? Single or Dual core? An ARM for cellphones? iPhone? Android?)


If it's a Windows app, I would look into XAct. Doesn't matter what the file format is because it's compressed down into XAct specific file (*.xwb). The amount of compression can be set.

#8 Ed Welch   Members   -  Reputation: 471

Like
0Likes
Like

Posted 04 January 2012 - 01:19 PM

I wrote some code to decode ADPCM, but I found a bad flaw in the specification. It introduces unnecessary padding at the end of the file.

#9 Sik_the_hedgehog   Crossbones+   -  Reputation: 1601

Like
1Likes
Like

Posted 04 January 2012 - 03:30 PM

Please reply if you think there's anything I should keep in mind when doing this.

Well, here's another suggestion but I guess you'll kill me for this =P

Basically you could try reducing the quality of sound effects when size becomes problematic. People can't distinguish between 8-bit and 16-bit samples unless they're audiophiles, and many sound effects can be downsampled without much distortion (how much you can downsample depends on the sound effect - low pitched ones aren't affected much, high pitched ones are less tolerant). I have tried this before and it worked pretty well.

One thing to take into account though is that if you downsample you should avoid interpolation at all costs, because that's what makes them sound worse. It doesn't help it usually happens at the hardware level so it's hard to avoid... Generally you do this by setting the audio output at a higher sample rate and then repeating samples when playing back (e.g. if the sound is 11025 Hz and the output is 44100 Hz you'd repeat each sample four times). This ensures the audio output sounds clear and not muffled.

Besides that, yeah, not much to say. ADPCM is extremely fast to process so you probably shouldn't worry all that much about it, and most likely Vorbis is being decompressed in software on most computers anyways.
Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#10 happytrooper   Members   -  Reputation: 102

Like
1Likes
Like

Posted 25 January 2012 - 11:01 AM

Right!

So I've implemented a decoder (http://wiki.multimedia.cx/index.php?title=IMA_ADPCM) to decode my IMA ADPCM sound.. buuuut the sound gets errrmm.. I'll let this picture talk for me:

Posted Image

See the 'chunks' in the wave? (Oh, and it sounds like it looks btw) The first 'chunk' is perfect but then it goes dooownhill.. Any clue from just looking at this what I might be doing wrong? If not I'll post some code.

Cheers!

#11 Adam_42   Crossbones+   -  Reputation: 2442

Like
1Likes
Like

Posted 25 January 2012 - 12:11 PM

I suspect you're not resetting the parameters at the start of each block. See http://wiki.multimedia.cx/index.php?title=Microsoft_IMA_ADPCM for how it does that for MS ADPCM.

#12 happytrooper   Members   -  Reputation: 102

Like
0Likes
Like

Posted 30 January 2012 - 06:00 AM

I suspect you're not resetting the parameters at the start of each block. See http://wiki.multimed...osoft_IMA_ADPCM for how it does that for MS ADPCM.


That sounds about right! Thanks for your reply! Although .. I'm having a bit of troubling getting my head around this.. Does this mean I should do the decoding in chunks? Right now I'm applying the decoding algorithm to the entire data:

int decode(int16* dst, const uint8* src, uint srcSize)

where srcSize is the size of the encoded data in bits and my predictedValue and stepIndex continuously follow through to the next nibble-iteration. Do I need to instead do this per block (block align in fmt-chunk?)?

perhaps something like this:
int decode(int16* dst, const uint8* src, int srcOffset, uint srcSize)

where srcOffset is the current offset into the source data?

Thanks in advance!

#13 Adam_42   Crossbones+   -  Reputation: 2442

Like
1Likes
Like

Posted 30 January 2012 - 06:37 AM

That sounds about right to me. Note that the MS-ADPCM format stores the predictedValue and stepIndex at the start of each block. I suspect your data will be similar.

You should be able to work out how much extra header data there is - if there was no block header the compression ratio would be exactly 4:1.

#14 happytrooper   Members   -  Reputation: 102

Like
0Likes
Like

Posted 09 February 2012 - 09:47 AM

The audio I'm working with comes from .wav-files which I presume means I should follow this: http://wiki.multimed...osoft_IMA_ADPCM
I'm a bit confused on the "This field reveals the size of a block of IMA-encoded data." and that it then says "an individual chunk of data begins with the following preamble:". I'm guessing they both refer to the same thing?

So, from what I've interpreted I should do something like this:

foreach(adpcm_block in raw_audio_data)
{
  var predictedValue = bytes 0 - 1 of adpcm_block
  var stepIndex = byte2 of adpcm_block
  // ignore byte 3 of adpcm_block
  foreach(4bit nibble in adpcm_block) // this would start on byte 4 (the fifth byte) in the block?
  {
   decompress(nibble)
  }
}

Does that look correct? If it does I'll have to post some code because I can't get it bloody right! Posted Image

#15 phantom   Moderators   -  Reputation: 7047

Like
1Likes
Like

Posted 09 February 2012 - 12:20 PM

I don't think block and chunk are the same thing, if you follow the link to the WAVEFORMATEX structure details and then onto the MSDN page you'll find the following;

Block alignment, in bytes. The block alignment is the minimum atomic unit of data for the wFormatTag format type. If wFormatTag is WAVE_FORMAT_PCM, nBlockAlign must equal (nChannels × wBitsPerSample) / 8. For non-PCM formats, this member must be computed according to the manufacturer's specification of the format tag.


So for a 16bit stereo sample you get (2 x 16) / 8 or 4 bytes per sample.

A 'chunk' on the other hand looks to be header + audio data, which will of course be larger than 4 bytes ;)

#16 happytrooper   Members   -  Reputation: 102

Like
0Likes
Like

Posted 09 February 2012 - 04:37 PM

Hey phantom and thanks for your reply!

I don't think block and chunk are the same thing, if you follow the link to the WAVEFORMATEX structure details and then onto the MSDN page you'll find the following;


Block alignment, in bytes. The block alignment is the minimum atomic unit of data for the wFormatTag format type. If wFormatTag is WAVE_FORMAT_PCM, nBlockAlign must equal (nChannels × wBitsPerSample) / 8. For non-PCM formats, this member must be computed according to the manufacturer's specification of the format tag.


So for a 16bit stereo sample you get (2 x 16) / 8 or 4 bytes per sample.

A 'chunk' on the other hand looks to be header + audio data, which will of course be larger than 4 bytes ;)


That's exactly what caused my confusion in the first place! Pulling apart wav-files I've been extracting various chunks (look here), which makes the following

If the IMA data is monaural, an individual chunk of data begins with the following preamble:
bytes 0-1: initial predictor (in little-endian format)
byte 2: initial index
byte 3: unknown, usually 0 and is probably reserved


make no sense to me, at least if they are referring to the header+data-chunks. But I would think they're actually referring to the blocks - on the other hand I heard/read/can't remember somewhere that IMA ADPCM cannot be decompressed per-block which would contradict each block having a header, that could've been in a dream though.. CONFUSED Posted Image !

Edit: Uh, I'm running so many parallell possible solutions to this little problem - I'll go back to the one I was working on when posting the images above and apply my new knowledge. This thread can be put on ice for now.




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS