• Announcements

    • khawk

      Download the Game Design and Indie Game Marketing Freebook   07/19/17

      GameDev.net and CRC Press have teamed up to bring a free ebook of content curated from top titles published by CRC Press. The freebook, Practices of Game Design & Indie Game Marketing, includes chapters from The Art of Game Design: A Book of Lenses, A Practical Guide to Indie Game Marketing, and An Architectural Approach to Level Design. The GameDev.net FreeBook is relevant to game designers, developers, and those interested in learning more about the challenges in game development. We know game development can be a tough discipline and business, so we picked several chapters from CRC Press titles that we thought would be of interest to you, the GameDev.net audience, in your journey to design, develop, and market your next game. The free ebook is available through CRC Press by clicking here. The Curated Books The Art of Game Design: A Book of Lenses, Second Edition, by Jesse Schell Presents 100+ sets of questions, or different lenses, for viewing a game’s design, encompassing diverse fields such as psychology, architecture, music, film, software engineering, theme park design, mathematics, anthropology, and more. Written by one of the world's top game designers, this book describes the deepest and most fundamental principles of game design, demonstrating how tactics used in board, card, and athletic games also work in video games. It provides practical instruction on creating world-class games that will be played again and again. View it here. A Practical Guide to Indie Game Marketing, by Joel Dreskin Marketing is an essential but too frequently overlooked or minimized component of the release plan for indie games. A Practical Guide to Indie Game Marketing provides you with the tools needed to build visibility and sell your indie games. With special focus on those developers with small budgets and limited staff and resources, this book is packed with tangible recommendations and techniques that you can put to use immediately. As a seasoned professional of the indie game arena, author Joel Dreskin gives you insight into practical, real-world experiences of marketing numerous successful games and also provides stories of the failures. View it here. An Architectural Approach to Level Design This is one of the first books to integrate architectural and spatial design theory with the field of level design. The book presents architectural techniques and theories for level designers to use in their own work. It connects architecture and level design in different ways that address the practical elements of how designers construct space and the experiential elements of how and why humans interact with this space. Throughout the text, readers learn skills for spatial layout, evoking emotion through gamespaces, and creating better levels through architectural theory. View it here. Learn more and download the ebook by clicking here. Did you know? GameDev.net and CRC Press also recently teamed up to bring GDNet+ Members up to a 20% discount on all CRC Press books. Learn more about this and other benefits here.
Sign in to follow this  
Followers 0
  • entries
  • comments
  • views

fixed block audio codec...

Sign in to follow this  
Followers 0


well, this thing has eaten up several days of work thus far, but at least has mostly moved from core algorithm-level stuff to cleaning up the code (mostly because it was "designed" mostly via iteration and experimentation).

the most basic idea is doing something like DXTn / BCn, but for audio.
this effectively means: fixed number of samples into fixed-size blocks.

it was initially planned to be like DXT, with only a single universal type of block, but experiments showed that no single block encoding was really doing ideally "for everything".

so, core idea:
a fixed size block of mono or stereo samples (currently 64) is encoded into a fixed-size block (currently 256 bits, or 32-bytes). this leads to about 4 bits per sample (for both mono and stereo), and a bit-rate of 176kbps for 44.1 kHz audio.

the partial idea here is to allow fast random access to sample blocks, without needing to first decode all of the audio (instead probably a fixed-size cache can be used). each block can then be decoded independently and stored in the cache. also, a design priority was that everything also be a nice power-of-2 size.

other options would have drawbacks:
for example, Ogg/Vorbis and MP3 aren't really well suited to random-access (likely requiring storing sound-effects decoded in advance into PCM form);
many traditional ADPCM variants also have a similar issue.

in both cases, it implies either linear stream-style playback, or needing to decode in advance.

also, if storing stereo audio, ADPCM requires 352 kbps at 44.1 kHz (because it stores both the left and right channels, rather than using joint-stereo).

staying at <= the bitrate of (mono) ADPCM, and ideally comparable or better audio quality, was a goal.
regardless of exact audio quality, the goal does seem to have been met.

(I can't say I have beaten ADPCM quality, but can at least say that in its present form it is not significantly worse...).
(my initial code I put up was just after I had started getting it working, and has considerably worse sound quality than the current form...)./*Experimental block-based audio codec.Encodes blocks of 64 samples into 256 bits (32 bytes).At 44.1kHz this is 176kbps.It can encode stereo using a "naive joint stereo" encoding.Most block formats will encode a single center channel and will offset it for the left/right channel.Basic Format 0: 4 bit: Block-Mode (0) currently unused (12 bits, zeroed) 16 bit min sample (center) 16 bit max sample (center) 8 bit left-center min (truncated) 8 bit left-center max 64 Samples, 1 bits/sample (64 bits) 16x 4-bit min (64 bits) 16x 4-bit max (64 bits) The 4-bit values interpolate between the full min/max for the block.The 1-bit samples select between the min and max value for each sample. Note: Interpolated values are linear, thus 0=0/15, 1=1/15, 2=2/15, ..., 14=14/15, 15=15/15 Bit packing is in low-high order, and multibyte values are little-endian.Basic Format 1: 4 bit: Block-Mode (1) currently unused (12 bits, zeroed) 16 bit min sample (center) 16 bit max sample (center) 8 bit left-center min (truncated) 8 bit left-center max 32x 2-bit sample (64 bits) 32x 4-bit sample (128 bits)This directly codes all samples, with the 4-bit values encoding even samples, and the 2-bit values encoding odd samples.The 4-bit samples are encoded between the block min/max values, and the 2-bit samples between the prior/next sample.Sample interpolation (2 bit samples):0=prior sample, 1=next sample, 2=average, 3=quadratic interpolated value.Basic Format 2: 4 bit: Block-Mode (2) currently unused (12 bits, zeroed) 16 bit min sample (center) 16 bit max sample (center) 8 bit left-center min (truncated) 8 bit left-center max 32x 6-bit samples (192 bits)This directly codes samples, with the 6-bit values encoding samples.The 6-bit samples are encoded between the block min/max values.This mode encodes even samples, with odd-samples being interpolated.The last sample is extrapolated.Stereo Format 3: 4 bit: Block-Mode (3) currently unused (12 bits, zeroed) 16 bit min sample (center) 16 bit max sample (center) 8 bit left-center min (truncated) 8 bit left-center max 32x 2-bit pan (64 bits) 32x 4-bit sample (128 bits)This directly codes samples, with the 4-bit values encoding even samples.The 2-bit pan value encodes the relative pan of the sample.The 4-bit samples are encoded between the block min/max values.The 2-bit samples represent values as:0=center pan (offset): The sample will be offset for left/right channels.1=center-pan (duplicate): The sample will be the same (center) value for both channels.2=left-pan: The sample will be panned towards the left.3=right pan: The sample will be panned towards the right.This mode encodes even samples, with odd-samples being interpolated.Basic Format 4: 4 bit: Block-Mode (4) currently unused (12 bits, zeroed) 16 bit min sample (center) 16 bit max sample (center) 8 bit left-center min (truncated) 8 bit left-center max 8x 4-bit min (32 bits) 8x 4-bit max (32 bits) 64x 2-bit sample (128 bits)The 4-bit values interpolate between the full min/max for the block.The 2-bit samples interpolate between the min and max values for each sub-block (0=min, 1=1/3, 2=2/3, 3=max). */

note that some things which seem like they would do better, actually do worse.
for example, 16x 12 bit samples with interpolated intermediate values: actually did poorly (the increase in sample precision did not offset for the reduction in the number of representable samples).

likewise, in early tests storing all samples directly as 3-bit interpolated values, didn't really do well (vs the use of 4-bit min-max values over groups with 1 or 2 bits per sample selecting each value).

likewise, block-mode 4 did pretty well, as it partly seems to overlap with the ranges of 0 and 1/2. however, none does clearly better for the various songs tested, as different songs seem to give different breakdowns of relative filter choices.

granted, a simpler filter would probably need to choose an option which does generally does fairly well, which at the moment is split mostly between 1, 2, and 4.

0 seems biased mostly for "noisy" sounds, and 3 is only really used much when there is a more significant left/right divergence.

I can't just do an ADPCM block, as I don't really have enough bits to really make this work out well (unless it were ADPCM + odd-sample interpolation, which is at least possible, but is uncertain how well it would work compared with the range-based approach).

this is actually closer to what I had initially imagined though, but I couldn't think up any good way to fit ADPCM in power-of-2 sized blocks with a power-of-2 number of samples while using less bits than other ADPCM strategies. (things are a lot less pretty at ~ 3-bits / sample).

also, I was initially working actually at a lower target bit-rate: 88 kbps, but I soon doubted I could actually pull off the whole "doesn't sound like total crap" part, so "upgraded" the design to using 176 kbps (by halving the target number of samples), which was the next step up with still keeping everything power-of-2. (there is still some code from the earlier 128-samples in 256-bits form, which is most closely related to block-type 0, just with 128 1-bit samples, and a smaller number of groups each addressing a larger number of samples, namely: 8 groups of 16 samples).

also, the current choice of 64 samples was specific:
much larger, and the waveform generally actually starts looking like a wave (as opposed to a shaky line, *1);
much smaller, and block overhead would eat up pretty much everything else.

64 samples in 256 bits seemed to be roughly the "local minimum". it was also chosen as 176 is fairly close to the "standard" 128kbps used for Ogg and MP3, so would produce "similar" sized files to MP3, even if albeit the quality will be a bit worse... (vs 352 kbps, which would be a bit steep...).

*1: a curve is a bit more of a problem than a relatively flat line, and a full cycle is just bad (as then we have to deal with a much larger value range).

core code has been made available:

yes, it is a little bigger/more complex than would be ideal...

Sign in to follow this  
Followers 0


How does this compare in quality/compression to A-law and mu-law encodings?


Share this comment

Link to comment

quick testing:

mu-law and A-law have considerably higher audio quality, but use around 705 kbps (for 44.1 stereo), vs 1.41 Mbps for raw PCM.


so, effectively, they use about 4x as many bits... (though are still about 1/2 the size of raw 16-bit PCM).


I don't currently have a numerical estimate for the relative quality difference, but it is fairly noticeable.


basically, my codec is giving (in this case) 8:1 compression vs raw PCM, and 4:1 vs mu-law.

for mono audio, compression is 4:1 and 2:1 though.



ADD: it is also considerably a worse size/quality tradeoff than Ogg/Vorbis or MP3, namely that although it falls in a similar bitrate window as them, the quality is somewhat worse. the primary difference though is that these codecs are more designed for stream-decoding (and not random access), and involve a somewhat more expensive audio encoding/decoding process (MDCT + Huffman).


something like DCT or Hadamard could be used for something like this, but I am less certain how well it would work with the use of fixed-format blocks (traditional DCT based codecs typically rely on the use of entropy coding).


don't really want to mess with something like this at the moment though...


Share this comment

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now