Jump to content

  • Log In with Google      Sign In   
  • Create Account

We're offering banner ads on our site from just $5!

1. Details HERE. 2. GDNet+ Subscriptions HERE. 3. Ad upload HERE.


misc experiment: BCn-like block compression, of audio...


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
3 replies to this topic

#1 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 23 April 2013 - 03:40 AM

earlier today I had an idea, and was compelled by this idea:

what if something like ADPCM and DXTn were hybridized?...

 

after a little mental jostling, the ADPCM parts were dropped, but it did remain as a goal to use less than or equal to the number of bits of ADPCM, and have comparable or better audio quality (I think this much has been achieved... at least). (EDIT: with this initial form, not really, while it uses less bits, the quality is a bit worse, at least vs IMA ADPCM, however for the songs tested MS-ADPCM seems to occasionally go into segments of full-on white noise, and sometimes messes up pretty badly, which counts against it IMO...)

 

Goal:

basically, 44.1kHz 16-bit mono or stereo stored with at most 4 bits/sample (average).

(ADD: and keeping everything a power-of-2 size and allowing random access to any point inside a sound-effect, like can be done when working with raw PCM).

 

 

design I ended up settling with:

  • 16 bit mix/max sample (center, 32 bits)
  • 8 bit left-center min/max (16 bits)
  • unused (16 bits)
  • 64 Samples, 1 bits/sample (64 bits)
  • 16x 4-bit min/max (128 bits)

or, stated alternatively:

  • 16 bit center min
  • 16 bit center max
  • 8 bit left-center min
  • 8 bit left-center max
  • unused 16-bits
  • 64 bits at 1 bit per sample
  • 16x 4 bit min (per 4 samples)
  • 16x 4 bit max (per 4 samples)

 

this encodes 64 samples into a 256 bit block, working out to an average of 4 bits/sample.

the 4-bit values interpolate between the main min/max values, and the 1 bit values choose between the min and max values.

 

the stereo is basically sort of a naive joint-stereo scheme.

 

at 44.1, this works out to 176kbps.

 

the quality loss isn't particularly noticable (apart from at low-frequency notes, where there seems to be a slight added "rumble").

 

I had tried another variant that got 88kbps at 44.1 (128 samples in 256 bits), but the quality was worse (it used 1-bit per group of 16 samples), and it sounded grainy.

 

down-sampling is another possible option (it will get 88 kbps at 22.5 kHz, or 44kbps at 11 kHz, but the quality hasn't really been tested for these rates).

 

granted, size/quality is much worse than something like Vorbis or MP3, but it is simpler at least...

 

 

yet to be seen is if there is much possible practical use for something like this...

 

 

current leaning is partly for storing things like background music in a mixer, which if stored as raw PCM data can eat a big chunk of RAM.

could also be used for sound effects in the off-chance that there are enough to actually matter (could be length-triggered, say, for sounds > 65536 samples or similar).

 

 

ADD:

http://pastebin.com/iCv3cvQ4

 

ADD 2:

code for a newer version:
http://pastebin.com/yL2ZWMcR

 

 

thoughts / comments?...


Edited by cr88192, 28 April 2013 - 09:39 PM.


Sponsor:

#2 Anthony Serrano   Members   -  Reputation: 1243

Like
0Likes
Like

Posted 23 April 2013 - 03:58 PM

I'm not sure that there's much use for this.

 

It results in a larger data stream than Vorbis, and is a bit more complex to decode than XA-Audio without providing a significantly better data rate, while providing lower audio quality than either.  (XA-Audio is a compressed audio format used on the PlayStation consoles, based on the compressed audio format used on the SNES, that compresses audio to 4.57 bits per sample.)



#3 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 23 April 2013 - 06:13 PM

I am not really familiar with XA-Audio, and haven't had much luck looking up a spec for this...

 

(I could evaluate this more, if I could find any real information about it...).

 

(ok, found out about BRR, which is apparently what this is...).

 

 

I was still messing with the audio quality, and have gotten it a bit better than it was earlier (with minor changes to the block format, and a lot of minor fiddling with the arithmetic).

 

(added a block-mode field, sort of like BC6 / BC7H, with a slightly tweaked version of the original format as mode 0).

 

as-is, it is still more "experimental" though, not necessarily to say I might use it at this point.

 

 

in my tests, it still seems a bit hit-or-miss vs 11 kHz 16-bit mono and 22kHz 8-bit mono.

I could consider a few block-modes to address a few weak cases (12-bit and 6-bit PCM-like modes).

it is tempting along with the possibility of a split-stereo block.

 

 

for on-disk storage I am currently using a mix of WAV, Ogg/Vorbis, and FLAC.

this would be probably more for compression in-RAM, rather than on-disk.

(granted, yes, stream-decoding of Vorbis is possible, vs as-is, decoding everything in advance...).

 

as-is, everything in RAM (in the mixer) is 44.1 kHz 16-bit mono or stereo and stored in raw PCM.

 

 

size vs quality does still seem better than ADPCM in my tests though... (which has lots of popping and hissing).

checking against Microsoft ADPCM and IMA ADPCM. (actually, IMA ADPCM sounds a little better for the vocals, but other things sound very washed-out).

 

likewise vs 22kHz 8-bit-mono, which seems to add a lot more grain and hiss (but does preserve some of the sounds a little better).

 

 

granted, yes, I have currently been using a few songs as tests (mostly testing with the "GiTS: SAC" intro songs).

 

so, yeah, later on, checking against other possibilities could make sense...


Edited by cr88192, 23 April 2013 - 06:46 PM.


#4 BGB   Crossbones+   -  Reputation: 1554

Like
0Likes
Like

Posted 24 April 2013 - 12:50 PM

updated format:

/*
Experimental block-based audio codec.
Encodes blocks of 64 samples into 256 bits (32 bytes).
At 44.1kHz this is 176kbps.
It can encode stereo using a "naive joint stereo" encoding.

Most block formats will encode a single center channel and will offset it for the left/right channel.

Basic Format 0:
    4 bit: Block-Mode (0)
    currently unused (12 bits, zeroed)
    16 bit min sample (center)
    16 bit max sample (center)
    8 bit left-center min (truncated)
    8 bit left-center max
    64 Samples, 1 bits/sample (64 bits)
    16x 4-bit min (64 bits)
    16x 4-bit max (64 bits)
 
The 4-bit values interpolate between the full min/max for the block (creating a 4-sample sub-block).
The 1-bit samples select between the min and max value for each sub-block.
 
Note: Interpolated values are linear, thus 0=0/15, 1=1/15, 2=2/15, ..., 14=14/15, 15=15/15
 
Bit packing is in low-high order, and multibyte values are little-endian.

Basic Format 1:
    4 bit: Block-Mode (1)
    currently unused (12 bits, zeroed)
    16 bit min sample (center)
    16 bit max sample (center)
    8 bit left-center min (truncated)
    8 bit left-center max
    32x 2-bit sample (64 bits)
    32x 4-bit sample (128 bits)

This directly codes all samples, with the 4-bit values encoding even samples, and the 2-bit values encoding odd samples.

The 4-bit samples are encoded between the block min/max values, and the 2-bit samples between the prior/next sample. (EDIT: 2-bit interpolation: 0=prior sample, 1=next sample, 2=linear average, 3=quadratic).

Basic Format 2:
    4 bit: Block-Mode (2)
    currently unused (12 bits, zeroed)
    16 bit min sample (center)
    16 bit max sample (center)
    8 bit left-center min (truncated)
    8 bit left-center max
    32x 6-bit samples (192 bits)

This directly codes samples, with the 6-bit values encoding samples.
The 6-bit samples are encoded between the block min/max values.

This mode encodes even samples, with odd-samples being interpolated (quadratic).
The last sample is extrapolated.

Stereo Format 3:
    4 bit: Block-Mode (3)
    currently unused (12 bits, zeroed)
    16 bit min sample (center)
    16 bit max sample (center)
    8 bit left-center min (truncated)
    8 bit left-center max
    32x 2-bit pan (64 bits)
    32x 4-bit sample (128 bits)

This directly codes samples, with the 4-bit values encoding even samples.
The 2-bit pan value encodes the relative pan of the sample.

The 4-bit samples are encoded between the block min/max values.

The 2-bit samples represent values as:
0=center pan (offset):
    The sample will be offset for left/right channels.
1=center-pan (duplicate):
    The sample will be the same (center) value for both channels.
2=left-pan:
    The sample will be panned towards the left.
3=right pan:
    The sample will be panned towards the right.

This mode encodes even samples, with odd-samples being interpolated (quadratic).

Basic Format 4:
    4 bit: Block-Mode (4)
    currently unused (12 bits, zeroed)
    16 bit min sample (center)
    16 bit max sample (center)
    8 bit left-center min (truncated)
    8 bit left-center max
    8x 4-bit min (32 bits)
    8x 4-bit max (32 bits)
    64x 2-bit sample (128 bits)

The 4-bit values interpolate between the full min/max for the block.
The 2-bit samples interpolate between the min and max values for each sub-block (0=min, 1=1/3, 2=2/3, 3=max).
 */

 

 

in current tests, block-types 2 and 3 seem to be chosen most aggressively (EDIT: 1 and 2, after more fine-tuning).

 

block type 0 seems to be picked most often for "noisy" sounds, but rarely for vocals.

block types 1 and 2 are most often picked for vocals it seems.

block type 3 is mostly when there is a stronger left/right divergence.

 

I had tested a block-format with 16x 12-bit samples (and interpolation for everything else), but it ended up mostly unused in tests, so I dropped it.

 

( ADD: the encoder basically chooses whichever format gives the smallest RMSE for a given block. I am going on both RMSE and perceived sound quality... ).

 

another considered possibility:

block type with all samples as 3-bits. (IIRC, I tried this earlier on, but initially this lost out quality-wise to what became block-type 0).

 

 

overall:

current quality seems to now beat out both 8-bit (mono) and MS-ADPCM (mono).

(for full stereo, at the same sample rate ADPCM takes 2x as much space, and at 22kHz sounds worse...).

 

it currently seems a bit closer compared with IMA ADPCM (which seems to do a good job with vocals but makes noisy sounds sound "muddy"...).

 

 

a drawback is that there still seems to be some "grit" at the moment.

may try to work more on this...

(some noise seems due to the handling of stereo though... but short of forcing mono there isn't really a good fix at the moment).

 

 

Deflate test:

none really compress well with deflate, but of these, raw 8bit and MS-ADPCM seem to compress the most.

my codec doesn't really seem to deflate very well though...

 

 

EDIT: now testing with more songs, mostly Skrillex...

( these songs seem to mostly favor block-types 0 and 1. )

 

 

ADD:

Added block mode 4.

more code cleanup still needed before I might make it available.

added a split-stereo mode, but this uses 352kbps (vs 176kbps), and may consider an 88 kbps sub-mode (basically would halve the samples and use quadratic interpolation for the rest).

 

also defined a file-header (using a BMP-like strategy), and working on making the thing usable as a command-line tool.

 

 

ADD 2:

code for a newer version:
http://pastebin.com/yL2ZWMcR


Edited by cr88192, 28 April 2013 - 03:40 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS