Archived

This topic is now archived and is closed to further replies.

Compression method

This topic is 5586 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I''ve thought of what might be a good way to compress sound and I''m just posting my idea here to see what comes of it. If it''s rubbish, then let me know. But please don''t take this as an excuse to do some flaming. Here we go... Instead of using 8-bit or 16-bit samples to represent the voltage level at that point, a sort of floating sample could be used. It would be a 4-bit value (which means between 0-15) which gives a voltage level plus allows for scaling. Because sound voltage levels are both positive and negative, it would start off centered on 0v (&b1000 = 0v (VB notation for hex and binary)). Each value would give the voltage level, but if it was &b0000 or &b1111 it would tell the program to ''step'' instead. If we use increments of 0.1v then a signal going from 0v to 1v down to -1v and back to 0v in steps of 0.1v would look like this in hex: &h89ABCDEF12343210EDCBA9876543210EDCDEF12345678 Note: &b0000 means step down, &b1111 means step up, &b0001 to &b1110 is 0 to 13 in that ''octave''. After a ''step'' the next value gives the level (e.g. 0.7v in octave 0 (&b1000 = 0.0v) = &b1111,&b0001) So, to represent 41 level changes took us just 23 bytes (not counting headers), when with 8-bit mono it would have been 41 bytes and 16-bit mono would have been 82. In stero, 4step (or whatever name I think up for it) would be 46 (assuming stero alternates left and right samples), 8-bit is 82 and 16-bit is 164! This does not count headers (which would add to it) but there is flexibility in this design in that any ''virtual'' bit level can be used (e.g. 128-bit quality is possible) without adding more than a small proportion onto the file size. It should be possible to chain octave changes (e.g. &hFFFF1 to go up 4 octaves and then use bottom level for that octave). Another possibility is error checking. For example, every 1k there could be a ''check byte'' which contains the current octave level. The reader could compare that to its internal level, and if there is any difference then there is an error somewhere. I''m guessing that I''m not the first person to think of this. If I wasn''t the first then I won''t try to make it *my* idea (I didn''t copy it off anyone if that''s what you''re thinking). This is currently just a theory but I''ll try to put some code together (most likely in VB, but I might port it to C++) and upload some sound files encoded in both standard WAVE and this format so people can compare them. I think I can trust you lot not to go and pirate it yet. If someone else thought of it first, and can PROVE it , then I''ll stop. Otherwise, it''s my idea and I reserve the right to patent it or otherwise as I see fit. IDEA: I''ve just realised that this will probably make the file more compressible (there are going to be an awful lot of &hF1 sequences in it). I''ll just have to see... Note on notation: "&h" means that the number is in hex, and "&b" means that the number is in binary. --Thomas McCorkell Just what is Karma? Is it a way to rate people? A way of assigning privilege levels? Or is karma just an anti-spam system?

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
what happens when the difference between two samples is larger then 15? (or rather 7; 7 up and 7 down)? you still need more data for one of those samples.... and what if you dont need how many of those you need for the next sample? you cant tell, or you need to waste bits on that as well...

A better idea might be huffman coding or something? better: take a look at gamasutra; there must be an article about vector quantizarion.. i read it a week ago... it has some ideas..

you could represent a series of 8 samples as an eight-dimensional vector, pick 256 vectors that occur frequently, and map an 8-sample "string" to one of those 256 vectors; that would be like palettizing soud. The trick is to choose a good set of (256) indices (please excuse me for my bad understanding of the terminology). That;s what you need the article for

Share this post


Link to post
Share on other sites
quote:
Original post by Anonymous Poster
what happens when the difference between two samples is larger then 15? (or rather 7; 7 up and 7 down)? you still need more data for one of those samples.... and what if you dont need how many of those you need for the next sample? you cant tell, or you need to waste bits on that as well...



quote:
Original post by BoggyB
It should be possible to chain octave changes (e.g. &hFFFF1 to go up 4 octaves and then use bottom level for that octave).



As in if there is a change of 20 levels up then assuming &b1000 is level 0 we get:

&hF (up one level +7=7)
&hF (up one level +7=14)
&h7 (actual level +6=20)

Final value: &hFF7

I think the above levels are right... as I said this is just the basic theory at the moment. Anyway, you''re not likely to get that kind of level change in a standard audio file unless you''re using 64 or 128 bits to represent each sample, and who would use that? I just included 128 as an example.

Don''t get me wrong, I do like having input, it''s just when people fail to read the whole post before replying.

Oh, and could you give me the address of the website?

--Thomas McCorkell

Just what is Karma? Is it a way to rate people? A way of assigning privilege levels? Or is karma just an anti-spam system?

Share this post


Link to post
Share on other sites
I didn''t try to read and understand your WHOLE post (as I
don''t have the time), but you may want to look up something
called "Arithmetic Coding". It sounds like that''s what
you''re doing. It also uses floating point numbers.



Kami no Itte ga ore ni zettai naru!

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
Im terribly sorry for not reading you whole post... I made a few incorrect assumptions (sorry)

The address of that atricle would be http://www.gamasutra.com/features/20010416/ivanov_01.htm

Actually the method you suggested is not such a bad idea. (please excuse me again; I stil havent read your post but I will...)

How well this works, i think, depends on the "bit depth" of the sound; noise for example wouldn''t work. For low frequency sound however it should IMHO work.

Again sorry.

Share this post


Link to post
Share on other sites
It''s all right... I''ve done the same thing myself sometimes :-)

The link is on a members-only site... (I HATE MEMBERS-ONLY) ...and it asks me for things like my address so I''m not going to sign up to it (I HATE SPAM). Could you e-mail me the article please? (Or just give me the general gist of it) Thanks.

tangentz: I will look that up at some point. Thanks.

Speaking of compression, one way is to use a look-up method with variable bit lengths (a bit like Morse code - E (the most common letter) gets the shortest signal: a single short beep/dot/flash/pulse/whatever).

Please note that this isn''t a full thesis or whatever, it''s just chucking ideas together in the hope that something good comes of it. Feel free to come up with your own ideas to add too this.

--Thomas McCorkell

Just what is Karma? Is it a way to rate people? A way of assigning privilege levels? Or is karma just an anti-spam system?

Share this post


Link to post
Share on other sites
quote:
Original post by BoggyB
Speaking of compression, one way is to use a look-up method with variable bit lengths (a bit like Morse code - E (the most common letter) gets the shortest signal: a single short beep/dot/flash/pulse/whatever).




Well, this is basicly what Huffman encoding does.

-Marten

Share this post


Link to post
Share on other sites
A note here: Have you actually studied various samples of wave data? A while ago I experimented with an algorithm that would vectorize the samples in two-three levels, and then add LZW encoding on the result. It work great on the one sample I tested with, which was some smooth Dolly Parton song. I then tried it on a Metallica song, and the result was horrible!!! There it was very common that samples went from +31232 to -26403 etc. And by "common" I mean that every second sample was around +16,000 and the others around -16,000.

I''m not sure what you mean, but you were talking of octaves. That''s totally irrelevat here, as a sample do not have an "octave", as it is defined as the frequency over several samples. Could have misunderstood you though.

Share this post


Link to post
Share on other sites
quote:
Original post by CWizard
A note here: Have you actually studied various samples of wave data? A while ago I experimented with an algorithm that would vectorize the samples in two-three levels, and then add LZW encoding on the result. It work great on the one sample I tested with, which was some smooth Dolly Parton song. I then tried it on a Metallica song, and the result was horrible!!! There it was very common that samples went from +31232 to -26403 etc. And by "common" I mean that every second sample was around +16,000 and the others around -16,000.

No compression algorithim works well on everything. There was a topic once about a "super-dooper" compression method. Hold on... I''ll post a link...
MP3-Beating Compression: http://www.gamedev.net/community/forums/topic.asp?topic_id=11259
quote:
I''m not sure what you mean, but you were talking of octaves. That''s totally irrelevat here, as a sample do not have an "octave", as it is defined as the frequency over several samples. Could have misunderstood you though.

You did. I only used "octave" because I couldn''t think of a better word. Don''t worry, you''re not the first and you won''t be the last. By "octave", I meant a range of values in the sample. The "octave" can be changed up or down by &hF or &h0. I didn''t use the word in relation to a musical "octave".

Sorry for the confusion.


--Thomas McCorkell

Just what is Karma? Is it a way to rate people? A way of assigning privilege levels? Or is karma just an anti-spam system?

Share this post


Link to post
Share on other sites
The point I was making is that I think your, as well as my, algorithm assume smooth waves (like synthetical sine waves), that in reality are very uncommon. Especially when more than one tone or noise are mixed on one channel.

And, I''ve read the entire MP3 beating compression algorithm thread, but it''s not very related to this.

A tip for your algortihm though, implement a single bit code that can negate the voltage. It is not very uncommon to have waves that go like this:


X X
X X X X
X X X X X X
XXXXXXXXXXXXXX
X X X X X X
X X X X
X X

You get the idea...

Share this post


Link to post
Share on other sites
quote:
Original post by CWizard
The point I was making is that I think your, as well as my, algorithm assume smooth waves (like synthetical sine waves), that in reality are very uncommon. Especially when more than one tone or noise are mixed on one channel.

I''ll have to test it on different files to see how it does, but you''re probably right.
quote:
And, I''ve read the entire MP3 beating compression algorithm thread, but it''s not very related to this.

Me too. I actually thought that he might have an idea, but about 20-30 posts were "you can''t compress 100MB to 3 bytes" and 50 were "you''re joking it can''t be done". The theory looks like it might work but we never got the code for it. At least I''ve put enough up to let someone finish it off.
quote:
A tip for your algortihm though, implement a single bit code that can negate the voltage. It is not very uncommon to have waves that go like this:


X X
X X X X
X X X X X X
XXXXXXXXXXXXXX
X X X X X X
X X X X
X X

You get the idea...

Woule be tricky but maybe using 5-bit blocks with one bit to indicate a negation might work... but would trash file handling. Will have to look into it. Thanks.

--Thomas McCorkell

Just what is Karma? Is it a way to rate people? A way of assigning privilege levels? Or is karma just an anti-spam system?

Share this post


Link to post
Share on other sites