Compression method

Started by
11 comments, last by BoggyB 21 years, 7 months ago
I''ve thought of what might be a good way to compress sound and I''m just posting my idea here to see what comes of it. If it''s rubbish, then let me know. But please don''t take this as an excuse to do some flaming. Here we go... Instead of using 8-bit or 16-bit samples to represent the voltage level at that point, a sort of floating sample could be used. It would be a 4-bit value (which means between 0-15) which gives a voltage level plus allows for scaling. Because sound voltage levels are both positive and negative, it would start off centered on 0v (&b1000 = 0v (VB notation for hex and binary)). Each value would give the voltage level, but if it was &b0000 or &b1111 it would tell the program to ''step'' instead. If we use increments of 0.1v then a signal going from 0v to 1v down to -1v and back to 0v in steps of 0.1v would look like this in hex: &h89ABCDEF12343210EDCBA9876543210EDCDEF12345678 Note: &b0000 means step down, &b1111 means step up, &b0001 to &b1110 is 0 to 13 in that ''octave''. After a ''step'' the next value gives the level (e.g. 0.7v in octave 0 (&b1000 = 0.0v) = &b1111,&b0001) So, to represent 41 level changes took us just 23 bytes (not counting headers), when with 8-bit mono it would have been 41 bytes and 16-bit mono would have been 82. In stero, 4step (or whatever name I think up for it) would be 46 (assuming stero alternates left and right samples), 8-bit is 82 and 16-bit is 164! This does not count headers (which would add to it) but there is flexibility in this design in that any ''virtual'' bit level can be used (e.g. 128-bit quality is possible) without adding more than a small proportion onto the file size. It should be possible to chain octave changes (e.g. &hFFFF1 to go up 4 octaves and then use bottom level for that octave). Another possibility is error checking. For example, every 1k there could be a ''check byte'' which contains the current octave level. The reader could compare that to its internal level, and if there is any difference then there is an error somewhere. I''m guessing that I''m not the first person to think of this. If I wasn''t the first then I won''t try to make it *my* idea (I didn''t copy it off anyone if that''s what you''re thinking). This is currently just a theory but I''ll try to put some code together (most likely in VB, but I might port it to C++) and upload some sound files encoded in both standard WAVE and this format so people can compare them. I think I can trust you lot not to go and pirate it yet. If someone else thought of it first, and can PROVE it , then I''ll stop. Otherwise, it''s my idea and I reserve the right to patent it or otherwise as I see fit. IDEA: I''ve just realised that this will probably make the file more compressible (there are going to be an awful lot of &hF1 sequences in it). I''ll just have to see... Note on notation: "&h" means that the number is in hex, and "&b" means that the number is in binary. --Thomas McCorkell Just what is Karma? Is it a way to rate people? A way of assigning privilege levels? Or is karma just an anti-spam system?
This piece of randomly insane randomness was brought to you by Thomas
Advertisement
what happens when the difference between two samples is larger then 15? (or rather 7; 7 up and 7 down)? you still need more data for one of those samples.... and what if you dont need how many of those you need for the next sample? you cant tell, or you need to waste bits on that as well...

A better idea might be huffman coding or something? better: take a look at gamasutra; there must be an article about vector quantizarion.. i read it a week ago... it has some ideas..

you could represent a series of 8 samples as an eight-dimensional vector, pick 256 vectors that occur frequently, and map an 8-sample "string" to one of those 256 vectors; that would be like palettizing soud. The trick is to choose a good set of (256) indices (please excuse me for my bad understanding of the terminology). That;s what you need the article for
quote:Original post by Anonymous Poster
what happens when the difference between two samples is larger then 15? (or rather 7; 7 up and 7 down)? you still need more data for one of those samples.... and what if you dont need how many of those you need for the next sample? you cant tell, or you need to waste bits on that as well...


quote:Original post by BoggyB
It should be possible to chain octave changes (e.g. &hFFFF1 to go up 4 octaves and then use bottom level for that octave).


As in if there is a change of 20 levels up then assuming &b1000 is level 0 we get:

&hF (up one level +7=7)
&hF (up one level +7=14)
&h7 (actual level +6=20)

Final value: &hFF7

I think the above levels are right... as I said this is just the basic theory at the moment. Anyway, you''re not likely to get that kind of level change in a standard audio file unless you''re using 64 or 128 bits to represent each sample, and who would use that? I just included 128 as an example.

Don''t get me wrong, I do like having input, it''s just when people fail to read the whole post before replying.

Oh, and could you give me the address of the website?

--Thomas McCorkell

Just what is Karma? Is it a way to rate people? A way of assigning privilege levels? Or is karma just an anti-spam system?
This piece of randomly insane randomness was brought to you by Thomas
I didn''t try to read and understand your WHOLE post (as I
don''t have the time), but you may want to look up something
called "Arithmetic Coding". It sounds like that''s what
you''re doing. It also uses floating point numbers.


Kami no Itte ga ore ni zettai naru!
神はサイコロを振らない!
Im terribly sorry for not reading you whole post... I made a few incorrect assumptions (sorry)

The address of that atricle would be http://www.gamasutra.com/features/20010416/ivanov_01.htm

Actually the method you suggested is not such a bad idea. (please excuse me again; I stil havent read your post but I will...)

How well this works, i think, depends on the "bit depth" of the sound; noise for example wouldn''t work. For low frequency sound however it should IMHO work.

Again sorry.
It''s all right... I''ve done the same thing myself sometimes :-)

The link is on a members-only site... (I HATE MEMBERS-ONLY) ...and it asks me for things like my address so I''m not going to sign up to it (I HATE SPAM). Could you e-mail me the article please? (Or just give me the general gist of it) Thanks.

tangentz: I will look that up at some point. Thanks.

Speaking of compression, one way is to use a look-up method with variable bit lengths (a bit like Morse code - E (the most common letter) gets the shortest signal: a single short beep/dot/flash/pulse/whatever).

Please note that this isn''t a full thesis or whatever, it''s just chucking ideas together in the hope that something good comes of it. Feel free to come up with your own ideas to add too this.

--Thomas McCorkell

Just what is Karma? Is it a way to rate people? A way of assigning privilege levels? Or is karma just an anti-spam system?
This piece of randomly insane randomness was brought to you by Thomas
quote:Original post by BoggyB
Speaking of compression, one way is to use a look-up method with variable bit lengths (a bit like Morse code - E (the most common letter) gets the shortest signal: a single short beep/dot/flash/pulse/whatever).



Well, this is basicly what Huffman encoding does.

-Marten
Sounds like temporal difference coding...

Documents [ GDNet | MSDN | STL | OpenGL | Formats | RTFM | Asking Smart Questions ]
C++ Stuff [ MinGW | Loki | SDL | Boost. | STLport | FLTK | ACCU Recommended Books ]
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan
A note here: Have you actually studied various samples of wave data? A while ago I experimented with an algorithm that would vectorize the samples in two-three levels, and then add LZW encoding on the result. It work great on the one sample I tested with, which was some smooth Dolly Parton song. I then tried it on a Metallica song, and the result was horrible!!! There it was very common that samples went from +31232 to -26403 etc. And by "common" I mean that every second sample was around +16,000 and the others around -16,000.

I''m not sure what you mean, but you were talking of octaves. That''s totally irrelevat here, as a sample do not have an "octave", as it is defined as the frequency over several samples. Could have misunderstood you though.
i love to listen to .ogg music. cd''s in about cd quality on 20 to 30mb.. hehe:D

"take a look around" - limp bizkit
www.google.com
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud

This topic is closed to new replies.

Advertisement