Jump to content
  • Advertisement
Sign in to follow this  
Lode

deflate spec

This topic is 4682 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

First of all, I don't ask anyone to read and understand the complete deflate spec for me, rather, I'm just hoping that someone accidently used it too once and remembers something about the bit order. In the following spec about the deflate format: http://ietfreport.isoc.org/idref/draft-deutsch-deflate-spec/#page-11 it's extremely hard for me to find out what now just the bit order is. A file is a series of bytes, but the compressed data is in the form of huffman code bits, mixed with bits to be interpreted as 3-bit integers and 5-bit integers. This whole thing is independent of the bytes in which those bits are stored, so deflate block headers might start right in the middle of a byte. I understood that you go through it byte by byte, and that you read the bits from LSB to MSB of the byte, so if the file is a series of bits as you would normally read them (with the MSB bits first), you read it in this order: 8 7 6 5 4 3 2 1 - 16 15 14 13 12 11 10 9 - 24 23 22 21 20 19 18 17 and that is the order in which you interpret bits of the huffman codes. I also understood that if you encounter a 3-bit integer in the code, that you keep reading the bits in the same order as I gave above, and that in that order, the first bit you encounter is the MSB of the 3-bit integer, then the second one, and finally the LSB of it. Am I correct? Thanks!

Share this post


Link to post
Share on other sites
Advertisement
Quote:
* Data elements are packed into bytes in order of
increasing bit number within the byte, i.e., starting
with the least- significant bit of the byte.
* Data elements other than Huffman codes are packed
starting with the least-significant bit of the data
element.
* Huffman codes are packed starting with the most-
significant bit of the code.

Looking at that, huffman codes are MSB-LSB. Otherwise its LSB-MSB. I didn't read the spec in detail though.

Edit: Actually I think it might be the other way around. Bits are normally interpreted as MSB=bit 0, LSB=bit 7. So if, for regular data elements, we're taking bit 0 from the LSB (bit 7) of the data and putting it in the LSB (bit 7) of byte, then bit 0, taken from data bit 7 will end up in bit 7 of the byte. *brain overload*

Maybe you should examine some real deflate data and look at the bit order of the header.

Share this post


Link to post
Share on other sites
For any fixed length codes you have to read they are stored LSB first, so if you read a three bit fixed length code from the (LSB first) byte: 10011001 you read the three bits 1, 0 and 0, which are interpreted as the binary number 001 with decimal value 1. Whenever you read a huffman code from the stream it is read MSB first, so if you had the huffman tree:
  root
/ \
0 1
[0] / \
0 1
/ \ [3]
0 1
[1] [2]

(where [x] indicates the code number) and read a huffman code from the same byte you would again read the bits 1, 0 and 0 but they would be interpreted as the binary number 100 with decimal value 4, corresponding to Huffman code 1 in the above tree.

Enigma

Share this post


Link to post
Share on other sites
Quote:
Original post by Enigma
byte: 10011001


I'm sorry but I'm still a bit confused because the byte is symmetric, could you do the same example for byte 10010110 instead please?

Thanks.

Share this post


Link to post
Share on other sites
Indeed, probably wasn't a good idea to pick a symmetric byte. OK, using LSB first byte 10010110, which printed normally in hexadecimal as an MSB byte would be 0x69, it would be exactly the same as my previous post. I.e.:
#include <iostream>

std::string binary(int value, unsigned int minDigits)
{
char codes[2] = {'0', '1'};
std::string result;
while (value > 0)
{
result = codes[value & 0x1] + result;
value >>= 1;
}
while (result.size() < minDigits)
{
result = '0' + result;
}
return result;
}

class Bitstream
{
public:
Bitstream(unsigned char c)
:
c_(c),
b_(0)
{
}
unsigned char getBit()
{
unsigned char r = (c_ & (1 << b_)) >> b_;
++b_;
return r;
}
private:
unsigned char c_, b_;
};

void readFixedLengthCode(Bitstream bs, unsigned char numberOfBits)
{
unsigned char fixedLengthCode = 0;
unsigned char bitNumber = 0;
while (bitNumber < numberOfBits)
{
fixedLengthCode += bs.getBit() << bitNumber;
++bitNumber;
}
std::cout << int(numberOfBits) << " bit fixed length code: " << binary(fixedLengthCode, numberOfBits) << '\n';
}

void readHuffmanCode(Bitstream bs, std::string code1, std::string code2, std::string code3, std::string code4)
{
char codes[2] = {'0', '1'};
std::string code;
while (code != code1 && code != code2 && code != code3 && code != code4)
{
code += codes[bs.getBit()];
}
std::cout << "huffman code: " << code << '\n';
}

int main()
{
unsigned char msbByte = 0x69;
std::cout << binary(msbByte, 8) << '\n';
readFixedLengthCode(Bitstream(msbByte), 3);
readHuffmanCode(Bitstream(msbByte), "0", "100", "101", "11");
}

Try playing around with that.

Enigma

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

Participate in the game development conversation and more when you create an account on GameDev.net!

Sign me up!