Huffman Coding: Decoding via Bitshifts...?

Started by
3 comments, last by SaintGreg 19 years ago
Hi, I have a question regarding the decoding of a huffman coded file and the bitshifts required... If I read in the file one character at a time I was thinking it would be possible to shift bits into a temporary character and then look for the correct code but I am unsure as to shift one bit at a time from one character to another...?
Advertisement
you can always do something like this:

temp_byte = buffer [bit_index >> 3];temp_bit = (temp_byte >> (bit_index & 0x7)) & 1;


then you just keep incrementing bit_index. Of course, reloading the temp_byte each time might slow down your algorithm, but doing an "if (!(bit_index & 0x7))" is probably even slower.
Can you explain that a little more please? I am still very new to all this stuff and that code just kind of went over my head a couple miles up.

thanks,
john
sure! (BTW, I'm assuming you're using C/C++)

step 1: read all of the compressed data into a buffer (where buffer is an array of unsigned char)

to find any desired bit (i.e. bit_index) you do the following:

step 2: isolate the byte in which that bit is located, ergo:
"temp_byte = buffer [bit_index >> 3];"
note that >>3 means divide by 8, (look up bit-shifting) so you could say:
"temp_byte = buffer [bit_index / 8];"
this is because there are 8 bits in a byte, so dividing by 8 and rounding down gives you the byte that houses the bit at "bit_index"

step 3: grab the sellected bit from that byte, ergo:
"temp_bit = (temp_byte >> (bit_index & 0x7)) & 1;"
there are 3 parts here:
a) bit_index & 0x7 gets the remainder of bit_index / 8. You could also use modulo (in C/C++ it is the "%" character) for any generic number, but since 8 is a power of two, we can use "& (binary)111" instead of "% 8", as it's faster
b) temp_byte >> X means shift the bits right by X (where x is our bit remainder from step 3a). if X=0, there is no shifting, if X=0 it shifts all the bits by one, so the 2-bit moves into the 1-bit place, the 4-bit into the 2-bit place, etc.
c) Y & 1 simply grabs the least significant bit of the number Y. in the previous step, we made sure that the bit of interest was acually in that location

so to get the 1st bit in the whole buffer, set bit_index to 0, and call the code. to find the 1000th bit, set bit_index to 999, and call the code. You should also do error checking to make sure you never read more than was put into the buffer.

hope it helps
You need to access bits one at a time, so you read in the file to a temporary char, or int, or whatever, char is easiest. Then you acess each bit at a time:

for(int i=0;i<howeverBigTheFileIs;i++){  char temp_char;  temp_char = readInAcharFromFile(whateverFile);  for(int j=0;j<sizeof(char)*8;j++)  {    doWhatYouNeedToDoWithTheMostRecentBit(temp_char & (0x01 << j));  }}


And doing lonestar's way by reading from a char buffer, I would use "buffer[bit_index / 8]" not "buffer[bit_index >> 3]" since the compiler will perform the optimization to do the divide faster with bit shifts, and it looks alot clearer when you see that you are dividing by 8 to go from bit_index to byte_index.

[Edited by - SaintGreg on March 29, 2005 4:29:59 PM]

This topic is closed to new replies.

Advertisement