fast bitwise parsing in C++

Started by
4 comments, last by JohnBolton 17 years, 12 months ago
When reading something like a huffman encoded file, since the huffman symbols have a length of a certain amount of bits in such a way that the symbols aren't nicely divided over bites, you need to read the thing bit per bit. Because everything works with bytes and C++ pointers can only point to bytes, I need to do things like this to access bits in a way as if I'm reading through a buffer of bits instead of a buffer of bytes:
int bit = (in[bp / 8] >> (bp % 8)) & 1; bp++;
Where in is the in buffer (which is an unsigned char buffer), bp is the "bit pointer". What I'd now like to ask is: Are there faster things than this to read bits out of unsigned char buffers? The "/ 8", ">>", "&" and "% 8" for every single bit seems so wasted to me... but I can't think of another way. Thanks!
Advertisement
map

Kuphryn
I don't have much experience in this realm but you could try:

int bit = (in[bp >> 3] >> (bp & 0x7)) & 1; bp++;


That atleast gets rid of the division and remainder ops, but I wouldnt be able to tell you how much faster it is, probably fairly insignificant. If it's ease of access you want, you could use a bit field:

struct OneByte{unsigned char bit0 : 1;unsigned char bit1 : 1;unsigned char bit2 : 1;unsigned char bit3 : 1;unsigned char bit4 : 1;unsigned char bit5 : 1;unsigned char bit6 : 1;unsigned char bit7 : 1;};


Again, like I said, I'm not really familiar with huffman encoding, but the best thing to do (if possible) is not to work with single bits directly. If you know which bits you need in each byte, mask those out and recombine them into a full byte (I'm assuming you will need the full byte anyway). Hopefully, that was atleast somewhat useful.

--
Kory Spansel
korys@pipeworks.com
"The pioneers of a warless world are the youth who refuse military service" - Albert Einstein
Quote:Original post by kuphryn
map

Kuphryn


What do you mean by map?
Division is still pretty expensive on Intel CPUs, so I would avoid it.

In anything you do, you want to write the case for doing N things first, and then implement doing a single thing by setting N == 1. For example, implementing puts() in terms of putchar() is inefficient; it's better to implement putchar() in terms of puts().

When it comes to bits, you want to keep a pointer to your buffer, and a counter for how many bits are left in the current byte (or, better, word). When asked for N bits, you will shift out that many bits, and return them; updating your pointers.

enum Bool { True, False, FileNotFound };
Ands and shifts are the way to do it. However, you can speed it up a lot by extracting several bits at a time instead of 1 at a time. I'm feeling generous, and I like solving little coding problems like this:
    // Extracts an n-bit (n <= 32) value from a buffer starting at offset bits    // and updates the offset value    unsigned int Extract( unsigned char const * buffer, int & offset, int n )    {        assert( n <= 32 );        int byteOffset = offset / 8;        int r = offset % 8;        unsigned char const * p = buffer + byteOffset;        unsigned int value = 0;                // If r > 0, then we are starting in the middle of a byte, so extract        // the rest of the byte first.        if ( r > 0 )        {            int k = ( r + n > 8 ) ? 8 - r : n; // Number of bits to be extracted from the current byte            value <<= k;            value |= ExtractFromByte( *p, r, k );            ++p;            offset += k;            n -= k;        }                // As long as n >= 8, then we can extract entire bytes        while ( n >= 8 )        {            value <<= 8;            value |= *p;            ++p;            offset += 8;            n -= 8;        }                // Get the remaining bits        if ( n > 0 )        {            value <<= n;            value |= ExtractFromByte( *p, 0, n );            offset += n;        }    }    // Extracts n bits at offset r from a byte. r+n must be <= 8.    inline unsigned char ExtractFromByte( unsigned char b, int r, int n )    {        assert( r + n <= 8 );        return ( b >> ( 8 - ( r + n ) ) ) & ( ( 1 << n ) - 1 );    } 
Note: This code is untested.
John BoltonLocomotive Games (THQ)Current Project: Destroy All Humans (Wii). IN STORES NOW!

This topic is closed to new replies.

Advertisement