Sign in to follow this  

fast bitwise parsing in C++

Recommended Posts

When reading something like a huffman encoded file, since the huffman symbols have a length of a certain amount of bits in such a way that the symbols aren't nicely divided over bites, you need to read the thing bit per bit. Because everything works with bytes and C++ pointers can only point to bytes, I need to do things like this to access bits in a way as if I'm reading through a buffer of bits instead of a buffer of bytes:
int bit = (in[bp / 8] >> (bp % 8)) & 1; bp++;
Where in is the in buffer (which is an unsigned char buffer), bp is the "bit pointer". What I'd now like to ask is: Are there faster things than this to read bits out of unsigned char buffers? The "/ 8", ">>", "&" and "% 8" for every single bit seems so wasted to me... but I can't think of another way. Thanks!

Share this post

Link to post
Share on other sites
I don't have much experience in this realm but you could try:

int bit = (in[bp >> 3] >> (bp & 0x7)) & 1; bp++;

That atleast gets rid of the division and remainder ops, but I wouldnt be able to tell you how much faster it is, probably fairly insignificant. If it's ease of access you want, you could use a bit field:

struct OneByte
unsigned char bit0 : 1;
unsigned char bit1 : 1;
unsigned char bit2 : 1;
unsigned char bit3 : 1;
unsigned char bit4 : 1;
unsigned char bit5 : 1;
unsigned char bit6 : 1;
unsigned char bit7 : 1;

Again, like I said, I'm not really familiar with huffman encoding, but the best thing to do (if possible) is not to work with single bits directly. If you know which bits you need in each byte, mask those out and recombine them into a full byte (I'm assuming you will need the full byte anyway). Hopefully, that was atleast somewhat useful.

Kory Spansel

Share this post

Link to post
Share on other sites
Division is still pretty expensive on Intel CPUs, so I would avoid it.

In anything you do, you want to write the case for doing N things first, and then implement doing a single thing by setting N == 1. For example, implementing puts() in terms of putchar() is inefficient; it's better to implement putchar() in terms of puts().

When it comes to bits, you want to keep a pointer to your buffer, and a counter for how many bits are left in the current byte (or, better, word). When asked for N bits, you will shift out that many bits, and return them; updating your pointers.

Share this post

Link to post
Share on other sites
Ands and shifts are the way to do it. However, you can speed it up a lot by extracting several bits at a time instead of 1 at a time. I'm feeling generous, and I like solving little coding problems like this:
    // Extracts an n-bit (n <= 32) value from a buffer starting at offset bits
// and updates the offset value
unsigned int Extract( unsigned char const * buffer, int & offset, int n )
assert( n <= 32 );
int byteOffset = offset / 8;
int r = offset % 8;
unsigned char const * p = buffer + byteOffset;
unsigned int value = 0;

// If r > 0, then we are starting in the middle of a byte, so extract
// the rest of the byte first.
if ( r > 0 )
int k = ( r + n > 8 ) ? 8 - r : n; // Number of bits to be extracted from the current byte
value <<= k;
value |= ExtractFromByte( *p, r, k );
offset += k;
n -= k;

// As long as n >= 8, then we can extract entire bytes
while ( n >= 8 )
value <<= 8;
value |= *p;
offset += 8;
n -= 8;

// Get the remaining bits
if ( n > 0 )
value <<= n;
value |= ExtractFromByte( *p, 0, n );
offset += n;

// Extracts n bits at offset r from a byte. r+n must be <= 8.
inline unsigned char ExtractFromByte( unsigned char b, int r, int n )
assert( r + n <= 8 );
return ( b >> ( 8 - ( r + n ) ) ) & ( ( 1 << n ) - 1 );
Note: This code is untested.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this