questions about dealing with bits (compression)

Started by
7 comments, last by umbrae 18 years, 8 months ago
I'm writing an implementation of the Lempel-Ziv-Welch compression for a uni assignment - no, I'm not asking you to write it for me :P. I'm going to use bit packing to output the pattern id's, that is, for the first 256 numbers I can just write 9 bits (not 8 - reason is in the implementation of lzw) and increase it to 10 bits, then 11 etc. I want to output these numbers to a binary stream without having to write some crazy algorithm that fills up a fixed size data type but as far as I can tell the minimum size of data to output is a byte. Is there a way to just output data one bit at a time? That would be the best solution because it also means that I don't need to know how big it's going to be, but then again I could probably figure out how to find it's size... Anyway, another way would be if there was a function or something that allowed you to just randomly write bits to a buffer. Say I had a buffer:
unsigned int* buffer = new unsigned int[sizeOfData];
and I had the last 13 bits of an unsigned int that I wanted in the first 13 bits of the buffer, I would just use bit shifting and make it work. But if the position of the data that I wanted to write spanned 2 unsigned int's... I would have to do a lot of algorithmic genius and I don't really want to do it. Is there a way to write data at an arbitary offset (in bits!) in a buffer? I can write something crazy that fills up an unsigned int with data and works out the bits left over, but I'm a Lazy Programmer(tm), that is "Do as little work as possible to get the task completed, but no less". If someone else has implemented this, or (more importantly) it's already in the stdlib or the basic functionality of C/C++ I want to use it. Any help or pointers would be grear. A simple "No that's not possible" would also be good :).
Advertisement
You can set an arbitrary bit in a buffer like this:

void setBit(unsigned char*buffer,unsigned int BitNumber,bool value){    static const unsigned char masks[] = {128,64,32,16,8,4,2,1};        unsigned int byteNumber = BitNumber/8;    unsigned int bitNumber  = BitNumber%8;    if(value)        buffer[byteNumber] |= masks[bitNumber];    else        buffer[byteNumber] &= 255-masks[bitNumber];}


Quote:Is there a way to just output data one bit at a time?

No. You can write a simple wrapper to simulate it however. Eg:

void WriteData(const int& data){	static int bitCount = 0;	static unsigned char storedData;	storedData = (storedData << 1) | data;	if(++bitCount == 8)	{		output << storedData;		bitCount = 0;	}}

Simplistic, and has problems (eg, no way to flush the output buffer), but that's the basic idea. See if you can design a better one. Enjoy. :)
Make buffer = ciel[Data / 8]

You know which element, by going data >> 3
Which bit in the element by data & 7
Use a mask to figure out what to do.

From,
Ncie coder
Click here to patch the mozilla IDN exploit, or click Here then type in Network.enableidn and set its value to false. Restart the browser for the patches to work.
Thanks a bundle everyone.

All valid and good methods, I was just hoping for something that didn't use a bunch of bitshifting and/or a function call for each bit...

I'm guessing that this isn't possible...


Thanks again :)
It's always going to be bit shifting and/or bit masks at low level, but you could easily take the function Igave you and create higher level wrappers around it for other types:


//use with POD types onlytemplate<class T>void writeUnalligned(unsigned char* buffer,unsigned int BitNumber,const T& data){    static const unsigned char masks[] = {128,64,32,16,8,4,2,1};    const byte* rawData = reinterpret_cast<const byte*>(&data);    for(int i=0;i<sizeof(T);++i)    {       for(int j=0;j<8;++j)       {         setBit(buffer,BitNumber+(i*8)+j,bool(rawData & masks[j]);       }    }}


That wraps up setBit for use with any pod type.
I can see no way to do this generically with a POD type.

Please check this code before using it - I only spent a couple of minutes on it and I haven't tested it.
Thanks again.

For my purpose I can get away with further simplifications and optimisations. Currently with my quick and dirty implementation (I don't even know if it can be uncompressed yet) I'm happy. I just used a temp buffer and wrote it out when it was full, remembering how much of the last chunk of data was left over.

Original Text File:                     578,178Unix compress (adaptive Lempel-Ziv)     253,435My Code:                                246,236OSX Zip:                                219,652Unix gzip (LZ77):                       219,308Unix zip:                               219,058Unix bzip2 (block-sorting):             180,775


Not bad for one night's work :)... it beats compress at least.
How about:
#include <bitset>#include <iostream>#include <limits>#include <stdexcept>#include <string>#include <vector>class BitVector{	private:		typedef std::vector< unsigned char > BitVectorImpl;	public:		BitVector(unsigned char undefinedBit = 0);		BitVector(std::string initialContents, unsigned char undefinedBit = 0);		void append(BitVector const & bitVector);		BitVectorImpl::iterator begin();		BitVectorImpl::const_iterator begin() const;		BitVectorImpl::iterator end();		BitVectorImpl::const_iterator end() const;		template < typename CHAR_TYPE, typename CHAR_TRAITS >		void printBinary(std::basic_ostream< CHAR_TYPE, CHAR_TRAITS > & stream) const;	private:		BitVectorImpl bitVector_;		unsigned int numberOfBits_;		unsigned char undefinedBit_;};BitVector::BitVector(unsigned char undefinedBit)	:	numberOfBits_(0),	undefinedBit_(undefinedBit ? 1 : 0){}BitVector::BitVector(std::string initialContents, unsigned char undefinedBit)	:	bitVector_((initialContents.size() + std::numeric_limits< unsigned char >::digits - 1) / std::numeric_limits< unsigned char >::digits),	numberOfBits_(initialContents.size()),	undefinedBit_(undefinedBit ? 1 : 0){	initialContents.insert(initialContents.end(), std::numeric_limits< unsigned char >::digits - (initialContents.size() % std::numeric_limits< unsigned char >::digits), undefinedBit_ ? '1' : '0');	for (unsigned int bitIndex = 0; bitIndex < initialContents.size(); bitIndex += std::numeric_limits< unsigned char >::digits)	{		std::bitset< std::numeric_limits< unsigned char >::digits > chunk(initialContents.substr(bitIndex, std::numeric_limits< unsigned char >::digits));		bitVector_[bitIndex / std::numeric_limits< unsigned char >::digits] = chunk.to_ulong();	}}void BitVector::append(BitVector const & bitVector){	if (numberOfBits_ + bitVector.numberOfBits_ < numberOfBits_)	{		throw std::overflow_error("bit vector overflow");	}	unsigned int extraBytesRequired = ((numberOfBits_ + bitVector.numberOfBits_ + std::numeric_limits< unsigned char >::digits - 1) / std::numeric_limits< unsigned char >::digits) - bitVector_.size();	bitVector_.insert(bitVector_.end(), extraBytesRequired, undefinedBit_ ? std::numeric_limits< unsigned char >::max() : std::numeric_limits< unsigned char >::min());	if (numberOfBits_ % std::numeric_limits< unsigned char >::digits == 0)	{		std::copy(bitVector.begin(), bitVector.end() - 1, bitVector_.begin() + (numberOfBits_ / std::numeric_limits< unsigned char >::digits));		unsigned char leftOverBits = bitVector.numberOfBits_ % std::numeric_limits< unsigned char >::digits;		unsigned char mask = ((1 << leftOverBits) - 1) << (std::numeric_limits< unsigned char >::digits - leftOverBits);		bitVector_.back() = (bitVector_.back() & ~mask) | (bitVector.bitVector_.back() & mask);	}	else	{		unsigned char remainingBitsInTargetByte = ((bitVector_.size() - extraBytesRequired) * std::numeric_limits< unsigned char >::digits) - numberOfBits_;		unsigned char remainingBitsInSourceByte = std::numeric_limits< unsigned char >::digits - remainingBitsInTargetByte;		unsigned char sourceOffsetMask = ((1 << remainingBitsInTargetByte) - 1) << remainingBitsInSourceByte;		unsigned char targetOffsetMask = ~((1 << remainingBitsInTargetByte) - 1);		BitVectorImpl::iterator targetByte = bitVector_.begin() + (numberOfBits_ / std::numeric_limits< unsigned char >::digits);		BitVectorImpl::const_iterator sourceByte = bitVector.begin();		unsigned int bitsCopied = 0;		unsigned int fullByteBits = bitVector.numberOfBits_ - (bitVector.numberOfBits_ % std::numeric_limits< unsigned char >::digits);		while (bitsCopied < fullByteBits)		{			*targetByte = (*targetByte & targetOffsetMask) | ((*sourceByte & sourceOffsetMask) >> remainingBitsInSourceByte);			if (bitsCopied + remainingBitsInTargetByte < bitVector.numberOfBits_)			{				++targetByte;				*targetByte = (*targetByte & ~targetOffsetMask) | ((*sourceByte & ~sourceOffsetMask) << remainingBitsInTargetByte);			}			bitsCopied += 8;			++sourceByte;		}		if (bitsCopied < bitVector.numberOfBits_)		{			if (static_cast< unsigned int >(remainingBitsInTargetByte) > bitVector.numberOfBits_ - bitsCopied)			{				unsigned char bitsRemaining = bitVector.numberOfBits_ - bitsCopied;				sourceOffsetMask &= ((1 << bitsRemaining) - 1) << (std::numeric_limits< unsigned char >::digits - bitsRemaining);				targetOffsetMask |= (1 << (static_cast< unsigned int >(remainingBitsInTargetByte) - bitsRemaining)) - 1;			}			*targetByte = (*targetByte & targetOffsetMask) | ((*sourceByte & sourceOffsetMask) >> remainingBitsInSourceByte);			if (bitsCopied + remainingBitsInTargetByte < bitVector.numberOfBits_)			{				if (static_cast< unsigned int >(remainingBitsInSourceByte) > bitVector.numberOfBits_ - (bitsCopied + remainingBitsInTargetByte))				{					unsigned char bitsRemaining = bitVector.numberOfBits_ - (bitsCopied + remainingBitsInTargetByte);					sourceOffsetMask |= ((1 << bitsRemaining) - 1) << (std::numeric_limits< unsigned char >::digits - bitsRemaining);					targetOffsetMask &= (1 << (static_cast< unsigned int >(remainingBitsInTargetByte) - bitsRemaining)) - 1;				}				++targetByte;				*targetByte = (*targetByte & ~targetOffsetMask) | ((*sourceByte & ~sourceOffsetMask) << remainingBitsInTargetByte);			}		}	}	numberOfBits_ += bitVector.numberOfBits_;}BitVector::BitVectorImpl::iterator BitVector::begin(){	return bitVector_.begin();}BitVector::BitVectorImpl::const_iterator BitVector::begin() const{	return bitVector_.begin();}BitVector::BitVectorImpl::iterator BitVector::end(){	return bitVector_.end();}BitVector::BitVectorImpl::const_iterator BitVector::end() const{	return bitVector_.end();}template < typename CHAR_TYPE, typename CHAR_TRAITS >void BitVector::printBinary(std::basic_ostream< CHAR_TYPE, CHAR_TRAITS > & stream) const{	BitVectorImpl::const_iterator chunkIterator = bitVector_.begin();	unsigned int bitIndex = 0;	for (; bitIndex + std::numeric_limits< unsigned char >::digits <= numberOfBits_; bitIndex += std::numeric_limits< unsigned char >::digits, ++chunkIterator)	{		std::bitset< std::numeric_limits< unsigned char >::digits > chunk(*chunkIterator);		stream << chunk.template to_string< CHAR_TYPE, CHAR_TRAITS, std::allocator< CHAR_TYPE > >();	}	for (unsigned char mask = 1 << (std::numeric_limits< unsigned char >::digits - 1); bitIndex < numberOfBits_; ++bitIndex, mask >>= 1)	{		stream << (*chunkIterator & mask ? '1' : '0');	}}BitVector & operator<<(BitVector & lhs, BitVector & rhs){	lhs.append(rhs);	return lhs;}int main(){	BitVector nineBitCode("010101111");	BitVector elevenBitCode("01000001111");	BitVector threeBitCode("111");	BitVector result;	result << nineBitCode << nineBitCode << threeBitCode << threeBitCode << nineBitCode << threeBitCode << elevenBitCode;	result.printBinary(std::cout);	std::cout << '\n';}


Enigma

EDIT: bug fix

[Edited by - Enigma on August 1, 2005 10:29:45 AM]
Quote:Original post by Enigma
How about:
*** Huge Source Snippet that's ripping the page in half :P ***


Thanks, but that's probably too *industrial* for my current project. I'm definitely copy pasting it though - thanks :).

This topic is closed to new replies.

Advertisement