How would I convert a number into binary bits, then truncate or enlarge their size, and then insert into a bit container?

Started by
12 comments, last by MarkS_ 9 years ago

Hi... I have tried to do this again and again but I'm failing to arrive at a solution. So I am posting here. Please bear with me. I want to take a number (declared preferably as int or char or std::uint8_t), convert it into its binary representation, then truncate or pad it by a certain variable number of bits given, and then insert it into a bit container (preferably std::vector<bool> because I need variable bit container size as per the variable number of bits). For example, I have int a= 2, b = 3. And let's say I have to write this as three bits and six bits respectively into the container. So I have to put 010 and 000011 into the bit container. So, how would I go from 2 to 010 or 3 to 000011 using normal STL methods? I tried every possible thing that came to my mind, but I got nothing. Please help. Thank you.

Advertisement
You can use the >>, <<, &, |, and ~ operators to do bitwise manipulation on integers.

Beware: The >> operator can* do arithmetic shifting if the type is signed (i.e. when the integer is signed, the highest bit is not modified when you shift).

Since this sounds a lot like homework I'm not sure if I can give any more hints than this.


*edit: Apparently ASR/LSR is implementation defined... making this even more of a pain in the ass than usual.

This is not homework. I'm trying to generate canonical Huffman codes as defined in http://en.wikipedia.org/wiki/Canonical_Huffman_code.

I did this logic on paper so I'll write it here. Let's say I have Huffman symbols and bit lengths (in order),


A	3
K	3
H	4
S	4
L	5
M	5
N	5
O	5

Since the pseudo code given in the link does some addition operations, I am using integer to assign code = 0. So, the final canonical codes will be,


A	0
K	1
H	4
S	5
L	12
M	13
N	14
O	15

Now, I want to convert these numbers into binary having their corresponding bit lengths as above so that I can use them to encode my file. So, these will become,


A	000
K	001
H	0100
S	0101
L	01100
M	01101
N	01110
O	01111

As you can see, the numbers from H - O do not need the first 0 at MSB to be represented. But it is required for correctly encoding the file and this is where I'm stuck. I hope I made myself clear. If anyone can think of a solution (or alternative), I would really appreciate it. Thank you very much.

If you want to work with what C++ has to offer then std::bitset is probably the best way to do it. It has a contructor that takes an unsigned long that extracts the individual bits for you and once you've created the bitset object then you can just use operator[] to access the bits you want.

You can use the >>, <<, &, |, and ~ operators to do bitwise manipulation on integers.


This.
If you don't want to use std::bitset, then you'll need to manually manage buffer your bits into bytes-- there is no way to read or write anything smaller than a byte from or to a stream:


struct foobar {
    unsigned buffer;
    int offset = sizeof(unsigned)*8 - 1;

    ostream out;
}

void foobar::add_bit(bool b) {
    this->buffer |= (b?1:0) << this->offset;

    if ( this->offset == 0 ) {
        this->write(this->buffer);

        this->offset = sizeof(this->buffer)*8 - 1;
        this->buffer = 0;
    } else
        this->offset--;
   
}

I tried using std::bitset. However, if I'm iterating through the list of symbols, I am not able change the size of the bitset every iteration to match the symbol's corresponding bit length since bitset must have a constant value.


for(every symbol, bit length)
{
	std::bitset<bit length> Bits(code);
}

Error: expression must have constant value.

Is there any alternative to this?

EDIT: fastcall22, that is not my problem. I can write bits to the file fine. I want to put these bits of variable sizes into a list (or map) of vector<bool> for each symbol so that I can use that to encode the corresponding symbol to the file.


convert it into its binary representation

Also, this sounds a bit weird, because computers store numbers in binary. Your int/char/whatever ALREADY is in its binary representation (because thats what the computer uses).

So, if you want the first three bits of an int, you just read the 3 first bits. No need to 'convert to binary' or anything.

As in:

int number=5;

int bits=3; //how many bits you want to read from number

std::vector<bool> bits;

for (int i=0; i<3; ++i) //bits stored in order 1st, 2nd, 3rd. Iterate in other direction for opposite order.

{

bool bit = static_cast<bool>((number>>i) & 1); //shift i bits to the right, then read the first bit (eg shift 2 to right, and the first bit will be the 3rd one)

bits.push_back(bit); //push it in the container

}

o3o


Also, this sounds a bit weird, because computers store numbers in binary. Your int/char/whatever ALREADY is in its binary representation (because thats what the computer uses).

Well, I thought so too... But when I write the number itself to the file, it is writing the ASCII value to the file. I assumed it might do the same thing even when I store it in a bit container. So, I thought I might need to do a conversion or something. For writing to the file, I figured out the way of packing bits into bytes to write the exact number. However, this I wasn't sure of. Thanks for the code, though. I'll check it out and see how it goes.

Well, I thought so too... But when I write the number itself to the file, it is writing the ASCII value to the file. I assumed it might do the same thing even when I store it in a bit container


There's a difference between stream << uint32 and stream.write(reinterpret_cast<const char*>(&uint32),sizeof(uint32)), regardless of whether or not you open a stream in binary or text mode.

In text mode, new lines are translated to a unified format: Any combination of 0x0D and 0x0A (windows, mac, and unix line endings) are translated to just 0x0A (IIRC).
In binary mode, there are no newline translations: "\r\n" remains as 0x0D 0x0A.

If you actually want to read and write binary data, then use stream.write/read:
int main() {
    unsigned short val = 1234; // (0x04D2)

    ofstream fs("out",ios::binary);

    fs << val << '\n';
    fs.write(reinterpret_cast<const char*>(&val),sizeof(val));
}

/* out contains (on a little-endian machine):
HEX                     ASCII
31 32 33 34 0A D2 04    1234.Ò.
*/

Well, I thought so too... But when I write the number itself to the file, it is writing the ASCII value to the file.


Sounds like you're writing a uint8_t, not realizing that in C/C++ that's a typedef for a char, and ostream treats a char as a, well, char (and hence interprets the number as a character in the current locale, usually resulting in ASCII output).

You can use the .write() method to output a specific value as raw bytes.

Now, I want to convert these numbers into binary having their corresponding bit lengths as above so that I can use them to encode my file. So, these will become,


C++ doesn't have any built-in mechanism for dealing with that. You can store your bit pattern in an uint8_t and then also store the number of bits in each pattern separately (e.g. as a pair<uint8_t, uint8_t> or a simple struct) and then use that for all the manual shifting you'll need to do.

There are libraries like Boost's dynamic_bitset that might come in handy, too.

But for the most part, this kind of low-level bit manipulation is (surprisingly) not something that C or C++ really excel at or make particularly easy.

As a note, keep in mind that you're often better off working with large integer types than char (like a regular int or long) when doing lots of this kind of stuff as it makes better use of the CPU's registers and instructions. Only work with 8-bit types when you really need to conserve memory usage; work with native machine integers everywhere you can.

Sean Middleditch – Game Systems Engineer – Join my team!

This topic is closed to new replies.

Advertisement