Sign in to follow this  
Komal Shashank

How would I convert a number into binary bits, then truncate or enlarge their size, and then insert into a bit container?

Recommended Posts

Hi... I have tried to do this again and again but I'm failing to arrive at a solution. So I am posting here. Please bear with me. I want to take a number (declared preferably as int or char or std::uint8_t), convert it into its binary representation, then truncate or pad it by a certain variable number of bits given, and then insert it into a bit container (preferably std::vector<bool> because I need variable bit container size as per the variable number of bits). For example, I have int a= 2, b = 3. And let's say I have to write this as three bits and six bits respectively into the container. So I have to put 010 and 000011 into the bit container. So, how would I go from 2 to 010 or 3 to 000011 using normal STL methods? I tried every possible thing that came to my mind, but I got nothing. Please help. Thank you.

Share this post


Link to post
Share on other sites
You can use the >>, <<, &, |, and ~ operators to do bitwise manipulation on integers.

Beware: The >> operator can* do arithmetic shifting if the type is signed (i.e. when the integer is signed, the highest bit is not modified when you shift).

Since this sounds a lot like homework I'm not sure if I can give any more hints than this.


*edit: Apparently ASR/LSR is implementation defined... making this even more of a pain in the ass than usual. Edited by Nypyren

Share this post


Link to post
Share on other sites

This is not homework. I'm trying to generate canonical Huffman codes as defined in http://en.wikipedia.org/wiki/Canonical_Huffman_code.

 

I did this logic on paper so I'll write it here. Let's say I have Huffman symbols and bit lengths (in order),

A	3
K	3
H	4
S	4
L	5
M	5
N	5
O	5

Since the pseudo code given in the link does some addition operations, I am using integer to assign code = 0. So, the final canonical codes will be,

A	0
K	1
H	4
S	5
L	12
M	13
N	14
O	15

Now, I want to convert these numbers into binary having their corresponding bit lengths as above so that I can use them to encode my file. So, these will become,

A	000
K	001
H	0100
S	0101
L	01100
M	01101
N	01110
O	01111

As you can see, the numbers from H - O do not need the first 0 at MSB to be represented. But it is required for correctly encoding the file and this is where I'm stuck. I hope I made myself clear. If anyone can think of a solution (or alternative), I would really appreciate it. Thank you very much.

Edited by WDRKKS

Share this post


Link to post
Share on other sites

If you want to work with what C++ has to offer then std::bitset is probably the best way to do it. It has a contructor that takes an unsigned long that extracts the individual bits for you and once you've created the bitset object then you can just use operator[] to access the bits you want.

Share this post


Link to post
Share on other sites

You can use the >>, <<, &, |, and ~ operators to do bitwise manipulation on integers.


This.
If you don't want to use std::bitset, then you'll need to manually manage buffer your bits into bytes-- there is no way to read or write anything smaller than a byte from or to a stream:

struct foobar {
    unsigned buffer;
    int offset = sizeof(unsigned)*8 - 1;

    ostream out;
}

void foobar::add_bit(bool b) {
    this->buffer |= (b?1:0) << this->offset;

    if ( this->offset == 0 ) {
        this->write(this->buffer);

        this->offset = sizeof(this->buffer)*8 - 1;
        this->buffer = 0;
    } else
        this->offset--;
   
}

Share this post


Link to post
Share on other sites

I tried using std::bitset. However, if I'm iterating through the list of symbols, I am not able change the size of the bitset every iteration to match the symbol's corresponding bit length since bitset must have a constant value.

for(every symbol, bit length)
{
	std::bitset<bit length> Bits(code);
}

Error: expression must have constant value.

Is there any alternative to this?

 

EDIT: fastcall22, that is not my problem. I can write bits to the file fine. I want to put these bits of variable sizes into a list (or map) of vector<bool> for each symbol so that I can use that to encode the corresponding symbol to the file.

Edited by WDRKKS

Share this post


Link to post
Share on other sites

convert it into its binary representation

 

Also, this sounds a bit weird, because computers store numbers in binary. Your int/char/whatever ALREADY is in its binary representation (because thats what the computer uses).

 

So, if you want the first three bits of an int, you just read the 3 first bits. No need to 'convert to binary' or anything.

 

As in:

int number=5;

int bits=3; //how many bits you want to read from number

std::vector<bool> bits;

for (int i=0; i<3; ++i) //bits stored in order 1st, 2nd, 3rd. Iterate in other direction for opposite order.

{

bool bit = static_cast<bool>((number>>i) & 1); //shift i bits to the right, then read the first bit (eg shift 2 to right, and the first bit will be the 3rd one)

bits.push_back(bit); //push it in the container

}

Edited by Waterlimon

Share this post


Link to post
Share on other sites

Also, this sounds a bit weird, because computers store numbers in binary. Your int/char/whatever ALREADY is in its binary representation (because thats what the computer uses).

 

Well, I thought so too... But when I write the number itself to the file, it is writing the ASCII value to the file. I assumed it might do the same thing even when I store it in a bit container. So, I thought I might need to do a conversion or something. For writing to the file, I figured out the way of packing bits into bytes to write the exact number. However, this I wasn't sure of. Thanks for the code, though. I'll check it out and see how it goes.

Edited by WDRKKS

Share this post


Link to post
Share on other sites

Well, I thought so too... But when I write the number itself to the file, it is writing the ASCII value to the file. I assumed it might do the same thing even when I store it in a bit container


There's a difference between [tt]stream << uint32[/tt] and [tt]stream.write(reinterpret_cast<const char*>(&uint32),sizeof(uint32))[/tt], regardless of whether or not you open a stream in binary or text mode.

In text mode, new lines are translated to a unified format: Any combination of 0x0D and 0x0A (windows, mac, and unix line endings) are translated to just 0x0A (IIRC).
In binary mode, there are no newline translations: "\r\n" remains as 0x0D 0x0A.

If you actually want to read and write binary data, then use stream.write/read:
int main() {
    unsigned short val = 1234; // (0x04D2)

    ofstream fs("out",ios::binary);

    fs << val << '\n';
    fs.write(reinterpret_cast<const char*>(&val),sizeof(val));
}

/* out contains (on a little-endian machine):
HEX                     ASCII
31 32 33 34 0A D2 04    1234.Ò.
*/
Edited by fastcall22

Share this post


Link to post
Share on other sites

Well, I thought so too... But when I write the number itself to the file, it is writing the ASCII value to the file.


Sounds like you're writing a uint8_t, not realizing that in C/C++ that's a typedef for a char, and ostream treats a char as a, well, char (and hence interprets the number as a character in the current locale, usually resulting in ASCII output).

You can use the .write() method to output a specific value as raw bytes.

Now, I want to convert these numbers into binary having their corresponding bit lengths as above so that I can use them to encode my file. So, these will become,


C++ doesn't have any built-in mechanism for dealing with that. You can store your bit pattern in an uint8_t and then also store the number of bits in each pattern separately (e.g. as a pair<uint8_t, uint8_t> or a simple struct) and then use that for all the manual shifting you'll need to do.

There are libraries like Boost's dynamic_bitset that might come in handy, too.

But for the most part, this kind of low-level bit manipulation is (surprisingly) not something that C or C++ really excel at or make particularly easy.

As a note, keep in mind that you're often better off working with large integer types than char (like a regular int or long) when doing lots of this kind of stuff as it makes better use of the CPU's registers and instructions. Only work with 8-bit types when you really need to conserve memory usage; work with native machine integers everywhere you can.

Share this post


Link to post
Share on other sites

Keep in mind that an alphabet with N symbols can require codes with up to N - 1 bits so unless your symbol set is small then the longest codes won't fit in a single integer. How are you creating the codes in the first place? If you have a Huffman tree then you can just traverse that to build a vector of bools for every symbol.

Share this post


Link to post
Share on other sites

Thanks for the replies, guys. I really appreciate it.

 

@Sean: I know that uint8_t is a typedef for char, but since I was using it to pack my bits to write my bytes, I thought I could use it for this too. Apparently not. However, now I plan on using long for the canonical codes and then putting them into the vector<bool>. I'm trying to master STL for now, so I'm trying to avoid touching the Boost libraries. But, thanks for your advice and very informative post. I did not find this information on googling.

 

@fastcall22: OK... So, if I want to write the bit lengths (int or long) to the file header, how would I do it? I want to write them in such a way that, regardless of how I write them, I want to read them as the same values are they were before writing.

 

@SOL: I do have a tree and they are generating a vector<bool> for each symbol. But they are not canonical. And that's why I want to generate the canonical codes for each symbol.

Share this post


Link to post
Share on other sites
Decoding: Huffman tree.
Encoding: map of char to list of bools, dynamic_bitset, or a Huffman tree that supports fast leaf lookups.
File I/O: Bit stream (such as http://www.drdobbs.com/bitstream-parsing-in-c/184402014 )

Bit streams are easy to write. The idea is:

- Make a class that holds a reference to the underlying stream to write to.
- Give the class two unsigned integers: "position" and "value".
- When you write a bit, value |= (bit << position); and position++; // do not shift the value itself - it makes decoding much harder.
- When your integer fills up (position == 31), write 'value' to the underlying stream and reset position and value to zero.
- When you have written all bits, if position != 0, write the value to the underlying stream.

The decoding stream works the same way but in reverse. Edited by Nypyren

Share this post


Link to post
Share on other sites

 


Also, this sounds a bit weird, because computers store numbers in binary. Your int/char/whatever ALREADY is in its binary representation (because thats what the computer uses).

 

Well, I thought so too... But when I write the number itself to the file, it is writing the ASCII value to the file. I assumed it might do the same thing even when I store it in a bit container. So, I thought I might need to do a conversion or something. For writing to the file, I figured out the way of packing bits into bytes to write the exact number. However, this I wasn't sure of. Thanks for the code, though. I'll check it out and see how it goes.

 

 

This statement makes me think that you are getting confused by the representation and not the methods. ASCII is nothing more than a human-readable representation of an 8-bit binary string. For instance, the binary string "0100 0001" is 65 in decimal, 41 in hex and 'A' in ASCII. What hex editor are you using to view the binary files? The hex editor I use shows two panels, the left being the hexadecimal representation and the right being the ASCII representation. I can click on any hex value or ASCII character to get the binary string.

 

It seems to me that you are not fully understanding what is being written to the file, even if it is correct. Don't get hung up on the representation. That is just there to make it easier for you to read. Understand what the representation is representing.

Edited by MarkS

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this