How would I read/write bit by bit from/to a file?

Started by
17 comments, last by Bacterius 9 years ago

OK... My code is not going to be portable. It's just for Windows. I can do what you have suggested, but is that the only reason why my output bits are not matching with the file bits or is it something related to my code. Am I doing any mistake in the code above? Please tell me. Thanks.

Advertisement

The reason the number of bits aren't matching is because you only output when you have 8bits accumulated so you'll need to account for those extra bits somehow. However, the issue with the output bits not matching the file bits is some other issue. I don't know what you're 'output bits' code is and I don't know how you're comparing that with your 'file bits' so maybe that's your problem.

C++: A Dialog | C++0x Features: Part1 (lambdas, auto, static_assert) , Part 2 (rvalue references) , Part 3 (decltype) | Write Games | Fix Your Timestep!

This is my output bits code... It just overloads the << operator so that I can print a vector<bool> using std::cout.


std::ostream& operator<<(std::ostream& os, std::vector<bool> Vec)
{
	std::copy(Vec.begin(), Vec.end(), std::ostream_iterator<bool>(os, ""));
	return os;
}

And since I'm not writing any header or anything and just writing bits directly from the vector<bool>, I'm comparing the first 2-3 bytes of the file in the Hex Editor with the first 16-24 bits in the console output. It doesn't match. It worked with the test program I did above where I just write one byte to the file 256 times to create a 256 byte file. There was a match. Somehow, it's not working here.

However, please clarify me this... Is the code I wrote above correct? Am I missing anything, like a reversing operation or something like that? Or is the function perfectly fine? Please tell me, thank you for your time.

Your code looks okay. I suggest you write a function that prints out your Huffman code as a string so you can compare the code before and after it has been written to a file. One thing to pay attention to is the bit order per byte. If someone else is to read your Huffman code then you need to document in which order that you should read the bits in each byte. I don't know how a vector of bools is implemented by your compiler but it could very well use the reverse bit order that you're using or even use an unsigned int as storage or similar. That could be why the bit patterns in memory and on file don't match each other exactly.

As others have already said, you really shouldn't be using a vector of bools to begin with since it's a deprecated and broken feature of the C++ standard. It's not hard to implement your own container if you want a dynamic bitset.

SOL, I did a couple of tests and found out that the vector<bool> works fine (at least in this particular instance). I tested to see which part of the code is causing the problem and found out it was the Bit_Counter part of the loop. Somehow it is not packing the bits correctly. I tried to write just one byte in the same loop and it works and everything fits. The output window, the hex editor, everything matches. I'm not able to see what's wrong. The operation syntax is correct. Here's the edited code which writes only byte perfectly,


for(int i = 0; i < Encoded_Data.size() /*this is 8 bits long now*/; ++i)
{
	if(Encoded_Data[i] == 1)
	{
		Packed_Byte |= 1;
	}

	if(i < Encoded_Data.size() - 1 /*naturally, this will be 7*/)
	{
		Packed_Byte <<= 1;
	}

	/*++Bit_Counter;

	if(Bit_Counter == 8)
	{
		Output_File << Packed_Byte;

		Bit_Counter = 0;
	}*/                        //this I removed completely
}

Output_File << Packed_Byte;

Any idea why? I'm clueless. sad.png

What are you testing with? This does not do what you think it does (taken from the code in your first post):


std::uint8_t F = 10111001;




This is wrong:


if (i < Encoded_Data.size() - 1 /*naturally, this will be 7*/)
{
    Packed_Byte <<= 1;
}

You should be testing against the number of bits in a byte and not the number of bits in the entire bit stream. Otherwise you'll end up with one extra shift. You need to detect the last byte separately.

What are you testing with? This does not do what you think it does (taken from the code in your first post):


std::uint8_t F = 10111001;

Yeah, I've moved on past that. I've already mentioned that I've been corrected on that matter. That's fine.

This is wrong:


if (i < Encoded_Data.size() - 1 /*naturally, this will be 7*/)
{
    Packed_Byte <<= 1;
}

You should be testing against the number of bits in a byte and not the number of bits in the entire bit stream. Otherwise you'll end up with one extra shift. You need to detect the last byte separately.

I thought about this a little... And came up with this. I don't know whether it is correct.


for(int i = 0; i < Encoded_Data.size(); ++i)
{
	if(Encoded_Data[i] == 1)
	{
		Packed_Byte |= 1;
	}

	++Bit_Counter;

	if(Bit_Counter != 8)
	{
		Packed_Byte <<= 1;
	}

	if(Bit_Counter == 8)
	{
		Output_File << Packed_Byte;

		Bit_Counter = 0;
	}
}

Why I say this is because, even though I am getting my desired output now, somehow still some bytes are not matching. Here's a sample,


From Hex Editor,
10100000 00100101 10001011 10110011 11010101

From Console Output,
10100000 00100101 10001011 10110011 01010101

As you can see, the fifth byte's first bit is 1 instead of 0. Any idea why?

EDIT: I forgot to reset the packed byte (Packed_Byte = 0) when I reset the bit counter. It is using the bit from the previous byte. Now it works perfectly. I also wrote the code for appending extra bits in order to write the last byte. I just have one last thing to do. I have to write the eof bit into the file. How would I do this? I couldn't find anything proper about it on googling.

Do you really have a need to work on the generated "binary" file at all, e.g. somehow modify or inspect the file through your Hex editor?

If no then why not just store the binaries as std::uint8_t and later read them back as std::uint8_t and convert to 0s and 1s as needed?

Do you really have a need to work on the generated "binary" file at all, e.g. somehow modify or inspect the file through your Hex editor?

If no then why not just store the binaries as std::uint8_t and later read them back as std::uint8_t and convert to 0s and 1s as needed?

Wouldn't that defeat the purpose of *compressing* said file which as I understand it is what Huffman coding is mainly for? Unless I misunderstood the problem.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

This topic is closed to new replies.

Advertisement