Sign in to follow this  
BeanDog

[.net] Reading strings written by BinaryWriter.Write(string)?

Recommended Posts

My game's toolset is written in C#. When I "compile" a set of content to a file, I use .NET's BinaryWriter class to write strings to the file. They're length-prefixed, but not (apparently) by a normal 4-byte int. My game itself is written in C++, and it needs to read these strings in. I can't seem to dig up on Google the format .NET uses for its length prefix on strings. Does someone have a link to sample code for reading such strings or a specification on the format?

Share this post


Link to post
Share on other sites
When you write out a string using BinaryWriter, the string is length prefixed with a 7 bit encoded integer. This enables it to saves space for when the length is not consuming the full 32 bit integer (as most won't)

Share this post


Link to post
Share on other sites
You know better than I do, but on the page it says:

Quote:
A length-prefixed string represents the string length by prefixing to the string a single byte or word that contains the length of that string.

Share this post


Link to post
Share on other sites
According to the current version of MSDN on BinaryWriter.Write(String) -
Quote:
This method first writes the length of the string as a four-byte unsigned integer, and then writes that many characters to the stream.
So far we have arguments for a byte, word, int and even variable length length prefix. [smile]

Share this post


Link to post
Share on other sites
I know it is not constant--experimentation showed that a 50-character string gets a 1-byte prefix, and that longer strings get more bytes. I'd rather not examine the output of the function for every possible length of string, so the search for documentation continues.

Share this post


Link to post
Share on other sites
Actually, I see byte, short, and int all in one paragraph:

Quote:
A length-prefixed string represents the string length by prefixing to the string a single byte or word that contains the length of that string. This method first writes the length of the string as a four-byte unsigned integer, and then writes that many characters to the stream. This method writes a length-prefixed string to this stream using the BinaryWriter instance's current Encoding.


Nice.

Share this post


Link to post
Share on other sites
You can use Reflector to inspect the code of BinaryWriter.Write(string). You will be fixed.

edit : according to the code for strings with less than 128 characters, it will prefix with a single byte, greater or equal to 128 it will prefix with 4 bytes.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
protected void Write7BitEncodedInt(int value)
{
uint num = (uint) value;
while (num >= 0x80)
{
this.Write((byte) (num | 0x80));
num = num >> 7;
}
this.Write((byte) num);
}


Share this post


Link to post
Share on other sites
You can work this out by writing code that tests all the boundary cases.


FileStream fs = new FileStream("c:\\moo.bin", FileMode.Create);
BinaryWriter bw = new BinaryWriter(fs);
string moo = "";
bw.Write( moo.PadLeft(5, 'X') );
bw.Write( moo.PadLeft(127, 'Y') );
bw.Write( moo.PadLeft(128, 'Z') );
bw.Write( moo.PadLeft(256, 'A') );
bw.Write( moo.PadLeft(32767, 'B') );
bw.Write( moo.PadLeft(32768, 'C') );
fs.Close();



It seems to me, that bit 7 of each byte determines if there is another byte to read. So lengths 0-127 take 1 byte, a length
of 128 requires two bytes, i.e. bit 7 is set to 1 but the real bit 7 is in bit 0 of the next byte, so in this manner 256 seems to be 0x80 0x02

Or I might be talking complete rubbish...

Jans.

Share this post


Link to post
Share on other sites
And binary reader usesthis courtesy of reflector.


protected internal int Read7BitEncodedInt()
{
byte num3;
int num = 0;
int num2 = 0;
do
{
if (num2 == 0x23)
{
throw new FormatException(Environment.GetResourceString("Format_Bad7BitInt32"));
}
num3 = this.ReadByte();
num |= (num3 & 0x7f) << num2;
num2 += 7;
}
while ((num3 & 0x80) != 0);
return num;
}


Share this post


Link to post
Share on other sites
Quote:
Original post by Niksan2
And binary reader usesthis courtesy of reflector.
*** Source Snippet Removed ***

And we have a winner! Anyone else find it strange that you have to decompile a system library to determine non-secret, defined behavior?

Share this post


Link to post
Share on other sites
Quote:
MSDN Docs
Reads a string from the current stream. The string is prefixed with the length, encoded as an integer seven bits at a time.

Lo and behold, it is! I'd probably seen that earlier, but it didn't mean anything to me--especially since I'd just read the Write(string) documentation that said it was either a byte... or a word... or a 4-byte int... or something else.

Share this post


Link to post
Share on other sites
Quote:
Original post by BeanDog
Quote:
Original post by Niksan2
And binary reader usesthis courtesy of reflector.
*** Source Snippet Removed ***

And we have a winner! Anyone else find it strange that you have to decompile a system library to determine non-secret, defined behavior?


http://blogs.msdn.com/sburke/archive/2008/01/16/configuring-visual-studio-to-debug-net-framework-source-code.aspx

btw you can exactly see how its encoded.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this