• entries
146
436
• views
198532

# On .Net serialization, Part 2

419 views

Last time I was decomposing the format of the serialized data that BinaryFormatter would emit for a simple type. Starting from where we left off, we are at offset 87 in the hex dump.

Quote:
 00000000 00 01 00 00 00 FF FF FF FF 01 00 00 00 00 00 00 ................00000010 00 0C 02 00 00 00 44 53 68 61 72 65 64 2C 20 56 ......DShared,.V00000020 65 72 73 69 6F 6E 3D 31 2E 30 2E 31 39 31 30 2E ersion.1.0.1910.00000030 32 39 34 38 36 2C 20 43 75 6C 74 75 72 65 3D 6E 29486,.Culture.n00000040 65 75 74 72 61 6C 2C 20 50 75 62 6C 69 63 4B 65 eutral,.PublicKe00000050 79 54 6F 6B 65 6E 3D 6E 75 6C 6C 05 01 00 00 00 yToken.null.....00000060 26 4B 65 6E 74 2E 53 68 61 72 65 64 2E 50 61 63 &Kent.Shared.Pac00000070 6B 65 74 73 2E 43 6C 69 65 6E 74 2E 4A 6F 69 6E kets.Client.Join00000080 52 65 71 75 65 73 74 02 00 00 00 07 56 65 72 73 Request.....Vers00000090 69 6F 6E 0A 50 6C 61 79 65 72 4E 61 6D 65 00 01 ion.PlayerName..000000A0 08 02 00 00 00 01 00 00 00 06 03 00 00 00 05 57 ...............W000000B0 61 73 68 75 0B ashu.

Quote:
 Offsets relative to the end of the typename.00 4 Member count

Now, you should be able to just guess what the next field is, just based on the output. It's the number of fields in the class. Then following it are that number of length prefixed strings. The lengths, again, are encoded in a 7 bit integer format. The strings are the names of the fields. After the field names comes a series of 1 byte fields which represent a BinaryTypeEnum of what the field type is, in general.
Quote:
 internal enum BinaryTypeEnum{ Primitive = 0, String = 1, Object = 2, ObjectUrt = 3, ObjectUser = 4, ObjectArray = 5, StringArray = 6, PrimitiveArray = 7}internal enum InternalPrimitiveTypeE{ Invalid = 0, Boolean = 1, Byte = 2, Char = 3, Currency = 4, Decimal = 5, Double = 6, Int16 = 7, Int32 = 8, Int64 = 9, SByte = 10, Single = 11, TimeSpan = 12, DateTime = 13, UInt16 = 14, UInt32 = 15, UInt64 = 16, Null = 17, String = 18}

So, as we can see, based on the above list, we have a Primitive followed by a String. Which, if you notice, also happens to match the order in which the field names were emitted, this is not coincidence. Following that comes a byte which indicates the type of the primitive, which in this case is 08, or int (see InternalPrimitiveTypeE enumeration). If the type is a string, an object, an object array or a string array, then this field is omitted.

Following that comes a 4 byte field containing the assembly ID (see offset A1), this number should match the initial assembly ID for our case, which is 02 00 00 00 (aka 2).

You will note that even now we still haven't written any of our actual data yet. Everything up to this point has just been building a graph of the various types that we are serializing so that we might be able to rebuild the type. We have the assembly name where the type resides, we have the names of the members, and an idea of what their types are (a primitive of type Int32 and a String). However, as you may have noticed, there isn't a whole lot left, so we must be getting down to SOME of our data...we hope.

Well, it turns out that this is the case. The next 4 bytes happen to be the value which gets stored in our Version field. Following that is an ID which indicates that the next value is a length prefixed string. The next byte (the 05) is a 7 bit encoded integer indicating the length of the string, and finally, the string is written. You will note, however, that there is one last byte to account for. This byte, the 0B that is, is the Serialization End Header. It indicates the end of the serialized chunk and is outputted after the entire object graph has been serialized. The value isn't unique, in that it can appear in the stream of data without being escaped.

Of course, now that we've seen all that it sends, we can also understand why it must send this much information. The simple explanation being: Since .Net serialization is just that, generic, it must provide all of the information required to identify the types contained within the stream, along with how they are laid out, and what types they reference/require as well. Of course, this means that for a simple PositionOrientation packet -
Quote:
 public struct Vertex { public Vertex(float x, float y, float z) { this.X = x; this.Y = y; this.Z = z; } public float X; public float Y; public float Z;}public struct PositionOrientation { public PositionOrientation(Vertex position, Vertex orientation) { this.Position = position; this.Orientation = orientation; } public Vertex Position; public Vertex Orientation;}

- We find that the amount of data that must be sent is nearly 10x as much as that sent, skyrocketing up to 298 bytes. In fact, this is one case where the shorter your member and type names, the greater your space savings. Although, not enough to sacrifice readability.

Of course, this doesn't mean that the type information will always outweigh the actual data. For instance sending a large enough array of bytes can easily outgrow the type information. But it does mean that for most game data, the amount of type information sent is much larger than we will need. Especially considering the fact that we will most likely know a great deal about the data that we expect to be arriving. Thus we must look to other ways of serializing the data.

One way of doing this, would be to use a pluggable factory. Now, if you've ever read the article on Pluggable Factories1 then the idea of sending an ID and using it to look up a factory for creating the original type is not new to you. Of course, this requires that we add each new type to the factory by hand. It also means that we still end up doing a lot of serialization work by hand, not a prospect we relish. Even encompassing that in the packet structures and using an interface to encode and decode that data isn't going to be very pretty. Optimally we should have a solution that is as easy to use as the BinaryFormatter, but geared towards size efficiency. Now that we have our customers requirements, the next step, of course, is going to be testing and building our new serializer. But I'll leave that to the next post.

1. Industrial Strength Pluggable Factories and also Why pluggable factories rock my world

## 1 Comment

Very interesting Washu, this is good to know.

~Graham

## Create an account

Register a new account