Jump to content
  • Advertisement
  • entries
    146
  • comments
    435
  • views
    198796

On .Net serialization, Part 1

Sign in to follow this  
Washu

1132 views

Well, having noticed a couple of recent posts on sending data across a network connection, I've decided to cover serialization, and how you can use it to send and receive network packets across a stream based, or even datagram based, protocol. Firstly I'm going to cover a bit about the BinaryFormatter, and how it works, along with what it must send to reconstruct the object you send from the raw data. Finally, I'll cover how you can write your own serializer that is as simple to use as the BinaryFormatter, albeit much more limited in scope and functionality, but that will generate much more compact serialized objects.

Now, one of the first things you may be thinking is that TCP isn't used in games, I've seen this misconception a lot
of places, and to be frank, it's a wee bit irritating. Some of your favorite MMORPGs (if you have one) use TCP.
Lineage 2 for instance, Asheron's call, Ultima Online, Everquest I and II. More than that, but many non MMO games use
TCP as well. To be frank, UDP is only really needed in a game when you like to spam packets, such as FPS games. More
importantly, you can still use serialization with datagram based protocols.

Now, when building a client and a server in .Net, you will typically want to use a shared assembly containing the
packets that you will exchange back and forth. This way you have only a single copy of the code that must be
maintained. More importantly, if you are going to use the serialization facilities that come with .Net by default,
then you must be using the same assembly, more on this in a bit. So, lets define a simple packet in our shared
assembly.
Quote:

using System;

namespace Kent.Shared.Packets.Client {
[Serializable]
public struct JoinRequest {
public JoinRequest(int version, string playerName) {
this.Version = version;
this.PlayerName = playerName;
}

public int Version;
public string PlayerName;
}
}

Of course, this packet is sent by our imaginary client to the server to join the game. It contains the version of
the client and the players name. You will also note that it has been marked as being serializable. This is required
so that we can serialize the packet to a stream.

Now, assuming our client to create and serialize this packet out to a stream, like so:
Quote:

BinaryFormatter formatter = new BinaryFormatter();

JoinRequest joinReq = new JoinRequest(1, "Washu");

formatter.Serialize(someStream, joinReq);

then you could simply recieve it by doing:
Quote:

BinaryFormatter formatter = new BinaryFormatter();

object thing = formatter.Deserialize(someStream);

JoinRequest joinReq = (JoinRequest)thing;


Seems simple enough, which is how it's supposed to be. One of the nice things about .Net is that using the
reflection capabilities you can easily query for information about an object, which is what serialization uses to
decompose and recompose the object at each end of the stream. However, if you look at what's being sent, you'll
notice something rather shocking, it's 181 bytes long! What you say? How can that be, it's just sending the version
and name. At most you would expect it to be 4+4+5, or 13 bytes long (4 bytes for an ID, 4 for the string length, and
5 for the string "Washu").

Well, that's not quite true. See, .Net serialization is a generic solution. Because of this they must present a
generic format for the serialized data that can be used across many machines. So it also must transmit type
information about the data contained.

A quick hex dump of the serialized data gives us:
Quote:

00000000 00 01 00 00 00 FF FF FF FF 01 00 00 00 00 00 00 ................
00000010 00 0C 02 00 00 00 44 53 68 61 72 65 64 2C 20 56 ......DShared,.V
00000020 65 72 73 69 6F 6E 3D 31 2E 30 2E 31 39 31 30 2E ersion.1.0.1910.
00000030 32 39 34 38 36 2C 20 43 75 6C 74 75 72 65 3D 6E 29486,.Culture.n
00000040 65 75 74 72 61 6C 2C 20 50 75 62 6C 69 63 4B 65 eutral,.PublicKe
00000050 79 54 6F 6B 65 6E 3D 6E 75 6C 6C 05 01 00 00 00 yToken.null.....
00000060 26 4B 65 6E 74 2E 53 68 61 72 65 64 2E 50 61 63 &Kent.Shared.Pac
00000070 6B 65 74 73 2E 43 6C 69 65 6E 74 2E 4A 6F 69 6E kets.Client.Join
00000080 52 65 71 75 65 73 74 02 00 00 00 07 56 65 72 73 Request.....Vers
00000090 69 6F 6E 0A 50 6C 61 79 65 72 4E 61 6D 65 00 01 ion.PlayerName..
000000A0 08 02 00 00 00 01 00 00 00 06 03 00 00 00 05 57 ...............W
000000B0 61 73 68 75 0B ashu.

A goodly amount of information there. So, lets see if we can't dissect this. Now, I'm going to mention this right
away: Anything I say about the data the BinaryFormatter outputs, in regards to the format, is volatile. It can
change at any time. This is a format used internally by the .Net framework, and so you shouldn't make any
assumptions that it will be the same for previous versions, and future versions of the framework.

The first part of the format
Quote:

Offset Size Description

00 1 SerializedStreamHeader (00)
01 4 Internal ID (01 00 00 00)
05 4 Header ID (FF FF FF FF)
09 4 Binary Formatter Major Version (01 00 00 00)
0D 4 Binary Formatter Minor Version (00 00 00 0C)

Seems pretty simple, you have a byte at offset 0 that determines the format of what follows, in this case a
Serialized Stream Header. Following that is an ID for internal use. Then the header ID, which starts at -1 and
decrements. And finally we have the version of the BinaryFormatter used. I should note that the values are stored in
the machine native format, so little endian in my case.

Assembly header:
Quote:

11 1 0x0A if there is a single null
0x0D if there are < 256 nulls
0x0E if there are >= 256 nulls
[12 1,4 1 byte if previous field is 0x0D, 4 bytes if previous field is 0x0E, otherwise this field is
omitted]

Offsets relative to previous field:
00 1 0X0C assembly information follows
01 4 Assembly ID
[05 1-4 7 bit encoded assembly string length]

Following that header comes an optional field which indicates the number of nulls in the Assembly string. This value
can be 0A, 0D, or 0E, or omitted. If the value is 0D then the following byte gives the number of nulls, and if it's
0E then the next 4 bytes (an int) gives the number of nulls. If it's 0A then there is only a single null, and hence
the count is not needed. After the optional field comes 0C, indicating that the next part is an assembly name. Then
comes an Assembly ID and finally the length of the assembly name string, 7 bit encoded (44 in our case, or 68
characters).

Type header:
Quote:

Offsets relative to end of assembly name:
00 1 ObjectWithMapTypedAssemId
01 4 Object ID
[05 1-4 7 bit encoded type string length]

The type header starts with another byte, identifying the format of the type header, in this case an object with a
known assembly (the assembly previously defined). Then comes the object ID, and another string which is the full
type name of the type. Using just this information, we can create an instance of a type from any assembly, as long
as it has been loaded into the Application Domain.

Well, that's about it for this post. Over the next few posts I'll cover what the rest of the data in our serialized
packet means, and then begin to implement a serializer based on a few assumptions that will reduce the amount of
data we have to send for that packet by up to 2,500%
Sign in to follow this  


9 Comments


Recommended Comments

I actually started reading this until I realized that my OS doesn't support the .NET framework. Oh well, another couple months to go... [sad]

Question: can mods delete other user's posts in their journal? I would assume so...

Share this comment


Link to comment
Excuse me, Mushu San, but what kind of OS is that?
Mono isn't available for your system?

Washu: Thanks for the informative entry [smile]

Edit: And yes, mods can delete posts anywhere AFAIK. Haven't tried.

Share this comment


Link to comment
Is there some reason you still shouldn't read it mushu? Even if your operating system is some obscure custom built turkey with rockets instead of wings?

Share this comment


Link to comment
Rawr. Ignorance is bliss.

Every time someone writes a bad line of code, a kitten dies. For every kitten that dies, I get $1US in a secret Paypal account. So please, think of my paypal account...

(EDIT: I'm going to go ahead and read it [wink])

(EDIT2: That was actually pretty informative. I was wondering a couple weeks ago what all that extra crap in the packets were.)

(EDIT3: lol - I just noticed some of the stuff you did with the journal template. Awesome)

Share this comment


Link to comment
Quote:

Rawr. Ignorance is bliss.

Not really.
Quote:

Every time someone writes a bad line of code, a kitten dies. For every kitten that dies, I get $1US in a secret Paypal account. So please, think of my paypal account...

I don't really see how this is relevant to the journal...but ok.
Quote:

(EDIT: I'm going to go ahead and read it [wink])

It's good solid information. No reason one shouldn't read it. If nothing else it can give you an insight into the inner workings of the .Net framework.
Quote:

(EDIT2: That was actually pretty informative. I was wondering a couple weeks ago what all that extra crap in the packets were.)

Told ya so.
Quote:

(EDIT3: lol - I just noticed some of the stuff you did with the journal template. Awesome)

Hush, that's a secret. Don't do this at home kids.

Share this comment


Link to comment
Wow, was this ever long in coming?

Type Traits
So, what are type traits? Well, type traits are essentially a compile-time mechanism that allows you to:

  • Obtain information about types

  • Modify types, such as adding or removing cv qualifiers

  • Make compile-time decisions using metafunctions to direct the compiler down certain code paths



In case you have been in a box for a while, there is a new extension to the standard, named “Technical Report on C++ Library Extensions”. This extension adds a great many faculties to the existing standard library, amongst which are type traits.

In the following few posts I'm going to examine some of the abilities of type traits and demonstrate how you can use them for a variety of purposes.

Type Information
The first thing I'm going to focus on is the ability to query for type information. To facilitate this ability, TR1 defines the following set of metafunctions:
Quote:

template <class T, T v>
struct integral_constant {
static const T value = v;
typedef T value_type;
typedef integral_constant<T,v> type;
};
typedef integral_constant<bool, true> true_type;
typedef integral_constant<bool, false> false_type;
// [4.5.1] primary type categories:
template <class T> struct is_void;
template <class T> struct is_integral;
template <class T> struct is_floating_point;
template <class T> struct is_array;
template <class T> struct is_pointer;
template <class T> struct is_reference;
template <class T> struct is_member_object_pointer;
template <class T> struct is_member_function_pointer;
template <class T> struct is_enum;
template <class T> struct is_union;
template <class T> struct is_class;
template <class T> struct is_function;
// [4.5.2] composite type categories:
template <class T> struct is_arithmetic;
template <class T> struct is_fundamental;
template <class T> struct is_object;
template <class T> struct is_scalar;
template <class T> struct is_compound;
template <class T> struct is_member_pointer;
// [4.5.3] type properties:
template <class T> struct is_const;
template <class T> struct is_volatile;
template <class T> struct is_pod;
template <class T> struct is_empty;
template <class T> struct is_polymorphic;
template <class T> struct is_abstract;
template <class T> struct has_trivial_constructor;
template <class T> struct has_trivial_copy;
template <class T> struct has_trivial_assign;
template <class T> struct has_trivial_destructor;
template <class T> struct has_nothrow_constructor;
template <class T> struct has_nothrow_copy;
template <class T> struct has_nothrow_assign;
template <class T> struct has_virtual_destructor;
template <class T> struct is_signed;
template <class T> struct is_unsigned;
template <class T> struct alignment_of;
template <class T> struct rank;
template <class T, unsigned I = 0> struct extent;

All of these metafunctions derive from the base class (either directly or indirectly) integral_constant, with the T in integral_constant being bool, and v either a true or a false. Basically, these classes will derive from true_type or false_type depending on if the type T provided matches with the type trait.
All of these classes also provide an operator type() const; which just returns an instance of the integral_constant<T,v>::type typedef.

The function of the majority of these is fairly clear, although the function of the last three may not be obvious at first. alignment_of essentially will return the minimum alignment requirement of the type passed to the metafunction (in integral_constant<T, v>::value, the alignment is of type size_t). The rank function will return the number of dimensions an array has. For instance: rank<int[][2][3]>::value will return 3, as the array has 3 dimensions. The extent metafunction will return the number of elements a particular dimension can hold (in an array). An example usage would be: extent<int[][2][3], 1>::value which will return 2 (0 based counter).

So, of what use is this information to us? Well, lets look at a simple example of an optimized copy algorithm. First of all, we know that if a type has a trivial assignment operator then it must be a POD type or a fundamental type. If it is either of those, then we can use a memcpy (or memmove) to do a bulk copy. So, lets start off with a basic copy function:

template<class T1, class T2, bool b>
T2 internal_copy(T1 start, T1 stop, T2 dest, integral_constant<bool, b> const&) {
while(start != stop) {
*dest = *start;
++start;
++dest;
}
return dest;
}
template<class T>
T* internal_copy(T* start, T* stop, T* dest, true_type const&) {
memcpy(dest, start, (stop – start) * sizeof(T));
return dest + (stop – start);
}
template<class T1, class T2>
T2 copy(T1 start, T1 stop, T2 dest) {
typedef typename std::iterator_traits<T1>::value_type value_type;
return internal_copy(start, stop, dest, has_trivial_assign<value_type>());
}


Seems simple enough... basically, if it has a trivial assignment operator and if T1 and T2 are pointers, it will dispatch to the second internal_copy function, otherwise it will fall back to the first internal_copy function. I am using iterator traits here because it will properly dereference pointers to give me the base type.

Well, this seems like a rather nice copy function, however we can do even better. If for the first internal copy we again dispatch to another copy, in this case depending on if the iterator is a random access iterator or not, then we can provide the compiler with an opportunity for loop unrolling. So, changing the first internal_copy, we get:

template<class T1, class T2>
T2 internal_copy_2(T1 start, T1 stop, T2 dest, std::random_access_iterator_tag const&) {
typedef typename std::iterator_traits<T1>::difference_type difference_type;
for(difference_type count = stop - start; count > 0; --count) {
*dest = *start;
++start;
++dest;
}

Share this comment


Link to comment
A Short and Relatively Simple Quiz

  1. Given the following three lines of code, answer these questions
    int* p = new int[10];
    int* j = p + 11;
    int* k = p + 10;

    • Is the second line valid?

    • If the second line is valid, where does the pointer point to?

    • What are some of the legal operations that can be performed on the third pointer?



  2. What output should the following line of code product?
    int a = 10; std::cout<<a<<a++<<--a;

  3. Assuming the function called in the following block of code has no default
    parameters, how many parameters does it take? Which objects are passed to it?
    f((a, b, c), d, e, ((g, h), i));

  4. Assuming the function called in the following block of code takes an A* and a B*, what is wrong with the code?
    f(new A(), new B());

Share this comment


Link to comment
[grin][grin][grin][grin][grin][grin]
[flaming][flaming][flaming][flaming][flaming][flaming]
[grin][flaming][grin][grin][flaming][grin]
[grin][flaming][grin][grin][flaming][grin]
[grin][flaming][grin][grin][flaming][grin]
[grin][flaming][grin][grin][flaming][grin]
[grin][flaming][grin][grin][flaming][grin]

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

We are the game development community.

Whether you are an indie, hobbyist, AAA developer, or just trying to learn, GameDev.net is the place for you to learn, share, and connect with the games industry. Learn more About Us or sign up!

Sign me up!