[.net] Deserialization problem

Started by
10 comments, last by Nypyren 15 years, 1 month ago
My Deserialize method:


public static Object Deserialize(Byte[] bytes)
{
	Object returnObject = null;

	if (bytes != null)
	{
		BinaryFormatter formatter = new BinaryFormatter();
		using (Stream stream = new MemoryStream())
		{
			stream.Write(bytes, 0, bytes.Length);
			stream.Position = 0;

			returnObject = formatter.Deserialize(stream);
		}
	}

	return returnObject;
}


is throwing an exception that says "End of Stream encountered before parsing was completed.", when I pass it a Byte array containing two bytes (145 and 0). Can anyone see why that might be? [Edited by - CyberSlag5k on March 12, 2009 12:44:47 PM]
Without order nothing can exist - without chaos nothing can evolve.
Advertisement
This may sound like a stupid question but what are you trying to do and what object do you expect the function to return? The serializer expects the stream to contain information about the object to deserialize, I would get confused too if you passed me 2 bytes and told me to deserialize that.. :)
That makes sense. In this instance, it's an Int16 (a short), which I will have to unbox. Basically, I wrote this method, which I'm now trying to debug:

public static Object ReadObject(this NetworkStream stream, Byte[] readBuffer, Int32 size){	Object returnObject = null;	Int32 count;	Int32 bytesRead = 0;	List<Byte> receivedBytes = new List<Byte>();	while (bytesRead < size)	{		count = stream.Read(readBuffer, 0, size);		if (count != 0)		{			bytesRead += count;			Byte[] copyBytes = new Byte[count];			Buffer.BlockCopy(readBuffer, 0, copyBytes, 0, count);			receivedBytes.Add(copyBytes);		}		else		{			return null;		}	}	Byte[] array = receivedBytes.ToArray();	returnObject = MAILUtility.Deserialize(array);	return returnObject;}


So that I could tell it how many bytes to read from a TCP socket connection, and it will give me back a completed object. I send my data preceded by two bytes which tell the length of the data, so that I know how much to expect when receiving it.

Basically, I'm trying to address the issues discussed in this thread. I want to read my two bytes, and then read however much data they say is coming, and I don't process anything until I've received the correct amount of data.
Without order nothing can exist - without chaos nothing can evolve.
I've decided to create a wrapper object that contains the information I wish to send ahead of the data I'm sending. Right now, that object looks like this:

[Serializable]public class DataHeader{   public Int32 DataSize { get; set; }}public DataHeader(Int32 dataSize{   DataSize = dataSize;}


I've tried sending just that, without attaching the actual data at the end, but I'm getting an exception when I attempt to send the serialized version of the object that says "Binary stream '0' does not contain a valid BinaryHeader. Possible causes are invalid stream or object version change between serialization and deserialization."

I have 3 questions:

1. Why might I be getting this exception? The class is very simple, and it's marked as Serializable.

2. When I check the length of the serialized byte array, it's 176 bytes long. Why is this so big? It contains just an integer, which should be something in the neighborhood of 32 bytes, right? I'm presuming that the serialization process appends some data to describe the type, as itachi pointed out. Is it reasonable that this would be 144 bytes long?

3. Right now, I'm forced to simply hard-code that value of 176 when I read in the header data. I can't seem to dynamically determine the size of my HeaderData object, unless I instantiate a version, deserialize it, and then check the resulting array's length, which seems hackish. Is there a better way, and if not, is it better to hard code the value or create a dummy object?
Without order nothing can exist - without chaos nothing can evolve.
BinaryFormatter is for complete serialization of type meta-data and the values. Do not use it for raw network communication, as you will find (and my blog covers this) that the overhead is quite significant. Basically BinaryFormatter will embed the type data and formatting of the entire object so that it may be properly deserialized on the destination side. It operates under the same principles as the Soap Formatter.

Finally, given a stream, say a NetworkStream, you can read binary data from it using a BinaryReader.

BinaryReader reader = new BinaryReader(freakingNetworkStream);int dataLength = reader.ReadInt16();byte[] data = reader.ReadBytes(dataLength);

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Quote:Original post by Washu
BinaryFormatter is for complete serialization of type meta-data and the values. Do not use it for raw network communication, as you will find (and my blog covers this) that the overhead is quite significant. Basically BinaryFormatter will embed the type data and formatting of the entire object so that it may be properly deserialized on the destination side. It operates under the same principles as the Soap Formatter.


What should I use instead? Your journal suggests rolling your own, is there nothing in .Net that will suffice? And is the BinaryFormatter what is causing my exception issue?

Quote:
Finally, given a stream, say a NetworkStream, you can read binary data from it using a BinaryReader.

BinaryReader reader = new BinaryReader(freakingNetworkStream);int dataLength = reader.ReadInt16();byte[] data = reader.ReadBytes(dataLength);


Most excellent. Is that gauranteed to bring back the correct amount of data? In other words, if I ask for 512 bytes, when the method returns, will I have all 512?

Thank you for your response.
Without order nothing can exist - without chaos nothing can evolve.
Thank you both, itachi and Washu. My client and server are talking better than ever! I was able to dump my new ReadObject method entirely, in favor of the BinaryReader:

if (m_Client.Connected == true){	NetworkStream stream = m_Client.GetStream();	BinaryReader reader = new BinaryReader(stream);	Int16 messageLength;	Byte[] messageBytes;	while (true)	{		messageLength = reader.ReadInt16();		messageBytes = reader.ReadBytes(messageLength);		m_ReceivedQueue.Enqueue(messageBytes);	}}


And I'm now able to send very large messages (though I won't often have the need, it's good to know that partial messages are getting properly handled via that ReadBytes method), as well as lots of short messages in succession (which I know aren't getting missed, as I sent numbers counting up to 100 and they were all there). The only thing left to do is to optimize it, as sending the fast, small messages takes longer than I would like. I'm assuming this is due to the overhead you mentioned, Washu, with the BinaryFormatter.

Thanks again, guys!
Without order nothing can exist - without chaos nothing can evolve.
Quote:Original post by CyberSlag5k
Thank you both, itachi and Washu. My client and server are talking better than ever! I was able to dump my new ReadObject method entirely, in favor of the BinaryReader:

*** Source Snippet Removed ***

And I'm now able to send very large messages (though I won't often have the need, it's good to know that partial messages are getting properly handled via that ReadBytes method), as well as lots of short messages in succession (which I know aren't getting missed, as I sent numbers counting up to 100 and they were all there). The only thing left to do is to optimize it, as sending the fast, small messages takes longer than I would like. I'm assuming this is due to the overhead you mentioned, Washu, with the BinaryFormatter.

Thanks again, guys!


Uh, no, chances are small messages take longer because you have TCP_NODELAY turned off. With that disabled small messages will be cached until the buffer holds more data, then sent (or a timeout is reached). The best thing to do would be to disable TCP_NODELAY. Assuming you're using a TcpClient class, look at the TcpClient.NoDelay property.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.

Quote:Original post by Washu
Uh, no, chances are small messages take longer because you have TCP_NODELAY turned off. With that disabled small messages will be cached until the buffer holds more data, then sent (or a timeout is reached). The best thing to do would be to disable TCP_NODELAY. Assuming you're using a TcpClient class, look at the TcpClient.NoDelay property.


I don't really see a significant difference with it on or off. It takes about 10-12 seconds to process 100 messages either way. Mind you, there's a lot more going on, XML is being constructed and parsed, the display is getting updated, etc., so the slow down may not be on the communications side of things. I just thought that there was ground to be made up because of what you said about the BinaryFormatter. Should I still be looking in to that, and if so, is there a simple solution?
Without order nothing can exist - without chaos nothing can evolve.
If you want to write your own serializing engine, look at the FormatterServices class, in particular the FormatterServices.GetObjectData function. It's a LOT more complicated than that, especially if you optimize "Type IDs" over the network (all network clients need to make sure they have the same optimized Type ID for the same in-code type).

There are some optimizations you should do, such as a cache for type info (in particular, the FieldInfo elements which are references, and the ones you want to serialize).

The way I do it is:

- During startup, use reflection to find all serializable types
- For each serializable type, add all of its serializable members to a lookup table
- Determine the minimum number of bytes for type IDs.

- When serializing an object graph, graph-walk the referenced objects and assign unique IDs to each one. Determine the minimum # of bytes for object IDs.
- Using a MemoryStream, write out a placeholder DWORD for the total number of bytes.
- Write a byte which indicates the typeID and objectID sizes (usually 1-4 each, so each size is stored in each nibble)
- Write the # of objects
- Write the array of objectIDs.
- For each object, write its data. References to other objects are serialized as their integer objectID.

- When deserializing an object graph, read in all of the objectIDs and create placeholder objects in a lookup array.
- Deserialize each object's data, and when you encounter an objectID, you can set the reference to the object placeholder in your lookup table (even if you haven't yet set that object's fields).

- If you encounter a type during runtime which does not already have a known typeID (any derived type that you didn't notice during the startup reflection scan), you must transfer the new typeID and full typename to all other networked clients. Also, keep the fact that this is a new type in a table so that you can immediately inform clients when they connect.


That's the general overview. There are some complications when dealing with arrays/structs/generic types, but you should be able to figure those out.

This topic is closed to new replies.

Advertisement