Theory - Handling TCP data that is split on recv calls

Started by
17 comments, last by Drew_Benton 12 years, 11 months ago
Hi Guys,

I understand that (with TCP) if data is sent on one send() call, that it is possible that it may possibly take multiple recv calls to get the data that was sent in the first place.

But, is there any delay between recv() calls or will the data always be there on the next call?

For example will this happen?

Data sent = "Hello world"

Recv() = "Hell"
Recv() = "o world"

Or is it possible that this may happen;

Recv() = "Hell"
Recv() = NO DATA
Recv() = NO DATA
Recv() = "o world"

Thanks in advance :)
Advertisement
The first will always happen, because recv() blocks until data has been received.
You haven't specified which language you're using, but assuming you're using C++, you should use an asynchronous solution like IOCP so that your main application thread doesn't have to block in order to wait for data.
This way, you can easily handle data asynchronously. Just attach a packet-ID to your packets, compare the packet-ID to the packetsize (which is predetermined), and if you've received PACKETSIZE number of bytes, the packet has been fully received. If the current buffer contains any data before the packet-ID, that data belonged to the previously received packet. If you've received more than PACKETSIZE number of bytes, you've received the start of a new packet also.
Thanks for the reply.

Yes, I am using C++. But, I am using non-blocking sockets.

So, is it possible the latter may happen? Or, will the data always be there on the second call?
In principle, yes, the second can happen if the socket is set to non-blocking mode. In that case, recv() returns SOCKET_ERROR and sets an error code if no data is available. For that model, you are just "polling" the socket for data every loop.
recv() is documented as returning the number of bytes read on success, the value 0 if the connection is closed, and an error code otherwise. If you set your socket into non-blocking mode, it will return an "error" value of something like EAGAIN, EWOULDBLOCK or WSAEWOULDBLOCK.
Actually, that is a good point. So, if I havent recieved all of the data I should be getting WSAEWOULDBLOCK and then I can act accordingly. :cool:
What's more that this, you are also likely to get part of the next block of data. Or part of the next three or four or five data blocks, all at once.


This is one big reason for design of most networking systems. Look at not just the IP protocol, but most of them up and down the OSI model, and you will see this: {header { data } } The header is simply the size of the payload and enough data to know what to do with it.

That gets wrapped up at another level, where the block becomes data for the next level, with it's own message {header { data } } where data = {header { data } }.

So by the time you get to the wire, you end up with:
{ Ethernet header { IP header { TCP header { App data header { your game packet header { actual data } } } } } }

It is generally best use this model for your own code.

Read a small but known header, that includes the total size. Keep this buffered until you get that much data or more, and then pull off a full block and process it. Repeat until you have extracted and processed all your items.

Actually, that is a good point. So, if I havent recieved all of the data I should be getting WSAEWOULDBLOCK and then I can act accordingly. :cool:


I recommend reading my description of one solution to the same problem in this other thread:
http://www.gamedev.net/topic/601279-sending-and-receiving-structs-i-read-faq/page__view__findpost__p__4807012
enum Bool { True, False, FileNotFound };
Thanks for the link hplus0603, checking it out now.

@frob - that was pretty much what I was thinking. So, getting too much data wont be an issue for me.

This is what I had in mind. A header (struct) containing an int for data size and data type. Where the data type might be another struct, compressed, or raw data. Something like;

struct header
{
int nType;
int nSize;
}

So, nType will identify to the app what sort of data to expect and what to do with it.

Does this sound feasable?


[edit]
I am also going to look at IOCP as suggested earlier (or atleast a multi-thread approach), as I foresee problems with waiting around for data too long - like application jitter, if the data gets caught up somewhere.

So, nType will identify to the app what sort of data to expect and what to do with it.

Does this sound feasable?


Yes, that's how it's usually done in practice for binary protocols. My advice is to keep it simple, and don't try to be clever with it.

I've seen some variations, such as writing a size as a byte, then data, and then a continued size byte for more data and so on (the last size byte is 0 to signal no more data), but I've never seen the point of those implementations. In the end, they use 2 bytes the majority of the time and are not saving anything.

I prefer having the size there to give more flexibility to the protocol rather than determining the size based on the ID alone. Reason be, fixed size packets usually mean fixed size strings, and those can really take a toll on a system. I've seen games use 512 byte fixed size strings for every chat packet, which is pretty painful to think about...

I am also going to look at IOCP as suggested earlier (or atleast a multi-thread approach), as I foresee problems with waiting around for data too long - like application litter, if the data gets caught up somewhere.[/quote]

Any problems you would have with IOCP in regards to that would also happen in any other network implementation. This issue belongs to the application protocol processing layer, not the underlying network layer. If you say you only care the client is still sending data, then a client could send your ping packet over and over while not actually doing anything. Of course, you'd not know that if you only handle it at the network layer.

If instead you handle it at the application protocol processing level, you can time between different logical packets. Only then, can you know that client A has not sent a login packet in 1 minute, so you should disconnect them. Or, a client has not finished sending the packets required to complete a transaction in game in 10 seconds, start a rollback of actions and disconnect them. So on and so forth, you have to be aware of such things if you want to avoid timing exploits in your system. It's surprising how many systems are actually vulnerable to such things. I've come across a lot of flaws in games that were directly related to this issue.

This topic is closed to new replies.

Advertisement