[TCP socket] Data not reaching the server occasionally

Started by
9 comments, last by hiigara 12 years, 6 months ago
My client connects to the server and sends the login data. About 1 in 50 times the connect succeeds but the server does not receive any data, so the client hangs forever waiting for the server reply.
I checked both client and server logs and they show a successful connect.

I have TCP_NODELAY set.
The strangest thing is that when I terminate the server, with the client still hanged, the client detects that the socket was closed by the server.
I don't know how TCP works under the hood.
My guess is somewhere in a middle man router my connection was broken but neither side has detected it. Is this possible?
Is it normal for a TCP connection to break like this or am I doing something wrong?
Advertisement
It does not sound normal.. try posting your code if it's short.

Another thing you can do is download a packet sniffer and see what data actually goes on the network. Try http://www.wireshark.org/ for example.
Analyzing the traffic does require some knowledge of how TCP works, but if you have some time it's an essential thing to learn when working with networking. Start by setting up a filter that captures all packets sent over TCP using the particular port you use in your program. This should give you only a few packets to check, and you can see if some of them are missing when the data goes missing.

My client connects to the server and sends the login data. About 1 in 50 times the connect succeeds but the server does not receive any data, so the client hangs forever waiting for the server reply.
I checked both client and server logs and they show a successful connect.


My guess is that you do not do proper packetization of the TCP stream.
TCP is a _stream_ protocol, like a file on disk, so there are no "boundaries" detectable between calls to send().
A call to recv() may return anything between 1 byte and your full buffer, and that may be part of a send(), or all of a send(), or some data from the end of one send() and the beginning of another send(), or any combination thereof.

Thus, when sending TCP data that is "messages" rather than just a binary stream (such as a file download or whatnot) you do one of two things:
1) use a packet terminator (such as linefeed) after each message. On the receiver, keep receiving data into a buffer until you have a linefeed, process that data, remove the data you processed (and the terminator) from the buffer. Repeat.
2) prefix each message with a length field. On the receiver, keep receiving into a buffer. When you have at least X bytes of data, where X is the size of the length field, calculate the length Y, and see if you have Y bytes. If so, process those bytes, then remove the length field and the data from the buffer. Repeat.

Note that, in both cases, the buffer will likely contain either a non-full packet, or more than one packet (data after the terminator/length), that is waiting to later become a full packet. This is expected and natural, and an outcome of the way TCP works.
enum Bool { True, False, FileNotFound };

My guess is that you do not do proper packetization of the TCP stream.
TCP is a _stream_ protocol, like a file on disk, so there are no "boundaries" detectable between calls to send().
A call to recv() may return anything between 1 byte and your full buffer, and that may be part of a send(), or all of a send(), or some data from the end of one send() and the beginning of another send(), or any combination thereof.

Thus, when sending TCP data that is "messages" rather than just a binary stream (such as a file download or whatnot) you do one of two things:
1) use a packet terminator (such as linefeed) after each message. On the receiver, keep receiving data into a buffer until you have a linefeed, process that data, remove the data you processed (and the terminator) from the buffer. Repeat.
2) prefix each message with a length field. On the receiver, keep receiving into a buffer. When you have at least X bytes of data, where X is the size of the length field, calculate the length Y, and see if you have Y bytes. If so, process those bytes, then remove the length field and the data from the buffer. Repeat.

Note that, in both cases, the buffer will likely contain either a non-full packet, or more than one packet (data after the terminator/length), that is waiting to later become a full packet. This is expected and natural, and an outcome of the way TCP works.

Packetization is a possibility. My packetization code is already quite sophisticated and I want to believe it is stable. I am going to add more log messages to the receiving end of the packetization, so even if a single byte arrives I will know.

This function is the heart of the receiving end. Whenever select() returns socket readable, I call it.


void Clientsocket::main()
{
switch( State ){
case SIZE:{

unsigned int _Packetsize;
ssize_t _ret = Socketsp->recv( &_Packetsize, sizeof _Packetsize,
MSG_PEEK );

if( _ret != sizeof _Packetsize ){

throw next_client_Exception();
}

_Packetsize = ::ntohl( _Packetsize );

if( _Packetsize > MAX_CLIENT_PACKET_SIZE ){

throw Exception("Packet too big: %u.", _Packetsize);
}

Recvbuffero.resize( _Packetsize );
Bytesreceived = 0;
State = PACKET;
}

case PACKET:{

ssize_t _ret = Socketsp->recv( Recvbuffero.data() + Bytesreceived,
Recvbuffero.size() - Bytesreceived, 0 );

if ( _ret > 0 ) {

Bytesreceived += _ret;
}

if( Bytesreceived < Recvbuffero.size() ){

throw next_client_Exception();
}

State = SIZE;
}
}
}


[quote name='hiigara' timestamp='1318614262' post='4872601']
My client connects to the server and sends the login data. About 1 in 50 times the connect succeeds but the server does not receive any data, so the client hangs forever waiting for the server reply.
I checked both client and server logs and they show a successful connect.


My guess is that you do not do proper packetization of the TCP stream.
TCP is a _stream_ protocol, like a file on disk, so there are no "boundaries" detectable between calls to send().
A call to recv() may return anything between 1 byte and your full buffer, and that may be part of a send(), or all of a send(), or some data from the end of one send() and the beginning of another send(), or any combination thereof.

Thus, when sending TCP data that is "messages" rather than just a binary stream (such as a file download or whatnot) you do one of two things:
1) use a packet terminator (such as linefeed) after each message. On the receiver, keep receiving data into a buffer until you have a linefeed, process that data, remove the data you processed (and the terminator) from the buffer. Repeat.
2) prefix each message with a length field. On the receiver, keep receiving into a buffer. When you have at least X bytes of data, where X is the size of the length field, calculate the length Y, and see if you have Y bytes. If so, process those bytes, then remove the length field and the data from the buffer. Repeat.
[/quote]

Some use both sentinel values and prefixes as an extra precaution. I'm contemplating going that route also.
I have a couple of examples of what you mention in the second item in some code here -- http://webEbenezer.n.../direct.tar.bz2 .


Brian Wood
Ebenezer Enterprises
http://webEbenezer.net
Why do you use MSG_PEEK, and are you sure it does what you want it to?
Also, for debugging purposes make sure to check if you receive any data, even if it doesn't match the length or contents you expect.
After a second inspection of the log file I can confirm that either not even a single byte reaches the server, or select() is not firing up when it should.

Here is some more code:


while ( 1 ) {

/*
fd_set is a structure. `=` works.
*/
fd_set _Readfdset = Readfdset ;
fd_set _Writefdset = Writefdset ;

int _Ret = ::select ( Maxfd + 1, & _Readfdset, & _Writefdset,
NULL, NULL );
xx( 7, "select() returned %d. Maxfd %d", _Ret, Maxfd );

if ( _Ret == -1 ) {

char _Buffer [ 1024 ] ;
xx ( 0, "::select(): %s.", ::strerror_r ( errno, _Buffer,
sizeof _Buffer ) ) ;

break;
}

if( FD_ISSET( Clientacceptsocket->get_fd (), &_Readfdset ) ){

while( accept_client() );
}

process_clients(_Readfdset, _Writefdset);
calculate_max_fd();

xx( 7, "Current # of clients: %d.", Clientsocketsplist.size() );

/*
Comment when admin thread coded
*/
g_Logp->flush();
}




And here is the log:

01:22:20 | select() returned 1. Maxfd 12
01:22:20 | accept_client(): accepted connection from [89.155.52.73]. Socket fd: 11.
01:22:20 | accept_client(): ::getsockopt returned 0. New socket's TCP_NODELAY: 1.
01:22:20 | accept_client(): New socket's O_NONBLOCK: 0x800.
01:22:20 | Current # of clients: 1.
01:25:38 | select() returned 1. Maxfd 12
01:25:38 | Businesssocketsp ready.
01:25:38 | Businesssocketsp: socket closed gracefully.
01:25:38 | initialize_business_server(): entering ::select().
01:25:53 | Terminating threads.
01:25:53 | accept_business_server(): ::accept(): Invalid argument. Likely a normal termination.
01:25:53 | Threads terminated.


The client connects at 1:22:20, and the login data should have reached the server at that time, but it doesn't.
The server goes back to sleep at 1:22:20 and only wakes up from select() at 1:25:38, 3 minutes later.
And it only wakes up because at 1:25:38 I initiate the shutdown sequence. The server effectively terminates at 1:25:53.
And as I said the client detects when the server closes the socket during shutdown.
If there is data to read select() should return right?
Maxfd has a good value. The client socket is 11.

In 49 out of 50 connections everything works as expected. Here is the log for a successful connection:

00:57:20 | select() returned 1. Maxfd 12
00:57:20 | accept_client(): accepted connection from [89.155.52.73]. Socket fd: 11.
00:57:20 | accept_client(): ::getsockopt returned 0. New socket's TCP_NODELAY: 1.
00:57:20 | accept_client(): New socket's O_NONBLOCK: 0x800.
00:57:20 | Current # of clients: 1.
00:57:20 | select() returned 1. Maxfd 12
00:57:20 | New Sessionid 1695357897.
00:57:20 | Forwarding packet 101 from socket 11.
00:57:20 | process_rpc_packet(): _Cmd 5.
00:57:20 | Current # of clients: 1.


As you can see select() awakes immediately after the successful connection.

Why do you use MSG_PEEK, and are you sure it does what you want it to?
Also, for debugging purposes make sure to check if you receive any data, even if it doesn't match the length or contents you expect.


I use MSG_PEEK so I don't have to buffer the length field myself, which I could, but with MSG_PEEK it just looks prettier.

It does not sound normal.. try posting your code if it's short.

Another thing you can do is download a packet sniffer and see what data actually goes on the network. Try http://www.wireshark.org/ for example.
Analyzing the traffic does require some knowledge of how TCP works, but if you have some time it's an essential thing to learn when working with networking. Start by setting up a filter that captures all packets sent over TCP using the particular port you use in your program. This should give you only a few packets to check, and you can see if some of them are missing when the data goes missing.


I will try that at some point. For now it seems a lot of work. I think I can live with this problem for the time being. If the connection hangs I will have a restart button, or an automatic heart beat of some sort.

I use MSG_PEEK so I don't have to buffer the length field myself, which I could, but with MSG_PEEK it just looks prettier.


You should not throw an exception if the length data is not complete -- what's exceptional about that? It's just a normal situation. Just ignore that socket for this time and move on. Note that, if you're using MSG_PEEK, if someone sends three bytes and nothing more, the socket will keep saying "ready" forever, and you will ignore it each time through the loop, leading to a form of Denial Of Service.


Also, you don't seem to be reading the length again when you're in the PACKET state, unless the packet is defined to include the length itself, and the minimum legal length value is then 4. If that's the case, you SHOULD throw when you get a value < 4, else someone can DOS your server by sending four bytes of 0, which will cause you to go into an infinite loop of receiving 0-byte packets.

The log messages you show cannot possibly have been generated by only the code that you posted, too, because there is a while() loop that doesn't mention "business" sockets, but the log message does talk about it.
enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement