Sign in to follow this  
hiigara

[TCP socket] Data not reaching the server occasionally

Recommended Posts

My client connects to the server and sends the login data. About 1 in 50 times the connect succeeds but the server does not receive any data, so the client hangs forever waiting for the server reply.
I checked both client and server logs and they show a successful connect.

I have TCP_NODELAY set.
The strangest thing is that when I terminate the server, with the client still hanged, the client detects that the socket was closed by the server.
I don't know how TCP works under the hood.
My guess is somewhere in a middle man router my connection was broken but neither side has detected it. Is this possible?
Is it normal for a TCP connection to break like this or am I doing something wrong?

Share this post


Link to post
Share on other sites
It does not sound normal.. try posting your code if it's short.

Another thing you can do is download a packet sniffer and see what data actually goes on the network. Try [url="http://www.wireshark.org/"]http://www.wireshark.org/[/url] for example.
Analyzing the traffic does require some knowledge of how TCP works, but if you have some time it's an essential thing to learn when working with networking. Start by setting up a filter that captures all packets sent over TCP using the particular port you use in your program. This should give you only a few packets to check, and you can see if some of them are missing when the data goes missing.

Share this post


Link to post
Share on other sites
[quote name='hiigara' timestamp='1318614262' post='4872601']
My client connects to the server and sends the login data. About 1 in 50 times the connect succeeds but the server does not receive any data, so the client hangs forever waiting for the server reply.
I checked both client and server logs and they show a successful connect.
[/quote]

My guess is that you do not do proper packetization of the TCP stream.
TCP is a _stream_ protocol, like a file on disk, so there are no "boundaries" detectable between calls to send().
A call to recv() may return anything between 1 byte and your full buffer, and that may be part of a send(), or all of a send(), or some data from the end of one send() and the beginning of another send(), or any combination thereof.

Thus, when sending TCP data that is "messages" rather than just a binary stream (such as a file download or whatnot) you do one of two things:
1) use a packet terminator (such as linefeed) after each message. On the receiver, keep receiving data into a buffer until you have a linefeed, process that data, remove the data you processed (and the terminator) from the buffer. Repeat.
2) prefix each message with a length field. On the receiver, keep receiving into a buffer. When you have at least X bytes of data, where X is the size of the length field, calculate the length Y, and see if you have Y bytes. If so, process those bytes, then remove the length field and the data from the buffer. Repeat.

Note that, in both cases, the buffer will likely contain either a non-full packet, or more than one packet (data after the terminator/length), that is waiting to later become a full packet. This is expected and natural, and an outcome of the way TCP works.

Share this post


Link to post
Share on other sites
[quote name='hplus0603' timestamp='1318622320' post='4872648']
My guess is that you do not do proper packetization of the TCP stream.
TCP is a _stream_ protocol, like a file on disk, so there are no "boundaries" detectable between calls to send().
A call to recv() may return anything between 1 byte and your full buffer, and that may be part of a send(), or all of a send(), or some data from the end of one send() and the beginning of another send(), or any combination thereof.

Thus, when sending TCP data that is "messages" rather than just a binary stream (such as a file download or whatnot) you do one of two things:
1) use a packet terminator (such as linefeed) after each message. On the receiver, keep receiving data into a buffer until you have a linefeed, process that data, remove the data you processed (and the terminator) from the buffer. Repeat.
2) prefix each message with a length field. On the receiver, keep receiving into a buffer. When you have at least X bytes of data, where X is the size of the length field, calculate the length Y, and see if you have Y bytes. If so, process those bytes, then remove the length field and the data from the buffer. Repeat.

Note that, in both cases, the buffer will likely contain either a non-full packet, or more than one packet (data after the terminator/length), that is waiting to later become a full packet. This is expected and natural, and an outcome of the way TCP works.
[/quote]
Packetization is a possibility. My packetization code is already quite sophisticated and I want to believe it is stable. I am going to add more log messages to the receiving end of the packetization, so even if a single byte arrives I will know.

This function is the heart of the receiving end. Whenever select() returns socket readable, I call it.
[code]

void Clientsocket::main()
{
switch( State ){
case SIZE:{

unsigned int _Packetsize;
ssize_t _ret = Socketsp->recv( &_Packetsize, sizeof _Packetsize,
MSG_PEEK );

if( _ret != sizeof _Packetsize ){

throw next_client_Exception();
}

_Packetsize = ::ntohl( _Packetsize );

if( _Packetsize > MAX_CLIENT_PACKET_SIZE ){

throw Exception("Packet too big: %u.", _Packetsize);
}

Recvbuffero.resize( _Packetsize );
Bytesreceived = 0;
State = PACKET;
}

case PACKET:{

ssize_t _ret = Socketsp->recv( Recvbuffero.data() + Bytesreceived,
Recvbuffero.size() - Bytesreceived, 0 );

if ( _ret > 0 ) {

Bytesreceived += _ret;
}

if( Bytesreceived < Recvbuffero.size() ){

throw next_client_Exception();
}

State = SIZE;
}
}
}

[/code]

Share this post


Link to post
Share on other sites
[quote name='hplus0603' timestamp='1318622320' post='4872648']
[quote name='hiigara' timestamp='1318614262' post='4872601']
My client connects to the server and sends the login data. About 1 in 50 times the connect succeeds but the server does not receive any data, so the client hangs forever waiting for the server reply.
I checked both client and server logs and they show a successful connect.
[/quote]

My guess is that you do not do proper packetization of the TCP stream.
TCP is a _stream_ protocol, like a file on disk, so there are no "boundaries" detectable between calls to send().
A call to recv() may return anything between 1 byte and your full buffer, and that may be part of a send(), or all of a send(), or some data from the end of one send() and the beginning of another send(), or any combination thereof.

Thus, when sending TCP data that is "messages" rather than just a binary stream (such as a file download or whatnot) you do one of two things:
1) use a packet terminator (such as linefeed) after each message. On the receiver, keep receiving data into a buffer until you have a linefeed, process that data, remove the data you processed (and the terminator) from the buffer. Repeat.
2) prefix each message with a length field. On the receiver, keep receiving into a buffer. When you have at least X bytes of data, where X is the size of the length field, calculate the length Y, and see if you have Y bytes. If so, process those bytes, then remove the length field and the data from the buffer. Repeat.
[/quote]

Some use both sentinel values and prefixes as an extra precaution. I'm contemplating going that route also.
I have a couple of examples of what you mention in the second item in some code here -- [url="http://webEbenezer.net/misc/direct.tar.bz2"]http://webEbenezer.n.../direct.tar.bz2[/url] .


Brian Wood
Ebenezer Enterprises
[url="http://webEbenezer.net"]http://webEbenezer.net[/url]

Share this post


Link to post
Share on other sites
Why do you use MSG_PEEK, and are you sure it does what you want it to?
Also, for debugging purposes make sure to check if you receive any data, even if it doesn't match the length or contents you expect.

Share this post


Link to post
Share on other sites
After a second inspection of the log file I can confirm that either not even a single byte reaches the server, or select() is not firing up when it should.

Here is some more code:
[code]

while ( 1 ) {

/*
fd_set is a structure. `=` works.
*/
fd_set _Readfdset = Readfdset ;
fd_set _Writefdset = Writefdset ;

int _Ret = ::select ( Maxfd + 1, & _Readfdset, & _Writefdset,
NULL, NULL );
xx( 7, "select() returned %d. Maxfd %d", _Ret, Maxfd );

if ( _Ret == -1 ) {

char _Buffer [ 1024 ] ;
xx ( 0, "::select(): %s.", ::strerror_r ( errno, _Buffer,
sizeof _Buffer ) ) ;

break;
}

if( FD_ISSET( Clientacceptsocket->get_fd (), &_Readfdset ) ){

while( accept_client() );
}

process_clients(_Readfdset, _Writefdset);
calculate_max_fd();

xx( 7, "Current # of clients: %d.", Clientsocketsplist.size() );

/*
Comment when admin thread coded
*/
g_Logp->flush();
}

[/code]


And here is the log:
[code]
01:22:20 | select() returned 1. Maxfd 12
01:22:20 | accept_client(): accepted connection from [89.155.52.73]. Socket fd: 11.
01:22:20 | accept_client(): ::getsockopt returned 0. New socket's TCP_NODELAY: 1.
01:22:20 | accept_client(): New socket's O_NONBLOCK: 0x800.
01:22:20 | Current # of clients: 1.
01:25:38 | select() returned 1. Maxfd 12
01:25:38 | Businesssocketsp ready.
01:25:38 | Businesssocketsp: socket closed gracefully.
01:25:38 | initialize_business_server(): entering ::select().
01:25:53 | Terminating threads.
01:25:53 | accept_business_server(): ::accept(): Invalid argument. Likely a normal termination.
01:25:53 | Threads terminated.
[/code]

The client connects at 1:22:20, and the login data should have reached the server at that time, but it doesn't.
The server goes back to sleep at 1:22:20 and only wakes up from select() at 1:25:38, 3 minutes later.
And it only wakes up because at 1:25:38 I initiate the shutdown sequence. The server effectively terminates at 1:25:53.
And as I said the client detects when the server closes the socket during shutdown.
If there is data to read select() should return right?
Maxfd has a good value. The client socket is 11.

In 49 out of 50 connections everything works as expected. Here is the log for a successful connection:
[code]
00:57:20 | select() returned 1. Maxfd 12
00:57:20 | accept_client(): accepted connection from [89.155.52.73]. Socket fd: 11.
00:57:20 | accept_client(): ::getsockopt returned 0. New socket's TCP_NODELAY: 1.
00:57:20 | accept_client(): New socket's O_NONBLOCK: 0x800.
00:57:20 | Current # of clients: 1.
00:57:20 | select() returned 1. Maxfd 12
00:57:20 | New Sessionid 1695357897.
00:57:20 | Forwarding packet 101 from socket 11.
00:57:20 | process_rpc_packet(): _Cmd 5.
00:57:20 | Current # of clients: 1.
[/code]

As you can see select() awakes immediately after the successful connection.

Share this post


Link to post
Share on other sites
[quote name='Erik Rufelt' timestamp='1318676329' post='4872788']
Why do you use MSG_PEEK, and are you sure it does what you want it to?
Also, for debugging purposes make sure to check if you receive any data, even if it doesn't match the length or contents you expect.
[/quote]

I use MSG_PEEK so I don't have to buffer the length field myself, which I could, but with MSG_PEEK it just looks prettier.

Share this post


Link to post
Share on other sites
[quote name='Erik Rufelt' timestamp='1318615473' post='4872615']
It does not sound normal.. try posting your code if it's short.

Another thing you can do is download a packet sniffer and see what data actually goes on the network. Try [url="http://www.wireshark.org/"]http://www.wireshark.org/[/url] for example.
Analyzing the traffic does require some knowledge of how TCP works, but if you have some time it's an essential thing to learn when working with networking. Start by setting up a filter that captures all packets sent over TCP using the particular port you use in your program. This should give you only a few packets to check, and you can see if some of them are missing when the data goes missing.
[/quote]

I will try that at some point. For now it seems a lot of work. I think I can live with this problem for the time being. If the connection hangs I will have a restart button, or an automatic heart beat of some sort.

Share this post


Link to post
Share on other sites
[quote name='hiigara' timestamp='1318682650' post='4872820']
I use MSG_PEEK so I don't have to buffer the length field myself, which I could, but with MSG_PEEK it just looks prettier.
[/quote]

You should not throw an exception if the length data is not complete -- what's exceptional about that? It's just a normal situation. Just ignore that socket for this time and move on. Note that, if you're using MSG_PEEK, if someone sends three bytes and nothing more, the socket will keep saying "ready" forever, and you will ignore it each time through the loop, leading to a form of Denial Of Service.


Also, you don't seem to be reading the length again when you're in the PACKET state, unless the packet is defined to include the length itself, and the minimum legal length value is then 4. If that's the case, you SHOULD throw when you get a value < 4, else someone can DOS your server by sending four bytes of 0, which will cause you to go into an infinite loop of receiving 0-byte packets.

The log messages you show cannot possibly have been generated by only the code that you posted, too, because there is a while() loop that doesn't mention "business" sockets, but the log message does talk about it.

Share this post


Link to post
Share on other sites
[quote name='hplus0603' timestamp='1318819269' post='4873299']
You should not throw an exception if the length data is not complete -- what's exceptional about that? It's just a normal situation. Just ignore that socket for this time and move on.
[/quote]
next_client_Exception does exactly that, it silently moves on to the next client socket.
The coding style I have adopted is throwing exceptions instead of returning values from functions. It does make the code smaller because I don't have to check for return values.
If you saw the full code you would understand my decision. In this case returning from the function means that I have a complete packet, everything else is "exceptional".

[quote name='hplus0603' timestamp='1318819269' post='4873299']
Note that, if you're using MSG_PEEK, if someone sends three bytes and nothing more, the socket will keep saying "ready" forever, and you will ignore it each time through the loop, leading to a form of Denial Of Service.
[/quote]
I will remove MSG_PEEK. Never occurred to me that.


[quote name='hplus0603' timestamp='1318819269' post='4873299']Also, you don't seem to be reading the length again when you're in the PACKET state, unless the packet is defined to include the length itself, and the minimum legal length value is then 4. If that's the case, you SHOULD throw when you get a value < 4, else someone can DOS your server by sending four bytes of 0, which will cause you to go into an infinite loop of receiving 0-byte packets.
[/quote]
Yes the packet includes the length itself. Again that exploit did not occur to me. I will rewrite Clientsocket::main.


[quote name='hplus0603' timestamp='1318819269' post='4873299']The log messages you show cannot possibly have been generated by only the code that you posted, too, because there is a while() loop that doesn't mention "business" sockets, but the log message does talk about it.
[/quote]
The code is big too post here. The messages that you not see in the code are generated inside accept_client() and process_clients(). But the code of those 2 functions does not seem relevant for the incident since I am setting the arguments of select() correctly and select() is not waking up. I could have a buffer overrun that occasionally corrupts Readfdset. I will add a log message with the contents of Readfdset.


Thanks for the feedback.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this