[solved] Calling recv() in a loop

Started by
3 comments, last by Mitchell314 10 years, 4 months ago

When reading a tcp socket for arbitrarily sized packets, is it safe to keep calling recv() until it returns a negative number? For simplicity's sake, I'm assuming no other error is raised, just that return value of 0 means connection closed, and -1 means there's no data left to be read at the time.

For example, would the following code properly read all of the socket's contents? The other end of the connection is sending data in the format of "<text> \n <text> \n <text> \n .... <text> \n"


/* Make believe string structure */
struct resizableString string;

int bufferSize = 100;
int socket = connectToOtherHost();

while (-1){
    char *pBuffer = calloc(bufferSize);
    int bytesRead = recv(socket, pBuffer, bufferSize, 0);

    if (bytesRead == 0) {
        printf("Client closed connection\n");

    } else if (bytesRead >0) {
        /* Appends fix-length buffer to a string */
        appendBufferToString(string, pBuffer, bufferSize);

    } else {
        /* Done reading */
        free(pBuffer);
        break;
    }
}
parsePacket(&string);
Advertisement
First, if you loop more than once, then you will leak a pBuffer per iteration except for the last.

Second, a recv value of 0 or -1 may indicate the connection has been closed in some way.

Third, recv() will block if there is no data waiting in the receive buffer. And, if a client crashes out or somehow becomes "non-nicely" disconnected, that means recv() will block forever (or until the connection times out, which will typically take hours.)

To know whether recv() will block or not, use the select() call. Generally, a networked program will replace whatever the main loop top-level thing is with select(). If there is a delay to wait for simulating or rendering, for example, then that delay is done using select(). If everything just runs as fast as possible, call select() with a 0 timeout. When data is available to be read, call recv() once on that socket, into whatever buffer you're currently working on filling. You may need to call recv() more than once to fill the buffer you want. Also, you may receive the end of one buffer, and the beginning of the next buffer, in the same call to recv().
enum Bool { True, False, FileNotFound };

Ah thanks. That code was made up for this thread for simplicity's sake, the actual code does use select() and handles the memory buffers. The strategy I'm using is to keep a linked list of buffers per connection, where each recv() reads data into a new buffer, and if any data is written it is added to the end of the linked list. The buffers are freed when the parser processes the list later. According to the man pages for recv(2):

All three routines return the length of the message on successful completion. If a message is too long to fit in the supplied buffer, excess bytes may be discarded depending on the type of socket the message is received from.

But it doesn't say which. And since I've been having networking-related bugs, I want to make sure that I'm not accidentally dropping data while not being susceptible to buffer overflow bugs/exploits. Blocking isn't too much of an issue; the program is supposed to be a daemon anyways. The program is connecting to game server instances, which may stream small or large amounts of data, and connecting to admin tools, which stream short commands. However, there is no set packet size, so I want to know if I can play it safe by reading from a socket into a growing list of buffers until it is dry.


But it doesn't say which. And since I've been having networking-related bugs, I want to make sure that I'm not accidentally dropping data while not being susceptible to buffer overflow bugs/exploits. Blocking isn't too much of an issue; the program is supposed to be a daemon anyways. The program is connecting to game server instances, which may stream small or large amounts of data, and connecting to admin tools, which stream short commands. However, there is no set packet size, so I want to know if I can play it safe by reading from a socket into a growing list of buffers until it is dry.

TCP (SOCK_STREAM) never drops data from the internal buffer, and will only give you what you request, leaving the rest available for subsequent calls to recv (I think internally it asks the remote host to send the data again if it can't keep up). So in that case you don't need to worry about losing bytes. What you do need to worry about is your stream getting fragmented, so you need to maintain some state during the connection, e.g. "ok so I've received 50 bytes of this 100-byte string, I still need 50 bytes..", because simply checking recv()'s return value as an indicator that a send() call on the remote host has been received is unreliable - TCP is stream-oriented, so it's possible that a remote host sends "hello" and then "test", and you, in your recv() loop, receive first "he", then "l", then "lote" and finally "st" (*this is a simplification, in reality this exact scenario does not really happen because packets have a minimum size before being fragmented, but it occurs all the time for longer streams). You cannot use recv()'s return value as an indicator that all data has made it to you, unless the remote host immediately closes afterwards (and even then I am not sure - someone will have to confirm this but I believe it's possible for the socket to close while data is still in transit to you, so you need some synchronization).

On the other hand UDP (SOCK_DGRAM) gives you no guarantees. The packet may have been discarded anywhere between source and destination, and that includes your system's internal buffer if it happens to run out of space. That's part of the deal, and it is a reality you will have to accept if you do UDP networking.

“If I understand the standard right it is legal and safe to do this but the resulting value could be anything.”

Thanks, that clears up my question. The protocols I'm using are just based around arbitrarily-length string messages separated by delimiters; message size is not communicated by any process. I've been burnt quite badly in the past by assuming that TCP preserves packet boundaries, hence the rotating buffers. :P

I've just never seen anybody else do it that way.

This topic is closed to new replies.

Advertisement