Theory - Handling TCP data that is split on recv calls

DividedByZero · 2011-05-10T00:07:46

Hi Guys, I understand that (with TCP) if data is sent on one send() call, that it is possible that it may possibly take multiple recv calls to get the data that was sent in the first place. But, is there any delay between recv() calls or will the data always be there on the next call? For example will this happen? Data sent = "Hello world" Recv() = "Hell" Recv() = "o world" Or is it possible that this may happen; Recv() = "Hell" Recv() = NO DATA Recv() = NO DATA Recv() = "o world" Thanks in advance

Networking and Multiplayer Programming

Started by DividedByZero May 05, 2011 09:40 AM

17 comments, last by Drew_Benton 12 years, 11 months ago

DividedByZero

1,261

Author

May 06, 2011 12:37 AM

Thanks for the advice. I agree 100% with the variable packet sizes. Thats why I was thinking that 'chat' might have an ID of 50 (for example) and then a variable size, to save wasting bandwidth with a bunch of zero's.

Clientwise - I was also just toying with the thought of using blocking sockets and having a separate thread that monitors incoming data. So, this way the main loop of the app will never block but will check the network class on everyloop to see if a complete set of data has come in.

The recv thread could set a private variable when there is data that is ready to be used by the main app and once the main app has used the data, could reset the variable (via a public function) to tell the class it is ready for the next data structure.

That way the recv thread can hang around all it likes (within the application rules etc..) withought stopping flow of the main program at all.

What are your thoughts on this?

Drew_Benton

1,865

May 06, 2011 05:06 AM

Since I've not seen it mentioned yet in this thread, I'll throw out the obligatory consider using: boost::asio, ACE, POCO, or any other known and established library that is designed to take care of these things for you. I'd recommend looking into boost::asio first myself. Once you learn how to use libraries like those, you will rarely find the need to work on this low level again.

But if you still want to do it yourself, I would really start out with a simple design that hplus linked to in his other reply. The client would simply call select with 0 timeout so it does not block. If there was data to receive, you can then being pulling it out and processing any complete messages. Depending on how much traffic and how many messages take place in your game, you might not really need to separate the networking logic from the client thread.

They can coexist together without any issues as long as you take some precautions in your design. Namely, you only check select at fixed rates rather than every update cycle, as checking to see if there is data each loop is a waste of resources. Once you being message processing, you keep track of processing time so you can bail out if too much time is being spent. On the next update cycle, you would continue processing messages.

That should be about it really. You should be able to implement and use a single threaded solution that will last you a good while. You can consider changing the method if you determine that running the network logic in your main thread is causing real client issues with performance. Usually though, such issues are not related to the actual networking IO aspect as much as just how you deserialize and process the messages

To get back to your question, I personally don't like using a synchronized variable to let the system know data is pending in this context. It's far too easy to come up with an implementation that suffers from a race condition or does not properly handle multiple events the same time. Your actual implementation will vary based on how many producer threads and how many consumer threads you have. In the end, you still have to have a lock take place, unless you are using a lockless queue.

Because of this, I'd implement a solution like this on Windows (due to the way CRITICAL_SECTION works):

global message queue
global lock (critical section)

Network Thread

while connected
- locals: buffer, size, index, messages
- if we have room left to store messages (we do not want the network thread to flood the client ever)
--- recv to buffer
--- perform protocol specific logic to split buffer into messages
----- running this logic in a thread only gives benefits when there is overhead from
----- packet decryption or other expensive deserialization calls

- if we have messages
-- global lock (enter critical section)
--- for each message in messages
----- add message to global message queue
-- global unlock (leave critical section)

- Sleep only if message queue is full, client thread is not consuming fast enough,
- so let it catch up some.

loop

Main Client Thread

while running
-- everything but network stuff --

- if ready to check for network events ( say you check every 1/60 of a second, some games more, others less)
-- global lock (enter critical section)
---- copy global message queue to local var
---- clear global message queue
-- global unlock (leave critical section)

-- process all messages or queue them into a client message queue to
--- process over the next bunch of update cycles that network logic is not called.

loop

But globals are bad!!1! Not always. In this case, you just want to be able to pass data from one thread to another. The easiest and most efficient way that has the lowest overhead (on Windows) is using a design like this. Simple and straight forward gets the job done. Since the client loop is only attempting to acquire the lock at a fixed rate, the maximum number of lock contentions you can have will be the inverse (i.e. imagine the threads happen to sync perfectly so both contend for the lock each loop).

All of the CRITICAL_SECTION locks the network thread makes when adding messages have no effect on the client thread until the client thread tries to acquire the lock. By that time, the lock is held for a very short period of time, as the only operations that are taking places are going to be pointer copy'ing related (assuming you are using a fixed size array as the copying medium, which means you limit the producer thread from having more than N amount of stored messages at a time). Say your global packet queue is just a fixed size array (or vector with a pre-allocated size) of 1024 elements or so and you store an int for the size. The operations that take place between the locks are pretty miniscule to not cause any client thread delays (no allocations are taking place at this time!). I just threw out a number, you'd probably want to make it a bit smaller depending on expected number of messages.

There are certainly other ways to do this, but then you are unnecessarily complicating what otherwise should be a very simple design. Other approaches include using PostThreadMessage to post the message objects to the client thread, using QueueUserAPC to invoke a function from the client thread context, attempting to create your own lockless queue, or making use of a library, like boost.

If you wanted something that was cross-platform, then this approach would not work as efficiently since it relies on CRITICAL_SECTION being so 'cheap' to use. It would still work with other locking mechanisms, but you have to be careful which one was chosen as some carry very high overhead. It should be noted that these things are not premature optimizations, it is simply a matter of choosing the right tools. While the 1024 array might be borderline premature optimization, it's other purpose is to allow you to gauge packet throughput to know if something is going wrong, either in the network thread or your client thread. If each update cycle you have a lot of messages to process, it means you might need to allocate more time for the network processing logic. Alternatively, if the time between the network logic processing in the main thread is greater than the rate you have set, it means something else in your system is eating up more than its fair share of time.

My final advice would be to start simple and get that to work. A single threaded solution should be possible and more than enough to handle what you want to throw at it at this stage from what I have read. It also simplifies a lot of other things so if you can't get a viable solution with it, chances are you are doing something wrong (or perhaps even using the wrong protocol!). If you want to make use of more efficient networking methods right off the bat, then it is strongly recommended to use an existing library. There are tradeoffs to each approach, but I think you just need to get more familiar with each method to understand how they could help you out (or not help you at all). Multithreaded programming in C++ should not be taken lightly, so be careful before just jumping right in!

DividedByZero

1,261

Author

May 06, 2011 06:25 AM

Excellent post.

I'll do a bit of playing over the next few days to see what I can come up with. I think I am at the point where I can come up with something workable and see how it goes.

Thanks again guys! :cool:

MickeyMouse

201

May 09, 2011 12:39 AM

You may also want to check out my solution to the problem (including source code) which is very similar to what hplus0603 proposed in a different thread but it handles both sending and receiving for you using non-blocking TCP sockets.

Maciej Sawitus
my blog | my games

ddyer

262

May 09, 2011 06:15 AM

You cannot make any assumptions about how many recv calls will be needed and how those received blocks of data will be split up.
You should never wait for data except in a task that does nothing else; or otherwise use nonblocking IO.
Do not forget that your send calls may block too, unless you are using nonblocking send calls.
You cannot ever make any assumptions, however reasonable, about the buffering capacity of the channel or the time delays that
may be involved in immediate delivery of the data.
That's TCP. It's the price you pay for guaranteed delivery.

.

---visit my game site http://www.boardspace.net - free online strategy games

Bozebo

108

May 09, 2011 11:47 AM

The way I do this is to update a buffer whenver recv returns something (so recv is called in a loop somewhere, 10 times a second works OK for me at the moment).
The first 2 bytes of the buffer are always meant to be an unsigned short representing an op-code. Each op-code relates to a command (such as login, update health, spawn zoned-in player etc).

Every command has a known exact length or a variable length. If they have a variable length, the 2 bytes after the op-code describe this.

so:
[ushort op-code][ushort length][bytes for message data]

or for a message of known length:
[ushort op-code][exact number of bytes for message data]

the buffer could be something like this:
[op][len][arguments][op][arguments][op][arguments][op][len][arguments][op][op][len][arguments]

(nothing stoping you making messages without any arguments, all that is needed is the op-code)

The only way the buffer can become corrupted (ie, the first 2 bytes are not infact an op-code, but any other 2 bytes) is if there is bad programming somewhere or the packets leaving the client machine are being manupulated (so, custom client application or man in the middle etc.). Input validation should result in notification of a corrupted buffer, my applications' buffers only get corrupted if I purposely make them do so for testing the robustness of my code. And my code always detects the corrupted buffer, but sometimes there are artifacts (ie, if the first 2 bytes happen to be a [moveto] message when infact they were meant to be part of something completely different - the bytes after these 2 are then parsed as the moveto location. At the moment, my clients' input for some things is instantly trusted - so it can cause the player to teleport to an inaccessible location! Rather than the server noticing that the requested location is impossible)

With this format, it can always be known how many bytes are expected for a message, so the char array buffer can be clipped one message at a time:

First, decide how many bytes following the first 2 are part of the message (eg, if login is known to send an unsigned long playerID and a 32 byte md5 sum).
Second, check if the buffer is at least this many bytes long
Third, extract your data and trim it off the start of the buffer - the 2 bytes now at the start of the buffer should be the op-code for the next waiting message

You may want to incorporate things like the max number of messages to be resolved in one go, and the max buffer size to be expected from a behaving device on the other end (if you discover your game code only leaves about 200 bytes in the server's buffer on average, cap it at 1500 or something - if a connection has more than 1500 bytes in it's buffer for any length of time, kill it)

I also like to make a message that denotes how many messages from that device have been handled, so the client knows that the server is keeping up with it's message output - If you count messages out and in on both sides of the connection.

One problem I find doing things this way is that it is hard to make a proper object orientated system to handle messages, I tend to resort to hard coding most messages with switch statements... messy. Complex things like spawning new players that have just logged in send the argument bytes to a deserialise method on my object. If I had a command/event system yet, it would help make this easier.

hplus0603

11,916

May 09, 2011 06:57 PM

[ushort op-code][ushort length][bytes for message data]

or for a message of known length:
[ushort op-code][exact number of bytes for message data]

I find that it's more robust to always send the length. This allows you to do things like record/replay, proxying, and better structuring of the network stack.

Specifically, you might want your network stack to have a layer that lets you say "get next complete packet if there is one." If the opcode tells you whether the data is known length or not, then that layer actually has to understand what the opcodes mean. This is a high degree of coupling that's generally not very robust. If you always send the opcode and length in a known format, then that layer can give you a complete packet, without needing to know anything about what might be in that packet, or what opcodes "mean."

enum Bool { True, False, FileNotFound };

Bozebo

108

May 09, 2011 09:47 PM

If the opcode tells you whether the data is known length or not, then that layer actually has to understand what the opcodes mean. This is a high degree of coupling that's generally not very robust. If you always send the opcode and length in a known format, then that layer can give you a complete packet, without needing to know anything about what might be in that packet, or what opcodes "mean."

That makes sense. I have been struggling to find a proper way to form my messaging system into true object orientated way of working: I knew that programming a way to "get next complete packet if there is one." would be awkward, if all messages have a size it will be cleaner and simpler.

Drew_Benton

1,865

May 10, 2011 12:07 AM

That makes sense. I have been struggling to find a proper way to form my messaging system into true object orientated way of working: I knew that programming a way to "get next complete packet if there is one." would be awkward, if all messages have a size it will be cleaner and simpler.

Take a look at this thread and this thread whenever you get a chance.

The main concepts discussed in those threads allow you to keep an OOP driven design while still allowing you to implement the practical aspects of network communication. You serialize your objects to a stream first. Once that is done, you now have the length of the payload. You can then just build a second 'packet' for the header data (opcode, size, anything else) and send the header followed by the payload. If you wanted to implement encryption or compression, you just have to work with an intermediate buffer, but everything else stays the same. When you receive data, you process the header first then the payload. You pass the payload to the deserialize function and out comes your object! It's a pretty nice design, imo, and it works well.

The downside to the approach is as with any generic de/serialization code: wasted bytes. Since you can control the actual types that are written, you might consider optimizations over the longer run to control bandwidth costs after you carefully collect bandwidth usage. For example, rather than write a size_t for a string length (which is never recommended anyways since the size can change across 32bit-64bit architectures) you write an UInt16 then check to make sure the size is within an acceptable range.

Theory - Handling TCP data that is split on recv calls

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Theory - Handling TCP data that is split on recv calls

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines