Questions about some Source Engine 2007 code.

Started by
6 comments, last by frob 5 years, 1 month ago

Saw some confusing things on the source engine 2007 code and was wondering if anyone knows why these are in here.

https://github.com/VSES/SourceEngine2007/blob/master/se2007/engine/cl_rcon.cpp#L559-L563

Why are they verifying that the read length is at least 4 bytes?

If a packet was sent with a length of 20 bytes and only the first 18 bytes were transmitted on the first send() attempt, then wouldn't the recv function for this class not attempt to recv those last 2 bytes until the next packet with at least 2 bytes (because 2+2=4) gets sent?

 

https://github.com/VSES/SourceEngine2007/blob/master/se2007/engine/cl_rcon.cpp#L568-L572

Why are they allocating a min recv buffer of 1024 instead of just the read length on the stream and attempting to read a minimum of 1024 bytes instead of readLen?

Advertisement

Look down the file a bit, you'll see after successfully reading data they call ParseReceivedData().  

 

Inside the parsing code you can see every packet starts with an int for the size of the packet, followed by a series of details beginning with a request ID int, command ID int, and then details of the command.

Without looking deeper in the code about how they do the buffer, that suggests that if they don't have enough data to actually be a packet they don't bother processing it. Processing it requires memory allocations and releases and hardware access, which are time consuming.

It is possibly an issue if their buffering system or their commands happen to move less than 4 items and then stall for a time, but in most games there is a constant flow of data so that isn't a problem.  Data will continue to flow continuously, so it would likely reach the four-byte threshold before the next update is run.  If the buffering were configured that way then it might be better to process any amount of data that has been received in case they were mid-data before, but that's a bigger question about the game's performance cost for waiting for data and the time between updates.

 

As for the minimum buffer size of 1024 bytes, that is probably because time passes and because network hardware runs separate from the program.  They query how much data is ready at one moment, but that is at least six lines and potentially many loops later, which includes a memory allocation and the loop of processing data buffers. Many things may have happened, including the OS using that time to swap to different processes so potentially a significant amount of time has passed between the time.  

Rather than only reading the values that had arrived during the earlier "is there any data?" query, they have a buffer that allows for at least 1024 bytes as a minimum. Thus even if there were only 32 bytes available during the initial query, if there were more bytes available at the time recv() is called it would pull up to 1024 bytes that are available, rather than reading only part of the data and keeping it queued until the next update to run.  They may have benefited from a second test inside the loop to update how many bytes are available to read, but again, that's a bigger design question of the damage done by waiting for data as well as understanding the time between updates.

Yes, servers generaly drop packets, or sessions, or clients, who do not send a complete contained initiation header successfully to be red.That happens on TCP protocol only with false client software.

It is totally possible, within the TCP spec, for a packet to be split across multiple IP datagrams, or for multiple packets to be combined into a single datagram. When such a datagram makes it to the other end, whatever data is in that datagram (that "glues on" to the previous end received,) will be made available to the reading program through the recv() call.

This is why recv() is different from read() -- read() will read until it fills the buffer, unless it knows that the buffer can never be filled (such as end of file), whereas recv() will return "whatever is there right now" even if it's less than you ask for, and even if more may be available later.

This means that a network receiver thread for TCP, that wants to decode individual length-delimited "packets," need to read whatever it can into some kind of cyclic buffer, and then once recv() has returned whatever it's going to return, check repeatedly whether it has enough data to decode a length field, and if so, decode the length field and see if there's enough data in the receive buffer for the full message, and if so, remove that message from the receive buffer, and then check again whether there's another message that also is there. Any other behavior (and any assumptions about particular pieces of data arriving together) are bugs, which will cause unwanted behavior under varying network conditions (lossy WiFi? cell phone? dial-up? super-fast optical fiber wired to your datacenter? squirrels gnawing on the DSL wire? lightning in Ohio? satellite connection from a sailboat in the Pacific ocean? Who knows!)

enum Bool { True, False, FileNotFound };

But there are heavy duty http servers that do not bother if initiation read data does not contain enough header information to establish request and sends response code 400 Bad Request. Those cannot afford any other way, for they would be prone to floodings and more easily exposed to low-scale DDoS, if they tried to buffer unknown requests /connections data yet stacking them up.

In other words, it always depends, right?

there are heavy duty http servers that do not bother if initiation read data does not contain enough header information to establish request and sends response code 400 Bad Request.

All kinds of people violate all kinds of specifications in a variety of ways for a variety of reasons. For 99% of the cases, the violations are actually mis-guided. But I'm not sure the case you're describing is a specification violation.

The HTTP server time-out behavior I know of and have seen doesn't require "a single datagram" to contain all the headers; rather they impose a timeout limit on requests to complete, and if the necessary data isn't available by that deadline, the request times out and gets an error back. This is necessary to avoid slow or malicious clients blocking server resources permanently. Because the TCP spec is not violated (multiple datagrams that arrive are glued together and served in a stream) and the HTTP spec is silent on the behavior of timing, this could be considered conformant behavior.

When it comes to games, you similarly want to kick clients that haven't sent any messages for many seconds, but the point I'm making above is that a single call to recv() will not necessarily return a "well formed" packet; you need to glue the result of multiple calls to recv() together to re-form the stream that the sender sent. and because network transmission is what it is, those multiple calls may be separated by some amount of time.

enum Bool { True, False, FileNotFound };
1 hour ago, hplus0603 said:

and because network transmission is what it is, those multiple calls may be separated by some amount of time.

Which gets its way back to the original code question.  There is some amount of time between the query for the amount of data received versus the request to fill up the buffer with data.

The buffer is extra large to accommodate any data that may have been received during that time window.  Because on a modern computer, you have no idea if the time between lines of code is best measured in nanoseconds or milliseconds, or in the case of power management, potentially even years between two adjacent instructions... 

Ultimately you have little or no say on what the hardware will do between here and there. There is a gap, and things happen in gaps. You don't know when data will arrive, or how much will arrive and be available at any given time.  So grab everything that is available, process everything you've received that you're able, and hold the rest for the next update. 

This topic is closed to new replies.

Advertisement