Jump to content

  • Log In with Google      Sign In   
  • Create Account

UDP network layer must-haves?


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
33 replies to this topic

#1 lerno   Members   -  Reputation: 209

Like
0Likes
Like

Posted 30 July 2013 - 01:52 PM

I've been trying to gather requirements for an UDP network protocol with virtual connections in a client-server scenario. 

 

This is what I got so far

  1. Glenn Fiedler's series: Use a protocol prefix to filter packets without the prefix (no real motivation given)
  2. Quake 3: Handle the case where the player's port may randomly change due to NAT behaviour. I asked on serverfault.com and heard that this may happen in other cases than with old NATs. I notice this is in Enet as well.
  3. Packet sequencing (obvious)
  4. Acks (implicit or explicit), some protocols use timeouts as well.
  5. Some sort of bandwidth / congestion handling (most implementations have this)
  6. Fragment large packets

What's missing?



Sponsor:

#2 hplus0603   Moderators   -  Reputation: 5309

Like
0Likes
Like

Posted 30 July 2013 - 03:08 PM

7. Good statistics/metrics from the running system, and ideally reported back from the clients for sampling


enum Bool { True, False, FileNotFound };

#3 lerno   Members   -  Reputation: 209

Like
0Likes
Like

Posted 30 July 2013 - 03:18 PM

7. Good statistics/metrics from the running system, and ideally reported back from the clients for sampling

 

Anything different from a TCP/IP implementation, aside from monitoring packet loss?



#4 hplus0603   Moderators   -  Reputation: 5309

Like
0Likes
Like

Posted 30 July 2013 - 08:27 PM

You need more than packet loss to prove you're getting any value out of the UDP, I would assume.

 

Being able to actually track down cases where NAT punch-through doesn't work right, or users try to fake connections, or whatever, is also pretty useful.

 

Actually, I'm not seeing "NAT introduction and punch-through" on the list. That's probably expected from a commercial network library these days.


enum Bool { True, False, FileNotFound };

#5 Sik_the_hedgehog   Crossbones+   -  Reputation: 1748

Like
0Likes
Like

Posted 30 July 2013 - 10:15 PM

The problem is that NAT punchthrough requires an external server as far as I know (you punch the hole in the NAT by trying to connect to that server). If it's for connection between the players directly you're most likely screwed if they aren't using a VPN or something like that.


Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

#6 hplus0603   Moderators   -  Reputation: 5309

Like
0Likes
Like

Posted 30 July 2013 - 10:44 PM

The problem is that NAT punchthrough requires an external server as far as I know


Yes, it does! Any UDP networking library that doesn't actually provide at least some support for it is unlikely to be really all that useful these days.
The support could be as simple as "here's the function to call, and here's a sample server you can run on Amazon EC2 or Interserver or whatever, and punch-through will be yours!"
As long as it's actually built into the library.
enum Bool { True, False, FileNotFound };

#7 lerno   Members   -  Reputation: 209

Like
0Likes
Like

Posted 31 July 2013 - 02:05 AM

Well, I deliberately excluded it as I didn't think it necessary for a strict client-server protocol where the server uses a static IP. But it's useful to mention.

 

In regards to statistics - both TCP and UDP would measure some sort of latency, so that would not be unique to UDP.



#8 samoth   Crossbones+   -  Reputation: 4783

Like
0Likes
Like

Posted 31 July 2013 - 03:29 AM

I'm somewhat unsure about what (1) is good for.

 

In order to discriminate non-malicious UDP datagrams from some different protocol, you would in my opinion just use a different destination port number, that's by all means good enough. How many different UDP protocols do you have in use on your server anyway? Surely not 2,000 or 3,000 of them all at the same time -- it should be quite possible to use a port number (or a range of port numbers) that doesn't conflict.

On the other hand, prefixing a well-known 32 bit number isn't a very serious challenge for someone maliciously sending fake datagrams.

 

If anything, I'd negotiate a random per-connection ID and use that to quickly filter out the most stupid attackers. It's still no big challenge for someone who really means to send you fake packets, but at least it isn't totally trivial to circumvent, and it's more or less a no-op to verify (so it's a good initial check to comb out noise).

 

About (6), I'm also not sure. You primarily use UDP to have low latency. If you mean to send bulk data, you use TCP (easier, more reliable, and exactly as fast as UDP). Insofar, "large data" is somewhat contradictory. Large data cannot have low latency by definition.

I would, on the contrary, require that individual messages within datagrams have a size no larger than 255 bytes (serialization layer). This allows you to length-prefix them with a single byte, which is convenient and also prevents some denial of service attacks. If someone really needs to send several kilobytes or megabytes, they can still split up data at a higher level and send a few dozen/hundred messages, which will go into many datagrams.

 

Also, UDP/IP already supports fragmenting natively, in case one of your datagrams should really be too big.

 

(7) as proposed by hplus0603 is harder to get right than it sounds, but it is equally important. Nothing stinks more than "network that doesn't work" when nobody can tell for sure what exactly doesn't work or why. You really really really want good metrics when there are problems, both to make sure your software works as expected (during beta and stress testing) and for being able to tell a user who is complaining whether it's a problem at their end (and what they can do).


Edited by samoth, 31 July 2013 - 03:42 AM.


#9 lerno   Members   -  Reputation: 209

Like
0Likes
Like

Posted 31 July 2013 - 03:36 AM

I'm somewhat unsure about what (1) is good for.

 

Me too, and like you say, it appears trivial to circumvent since if you're making a DDOS, you're likely to already have used wireshark or similar to determine what a valid payload looks like. Aside from that Glenn Fiedler seems to know what he's talking about in regards to UDP, so I wasn't about to dismiss it outright. It's not really clear from his articles why he put it there.

 

Any suggestions on metrics?



#10 lerno   Members   -  Reputation: 209

Like
0Likes
Like

Posted 02 August 2013 - 01:34 PM


About (6), I'm also not sure. You primarily use UDP to have low latency. If you mean to send bulk data, you use TCP (easier, more reliable, and exactly as fast as UDP). Insofar, "large data" is somewhat contradictory. Large data cannot have low latency by definition.
I would, on the contrary, require that individual messages within datagrams have a size no larger than 255 bytes (serialization layer). This allows you to length-prefix them with a single byte, which is convenient and also prevents some denial of service attacks. If someone really needs to send several kilobytes or megabytes, they can still split up data at a higher level and send a few dozen/hundred messages, which will go into many datagrams.

 

Both Q3 networking code and Enet has their own fragmentation code, fragmenting things below an arbitrary guessed MTU you can set up. Isn't the advantage that if you do your own fragmentation, then you can be fairly sure (given that you've taken care to select a good MTU) there won't be any unnecessary defragmentation->fragmentation happening anywhere except for the final networking layer. If each fragment is correctly tagged, then it might be possible to avoid wasting time to wait for remaining fragments of an out-of-date fragment.

 

Maybe there are other reasons as well.



#11 hplus0603   Moderators   -  Reputation: 5309

Like
2Likes
Like

Posted 02 August 2013 - 10:26 PM

For metrics, you want various kind of parameters for gross trouble-shooting:

- customer class (if you have it)

- session state (not-established, negotiating, lobby, game, etc)

- size of packet

- number of payload messages per packet

- payload message types

- packet direction (to server or from server)

- number of dropped packets detected

- number of duplicate packets detected

- number of reordered packets detected

- measured round-trip latency

- number of malformed packets

 

In the best of worlds, you chuck all of this at some reporting infrastructure, and generate an online cube where you can slice your traffic across arbitrary dimensions (each of the things above classify packets across some dimension.) This doesn't necessarily need to be real-time, because it's useful for finding things you didn't know about your game. Each individual dimension could separately be real-time, and the drill-down would be relegated to a baked cube, for example.

At that point, you can get reports like "number of packets that are re-ordered, containing the 'fire weapon' payload."

 

Now, there's a second level of diagnosis, where you can get this data sliced, in real-time, based on specific identifiers. Specific IP. Specific game instance. Specific player. Random sampling. Etc. Being able to have a customer on the line and turn on tracing of that particular customer's traffic is super helpful while debugging.

 

Another useful feature is the ability to capture and play back traffic, both for analysis, and for actual system analysis. If you can capture all packets that go in since server start, then you can reproduce any server state offline for debugging!


enum Bool { True, False, FileNotFound };

#12 lerno   Members   -  Reputation: 209

Like
0Likes
Like

Posted 03 August 2013 - 01:23 AM

hplus0603: Thanks, that's a great list to start out with.



#13 Washu   Senior Moderators   -  Reputation: 5195

Like
0Likes
Like

Posted 03 August 2013 - 03:25 AM

Such a system should also have, ideally, an ability to simulate various network conditions and failures. Examples include:

  • Being able to simulate various forms of latency (client to server, server to client, bidirectional)
  • Dropped packets
  • Partial or corrupted packets (it can happen, even with TCP, The CRC only has so many bits)

Obviously, all these things should work with the metrics that are gathered, to allow you to diagnose and mitigate any issues found.


In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.
ScapeCode - Blog | SlimDX


#14 hplus0603   Moderators   -  Reputation: 5309

Like
1Likes
Like

Posted 08 August 2013 - 11:01 AM

it can happen, even with TCP, The CRC only has so many bits


Although, luckily, pretty much all physical transport layers have their own detection/correction codes, that significantly improve the robustness. TCP over a 1/10000 reliable physical link would be terrible. Luckily, typical links have bit error rates much, much lower than that.
enum Bool { True, False, FileNotFound };

#15 Washu   Senior Moderators   -  Reputation: 5195

Like
0Likes
Like

Posted 08 August 2013 - 12:13 PM

it can happen, even with TCP, The CRC only has so many bits


Although, luckily, pretty much all physical transport layers have their own detection/correction codes, that significantly improve the robustness. TCP over a 1/10000 reliable physical link would be terrible. Luckily, typical links have bit error rates much, much lower than that.


Yes, and no...

Older data link layer protocols have quite significant error checking capabilities... being from a time when data lines were usually noisy and not very good. However newer data link layer protocols, due to the increased quality of the lines and equipment, have significantly reduced error correction, preferring to defer that to higher layer protocols. Although they haven't completely eliminated it, you usually have fairly simple error checking (like parity bits). But yes, there is error checking on many different levels. Nevertheless, it is something you should be able to test on your platform, to ensure that your code is handling it correctly.

In time the project grows, the ignorance of its devs it shows, with many a convoluted function, it plunges into deep compunction, the price of failure is high, Washu's mirth is nigh.
ScapeCode - Blog | SlimDX


#16 Dave Weinstein   Members   -  Reputation: 505

Like
0Likes
Like

Posted 09 August 2013 - 09:01 AM

Protocol prefixes exist for a couple of reasons (neither of which is to block a malicious attacker).

 

One is to easily filter out another program which happens to be squatting on your registered port (you are going to register your port, right?). People should not re-use IANA registered ports, but they do.

 

The other is to allow you to do breaking revisions to your own protocol later, and have that simply filtered out.



#17 hplus0603   Moderators   -  Reputation: 5309

Like
1Likes
Like

Posted 09 August 2013 - 09:56 AM

you are going to register your port, right?


You're kidding, right? If every custom protocol invented was registered, and ports not re-used, we would have ran out of ports in 1982. (Or earlier.)

And, given that you provide the server, and manage the server, how could another service be squatting on the same port? The only reason that could happen would be if some third-party user accidentally puts in the wrong hostname/port information in some other program. Or they did it maliciously -- this is known as a "fuzz attack."

It does make sense to include protocol version information in your preamble, though, so you can reject clients that are too old. This may not need to be part of every packet -- making it part of the initial credentials packet, and making successive packets just rely on successful authentication, might be good enough.


Edited by hplus0603, 10 August 2013 - 12:24 PM.

enum Bool { True, False, FileNotFound };

#18 Dave Weinstein   Members   -  Reputation: 505

Like
0Likes
Like

Posted 09 August 2013 - 10:51 PM


And, given that you provide the server, and manage the server, how could another service be squatting on the same port? The only reason that could happen would be if some third-party user accidentally puts in the wrong hostname/port information in some other program. Or they did it maliciously -- this is known as a "fuzz DDoS attack."

 

It is more of an issue for LAN play, where broadcast packets become problematic if multiple applications are using the same port. But I've seen lots of cases where companies (including large companies) decide to use a port that is already registered by someone else for some completely different purpose and just set up shop.

 

As to your first question, if you go look at the IANA list, you'll see that I registered a set of ports for all the Red Storm games back in the '90s.



#19 samoth   Crossbones+   -  Reputation: 4783

Like
0Likes
Like

Posted 10 August 2013 - 04:17 AM

Both Q3 networking code and Enet has their own fragmentation code, fragmenting things below an arbitrary guessed MTU you can set up. Isn't the advantage that if you do your own fragmentation, then you can be fairly sure (given that you've taken care to select a good MTU) there won't be any unnecessary defragmentation->fragmentation happening anywhere except for the final networking layer. If each fragment is correctly tagged, then it might be possible to avoid wasting time to wait for remaining fragments of an out-of-date fragment.

 

Maybe there are other reasons as well.

 

There are mainly two reasons why you would implement your own fragmentation layer:

1. You have no clue (rather unlikely for Q3)

2. You know that IP does fragmentation but want to avoid it

 

Why would you want to avoid it? There is at least in theory one good reason. IP discards the whole datagram if one of its fragments is lost. Assume you send a 4 kilobyte datagram with a MTU of 1280, respectively 4 fragments. If you do your own fragmentation, those "fragments" are complete datagrams. If one is lost, you still get the other three. Relying on IP fragmentation means that if one is lost, you lose all four.

 

So much for the theory. In reality, you do not lose datagrams at all. Except when you lose them, and then you lose them in dozens, not just one.

 

Losing individual datagrams because of "noise" just doesn't happen nowadays (except maybe on a pathetic low signal wireless, but you wouldn't want to play a game in such a setup anyway). When you lose packets, it's because some router's queue is temporarily full and it discards every incoming packet until it gets a breather, maybe 0.1 seconds or so later. Insofar, there is no visible difference between losing one fragment and losing all of them, as it's the same observable end result either way.



#20 hplus0603   Moderators   -  Reputation: 5309

Like
0Likes
Like

Posted 10 August 2013 - 12:29 PM

then you lose them in dozens, not just one

 

Very true!

 

In fact, most networking hardware seems to have too much buffering, rather than too little, these days, which leads to all kinds of bad oscillation behavior. (Google "buffer bloat" for examples from the TCP world.)

 

 

 

you wouldn't want to play a game in such a setup anyway

 

You might not want to, but your users are quite likely to try. (And over a 3G mobile connection. And over satellite internet. And over tin-cans-with-string using a carrier pigeon back-up.)

Guess who they will blame when the game doesn't work? Unless you have a very clear meter for quality, and very clearly shows how the quality impacts the game working or not, they will blame you.


Edited by hplus0603, 10 August 2013 - 12:29 PM.

enum Bool { True, False, FileNotFound };




Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS