UDP network layer must-haves?

Started by
26 comments, last by hplus0603 10 years, 7 months ago

I've been trying to gather requirements for an UDP network protocol with virtual connections in a client-server scenario.

This is what I got so far

  1. Glenn Fiedler's series: Use a protocol prefix to filter packets without the prefix (no real motivation given)
  2. Quake 3: Handle the case where the player's port may randomly change due to NAT behaviour. I asked on serverfault.com and heard that this may happen in other cases than with old NATs. I notice this is in Enet as well.
  3. Packet sequencing (obvious)
  4. Acks (implicit or explicit), some protocols use timeouts as well.
  5. Some sort of bandwidth / congestion handling (most implementations have this)
  6. Fragment large packets

What's missing?

Advertisement

7. Good statistics/metrics from the running system, and ideally reported back from the clients for sampling

enum Bool { True, False, FileNotFound };

7. Good statistics/metrics from the running system, and ideally reported back from the clients for sampling

Anything different from a TCP/IP implementation, aside from monitoring packet loss?

You need more than packet loss to prove you're getting any value out of the UDP, I would assume.

Being able to actually track down cases where NAT punch-through doesn't work right, or users try to fake connections, or whatever, is also pretty useful.

Actually, I'm not seeing "NAT introduction and punch-through" on the list. That's probably expected from a commercial network library these days.

enum Bool { True, False, FileNotFound };

The problem is that NAT punchthrough requires an external server as far as I know (you punch the hole in the NAT by trying to connect to that server). If it's for connection between the players directly you're most likely screwed if they aren't using a VPN or something like that.

Don't pay much attention to "the hedgehog" in my nick, it's just because "Sik" was already taken =/ By the way, Sik is pronounced like seek, not like sick.

The problem is that NAT punchthrough requires an external server as far as I know


Yes, it does! Any UDP networking library that doesn't actually provide at least some support for it is unlikely to be really all that useful these days.
The support could be as simple as "here's the function to call, and here's a sample server you can run on Amazon EC2 or Interserver or whatever, and punch-through will be yours!"
As long as it's actually built into the library.
enum Bool { True, False, FileNotFound };

Well, I deliberately excluded it as I didn't think it necessary for a strict client-server protocol where the server uses a static IP. But it's useful to mention.

In regards to statistics - both TCP and UDP would measure some sort of latency, so that would not be unique to UDP.

I'm somewhat unsure about what (1) is good for.

In order to discriminate non-malicious UDP datagrams from some different protocol, you would in my opinion just use a different destination port number, that's by all means good enough. How many different UDP protocols do you have in use on your server anyway? Surely not 2,000 or 3,000 of them all at the same time -- it should be quite possible to use a port number (or a range of port numbers) that doesn't conflict.

On the other hand, prefixing a well-known 32 bit number isn't a very serious challenge for someone maliciously sending fake datagrams.

If anything, I'd negotiate a random per-connection ID and use that to quickly filter out the most stupid attackers. It's still no big challenge for someone who really means to send you fake packets, but at least it isn't totally trivial to circumvent, and it's more or less a no-op to verify (so it's a good initial check to comb out noise).

About (6), I'm also not sure. You primarily use UDP to have low latency. If you mean to send bulk data, you use TCP (easier, more reliable, and exactly as fast as UDP). Insofar, "large data" is somewhat contradictory. Large data cannot have low latency by definition.

I would, on the contrary, require that individual messages within datagrams have a size no larger than 255 bytes (serialization layer). This allows you to length-prefix them with a single byte, which is convenient and also prevents some denial of service attacks. If someone really needs to send several kilobytes or megabytes, they can still split up data at a higher level and send a few dozen/hundred messages, which will go into many datagrams.

Also, UDP/IP already supports fragmenting natively, in case one of your datagrams should really be too big.

(7) as proposed by hplus0603 is harder to get right than it sounds, but it is equally important. Nothing stinks more than "network that doesn't work" when nobody can tell for sure what exactly doesn't work or why. You really really really want good metrics when there are problems, both to make sure your software works as expected (during beta and stress testing) and for being able to tell a user who is complaining whether it's a problem at their end (and what they can do).

I'm somewhat unsure about what (1) is good for.

Me too, and like you say, it appears trivial to circumvent since if you're making a DDOS, you're likely to already have used wireshark or similar to determine what a valid payload looks like. Aside from that Glenn Fiedler seems to know what he's talking about in regards to UDP, so I wasn't about to dismiss it outright. It's not really clear from his articles why he put it there.

Any suggestions on metrics?


About (6), I'm also not sure. You primarily use UDP to have low latency. If you mean to send bulk data, you use TCP (easier, more reliable, and exactly as fast as UDP). Insofar, "large data" is somewhat contradictory. Large data cannot have low latency by definition.
I would, on the contrary, require that individual messages within datagrams have a size no larger than 255 bytes (serialization layer). This allows you to length-prefix them with a single byte, which is convenient and also prevents some denial of service attacks. If someone really needs to send several kilobytes or megabytes, they can still split up data at a higher level and send a few dozen/hundred messages, which will go into many datagrams.

Both Q3 networking code and Enet has their own fragmentation code, fragmenting things below an arbitrary guessed MTU you can set up. Isn't the advantage that if you do your own fragmentation, then you can be fairly sure (given that you've taken care to select a good MTU) there won't be any unnecessary defragmentation->fragmentation happening anywhere except for the final networking layer. If each fragment is correctly tagged, then it might be possible to avoid wasting time to wait for remaining fragments of an out-of-date fragment.

Maybe there are other reasons as well.

This topic is closed to new replies.

Advertisement