UDP Replication

Started by
13 comments, last by Pranav Sathyanarayanan 9 years, 11 months ago

Quote
WinIOCP, ePoll or KEvent
This is an excellent tip for TCP, but less for UDP. With TCP, you have many sockets in an "unknown" state, but you can only do one thing at a time, so you need some special magic that tells you which one is ready so you can read from (or which overlapped operation has just finished).

Using UDP, you have a single socket, no more. And that single socket either has something to read, or it doesn't. Instead of blocking on a function that tells you "it's now ready" and then calling recvfrom, you can just as well block on recvfrom, which will conveniently return when something came in. Then push that packet on a queue (making the processing someone else's problem!) and immediately call recvfrom again.


Actually, it applies equally well to both tcp and udp. From the os side of things it doesn't matter if it is tcp or udp, multiple sockets or single sockets, these systems are just going to generate events which will wake a thread to pull the data. Basically what you are describing with waiting on recvfrom and then pushing to a queue is exactly what the underlying OS async solutions would be doing for you. The benefit, even with UDP, is you can throw more threads to wait on work at the OS solution without writing the queue portion yourself. Additionally, in WinIOCP at least, you will bypass much of the normal user space buffering of packet data and instead the data is directly written to your working buffers.

This is getting into fairly arcane and difficult coding areas but the general point is that the OS level interaction is effective in both cases. In fact, I tend to think that for udp these systems are even more effective since unlike tcp you will get more smaller events so in the long run the more efficient use of the resources will add up to a major win.
Advertisement

how would one go about not locking the queues as they are being read from and added to in different threads


In the simplest case, just use a lock to protect the queue, and use a linked list or the like to queue your items.
Because the queue is only locked for a very short amount of time (to enqueue or dequeue a single item,) there is no real risk of contention being a problem.
If you want to get fancier, look into "lockless FIFOs," which can be implemented incredibly cheaply, as long as you have only a single reader and a single writer per queue (which is actually the common case.)

However, seeing as you're still in school, I *highly* recommend avoiding threads in this server, at least for now. You simply do not need them, until you can show that the game server processing you do really does need multiple cores to achieve performance goals.
And, if you absolutely HAVE to use threads (this might be a mandatory project or something,) I'd highly recommend just using a simple mutex or other light-weight lock (CRITICAL_SECTION in Windows for example) to protect adding to and removing from each queue.
enum Bool { True, False, FileNotFound };


And, if you absolutely HAVE to use threads (this might be a mandatory project or something,) I'd highly recommend just using a simple mutex or other light-weight lock (CRITICAL_SECTION in Windows for example) to protect adding to and removing from each queue.

I completely agree, as I stated also, don't go and thread things until you can justify it. But, as a note, critical section as of Vista is no longer notably faster than a mutex. Window's moved the first order checks into user space like all other OS's which makes it basically the same as a critical section anymore.

Actually, it applies equally well to both tcp and udp. From the os side of things it doesn't matter if it is tcp or udp, multiple sockets or single sockets, these systems are just going to generate events which will wake a thread to pull the data.

Yes and no. The hardware generates interrupts of course, and the OS eventually wakes a thread, but not necessarily (even not usually) in a 1:1 correlation. However, given poll+receive you have two guaranteed user-kernel-user transitions instead of one.

For TCP that makes sense since you somehow must mulitplex between many sockets. There is not much of a choice if you wish to be responsive. -- For UDP, it's wasting 10k cycles per packet received for nothing, since there is only one socket, and nothing to multiplex. You can just as well receive right away instead of doing another round trip. Same for IOCP where you have two roundtrips, one for kicking off the operation, and one for checking completeness.

Throwing more threads at the problem doesn't help, by the way. Operating systems even try to do the opposite and coalesce interrupts. The network card DMAs several packets into memory, and sends a single interrupt. A single kernel thread does the checksumming and other stuff (like re-routing, fragment assembly), being entirely bound by memory bandwidth, not ALU. It eventually notifies whoever wants some.

A single user thread can easily receive data at the maximum rate any existing network device is capable delivering, using APIs from the early 1980s. Several threads will receive the same amount of data in smaller chunks, none faster, but with many more context switches.

Basically what you are describing with waiting on recvfrom and then pushing to a queue is exactly what the underlying OS async solutions would be doing for you. The benefit, even with UDP, is you can throw more threads to wait on work at the OS solution without writing the queue portion yourself.

Yes, this is more or less how Windows overlapped I/O or the GLibc userland aio implementation works, but not traditional Unix-style nonblocking I/O (or socket networking as such). Of course in reality there is no queue at all, only conceptually insofar as the worker thread reads into "some location" and then signals another thread via some means.

Additionally, in WinIOCP at least, you will bypass much of the normal user space buffering of packet data and instead the data is directly written to your working buffers.

Yes, albeit overlapped I/O is troublesome, and in some cases considerably slower than non-overlapped I/O. I have not benchmarked it for sockets since I deem that pointless, but e.g. for file I/O, overlapped is roughly half the speed on every system I've measured (for no apparent reason). The copy to the userland buffer does not seem to be a performance issue at all, surprising as it is (that's also true for Linux, try one of the complicated zero-copy APIs like tee/splice, and you'll see that while they look great on paper, in reality they're much more complicated and more troublesome, but none faster. Sometimes they're even slower than APIs that simply copy the buffer contents -- don't ask me why).

But even disregarding the performance issue (if it exists for overlapped sockets, it likely does not really matter), overlapped I/O is complicated and full of quirks. If it just worked as you expect, without exceptions and special cases, then it would be great, but it doesn't. Windows is a particular piss-head in that respect, but e.g. Linux is sometimes not much better.

Normally, when you do async I/O then your expectation is that you tell the OS to do something, and it doesn't block or stall or delay more than maybe a few dozen cycles, just to record the request. It may then take nanoseconds, milliseconds, or minutes for the request to complete (or fail) and then you are notified in some way. That's what happens in your dreams, at least.

Reality has it that Windows will sometimes just decide that it can serve the request "immediately", even though they have a very weird idea of what "immediately" means. I've had "immediately" take several milliseconds in extreme cases, which is a big "WTF?!!" when you expect that stuff happens asynchronously and thus your thread won't block. Also there is no way of preventing Windows from doing that, nor is there anything you can do (since it's already too late!) when you realize it happened.

Linux on the other hand, has some obscure undocumented limits that you will usually not run into, but when you do, submitting a command just blocks for an arbitrarily long time, bang you're dead. Since this isn't even documented, it is actually an even bigger "WTF?!!" than on the Windows side (although you can't do anything about it, at least Microsoft tells you right away about the quirks in their API).

In summary, I try to stay away from async APIs since they not only require more complicated program logic but also cause much more trouble than they're worth compared to just having one I/O thread of yours perform the work using a blocking API (with select/(e)poll/kqueue for TCP, and with nothing else for UDP).

For a FPS game with smallish levels and smallish number of players (such as Quake) sending a single packet with information about all players for each network tick is totally fine, simple, and will perform well.

Well yes, from a pure performance point of view, it's OK for a Quake-style of game (not so for something much bigger, though).

But my point about knowledge remains. In a game where several people compete, it can be troublesome to provide information to people that they actually can't know. Such as those shoot-through-wall aimbots, or other things. Imagine someone implements a minimap where enemies show as little dots (and nobody using the genuine client has such a mini-map). Imagine knowing what weapon, how much armour, and how much health your opponents have (when you shouldn't!), and where they hide. Imagine knowing which door they'll come through before they know (because you can "see through" the door).

No player should ever know the whole world, normally. Not unless it doesn't matter anyway.

So have a little flag that tells you whether or not the receiving client can see this player (to tell them if the player should be visible at their end) and don't send the new data about that player if they aren't visible. Problem solved.

For a FPS game with smallish levels and smallish number of players (such as Quake) sending a single packet with information about all players for each network tick is totally fine, simple, and will perform well.

Well yes, from a pure performance point of view, it's OK for a Quake-style of game (not so for something much bigger, though).

But my point about knowledge remains. In a game where several people compete, it can be troublesome to provide information to people that they actually can't know. Such as those shoot-through-wall aimbots, or other things. Imagine someone implements a minimap where enemies show as little dots (and nobody using the genuine client has such a mini-map). Imagine knowing what weapon, how much armour, and how much health your opponents have (when you shouldn't!), and where they hide. Imagine knowing which door they'll come through before they know (because you can "see through" the door).

No player should ever know the whole world, normally. Not unless it doesn't matter anyway.

For a FPS game with smallish levels and smallish number of players (such as Quake) sending a single packet with information about all players for each network tick is totally fine, simple, and will perform well.

Well yes, from a pure performance point of view, it's OK for a Quake-style of game (not so for something much bigger, though).

But my point about knowledge remains. In a game where several people compete, it can be troublesome to provide information to people that they actually can't know. Such as those shoot-through-wall aimbots, or other things. Imagine someone implements a minimap where enemies show as little dots (and nobody using the genuine client has such a mini-map). Imagine knowing what weapon, how much armour, and how much health your opponents have (when you shouldn't!), and where they hide. Imagine knowing which door they'll come through before they know (because you can "see through" the door).

No player should ever know the whole world, normally. Not unless it doesn't matter anyway.

So have a little flag that tells you whether or not the receiving client can see this player (to tell them if the player should be visible at their end) and don't send the new data about that player if they aren't visible. Problem solved.

Indeed, you can still keep it simple. UDK uses relevancy checks for each object when it is replicated to a client, so you could simply set an exclusion radius, and/or perform a line of sight test.

and don't send the new data about that player if they aren't visible. Problem solved.


... and also tell the player when you're going to stop sending updates about the entity, so there are not "dead" entities visible on the player's screen. And re-check each entity against each other entity each time they move (enough) to perhaps change relevancy. An re-send all relevant data about the entities when they are re-introduced by again becoming relevant, because the entity may have changed clothing or spell effects or whatever.

THEN "problem solved." But for games which don't have massive player counts, sending all the entities is typically easier and not less efficient. So do that if your goal is to get to a working game sooner, rather than later.
enum Bool { True, False, FileNotFound };

Wow, just a few days so many amazing posts haha. Anyways, I rewrote my server as per the several suggestions here, namely 0 threading, and a "Network Tick" type functionality which sends comprehensive updates of all players to every client. This has proven to work far better than any previous iteration of the server I had made. We ran 5 clients today and it worked amazingly. Of course, on the internet there will be a lot more latency than my school's intranet, but we will combat one problem at a time.

The next step I suppose is that we can finally move forward to implementing the asynchronous functionalities that was mentioned here, as well as "lockless FIFOs" and figuring out how many threads we will spawn initially and what each one will handle. But as of right now, the team is delighted to see our avatars running around in virtual space haha. Thank you soo much!

This topic is closed to new replies.

Advertisement