Jump to content

  • Log In with Google      Sign In   
  • Create Account

Banner advertising on our site currently available from just $5!


1. Learn about the promo. 2. Sign up for GDNet+. 3. Set up your advert!


UDP Replication


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
17 replies to this topic

#1 PranavSathy   Members   -  Reputation: 122

Like
0Likes
Like

Posted 24 April 2014 - 06:53 PM

Hey Guys,

 

So new question. I have implemented a simple ZoneServer in C++ and I am curious as to how to go about handling my ONRPG player replication. So currently, what I do is this:

 

1) Create and bind a UDP socket to my desired port on the server

2) Wait for incoming data using recvfrom, and when it is received a new thread is created passing that information into it.

3) The thread determines the information within the packet, and if it is a request for a position update, determines the position and starts a new thread "replicate".

4) The replicate thread then replicates the new users position to all clients.

 

For those who are familiar with Unity, (which is what I am using for my client), I simply have a thread listening for packets and push them to a buffer, and in the main Update loop of my NetworkController I process ~4000 packets from the top of the queue at a time, and Lerp my remote players to their desired position.

 

The issue is, although there is no lag for the player, there becomes a lot of lag for the remote players on each client when over 3-4 people are connected on the server. Is there any way I can improve my server end?


Edited by PranavSathy, 24 April 2014 - 06:57 PM.


Sponsor:

#2 hplus0603   Moderators   -  Reputation: 6666

Like
3Likes
Like

Posted 24 April 2014 - 07:10 PM

First, do not create new threads in response to receiving messages.
Second, do not create new threads in response to more/fewer users connecting.
Third, because you only have one network interface, it makes sense to only have one thread that talks to the network.
Fourth, users shouldn't have to "request updates" -- if you're a connected user, you want updates, right?
Fifth, when you say there is "lag," you will need to quantify that.
How much lag?
How many packets are you sending per second?
How many messages are there in each packet?

In general, a network "replication" server might look something like:
 
int udpsock = socket( ... );
bind(udpsock, ... );
fcntl(udpsock, ... ); // make non-blocking
while (true) {
  int n;
  char buf[1536];
  sockaddr_in sin;
  socklen_t slen = sizeof(sin);
  while ((n = recvfrom(udpsock, buf, sizeof(buf), 0, (sockaddr *)&sin, &slen)) > 0) {
    if (is_new_address(sin)) {
      add_new_remote_player(sin);
    }
    handle_update_from_player(sin, buf, n);
  }
  if (time_now() > last_send_time + tick_interval) {
    last_send_time = time_now();
    n = get_position_of_all_players(buf, 1536);
    send_packet_to_all_players(buf, n);
  }
}
This loop could obviously be significantly improved by using select() with a timeout to not busy-wait, by detecting clients that have stopped sending updates to time them out, etc. But it should get you on the right direction towards building a basic, functional, high-performance repeating server.

Edited by hplus0603, 24 April 2014 - 07:11 PM.

enum Bool { True, False, FileNotFound };

#3 PranavSathy   Members   -  Reputation: 122

Like
0Likes
Like

Posted 24 April 2014 - 08:02 PM

 Wow hplus, this is like invaluable, so I am in the process of ridding myself of my threads, so in this case my question becomes then that last function, send_packet_to_all_players, is that ONE packet that has information about all players in the zone?



#4 SeanMiddleditch   Crossbones+   -  Reputation: 9893

Like
0Likes
Like

Posted 24 April 2014 - 09:11 PM

Wow hplus, this is like invaluable, so I am in the process of ridding myself of my threads, so in this case my question becomes then that last function, send_packet_to_all_players, is that ONE packet that has information about all players in the zone?


Send as few packets as possible. You can put multiple messages into a single packet. Each object's position update will often be a single message. Maybe it'll be combined with orientation and velocity data or maybe a single message will include information about a group of objects. It all depends on your game (there is not a single correct answer).

You do not generally need or want to send information about an entire zone to each player. If player A is standing 100 units away from player B and the in-game visibility is only 10 units, why would the players need to know anything about each other? This is generally referred to as "area of interest filtering." Figure out which players care about which objects and only send updates about those objects. This filtering can range from very simple radius checks to some very complex queries, depending on your needs.

#5 PranavSathy   Members   -  Reputation: 122

Like
0Likes
Like

Posted 24 April 2014 - 11:18 PM

Right, thanks Sean! and hplus too!! I have implemented this new server, will conduct my tests tomorrow, hopefully my lag problem will be fixed, and hplus I will try and quantify the data if this does not work for you o.0. Just out of curiosity, why is multithreading looked down upon?



#6 AllEightUp   Moderators   -  Reputation: 4413

Like
0Likes
Like

Posted 25 April 2014 - 06:10 AM

Right, thanks Sean! and hplus too!! I have implemented this new server, will conduct my tests tomorrow, hopefully my lag problem will be fixed, and hplus I will try and quantify the data if this does not work for you o.0. Just out of curiosity, why is multithreading looked down upon?


Multithreading is not the problem here, the usage you described is the problem. Spinning up new threads is something to be avoided because it is exceptionally slow and costly. In your case, it is actually reasonable to use threads but you need to do it only as you determine it is needed and also you don't want to actually "start" threads, they should all exist and never shutdown, they simply idle until there is work.

If/when you get to the point where it makes sense, there are many ways to go about things but I tend to move straight to the OS specifics. There is very little point running your system multithreaded through the generic API's when WinIOCP, ePoll or KEvent API's do a significant portion of the thread communications for you and cut out a fair portion of the overhead involved. Of course, while epoll and kevents are fairly simple and give you great benefits, WinIOCP is a PITA to get correct. Either way though, when you need the threaded solution, the OS specifics cut out a lot of intermediate bits that allow you to reduce latency issues and maintain high performance. But, again, doing this is really only going to be valid for a pretty high amount of traffic/process, it is up to you to decide when to switch over.

#7 samoth   Crossbones+   -  Reputation: 5494

Like
0Likes
Like

Posted 25 April 2014 - 06:59 AM

threads

 

In addition to what hplus0603 already said, you should generally never spawn threads, except when your program starts up. And then, you should spawn a fixed amount of them (typically equal to the number of CPU cores, or one less). Then assign tasks to the threads via one or several queues (lockfree ideally, but queues with a lock work just fine too, if you manage tasks properly). Note that when I say "task" then that does not mean something like "add two numbers", but something like "process 1,000 items".

 

The reason for that is that spawning threads is a lengthy, expensive operation which you do not want to do while receiving (or rather, not receiving, but dropping) UDP packets, and spawning a thread per task is generally a bad design, which is neither efficient nor scales well. Many threads means many context switches, which means a lot of CPU wasted for nothing.

 

You definitively want receiving UDP traffic to happen on one thread that does nothing else, since if you don't receive datagrams "immediately", your receive buffer will quickly fill up, and you will drop packets. So, you will definitively not want to do anything like reading a file, processing complicated game logic, or spawning threads in that same thread which handles the network. You don't need more than just one thread for that either, though. One thread is absolutely capable of doing that task (even with plain old blocking API calls and nothing special!).

 

ONE packet that has information about all players in the zone?

This depends. While that may be a very efficient solution (for some rare cases), it may not be the correct one. Every player in the zone (probably) does not see everything, but you are still sending every player the complete information. That may be acceptable, or it may be exploitable and therefore forbidding (depends on your game).

 

Also, not all information is equally important to each player. Depending on the amount of updates, you may have to send a considerable amount of data to every player. Bandwidth is not only limited (both at your end and at the other end!) but also costs money. You will therefore wish to reduce bandwidth by sending each player only

a) what they can actually see

b) what, in this subset, matters most

c) no more than a fixed so-and-so-much budget per second

 

It matters big time if someone who is 2 meters away makes a side step or changes clothes. This is immediately obvious. However, changing clothes may not be as important as pulling a gun.

It doesn't matter at all if someone 250 meters away makes a step or changes clothes. You likely won't notice at all.

 

Since the number of updates that you need to transmit scales at least quadratically with distance (according to the area of a disk for 2D/pseudo-3D, or if it's real 3D the volume of a sphere), you usually need to apply some "importance metric" that is somehow related to distance for each receiving user.

 

WinIOCP, ePoll or KEvent

This is an excellent tip for TCP, but less for UDP.  With TCP, you have many sockets in an "unknown" state, but you can only do one thing at a time, so you need some special magic that tells you which one is ready so you can read from (or which overlapped operation has just finished).

 

Using UDP, you have a single socket, no more. And that single socket either has something to read, or it doesn't. Instead of blocking on a function that tells you "it's now ready" and then calling recvfrom, you can just as well block on recvfrom, which will conveniently return when something came in. Then push that packet on a queue (making the processing someone else's problem!) and immediately call recvfrom  again.


Edited by samoth, 25 April 2014 - 07:10 AM.


#8 hplus0603   Moderators   -  Reputation: 6666

Like
0Likes
Like

Posted 25 April 2014 - 10:10 AM

For a FPS game with smallish levels and smallish number of players (such as Quake) sending a single packet with information about all players for each network tick is totally fine, simple, and will perform well. You only need to generate the contents of this packet once per tick, too, which is a bonus.

When the number of players goes up (say, above 30) and the sizes of levels goes up (so not everybody can possibly snipe everybody) then you can start doing interest management, where "close" or "important" entities are sent every network tick, but "far" or "unimportant" entities are sent less often. These packets need to be generated differently for each player, because each player has a different viewpoint.

When it comes to threading, threads are great to use multiple CPU cores. Thus, the ideal program/system has exactly one thread per CPU core. To make sure that those threads always have work to do, you should be using some kind of notified, asynchronous, or non-blocking I/O, so that threads don't get stalled blocking on I/O. For things that don't have convenient asynchronous APIs, like file I/O on Linux, you can spin up an additional thread, which receives requests for I/O, performs the requests, and then responds back, basically implementing async I/O at user level. You'd use some kind of queue between the other threads posting requests, and responses getting queued.

Similarly, there are physical hardware limitations. Each hard disk can only read from one track on the spinning platter (or one sector of flash) at a time. Each network interface can only send one network packet at a time. Thus, having more threads waiting for each particular piece of hardware at the same time is inefficient. Over-threading a program is very likely to run into this problem, where the threads don't give you any performance, but end up costing in resources and complexity (and bugs!)

Now, this is how high-performance servers end up being structured. If you're currently testing with 4 players, chances are that you don't need to implement this structure. You could get away with a single thread for a long while! And, once you start adding threads, adding one thread per subsystem (collision detection, networking, disk I/O, interest management, ...) is generally easier to debug and optimize than trying to add one thread per user, where each thread can potentially "do all the things" and the number of threads is not bounded or even managed.
enum Bool { True, False, FileNotFound };

#9 PranavSathy   Members   -  Reputation: 122

Like
0Likes
Like

Posted 25 April 2014 - 10:34 AM

Wow, I did not even know how this works, learning something new everyday. Sounds like a fun challenge!!! Thanks for all the help, can't wait to see the performance difference once classes are over today. I suppose after this, my team and I will be having a discussion about some of the game mechanics we would like to see in the game, and then how we will go about using all of your advices to restructure the way we are thinking about our server at the moment. But I understand now that having a manageable and bounded # of threads dedicated to certain tasks, which do not wait on the same hardware and utilize ASIO to the best of their capabilities, as well as a queue based system for passing tasks to threads is the best way to go about it. I will post back here once I see how well it worked out. Thanks soo much!!

 

Just out of curiosity actually, how would one go about not locking the queues as they are being read from and added to in different threads. Isn't that dangerous, at least from my clearly rudimentary understanding of threads, mutex's are required for synchronous communication but I should be going for asynch, however accessing the same memory in 2 places at the same time is dangerous no?


Edited by PranavSathy, 25 April 2014 - 10:35 AM.


#10 samoth   Crossbones+   -  Reputation: 5494

Like
0Likes
Like

Posted 25 April 2014 - 12:48 PM

For a FPS game with smallish levels and smallish number of players (such as Quake) sending a single packet with information about all players for each network tick is totally fine, simple, and will perform well.

Well yes, from a pure performance point of view, it's OK for a Quake-style of game (not so for something much bigger, though).

 

But my point about knowledge remains. In a game where several people compete, it can be troublesome to provide information to people that they actually can't know. Such as those shoot-through-wall aimbots, or other things. Imagine someone implements a minimap where enemies show as little dots (and nobody using the genuine client has such a mini-map). Imagine knowing what weapon, how much armour, and how much health your opponents have (when you shouldn't!), and where they hide. Imagine knowing which door they'll come through before they know (because you can "see through" the door).

No player should ever know the whole world, normally. Not unless it doesn't matter anyway.


Edited by samoth, 25 April 2014 - 12:48 PM.


#11 AllEightUp   Moderators   -  Reputation: 4413

Like
0Likes
Like

Posted 25 April 2014 - 04:25 PM

Quote
WinIOCP, ePoll or KEvent
This is an excellent tip for TCP, but less for UDP.  With TCP, you have many sockets in an "unknown" state, but you can only do one thing at a time, so you need some special magic that tells you which one is ready so you can read from (or which overlapped operation has just finished).
 
Using UDP, you have a single socket, no more. And that single socket either has something to read, or it doesn't. Instead of blocking on a function that tells you "it's now ready" and then calling recvfrom, you can just as well block on recvfrom, which will conveniently return when something came in. Then push that packet on a queue (making the processing someone else's problem!) and immediately call recvfrom  again.


Actually, it applies equally well to both tcp and udp. From the os side of things it doesn't matter if it is tcp or udp, multiple sockets or single sockets, these systems are just going to generate events which will wake a thread to pull the data. Basically what you are describing with waiting on recvfrom and then pushing to a queue is exactly what the underlying OS async solutions would be doing for you. The benefit, even with UDP, is you can throw more threads to wait on work at the OS solution without writing the queue portion yourself. Additionally, in WinIOCP at least, you will bypass much of the normal user space buffering of packet data and instead the data is directly written to your working buffers.

This is getting into fairly arcane and difficult coding areas but the general point is that the OS level interaction is effective in both cases. In fact, I tend to think that for udp these systems are even more effective since unlike tcp you will get more smaller events so in the long run the more efficient use of the resources will add up to a major win.

#12 hplus0603   Moderators   -  Reputation: 6666

Like
0Likes
Like

Posted 25 April 2014 - 07:10 PM

how would one go about not locking the queues as they are being read from and added to in different threads


In the simplest case, just use a lock to protect the queue, and use a linked list or the like to queue your items.
Because the queue is only locked for a very short amount of time (to enqueue or dequeue a single item,) there is no real risk of contention being a problem.
If you want to get fancier, look into "lockless FIFOs," which can be implemented incredibly cheaply, as long as you have only a single reader and a single writer per queue (which is actually the common case.)

However, seeing as you're still in school, I *highly* recommend avoiding threads in this server, at least for now. You simply do not need them, until you can show that the game server processing you do really does need multiple cores to achieve performance goals.
And, if you absolutely HAVE to use threads (this might be a mandatory project or something,) I'd highly recommend just using a simple mutex or other light-weight lock (CRITICAL_SECTION in Windows for example) to protect adding to and removing from each queue.
enum Bool { True, False, FileNotFound };

#13 AllEightUp   Moderators   -  Reputation: 4413

Like
0Likes
Like

Posted 25 April 2014 - 10:51 PM


And, if you absolutely HAVE to use threads (this might be a mandatory project or something,) I'd highly recommend just using a simple mutex or other light-weight lock (CRITICAL_SECTION in Windows for example) to protect adding to and removing from each queue.

 

I completely agree, as I stated also, don't go and thread things until you can justify it.  But, as a note, critical section as of Vista is no longer notably faster than a mutex.  Window's moved the first order checks into user space like all other OS's which makes it basically the same as a critical section anymore.



#14 samoth   Crossbones+   -  Reputation: 5494

Like
0Likes
Like

Posted 27 April 2014 - 08:23 AM

Actually, it applies equally well to both tcp and udp. From the os side of things it doesn't matter if it is tcp or udp, multiple sockets or single sockets, these systems are just going to generate events which will wake a thread to pull the data.

Yes and no. The hardware generates interrupts of course, and the OS eventually wakes a thread, but not necessarily (even not usually) in a 1:1 correlation. However, given poll+receive you have two guaranteed user-kernel-user transitions instead of one.

For TCP that makes sense since you somehow must mulitplex between many sockets. There is not much of a choice if you wish to be responsive. -- For UDP, it's wasting 10k cycles per packet received for nothing, since there is only one socket, and nothing to multiplex. You can just as well receive right away instead of doing another round trip. Same for IOCP where you have two roundtrips, one for kicking off the operation, and one for checking completeness.

 

Throwing more threads at the problem doesn't help, by the way. Operating systems even try to do the opposite and coalesce interrupts. The network card DMAs several packets into memory, and sends a single interrupt. A single kernel thread does the checksumming and other stuff (like re-routing, fragment assembly), being entirely bound by memory bandwidth, not ALU. It eventually notifies whoever wants some.

A single user thread can easily receive data at the maximum rate any existing network device is capable delivering, using APIs from the early 1980s. Several threads will receive the same amount of data in smaller chunks, none faster, but with many more context switches.

 

 

Basically what you are describing with waiting on recvfrom and then pushing to a queue is exactly what the underlying OS async solutions would be doing for you. The benefit, even with UDP, is you can throw more threads to wait on work at the OS solution without writing the queue portion yourself.

Yes, this is more or less how Windows overlapped I/O or the GLibc userland aio implementation works, but not traditional Unix-style nonblocking I/O (or socket networking as such). Of course in reality there is no queue at all, only conceptually insofar as the worker thread reads into "some location" and then signals another thread via some means.

 

 

Additionally, in WinIOCP at least, you will bypass much of the normal user space buffering of packet data and instead the data is directly written to your working buffers.

Yes, albeit overlapped I/O is troublesome, and in some cases considerably slower than non-overlapped I/O. I have not benchmarked it for sockets since I deem that pointless, but e.g. for file I/O, overlapped is roughly half the speed on every system I've measured (for no apparent reason). The copy to the userland buffer does not seem to be a performance issue at all, surprising as it is (that's also true for Linux, try one of the complicated zero-copy APIs like tee/splice, and you'll see that while they look great on paper, in reality they're much more complicated and more troublesome, but none faster. Sometimes they're even slower than APIs that simply copy the buffer contents -- don't ask me why).

 

But even disregarding the performance issue (if it exists for overlapped sockets, it likely does not really matter), overlapped I/O is complicated and full of quirks. If it just worked as you expect, without exceptions and special cases, then it would be great, but it doesn't. Windows is a particular piss-head in that respect, but e.g. Linux is sometimes not much better.

Normally, when you do async I/O then your expectation is that you tell the OS to do something, and it doesn't block or stall or delay more than maybe a few dozen cycles, just to record the request. It may then take nanoseconds, milliseconds, or minutes for the request to complete (or fail) and then you are notified in some way. That's what happens in your dreams, at least.

 

Reality has it that Windows will sometimes just decide that it can serve the request "immediately", even though they have a very weird idea of what "immediately" means. I've had "immediately" take several milliseconds in extreme cases, which is a big "WTF?!!" when you expect that stuff happens asynchronously and thus your thread won't block. Also there is no way of preventing Windows from doing that, nor is there anything you can do (since it's already too late!) when you realize it happened.

Linux on the other hand, has some obscure undocumented limits that you will usually not run into, but when you do, submitting a command just blocks for an arbitrarily long time, bang you're dead. Since this isn't even documented, it is actually an even bigger "WTF?!!" than on the Windows side (although you can't do anything about it, at least Microsoft tells you right away about the quirks in their API).

 

In summary, I try to stay away from async APIs since they not only require more complicated program logic but also cause much more trouble than they're worth compared to just having one I/O thread of yours perform the work using a blocking API (with select/(e)poll/kqueue for TCP, and with nothing else for UDP).


Edited by samoth, 27 April 2014 - 08:24 AM.


#15 OandO   Members   -  Reputation: 958

Like
0Likes
Like

Posted 27 April 2014 - 08:33 AM

 

For a FPS game with smallish levels and smallish number of players (such as Quake) sending a single packet with information about all players for each network tick is totally fine, simple, and will perform well.

Well yes, from a pure performance point of view, it's OK for a Quake-style of game (not so for something much bigger, though).

 

But my point about knowledge remains. In a game where several people compete, it can be troublesome to provide information to people that they actually can't know. Such as those shoot-through-wall aimbots, or other things. Imagine someone implements a minimap where enemies show as little dots (and nobody using the genuine client has such a mini-map). Imagine knowing what weapon, how much armour, and how much health your opponents have (when you shouldn't!), and where they hide. Imagine knowing which door they'll come through before they know (because you can "see through" the door).

No player should ever know the whole world, normally. Not unless it doesn't matter anyway.

 

So have a little flag that tells you whether or not the receiving client can see this player (to tell them if the player should be visible at their end) and don't send the new data about that player if they aren't visible. Problem solved.



#16 Angus Hollands   Members   -  Reputation: 778

Like
0Likes
Like

Posted 27 April 2014 - 02:26 PM

 

For a FPS game with smallish levels and smallish number of players (such as Quake) sending a single packet with information about all players for each network tick is totally fine, simple, and will perform well.

Well yes, from a pure performance point of view, it's OK for a Quake-style of game (not so for something much bigger, though).

 

But my point about knowledge remains. In a game where several people compete, it can be troublesome to provide information to people that they actually can't know. Such as those shoot-through-wall aimbots, or other things. Imagine someone implements a minimap where enemies show as little dots (and nobody using the genuine client has such a mini-map). Imagine knowing what weapon, how much armour, and how much health your opponents have (when you shouldn't!), and where they hide. Imagine knowing which door they'll come through before they know (because you can "see through" the door).

No player should ever know the whole world, normally. Not unless it doesn't matter anyway.

 

 

 

 

 

For a FPS game with smallish levels and smallish number of players (such as Quake) sending a single packet with information about all players for each network tick is totally fine, simple, and will perform well.

Well yes, from a pure performance point of view, it's OK for a Quake-style of game (not so for something much bigger, though).

 

But my point about knowledge remains. In a game where several people compete, it can be troublesome to provide information to people that they actually can't know. Such as those shoot-through-wall aimbots, or other things. Imagine someone implements a minimap where enemies show as little dots (and nobody using the genuine client has such a mini-map). Imagine knowing what weapon, how much armour, and how much health your opponents have (when you shouldn't!), and where they hide. Imagine knowing which door they'll come through before they know (because you can "see through" the door).

No player should ever know the whole world, normally. Not unless it doesn't matter anyway.

 

So have a little flag that tells you whether or not the receiving client can see this player (to tell them if the player should be visible at their end) and don't send the new data about that player if they aren't visible. Problem solved.

 

 

Indeed, you can still keep it simple. UDK uses relevancy checks for each object when it is replicated to a client, so you could simply set an exclusion radius, and/or perform a line of sight test.



#17 hplus0603   Moderators   -  Reputation: 6666

Like
0Likes
Like

Posted 27 April 2014 - 06:56 PM

and don't send the new data about that player if they aren't visible. Problem solved.


... and also tell the player when you're going to stop sending updates about the entity, so there are not "dead" entities visible on the player's screen. And re-check each entity against each other entity each time they move (enough) to perhaps change relevancy. An re-send all relevant data about the entities when they are re-introduced by again becoming relevant, because the entity may have changed clothing or spell effects or whatever.

THEN "problem solved." But for games which don't have massive player counts, sending all the entities is typically easier and not less efficient. So do that if your goal is to get to a working game sooner, rather than later.
enum Bool { True, False, FileNotFound };

#18 PranavSathy   Members   -  Reputation: 122

Like
0Likes
Like

Posted 29 April 2014 - 11:33 PM

 Wow, just a few days so many amazing posts haha. Anyways, I rewrote my server as per the several suggestions here, namely 0 threading, and a "Network Tick" type functionality which sends comprehensive updates of all players to every client. This has proven to work far better than any previous iteration of the server I had made. We ran 5 clients today and it worked amazingly. Of course, on the internet there will be a lot more latency than my school's intranet, but we will combat one problem at a time.

 

The next step I suppose is that we can finally move forward to implementing the asynchronous functionalities that was mentioned here, as well as "lockless FIFOs" and figuring out how many threads we will spawn initially and what each one will handle. But as of right now, the team is delighted to see our avatars running around in virtual space haha. Thank you soo much!






Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS