Concern over input jitter

Started by
5 comments, last by Angus Hollands 9 years, 11 months ago

Hi everyone. Recently I moved over to a sort of dejitter buffer which has been working wonderfully on LAN play tests, which due to slight timing problems wasn't so happy with my last implementation.


consume_move = self.buffer.popleft

# When we run out of moves, wait till we have enough
buffer_length = len(self.buffer)

# Ensure we don't over fill due to network jitter
buffer_limit = (self.buffer_length + self.buffer_padding)

# If we have 0 moves in the buffer
if not buffer_length:
	print("Waiting for enough inputs ...")
	self.buffer_filling = True

# Prevent too many items filling buffer
# Otherwise we may move to the past slowly and it causes long-term issues
elif buffer_length > buffer_limit:
	print("Received too many inputs, dropping ...")
	for _ in range(buffer_length - self.buffer_length):
		consume_move()

# Clear buffer filling status when we have enough
if self.buffer_filling:
	if len(self.buffer) < self.buffer_length:
		return
	self.buffer_filling = False

	# New debug feature, needs clamping at safe maximum
	# Ping is the RTT, and we add 1 to ensure the time is at least 1 tick
	self.buffer_length += WorldInfo.to_ticks(self.info.ping / 2) + 1

try:
	buffered_move = self.buffer[0]

except IndexError:
	print("Ran out of moves!")
	return

move_tick = buffered_move.tick

# Run the move at the present time (it's valid)
consume_move()

Here is my update code in the main update function for a player (controller)

When running over the internet (using my NAT's external IP and connecting from the machine), it isn't happy with 100 ms dejittering, and when I added the simple increment in the "if self.buffer_filling" branch, it seems happy at around 13 -> 16 ticks, which is around 400 ms. Surely this doesn't seem reasonable?

This seems far too high for a) my internet connection and b) most internet connections. I could have reason to suspect my provider as they're not the best in my opinion, but it seems unusual that so many packets are delayed, as they are each sent individually.

I printed out the number of items in the buffer each tick and it would read something like:


13
12
13
12
12
13
13
13
12
12
13
12
11
10
9
8
7
6
5
4
3
2
1
0
No moves!
14
13
12
13
13

Also, I do seem to notice that every so often a packet is dropped. What would be an expected packet loss statistic for a 4 Mb/s, latency 60ms, internet connection in the UK?

I'm trying to determine if it is some deeper level network code issue (in my codebase) or just life.

Advertisement
0) Are you using TCP or UDP?
1) Did you turn on TCP_NODELAY?
2) Do you handle the case where you receive a "packet" in multiple separate receives, or receive more than one "packet" in one receive call? TCP is stream based, like a file, not packet based, like structs, so you have to figure out where the end of one packet is and the start of the next packet is yourself -- one send() does not necessarily map to one recv().
enum Bool { True, False, FileNotFound };

Sorry, I omitted some rather crucial information.

I use UDP for all network transfers.

From the first post, I have a seemingly local range or about 1-3 ticks which seems perfectly acceptable ( < 100ms, it is more typically), but there are anomalous drops which cause up to 300 ms delay. This worries me, as I can't immediately think why this would happen due to me code, and hence it seems like I need to be able to handle such cases.

Every time this happens, the server has to accumulate n moves before continuing simulation for that client, meaning that they are corrected on the client.

Yes, when using UDP, you have to account for random packet loss, delay, and re-ordering. This does happen in the wild on the internet, sometimes more than other times. WiFi is worse than wired; different ISPs have different levels of quality; even rainstorms or solar flares may affect various networks.
enum Bool { True, False, FileNotFound };

Yes, when using UDP, you have to account for random packet loss, delay, and re-ordering. This does happen in the wild on the internet, sometimes more than other times. WiFi is worse than wired; different ISPs have different levels of quality; even rainstorms or solar flares may affect various networks.

So, with that in mind, what might be my best solution? It seems somewhat unusual that the delay delays all successive packets, even though they should be sent along potentially different routes across the network.

The issue with arbitrarily delaying commands is that they create a noticeable lag when projectiles are spawned (which are not predicted) and increase the window within which the player may shoot another player, but be killed by them before their own commands were processed. So, It is sensible to delay them as little as possible, but enough to protect from typical connection jitter. In order to do this, I'm defining a base dejitter time, and allowing the server to grow the buffer for individual clients until a maximum of 300 ms total delay. However this introduces the fact that freak, large delays, will cause the buffer to grow, when for 80-90% of the time it doesn't need to be so delayed.

In essence, I am happy to introduce artificial latency for a command buffer, but:

  1. 300-350 ms seems too high for an unsaturated connection (as the server and client are not locally saturated)
  2. I wonder if It is not network conditions that are primarily to blame.

Another confusing thing is that my network send rate and receive rate do not look roughly opposite on the server and client, and both has only one peer

Are there any resources on the subject I might visit? I cannot think of a clean way of handling this, and at the moment it is not enjoyable to play with.

For UDP, delays can be caused by many things. For example:

- thread scheduling delays in the receiving or sending programs

- prioritization/buffering in any equipment between sender and receiver

- wireless resend

- other use of available up/downlinks (windows update, bittorrent, dropbox, etc)

To start looking into this, your best bet is to hook a network analyzer of some sort as close as possible to your uplink on each side. A Linux laptop with tcpdump, or a Windows laptop with Wireshark would probably work, for example. For best effect, use one that has two Ethernet ports and works as the gateway for all devices on the internal link.

To track down whether the delay is in threading/scheduling, you can use Wireshark or tcpdump on the local machine and try to correlate the received packets with receive times within the application.

enum Bool { True, False, FileNotFound };

I have spent some time addressing the issue. Because of this, exactly what was wrong wasn't entirely clear. But I have a more usable solution now.

I've written a JitterBuffer class that is a fixed-length (scrolling window) container. It has two callbacks, one that gets the id of an item, and the other that will find the previous item. I send two moves at a time and therefore if the buffer cannot find a move (because the packet is dropped, which was happening a number of times per minute) it looks up the next move's previous move. This does two things, firstly ensuring I have the move at simulation time, and secondly avoiding a slowly decrementing number of items in the buffer (which leads to a large delay whilst we fill the buffer back to 100ms buffering).

At the moment snapshots are sent every network tick, whilst RPC calls are sent every tick if available (batching them together in one packet). I recognise that this incurs an overhead (50 bytes) per tick (on the client), but it is useful because I tend to loose one packet alone (rather than several, but this isn't the important point), and loosing 1/60 ms of information is more recoverable than 1/network_tick_rate 's worth (1/25 s).

So, in essence I believe the largest problem was the loss of moves due to packet loss, which did not leave enough time for them to be reliably re-sent. I cannot account for freak heavy packet loss, but there's little one can do there, and it seems to be running a lot smoother already. What was interesting is that it was difficult to discern the problem because there were a number of issues interwoven.

This topic is closed to new replies.

Advertisement