Lockstep RTS: 90% done, 90% to go

Recommended Posts

The good news is that I have multiple instances of my game successfully running in lockstep across the loopback interface. Each instance runs in a small 400x300 window that allows me to issue commands from any player I want. I can select a few of one player's ships and give them a move order, and a very short time later the ships are moving on all the other player's screens. The bad news is that each game stalls fairly frequently. I'm still analyzing the data, but I'm having trouble coming up with any decent theories. The game instances just can't seem to communicate fast enough to keep ahead of the simulation updates. I'd like to hear any general advice or thoughts on what my problem might be. I doubt what I've said so far will be very useful, so I'll attempt to post a concise summary of my logic below. First, some background: My game is being developed in Python with PyGame and PyOpenGL. I'm using Python's socket module for my low-level communication. This will eventually be replaced with something like enet or RakNet, but for now the sockets allow me to easily send messages between players via TCP/IP. I'm working on a 1.6 ghz laptop. Running two instances of my app yield frame rates of roughly 80 FPS. Ping is usually around 10 ms, but I've had values as high as 20 ms across the loopback. The game will attempt to update the world every 50ms (i.e. 20 "ticks" per second). Player commands are scheduled two ticks in the future. Without further ado, here is my main loop with special emphasis on the lockstep networking code:
-------------------------------------

receive messages from other players over network

process local input and create game commands

sysTime = store local system time

if local input commands have been created:
schedule commands in the local world simulation for world tick + 2
send commands to other players

if player is host...
tickLength = calculate best tick length that can be acheived by each player (looks at pings and frame rates)
send "all ready" message to all players.  The message includes the calculated tick length

if sysTime > nextTickTime:

nextTickTime = sysTime + tickLength
stalled = lastAllReadyTick < world tick

if not stalled...
if local player is a client and hasn't sent a "ready" message for world tick + 2...
bestTickLength = highest average ping value to any other player OR local framelength (whichever is greater)

update world simulation
world tick += 1

render graphics
-------------------------------------

Share on other sites
Kylotan    9981
So, are you basically counting the game as 'stalled' if you don't get a response within the 50ms? My concentration span's too short to read all your code and explanations to try and work out what you mean, I'm afraid.

How much data do you send each time? And have you tried "mySocket.setsockopt(IPPROTO_TCP, TCP_NODELAY, 1)"?

Share on other sites
The game is "stalled" if a response isn't received within 100ms, since I'm delaying the commands by two ticks.

I'll give TCP_NODELAY a shot.

Share on other sites
I can't tell if it made a difference or not. I'm still getting stall messages. I'll analyze the log messages some more tonight after work and see if I can figure anything out.

Share on other sites
ShynDarkly    133
If you're only sending small amounts of data relatively infrequently and using TCP, even with Nagle coalesing disabled you could be suffering from the way most stacks now utilise delayed ACKs. As insane as it sounds, see if the stalls happen less if you increase the amount of data you're sending.

Share on other sites
hplus0603    11356
If you are running multiple games on a single CPU, scheduling (as in CPU/threads/processes) may come into effect. The Windows pre-emption quantum can be as big as 150 milliseconds, depending on system settings.

Try adding a Sleep(10) after processing each game tick, in each client. This will reduce frame rate, but will probably improve responsiveness.

Last, for play over the internet, a two-step delay between command and action is not enough for practical play. I would suggest counting on at least a 250 ms latency for robustness.

Share on other sites
Thanks for the ideas, guys. I'll try them all tonight.

I suppose I should have mentioned that my laptop is a dual-core Turion 64. How would that affect scheduling CPU time slices?

Share on other sites
Ozymandias43    158
It'd probably help, unless they both end up getting scheduled onto the same core. Do you have any problems if just two copies are playing? If you hook it up to a second computer, do you have any problem with two instances on the one and a third on the other?

Share on other sites
Kylotan    9981
You're running a PyGame app; are you emptying the event queue every single frame (as you should be)?

Share on other sites
Quote:
 Original post by KylotanYou're running a PyGame app; are you emptying the event queue every single frame (as you should be)?

I believe so. At the start of the main loop, I have a large "for event in pygame.event.get():" loop. That should clear the event list, right?

I'm starting to think I have a fundamental flaw in my logic. The host instance on my laptop was stalling, even when the client instance was running on my desktop machine. Tick length was set to 100ms, and command delay was set to 5. Stalls seemed to occur less frequently, but they still happened fairly regularly.

Another idea I just had was to schedule ticks based on the scheduled tick time + tick length. Right now, I'm scheduling the next tick based on the current system time. If tick X was scheduled to occur at time 1000, it might not actually be processed until 1020, depending on how quickly the main loop is running. Tick X + 1 would then be scheduled for 1120, which significantly disrupts the "one tick every 100ms" rule. If I scheduled the next tick time for 1000 + 100, I would get a more consistent tick rate on all machines, which might help eliminate the stalls.

I didn't try padding my messages with extra data, so I'll try that tonight, too.

- Mike

Share on other sites
Quote:
 Original post by doctorsixstringI believe so. At the start of the main loop, I have a large "for event in pygame.event.get():" loop. That should clear the event list, right?

Actually, that isn't quite right. Here is my main loop:

	while not quit:		timer.update()		input.update()		if net != None:		# net object is None in single-player games			net.run()		for player in players:			player.update()		currentState.update()		scheduler.update()		console.update()		renderGraphics()

1) net.run() checks for incoming messages from the network.

2) currentState.update() loops through pygame.events.get().

3) scheduler.update() processes the lockstep logic and updates the world simulation.

Would the order of that logic matter? I'm not sure if there is a problem with doing local input processing between receiving network messages in net.run() and processing the lockstep logic in scheduler.update(). It all needs to processed either way, right?

Share on other sites
Quote:
 Original post by doctorsixstringAnother idea I just had was to schedule ticks based on the scheduled tick time + tick length. Right now, I'm scheduling the next tick based on the current system time. If tick X was scheduled to occur at time 1000, it might not actually be processed until 1020, depending on how quickly the main loop is running. Tick X + 1 would then be scheduled for 1120, which significantly disrupts the "one tick every 100ms" rule. If I scheduled the next tick time for 1000 + 100, I would get a more consistent tick rate on all machines, which might help eliminate the stalls.

This has appeared to make a big difference. I've been testing on two seperate machines with 50ms ticks and a 5 command delay, and I've eliminated pretty much all stalling. A command delay of 2 shows minimal stalling. I currently have both tick length and command delay stored in a config file, and I'll probably keep it that way. Maybe I could even add a feature where the scheduler would automatically scale the command delay like it already scales the tick length.

I've still got some other work to do with my multiplayer code, but this has helped a lot. Thanks for all the ideas, guys!

- Mike