Recovering from a login server crash

Started by
7 comments, last by KnolanCross 11 years, 6 months ago
I have an issue with my login server and I'm not sure how to go about fixing it.
My architecture supports multiple login servers accessing the same database. To prevent double-login of the same account, in my account table I have a flag that says if the account is logged on. When the user is disconnected the flag is set back to false. I do all this is stored proc so it happens in 1 database call. It's worth noting that my game does not support "force login" where it kicks out the currently logged player.

Now, if my login server happens to crash for whatever reason, the account stay logged on and I have to manually fix the flags. It's even worse if my other login server is still up because I can't know which account was on the crashed server.

One solution I though was to add a time token that would be updated every x minutes by the login server. (say 5 minutes). If when a login attempt the login flag is set to true but the time token is greater than x minutes, then I consider the account to be logged out. I feel this would work but it opens a possibility of a double login if for some reason the token isn't updated when supposed to. It also adds a bit of database calls.

Thanks,
- Chindril
Advertisement
Why not a keep alive tracking on the login servers themselves?

Generally, duplicate logins are not allowed. For each login, you give a ticket, and each subsequent client transactions are checked against that ticket. When a new login occurs, the ticket changes, and if a client with an expired ticket attempts to negociate, that particular client will get kicked out.

Everything is better with Metal.

From what I understand of your solution, if Player A logins with account Bob, then 5 minutes later Player B also logins with account Bob, then Player A gets kicked. This is a big no-no. Player B needs to receive a "Player already logged on" message.
I find that kicking out the previous login is generally much more robust than trying to favor the "first" login. If you try to login from more than one place, whoever logged in last wins. Also, if players try to share the same player account among multiple players, whoever logs in last wins, which is great for developers, because a shared account means interrupted game sessions which means more incentive to not share accounts.

If you have to support "first login wins" then you need to also store the location (game server) of that login, and when you find a clash, you have to separately contact that game server, verify whether a client is still connected, and if not, allow an override of the login from the new location. That can race with other simultaneous login requests, btw. The complexity and risk of races means that it's generally less robust.

enum Bool { True, False, FileNotFound };

From what I understand of your solution, if Player A logins with account Bob, then 5 minutes later Player B also logins with account Bob, then Player A gets kicked. This is a big no-no. Player B needs to receive a "Player already logged on" message.


You can mitigate that. When B wants to log in, you need to check if A login is still valid.

Meaning, A is still connected to its login server, but also that the actual login server is also alive (aka not crashed). Else, you can't tell when A is not logged on anymore. In any case, the login server, may crash, then recover later.

The 'alive' thing is straight forward, send 'I'm alive' messages at regular intervals from player A to its login server, and also the login server does the same with the database / login coordinator server.

There's a lot of logic you can run, the main thing is the database / coordinator needs to know if A's login server is still alive or not, which is the only thing that should be responsible for keeping track of A's logged on status, not the database / coordinator.

The login server may be down, in that case you either kick A and let B through, or stall B until the login server has recovered and can give an affirmative / negative about the status of A.

Everything is better with Metal.


This is a big no-no. Player B needs to receive a "Player already logged on" message.
[/quote]
No online services that I know of do this.

You need to think very carefully, because a "first player wins" can leave a single user locked out of their *own* account. For example, if the user's computer were to crash, they cannot log out of their original session, and it may take some time for the current server to register that the player is no longer active (depending on the type of game).

Worse, if the player accidentally leaves the game open in one location (e.g. a desktop at home), they may be locked out from playing the game on the go using their phone/tablet/laptop - unless your game also promptly kicks "idle" players (which can lead to a different kind of problem).
If you have to support "first login wins" then you need to also store the location (game server) of that login, and when you find a clash, you have to separately contact that game server, verify whether a client is still connected, and if not, allow an override of the login from the new location. That can race with other simultaneous login requests, btw. The complexity and risk of races means that it's generally less robust.

I have to do this either way. In my previous example, when Player B logs in, I need to get the Player A's login server to forcefully disconnect him. Therefore it really makes no difference which method I use.
when Player B logs in, I need to get the Player A's login server to forcefully disconnect him[/quote]

You could defer that until the player A service later checks back with the database, and notices that the player is now on another server, and thus should be kicked.

Btw: I like splitting servers into "game" servers (with simulation) and "service" servers, which are basically non-real-time application servers that talk to the database. Thus, things like "checkpoint player state" or whatever would live on the application server/s, just like "login," so there would be more opportunity to notice that a player session is no longer valid.

Also, that way, the simulation servers will never talk directly to a database (or file system,) and can thus be entirely non-blocking even if your database API is not non-blocking. Assuming you have non-blocking HTTP or something for game -> application server calls, but that's easy to write if you don't have it.
enum Bool { True, False, FileNotFound };
I would tell you to implement the second login kicks the former player approach, but if you really want to keep your current login policy you probably will need to implement two stategies: a fault handler in the servers and a monitor process.

I believe those are the steps:
1) Add an exception (or signal) handler so the login server won't simply die without setting the players as logged of.
2) Add a row in the player's table to hold the pid of the client login process that it was logged (and the server name or some kind of id, if you have multiple machines).
3) The login server must create a file with the pid (and server id, if needed) before it start accepting connections.
4) When a login server finishes normally (or handling an exception) it should unlink the file it created.
5) Create a monitor process that will check the files created by the login servers. If a login server died for some reason (and the exception couldn't be handled) the file will still exist, but the process won't be running. The monitor process should look for the players that were logged by the server that died and mark then as not logged. After all files are checked the monitor process will go to sleep for a few seconds, then repeat.

I believe this way is thread-safe and will work even if one the server shuts down due to some hardware problem, OS crash or the host facing a power outage.

Currently working on a scene editor for ORX (http://orx-project.org), using kivy (http://kivy.org).

This topic is closed to new replies.

Advertisement