Sign in to follow this  
alnite

Managing connected clients

Recommended Posts

I have a TCP server listening, and hundreds of clients connecting to the server.

Whenever a client is connected, it sends a unique ID (say "123") to the server "X", the server "X" authenticates the unique ID "123" against the database, and if it checks out, it stores another entry in the cache to indicate that "123" is connected to server "X".   This is so I can put many instances of the servers and know which client is connected to which server.

 

When client disconnects, the server removes the entry from the cache.

 

Now I am running into an issue that if a client is connecting/disconnecting all the time, there is a race condition when the event is sent to the cache.  I am observing that if a client connects and disconnects repeatedly, it may show as disconnected on the server side.  The disconnect may take several minutes because of timeout, flushing out and closing the TCP stream, etc, that the new connection is posted first, only to be removed later by the old connection.

 

How do I handle this flow to correctly store client's connected status?

Edited by alnite

Share this post


Link to post
Share on other sites
I think your design is inherently unreliable here. For example, what if the client connects to a server, and the server marks the client connnected, and then the server crashes? Nobody is there to clean up the database.

Also, what's to prevent some random person on the internet from connecting, and sending "I'm client 101!" and the server will approve that. Or, if it doesn't, said someone re-connects, and says "I'm client 102!" -- repeat until success.

However, we can't recommend specific solutions until we know what you want to use this connection management for. For example, if what you're trying to do so "presence," as in "so-and-so is online" in a chat client, that's actually one of the hardest features to scale! A database won't give you the right options there -- it's not at all the right choice.

Share this post


Link to post
Share on other sites

Everything protected with timeouts ....

 

'Keep Alive' messages from client to server sent frequently  (3 ?  times as often as the connection timeout period)

 

As someone else said - ACK (acknowlege) messages back to the client to indicate that the Server is active (initaially) and got the connect  msg

 

A 3-way hand shake    protocol   (for something as important as Connect/Disconnect)

 

Msg ---------------------------->

.....................<------------------  Msg_ACK 

Msg_ACK_ACK ------------->

 

Usually the disconnect request also should be acknowledged with a 3-way handshake just to make it clear with resend after timeout on client/server side

 

Disconnects can happen from either side  (ie- server going down)  so silar handshake protocols for both client and server disconnects

Edited by wodinoneeye

Share this post


Link to post
Share on other sites

The heartbeat from the client (or from the server on behalf of the client) is only half the equation. The other half is a dedicated process running on the database server that can occasionally connect to the database and timeout clients, and potentially perform other maintenance tasks. The exact implementation depends on the application, but at the very least you need those two elements.

Edited by Zipster

Share this post


Link to post
Share on other sites

The other half is a dedicated process running on the database server that can occasionally connect to the database and timeout clients, and potentially perform other maintenance tasks.


I know of a company that did that, and scaled it up to a surprisingly high number of users. That being said, it is tremendously wasteful of machine resources, AND it is slow to respond to client disconnect. Using specific solutions for presence is a better user experience and a cheaper, more reliable way to run the service. For medium user counts, a pre-existing tool like ejabberd can do this for you. For massive user counts, you're going to end up writing/re-writing this on your own, because everyone ends up doing that (Google, Facebook, Riot, Steam, as well as us at IMVU.)

Share this post


Link to post
Share on other sites

Well, when I said server, I actually simplified what we really have.

 

This is closer to what we have.

 

CLIENT <---> TCP SERVER <---> RECORDS SERVER <---> DB/CACHE

 

Client talks to TCP Server, which is only to manage active TCP connections.  TCP Server isn't aware of any DB/cache.  TCP Server talks to Records Server for authentication and posting the 'connected' status.

 

TCP tells Records Server that "hey, user X is online, and it's connected to me"

So you can increase the number of TCP servers independently of the Records Server.

Everything else in the cloud will talk to the Records Server to determine if user is online.

 

I changed the way we store this connected status to include the client's IP address as part of the check.  If the disconnection isn't detected until later, it won't remove the current connection since the client's port will be different (at least for our case, the client isn't set to one port).  It alleviates a lot of the problems now, but I'd like to hear better options.

Share this post


Link to post
Share on other sites

So you can increase the number of TCP servers independently of the Records Server.


Why is that necessary? (I'm not saying it isn't, just saying that "managing TCP connections" is never a bottleneck on modern systems.)
When the TCP server crashes, who is responsible for telling the record server that the user isn't connected?

If the disconnection isn't detected until later,


How? Through a periodic cleaning operation?

I'd like to hear better options.


If all you need is online status ("presence") then you don't need a database at all; you can keep it all in RAM.

If you need to scale to truly massive user counts, you need to shard that RAM, but for almost any reasonable "indie" user counts (up to hundreds of thousands of online users,) a single box that does this is enough. Plus a hot standby that can take over when the first box dies.

Each TCP server would stay actively connected to the RAM server, so the RAM server could invalidate all users hosted by that server when the server stopped responding/broke the connection.

You probably also want to use publish/subscribe rather than checking for propagating online information -- when I log on, I subscribe to the online status of all my friends, rather than having to poll for that for each friend at some interval. Again, both for user responsiveness (polling has to be slow) and for implementation efficiency (polling uses many orders of magnitude more resources.)

Share this post


Link to post
Share on other sites

I know of a company that did that, and scaled it up to a surprisingly high number of users. That being said, it is tremendously wasteful of machine resources, AND it is slow to respond to client disconnect. Using specific solutions for presence is a better user experience and a cheaper, more reliable way to run the service. For medium user counts, a pre-existing tool like ejabberd can do this for you. For massive user counts, you're going to end up writing/re-writing this on your own, because everyone ends up doing that (Google, Facebook, Riot, Steam, as well as us at IMVU.)

 

Assuming all your services can be connection-driven, like presence. But that's often not the case. We have a handful of RESTful services in our game, for instance, that require the database to be "pumped" every few seconds or so. It all just depends on the full extent of what you're trying to accomplish smile.png

Edited by Zipster

Share this post


Link to post
Share on other sites

Why is that necessary?

 

It's not super necessary.  We had both services tightly coupled together (TCP knows Records service, and Records service maintains list of TCP server nodes) that changes on one needs changes on the other.  We thought it's a good idea to decouple them and have it mostly a one-way connection from TCP to the other.  Part maintenance, and part that it'd be less headache if we do need to scale.

 

 

 

When the TCP server crashes, who is responsible for telling the record server that the user isn't connected?

 

If the server app crashes, we have an autorestart policy.    Clients connected status will get updated when this server has been rebooted, so all clients connected to this are basically invalid.

If the hardware crashes, then we have so far no disaster plan for that.  The rest of the services won't know until any attempt to ping the client is made.  Any recommendation? biggrin.png

 

 

 


How? Through a periodic cleaning operation?

 

There isn't any periodic cleaning operation, unless not at this time.  Perhaps something we should add in the near future.

 

Our TCP server is a dumb server that it's not trying to act smart by pinging clients once in a while, or even try to make sense of the protocol beyond authentication reason.  We tried to add something like this but the biggest problem seems to be correctly detecting that clients are disconnected.  We have had issues that sometimes server may even think clients are still connected.  Any data transfer down the stream doesn't trigger any error until minutes later.

 

 


You probably also want to use publish/subscribe rather than checking for propagating online information -- when I log on, I subscribe to the online status of all my friends, rather than having to poll for that for each friend at some interval. Again, both for user responsiveness (polling has to be slow) and for implementation efficiency (polling uses many orders of magnitude more resources.)

Thank you for this.  Will certainly keep this in mind.  Once we reach hundreds of users, we would need to redesign the communication.

Edited by alnite

Share this post


Link to post
Share on other sites

We have a handful of RESTful services in our game, for instance, that require the database to be "pumped" every few seconds or so.


Don't use a database for this. Really! Anything that is not a durable edge transition, should not be in a database; it should be in some kind of in-RAM "game" server.
Well, sure. You can use databases for this. And if you ever get big, you will suddenly have a very high pressure to figure out how to NOT use databases for those calls :-)

Share this post


Link to post
Share on other sites

ACK messages and 3-way handshake does not solve the problem that, if a server crashes, the client will still be marked as "active" in the database.

 

 

 

If the server crashed then NO clients would still be active and any ones logged that way in the DB would have to be cleaned up as part of the initial server restart processing  ?

 

Recovery processing ...       possibly attempting Reattach to still  'active' clients if the server can come back up fast enough (?) when state data corruption isnt an issue.

Share this post


Link to post
Share on other sites

We have a handful of RESTful services in our game, for instance, that require the database to be "pumped" every few seconds or so.

Don't use a database for this. Really! Anything that is not a durable edge transition, should not be in a database; it should be in some kind of in-RAM "game" server.Well, sure. You can use databases for this. And if you ever get big, you will suddenly have a very high pressure to figure out how to NOT use databases for those calls :-)

The database is only for persistence. We have a public server that services client requests, and a back-end process for handling the side effects of these requests, such as updating third party services with client information in the DB. We simply chose to go with a separate process for this work as opposed to stuffing it into the front-end server. At which point, since we already have a little daemon running, we can put a few other maintenance tasks in there.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this