Managing connected clients

Started by
13 comments, last by Zipster 8 years, 6 months ago

I have a TCP server listening, and hundreds of clients connecting to the server.

Whenever a client is connected, it sends a unique ID (say "123") to the server "X", the server "X" authenticates the unique ID "123" against the database, and if it checks out, it stores another entry in the cache to indicate that "123" is connected to server "X". This is so I can put many instances of the servers and know which client is connected to which server.

When client disconnects, the server removes the entry from the cache.

Now I am running into an issue that if a client is connecting/disconnecting all the time, there is a race condition when the event is sent to the cache. I am observing that if a client connects and disconnects repeatedly, it may show as disconnected on the server side. The disconnect may take several minutes because of timeout, flushing out and closing the TCP stream, etc, that the new connection is posted first, only to be removed later by the old connection.

How do I handle this flow to correctly store client's connected status?

Advertisement
I think your design is inherently unreliable here. For example, what if the client connects to a server, and the server marks the client connnected, and then the server crashes? Nobody is there to clean up the database.

Also, what's to prevent some random person on the internet from connecting, and sending "I'm client 101!" and the server will approve that. Or, if it doesn't, said someone re-connects, and says "I'm client 102!" -- repeat until success.

However, we can't recommend specific solutions until we know what you want to use this connection management for. For example, if what you're trying to do so "presence," as in "so-and-so is online" in a chat client, that's actually one of the hardest features to scale! A database won't give you the right options there -- it's not at all the right choice.
enum Bool { True, False, FileNotFound };

Send ACK packets to mark if a client is connected or disconnected.

Everything protected with timeouts ....

'Keep Alive' messages from client to server sent frequently (3 ? times as often as the connection timeout period)

As someone else said - ACK (acknowlege) messages back to the client to indicate that the Server is active (initaially) and got the connect msg

A 3-way hand shake protocol (for something as important as Connect/Disconnect)

Msg ---------------------------->

.....................<------------------ Msg_ACK

Msg_ACK_ACK ------------->

Usually the disconnect request also should be acknowledged with a 3-way handshake just to make it clear with resend after timeout on client/server side

Disconnects can happen from either side (ie- server going down) so silar handshake protocols for both client and server disconnects

--------------------------------------------[size="1"]Ratings are Opinion, not Fact
ACK messages and 3-way handshake does not solve the problem that, if a server crashes, the client will still be marked as "active" in the database.
enum Bool { True, False, FileNotFound };

The heartbeat from the client (or from the server on behalf of the client) is only half the equation. The other half is a dedicated process running on the database server that can occasionally connect to the database and timeout clients, and potentially perform other maintenance tasks. The exact implementation depends on the application, but at the very least you need those two elements.

The other half is a dedicated process running on the database server that can occasionally connect to the database and timeout clients, and potentially perform other maintenance tasks.


I know of a company that did that, and scaled it up to a surprisingly high number of users. That being said, it is tremendously wasteful of machine resources, AND it is slow to respond to client disconnect. Using specific solutions for presence is a better user experience and a cheaper, more reliable way to run the service. For medium user counts, a pre-existing tool like ejabberd can do this for you. For massive user counts, you're going to end up writing/re-writing this on your own, because everyone ends up doing that (Google, Facebook, Riot, Steam, as well as us at IMVU.)
enum Bool { True, False, FileNotFound };

Well, when I said server, I actually simplified what we really have.

This is closer to what we have.

CLIENT <---> TCP SERVER <---> RECORDS SERVER <---> DB/CACHE

Client talks to TCP Server, which is only to manage active TCP connections. TCP Server isn't aware of any DB/cache. TCP Server talks to Records Server for authentication and posting the 'connected' status.

TCP tells Records Server that "hey, user X is online, and it's connected to me"

So you can increase the number of TCP servers independently of the Records Server.

Everything else in the cloud will talk to the Records Server to determine if user is online.

I changed the way we store this connected status to include the client's IP address as part of the check. If the disconnection isn't detected until later, it won't remove the current connection since the client's port will be different (at least for our case, the client isn't set to one port). It alleviates a lot of the problems now, but I'd like to hear better options.

So you can increase the number of TCP servers independently of the Records Server.


Why is that necessary? (I'm not saying it isn't, just saying that "managing TCP connections" is never a bottleneck on modern systems.)
When the TCP server crashes, who is responsible for telling the record server that the user isn't connected?

If the disconnection isn't detected until later,


How? Through a periodic cleaning operation?

I'd like to hear better options.


If all you need is online status ("presence") then you don't need a database at all; you can keep it all in RAM.

If you need to scale to truly massive user counts, you need to shard that RAM, but for almost any reasonable "indie" user counts (up to hundreds of thousands of online users,) a single box that does this is enough. Plus a hot standby that can take over when the first box dies.

Each TCP server would stay actively connected to the RAM server, so the RAM server could invalidate all users hosted by that server when the server stopped responding/broke the connection.

You probably also want to use publish/subscribe rather than checking for propagating online information -- when I log on, I subscribe to the online status of all my friends, rather than having to poll for that for each friend at some interval. Again, both for user responsiveness (polling has to be slow) and for implementation efficiency (polling uses many orders of magnitude more resources.)
enum Bool { True, False, FileNotFound };

I know of a company that did that, and scaled it up to a surprisingly high number of users. That being said, it is tremendously wasteful of machine resources, AND it is slow to respond to client disconnect. Using specific solutions for presence is a better user experience and a cheaper, more reliable way to run the service. For medium user counts, a pre-existing tool like ejabberd can do this for you. For massive user counts, you're going to end up writing/re-writing this on your own, because everyone ends up doing that (Google, Facebook, Riot, Steam, as well as us at IMVU.)

Assuming all your services can be connection-driven, like presence. But that's often not the case. We have a handful of RESTful services in our game, for instance, that require the database to be "pumped" every few seconds or so. It all just depends on the full extent of what you're trying to accomplish smile.png

This topic is closed to new replies.

Advertisement