Sign in to follow this  
_winterdyne_

Thread-heavy kludge or reasonable system?

Recommended Posts

Right (final) sign off on the transport / basic services portion of my library, but there's a mechanism I think might be a bit of a kludge. Before I spend a week or so porting the code to a linux project, I thought I'd throw this to the wolves, and see if the overall structure is ok, and to see if what I think is a kludge gets picked up on. WSNet is the portion of the primogen library dealing with network transport. It's basically a service / service user pair, of which an application will implement one or the other. Overall behaviour (as far as raised events and API) is identical between UDP and TCP. UDP delivery is reliable (ack-based) and non-sequential (messages can arrive out of order). TCP delivery is obviously sequential and so has some advantage (especially if large multi-packet messages are being sent). There are three main objects - a WSNet (singleton), a WSNetService, and a WSNetServiceUser. The WSNet singleton provides platform-level information and settings (max packet size, latency settings, timeout settings) and maintains a register of operating services, service users and what ports they're operating on. It also initialises any platform-specific APIs (Winsock), and prepares a log file which most network-related objects write to. A WSNetServiceUser is the client side member of the pair. It provides a single wrapper for login and authentication procedure according to an automatically checked initialisation function - the user is expected to provide their initialisation requirements by means of a (pure) virtual function. The state is checked for validity afterwards. This object creates two threads - one which remains active for the lifetime of the object, should the object have automatic reconnection enabled (on by default), and the other which remains active only for the duration of a connection, and which opens, authenticates, logs in, and watches a connection to a specified host, using either TCP or UDP, generating application level events (via a subscriber pattern). The reconnection thread watches for an unexpected drop event (raised by the connection thread), and responds to it by waiting for the connection to die (socket cleanup, imminent in the case of a drop) and then effectively restarting it. A WSNetService is the server side of the pair. This is more complex than the WSNetServiceUser. It performs pretty much the same function, watching connections and generating application events, but does so through the specification of 2 or 3 aggregated client processor classes in a checked initialisation stage. There will be a WSNetClientManager and a WSNetAuthenticator. There may or may not be a WSNetLoginManager, the presence of which alters behaviour for the service. All of these classes watch clients assigned to them for timeout, and automatically cull linkdead clients. (Timeout values for each specified in WSNet). They are all updated by the WSNetService's thread. Incoming (unknown) connections / packets are passed to the WSNetAuthenticator. This will generate an internal client object for the connection (by IP address). Sequence resets are also passed to the authenticator (since they imply a new connection process). Non-zero sequences from unknown connections are responded to with a special reauthenticate packet, which causes the connecting WSNetServiceUser to disconnect and reauthenticate (without causing the drop event). It is possible that encryption is being used on initial connections - this will NOT be key negotiated (closed key). Application-specified client type is checked, and diffie-hellman key negotiation may be performed. If keys are exchanged correctly, or D-H negotiation is not specified (application option), the client is passed to the WSNetLoginManager. Encryption, when specified, is only applied to the payload portion of the message structure - the header, (basically just length) and packet headers are sent open. This allows us to have some special packets - ack, reauth, and logout. Encryption is done by means of a two-class pair a blockcipher and a blockinitialiser (which may be far more than a simple key). If D-H was used, clients will each have a unique authenticated initialiser. This is 'owned' by the client. If D-H wasn't used, but an authenticated scheme was specified, the initialiser for that is owned by the authenticator, and a simple pointer is held by the clients. The WSNetLoginManager is an abstract base class, which an application must override. It will be send incoming network messages from clients assigned to it until it returns true from the handler function. Messages are expected to conform with the 'authenticated' encryption scheme specified in the Authenticator. Once the login manager is done with checking messages from a client, it returns true to the message handler, will have assigned an account reference (checked for validity!) to the client, which is then passed to the WSNetClientManager. The ClientManager is the simplest of the lot - it simply passes incoming messages to the event pump for the WSNetService. Logout packets from a client in the Client Manager trigger the logout process to start (this is a timed, graceful exit from the service). WSNet specifies the logout duration. Immediate disconnects cause socket errors (either ICMP_PORTUNREACHABLE or host down type messages). The WSNetService is driven by a single thread created when the service is started, and destroyed just after the service stops, which polls the incoming sockets, queuing messages for handling, spends some time processing and cleaning up old packets and messages, updates the 2 or 3 client processors, and finally pumps the awaiting events to their subscribers. The service can be paused (stopping handling of incoming packets, but allowing sending of queued messages, and resumed. It all works well enough - can anybody see any major problems with it layout wise? Anything it should do it doesn't? Anything that should really be in a higher-level library? Note that there may well be more than one service/serviceuser pair in an application (certainly in clustering systems). Is having a thread-heavy design like this going to cost me? The main interface to the network layer will be through queuing messages / events to send (threadsafe) and receiving events / messages (again, threadsafe).

Share this post


Link to post
Share on other sites
Certainly looks interesting. Do you think a single thread is enough for the WSNetService? Will that thread be handling authentication, encryption, packet queuing, etc?

I have been working on the communications infrastructure for my own project, and it has some similarities with this schema you've described -- I am using UDP only, however. Since I consider myself more of a hardcore Windows programmer, I decided to capitalize on my strengths. I use BindIoCompletionPort, QueueUserWorkItem and CreateTimerQueueTimer's for the network subsystem.

For authentication, I opted not to have a resync. I figure, if the client got hosed up enough to lose its encryption key and require a resync, then the client should really just shutdown and restart from scratch.

My authentication sequence went something like this:
- Client sends gateway a request for the gateway's public key
- Gateway returns client a public key
- Client generates a public/private keypair, and using the gateway's public key, encrypts its own public key and the login credentials, and sends them to the gateway.
- Gateway decrypts the credentials and validates them. If valid, the gateway generates a symmetric key, encrypts it using the client's supplied public key, and sends it back to the client.
- Client now uses the symmetric key for all subsequent communications with the gateway.

The only unencrypted packets I allow are the initial request for a gateway's public key -- and the only packets that are partially encrypted are the client's credentials packet and the symmetric key response packet -- those two have unencrypted headers, but encrypted bodies.

After that, all comms between client and gateway are encrypted, including the header. Each gateway will have its own public/private keypair and each gateway cycles its keypair at random time intervals (never more than an hour, never less than a few minutes). Finally, all packet ID's are stored in a database and the header file for the packet ID's is generated during the build process -- so I can very easily rehash the entire packet ID subsystem with a single SQL query. I keep all the language-specific texts in the database too, and all the resource files are generated from there as well during a build.

Clients are required to ping the gateway periodically -- the gateway will automatically logout and cleanup any client that hasn't pinged for a given timeout, or doesn't complete the login procedure in a timely manner.

In order to execute a DDoS, you'd have to hit all the gateways, and the zombie machines would spend a lot of time generating keypairs and bogus credentials -- without which the gateway would just kick out the packet and not even submit it to the database, which would be where a true DDoS might do the most harm.

I figure, I won't be able to stop the hackers, but I'm not gonna lie down either.

Robert

[Edited by - rmsimpson on April 24, 2006 1:55:11 PM]

Share this post


Link to post
Share on other sites
Yes, I think a single thread PER service is fine - but there's the distinction.
The service is a single transport channel = a real-world application would use more than one service to perform discrete stages in a complex login procedure.

Whilst these *can* be performed by a single service, it's more likely that a service would be set up to introduce clients to the master servers of game clusters and to provide a symmetric session key.

As far as DoS and DDoS attacks go, so long as you have an external address which you're open (accepting) on, you're vulnerable. You can have some breathing room by having the port(s) opened for game sessions vary by session or by time, which makes a non-adaptive DoS's job difficult (other than on your gateway, which should be non-essential to the game service itself), or you can employ a blacklist principle which simply closes connections from blacklisted IPs without going through the rigmarole of authenticating. Unless the DoS is distributed and large enough in scale to hammer your socket pool, you might get away with it.

I'm not sure I'd want to run a query adding a packet ID to a database for every incoming packet though! Just locally logging or caching and submitting every so often (at housekeeping time) might be a better idea.

Share this post


Link to post
Share on other sites
I was unclear ... I meant that all packet type id's (movement, login, etc) are stored in the database and can be cycled during the build process. Thus a login packet that was identified by a 2-digit value of "0x0100" is in a subsequent patch identified by a 2-digit value of "0x10f1". Just another way of making more work for the hacker.

Robert

Share this post


Link to post
Share on other sites
Since I'm using UDP, the gateway opens up the same port all the time and I never have to worry about running out of sockets. The gateways are multi-homed, so the only real way to DDOS the system is to hit the gateways. Impact is minimized by the gateway's logic ... the ability to detect duplicate public keys in an authentication packet, duplicate logins with the same credentials, etc. Anything unrecognized is automatically kicked out, and the checks take less time to perform than the attacker needs to spend formulating the packets to get past those checks. Once past, a database hit will occur. Usually that database hit will result in a hacker getting blacked out and no more database hits would occur for that blacklisted IP. That goal is really to avoid hitting the database until its absolutely necessary, to minimize the impact of a DDOS.

It's not perfect of course ... no system really is.

Share this post


Link to post
Share on other sites
Hm, just realised I didn't answer your question about the service's thread. This applies equally to the main service user's thread.

Both begin by accepting any incoming connections (TCP), adding sockets if necessary to the socket pool.

Then the socket pool is checked for incoming data, and a ready socket is processed, reading and queuing a packet, or handling it if it is a special system-level packet (reauth / ack / logout).

Packets are assembled into messages or timeout. Duplicate packets and obviously invalid packets are dumped, potentially blacklisting the sender.

Complete messages are queued and handled by the appropriate client processor, or by the service user itself in the case of those. Game level messages are delivered to a registered subscriber (in a different thread) for handling.

The thread alternates sending from the outbound packet queue (packets are added by whatever thread dispatches the message) and receiving, until either both jobs are finished (queues empty) or a timeout occurs.

The thread then spends some time cleaning up old incoming packets and resending non-acked old packets, before looping.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this