Federated Identity: A world of pain
Probably the biggest thing affecting the way user accounts get handled is the fact that we're supporting federated identity in V5. No longer will you need a username and password with us; if you'd prefer, you'll be able to sign in using a Facebook or LinkedIn account, or a generic OpenID. This is potentially a pretty big deal for our signup rate; given that pretty much every piece of content on the site will have a 'share' button that can spam a link to the social network of your choice, we're going to get an increase in people coming from sites like Facebook, and we want to make it as easy as possible for them to interact with the site, post comments, and so on.
Simply authenticating with these external sites would be easy, but we also want to fetch profile data from them... and while technically straightforward, the data privacy policies complicate matters. You're not allowed to store any data you retrieve from Facebook for more than 24 hours (with a few exceptions, like the numeric user ID); LinkedIn has a similar, though less explicit, policy. But if you come to the site from Facebook and post a comment, who do we attribute that comment to a week later? We can't store your name, avatar, or anything like that for more than a day.
What we have to do is simply re-fetch the data from Facebook when we need it. We can cache whatever we've fetched for up to 24 hours, but after that we drop it from our cache and wait until somebody needs it again. As well as storing your Facebook user ID, we also store the session key needed to talk to Facebook about you. The session has the special 'offline access' permission set on it, so we can keep using the same session key even when you're signed out of Facebook - it lasts until you 'disconnect' us (remove us from your FB applications listing).
So, all we need is a table of (facebookUserID, facebookSessionKey, expires, ...) to store all our cached Facebook data. We can run a job every 10 minutes or so, and for any entry that's approaching the 24 hour limit, we wipe all the data except the user ID and the session key. When the profile data is needed again, we go and refetch it from Facebook. Simples.
What's in a name?
One of the problems this is going to make much more acute is duplicate names. At the moment, it's no big deal to ask every user to pick a unique nickname, but if you're coming from Facebook or LinkedIn then it's much more natural to just use your real name. But we can't ask people to pick unique real names! What happens when two John Smiths both come to use the site?
Also, plenty of users won't want to go by their real names. Just because you've come to the site from Facebook or LinkedIn doesn't mean you're happy advertising who you are.
The end requirement is that we want every user to have a unique 'display name,' which can be constructed from their first/last name, their nickname, or a combination thereof. The rules will be something like:
- Offer the user the option to display their real name. If they turn it down, they have to pick a nickname that doesn't match any of the existing display names, and the nickname will be their display name.
- If they enter their real name, and there's no other user with that real name, their real name can be their display name, and a nickname is optional.
- If their real name is already in use as a display name, then they have to pick a nickname that will cause their display name to be unique.
Going by both real name and nickname will probably be displayed like:
Richard "Superpig" Fine
while going by real names or nicknames would just be what you'd expect - "Richard Fine" and "Superpig" respectively.
Cobbling bits together
A further complication is that the user might get some of their profile information from the external site, but not all. LinkedIn, for example, doesn't provide any kind of email address. And what if the user wants to present a slightly different identity on GDNet? Maybe they go by 'T-Bird Smith' on Facebook, but they'd rather go by the slightly more professional 'Tom Smith' on GDNet.
Enter the 'profile map.' The map specifies, for each field of the profile, where it comes from: LinkedIn, Facebook, GDNetV4, GDNetV5, and so on. Whenever the site needs to load somebody's profile into memory, the accounts service begins by fetching the profile map, and then the necessary LinkedIn/Facebook/V4/V5 database rows, combining fields across them to populate the user profile data structure. (This structure is then cached in-memory to avoid having to assemble stuff from the DB every time).
Here comes the new stuff, same as the old stuff
One other thing about this architecture is that it finally answers the question of how to handle existing (V4) user accounts: they just get treated like another identity provider, same as Facebook or LinkedIn. At some point we'll convert every V4 account into a V5 account, but treating it like an external identity provider for now will make it very easy to run the two sites side-by-side until that time.