• entries
375
1136
• views
299642

But tonight, on this small planet, on Earth, we're going to rock civilization.

## members.gamedev.net

I'm pleased to announce that the GDNet+ member webspace is now fully back online - and, for the first time in about 3 years, accessible by FTP once more!

Just FTP into members.gamedev.net, using your regular username and password, to access your personal space. Old GDNet+ members should find all their files ready and waiting for them. We've also increased the space quota to 100MB per user, and we'll look at increasing this further as things get settled in.

Anything you upload into your webspace is accessible over HTTP, too...

Old GDNet+ members can browse to the same addresses that they've always used. We'll probably be retiring these addresses at some point, but we'll make sure we let you know before we do.

New GDNet+ members, log in, and browse your way to https://www.gamedev.net/subscribe - take a look at the bottom of the page for your address info.

## GDNet Slim

For the past three days or so, I've taken some time away from working on V5 to see if there aren't some things I can do for the current site, V4. As you're no doubt aware, we're in a bit of a tight spot on cashflow right now - much like everyone else in the industry - so I figured I'd see if there wasn't anything I could do to bring down our hosting costs. Messing with our hardware and datacenter setup is beyond my remit; I'm only the software guy here, but that software has been churning out an average of 15 terabytes of data every month, and bandwidth ain't free. Not to mention that it makes the site load more slowly for you.

So, what exactly have I done about it? 97 commits to Subversion in the past three days, that's what [grin]

I spent about 4 hours optimizing and refactoring the site's CSS. Historically the site's had one large (28kb) CSS file per theme, with lots of duplication between themes; this is now one shared (16kb) and one theme-specific (11kb) file. A whopping 1kb saving, hurrah! Might not seem like much, but now that all the common stuff is in one file, it makes it easier to optimize, and also means that the optimizations will be picked up by people on every theme.
I totally rewrote the markup (and CSS) for the header banner you see up top there. It used to be this big 3-row table, with 0-height cells, lots of sliced-up background imagery, etc. It's now 4 divs. Much, much cleaner.
I put all the little icons from the nav bar into a sprite map, and got them all to be displayed by CSS. So, now, instead of making 15 separate requests to the server, you only make 1, and now there are no image tags in the header of every page.
I rewrote the little popup you get when you mouse over the 'recent topics' on the front page. The javascript library we were using to do this weighed in at 50KB (!!!); even minified it was still 23KB. I had a look into a jQuery solution, as we can embed a version of jQuery hosted by one of the big CDNs, but then realised that the whole thing could just be CSS instead. So, it is. That's a 50KB saving on bandwidth for every brand new visitor to our site's front page right there, which is substantial.
I stripped a bunch of
tags out of the markup and replaced them with margins (specified in the cached CSS files, naturally).
I updated our Google Analytics code. This wasn't strictly necessary, but I wanted to do it, and in the process I discovered that none of the forum pages have actually been including it properly up until now. The visitor graph in Analytics since I fixed it has a spike that looks like we've just been featured on CNN or something [grin]
I tidied up the breadcrumb, search box, and footer code. Again, mostly just getting rid of tables and replacing them with CSS.
I killed some of the 'xmlns' attributes that get left in our output due to the way we're using XSLT. There's still a bunch of them around, but I covered forum topics, which are the most popular offender. At some point I'll go back in and do all the other cases.
I redid the markup for the headers in 'printable version' articles. The gain from this won't be too huge, but it's often where Google searches end up, so it won't be nothing either. Also because I HATE TABLES AND WILL MAKE LOVE TO CSS IF IT IS EVER INCARNATE AS A TANGIBLE ENTITY.
I made us a new version of the OpenGL|ES logo. It's shinier!

That's pretty much everything for now. It's a little difficult to get a picture of how much total change it's made, but the HTML for the site front page has dropped from 95kb to 85kb. I guess I'll find out if it's actually made a serious dent when I hear the bandwidth figures in a few days.

What's the downside to all this? I've been acting with basically no regard to old versions of IE. Chrome is my primary development browser now, with Firefox a close second; I check that things work in IE8, particularly when using unusual CSS pseudoclasses like :hover and :first-child, but anything prior to IE8 - and especially anything prior to IE6 - can go die in a fire, basically. I know, I know, you can't do anything about it, your machine is locked down by corporate, I understand... and I don't care. These days, I think I'd be comfortable accusing any sysadmin who hasn't upgraded all their machines to at least IE7 of criminal negligence.

I guess the site will probably still work in old versions of IE. I'm not actively trying to shoot them down. Yet. By and large, things should degrade gracefully.

To end, here are some excerpts from my SVN logs that you may enjoy.

2010-07-15 00:29:18 dropped prototype and clientscripts.js from the page header. (over 120kb for a new visitor!)
2010-07-15 00:32:50 also dropped menu.js, as the menus have been CSS powered for some time now

2010-07-15 03:24:27 killed the empty child! \m/

2010-07-15 04:33:49 tidied up breadcrumb + search boxes
2010-07-15 04:34:38 oops
2010-07-15 04:35:45 added a floatclearer
2010-07-15 04:37:03 try again

2010-07-16 02:21:38 updated 'printable' articles to use GAM
2010-07-16 02:23:11 forgot the

## Activity streams

Today's chocolate-chunk-o'-LINQ:

public ActivityLogEntry[] GetActivitiesFiltered(DateTime? startTime, DateTime? endTime, Uri[] actors, Uri[] actorTypes, Uri[] objects, Uri[] objectTypes, Uri[] verbs, int? maxToFetch)
{
using(var context = new ActivityEntities())
{
var entries = context.Activities.AsQueryable();

if (startTime.HasValue)
entries = entries.Where(act => act.timestamp >= startTime.Value);
if (endTime.HasValue)
entries = entries.Where(act => act.timestamp
if (actors != null)
entries = actors.Length == 1 ? entries.Where(ent => ent.actorUri == actors.First().ToString())
: entries.Join(actors, ent => ent.actorUri, act => act.ToString(), (ent, act) => ent);
if (actorTypes != null)
entries = actorTypes.Length == 1 ? entries.Where(ent => ent.actorType == actorTypes.First().ToString())
: entries.Join(actorTypes, ent => ent.actorType, act => act.ToString(), (ent, act) => ent);

if (objects != null)
entries = objects.Length == 1 ? entries.Where(ent => ent.objectUri == objects.First().ToString())
: entries.Join(objects, ent => ent.objectUri, act => act.ToString(), (ent, act) => ent);
if (objectTypes != null)
entries = objectTypes.Length == 1 ? entries.Where(ent => ent.objectType == objectTypes.First().ToString())
: entries.Join(objectTypes, ent => ent.objectType, act => act.ToString(), (ent, act) => ent);

if (verbs != null)
entries =
entries.Where(
act => act.ActivityVerbs.Join(verbs, v => v.verb, w => w.ToString(), (v, w) => w).Any());

if (maxToFetch.HasValue)
entries = entries.Take(maxToFetch.Value);

return entries.Select(MakeFromEntity).ToArray();
}
}

## V5: User accounts and profiles

At the moment I'm working on the code for managing user accounts. This encompasses logging into accounts, creating new accounts, changing your password, and so on. There are some interesting features and design requirements that make this a non-trivial thing to do, so maybe it'll be interesting for you to read about it.

Federated Identity: A world of pain

Probably the biggest thing affecting the way user accounts get handled is the fact that we're supporting federated identity in V5. No longer will you need a username and password with us; if you'd prefer, you'll be able to sign in using a Facebook or LinkedIn account, or a generic OpenID. This is potentially a pretty big deal for our signup rate; given that pretty much every piece of content on the site will have a 'share' button that can spam a link to the social network of your choice, we're going to get an increase in people coming from sites like Facebook, and we want to make it as easy as possible for them to interact with the site, post comments, and so on.

Simply authenticating with these external sites would be easy, but we also want to fetch profile data from them... and while technically straightforward, the data privacy policies complicate matters. You're not allowed to store any data you retrieve from Facebook for more than 24 hours (with a few exceptions, like the numeric user ID); LinkedIn has a similar, though less explicit, policy. But if you come to the site from Facebook and post a comment, who do we attribute that comment to a week later? We can't store your name, avatar, or anything like that for more than a day.

What we have to do is simply re-fetch the data from Facebook when we need it. We can cache whatever we've fetched for up to 24 hours, but after that we drop it from our cache and wait until somebody needs it again. As well as storing your Facebook user ID, we also store the session key needed to talk to Facebook about you. The session has the special 'offline access' permission set on it, so we can keep using the same session key even when you're signed out of Facebook - it lasts until you 'disconnect' us (remove us from your FB applications listing).

So, all we need is a table of (facebookUserID, facebookSessionKey, expires, ...) to store all our cached Facebook data. We can run a job every 10 minutes or so, and for any entry that's approaching the 24 hour limit, we wipe all the data except the user ID and the session key. When the profile data is needed again, we go and refetch it from Facebook. Simples.

A similar approach is taken with LinkedIn - using OAuth, rather than a proprietary platform - and will be taken with OpenID, using some of the extensions to the standard. There are no explicit privacy policy concerns with OpenID, but another benefit of doing this is that we'll automatically be synchronising our data with external sites - so if you change your profile information on your OpenID site, it'll update here too.

What's in a name?

One of the problems this is going to make much more acute is duplicate names. At the moment, it's no big deal to ask every user to pick a unique nickname, but if you're coming from Facebook or LinkedIn then it's much more natural to just use your real name. But we can't ask people to pick unique real names! What happens when two John Smiths both come to use the site?

Also, plenty of users won't want to go by their real names. Just because you've come to the site from Facebook or LinkedIn doesn't mean you're happy advertising who you are.

The end requirement is that we want every user to have a unique 'display name,' which can be constructed from their first/last name, their nickname, or a combination thereof. The rules will be something like:

Offer the user the option to display their real name. If they turn it down, they have to pick a nickname that doesn't match any of the existing display names, and the nickname will be their display name.
If they enter their real name, and there's no other user with that real name, their real name can be their display name, and a nickname is optional.
If their real name is already in use as a display name, then they have to pick a nickname that will cause their display name to be unique.

Going by both real name and nickname will probably be displayed like:

Richard "Superpig" Fine

while going by real names or nicknames would just be what you'd expect - "Richard Fine" and "Superpig" respectively.

Cobbling bits together

A further complication is that the user might get some of their profile information from the external site, but not all. LinkedIn, for example, doesn't provide any kind of email address. And what if the user wants to present a slightly different identity on GDNet? Maybe they go by 'T-Bird Smith' on Facebook, but they'd rather go by the slightly more professional 'Tom Smith' on GDNet.

Enter the 'profile map.' The map specifies, for each field of the profile, where it comes from: LinkedIn, Facebook, GDNetV4, GDNetV5, and so on. Whenever the site needs to load somebody's profile into memory, the accounts service begins by fetching the profile map, and then the necessary LinkedIn/Facebook/V4/V5 database rows, combining fields across them to populate the user profile data structure. (This structure is then cached in-memory to avoid having to assemble stuff from the DB every time).

Here comes the new stuff, same as the old stuff

One other thing about this architecture is that it finally answers the question of how to handle existing (V4) user accounts: they just get treated like another identity provider, same as Facebook or LinkedIn. At some point we'll convert every V4 account into a V5 account, but treating it like an external identity provider for now will make it very easy to run the two sites side-by-side until that time.

## V5: What I've been working on recently

Well, I could tell you, but maybe it'd be easier just to show you.

(Let me know if you get any errors out of it. I'm aware of two issues at the moment: one, that ads don't load in IE; and two, that sometimes a page displays a generic 'something went wrong' message which goes away when you refresh. I'm fairly sure the second is something to do with an idle timeout somewhere because it only happens after nobody's touched the pages for a bit).

More to come.

EDIT: Here's another one.

## Service process account install gotcha

Here's a little something that had me stumped for 15 mins. The info on the net about it is pretty sparse so maybe this will help somebody.

I was trying to install the GDNet service processes on the backend server. Every service process needs its own user account - it makes security, auditing, and SQL Server access a lot neater. Normally when you install a service process that uses a user account, you get prompted for the username and password of the account the service should use. I want the installs to be unattended, so I hardcoded the usernames and passwords into each service process:

[RunInstaller(true)]
public class RendererServiceInstaller : System.Management.Instrumentation.DefaultManagementInstaller
{
public RendererServiceInstaller()
{
var process = new ServiceProcessInstaller
{
Account = ServiceAccount.User,
};
var service = new ServiceInstaller
{
DisplayName = "GDNet V5 Rendering Service",
Description = "Service that renders GDNet XML into XHTML for output to users.",
ServiceName = "V5Renderer"
};

var evtLog = new EventLogInstaller {Source=RendererService.EventLogSourceName, Log="GDNet" };
}
}

When I tried running InstallUtil to install the service, though, I got this error:

System.ComponentModel.Win32Exception: No mapping between account names and security IDs was done

I'd granted the accounts in question the 'Log on as a service' and 'Log on locally' rights, I could start processes as them by hand over remote desktop, so what was the problem?

Look at the username I'm using: it's a domain account, so it's in the form DOMAIN\accountname. Look at what's separating those two components. It's a backslash. Backslashes are special in C# (and many other languages). As far as the compiler was concerned, the account name wasn't 'GDNET\v5_render', it was 'GDNET', then a vertical tab, then "5_render".

Sticking an @ on the front of the username string has fixed it.

## V5: Fun with MSBuild

Now that Pulse is churning away at the codebase, I've spent time today doing further tidying of the build and deploy process. Once the site goes live I want to be able to get changes deployed quickly and safely, and I want to be able to start deploying builds for the Staff to look at within the next few days, so I'm doing what I can to get the pipeline right now. Fortunately, the tooling in this area is all really pretty good.

The first problem is versioning. As I mentioned in my last journal entry, I wanted to get MSBuild to stamp all my executables (I should have said 'assemblies' because a lot of these are libraries) with the build number. When dealing with rapidly-changing, opaque binaries spread across multiple computers, being able to ensure that your files are in sync is critical.

MSBuild, the standard build engine used for .NET projects, is highly flexible and extensible; it's very easy to just drop in new kinds of build task and add them to your project file, and best of all, Visual Studio is fine with it - it can't show them in the UI most of the time, but it can respect them, execute them, and generally not screw them up while working with the project file like normal. There's also a lot of drop-in build tasks freely available all over the net. For this, I'm using the AssemblyInfoTask (though I may upgrade to the MSBuild Extension Pack). The task takes the necessary version parts - major, minor, build, revision, plus any of the product/company name, copyright info etc that you usually find in a Win32 file version resource - and updates the project's AssemblyInfo.cs file with them prior to build. That's a little skeevy as it means the AssemblyInfo.cs file - which is under SVN - keeps getting local changes, but I can live with it. I've written a .targets file that incorporates the task just before the main compile phase, like so:

1
0
$(PulseBuildNumber)$(PulseSvnRevision)
NoIncrement
D
NoIncrement
D

"**\AssemblyInfo.*" Exclude="**\.svn\**"/>

$(CoreCompileDependsOn); UpdateAssemblyInfoFiles "UpdateAssemblyInfoFiles" Inputs="$(MSBuildAllProjects);
@(Compile);
@(ManifestResourceWithNoCulture);
$(ApplicationIcon);$(AssemblyOriginatorKeyFile);
@(ManifestNonResxWithNoCultureOnDisk);
@(ReferencePath);
@(EmbeddedDocumentation);
@(AssemblyInfoFiles)"
Outputs="@(AssemblyInfoFiles);@(IntermediateAssembly)">
"@(AssemblyInfoFiles)"
AssemblyMajorVersion="$(AssemblyMajorVersion)" AssemblyMinorVersion="$(AssemblyMinorVersion)"
AssemblyBuildNumber="$(AssemblyBuildNumber)" AssemblyRevision="$(AssemblyRevision)"
AssemblyBuildNumberType="$(AssemblyBuildNumberType)" AssemblyBuildNumberFormat="$(AssemblyBuildNumberFormat)"
AssemblyRevisionType="$(AssemblyRevisionType)" AssemblyRevisionFormat="$(AssemblyRevisionFormat)">
"MaxAssemblyVersion" PropertyName="MaxAssemblyVersion"/>
"MaxAssemblyFileVersion" PropertyName="MaxAssemblyFileVersion"/>

This could be made somewhat more efficient - I don't strictly need to pull the version bits out into a separate PropertyGroup, for example, and could just write them directly into the attributes on the AssemblyInfo element. Still, it gets the job done. All I then need to do is add an statement into my .csproj file pointing at this .targets file, and the build step is magically included.

Note how the build and revision numbers are actually variables - PulseBuildNumber and PulseSvnRevision. I'm passing those in as arguments to MSBuild when I launch it. You can do this on the command-line using the /p switch, though because I'm using Pulse, it's actually got an XML config file that I use to feed inputs to MSBuild:

$(build.number) and$(build.revision) are, in turn, built-in variables defined by Pulse whenever it launches a build. See the data pipeline!

I had a good question from @naim_kingston on Twitter, asking why I use both the build number and the SVN revision number - aren't they redundant? In theory, yes; I should only need the SVN revision number, and then should be able to check out that revision of the code, build it, and always get the same result. In practice, though, I might not always get the same result because there are elements of the environment that may have changed. For example, maybe I'm using a different version of the compiler, or of the build tasks library. Storing the build number as well allows me to more quickly correlate a particular binary to its entry in Pulse's build log, so I can very quickly go to Pulse and download the right .pdb files, MSBuild output files, and so on, and always be confident that what I'm getting is from exactly the same build, rather than just one that used the same code.

So, that's got versioning sorted. I need to add the element to more of my project files, but I've got the main service projects covered for now. I'll add more as I go along.

Next, app.config files. It's common to want to change stuff in these files, such as the address at which a service can be found (e.g. from "db-server.gamedev.net" to "localhost"), but changing the app.config file directly means you have to remember not to check it into SVN, and it's kinda pesky to have it always showing up as 'modified' in the Pending Changes window. What would be better would be if I could have a second file of 'local overrides' that should be used in preference to the app.config file, falling back to app.config for stuff I don't care to change.

MSBuild to the rescue once more. This time I've used the MSBuild Community Tasks, which includes a task called "XmlMassUpdate" - given two XML files, it takes the nodes from one, and adds, inserts, or replaces them into the other. There's also some custom attributes for removing nodes from the target file. Another .targets file integrates the task into my build pipeline, and presto: I have an app.local.config file in each project, svn:ignored to stop it from pestering me, that MSBuild neatly integrates on every local build.

The next challenge I face is how to get each service from a ZIP file in Pulse to a correctly installed and registered presence on the relevant server. There's more to this than just XCOPY - most of the services need to be registered as Windows Services, have event log sources and WMI classes created, etc - and ideally it should happen without me logging into each machine by hand and copying files around + running InstallUtil. The answer is probably going to be to build MSI files. Anyway, that's for later. For now, I'll sleep.

## V5: Continuous Integration and Deployment

I spent a bit of time recently doing some work on V5's build pipeline, implementing continuous integration and making the deploy-to-servers process a bit more formal. Unlike most web developers, I'm a big fan of pre-deployment testing and verification, so a well-established build process is a key part of that.

Continuous Integration, for those who aren't familiar with it, is the simple idea that your code should be continually being built. Every change you check into source control should get compiled, packaged, and tested on all your target platforms - automatically, of course. It's a great way to catch build errors in other configurations or on platforms other than the one you're developing on.

Many people go for CI servers built around CruiseControl, but after researching the options when I was back at NaturalMotion, I selected, used, and fell in love with Zutubi Pulse. So, it's now running on GDNet, a nice complement to our issue tracker and source control system.

Pulse is great. It's got an easy-to-understand but elegant and powerful web UI, built-in support for a bunch of external build systems (such as MSBuild), it's trivial to install... but the best thing, really, is the support. Zutubi is, as far as I can tell, two guys in Australia - Jason and Daniel. Yet, between them, forum questions get answered within minutes, with detailed and helpful responses; feature requests get logged and show up in a point release a week later; their JIRA instance is publicly accessible; and they still, somehow, find time to blog about build systems, agile programming, unit testing, and so on. If I ever meet these men, I am buying them a drink. Each.

Two further things that are more relevant to the average GDNetter: Firstly, they have free licenses available for open-source projects and for small teams (2 people / 2 projects), and secondly, I'm told they've got a number of game developers as customers... so they've got quite a lot of familiarity with our use-cases, and Pulse handles things like '4GB of art assets' pretty well. I'd definitely recommend checking Pulse out if you've got the hardware to spare.

The other nice thing about having a CI server is it provides an authoritative 'release provider' within the GDNet LAN: a clear, single source for new releases of the site software to be deployed to our machines. I've done some work tonight to have Pulse capture the executables and content directories as zip-file 'artifacts;' next I'll get MSBuild to actually stamp the executables with the build number, and I'll look into ways to quickly and efficiently deploy the artifacts to the machines that need them. Eventually, doing a new release of the GDNet site will just be a question of clicking a 'trigger' button, and watching the progress bar tick for a bit [grin]

## V5 Guts: Text Sanitizer

One of the biggest causes of security issues in sites - XSS attacks, SQL injection, etc - is a failure to properly handle user input, making sure that it doesn't contain undesirable elements.

This is potentially a very complex task, and it gets more complex the more the user's allowed to do and the more you care about the output. In V5, I want to expand the capabilities of the markup users can include through things like attributes; I also want to keep the data on the server end in a highly flexible format, making it easy to do things like strip out smilies, find posts associated by quotations, and so on. XML seemed the obvious choice.

Another thing I really, really wanted to fix is the way HTML entities get handled. At the moment, if you make a post with < and >gt; entities, they get turned back into when you edit the post, and then treated as HTML when you save the post again... there are also problems with how to encode stuff when putting it out as RSS or similar. I wanted to put a stop to all these encoding issues.

Happily, we've now got a pretty solid pipeline in place. A combination of HTML Agility and OWASP AntiSamy, with my own extensions and modifications, provide the bulk of the work.

HTML Agility takes the tag soup you guys will throw at the site and turns it into an XML document. At its core is a normal state-machine based parser that generates DOM nodes as it encounters them. Agility also handles encoding issues, turning HTML entities like ™ into their actual character sequences. I've also extended it to allow tag names that have namespace prefixes - so it will allow, for example, tags.

The output from Agility is a near-as-dammit-valid XML document that I feed to AntiSamy.NET. Now, AntiSamy I have made some fairly extensive changes to, updating it for C# 2 and multithreading it all. Still, the core concept remains the same: AntiSamy has a 'policy' of which tags are allowed, and which attributes and CSS properties are allowed on them (along with regexps defining the values those attributes and properties can take). When something isn't allowed, it can be dropped entirely - such as I might do to tags - or it can be 'filtered,' removing the tag but leaving its contents. I've set it up to support multiple policies, so I can permit one set of tags when writing articles, another when writing journal entries, and another when writing forum posts, etc.

The result is a neatly-filtered XML fragment that I can quickly and easily perform XPath queries against, or feed to the renderer for processing by the XSLT stylesheets and outputting.

## Wheeee

There's a horribly subtle little bug here. Can you spot it?

public IEnumerable GetContentItemHeaders(IEnumerable uris, ContentItemHeader.Fields fieldsToGet, int actorSid)
{
using (var context = new ArticlesDataContext())
{
return uris.Select(delegate(Uri u)
{
var identifier = PrefixUri.MakeRelativeUri(u).ToString();
try {
var g = new Guid(identifier);
return context.articles.Where(a => a.ID == g).FirstOrDefault();
}
catch (FormatException)
{
return context.articles.Where(a => a.UrlTitle == identifier).FirstOrDefault();
}
})
.Where(g => g != null)
.Select(g => new ContentItemHeader
{
ObjectSID=0,
Uri= new Uri(PrefixUri, g.UrlTitle)
});
}
}

(Hint: It manifests as an ObjectDisposedException).

## GDNet V5 Concepts: User Ratings

One of the questions from a previous entry was what's happening to user ratings in V5. I don't have funky screenshots to show you this time, but I'll talk about what the plan is.

The present system
The present user rating system, visible under every post as a number, was created to solve a set of problems:

How do users distinguish the people that should be listened to from the people that shouldn't?
How do we identify users who are contributing to the site and community?
How do we identify users who are detracting from the site and community?

These problems were all solvable, but they required a lot of time investment and effort. We wanted to shift away from solutions that relied on users and moderators spending lots of time watching site activity. The solution was to seek to recruit the entire userbase to help solve the problem, by giving everybody a means to indicate who should and shouldn't be listened to. That, in turn, needed some kind of balancing to determine which people were good judges, which is why higher-rated users have a larger effect on the ratings of others than lower-rated users.

It's true that in general, the rating system has worked. The top-rated users are, pretty much uniformly, good contributors to the site. The lowest-rated users are generally incoherent, in(s)ane, and unwanted - though I think that exceptions exist. And users do pay some attention to the ratings of those they read, though only around 1% of registered user accounts actually filter out posts with ratings below a given threshold.

We do definitely see some undesirable behaviour. For example:

People getting upset about their rating dropping a few points and posting threads about it. This wouldn't happen if people were less sensitive, of course, but we have to face the fact that they are this sensitive. It doesn't help that there's not much one can tell those people except "be nicer."
Bandwagoning - people voting somebody down partly because they've got a low rating, and That's What This Thread Is All About Anyway. Group dynamics can be bizarre at times.
People who are great technical contributors, ending up with low(er) ratings because they got a bit ranty in the Lounge, and therefore start to be ignored in technical discussions.
Similarly, people who are really funny in Lounge threads get high ratings, and then when posting in technical threads perhaps get given more authority and credit than they're due.
People who get low ratings can have trouble recovering that rating, partly because people aren't inclined to vote low-rated users up, and if the filters are in play then their posts won't even get seen. This usually leads to the low-rated poster either creating a new account (which is a policy violation) or just leaving the site altogether. Sometimes they'll stay and just not care about their rating, but whether or not they care doesn't change the fact that we then have a user who is making positive contributions but has a low rating.

At the heart of the current rating system's design rests a few fundamental assumptions. Firstly, it assumes that if a user is good in any one way recognised by the community, then they're good in all ways - or at least are smart enough to disclaim themselves in areas where they're not good. Secondly, it assumes that users will fully consider a user and the contributions they've made to the site as a whole before rationally rating them. Thirdly, it assumes that users have good ideas about how to respond to changes in their rating - that they don't just keep doing exactly what they've been doing (albeit with an added air of bafflement and indignation) expecting a different result.

It also contributes to a bad philosophical assumption on the part of the user, and that is: that something is right because a particular person said so. Smart users won't read the ratings in this way; but some users will, when given two answers to their question, pick the answer from the higher-rated user because the user is higher-rated rather than because the answer is better.

None of these assumptions are good. They're true enough of the time that we can point to some corroborating accounts and say, "look, the system works!" but that doesn't tell us whether the system works as well as it could do.

I'm the highest-rated user on this site, so it's not something I consider lightly [grin] but in V5 I'm planning to replace the present rating system with an approach that is less susceptible - albeit not totally immune - to the above problems.

The V5 Rating Strategy
Tagging
The first problem I set out to solve was this: How do we make the rating better convey the ways or areas in which a person is good?

The solution to this one seemed fairly obvious. A mechanism by which users can express their support of a person in arbitrary, user-defined categories? Sounds like a job for tagging to me! By letting users tag users as another kind of site content, we go from having a single rating axis, to as many axes as you want - be they subject-area tags like 'Python' or 'object oriented,' or style tags like 'funny' or 'friendly.' Reconciling the different ways users tag content is already something the tagging engine has to do.

Immediately this also defeats the assumption that 'good in one area == good in all areas.' It becomes very easy to identify when a user is participating in something that matches their tags - i.e. when they're talking about what they're good at.

Thanks
How do we defeat the second assumption - that users will think long and hard before selecting tags for a user? In reality, people don't do that - they read one post, have a strong reaction to it, and then rate accordingly; they don't go "well, this post is obnoxious, but maybe the guy's just having a bad day. I'll check out his other stuff to be sure." If we embrace the strong-reaction-to-a-single-post idea instead of denying it, what we get is: Let people express that reaction with a single click, and then aggregate those reactions to get a feeling for where the user is most well received.

The way this'll be implemented will be via a 'thanks' button on every content item that a user can contribute to. It lets you express that strong reaction quickly. Then, over time, the posts that a user is 'thanked' for will start to contribute their tags to the user - if the user receives lots of 'thanks' in threads that are tagged 'Python performance pygame' for example, then they'll start to acquire those tags themselves. This also gives users more feedback on what they're doing right.

Will there be a 'No thanks!' button? I'm not sure, but I think probably not. If you don't like a contribution, just don't thank the author. If it's really necessary, you can still tag the author explicitly, or even report the post to a moderator.

Decay
How do we deal with the fact that a user's expertise will change over time? Maybe they were a game programming guru 10 years ago, but they've not kept up and their advice is out of date now. This is a fairly simple one, actually: have tags 'decay' over time. Tags that are still frequently applied to a user will 'refresh' and will decay more slowly than tags that aren't. This also solves the 'idiot' problem - how to handle people tagging each other as 'idiot' - because if the user stops being an idiot, the tag will fade away; and it mitigates the lack of a 'no thanks' button, because posting without receiving thanks will cause your tags to fade away.

Getting input
How do we get people to actually use this stuff? That's one of the bigger problems with tagging in general. Step one is to make things as easy to use as possible - single-click to 'thanks' a post, two clicks to get to adding more complex tags. Step two is to get users to at least tag their own stuff; users will be encouraged to 'self-assess' by tagging themselves, to tag their own threads and entries, and so on. Step three is to incentivize. Now, there's a limited amount we can do here - we're not about to start paying people to tag content. What you saw in my last post, though, was the 'badges' system in userboxes; what we can quite easily do is grant a badge to people who tagged 100 content items in the past month, or something like that.

Using the output
Lastly, how do we help users find the best possible content, instead of wasting their time with incoherent in(s)anity - without encouraging them to trust an answer just because it's from a highly rated user? This is a balancing act to be sure, because most of the time the best content is produced by the high-rated users.

The first trick here is to make the way that ratings are displayed be subtle; no more four-digit numbers on each post. Instead, we're considering things like changing the background colour of the post, or the thickness of the post border, to indicate when a user is strongly aligned (tagged the same way as) a thread. Making the display subtle in this way will still make the post stand out a little in the thread, without providing such a clear and definitive thing that people can get overexcited about.

What we will probably display clearly on a post is the number of times it's been thanked (perhaps only within the past X weeks). This makes the number that people latch onto be about individual posts, rather than about users, and that's a lot safer - posts are easier to talk about without people taking things personally.

The second trick is to use the information on a broader level to bias search results. When you're searching for content on a particular topic, the search can elevate threads that have good alignment, or that have lots of 'thanked' posts in. This is still sort of acting on this idea that that content will be right 'because a smart person said it,' but by elevating it to the per-thread level instead of the per-post level, lower-rated users will still have a good opportunity to point out when the higher-rated user isn't making sense.

You'll notice I've not talked about 5-star ratings at all so far. We're still deciding exactly how they'll be integrated. The advantage that 5-star ratings offer is that they are coarse; tagging a thread with particular tokens might capture what the thread is about, but maybe you just want to convey some overall impression that the thread is awesome (or terrible), without figuring out exactly which tags would express that; they might be more applicable to, say, gallery entries. They've got their fair share of problems, of course, as comments on my previous post about the rating UI pointed out. We'll have to do some more thinking about them.

Conclusion
The new system doesn't quite solve the problems that the original rating system set out to solve. Instead, it focuses on the deeper problems of how to get the best content into your hands as quickly as possible and how to describe users; they're harder problems, naturally, but I think more worthwhile.

So, what do you think? I expect that quite a lot of people might have strong feelings about this topic [smile]

## Aaargh

This may be one of the most annoying developments in web technology I've encountered so far.

I guess it's not the access control spec itself per se, as much as it is Firefox and Firebug's implementation of it. Though it is frustrating that a request from the http version of a site to the https version of a site is considered cross-domain; the domain name and thus the set of IP addresses used in each case is the same, so I don't see any scenario in which you could control one but not the other, at least, not one that would otherwise be winnable (e.g. malicious router on the backbone routing things at the application level - it could do that for different file paths within an application, let alone different protocols).

When a cross-domain XHR fails under Firefox, there's no feedback as to why. There's no exception, no console message. The Net panel shows the requests that have been made - so you might see the OPTIONS preflight request, if one is made - but it doesn't tell you if/when it's discarding the results of a request (in response to security policy bits not being satisfied), or why. All you get is an empty XHR object in an error state.

Which is... difficult... to debug.

## Text sanitization

My work over the past few days has mostly been on the text sanitizer.

The sanitizer is an interesting beast. The basic task it faces is to take a chunk of what may be approximately something approaching XHTML (annotated with custom GDNet extensions), parse and lex it into an XML tree, strip away any elements or attributes that aren't permitted, and ensure that the result is valid XHTML (or that it would be when wrapped inside a DIV).

The first part - generating the XML tree - is actually the simplest. I'm using HTML Tidy, an open-source library for this kind of thing, that can take an arbitrary input and will return valid XML, adding closing tags and stuff where necessary.

The next steps - stripping forbidden elements and attributes - is harder. The sanitizer supports different sanitization 'profiles,' that describe what is and is not allowed for a given chunk of text; this means we can, for example, set a profile for the forums that only grants basic text and formatting tags, but set a profile for the journals that grants things like tables and embedded video.

One significant decision is whether to take an inclusive (only the named tags are allowed, everything else is removed) approach, or an exclusive (only the named tags are removed, everything else is kept) approach. Inclusive is better in that it's more secure, but it also means that the sanitizer needs to know about every possible tag you might want to use, including the attributes permitted on each. The exclusive approach is much easier to write - I just 'blacklist' the disallowed tags and attributes - but it's much more open to abuse, in that if I forget a tag then we've got problems. Things are complicated further by the way in which children of tags should be removed - if you've used the bold tag and it's not allowed, then the tag should be removed without removing the text within it. , on the other hand...

One thing I'm doing to ease the development burden is to use unit tests. I'm building a collection of bits of malformed or malicious text, coupled with the result that the sanitizer should produce.

This is where you can help. What test cases should I have? What finicky tricks and traps do you think the sanitizer should be watching out for?

## Search, don't Sort

One of the major philosophical elements of the V5 design is one taken from Google: Search, don't sort.

The problems with rigid categorization - sorting content items into distinct categories as 'containers' - are fairly well-known:

How do you decide what categories there should be? GDNet only creates new forums when there's sufficient traffic in one area to warrant it; we do this for good reason, but until the traffic reaches critical mass, the category on a topic isn't as precise as it could be.
How do you decide which category something should be in? When you've got category so vaguely defined as 'Game Programming' and 'General Programming,' it's easy to see how people can get confused.
What do you do when a content item should appear in more than one category? And what if they should appear in each category to unequal extents?
How do categories relate to one another? If something in one category is commonly in another category, perhaps they should be nested? If something is in the nested category, is it always also in the parent category?

A different approach is flexible category annotations, or 'tags.' Instead of viewing categories as containers that content items are sorted into, they're viewed as indexes into the content pool, fuzzy sets that describe the data rather than housing it.

What am I telling you this for? It's pretty well-known stuff by now, I guess. I'm bringing it up because over the past few days I've been working mostly on the tagging and search engines for V5.

The tagging engine has a pretty simple set of responsibilities:

Store and retrieve the tags associated by a user with a given resource.
Calculate some set of 'aggregated' tags for a resource, using the tags applied to the item by all users.
Find the resources most relevant to a tag or set of tags.

The implementation I've written so far is a naive one, but it'll suffice for the time being. The aggregation process is simply the average of all user-applied tags, crude but open to tweaking later. Finding the most relevant resources is little more than a SELECT query, scoring relevance by taking the mean least squared error between each tag set and the supplied search tags. There are problems, but they can be fixed later.

One nice trick resulting from the RESTful schema for the site is that each resource has a nice, clear URI - ideal for using as a key. So each tagset is the association of a set of (Tag, Weight) pairs with a Uri. The result is completely content-agnostic; the tagging engine knows nothing about the kinds of content the site offers.

The tagging engine's last responsibility - finding resources - is obviously highly related to the search engine. Not all searches are tag-related; for example, Active Topics is a search for all discussion threads updated in the past 24 hours, while it's easy to imagine other searches based around the author of the content or similar. So, there is a separate search service that stores, maintains, and performs all saved and transient searches, using the tagging engine when appropriate.

## IE8 fixes, and chat client

Not much to report today.

IE support is now better, though not on a par with the other platforms by a long shot. Funnily enough, the problem wasn't the mime type - I've been serving it up as text/html for IE for a long time - but more with the actual document content itself. Specifically, benryves drew my attention to this part of the XHTML standard, which states that certain tags should be written as explicit open/close pairs (rather than the style). Under XML these are equivalent, but given that we're pretending for IE's sake that the document is HTML, it causes problems. What's most interesting about this is that it also breaks Firefox - Firefox does not seem to like minimized

## V5 Pre-alpha launch

Happy 10th birthday, GDNet! I got you a present. It's not much. I'd hoped, planned, for so much more, but you know how these things go.

Yes, folks, the V5 codebase is finally at a point where I can start putting bits of it up for public dissection, consumption, digestion, and *ahem* feedback!

There's not much to show you today, but I'm planning on pushing out new stuff very quickly at this point; much of the infrastructure is now in place, reasonably solid, so I can really focus on things that you can see.

Things to note before we start:

Firstly, I've been developing it primarily in Firefox; it also mostly works in Chrome. It's broken in IE - I think the problem is the content-type - and I've not tested it in Opera. Eventually, the site will be supported in FF3, IE7 or later, Chrome, Safari, and Opera. I'm aiming to downgrade gracefully to older browsers, but it's not a top priority and it probably won't be pretty.

Secondly, I've been doing all of the graphic design work myself, and I'm no artist. I'm focusing mostly on the functionality of the UI; consider the way that it looks to be 'programmer art' for now. Somebody with actual aesthetic sensibilities will look at it later, I promise [grin]

Thirdly, speed-wise, what you can see today is an unoptimized debug build, sharing a server with the current site (and the current site does not like to share). I've not had a chance to properly stress-test it, which is partly what taking it public is for. So, performance will improve drastically as the bugs are ironed out and I can start turning off the debugging flags.

Lastly, it should all be valid XHTML, CSS, and javascript; it should all work correctly when you are increasing or decreasing the text size; and the URI schema should be generally RESTful.

You can use your regular GDNet username/password for login. It's all connected up to the current site DB through an adaptor layer that maps V4 database records to the new schema formats to as great an extent as possible.
Submission of username/password info is now done over SSL, for greatly improved security. (Maybe you don't care that much about your GDNet account being secure right now, but this is an absolute requirement for some of the services we want to offer in the future).
Once you're logged in, you should see a little bug icon next to the welcome message in the bottom right corner. Click it, and you'll get a box that lets you submit bugs and feedback, right from the browser; reports go automatically straight into my bugtracker. This icon should appear on every page of the site for logged-in users. Go ahead and use it liberally over the coming weeks. (Please don't abuse it; all you do is make more work for me).
A forum topic

I've tried to minimize the amount of extra cruft displayed on each post, so you can focus on the content. Extra user info can be revealed by hitting the chevrons at the right end of the post header.
Avatars don't work yet. They're going to be hard to sync between the current site and the new site...
You can see a few people have badges next to their name. More info about their badges is displayed in the expanded info. At the moment there are only two kinds of badge - Moderator and GDNet+ - but it's easy to think of other badges we might create and apply.

So, yeah. Not much to look at for now, but gimmie feedback. I should have some more stuff for you in the next couple of days.

## Almost done

So yeah I've been a bit quiet recently. This is mostly due to life events, partly due to being busy working on V5, but mostly due to my Masters' degree. A degree which, as of Monday at noon, I will have finished.

Monday noon is the final deadline for the last piece of work I have due in: my dissertation, my magnum opus, my fourth-year project. Weighing in at 8510 words plus a sizeable chunk of code, I have reached the point with this project that I consider it done. I could keep picking over it, tweaking words and phrasing here and there, but all that it'll do is drive me slowly insane.

What might be more productive would be if I released it for some peer review, so that's what I'm doing.

An Object-Oriented Spline Library

I'll get the code up here as well in short order, but if any of you happen to feel like taking the time to review the report itself - it should be pretty comprehensible on its own without the code anyway - I'll give you love, and possibly cookies.

I want to freeze this by the end of the day so I've got time to get it printed and everything, so there's a limited window that I can use the feedback in - but as I'm planning on releasing the library (under a permissive license) once everything's done, it's all potentially useful for later anyway.

(Feedback collected from #gamedev so far: try some different layout bits; define what C0 continuity means; fix the spelling mistake in the ToC ("tesselation"); replace some of the code snippets with neater pseudocode. Thanks clb, Shadowdancer, Zao, sirob, and everyone else...)

## GDC: Day One

Right, you know the drill by now I'm sure. I'm in the Press Lounge with Dave and Kevin, preparing to head out for the day's sessions. Today and tomorrow I'll be covering the Casual Games Summit. There's a curious 'red ocean versus blue ocean' theme going on, but in general it looks like there's a nice review of the state of the industry, and a fair chunk of talk about social media and multiplayer.

Fingers crossed that the Moscone wireless will be better than last year. Press Lounge seems OK - but then it always does until Wednesday. We'll see.

## V5 misc

V5 work continues. Today I added the IRC client; rather than using the Java applet again, we're going with Mibbit, which is more fully featured, and runs purely on javascript/AJAX. It's also under active development, and we just embed it, so as they add new features to it those features should just magically manifest themselves at our end. Lurvely.

Let's see, what else have I done? Some refactoring... logging in is now a much simpler codepath. I've added support for reporting bugs directly from the browser page (for logged-in users, at least), which automatically includes useful information in the report like the page you were looking at or various page-level javascript vars. Should make the beta process much smoother. There are also now both RSS 2.0 and Atom feeds for Active Topics in the code.

I've also done more work on the URI schema - the actual addresses you'll use to access resources on the site. I'm going for a RESTful approach with all this, so getting the URI schema right is less about organising files on the webserver and more about usability; for example, /community/forums/topic.asp?id=123456 becomes /discuss/topic/123456 and so on. It also motivates the design of service contracts going forward.

I'm generally pretty happy about my choice of WCF - the documentation is good, the framework generally follows the principle of least surprise, integration with third party tech is fine (any .NET or COM library is trivial to work with, plus of course any other web services), and the more I dig into customizing the WCF stack itself - such as for my XSRF filters - the more I feel that I am bending the framework to my will, rather than being forced to conform to its way of working.

I've only really got one complaint about it, at this stage: while it's very easy to swap out framework pieces for custom components, often those pieces are larger than you really want. That would be fine if it was easy to recreate functionality offered by the part you're replacing, but MS keep most of the relevant helper classes and methods as internal to the WCF assemblies. I'd really like to reuse their code for extracting the body of a POST request as a Stream, for example, but the relevant class (HttpStreamFormatter, I believe) is marked as internal. I can understand that every class they expose publicly is one they have to document, support, and change control, but I think it would be worth it, particularly for people building HTTP apps with WCF.

## V5: XSRF Prevention

Looks like I just missed Gaiiden's weekly journal roundup. Oh well.

I've spent today and yesterday implementing a security measure against cross-site request forgery attacks, otherwise known as XSRF attacks. These are a slightly terrifying class of attack, not least because so few people seem to be paying attention to them; an estimated 70% of sites on the web are vulnerable to - and have done nothing to guard against - this kind of attack.

XSRF is an attack in which a malicious site causes your browser to make a request to another one, in such a way that it takes advantage of the fact that you've got some cookies or some kind of session key open with that other site.

Say you've got a banking website which allows you to conduct some transactions online. They've got a web form for sending money from your account to another one; it submits data to /actions/do_transaction?to=XXXX&amt=YYYY, where XXXX is the target account number and YYYY is the amount. When you're logged into the site, your session is maintained through the use of a cookie stored on your machine.

All that I have to do is embed a 1x1 image in my page that is sourced from '//your.bank/actions/do_transaction?to=1234&amt=1000', and if you view my page while you're logged in, then presto - you've transferred \$1000 to account number 1234. Your browser sees the URI that the image is supposed to come from, and issues a request for it - sending any cookies necessary to keep the session alive. It's like 'remote controlling' a session - there's no need to ever actually steal the session cookie when you can just make the browser that already holds it do what you want to do. It's known as a "confused deputy attack

So, some protections that don't work:

Check the referrer: easily faked, plus some users don't send referrer headers.
Use POST requests instead of GET requests: while this would defeat the IMG tag approach, it's trivial to get around using javascript and XmlHttpRequest.
SSL: At no point is the connection between you and your bank site ever actually attacked in this, so securing that connection doesn't help.
Encrypted cookies: Again, the cookie is never actually stolen, so encrypting it won't help.

Ultimately, there is only one possible defence: Require that the request contain some information that is not stored in cookies and that malicious sites cannot know ahead of time. When your bank presents the 'transfer money' page, it includes that information in the page itself - in the HTML, or in the javascript - and submits it straight back again when you've finished filling out the form. So, if a malicious site wants to obtain that information, it can only do it while you've got the actual page open - and in theory the browser security model should prevent that.

As for the information itself, something as simple as a hash of the request URI with the session ID is enough to shut down most (if not all) attack scenarios. It's got the advantage of being easily testable - all the information you need is in the request itself.

So. What I've built over the past couple of days is a WCF extension that can test messages for the XSRF-prevention token prior to the message even reaching the service operation itself. In short, all I have to do is add a couple of attributes to my service contract:

[ServiceContract]
[XsrfAwareBehavior]
interface IDiscussionService
{
[OperationContract]
[WebGet(UriTemplate="", BodyStyle=WebMessageBodyStyle.Bare)]
Stream GetDiscussionOverviewPage();

[OperationContract]
[WebGet(UriTemplate = "activeTopics", BodyStyle = WebMessageBodyStyle.Bare)]
Stream GetActiveTopicsPage();

[OperationContract]
[WebGet(UriTemplate = "activeTopics.json", BodyStyle = WebMessageBodyStyle.Bare, ResponseFormat = WebMessageFormat.Json)]
[XsrfAwareOperation]

[OperationContract]
[WebGet(UriTemplate = "{id}", BodyStyle = WebMessageBodyStyle.Bare)]
}

You can see one of them at the beginning - indicating that this service contract needs to be checked for XSRF-aware operations - and then the actual operation marker on the GetActiveTopicsJson() method. XsrfAwareBehavior invokes a service contract behavior I've written, which scans the contract for methods marked as XsrfAwareOperation, and inserts my token-checker into the formatting pipeline for each one.

Actually inserting the tokens into HTML is still a bit clunky - I've got a method available to my XSLT which takes the URI for a link and returns the appropriate token. It'll do for now.

Note that this doesn't protect against script injection attacks. If somebody manages to run an unauthorized javascript on a page from actually within the site, then they'll have access to the cookie containing the session ID and could quite easily hash it themselves to issue requests elsewhere. V5 is not going to be quite as permissive as V4 is when it comes to custom javascript, though [wink]