• entries
375
1136
• views
298808

But tonight, on this small planet, on Earth, we're going to rock civilization.

## members.gamedev.net

I'm pleased to announce that the GDNet+ member webspace is now fully back online - and, for the first time in about 3 years, accessible by FTP once more!

Just FTP into members.gamedev.net, using your regular username and password, to access your personal space. Old GDNet+ members should find all their files ready and waiting for them. We've also increased the space quota to 100MB per user, and we'll look at increasing this further as things get settled in.

Old GDNet+ members can browse to the same addresses that they've always used. We'll probably be retiring these addresses at some point, but we'll make sure we let you know before we do.

## V5: What I've been working on recently

Well, I could tell you, but maybe it'd be easier just to show you.

(Let me know if you get any errors out of it. I'm aware of two issues at the moment: one, that ads don't load in IE; and two, that sometimes a page displays a generic 'something went wrong' message which goes away when you refresh. I'm fairly sure the second is something to do with an idle timeout somewhere because it only happens after nobody's touched the pages for a bit).

More to come.

EDIT: Here's another one.

## GDNet Slim

For the past three days or so, I've taken some time away from working on V5 to see if there aren't some things I can do for the current site, V4. As you're no doubt aware, we're in a bit of a tight spot on cashflow right now - much like everyone else in the industry - so I figured I'd see if there wasn't anything I could do to bring down our hosting costs. Messing with our hardware and datacenter setup is beyond my remit; I'm only the software guy here, but that software has been churning out an average of 15 terabytes of data every month, and bandwidth ain't free. Not to mention that it makes the site load more slowly for you.

So, what exactly have I done about it? 97 commits to Subversion in the past three days, that's what [grin]

I spent about 4 hours optimizing and refactoring the site's CSS. Historically the site's had one large (28kb) CSS file per theme, with lots of duplication between themes; this is now one shared (16kb) and one theme-specific (11kb) file. A whopping 1kb saving, hurrah! Might not seem like much, but now that all the common stuff is in one file, it makes it easier to optimize, and also means that the optimizations will be picked up by people on every theme.
I totally rewrote the markup (and CSS) for the header banner you see up top there. It used to be this big 3-row table, with 0-height cells, lots of sliced-up background imagery, etc. It's now 4 divs. Much, much cleaner.
I put all the little icons from the nav bar into a sprite map, and got them all to be displayed by CSS. So, now, instead of making 15 separate requests to the server, you only make 1, and now there are no image tags in the header of every page.
I rewrote the little popup you get when you mouse over the 'recent topics' on the front page. The javascript library we were using to do this weighed in at 50KB (!!!); even minified it was still 23KB. I had a look into a jQuery solution, as we can embed a version of jQuery hosted by one of the big CDNs, but then realised that the whole thing could just be CSS instead. So, it is. That's a 50KB saving on bandwidth for every brand new visitor to our site's front page right there, which is substantial.
I stripped a bunch of
tags out of the markup and replaced them with margins (specified in the cached CSS files, naturally).
I updated our Google Analytics code. This wasn't strictly necessary, but I wanted to do it, and in the process I discovered that none of the forum pages have actually been including it properly up until now. The visitor graph in Analytics since I fixed it has a spike that looks like we've just been featured on CNN or something [grin]
I tidied up the breadcrumb, search box, and footer code. Again, mostly just getting rid of tables and replacing them with CSS.
I killed some of the 'xmlns' attributes that get left in our output due to the way we're using XSLT. There's still a bunch of them around, but I covered forum topics, which are the most popular offender. At some point I'll go back in and do all the other cases.
I redid the markup for the headers in 'printable version' articles. The gain from this won't be too huge, but it's often where Google searches end up, so it won't be nothing either. Also because I HATE TABLES AND WILL MAKE LOVE TO CSS IF IT IS EVER INCARNATE AS A TANGIBLE ENTITY.
I made us a new version of the OpenGL|ES logo. It's shinier!

That's pretty much everything for now. It's a little difficult to get a picture of how much total change it's made, but the HTML for the site front page has dropped from 95kb to 85kb. I guess I'll find out if it's actually made a serious dent when I hear the bandwidth figures in a few days.

What's the downside to all this? I've been acting with basically no regard to old versions of IE. Chrome is my primary development browser now, with Firefox a close second; I check that things work in IE8, particularly when using unusual CSS pseudoclasses like :hover and :first-child, but anything prior to IE8 - and especially anything prior to IE6 - can go die in a fire, basically. I know, I know, you can't do anything about it, your machine is locked down by corporate, I understand... and I don't care. These days, I think I'd be comfortable accusing any sysadmin who hasn't upgraded all their machines to at least IE7 of criminal negligence.

I guess the site will probably still work in old versions of IE. I'm not actively trying to shoot them down. Yet. By and large, things should degrade gracefully.

To end, here are some excerpts from my SVN logs that you may enjoy.

2010-07-15 00:29:18 dropped prototype and clientscripts.js from the page header. (over 120kb for a new visitor!)
2010-07-15 00:32:50 also dropped menu.js, as the menus have been CSS powered for some time now

2010-07-15 03:24:27 killed the empty child! \m/

2010-07-15 04:33:49 tidied up breadcrumb + search boxes
2010-07-15 04:34:38 oops
2010-07-15 04:37:03 try again

2010-07-16 02:21:38 updated 'printable' articles to use GAM
2010-07-16 02:23:11 forgot the

## V5: Fun with MSBuild

Now that Pulse is churning away at the codebase, I've spent time today doing further tidying of the build and deploy process. Once the site goes live I want to be able to get changes deployed quickly and safely, and I want to be able to start deploying builds for the Staff to look at within the next few days, so I'm doing what I can to get the pipeline right now. Fortunately, the tooling in this area is all really pretty good.

The first problem is versioning. As I mentioned in my last journal entry, I wanted to get MSBuild to stamp all my executables (I should have said 'assemblies' because a lot of these are libraries) with the build number. When dealing with rapidly-changing, opaque binaries spread across multiple computers, being able to ensure that your files are in sync is critical.

MSBuild, the standard build engine used for .NET projects, is highly flexible and extensible; it's very easy to just drop in new kinds of build task and add them to your project file, and best of all, Visual Studio is fine with it - it can't show them in the UI most of the time, but it can respect them, execute them, and generally not screw them up while working with the project file like normal. There's also a lot of drop-in build tasks freely available all over the net. For this, I'm using the AssemblyInfoTask (though I may upgrade to the MSBuild Extension Pack). The task takes the necessary version parts - major, minor, build, revision, plus any of the product/company name, copyright info etc that you usually find in a Win32 file version resource - and updates the project's AssemblyInfo.cs file with them prior to build. That's a little skeevy as it means the AssemblyInfo.cs file - which is under SVN - keeps getting local changes, but I can live with it. I've written a .targets file that incorporates the task just before the main compile phase, like so:

1
0
$(PulseBuildNumber)$(PulseSvnRevision)
NoIncrement
D
NoIncrement
D

"**\AssemblyInfo.*" Exclude="**\.svn\**"/>

$(CoreCompileDependsOn); UpdateAssemblyInfoFiles "UpdateAssemblyInfoFiles" Inputs="$(MSBuildAllProjects);
@(Compile);
@(ManifestResourceWithNoCulture);
$(ApplicationIcon);$(AssemblyOriginatorKeyFile);
@(ManifestNonResxWithNoCultureOnDisk);
@(ReferencePath);
@(EmbeddedDocumentation);
@(AssemblyInfoFiles)"
Outputs="@(AssemblyInfoFiles);@(IntermediateAssembly)">
"@(AssemblyInfoFiles)"
AssemblyMajorVersion="$(AssemblyMajorVersion)" AssemblyMinorVersion="$(AssemblyMinorVersion)"
AssemblyBuildNumber="$(AssemblyBuildNumber)" AssemblyRevision="$(AssemblyRevision)"
AssemblyBuildNumberType="$(AssemblyBuildNumberType)" AssemblyBuildNumberFormat="$(AssemblyBuildNumberFormat)"
AssemblyRevisionType="$(AssemblyRevisionType)" AssemblyRevisionFormat="$(AssemblyRevisionFormat)">
"MaxAssemblyVersion" PropertyName="MaxAssemblyVersion"/>
"MaxAssemblyFileVersion" PropertyName="MaxAssemblyFileVersion"/>

This could be made somewhat more efficient - I don't strictly need to pull the version bits out into a separate PropertyGroup, for example, and could just write them directly into the attributes on the AssemblyInfo element. Still, it gets the job done. All I then need to do is add an statement into my .csproj file pointing at this .targets file, and the build step is magically included.

Note how the build and revision numbers are actually variables - PulseBuildNumber and PulseSvnRevision. I'm passing those in as arguments to MSBuild when I launch it. You can do this on the command-line using the /p switch, though because I'm using Pulse, it's actually got an XML config file that I use to feed inputs to MSBuild:

$(build.number) and$(build.revision) are, in turn, built-in variables defined by Pulse whenever it launches a build. See the data pipeline!

I had a good question from @naim_kingston on Twitter, asking why I use both the build number and the SVN revision number - aren't they redundant? In theory, yes; I should only need the SVN revision number, and then should be able to check out that revision of the code, build it, and always get the same result. In practice, though, I might not always get the same result because there are elements of the environment that may have changed. For example, maybe I'm using a different version of the compiler, or of the build tasks library. Storing the build number as well allows me to more quickly correlate a particular binary to its entry in Pulse's build log, so I can very quickly go to Pulse and download the right .pdb files, MSBuild output files, and so on, and always be confident that what I'm getting is from exactly the same build, rather than just one that used the same code.

So, that's got versioning sorted. I need to add the element to more of my project files, but I've got the main service projects covered for now. I'll add more as I go along.

Next, app.config files. It's common to want to change stuff in these files, such as the address at which a service can be found (e.g. from "db-server.gamedev.net" to "localhost"), but changing the app.config file directly means you have to remember not to check it into SVN, and it's kinda pesky to have it always showing up as 'modified' in the Pending Changes window. What would be better would be if I could have a second file of 'local overrides' that should be used in preference to the app.config file, falling back to app.config for stuff I don't care to change.

MSBuild to the rescue once more. This time I've used the MSBuild Community Tasks, which includes a task called "XmlMassUpdate" - given two XML files, it takes the nodes from one, and adds, inserts, or replaces them into the other. There's also some custom attributes for removing nodes from the target file. Another .targets file integrates the task into my build pipeline, and presto: I have an app.local.config file in each project, svn:ignored to stop it from pestering me, that MSBuild neatly integrates on every local build.

The next challenge I face is how to get each service from a ZIP file in Pulse to a correctly installed and registered presence on the relevant server. There's more to this than just XCOPY - most of the services need to be registered as Windows Services, have event log sources and WMI classes created, etc - and ideally it should happen without me logging into each machine by hand and copying files around + running InstallUtil. The answer is probably going to be to build MSI files. Anyway, that's for later. For now, I'll sleep.

## V5: User accounts and profiles

At the moment I'm working on the code for managing user accounts. This encompasses logging into accounts, creating new accounts, changing your password, and so on. There are some interesting features and design requirements that make this a non-trivial thing to do, so maybe it'll be interesting for you to read about it.

Federated Identity: A world of pain

Probably the biggest thing affecting the way user accounts get handled is the fact that we're supporting federated identity in V5. No longer will you need a username and password with us; if you'd prefer, you'll be able to sign in using a Facebook or LinkedIn account, or a generic OpenID. This is potentially a pretty big deal for our signup rate; given that pretty much every piece of content on the site will have a 'share' button that can spam a link to the social network of your choice, we're going to get an increase in people coming from sites like Facebook, and we want to make it as easy as possible for them to interact with the site, post comments, and so on.

Simply authenticating with these external sites would be easy, but we also want to fetch profile data from them... and while technically straightforward, the data privacy policies complicate matters. You're not allowed to store any data you retrieve from Facebook for more than 24 hours (with a few exceptions, like the numeric user ID); LinkedIn has a similar, though less explicit, policy. But if you come to the site from Facebook and post a comment, who do we attribute that comment to a week later? We can't store your name, avatar, or anything like that for more than a day.

What we have to do is simply re-fetch the data from Facebook when we need it. We can cache whatever we've fetched for up to 24 hours, but after that we drop it from our cache and wait until somebody needs it again. As well as storing your Facebook user ID, we also store the session key needed to talk to Facebook about you. The session has the special 'offline access' permission set on it, so we can keep using the same session key even when you're signed out of Facebook - it lasts until you 'disconnect' us (remove us from your FB applications listing).

So, all we need is a table of (facebookUserID, facebookSessionKey, expires, ...) to store all our cached Facebook data. We can run a job every 10 minutes or so, and for any entry that's approaching the 24 hour limit, we wipe all the data except the user ID and the session key. When the profile data is needed again, we go and refetch it from Facebook. Simples.

A similar approach is taken with LinkedIn - using OAuth, rather than a proprietary platform - and will be taken with OpenID, using some of the extensions to the standard. There are no explicit privacy policy concerns with OpenID, but another benefit of doing this is that we'll automatically be synchronising our data with external sites - so if you change your profile information on your OpenID site, it'll update here too.

What's in a name?

One of the problems this is going to make much more acute is duplicate names. At the moment, it's no big deal to ask every user to pick a unique nickname, but if you're coming from Facebook or LinkedIn then it's much more natural to just use your real name. But we can't ask people to pick unique real names! What happens when two John Smiths both come to use the site?

Also, plenty of users won't want to go by their real names. Just because you've come to the site from Facebook or LinkedIn doesn't mean you're happy advertising who you are.

The end requirement is that we want every user to have a unique 'display name,' which can be constructed from their first/last name, their nickname, or a combination thereof. The rules will be something like:

Offer the user the option to display their real name. If they turn it down, they have to pick a nickname that doesn't match any of the existing display names, and the nickname will be their display name.
If they enter their real name, and there's no other user with that real name, their real name can be their display name, and a nickname is optional.
If their real name is already in use as a display name, then they have to pick a nickname that will cause their display name to be unique.

Going by both real name and nickname will probably be displayed like:

Richard "Superpig" Fine

while going by real names or nicknames would just be what you'd expect - "Richard Fine" and "Superpig" respectively.

Cobbling bits together

A further complication is that the user might get some of their profile information from the external site, but not all. LinkedIn, for example, doesn't provide any kind of email address. And what if the user wants to present a slightly different identity on GDNet? Maybe they go by 'T-Bird Smith' on Facebook, but they'd rather go by the slightly more professional 'Tom Smith' on GDNet.

Enter the 'profile map.' The map specifies, for each field of the profile, where it comes from: LinkedIn, Facebook, GDNetV4, GDNetV5, and so on. Whenever the site needs to load somebody's profile into memory, the accounts service begins by fetching the profile map, and then the necessary LinkedIn/Facebook/V4/V5 database rows, combining fields across them to populate the user profile data structure. (This structure is then cached in-memory to avoid having to assemble stuff from the DB every time).

Here comes the new stuff, same as the old stuff

One other thing about this architecture is that it finally answers the question of how to handle existing (V4) user accounts: they just get treated like another identity provider, same as Facebook or LinkedIn. At some point we'll convert every V4 account into a V5 account, but treating it like an external identity provider for now will make it very easy to run the two sites side-by-side until that time.

## I'm coming to get you, Barbara

Finished watching NotLD. Much fun. It's given me some things to think about (such as, what if zombies only took damage from shots to the head?, and, human players may well construct plans that depend on every one of them playing their part perfectly, and if they don't...), as well as a great soundbite, the guy at the beginning "I'm coming to get you, Barbara." Superb. Remind me to put that in as an easter egg somewhere.

Ali and Fed watched it with me, with Paddy (our new head of house, and one of my closest friends) popping in and out (he was "on duty," though, seeing as how it's the last Saturday night of term. Plus he wasn't feeling well, poor chap). Many silly comments were made, and Ali was surprised at how gory it was. It is really quite a gory film, especially for black-and-white (because brought up as I was on color films, gore really needs red to be effective); some very nice clips of zombies wrestling with lengths of intestine, or squeezing kidneys to get all the juice out. Lovely stuff.

It's also given me a few more ideas about how the dynamics of the human players will work. I'd kinda just assumed that if you throw a load of human players into a room together with the zombies outside, they'll all work together to defend the place - but it occurs to me that such a thing is not necessarily the case. When you consider that the human players are the most "hardcore" of all the players in the world (they survived longest), it's less likely that they'll be willing to cooperate and follow orders. Of course, if they're hardcore players, they'll know how best to play the game and how best to defend, and so may well follow orders simply because that's how they'd do things if they were in charge.

Lack of attachment to other characters may be a problem - "every man for himself" could cause the dynamics to break down a bit. Mr Cooper, in the film, was acting partly out of cowardice, but partly out of a wish to protect the others members of his family. Without unwell daughters to defend, I'm not sure if the conflict factor will be strong enough.

I'm also wondering about the ongoing nature of the game. Is it really workable to have a constant, ongoing scenario, in which both sides struggle but neither win? It'd be hell to balance. So instead, I'm wondering if the game would work better in "simulation cycles."

Here's how it works. We get rid of our single, huge, connected world (awww...), and replace it with a large number of smaller ones. Only five to ten of these are actually running at any given time, but the content exists for, say, 20. Every few days, one of the worlds is ended - once all the action's died down, sort of thing - and it restarted, either with the same content set, or a new one.

It's kinda like the cinema. "The next screening of 'Deserted Town' will be in 3 days on server #8." That way, the simulation can go either way - perhaps some final assessment of a "winning side" could be employed, though I'm not sure about that - and then stop+start all over again.

That might screw with the save/load system, though. Dammit, why can't people just play games 24/7? That'd make my life so much easier. [grin]

People would need to be in some kind of "entry hall" before the simulation starts; when it actually does start, they're in it, and nobody else can join. That limits the numbers, it ensures that the simulation can't go on for ever. Perhaps save/loading can be that anyone who is out of the game for more than 24 hours is auto-killed? That way you just run the simulation until (a) everyone's dead, or (b) most people are dead and everyone else has reached a stalemate. Then you restart.

It's quite possible that both systems could be in use at the same time. Have one "constant simulation," which is a single huge world, and alongside you have many "periodic simulations", like I've described.

I'm also wondering about the 'flowing' aspect of the game world. Emerson Best once showed me something he thought was crap about Medal of Honor - at the edge of the game world, there's a small wall which you can't step up on. The fact that you can't go any further isn't a problem - sad though it may be, a game such as Medal of Honor does need to have limits to its game world.

What was crap was that you, this hardcore elite soldier, couldn't climb up this little step.

Game limits can be done in a number of ways, and each one breaks immersion to a different degree. A pure and simple invisible wall breaks immersion. A pathetic little unclimbable ledge not only breaks immersion, but is out of character, too. In Sniper, the general method was to use things like barricades as impassable barriers (oh the irony); it seemed somewhat artificial, but was still in character for the game world. In some Sniper levels, the barricades weren't even necessary - there simply wasn't an exit from the area. That's the best kind of approach.

But it's not always practical. If you've got things like roads and rivers in your level, it can be quite hard to have them just... stop. So I'm wondering about stealing a technique from some of the older games. I'm wondering about having my game worlds wrap.

What's wrapping? For those who have never encountered it, wrapping is basically where you walk off one side of the map, and come back on the other. And suddenly, all problems are resolved - no world barriers. Roads reach the edge of the map and mysteriously meet up with... themselves! It might not be great from a usefulness-of-the-road point of view, but hey.

It does present technical problems, for the renderer at least. If you could climb a tall tower and look straight ahead towards the edge of the world, you could in theory see all the way in until you're looking at the tower you're standing in. And it doesn't stop there - you see the map repeated infinite times.

Well, I guess I'll be avoiding tall towers in the world designs, and ensuring some kind of visibility-limiting factor like fog or the far clip plane [wink]

So, we get these small(ish) maps that just loop, being replayed over and over. There's something deep and poetic in that, but I'm not sure what.

It might frustrate some players - particularly human ones - who try and survive by escaping the area. That could just be par for the course, though - weigh up the number of people frustrated when they're pinned against the edge of the world, against the number of people frustrated when they can't drive away from the zombie-infested city, and I think I can guess which is the smaller number. It'd just mean that the game can never be won by simply running away - you've got to stay and fight, to make a stand - which is more interesting anyway. Ensure plenty of defensible positions, a decent amount of weaponry, and you're set.

As the sister enters the house near the beginning of the film, I made some comments about how she'd moved into a more defensible position, and was securing the house. I then realised that some day, I might be watching my game through an observation camera, making comments on the play technique of some female human player as she seeks shelter somewhere. That was kinda cool. :)

## Activity streams

Today's chocolate-chunk-o'-LINQ:

public ActivityLogEntry[] GetActivitiesFiltered(DateTime? startTime, DateTime? endTime, Uri[] actors, Uri[] actorTypes, Uri[] objects, Uri[] objectTypes, Uri[] verbs, int? maxToFetch)
{
using(var context = new ActivityEntities())
{
var entries = context.Activities.AsQueryable();

if (startTime.HasValue)
entries = entries.Where(act => act.timestamp >= startTime.Value);
if (endTime.HasValue)
entries = entries.Where(act => act.timestamp
if (actors != null)
entries = actors.Length == 1 ? entries.Where(ent => ent.actorUri == actors.First().ToString())
: entries.Join(actors, ent => ent.actorUri, act => act.ToString(), (ent, act) => ent);
if (actorTypes != null)
entries = actorTypes.Length == 1 ? entries.Where(ent => ent.actorType == actorTypes.First().ToString())
: entries.Join(actorTypes, ent => ent.actorType, act => act.ToString(), (ent, act) => ent);

if (objects != null)
entries = objects.Length == 1 ? entries.Where(ent => ent.objectUri == objects.First().ToString())
: entries.Join(objects, ent => ent.objectUri, act => act.ToString(), (ent, act) => ent);
if (objectTypes != null)
entries = objectTypes.Length == 1 ? entries.Where(ent => ent.objectType == objectTypes.First().ToString())
: entries.Join(objectTypes, ent => ent.objectType, act => act.ToString(), (ent, act) => ent);

if (verbs != null)
entries =
entries.Where(
act => act.ActivityVerbs.Join(verbs, v => v.verb, w => w.ToString(), (v, w) => w).Any());

if (maxToFetch.HasValue)
entries = entries.Take(maxToFetch.Value);

return entries.Select(MakeFromEntity).ToArray();
}
}

## More dev

OK, post icons - on threads - are safe for now. I've left them off individual posts, though they might go back on; I can see them sometimes being useful to communicate the overall tone of a post (I often used to use the roll-eyes smiley when being sarcastic). We'll see. Certainly where icons are kept I think we will roll out more of them.

Design work continues... most recently I've written in the stuff about subscriptions. Paypal will still be supported, and eventually I want to look into the possibility of supporting transactions through other means, maybe such as Google Checkout.

There are still a few things on my to-do list - some smaller than others. The one that I guess is amongst the most contentious is the rating system.

It's been established that the site will be getting the ability to tag a user with one or more keywords. If you think that a particular person is "all about" graphics, or neural networks, or whatever, you can tag them accordingly. Then, when you're searching for information on a particular topic, the site will be able to point you at people who are heavily involved in that topic - the idea is that they will be the 'experts in the field.' The search will also be able to do things like identifying threads or articles that have involvement from those experts (handy if you're looking for answers), versus things that do not (handy if you're looking for questions).

The tagging system won't just be limited to technical topics. If somebody's just a really nice guy, you can tag them with 'nice guy' (or just 'nice'). If they're good at explaining things, you can tag them with 'good teacher' or 'good at explaining.' If they're impatient and ungrateful, you can tag them with 'unpatient' and 'ungrateful.'

The question is, is that enough to be useful?

Remember that the site has no concept of what a tag means. It has no inherent distinction between 'idiot' and 'guru' - they're both just words. As such it's difficult for the site to 'take action' against people who are being rude and abusive; it doesn't know which tags indicate that.

Furthermore, I'm not sure how many people will be comfortable tagging somebody as an idiot. It is, perhaps, a bit too negative, a bit too damning, and it lacks eloquence - 'idiot' isn't very descriptive. The system won't prevent it for those people who /are/ comfortable with it, naturally, but I fear that it may simply not be used by people who just want to express a vague feeling of displeasure with a person.

Add to this another oft-cited issue with the existing rating system - that people too often don't know or understand what they've been rated up or down for. We've said in the past that ratings should be awarded based on a holistic consideration of a person's contribution - that you shouldn't rate somebody without looking at their profile and seeing their other posts. Maybe they're just having a bad day. After observing the system for several years, I don't think people do this - so maybe it's worth abandoning the approach.

What I'm considering is a variation on a system I've seen at some other forums - specifically I'm thinking of TCE, though I'm sure it's elsewhere as well. Simply put, on every post, there'd be a "thanks!" and "no thanks!" button. You press the former if you want to thank a user for their contribution; you press the latter if you feel the opposite. The total number of 'thanks' and 'no thanks' are weighed up and used to calculate a karma rating for the post. A user's total karma rating is then calculated as a function of the karma ratings of all their posts.

To be clear, unlike these other systems, it would still be anonymous. You would not see who has thanked/blamed somebody for a given post, only the number of people who have done so. I'm thinking as well that it would be displayed on a colour scale rather than a numeric value, so that people don't go apeshit over tiny changes in value.

What do you think?

## GDNet V5 Concepts: User Ratings

One of the questions from a previous entry was what's happening to user ratings in V5. I don't have funky screenshots to show you this time, but I'll talk about what the plan is.

The present system
The present user rating system, visible under every post as a number, was created to solve a set of problems:

How do users distinguish the people that should be listened to from the people that shouldn't?
How do we identify users who are contributing to the site and community?
How do we identify users who are detracting from the site and community?

These problems were all solvable, but they required a lot of time investment and effort. We wanted to shift away from solutions that relied on users and moderators spending lots of time watching site activity. The solution was to seek to recruit the entire userbase to help solve the problem, by giving everybody a means to indicate who should and shouldn't be listened to. That, in turn, needed some kind of balancing to determine which people were good judges, which is why higher-rated users have a larger effect on the ratings of others than lower-rated users.

It's true that in general, the rating system has worked. The top-rated users are, pretty much uniformly, good contributors to the site. The lowest-rated users are generally incoherent, in(s)ane, and unwanted - though I think that exceptions exist. And users do pay some attention to the ratings of those they read, though only around 1% of registered user accounts actually filter out posts with ratings below a given threshold.

We do definitely see some undesirable behaviour. For example:

People getting upset about their rating dropping a few points and posting threads about it. This wouldn't happen if people were less sensitive, of course, but we have to face the fact that they are this sensitive. It doesn't help that there's not much one can tell those people except "be nicer."
Bandwagoning - people voting somebody down partly because they've got a low rating, and That's What This Thread Is All About Anyway. Group dynamics can be bizarre at times.
People who are great technical contributors, ending up with low(er) ratings because they got a bit ranty in the Lounge, and therefore start to be ignored in technical discussions.
Similarly, people who are really funny in Lounge threads get high ratings, and then when posting in technical threads perhaps get given more authority and credit than they're due.
People who get low ratings can have trouble recovering that rating, partly because people aren't inclined to vote low-rated users up, and if the filters are in play then their posts won't even get seen. This usually leads to the low-rated poster either creating a new account (which is a policy violation) or just leaving the site altogether. Sometimes they'll stay and just not care about their rating, but whether or not they care doesn't change the fact that we then have a user who is making positive contributions but has a low rating.

At the heart of the current rating system's design rests a few fundamental assumptions. Firstly, it assumes that if a user is good in any one way recognised by the community, then they're good in all ways - or at least are smart enough to disclaim themselves in areas where they're not good. Secondly, it assumes that users will fully consider a user and the contributions they've made to the site as a whole before rationally rating them. Thirdly, it assumes that users have good ideas about how to respond to changes in their rating - that they don't just keep doing exactly what they've been doing (albeit with an added air of bafflement and indignation) expecting a different result.

It also contributes to a bad philosophical assumption on the part of the user, and that is: that something is right because a particular person said so. Smart users won't read the ratings in this way; but some users will, when given two answers to their question, pick the answer from the higher-rated user because the user is higher-rated rather than because the answer is better.

None of these assumptions are good. They're true enough of the time that we can point to some corroborating accounts and say, "look, the system works!" but that doesn't tell us whether the system works as well as it could do.

I'm the highest-rated user on this site, so it's not something I consider lightly [grin] but in V5 I'm planning to replace the present rating system with an approach that is less susceptible - albeit not totally immune - to the above problems.

The V5 Rating Strategy
Tagging
The first problem I set out to solve was this: How do we make the rating better convey the ways or areas in which a person is good?

The solution to this one seemed fairly obvious. A mechanism by which users can express their support of a person in arbitrary, user-defined categories? Sounds like a job for tagging to me! By letting users tag users as another kind of site content, we go from having a single rating axis, to as many axes as you want - be they subject-area tags like 'Python' or 'object oriented,' or style tags like 'funny' or 'friendly.' Reconciling the different ways users tag content is already something the tagging engine has to do.

Immediately this also defeats the assumption that 'good in one area == good in all areas.' It becomes very easy to identify when a user is participating in something that matches their tags - i.e. when they're talking about what they're good at.

Thanks
How do we defeat the second assumption - that users will think long and hard before selecting tags for a user? In reality, people don't do that - they read one post, have a strong reaction to it, and then rate accordingly; they don't go "well, this post is obnoxious, but maybe the guy's just having a bad day. I'll check out his other stuff to be sure." If we embrace the strong-reaction-to-a-single-post idea instead of denying it, what we get is: Let people express that reaction with a single click, and then aggregate those reactions to get a feeling for where the user is most well received.

The way this'll be implemented will be via a 'thanks' button on every content item that a user can contribute to. It lets you express that strong reaction quickly. Then, over time, the posts that a user is 'thanked' for will start to contribute their tags to the user - if the user receives lots of 'thanks' in threads that are tagged 'Python performance pygame' for example, then they'll start to acquire those tags themselves. This also gives users more feedback on what they're doing right.

Will there be a 'No thanks!' button? I'm not sure, but I think probably not. If you don't like a contribution, just don't thank the author. If it's really necessary, you can still tag the author explicitly, or even report the post to a moderator.

Decay
How do we deal with the fact that a user's expertise will change over time? Maybe they were a game programming guru 10 years ago, but they've not kept up and their advice is out of date now. This is a fairly simple one, actually: have tags 'decay' over time. Tags that are still frequently applied to a user will 'refresh' and will decay more slowly than tags that aren't. This also solves the 'idiot' problem - how to handle people tagging each other as 'idiot' - because if the user stops being an idiot, the tag will fade away; and it mitigates the lack of a 'no thanks' button, because posting without receiving thanks will cause your tags to fade away.

Getting input
How do we get people to actually use this stuff? That's one of the bigger problems with tagging in general. Step one is to make things as easy to use as possible - single-click to 'thanks' a post, two clicks to get to adding more complex tags. Step two is to get users to at least tag their own stuff; users will be encouraged to 'self-assess' by tagging themselves, to tag their own threads and entries, and so on. Step three is to incentivize. Now, there's a limited amount we can do here - we're not about to start paying people to tag content. What you saw in my last post, though, was the 'badges' system in userboxes; what we can quite easily do is grant a badge to people who tagged 100 content items in the past month, or something like that.

Using the output
Lastly, how do we help users find the best possible content, instead of wasting their time with incoherent in(s)anity - without encouraging them to trust an answer just because it's from a highly rated user? This is a balancing act to be sure, because most of the time the best content is produced by the high-rated users.

The first trick here is to make the way that ratings are displayed be subtle; no more four-digit numbers on each post. Instead, we're considering things like changing the background colour of the post, or the thickness of the post border, to indicate when a user is strongly aligned (tagged the same way as) a thread. Making the display subtle in this way will still make the post stand out a little in the thread, without providing such a clear and definitive thing that people can get overexcited about.

What we will probably display clearly on a post is the number of times it's been thanked (perhaps only within the past X weeks). This makes the number that people latch onto be about individual posts, rather than about users, and that's a lot safer - posts are easier to talk about without people taking things personally.

The second trick is to use the information on a broader level to bias search results. When you're searching for content on a particular topic, the search can elevate threads that have good alignment, or that have lots of 'thanked' posts in. This is still sort of acting on this idea that that content will be right 'because a smart person said it,' but by elevating it to the per-thread level instead of the per-post level, lower-rated users will still have a good opportunity to point out when the higher-rated user isn't making sense.

You'll notice I've not talked about 5-star ratings at all so far. We're still deciding exactly how they'll be integrated. The advantage that 5-star ratings offer is that they are coarse; tagging a thread with particular tokens might capture what the thread is about, but maybe you just want to convey some overall impression that the thread is awesome (or terrible), without figuring out exactly which tags would express that; they might be more applicable to, say, gallery entries. They've got their fair share of problems, of course, as comments on my previous post about the rating UI pointed out. We'll have to do some more thinking about them.

Conclusion
The new system doesn't quite solve the problems that the original rating system set out to solve. Instead, it focuses on the deeper problems of how to get the best content into your hands as quickly as possible and how to describe users; they're harder problems, naturally, but I think more worthwhile.

So, what do you think? I expect that quite a lot of people might have strong feelings about this topic [smile]

## Text sanitization

My work over the past few days has mostly been on the text sanitizer.

The sanitizer is an interesting beast. The basic task it faces is to take a chunk of what may be approximately something approaching XHTML (annotated with custom GDNet extensions), parse and lex it into an XML tree, strip away any elements or attributes that aren't permitted, and ensure that the result is valid XHTML (or that it would be when wrapped inside a DIV).

The first part - generating the XML tree - is actually the simplest. I'm using HTML Tidy, an open-source library for this kind of thing, that can take an arbitrary input and will return valid XML, adding closing tags and stuff where necessary.

The next steps - stripping forbidden elements and attributes - is harder. The sanitizer supports different sanitization 'profiles,' that describe what is and is not allowed for a given chunk of text; this means we can, for example, set a profile for the forums that only grants basic text and formatting tags, but set a profile for the journals that grants things like tables and embedded video.

One significant decision is whether to take an inclusive (only the named tags are allowed, everything else is removed) approach, or an exclusive (only the named tags are removed, everything else is kept) approach. Inclusive is better in that it's more secure, but it also means that the sanitizer needs to know about every possible tag you might want to use, including the attributes permitted on each. The exclusive approach is much easier to write - I just 'blacklist' the disallowed tags and attributes - but it's much more open to abuse, in that if I forget a tag then we've got problems. Things are complicated further by the way in which children of tags should be removed - if you've used the bold tag and it's not allowed, then the tag should be removed without removing the text within it. , on the other hand...

One thing I'm doing to ease the development burden is to use unit tests. I'm building a collection of bits of malformed or malicious text, coupled with the result that the sanitizer should produce.

This is where you can help. What test cases should I have? What finicky tricks and traps do you think the sanitizer should be watching out for?

## Integration

So, the main thing I'm working on at the moment is the design document for the next version of the site. It's not a small document - it outlines everything I plan to bring to the site, in terms of functionality, across the entire V5 line - and will probably guide development for at least a year. So it's fairly important that I get it right.

The announcement I posted - collecting user stories - was the first step in this. The entire first chapter is dedicated to information about our audience, from user stories to group statistics.

One of the things I'm particularly interested in is which other sites you use on a regular basis - partly to add more background info to that first chapter, and partly to look for opportunities to integrate this site with others.

So, which sites do you use on a regular basis?

Are there any times you've been using GDNet and have thought "Hmm, it'd be good if I could here?" For example, somebody on IRC suggested integrating Twitter status feeds into the user profiles, which I think is a nice idea.

## Aaargh

This may be one of the most annoying developments in web technology I've encountered so far.

I guess it's not the access control spec itself per se, as much as it is Firefox and Firebug's implementation of it. Though it is frustrating that a request from the http version of a site to the https version of a site is considered cross-domain; the domain name and thus the set of IP addresses used in each case is the same, so I don't see any scenario in which you could control one but not the other, at least, not one that would otherwise be winnable (e.g. malicious router on the backbone routing things at the application level - it could do that for different file paths within an application, let alone different protocols).

When a cross-domain XHR fails under Firefox, there's no feedback as to why. There's no exception, no console message. The Net panel shows the requests that have been made - so you might see the OPTIONS preflight request, if one is made - but it doesn't tell you if/when it's discarding the results of a request (in response to security policy bits not being satisfied), or why. All you get is an empty XHR object in an error state.

Which is... difficult... to debug.

## Service process account install gotcha

Here's a little something that had me stumped for 15 mins. The info on the net about it is pretty sparse so maybe this will help somebody.

I was trying to install the GDNet service processes on the backend server. Every service process needs its own user account - it makes security, auditing, and SQL Server access a lot neater. Normally when you install a service process that uses a user account, you get prompted for the username and password of the account the service should use. I want the installs to be unattended, so I hardcoded the usernames and passwords into each service process:

[RunInstaller(true)]
public class RendererServiceInstaller : System.Management.Instrumentation.DefaultManagementInstaller
{
public RendererServiceInstaller()
{
var process = new ServiceProcessInstaller
{
Account = ServiceAccount.User,
};
var service = new ServiceInstaller
{
DisplayName = "GDNet V5 Rendering Service",
Description = "Service that renders GDNet XML into XHTML for output to users.",
ServiceName = "V5Renderer"
};

var evtLog = new EventLogInstaller {Source=RendererService.EventLogSourceName, Log="GDNet" };
}
}

When I tried running InstallUtil to install the service, though, I got this error:

System.ComponentModel.Win32Exception: No mapping between account names and security IDs was done

I'd granted the accounts in question the 'Log on as a service' and 'Log on locally' rights, I could start processes as them by hand over remote desktop, so what was the problem?

Look at the username I'm using: it's a domain account, so it's in the form DOMAIN\accountname. Look at what's separating those two components. It's a backslash. Backslashes are special in C# (and many other languages). As far as the compiler was concerned, the account name wasn't 'GDNET\v5_render', it was 'GDNET', then a vertical tab, then "5_render".

Sticking an @ on the front of the username string has fixed it.

## IPHONE'D!

I write this from my shiny new iPhone [grin]

## V5: Continuous Integration and Deployment

I spent a bit of time recently doing some work on V5's build pipeline, implementing continuous integration and making the deploy-to-servers process a bit more formal. Unlike most web developers, I'm a big fan of pre-deployment testing and verification, so a well-established build process is a key part of that.

Continuous Integration, for those who aren't familiar with it, is the simple idea that your code should be continually being built. Every change you check into source control should get compiled, packaged, and tested on all your target platforms - automatically, of course. It's a great way to catch build errors in other configurations or on platforms other than the one you're developing on.

Many people go for CI servers built around CruiseControl, but after researching the options when I was back at NaturalMotion, I selected, used, and fell in love with Zutubi Pulse. So, it's now running on GDNet, a nice complement to our issue tracker and source control system.

Pulse is great. It's got an easy-to-understand but elegant and powerful web UI, built-in support for a bunch of external build systems (such as MSBuild), it's trivial to install... but the best thing, really, is the support. Zutubi is, as far as I can tell, two guys in Australia - Jason and Daniel. Yet, between them, forum questions get answered within minutes, with detailed and helpful responses; feature requests get logged and show up in a point release a week later; their JIRA instance is publicly accessible; and they still, somehow, find time to blog about build systems, agile programming, unit testing, and so on. If I ever meet these men, I am buying them a drink. Each.

Two further things that are more relevant to the average GDNetter: Firstly, they have free licenses available for open-source projects and for small teams (2 people / 2 projects), and secondly, I'm told they've got a number of game developers as customers... so they've got quite a lot of familiarity with our use-cases, and Pulse handles things like '4GB of art assets' pretty well. I'd definitely recommend checking Pulse out if you've got the hardware to spare.

The other nice thing about having a CI server is it provides an authoritative 'release provider' within the GDNet LAN: a clear, single source for new releases of the site software to be deployed to our machines. I've done some work tonight to have Pulse capture the executables and content directories as zip-file 'artifacts;' next I'll get MSBuild to actually stamp the executables with the build number, and I'll look into ways to quickly and efficiently deploy the artifacts to the machines that need them. Eventually, doing a new release of the GDNet site will just be a question of clicking a 'trigger' button, and watching the progress bar tick for a bit [grin]

## V5: XSRF Prevention

Looks like I just missed Gaiiden's weekly journal roundup. Oh well.

I've spent today and yesterday implementing a security measure against cross-site request forgery attacks, otherwise known as XSRF attacks. These are a slightly terrifying class of attack, not least because so few people seem to be paying attention to them; an estimated 70% of sites on the web are vulnerable to - and have done nothing to guard against - this kind of attack.

XSRF is an attack in which a malicious site causes your browser to make a request to another one, in such a way that it takes advantage of the fact that you've got some cookies or some kind of session key open with that other site.

Say you've got a banking website which allows you to conduct some transactions online. They've got a web form for sending money from your account to another one; it submits data to /actions/do_transaction?to=XXXX&amt=YYYY, where XXXX is the target account number and YYYY is the amount. When you're logged into the site, your session is maintained through the use of a cookie stored on your machine.

All that I have to do is embed a 1x1 image in my page that is sourced from '//your.bank/actions/do_transaction?to=1234&amt=1000', and if you view my page while you're logged in, then presto - you've transferred \$1000 to account number 1234. Your browser sees the URI that the image is supposed to come from, and issues a request for it - sending any cookies necessary to keep the session alive. It's like 'remote controlling' a session - there's no need to ever actually steal the session cookie when you can just make the browser that already holds it do what you want to do. It's known as a "confused deputy attack

So, some protections that don't work:

Check the referrer: easily faked, plus some users don't send referrer headers.
Use POST requests instead of GET requests: while this would defeat the IMG tag approach, it's trivial to get around using javascript and XmlHttpRequest.
SSL: At no point is the connection between you and your bank site ever actually attacked in this, so securing that connection doesn't help.
Encrypted cookies: Again, the cookie is never actually stolen, so encrypting it won't help.

Ultimately, there is only one possible defence: Require that the request contain some information that is not stored in cookies and that malicious sites cannot know ahead of time. When your bank presents the 'transfer money' page, it includes that information in the page itself - in the HTML, or in the javascript - and submits it straight back again when you've finished filling out the form. So, if a malicious site wants to obtain that information, it can only do it while you've got the actual page open - and in theory the browser security model should prevent that.

As for the information itself, something as simple as a hash of the request URI with the session ID is enough to shut down most (if not all) attack scenarios. It's got the advantage of being easily testable - all the information you need is in the request itself.

So. What I've built over the past couple of days is a WCF extension that can test messages for the XSRF-prevention token prior to the message even reaching the service operation itself. In short, all I have to do is add a couple of attributes to my service contract:

[ServiceContract]
[XsrfAwareBehavior]
interface IDiscussionService
{
[OperationContract]
[WebGet(UriTemplate="", BodyStyle=WebMessageBodyStyle.Bare)]
Stream GetDiscussionOverviewPage();

[OperationContract]
[WebGet(UriTemplate = "activeTopics", BodyStyle = WebMessageBodyStyle.Bare)]
Stream GetActiveTopicsPage();

[OperationContract]
[WebGet(UriTemplate = "activeTopics.json", BodyStyle = WebMessageBodyStyle.Bare, ResponseFormat = WebMessageFormat.Json)]
[XsrfAwareOperation]

[OperationContract]
[WebGet(UriTemplate = "{id}", BodyStyle = WebMessageBodyStyle.Bare)]
}

You can see one of them at the beginning - indicating that this service contract needs to be checked for XSRF-aware operations - and then the actual operation marker on the GetActiveTopicsJson() method. XsrfAwareBehavior invokes a service contract behavior I've written, which scans the contract for methods marked as XsrfAwareOperation, and inserts my token-checker into the formatting pipeline for each one.

Actually inserting the tokens into HTML is still a bit clunky - I've got a method available to my XSLT which takes the URI for a link and returns the appropriate token. It'll do for now.

Note that this doesn't protect against script injection attacks. If somebody manages to run an unauthorized javascript on a page from actually within the site, then they'll have access to the cookie containing the session ID and could quite easily hash it themselves to issue requests elsewhere. V5 is not going to be quite as permissive as V4 is when it comes to custom javascript, though [wink]

## V5 Guts: Text Sanitizer

One of the biggest causes of security issues in sites - XSS attacks, SQL injection, etc - is a failure to properly handle user input, making sure that it doesn't contain undesirable elements.

This is potentially a very complex task, and it gets more complex the more the user's allowed to do and the more you care about the output. In V5, I want to expand the capabilities of the markup users can include through things like attributes; I also want to keep the data on the server end in a highly flexible format, making it easy to do things like strip out smilies, find posts associated by quotations, and so on. XML seemed the obvious choice.

Another thing I really, really wanted to fix is the way HTML entities get handled. At the moment, if you make a post with < and >gt; entities, they get turned back into when you edit the post, and then treated as HTML when you save the post again... there are also problems with how to encode stuff when putting it out as RSS or similar. I wanted to put a stop to all these encoding issues.

Happily, we've now got a pretty solid pipeline in place. A combination of HTML Agility and OWASP AntiSamy, with my own extensions and modifications, provide the bulk of the work.

HTML Agility takes the tag soup you guys will throw at the site and turns it into an XML document. At its core is a normal state-machine based parser that generates DOM nodes as it encounters them. Agility also handles encoding issues, turning HTML entities like ™ into their actual character sequences. I've also extended it to allow tag names that have namespace prefixes - so it will allow, for example, tags.

The output from Agility is a near-as-dammit-valid XML document that I feed to AntiSamy.NET. Now, AntiSamy I have made some fairly extensive changes to, updating it for C# 2 and multithreading it all. Still, the core concept remains the same: AntiSamy has a 'policy' of which tags are allowed, and which attributes and CSS properties are allowed on them (along with regexps defining the values those attributes and properties can take). When something isn't allowed, it can be dropped entirely - such as I might do to tags - or it can be 'filtered,' removing the tag but leaving its contents. I've set it up to support multiple policies, so I can permit one set of tags when writing articles, another when writing journal entries, and another when writing forum posts, etc.

The result is a neatly-filtered XML fragment that I can quickly and easily perform XPath queries against, or feed to the renderer for processing by the XSLT stylesheets and outputting.

## Word to the wise

I had an email yesterday from a guy with some questions about working in game testing. He had some very... optimistic impressions about what working in the field could be like. After straightening out some of his misconceptions, I asked him where he'd picked up the idea that he'd be able to make a living testing games out of his home...

GameTesterGuide.net is at best very poor quality and a worst a total scam, and should be avoided at all costs.

Google can show you some threads around the place where people have actually bought in and posted about what they recieved. Most notably, one guy recommended using chargeback to reclaim the money on a credit card.

Here's hoping you read this before it's too late...

## IE8 fixes, and chat client

Not much to report today.

IE support is now better, though not on a par with the other platforms by a long shot. Funnily enough, the problem wasn't the mime type - I've been serving it up as text/html for IE for a long time - but more with the actual document content itself. Specifically, benryves drew my attention to this part of the XHTML standard, which states that certain tags should be written as explicit open/close pairs (rather than the style). Under XML these are equivalent, but given that we're pretending for IE's sake that the document is HTML, it causes problems. What's most interesting about this is that it also breaks Firefox - Firefox does not seem to like minimized

## V5 misc

V5 work continues. Today I added the IRC client; rather than using the Java applet again, we're going with Mibbit, which is more fully featured, and runs purely on javascript/AJAX. It's also under active development, and we just embed it, so as they add new features to it those features should just magically manifest themselves at our end. Lurvely.

Let's see, what else have I done? Some refactoring... logging in is now a much simpler codepath. I've added support for reporting bugs directly from the browser page (for logged-in users, at least), which automatically includes useful information in the report like the page you were looking at or various page-level javascript vars. Should make the beta process much smoother. There are also now both RSS 2.0 and Atom feeds for Active Topics in the code.

I've also done more work on the URI schema - the actual addresses you'll use to access resources on the site. I'm going for a RESTful approach with all this, so getting the URI schema right is less about organising files on the webserver and more about usability; for example, /community/forums/topic.asp?id=123456 becomes /discuss/topic/123456 and so on. It also motivates the design of service contracts going forward.

I'm generally pretty happy about my choice of WCF - the documentation is good, the framework generally follows the principle of least surprise, integration with third party tech is fine (any .NET or COM library is trivial to work with, plus of course any other web services), and the more I dig into customizing the WCF stack itself - such as for my XSRF filters - the more I feel that I am bending the framework to my will, rather than being forced to conform to its way of working.

I've only really got one complaint about it, at this stage: while it's very easy to swap out framework pieces for custom components, often those pieces are larger than you really want. That would be fine if it was easy to recreate functionality offered by the part you're replacing, but MS keep most of the relevant helper classes and methods as internal to the WCF assemblies. I'd really like to reuse their code for extracting the body of a POST request as a Stream, for example, but the relevant class (HttpStreamFormatter, I believe) is marked as internal. I can understand that every class they expose publicly is one they have to document, support, and change control, but I think it would be worth it, particularly for people building HTTP apps with WCF.

## V5 Pre-alpha launch

Happy 10th birthday, GDNet! I got you a present. It's not much. I'd hoped, planned, for so much more, but you know how these things go.

Yes, folks, the V5 codebase is finally at a point where I can start putting bits of it up for public dissection, consumption, digestion, and *ahem* feedback!

There's not much to show you today, but I'm planning on pushing out new stuff very quickly at this point; much of the infrastructure is now in place, reasonably solid, so I can really focus on things that you can see.

Things to note before we start:

Firstly, I've been developing it primarily in Firefox; it also mostly works in Chrome. It's broken in IE - I think the problem is the content-type - and I've not tested it in Opera. Eventually, the site will be supported in FF3, IE7 or later, Chrome, Safari, and Opera. I'm aiming to downgrade gracefully to older browsers, but it's not a top priority and it probably won't be pretty.

Secondly, I've been doing all of the graphic design work myself, and I'm no artist. I'm focusing mostly on the functionality of the UI; consider the way that it looks to be 'programmer art' for now. Somebody with actual aesthetic sensibilities will look at it later, I promise [grin]

Thirdly, speed-wise, what you can see today is an unoptimized debug build, sharing a server with the current site (and the current site does not like to share). I've not had a chance to properly stress-test it, which is partly what taking it public is for. So, performance will improve drastically as the bugs are ironed out and I can start turning off the debugging flags.

Lastly, it should all be valid XHTML, CSS, and javascript; it should all work correctly when you are increasing or decreasing the text size; and the URI schema should be generally RESTful.

You can use your regular GDNet username/password for login. It's all connected up to the current site DB through an adaptor layer that maps V4 database records to the new schema formats to as great an extent as possible.
Submission of username/password info is now done over SSL, for greatly improved security. (Maybe you don't care that much about your GDNet account being secure right now, but this is an absolute requirement for some of the services we want to offer in the future).
Once you're logged in, you should see a little bug icon next to the welcome message in the bottom right corner. Click it, and you'll get a box that lets you submit bugs and feedback, right from the browser; reports go automatically straight into my bugtracker. This icon should appear on every page of the site for logged-in users. Go ahead and use it liberally over the coming weeks. (Please don't abuse it; all you do is make more work for me).
A forum topic

I've tried to minimize the amount of extra cruft displayed on each post, so you can focus on the content. Extra user info can be revealed by hitting the chevrons at the right end of the post header.
Avatars don't work yet. They're going to be hard to sync between the current site and the new site...
You can see a few people have badges next to their name. More info about their badges is displayed in the expanded info. At the moment there are only two kinds of badge - Moderator and GDNet+ - but it's easy to think of other badges we might create and apply.

So, yeah. Not much to look at for now, but gimmie feedback. I should have some more stuff for you in the next couple of days.

## Search, don't Sort

One of the major philosophical elements of the V5 design is one taken from Google: Search, don't sort.

The problems with rigid categorization - sorting content items into distinct categories as 'containers' - are fairly well-known:

How do you decide what categories there should be? GDNet only creates new forums when there's sufficient traffic in one area to warrant it; we do this for good reason, but until the traffic reaches critical mass, the category on a topic isn't as precise as it could be.
How do you decide which category something should be in? When you've got category so vaguely defined as 'Game Programming' and 'General Programming,' it's easy to see how people can get confused.
What do you do when a content item should appear in more than one category? And what if they should appear in each category to unequal extents?
How do categories relate to one another? If something in one category is commonly in another category, perhaps they should be nested? If something is in the nested category, is it always also in the parent category?

A different approach is flexible category annotations, or 'tags.' Instead of viewing categories as containers that content items are sorted into, they're viewed as indexes into the content pool, fuzzy sets that describe the data rather than housing it.

What am I telling you this for? It's pretty well-known stuff by now, I guess. I'm bringing it up because over the past few days I've been working mostly on the tagging and search engines for V5.

The tagging engine has a pretty simple set of responsibilities:

Store and retrieve the tags associated by a user with a given resource.
Calculate some set of 'aggregated' tags for a resource, using the tags applied to the item by all users.
Find the resources most relevant to a tag or set of tags.

The implementation I've written so far is a naive one, but it'll suffice for the time being. The aggregation process is simply the average of all user-applied tags, crude but open to tweaking later. Finding the most relevant resources is little more than a SELECT query, scoring relevance by taking the mean least squared error between each tag set and the supplied search tags. There are problems, but they can be fixed later.

One nice trick resulting from the RESTful schema for the site is that each resource has a nice, clear URI - ideal for using as a key. So each tagset is the association of a set of (Tag, Weight) pairs with a Uri. The result is completely content-agnostic; the tagging engine knows nothing about the kinds of content the site offers.

The tagging engine's last responsibility - finding resources - is obviously highly related to the search engine. Not all searches are tag-related; for example, Active Topics is a search for all discussion threads updated in the past 24 hours, while it's easy to imagine other searches based around the author of the content or similar. So, there is a separate search service that stores, maintains, and performs all saved and transient searches, using the tagging engine when appropriate.

## Wheeee

There's a horribly subtle little bug here. Can you spot it?

{
using (var context = new ArticlesDataContext())
{
return uris.Select(delegate(Uri u)
{
var identifier = PrefixUri.MakeRelativeUri(u).ToString();
try {
var g = new Guid(identifier);
return context.articles.Where(a => a.ID == g).FirstOrDefault();
}
catch (FormatException)
{
return context.articles.Where(a => a.UrlTitle == identifier).FirstOrDefault();
}
})
.Where(g => g != null)
{
ObjectSID=0,
Uri= new Uri(PrefixUri, g.UrlTitle)
});
}
}

(Hint: It manifests as an ObjectDisposedException).

## This is relevant to my interests

So, one of the things that we at Gamedev Towers want to bring to the site in the future is a tagging system. I've spent the day so far working on a basic prototype.

Tags are very easy to implement, but difficult to design. Here's the basic idea of tags:

Allow users to attach a bunch of tags to things.
Um...

The basic notion is that tags can be used to establish a 'semantic network' of content, making information easier to find. Instead of taking a user's search phrase and matching it against all the text in your database, you take each chunk of text at authortime and pull the keywords out then to make searches faster later. Furthermore, rather than trying to pull the keywords out automatically, you encourage the author to provide the keywords him or her self.

Second to the idea is the notion of incidental search - things like "related content." You do a search for the tags that the current item is annotated with, ignoring the current item, and offer it as a "see also" section. For this to work well you thus need to do more than just a basic string matching on your tags. Things like synonyms and spelling mistakes would cripple such a simple implementation.

Who gets to tag content, and at what granularity should content be tagged? Youtube allows the author to set the tags, and only per-video. Del.icio.us allows each user to provide their own set of tags for a bookmark, but they're only per-bookmark. Most blogs, on the other hand, only allow the author to tag, but tag each individual post. Which approach is right for GDNet? Do we tag posts, threads, entire forums? Do we rely on the authors to tag their content correctly, or do we encourage the community to do it en masse? How do we structure the system so that it can't be broken by incorrect tagging?

The model employed by del.icio.us is the one that I think seems the most promising, at least in part. Del.icio.us, if you don't know it, is a social bookmarking site - you store your bookmarks in the cloud, annotated with descriptions and tags, and other people can browse or search through them. Now, if a site is good, there's a reasonable chance that lots of people will all bookmark it independently - and they'll use similar tags. Once 10 people have bookmarked the same resource, you'll have a pretty good idea of what the correct tags for it are. Once 100 people have done it, you're solid; you'll have covered most synonyms, spelling mistakes, etc. Languages are a thornier issue but I'm not super concerned about addressing that quite yet.

So, we could use that model. We actually already have a bookmarking system, so that would be the logical thing to expand. Let people quickly add threads - or even individual posts - to their bookmarks to form a "personal search store" of useful content. That would be a good starting point for guiding searches, even for those people who don't bookmark anything. We could even add support for bookmarking external links. And if we were to implement something like del.icio.us, why would people use it instead of just using del.icio.us? Integration. Del.icio.us doesn't do things like tracking when pages update; while for us, providing last-post information with each bookmarked forum thread is trivial. We have insider knowledge on most of the content.

So that would be a start. Would it be enough? I'm not sure, but I think probably not. Under that system, some content would acquire tags that could aid later searches - that works out quite well, in fact, because the content that people tag will be the content most likely to be useful. Still, it leaves a lot of content untagged, and doesn't help change the way people find content in the first place.

One small extension to the system might improve things significantly: when a user posts a new content item, consider it "auto-bookmarked." While posting, have the user set up the tags that it should use. By folding this into the bookmarking system - not explicitly, of course, but internally - all new content items are guaranteed to receive tags. Question is, if this were enforced - posters had to supply tags - would they actually use it? It's an approach that leads to people using tags like "asdfasdf" just to satisfy the software. That's not helpful. There are two things that may help, though.

The first is automatic tag suggestion. It's a nontrivial task, but it may be possible to take a content item - I'm think primarily text, here - and identify key words automatically. To take a page out of Google's book, extra weight would be given to things like the title or to hypertext links. Clicking a few tags in a "suggested tags" list is easier than typing junk into a text field, so while people might apply the wrong tags, it would help stop the system getting polluted with junk tags. Automatic tag suggestion is also the only realistic way of generating tags for all our archived content...

The second is to take advantage of the path the user took to creating the content item. Take the saved searchforum that gets used to post a new thread. If that forum has some tags associated with it, then the new thread could automatically have those tags applied. That would ensure that anything posted in Graphics Programming and Theory would at least get a "graphics" tag, for example. This leads neatly to the next aspect of the system...

Currently there are a number of predefined forums on GDNet - "For Beginners," "Graphics Programming and Theory," and so on. These are categories for topics that have been defined by the GDNet Overlords over a long period of time, and are fairly resistant to change - new forums are only created in response to a surge of discussion on one subject that distorts the focus of an existing forum and drowns out discussion about other topics.

But who's to say that we're right? Many of the forums have poorly defined boundaries - where do you draw the line between General Programming and Game Programming, after all? Or Math and Physics and Graphics Programming and Theory? We don't permit cross-posting, so if you've got something in the grey area, you just have to pick one and go with it, likely costing you the expertise of people in the other one. Ideally your topic should be marked (*cough* TAGGED *cough*) for both forums.

Thing is, if we've got all our content tagged, rigid categories aren't necessary. Instead we have the concept of saved searches - a set of search parameters, the results of which are used to generate a set of topics. We flip things upside down and allow topics to self-select into "forums" instead of having to explicitly associate them. Want a forum dedicated entirely to shadow-mapping? Just set up a saved search for that. And of course, anything that the search can do, this can do too - for example, you could edit your search to exclude topics started by a particular poster that you don't like. If you start connecting it to user profile data, too - like, say, a user's stated "proficiency level" in given topics - then you can quickly construct a beginners-only (or experts-only) view.

There's obviously still a lot of value in having predefined categories. And that's one of the great things - we can still keep those, even with a search-based system; a saved search for the "offtopic" tag, titled "GDNet Lounge", and you've got your Lounge. It's self-supporting, too, as I noted above - if you go to the create-thread interface via that Lounge saved-search, then your topic will receive the "offtopic" keyword automatically, so what you've posted in the Lounge will appear to stay there.

There are other details I'm thinking about. For example, should all tags be considered equal? This post is mostly about tagging, somewhat about GDNet, a bit about forum structure... yet just tagging it "tagging, gdnet, forum structure" wouldn't capture that information. It would have to be a simple UI, like a slider bar for each tag, but perhaps users could choose to specify weights for their tags if they so desire. You no longer have to decide whether or not it's worth using a particular tag, you can just use it but at a low weighting.

I realise this is a long post. If you made it this far, well done! Care to round off your journey by leaving me some feedback?