Tags are very easy to implement, but difficult to design. Here's the basic idea of tags:
- Allow users to attach a bunch of tags to things.
The basic notion is that tags can be used to establish a 'semantic network' of content, making information easier to find. Instead of taking a user's search phrase and matching it against all the text in your database, you take each chunk of text at authortime and pull the keywords out then to make searches faster later. Furthermore, rather than trying to pull the keywords out automatically, you encourage the author to provide the keywords him or her self.
Second to the idea is the notion of incidental search - things like "related content." You do a search for the tags that the current item is annotated with, ignoring the current item, and offer it as a "see also" section. For this to work well you thus need to do more than just a basic string matching on your tags. Things like synonyms and spelling mistakes would cripple such a simple implementation.
Who gets to tag content, and at what granularity should content be tagged? Youtube allows the author to set the tags, and only per-video. Del.icio.us allows each user to provide their own set of tags for a bookmark, but they're only per-bookmark. Most blogs, on the other hand, only allow the author to tag, but tag each individual post. Which approach is right for GDNet? Do we tag posts, threads, entire forums? Do we rely on the authors to tag their content correctly, or do we encourage the community to do it en masse? How do we structure the system so that it can't be broken by incorrect tagging?
The model employed by del.icio.us is the one that I think seems the most promising, at least in part. Del.icio.us, if you don't know it, is a social bookmarking site - you store your bookmarks in the cloud, annotated with descriptions and tags, and other people can browse or search through them. Now, if a site is good, there's a reasonable chance that lots of people will all bookmark it independently - and they'll use similar tags. Once 10 people have bookmarked the same resource, you'll have a pretty good idea of what the correct tags for it are. Once 100 people have done it, you're solid; you'll have covered most synonyms, spelling mistakes, etc. Languages are a thornier issue but I'm not super concerned about addressing that quite yet.
So, we could use that model. We actually already have a bookmarking system, so that would be the logical thing to expand. Let people quickly add threads - or even individual posts - to their bookmarks to form a "personal search store" of useful content. That would be a good starting point for guiding searches, even for those people who don't bookmark anything. We could even add support for bookmarking external links. And if we were to implement something like del.icio.us, why would people use it instead of just using del.icio.us? Integration. Del.icio.us doesn't do things like tracking when pages update; while for us, providing last-post information with each bookmarked forum thread is trivial. We have insider knowledge on most of the content.
So that would be a start. Would it be enough? I'm not sure, but I think probably not. Under that system, some content would acquire tags that could aid later searches - that works out quite well, in fact, because the content that people tag will be the content most likely to be useful. Still, it leaves a lot of content untagged, and doesn't help change the way people find content in the first place.
One small extension to the system might improve things significantly: when a user posts a new content item, consider it "auto-bookmarked." While posting, have the user set up the tags that it should use. By folding this into the bookmarking system - not explicitly, of course, but internally - all new content items are guaranteed to receive tags. Question is, if this were enforced - posters had to supply tags - would they actually use it? It's an approach that leads to people using tags like "asdfasdf" just to satisfy the software. That's not helpful. There are two things that may help, though.
The first is automatic tag suggestion. It's a nontrivial task, but it may be possible to take a content item - I'm think primarily text, here - and identify key words automatically. To take a page out of Google's book, extra weight would be given to things like the title or to hypertext links. Clicking a few tags in a "suggested tags" list is easier than typing junk into a text field, so while people might apply the wrong tags, it would help stop the system getting polluted with junk tags. Automatic tag suggestion is also the only realistic way of generating tags for all our archived content...
The second is to take advantage of the path the user took to creating the content item. Take the
Currently there are a number of predefined forums on GDNet - "For Beginners," "Graphics Programming and Theory," and so on. These are categories for topics that have been defined by the GDNet Overlords over a long period of time, and are fairly resistant to change - new forums are only created in response to a surge of discussion on one subject that distorts the focus of an existing forum and drowns out discussion about other topics.
But who's to say that we're right? Many of the forums have poorly defined boundaries - where do you draw the line between General Programming and Game Programming, after all? Or Math and Physics and Graphics Programming and Theory? We don't permit cross-posting, so if you've got something in the grey area, you just have to pick one and go with it, likely costing you the expertise of people in the other one. Ideally your topic should be marked (*cough* TAGGED *cough*) for both forums.
Thing is, if we've got all our content tagged, rigid categories aren't necessary. Instead we have the concept of saved searches - a set of search parameters, the results of which are used to generate a set of topics. We flip things upside down and allow topics to self-select into "forums" instead of having to explicitly associate them. Want a forum dedicated entirely to shadow-mapping? Just set up a saved search for that. And of course, anything that the search can do, this can do too - for example, you could edit your search to exclude topics started by a particular poster that you don't like. If you start connecting it to user profile data, too - like, say, a user's stated "proficiency level" in given topics - then you can quickly construct a beginners-only (or experts-only) view.
There's obviously still a lot of value in having predefined categories. And that's one of the great things - we can still keep those, even with a search-based system; a saved search for the "offtopic" tag, titled "GDNet Lounge", and you've got your Lounge. It's self-supporting, too, as I noted above - if you go to the create-thread interface via that Lounge saved-search, then your topic will receive the "offtopic" keyword automatically, so what you've posted in the Lounge will appear to stay there.
There are other details I'm thinking about. For example, should all tags be considered equal? This post is mostly about tagging, somewhat about GDNet, a bit about forum structure... yet just tagging it "tagging, gdnet, forum structure" wouldn't capture that information. It would have to be a simple UI, like a slider bar for each tag, but perhaps users could choose to specify weights for their tags if they so desire. You no longer have to decide whether or not it's worth using a particular tag, you can just use it but at a low weighting.
I realise this is a long post. If you made it this far, well done! Care to round off your journey by leaving me some feedback?