Upcoming downtime

Published December 04, 2007
Advertisement
Before I start: Anyone out there interested in translating your game into Russian? A friend of mine is a Russian living in the UK and she's interested in doing some translation work. She's not currently looking for money from it, just experience and some references she can put on her CV. She can do straight textual translation, and also cultural awareness stuff (e.g. she can tell you if a hand gesture your character makes means something really rude in Russian or something). If anyone's interested, let me know.

OK, so... downtime in the future? Oh yes. But it's for a good cause!

Washu and I were looking over the indexes on the site's DB recently and we realised that the indexes on the thread and post tables really needed a bit of work.

Perhaps you don't know what an index is. Consider an index in a book: It's an alphabetically-sorted 'summary' of the book, constructed from the titles of topics discussed in each section, with page references that let you navigate from the 'summary' to the 'full' text. Well, an index in a database is exactly the same thing - a sorted 'summary' of a table, constructed from data in one or more columns, that includes pointers onwards to the full rows that the data was drawn from.

Indexes speed up searches for information. Without an index, you just have to start at the beginning and work your way through all the data until you find what you're looking for - in a databasing context, this means a lot of unnecessary disk reads and wasted CPU time. But with the right index - one that is constructed from columns you're interested in searching on - it can be much faster.

For example, consider an index on the Topics table that is constructed from the "Author" column. It's effectively a list of every user who's ever created a topic, and under each user, a pointer to each row that represents a topic that they've started. If I want to find all the topics that a user has started, I can just look their ID up in the index (very efficiently, because the index is sorted, so I can use techniques like binary search or even more complex structures like B-Trees to store the index) and get the rows I need to read straight out of it. However, if I want to find all the topics in a particular forum, the index doesn't help me.

Why not just create an index for every possible kind of information? The downsides to every index you have are (a) the space they take up and (b) the extra time required to update the index whenever you change the data in the table. If you can minimize the number of indexes you use, your updates/inserts/deletes go faster, and your indexes are smaller which means you stand a better chance of keeping them in memory for fast queries.

So, you can see how indexes become a bit of a balancing act - you want to be able to use indexes to speed up your queries wherever possible, but you also want to minimize the total number and size of indexes that you use.

There are two further ideas to complicate things.

Firstly, on any table, one can select a single index as the "Clustered" index. This means that beyond simply being an index into existing data, this index will determine the physical layout of data on the disk - effectively letting you store things in a sorted order, which can be very useful. For example, if I create a clustered index for the topics table that indexes the table by forum ID and then by last post time, then the data on disk will be grouped into one big block for each forum, and sorted by most recent post time within that block - so when you view the forum listing page, all the database needs to do is start somewhere in the block and read one contiguous lump of data from disk. Much faster than having to seek all over the place for rows that are scattered around.

Secondly, it's possible to avoid having to read the table at all by storing extra information in the index. For example, say I've got an index on the replies table, that indexes by reply ID, then topic ID, then author. Reply ID is unique to every row in the table, so having the topic ID and author in the index doesn't help to organise the data at all - but if my query only wants to know the topic and author for a given reply, it can find all that information in the index without having to actually visit the row containing the reply itself. Storing unnecessary information in an index like this is known as "covering" a query, and can speed things for queries at the cost of index size and update speed.

What I'm currently looking at is a clustered index on the topics table by forum and last reply time, and a clustered index on the replies table by topic ID and timestamp - so on disk, threads will be grouped into forums, replies will be grouped into threads, and both will be sorted in chronological order. Then there are a bunch of nonclustered indexes that account for some of the other ways topics and replies are accessed (e.g. by user, by IP, by time (regardless of forum), etc).

Once we've settled on a new indexing setup, the site will be going down - hopefully not for very long, like an hour or two - to apply the new indexes. The clustered indexes are the big ones because they'll have to physically sort the data on disk.
Previous Entry HI GUUYS
Next Entry Syntax hilighting
0 likes 3 comments

Comments

jollyjeffers
Sounds good to me!

I managed to speed up queries by ridiculous amounts by indexing. We did have 60 million records with only 3000 distinguishing attributes though [lol]

Jack
December 05, 2007 04:39 AM
johnhattan
I could probably use a pro-bono Russian translator, but I know if I do that it'll improve my viewership by 1% in exchange for people emailing me tech support help in Russian :)

Thus far, people from other countries seem forgiving of foreign text as long as they can figure the game out.
December 06, 2007 07:13 PM
Moe
Indexing, eh? I certainly have a greater appreciation for the DB end of things on Gamedev.net. I have/had a database class this semester, and for our final project we built a small online forum. It looked quite horrible, but it did actually work which was cool to see. I certainly have a new-found respesct for the GDNet database!

I probably wouldn't hesitate to put a news article on the front page stating when the downtime is going to be. It's always a bit annerving to swing over to gamedev only to find that it's down.
December 12, 2007 11:14 PM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement