php/mysql highscore optimization

Started by
6 comments, last by Anonymous P 16 years, 6 months ago
hello, I'm currently designing an online strategy game, I've completed the design document and am currently working on implementing several features, however I'm not sure how i would optimize the highscores for fast sorting/retrieval of 100k+ players? highscores will be ranked by level, then experience. Edit: I should have mentioned that I'm only using one table at the moment, the level and experience columns are indexed.
Advertisement
Create a "SELECT [whatever] FROM [table] ORDER BY [foo] LIMIT [max]" view on your table. Ideally, your database engine should optimize it out by only reevaluating the view when the underlying table is modified.
I’m already doing that,

is there no other way for faster sorting & retrieval which is scalable?
This is the general-case optimization. If you wish to optimize further, you'll need to profile the typical usage of your application and determine which query sequences cause the biggest performance hits, in order to optimize those queries at the cost of slowing down other, less important queries.

So, which ones?
If you have placed the right indexes, you should be okay for the moment. Hundreds of thousands of records sound like a lot, but a proper database engine should be able to handle that. Having said that, my only experience with huge databases involved MS SQL, but I am fairly confident MySQL has grown into a good alternative. As far as I know, you have the option to choose from different storage engines, you may want to research their advantages and disadvantages in several scenarios.

In general, the problem with heavy indexes is not the retrieval of the data but the update of existing data and more importantly the insertion of new data. This can become slow as the engine has to physically reorder the records (on a clustered index) or update the index tables pointing to the original data (non-clustered index). In that respect, profiling and analyzing its performance is absolutely paramount.

Placing indexed tables in separate file groups on separate hard disks is something that works well on MS SQL Server, but I don't know if this is possible in MySQL. If worse comes worse you could generate additional tables optimized for certain tasks. This would go against a proper normalized architecture, so I wouldn't go there if not strictly necessary.
Quote:
Edit: I should have mentioned that I'm only using one table at the moment, the level and experience columns are indexed.


You shouldn't have any problems with those volumes of data, if you properly normalize your db. Keeping everything in one big table is generally a very bad idea for several reasons, among them performance.


1) Normalize your data. Most relational databases (such as mysql) keep their data chunked by record; this means that a query that only requires three fields (id, level, experience) will pull lots of unneccesary information from disk. Disk seeks are the first thing that kills db performance.

Also, depending on the database's transaction safety handling mechanism (don't know about mysql) normalization can be a big boost to update and insert performance by splitting frequently updated rows from the rest of the data.

Let's be generous and say that experience is a 32 bit integer, id is a 64 bit integer and level is a 32 bit integer. That's 3 megabytes for 200000 users, which amply fits in ram, and will stay there if it's accessed frequently.

Split off your level and experience into a separate table. Other things that are queried separately too; this lets the database's paging system take maximum advantage of available ram. Unless you need most fields in most of your queries, this will substantially improve performance.


2) Your query should look like

select id, level, experience from big_ass_table order by level, experience limit 20

Your DB's query planner MAY use the level index (not the experience index) to optimize this query, but maybe not.

A double index on (level, experience) makes this query blindingly fast, but as mentioned above, it increases the insert and update costs. Double indexes are big. It depends. Which brings us to the third thing to try if everything else fails (trust me, it won't):


3) Denormalize ;) Let's say you've split your levels and experience values for each player into a separate table that looks something like this, depending on the sql dialect:

CREATE TABLE level_exp (
id BIGINT PRIMARY KEY REFERENCES player_table(id),
level INT NOT NULL,
exp INT NOT NULL);

CREATE INDEX level_idx ON level_exp(level);


Thing is, you're running it on your grandma's underclocked 286 and your query planner isn't very good. A double index is too bulky and slows you down on inserts. What to do?

Well, you might notice that you can create a single index that fulfills your needs; the 64 bit integer idx, created by concatenating level and index, is exactly the ordering you need:

idx = (level << 32) | exp

So you drop the index on level, add an idx column to the level_exp table, index that, and write a trigger that takes care of maintaining it in sync with the records:

CREATE TABLE level_exp (
id BIGINT PRIMARY KEY REFERENCES player_table(id),
level INT NOT NULL,
exp INT NOT NULL,
idx BIGINT NOT NULL);

CREATE INDEX level_exp_idx ON level_exp(idx);

Now you've made all the DBA's kill kittens, but even the most boneheaded query planner will perform an (optimal in this case) index scan backwards using this query:

SELECT id, level, exp FROM level_exp ORDER BY idx DESC LIMIT 20;
Technically, splitting a table into two or more parallel tables with the same primary key is not actually normalising anything. That's not to discount the potential performance benefits of such an approach however.
Blargh, I meant to work something about normalization into the post (one big table screams issues) but only mixed the term in somewhere it doesn't belong. Sorry.

This topic is closed to new replies.

Advertisement