[web] Optimizing SQL Highscore

This topic is 4322 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

Recommended Posts

Optimizing SQL: Our webhost recently pulled the plug on our PHP scripts for over-using CPU ([rant], I'd advice avoiding LunarPages like the plague. Worst user-treatment I've ever seen, they just yanked our entire website, email, FTP, control panel access, everything offline without warning.[/rant]) I was hoping someone here could give us some tips on how to optimize the SQL to reduce load (so we don't upset the NEXT web-host as well :) It's a fairly simple script, just used for downloading top-5 highscores. The highscores get hit at the end of every level, ~18K entries for the last 10 days). The SQL/PHP code looks like this:
$query = "SELECT GameUser.Name, GameUser.Location, GameScore.Score FROM GameScore,GameUser WHERE GameUser.UserID = GameScore.UserID AND GameScore.Level = '$Level'
AND GameScore.GameID = $GameID ORDER BY GameScore.Score DESC LIMIT 5"; /*echo$query;*/

$result = mysql_query($query, $db) or die("@@Error@@ executing query: ".mysql_error()); while ($myrow = mysql_fetch_row($result)) { printf("%s@%s@%d@ ",$myrow[0], $myrow[1],$myrow[2]);
}

The table looks like this
CREATE TABLE GameScore (
Score BIGINT,
Level TEXT,
GameID BIGINT,
Date DATETIME,
VersionID BIGINT,
UserID BIGINT NOT NULL ,
INDEX ( UserID)
)

CREATE TABLE GameUser (
Name TEXT,
Location TEXT,
Created DATETIME,
UserID BIGINT NOT NULL AUTO_INCREMENT,
INDEX ( UserID )
)


Specifically, they're complaining about the Select query. Is it the table-join that's getting the server down, or just bad use of indexing? Thanks for any ideas! Allan

Share on other sites
The query itself is simple and shouldn't be the cause of the problem, but it may be the number of rows that are being scanned. SQL may only be returning five rows, but it has to sort through tens of thousands to get those five.

If you're using MySQL, it doesn't perform well for larger databases, where as PostgreSQL, which is a little slower, performs well with larger databases.

One other note: your table structure is setup to contain much more bit space than what is needed.

Change your TEXT fields to VARCHAR(255) fields. TEXT is better suited for comments, descriptions, etc. and not for short items, such as name, address, phone number, etc. I would also set the BIGINT's to UNSIGNED, unless you want to track negative numbers (if not, definitely go with UNSIGNED). If any of the BIGINT's contain small numbers, less than or equal to 4,294,967,295, then INT should suffice (you may even be able to use TINYINT or SMALLINT, depending on the amount being stored).

Hope this helps!

Share on other sites
First off, I hate joins written like that. It's too hard to read. Use the JOIN keyword so it's obvious what's going on:
SELECT u.Name, u.Location, s.Score FROM GameScore AS sJOIN GameUser AS u ON u.UserID = s.UserIDWHERE s.Level = '$Level' AND s.GameID =$GameID ORDER BY s.Score DESCLIMIT 5

By looking at this, it's easy to see that both UserIDs must be indexed, which you do. Then, either Level or GameID should be - or possibly a key on both together. Without knowing the data distribution, it's hard to recommend.

There is even another way to optimize the query, if you know something about the data. If you know you will have a lot of duplicate s.Level/s.GameID's then you may be better off indexing s.Score and doing something like this:
SELECT u.Name, u.Location, s.Score FROM GameScore AS sJOIN GameUser AS u ON u.UserID = s.UserIDWHERE s.Level = '$Level' AND s.GameID =$GameID AND s.Score > $scoreORDER BY s.Score DESCLIMIT 5 Notice that$score is now being added to the query. You can start with a value that is probably going to give back 5 records. If it doesn't, you can re-run the query with a lower $score value. This is usually not needed in MyISAM queries, but it can make all the difference with InnoDB. But it all depends on your data... MySQL will only use one index per table, so you'll need to figure out which of those three is the best candidate. Share this post Link to post Share on other sites Thanks for the points on the variable sizes, Mathachew. Would it be worthwhile to duplicate name and location in the score table to avoid joining? Also; it's a good point that duplication might be troublesome. Our GameID's are all identical (since only one game is using the system so far. Once we add a second game, it'll presumably reuse the same code and table). There's around 90 levelIDs or so. 18.000 entries, so the search may actually be what's killing us. Does anyone have recommendations for optimizing highscore tables? Should we look at using a Cron job or similar to trim the highscore every day? Allan Share this post Link to post Share on other sites Quote:  Thanks for the points on the variable sizes, Mathachew. You are welcome! Quote:  Would it be worthwhile to duplicate name and location in the score table to avoid joining? Considering that the amount of fields you have between the two tables is low, you could merge them into a single table and create an Index based on the UserID and GameID. Doing this eliminates one table, and creates a unique entry per user per game, and omits the need of a JOIN in your query. I conceed that doing this may not give you any better results because I simply do not know. Quote:  Does anyone have recommendations for optimizing highscore tables? Should we look at using a Cron job or similar to trim the highscore every day? Depends on what you want to do with the lower scores. You could remove them or do some kind of archiving of the older/lower score data by using a CRON job. However, I haven't done anything like this and anything that immediately comes to mind wouldn't be effecient. Share this post Link to post Share on other sites Since we only really care about the top 10 scores, I ended up setting up a cron-job to execute across each unique level and delete un-needed ones. Right now (since we're at the TryGames 'new game' bar), we're getting ~100 highscore entries / minute, so clearing them out each hour or so should be good enough. Thanks for all your help, Allan Share this post Link to post Share on other sites How unique are the levels? Would adding an index on them help? If they aren't very unique, then adding an index on the score and adding it to the WHERE would be the best. I would keep the tables separated. I doubt the JOIN is hurting things, because it shouldn't be applied until after the 5 records are found. Another thing you could do, which is similar to your current approach is simply to build a high-score table once every hour: INSERT INTO high_score (GameID, LevelID, name, location, score)SELECT$GameID, $Level, u.Name, u.Location, s.Score FROM GameScore AS sJOIN GameUser AS u ON u.UserID = s.UserIDWHERE s.Level = '$Level' AND s.GameID = \$GameID ORDER BY s.Score DESCLIMIT 5

The drawback is the data isn't live, but the benefits are that it is as fast as can be to access and you don't have to throw away old records.

Would it be possible to provide us (even privately) with a SQL dump of those two tables (with live data)? I'd be interested in taking a look at it.

Share on other sites
As most people have already mentioned, You don't have the proper indexes.

I would take a look at
http://dev.mysql.com/doc/refman/5.0/en/explain.html

I would also consider breaking values into seperate tables.. Do you have old scores in there that are rarely accessed? Maybe move old values into another table for holding.

Share on other sites
Quote:
 Original post by konForceHow unique are the levels? Would adding an index on them help? If they aren't very unique, then adding an index on the score and adding it to the WHERE would be the best.I would keep the tables separated. I doubt the JOIN is hurting things, because it shouldn't be applied until after the 5 records are found.

Best as I can see, the things I should have done differently:

1. Assign a unique value instead of a string for the level-name. Could have gotten away with just hashing the name.
2. Avoid the join; it's a waste of energy.
3. Look at cleaning up the data more often.

Right now we're generating ~1000 highscores / minute (12K unique users), since we just went live at BigFish and TryMedia... doing a data-dump of that is probably going to be pretty unreadable :)

For the next iteration I'll streamline the process a bit.

The highscore needs to remain realtime; our users like to see how they stack up.

Allan Simonsen

Share on other sites
Why not have a small table *just* for the current high scores? Rather than insert new scores into the giant high scores list and then querying against it, write a script that will take in the new scores and, if they are high enough, insert them into the small high score table, bumping out the lowest current high score. You can make even make an optimization here by comparing first against the current 5th place score... all scores lower than this don't need to be compared to any of the other values.

1. 1
2. 2
3. 3
4. 4
Rutin
18
5. 5

• 14
• 12
• 9
• 12
• 37
• Forum Statistics

• Total Topics
631423
• Total Posts
3000003
×