Sign in to follow this  

[web] Optimizing SQL Highscore

This topic is 4129 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Optimizing SQL: Our webhost recently pulled the plug on our PHP scripts for over-using CPU ([rant], I'd advice avoiding LunarPages like the plague. Worst user-treatment I've ever seen, they just yanked our entire website, email, FTP, control panel access, everything offline without warning.[/rant]) I was hoping someone here could give us some tips on how to optimize the SQL to reduce load (so we don't upset the NEXT web-host as well :) It's a fairly simple script, just used for downloading top-5 highscores. The highscores get hit at the end of every level, ~18K entries for the last 10 days). The SQL/PHP code looks like this:
$query = "SELECT GameUser.Name, GameUser.Location, GameScore.Score 
          FROM `GameScore`,`GameUser`
               WHERE GameUser.UserID = GameScore.UserID
               AND GameScore.Level = '$Level' 
               AND GameScore.GameID = $GameID 
          ORDER BY GameScore.Score DESC LIMIT 5";
/*echo  $query;*/
		
$result = mysql_query($query, $db) or die("@@Error@@ executing query:  ".mysql_error());	

while ($myrow = mysql_fetch_row($result)) 
{
    printf("%s@%s@%d@ ", $myrow[0], $myrow[1], $myrow[2]); 
}	
The table looks like this
CREATE TABLE `GameScore` (
`Score` BIGINT,
`Level` TEXT,
`GameID` BIGINT,
`Date` DATETIME,
`VersionID` BIGINT,
`UserID` BIGINT NOT NULL ,
INDEX ( `UserID`)
)


CREATE TABLE `GameUser` (
`Name` TEXT,
`Location` TEXT,
`Created` DATETIME,
`UserID` BIGINT NOT NULL AUTO_INCREMENT,
INDEX ( `UserID` )
)

Specifically, they're complaining about the Select query. Is it the table-join that's getting the server down, or just bad use of indexing? Thanks for any ideas! Allan

Share this post


Link to post
Share on other sites
The query itself is simple and shouldn't be the cause of the problem, but it may be the number of rows that are being scanned. SQL may only be returning five rows, but it has to sort through tens of thousands to get those five.

If you're using MySQL, it doesn't perform well for larger databases, where as PostgreSQL, which is a little slower, performs well with larger databases.

One other note: your table structure is setup to contain much more bit space than what is needed.

Change your TEXT fields to VARCHAR(255) fields. TEXT is better suited for comments, descriptions, etc. and not for short items, such as name, address, phone number, etc. I would also set the BIGINT's to UNSIGNED, unless you want to track negative numbers (if not, definitely go with UNSIGNED). If any of the BIGINT's contain small numbers, less than or equal to 4,294,967,295, then INT should suffice (you may even be able to use TINYINT or SMALLINT, depending on the amount being stored).

Hope this helps!

Share this post


Link to post
Share on other sites
First off, I hate joins written like that. It's too hard to read. Use the JOIN keyword so it's obvious what's going on:

SELECT u.Name, u.Location, s.Score
FROM `GameScore` AS s
JOIN `GameUser` AS u ON u.UserID = s.UserID
WHERE s.Level = '$Level' AND s.GameID = $GameID
ORDER BY s.Score DESC
LIMIT 5

By looking at this, it's easy to see that both UserIDs must be indexed, which you do. Then, either Level or GameID should be - or possibly a key on both together. Without knowing the data distribution, it's hard to recommend.

There is even another way to optimize the query, if you know something about the data. If you know you will have a lot of duplicate s.Level/s.GameID's then you may be better off indexing s.Score and doing something like this:

SELECT u.Name, u.Location, s.Score
FROM `GameScore` AS s
JOIN `GameUser` AS u ON u.UserID = s.UserID
WHERE s.Level = '$Level' AND s.GameID = $GameID AND s.Score > $score
ORDER BY s.Score DESC
LIMIT 5

Notice that $score is now being added to the query. You can start with a value that is probably going to give back 5 records. If it doesn't, you can re-run the query with a lower $score value. This is usually not needed in MyISAM queries, but it can make all the difference with InnoDB. But it all depends on your data...

MySQL will only use one index per table, so you'll need to figure out which of those three is the best candidate.

Share this post


Link to post
Share on other sites
Thanks for the points on the variable sizes, Mathachew.

Would it be worthwhile to duplicate name and location in the score table to avoid joining?

Also; it's a good point that duplication might be troublesome. Our GameID's are all identical (since only one game is using the system so far. Once we add a second game, it'll presumably reuse the same code and table). There's around 90 levelIDs or so. 18.000 entries, so the search may actually be what's killing us.

Does anyone have recommendations for optimizing highscore tables? Should we look at using a Cron job or similar to trim the highscore every day?

Allan

Share this post


Link to post
Share on other sites
Quote:
Thanks for the points on the variable sizes, Mathachew.


You are welcome!

Quote:
Would it be worthwhile to duplicate name and location in the score table to avoid joining?


Considering that the amount of fields you have between the two tables is low, you could merge them into a single table and create an Index based on the UserID and GameID. Doing this eliminates one table, and creates a unique entry per user per game, and omits the need of a JOIN in your query. I conceed that doing this may not give you any better results because I simply do not know.

Quote:
Does anyone have recommendations for optimizing highscore tables? Should we look at using a Cron job or similar to trim the highscore every day?


Depends on what you want to do with the lower scores. You could remove them or do some kind of archiving of the older/lower score data by using a CRON job. However, I haven't done anything like this and anything that immediately comes to mind wouldn't be effecient.

Share this post


Link to post
Share on other sites
Since we only really care about the top 10 scores, I ended up setting up a cron-job to execute across each unique level and delete un-needed ones. Right now (since we're at the TryGames 'new game' bar), we're getting ~100 highscore entries / minute, so clearing them out each hour or so should be good enough.

Thanks for all your help,

Allan

Share this post


Link to post
Share on other sites
How unique are the levels? Would adding an index on them help? If they aren't very unique, then adding an index on the score and adding it to the WHERE would be the best.

I would keep the tables separated. I doubt the JOIN is hurting things, because it shouldn't be applied until after the 5 records are found.

Another thing you could do, which is similar to your current approach is simply to build a high-score table once every hour:

INSERT INTO high_score (GameID, LevelID, name, location, score)
SELECT $GameID, $Level, u.Name, u.Location, s.Score
FROM `GameScore` AS s
JOIN `GameUser` AS u ON u.UserID = s.UserID
WHERE s.Level = '$Level' AND s.GameID = $GameID
ORDER BY s.Score DESC
LIMIT 5

The drawback is the data isn't live, but the benefits are that it is as fast as can be to access and you don't have to throw away old records.

Would it be possible to provide us (even privately) with a SQL dump of those two tables (with live data)? I'd be interested in taking a look at it.

Share this post


Link to post
Share on other sites
As most people have already mentioned, You don't have the proper indexes.

I would take a look at
http://dev.mysql.com/doc/refman/5.0/en/explain.html

This might help you greatly.

I would also consider breaking values into seperate tables.. Do you have old scores in there that are rarely accessed? Maybe move old values into another table for holding.

Share this post


Link to post
Share on other sites
Quote:
Original post by konForce
How unique are the levels? Would adding an index on them help? If they aren't very unique, then adding an index on the score and adding it to the WHERE would be the best.

I would keep the tables separated. I doubt the JOIN is hurting things, because it shouldn't be applied until after the 5 records are found.


Best as I can see, the things I should have done differently:

1. Assign a unique value instead of a string for the level-name. Could have gotten away with just hashing the name.
2. Avoid the join; it's a waste of energy.
3. Look at cleaning up the data more often.

Right now we're generating ~1000 highscores / minute (12K unique users), since we just went live at BigFish and TryMedia... doing a data-dump of that is probably going to be pretty unreadable :)

For the next iteration I'll streamline the process a bit.

The highscore needs to remain realtime; our users like to see how they stack up.

Thanks for all the advice,

Allan Simonsen


Share this post


Link to post
Share on other sites
Why not have a small table *just* for the current high scores? Rather than insert new scores into the giant high scores list and then querying against it, write a script that will take in the new scores and, if they are high enough, insert them into the small high score table, bumping out the lowest current high score. You can make even make an optimization here by comparing first against the current 5th place score... all scores lower than this don't need to be compared to any of the other values.

Share this post


Link to post
Share on other sites
If you're using that query with large tables, you should have indexes on at least:

- Anything you're joining on (e.g. Foreign keys which don't currenly have indexes on; all primary keys will have automatic indexes)
- Anything you're ordering by (e.g. score)

I cannot begin to explain how much difference having the right indexes makes on larger data sets. If for example, you have 10k+ rows, the difference can be dramatic- You could see queries going from taking maybe 100s of milliseconds to just a few ms

Mark

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
You should read konForce's posts again. He's got many good points.

Quote:
1. Assign a unique value instead of a string for the level-name. Could have gotten away with just hashing the name.

Very true. I'd simply use an auto-incrementing integer as level identifier. When the user enters a level, you must fetch some data anyway, getting one more integer costs almost nothing here.
However, you can make your highscore table fixed-size by removing the text/varchar column, which is good, and despite indices, looking up an integer is probably still by order of magnitude faster.
[quote]1. Assign a unique value instead of a string for the level-name. Could have gotten away with just hashing the name.

Quote:
2. Avoid the join; it's a waste of energy.

What's it good for anyway, if only the top ten is wanted, and you know the user's ID anyway :)
Never fetch data that you already know (btw. do you absolutely need "location"?).

Quote:
Original post by __ODIN__
Since we only really care about the top 10 scores, I ended up setting up a cron-job to execute across each unique level
This can also be done using a stored procedure (or, with some hack, using a trigger... stored proc is better).
Your dataset will remain constant (10 elements), and the query string has already run through the parser and the query optimizer has already done its work, too (re-runs every time otherwise).
Lastly,you save one client connection and the associated overhead.

Share this post


Link to post
Share on other sites
You are using a BIGINT as your userid, thats a wast of memory, a MEDIUMINT UNSIGNED offers enough room for 16 million entries.

Looking at your tables I guess you have a 1-on-1 relation ship between GameUser and GameScore, so you could insert an entry with scores=0 as soon as a player is registered into gamescore

Apply a PRIMARY KEY to userid in both tables(somehow primary keys seem to be faster than plain indices on large tables).
I had a table with a x y z coordinates as indices and wanted to do an interval search on them

the table had 262144=512*512 entries, (x,y) coordinates where the map coordinates, z was the continent

A simple search

//select a 8x8 region from the map
SELECT * FROM province WHERE z==zsearch AND x>=minx AND x <= maxx AND y >=miny AND y <=maxy LIMIT 64;

this query took 1.12 seconds

Now since each of my maps are 512*512 so they are always 262144 entries I removed the z column, got rid of the primary key(provinceid) auto_increment and mapped the x,y,z values onto the primary key(provinceid) at insertion time,

Since all my primary keys are static and predefined I modified the select clause to this

//inside a stored procedure
SELECT * FROM province WHERE provinceid >= get_provinceid(minx,miny,z) AND provinceid <= get_provinceid(maxx,maxy,z) x>=minx AND x <= maxx AND y >=miny AND y <=maxy LIMIT 64;

this query took only 0.00-0.01 seconds.

I still have to test why the indices performed so pure.

Share this post


Link to post
Share on other sites
Guest Anonymous Poster
I just wanted to say, it is nice to see good advice being shared here on the boards, totally focused on working together to help people as opposed to bashing each others ideas. The community needs more positive posts like this. Anyways, keep up the good work, and Ill stop hijacking this thread.

Share this post


Link to post
Share on other sites
Doing some research I probably found the problem, MYSQL seems to use only one index per query, that one which results in the least rows, I could use a multi column index (x,y,z), but I have to try this tonight, so don t take my granted on that

Share this post


Link to post
Share on other sites

This topic is 4129 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this