• Advertisement
  • entries
    1212
  • comments
    1738
  • views
    1143713

Musing out loud.

Sign in to follow this  

236 views

Working on the standalone daily puzzles (which need a catchier name than that). I have a design issue, though.

Let's take the Shi Sen puzzle. In it, you have one million possible boards (it's just a random seed, so don't get impressed). After you play a game, it logs the following into the high score table:

Your name, board number, time, and number of tiles remaining.

If you play a particular board (say board 1000) and solve it with fewer tiles than you did the previous time you played board 1000, then it'll replace that entry on the table with your new score. If not, it'll tell you that you already got a better score on a previous play.

These entries are stored in an XML file.

The problem is gonna be size. If there are one million (or ONE MEEEELION) boards, and you play two or three games a day during coffee breaks, that table and XML file are gonna get big and slow. And, having seen my target audience, these are games that people leave on their machines for YEARS, so the potential is that the table will get so huge that the game will either crash trying to load it or will get so slow with the saving and loading that you won't wanna play it.

The solutions seem to be:

- Make the number of possible boards smaller, like 1,000 or 10,000. While it'd be a marketing hit that you wouldn't have as many possible games, the practicalities are that you will never pop up a game and immediately say "Hey, I remember this layout".

- Keep the big number of boards, but throttle the size of the high score table to 200 or so entries, throwing out the worst score in the table when you make a better one. While this does fix the practical part, it does diminish the potential of saying "hey, let's see if I can get a better score on number 1000 than I did yesterday, as yesterday's score might not even be there for you to see.

- Only keep scores for boards where you eliminated all of the tiles. While very good players could still conceivably "clog the board" if you play the game constantly, it'll fill up much slower.



As an example, I'm looking at my favorite wargame, Slay by Sean O'Connor. Similarly to my games, it has 10,000 "islands" to defeat. The high score table only stores the 20 best entries (shortest number of turns before the enemies surrender) and does replace entries that are "on the board" if you beat 'em in fewer turns.

The difference, though, is that Slay doesn't have a true "perfect score". While there's a practical limit as to how few turns you can make before losing, there's no real concept of a perfect game.

Shi Sen, OTOH, has a perfect game. If you eliminate all 144 tiles on a board, you have played a perfect game. You can never do better than that.

I asked Shelly, and her response was to do whatever's least obtrusive because she uses the game as an executive time-killer and has no desire at all to ever try to beat the same board with fewer tiles and/or a better time. And, honestly, that's probably what I'll be dealing with for most of my audience. The competative types aren't gonna be playing my games. They're certainly competing in the Daily Puzzle arena, but it's about the most casual kind of competition you can imagine. We're talking anonymous 24-hour turns here :)



My current solution's gonna be a melange of all of all possible solutions:

1. Reduce the possible games to 10,000.
2. Throttle the high score list to only 100 or 200 entries, depending on performance.

I was gonna only store scores for perfect games, but doing so really won't affect performance. Non-perfect scores will be shoved off the board by perfect ones eventually, so there's not really a hit for having non-perfect scores other than the board will hit the limit sooner.
Sign in to follow this  


9 Comments


Recommended Comments

I'd vote for a limited top scores, like top 200 entries.

If at all possible, can you determine board difficulty mathmatically? If so, have 3 boards (each w/ a top 20-50) where you keep track of the top scores for each catagory. (Would also allow the user to choose a board difficulty, "Give me a easy board, normal board, hard board, any board")

Share this comment


Link to comment
IMHO you've chosen the complete wrong direction to solve your problem. Effectively, you've chosen to cripple the game to suit your choice of tools (XML) instead of firing the offending tool that's holding you back and getting one with less suck involved.

Consider writing a simple binary format to store your score tables.

With one million boards, you need roughly 20 bits of storage (MAX) to record the board's ID number in two's-complement format. You can save signficantly on this in the long term by using packed BCD instead. I'm too lazy to run the numbers at the moment, but you should be able to store an average board ID number in around 16 bits easily. In any case you can get good compression here at this step.

The next thing to do is store the number of cleared tiles. Since there are 144 tiles in play, we only need one byte to store the "score." Time measured in seconds can easily be recorded in 2 bytes (giving us up to 65,535 seconds of play, or roughly 18 hours).

For ease of use, let's just stick with 24-bit (3-byte) two's-complement storage. (Especially since BCD would cost us constant size requirements for each entry, which as we'll see shortly will be very nice to have.) One byte for score, two bytes for time brings us to 6 bytes per entry. Name is tricky, but we can use a hash table lookup here to cheat. Say we use CRC16 to cheat and generate a hash lookup that maps a 16-bit code to a separate string table that holds the actual names. This means we need only 8 bytes per entry, plus the string table.

After playing all 1 million boards, my score file is going to be about 7.5MB plus spare change for the string table. We'll assume the user is very verbose with their names (and/or lots of people played on the same machine) and say the file takes 8MB.


Now, we have the facts laid out before us. Let's look at the implications:

  • 8MB is freaking peanuts.

  • XML will blow past the 8MB mark after far fewer than 1 million boards.

  • It is extremely unlikely that anyone will play all 1 million boards. Let's take the totally arbitrary guess that the average player, over a span of say 2 years, plays about 3000 boards. Their file size is 23KB plus the string table. 23KB for 3000 boards is tiny. I doubt you could get 300 boards in 23KB using XML.

  • Binary formats are less likely to get hacked by goofy kids trying to play with the high scores. (Yes, I was one of those kids. Yes, binary formats are still trivial to crack for someone who has the skills. Most people don't have the skills to reverse engineer a binary file format, but just about any bored 14 year old kid can use Notepad and jack up an XML file.)

  • You don't actually have to hold the score table in memory. You can use a binary search to locate the relevant board ID in the file, which is easy, since each entry is a constant size (8 bytes). Even after 1 million games you only need a worst-case of 20 hops using a binary search to find a relevant entry. 20 reads out of a file is fricken peanuts.

  • Since each entry is constant-size, you can replace existing high scores (for boards that have already been played) without copying or rewriting any of the data file (i.e. in-place).

  • The worst possible case is having to insert new board scores at the beginning of the file. Say board 1 is the last of the million boards played, and we have to insert it at the top of the file (in keeping with our sorting requirement for doing binary search lookups). Worst case we are moving 8MB of data. Even on ten year old computers, shuffling 8MB of file takes less than a few seconds. Motherfricken peanuts, man.

  • If you use packed BCD and some other hashing tricks you can probably get the size of each score entry down by a byte or two, maybe even three, giving you even better final results.

  • You don't need an XML parsing library's overhead to do any of this. Writing the file I/O and encoding stuff is trivial - few hundred lines tops. Even bindings to MSXML, doing a node tree parse and traversal, and all the assorted XML jazz is going to cost you far more than that. And the performance is going to suck, period, for any large number of scores.



Now you retain the potential of 1 million games, at the sacrifice of a tiny bit of brain effort in coding the file format, and with the benefits of freeing yourself from a none-too-lightweight library dependency.

In XML, you have to load the entire file into memory to do a parse and construct the node tree. In the binary format, you never have to load more than a few bytes at a time if you don't want to. Even in the case of inserting new scores into the file, you can do tiny batch-chunk copies to shift the data around, as small as a couple of KB. If you take advantage of the tremendously smart caching in modern file I/O systems, you can just blob-transfer the entire damn thing and you will never, ever, ever have even remotely strenuous memory or time requirements.


There, now... see what difference it can make when you use the best tool for a job, instead of redefining the job to fit the tools? XML is a hammer. Not every single data storage problem in the world is a nail. In fact, most of them aren't.


Afterthought: I just re-read this to proof it (insert obligatory whining about lack of preview feature) and realized it comes off a little harsh. This isn't intentional. I'm just a little weary of seeing XML thrown around every single time someone wants to put some data on disk, when it is exceedingly rare for XML to even make sense as a solution - let alone for XML to be the optimal solution. At the risk of sounding like a whiny nerd past his prime, things were nice in Ye Olden Days when programmers had to know, as a matter of course, how to critically analyze these kinds of consideration.

Overusing XML makes baby kittens kill Jesus. Or something.

Share this comment


Link to comment
Have you just considered writing a cron-job that clears out old entries from the XML file every so often?

ApochPIQ has the right idea technically, I think.

Share this comment


Link to comment
My thought:

Keep the top 100 scores, perfect or not, dupe or not.

Keep the perfect scores in a hall of fame. Only keep the 20 best, and don't allow dupe boards in the HoF(i.e. if a dupe board enters the hall of fame again, eliminate the old value). I'm assuming that in the case of perfect boards, the tie breaker is time taken.

Share this comment


Link to comment
On another hand, scores that are low AND 2 years old are likely to be forgotten. Boards that has been played only once 12 months ago will probably be forgotten too. Keeping old scores around is good, but keeping them forever is quite unuseful IMHO. You can take advantage of the limited memory of your player to remove older low scores (high scores are harder to remove because they'll probably be displayed somewhere).

Share this comment


Link to comment
Actually, that's not a bad idea. You could periodically parse the score data, fit it to a normal curve, and chop off (say) the lower 60% of scores. Maybe weight based on age so that extremely old scores are considered "lower" than newer scores, but old-and-really-good scores shift towards the keeper end of the distribution. Should be a simple implementation with a little algebraic foolery.

Implement that, and bam, instant data savings. Combine that with a good binary format and you can have reasonable scorekeeping for 1 million boards in 3.5MB tops.

Share this comment


Link to comment
How about indexing scores based on game number?


ie:

for each game number (or maybe decade or century of numbers) keep one file of scores. Therefore, finding the appropriate file is fast, and only files that have games that have been played will exist.

Share this comment


Link to comment
I think Mith's idea is the best so far. Apoch's idea was good, but seems a bit overcomplicated for a small game like this IMO.

Share this comment


Link to comment
Quote:
How about indexing scores based on game number?


ie:

for each game number (or maybe decade or century of numbers) keep one file of scores. Therefore, finding the appropriate file is fast, and only files that have games that have been played will exist.


You may have a speed penalty from managing many small files, depending on what your filesystem is -- something like XFS would solve that problem, though, and I don't think your file writing is at a critical mass of traffic.

Alternatively, you could just use a relational database. [grin] There exist many libraries for interfacing with an SQL-based database and I'm sure you can find one to install somewhere. A decent SQL implementation solves the problems of locking, data retention (you can easily write SQL queries to pick out entries which aren't significantly high-scored and haven't been updated in a long time and delete them) and serialization (can easily back up your database).

Although, you mentioned a "table" so perhaps you are already using a database solution. If so, I'd look at some help for scaling whatever database solution you are using and ditching XML altogether.

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Advertisement