The blessing of running an MMO

So I came home today from 8 hours of work (I had to stand most of the time so I was dead tired), I open my ICQ client, and the first thing I see is a message from one of the admins (the only one except for me and my wife) telling me things are FUBAR because today was a random event day (the day of rare manufacturing) and the chances to produce rare items were very high.
First I thought he exagerated, but then people were saying stuff like: "I created 80 enriched fire essences in 5 minutes". Now, that was not supposed to happen. Normally, creating an enriched fire essence is a chance in 1K. The way I designed it, with some special perk and that more rare items day, the chance should have been maybe one in 500, but not 80%!
One admin command was used to change the random day to some other day, only 16 minutes after it started, but it was already too late.
I took a quick look at the items stats log (where it shows how many items entered and exited the game) and to my surprise, about 6K enriched fire essences were created. If leaving things as they were, that would have totally and irrevokably fucked up the whole ingame economy.
So we had to do a rollback to the last backups (about 18 hours lost, 2 days lost for some other players who were online continously for more than 18 hours).
Most of the players understood the need to do that, but a few were bitching (which was not really so unexpected). Half an hour before the rollback, I told them that since we will have a rollback anyay, they might as well get their best items and go have some fun in the main PK map. Which they did, and it was interesting to watch them all wearing their best gear that is usually only for the show :)
After the rollback was completed, I manually set the current day to the "day of more experience", with a 40% experience increase, in order to somehow compensate for the damage.

Now everything is OK, but still, I couldn't do any useful stuff today besides for solving this problem (fixing the server and doing the rollback).
Recommended Comments

Yikes! Did you find out what was causing the problem? A more cynical question would be how did you test this before it went live?

I think the economy getting screwed up is a common theme across MMORPGs so why not write an economy monitor? You could monitor things such as amount of each resource in and out of the system, trade flow etc. Run this for a few weeks to collect baseline data and then you could forward predict where things are heading and also detect abnormal conditions (such as lots of fire essence being created)

Rig it up to an e-mail/sms alert system and allow it to disable certain resource actions (e.g. creating rare things, harvesting etc) and then you can catch things before they go wrong.

Yes, of course we found out what was wrogn and fixed it.
And we did test it beforehand, but unfortunately most of the players do not enjoy going on the test server and test stuff there, so we can't do really extensive testings.

As for the 'monitoring' system, the moderators and players can do that very well. But what good is if I find out that stuff are fucked up when I am not even home and have no Internet access?

Do you not have a cell phone, Radu? I know that you can rig up a Linux box to send SMS messages on the cheap. It might be a good idea for the future, but I don't know how closely you want to be tied to this bastard thing, and I suppose it won't come in handy at all if you are away from your PC for too long.

I have a cellphone, but there is little you can do with a cellphone that is not actually a PDA.

ack, sorry to hear about the bug that made it to the live server. Such is the life of a programmer.

BTW, i saw that you are working on a new game. Did you give up on making barren moon?



I didn't give up per se, but I just postponed it, because it's a huge amount of work, and only very trusted people can be given the source code for the server, so currently I just don't have time to do it.

*ouch* I wasn't trying to be insulting in any way.

I think you missed the point about my post though, it wasn't just an alert system to tell you to get home (when blatantly you can't jump up at work and do this) but it was an automatic healing system to stop damage escalating.

Such an altert system is impossible to design.
For example, how would you design it? To monitor what exactly?

If these are being created/destroyed using a global manager it's easy to add some hooks in to monitor what is hapening.

Here are a few methods you could use to make a monitoring server.

1. Have a seperate process that runs queries on your RTD to determine things such as amount of items in the game, amount of each resource etc.
2. Track objects at creation and deletion, much like you might implement a memory manager with logging.

Create a rule for each piece of data based on a generic template. e.g.

1. Amount of item x is within the range 80% to 120% of daily average. As you collect more data you can refine this to better detect abnormal conditions.
2. Amount of items x created with period of 1 minute is within range 10-20 (for instance)

When you rollout a change create temporary rules to make sure that the change doesn't massively effect the server e.g if you know that the chance of creating rares doubles then allow double the range. Once the changes are gold make the rules permenant too.

If a rule is broken trigger an action.

1. E-mail/SMS admins.
2. Suspend creation of that resource or make it's production time-limited.

We don't have objects in the game, it's a procedural game.
As for a "determine if an item goes more than 120% of the average quantity"... Really, you never worked an an MMORPG.
That is simply not feasable. There is no average quantity of daily items in a MMO, it is just like in the real life. The quantity fluctuates drastically every day depending on a lot of variables.

I work with systems with comparable data loads to low end mmorpgs. We're talking 4GB + memory resident databases here and hundreds of transactions a second (if not more). There is no usual day for this either however we do spot unusual loads as this could mean something is wrong and cost a customer thousands of pounds and if it's our fault ultimately us thousands of pounds.

I can tell you're not keen on the idea, I certainly wasn't pushing it on your mmorpg, I was merely suggesting it as an idea for discussion.

The fact that it is procedural makes no difference to tracking when an item (e.g. armour) is produced in game (and we're talking real objects not oop objects). At some point you change some data to say this exists and presumably that's done centrally for all possible items in your world, this is where you track it.

Saying that there is no normal amount for an event to happen does not mean that you can't make predicitons or spot exceptions. You said yourself that the normal chance is 1 in 1000 and if for instance it's 1 in 500 when this event occurs then you know that if there's 500 players online then there should be around 1 if *Everyone* was making them. If you find that there's 20 being created per tick and this is constant then you can pretty much guess something is wrong. I'm talking about using statistics here to detect probable errors.

And remember I was discussing this not just in relation to your mmorpg.

For some systems it can be predicted, such as a database load. However, a MMORPG is different because every day is different. There is no way to determine the 'normal' quantity of something, not even within a general limit. It fluctuates a lot,
Besides, a system like that wouldn't help, if something is terribly wrong, the players will let us know anyway.

