Why is scaling an mmo so difficult?

Started by
22 comments, last by wodinoneeye 9 years, 1 month ago

Why am I being down voted for above comment, just a bit a of fun?

Hi! I was one of the people who downvoted you. Upon reflection, I may have misunderstood the intention of your post - it can either be understood as a good-natured and friendly ribbing, or a rude jab at someone for your own personal entertainment.

The latter is how I interpreted it - possibly incorrectly. Anyway, I sent you an absurdly long PM listing my reasons in excruciatingly ridiculous detail (I'm not exactly known for my brevity laugh.png); feel free to respond to the PM if you like. I sent it as PM so as to not derail an otherwise interesting discussion on MMO scaling.

haha Carrot's rep was at 666 at the time of posting, just a bit of fun :)

Advertisement

The true advantage of being cloud hosted is the ability to be flexible in how much computing resources you employ in order to meet the demand


And you can only take advantage of that elasticity if you build your entire architecture to take that into account. Which goes back to game design!

Really, cloud is great for avoiding the hassle of buying servers, leasing co-lo space, and negotiating ISPs. It's also great when you really do have elastic demand, and have a software architecture that takes that into account. However, if you have predictable day/night cycles, and don't mind waiting a week for new servers to be built by your local screwdriver shop, then rolling your own hosting is going to be long-term cheaper (assuming you have the people who know how to do it.)
enum Bool { True, False, FileNotFound };

Mmo's are kind of just the perfect storm of hard software problems. Generally the ones you have to deal with in an mmo fall into the following areas.

- Concurrency. Understanding how basic threading and mutexes work gets you maybe 5% of the way to where you need to be in order to do concurrency well. Go read up on fork/join and the lmax disruptor for starters. Read up on lock free algorithms. That's the kind of stuff you need to know something about to do this well.

- Persistence. Games are typically 50/50 read write when it comes to the database. Most software is read heavy, most databases are optimized for read heavy apps. You need to know stuff like implementing write behind caches and various other caching mechanism's to handle this problem and do it with consistent low latency.

- Networking. This is the thing most people associate with game servers, and it's also by far the simplest problem to solve, as it's mostly already solved. Not really worth going into.

- Scalability. Games are stateful. Global state synchronization doesn't scale. Problem! It's kind of why you don't use transactions for everything in your database and only use them when needed, because the cost for synchronizing state at scale is huge. There are known solutions to this, it's a solved problem but more difficult to find information on. Also ties directly into concurrency.

As far as cloud services it just depends on the type of game, and the type of cloud. Realtime multiplayer games eat up a ton of bandwidth,more then you would think. Most cloud services are priced for web stuff and that pricing just doesn't work for realtime multiplayer stuff. You need to get down around $0.02 per GB before it starts to make sense.

Most cloud providers also overprovision. If you run your own vm's then virtualization can work great, but you won't get consistent performance out of most cloud virtualization.

Overall the cloud is overrated, especially if you have your own dev ops team. When you do the math on buying the hardware yourself and either colocating or even just outsourcing the management of your hardware, the cloud starts to look really expensive. Contrary to what Amazon and others want you to think, they have huge margins on this stuff. They count on the fact that cloud hosting is just taken for granted and people not actually doing the math.

The sweet spot I found that we used on almost a dozen games, was to use a company like softlayer that could provision real hardware to our specs and manage the network/hardware level admin. And our small dev ops team managed provisioning and stuff like that. It was cost effective and we kept good performance.

FYI the current trend is for more companies to use hybrids and more real hardware. People are tired of cloud providers overprovisioning, and starting to actually do the math and see how consistently they are just flat getting ripped off. There is a reason why cloud providers don't tell you how many vm's they run per core.

These problems are easy enough to solve if you have enough money.

Scalable hardware and enterprise level equipment and staff to run it 24/7 costs serious money. We aren't talking bargain basement hosting here.

This is why it's hard to scale, programmatically it's simple enough for someone with experience of networking code and general IT, it's just that indies and newbies can't afford to scale that high and get stuck after about 1000 users because all the users are stuck on a couple of cheap hostgator servers...

Its not quite that simple, there really are three big problems:

1) The resources available on a single server node is limited regardless of how much money you have, data access over infiniband and other enterprisy interconnect solutions are (and always will be) several orders of magnitude worse than a cache miss so scaling a realtime simulation horizontally across nodes is not easy, splitting the simulation into several partially or mostly independent simulations and then distributing those (which most of todays MMOs do) is alot easier but restricts the game design quite heavily and splitting it off into fully independent simulations(Bringing the scaling problem down to the level of most enterprise services) takes you straight out of MMO territory and back into the normal multiplayer region.

2) bandwidth scaling for a multiplayer game is always worse than linear, the worst case for a naive system is n*n-1. The only way to scale linearly is to remove all direct interaction between players(But then its not really a multiplayer game anymore)

3) In order to avoid or minimize problems 1 and 2 the players have to be spread out and that requires a large game world with a metric crapton of content. This is primarly a resource/money problem but it is also one that can be solved in incremental steps as your playerbase(and thus hopefully income) grows.

Scaling a game server is simply a far harder problem(One that thus far has not been fully solved by any existing AAA MMO) than scaling a stateless enterprise service where each request can be sent off to a suitable node and handled in isolation.

[size="1"]I don't suffer from insanity, I'm enjoying every minute of it.
The voices in my head may not be real, but they have some good ideas!

A very basic 'why' - it is more complicated to program a solution when the same processes have to be shared across multiple machines.

Mutli threading was one step harder, making the threads interact between physically seperate server machines (via local network) is another step harder and includes a much more rigorous(imposing) data-lock model if you also need speed.

Then there is the old N^2 problem of when many players are all within interaction distance of each other ( a blizzard of state updates for the simulation to handle/process and to spew out to the clients). Many games 'handle it' by just not allowing it to happen -- like via separate instances which players get tossed into in the special congested areas (town/bank areas usually) - it still requires special-case programming which Ive recently seen big label games still botch).

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

This topic is closed to new replies.

Advertisement