Jump to content
  • Advertisement
Sign in to follow this  
Kylotan

MMOs and modern scaling techniques

This topic is 1888 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

Advertisement

I'm in a somewhat similar situation, I'm currently involved in design of scalable game systems. I come from a "big data" devops background, and have been facing many of the same issues. I have the same impression you got, that what us guys writing "large applications" do is completely alien to most game developers who only have game dev backgrounds.

 

 

My take-away so far:

 


  • The game itself being designed to be scalable is probably the biggest thing. GIGO. Some types of game-play are just inherently bad for scalability. So you're left with a choice of latency vs scalability.

  • In many cases, there's ways to increase the complexity of events to break them apart to increase their scalability. In situations where this is asymptotically impractical or impossible, you need to design out the local machine state. That is to say, if events must be processed sequentially, design them so that the sequential elements can be processed on any server. Borrow from REST where you can. The trade-off here is you're often increasing your processing or network over-head as much as 3x...but honestly, it's often worth it. It's counter-intuitive, especially for game developers to design purposefully wasteful systems.

  • Another problem for game devs in particular, OOP(design and principle, not necessarily the languages), is the enemy. It requires an extreme paradigm shift, the more state you can design away, the more scalability you get.


 

We're using an system very similar to what you're describing, asynchronous message passing across geographically diverse clusters(implemented in Clojure and Erlang, if anyones curious, with Riak as our back-end database), for large parts of our game system. And it works really well.

 

I'd say you can use these methods for just about any game system outside FPS, fighting games, maybe racing. Anything that is super latency sensitive is going to end up being a bad choice, but for everything else, I think it's a major win.

 

 

I guess I am generally interested in whether we're missing any tricks from web and app development, by being stuck in our ways and developing servers in much the same way we did 10 years ago. For example, some people are suggesting holding all data in persistent storage and manipulating it via memcached, and on a similar line Cryptic once insisted that they needed every change to characters to hit the DB (resulting in them writing their own database to make this feasible), rather than the usual "change in memory, serialise to DB later" approach. But do these methods pay off?

 

 

This imho is actually a really powerful method for games, similar to how we're doing things. Where every action is recorded to persistent storage(Riak), and then other players are essentially reading other players writes in a sense. With the memcache or application layer managing it.

 

Depending on the DB setup, this can be fast enough for real-time, but as with anything it's a tradeoff ala CAP theorem. For us, we decided that a little bit more latency in exchange for more scalability was a worthwhile tradeoff.

Edited by VFe

Share this post


Link to post
Share on other sites

Your best bet is to just look at documentation and GDC presentations from companies already doing single-shard MMOs, like CCP (with EVE Online).

 

EVE's architecture - as far as I can tell - is not really any different from most of the others, in that it's geographically partitioned (specifically, one process per Solar System). They have some pretty hefty hardware on the back-end for persistence, which presumably is why they don't need to run multiple shards. I'm guessing that they don't have a great need for low latency either, which helps.

 

I don't know any other single-shard MMOs that are of a significant size; I'd be interested to learn of them (and especially of their architecture).

Edited by Kylotan

Share this post


Link to post
Share on other sites


Some types of game-play are just inherently bad for scalability.

 

Sure. My hypothesis is that the traditional MMORPG is bad for scalability. Lots of actions depend on being able to query, and conditionally modify, more than one entity simultaneously. If I want to trade gold for an NPC's sword, how do we do that without being able to lock both characters? It's not an intractable problem - there are algorithms for coordinating trades between 2 distributed agents - but they are 10x more complex to implement than if a single process had exclusive access to them both.

 

(The flippant answer is usually to delegate this sort of problem to the database; but while this can make the exchange atomically, it doesn't help you much with ensuring that what is in memory is consistent, unless you opt for basically not storing this information in memory, which brings back the latency problem... etc)

 


every action is recorded to persistent storage(Riak), and then other players are essentially reading other players writes in a sense. With the memcache or application layer managing it.

 

This sounds interesting, but I would love to hear some insights into how complex multi-player interactions are implemented. Queries that can be high performance one-liners when the data is in memory are slow queries when you call out to memcache. And aren't there still potentially race conditions in these cases?

Share this post


Link to post
Share on other sites

modern online games are already using web scaling approaches


There really are two kinds of things going on in most games.
One is the real-time component, where many other players see my real-time movements and my real-time actions.
The other is the persistent game state component, which is not that different from any other web services application.
The trick is that, if you try to apply web services approaches to the real-time stuff, you will fail, and if you try to apply the real-time approach to the web services stuff, you will pay way too much to build whatever you want to build.

Some projects have tried to unify these two worlds in one way or another (for example, Project Darkstar) but it's never really worked out well. In my opinion, doing that is a bit like trying to unify real-time voice communication with one-way streamed HD video delivery -- the requirements are so drastically different, it just doesn't make any sense to do that.
That analogy isn't entirely bad: One-way streamed video is a lot more like "web apps" than real-time voice communication, which is more like interactive games, except without the N-squared interactions and physics rules.

So, one way of approaching the scalability problem is to dumb down the gameplay until it's no longer a real-time, interactive, shared world. That doesn't make for fun games for those people who want those things. The other way is to be careful about only building the bits that you really do need for your desired gameplay, and optimizing those parts to the best of the state of the art.
Getting the game developers, and the web developers, to understand each other, and work under the same roof, AND making sure that the right tool is used for the right problem, while delivering the needed infrastructure for gameplay development, is really hard to pull off; I've seen many projects die because it misses one of those "legs" and doesn't even know what it's missing. (A bit of the Dunning-Kruger effect, coupled with "I've got a great hammer!")

Share this post


Link to post
Share on other sites


EVE's architecture - as far as I can tell - is not really any different from most of the others, in that it's geographically partitioned (specifically, one process per Solar System).

Yes, one system per process, however they have the ability to move individual systems around to allow for extra processing power for when a system needs it, when you've got 7.5k players participating in a single battle in a single system... it requires a lot of processing power, the physics system alone takes a huge amount of time.

 


They have some pretty hefty hardware on the back-end for persistence, which presumably is why they don't need to run multiple shards.

The reason they don't run multiple shards is that it doesn't fit the game. The economy and the resources used to build everything is the core, multiple shards would defeat that purpose. As such it was architected around the idea of a single unified system.

 


I'm guessing that they don't have a great need for low latency either, which helps.

Up until the introduction of time dilation, latency and system resources was actually the major factor in determining the winner of a fight. When they switched out from standard blocking sockets to IOCP they reduced the system resource issue, but latency was still a major factor in determining who won a fight. Prior to TD and the IOCP switchup, the dreaded black screen of death (as it was known) was the major issue facing players in determining the winner of a fight. In essence, whomever got the most players into a system first... won. People with less latency (mainly brits and europeans) had a distinct advantage here, due to the location of the EVE data center.

 

I do recommend reviewing their GDC presentations, especially on their backend architecture, its quite flexible, now... it wouldn't necessarily work in a zoneless system, although I bet through some clever trickery and using player locality you could engineer a system of virtual boxes or spheres of players that overlapped and allowed you to grow or shrink on demand.

Share this post


Link to post
Share on other sites

Ok so I'll chime in here.

 

Distributed archtiectures are the right tool for the job.  What I've seen is that the game industry is just not where most of the innovation with server side architectures is happening, so they are a bit behind.  That has been my observation from working in the industry and just looking around and talking to people.

 

When it comes to performance, it's all about architecture.  And the tools available on the server side have been moving fast in recent years.  Traditionally large multiplayer games weren't really great with architecture anyways, and for a variety of reasons they just kind of stuck with what they had.  This is changing, but slowly.

 

I think the game industry also had other bad influences like trying to equate the lessons learned on client side performance to the server. 

 

The secret to the current distributed systems is that almost all of them are based on message passing, usually using the actor model.  No shared state, messages are immutable, and there is no reliability at the message or networking level.  Threading models are also entirely different.  For example the platform I work a lot with, Aka, in simple terms passes actors between threads instead of locking each one to a specific thread.  They can use amazingly small thread pools to achieve high levels of concurrency.

 

What you get out of all that is a system that scales with very deterministic performance, and you have a method to distribute almost any workload over a large number of servers.

 

Another thing to keep in mind is that absolute performance usually matters very little on the server side.  This is something that is just counter intuitive for many game developers.  For an average request to a server, the response time difference between a server written in a slow vs a fast language is sub 1ms.  Usually it's in microseconds.  And when you factor in network and disk io latencies, it's white noise.  That's why scaling with commodity hardware using productive languages is common place on the server side.  The reason why you don't see more productive languages used for highly concurrent stuff is not because they are not performant enough, it's because almost all of them still have a GIL (global interpreter lock) that limits them to basically running on a single cpu in a single process. My favorite model now for being productive is to use the JVM but use jvm languages such as jruby or closure when possible, and drop to java only when I really need to.

 

For some of the specific techniques used in distributed systems, consistent hashing is a common tool.  You can use that to spread workloads over a cluster and when a node goes down, messages just get hashed to another node and things move on.

 

Handling things like transactions is not difficult I do it fairly often.  I use an actor with a FSM, and it handles the negotiation between the parties.  You write the code so all messages are idempotent, and from there it's straight forward.

 

Handling persistence in general is fairly straight forward on a distributed system.  I use Akka a lot, and I basically have a large distributed memory store based on actors in a distributed hash ring, backed by nosql databases with an optional write behind cache in between.  Because every unique key you store is mapped to a single actor, everything is serialized.  For atomic updates I use an approach that's similar to a stored procedure.  Note that I didn't say this was necessarily easy.  There are very few off the shelf solutions for stuff like this.  You can find the tools to make it all, but you have to wire it up yourself.

 

Having worked on large games before, my recent experiences with distributed systems has been very positive.  A lot of it comes down to how concurrency is handled.  Having good abstractions for that in the actor model makes so many things simpler.  Not to say there are not any challenges left.  You hit walls with any system, I'm just hitting them much later now with almost everything.

Share this post


Link to post
Share on other sites
Sign in to follow this  

  • Advertisement
×

Important Information

By using GameDev.net, you agree to our community Guidelines, Terms of Use, and Privacy Policy.

GameDev.net is your game development community. Create an account for your GameDev Portfolio and participate in the largest developer community in the games industry.

Sign me up!