MMOs and modern scaling techniques

Started by
65 comments, last by wodinoneeye 9 years, 9 months ago

If you have GDC Vault access, look up a talk by Pat Wyatt from GDC 2013 I think it was... maybe 2012.

If you don't have GDC Vault access, what's wrong with you?! :-P


It's far too expensive for me, unfortunately. sad.png
Advertisement

I wouldn't put so much trust on "how web developers approach scalability".


I appreciate that a lot of what used to be considered the state-of-the-art is now not considered best practice. But still, there are sites today deploying technology that services many more concurrent clients than single-shard MMOs can manage. The question is whether it would be possible for MMOs to do the same... or not. From what people are saying, the answer appears to be "Yes, of course... if you can overcome the complexity... which nobody is going to talk about in any detail". ;)

But still the problem remains that we have one socket per TCP connection and that sucks hard.

That's not really the problem I am trying to discuss though. Firstly, because you can distribute the front end servers quite easily. And secondly, because you don't necessarily need to use TCP for your main client connection anyway.

My experience of MMOs is that once you've done your optimisation on the I/O level - eg. getting your buffers the right size, perhaps using a proxy so that your game server is not spending half its time servicing network interrupts, etc - you'll hit a CPU cap with the gameplay before you hit a cap imposed by networking delays between the server and the player clients. Character interactions are O(N2) whereas your number of connections is only O(N), and the value of N is greater for character interactions because it includes NPCs. The cap in my experience seems to be around 500-1500 players per process, depending on how complex the computations for each player are.

Sure, at a very high level with distributed servers like Amazon EC2, these paradigms work. But beware that a user waiting 5 second for the search results of their long-lost friend on Facebook is acceptable(*). A game with a 5 second lag for casting a spell is not.


Sure. That's the backbone of my suspicion. The argument I had which inspired me to start this thread included the other guy saying that my traditional approach is quite obviously not widely used because it would cost half a million dollars per month on Amazon EC2. Trying to tell him that most MMOs - in the original meaning of the word - do not and will not run on EC2, would probably have been futile.

There are two reasons not to run games on EC2:

1) Amazon charges an arm and a leg for bandwidth. You can buy it MUCH cheaper in a co-lo facility.

2) Virtualization induces scheduling jitter, which impacts real-time physics simulation. If your CPU suddenly goes away for 100 milliseconds, that's a six frame stutter, which is quite noticable. When I measured this, it could get as bad as 1500 milliseconds.

The real world performance of network infrastructure is not even slightly approaching 50% light



15:48 ~ jwatte@AF002000$ traceroute www.interserver.net
traceroute to www.interserver.net (198.41.189.28), 30 hops max, 60 byte packets
 1  * * *
 2  208.71.159.129 (208.71.159.129)  0.299 ms  0.573 ms  0.492 ms
 3  117.Vl117-Cr01-PAIX-PAL.unwiredltd.net (204.11.106.45)  2.014 ms  1.902 ms  1.864 ms
 4  209.63.145.114 (209.63.145.114)  3.698 ms  3.710 ms  3.687 ms
 5  be-1.br02.chcgildt.integra.net (209.63.82.186)  56.066 ms  56.075 ms  56.052 ms
 6  xe-1-2-0.edge01.ord02.as13335.net (206.223.119.180)  54.250 ms  53.702 ms  53.516 ms
 7  198.41.189.28 (198.41.189.28)  53.469 ms  53.513 ms  53.503 ms
15:48 ~ jwatte@AF002000$

The light transmission time from Oakland to New York and back (2900 miles each way) in copper (about 2/3 the speed of vacuum) is about 46 milliseconds. In this case (from well-connected data center to well-connected data center) we are substantially CLOSER to speed-of-light than 50%. (46/53 is about 86% speed of light.)

Most of the delay comes from slow residential "last mile" connection issues, and WiFi access points, which may vary from sub-millisecond to dozens-of-milliseconds.

enum Bool { True, False, FileNotFound };

3) The storage throughput is terrible. If you write anything to disk it's a colossal pain. Unfortunately much friendlier providers(DO,Linode) storage pools are tied to server size, with an absolute max.

Also, I won't argue data-center to data-center speed. You're absolutely correct. I wouldn't personally ignore last mile infrastructure though when discussing gaming, which is assumption I made in my response.

Kylotan wrote:

The question is whether it would be possible for MMOs to do the same... or not. From what people are saying, the answer appears to be "Yes, of course... if you can overcome the complexity... which nobody is going to talk about in any detail". ;)


I agree with you, except I see no "of course" there.

As you said, character/character interaction is N-squared; connections (and web architecture) is all built around scaling out the N problem. No real-time physics simulation engine exists that scales out across machines along the axis of the number of cross-interacting entities, although they exist (expensively, see DIS) for making each separate actor extremely complex.

If your needs match those of Farmville, an EC2 based web solution is great. The developers were quoted as saying "we're glad we had scripted bringing up more EC2 instances, because we couldn't have done so manually to keep up with the growth in demand."
I would love for there to exist a similarly flexible solution and architecture for the N-squared character interaction problem. But there doesn't, for rather deep technical reasons as described above.
enum Bool { True, False, FileNotFound };

There are two reasons not to run games on EC2:

1) Amazon charges an arm and a leg for bandwidth. You can buy it MUCH cheaper in a co-lo facility.

2) Virtualization induces scheduling jitter, which impacts real-time physics simulation. If your CPU suddenly goes away for 100 milliseconds, that's a six frame stutter, which is quite noticable. When I measured this, it could get as bad as 1500 milliseconds.

The real world performance of network infrastructure is not even slightly approaching 50% light



15:48 ~ jwatte@AF002000$ traceroute www.interserver.net
traceroute to www.interserver.net (198.41.189.28), 30 hops max, 60 byte packets
 1  * * *
 2  208.71.159.129 (208.71.159.129)  0.299 ms  0.573 ms  0.492 ms
 3  117.Vl117-Cr01-PAIX-PAL.unwiredltd.net (204.11.106.45)  2.014 ms  1.902 ms  1.864 ms
 4  209.63.145.114 (209.63.145.114)  3.698 ms  3.710 ms  3.687 ms
 5  be-1.br02.chcgildt.integra.net (209.63.82.186)  56.066 ms  56.075 ms  56.052 ms
 6  xe-1-2-0.edge01.ord02.as13335.net (206.223.119.180)  54.250 ms  53.702 ms  53.516 ms
 7  198.41.189.28 (198.41.189.28)  53.469 ms  53.513 ms  53.503 ms
15:48 ~ jwatte@AF002000$

The light transmission time from Oakland to New York and back (2900 miles each way) in copper (about 2/3 the speed of vacuum) is about 46 milliseconds. In this case (from well-connected data center to well-connected data center) we are substantially CLOSER to speed-of-light than 50%. (46/53 is about 86% speed of light.)

Most of the delay comes from slow residential "last mile" connection issues, and WiFi access points, which may vary from sub-millisecond to dozens-of-milliseconds.

does such ping results physically reliable? i mean when ping from A city to B city, and ping gives 50 ms it means that it really is avaliable in b after a 50 ms delay? (ping gives one way travel time? how it does the clocks synchronisation?) - as far as i know life it may show that

this value as some half theoretical one and real physical times may be larger here (though i know very little about this, im just 'investigating'/curious)

ping is answered on a very low layer of the protocol stack. No application is involved. So it is nearly the time needed for the physical transport from a to b.

i mean when ping from A city to B city, and ping gives 50 ms it means that it really is avaliable in b after a 50 ms delay? (ping gives one way travel time?


Ping gives the two-way travel time. The time from A to B is half that of Ping.

If the sending application and the receiving application are written properly, and are running on servers that are properly provisioned (not overloaded,) then the application-to-application time will be very similar to the ping time. However, if there are problems in the implementation of the application, or the management of the server, application-to-application time may be a lot longer.
enum Bool { True, False, FileNotFound };

i mean when ping from A city to B city, and ping gives 50 ms it means that it really is avaliable in b after a 50 ms delay? (ping gives one way travel time?


Ping gives the two-way travel time. The time from A to B is half that of Ping.

If the sending application and the receiving application are written properly, and are running on servers that are properly provisioned (not overloaded,) then the application-to-application time will be very similar to the ping time. However, if there are problems in the implementation of the application, or the management of the server, application-to-application time may be a lot longer.

well at least some very good news,

I got an radio waves internet connection and got pings 150-250 ms

(ocassionaly 400, 600 ms) (think its low - this is because this radio wave connestion?) Does usually people with underground connection

have it much lower? can there be assume some reasonable average ping, and some fast connection average ping?

Does mmo games work at much slower rate than those pings?

Yes, for radio-based internet, it's typically frequency arbitration and occasionally drops/collisions that increase the latency.

For wired connections, how much your ISP adds depends on the quality of their network and their willingness to peer with well-connected back ends.

Comcast (my home ISP) adds about 20 milliseconds going 10 miles, and adds between 0.01% and 10% packet drop depending on who I'm trying to talk to.

MMOs can live with seconds of latency. It depends on the play style. If the play-style focuses on physics simulation and player/player interaction, low latency is very important (like for an FPS.)

enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement