# SpatialOS single shard MMO

## Recommended Posts

drainedman    104

Has anyone tried SpatialOS?

I'd like a different perspective I don't pick up a lot just by thinking about it myself.

Is this stuff quite similar to Pikkoserver though?

Are workers/services really the way forward. I don't see any detailed explanation or case studies, just presentations.

##### Share on other sites
khawk    2924

I know of two members here who may be able to help, and there may be more but I'll tag them. @JWalsh's studio Soulbound Studios has partnered with SpatialOS.

@riuthamus was talking to them at one point as well for their Greenlit game, but I don't know if anything came of it. Perhaps there are others.

For large scale simulation SpatialOS is probably a strong future since it's based on the fundamental concept of distributed simulation architectures. Pikkoserver sounds similar, but to be honest I had never heard of it until your post.

The concept that SpatialOS provides has been around a long time. In the simulation world, distributed simulations are a common architecture. SpatialOS takes that idea to the next level by moving the distributed concept to "the cloud", where computation is relatively cheap and easy to scale. They then take that to the next level with gaming by offering engine integrations and other technology that makes it easier to work with their platform.

As I understand it, in SpatialOS you define behaviors for entities in the world, and these entities are running on their servers. From a backend perspective the servers scale as the number of entities scale. In theory you could simulate every detail in a world by just adding more computational power.

##### Share on other sites
hplus0603    11356

Every five years, some company with expertise from some "adjacent" technology area (finance, mil/sim, telecom, geospatial, etc) believes that they can do games better! They will win with their superior tech!

I have not seen a single one of those actually manage to get anything real and lasting into the marketplace. Not for lack of trying!

Is Improbable different? Perhaps. They got a giant investment from SoftBank, which might mean they have something neat and new. Or it may mean they have good connections to investors who have different investment criteria.

Quote

moving the distributed concept to "the cloud", where computation is relatively cheap and easy to scale

You mean, in the "cloud," where latencies cannot be managed, where noisy neighbors can flood your network, and where two processes that communicate intimately end up being placed on different floors of a mile-long data center, and that charges 10x mark-ups on bulk network capacity? That "cloud"? Or is this some other "cloud" that actually works for non-trivial low-latency real-time use cases?

In the end, though, most games just aren't well-funded and big enough to actually make a lot of sense for more business-focused companies. And, those who are (OverGears of BattleDuty and such) they end up distinguishing their games from others by integrating gameplay with networking with infrastructure really tightly, and at that point, a "one size fits all" solution looks more like "one size fits none."

Don't get me wrong. Distributed simulation is a fun area to work in, and there are likely large gains to be had through innovative whole-stack approaches. History just shows that the over/under on any one particular entrant in the market is "not gonna make it."

##### Share on other sites
khawk    2924
3 minutes ago, hplus0603 said:

That "cloud"? Or is this some other "cloud" that actually works for non-trivial low-latency real-time use cases?

That's "the cloud". I don't see SpatialOS being ready for low-latency real-time any time soon and not necessarily of their own fault.

FWIW, I did some press time with them at GDC. It was an interesting visit since I have experience in developing the types of distributed entity simulation platforms they're building. The demo was to showcase intelligent behavior across thousands of entities in a "virtual world" setting. It was fairly slow paced, when it worked - true to startup style they didn't account for poor internet at a trade show. So, not much of a demo.

##### Share on other sites
drainedman    104
23 minutes ago, khawk said:

That's "the cloud". I don't see SpatialOS being ready for low-latency real-time any time soon and not necessarily of their own fault.

FWIW, I did some press time with them at GDC. It was an interesting visit since I have experience in developing the types of distributed entity simulation platforms they're building. The demo was to showcase intelligent behavior across thousands of entities in a "virtual world" setting. It was fairly slow paced, when it worked - true to startup style they didn't account for poor internet at a trade show. So, not much of a demo.

Ah pretty disappointing if it can't do low latency. They kind of sold it as that though.

##### Share on other sites
hplus0603    11356

Also, a variety of multi-entity architectures have been tried. The most famous failure is probably Sun Darkstar, which ended up supporting fewer entities in cluster mode than in single-server mode :-) They used tuple spaces, which ends up being an instance of "shard by ID." The other main approach is "shard by geography."

A "massively multiplayer" kind of server ends up with, at a worst case, every entity wanting to interact with every other entity. For example, everyone try to pile into the same auction area or GM quest or whatever. (Or all the mages gather in one place and all try to manabolt each other / all soldiers try to grenade each other / etc.)

N-squared, as we know, leads to an upper limitation to the number of objects that can go into a single server. Designing your game to avoid this, helps not just servers, but also gameplay. When there's a single auction area that EVERYBODY wants to be in, it's not actually a great auction experience (too spammy,) so there's something to be said for spreading the design out. (Same thing for instanced dungeons/quests, etc.)

Anyway, once you need more than one server, then you can allocate different servers to different parts of the world (using level files, or quad trees, or voronoi diagrams, or some other spatial index,) To support people interacting across borders, you need to duplicate an entity across the border for as far as the "perception range" is. This, in turn, means that you really want the minimum size of the geographic areas to be larger than the perception range, so you don't need to duplicate a single entity across very many servers. If by default you have chessboard distribution, and the view range is two squares, you have to duplicate the entity across 9 servers all the time. That means you need 10 servers just to get up to the capacity range of a single non-sharded server! The draw-back then is that you have a maximum density per area, and a minimum area size per server, which means your world has to spread out somewhat evenly. Because the server/server communication is "local" (only neighbors,) you can easily scale this to as large an area as you want, as long as players keep under the designated maximum limit. Many games have used methods similar to this (There.com, Asheron's Call, and several other.)

The other option is to allocate by ID, or just randomly by load on entity instantiation. Each server simulates entities allocated to them. You have to load the entire static world into each server, which may be expensive if your world is really large, but on modern servers, that's not a problem. Then, to interact between other servers, each server broadcasts the state of their entities using something like UDP broadcast, and all other servers decode the packets and forward entities that would "interact with" entities that are in their own memory. This obviously lets you add servers in linear relation to number of players, and instead you are limited by the speed at which servers can process incoming UDP broadcast updates to filter for interactions with their own entities, and you are limited by available bandwidth on the network. 100 Gbps Ethernet starts looking really exciting if you want to run simulation at 60 Hz for hundreds of thousands of entities across a number of servers! (In reality, you might not even get there, depending on a number of factors -- Amdahl's Law ends up being a real opponent.)

None of this is new. The military did it in the '80s on top of DIS. And then again in the late '90s / early '00s on top of HLA. It's just that their scale stops at how many airplanes and boats and tanks they own, and they also end up accepting that they have to buy one computer per ten simulated entities or whatever the salespeople come up with. There's only so many billion-dollar airplanes in the air at one time, anyway.

For games, the challenge is much more around designing your game really tightly around the challenges and opportunities of whatever technology you choose, and then optimizing the constant factor such that you can run a real-size game world on reasonable hardware. (For more on single-hardware versus large-scale, see for example http://www.frankmcsherry.org/assets/COST.pdf )

##### Share on other sites
drainedman    104

What if you had just one server with 128 cores and a huge shared memory?

Surely that would support hundreds of thousands of players.

##### Share on other sites
hplus0603    11356

Cores talk to memory through a memory bus. That memory bus has some fixed latency for filling cache misses. Surely, it will be faster than an external network, but on the other hand, all of those users need packets generated to/from themselves, going into/outof the core, too.

It's hard to get more than four-channel memory into a single socket, and price goes up by something like the square of the socket count (dual-socket much more expensive than single-socket; quad-socket much more expensive than dual-socket.)

And, once you have CPUs with different sockets, the different CPUs should be thought of as different network nodes -- a cache miss filled in 400 cycles on a local RAM module may take 4000 cycles when filling from a remote CPU NUMA node. (usually, the difference is less stark, but it's easily noticeable if you measure it.)

So, let's assume there are four sockets, each with 50 GB/s throughput. Split 25,000 users per core, at 60 Hz. That gives 33 kilobytes per player per frame, and this is assuming that you will use memory optimally. (Most cache-miss-based algorithms would be happy to get past half the theoretical throughput.)

Can you do all the processing you need to do for a single player for a single frame, touching only 33 kilobytes of RAM? (Physics, AI, rules, network input, network output, interest management, and so on all go into this budget.) It's quite possible, if you know what you're doing, and carefully tune the code for the system that's running it, but it's no sure slam-dunk winner. Write your code in Java or Python or some other language that ends up chasing too many pointers, and you lose your edge very quickly.

I just priced a PowerEdge with quad 16 core/32 thread Xeons, four 8-way 16 GB DIMMs per socket, and dual 10 Gb network interfaces; it's about $50k (plus tax and if you need storage disks and such.) You'd also want at least two, because if you have a single server and it dies, your game is as dead as the server. (You'd also need data center rack space/power/cooling and routers/switches/uplink and so on.) Although, still,$100k isn't that bad; the multiplayer networking systems have to compete with this offering and make it worthwhile, which limits their ability to charge upwards, which in turn means they can't solve problems that are too advanced or too fancy, and thus have to simplify their solution to the wider masses of developers. This, coupled with the incredible importance of designing game and network behavior hand-in-hand, probably is one of the explanations why there isn't a phletora of successful game multiplayer hosting/middleware solutions out there, and why each of the ones that are surviving actually fills a different niche -- there simply isn't space to survive for two different competitors in the same niche.

##### Share on other sites
drainedman    104
Posted (edited)

I can't afford $50k but I am planning on building a beowulf cluster to support a kind of MMO with 100k+ entities (not all human but all are persistent). I'm just going to go for a simple method - each 2 km sq region is handled one process. Every process sends sync messages to its neighbours via network messaging. Only edges of the regions are kept in sync. I'd like to run this over a VLAN to separate network channels. Of course this has lots of limitations but if I get this far then I will implement some load balancing e.g. split the regions with heavy loads into 1km sq regions, etc. So long as not too many objects converge in spot and do too many interactions it might be ok. Edited by drainedman #### Share this post ##### Link to post ##### Share on other sites Kylotan 10005 I worked for an MMO company who had very similar technology up and running back in 2005. We didn't even use that tech in the end because it wasn't a useful selling point and the market had moved away from the single-shared-world model anyway due to World of Warcraft. (Arguably this was because their networking model was so primitive that they had no choice, but turning limitations into opportunity is always a good idea.) As mentioned above the main draw of this new service is that you don't have to manage your own servers like we did. Cloud services are probably low latency enough for most MMO usages now, but probably not shooter-style ones. Potential downsides include hosting and operation costs, since you can probably get it cheaper via a specialised solution (although you also run the risk of over-specifying and paying for capacity you don't need), and development costs, because you have to write the entire game using their paradigm, which may not suit you. #### Share this post ##### Link to post ##### Share on other sites ggambett 145 Posted (edited) Hi! I'm Gabriel, former game dev and very interested in niche topics such as client/server network architectures or pathfinding for games, now working at Improbable. I hope I can clarify a few things about SpatialOS - apologies for the long reply but there’s loads of great stuff to discuss here! @khawk your understanding of SpatialOS is pretty much spot-on. An alternative explanation that I really like is this: imagine the traditional Entity-Component-System architecture, where each system is a distributed system rather than a thread on a server. SpatialOS lets you do this, without having to actually write a distributed system. Quote FWIW, I did some press time with them at GDC I was there!!! From your description it sounds like you tried Worlds Adrift by Bossa Studios, or some of the other games by our partners (like Chronicles of Elyria by @JWalsh, whom I also had the pleasure to meet!). Did anyone give you a 1:1 tour of our Wizards demo game? We're now offering this tour on the website, and I can't recommend it enough - it should make the core concepts and the workflow clear to everyone (if it doesn't, please message me - I run the team responsible for this content, so feedback is more than welcome!) Quote latencies cannot be managed, where noisy neighbors can flood your network, Yes, but this is also the case for any client-server game, and it depends on your internet connection at home; whether there's a single server or a swarm of workers on the other side can't improve that. Quote processes that communicate intimately end up being placed on different floors of a mile-long data center The internal latencies in a mile-long data center are so small compared to the latency from your home to the datacenter, that the latency you experience as a gamer will be dominated by the latter (as I said above, this is no different to connecting to a game with a single server). That said, SpatialOS is called SpatialOS for a reason Locality of reference is one of the core concepts of how the load balancing and worker allocation algorithms work. We go to great lengths to make sure that entities in close proximity in the virtual world are physically close in the datacenter - usually within the same physical server. To see all this in action, I'd suggest you take a look at Worlds Adrift, which has to be pretty much real-time because it's a very physics-heavy game. Every single thing in the game is physically simulated, including individual ship parts - as you can see in that video! Quote N-squared, as we know, leads to an upper limitation to the number of objects that can go into a single server. Designing your game to avoid this, helps not just servers, but also gameplay. Yep, agreed. Just like we can't work around the speed of light to provide zero end-to-end latency, we can't do much about the way O(n2) works As you point out, this has to be solved with a mixture of game design and clever software techniques. Note that O(n2) appears in different places. Interactions of objects within a worker are one, as you point out, but there's also O(n2) network communication between workers, and we can and do something about it - smart distribution and migration of entities between physical servers. This is another non-trivial problem that SpatialOS solves, and which is invisible to the developer and the player (see this for an example). @hplus0603 your thoughts about the impact of the perception range are very interesting. SpatialOS does this in a different way, though. In SpatialOS, the allocation of workers is dynamic, and it follows the workload of the simulation around. This minimises the migrations that need to happen. There is co-simulation where the areas overlap, but we make this invisible both to developers and players. Distributed physics is a particularly fun example of this - we've written about that in a blog post. Second, SpatialOS does this at the component level rather than the entity level. So you could have a game world simulated by 100 physics workers (e.g. instances of PhysX) but only 10 game logic workers if it's a physics-heavy game. Or 10 physics workers, 50 AI workers and 5 pathfinding workers. The point is that every kind of worker needs to "see" just a narrow subset of the components of an entity (generally, the ones it is able to simulate), and this reduces the bandwidth requirements tremendously. Workers follow the workload, and they do this in a layered way, so in practice there are far less physical migrations than you may think. Quote I can't afford$50k but I am planning on building a beowulf cluster to support a kind of MMO with 100k+ entities (not all human but all are persistent).

Sounds like a job for SpatialOS You get the benefits of running a cluster, but without any of the complexity of setting up and running a cluster. In fact, you don't have to write networking code at all - you see this in practice in our Wizards demo.

About the $50k, are you aware of our Games Innovation Program? We understand this is a concern for a lot of developers, so in a nutshell, we've partnered with Google Cloud to offer subsidies to usage costs to users enrolled in the Program. Read more details in this article. Also, development on your local machine is always free, so you can try SpatialOS with your SDK of choice, try the Wizards demo or the Pirates tutorial, etc. Quote Potential downsides include hosting and operation costs, since you can probably get it cheaper via a specialised solution (although you also run the risk of over-specifying and paying for capacity you don't need), and development costs, because you have to write the entire game using their paradigm, which may not suit you. You would be surprised! We ran the numbers in detail, and the economics may not be so different, especially when you really scale your game up. And at that point you also need to consider the cost and expertise of having a dedicated infrastructure / DevOps team. Spilt Milk, the creators of Lazarus, touch upon these topics in this article. As a small team with no previous large scale networking experience, they went from zero to a continuous playable alpha with 3000 concurrent players in 4 months. Once again, this was a great thread to read! Happy to address any other questions you may have Edited by ggambett Formatting #### Share this post ##### Link to post ##### Share on other sites Scouting Ninja 3968 Posted (edited) 1 hour ago, ggambett said: Yes, but this is also the case for any client-server game, and it depends on your internet connection at home; 1 hour ago, ggambett said: The internal latencies in a mile-long data center are so small compared to the latency from your home to the datacenter, that the latency you experience as a gamer will be dominated by the latter interesting, so what range of networking would be ideal from the players point? I mean if only players with the highest bandwidth can play then what is the point of a huge game world? How do you plan to deal with the view range, it is notable that in the Wizards demo the camera is setup to prevent the player from seeing into the distance. The Pirate tutorial has nothing there, only a flat plane. Isn't the point of making a large world to have enough space for a huge player base, for players to see they are in a large open world and to have hundreds of players interacting with each other at once. At the moment it still looks like having servers for each region is a better idea than having one large server where you lump players. Then there also is the fact that the further a player is from the server the more they will lag. Edited by Scouting Ninja #### Share this post ##### Link to post ##### Share on other sites Kylotan 10005 (Replying to ggambett, 2 posts up) Hi Gabriel, thanks for coming on here and answering some questions. I can see why the product is an attractive one for people who aren't able or willing to manage their own hosting. That is one reason why it is a more attractive proposition than what my former company was offering 10 years ago, because although we provided a very similar architecture, the expectation there was that the game developer would provide and manage their own servers. So things have changed there. I still think the paradigm shift necessary to use such a system can be complex. The "zero networking code" aspect is not a big deal these days since most game engines offer that to some degree with state replication - but learning to write code with fewer 'shared-everything' assumptions and more message passing is tricky for many game developers. That's not a criticism of your tech, as the problem is intrinsic to running a distributed simulation. But it's also why some games have gone down the WoW route and simply decided it wasn't a problem that was worth solving. Additionally, trying to describe it as like an entity-component system for the cloud is a negative in this regard because it's clear from posts on Gamedev.net that most developers aren't comfortable with that approach and struggle significantly with creating clear partitions between components and in handling complex multi-entity/component interactions. That's arguably why the 2 major engines allow full communication between arbitrary entities and components - the alternatives make otherwise easy interactions quite complex to handle.) #### Share this post ##### Link to post ##### Share on other sites hplus0603 11356 Quote Quote latencies cannot be managed, where noisy neighbors can flood your network, Yes, but this is also the case for any client-server game, and it depends on your internet connection at home; whether there's a single server or a swarm of workers on the other side can't improve that. I think you misunderstood me. I'm talking entirely about things that go in inside a virtualized, cloud-hosted data center. Because it uses virtualization for the machine hosts, you are subject to the requirements of the virtualization platform, and that often introduces significant (many milliseconds) latencies in scheduling, because the VM hosts are all optimized for batch throughput, not for low-latency real-time processing. For real-time simulation running close to full machine utilization, a physics step time that goes from 15.5 milliseconds to 17.5 milliseconds will make you miss your deadline. For real-time physics games, I much prefer bare metal for this reason. It's also interesting that you mention co-simulation across visibility borders and PhysX at the same time. PhysX is not deterministic, so any co-simulation across borders will diverge. With enough authoritative network state snapshots, you can mash that with brute force, of course. Regarding the "borders moving with load," that's something we looked at an implemented for There.com, but it ended up not being useful for real gameplay, because players tended to gather in the same kind of gathering places. Meanwhile, the view distance across borders (i e, how much you need to co-simulate) has to be determined by the "visibility range" of your objects. If your object is a missile cruiser with a range of 150 kilometers, you have to have an instance of the object on any server that touches this radius, so that it can do target acquisition. (Either you have an instance of the cruiser on each server within the radius, OR you bring a copy of each object within that radius to the cruiser's server -- if there are fewer cruisers than targets, you want the former, for hopefully obvious reasons.) If you have a soldier with a sniper rifle with a 2 kilometer scope, you have to be able to see each object within two kilometers, or the player will be sad. I'm pointing this out, not to cast shade on SpatialOS, but to show that any distributed server framework has to be used with gameplay design that goes hand-in-hand with the networking/simulation capabilities, and each solution will bring with it specific limitations you have to accept as a game designer. Using words such as "invisible to the developer" or "without having to think about distribution" sounds great in marketing, but ends up not actually being helpful to the end developer. And, honestly, is actually untrue for all but the most trivial kinds of games. I've found that the companies that end up doing the best in gaming are those that are clear about pros and cons about their systems, and that do not make over-simplified promises they cannot actually deliver on (without tons of caveats) in their marketing. #### Share this post ##### Link to post ##### Share on other sites ggambett 145 1 hour ago, Scouting Ninja said: I mean if only players with the highest bandwidth can play then what is the point of a huge game world? I didn't say that; apologies if I expressed myself in an unclear way. What I tried to say is that the player's latency to a datacenter is the same regardless of whether there's a single server inside the datacenter running the game, or a cluster of a hundred (because the internal latency is minimal). So the network requirements of a SpatialOS game are no more strict than a regular game running on a single server. On the flip side, the SpatialOS game is not limited by whatever that single server and engine can handle (and servers can get only so big). 1 hour ago, Scouting Ninja said: How do you plan to deal with the view range, Each worker has a configurable "checkout radius", which is effectively the view range. There are interesting LOD techniques you can apply to updates to minimise the bandwidth impact of a big checkout radius. We also have the concept of streaming queries, which allows workers to "see" entities that wouldn't normally fall within their checkout radius. This is used in Worlds Adrift, for example, where islands are visible from vast distances, much farther away from where a worker would need 60hz position updates for things in their surface. Hopefully this also answers @hplus0603's question (although this is the only instance of "view across borders", because the border of the region of interest of a client is the view range). 2 hours ago, Scouting Ninja said: Isn't the point of making a large world to have enough space for a huge player base, for players to see they are in a large open world and to have hundreds of players interacting with each other at once. Absolutely, and this is exactly the kind of experience that SpatialOS enables - take a look at Worlds Adrift for an example of exactly that 2 hours ago, Scouting Ninja said: At the moment it still looks like having servers for each region is a better idea than having one large server where you lump players. That may be more appropriate for some types of games. SpatialOS offers different load-balancing modes, and one of them puts workers in a static configuration. Note that even in this case, workers aren't equivalent to servers handling regions; all the workers combine to simulate a single continuous game world with no hard boundaries, regardless of whether you choose a static or a dynamic worker allocation setup. 2 hours ago, Kylotan said: I can see why the product is an attractive one for people who aren't able or willing to manage their own hosting. Making massively distributed systems is not exactly simple, especially for non-embarrassingly-parallelizable problems such as physics. Of course as a game developer you want to spend most of your time and effort making a game and exploring the creative possibilities of large worlds, not a spatially distributed cloud compute platform! But even studios that are experts in, and famous for, long-running MMOs see the value in using SpatialOS. The clearest example I can offer of this is Jagex, the creators of Runescape, with whom we've recently partnered with. 2 hours ago, Kylotan said: I still think the paradigm shift necessary to use such a system can be complex. [...] That's not a criticism of your tech, as the problem is intrinsic to running a distributed simulation. There is a bit of a paradigm shift, absolutely. But what you get for your investment in learning how to work with SpatialOS is the possibility of building games of a scale and richness that is currently beyond the reach of most game developers. 2 hours ago, Kylotan said: most developers [...] struggle significantly with creating clear partitions between components [...] That's arguably why the 2 major engines allow full communication between arbitrary entities and components - the alternatives make otherwise easy interactions quite complex to handle.) I would argue that's not a limitation of the ECS (or ECW) architecture. In fact, this limitation doesn't exist in SpatialOS - any component of any entity can communicate with any other component of any other entity by sending a command (essentially RPCs), no matter where it is (in the game world, or in computational terms). 1 hour ago, hplus0603 said: PhysX is not deterministic, so any co-simulation across borders will diverge. With enough authoritative network state snapshots, you can mash that with brute force, of course. I have limited knowledge of physics simulation, but I understand stable simulation is not trivial, even on a single instance of a physics engine; forcing more than one physics engine to cooperate, where they aren't even aware of the other's existence, is not a matter of brute force I refer again to this experiment we made (the video is pretty cool!). But the broader point is that a SpatialOS game developer doesn't even have to think about this - no discussion about whether to use brute force or something more subtle, no code to deal with this or with anything related to co-simulation. 1 hour ago, hplus0603 said: Using words such as "invisible to the developer" or "without having to think about distribution" sounds great in marketing, but ends up not actually being helpful to the end developer. But this is pretty much true in the case of SpatialOS, and I say this as both as a game developer and a hardcore software engineer, not as a marketing person. As discussed above, there is a bit of a paradigm shift required, but you really don't have to think in terms of implementing or running a distributed system; game logic involves workers receiving state updates, doing whatever computation they need, and sending back state updates. There really isn't any need to write networking code, or any kind of manual synchronization code, or in general, even being aware that you're making a massively distributed game rather than a single-player game running in a single machine, except in the broadest terms (e.g. commands may fail, so you may need to add some custom retry logic if you're not happy with the defaults). Don't take my word for this; you can play actual games built on SpatialOS right now (Worlds Adrift, Lazarus). You can download the SDK and play with the Wizards demo or the Pirates tutorial right now. It's all freely available, fully documented, in production, and with enough examples and starter projects to get you started (github.com/spatialos). #### Share this post ##### Link to post ##### Share on other sites hplus0603 11356 Quote game logic involves workers receiving state updates, doing whatever computation they need, and sending back state updates I know what you're talking about. I built a system that had many similar properties, including this programming model for entities. Our sales people used the same marketing claims. It turns out, there are things developers do "as a matter of course" that ends up generating way too much RPC traffic to scale well. Developers need to know what the distribution decomposition is, if they want to get anywhere near (say, within an order of magnitude of) the theoretical maximum performance of the system. Naive developers, even using your carefully crafted API that attempts to make developing distributed objects easy, and "hiding" RPC/messaging, WILL flood your system to the point where scalability is 1/100th of what it should be. Further, developers will assume that RPC or events are reliable, AND bounded time. As you know you can't get both of those at the same time across a lossy network ("two generals problem.") If, in the context of "paradigm shift," you suggest that developers also need to train themselves to know about these things, then yes, inside the paradigm you live inside the paradigm. But that paradigm includes limitations that are imposed by your particular distribution model. That's an unavoidable outcome of distributed games, and it's what makes distributed games (and other systems) an order of magnitude harder to work with than in-RAM single-player games, and pretending that they're the same does nobody any favors. (Except possibly salespeople on commission who would prefer to close deals early over closing the right deals -- luckily, I've managed to avoid most of those in my life!) #### Share this post ##### Link to post ##### Share on other sites drainedman 104 Posted (edited) This is how my tiny brain understands the problem. Ultimately locality is at the core of the it (server side). Suppose an entity is processed 30 times in 1 second. In order for this entity to process its behaviour within that slice of time it must take in information of its nearby space. For example a brick falling through space will need to know about its neighbours so it can go bumpity-bump-bump with other bricks. We can cheat a bit by generally disregarding entities a long way away to reduce the number of interactions from squared to linear. When working within a single process we can do entity neighbour lookups very quickly as RAM access is pretty quick (we have seen this used to great effect with CUDA physx demos and so on). 10,000 entities will mean 300,000 neighbour queries on top of physics, behaviour calculations etc per second. Quite manageable within one process. However, to scale up to more entities we want to split the workload across two or more nodes and the processes cannot access each other's RAM. non-local entities (entities from different process) must talk to each other by some other medium. Quick, large distributed shared memory isn't an option with current hardware (although surely some hardware guru could build it), so we use something like standard networking. Because of this our communication speed between non-local entities has dropped by a factor of about 200 or worse. To compound this latency we find that highly dynamic environments such as MMO's will vary the load which can mean high volumes of traffic in concentrated areas. Some entities will travel insanely fast across multiple nodes (speeding bullets, airplanes, cars). Also some very selfish entities are particularly inconsiderate and want to exchange HIGH volumes of data and want to do it instantly. Transactions in market places springs to mind. Or perhaps a car with a 10 hour pre-planned route. Or an entity with a daily schedule. We can't really ever get around these limitations until we (somehow) increase the speed of access across all nodes to be as quick as if they were all a single unified node. Meanwhile designing the game/simulation is critical to making the whole experience balance out. I don't even think its possible in the generic sense to make a scaleable MMO - with current hardware. Actors, services, workers, etc I view as a kind of syntax flim-flam, froo-froo. It does little to address the limitations. My personal pet favourite tech to tackle this problem is MPI https://en.wikipedia.org/wiki/Message_Passing_Interface Edited by drainedman speling #### Share this post ##### Link to post ##### Share on other sites hplus0603 11356 You are not wrong :-) Quote We can cheat a bit by generally disregarding entities a long way away to reduce the number of interactions from squared to linear. It's actually still quadratic, but quadratic in a smaller number (number of entities divided by number of servers, times number of entity copies needed for cross-border resolution.) Similarly, a single locality query for "nearby" objects is not a constant cost, but actually has a cost that is linear in number of neighbors. While there may be 300,000 queries for 10,000 entities at 30 Hz, each query may return more than one entity, and thus may cost more than "1" along a few cost metrics (storage, memory touched, entities to check against, etc.) Quote My personal pet favourite tech to tackle this problem is MPI MPI lets you send messages between processes, using non-lossy but also not-real-time-aware TCP RPC. This is a useful primitive to use when building distributed systems, but it doesn't really get at the real question, which is "how do you structure your game design to make best use of distributed servers, and avoid placing undue burden on the server system that you have?" It seems like SpatialOS implements a particular kind of trade-off and an API to support developing entities for that trade-off. Other systems do a similar thing, reaching different conclusions with different base assumptions. This is why "how can I compare these different platforms?" is such a hard question, because it totally depends on the specifics of your game. Farmville works great on plain Amazon ECC web server instances. Unreal Tournament, not as much. #### Share this post ##### Link to post ##### Share on other sites drainedman 104 13 minutes ago, hplus0603 said: This is a useful primitive to use when building distributed systems, but it doesn't really get at the real question, which is "how do you structure your game design to make best use of distributed servers, and avoid placing undue burden on the server system that you have?" It seems like SpatialOS implements a particular kind of trade-off and an API to support developing entities for that trade-off. Other systems do a similar thing, reaching different conclusions with different base assumptions. This is why "how can I compare these different platforms?" is such a hard question, because it totally depends on the specifics of your game. Farmville works great on plain Amazon ECC web server instances. Unreal Tournament, not as much. Indeed MPI is just a tool and not a complete solution. SpatialOS is a "solution" platform but it remains to be seen that it can do the more demanding applications such as a huge Unreal tournament, for example. Of course as you pointed out we have seen this kind of tech many times before. Pikkoserver, Shinra spring to mind as the latest failures. I don't think its an impossible feat as such, just a bit limiting with current hardware. I don't really know! I suspect the answer lies in a more hardware solution than software however. #### Share this post ##### Link to post ##### Share on other sites Kylotan 10005 14 hours ago, ggambett said: 17 hours ago, Kylotan said: most developers [...] struggle significantly with creating clear partitions between components [...] That's arguably why the 2 major engines allow full communication between arbitrary entities and components - the alternatives make otherwise easy interactions quite complex to handle.) I would argue that's not a limitation of the ECS (or ECW) architecture. In fact, this limitation doesn't exist in SpatialOS - any component of any entity can communicate with any other component of any other entity by sending a command (essentially RPCs), no matter where it is (in the game world, or in computational terms). I think hplus0603 already touched upon this, but the problem is that this decomposition never comes for free. When you split logic over multiple objects you have to decide how and when those objects communicate, and what gets communicated. It's rare that any desired communication is impossible but it's common that certain communications become more verbose, more complex, slower, or all of the above. With a distributed simulation the problem gets worse. If one component has to send an asynchronous message to another to get data, that is quick but complex. If you wrap the asynchronous message and the receipt of a response into an RPC call, that is simple but slow. I see that your commands use a Request/Response pair, which is basically the former approach, and is exactly what we used at the first MMO company I worked at. It is elegant at the networking and simulation level, and complex at the game logic level. Again, I'm not criticising the SpatialOS technology - just stating that it really does require a mental shift to implement things using such a model and that developers still have to think like a network programmer even if they never have to worry about packets or serialisation. #### Share this post ##### Link to post ##### Share on other sites flodihn 281 Posted (edited) On 20/06/2017 at 7:31 AM, drainedman said: I can't afford$50k but I am planning on building a beowulf cluster to support a kind of MMO with 100k+ entities (not all human but all are persistent).

I'm just going to go for a simple method - each 2 km sq region is handled one process. Every process sends sync messages to its neighbours via network messaging. Only edges of the regions are kept in sync. I'd like to run this over a VLAN to separate network channels.

Of course this has lots of limitations but if I get this far then I will implement some load balancing e.g. split the regions with heavy loads into 1km sq regions, etc.

So long as not too many objects converge in spot and do too many interactions it might be ok.

I am currently building a mini cluster of raspberry pi's. So far I got two of them but plan to expand to four.

Each raspberry pi has 4 cores and 1 GB RAM and 16 GB SSD drive.

You could easly support 20,000 - 40,000+ concurrent players on that, but it heavily depends on the game design.
When doing single shard multiplayer games, it is critical to avoid choke points, such as having one capitol city or one location for the auction house where players tend to group together.

I made my own distributed server in Erlang and on an 8 core desktop computer I could support congestion of about 2 groups with 2000 players at the very same spot. This would be similar to players forming a warband in WoW and running around killing stuff.
With those 2 groups of 2000 players each, the same server was under heavy load but still responsive, I tried to fire up a third group of that size but then things started to lag badly.

In an other test, I spread the players evenly about the world, I had divided the world (1024x1024m) into smaller chunks of 64x64 meters. With about 60 players in each such chunk the server were running smoothly handling around 12,000 concurrent players with latencies between 10 - 50 ms.

I have not yet tested the performance on my raspberry pi cluster, but I am working on it now and should have some interesting results to share later this  summer.

Supporting 100k+ entities were most of them are static or mostly inactive should not be very hard, but it is time consuming to build such software unless you use something already existing.

Edited by flodihn

##### Share on other sites
hplus0603    11356

You could easly support 20,000 - 40,000+ concurrent players on that

While the Raspberry Pi is amazing, it's not THAT amazing. For example, the Ethernet is a 100 Mbps interface sitting on an internal USB hub. And calling the MicroSD card a "SSD" is ... a little optimistic :-)

I bet, if players aren't doing much, and you don't have real-time simulation requirements, and bandwidth is carefully managed, you could do thousands per Pi. That's still pretty amazing!

##### Share on other sites
drainedman    104

I have always generally preferred to roll my own solutions where there is not a proven technology available. I think the reason behind this is that I like to fully understand how my system works. I also like to deal in precise quantities, I like to know what is and isn't possible. Using 3rd party stuff I can never be sure if I'm using it wrong or if the "technology" is limited.

For example Erlang gets recommended a lot but I just don't see the speed advantage of using Erlang for low latency distributed simulation for instance. Perhaps I just don't like Erlang. Maybe the supposed advantage is that its easier to use but I'm already invested in my own pet technology so that its easier doesn't really hold true for me either.

In any case my approach is not treading new ground, which gives me some optimism. I have read papers on similar stuff done years ago.

##### Share on other sites
hplus0603    11356

For example Erlang gets recommended a lot but I just don't see the speed advantage of using Erlang for low latency distributed simulation for instance.

Erlang gives you the advantages of immutable data. It also gives you the advantages of ultra-cheap "processes" (more like "fibers" but the immutable data means you can't accidentally someone else.) It also gives you the advantages of being able to upgrade the running code in-place. No rolling re-starts, no socket close and reconnects, just keep on going with the new version of the code. (The immutable-isolated-data concept makes writing these in-RAM migrations possible.) Finally, it gives you very small-size, threaded garbage collection, because each little micro-process has its own heap that's collected in isolation.

The upgrade-in-place is actually quite hard to do with DLL re-loads in C/C++, or most of any other language. The other bits are possible in other languages, with slightly different trade-offs. If you're interested in systems, in general, and haven't used most of those features (especially immutable-data functional programming) on previous projects, it's at least worth checking out for learning. It is a very different paradigm, though (if you haven't already used ML/Haskell/OCaml/F#) so expect going to be very hard and uphill initially. It really takes time and effort to learn, like anything else that is actually big, different, and worthwhile.

That doesn't mean Erlang is the right choice for you. The IPC is low-latency, but cross-node communication uses TCP. The VM generates native code for execution, but the constant overhead is noticeable (Erlang is slightly slower than Java in my experience.) Not every project needs 100% uptime even through deploys and rollbacks. But, it represents a fundamentally different approach to the problems faced by distributed systems developers, and thus, it's quite worth learning in detail.

##### Share on other sites
flodihn    281
9 hours ago, hplus0603 said:

While the Raspberry Pi is amazing, it's not THAT amazing. For example, the Ethernet is a 100 Mbps interface sitting on an internal USB hub. And calling the MicroSD card a "SSD" is ... a little optimistic :-)

I bet, if players aren't doing much, and you don't have real-time simulation requirements, and bandwidth is carefully managed, you could do thousands per Pi. That's still pretty amazing!

I was referring to my Raspberry cluster of 4 Raspberry's, maybe that was not clear in my post, sorry about that. But yeah, the numbers heavily depend on the type of game play you have in your game.