How Galaxy, an In-Memory Data Grid for MMO Games, Works

Started by
14 comments, last by pronpu 11 years, 7 months ago
Hi.
A new blog post explains the internal workings of Galaxy, an open-source, in-memory data grid for MMO games.
Advertisement
Has to handle transaction rollback to prevent deadlock/livelock.

Failure protection via master-node redundancy and cohesive sync imaging are needed as well.

Row sizing policy for different data types to give the system a hint on how large to make the 'cache rows' also would be a useful addition.
--------------------------------------------[size="1"]Ratings are Opinion, not Fact
Maybe I'm getting something wrong, but I don't understand the scalability claims. The way the system relies on broadcasts both to enforce ordering and to fetch cache lines means that it won't scale better than your network. It looks to me as if you have moved the serialization bottleneck of a single CPU/memory bus, into a single Ethernet broadcast network.

Unfortunately, Ethernet networks are slower than memory busses. I would expect that, for data limited workloads, the system you propose will top out SOONER than an equivalent system built using unified memory.

This might be very similar to the problems that hampered and ultimately killed Project Darkstar, and that have prevented most Tuple Space implementations from gaining much traction.

I may be missing something though. Do you know of a real-world system using a similar mechanism that has actually scaled well? What previous experience made you build the system the way it is now?

enum Bool { True, False, FileNotFound };

Has to handle transaction rollback to prevent deadlock/livelock.


Done. See docs.


Failure protection via master-node redundancy and cohesive sync imaging are needed as well.


Done. See this blog post.


Row sizing policy for different data types to give the system a hint on how large to make the 'cache rows' also would be a useful addition.


This is not required as cache rows are dynamically sized (each row contains one object).

Maybe I'm getting something wrong, but I don't understand the scalability claims. The way the system relies on broadcasts both to enforce ordering and to fetch cache lines means that it won't scale better than your network. It looks to me as if you have moved the serialization bottleneck of a single CPU/memory bus, into a single Ethernet broadcast network.


That's not the way galaxy works. Broadcasts are very rare, and are only required the first time a line is requested.


This might be very similar to the problems that hampered and ultimately killed Project Darkstar, and that have prevented most Tuple Space implementations from gaining much traction.


Project Darkstar never got to the point of supporting multiple nodes. Galaxy was designed precisely to target issues with tuple-spaces. Tuple spaces use a distributed-hash-table requiring 1 network hop in the common case (and in the worst case). Galaxy requires 0 network hops in the common case (for the price of more than one network hops worst-case).

See here for the rational behind Galaxy's architecture.

Note that Galaxy is just a distribution layer. Its real power comes from distributed data structures built on top of it. I will soon write a blog post at highscalability.com analyzing the efficiency of a well-behaving Galaxy data-structure.
That's not the way galaxy works. Broadcasts are very rare, and are only required the first time a line is requested.


Evenso, Ethernet is slower than RAM.

Project Darkstar never got to the point of supporting multiple nodes.[/quote]

Actually, they did, but the scalability coefficient was negative. One of the reasons why I think trying to run "shared state" systems over networks is almost always the wrong solution.

enum Bool { True, False, FileNotFound };

Evenso, Ethernet is slower than RAM.


I'm not quite sure what you mean. The whole point behind Galaxy is reducing network access and making sure most operations access RAM only. Galaxy is meant to use the network less than other IMDGs/centralized DBs, not more.


Actually, they did, but the scalability coefficient was negative.


I don't want this to turn into a factual argument, but this is simply not true. They did run a simple experiment where they used their single-node logic in a multi-node setting (I think simply by accessing a single BerkeleyDB instance without any caching) and got negative scalability as expected. Full multi-node operation, however, was never implemented. Here's a question asked on the Red Dwarf (Darkstar's successor after Oracle had shut down Darkstar) forums on Mar 29, 2011, and answered by cyberqat, Darkstar's lead developer:
Q: Is it possible with RedDwarf server part to connect multiple servers, so that they work as a single logical unit ? I think its about clustering and load balancing but I'm not sure...
A: What you are referring to is "multi-node" operation. This is designed and intended, but not yet fully implemented.


One of the reasons why I think trying to run "shared state" systems over networks is almost always the wrong solution.


That's the big question. The first thing to understand, though, is what is meant by "shared state". After all, all shared state solutions, even the shared RAM inside a single machine, are implemented using message passing. Messages are always sent as a result of contention, so contention always leads to communication which increases latency. If you have a conceptual shared state where no contention ever occurs (suppose you have a world divided into 4 quadrants distributed to 4 machines, but, by pure chance, no object ever crosses the quadrant borders and objects across quadrants don't see or affect each other), then no messages will be ever passed and you'll have perfect linear scalability. On the other hand, if you ever need to scale any data processing (be it a game or any other application) beyond the capabilities of a single node, than the problem domain may absolutely require message passing if more than one node would ever require accessing the same data item. So contention can sometimes never be eliminated fully, simply by the nature of your problem.
Now, whether you use "message passing" or "shared state", contention can and will occur, and will invariably increase latency. It may be true that if you use the shared-state concept to model your problem, you may be drawn to increase contention simply because it is better hidden by the model than in the message-passing idiom. However, I'd like to claim that for some problems, the concept of shared state is much more natural. If you have one big game-world with lots of objects, that simply cannot be simulated by a single machine, you will need to break it up into pieces somehow, and being able to still treat it as a single world will be quite natural. In order to scale, though, you will still need to reduce contention. If you build your distributed world-data-structure just right, Galaxy will help you reduce contention, and decrease the number of messages passed among nodes, so that nearly all operations are executed on a single node.

[quote name='hplus0603' timestamp='1344527262' post='4967820']
Evenso, Ethernet is slower than RAM.


I'm not quite sure what you mean. The whole point behind Galaxy is reducing network access and making sure most operations access RAM only. Galaxy is meant to use the network less than other IMDGs/centralized DBs, not more.
[/quote]

My point is that IMDGs and centralized DBs is the wrong approach. You want to physically shard in some smart way, and reduce interactions between shards, for actual scalability.


Actually, they did, but the scalability coefficient was negative.


I don't want this to turn into a factual argument, but this is simply not true. They did run a simple experiment
[/quote]

Interesting! That goes against what Jeff what's-his-face told me in person at GDC a few years ago, and also said publicly in their forums. I'm not surprised at all, though, as "Darkstar" seems to have been 90% marketing fluff anyway.

If you build your distributed world-data-structure just right, Galaxy will help you reduce contention, and decrease the number of messages passed among nodes, so that nearly all operations are executed on a single node.
[/quote]

That sounds like a very reasonable and balanced statement that I can believe!
enum Bool { True, False, FileNotFound };

You want to physically shard in some smart way, and reduce interactions between shards, for actual scalability.


Well, I believe that sufficiently advanced technology can both automate manual tasks, making them easier and doing a better job in most cases, while also making what's been previously impossible or extremely difficult, quite possible if not easy. Otherwise, what's the use of computer science?
The problem, as I see it, is that, with physical sharding, you end up forcing interactions onto a RAM bus, which is fast.
With "object soup" -- multiple overlying objects realms, or a big shared state domain -- at least some of the interactions will be over a network, and a network is 100x slower than a memory bus, or worse.
Thus, if only 1% of your interactions are across a network, you become network limited instead of RAM limited. For some topologies, the allowable interaction fraction is even smaller, such as 0.1%.

Now, consider a "radar" that wants to scan all objects within a particular geographic area. Suppose that each player has a radar. This means that you will, by necessity, transfer information about most players, to most players. This means that your network interaction fraction will quite possibly go over the 1% limit, and thus your overall networked system is more constrained than if you had just fit it all into RAM. There are many other use cases where you get similar interactions; radar is just the most obvious.

You can work around this, by building a "radar manager" that collects local radar information into a single data item, that is then shared with others, so you scale by O(radar managers) instead of O(players) -- in effect, you're micro-sharding the particular use case of radars. Then you end up with inefficiencies in the amount of data shared, because the memory-optimizing systems generally aren't distributed according to locality. After a while of doing this for each and every kind of subsystem, though, you start longing for the straightforward days of physical sharding :-)

A system for distributed, scalable interactions I like better, is one where you are only allowed to depend on the state of other objects from one step ago. This means that everyone simulates their next step based on the previous state of anything they interact with. Then, state is exchanged between entities that depend on other entities (this is a networked graph operation.) This still may become network limited on the amount of object state, but the limitation is very manageable, because you can manage exactly what goes where, when, and it's done at a very precise and well-defined point in time (and space).

Anyway, I look forward to Galaxy going down this discovery path, and coming up with "just right" data structures to minimize the cost of network limitations, and I look forward to seeing each step shared on this forum :-)
enum Bool { True, False, FileNotFound };

This topic is closed to new replies.

Advertisement