Jump to content

  • Log In with Google      Sign In   
  • Create Account


How Galaxy, an In-Memory Data Grid for MMO Games, Works


Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.

  • You cannot reply to this topic
15 replies to this topic

#1 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 26 July 2012 - 11:56 AM

Hi.
A new blog post explains the internal workings of Galaxy, an open-source, in-memory data grid for MMO games.

Edited by pronpu, 26 July 2012 - 11:56 AM.


Sponsor:

#2 wodinoneeye   Members   -  Reputation: 746

Like
0Likes
Like

Posted 26 July 2012 - 10:51 PM

Has to handle transaction rollback to prevent deadlock/livelock.

Failure protection via master-node redundancy and cohesive sync imaging are needed as well.

Row sizing policy for different data types to give the system a hint on how large to make the 'cache rows' also would be a useful addition.
--------------------------------------------Ratings are Opinion, not Fact

#3 hplus0603   Moderators   -  Reputation: 5155

Like
0Likes
Like

Posted 27 July 2012 - 03:52 AM

Maybe I'm getting something wrong, but I don't understand the scalability claims. The way the system relies on broadcasts both to enforce ordering and to fetch cache lines means that it won't scale better than your network. It looks to me as if you have moved the serialization bottleneck of a single CPU/memory bus, into a single Ethernet broadcast network.

Unfortunately, Ethernet networks are slower than memory busses. I would expect that, for data limited workloads, the system you propose will top out SOONER than an equivalent system built using unified memory.

This might be very similar to the problems that hampered and ultimately killed Project Darkstar, and that have prevented most Tuple Space implementations from gaining much traction.

I may be missing something though. Do you know of a real-world system using a similar mechanism that has actually scaled well? What previous experience made you build the system the way it is now?


enum Bool { True, False, FileNotFound };

#4 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 09 August 2012 - 03:24 AM

Has to handle transaction rollback to prevent deadlock/livelock.


Done. See docs.

Failure protection via master-node redundancy and cohesive sync imaging are needed as well.


Done. See this blog post.

Row sizing policy for different data types to give the system a hint on how large to make the 'cache rows' also would be a useful addition.


This is not required as cache rows are dynamically sized (each row contains one object).

#5 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 09 August 2012 - 03:31 AM

Maybe I'm getting something wrong, but I don't understand the scalability claims. The way the system relies on broadcasts both to enforce ordering and to fetch cache lines means that it won't scale better than your network. It looks to me as if you have moved the serialization bottleneck of a single CPU/memory bus, into a single Ethernet broadcast network.


That's not the way galaxy works. Broadcasts are very rare, and are only required the first time a line is requested.

This might be very similar to the problems that hampered and ultimately killed Project Darkstar, and that have prevented most Tuple Space implementations from gaining much traction.


Project Darkstar never got to the point of supporting multiple nodes. Galaxy was designed precisely to target issues with tuple-spaces. Tuple spaces use a distributed-hash-table requiring 1 network hop in the common case (and in the worst case). Galaxy requires 0 network hops in the common case (for the price of more than one network hops worst-case).

See here for the rational behind Galaxy's architecture.

Note that Galaxy is just a distribution layer. Its real power comes from distributed data structures built on top of it. I will soon write a blog post at highscalability.com analyzing the efficiency of a well-behaving Galaxy data-structure.

#6 hplus0603   Moderators   -  Reputation: 5155

Like
0Likes
Like

Posted 09 August 2012 - 09:47 AM

That's not the way galaxy works. Broadcasts are very rare, and are only required the first time a line is requested.


Evenso, Ethernet is slower than RAM.

Project Darkstar never got to the point of supporting multiple nodes.


Actually, they did, but the scalability coefficient was negative. One of the reasons why I think trying to run "shared state" systems over networks is almost always the wrong solution.


enum Bool { True, False, FileNotFound };

#7 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 09 August 2012 - 02:30 PM

Evenso, Ethernet is slower than RAM.


I'm not quite sure what you mean. The whole point behind Galaxy is reducing network access and making sure most operations access RAM only. Galaxy is meant to use the network less than other IMDGs/centralized DBs, not more.

Actually, they did, but the scalability coefficient was negative.


I don't want this to turn into a factual argument, but this is simply not true. They did run a simple experiment where they used their single-node logic in a multi-node setting (I think simply by accessing a single BerkeleyDB instance without any caching) and got negative scalability as expected. Full multi-node operation, however, was never implemented. Here's a question asked on the Red Dwarf (Darkstar's successor after Oracle had shut down Darkstar) forums on Mar 29, 2011, and answered by cyberqat, Darkstar's lead developer:
Q: Is it possible with RedDwarf server part to connect multiple servers, so that they work as a single logical unit ? I think its about clustering and load balancing but I'm not sure...
A: What you are referring to is "multi-node" operation. This is designed and intended, but not yet fully implemented.

One of the reasons why I think trying to run "shared state" systems over networks is almost always the wrong solution.


That's the big question. The first thing to understand, though, is what is meant by "shared state". After all, all shared state solutions, even the shared RAM inside a single machine, are implemented using message passing. Messages are always sent as a result of contention, so contention always leads to communication which increases latency. If you have a conceptual shared state where no contention ever occurs (suppose you have a world divided into 4 quadrants distributed to 4 machines, but, by pure chance, no object ever crosses the quadrant borders and objects across quadrants don't see or affect each other), then no messages will be ever passed and you'll have perfect linear scalability. On the other hand, if you ever need to scale any data processing (be it a game or any other application) beyond the capabilities of a single node, than the problem domain may absolutely require message passing if more than one node would ever require accessing the same data item. So contention can sometimes never be eliminated fully, simply by the nature of your problem.
Now, whether you use "message passing" or "shared state", contention can and will occur, and will invariably increase latency. It may be true that if you use the shared-state concept to model your problem, you may be drawn to increase contention simply because it is better hidden by the model than in the message-passing idiom. However, I'd like to claim that for some problems, the concept of shared state is much more natural. If you have one big game-world with lots of objects, that simply cannot be simulated by a single machine, you will need to break it up into pieces somehow, and being able to still treat it as a single world will be quite natural. In order to scale, though, you will still need to reduce contention. If you build your distributed world-data-structure just right, Galaxy will help you reduce contention, and decrease the number of messages passed among nodes, so that nearly all operations are executed on a single node.

Edited by pronpu, 09 August 2012 - 02:31 PM.


#8 hplus0603   Moderators   -  Reputation: 5155

Like
0Likes
Like

Posted 10 August 2012 - 02:15 AM


Evenso, Ethernet is slower than RAM.


I'm not quite sure what you mean. The whole point behind Galaxy is reducing network access and making sure most operations access RAM only. Galaxy is meant to use the network less than other IMDGs/centralized DBs, not more.


My point is that IMDGs and centralized DBs is the wrong approach. You want to physically shard in some smart way, and reduce interactions between shards, for actual scalability.


Actually, they did, but the scalability coefficient was negative.


I don't want this to turn into a factual argument, but this is simply not true. They did run a simple experiment


Interesting! That goes against what Jeff what's-his-face told me in person at GDC a few years ago, and also said publicly in their forums. I'm not surprised at all, though, as "Darkstar" seems to have been 90% marketing fluff anyway.

If you build your distributed world-data-structure just right, Galaxy will help you reduce contention, and decrease the number of messages passed among nodes, so that nearly all operations are executed on a single node.


That sounds like a very reasonable and balanced statement that I can believe!

enum Bool { True, False, FileNotFound };

#9 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 10 August 2012 - 06:35 PM

You want to physically shard in some smart way, and reduce interactions between shards, for actual scalability.


Well, I believe that sufficiently advanced technology can both automate manual tasks, making them easier and doing a better job in most cases, while also making what's been previously impossible or extremely difficult, quite possible if not easy. Otherwise, what's the use of computer science?

Edited by pronpu, 10 August 2012 - 08:27 PM.


#10 hplus0603   Moderators   -  Reputation: 5155

Like
0Likes
Like

Posted 11 August 2012 - 05:21 AM

The problem, as I see it, is that, with physical sharding, you end up forcing interactions onto a RAM bus, which is fast.
With "object soup" -- multiple overlying objects realms, or a big shared state domain -- at least some of the interactions will be over a network, and a network is 100x slower than a memory bus, or worse.
Thus, if only 1% of your interactions are across a network, you become network limited instead of RAM limited. For some topologies, the allowable interaction fraction is even smaller, such as 0.1%.

Now, consider a "radar" that wants to scan all objects within a particular geographic area. Suppose that each player has a radar. This means that you will, by necessity, transfer information about most players, to most players. This means that your network interaction fraction will quite possibly go over the 1% limit, and thus your overall networked system is more constrained than if you had just fit it all into RAM. There are many other use cases where you get similar interactions; radar is just the most obvious.

You can work around this, by building a "radar manager" that collects local radar information into a single data item, that is then shared with others, so you scale by O(radar managers) instead of O(players) -- in effect, you're micro-sharding the particular use case of radars. Then you end up with inefficiencies in the amount of data shared, because the memory-optimizing systems generally aren't distributed according to locality. After a while of doing this for each and every kind of subsystem, though, you start longing for the straightforward days of physical sharding :-)

A system for distributed, scalable interactions I like better, is one where you are only allowed to depend on the state of other objects from one step ago. This means that everyone simulates their next step based on the previous state of anything they interact with. Then, state is exchanged between entities that depend on other entities (this is a networked graph operation.) This still may become network limited on the amount of object state, but the limitation is very manageable, because you can manage exactly what goes where, when, and it's done at a very precise and well-defined point in time (and space).

Anyway, I look forward to Galaxy going down this discovery path, and coming up with "just right" data structures to minimize the cost of network limitations, and I look forward to seeing each step shared on this forum :-)

enum Bool { True, False, FileNotFound };

#11 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 21 August 2012 - 12:25 AM

Thus, if only 1% of your interactions are across a network, you become network limited instead of RAM limited. For some topologies, the allowable interaction fraction is even smaller, such as 0.1%.


I wrote a blog post on highscalability.com today analyzing the performance of a data structure distributed on top of Galaxy. It proves that if you use a data structure that fits well with Galaxy's design, you'll get far, far fewer than 0.1% of the transactions would require any network hops.

#12 hplus0603   Moderators   -  Reputation: 5155

Like
0Likes
Like

Posted 21 August 2012 - 09:59 AM

if you use a data structure that fits well with Galaxy's design


I also have no argument with that. The question is whether you can implement the actual gameplay you want with that data structure.

enum Bool { True, False, FileNotFound };

#13 andyaustin   Members   -  Reputation: 108

Like
0Likes
Like

Posted 21 August 2012 - 02:50 PM

@pronpu, great post at HS. I have one question though - many scenarios would require dealing with border objects (items that are in one node, but they are spatially close to items in another node so as to cause interaction) so won't that require network IOs? Even if we assume those type of items are shared across the two nodes, updating them frequently would result in network IO isn't it?

Edited by andyaustin, 21 August 2012 - 02:51 PM.


#14 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 22 August 2012 - 06:58 PM

updating them frequently would result in network IO isn't it?


Well, after an object is updated (once or multiple time), a read from any other node would require a network roundtrip. Just items close to one another "across the border" won't cause network IO, even if clients need to see items on both sides. You can just send the data to the client from both nodes.
It's only when the items interact that inter-node IO would be required. Now, if one node initiates a transaction, all items in that transactions are transferred to that node (to be "owned" by it), and then they stay there. So, for example, if a character stands right on the border and touches an object on side, that object will be transferred to the character's node; if the character then touches an object on the other side, this, too, will be migrated to the same node, and any further interaction with those two objects will no longer require network IO.

So, potentially you'll have a problem if two characters stand on either side of the border, kicking, say, a ball between them back and forth. But how fast could that happen? Would your game allow them to pass the ball hundreds of time a second?

#15 andyaustin   Members   -  Reputation: 108

Like
0Likes
Like

Posted 22 August 2012 - 09:53 PM


updating them frequently would result in network IO isn't it?


Well, after an object is updated (once or multiple time), a read from any other node would require a network roundtrip. Just items close to one another "across the border" won't cause network IO, even if clients need to see items on both sides. You can just send the data to the client from both nodes.
It's only when the items interact that inter-node IO would be required. Now, if one node initiates a transaction, all items in that transactions are transferred to that node (to be "owned" by it), and then they stay there. So, for example, if a character stands right on the border and touches an object on side, that object will be transferred to the character's node; if the character then touches an object on the other side, this, too, will be migrated to the same node, and any further interaction with those two objects will no longer require network IO.

So, potentially you'll have a problem if two characters stand on either side of the border, kicking, say, a ball between them back and forth. But how fast could that happen? Would your game allow them to pass the ball hundreds of time a second?

Thanks for the explanation, pronpu! I'm thinking in the lines of a fast paced game (let's say it is similar to Quake, but not exactly a FPS). If for some reason two players are handled by two nodes, and they need to know about the other's position (they both are running around so their positions update many times per second, and the game's engine needs to do some processing on these objects), then with Galaxy the nodes would need to read data from each other everytime the position changes if I understand it correctly. I was wondering if there are some standard techniques to reduce this communication.

#16 pronpu   Members   -  Reputation: 139

Like
0Likes
Like

Posted 27 August 2012 - 07:11 PM

If for some reason two players are handled by two nodes, and they need to know about the other's position... then with Galaxy the nodes would need to read data from each other everytime the position changes if I understand it correctly. I was wondering if there are some standard techniques to reduce this communication.


Well, if the two characters just see each other, then there's no reason why each wouldn't get updates about the other from the other's node. If they interact then the way Galaxy works, after the first interaction they would both be handled on the same node. So the scenario you describe could really only happen if the two interact indirectly, say by passing a ball between them back and forth. In that case, the ball would migrate from one node to the other and back. That would entail one network roundtrip for each pass. But how fast could the two characters pass the bal between them?

Edited by pronpu, 27 August 2012 - 07:12 PM.





Old topic!
Guest, the last post of this topic is over 60 days old and at this point you may not reply in this topic. If you wish to continue this conversation start a new topic.



PARTNERS