Completely distributed game logic design

Antheus · 2007-08-04T15:37:47

While trying to find efficient and scalable distributed application design, I got somewhat stuck on MPI approach. The idea here is to try to avoid any kind of explicit synchronization. [message queue] --> Object 1 // thread [message queue] --> Object 2 // thread [message queue] --> Object 3 // thread ... [message queue] --> Object n // thread with usual message loop: while (this->isAlive()) { Message *msg = next_message(); handle( msg ); } The threading doesn't really imply a thread per object (given hundreds of thousands of objects that's unviable), nor does it imply true threads (objects can be distributed over multiple machines and/or processes. With this design, each object represents equivalent of thread's own stack space. Queues would be implemented using one of asynchronous dispatching strategies (Reactor/Proactor), with central demuxer assigning objects with non-empty message queues to available worker threads. Demuxer ensures that a single objects is never handled by more than one thread at the same time (in a way similar to fibers, co-routines). But this would have perhaps the greatest impact on programming model. While setting values becomes trivial (SetValueMessage( id, value )), querying the value becomes problematic for objects that are not currently handled by the same thread (1 thread per object). This would devolve into rather verbose message passing semantics (typical combat scenario): // In player object on_attack( target, spell ) { if (this->has_spell( spell ) ) { if (this->get_energy() > spell.energy_cost() ) { send( AttackMessage, this->current_weapon(), this, target, spell ); } } } // In weapon object on_attack( attacker, target, weapon, spell ) { int base_damage = rand( this->minimum_damage(), this->maximum_damage() ); send( DealDamageMessage, target, attacker, base_damage, this->extra_damage(), spell); } // In target object on_deal_damage( attacker, base_damage, extra_damage ) { send( AbsorbDamageMessage, this->current_armor(), attacker, this, base_damage, extra_damage ); } // In armor object on_absorb_damage( attacker, target, base_damage, extra_damage ) { int adjusted_damage = base_damage * this->armor_rating() + extra_damage; send( DealAdjustedDamage, target, attacker, adjusted_damage ); } // Back in target object on_deal_adjusted_damage( attacker, adjusted_damage ) { // this also triggers the change in health value to be propagated // to remote nodes health.set( this.health.get() - adjusted_damage ); } send() has the signature of send( MessageType, SendDestination, ... ); Explicit use of this-> indicates that it's only possible to query values belonging to the object which is currently handled by the thread. While other objects may exist in same address space, their values cannot be obtained, since they might be modified by some other thread (or running in parallel on another machine, causing data inconsistency. This would be equivalent to all get_ functions being protected or private, and message handlers member functions. While this approach appears to solve concurrency issues, as well as data consistency problems (events will occur in proper sequence, but several events from different clients would not have guaranteed order of execution, possibly just bucket synchronization), the programming model is somewhat cumbersome. The number of different messages is rather large, and the current syntax isn't really well suited (but that's solvable). However, I fail to see an alternative to this. Allowing values on other objects to be queried directly requires synchronization on per-object basis. While usually not very drastic, it does represent overhead. It also cannot be done across multiple processes. If implemented in C++, individual actions would likely be in form of functions (or functors), that would pass the object reference via IOC, guaranteeing thread-safety. When passing messages between cluster nodes, the order of events would be preserved. Regardless of where the objects physically reside, there is always only one single message queue handler responsible, so this eliminates authority issues. I realize that this is merely a re-invention of similar MPI solutions or lambda calculus, but it's turned out somewhat hard to find practical alternatives to classic single-threaded design, or at least any attempt at reducing required locking. There are obvious problems even with this design. One example would be buying an item from a vendor: A ------ HasItemMessage ---> Vendor | A <-- ItemAvailableMessage ----+ | // Here another player buys item +---- BuyItemMessage ------> Vendor | + // Although vendor said they have the itme, it's no longer available, // making the whole query process somewhat redundant, and we could go with // BuyItemMessage from the start But I consider these higher level problems. Either that, or allowing for some form of soft locks, similar to way files are sometimes used to limit access to a directory or file, where presence of a lock. file means the resource is currently in use. Anyway, I'm just brainstorming, perhaps someone will find a fatal flaw in this whole concept that makes this useless in the first place.

Antheus

2,410

Author

July 19, 2007 06:42 AM

Quote:I was going to object that this scales by n-squared in total number of objects on the cluster, which isn't good for scalability, but you already noted that. This is the main reason why I think geographic division is still a necessity for anything physically based and real-time.

This might not be necessary.

Fully distributed collision detection is likely not viable. But there's frequently little need to do so.

For a semi-real-time application, an approximation of CD will likely suffice (with clients at 250ms round-trip time, anything more is unrealistic). CD can then be performed on dedicated servers, each of which serves a geographic area (these can be co-located alongside other nodes).

When an entity is moved, its host notifies the appropriate server about where the movement will take place. CD Server verifies for collisions, and notifies clients back. Calculation of CD is still O(n^2), but network traffic in this case is only O(2n).

But perhaps more importantly, CD is a problem well suited for different model - MapReduce for example. It operates on a fixed set of input data, performs a monolithic calculation depending on the input set only, and produces a set for output. Unlike game logic, which operates on a large variable set of objects, each of which is active only for a fraction of time.

The model I proposed was actually somewhat mis-leading with respect to MPI. While messages are used in communication, the control flow is the same as in agent based architectures. Each command a controller invokes creates an artifact (an agent), which then moves between objects, operating on two sets of data - command's own state (parameters), and object's state.

I was primarily concerned with lower level details of finding a solution suitable for both, multi-core and networked distribution, so I kinda lost the big picture view. There hasn't been much association of agent based systems with MMOs, so that connection escaped me at first.

In the above combat example, a CombatAgent would be sent between objects, it would accumulate the resolution of combat, and travel between all the affected nodes (player->weapon->target->armor->target). Fortunately, agent systems are fairly well documented, making further design somewhat simpler, or at least provide reference points. It also makes the logic and message semantics somewhat more intuitive.

hplus0603

11,916

July 19, 2007 07:57 AM

Quote:with clients at 250ms round-trip time, anything more is unrealistic

Our OLIVE(tm) architecture does 30 Hz physical updates in an MMO context, at client round trip latencies up to 1200 milliseconds. We keep messaging within the server cluster down to less than a step duration, total, but we also don't need to send interactions across the servers, due to the way we design our zoning and ghosting. But it's also somewhat overkill for the kind of scenario you describe, which sounds much more RPG-like.

enum Bool { True, False, FileNotFound };

Antheus

2,410

Author

July 19, 2007 05:29 PM

Quote:Original post by hplus0603
Quote:with clients at 250ms round-trip time, anything more is unrealistic

Our OLIVE(tm) architecture does 30 Hz physical updates in an MMO context, at client round trip latencies up to 1200 milliseconds. We keep messaging within the server cluster down to less than a step duration, total, but we also don't need to send interactions across the servers, due to the way we design our zoning and ghosting. But it's also somewhat overkill for the kind of scenario you describe, which sounds much more RPG-like.

Perhaps I steered off track by mentioning DHT - I merely wanted to point out the transparency of the design.

For all practical purposes the server runs typical zoned architecture, with logic most likely running in a single thread per process (possible like a co-routine, so it can be suspended and resumed as needed).

As such, there's no reason for parallel tasks running different types of processing.

The latency of 250ms I mentioned applied more to practical problems (how long till player is rubber-banded back after collision has been resolved). For a car simulation, latency would inevitably play a role - or at very least, I'm not aware of any latency independent mechanism that would allow true real-time physics without encountering some artifacts (apart from fixed step simulation).

Antheus

2,410

Author

July 21, 2007 05:03 PM

Meh, guess I invented Actor model

Oh well, at least I can look at Erlang's syntax, and map its concepts into Lua and C++.

The only benefit lies with implementation, whereas erlang's threads supposedly occupy 300 bytes, I can currently get by with under 80 bytes/thread. Guess I'll toy around with this a bit more to see how it works out.

hplus0603

11,916

July 22, 2007 09:53 AM

An alternative is to bind your messaging to fibers (a.k.a. cooperative threads, or co-routines). That way, you can send a message, and block until you get a reply.

However, no matter how you wrap it, synchronous cross-object calls will lead to deadlocks unless you enforce pretty stringent ordering rules (lock hierarchies). As you've discovered, you don't want to put the actions in the objects anymore; instead, the actions are visitors that get marshaled around.

Which, in turn, gets into a whole other bag of transactional update problems, which in turn have been researched in the clustered database space for a long time. You'll probably end up with not only actors, but also quite possibly transaction monitors.

enum Bool { True, False, FileNotFound };

Antheus

2,410

Author

July 22, 2007 10:42 AM

Quote:An alternative is to bind your messaging to fibers (a.k.a. cooperative threads, or co-routines). That way, you can send a message, and block until you get a reply.

I've got that already. It's software based lock-free event demuxer (software-based, since I strongly doubt any OS would look kindly upon allocating millions of kernel objects for this kind of tasks), that ensures that each object is executed by a single thread at a time only.

Quote:However, no matter how you wrap it, synchronous cross-object calls will lead to deadlocks unless you enforce pretty stringent ordering rules (lock hierarchies). As you've discovered, you don't want to put the actions in the objects anymore; instead, the actions are visitors that get marshaled around.

This is why I'm disallowing synchronous calls altogether. Message dispatch is non-blocking, uni-directional, and foreign objects aren't accessible, outside of their handle. So all objects references that individual objects hold are just pointers that cannot be dereferenced, but can be sent a message.

The final version of the model is essentially equivalent to stackless python, and the programming model is identical to organization presented in an example, covering even collision detection and other aspects.

Quote:Which, in turn, gets into a whole other bag of transactional update problems, which in turn have been researched in the clustered database space for a long time. You'll probably end up with not only actors, but also quite possibly transaction monitors.

I've noted transaction problems in the beginning. What I still cannot find is an example of where I'd need transactions. While I'm sure there are examples of certain actions which might require some strict sequencing, I haven't been able to find examples of it in the game logic I've currently worked out. This means I'm either missing a huge thing, or that such situations are rare (perhaps not in general case, but in rpg-ish oriented logic).

Bank problem: "5 people want to withdraw $100 each from an account with $275". 5 messages are sent to bank object. First 2 that arrive generate a return message containing amount withdrawn (receivers then increase their own money value in response to on_receive_money message), other 3 receive "not enough funds" message.

Whether the order here was correct is unanswered. Did the right 2 people receive the money? Since the client latency is not defined, I cannot say who really did something first. Like I said, perhaps I'm missing something... When a resource is accessed on first-come-first-serve basis, there's simply no way to be fair about it (it could come to nanoseconds or instruction cache ordering on the client-side).

Then again, agent-based systems usually provide a blackboard. Even if explicitly or implicitly locked, it would only be needed to cover special cases.

Part of the reasoning I'm applying here (it is a non-deterministic system after-all), is that as long as client's actions make sense to client (kick, punch, move doesn't become punch, die, move, kick), the very latency of such worlds adds non-determinism to "absolute" sequence of events.

hplus0603

11,916

July 25, 2007 11:35 AM

That's the simplest bank problem, with only one resource. However, it still needs transactions. Consider:

1) Action A withdraws $100.
2) system crashes

Where did the $100 go? They went into thin air.

A better example, very RPG-ish: Two players want to trade; $100 for a Vorpal Blade of Pwnage.

So, the operation needs to do the following things, with a partial ordering:

1) withdraw $100 from player 1
2) withdraw a Vorpal Blade of Pwnage from player 2
3) deposit $100 to player 2
4) deposit a Vorpal Blad of Pwnage to player 1

Now, the system can crash anywhere after #1 has executed, but #4 has not committed to hard storage. Given that player 1 and player 2 can actually live on different machines, it's even a distributed transaction.

Btw: a similar bank problem is "user A wants to transfer $100 from acct 1 to acct 2; user B wants to transfer $100 from acct 2 to acct 1" which may lead to deadlock, or may lead to distributed transaction failure.

enum Bool { True, False, FileNotFound };

Antheus

2,410

Author

July 25, 2007 01:34 PM

Unlike pure actor models, all data I have exists in tangible space, not in some network ether. As such, money can't disappear, unless deliberately doing so.

In many ways, almost everything should be a transaction. Firing a gun disposes a bullet locally, and converts the bullet into damage at the client. Unfortunately, while databases offer full transactional support, logic in a database isn't viable for lower-end designs.
This is why I separate into 3 types of transactions:
- Irrelevant (actions which may as well get lost, movement, chat, many combat actions)
- Important (item consumption, firing a gun, losing a bullet). These can be handled through basic roll-back techniques, for performance reasons, these can also be handled in a soft manner for each type of transaction independently, possibly with acks.
- Critical (cross-process object migration, trade) - handled individually through lower-level RPC-style messagin, or through creation of transaction-like objects (TradeBroker, persisted, handles the transaction and commit logic). These must explicitly succeed before commits are made.

While developing a fully transacted system would be nice, even the highest end databases would struggle at typical number of interactions. Perhaps Project Darkstar will prove this wrong, or perhaps I over-estimate the exact workload. Then again, that project hasn't yet been tested in a true virtual world style model.

I'm sure that a fully distributed transacted system would be great. But there's also another type of problem. Virtual worlds, unless explicitly zoned, act as one single application. Failure of any part of cluster, for the purpose of consistency, would need to block entire application. If for nothing else, then simply for butterfly effect. How can I know, in general case, that if a mob doesn't spawn 100 miles away, I will not be affected at some later point?

For example, space simulation. If the earth node goes down, what happens to moon? Does it continue rotating or does it continue in straight line? May I assume that Earth is still there where it was? What if node didn't go down, but it was Vorgon fleet? Is it even possible to assume anything here, or is the only solution to freeze entire cluster?

My system is based on some assumptions. Interactions between objects are on a very local scale. Trades will almost exclusively be performed on same node. Objects beyond LoS will not affect me. Most actions carry no long-term effect. Most actions are short-lived, with well known side-effects.

Special class of problems I do need to take care off fall into containment category. Trades, cluster migration and object deletion are all perfect examples. Fortunately, they are also quite limited, and most of the system can be oblivious to this.

hplus0603

11,916

July 25, 2007 02:48 PM

Quote:all data I have exists in tangible space

Ah, but then you have to either centralize, and have a limitation on scalability, or solve the distributed transaction problem.

In OLIVE, we federate all the data across multiple database instances, each of which stores a horizontal slice of the database. If one server goes down, then transactions involving users on that server will temporarily stall or fail, but the rest of the system works fine. I bet some of our users have 20+ separate database instances, so only 5% of users would be affected by a single machine failure.

Project Darkstar, as it currently stands, uses a single database back-end, and won't even do object inflation caching -- each object is fully de-serialized, acted on, and then re-serialized, for each operation. I doubt that'll scale all that well. They've certainly indicated that they don't expect real-time changes (like terrain queries, physics, etc) to be part of the Darkstar back-end, but instead solved "outside the box."

enum Bool { True, False, FileNotFound };

Antheus

2,410

Author

August 04, 2007 09:27 AM

Quote:Original post by hplus0603Btw: a similar bank problem is "user A wants to transfer $100 from acct 1 to acct 2; user B wants to transfer $100 from acct 2 to acct 1" which may lead to deadlock, or may lead to distributed transaction failure.

I've worked out a seemingly reliable solution to transactions.

All logic used during run-time exists in transient memory (read from database, then kept on servers in memory).
Objects that exist are based on component model, with each object possibly being a separate process.

From practical perspective, each message handler is atomic. A message on an object is either executed fully (and possibly committed to database), or it didn't happen.

However, object ownership transfers are inherently non-atomic, and require, by definition touching multiple objects.

As such, I define a transaction over set of tuples [ TRANSFER( X, A, B ) ], where X is the item being moved from parent A to new parent B. These transactions are transmitted to a transient transaction manager object. TM is responsible for committing this set of transactions to database, and notifying the clients about changes after this has succeeded or failed.

The containment table is a standalone table, that handles only 3 types of transactions :
- object creation (just creation, not populating with values)
- hierarchy changes (transferring items between containers)
- object destruction (special case of transfer where destination container is a dummy pre-defined container).

Trading scenario: A TM object is created (in any process, on any machine). Plyaers submit item references for trading, and confirm. TM commits the transaction for all items (pre-conditions A owns X, post-condition B owns X) for every tuple in transaction. After transaction succeeds, all peers participating in transaction are notified.

If any of the peers crashes at this point, it doesn't matter. They'll need to restore state from DB anyway. If either network or TM fail after committing, but before peers received confirmation, state is temporarily de-synched. However - even if clients were to think they still own the items, they can't trade them anymore - on next attempt, the TRANSFER would fail on pre-condition.

Last problem to address here is the asynchronous nature of transactions. Obviously, after an object has initiated TRANSFER, but before they receive the confirmation, they are still free to modify the object. A somewhat trivial solution to this is locking. When an object is submitted for transaction, it's flagged with transaction ID. If such ID exists, all messages, with exception of transactions will fail. This can be incorporated at very component level.

There is only one type of lock, and object can only hold one lock at a time (no dead-lock, only timeout, no system-wide blocking either).

This leaves me with two types of atomic operations.
- Operations on a single object
- Ownership change for a set of objects (synchronized group CAS, pre-condition checked in database)

These two primitives are sufficient to maintain referential integrity of transient logic state, and, even in case of crash, ensure that possibly inconsistent state has no effect on persistent storage.

Logic inconsistencies can still potentially occur - but that would be due to coding errors, by deliberately breaking the object ownership (passing persisted data by value to another object which stores it locally). However, as long as the ownership is respected, I believe that this scheme provides adequate integrity.

As for scalability - ownership change is assumed to not be the dominant operation, and it resides on a separate table, clustered database (if necessary), separately of actual object data. Assuming some 100 bytes for each transaction message per object TRANSFERred, it allows for 100,000 transactions per second before saturating the network (100Mbit). At this point, this sounds reasonable (Eve apparently performs 1250 transactions per second during peak times). With additional knowledge of types of transactions, this can probably be optimized further.

Ownership transfer for non-persisted objects doesn't need to go through database. Money (or other discrete transferable units) is treated as object for purposes of this mechanism.

Completely distributed game logic design

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Completely distributed game logic design

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines