Sign in to follow this  

Completely distributed game logic design

This topic is 3818 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

While trying to find efficient and scalable distributed application design, I got somewhat stuck on MPI approach. The idea here is to try to avoid any kind of explicit synchronization.
[message queue] --> Object 1 // thread
[message queue] --> Object 2 // thread
[message queue] --> Object 3 // thread
              ...
[message queue] --> Object n // thread

with usual message loop:
while (this->isAlive()) {
  Message *msg = next_message();
  handle( msg );
}

The threading doesn't really imply a thread per object (given hundreds of thousands of objects that's unviable), nor does it imply true threads (objects can be distributed over multiple machines and/or processes. With this design, each object represents equivalent of thread's own stack space. Queues would be implemented using one of asynchronous dispatching strategies (Reactor/Proactor), with central demuxer assigning objects with non-empty message queues to available worker threads. Demuxer ensures that a single objects is never handled by more than one thread at the same time (in a way similar to fibers, co-routines). But this would have perhaps the greatest impact on programming model. While setting values becomes trivial (SetValueMessage( id, value )), querying the value becomes problematic for objects that are not currently handled by the same thread (1 thread per object). This would devolve into rather verbose message passing semantics (typical combat scenario):
// In player object
on_attack( target, spell ) {
  if (this->has_spell( spell ) ) {
    if (this->get_energy() > spell.energy_cost() ) {
      send( AttackMessage, this->current_weapon(), this, target, spell );
    }
  }
}
// In weapon object
on_attack( attacker, target, weapon, spell ) {
  int base_damage = rand( this->minimum_damage(), this->maximum_damage() );
  send( DealDamageMessage, target, attacker, base_damage, this->extra_damage(), spell);
}
// In target object
on_deal_damage( attacker, base_damage, extra_damage )
{
  send( AbsorbDamageMessage, this->current_armor(), attacker, this, base_damage, extra_damage );
}
// In armor object
on_absorb_damage( attacker, target, base_damage, extra_damage )
{
  int adjusted_damage = base_damage * this->armor_rating() + extra_damage;
  send( DealAdjustedDamage, target, attacker, adjusted_damage );
}
// Back in target object
on_deal_adjusted_damage( attacker, adjusted_damage )
{
  // this also triggers the change in health value to be propagated
  // to remote nodes
  health.set( this.health.get() - adjusted_damage ); 
}


send() has the signature of send( MessageType, SendDestination, ... ); Explicit use of this-> indicates that it's only possible to query values belonging to the object which is currently handled by the thread. While other objects may exist in same address space, their values cannot be obtained, since they might be modified by some other thread (or running in parallel on another machine, causing data inconsistency. This would be equivalent to all get_ functions being protected or private, and message handlers member functions. While this approach appears to solve concurrency issues, as well as data consistency problems (events will occur in proper sequence, but several events from different clients would not have guaranteed order of execution, possibly just bucket synchronization), the programming model is somewhat cumbersome. The number of different messages is rather large, and the current syntax isn't really well suited (but that's solvable). However, I fail to see an alternative to this. Allowing values on other objects to be queried directly requires synchronization on per-object basis. While usually not very drastic, it does represent overhead. It also cannot be done across multiple processes. If implemented in C++, individual actions would likely be in form of functions (or functors), that would pass the object reference via IOC, guaranteeing thread-safety. When passing messages between cluster nodes, the order of events would be preserved. Regardless of where the objects physically reside, there is always only one single message queue handler responsible, so this eliminates authority issues. I realize that this is merely a re-invention of similar MPI solutions or lambda calculus, but it's turned out somewhat hard to find practical alternatives to classic single-threaded design, or at least any attempt at reducing required locking. There are obvious problems even with this design. One example would be buying an item from a vendor:
A ------ HasItemMessage ---> Vendor
                               |
A <-- ItemAvailableMessage ----+
|                       // Here another player buys item
+---- BuyItemMessage ------> Vendor
                               |
                               + 
// Although vendor said they have the itme, it's no longer available,
// making the whole query process somewhat redundant, and we could go with
// BuyItemMessage from the start


But I consider these higher level problems. Either that, or allowing for some form of soft locks, similar to way files are sometimes used to limit access to a directory or file, where presence of a lock. file means the resource is currently in use. Anyway, I'm just brainstorming, perhaps someone will find a fatal flaw in this whole concept that makes this useless in the first place.

Share this post


Link to post
Share on other sites
I'd stick with messages given the requirements. Maybe a different syntax. Dunno if you could use RPC calls instead to reduce the bloat, but that's an option.

Share this post


Link to post
Share on other sites
"When passing messages between cluster nodes"

Replication of the world state (updates) on the seperate machines in a cluster can become significant overhead (multicores could share the same memory space).
Spacial partitioning and residence of objects and probably 'local' execution queues would likely be more efficient (though adding the boundry transition problems).


The project I'm working on likely has much heavier AI processing, so the communications overhead ratio would be much less. The large number of cluster nodes I expect would be costly for each to maintain world state, so each high AI object has a window of the 'chunks' of the world surrounding it (Im still investigating the tradeoffs of sharing the chunks between objects that overlap in the same space). The central 'bookkeeping' master copy of the world space itself is spread over many machines. Im still investigating splitting the simulation mechanism across multi-cores as well as multiple machines.

Share this post


Link to post
Share on other sites
Multi-cores help when you are compute bound, but beware that the multi-cores share a single memory bus to main RAM. That means that for the cases where you are memory access bound, multiple cores are no faster than a single core (and may be slower because of contention).

To take advantage of heavily multi-cored CPUs (Sun Niagara2 -- 64 cores!) I believe we need a totally different programming paradigm (message passing / state machines), and we don't really have the languages and tools (like debuggers!) to make good use of that paradigm yet. Also, I think you'll want to use L2 cache as a message queue, so you can avoid going through main RAM for the messaging; currently there's not quite enough control to make sure that that happens. Lockable/assignable L2 cache is probably in our future.

Share this post


Link to post
Share on other sites
Quote:
To take advantage of heavily multi-cored CPUs (Sun Niagara2 -- 64 cores!) I believe we need a totally different programming paradigm (message passing / state machines), and we don't really have the languages and tools (like debuggers!) to make good use of that paradigm yet. Also, I think you'll want to use L2 cache as a message queue, so you can avoid going through main RAM for the messaging; currently there's not quite enough control to make sure that that happens. Lockable/assignable L2 cache is probably in our future.


I spent some time verifying and implementing the proposed system, and while I do not have such huge hardware, it doesn't affect the solution itself.

The problem is formalized with a very simple primitive:

class ObjectPtr {
void send( Message msg ) {
if ( is_local ) {
m_object.messagequeue.push( msg );
MARK_PENDING(m_object); // notify demuxer that m_object needs attention
} else {
socket.send( msg );
}
}
Object *m_object;
}

class Object {
void process_pending() {
int n = 10; // process at most n messages, then yield to others
while ( !messagequeue.empty() && n-- ) {
handlemessage( messagequeue.pop() );
}
}

State object_local_state;
}


The processing model results in this:

---Socket-->[Demuxer}-------->{Objects}
^ |//Send
+-----------------+------>[Serializer]--->{Net}


Serializer runs in its own thread, and demuxer passes out worker threads to objects with non-empty message queues (most likely only 1 (special case, can run in completely single-threaded implementation with no change to code) or 2 workers per processor)

While this would work fine in theory, the problem becomes with message storage. If used for true IPC, it needs to be persisted (either on heap, or via network). And using lock-free queues, allocating and de-allocating becomes a challenge in itself. Using a lockfree allocator solves this problem, but then causes livelock problem if the allocator is exhausted (e.g. Object produces messages at 1:2 ratio, when generating a message, it will spin on allocator waiting to receive an allocation, and demuxer will be waiting for processing to complete). There might be a solution for this at some higher level.

One side-effect of the design I hadn't noticed at first is that there is no more need for object shadowing. Since all calculations are performed either on object that owns the queue, or the data is passed through a message to another object, the execution context will always be local only to thread to which the object has been assigned.

In many ways I feel this is close to the state machine model mentioned. In this example, each state machine has its own thread, and messages form both, input and output to another SM.

One problem that I do suspect is the most problematic is acknowledgment or feedback. Since there are no guarantees as to when other a certain action will be acknowledged (one node is very busy and takes seconds to respond, when expected time is milliseconds). Then again, since this is about real-time gameplay, if such disconnect occurs, the solution would likely lie elsewhere, such as re-distributing the objects across nodes.

And, this design also reduces intra-node traffic (no shadowed copies), albeit at the expense of larger intra-object messages. So far, I've found that the most complex action I need to express takes 4 messages (per user command), with most of them being executable in direct response to command with no additional messages.

I won't know about real-world behaviour until the implementation is in workable form and split across a few machines.

Another consequence of such model is also that it becomes trivial to use DHT as storage, real-world analysis will need to show the needs for traffic balancing and explicit object grouping on a single machine - I suspect that with moderately complex logic of a typical MMO, this overhead would be less than originally expected.

Share this post


Link to post
Share on other sites
That sounds very encouraging.

Some questions:

1) Can you guarantee latencies between objects? It seems like, with a guaranteed maximum workload, you might be able to.
2) How do you do collision detection between objects in different processes? I think this one is harder.
3) If you put objects onto servers in a random or round-robin fashion (often one of the better load balancing strategies for disconnected loads), then all of the processors may need to load all of the static data about the world (terrain db, etc).

Share this post


Link to post
Share on other sites
Quote:
1) Can you guarantee latencies between objects? It seems like, with a guaranteed maximum workload, you might be able to.
2) How do you do collision detection between objects in different processes? I think this one is harder.
3) If you put objects onto servers in a random or round-robin fashion (often one of the better load balancing strategies for disconnected loads), then all of the processors may need to load all of the static data about the world (terrain db, etc).


1) The guarantee is somewhat tricky. If a particular action takes a long time (hundreds of ms or s), then the system breaks. This is of course no different that using other approaches (even with multiple worker threads, if a monolithic task takes a long time, it will bog down any system if executed in all worker threads).

That aside, if each action by itself is well behaved, or offloads its work to external workers (that do not need to lock the object), then it's possible to establish per-object quotas, and use that to balance them across nodes.

2) This one is trickier. I won't pretend I can offer any reasonable solution to general problems, but I believe that a solution with bucket synchronization could work.

- For the duration of time slice, collect (in an independant out-of-thread object) intended movements. For n-dimensional movement, that would be n+1 dimensional object. For objects represented with circle, an extruded cylinder would represent intended movement. This can be further simplified with a point extended into a line. Desired movements can be broadcast within cluster (barely managable with reasonably small number of objects. A caching/grouping/delta compression strategy could also be applied here for cases where large number of collisions need to be resolved (1000+).

At the end of time slice, calculate all possible intersections and seek for collisions. Send messages back to each object, signaling either allow or deny movement. This calculations would be done locally for each process, and collision information would be known for entire space, so final allow/deny notifications could be sent locally only, not adding to additional traffic.

Since this calculation is out-of-thread (it doesn't need the object's thread safety), messages can be sent with higher priority. Once again, this only works under certain load guarantees - or perhaps it's not even that dependant on the load, I'd have to look into this in more detail.

In the end, the problem will be with (n-1)(n-2)/2 complexity of collision detection itself, not so much with the cluster scalability. Given realistic loads, it's unlikely that more than 1000+ players would be close enough (collision detector has knowledge of spatial relations, and can cull out objects that cannot possibly collide, even if individual objects can't). Additionally, static geometry can be safely checked on per-object basis, and non-player entities can probably also use simplified tests.

3) I believe there was an article about DHT addressing this. One solution is to assign each sub-part of the world to m different nodes, where m << n, where n is total number of nodes. This way, different parts of the world are permuted across multiple nodes, requiring single node to only know about a small number of subsets of the world. This still implies geographical grouping. In case of m = n, every node contains entire world data. I'll have to look it up, IIRC Colossus or Mercury implementations use precalculated connections between nodes to statically balance regions.

Given Zipf or Gaussian distribution of activity across the world, simply splitting one geographic location across 3-6 nodes would likely be sufficient to take care of the worst overload (within limits, obviously).

Share this post


Link to post
Share on other sites
Quote:
Original post by hplus0603
Multi-cores help when you are compute bound, but beware that the multi-cores share a single memory bus to main RAM. That means that for the cases where you are memory access bound, multiple cores are no faster than a single core (and may be slower because of contention).

To take advantage of heavily multi-cored CPUs (Sun Niagara2 -- 64 cores!) I believe we need a totally different programming paradigm (message passing / state machines), and we don't really have the languages and tools (like debuggers!) to make good use of that paradigm yet. Also, I think you'll want to use L2 cache as a message queue, so you can avoid going through main RAM for the messaging; currently there's not quite enough control to make sure that that happens. Lockable/assignable L2 cache is probably in our future.



I will have to do some testing on the quad core system I am planning to buy soon. The Q6600 has 8 meg of cache (4x2) and hopefully the data flow to number crunching ratio wont be too high. Additional cores could only increase the need for data. I havent yet seen how architecturally they might increase the flow from shared main memory to actually allow efficient use of so many cores for more than limited applications (ie- number crunching pipelines).

Share this post


Link to post
Share on other sites
Quote:
Desired movements can be broadcast within cluster


I was going to object that this scales by n-squared in total number of objects on the cluster, which isn't good for scalability, but you already noted that. This is the main reason why I think geographic division is still a necessity for anything physically based and real-time.

Note that you can work around the problem with game design, instead. A RPG really doesn't need physics (or, why EverQuest mobs will suddenly warp through trees and rocks that are in their path).

Share this post


Link to post
Share on other sites
Quote:
I was going to object that this scales by n-squared in total number of objects on the cluster, which isn't good for scalability, but you already noted that. This is the main reason why I think geographic division is still a necessity for anything physically based and real-time.


This might not be necessary.

Fully distributed collision detection is likely not viable. But there's frequently little need to do so.

For a semi-real-time application, an approximation of CD will likely suffice (with clients at 250ms round-trip time, anything more is unrealistic). CD can then be performed on dedicated servers, each of which serves a geographic area (these can be co-located alongside other nodes).

When an entity is moved, its host notifies the appropriate server about where the movement will take place. CD Server verifies for collisions, and notifies clients back. Calculation of CD is still O(n^2), but network traffic in this case is only O(2n).

But perhaps more importantly, CD is a problem well suited for different model - MapReduce for example. It operates on a fixed set of input data, performs a monolithic calculation depending on the input set only, and produces a set for output. Unlike game logic, which operates on a large variable set of objects, each of which is active only for a fraction of time.

The model I proposed was actually somewhat mis-leading with respect to MPI. While messages are used in communication, the control flow is the same as in agent based architectures. Each command a controller invokes creates an artifact (an agent), which then moves between objects, operating on two sets of data - command's own state (parameters), and object's state.

I was primarily concerned with lower level details of finding a solution suitable for both, multi-core and networked distribution, so I kinda lost the big picture view. There hasn't been much association of agent based systems with MMOs, so that connection escaped me at first.

In the above combat example, a CombatAgent would be sent between objects, it would accumulate the resolution of combat, and travel between all the affected nodes (player->weapon->target->armor->target). Fortunately, agent systems are fairly well documented, making further design somewhat simpler, or at least provide reference points. It also makes the logic and message semantics somewhat more intuitive.

Share this post


Link to post
Share on other sites
Quote:
with clients at 250ms round-trip time, anything more is unrealistic


Our OLIVE(tm) architecture does 30 Hz physical updates in an MMO context, at client round trip latencies up to 1200 milliseconds. We keep messaging within the server cluster down to less than a step duration, total, but we also don't need to send interactions across the servers, due to the way we design our zoning and ghosting. But it's also somewhat overkill for the kind of scenario you describe, which sounds much more RPG-like.

Share this post


Link to post
Share on other sites
Quote:
Original post by hplus0603
Quote:
with clients at 250ms round-trip time, anything more is unrealistic


Our OLIVE(tm) architecture does 30 Hz physical updates in an MMO context, at client round trip latencies up to 1200 milliseconds. We keep messaging within the server cluster down to less than a step duration, total, but we also don't need to send interactions across the servers, due to the way we design our zoning and ghosting. But it's also somewhat overkill for the kind of scenario you describe, which sounds much more RPG-like.


Perhaps I steered off track by mentioning DHT - I merely wanted to point out the transparency of the design.

For all practical purposes the server runs typical zoned architecture, with logic most likely running in a single thread per process (possible like a co-routine, so it can be suspended and resumed as needed).

As such, there's no reason for parallel tasks running different types of processing.

The latency of 250ms I mentioned applied more to practical problems (how long till player is rubber-banded back after collision has been resolved). For a car simulation, latency would inevitably play a role - or at very least, I'm not aware of any latency independent mechanism that would allow true real-time physics without encountering some artifacts (apart from fixed step simulation).

Share this post


Link to post
Share on other sites
Meh, guess I invented Actor model

Oh well, at least I can look at Erlang's syntax, and map its concepts into Lua and C++.

The only benefit lies with implementation, whereas erlang's threads supposedly occupy 300 bytes, I can currently get by with under 80 bytes/thread. Guess I'll toy around with this a bit more to see how it works out.

Share this post


Link to post
Share on other sites
An alternative is to bind your messaging to fibers (a.k.a. cooperative threads, or co-routines). That way, you can send a message, and block until you get a reply.

However, no matter how you wrap it, synchronous cross-object calls will lead to deadlocks unless you enforce pretty stringent ordering rules (lock hierarchies). As you've discovered, you don't want to put the actions in the objects anymore; instead, the actions are visitors that get marshaled around.

Which, in turn, gets into a whole other bag of transactional update problems, which in turn have been researched in the clustered database space for a long time. You'll probably end up with not only actors, but also quite possibly transaction monitors.

Share this post


Link to post
Share on other sites
Quote:
An alternative is to bind your messaging to fibers (a.k.a. cooperative threads, or co-routines). That way, you can send a message, and block until you get a reply.


I've got that already. It's software based lock-free event demuxer (software-based, since I strongly doubt any OS would look kindly upon allocating millions of kernel objects for this kind of tasks), that ensures that each object is executed by a single thread at a time only.

Quote:
However, no matter how you wrap it, synchronous cross-object calls will lead to deadlocks unless you enforce pretty stringent ordering rules (lock hierarchies). As you've discovered, you don't want to put the actions in the objects anymore; instead, the actions are visitors that get marshaled around.


This is why I'm disallowing synchronous calls altogether. Message dispatch is non-blocking, uni-directional, and foreign objects aren't accessible, outside of their handle. So all objects references that individual objects hold are just pointers that cannot be dereferenced, but can be sent a message.

The final version of the model is essentially equivalent to stackless python, and the programming model is identical to organization presented in an example, covering even collision detection and other aspects.

Quote:
Which, in turn, gets into a whole other bag of transactional update problems, which in turn have been researched in the clustered database space for a long time. You'll probably end up with not only actors, but also quite possibly transaction monitors.


I've noted transaction problems in the beginning. What I still cannot find is an example of where I'd need transactions. While I'm sure there are examples of certain actions which might require some strict sequencing, I haven't been able to find examples of it in the game logic I've currently worked out. This means I'm either missing a huge thing, or that such situations are rare (perhaps not in general case, but in rpg-ish oriented logic).

Bank problem: "5 people want to withdraw $100 each from an account with $275". 5 messages are sent to bank object. First 2 that arrive generate a return message containing amount withdrawn (receivers then increase their own money value in response to on_receive_money message), other 3 receive "not enough funds" message.

Whether the order here was correct is unanswered. Did the right 2 people receive the money? Since the client latency is not defined, I cannot say who really did something first. Like I said, perhaps I'm missing something... When a resource is accessed on first-come-first-serve basis, there's simply no way to be fair about it (it could come to nanoseconds or instruction cache ordering on the client-side).

Then again, agent-based systems usually provide a blackboard. Even if explicitly or implicitly locked, it would only be needed to cover special cases.

Part of the reasoning I'm applying here (it is a non-deterministic system after-all), is that as long as client's actions make sense to client (kick, punch, move doesn't become punch, die, move, kick), the very latency of such worlds adds non-determinism to "absolute" sequence of events.

Share this post


Link to post
Share on other sites
That's the simplest bank problem, with only one resource. However, it still needs transactions. Consider:

1) Action A withdraws $100.
2) system crashes

Where did the $100 go? They went into thin air.

A better example, very RPG-ish: Two players want to trade; $100 for a Vorpal Blade of Pwnage.

So, the operation needs to do the following things, with a partial ordering:

1) withdraw $100 from player 1
2) withdraw a Vorpal Blade of Pwnage from player 2
3) deposit $100 to player 2
4) deposit a Vorpal Blad of Pwnage to player 1

Now, the system can crash anywhere after #1 has executed, but #4 has not committed to hard storage. Given that player 1 and player 2 can actually live on different machines, it's even a distributed transaction.

Btw: a similar bank problem is "user A wants to transfer $100 from acct 1 to acct 2; user B wants to transfer $100 from acct 2 to acct 1" which may lead to deadlock, or may lead to distributed transaction failure.

Share this post


Link to post
Share on other sites
Unlike pure actor models, all data I have exists in tangible space, not in some network ether. As such, money can't disappear, unless deliberately doing so.

In many ways, almost everything should be a transaction. Firing a gun disposes a bullet locally, and converts the bullet into damage at the client. Unfortunately, while databases offer full transactional support, logic in a database isn't viable for lower-end designs.
This is why I separate into 3 types of transactions:
- Irrelevant (actions which may as well get lost, movement, chat, many combat actions)
- Important (item consumption, firing a gun, losing a bullet). These can be handled through basic roll-back techniques, for performance reasons, these can also be handled in a soft manner for each type of transaction independently, possibly with acks.
- Critical (cross-process object migration, trade) - handled individually through lower-level RPC-style messagin, or through creation of transaction-like objects (TradeBroker, persisted, handles the transaction and commit logic). These must explicitly succeed before commits are made.

While developing a fully transacted system would be nice, even the highest end databases would struggle at typical number of interactions. Perhaps Project Darkstar will prove this wrong, or perhaps I over-estimate the exact workload. Then again, that project hasn't yet been tested in a true virtual world style model.

I'm sure that a fully distributed transacted system would be great. But there's also another type of problem. Virtual worlds, unless explicitly zoned, act as one single application. Failure of any part of cluster, for the purpose of consistency, would need to block entire application. If for nothing else, then simply for butterfly effect. How can I know, in general case, that if a mob doesn't spawn 100 miles away, I will not be affected at some later point?

For example, space simulation. If the earth node goes down, what happens to moon? Does it continue rotating or does it continue in straight line? May I assume that Earth is still there where it was? What if node didn't go down, but it was Vorgon fleet? Is it even possible to assume anything here, or is the only solution to freeze entire cluster?

My system is based on some assumptions. Interactions between objects are on a very local scale. Trades will almost exclusively be performed on same node. Objects beyond LoS will not affect me. Most actions carry no long-term effect. Most actions are short-lived, with well known side-effects.

Special class of problems I do need to take care off fall into containment category. Trades, cluster migration and object deletion are all perfect examples. Fortunately, they are also quite limited, and most of the system can be oblivious to this.

Share this post


Link to post
Share on other sites
Quote:
all data I have exists in tangible space


Ah, but then you have to either centralize, and have a limitation on scalability, or solve the distributed transaction problem.

In OLIVE, we federate all the data across multiple database instances, each of which stores a horizontal slice of the database. If one server goes down, then transactions involving users on that server will temporarily stall or fail, but the rest of the system works fine. I bet some of our users have 20+ separate database instances, so only 5% of users would be affected by a single machine failure.

Project Darkstar, as it currently stands, uses a single database back-end, and won't even do object inflation caching -- each object is fully de-serialized, acted on, and then re-serialized, for each operation. I doubt that'll scale all that well. They've certainly indicated that they don't expect real-time changes (like terrain queries, physics, etc) to be part of the Darkstar back-end, but instead solved "outside the box."

Share this post


Link to post
Share on other sites
Quote:
Original post by hplus0603Btw: a similar bank problem is "user A wants to transfer $100 from acct 1 to acct 2; user B wants to transfer $100 from acct 2 to acct 1" which may lead to deadlock, or may lead to distributed transaction failure.


I've worked out a seemingly reliable solution to transactions.

All logic used during run-time exists in transient memory (read from database, then kept on servers in memory).
Objects that exist are based on component model, with each object possibly being a separate process.

From practical perspective, each message handler is atomic. A message on an object is either executed fully (and possibly committed to database), or it didn't happen.

However, object ownership transfers are inherently non-atomic, and require, by definition touching multiple objects.

As such, I define a transaction over set of tuples [ TRANSFER( X, A, B ) ], where X is the item being moved from parent A to new parent B. These transactions are transmitted to a transient transaction manager object. TM is responsible for committing this set of transactions to database, and notifying the clients about changes after this has succeeded or failed.

The containment table is a standalone table, that handles only 3 types of transactions :
- object creation (just creation, not populating with values)
- hierarchy changes (transferring items between containers)
- object destruction (special case of transfer where destination container is a dummy pre-defined container).

Trading scenario: A TM object is created (in any process, on any machine). Plyaers submit item references for trading, and confirm. TM commits the transaction for all items (pre-conditions A owns X, post-condition B owns X) for every tuple in transaction. After transaction succeeds, all peers participating in transaction are notified.

If any of the peers crashes at this point, it doesn't matter. They'll need to restore state from DB anyway. If either network or TM fail after committing, but before peers received confirmation, state is temporarily de-synched. However - even if clients were to think they still own the items, they can't trade them anymore - on next attempt, the TRANSFER would fail on pre-condition.

Last problem to address here is the asynchronous nature of transactions. Obviously, after an object has initiated TRANSFER, but before they receive the confirmation, they are still free to modify the object. A somewhat trivial solution to this is locking. When an object is submitted for transaction, it's flagged with transaction ID. If such ID exists, all messages, with exception of transactions will fail. This can be incorporated at very component level.

There is only one type of lock, and object can only hold one lock at a time (no dead-lock, only timeout, no system-wide blocking either).

This leaves me with two types of atomic operations.
- Operations on a single object
- Ownership change for a set of objects (synchronized group CAS, pre-condition checked in database)

These two primitives are sufficient to maintain referential integrity of transient logic state, and, even in case of crash, ensure that possibly inconsistent state has no effect on persistent storage.

Logic inconsistencies can still potentially occur - but that would be due to coding errors, by deliberately breaking the object ownership (passing persisted data by value to another object which stores it locally). However, as long as the ownership is respected, I believe that this scheme provides adequate integrity.

As for scalability - ownership change is assumed to not be the dominant operation, and it resides on a separate table, clustered database (if necessary), separately of actual object data. Assuming some 100 bytes for each transaction message per object TRANSFERred, it allows for 100,000 transactions per second before saturating the network (100Mbit). At this point, this sounds reasonable (Eve apparently performs 1250 transactions per second during peak times). With additional knowledge of types of transactions, this can probably be optimized further.

Ownership transfer for non-persisted objects doesn't need to go through database. Money (or other discrete transferable units) is treated as object for purposes of this mechanism.

Share this post


Link to post
Share on other sites
Congratulations, you have just re-invented the basics of a Transaction Monitor! Granted, IBM has known about those things for 40 years, but better late than never :-)

Seriously, though, you are correct that transactions involving multiple parties will need a separate arbiter, who first commits locally what he/she will do, then goes out and tells the others to do it. Google for "marriage protocol" or "two-phase commit" for more good information about that mechanism.

Share this post


Link to post
Share on other sites
Quote:
Original post by hplus0603
Congratulations, you have just re-invented the basics of a Transaction Monitor! Granted, IBM has known about those things for 40 years, but better late than never :-)

Seriously, though, you are correct that transactions involving multiple parties will need a separate arbiter, who first commits locally what he/she will do, then goes out and tells the others to do it. Google for "marriage protocol" or "two-phase commit" for more good information about that mechanism.


I've researched all of those, as well as alternatives, and after prototyping settled for middle ground, based on those approaches.

Ideally, I'd have everything transacted, and everything solved on third-party solutions. Unfortunately, from practical perspective, 95% of that would be overkill, but 5% would be crucial in case of failure. This is why I also have custom TM (that relies solely on database to handle real data integrity tests, clustering and the rest for that part of persistence).

From perspective of application logic, I deliberately allow certain states to be non-synchronized, or to fail. After analyzing, and prototyping some common, and some detailed actions, I found those two primitives to cover all operations that need to be performed, or that they can be realized through this set of operations.

Everything could obviously be done entirely in database. But one still cannot escape the need for some proxy front-end. And this is just one way of light-weight partitioning, ease of development (providing external languages with performant full access to full game state, while retaining all the benefits of underlying system), and taking as much benefit from existing full-scale solutions that offer the reliable, but much heavier approach.

Share this post


Link to post
Share on other sites
Sign in to follow this