# Distributed transaction & message queue

This topic is 1108 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Recommended Posts

Recently I've been working on finding a proper solution for distributed transaction, and I found this: http://www.codeofhonor.com/blog/wp-content/uploads/2012/04/Patrick-Wyatt-Writing-reliable-online-game-services.pdf

The pdf file's content comes from a lecture by Patrick Wyatt in GDC 2012, and I believe many guys have read it.

In this lecture, Patrick showed how he solved the consistency of trade between two players distributed in two different databases(not by using XA), the approach is to achieve 'eventual consistency' by using a consistent 'transaction queue'. I will explain this approach below briefly.

------------------>>>>

1. Let's suppose player1 is in db1, player2 is in db2. And player1 wanna give \$10 to player2.

2. We first commit a transaction to db1, like this

begin_transaction_in_db1
update player set money = money - 10 where id = money_giver_id;
insert into transaction_queue values(db2, 'increase_player_money', money_receiver_id, 10);
end_transaction


What we insert into the table transaction_queue is essentially a promise-- we will increase the money_receiver's money in the future.

3. A worker process will scan the table transaction_queue in db1, and when it gets the new record inserted above, it will commit a transaction to db2, like this

begin_transaction_in_db2
update player set money = money + 10 where id = money_receiver_id;
end_transaction


4. By implementing like above, we can acheive eventual consistency.

---------------<<<<

I have found many similar idea like this, to avoid using the high-latency XA function provided by mysql.

But I'm really wondering: what if transaction2 fails while transaction1 succeeds? Apparently, we will be unable to rollback transaction1 when we find transaction2 has failed, because at that time transaction1 has been committed already...

Edited by iwitggwg

##### Share on other sites

what if transaction2 fails(the money_receiver does not exist any more when transaction2 is executed, e.g.)

There are two kinds of fails. One is, when the communication fails (servers crashes, networkcommunication aborted, harddisk crahed etc.) and when the data consistency fails. The first idea is, to ensure data consistency (just check it upfront), then execute the transaction. If the second (db2) transaction failed, you try to repeat it (therefor you need idempotent transactions). If it fails because of some data inconsistency, you need to revert your first transaction. Eventually there should be always some other security systems installed like don't payout any real cash until all transactions in the second db have been successfully commited.

There is always a trade off, you can't have 100% transaction save behavior most of the time, but you can have 99.99%. If one case of the 0.01% fails, you need to handle it in an other way, most often manually. It is just important, that

1. you need a (hi-performance) way of handling the transaction for 99.99% of the data.

2. you need a way to get informations about "hard" fails.

3. you need a way to (manually) fix these hard fails.

##### Share on other sites
When you use message queuing, you have to only use messages that are eventually fulfillable.
"Give X credits to player Y" is always eventually fulfillable, unless you have some business rule that forbids a balance over a limit.
If you do have such a limit, the message would be "Give X credits to player Y, but if the limit is hit, send the remaining credits to location F and send an email to the giver and recipient" or something like that.

However, "give value from A to B" isn't a very interesting use case, because it is inherently always fulfillable.
What's more interesting is two-party things, like "give X credits from A to B while transferring the flaming sword of fire from B to A."
That would have to be formulated something like:
Message A->B: "Here's X credits, give them to B and enqueue a message to give the sword to A, OR enqueue a message to refund the credits."
Message B->A: "Here is the sword." OR "Here are the credits back."
Keep re-formulating the messages and the protocol until the likelihood of failure is very small.
Also, "eventual consistency" matters here. If system B is down for a while, that's OK, as long as it eventually comes back and processes the message.
If you're in an environment where that can't be guaranteed, then the message becomes a little more complex:
Message A->B: "Here's X credits, give them to B and enqueue a message to give the sword to A, OR enqueue a message to refund the credits, OR if timing out after T time, refund the credits."
Message B->A: "Here's a sword. If timing out after time T, you're screwed and need manual intervention."

At some point, any system will run into some limitation. Maybe there's a rule that you can only own one elemental sword, and when the sword returns, A already owns the freezing sword of frost. Maybe you saved space in the database by only allocating 16 bits of credits storage, and B went over the 65,535 credit limit. Maybe player A gets banned and deleted from the system before the sword makes it there.
At some point, you will have to throw an exception (metaphorically -- not necessarily using your language's exception mechanism) and kick the action up to an operator/human. When this happens, it's generally super helpful if the operations in the system leave good audit trails, using some traceable unique ID, and you have good tools to actually see each step that was taken in the chain. Edited by hplus0603

##### Share on other sites

THX to @Ashaman73 and @hplus0603, I like the point 'There is always a trade off' and 'At some point, any system will run into some limitation', they are helpful to me

##### Share on other sites
Escrow-style services can be useful for that kind of transaction if you fear a multi-part transaction can fail.

This becomes important if you really do need audit trails as hplus mentioned.

For example, the {give money from A to B, give sword from B to A} can work well with a queue, as the escrow service becomes the queue consumer. Neither A nor B performs the transaction, the work is done through the escrow service.

Depending on your level of confidence you may even need additional verification that the sub-steps are atomic and idempotent. It all depends on how far down the rabbit hole you want to go.

An enormous number of games suffer from exploits due to poor handling of these transactions, where you can find ways to duplicate items or add the difference between items by doing an undo-style transfer (e.g. sell then buyback from shopkeeper before closing the screen) or rollback (e.g. start doing the thing that consumes resources then cancel back out to recover resources). It doesn't make much sense to spend much time on this kind of thing for local games, but for online games where in-game resources can have cash value on marketplaces it becomes much more significant.

##### Share on other sites

The one problem i have is that this does nothing to prevent a double spend exploit.

In fact, doing everything in one transaction doesn't do the trick either.

When your player trade starts, the first thing that happens is some sort of logic test to make sure that the player should be allowed to perform the trade (ie does the player have enough gold?).

Then it actually performs the transaction(s).

If a second trade spins up between these two steps then the user will have the ability to spend into the negative since your logic tests will see old data.

You would have to have some way to prevent other database sessions from reading this data as well as editing it so that only one trade happens at a time. (maybe just locking the records?).

Database transactions aren't a silver bullet.

##### Share on other sites

The one problem i have is that this does nothing to prevent a double spend exploit.

Sorry, but it really is a solved problem.

It was solved centuries ago in the physical world with escrow services, and was solved decades ago in the electronic financial industry with a variety of different algorithms, including the one presented in my post above.

You have two different systems, each with the potential to fail, or not satisfy the requirements at the time of transaction, or be fraudulent, or be cancelled in the future. Transactions like {buy ticket on ticketing system, transfer funds with banking system} happen all the time around the globe. Others like {remove item from inventory tracking system, transfer funds with banking system} happen thousands of times every second at major retail chains.

You start with both systems using databases that satisfy ACID requirements, then you use your own intermediate system with idempotent transactions that handle duplicate events, auditing, and enable implementing product returns or mid-transaction exceptions.

This is very much a solved problem. It requires quite a few small steps, but done correctly the steps can be safely re-run multiple times, safely be interrupted and resumed, safely be cancelled mid-process during an exceptional process like insufficient funds or discovery of fraud, and can safely be cancelled through a second later transaction. It can also generate audit trails adequate for everything including government investigation and insurance investigation.

##### Share on other sites

The one problem i have is that this does nothing to prevent a double spend exploit.

Sorry, but it really is a solved problem.

It was solved centuries ago in the physical world with escrow services, and was solved decades ago in the electronic financial industry with a variety of different algorithms, including the one presented in my post above.

You have two different systems, each with the potential to fail, or not satisfy the requirements at the time of transaction, or be fraudulent, or be cancelled in the future. Transactions like {buy ticket on ticketing system, transfer funds with banking system} happen all the time around the globe. Others like {remove item from inventory tracking system, transfer funds with banking system} happen thousands of times every second at major retail chains.

You start with both systems using databases that satisfy ACID requirements, then you use your own intermediate system with idempotent transactions that handle duplicate events, auditing, and enable implementing product returns or mid-transaction exceptions.

This is very much a solved problem. It requires quite a few small steps, but done correctly the steps can be safely re-run multiple times, safely be interrupted and resumed, safely be cancelled mid-process during an exceptional process like insufficient funds or discovery of fraud, and can safely be cancelled through a second later transaction. It can also generate audit trails adequate for everything including government investigation and insurance investigation.

However, it is NOT solved by the example provided.

This is why YOU said

Depending on your level of confidence you may even need additional verification that the sub-steps are atomic and idempotent. It all depends on how far down the rabbit hole you want to go.
Edited by Gl2eenDl2agon

##### Share on other sites
Yes, if you are not confident in your underlying systems, particularly that they satisfy ACID requirements, then you need to verify those as well.

Most businesses are quite willing to trust that their database servers from Postres or MySQL or Oracle or whatever satisfy those requirements. Others will create test plans that include testing those systems, and they occasionally will discover database system bugs.

##### Share on other sites

Yes, if you are not confident in your underlying systems, particularly that they satisfy ACID requirements, then you need to verify those as well.

Most businesses are quite willing to trust that their database servers from Postres or MySQL or Oracle or whatever satisfy those requirements. Others will create test plans that include testing those systems, and they occasionally will discover database system bugs.

You are forgetting that ACID databases do not lock reads while you have a transaction open.

If something else reads the value during your transaction it will see the old value prior to your modification since your changes are queued up in a working copy of the modified records until you commit.

If you don't take that into consideration, you will end up with a situation where you can spend some gold while the prior action of spending gold is still in process. The ACID compliant database won't help you there.

multiversion concurrency control, in which the database provides each reading transaction the prior, unmodified version of data that is being modified by another active transaction. This allows readers to operate without acquiring locks, i.e. writing transactions do not block reading transactions, and readers do not block writers. Going back to the example, when user A's transaction requests data that user B is modifying, the database provides A with the version of that data that existed when user B started his transaction. User A gets a consistent view of the database even if other users are changing data.

Locking the data from read access while you are writing to it in a transaction is not the default action of ACID databases. You have to manually lock it.

(For some reason I can't remove that VVVVV )

##### Share on other sites

You are forgetting that ACID databases do not lock reads while you have a transaction open.

Locking the data from read access while you are writing to it in a transaction is not the default action of ACID databases. You have to manually lock it.

This is not a true statement.

The DEFAULT behavior of MySQL might be "REPEATABLE READ" transaction isolation level, which has some of the behavior you're suggesting. However, this does *not* count as ACID, because it's missing the "I" value.
In MySQL, you should make sure to run all transactions in "SERIALIZABLE" transaction isolation level, which does fully implement ACID and behaves properly with regard to read-after-read hazards (and all other hazards described here.)
More documentation on MySQL transaction isolation levels: http://dev.mysql.com/doc/refman/5.0/en/server-options.html#option_mysqld_transaction-isolation

Other databases (Postgres, Oracle, DB/2, SQL Server, etc) mostly have similar trade-offs available, although some of them may default to different levels.

Note that, at the DBA level, there are many other things that could turn your "durable" transaction non-durable. Hard disks with caches that don't survive power loss (this may include SSDs) and disk chipset drivers that play fast and loose with out-of-order command queueing come to mind for example.
Running a real, online, system of any kind (game or not) requires a separate set of knowledge, skill, and actions that are totally different from developing a game or application. Arming developers (and operators) with understanding of both sides of this equation is what the DevOps movement is about. Edited by hplus0603

##### Share on other sites

This is not a true statement.

The DEFAULT behavior of MySQL might be "REPEATABLE READ" transaction isolation level, which has some of the behavior you're suggesting. However, this does *not* count as ACID, because it's missing the "I" value.

But you see my frustration with the popular opinion that a developer can be completely ignorant of whats going on inside of the magical RDBMS software because it magically handles everything.

Considering EVERY triple-A online game has struggled with duping exploits you'd think more attention would be paid to this issue.

##### Share on other sites

Considering EVERY triple-A online game has struggled with duping exploits you'd think more attention would be paid to this issue.

Traditional game development studios have previously had a culture that rewards and attracts different skills.

Once FarmVille made its billion, however, I think we're seeing that starting to shift. Like all cultural shifts, this will take time.

##### Share on other sites

This topic is 1108 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

## Create an account

Register a new account

• ### Forum Statistics

• Total Topics
628734
• Total Posts
2984444

• 25
• 11
• 10
• 16
• 14