Collusion Strategy Wins Iterated Prisoner's Dilemma Competition

Started by
20 comments, last by Kylotan 19 years, 4 months ago
I came across this article and thought gamedev might find it interesting. Researchers in England used an agent collusion strategy to win an AI competition: A New Way Out of the Prisoner’s Dilemma: Cheat.
Quote: ... “The Prisoner’s Dilemma encapsulates the essence of how cooperation can emerge when you have self-interested behavior,” said Nick Jennings, a computer science professor at Southampton University and member of the winning team. “It’s very, very simple, and yet very powerful.” The winning strategy, Tit for Tat, worked like this: a player began by cooperating each round, until his opponent defected. After this point, the player echoed his opponent’s last move, defecting after the other player had betrayed him, and cooperating when his opponent began to cooperate again. The Southampton University researchers, who entered a team of 60 software robots, or agents, into the competition, found that an even better strategy was to submit a group of robots that behaved in the tournament either as masters or slaves. A robot’s particular role was encoded in its pattern of opening moves, allowing any other robot that knew what to listen for to recognize its opponent as another Southampton player. When a Southampton “master” was pitted against a Southampton “slave”, the slave would sacrifice itself, cooperating every round to allow the master to defect repeatedly and rack up a huge score. In the end, the individual master robots scored more points than the robots playing Tit for Tat. “It was almost like a cult,” said Graham Kendall, a professor at the University of Nottingham, in England, who organized the 20th-anniversary competition, referring to the culture the Southampton entries created. “There were superagents who exploited all their friends.” While technically this strategy violates the spirit of the Prisoner’s Dilemma, which assumes that the two prisoners cannot communicate with one another, the Southampton system is not without its uses. In reality, collusion between competitors is generally impossible to prevent, and a whole field of research has sprung up to study how intelligent agents can use what’s called coding theory to communicate covertly, and how such collusion can in turn be prevented. For example, the U.S. Federal Communications Commission’s radio spectrum auctions have historically been plagued by cases of bidders colluding by coding hidden signals in their bids, and so the FCC has brought in researchers to advise it on how to modify the rules of its auctions to try to mitigate collusion. ... A second round of the Iterated Prisoner’s Dilemma competition will be held in April of 2005 at the IEEE Symposium on Computational Intelligence and Games hosted by the University of Essex, in England. Armchair game theorists may submit an entry by visiting http://www.prisoners-dilemma.com/.
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
Advertisement
I've seen this article as well... and while I thought it was interesting, I also have this to say...

...just because an algorithm cheats doesn't mean it's better than some other algorithm that doesn't cheat. In other words, IF you kept the constraint that algorithms couldn't cheat, then Tit-For-Tat is still an optimal strategy.

Personally I don't think they've done anything really noteworthy, because all they have done is change (ignore) the rules of the game to ones which benefit their algorithm and done so in an environment where no other agents were permitted to act according to their rules. If we suddenly decided that cheating was a viable strategy in game theory problems, then many of the current optimal solutions would suddenly become sub-optimal.

Besides, I hate cheaters! ;)

Timkin
The story did strike me as a puff piece advertising the next round of competition. While I agree that new insights can be gained by studying the various ways that collusion leads to success in the game, I don't think Southampton should have been awarded any kind of prize for cheating. It seems to me that they ruined the credibility of the competition. Who would enter now knowing that the other contestants are probably cheating?
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
why isn't cheating a valid solution, as long as that cheat has no legal and/or damaging social reprecussions? For example, given person A forcing person B to play a game of russian roulette, isn't cheating a valid option for person B? The prisoner's dilemma is similar, major reprecussions for the agents would come to fruition if things don't go correctly for them, so why not cheat?

cheating is a bad word, in this case, because the word cheating connotates academic dishonesty, which IS wrong, it creates a situation in which you can falsely represent yourself as being knowledgeable, which could later lead to problems with your career (specifically, losing money for the company that you work for). Cheating on a test hurts other people around you.

But in the case of the prisoner's dilemma, the "game" is really just "cops vs robbers". The fact that the formal game has the restriction that it is impossible for the robbers to communicate just points out a fundamental flaw in the game, it doesn't represent anything real.

So, it's not "cheating" to get the prisoners to communicate. It's a competition between the prisoners and cops, and the so called "cheating" is just counter-manding the efforts of the cops, which is exactly what they want to do.

Really, what is cheating? When you get into the real world, there is breaking the law, there is violating the trust of your friends and family, but other than that, what is there? This extends even to sporting competitions, cheating is "illegal" in the sense that, if you are caught, there are serious reprecussions. It is also a violation of your teammates trust (that is, if you cheat independantly without their knowledge).

In this specific case, the Southampton team was clever enough to get around the rules. Their achievment should be recognized, they are deserving of this win.

[Formerly "capn_midnight". See some of my projects. Find me on twitter tumblr G+ Github.]

I agree. It seems like everyone is labeling this "cheating" only because they didn't think of it first. Collusion is a very beneficial strategy to the winners when used correctly, but the losers pay big.
- k2"Choose a job you love, and you'll never have to work a day in your life." — Confucius"Logic will get you from A to B. Imagination will get you everywhere." — Albert Einstein"Money is the most egalitarian force in society. It confers power on whoever holds it." — Roger Starr{General Programming Forum FAQ} | {Blog/Journal} | {[email=kkaitan at gmail dot com]e-mail me[/email]} | {excellent webhosting}
Its 'cheating' because it isn't solving the problem at all. The point is for each one to do as well as possible, not for teams to pollute the sample space to make one of theirs seem better.

Imagine how angry people would be if one person owned 5 pro (american) football teams and decided that team 5 would always win against the other 4, team 4 against the remaining 3, 3 against the other 2, and 2 would always win against 1.

It isn't playing the game with that kind of collusion, its just artifically inflating scores.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Captain_midnight:

Cheating in this context is the 'breaking of established and agreed upon rules for the conduct of a game'. What you seem to be missing here is not that collusion is a better strategy for maximising the reward of a few previously chosen individuals, but rather that:

1) The point is to try and solve the iterated prisoners dilemna. Not some other variant of the game;

2) It's not the same game if you change the rules for some players and not for others;

3) It's not supposed to be anything like a real 'cops and robbers' scenario as you suggested. It's an elucidation of principles of rationality and utility theory in a 'game' environment. Hence the name 'game theory'; and finally,

4) If you look at the utility of the colluding group as a whole, I would expect that it would be worse than that of the average player of the game, since at least half of the colluders need to be submissive (and accepting the penalty) to ensure that the other half win often enough to beat tit-for-tat.


My post above had nothing to do with 'academic cheating' or dishonesty. We're talking about breaking the game rules here and what that means (that there isn't a 'more optimal' solution to IDP). Now sure, this is just a game and the consequences aren't that high (well, actually, there are academic consequences... like the gaining of research funding, publications, etc), but the whole point of the IPD is that there is no collusion of communication; that each prisoner needs to have an individual strategy that maximises its own reward over the iterations of the game. Placing agents into the game with the sole purpose of trying to maximise their opponents reward breaks the concept of the game, because now the agent in the other room isn't their opponent, but rather friend (and remember, this ISN'T real cops and robbers, so we're not talking about two stooges nicked for stealing handbags).


Ask yourself this simple question. What if the game were played without the two prisoners being able to identify themselves. Would this 'collusion strategy' still be optimal? (You can assume you know the probability that the opponent is actually one of your friends and the probability that they are more senior to you or less senior).

Timkin
It seems to me that this talk of cheating is missing the purpose of the contest. These researchers could hardly care less about which team wins a Prisoner's Dilemma tournament; what they want to know is which strategy wins, in order to better understand what sort of strategies humans and animals apply in real life.

There is an obvious analogy here to social systems : The Southampton people have re-invented hierarchical societies, in which a small minority does very well for itself, while the vast mass suffers. The other researchers, in contrast, are trying to do anarchic capitalism, where every-AI-for-itself leads to a much more even distribution of resources.

There is no cheating, only an interesting demonstration of complicated strategies.
To win one hundred victories in one hundred battles is not the acme of skill. To subdue the enemy without fighting is the acme of skill.
What is also of great importance to realize is that the strategy which won may not be the best overall strategy, because the *environment* (the strategies other competitors use) is just as important as the winner's strategy to make the winner win! What if Tit For Tat was not present in the environment, which is entirely possible? Then surely another strategy would be best!

But here is the most interesting environmental point -- can a strategy be devised that figures out the patterns of the new winner and circumvents them? Tit for Tat is most stable because it didn't have a competitor beat it for a long time, even though they all realized exactly how TFT operates! It is thus a sort of stable strategy ... what if this new winner can be easily taken advantage of -- then it wouldn't survive at all! In fact, in the second competition, robots that performed well in the first were utterly taken advantage of by "mean" robots that only tried to win by playing devilishly -- even so, TFT survived!

For anyone more interested in the theoretical discussion of the contest and why different strategies win, there's an entire chapter devoted to it in Charles Dawkin's book "The Selfish Gene" which also emphasizes the important role of the environment.
h20, member of WFG 0 A.D.
Quote:Original post by King of Men
It seems to me that this talk of cheating is missing the purpose of the contest. These researchers could hardly care less about which team wins a Prisoner's Dilemma tournament; what they want to know is which strategy wins


So are you suggesting that a strategy that ignores the rules and performs better than those that obey the rules is a better strategy?

If all we wanted to do was find the most optimal strategy then of course we'd ignore all of the rules that lead us to penalty an impose lots of new rules that ensured we won whatever competition we were in. But then thats not the point of a game, is it (whether it be for social research or entertainment). This has important relevance to computer games too. Much of the fun (and thus optimal utility) obtained in winning a game is doing so within the bounds of the games rules. Sure, there are some who want to get to the end in whatever means they can. Most though want to beat the game without resorting to cheating. That's why Game AI is such a huge challenge!

Getting back to the IPD: I think we can all agree that using collusion/cheating strategies is likely to increase the expected utility of certain individuals. However, the point of such competitions is NOT to find any strategy that wins, but rather a strategy that wins within the constraints of the game. We can see this by considering another strategy that is guaranteed to beat Tit-For-Tat and the new collusion strategy: Shoot-The-Cop. In this strategy, the prisoner pulls out a hidden firearm, kills the cop and walks free, thinking they're all nuts for offering him a choice! Clearly he obtained the maximum reward - freedom - yet some would say this is an absurd strategy because he cheated by having a hidden gun and we added a rule that permits the killing of the jailers. Where do we draw the line at which rules we can ignore/omit and which we must maintain to preseve the 'game'?

The point I'm trying to get across is that if you change the rules then you have changed the game and it's no longer the Iterated Prisoners Dilemna, but rather something else... something else probably less interesting and less important too.

Cheers,

Timkin

This topic is closed to new replies.

Advertisement