Sign in to follow this  

Collusion Strategy Wins Iterated Prisoner's Dilemma Competition

This topic is 4758 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Recommended Posts

I came across this article and thought gamedev might find it interesting. Researchers in England used an agent collusion strategy to win an AI competition: A New Way Out of the Prisoner’s Dilemma: Cheat.
Quote:
... “The Prisoner’s Dilemma encapsulates the essence of how cooperation can emerge when you have self-interested behavior,” said Nick Jennings, a computer science professor at Southampton University and member of the winning team. “It’s very, very simple, and yet very powerful.” The winning strategy, Tit for Tat, worked like this: a player began by cooperating each round, until his opponent defected. After this point, the player echoed his opponent’s last move, defecting after the other player had betrayed him, and cooperating when his opponent began to cooperate again. The Southampton University researchers, who entered a team of 60 software robots, or agents, into the competition, found that an even better strategy was to submit a group of robots that behaved in the tournament either as masters or slaves. A robot’s particular role was encoded in its pattern of opening moves, allowing any other robot that knew what to listen for to recognize its opponent as another Southampton player. When a Southampton “master” was pitted against a Southampton “slave”, the slave would sacrifice itself, cooperating every round to allow the master to defect repeatedly and rack up a huge score. In the end, the individual master robots scored more points than the robots playing Tit for Tat. “It was almost like a cult,” said Graham Kendall, a professor at the University of Nottingham, in England, who organized the 20th-anniversary competition, referring to the culture the Southampton entries created. “There were superagents who exploited all their friends.” While technically this strategy violates the spirit of the Prisoner’s Dilemma, which assumes that the two prisoners cannot communicate with one another, the Southampton system is not without its uses. In reality, collusion between competitors is generally impossible to prevent, and a whole field of research has sprung up to study how intelligent agents can use what’s called coding theory to communicate covertly, and how such collusion can in turn be prevented. For example, the U.S. Federal Communications Commission’s radio spectrum auctions have historically been plagued by cases of bidders colluding by coding hidden signals in their bids, and so the FCC has brought in researchers to advise it on how to modify the rules of its auctions to try to mitigate collusion. ... A second round of the Iterated Prisoner’s Dilemma competition will be held in April of 2005 at the IEEE Symposium on Computational Intelligence and Games hosted by the University of Essex, in England. Armchair game theorists may submit an entry by visiting http://www.prisoners-dilemma.com/.

Share this post


Link to post
Share on other sites
I've seen this article as well... and while I thought it was interesting, I also have this to say...

...just because an algorithm cheats doesn't mean it's better than some other algorithm that doesn't cheat. In other words, IF you kept the constraint that algorithms couldn't cheat, then Tit-For-Tat is still an optimal strategy.

Personally I don't think they've done anything really noteworthy, because all they have done is change (ignore) the rules of the game to ones which benefit their algorithm and done so in an environment where no other agents were permitted to act according to their rules. If we suddenly decided that cheating was a viable strategy in game theory problems, then many of the current optimal solutions would suddenly become sub-optimal.

Besides, I hate cheaters! ;)

Timkin

Share this post


Link to post
Share on other sites
The story did strike me as a puff piece advertising the next round of competition. While I agree that new insights can be gained by studying the various ways that collusion leads to success in the game, I don't think Southampton should have been awarded any kind of prize for cheating. It seems to me that they ruined the credibility of the competition. Who would enter now knowing that the other contestants are probably cheating?

Share this post


Link to post
Share on other sites
why isn't cheating a valid solution, as long as that cheat has no legal and/or damaging social reprecussions? For example, given person A forcing person B to play a game of russian roulette, isn't cheating a valid option for person B? The prisoner's dilemma is similar, major reprecussions for the agents would come to fruition if things don't go correctly for them, so why not cheat?

cheating is a bad word, in this case, because the word cheating connotates academic dishonesty, which IS wrong, it creates a situation in which you can falsely represent yourself as being knowledgeable, which could later lead to problems with your career (specifically, losing money for the company that you work for). Cheating on a test hurts other people around you.

But in the case of the prisoner's dilemma, the "game" is really just "cops vs robbers". The fact that the formal game has the restriction that it is impossible for the robbers to communicate just points out a fundamental flaw in the game, it doesn't represent anything real.

So, it's not "cheating" to get the prisoners to communicate. It's a competition between the prisoners and cops, and the so called "cheating" is just counter-manding the efforts of the cops, which is exactly what they want to do.

Really, what is cheating? When you get into the real world, there is breaking the law, there is violating the trust of your friends and family, but other than that, what is there? This extends even to sporting competitions, cheating is "illegal" in the sense that, if you are caught, there are serious reprecussions. It is also a violation of your teammates trust (that is, if you cheat independantly without their knowledge).

In this specific case, the Southampton team was clever enough to get around the rules. Their achievment should be recognized, they are deserving of this win.

Share this post


Link to post
Share on other sites
I agree. It seems like everyone is labeling this "cheating" only because they didn't think of it first. Collusion is a very beneficial strategy to the winners when used correctly, but the losers pay big.

Share this post


Link to post
Share on other sites
Its 'cheating' because it isn't solving the problem at all. The point is for each one to do as well as possible, not for teams to pollute the sample space to make one of theirs seem better.

Imagine how angry people would be if one person owned 5 pro (american) football teams and decided that team 5 would always win against the other 4, team 4 against the remaining 3, 3 against the other 2, and 2 would always win against 1.

It isn't playing the game with that kind of collusion, its just artifically inflating scores.

Share this post


Link to post
Share on other sites
Captain_midnight:

Cheating in this context is the 'breaking of established and agreed upon rules for the conduct of a game'. What you seem to be missing here is not that collusion is a better strategy for maximising the reward of a few previously chosen individuals, but rather that:

1) The point is to try and solve the iterated prisoners dilemna. Not some other variant of the game;

2) It's not the same game if you change the rules for some players and not for others;

3) It's not supposed to be anything like a real 'cops and robbers' scenario as you suggested. It's an elucidation of principles of rationality and utility theory in a 'game' environment. Hence the name 'game theory'; and finally,

4) If you look at the utility of the colluding group as a whole, I would expect that it would be worse than that of the average player of the game, since at least half of the colluders need to be submissive (and accepting the penalty) to ensure that the other half win often enough to beat tit-for-tat.


My post above had nothing to do with 'academic cheating' or dishonesty. We're talking about breaking the game rules here and what that means (that there isn't a 'more optimal' solution to IDP). Now sure, this is just a game and the consequences aren't that high (well, actually, there are academic consequences... like the gaining of research funding, publications, etc), but the whole point of the IPD is that there is no collusion of communication; that each prisoner needs to have an individual strategy that maximises its own reward over the iterations of the game. Placing agents into the game with the sole purpose of trying to maximise their opponents reward breaks the concept of the game, because now the agent in the other room isn't their opponent, but rather friend (and remember, this ISN'T real cops and robbers, so we're not talking about two stooges nicked for stealing handbags).


Ask yourself this simple question. What if the game were played without the two prisoners being able to identify themselves. Would this 'collusion strategy' still be optimal? (You can assume you know the probability that the opponent is actually one of your friends and the probability that they are more senior to you or less senior).

Timkin

Share this post


Link to post
Share on other sites
It seems to me that this talk of cheating is missing the purpose of the contest. These researchers could hardly care less about which team wins a Prisoner's Dilemma tournament; what they want to know is which strategy wins, in order to better understand what sort of strategies humans and animals apply in real life.

There is an obvious analogy here to social systems : The Southampton people have re-invented hierarchical societies, in which a small minority does very well for itself, while the vast mass suffers. The other researchers, in contrast, are trying to do anarchic capitalism, where every-AI-for-itself leads to a much more even distribution of resources.

There is no cheating, only an interesting demonstration of complicated strategies.

Share this post


Link to post
Share on other sites
What is also of great importance to realize is that the strategy which won may not be the best overall strategy, because the *environment* (the strategies other competitors use) is just as important as the winner's strategy to make the winner win! What if Tit For Tat was not present in the environment, which is entirely possible? Then surely another strategy would be best!

But here is the most interesting environmental point -- can a strategy be devised that figures out the patterns of the new winner and circumvents them? Tit for Tat is most stable because it didn't have a competitor beat it for a long time, even though they all realized exactly how TFT operates! It is thus a sort of stable strategy ... what if this new winner can be easily taken advantage of -- then it wouldn't survive at all! In fact, in the second competition, robots that performed well in the first were utterly taken advantage of by "mean" robots that only tried to win by playing devilishly -- even so, TFT survived!

For anyone more interested in the theoretical discussion of the contest and why different strategies win, there's an entire chapter devoted to it in Charles Dawkin's book "The Selfish Gene" which also emphasizes the important role of the environment.

Share this post


Link to post
Share on other sites
Quote:
Original post by King of Men
It seems to me that this talk of cheating is missing the purpose of the contest. These researchers could hardly care less about which team wins a Prisoner's Dilemma tournament; what they want to know is which strategy wins


So are you suggesting that a strategy that ignores the rules and performs better than those that obey the rules is a better strategy?

If all we wanted to do was find the most optimal strategy then of course we'd ignore all of the rules that lead us to penalty an impose lots of new rules that ensured we won whatever competition we were in. But then thats not the point of a game, is it (whether it be for social research or entertainment). This has important relevance to computer games too. Much of the fun (and thus optimal utility) obtained in winning a game is doing so within the bounds of the games rules. Sure, there are some who want to get to the end in whatever means they can. Most though want to beat the game without resorting to cheating. That's why Game AI is such a huge challenge!

Getting back to the IPD: I think we can all agree that using collusion/cheating strategies is likely to increase the expected utility of certain individuals. However, the point of such competitions is NOT to find any strategy that wins, but rather a strategy that wins within the constraints of the game. We can see this by considering another strategy that is guaranteed to beat Tit-For-Tat and the new collusion strategy: Shoot-The-Cop. In this strategy, the prisoner pulls out a hidden firearm, kills the cop and walks free, thinking they're all nuts for offering him a choice! Clearly he obtained the maximum reward - freedom - yet some would say this is an absurd strategy because he cheated by having a hidden gun and we added a rule that permits the killing of the jailers. Where do we draw the line at which rules we can ignore/omit and which we must maintain to preseve the 'game'?

The point I'm trying to get across is that if you change the rules then you have changed the game and it's no longer the Iterated Prisoners Dilemna, but rather something else... something else probably less interesting and less important too.

Cheers,

Timkin

Share this post


Link to post
Share on other sites
Timkin, I think the disagreements here are a result of different goals from this experiment. You, and other academic types, are interested in finding a better strategy in terms of game theory, while others are more interested in finding an optimal strategy for the real world. For instance, consider a drug lord that could work to gain a ruthless reputation, and then he could be reasonably sure that the other prisoner would choose to spend more time in jail than to rat him out, because being such a ruthless person the drug lord will surely have this other prisoner tortured and killed, then kill his entire family. Then the optimal strategy extends beyond the walls of the jail house. But you are interested in something different, and "playing by the rules" can mean different things depending upon the situation.

Share this post


Link to post
Share on other sites
Quote:
Original post by Timkin
So are you suggesting that a strategy that ignores the rules and performs better than those that obey the rules is a better strategy?


In a word, yes. Evolution doesn't care how weird your niche is, it only cares whether your genes get passed on.

But more to the point, I don't think the Southampton team broke any rules. They made a set of bots that altered the environment in their own favour, true; but they were still playing IPD. That's hardly in the same class as pulling out a gun. Plenty of previous strategies have relied on trying to figure out which algorithm the opponent is following, without anybody calling it cheating.

Share this post


Link to post
Share on other sites
Quote:
Original post by Russell
Timkin, I think the disagreements here are a result of different goals from this experiment. You, and other academic types, are interested in finding a better strategy in terms of game theory, while others are more interested in finding an optimal strategy for the real world. For instance, consider a drug lord that could work to gain a ruthless reputation, and then he could be reasonably sure that the other prisoner would choose to spend more time in jail than to rat him out, because being such a ruthless person the drug lord will surely have this other prisoner tortured and killed, then kill his entire family. Then the optimal strategy extends beyond the walls of the jail house. But you are interested in something different, and "playing by the rules" can mean different things depending upon the situation.


It seems to me that the real world goal isn't to devise better strategies for drug lords, but to devise better understanding of the strategies that drug lords use. The example given in the article is business collusion in the bidding process. The idea being that a better understanding of how 'agents' collude can provide better tools for identifying and prosecuting collusion in situations where it is illegal - such as on government contracts or in setting prices.

Share this post


Link to post
Share on other sites
Quote:
Original post by King of Men
In a word, yes. Evolution doesn't care how weird your niche is, it only cares whether your genes get passed on.

But more to the point, I don't think the Southampton team broke any rules. They made a set of bots that altered the environment in their own favour, true; but they were still playing IPD. That's hardly in the same class as pulling out a gun. Plenty of previous strategies have relied on trying to figure out which algorithm the opponent is following, without anybody calling it cheating.


I first read about IPD in Richard Dawkins's _Selfish Gene_, and so I look at this from the evolutionary point of view. Just like you I believe.
I am trying to find a equivalent of this master slave collusion in nature, and the only thing I can think of is swarm insects such as ants. One ant reaps all the benefits, while a gazillion slaves labour for her. Not because she's better or something, but because she will carry on the species, the genes; and that's all that really matters.
Why is it cheating ?
I am more interested in the overall result for the Southampton team : did the winnings of the master compensate for the loss of the slaves ? If it did, then surely that's a valid strategy.

All we have to see now is other strategies that will keep an eye on other bots trying to apply this tactic and exploit the fact.
Something that says "this guy's a slave, abuse him!".
Think of ants titillating aphids to collect honey, or something like that.

I was watching an amazing documentary on card counters at Black Jack, the other night, and this article here reminds me very much of the documentary: the best tactic that evolved to "cheat" at Black Jack was to have teams of players, working together, most of them doing the counting work without trying to use it, while another would wait for a signal from the counters to bet high and collect shitloads of money, the winnings compensating for the money invested by the counters.
How was this countered ? Simply by getting list of known card counter teams and kicking them out of a tournament before they could get started...

so you could imagine that the tournament organisers would try and detect such master/slaves team and exclude them.
Or better, as I suggested above, have a bot that observe other bots and try to determine what strategy they are using, and if they are peered with one of the slaves, emit the correct signals so that you pass for a master and reap the reward ! :)

Just a thought.

Share this post


Link to post
Share on other sites
LessBread -

Awesome post. Nash was one of my heroes growing up, and it doesn't surprise me to see a post like this from you. Bravo-Zulu.


I saw this post last night, but didn't read it until this morning. This was a bad move, since I thought about it all night. lol. I came up with a similar solution as the Southampton crew, complete with the proposed strategy and was ready to present a better solution until I read the article...

Story of my life... always a day late and a dollar short. <sigh>.


I honestly don’t think you can refer to this as 'cheating', since the ‘word’ of the rules was still being adhered to, although I can see the argument for the spirit of the game being broken. The rules of the game do not allow for *direct* communication which no robot did. Having a plan of predefined moves to identify each other does not break this rule, nor does the self sacrifice part of it. Feedback is communication, and therefore a viable alternative for identifying the motives of the other prisoner. Ironically, it also demonstrates that the pretense of collaboration and team work is ultimately a better solution to many problems vice solitude - a lesson that many walks of life need to review in this world imho.

Share this post


Link to post
Share on other sites
Okay... if we're going to discuss this as a viable strategy in the 'Real World (TM)', then that's one matter. From the game theory perspective, I still stand by my stance that collusion is cheating.

Just a quick response on the drug lord example: Russell, you've changed the utility function of the IDP in your example by stating that the penalty for ratting out is now extremely bad no matter what the other party does. I.e., you've removed any and all positive return that might be obtained by implying that 'one day you'll pay for ratting the drug lord out'.

From an evolutionary or even social perspective (and this includes business as well), yes, I certainly accept that collusion is going to be a viable strategy in certain problems since for the few agents at the top of the heirarchy, it pays more. Heck, that's how most Western economies work!? I don't though, accept that collusion is a globally better strategy when you consider both of the colluding parties in the IPD. That is, (and I accept I haven't seen numeric results, so this is based on my experience/knowledge/intuition), if you consider the expected utility of both colluders over many iterations of the game, it would be lower than the expected utility of a TFT player. In other words, one of the pair is taking a maximum utility hit so the other can make a moderate gain that is, on average, more than the gains obtained by TFT.

I don't see that collusion in the bidding/tendering processes of business is a reasonable example though, since there is no actual penalty applied to companies who aren't successful in that process. Auctions are not equivalent to IPD, nor are tendor processes.

Timkin

Share this post


Link to post
Share on other sites
Quote:
Original post by Timkin
I don't though, accept that collusion is a globally better strategy when you consider both of the colluding parties in the IPD. That is, (and I accept I haven't seen numeric results, so this is based on my experience/knowledge/intuition), if you consider the expected utility of both colluders over many iterations of the game, it would be lower than the expected utility of a TFT player. In other words, one of the pair is taking a maximum utility hit so the other can make a moderate gain that is, on average, more than the gains obtained by TFT.

I suppose that's the real problem right there. The maximum winning would have to compensate the maximum loss incurred by the slave(s).
In the Black Jack example, this was exactly what happened. Out of three people, only one won, but he more than compensated the money "lost" (I think invested is more appropriate, really) by the two wingmen.

Quote:

I don't see that collusion in the bidding/tendering processes of business is a reasonable example though, since there is no actual penalty applied to companies who aren't successful in that process. Auctions are not equivalent to IPD, nor are tendor processes.
Timkin


That's why I mention Black Jack. Although it's true you can't make a direct parallel (it would depend on the reward/loss numbers, I think) between Black Jack and an IPD.

Share this post


Link to post
Share on other sites
Quote:
Original post by xiuhcoatl
LessBread -

Awesome post. Nash was one of my heroes growing up, and it doesn't surprise me to see a post like this from you. Bravo-Zulu.

Thanks! [grin]


I came across a press release today that I think relates to this topic: UCLA study points to evolutionary roots of altruism, moral outrage. The study was performed by anthropologists and while the article doesn't mention the use of computers, it does describe the study as being based on a mathematical model and uses language that suggests the use of computers. Unfortunately, the press release doesn't provide much in the way of technical details.

Here's the portion from the press release that describes the model and some of the outcomes:

Quote:

In his mathematical model, Panchanathan pitted three types of society members:

* "Cooperators," or people who always contribute to the public good and who always assist individual community members in the group with the favors that are asked of them.
* "Defectors," who never contribute to the public good nor assist other community members who ask for help.
* "Shunners," or hard-nosed types who contribute to the public good, but only lend aid to those individuals with a reputation for contributing to the public good and helping other good community members who ask for help. For members in bad standing, shunners withhold individual assistance.

During the course of the game, both cooperators and shunners helped to clear the swamp. The benefits from the mosquito-free swamp, however, flowed to the whole community, including defectors. When the researcher took only this behavior into account, the defectors come out on top because they enjoyed the same benefits the other types, but they paid no costs for the benefits.

But when it came to getting help in home repair, the defectors didn't always do so well. The cooperators helped anyone who asked, but the shunners were selective; they only help those with a reputation for clearing the swamp and helping good community members in home repair. By not helping defectors when they ask for help, shunners were able to save time and resources, thus improving their score. If the loss that defectors experienced from not being helped by shunners was greater than the cost they would have paid to clear the swamp, then defectors lost out.

After these social interactions went on for a period of time that might approximate a generation, individuals were allowed to reproduce based on accumulated scores, so that those with more "fitness points" had more children. Those children were assumed to have adopted their parents' strategy.

Eventually, Panchanathan found that communities end up with either all defectors or all shunners.

"Both of those end points represent 'evolutionarily stable equilibriums'; no matter how much time passes, the make-up of the population does not change," Panchanathan said.

In a community with just cooperators and defectors, defectors -- not surprisingly -- always won. Also when shunners were matched against cooperators, shunners won.

"The cooperators were too nice; they died out," Panchanathan said. "In order to survive, they had to be discriminate about the help they gave."

But when shunners were matched against defectors, the outcome was either shunners or defectors. The outcome depended on the initial frequency of shunners. If enough shunners were present at the beginning of the exercise, then shunners prevailed. Otherwise, defectors prevailed, potentially pointing to the precarious nature of cooperative society.


I suppose what is novel about this study are the conclusions it draws about the possible origins of social morality - because back in college I read studies about real people in real situations that described the same outcomes. The example I remember involved clearing weeds from an irrigation ditch rather than clearing a swamp but I suppose the kind of collective action isn't really important.

This brings up something about IPD in a computer science context that I don't fully get since I come at IPD from a social science perspective. In a social science perspective, at least as far as the PD model is applied to the real world, it is assumed that an agents reputation follows them to some extent. So that, it's tit for tat, but, as in the UCLA example, defectors get a reputation for defection and end up being shunned. Maybe it's the adherence to the abstract formulation of the model (the stricture against communication) that I'm not getting, or maybe I do get it and well - I don't know. Anyway.

Do you think it possible to apply the findings of the UCLA study to the Southhampton strategy and do you think that would improve the Southhampton results? It seems clear that Southhampton's 'slaves' correspond with UCLA's 'cooperators', but does Southhampton's 'masters' correspond with UCLA's 'shunners' or 'defectors'?

Share this post


Link to post
Share on other sites
Well to be really fair *iterated* prisoner's dilemma isn't really a classic prisoner's dilemma since the very fact that it is iterated gives the possiblity of communication. Of course, this reflects social situations more accurately since you tend to interact with plenty of people more than once. However, I odn't think the result is that interesting from a theorethical game theoretic stand point, which is why academics like Timkin think it's cheating. And trying to link this to feudal systems and the like seems kind of far fetched. The good thing is it has brought exposure to game theory and the prisoner's dilemma!

Share this post


Link to post
Share on other sites
Quote:
Original post by xiuhcoatl
I honestly don’t think you can refer to this as 'cheating', since the ‘word’ of the rules was still being adhered to, although I can see the argument for the spirit of the game being broken. The rules of the game do not allow for *direct* communication which no robot did. Having a plan of predefined moves to identify each other does not break this rule, nor does the self sacrifice part of it. Feedback is communication, and therefore a viable alternative for identifying the motives of the other prisoner.


It does violate the rules to which TFT adhered, though in a different sense. To quote Axelrod, "there is no mechanism available to the players to make enforceable threats or commitments." Lack of communication is actually a secondary concern. Even if direct communication is allowed, I can't guarantee that the other prisoner will keep up his end of any bargain we might establish. We may agree to both cooperate whenever we can identify each other, but nothing is keeping him from double crossing me, or vice versa. On the other hand, the master agent *knows* that the slave agent will fulfill his end of their 'bargain.'
Even if you disregard the above, you still can't directly compare the Southhampton strategy to TFT because SH is essentially *two* strategies, only one of which actually performs well and which requires the other to be present. Drop in the master agent by itself and I imagine we'll find it to be average at best, depending on what its backup strategy is.
Like others have already stated, I'm not saying this is an uninteresting or useless find, but I don't think it warrants the kind of comparisons it's inspired.

Share this post


Link to post
Share on other sites
I don't see how this is completely inaccurate in life. If you look at societies, the tendency is for a few to take all the control they can and for a majority to submit to what the leaders decide.

I'm not really sure I'd call this a solution to the problem if you define the problem as how can both do the best but if you define the problem as racking up the highest score, I don't see a problem.

To me, what you are really seeing here is artificial intelligence simulating part of life.

Share this post


Link to post
Share on other sites
I don't even see the point of the research. Putting the issue of cheating aside, surely it should come as no surprise that if one agent sacrifices itself to help a second, then that second one is likely to be in a better position than any third agent that has received no such consideration.

Share this post


Link to post
Share on other sites

This topic is 4758 days old which is more than the 365 day threshold we allow for new replies. Please post a new topic.

If you intended to correct an error in the post then please contact us.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Sign in to follow this