Collusion Strategy Wins Iterated Prisoner's Dilemma Competition
Timkin, I think the disagreements here are a result of different goals from this experiment. You, and other academic types, are interested in finding a better strategy in terms of game theory, while others are more interested in finding an optimal strategy for the real world. For instance, consider a drug lord that could work to gain a ruthless reputation, and then he could be reasonably sure that the other prisoner would choose to spend more time in jail than to rat him out, because being such a ruthless person the drug lord will surely have this other prisoner tortured and killed, then kill his entire family. Then the optimal strategy extends beyond the walls of the jail house. But you are interested in something different, and "playing by the rules" can mean different things depending upon the situation.
Quote:Original post by Timkin
So are you suggesting that a strategy that ignores the rules and performs better than those that obey the rules is a better strategy?
In a word, yes. Evolution doesn't care how weird your niche is, it only cares whether your genes get passed on.
But more to the point, I don't think the Southampton team broke any rules. They made a set of bots that altered the environment in their own favour, true; but they were still playing IPD. That's hardly in the same class as pulling out a gun. Plenty of previous strategies have relied on trying to figure out which algorithm the opponent is following, without anybody calling it cheating.
Quote:Original post by Russell
Timkin, I think the disagreements here are a result of different goals from this experiment. You, and other academic types, are interested in finding a better strategy in terms of game theory, while others are more interested in finding an optimal strategy for the real world. For instance, consider a drug lord that could work to gain a ruthless reputation, and then he could be reasonably sure that the other prisoner would choose to spend more time in jail than to rat him out, because being such a ruthless person the drug lord will surely have this other prisoner tortured and killed, then kill his entire family. Then the optimal strategy extends beyond the walls of the jail house. But you are interested in something different, and "playing by the rules" can mean different things depending upon the situation.
It seems to me that the real world goal isn't to devise better strategies for drug lords, but to devise better understanding of the strategies that drug lords use. The example given in the article is business collusion in the bidding process. The idea being that a better understanding of how 'agents' collude can provide better tools for identifying and prosecuting collusion in situations where it is illegal - such as on government contracts or in setting prices.
Quote:Original post by King of Men
In a word, yes. Evolution doesn't care how weird your niche is, it only cares whether your genes get passed on.
But more to the point, I don't think the Southampton team broke any rules. They made a set of bots that altered the environment in their own favour, true; but they were still playing IPD. That's hardly in the same class as pulling out a gun. Plenty of previous strategies have relied on trying to figure out which algorithm the opponent is following, without anybody calling it cheating.
I first read about IPD in Richard Dawkins's _Selfish Gene_, and so I look at this from the evolutionary point of view. Just like you I believe.
I am trying to find a equivalent of this master slave collusion in nature, and the only thing I can think of is swarm insects such as ants. One ant reaps all the benefits, while a gazillion slaves labour for her. Not because she's better or something, but because she will carry on the species, the genes; and that's all that really matters.
Why is it cheating ?
I am more interested in the overall result for the Southampton team : did the winnings of the master compensate for the loss of the slaves ? If it did, then surely that's a valid strategy.
All we have to see now is other strategies that will keep an eye on other bots trying to apply this tactic and exploit the fact.
Something that says "this guy's a slave, abuse him!".
Think of ants titillating aphids to collect honey, or something like that.
I was watching an amazing documentary on card counters at Black Jack, the other night, and this article here reminds me very much of the documentary: the best tactic that evolved to "cheat" at Black Jack was to have teams of players, working together, most of them doing the counting work without trying to use it, while another would wait for a signal from the counters to bet high and collect shitloads of money, the winnings compensating for the money invested by the counters.
How was this countered ? Simply by getting list of known card counter teams and kicking them out of a tournament before they could get started...
so you could imagine that the tournament organisers would try and detect such master/slaves team and exclude them.
Or better, as I suggested above, have a bot that observe other bots and try to determine what strategy they are using, and if they are peered with one of the slaves, emit the correct signals so that you pass for a master and reap the reward ! :)
Just a thought.
LessBread -
Awesome post. Nash was one of my heroes growing up, and it doesn't surprise me to see a post like this from you. Bravo-Zulu.
I saw this post last night, but didn't read it until this morning. This was a bad move, since I thought about it all night. lol. I came up with a similar solution as the Southampton crew, complete with the proposed strategy and was ready to present a better solution until I read the article...
Story of my life... always a day late and a dollar short. <sigh>.
I honestly don’t think you can refer to this as 'cheating', since the ‘word’ of the rules was still being adhered to, although I can see the argument for the spirit of the game being broken. The rules of the game do not allow for *direct* communication which no robot did. Having a plan of predefined moves to identify each other does not break this rule, nor does the self sacrifice part of it. Feedback is communication, and therefore a viable alternative for identifying the motives of the other prisoner. Ironically, it also demonstrates that the pretense of collaboration and team work is ultimately a better solution to many problems vice solitude - a lesson that many walks of life need to review in this world imho.
Awesome post. Nash was one of my heroes growing up, and it doesn't surprise me to see a post like this from you. Bravo-Zulu.
I saw this post last night, but didn't read it until this morning. This was a bad move, since I thought about it all night. lol. I came up with a similar solution as the Southampton crew, complete with the proposed strategy and was ready to present a better solution until I read the article...
Story of my life... always a day late and a dollar short. <sigh>.
I honestly don’t think you can refer to this as 'cheating', since the ‘word’ of the rules was still being adhered to, although I can see the argument for the spirit of the game being broken. The rules of the game do not allow for *direct* communication which no robot did. Having a plan of predefined moves to identify each other does not break this rule, nor does the self sacrifice part of it. Feedback is communication, and therefore a viable alternative for identifying the motives of the other prisoner. Ironically, it also demonstrates that the pretense of collaboration and team work is ultimately a better solution to many problems vice solitude - a lesson that many walks of life need to review in this world imho.
Okay... if we're going to discuss this as a viable strategy in the 'Real World (TM)', then that's one matter. From the game theory perspective, I still stand by my stance that collusion is cheating.
Just a quick response on the drug lord example: Russell, you've changed the utility function of the IDP in your example by stating that the penalty for ratting out is now extremely bad no matter what the other party does. I.e., you've removed any and all positive return that might be obtained by implying that 'one day you'll pay for ratting the drug lord out'.
From an evolutionary or even social perspective (and this includes business as well), yes, I certainly accept that collusion is going to be a viable strategy in certain problems since for the few agents at the top of the heirarchy, it pays more. Heck, that's how most Western economies work!? I don't though, accept that collusion is a globally better strategy when you consider both of the colluding parties in the IPD. That is, (and I accept I haven't seen numeric results, so this is based on my experience/knowledge/intuition), if you consider the expected utility of both colluders over many iterations of the game, it would be lower than the expected utility of a TFT player. In other words, one of the pair is taking a maximum utility hit so the other can make a moderate gain that is, on average, more than the gains obtained by TFT.
I don't see that collusion in the bidding/tendering processes of business is a reasonable example though, since there is no actual penalty applied to companies who aren't successful in that process. Auctions are not equivalent to IPD, nor are tendor processes.
Timkin
Just a quick response on the drug lord example: Russell, you've changed the utility function of the IDP in your example by stating that the penalty for ratting out is now extremely bad no matter what the other party does. I.e., you've removed any and all positive return that might be obtained by implying that 'one day you'll pay for ratting the drug lord out'.
From an evolutionary or even social perspective (and this includes business as well), yes, I certainly accept that collusion is going to be a viable strategy in certain problems since for the few agents at the top of the heirarchy, it pays more. Heck, that's how most Western economies work!? I don't though, accept that collusion is a globally better strategy when you consider both of the colluding parties in the IPD. That is, (and I accept I haven't seen numeric results, so this is based on my experience/knowledge/intuition), if you consider the expected utility of both colluders over many iterations of the game, it would be lower than the expected utility of a TFT player. In other words, one of the pair is taking a maximum utility hit so the other can make a moderate gain that is, on average, more than the gains obtained by TFT.
I don't see that collusion in the bidding/tendering processes of business is a reasonable example though, since there is no actual penalty applied to companies who aren't successful in that process. Auctions are not equivalent to IPD, nor are tendor processes.
Timkin
Quote:Original post by Timkin
I don't though, accept that collusion is a globally better strategy when you consider both of the colluding parties in the IPD. That is, (and I accept I haven't seen numeric results, so this is based on my experience/knowledge/intuition), if you consider the expected utility of both colluders over many iterations of the game, it would be lower than the expected utility of a TFT player. In other words, one of the pair is taking a maximum utility hit so the other can make a moderate gain that is, on average, more than the gains obtained by TFT.
I suppose that's the real problem right there. The maximum winning would have to compensate the maximum loss incurred by the slave(s).
In the Black Jack example, this was exactly what happened. Out of three people, only one won, but he more than compensated the money "lost" (I think invested is more appropriate, really) by the two wingmen.
Quote:
I don't see that collusion in the bidding/tendering processes of business is a reasonable example though, since there is no actual penalty applied to companies who aren't successful in that process. Auctions are not equivalent to IPD, nor are tendor processes.
Timkin
That's why I mention Black Jack. Although it's true you can't make a direct parallel (it would depend on the reward/loss numbers, I think) between Black Jack and an IPD.
Quote:Original post by xiuhcoatl
LessBread -
Awesome post. Nash was one of my heroes growing up, and it doesn't surprise me to see a post like this from you. Bravo-Zulu.
Thanks! [grin]
I came across a press release today that I think relates to this topic: UCLA study points to evolutionary roots of altruism, moral outrage. The study was performed by anthropologists and while the article doesn't mention the use of computers, it does describe the study as being based on a mathematical model and uses language that suggests the use of computers. Unfortunately, the press release doesn't provide much in the way of technical details.
Here's the portion from the press release that describes the model and some of the outcomes:
Quote:
In his mathematical model, Panchanathan pitted three types of society members:
* "Cooperators," or people who always contribute to the public good and who always assist individual community members in the group with the favors that are asked of them.
* "Defectors," who never contribute to the public good nor assist other community members who ask for help.
* "Shunners," or hard-nosed types who contribute to the public good, but only lend aid to those individuals with a reputation for contributing to the public good and helping other good community members who ask for help. For members in bad standing, shunners withhold individual assistance.
During the course of the game, both cooperators and shunners helped to clear the swamp. The benefits from the mosquito-free swamp, however, flowed to the whole community, including defectors. When the researcher took only this behavior into account, the defectors come out on top because they enjoyed the same benefits the other types, but they paid no costs for the benefits.
But when it came to getting help in home repair, the defectors didn't always do so well. The cooperators helped anyone who asked, but the shunners were selective; they only help those with a reputation for clearing the swamp and helping good community members in home repair. By not helping defectors when they ask for help, shunners were able to save time and resources, thus improving their score. If the loss that defectors experienced from not being helped by shunners was greater than the cost they would have paid to clear the swamp, then defectors lost out.
After these social interactions went on for a period of time that might approximate a generation, individuals were allowed to reproduce based on accumulated scores, so that those with more "fitness points" had more children. Those children were assumed to have adopted their parents' strategy.
Eventually, Panchanathan found that communities end up with either all defectors or all shunners.
"Both of those end points represent 'evolutionarily stable equilibriums'; no matter how much time passes, the make-up of the population does not change," Panchanathan said.
In a community with just cooperators and defectors, defectors -- not surprisingly -- always won. Also when shunners were matched against cooperators, shunners won.
"The cooperators were too nice; they died out," Panchanathan said. "In order to survive, they had to be discriminate about the help they gave."
But when shunners were matched against defectors, the outcome was either shunners or defectors. The outcome depended on the initial frequency of shunners. If enough shunners were present at the beginning of the exercise, then shunners prevailed. Otherwise, defectors prevailed, potentially pointing to the precarious nature of cooperative society.
I suppose what is novel about this study are the conclusions it draws about the possible origins of social morality - because back in college I read studies about real people in real situations that described the same outcomes. The example I remember involved clearing weeds from an irrigation ditch rather than clearing a swamp but I suppose the kind of collective action isn't really important.
This brings up something about IPD in a computer science context that I don't fully get since I come at IPD from a social science perspective. In a social science perspective, at least as far as the PD model is applied to the real world, it is assumed that an agents reputation follows them to some extent. So that, it's tit for tat, but, as in the UCLA example, defectors get a reputation for defection and end up being shunned. Maybe it's the adherence to the abstract formulation of the model (the stricture against communication) that I'm not getting, or maybe I do get it and well - I don't know. Anyway.
Do you think it possible to apply the findings of the UCLA study to the Southhampton strategy and do you think that would improve the Southhampton results? It seems clear that Southhampton's 'slaves' correspond with UCLA's 'cooperators', but does Southhampton's 'masters' correspond with UCLA's 'shunners' or 'defectors'?
Well to be really fair *iterated* prisoner's dilemma isn't really a classic prisoner's dilemma since the very fact that it is iterated gives the possiblity of communication. Of course, this reflects social situations more accurately since you tend to interact with plenty of people more than once. However, I odn't think the result is that interesting from a theorethical game theoretic stand point, which is why academics like Timkin think it's cheating. And trying to link this to feudal systems and the like seems kind of far fetched. The good thing is it has brought exposure to game theory and the prisoner's dilemma!
Quote:Original post by xiuhcoatl
I honestly don’t think you can refer to this as 'cheating', since the ‘word’ of the rules was still being adhered to, although I can see the argument for the spirit of the game being broken. The rules of the game do not allow for *direct* communication which no robot did. Having a plan of predefined moves to identify each other does not break this rule, nor does the self sacrifice part of it. Feedback is communication, and therefore a viable alternative for identifying the motives of the other prisoner.
It does violate the rules to which TFT adhered, though in a different sense. To quote Axelrod, "there is no mechanism available to the players to make enforceable threats or commitments." Lack of communication is actually a secondary concern. Even if direct communication is allowed, I can't guarantee that the other prisoner will keep up his end of any bargain we might establish. We may agree to both cooperate whenever we can identify each other, but nothing is keeping him from double crossing me, or vice versa. On the other hand, the master agent *knows* that the slave agent will fulfill his end of their 'bargain.'
Even if you disregard the above, you still can't directly compare the Southhampton strategy to TFT because SH is essentially *two* strategies, only one of which actually performs well and which requires the other to be present. Drop in the master agent by itself and I imagine we'll find it to be average at best, depending on what its backup strategy is.
Like others have already stated, I'm not saying this is an uninteresting or useless find, but I don't think it warrants the kind of comparisons it's inspired.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement