Category Archives: Game Theory

“Epistemic Game Theory,” E. Dekel & M. Siniscalchi (2014)

Here is a handbook chapter that is long overdue. The theory of epistemic games concerns a fairly novel justification for solution concepts under strategic uncertainty – that is, situations where what I want to do depends on other people do, and vice versa. We generally analyze these as games, and have a bunch of equilibrium (Nash, subgame perfection, etc.) and nonequilibrium (Nash bargain, rationalizability, etc.) solution concepts. So which should you use? I can think of four classes of justification for a game solution. First, the solution might be stable: if you told each player what to do, no one person (or sometimes group) would want to deviate. Maskin mentions this justification is particularly worthy when it comes to mechanism design. Second, the solution might be the outcome of a dynamic selection process, such as evolution or a particular learning rule. Third, the solution may be justified by certain axiomatic first principles; Shapley value is a good example in this class. The fourth class, however, is the one we most often teach students: a solution concept is good because it is justified by individual behavior assumptions. Nash, for example, is often thought to be justified by “rationality plus correct beliefs”. Backward induction is similarly justified by “common knowledge of rationality at all states.”

Those are informal arguments, however. The epistemic games (or sometimes, “interactive epistemology”) program seeks to formally analyze assumptions about the knowledge and rationality of players and what it implies for behavior. There remain many results we don’t know (for instance, I asked around and could only come up with one paper on the epistemics of coalitional games), but the results proven so far are actually fascinating. Let me give you three: rationality and common belief in rationality implies rationalizable strategies are played, the requirements for Nash are different depending on how players there are, and backward induction is surprisingly difficult to justify on epistemic grounds.

First, rationalizability. Take a game and remove any strictly dominated strategy for each player. Now in the reduced game, remove anything that is strictly dominated. Continue doing this until nothing is left to remove. The remaining strategies for each player are “rationalizable”. If players can hold any belief they want about what potential “types” opponents may be – where a given (Harsanyi) type specifies what an opponent will do – then as long as we are all rational, we all believe the opponents are rational, we all believe the opponents all believe that we all are rational, ad infinitum, the only possible outcomes to the game are the rationalizable ones. Proving this is actually quite complex: if we take as primitive the “hierarchy of beliefs” of each player (what do I believe my opponents will do, what do I believe they believe I will do, and so on), then we need to show that any hierarchy of beliefs can be written down in a type structure, then we need to be careful about how we define “rational” and “common belief” on a type structure, but all of this can be done. Note that many rationalizable strategies are not Nash equilibria.

So what further assumptions do we need to justify Nash? Recall the naive explanation: “rationality plus correct beliefs”. Nash takes us from rationalizability, where play is based on conjectures about opponent’s play, to an equilibrium, where play is based on correct conjectures. But which beliefs need to be correct? With two players and no uncertainty, the result is actually fairly straightforward: if our first order beliefs are (f,g), we mutually believe our first order beliefs are (f,g), and we mutually believe we are rational, then beliefs (f,g) represent a Nash equilibrium. You should notice three things here. First, we only need mutual belief (I know X, and you know I know X), not common belief, in rationality and in our first order beliefs. Second, the result is that our first-order beliefs are that a Nash equilibrium strategy will be played by all players; the result is about beliefs, not actual play. Third, with more than two players, we are clearly going to need assumptions about how my beliefs about our mutual opponent are related to your beliefs; that is, Nash will require more, epistemically, than “basic strategic reasoning”. Knowing these conditions can be quite useful. For instance, Terri Kneeland at UCL has investigated experimentally the extent to which each of the required epistemic conditions are satisfied, which helps us to understand situations in which Nash is harder to justify.

Finally, how about backward induction? Consider a centipede game. The backward induction rationale is that if we reached the final stage, the final player would defect, hence if we are in the second-to-last stage I should see that coming and defect before her, hence if we are in the third-to-last stage she will see that coming and defect before me, and so on. Imagine that, however, player 1 does not defect in the first stage. What am I to infer? Was this a mistake or am I perhaps facing an irrational opponent? Backward induction requires that I never make such an inference, and hence I defect in stage 2.

Here is a better justification for defection in the centipede game, though. If player 1 doesn’t defect in the first stage, then I “try my best” to retain a belief in his rationality. That is, if it is possible for him to have some belief about my actions in the second stage which rationally justified his first stage action, then I must believe that he holds those beliefs. For example, he may believe that I believe he will continue again in the third stage, hence that I will continue in the second stage, hence he will continue in the first stage then plan to defect in the third stage. Given his beliefs about me, his actions in the first stage were rational. But if that plan to defect in stage three were his justification, then I should defect in stage two. He realizes I will make these inferences, hence he will defect in stage 1. That is, the backward induction outcome is justified by forward induction. Now, it can be proven that rationality and common “strong belief in rationality” as loosely explained above, along with a suitably rich type structure for all players, generates a backward induction outcome. But the epistemic justification is completely based on the equivalence between forward and backward induction under those assumptions, not on any epistemic justification for backward induction reasoning per se. I think that’s a fantastic result.

Final version, prepared for the new Handbook of Game Theory. I don’t see a version on RePEc IDEAS.

“The Tragedy of the Commons in a Violent World,” P. Sekeris (2014)

The prisoner’s dilemma is one of the great insights in the history of the social sciences. Why would people ever take actions that make everyone worse off? Because we all realize that if everyone took the socially optimal action, we would each be better off individually by cheating and doing something else. Even if we interact many times, that incentive to cheat will remain in our final interaction, hence cooperation will unravel all the way back to the present. In the absence of some ability to commit or contract, then, it is no surprise we see things like oligopolies who sell more than the quantity which maximizes industry profit, or countries who exhaust common fisheries faster than they would if the fishery were wholly within national waters, and so on.

But there is a wrinkle: the dreaded folk theorem. As is well known, if we play frequently enough, and the probability that any given game is the last is low enough, then any feasible outcome which is better than what players can guarantee themselves regardless of other player’s action can be sustained as an equilibrium; this, of course, includes the socially optimal outcome. And the punishment strategies necessary to get to that social optimum are often fairly straightforward. Consider oligopoly: if your firm produces more than half the monopoly output, then I produce the Cournot duopoly quantity in the next period. If you think I will produce Cournot, your best response is also to produce Cournot, and we will do so forever. Therefore, if we are setting prices frequently enough, the benefit to you of cheating today is not enough to overcome the lower profits you will earn in every future period, and hence we are able to collude at the monopoly level of output.

Folk theorems are really robust. What if we only observe some random public signal of what each of us did in the last period? The folk theorem holds. What if we only privately observe some random signal of what the other people did last period? No problem, the folk theorem holds. There are many more generalizations. Any applied theorist has surely run into the folk theorem problem – how do I let players use “reasonable” strategies in a repeated game but disallow crazy strategies which might permit tacit collusion?

This is Sekeris’ problem in the present paper. Consider two nations sharing a common pool of resources like fish. We know from Hotelling how to solve the optimal resource extraction problem if there is only one nation. With more than one nation, each party has an incentive to overfish today because they don’t take sufficient account of the fact that their fishing today lowers the amount of fish left for the opponent tomorrow, but the folk theorem tells us that we can still sustain cooperation if we interact frequently enough. Indeed, Ostrom won the Nobel a few years ago for showing how such punishments operate in many real world situations. But, but! – why then do we see fisheries and other common pool resources overdepleted so often?

There are a few ways to get around the folk theorem. First, it may just be that players do not interact forever, at least probabalistically; some firms may last longer than others, for instance. Second, it may be that firms cannot change their strategies frequently enough, so that you will not be punished so harshly if you deviate from the cooperative optimum. Third, Mallesh Pai and coauthors show in a recent paper that with a large number of players and sufficient differential obfuscation of signals, it becomes too difficult to “catch cheaters” and hence the stage game equilibrium is retained. Sekeris proposes an alternative to all of these: allow players to take actions which change the form of the stage game in the future. In particular, he allows players to fight for control of a bigger share of the common pool if they wish. Fighting requires expending resources from the pool building arms, and the fight itself also diminishes the size of the pool by destroying resources.

As the remaining resource pool gets smaller and smaller, then each player is willing to waste fewer resources arming themselves in a fight over that smaller pool. This means that if conflict does break out, fewer resources will be destroyed in the “low intensity” fight. Because fighting is less costly when the pool is small, as the pool is depleted through cooperative extraction, eventually the players will fight over what remains. Since players will have asymmetric access to the pool following the outcome of the fight, there are fewer ways for the “smaller” player to harm the bigger one after the fight, and hence less ability to use threats of such harm to maintain folk-theorem cooperation before the fight. Therefore, the cooperative equilibrium partially unravels and players do not fully cooperate even at the start of the game when the common pool is big.

That’s a nice methodological trick, but also somewhat reasonable in the context of common resource pool management. If you don’t overfish today, it must be because you fear I will punish you by overfishing myself tomorrow. If you know I will enact such punishment, then you will just invade me tomorrow (perhaps metaphorically via trade agreements or similar) before I can enact such punishment. This possibility limits the type of credible threats that can be made off the equilibrium path.

Final working paper (RePEc IDEAS. Paper published in Fall 2014 RAND.

“Competition in Persuasion,” M. Gentzkow & E. Kamenica (2012)

How’s this for fortuitous timing: I’d literally just gone through this paper by Gentzkow and Kamenica yesterday, and this morning it was announced that Gentzkow is the winner of the 2014 Clark Medal! More on the Clark in a bit, but first, let’s do some theory.

This paper is essentially the multiple sender version of the great Bayesian Persuasion paper by the same authors (discussed on this site a couple years ago). There are a group of experts who can (under commitment to only sending true signals) send costless signals about the realization of the state. Given the information received, the agent makes a decision, and each expert gets some utility depending on that decision. For example, the senders might be a prosecutor and a defense attorney who know the guilt of a suspect, and the agent a judge. The judge convicts if p(guilty)>=.5, the prosecutor wants to maximize convictions regardless of underlying guilt, and vice versa for the defense attorney. Here’s the question: if we have more experts, or less collusive experts, or experts with less aligned interests, is more information revealed?

A lot of our political philosophy is predicated on more competition in information revelation leading to more information actually being revealed, but this is actually a fairly subtle theoretical question! For one, John Stuart Mill and others of his persuasion would need some way of discussing how people competing to reveal information strategically interact, and to the extent that this strategic interaction is non-unique, they would need a way for “ordering” sets of potentially revealed information. We are lucky in 2014, thanks to our friends Nash and Topkis, to be able to nicely deal with each of those concerns.

The trick to solving this model (basically every proof in the paper comes down to algebra and some simple results from set theory; they are clever but not technically challenging) is the main result from the Bayesian Persuasion paper. Draw a graph with the agent’s posterior belief on the X-axis, and the utility (call this u) the sender gets from actions based on each posterior on the y-axis. Now draw the smallest concave function (call it V) that is everywhere greater than u. If V is strictly greater than u at the prior p, then a sender can improve her payoff by revealing information. Take the case of the judge and the prosecutor. If the judge has the prior that everyone brought before them is guilty with probability .6, then the prosecutor never reveals information about any suspect, and the judge always convicts (giving the prosecutor utility 1 rather than 0 from an acquittal). If, however, the judge’s prior is that everyone is guilty with .4, then the prosecutor can mix such that 80 percent of criminals are convicted by judiciously revealing information. How? Just take 2/3 of the innocent people, and all of the guilty people, and send signals that each of these people is guilty with p=.5, and give the judge information on the other 1/3 of innocent people that they are innocent with probability 1. This is plausible in a Bayesian sense. The judge will convict all of the folks where p(guilty)=.5, meaning 80 percent of all suspects are convicted. If you draw the graph described above with u=1 when the judge convicts and u=0 otherwise, it is clear that V>u if and only if p<.5, hence information is only revealed in that case.

What about when there are multiple senders with different utilities u? It is somewhat intuitive: more information is always, almost by definition, informative for the agent (remember Blackwell!). If there is any sender who can improve their payoff by revealing information given what has been revealed thus far, then we are not in equilibrium, and some sender has the incentive to deviate by revealing more information. Therefore, adding more senders increases the amount of information revealed and “shrinks” the set of beliefs that the agent might wind up holding (and, further, the authors show that any Bayesian plausible beliefs where no sender can further reveal information to improve their payoff is an equilibrium). We still have a number of technical details concerning multiplicity of equilibria to deal with, but the authors show that these results hold in a set order sense as well. This theorem is actually great: to check equilibrium information revelation, I only need to check where V and u diverge sender by sender, without worrying about complex strategic interactions. Because of that simplicity, it ends up being very easy to show that removing collusion among senders, or increasing the number of senders, will improve information revelation in equilibrium.

September 2012 working paper (IDEAS version). A brief word on the Clark medal. Gentzkow is a fine choice, particularly for his Bayesian persuasion papers, which are already very influential. I have no doubt that 30 years from now, you will still see the 2011 paper on many PhD syllabi. That said, the Clark medal announcement is very strange. It focuses very heavily on his empirical work on newspapers and TV, and mentions his hugely influential theory as a small aside! This means that five of the last six Clark medal winners, everyone but Levin and his relational incentive contracts, have been cited primarily for MIT/QJE-style theory-light empirical microeconomics. Even though I personally am primarily an applied microeconomist, I still see this as a very odd trend: no prizes for Chernozhukov or Tamer in metrics, or Sannikov in theory, or Farhi and Werning in macro, or Melitz and Costinot in trade, or Donaldson and Nunn in history? I understand these papers are harder to explain to the media, but it is not a good thing when the second most prominent prize in our profession is essentially ignoring 90% of what economists actually do.

“The Explanatory Relevance of Nash Equilibrium: One-Dimensional Chaos in Boundedly Rational Learning,” E. Wagner (2013)

The top analytic philosophy journals publish a surprising amount of interesting game and decision theory; the present article, by Wagner in the journal Philosophy of Science, caught my eye recently.

Nash equilibria are stable in a static sense, we have long known; no player wishes to deviate given what others do. Nash equilibria also require fairly weak epistemic conditions: if all players are rational and believe the other players will play the actual strategies they play with probability 1, then the set of outcomes is the Nash equilibrium set. A huge amount of work in the 80s and 90s considered whether players would “learn” to play Nash outcomes, and the answer is by and large positive, at least if we expand from Nash equilibria to correlated equilibria: fictitious play (I think what you do depends on the proportion of actions you took in the past) works pretty well, rules that are based on the relative payoffs of various strategies in the past work with certainty, and a type of Bayesian learning given initial beliefs about the strategy paths that might be used generates Nash in the limit, though note the important followup on that paper by Nachbar in Econometrica 2005. (Incidentally, a fellow student pointed out that the Nachbar essay is a great example of how poor citation measures are for theory. The paper has 26 citations on Google Scholar mainly because it helped kill a literature; the number of citations drastically underestimates how well-known the paper is among the theory community.)

A caution, though! It is not the case that every reasonable evolutionary or learning rule leads to an equilibrium outcome. Consider the “continuous time imitative-logic dynamic”. A continuum of agents exist. At some exponential time for each agent, a buzzer rings, at which point they randomly play another agent. The agent imitates the other agent in the future with probability exp(beta*pi(j)), where beta is some positive number and pi(j) is the payoff to the opponent; if imitation doesn’t occur, a new strategy is chosen at random from all available strategies. A paper by Hofbauer and Weibull shows that as beta grows large, this dynamic is approximately a best-response dynamic, where strictly dominated strategies are driven out; as beta grows small, it looks a lot like a replicator dynamic, where imitation depends on the myopic relative fitness of a strategy. A discrete version of the continuous dynamics above can be generated (all agents simultaneously update rather than individually update) which similarly “ranges” from something like the myopic replicator to something like a best response dynamic as beta grows. Note that strictly dominated strategies are not played for any beta in both the continuous and discrete time i-logic dynamics.

Now consider a simple two strategy game with the following payoffs:

      Left Right
Left   (1,1) (a,2)
Right (2,a) (1,1)

The unique Nash equilibrium is X=1/A. Let, say, A=3. When beta is very low (say, beta=1), and players are “relatively myopic”, and the initial condition is X=.1, the discrete time i-logic dynamic converges to X=1/A. But if beta gets higher, say beta=5, then players are “more rational” yet the dynamic does not converge or cycle at all: indeed, whether the population plays left or right follows a chaotic system! This property can be generated for many initial points X and A.

The dynamic here doesn’t seem crazy, and making agents “more rational” in a particular sense makes convergence properties worse, not better. And since play is chaotic, a player hoping to infer what the population will play next is required to know the initial conditions with certainty. Nash or correlated equilibria may have some nice dynamic properties for wide classes of reasonable learning rules, but the point that some care is needed concerning what “reasonable learning rules” might look like is well taken.

Final 2013 preprint. Big thumbs up to Wagner for putting all of his papers on his website, a real rarity among philosophers. Actually, a number of his papers look quite interesting: Do cooperate and fair bargaining evolve in tandem? How do small world networks help the evolution of meaning in Lewis-style sender-receiver games? How do cooperative “stag hunt” equilibria evolve when 2-player stag hunts have such terrible evolutionary properties? I think this guy, though a recent philosophy PhD in a standard philosophy department, would be a very good fit in many quite good economic theory programs…

“Long Cheap Talk,” R. Aumann & S. Hart (2003)

I wonder if Crawford and Sobel knew just what they were starting when they wrote their canonical cheap talk paper – it is incredible how much more we know about the value of cheap communication even when agents are biased. Most importantly, it is not true that bias or self-interest means we must always require people to “put skin in the game” or perform some costly action in order to prove the true state of their private information. A colleague passed along this paper by Aumann and Hart which addresses a question that has long bedeviled students of repeated games: why don’t they end right away? (And fair notice: we once had a full office shrine, complete with votive candles, to Aumann, he of the antediluvian beard and two volume tome, so you could say we’re fans!)

Take a really simple cheap talk game, where only one agent has any useful information. Row knows what game we are playing, and Column only knows the probability distribution of such games. In the absence of conflict (say, where there are two symmetric games, each of which has one Pareto optimal equilibrium), Row first tells Column that which game is the true one, this is credible, and so Column plays the Pareto optimal action. In other cases, we know from Crawford-Sobel logic that partial revelation may be useful even when there are conflicts of interest: Row tells Column with some probability what the true game is. We can also create new equilibria by using talk to reach “compromise”. Take a Battle of the Sexes, with LL payoff (6,2), RR (2,6) and LR=RL=(0,0). The equilibria of the simultaneous game without cheap talk are LL,RR, or randomize 3/4 on your preferred location and 1/4 of the opponent’s preferred location. But a new equilibria is possible if we can use talk to create a public randomization device. We both write down 1 or 2 on a piece of paper, then show the papers to each other. If the sum is even, we both go LL. If the sum is odd, we both then go RR. This gives ex-ante payoff (4,4), which is not an equilibrium payoff without the cheap talk.

So how do multiple rounds help us? They allow us to combine these motives for cheap talk. Take an extended Battle of the Sexes, with a third action A available to Column. LL still pays off (6,2), RR still (2,6) and LR=RL=(0,0). RA or LA pays off (3,0). Before we begin play, we may be playing extended Battle of the Sexes, or we may be playing a game Last Option that pays off 0 to both players unless Column plays A, in which case both players get 4; both games are equally probable ex-ante, and only Row learns which game we actually in. Here, we can enforce a payoff of (4,4) if, when the game is actually extended Battle of the Sexes, we randomize between L and R as in the previous paragraph, but if the game is Last Option, Column always plays A. But the order in which we publicly randomize and reveal information matters! If we first randomize, then reveal which game we are playing, then whenever the public randomization causes us to play RR (giving row player a payoff of 2 in Battle of the Sexes), Row will afterwards have the incentive to claim we are actually playing Last Resort. But if Row first reveals which game we are playing, and then we randomize if we are playing extended Battle of the Sexes, we indeed enforce ex-ante expected payoff (4,4).

Aumann and Hart show precisely what can be achieved with arbitrarily long strings of cheap talk, using a clever geometric proof which is far too complex to even summarize here. But a nice example of how really long cheap talk of this fashion can be used is in a paper by Krishna and Morgan called The Art of Conversation. Take a standard Crawford-Sobel model. The true state of the world is drawn uniformly from [0,1]. I know the true state, and get utility which is maximized when you take action on [0,1] as close as possible to the true state of the world plus .1. Your utility is maximized when you take action as close as possible to the true state of the world. With this “bias”, there is a partially informative one-shot cheap talk equilibrium: I tell you whether we are in [0,.3] or [.3,1] and you in turn take action either .15 or .65. How might we do better with a string of cheap talk? Try the following: first I tell you whether we are in [0,.2] or [.2,1]. If I say we are in the low interval, you take action .1. If I say we are in the high interval, we perform a public randomization which ends the game (with you taking action .6) with probability 4/9 and continues the game with probability 5/9; for example, to publicly randomize we might both shout out numbers between 1 and 9, and if the difference is 4 or less, we continue. If we continue, I tell you whether we are in [.2,.4] or [.4,1]. If I say [.2,.4], you take action .3, else you take action .7. It is easy to calculate that both players are better off ex-ante that in the one-shot cheap talk game. The probabilities 4/9 and 5/9 were chosen so as to make each player indifferent from following the proposed equilibrium after the randomization or not.

The usefulness of the lotteries interspersed with the partial revelation are to let the sender credibly reveal more information. If there were no lottery, but instead we always continued with probability 1, look at what happens when the true state of nature is .19. The sender knows he can say in the first revelation that, actually, we are on [.2,1], then in the second revelation that, actually, we are on [.2,4], in which case the receiver plays .3 (which is almost exactly sender’s ideal point .29). Hence without the lotteries, the sender has an incentive to lie at the first revelation stage. That is, cheap talk can serve to give us jointly controlled lotteries in between successive revelation of information, and in so doing, improve our payoffs.

Final published Econometrica 2003 copy (IDEAS). Sequential cheap talk has had many interesting uses. I particularly enjoyed this 2008 AER by Alonso, Dessein and Matouschek. The gist is the following: it is often thought that the tradeoff between decentralized firms and centralized firms is more local control in exchange for more difficult coordination. But think hard about what information will be transmitted by regional managers who only care about their own division’s profits. As coordination becomes more important, the optimal strategy in my division is more closely linked to the optimal decision in other divisions. Hence I, the regional manager, have a greater incentive to freely share information with other regional managers than in the situation where coordination is less important. You may prefer centralized decision-making when cooperation is least important because this is when individual managers are least likely to freely share useful information with each other.

“Between Proof and Truth,” J. Boyer & G. Sandu (2012)

In the previous post, I promised that game theory has applications in pure philosophy. Some of these applications are economic in nature – a colleague has written a paper using a branch of mechanism design to link the seemingly disjoint methodologies of induction and falsification – but others really are pure. In particular, there is a branch of zero-sum games which deals with the fundamental nature of truth itself!

Verificationist theories of truth like those proposed by Michael Dummett suggest that a statement is true if we can prove it to be true. This may seem trivial, but it absolutely is not: for one, to the extent that there are unprovable statements as in some mathematical systems, we ought say “that statement is neither true nor false” not simply “we cannot prove that statement true or false”. Another school of thought, beginning with Tarski’s formal logics and expanded by Hintikka, says that truth is a property held by sentences. The statement “Snow is white” is true if and only if there is a thing called snow, and it is necessarily white. The statement “A or B” is true if and only if the thing denoted by “A” is true or the thing denoted by “B” is true.

These may seem very different conceptions. The first is a property requiring action – something is true if someone can verify it. The second seems more in the air – something is true if its sentence has certain properties. But take any sentence in formal logic, like “There exists A such that for all B either C or D is true”. We can play a game between Verifier and Falsifier. Move left to right across the sentence, letting the Verifier choose at all existentials and “or” statements, and the Falsifier choose at all universals and “and” statements. Verifier wins if he can get to the end of sentence with the sentence remaining true. That is, semantic games take Tarksi truth and make it playable by someone with agency, at least in principle.

The paper by Boyer and Sandu takes this as a starting point, and discusses when Dunnett’s truth coincides with Tarksi and Hintikka’s truth, restricting ourselves to semantic games played on recursive structures (nonconstructive winning strategies in the semantic game seems problematic if we want to relate truth in semantic games to verificationist truth!) Take statements in Peano arithmetic where all objects chosen are natural numbers (it happens to be truth that in PA, every recursive structure is isomorphic to the natural numbers). When is every statement I can prove also true in the sense of a winning strategy in the recursive semantic game? Conversely, when can the semantic game truth of a sentence by given by a proof? The answer to both is negative. For the first, check that the sentence that all programs x1 and inputs x2, there exists a number of steps y such that the system either halts after y steps or it does not halt. This is the halting problem. It is not decidable, hence there is no winning strategy for Verifier, but the sentence if trivially provable in Peano arithmetic by the law of the excluded middle.

Boyer and Sandu note (as known in an earlier literature) that we can relate the two types of truth by extending the semantic game to allow backward moves. That is, at any node, or at the end of the game, Verifier can go back to any node she played and change her action. Verifier wins if she has a finite winning strategy. It turns out that Verifier can win in the game with backward moves if and only if she can win in the standard game. Further, if a statement can be proven, Verifier can win in the game with backward moves using a recursive strategy. This has some interesting implications for Godel sentences (“This sentence is not provable within the current system.”) which I don’t wish to discuss here.

Note that all of this is just the use of game theory in “games against nature”. We usually think of game theory as being a tool for the analysis of situations with strategic interaction, but the condition that players are rational perfect optimizers means that, in zero sum games, checking whether something is possible for some player just involves checking whether a player called Nature has a winning strategy against him. This technique is so broadly applicable, in economics and otherwise, that we ought really be careful about defining game theory as solely a tool for analyzing the strategies of multiple “actual” agents; e.g., Wikipedia quotes Myerson’s definition that game theory is “the study of mathematical models of conflict and cooperation between intelligent rational decision-makers”. This is too limiting.

Final copy. This article appeared in Synthese 187/March 2012. Philosophers seem to rarely put their working papers online, but Springer has taken down the paywall on Synthese throughout December, so you can read the above link even without a subscription.

Game Theory and History, A. Greif & Friends (1993, 1994)

(This post refers to A. Greif, “Contract Enforceability and Economic Institutions in Early Trade: The Maghribi Traders’ Coalition”, AER 1993, and A. Greif, P. Milgrom & B. Weingast, “Coordination, Commitment and Enforcement: The Case of the Merchant Guild,” JPE 1994.)

Game theory, after a rough start, may actually be fulfilling its role as proposed by Herbert Gintis: unifier of the sciences. It goes without saying that game theoretic analysis is widespread in economics, political science (e.g., voter behavior), sociology (network games), law (antitrust), computer science (defending networks against attacks), biology (evolutionary strategies), pure philosophy (more on this in a post tomorrow!), with occasional appearances in psychology, religion (recall Aumann’s Talmud paper), physics (quantum games), etc. But history? Surely game theory, particularly the more complex recent results, has no place there? Yet Avner Greif, an economic historian at Stanford, has shown that games can play a very interesting role indeed in understanding historical events.

Consider first his Maghribi traders paper. In the 11th and 12th century, a group of Judeo-Arabic traders called the Maghribis traded across the Mediterranean. Two institutional aspects of their trade are interesting. First, they all hired agents in foreign cities to carry out their trade, and second, they generally used other Maghribi merchants as their agents. This is quite different from, for instance, Italy, where merchants tended to hire agents in foreign cities who were not themselves merchants. What explains that difference, and more generally, how can long distance traders insure that traders do not rip you off? For instance, how do I keep them from claiming they sold at a low price when actually they sold at a high one?

To a theorist, this looks like a repeated reputational game with imperfect monitoring. Greif doesn’t go the easy route and just assume there are trustworthy and untrustworthy types. Rather, he assumes that there are a set of potential agents who can be hired in each period, that agents are exogenously separated from merchants with probability p in each period, and that merchants can choose to hire and fire at any wage they choose. You probably know from economics of reputation or from the efficiency wage literature that I need to offer wages higher than the agent’s outside option to keep him from stealing; the value of the continuation game, then, is more than the value of stealing now. Imagine that I fire anyone who steals and never hire him again. How do I ensure that other firms do not then hire that same agent (perhaps the agent will say, “Look, give me a second chance and I will work at a lower wage”)? Well, an agent who has cheated one merchant will never be hired by that merchant again. This means that when he is in the unemployed pool, even if other merchants are willing to hire him, his probability of getting hired is lower, since one merchant will definitely not hire him. That means that the continuation value of the game if he doesn’t steal from me is lower. Therefore, the efficiency wage I must pay him to keep him from stealing is higher than the efficiency wage I can pay someone who hasn’t ever stolen, so I strictly prefer to hire agents who have never stolen. This allows the whole coalition to coordinate. Note that the fewer agents there are, the higher the continuation value from not stealing, and hence the lower the efficiency wage I can pay: it is optimal to keep the set of potential agents small.

What of the Italian merchants? Why do they not hire only each other? Maghribi merchants tended to be involved only in long distance trade, while Italian merchants were also involved in real estate and other pursuits. This means the outside option (continuation value after cheating if no one hires me again) is higher for Italian merchants than Maghribi merchants, which means that hiring merchants at the necessary efficiency wage will be relatively more expensive for Italians than Maghribis.

A followup by Greif, with Milgrom and Weingast, considers the problem of long distance trade from the perspective of cities. Forget about keeping your agent from ripping you off: how do you keep the city from ripping you off? For instance, Genoans in Constantinople had their district overrun by a mob at one point, with no compensation offered. Sicilians raised taxes on sales by Jews at one point after they had brought their goods for sale. You may naively think that reputation alone will be enough; I won’t rip anyone off because I want a reputation of being a safe and fair city for trade.

But again, the literature of repeated games tells us this will not work. Generally, I need to punish deviations from the efficient set of strategies, and punish those who themselves do not punish deviators. In terms of medieval trade, to keep a city from ripping me off, I need not only to punish the city by bringing it less trade, but I also need to make sure the city doesn’t make up for my lost trade by offering a special deal to some other trader. That is, I need to get information about violation against a single trader to other traders, and I need to make sure they are willing to punish the deviating city.

The merchant guild was the institution that solved this problem. Merchant guilds were able to punish their own members by, for example, keeping them from earning rents from special privileges in their own city. In the most general setting, when a guild orders a boycott, cities may be able to attract some trade, but less than the efficient amount, because only by offering a particularly good deal to the merchants who come during a boycott will entice them to come and to credibly believe the city will not steal.

This is all to say that strong guilds may be in the best interest of cities since they allow the city to solve its commitment problem. The historical record confirms many examples of cities encouraging guilds to come trade, and encouraging the strengthening of guilds. Only a reputational model like the above one can explain such city behavior; if guilds are merely extracting rents with monopoly privilege, cities would not encourage them all. Both of these papers, I think, are quite brilliant.

1993 AER (IDEAS version) and 1994 JPE (IDEAS version). Big thumbs up to Avner for having the final published versions of these papers on his website.

“Unbeatable Imitation,” P. Duersch, J. Oechssler & B. C. Schipper (2012)

People, particularly in relatively unimportant situations, rely on heuristics rather than completely rational foresight. But using heuristics in modeling seems to me undesirable, because players using heuristics can easily be abused by more strategic players. For instance, consider the game of fighter pilot Chicken as in the movie Top Gun. Both players prefer going straight while the opponent swerves to swerving when the opponent swerves (hence showing lack of nerve) to swerving when the opponent goes straight (hence showing a unique lack of nerve) to going straight when the opponent goes straight (hence crashing). Consider playing Chicken over and over against an heuristic-based opponent. Perhaps the opponent simply best responds to whatever you did in the previous period. In this case, if I go straight in period 1, the opponent swerves in the next period, and if I swerve, the opponent goes straight. Therefore, I’ll go straight in periods 1 through infinity, knowing my opponent will swerve in every period except possibly the first. The sophisticated player will earn a much higher payoff than the unsophisticated one.

Duersch et al show that, in every 2×2 symmetric game and in a large class of N-by-N symmetric two-player games, a simple heuristic called “imitation” has an undiscounted average payoff identical to that which can be achieved by an opponent playing any strategy at all. In imitation, I retain my strategy each period unless the opponent earned strictly more than I did in the previous period, in which case I copy him. Consider Chicken again. If I go straight and the opponent swerves, then I know he will go straight in the following period. In the next period, then, I can either crash into him (causing him to swerve two periods on) or swerve myself (causing him to go straight two periods on). In any case, I can at best get my opponent to swerve while I go straight once every two periods. By symmetry, in the periods where this doesn’t happen, I can at best get the payoff from swerving when my opponent goes straight, meaning my average payoff is no better than my heuristic-based opponent! This is true no matter what strategy is used against the imitating opponent.

Now imitation will fail in many games, of course. Consider Rock-Paper-Scissors. If I play Rock when you play Scissors, then since you imitate, you will switch to Rock in the next period. Knowing this, I will play Paper, and so on, winning every period. Games that have this type of cycling possibility allow me to extract arbitrarily larger higher payoff than the imitating opponent. What’s interesting is that, in finite symmetric two-player games between an imitator and an agent with perfect rationality, games with a possibility of cycling in some subgame are the only ones in which the imitator does not earn the same average payoff per period as the rational player! Checking this condition is difficult, but games with no pure equilibrium in the relative payoff game (i.e., the game where payoffs for each player are equal to the difference in payoffs between players in the original game, hence making the original game zero-sum) always have a cycle, and games which are quasiconcave never do. Many common games (oligopoly competition, Nash bargaining, etc.) can be written as quasiconcave games.

Imitation is really pretty unique. The authors give the example of a 3×3 symmetric oligopoly game, where strategy 1 is “produce Cournot quantity”, strategy 2 is “produce Stackelberg follower quantity” and strategy 3 is “produce Stackelberg leader quantity.” The game has no subgames with cycles as defined above, and hence imitators and the rational player earn the same average payoff (if rational player plays Stackelberg leader and I play something else, then he earns more than me, hence I imitate him next period, hence he best responds by playing Stackelberg follower). Other heuristics do much worse than imitation. A heuristic where you simply best reply just plays Stackelberg follower forever, for example.

This result is quite interesting, and the paper is short; the “useful insight on the worst page” test of a quality paper is easily satisfied. I like this work too because it is related to some ideas I have about the benefits of going first. Consider shifting a symmetric simultaneous game to a symmetric sequential game. Going first has no benefit except that it allows me to commit to my action (and many negatives, of course, including the inability to mix strategies). Likewise a heuristic rule allows the heuristic player to commit to actions without assuming perfection of the equilibrium. So there is a link between “optimal” heuristics and the desire of a rational player to commit to his action in advance if he could so choose.

November 2011 Working Paper (IDEAS version). Final version published in the September 2012 issue of Games and Economic Behavior.

“Aggregate Comparative Statics,” D. Acemoglu & M. K. Jensen (2011)

Enormous parts of economic theory are devoted to “comparative statics”: if one variable changes (one chickpea firm’s cost structure decreases, demand for pineapples goes up, firms receive better information about the size of an oil field before bidding on rights to that field), does another variable increase or decrease (chickpea prices rise or fall, equilibrium supply of pineapples rises or falls, revenue for the oil field auctioneer rises or falls)?

There are a couple ways we can check comparative statics. The traditional way is using the implicit function theorem. Call t the variable you wish to change, and x the endogenous variable who comparative static you wish to check. If you have a model for which you can solve for x as a function of t, then finding the sign of dx/dt (or the nondifferential analogue) is straightforward. This is rare indeed in many common economic models; for instance, if x is supply of firm 1 in a Cournot duopoly, and t is the marginal cost of firm 2, and all I know is that demand satisfies certain properties, then since I haven’t even specified the functional form of demand, I will not be able to take an explicit derivative. We may still be in luck, however. Let f(x(t),t) be the function an agent is maximizing, which depends on x the agent can choose, and t which is exogenous. If the optimal choice x* lies in the interior of a convex strategy set, and f is concave and twice differentiable, then the first order approach applies, and hence at any optima, f'(x*,t)=0. By the chain rule, f'(x*,t)=0=(df/dx*)(dx*/dt)+(df/dt), or dx*/dt=-(df/dt)/(df/dx*).

So far, so good. But often, our maximand is neither differentiable nor concave, our strategy set is not convex, or our solution is not interior. What are we to do? Here, the monotone comparative statics approach (Athey and Milgrom are the early reference here) allows us to sign without many assumptions. We still don’t have many good results for games of strategic substitutes, however; in these types of games, my best response is decreasing in your action, so if a parameter change causes you to increase your action, I will want to decrease my action. Further, there are many games where player actions are neither strategic complements nor strategic substitutes. A good example is a patent race. If you increase your effort, I may best respond by increasing my effort (in order to “stay close”) or decreasing my effort (because I am now so far behind that I have no chance of inventing first).

Acemoglu and Jensen show that both types of games can be handled nicely if the games are “aggregative games” where my action depends only on the aggregate output of all firms rather than the individual actions of other firms. My choice of production in a Cournot oligopoly depends only on aggregate production by all firms, and my choice of effort in an independent Poisson patent race depends only on the cumulative hazard rate of invention. In such a case, if the game is one of strategic substitutes in the choice variable x, and the aggregate quantity is some increasing function of the sum of all x and t, then an increase in t leads to a decrease in equilibrium choices x, and entry by another player increases the aggregate quantity. If t(i) is an idiosyncratic exogenous variable that affects i’s choice, and only affects other players’ actions through the aggregate x, then an increase in t(i) increases x(i) and decreases x(j) for all other j. [When I say "increase" and "decrease" for the equilibrium quantities, when there are multiple equilibria, this means that the entire set of equilibria quantities shift up or down.]

For example, take a Cournot game, where my profits are s(i)*P(X+t)-c(i)(s(i),t(i)), where s(i) is my supply choice, P(X+t) is price given the total industry supply plus a demand shifter t, and c(i) is my cost function for producing s(i) given some idiosyncratic cost shifter t(i) such that s(i) and t(i) obey the single crossing property. Assume P is twice differentiable. Then taking the derivative of the profit function with respect to other players’ supply, we have that my supply is a strategic substitute with other’s supply iff P’+s(i)P”<0. Note that this condition does not depend on any assumption about the cost function c. Then using the theorem in the last paragraph, we have that, in a Cournot game with *any* cost function, a decrease in the demand curve decreases total industry supply, an additional entrant decreases equilibrium supply from existing firms, and a decrease in costs for one firm increases that firm's equilibrium output and decreases the output of all other firms. All we assumed to get this was the strategic substitutes property, twice-differentiability of P, and the single crossing property of the cost function in s and t. As with many comparative statics results, checking strategic substitutability and making sure the signs of s and t are correct for the interpretation of results is often easier than taking implicit derivatives even if your assumptions are such that the implicit approach could conceivably work.

If the strategic substitutes property does not hold, as in a patent race, then with compact, convex strategy sets, a payoff function pseudoconcave in own player strategies, a boundary condition, and a “local solvability” condition are sufficient to get the same results. Local solvability is roughly the condition which guarantees that the player’s own effect on the aggregate when she reallocates after a change in t or t(i) does not alter the result from the previous paragraph enough to flip the comparative static’s sign.

April 2011 Working Paper, as yet unpublished (IDEAS version).

“Evolutionary Dynamics and Backward Induction,” S. Hart (2002)

Let’s follow up yesterday’s post with an older paper on evolutionary games by the always-lucid Sergiu Hart. As noted in the last post, there are many evolutionary dynamics for which the rest points of the evolutionary game played by completely myopic agents and the Nash equilibria of the equivalent static game played by strategic games coincide, which is really quite phenomenal (and since you know there are payoff-suboptimal Nash equilibria, results of this kind have, since Maynard Smith (1973), fundamentally changed our understanding of biology). Nash equilibria is a low bar, however. Since Kuhn (1953), we have also known that every finite game has a backward induction equilibrium, what we now call the subgame perfect equilibrium, in pure strategies. When does the invariant limit distribution of an extensive form evolutionary game coincide with the backward induction equilibrium? (A quick mathematical note: an evolutionary system with mutation, allowing any strategy to “mutate” on some agent with some probability in each state, means that by pure luck the system can move from any state to any other state. We also allow evolutionary systems to have selection, meaning that with some probability in each state an agent switches from his current strategy to one with a higher payoff. This process defines a Markov chain, and since the game is finite and the mutations allow us to reach any state, it is a finite irreducible Markov chain. Such Markov chains have a unique invariant distribution in the limit.)

In general, we can have limit distributions of evolutionary processes that are not the backward induction equilibrium. Consider the following three step game. Agent 1 chooses C or B, then if C was chosen, agent 2 (in the agent-normal form) chooses C2 or B2, then if C and B2 were chosen, agent 3 chooses C3 or B3. The payoff to each agent when B is chosen is (4,0,0), when C and C2 are chosen is (5,9,0), when C, B2 and C3 are chosen is (0,0,0), and when C, B2 and B3 are chosen is (0,10,1). You can see that (C,C2,C3) and (B,B2,B3) are both Nash, but only (B,B2,B3) is subgame perfect, and hence the backward induction equilibrium. Is (B,B2,B3) the limit distribution of the evolutionary game? In the backward induction equilibrium, agent 1 chooses B at the first node, and hence nodes 2 and 3 are never reached, meaning only mutation, and not selection, affect the distribution of strategies at those nodes. Since the Markov chain is ergodic, with probability 1 the proportion of agents at node 2 playing B2 will fall below .2; when that happens, selection at node 1 will push agents toward C instead of B. When this happens, now both nodes 2 and 3 are reached with positive probability. If less than .9 of the agents in 3 are playing B3, then selection will push agents at node 2 toward C2. Selection can therefore push the percentage of agents playing B2 down to 0, and hence (C,C2,C3) can be part of the limit invariant distribution even though it is not the backward induction solution.

So is backward induction unjustifiable from an evolutionary perspective? No! Hart shows that if the number of agents goes to infinity as the probability of mutation goes to zero, then the backward induction solution, when it is unique, is also the only element in the limit invariant distribution of the evolutionary game. How does letting the number of agents go to infinity help? Let Bi be an element of the backward induction equilibrium at node i somewhere in the game tree. Bi must be a best reply in the subgame beginning with i if Bj is played in all descendant nodes by a sufficiently high proportion of the population, so if Bi is not a best reply (and hence selection does not push us toward Bi) it must be that Bj is not being played further down the game tree. If Bi is a best reply in the subgame beginning with i, then most of the population will play Bi because of selection pressures.

Now here’s the trick. Consider the problematic case in the example, when node i is not being reached in a hypothesized limit distribution (if i is reached, then since the probability of mutation goes to zero, selection is much stronger than mutation, and hence non best replies will go away in the limit). Imagine that there is another node g preceding i which is also not reached, and that i is only reached when some strategy outside the backward induction equilibrium is played in g. When g and i are not reached, there is no selection pressure, and hence no reason that the backward induction equilibrium node will be played. Large populations help here. With some small probability, an agent in g mutates such that he plays the node which reaches i. This still has no effect unless there is also a mutation in the node before g that causes g to be reached. The larger the population, the lower the probability the specific individual who mutated in g mutates back before any individual in the node before g mutates. Hence larger populations make it more likely that rare mutations in unreached nodes will “coordinate” in the way needed for selection to take over.

Final GEB version (IDEAS version). Big up to Sergiu for posting final pdfs of all of his papers on his personal website.

Follow

Get every new post delivered to your Inbox.

Join 205 other followers

%d bloggers like this: