Category Archives: Game Theory

“Competition in Persuasion,” M. Gentzkow & E. Kamenica (2012)

How’s this for fortuitous timing: I’d literally just gone through this paper by Gentzkow and Kamenica yesterday, and this morning it was announced that Gentzkow is the winner of the 2014 Clark Medal! More on the Clark in a bit, but first, let’s do some theory.

This paper is essentially the multiple sender version of the great Bayesian Persuasion paper by the same authors (discussed on this site a couple years ago). There are a group of experts who can (under commitment to only sending true signals) send costless signals about the realization of the state. Given the information received, the agent makes a decision, and each expert gets some utility depending on that decision. For example, the senders might be a prosecutor and a defense attorney who know the guilt of a suspect, and the agent a judge. The judge convicts if p(guilty)>=.5, the prosecutor wants to maximize convictions regardless of underlying guilt, and vice versa for the defense attorney. Here’s the question: if we have more experts, or less collusive experts, or experts with less aligned interests, is more information revealed?

A lot of our political philosophy is predicated on more competition in information revelation leading to more information actually being revealed, but this is actually a fairly subtle theoretical question! For one, John Stuart Mill and others of his persuasion would need some way of discussing how people competing to reveal information strategically interact, and to the extent that this strategic interaction is non-unique, they would need a way for “ordering” sets of potentially revealed information. We are lucky in 2014, thanks to our friends Nash and Topkis, to be able to nicely deal with each of those concerns.

The trick to solving this model (basically every proof in the paper comes down to algebra and some simple results from set theory; they are clever but not technically challenging) is the main result from the Bayesian Persuasion paper. Draw a graph with the agent’s posterior belief on the X-axis, and the utility (call this u) the sender gets from actions based on each posterior on the y-axis. Now draw the smallest concave function (call it V) that is everywhere greater than u. If V is strictly greater than u at the prior p, then a sender can improve her payoff by revealing information. Take the case of the judge and the prosecutor. If the judge has the prior that everyone brought before them is guilty with probability .6, then the prosecutor never reveals information about any suspect, and the judge always convicts (giving the prosecutor utility 1 rather than 0 from an acquittal). If, however, the judge’s prior is that everyone is guilty with .4, then the prosecutor can mix such that 80 percent of criminals are convicted by judiciously revealing information. How? Just take 2/3 of the innocent people, and all of the guilty people, and send signals that each of these people is guilty with p=.5, and give the judge information on the other 1/3 of innocent people that they are innocent with probability 1. This is plausible in a Bayesian sense. The judge will convict all of the folks where p(guilty)=.5, meaning 80 percent of all suspects are convicted. If you draw the graph described above with u=1 when the judge convicts and u=0 otherwise, it is clear that V>u if and only if p<.5, hence information is only revealed in that case.

What about when there are multiple senders with different utilities u? It is somewhat intuitive: more information is always, almost by definition, informative for the agent (remember Blackwell!). If there is any sender who can improve their payoff by revealing information given what has been revealed thus far, then we are not in equilibrium, and some sender has the incentive to deviate by revealing more information. Therefore, adding more senders increases the amount of information revealed and “shrinks” the set of beliefs that the agent might wind up holding (and, further, the authors show that any Bayesian plausible beliefs where no sender can further reveal information to improve their payoff is an equilibrium). We still have a number of technical details concerning multiplicity of equilibria to deal with, but the authors show that these results hold in a set order sense as well. This theorem is actually great: to check equilibrium information revelation, I only need to check where V and u diverge sender by sender, without worrying about complex strategic interactions. Because of that simplicity, it ends up being very easy to show that removing collusion among senders, or increasing the number of senders, will improve information revelation in equilibrium.

September 2012 working paper (IDEAS version). A brief word on the Clark medal. Gentzkow is a fine choice, particularly for his Bayesian persuasion papers, which are already very influential. I have no doubt that 30 years from now, you will still see the 2011 paper on many PhD syllabi. That said, the Clark medal announcement is very strange. It focuses very heavily on his empirical work on newspapers and TV, and mentions his hugely influential theory as a small aside! This means that five of the last six Clark medal winners, everyone but Levin and his relational incentive contracts, have been cited primarily for MIT/QJE-style theory-light empirical microeconomics. Even though I personally am primarily an applied microeconomist, I still see this as a very odd trend: no prizes for Chernozhukov or Tamer in metrics, or Sannikov in theory, or Farhi and Werning in macro, or Melitz and Costinot in trade, or Donaldson and Nunn in history? I understand these papers are harder to explain to the media, but it is not a good thing when the second most prominent prize in our profession is essentially ignoring 90% of what economists actually do.

“The Explanatory Relevance of Nash Equilibrium: One-Dimensional Chaos in Boundedly Rational Learning,” E. Wagner (2013)

The top analytic philosophy journals publish a surprising amount of interesting game and decision theory; the present article, by Wagner in the journal Philosophy of Science, caught my eye recently.

Nash equilibria are stable in a static sense, we have long known; no player wishes to deviate given what others do. Nash equilibria also require fairly weak epistemic conditions: if all players are rational and believe the other players will play the actual strategies they play with probability 1, then the set of outcomes is the Nash equilibrium set. A huge amount of work in the 80s and 90s considered whether players would “learn” to play Nash outcomes, and the answer is by and large positive, at least if we expand from Nash equilibria to correlated equilibria: fictitious play (I think what you do depends on the proportion of actions you took in the past) works pretty well, rules that are based on the relative payoffs of various strategies in the past work with certainty, and a type of Bayesian learning given initial beliefs about the strategy paths that might be used generates Nash in the limit, though note the important followup on that paper by Nachbar in Econometrica 2005. (Incidentally, a fellow student pointed out that the Nachbar essay is a great example of how poor citation measures are for theory. The paper has 26 citations on Google Scholar mainly because it helped kill a literature; the number of citations drastically underestimates how well-known the paper is among the theory community.)

A caution, though! It is not the case that every reasonable evolutionary or learning rule leads to an equilibrium outcome. Consider the “continuous time imitative-logic dynamic”. A continuum of agents exist. At some exponential time for each agent, a buzzer rings, at which point they randomly play another agent. The agent imitates the other agent in the future with probability exp(beta*pi(j)), where beta is some positive number and pi(j) is the payoff to the opponent; if imitation doesn’t occur, a new strategy is chosen at random from all available strategies. A paper by Hofbauer and Weibull shows that as beta grows large, this dynamic is approximately a best-response dynamic, where strictly dominated strategies are driven out; as beta grows small, it looks a lot like a replicator dynamic, where imitation depends on the myopic relative fitness of a strategy. A discrete version of the continuous dynamics above can be generated (all agents simultaneously update rather than individually update) which similarly “ranges” from something like the myopic replicator to something like a best response dynamic as beta grows. Note that strictly dominated strategies are not played for any beta in both the continuous and discrete time i-logic dynamics.

Now consider a simple two strategy game with the following payoffs:

      Left Right
Left   (1,1) (a,2)
Right (2,a) (1,1)

The unique Nash equilibrium is X=1/A. Let, say, A=3. When beta is very low (say, beta=1), and players are “relatively myopic”, and the initial condition is X=.1, the discrete time i-logic dynamic converges to X=1/A. But if beta gets higher, say beta=5, then players are “more rational” yet the dynamic does not converge or cycle at all: indeed, whether the population plays left or right follows a chaotic system! This property can be generated for many initial points X and A.

The dynamic here doesn’t seem crazy, and making agents “more rational” in a particular sense makes convergence properties worse, not better. And since play is chaotic, a player hoping to infer what the population will play next is required to know the initial conditions with certainty. Nash or correlated equilibria may have some nice dynamic properties for wide classes of reasonable learning rules, but the point that some care is needed concerning what “reasonable learning rules” might look like is well taken.

Final 2013 preprint. Big thumbs up to Wagner for putting all of his papers on his website, a real rarity among philosophers. Actually, a number of his papers look quite interesting: Do cooperate and fair bargaining evolve in tandem? How do small world networks help the evolution of meaning in Lewis-style sender-receiver games? How do cooperative “stag hunt” equilibria evolve when 2-player stag hunts have such terrible evolutionary properties? I think this guy, though a recent philosophy PhD in a standard philosophy department, would be a very good fit in many quite good economic theory programs…

“Long Cheap Talk,” R. Aumann & S. Hart (2003)

I wonder if Crawford and Sobel knew just what they were starting when they wrote their canonical cheap talk paper – it is incredible how much more we know about the value of cheap communication even when agents are biased. Most importantly, it is not true that bias or self-interest means we must always require people to “put skin in the game” or perform some costly action in order to prove the true state of their private information. A colleague passed along this paper by Aumann and Hart which addresses a question that has long bedeviled students of repeated games: why don’t they end right away? (And fair notice: we once had a full office shrine, complete with votive candles, to Aumann, he of the antediluvian beard and two volume tome, so you could say we’re fans!)

Take a really simple cheap talk game, where only one agent has any useful information. Row knows what game we are playing, and Column only knows the probability distribution of such games. In the absence of conflict (say, where there are two symmetric games, each of which has one Pareto optimal equilibrium), Row first tells Column that which game is the true one, this is credible, and so Column plays the Pareto optimal action. In other cases, we know from Crawford-Sobel logic that partial revelation may be useful even when there are conflicts of interest: Row tells Column with some probability what the true game is. We can also create new equilibria by using talk to reach “compromise”. Take a Battle of the Sexes, with LL payoff (6,2), RR (2,6) and LR=RL=(0,0). The equilibria of the simultaneous game without cheap talk are LL,RR, or randomize 3/4 on your preferred location and 1/4 of the opponent’s preferred location. But a new equilibria is possible if we can use talk to create a public randomization device. We both write down 1 or 2 on a piece of paper, then show the papers to each other. If the sum is even, we both go LL. If the sum is odd, we both then go RR. This gives ex-ante payoff (4,4), which is not an equilibrium payoff without the cheap talk.

So how do multiple rounds help us? They allow us to combine these motives for cheap talk. Take an extended Battle of the Sexes, with a third action A available to Column. LL still pays off (6,2), RR still (2,6) and LR=RL=(0,0). RA or LA pays off (3,0). Before we begin play, we may be playing extended Battle of the Sexes, or we may be playing a game Last Option that pays off 0 to both players unless Column plays A, in which case both players get 4; both games are equally probable ex-ante, and only Row learns which game we actually in. Here, we can enforce a payoff of (4,4) if, when the game is actually extended Battle of the Sexes, we randomize between L and R as in the previous paragraph, but if the game is Last Option, Column always plays A. But the order in which we publicly randomize and reveal information matters! If we first randomize, then reveal which game we are playing, then whenever the public randomization causes us to play RR (giving row player a payoff of 2 in Battle of the Sexes), Row will afterwards have the incentive to claim we are actually playing Last Resort. But if Row first reveals which game we are playing, and then we randomize if we are playing extended Battle of the Sexes, we indeed enforce ex-ante expected payoff (4,4).

Aumann and Hart show precisely what can be achieved with arbitrarily long strings of cheap talk, using a clever geometric proof which is far too complex to even summarize here. But a nice example of how really long cheap talk of this fashion can be used is in a paper by Krishna and Morgan called The Art of Conversation. Take a standard Crawford-Sobel model. The true state of the world is drawn uniformly from [0,1]. I know the true state, and get utility which is maximized when you take action on [0,1] as close as possible to the true state of the world plus .1. Your utility is maximized when you take action as close as possible to the true state of the world. With this “bias”, there is a partially informative one-shot cheap talk equilibrium: I tell you whether we are in [0,.3] or [.3,1] and you in turn take action either .15 or .65. How might we do better with a string of cheap talk? Try the following: first I tell you whether we are in [0,.2] or [.2,1]. If I say we are in the low interval, you take action .1. If I say we are in the high interval, we perform a public randomization which ends the game (with you taking action .6) with probability 4/9 and continues the game with probability 5/9; for example, to publicly randomize we might both shout out numbers between 1 and 9, and if the difference is 4 or less, we continue. If we continue, I tell you whether we are in [.2,.4] or [.4,1]. If I say [.2,.4], you take action .3, else you take action .7. It is easy to calculate that both players are better off ex-ante that in the one-shot cheap talk game. The probabilities 4/9 and 5/9 were chosen so as to make each player indifferent from following the proposed equilibrium after the randomization or not.

The usefulness of the lotteries interspersed with the partial revelation are to let the sender credibly reveal more information. If there were no lottery, but instead we always continued with probability 1, look at what happens when the true state of nature is .19. The sender knows he can say in the first revelation that, actually, we are on [.2,1], then in the second revelation that, actually, we are on [.2,4], in which case the receiver plays .3 (which is almost exactly sender’s ideal point .29). Hence without the lotteries, the sender has an incentive to lie at the first revelation stage. That is, cheap talk can serve to give us jointly controlled lotteries in between successive revelation of information, and in so doing, improve our payoffs.

Final published Econometrica 2003 copy (IDEAS). Sequential cheap talk has had many interesting uses. I particularly enjoyed this 2008 AER by Alonso, Dessein and Matouschek. The gist is the following: it is often thought that the tradeoff between decentralized firms and centralized firms is more local control in exchange for more difficult coordination. But think hard about what information will be transmitted by regional managers who only care about their own division’s profits. As coordination becomes more important, the optimal strategy in my division is more closely linked to the optimal decision in other divisions. Hence I, the regional manager, have a greater incentive to freely share information with other regional managers than in the situation where coordination is less important. You may prefer centralized decision-making when cooperation is least important because this is when individual managers are least likely to freely share useful information with each other.

“Between Proof and Truth,” J. Boyer & G. Sandu (2012)

In the previous post, I promised that game theory has applications in pure philosophy. Some of these applications are economic in nature – a colleague has written a paper using a branch of mechanism design to link the seemingly disjoint methodologies of induction and falsification – but others really are pure. In particular, there is a branch of zero-sum games which deals with the fundamental nature of truth itself!

Verificationist theories of truth like those proposed by Michael Dummett suggest that a statement is true if we can prove it to be true. This may seem trivial, but it absolutely is not: for one, to the extent that there are unprovable statements as in some mathematical systems, we ought say “that statement is neither true nor false” not simply “we cannot prove that statement true or false”. Another school of thought, beginning with Tarski’s formal logics and expanded by Hintikka, says that truth is a property held by sentences. The statement “Snow is white” is true if and only if there is a thing called snow, and it is necessarily white. The statement “A or B” is true if and only if the thing denoted by “A” is true or the thing denoted by “B” is true.

These may seem very different conceptions. The first is a property requiring action – something is true if someone can verify it. The second seems more in the air – something is true if its sentence has certain properties. But take any sentence in formal logic, like “There exists A such that for all B either C or D is true”. We can play a game between Verifier and Falsifier. Move left to right across the sentence, letting the Verifier choose at all existentials and “or” statements, and the Falsifier choose at all universals and “and” statements. Verifier wins if he can get to the end of sentence with the sentence remaining true. That is, semantic games take Tarksi truth and make it playable by someone with agency, at least in principle.

The paper by Boyer and Sandu takes this as a starting point, and discusses when Dunnett’s truth coincides with Tarksi and Hintikka’s truth, restricting ourselves to semantic games played on recursive structures (nonconstructive winning strategies in the semantic game seems problematic if we want to relate truth in semantic games to verificationist truth!) Take statements in Peano arithmetic where all objects chosen are natural numbers (it happens to be truth that in PA, every recursive structure is isomorphic to the natural numbers). When is every statement I can prove also true in the sense of a winning strategy in the recursive semantic game? Conversely, when can the semantic game truth of a sentence by given by a proof? The answer to both is negative. For the first, check that the sentence that all programs x1 and inputs x2, there exists a number of steps y such that the system either halts after y steps or it does not halt. This is the halting problem. It is not decidable, hence there is no winning strategy for Verifier, but the sentence if trivially provable in Peano arithmetic by the law of the excluded middle.

Boyer and Sandu note (as known in an earlier literature) that we can relate the two types of truth by extending the semantic game to allow backward moves. That is, at any node, or at the end of the game, Verifier can go back to any node she played and change her action. Verifier wins if she has a finite winning strategy. It turns out that Verifier can win in the game with backward moves if and only if she can win in the standard game. Further, if a statement can be proven, Verifier can win in the game with backward moves using a recursive strategy. This has some interesting implications for Godel sentences (“This sentence is not provable within the current system.”) which I don’t wish to discuss here.

Note that all of this is just the use of game theory in “games against nature”. We usually think of game theory as being a tool for the analysis of situations with strategic interaction, but the condition that players are rational perfect optimizers means that, in zero sum games, checking whether something is possible for some player just involves checking whether a player called Nature has a winning strategy against him. This technique is so broadly applicable, in economics and otherwise, that we ought really be careful about defining game theory as solely a tool for analyzing the strategies of multiple “actual” agents; e.g., Wikipedia quotes Myerson’s definition that game theory is “the study of mathematical models of conflict and cooperation between intelligent rational decision-makers”. This is too limiting.

Final copy. This article appeared in Synthese 187/March 2012. Philosophers seem to rarely put their working papers online, but Springer has taken down the paywall on Synthese throughout December, so you can read the above link even without a subscription.

Game Theory and History, A. Greif & Friends (1993, 1994)

(This post refers to A. Greif, “Contract Enforceability and Economic Institutions in Early Trade: The Maghribi Traders’ Coalition”, AER 1993, and A. Greif, P. Milgrom & B. Weingast, “Coordination, Commitment and Enforcement: The Case of the Merchant Guild,” JPE 1994.)

Game theory, after a rough start, may actually be fulfilling its role as proposed by Herbert Gintis: unifier of the sciences. It goes without saying that game theoretic analysis is widespread in economics, political science (e.g., voter behavior), sociology (network games), law (antitrust), computer science (defending networks against attacks), biology (evolutionary strategies), pure philosophy (more on this in a post tomorrow!), with occasional appearances in psychology, religion (recall Aumann’s Talmud paper), physics (quantum games), etc. But history? Surely game theory, particularly the more complex recent results, has no place there? Yet Avner Greif, an economic historian at Stanford, has shown that games can play a very interesting role indeed in understanding historical events.

Consider first his Maghribi traders paper. In the 11th and 12th century, a group of Judeo-Arabic traders called the Maghribis traded across the Mediterranean. Two institutional aspects of their trade are interesting. First, they all hired agents in foreign cities to carry out their trade, and second, they generally used other Maghribi merchants as their agents. This is quite different from, for instance, Italy, where merchants tended to hire agents in foreign cities who were not themselves merchants. What explains that difference, and more generally, how can long distance traders insure that traders do not rip you off? For instance, how do I keep them from claiming they sold at a low price when actually they sold at a high one?

To a theorist, this looks like a repeated reputational game with imperfect monitoring. Greif doesn’t go the easy route and just assume there are trustworthy and untrustworthy types. Rather, he assumes that there are a set of potential agents who can be hired in each period, that agents are exogenously separated from merchants with probability p in each period, and that merchants can choose to hire and fire at any wage they choose. You probably know from economics of reputation or from the efficiency wage literature that I need to offer wages higher than the agent’s outside option to keep him from stealing; the value of the continuation game, then, is more than the value of stealing now. Imagine that I fire anyone who steals and never hire him again. How do I ensure that other firms do not then hire that same agent (perhaps the agent will say, “Look, give me a second chance and I will work at a lower wage”)? Well, an agent who has cheated one merchant will never be hired by that merchant again. This means that when he is in the unemployed pool, even if other merchants are willing to hire him, his probability of getting hired is lower, since one merchant will definitely not hire him. That means that the continuation value of the game if he doesn’t steal from me is lower. Therefore, the efficiency wage I must pay him to keep him from stealing is higher than the efficiency wage I can pay someone who hasn’t ever stolen, so I strictly prefer to hire agents who have never stolen. This allows the whole coalition to coordinate. Note that the fewer agents there are, the higher the continuation value from not stealing, and hence the lower the efficiency wage I can pay: it is optimal to keep the set of potential agents small.

What of the Italian merchants? Why do they not hire only each other? Maghribi merchants tended to be involved only in long distance trade, while Italian merchants were also involved in real estate and other pursuits. This means the outside option (continuation value after cheating if no one hires me again) is higher for Italian merchants than Maghribi merchants, which means that hiring merchants at the necessary efficiency wage will be relatively more expensive for Italians than Maghribis.

A followup by Greif, with Milgrom and Weingast, considers the problem of long distance trade from the perspective of cities. Forget about keeping your agent from ripping you off: how do you keep the city from ripping you off? For instance, Genoans in Constantinople had their district overrun by a mob at one point, with no compensation offered. Sicilians raised taxes on sales by Jews at one point after they had brought their goods for sale. You may naively think that reputation alone will be enough; I won’t rip anyone off because I want a reputation of being a safe and fair city for trade.

But again, the literature of repeated games tells us this will not work. Generally, I need to punish deviations from the efficient set of strategies, and punish those who themselves do not punish deviators. In terms of medieval trade, to keep a city from ripping me off, I need not only to punish the city by bringing it less trade, but I also need to make sure the city doesn’t make up for my lost trade by offering a special deal to some other trader. That is, I need to get information about violation against a single trader to other traders, and I need to make sure they are willing to punish the deviating city.

The merchant guild was the institution that solved this problem. Merchant guilds were able to punish their own members by, for example, keeping them from earning rents from special privileges in their own city. In the most general setting, when a guild orders a boycott, cities may be able to attract some trade, but less than the efficient amount, because only by offering a particularly good deal to the merchants who come during a boycott will entice them to come and to credibly believe the city will not steal.

This is all to say that strong guilds may be in the best interest of cities since they allow the city to solve its commitment problem. The historical record confirms many examples of cities encouraging guilds to come trade, and encouraging the strengthening of guilds. Only a reputational model like the above one can explain such city behavior; if guilds are merely extracting rents with monopoly privilege, cities would not encourage them all. Both of these papers, I think, are quite brilliant.

1993 AER (IDEAS version) and 1994 JPE (IDEAS version). Big thumbs up to Avner for having the final published versions of these papers on his website.

“Unbeatable Imitation,” P. Duersch, J. Oechssler & B. C. Schipper (2012)

People, particularly in relatively unimportant situations, rely on heuristics rather than completely rational foresight. But using heuristics in modeling seems to me undesirable, because players using heuristics can easily be abused by more strategic players. For instance, consider the game of fighter pilot Chicken as in the movie Top Gun. Both players prefer going straight while the opponent swerves to swerving when the opponent swerves (hence showing lack of nerve) to swerving when the opponent goes straight (hence showing a unique lack of nerve) to going straight when the opponent goes straight (hence crashing). Consider playing Chicken over and over against an heuristic-based opponent. Perhaps the opponent simply best responds to whatever you did in the previous period. In this case, if I go straight in period 1, the opponent swerves in the next period, and if I swerve, the opponent goes straight. Therefore, I’ll go straight in periods 1 through infinity, knowing my opponent will swerve in every period except possibly the first. The sophisticated player will earn a much higher payoff than the unsophisticated one.

Duersch et al show that, in every 2×2 symmetric game and in a large class of N-by-N symmetric two-player games, a simple heuristic called “imitation” has an undiscounted average payoff identical to that which can be achieved by an opponent playing any strategy at all. In imitation, I retain my strategy each period unless the opponent earned strictly more than I did in the previous period, in which case I copy him. Consider Chicken again. If I go straight and the opponent swerves, then I know he will go straight in the following period. In the next period, then, I can either crash into him (causing him to swerve two periods on) or swerve myself (causing him to go straight two periods on). In any case, I can at best get my opponent to swerve while I go straight once every two periods. By symmetry, in the periods where this doesn’t happen, I can at best get the payoff from swerving when my opponent goes straight, meaning my average payoff is no better than my heuristic-based opponent! This is true no matter what strategy is used against the imitating opponent.

Now imitation will fail in many games, of course. Consider Rock-Paper-Scissors. If I play Rock when you play Scissors, then since you imitate, you will switch to Rock in the next period. Knowing this, I will play Paper, and so on, winning every period. Games that have this type of cycling possibility allow me to extract arbitrarily larger higher payoff than the imitating opponent. What’s interesting is that, in finite symmetric two-player games between an imitator and an agent with perfect rationality, games with a possibility of cycling in some subgame are the only ones in which the imitator does not earn the same average payoff per period as the rational player! Checking this condition is difficult, but games with no pure equilibrium in the relative payoff game (i.e., the game where payoffs for each player are equal to the difference in payoffs between players in the original game, hence making the original game zero-sum) always have a cycle, and games which are quasiconcave never do. Many common games (oligopoly competition, Nash bargaining, etc.) can be written as quasiconcave games.

Imitation is really pretty unique. The authors give the example of a 3×3 symmetric oligopoly game, where strategy 1 is “produce Cournot quantity”, strategy 2 is “produce Stackelberg follower quantity” and strategy 3 is “produce Stackelberg leader quantity.” The game has no subgames with cycles as defined above, and hence imitators and the rational player earn the same average payoff (if rational player plays Stackelberg leader and I play something else, then he earns more than me, hence I imitate him next period, hence he best responds by playing Stackelberg follower). Other heuristics do much worse than imitation. A heuristic where you simply best reply just plays Stackelberg follower forever, for example.

This result is quite interesting, and the paper is short; the “useful insight on the worst page” test of a quality paper is easily satisfied. I like this work too because it is related to some ideas I have about the benefits of going first. Consider shifting a symmetric simultaneous game to a symmetric sequential game. Going first has no benefit except that it allows me to commit to my action (and many negatives, of course, including the inability to mix strategies). Likewise a heuristic rule allows the heuristic player to commit to actions without assuming perfection of the equilibrium. So there is a link between “optimal” heuristics and the desire of a rational player to commit to his action in advance if he could so choose.

November 2011 Working Paper (IDEAS version). Final version published in the September 2012 issue of Games and Economic Behavior.

“Aggregate Comparative Statics,” D. Acemoglu & M. K. Jensen (2011)

Enormous parts of economic theory are devoted to “comparative statics”: if one variable changes (one chickpea firm’s cost structure decreases, demand for pineapples goes up, firms receive better information about the size of an oil field before bidding on rights to that field), does another variable increase or decrease (chickpea prices rise or fall, equilibrium supply of pineapples rises or falls, revenue for the oil field auctioneer rises or falls)?

There are a couple ways we can check comparative statics. The traditional way is using the implicit function theorem. Call t the variable you wish to change, and x the endogenous variable who comparative static you wish to check. If you have a model for which you can solve for x as a function of t, then finding the sign of dx/dt (or the nondifferential analogue) is straightforward. This is rare indeed in many common economic models; for instance, if x is supply of firm 1 in a Cournot duopoly, and t is the marginal cost of firm 2, and all I know is that demand satisfies certain properties, then since I haven’t even specified the functional form of demand, I will not be able to take an explicit derivative. We may still be in luck, however. Let f(x(t),t) be the function an agent is maximizing, which depends on x the agent can choose, and t which is exogenous. If the optimal choice x* lies in the interior of a convex strategy set, and f is concave and twice differentiable, then the first order approach applies, and hence at any optima, f'(x*,t)=0. By the chain rule, f'(x*,t)=0=(df/dx*)(dx*/dt)+(df/dt), or dx*/dt=-(df/dt)/(df/dx*).

So far, so good. But often, our maximand is neither differentiable nor concave, our strategy set is not convex, or our solution is not interior. What are we to do? Here, the monotone comparative statics approach (Athey and Milgrom are the early reference here) allows us to sign without many assumptions. We still don’t have many good results for games of strategic substitutes, however; in these types of games, my best response is decreasing in your action, so if a parameter change causes you to increase your action, I will want to decrease my action. Further, there are many games where player actions are neither strategic complements nor strategic substitutes. A good example is a patent race. If you increase your effort, I may best respond by increasing my effort (in order to “stay close”) or decreasing my effort (because I am now so far behind that I have no chance of inventing first).

Acemoglu and Jensen show that both types of games can be handled nicely if the games are “aggregative games” where my action depends only on the aggregate output of all firms rather than the individual actions of other firms. My choice of production in a Cournot oligopoly depends only on aggregate production by all firms, and my choice of effort in an independent Poisson patent race depends only on the cumulative hazard rate of invention. In such a case, if the game is one of strategic substitutes in the choice variable x, and the aggregate quantity is some increasing function of the sum of all x and t, then an increase in t leads to a decrease in equilibrium choices x, and entry by another player increases the aggregate quantity. If t(i) is an idiosyncratic exogenous variable that affects i’s choice, and only affects other players’ actions through the aggregate x, then an increase in t(i) increases x(i) and decreases x(j) for all other j. [When I say "increase" and "decrease" for the equilibrium quantities, when there are multiple equilibria, this means that the entire set of equilibria quantities shift up or down.]

For example, take a Cournot game, where my profits are s(i)*P(X+t)-c(i)(s(i),t(i)), where s(i) is my supply choice, P(X+t) is price given the total industry supply plus a demand shifter t, and c(i) is my cost function for producing s(i) given some idiosyncratic cost shifter t(i) such that s(i) and t(i) obey the single crossing property. Assume P is twice differentiable. Then taking the derivative of the profit function with respect to other players’ supply, we have that my supply is a strategic substitute with other’s supply iff P’+s(i)P”<0. Note that this condition does not depend on any assumption about the cost function c. Then using the theorem in the last paragraph, we have that, in a Cournot game with *any* cost function, a decrease in the demand curve decreases total industry supply, an additional entrant decreases equilibrium supply from existing firms, and a decrease in costs for one firm increases that firm's equilibrium output and decreases the output of all other firms. All we assumed to get this was the strategic substitutes property, twice-differentiability of P, and the single crossing property of the cost function in s and t. As with many comparative statics results, checking strategic substitutability and making sure the signs of s and t are correct for the interpretation of results is often easier than taking implicit derivatives even if your assumptions are such that the implicit approach could conceivably work.

If the strategic substitutes property does not hold, as in a patent race, then with compact, convex strategy sets, a payoff function pseudoconcave in own player strategies, a boundary condition, and a “local solvability” condition are sufficient to get the same results. Local solvability is roughly the condition which guarantees that the player’s own effect on the aggregate when she reallocates after a change in t or t(i) does not alter the result from the previous paragraph enough to flip the comparative static’s sign.

April 2011 Working Paper, as yet unpublished (IDEAS version).

“Evolutionary Dynamics and Backward Induction,” S. Hart (2002)

Let’s follow up yesterday’s post with an older paper on evolutionary games by the always-lucid Sergiu Hart. As noted in the last post, there are many evolutionary dynamics for which the rest points of the evolutionary game played by completely myopic agents and the Nash equilibria of the equivalent static game played by strategic games coincide, which is really quite phenomenal (and since you know there are payoff-suboptimal Nash equilibria, results of this kind have, since Maynard Smith (1973), fundamentally changed our understanding of biology). Nash equilibria is a low bar, however. Since Kuhn (1953), we have also known that every finite game has a backward induction equilibrium, what we now call the subgame perfect equilibrium, in pure strategies. When does the invariant limit distribution of an extensive form evolutionary game coincide with the backward induction equilibrium? (A quick mathematical note: an evolutionary system with mutation, allowing any strategy to “mutate” on some agent with some probability in each state, means that by pure luck the system can move from any state to any other state. We also allow evolutionary systems to have selection, meaning that with some probability in each state an agent switches from his current strategy to one with a higher payoff. This process defines a Markov chain, and since the game is finite and the mutations allow us to reach any state, it is a finite irreducible Markov chain. Such Markov chains have a unique invariant distribution in the limit.)

In general, we can have limit distributions of evolutionary processes that are not the backward induction equilibrium. Consider the following three step game. Agent 1 chooses C or B, then if C was chosen, agent 2 (in the agent-normal form) chooses C2 or B2, then if C and B2 were chosen, agent 3 chooses C3 or B3. The payoff to each agent when B is chosen is (4,0,0), when C and C2 are chosen is (5,9,0), when C, B2 and C3 are chosen is (0,0,0), and when C, B2 and B3 are chosen is (0,10,1). You can see that (C,C2,C3) and (B,B2,B3) are both Nash, but only (B,B2,B3) is subgame perfect, and hence the backward induction equilibrium. Is (B,B2,B3) the limit distribution of the evolutionary game? In the backward induction equilibrium, agent 1 chooses B at the first node, and hence nodes 2 and 3 are never reached, meaning only mutation, and not selection, affect the distribution of strategies at those nodes. Since the Markov chain is ergodic, with probability 1 the proportion of agents at node 2 playing B2 will fall below .2; when that happens, selection at node 1 will push agents toward C instead of B. When this happens, now both nodes 2 and 3 are reached with positive probability. If less than .9 of the agents in 3 are playing B3, then selection will push agents at node 2 toward C2. Selection can therefore push the percentage of agents playing B2 down to 0, and hence (C,C2,C3) can be part of the limit invariant distribution even though it is not the backward induction solution.

So is backward induction unjustifiable from an evolutionary perspective? No! Hart shows that if the number of agents goes to infinity as the probability of mutation goes to zero, then the backward induction solution, when it is unique, is also the only element in the limit invariant distribution of the evolutionary game. How does letting the number of agents go to infinity help? Let Bi be an element of the backward induction equilibrium at node i somewhere in the game tree. Bi must be a best reply in the subgame beginning with i if Bj is played in all descendant nodes by a sufficiently high proportion of the population, so if Bi is not a best reply (and hence selection does not push us toward Bi) it must be that Bj is not being played further down the game tree. If Bi is a best reply in the subgame beginning with i, then most of the population will play Bi because of selection pressures.

Now here’s the trick. Consider the problematic case in the example, when node i is not being reached in a hypothesized limit distribution (if i is reached, then since the probability of mutation goes to zero, selection is much stronger than mutation, and hence non best replies will go away in the limit). Imagine that there is another node g preceding i which is also not reached, and that i is only reached when some strategy outside the backward induction equilibrium is played in g. When g and i are not reached, there is no selection pressure, and hence no reason that the backward induction equilibrium node will be played. Large populations help here. With some small probability, an agent in g mutates such that he plays the node which reaches i. This still has no effect unless there is also a mutation in the node before g that causes g to be reached. The larger the population, the lower the probability the specific individual who mutated in g mutates back before any individual in the node before g mutates. Hence larger populations make it more likely that rare mutations in unreached nodes will “coordinate” in the way needed for selection to take over.

Final GEB version (IDEAS version). Big up to Sergiu for posting final pdfs of all of his papers on his personal website.

“Survival of Dominated Strategies Under Evolutionary Dynamics,” J. Hofbauer & W. Sandholm (2011)

There is a really interesting tension in a lot of economic rhetoric. On the one hand, we have results that derive from optimal behavior by agents with rational foresight: “price equals marginal cost in competitive markets because of profit-maximizing behavior” or “Policy A improves welfare in a dynamic general equilibrium setting with utility-maximizers”. Alternatively, though, we have explanations that rely on dynamic consequences to even non-maximizing agents: “price equals marginal cost in competitive markets because firms who price about MC are driven out by competition” or “Policy A improves welfare in a dynamic general equilibrium, and the dynamic equilibrium is sensible because firms adjust myopically as if in a tatonnement process.”

These two types of explanation, without further proof, are not necessarily the same. Profit-maximizing firms versus firms disciplined by competition give completely different welfare results under monopoly, since the non profit-maximizing monopolist can be very wasteful and yet still make positive profits. In a dynamic context, firms adjust myopically to excess demand in some markets, rather than profit-maximizing according to rational expectations, will not necessarily converge to equilibrium (a friend mentioned that Lucas made precisely this point in a paper in the 1970s).

How can we square the circle? At least in static games, there has been a lot of work here. Nash and other strategic equilibrium concepts are well known. There is also a branch of game theory going back to the 1950s, evolutionary games, where rather than choosing strategically, a probability vector lists what portion of the players are playing a given strategy at a given time, resulting in some payoffs. A revision rule, perhaps stochastic to allow for “mutations” as in biology, then tells us how the vector of strategies updates conditional on payoffs in the previous round. Fudenberg and Kreps’ learning model from the 1980s is a special case.

Amazingly, it is true for almost all sensible revision rules that the set of rest points of the dynamic includes every Nash equilibrium of the underlying static game, and further that for many revision rules the dynamic rest points are exactly equivalent to the set of Nash equilibria. We have one problem, however: dynamic systems needn’t converge to points at all, but rather may converge to cycles or other outcomes.

Hofbauer and Sandholm – Sandholm being both a graduate of my institution and probably the top economist in the world today on evolutionary games – show that for any revision rule satisfying a handful of common properties, we can construct a game where strictly dominated strategies are played with positive probability. This includes any dynamic meeting the following four properties: the population law of motion is continuous in payoffs and the current population vector, there is positive correlation between strategy growth rates and current payoffs, the dynamic is at rest iff the strategy vector is a Nash equilibrium of the underlying static game, and if an unplayed strategy has sufficiently high rewards, then with positive probability some agents begin using it. These criterion are satisfied by “excess payoff dynamics” like BNN where strategies with higher than average payoffs have higher than average growth rates, and by “pairwise comparison dynamics” where agents switch with positive probability to strategies which have higher payoff than their own current payoff. A myopic best response is not continuous, and indeed, myopic best response has been shown to eliminate strictly dominated strategies.

The proof involves a quite difficult topological construction which I don’t discuss here, but it’s worth discussing the consequence of this result. In strategic situations where we may think agents lack full rationality or rational foresight, and where we observe cycle or other non-rest behavior over time, we should be hesitant to ignore strictly dominated actions (particularly ones that are only dominated by a small amount) in our analysis of the situation. There is also scope for policy improvements: if agents are learning using a dynamic which does not rule out strictly dominated strategies, we may be able to provide information which coerces an alternative dynamic which will rule out such strategies.

Final version in Theoretical Economics 6 (IDEAS version). Another big thumbs up to the journal Theoretical Economics, the Lake Wobegon of econ, where the submissions are free, the turnaround time is fast, and all the articles are totally ungated.

“The Nash Bargaining Solution in Economic Modeling,” K. Binmore, A. Rubinsten & A. Wolinsky (1986)

If we form a joint venture, our two firms will jointly earn a profit of N dollars. If our two countries agree to this costly treaty, total world welfare will increase by the equivalent of N dollars. How should we split the profit in the joint venture case, or the costs in the case of the treaty? There are two main ways of thinking about this problem: the static bargaining approach developed first by John Nash, and bargaining outcomes that form the perfect outcome of a strategic game, for which Rubinstein (1982) really opened the field.

The Nash solution says the following. Let us have some pie of size 1 to divide. Let each of us have a threat point, S1 and S2. Then if certain axioms are followed (symmetry, invariance to unimportant transformations of the utility function, Pareto optimality and something called the IIA condition), the bargain is the one that maximizes (u1(p)-u1(S1))*(u2(1-p)-u2(S2)), where p is the share of the pie of size 1 that accrues to player 1. So if we both have linear utility, player 1 can leave and collect .3, and player 2 can leave and collect 0, but a total of 1 is earned by our joint venture, the Nash bargaining solution is the p that maximizes (p-.3)*(1-p-0); that is, p=.65. This is pretty intuitive: 1-.3-0=.7 of surplus is generated by the joint venture, and we each get our outside option plus half of that surplus.

The static outcome is not very compelling, however, as Tom Schelling long ago pointed out. In particular, the outside option looks like a noncredible threat: If player 2 refused to offer player 1 more than .31, then Player 1 would accept given his outside option is only .3. That is, in a one-shot bargaining game, any p between .3 and 1 looks like an equilibrium. It is also not totally clear how we should interpret the utility functions u1 and u2, and the threat points S1 and S2.

Rubinstein bargaining began to fix this. Let players make offers back and forth, and let there be a time period D between each offer. If no agreement is reached after T periods, we both get our outside options. Under some pretty compelling axioms, there is a unique perfect equilibrium whereby player 1 gets p* if he makes the first offer, and p** if player 2 makes the first offer. Roughly, if the time between offers is D, player 1 must offer player 2 a high enough share that player 2 is indifferent between that share today and the amount he could earn when he makes an offer in the next period. Note that the outside options do not come into play unless, say, player 1’s outside option is higher than min{p*,p**}. Note also that as D goes to 0, all of the difference in bargaining power has to do with who is more patient. Binmore et al modify this game so that, instead of discounting the future, rather there is a small chance that the gains from negotiation will disappear (“breakdown”) in between every period; for instance, we may want to form a joint venture to invent some product, but while we negotiate, another firm may swoop in and invent it. It turns out that this model, with von Neumann-Morganstern utility functions for each player (though perhaps differing levels of risk aversion) is a special case of Rubinstein bargaining.

Binmore et al prove that as D goes to zero, both strategic cases above have unique perfect equilibria equal to a Nash bargaining solution. But a Nash solution for what utility functions and threat points? The Rubinstein game limits to Nash bargaining where the difference in utilities has to do with time preference, and the threat points S1 and S2 are equal to zero. The breakdown game limits to Nash bargaining where the difference in utilities has to do with risk aversion, and the threat points S1 and S2 are equal to whatever utility we would get from the world after breakdown.

Two important points: first, it was well known that a concave transformation of a utility function leads to a worse outcome in Nash bargaining for that player. But we know from the previous paragraph that this concave transformation is equivalent to a more impatient Rubinstein bargainer: a concave transformation of the utilities in the Nash outcome has to do with changing the patience, not the risk aversion, of players. Second, Schelling was right when he argued that the Nash threat points involve noncredible threats. As long as players prefer their Rubinstein equilibrium outcome to their outside option, the outside option does not matter for the bargaining outcome. Take the example above where one player could leave the joint venture and still earn .3. The limit of Rubinstein bargaining is for each player to earn .5 from the joint venture, not .65 and .35. The fact that one player could leave the joint venture and still earn .3 is totally inconsequential to the negotiation, since the other player knows that this threat is not credible whenever the first player could earn at least .31 by staying. This point is often wildly misunderstood when people apply Nash bargaining solutions: properly defining the threat point matters!

Final RAND version (IDEAS). There has been substantial work since the 80s on the problem of bargaining, particularly in trying to construct models where delay is generated, since Rubinstein guarantees agreement immediately and real-world bargaining rarely ends in one step; unsurprisingly, these newer papers tend to rely on difficult manipulation of theorems using asymmetric information.

Follow

Get every new post delivered to your Inbox.

Join 176 other followers

%d bloggers like this: