Category Archives: Information Econ

Operations Research and the Rise of Applied Game Theory – A Nobel for Milgrom and Wilson

Today’s Nobel Prize to Paul Milgrom and Robert Wilson is the capstone of an incredibly fruitful research line which began in the 1970s in a few small departments of Operations Research. Game theory, or the mathematical study of strategic interaction, dates back to work by Zermelo, Borel and von Neumann in the early 20th century. The famed book by von Neumann and Morganstern was published in 1944, and widely reviewed as one of the most important social scientific works of the century. And yet, it would be three decades before applications of game theory revolutionized antitrust, organizational policy, political theory, trade, finance, and more. Alongside the “credibility revolution” of causal econometrics, and to a lesser extent behavioral economics, applied game theory has been the most important development in economics in the past half century. The prize to Milgrom and Wilson is likely the final one that will go for early applied game theory, joining those in 1994, 2005, 2007, 2014 and 2016 that elevated the abstractions of the 1940s to today’s rigorous interpretations of so many previously disparate economic fields.

Neither Wilson nor Milgrom were trained in pure economics departments. Wilson came out of the decision sciences program of Howard Raiffa at Harvard, and Milgrom was a student of Wilson’s at Stanford Business School. However, the link between operations research and economics is a long one, with the former field often serving as a vector for new mathematical tools before the latter field was quite ready to accept them. In the middle of the century, the mathematics of optimal control and dynamic programming – how to solve problems where today’s action affects tomorrow’s possibilities – were applied to resource allocation by Kantorovich in the Soviet Union and to market economics problems in the West by Koopmans, Samuelson, Solow, and Dorfman. Luce and Raiffa explained how the theory of games and the ideas of Bayesian decision theory apply to social scientific problems. Stan Reiter’s group first at Purdue, then later with Nancy Schwartz at Kellogg MEDS, formally brought operations researchers and economists into the same department to apply these new mathematics to economic problems.

The real breakthrough, however, was the arrival of Bayesian games and subgame perfection from Harsanyi (1968) and Selten (1965, 1975). These tools in combination allow us to study settings where players signal, make strategic moves, bluff, attempt to deter, and so on. From the perspective of an institutional designer, they allow us, alongside Myerson’s revelation principle, to follow Hayek’s ideas formally and investigate how we should organize an economic activity given the differing information and possible actions of each player. Indeed, the Wilson Doctrine argues that practical application of game theory requires attention to these informational features. There remains a more complete intellectual history to be written here, but Paul Milgrom and Al Roth’s mutual interview of the JEP provides a great sense of the intellectual milieu of the 1970s as they developed their ideas. Wilson, the Teacher, and Milgrom, the Popularizer, were at the heart of showing just how widely these new tools in game theory could be applied.

Let us begin with the Popularizer. Milgrom was born and raised in Michigan, taking part in anti-war and anti-poverty protests as a radical student in Ann Arbor in the late 1960s. The 1960s were a strange time, and so Milgrom went straight from the world of student activism to the equally radical world of…working as an actuary for an insurance company. After enrolling in the MBA program at Stanford in the mid-1970s, he was invited to pursue a PhD under his co-laureate Robert Wilson, who, as we shall see, was pursuing an incredibly lucrative combination of operations research and economics with his students. It is hard to overstate how broad Milgrom’s contributions have been, both theoretically and in practice. But we can get a good taste by looking at four: the multitasking problem and the no-trade theorem on the theoretical side, and medieval guilds and modern spectrum auctions on the applied side.

It is perhaps surprising that Milgrom’s most-cited paper was published in the JLEO, well into his career. But the famed multitasking paper is so incredibly informative. The idea is simple: you can motivate someone either with direct rewards or by changing their opportunity cost. For instance, if you want a policeman to walk the beat more often, then make their office particularly dull and full of paperwork. Workers generally have many tasks they can work on, however, which vary in their relative costs. For example, a cop can slack off, arrest people on nonsense charges, or solve murders. Making their office dull will cause them to sit at their office desk for fewer hours, but it likely won’t cause them to solve murders rather than arrest people on nonsense charges. Why not just pay for the solved murders directly? Often it is impossible to measure or observe everything you want done.

If you “reward A while hoping for B”, as Steven Kerr’s famous management paper puts it, you are likely to get a lot of A. If you pay rewards for total arrests, your cops will cite people for riding bikes with no lights. So what can be done? Milgrom and Holmstrom give a simple model where workers exert effort, do some things you can measure and some which you cannot, and you get a payoff depending on both. If a job has some things you care about which are hard to measure, you should use fewer strong incentives on the things you can measure: by paying cops for arrests, you make the opportunity cost of solving murders for the cops who like doing this higher, because now they are giving up the reward they would get from arresting the bicyclists every time they work a murder! Further, you should give workers working on hard-to-measure tasks little job flexibility. The murder cop paid on salary should need to show her face in the office, while the meter maid getting paid based on how many tickets she gives already has a good reason not to shirk while on the clock. Once you start thinking about multitasking and the interaction of incentives with opportunity costs, you start seeing perverse incentives absolutely everywhere.

Milgrom’s writings on incentives within organizations are without a doubt the literature I draw on most heavily when teaching strategic management. It is a shame that the textbook written alongside John Roberts never caught on. For a taste of their basic view of management, check out “The Firm as an Incentive System”, which formally lays out formal incentives, asset ownership, and task assignments as a system of complements which make organizations function well. The field now known as organizational economics has grown to incorporate ideas like information transmission (Garicano 2000 JPE) and the link between relational contracts and firm culture (e.g., Gibbons and Henderson 2011). Yet there remain many questions on why firms are organized the way they are which are open to an enterprising graduate student with a good theoretical background.

Multitasking has a similar feel to many of Milgrom’s great papers: they provide a framework improving our intuition about some effect in the world, rather than just showing a mathematical curiosity. The same is true of his most famous finance paper, the “no-trade theorem” developed with Nancy Stokey. The idea is ex-post obvious but ex-ante incredibly surprising. Imagine that in the market for corn, there is free exchange, and all trades anyone wants to make (to mitigate risk, for use, to try to trade on private information, etc.) have been made. A farmer one day notices a blight on his crop, and suspects this blight is widespread in the region. Therefore, the supply of corn will fall. Can he profit from this insight? Milgrom-Stokey’s answer is no!

How could this be? Even if everyone had identical prior beliefs about corn supply, conditional on getting this information, the farmer definitely has a higher posterior belief about corn price come harvest season than everyone else. However, we assumed that before the farmer saw the blight, all gains from trade had been exhausted, and that it was common knowledge that this was so. The farmer offering to buy corn at a higher price is informative that the farmer has learned something. If the former price was $5/bushel, and the farmer offers you $7, then you know that he has received private information that the corn will be worth even more than $7, hence you should not sell him any. Now, of course there is trade on information all the time; indeed, there are huge sums spent collecting information so that it can be traded on! However, Milgrom-Stokey makes clear just how clear we have to be about what is causing the “common knowledge that all gains from trade were exhausted” assumption to fail. Models with “noise” traders, or models with heterogeneous prior beliefs (a very subtle philosophical issue), have built on Milgrom-Stokey to understand everything from asset bubbles to the collapse in trade in mortgage backed securities in 2008.

When it comes to practical application, Milgrom’s work on auctions is well-known, and formed the basis of his Nobel citation. How did auctions become so “practical”? There is no question that the rise of applied auction theory, with the economist as designer, has its roots in the privatization wave of the 1990s that followed the end of the Cold War. Governments held valuable assets: water rights, resource tracts, spectrum that was proving important for new technologies like the cell phone. Who was to be given these assets, and at what price? Milgrom’s 1995 Churchill lectures formed the basis for a book, “Putting Auction Theory to Work”, which is now essential reading alongside Klemperer’s “Theory and Practice”, for theorists and practitioners alike. Where it is unique is in its focus on the practical details of running auctions.

This focus is no surprise. Milgrom’s most famous theoretical work in his 1982 Econometrica with Robert Weber on optimal auctions which are partly common-value and partly private-value. That is, consider selling a house, where some of the value is your idiosyncratic taste, and some of the value is whether the house has mold. Milgrom and Weber show a seller should reduce uncertainty as much as possible about the “common” part of the value. If the seller does not know this information or can’t credibly communicate it, then unlike in auctions which don’t have that common component, it matters a lot whether how you run the auction. For instance, with a first-price auction, you may bid low even though you like the house because you worry about winning when other bidders noticed the mold and you didn’t. In a second-price auction, the price you pay incorporates in part that information from others, hence leads to more revenue for the homeseller.

In practical auctions more broadly, complements across multiple goods being sold separately, private information about common components, the potential to collude or form bidder rings, and the regularity with which auctions are held and hence the number of expected bidders are all incredibly important to auctioneer revenue and efficient allocation of the object being sold. I omit further details of precisely what Milgrom did in the many auctions he consulted on, as the popular press will cover this aspect of his work well, but it is not out of the question to say that the social value of better allocation of things like wireless spectrum is on the order of tens of billions of dollars.

One may wonder why we care about auctions at all. Why not just assign the item to whoever we wish, and then let the free market settle things such that the person with the highest willingness-to-pay winds up with the good? It seems natural to think that how the good is allocated matters for how much revenue the government earns – selling the object is better on this count than giving it away – but it turns out that the free market will not in general allocate goods efficiently when sellers and buyers are uncertain about who is willing to pay how much for a given object.

For instance, imagine you own a car, and you think I am willing to pay somewhere between $10,000 and $20,000 to buy it from you. I think you are willing to give up the car for somewhere between $5,000 and $15,000. I know my own valuation, so let’s consider the case where I am willing to pay exactly $10,000. If you are willing to sell for $8,000, it seems reasonable that we can strike a deal. This is not the case: since all you know is that I am willing to pay somewhere between $10,000 and $20,000, you do know you can always get a $2,000 profit by selling at $10,000, but also that it’s incredibly unlikely that I will say no if you charge $10,001, or $11,000, or even more. You therefore will be hesitant to strike the deal to sell for 10 flat. This essential tension is the famed Myerson-Satterthwaite Theorem, and it occurs precisely because the buyer and seller do not know each other’s value for the object. A government auctioning off an object initially, however, can do so efficiently in a much wider set of contexts (see Maskin 2004 JEL for details). The details of auction design cannot be fixed merely by letting the market sort things out ex-post: the post-Cold War asset sales had issues not just of equity, but also efficiency. Since auctions today are used to allocate everything from the right to extract water to carbon emissions permits at the heart of global climate change policy, ensuring we get their design right is not just a minor theoretical concern!

The problem of auction design today is, partly because of Milgrom’s efforts, equally prominent in computer science. Many allocation problems are computational, with players being algorithms. This is true of electricity markets in practice, as well as the allocation of online advertisements, the design of blockchain-like mechanisms for decentralized exchange and record-keeping, and methods for preventing denial of service attacks while permitting legitimate access to internet-connected servers. Even when humans remain in the loop to some extent, we need to guarantee not just an efficient algorithm, but a practically-computable equilibrium. Leyton-Brown, Milgrom and Segal discuss this in the context of a recent spectrum auction. The problem of computability turns out to be an old one: Robert Wilson’s early work was on precisely the problem of computing equilibria. Nonetheless, given its importance in algorithmic implementation of mechanisms, it would be no surprise to see many important results in applied game theory come from computer scientists and not just economists and mathematicians in coming years. This pattern of techniques flowing from their originating field to the one where they have important new applications looks a lot like the trail of applied game theory arriving in economics by way of operations research, does it not?

That deep results in game theory can inform the real world goes beyond cases like auctions, where the economic setting is easy to understand. Consider the case of the long distance trade in the Middle Ages. The fundamental problem is that of the Yuan dynasty folk song: when “heaven is high and the emperor is far away”, what stops the distant city you arrive in from confiscatory taxation, or outright theft, of your goods? Perhaps the threat that you won’t return to trade? This is not enough – you may not return, but other traders will be told, “we had to take the goods from the last guy because he broke some rules, but of course we will treat you fairly!” It was quite common for confiscation to be targeted only at one group – the Genoans in Constantinople, the Jews in Sicily – with all other traders being treated fairly.

The theory of repeated games can help explain what to do. It is easiest to reach efficiency when you punish not only the cheaters, but also punish those who do not themselves punish cheaters. That is, the Genoans need to punish not just the Turks by withdrawing business, but also punish the Saracens who would try to make up the trade after the Genoans pull out. The mechanism to do so is a merchant guild, a monopoly which can enforce boycotts in distant cities by taking away a given merchant’s rights in their own city. Greif, Milgrom and Weingast suggest that because merchant guilds allow cities to credibly commit to avoid confiscation, they benefit the cities themselves by increasing the amount of trade. This explains why cities encouraged the formations of guilds – one does not normally encourage your sellers to form a monopsony!

Enough on the student – let us turn to Milgrom’s advisor, Robert Wilson. Wilson was born in the tiny hamlet of Geneva, Nebraska. As discussed above, his doctoral training at Harvard was from Howard Raiffa and the decision theorists, after which he was hired at Stanford, where he has spent his career. As Milgrom is now also back at Stanford, their paths are so intertwined that the two men now quite literally live on the same street.

Wilson is most famous for his early work applying the advances of game theory in the 1970s to questions in auction design and reputation. His 3 page paper written in 1966 and published in Management Science in 1969 gives an early application of Harsanyi’s theory of Bayesian games to the “winner’s curse”. The winner’s curse means that the winner of an auction for a good with a “common value” – for instance, a tract of land that either has oil or does not – optimally bids less in a first-price auction than what they believe that good to be worth, or else loses money.

One benefit of being an operations researcher is that there is a tight link in that field between academia and industry. Wilson consulted with the Department of the Interior on oil licenses, and with private oil companies on how they bid in these auctions. What he noticed was that managers often shaded down their engineer’s best estimates of the value of an oil tract. The reason why is, as the paper shows, very straightforward. Assume we both get a signal uniformly distributed on [x-1,x+1] about the value of the tract, where x is the true value. Unconditionally, my best estimate of the value of the plot is exactly my signal. However, conditional on winning the auction, my signal was higher than my rivals. Therefore, if I knew my rival’s signal, I would have bid exactly halfway between the two. Of course, I don’t know her signal. But since my payoff is 0 if I don’t win, and my payoff is my bid minus x if I win, there is a straightforward formula, which depends on the distribution of the signals, for how much I should shade my bid. Many teachers have drawn on Bob’s famous example of the winner’s curse by auctioning off a jar of coins in class, the winner inevitably being the poor student who doesn’t realize they should have shaded their bid!

Wilson not only applied these new game theoretic tools, but also developed many of them. This is particularly true in 1982, when he published all three of his most cited papers: a resolution of the “Chain store paradox”, the idea of sequential equilibria, and the “Gang of Four” reputation paper with Kreps, Roberts, and Milgrom. To understand these, we need to understand the problem of non-credible threats.

The chain store paradox goes like this. Consider Walmart facing a sequence of potential competitors. If they stay out, Walmart earns monopoly profits in the town. If they enter, Walmart can either fight (in which case both make losses) or accept the entry (in which case they both earn duopoly profits, lower than what Walmart made as a monopolist). It seems intuitive that Walmart should fight a few early potential competitors to develop a reputation for toughness. Once they’ve done it, no one will enter. But if you think through the subgame perfect equilibrium here, the last firm who could enter knows that after they enter, Walmart is better off accepting the entry. Hence the second-to-last firm reasons that Walmart won’t benefit from establishing a reputation for deterrence, and hence won’t fight it. And likewise for the third-to-last entrant and so on up the line: Walmart never fights because it can’t “credibly” threaten to fight future entrants regardless of what it did in the past.

This seems odd. Kreps and Wilson (JET 1982) make an important contribution to reputation building by assuming there are two types of Walmart CEOs: a tough one who enjoys fighting, and a weak one with the normal payoffs above. Competitors don’t know which Walmart they are facing. If there is even a small chance the rivals think Walmart is tough, then even the weak Walmart may want to fight early rivals by “pretending” to be tougher than they are. Can this work as an equilibrium? We really need a new concept, because we both want the game to be perfect, where at any time, players play Nash equilibria from that point forward, and Bayesian, where players have beliefs about the others’ type and update those beliefs according to the hypothesized equilibrium play. Kreps and Wilson show how to do this in their Econometrica introducing sequential equilibria. The idea here is that equilibria involve strategies and beliefs at every node of the game tree, with both being consistent along the equilibrium path. Beyond having the nice property of allowing us to specifically examine the beliefs at any node, even off the equilibrium path, sequential equilibria are much simpler to compute than similar ideas like trembling hand perfection. Looking both back to Wilson’s early work on how to compute Nash equilibria, and Milgrom’s future work on practical mechanism design, is it not surprising to see the idea of practical tractability appear even back in 1982.

This type of reputation-building applies even to cooperation – or collusion, as cooperating when it is your interest to cheat and colluding when it is in your interest to undercut are the same mathematical problem. The Gang of Four paper by Kreps, Wilson, Milgrom, and Roberts shows that in finite prisoner’s dilemmas, you can get quite a bit of cooperation just with a small probability that your rival is an irrational type who always cooperates as long as you do so. Indeed, the Gang of Four show precisely how close to the end of the game players will cooperate for a given small belief that a rival is the naturally-cooperative type. Now, one may worry that allowing types in this way gives too much leeway for the modeler to justify any behavior, and indeed this is so. Nonetheless, the 1982 papers kicked off an incredibly fruitful search for justifications for reputation building – and given the role of reputation in everything from antitrust to optimal policy from central banks, a rigorous justification is incredibly important to understanding many features of the economic world.

I introduced Robert Wilson as The Teacher. This is not meant to devalue his pure research contributions, but rather to emphasize just how critical he was in developing students at the absolute forefront of applied games. Bengt Holmstrom did his PhD under Wilson in 1978, went to Kellogg MEDS after a short detour in Finland, then moved to Yale and MIT before winning the Nobel Prize. Al Roth studied with Wilson in 1974, was hired at the business school at Illinois, then Pittsburgh, then Harvard and Stanford before winning a Nobel Prize. Paul Milgrom was a 1979 student of Wilson’s, beginning also at MEDS before moving to Yale and Stanford, and winning his own Nobel Prize. This is to say nothing of his students developed later, including the great organizational theorist Bob Gibbons, or his earliest students like Armando Ortega Reichert, whose unpublished dissertation in 1969 contains important early results in auction theory and was an important influence on the limit pricing under incomplete information in Milgrom and Roberts (1982). It is one thing to write papers of Nobel quality. It is something else altogether to produce (at least!) three students who have done the same. And as any teacher is proud of their successful students, surely little is better than winning a Nobel alongside one of them!

Advertisement

Alberto Alesina and Oliver Williamson: Taking Political and Economic Frictions Seriously

Very sad news this week for the economics community: both Oliver Williamson and Alberto Alesina have passed away. Williamson has been in poor health for some time, but Alesina’s death is a greater shock: he apparently had a heart attack while on a hike with his wife, at the young age of 63. While one is most famous for the microeconomics of the firm, and the other for political economy, there is in fact a tight link between their research agendas. They have attempted to open “black boxes” in economic modeling – about why firms organize the way they do, and the nature of political constraints on economic activity – to clarify otherwise strange differences in how firms and governments behave.

First, let us discuss Oliver Williamson, the 2009 Nobel winner (alongside Elinor Ostrom), and student of Ken Arrow and later the Carnegie School. He grew up in Superior, Wisconsin, next to Duluth at the frigid tip of Lake Superior, as the son of two schoolteachers. Trained as an engineer before returning to graduate school, he had a strong technical background. However, he also possessed, in the words of Arrow, the more important trait of “asking good questions”.

Industrial organization in the 1960s was a field that needed a skeptical mind. To a first approximation, any activity that was unusual was presumed to be anti-competitive. Vertical integration as anticompetitive was high on this list. While Williamson was first thinking about the behavior of firms, the famous case of U.S. vs. Arnold, Schwinn reached the Supreme Court. Schwinn, the bicycle company, neither owned distributors nor retailers. However, it did contractually limit distributors from selling bikes to retailers that were not themselves partnered with Schwinn. In 1967, the Supreme Court ruled these contracts an antitrust violation.

Williamson was interested in why a firm might limit these distributors. Let’s start with the ideas of Mr. Coase. Coase argued that transactions in a market are not free: we need to find suppliers, evaluate quality, and so on. The organization of economic activity therefore attempts to economize on these “transaction costs”. In the Coasean world, transaction costs were nebulous, and attracted a great deal of critique. As Williamson, among many others, points out, both buying from a supplier and vertical integration require transaction costs: I need to haggle over the price of the component or else the price of the whole company! Therefore, in an unchanging world, it is not clear that integration does anything to reduce the transaction costs of evaluating what my partner – in procurement or in merger – is capable of. In the case of Schwinn, the transaction costs must be incurred whether we are debating how to split profits with a particular retailer for the upcoming year, or the price of a pallet of bicycles sold to that retailer.

Williamson’s model is richer. He takes change in the relationship as first order: the famous “unprogrammed adaptations”. The relationship between Schwinn and its retailers requires actions by both over time. Because we are not omniscient, no contract will cover every eventuality. When something unexpected happens, and we both want to renegotiate our contract, we are said to be facing an unprogrammed adaptation. For instance, if advertising is useful, and e-scooters unexpectedly become popular after Schwinn and their retailer sign their initial contract, then we will need to renegotiate who pays for those ads. Of course, we will only bother to negotiate at all if Schwinn and the retailer jointly profit from their relationship compared to their next best options, generating so-called “appropriable quasi-rents”.

We now have an explanation for Schwinn’s behavior. They expect frequent haggling with their retailer about which bicycles to advertise, service standards for repairs, employee training, and so on. If these negotiations fail, the next best option is pretty bad – many small towns might only have one full-service bicycle shop, the Schwinn bikes are more popular than alternatives, and Schwinn itself has neither the resources nor the knowledge to run its own full-service chain of retailers efficiently. Schwinn therefore uses exclusive retail contracts to limit the number of retailers it must negotiate with over service standards, advertising, and the like.

While we have focused on the application of transaction costs to antitrust, Williamson’s basic framework extends much further. He saw the problem as one of “choice” versus “contract”. The canonical topic of study in economics is choice: “Economics is the science which studies human behavior as a relationship between ends and scarce means which have alternative uses,” as Lionel Robbins famously puts it. However, constraints also matter. Agents can act only within the bounds of the law, as a function of what other firms are capable of, and so on. Some of these constraints are public – e.g., what tariff rate do we face, are we allowed to put a price on kidneys for exchange, and so on. Williamson focused our attention on private constraints: the contracts, governance structures, and tools to align incentives which help us reach efficiency when information is asymmetric and contracts are incomplete. The timing was perfect: both Williamson and his professor Ken Arrow, along with Alchian, Demsetz, Klein and others, saw how important this “private ordering” was in their work in the 1960s, but that work was largely qualitative. The formal advancements in game theory in the 1970s gave us the tools that permitted formal analyses of contracting let us transform these ideas into a modern field of industrial organization.

Williamson was in no way an ideologue who ignored the possibility of anticompetitive behavior. Indeed, many canonical anticompetitive strategies, such as “raising rivals’ costs” whereby a firm encourages legal restrictions which raise its own cost but raise rival costs to an even greater degree, originate with Williamson. I also particularly like that Williamson both wrote serious economics, but also frequently translated those results for law journals in order to reach a wider audience. Erik Hovenkamp and I tried to follow this legacy recently in our work on the antitrust of startup acquisitions, where we wrote both a theoretical version and a law review article on the implications of this theory for existing legal practice.

Transaction cost economics is now huge and the both the benefits and critiques of this approach are serious (for more, see my course notes on the theory of the firm). Every economist, when looking at “unusual” contracts or mergers, now follows Williamson in simultaneously looking for the strategic anticompetitive explanation and the cost-saving explanation. The name of this balance? Literally, the Williamson tradeoff!

—————–

If Williamson was interested in “private ordering”, Alesina was focused on the public constraints on behavior. He was without question at the head of the table when it came to winning a Nobel for political economy. Economists, by and large, are technocrats. We have models of growth, of R&D, of fiscal policy, of interstate coordination, and so on. These models imply useful policies. The “public choice” critique, that the politicians and bureaucrats implementing these policies, may muck things up, is well known. The “political business cycle” approach of Nordhaus has politicians taking advantage of myopic voters by, for instance, running expansionary, inflation-inducing policy right before an election, generating lower unemployment today but higher inflation tomorrow.

Alesina’s research goes further than either of these approaches. Entering the field after the rational expectations revolution arrived, Alesina saw how skeptical economists were of the idea that politicians could, each election cycle, take advantage of voters in the same way. I like to explain rational expectations to students as the Bob Marley rule: “You can fool some people sometimes, but you can’t fool all the people all the time.” Rather than myopic voters, we have voters who do not perfectly observe the government’s actions or information. Politicians wish to push their preferences (“ideology”) and also to get re-elected (“career concerns”). Voters have differing preferences. We then want to ask: to what extent can politicians use their private information to push preferences that “society” does not necessarily want, and how does that affect the feasibility of political unions, monetary policy, fiscal policy, and so on?

One important uncertainty is that voters are uncertain about who will win an election. Consider a government which can spend on the military or on education (“guns” or “butter”), and can finance this through debt if they like. A benevolent social planner uses debt to finance investment such that the tax burden is distributed over time. In a political macro model, however, Alesina and Tabellini (RESTUD 1990) show that there will be too much debt, especially when elections are close. If I favor military spending more than education, I can jack up the debt when I am in power with military spending. This not only gets me more military today, but also constrains the other party from spending so much on education tomorrow since society’s debt load will be too high. In equilibrium, both parties try to constrain their rival’s action in the future by using debt spending today. The model makes clear predictions about how debt relates to fundamentals of society – political polarization, and so on – without requiring irrationality on the part of any actor, whether voter or politician.

It is not hard to see how the interests of economists are so heavily linked to their country of origin. Many of our best macroeconomists come from Argentina, home of a great deal of macroeconomic instability. Americans are overrepresented in applied micro, no surprise given the salience of health, education, and labor issues in U.S. political debates. The French, with their high level of technical training in schools and universities, have many great theorists. And no surprise, the Italians are often interested in how political incentives affect and limit economic behavior. Once you start applying Alesina’s ideas, the behavior of politicians and implications for society become clear. Why do politicians delegate some tasks to bureaucrats and not others? The hard ones the politicians might be blamed for if they fail get delegated, and the ones that allow control of distribution do not ((Alesina and Tabellini 2007 AER). Why doesn’t the US have a strong welfare state compared to Europe? The distortions from taxation, relative income mobility, or political power of the poor are relatively unimportant to the racial fractionalization which also explains changes in European preferences over time (Alesina, Glaeser and Sacerdote, Brookings 2001 and Alesina, Miano and Stantcheva 2018).

Perhaps the most salient of Alesina’s questions is one of his oldest (Alesina and Spoloare, QJE 1997): why are there so many countries? Are there “too many”, and what could this mean? In a crisis like Covid, would we be better off with a European fiscal union rather than a bunch of independent countries? Big countries can raise funds with less distortion, public goods often economies of scale, and transfers within countries can handle idiosyncratic regional shocks – these are both assumptions and empirical facts. On the other hand, the bigger the country, the less agreement on how to value public goods. Consider a region on the outskirts of an existing country – say, Sudtirol in Italy. If they secede, they pay higher taxes for their public goods, but the public goods provided are much closer to their preferences. In a democratic secession, these Sudtirol voters do not account for how their secession causes the cost of government in the remaining rump of Italy to rise. Hence they are too likely to secede, versus what a social planner prefers.

We can see this effect in the EU right now. An EU fiscal union would reduce the cost of providing some public goods, insurance to shocks among them. However, the Germans and Dutch have very different public goods preferences from the Italians and Greeks. A planner would balance the marginal cost of lower alignment for the average EU citizen against the marginal benefit of lower public goods costs. A German elected leader will weigh the marginal cost of lower alignment for the average German citizen (worse than that of the EU median citizen!) against the marginal benefit of lower public goods costs (less important, because it doesn’t account for cheaper public goods for Greeks and Italians when Germany joins them to borrow funds jointly). We therefore get too little coordinated fiscal action. This lack of action of public goods makes some Europeans skeptical of other aspects of the EU project: one of Alesina’s final op-eds was on was on the disastrously nationalistic EU response to Covid. Luis Garicano, the well-known Spanish economist and current MEP, has a very interesting discussion with Luigi Zingales on precisely this point.

It’s positive enough that Alesina’s work was well-respected in political science and not just economics. What I especially like about Alesina, though, is how ideologically confusing his policy advice is, especially for an American. He simultaneously supports a lower tax rate for women on the basis on intrafamily dynamics, and was the leading proponent of expansionary austerity, or spending cuts during recessions! The tax rate idea is based on the greater elasticity of labor supply of women, hence is a direct application of the Ramsey rule. Expansionary austerity is based on a serious review of austerity policies over many decades. He pushed these ideas and many others in at least 10 books and dozens of op-eds (including more than 30 for VoxEU). Agree with these ideas or not – and I object to both! – Alesina nonetheless argued for these positions from a base of serious theory and empirics, rather than from ideology. What worthier legacy could there be for an academic?

The 2018 Fields Medal and its Surprising Connection to Economics!

The Fields Medal and Nevanlinna Prizes were given out today. They represent the highest honor possible for young mathematicians and theoretical computer scientists, and are granted only once every four years. The mathematics involved is often very challenging for outsiders. Indeed, the most prominent of this year’s winners, the German Peter Scholze, is best known for his work on “perfectoid spaces”, and I honestly have no idea how to begin explaining them aside from saying that they are useful in a number of problems in algebraic geometry (the lovely field mapping results in algebra – what numbers solve y=2x – and geometry – noting that those solutions to y=2x form a line). Two of this year’s prizes, however, the Fields given to Alessio Figalli and the Nevanlinna to Constantinos Daskalakis, have a very tight connection to an utterly core question in economics. Indeed, both of those men have published work in economics journals!

The problem of interest concerns how best to sell an object. If you are a monopolist hoping to sell one item to one consumer, where the consumer’s valuation of the object is only known to the consumer but commonly known to come from a distribution F, the mechanism that maximizes revenue is of course the Myerson auction from his 1981 paper in Math OR. The solution is simple: make a take it or leave it offer at a minimum price (or “reserve price”) which is a simple function of F. If you are selling one good and there are many buyers, then revenue is maximized by running a second-price auction with the exact same reserve price. In both cases, no potential buyer has any incentive to lie about their true valuation (the auction is “dominant strategy incentive compatible”). And further, seller revenue and expected payments for all players are identical to the Myerson auction in any other mechanism which allocates goods the same way in expectation, with minor caveats. This result is called “revenue equivalence”.

The Myerson paper is an absolute blockbuster. The revelation principle, the revenue equivalence theorem, and a solution to the optimal selling mechanism problem all in the same paper? I would argue it’s the most important result in economics since Arrow-Debreu-McKenzie, with the caveat that many of these ideas were “in the air” in the 1970s with the early ideas of mechanism design and Bayesian game theory. The Myerson result is also really worrying if you are concerned with general economic efficiency. Note that the reserve price means that the seller is best off sometimes not selling the good to anyone, in case all potential buyers have private values below the reserve price. But this is economically inefficient! We know that there exists an allocation mechanism which is socially efficient even when people have private information about their willingness to pay: the Vickrey-Clarke-Groves mechanism. This means that market power plus asymmetric information necessarily destroys social surplus. You may be thinking we know this already: an optimal monopoly price is classic price theory generates deadweight loss. But recall that a perfectly-price-discriminating monopolist sells to everyone whose willingness-to-pay exceeds the seller’s marginal cost of production, hence the only reason monopoly generates deadweight loss in a world with perfect information is that we constrain them to a “mechanism” called a fixed price. Myerson’s result is much worse: letting a monopolist use any mechanism, and price discriminate however they like, asymmetric information necessarily destroys surplus!

Despite this great result, there remain two enormous open problems. First, how should we sell a good when we will interact with the same buyer(s) in the future? Recall the Myerson auction involves bidders truthfully revealing their willingness to pay. Imagine that tomorrow, the seller will sell the same object. Will I reveal my willingness to pay truthfully today? Of course not! If I did, tomorrow the seller would charge the bidder with the highest willingness-to-pay exactly that amount. Ergo, today bidders will shade down their bids. This is called the “ratchet effect”, and despite a lot of progress in dynamic mechanism design, we have still not fully solved for the optimal dynamic mechanism in all cases.

The other challenging problem is one seller selling many goods, where willingness to pay for one good is related to willingness to pay for the others. Consider, for example, selling cable TV. Do you bundle the channels together? Do you offer a menu of possible bundles? This problem is often called “multidimensional screening”, because you are attempting to “screen” buyers such that those with high willingness to pay for a particular good actually pay a high price for that good. The optimal multidimensional screen is a devil of a problem. And it is here that we return to the Fields and Nevanlinna prizes, because they turn out to speak precisely to this problem!

What could possibly be the connection between high-level pure math and this particular pricing problem? The answer comes from the 18th century mathematician Gaspard Monge, founder of the Ecole Polytechnique. He asked the following question: what is the cheapest way to move mass from X to Y, such as moving apples from a bunch of distribution centers to a bunch of supermarkets. It turns out that without convexity or linearity assumptions, this problem is very hard, and it was not solved until the late 20th century. Leonid Kantorovich, the 1975 Nobel winner in economics, paved the way for this result by showing that there is a “dual” problem where instead of looking for the map from X to Y, you look for the probability that a given mass in Y comes from X. This dual turns out to be useful in that there exists an object called a “potential” which helps characterize the optimal transport problem solution in a much more tractable way than searching across any possible map.

Note the link between this problem and our optimal auction problem above, though! Instead of moving mass most cheaply from X to Y, we are looking to maximize revenue by assigning objects Y to people with willingness-to-pay drawn from X. So no surprise, the solution to the optimal transport problem when X has a particular structure and the solution to the revenue maximizing mechanism problem are tightly linked. And luckily for us economists, many of the world’s best mathematicians, including 2010 Fields winner Cedric Villani, and this year’s winner Alessio Figalli, have spent a great deal of effort working on exactly this problem. Ivar Ekeland has a nice series of notes explaining the link between the two problems in more detail.

In a 2017 Econometrica, this year’s Nevanlinna winner Daskalakis and his coauthors Alan Deckelbaum and Christos Tzamos, show precisely how to use strong duality in the optimal transport problem to solve the general optimal mechanism problem when selling multiple goods. The paper is very challenging, requiring some knowledge of measure theory, duality theory, and convex analysis. That said, the conditions they give to check an optimal solution, and the method to find the optimal solution, involve a reasonably straightforward series of inequalities. In particular, the optimal mechanism involves dividing the hypercube of potential types into (perhaps infinite) regions who get assigned the same prices and goods (for example, “you get good A and good B together with probability p at price X”, or “if you are unwilling to pay p1 for A, p2 for B, or p for both together, you get nothing”).

This optimal mechanism has some unusual properties. Remember that the Myerson auction for one buyer is “simple”: make a take it or leave it offer at the reserve price. You may think that if you are selling many items to one buyer, you would likewise choose a reserve price for the whole bundle, particularly when the number of goods with independently distributed values becomes large. For instance, if there are 1000 cable channels, and a buyer has value distributed uniformly between 0 and 10 cents for each channel, then by a limit theorem type argument it’s clear that the willingness to pay for the whole bundle is quite close to 50 bucks. So you may think, just price at a bit lower than 50. However, Daskalakis et al show that when there are sufficiently many goods with i.i.d. uniformly-distributed values, it is never optimal to just set a price for the whole bundle! It is also possible to show that the best mechanism often involves randomization, where buyers who report that they are willing to pay X for item a and Y for item b will only get the items with probability less than 1 at specified price. This is quite contrary to my intuition, which is that in most mechanism problems, we can restrict focus to deterministic assignment. It was well-known that multidimensional screening has weird properties; for example, Hart and Reny show that an increase in buyer valuations can cause seller revenue from the optimal mechanism to fall. The techniques Daskalakis and coauthors develop allow us to state exactly what we ought do in these situations previously unknown in the literature, such as when we know we need mechanisms more complicated than “sell the whole bundle at price p”.

The history of economics has been a long series of taking tools from the frontier of mathematics, from the physics-based analogues of the “marginalists” in the 1870s, to the fixed point theorems of the early game theorists, the linear programming tricks used to analyze competitive equilibrium in the 1950s, and the tropical geometry recently introduced to auction theory by Elizabeth Baldwin and Paul Klemperer. We are now making progress on pricing issues that have stumped some of the great theoretical minds in the history of the field. Multidimensional screening is an incredibly broad topic: how ought we regulate a monopoly with private fixed and marginal costs, how ought we tax agents who have private costs of effort and opportunities, how ought a firm choose wages and benefits, and so on. Knowing the optimum is essential when it comes to understanding when we can use simple, nearly-correct mechanisms. Just in the context of pricing, using related tricks to Daskalakis, Gabriel Carroll showed in a recent Econometrica that bundling should be avoided when the principal has limited knowledge about the correlation structure of types, and my old grad school friend Nima Haghpanah has shown, in a paper with Jason Hartline, that firms should only offer high-quality and low-quality versions of their products if consumers’ values for the high-quality good and their relative value for the low versus high quality good are positively correlated. Neither of these results are trivial to prove. Nonetheless, a hearty cheers to our friends in pure mathematics who continue to provide us with the tools we need to answer questions at the very core of economic life!

“Eliminating Uncertainty in Market Access: The Impact of New Bridges in Rural Nicaragua,” W. Brooks & K. Donovan (2018)

It’s NBER Summer Institute season, when every bar and restaurant in East Cambridge, from Helmand to Lord Hobo, is filled with our tribe. The air hums with discussions of Lagrangians and HANKs and robust estimators. And the number of great papers presented, discussed, or otherwise floating around inspires.

The paper we’re discussing today, by Wyatt Brooks at Notre Dame and Kevin Donovan at Yale SOM, uses a great combination of dynamic general equilibrium theory and a totally insane quasi-randomized experiment to help answer an old question: how beneficial is it for villages to be connected to the broader economy? The fundamental insight requires two ideas that are second nature for economists, but are incredibly controversial outside our profession.

First, going back to Nobel winner Arthur Lewis if not much earlier, economists have argued that “structural transformation”, the shift out of low-productivity agriculture to urban areas and non-ag sectors, is fundamental to economic growth. Recent work by Hicks et al is a bit more measured – the individuals who benefit from leaving agriculture generally already have, so Lenin-type forced industrialization is a bad idea! – but nonetheless barriers to that movement are still harmful to growth, even when those barriers are largely cultural as in the forthcoming JPE by Melanie Morton and the well-named Gharad Bryan. What’s so bad about the ag sector? In the developing world, it tends to be small-plot, quite-inefficient, staple-crop production, unlike the growth-generating positive-externality-filled, increasing-returns-type sectors (on this point, Romer 1990). There are zero examples of countries becoming rich without their labor force shifting dramatically out of agriculture. The intuition of many in the public, that Gandhi was right about the village economy and that structural transformation just means dreadful slums, is the intuition of people who lack respect for individual agency. The slums may be bad, but look how they fill up everywhere they exist! Ergo, how bad must the alternative be?

The second related misunderstanding of the public is that credit is unimportant. For folks near subsistence, the danger of economic shocks pushing you near that dangerous cutpoint is so fundamental that it leads to all sorts of otherwise odd behavior. Consider the response of my ancestors (and presumably the author of today’s paper’s ancestors, given that he is a Prof. Donovan) when potato blight hit. Potatoes are an input to growing more potatoes tomorrow, but near subsistence, you have no choice but to eat your “savings” away after bad shocks. This obviously causes problems in the future, prolonging the famine. But even worse, to avoid getting in a situation where you eat all your savings, you save more and invest less than you otherwise would. Empirically, Karlan et al QJE 2014 show large demand for savings instruments in Ghana, and Cynthia Kinnan shows why insurance markets in the developing world are incomplete despite large welfare gains. Indeed, many countries, including India, make it illegal to insure oneself against certain types of negative shocks, as Mobarak and Rosenzweig show. The need to save for low probability, really negative, shocks may even lead people to invest in assets with highly negative annual returns; on this, see the wonderfully-titled Continued Existence of Cows Disproves Central Tenets of Capitalism? This is all to say: the rise of credit and insurance markets unlocks much more productive activity, especially in the developing world, and it is not merely the den of exploitative lenders.

Ok, so insurance against bad shocks matters, and getting out of low-productivity agriculture may matter as well. Let’s imagine you live in a tiny village which is often separated from bigger towns, geographically. What would happen if you somehow lowered the cost of reaching those towns? Well, we’d expect goods-trade to radically change – see the earlier post on Dave Donaldson’s work, or the nice paper on Brazilian roads by Morten and Oliveria. But the benefits of reducing isolation go well beyond just getting better prices for goods.

Why? In the developing world, most people have multiple jobs. They farm during the season, work in the market on occasion, do construction, work as a migrant, and so on. Imagine that in the village, most jobs are just farmwork, and outside, there is always the change for day work at a fixed wage. In autarky, I just work on the farm, perhaps my own. I need to keep a bunch of savings because sometimes farms get a bunch of bad shocks: a fire burns my crops, or an elephant stomps on them. Running out of savings risks death, and there is no crop insurance, so I save precautionarily. Saving means I don’t have as much to spend on fertilizer or pesticide, so my yields are lower.

If I can access the outside world, then when my farm gets bad shocks and my savings runs low, I leave the village and take day work to build them back up. Since I know I will have that option, I don’t need to save as much, and hence I can buy more fertilizer. Now, the wage for farmers in the village (including the implicit wage that would keep me on my own farm) needs to be higher since some of these ex-farmers will go work in town, shifting village labor supply left. This higher wage pushes the amount of fertilizer I will buy down, since high wages reduce the marginal productivity of farm improvements. Whether fertilizer use goes up or down is therefore an empirical question, but at least we can say that those who use more fertilizer, those who react more to bad shocks by working outside the village, and those whose savings drops the most should be the same farmers. Either way, the village winds up richer both for the direct reason of having an outside option, and for the indirect reason of being able to reduce precautionary savings. That is, the harm is coming both from the first moment, the average shock to agricultural productivity, but also the second moment, its variance.

How much does this matter is practice? Brooks and Donovan worked with a NGO that physically builds bridges in remote areas. In Nicaragua, floods during the harvest season are common, isolating villages for days at a time when the riverbed along the path to market turns into a raging torrent. In this area, bridges are unnecessary when the riverbed is dry: the land is fairly flat, and the bridge barely reduces travel time when the riverbed isn’t flooded. These floods generally occur exactly during the growing season, after fertilizer is bought, but before crops are harvested, so the goods market in both inputs and outputs is essentially unaffected. And there is nice quasirandom variation: of 15 villages which the NGO selected as needing a bridge, 9 were ruled out after a visit by a technical advisor found the soil and topography unsuitable for the NGO’s relatively inexpensive bridge.

The authors survey villages the year before and the two years after the bridges are built, as well as surveying a subset of villagers with cell phones every two weeks in a particular year. Although N=15 seems worrying for power, the within-village differences in labor market behavior are sufficient that properly bootstrapped estimates can still infer interesting effects. And what do you find? Villages with bridges have many men shift from working in the village to outside in a given week, the percentage of women working outside nearly doubles with most of the women entering the labor force in order to work, the wages inside the village rise while wages outside the village do not, the use of fertilizer rises, village farm profits rise 76%, and the effect of all this is most pronounced on poorer households physically close to the bridge.

All this is exactly in line with the dynamic general equilibrium model sketched out above. If you assumed that bridges were just about market access for goods, you would have missed all of this. If you assumed the only benefit was additional wages outside the village, you would miss a full 1/3 of the benefit: the general equilibrium effect of shifting out workers who are particularly capable working outside the village causes wages to rise for the farm workers who remain at home. These particular bridges show an internal rate of return of nearly 20% even though they do nothing to improve market access for either inputs and outputs! And there are, of course, further utility benefits from reducing risk, even when that risk reduction does not show up in income through the channel of increased investment.

November 2017 working paper, currently R&R at Econometrica (RePEc IDEAS version. Both authors have a number of other really interesting drafts, of which I’ll mention two. Brooks, in a working paper with Joseph Kaposki and Yao Li, identify a really interesting harm of industrial clusters, but one that Adam Smith would have surely identified: they make collusion easier. Put all the firms in an industry in the same place, and establish regular opportunities for their managers to meet, and you wind up getting much less variance in markups than firms which are induced to locate in these clusters! Donovan, in a recent RED with my friend Chris Herrington, calibrates a model to explain why both college attendance and the relative cognitive ability of college grads rose during the 20th century. It’s not as simple as you might think: a decrease in costs, through student loans of otherwise, only affects marginal students, who are cognitively worse than the average existing college student. It turns out you also need a rising college premium and more precise signals of high schoolers’ academic abilities to get both patterns. Models doing work to extract insight from data – as always, this is the fundamental reason why economics is the queen of the social sciences.

Nobel Prize 2016 Part II: Oliver Hart

The Nobel Prize in Economics was given yesterday to two wonderful theorists, Bengt Holmstrom and Oliver Hart. I wrote a day ago about Holmstrom’s contributions, many of which are simply foundational to modern mechanism design and its applications. Oliver Hart’s contribution is more subtle and hence more of a challenge to describe to a nonspecialist; I am sure of this because no concept gives my undergraduate students more headaches than Hart’s “residual control right” theory of the firm. Even stranger, much of Hart’s recent work repudiates the importance of his most famous articles, a point that appears to have been entirely lost on every newspaper discussion of Hart that I’ve seen (including otherwise very nice discussions like Applebaum’s in the New York Times). A major reason he has changed his beliefs, and his research agenda, so radically is not simply the whims of age or the pressures of politics, but rather the impact of a devastatingly clever, and devastatingly esoteric, argument made by the Nobel winners Eric Maskin and Jean Tirole. To see exactly what’s going on in Hart’s work, and why there remains many very important unsolved questions in this area, let’s quickly survey what economists mean by “theory of the firm”.

The fundamental strangeness of firms goes back to Coase. Markets are amazing. We have wonderful theorems going back to Hurwicz about how competitive market prices coordinate activity efficiently even when individuals only have very limited information about how various things can be produced by an economy. A pencil somehow involves graphite being mined, forests being explored and exploited, rubber being harvested and produced, the raw materials brought to a factory where a machine puts the pencil together, ships and trains bringing the pencil to retail stores, and yet this decentralized activity produces a pencil costing ten cents. This is the case even though not a single individual anywhere in the world knows how all of those processes up the supply chain operate! Yet, as Coase pointed out, a huge amount of economic activity (including the majority of international trade) is not coordinated via the market, but rather through top-down Communist-style bureaucracies called firms. Why on Earth do these persistent organizations exist at all? When should firms merge and when should they divest themselves of their parts? These questions make up the theory of the firm.

Coase’s early answer is that something called transaction costs exist, and that they are particularly high outside the firm. That is, market transactions are not free. Firm size is determined at the point where the problems of bureaucracy within the firm overwhelm the benefits of reducing transaction costs from regular transactions. There are two major problems here. First, who knows what a “transaction cost” or a “bureaucratic cost” is, and why they differ across organizational forms: the explanation borders on tautology. Second, as the wonderful paper by Alchian and Demsetz in 1972 points out, there is no reason we should assume firms have some special ability to direct or punish their workers. If your supplier does something you don’t like, you can keep them on, or fire them, or renegotiate. If your in-house department does something you don’t like, you can keep them on, or fire them, or renegotiate. The problem of providing suitable incentives – the contracting problem – does not simply disappear because some activity is brought within the boundary of the firm.

Oliver Williamson, a recent Nobel winner joint with Elinor Ostrom, has a more formal transaction cost theory: some relationships generate joint rents higher than could be generated if we split ways, unforeseen things occur that make us want to renegotiate our contract, and the cost of that renegotiation may be lower if workers or suppliers are internal to a firm. “Unforeseen things” may include anything which cannot be measured ex-post by a court or other mediator, since that is ultimately who would enforce any contract. It is not that everyday activities have different transaction costs, but that the negotiations which produce contracts themselves are easier to handle in a more persistent relationship. As in Coase, the question of why firms do not simply grow to an enormous size is largely dealt with by off-hand references to “bureaucratic costs” whose nature was largely informal. Though informal, the idea that something like transaction costs might matter seemed intuitive and had some empirical support – firms are larger in the developing world because weaker legal systems means more “unforeseen things” will occur outside the scope of a contract, hence the differential costs of holdup or renegotiation inside and outside the firm are first order when deciding on firm size. That said, the Alchian-Demsetz critique, and the question of what a “bureaucratic cost” is, are worrying. And as Eric van den Steen points out in a 2010 AER, can anyone who has tried to order paper through their procurement office versus just popping in to Staples really believe that the reason firms exist is to lessen the cost of intrafirm activities?

Grossman and Hart (1986) argue that the distinction that really makes a firm a firm is that it owns assets. They retain the idea that contracts may be incomplete – at some point, I will disagree with my suppliers, or my workers, or my branch manager, about what should be done, either because a state of the world has arrived not covered by our contract, or because it is in our first-best mutual interest to renegotiate that contract. They retain the idea that there are relationship-specific rents, so I care about maintaining this particular relationship. But rather than rely on transaction costs, they simply point out that the owner of the asset is in a much better bargaining position when this disagreement occurs. Therefore, the owner of the asset will get a bigger percentage of rents after renegotiation. Hence the person who owns an asset should be the one whose incentive to improve the value of the asset is most sensitive to that future split of rents.

Baker and Hubbard (2004) provide a nice empirical example: when on-board computers to monitor how long-haul trucks were driven began to diffuse, ownership of those trucks shifted from owner-operators to trucking firms. Before the computer, if the trucking firm owns the truck, it is hard to contract on how hard the truck will be driven or how poorly it will be treated by the driver. If the driver owns the truck, it is hard to contract on how much effort the trucking firm dispatcher will exert ensuring the truck isn’t sitting empty for days, or following a particularly efficient route. The computer solves the first problem, meaning that only the trucking firm is taking actions relevant to the joint relationship which are highly likely to be affected by whether they own the truck or not. In Grossman and Hart’s “residual control rights” theory, then, the introduction of the computer should mean the truck ought, post-computer, be owned by the trucking firm. If these residual control rights are unimportant – there is no relationship-specific rent and no incompleteness in contracting – then the ability to shop around for the best relationship is more valuable than the control rights asset ownership provides. Hart and Moore (1990) extends this basic model to the case where there are many assets and many firms, suggesting critically that sole ownership of assets which are highly complementary in production is optimal. Asset ownership affects outside options when the contract is incomplete by changing bargaining power, and splitting ownership of complementary assets gives multiple agents weak bargaining power and hence little incentive to invest in maintaining the quality of, or improving, the assets. Hart, Schleifer and Vishny (1997) provide a great example of residual control rights applied to the question of why governments should run prisons but not garbage collection. (A brief aside: note the role that bargaining power plays in all of Hart’s theories. We do not have a “perfect” – in a sense that can be made formal – model of bargaining, and Hart tends to use bargaining solutions from cooperative game theory like the Shapley value. After Shapley’s prize alongside Roth a few years ago, this makes multiple prizes heavily influenced by cooperative games applied to unexpected problems. Perhaps the theory of cooperative games ought still be taught with vigor in PhD programs!)

There are, of course, many other theories of the firm. The idea that firms in some industries are big because there are large fixed costs to enter at the minimum efficient scale goes back to Marshall. The agency theory of the firm going back at least to Jensen and Meckling focuses on the problem of providing incentives for workers within a firm to actually profit maximize; as I noted yesterday, Holmstrom and Milgrom’s multitasking is a great example of this, with tasks being split across firms so as to allow some types of workers to be given high powered incentives and others flat salaries. More recent work by Bob Gibbons, Rebecca Henderson, Jon Levin and others on relational contracting discusses how the nexus of self-enforcing beliefs about how hard work today translates into rewards tomorrow can substitute for formal contracts, and how the credibility of these “relational contracts” can vary across firms and depend on their history.

Here’s the kicker, though. A striking blow was dealt to all theories which rely on the incompleteness or nonverifiability of contracts by a brilliant paper of Maskin and Tirole (1999) in the Review of Economic Studies. Theories relying on incomplete contracts generally just hand-waved that there are always events which are unforeseeable ex-ante or impossible to verify in court ex-post, and hence there will always scope for disagreement about what to do when those events occur. But, as Maskin and Tirole correctly point out, agent don’t care about anything in these unforeseeable/unverifiable states except for what the states imply about our mutual valuations from carrying on with a relationship. Therefore, every “incomplete contract” should just involve the parties deciding in advance that if a state of the world arrives where you value keeping our relationship in that state at 12 and I value it at 10, then we should split that joint value of 22 at whatever level induces optimal actions today. Do this same ex-ante contracting for all future profit levels, and we are done. Of course, there is still the problem of ensuring incentive compatibility – why would the agents tell the truth about their valuations when that unforeseen event occurs? I will omit the details here, but you should read the original paper where Maskin and Tirole show a (somewhat convoluted but still working) mechanism that induces truthful revelation of private value by each agent. Taking the model’s insight seriously but the exact mechanism less seriously, the paper basically suggests that incomplete contracts don’t matter if we can truthfully figure out ex-post who values our relationship at what amount, and there are many real-world institutions like mediators who do precisely that. If, as Maskin and Tirole prove (and Maskin described more simply in a short note), incomplete contracts aren’t a real problem, we are back to square one – why have persistent organizations called firms?

What should we do? Some theorists have tried to fight off Maskin and Tirole by suggesting that their precise mechanism is not terribly robust to, for instance, assumptions about higher-order beliefs (e.g., Aghion et al (2012) in the QJE). But these quibbles do not contradict the far more basic insight of Maskin and Tirole, that situations we think of empirically as “hard to describe” or “unlikely to occur or be foreseen”, are not sufficient to justify the relevance of incomplete contracts unless we also have some reason to think that all mechanisms which split rent on the basis of future profit, like a mediator, are unavailable. Note that real world contracts regularly include provisions that ex-ante describe how contractual disagreement ex-post should be handled.

Hart’s response, and this is both clear from his CV and from his recent papers and presentations, is to ditch incompleteness as the fundamental reason firms exist. Hart and Moore’s 2007 AER P&P and 2006 QJE are very clear:

Although the incomplete contracts literature has generated some useful insights about firm boundaries, it has some shortcomings. Three that seem particularly important to us are the following. First, the emphasis on noncontractible ex ante investments seems overplayed: although such investments are surely important, it is hard to believe that they are the sole drivers of organizational form. Second, and related, the approach is ill suited to studying the internal organization of firms, a topic of great interest and importance. The reason is that the Coasian renegotiation perspective suggests that the relevant parties will sit down together ex post and bargain to an efficient outcome using side payments: given this, it is hard to see why authority, hierarchy, delegation, or indeed anything apart from asset ownership matters. Finally, the approach has some foundational weaknesses [pointed out by Maskin and Tirole (1999)].

To my knowledge, Oliver Hart has written zero papers since Maskin-Tirole was published which attempt to explain any policy or empirical fact on the basis of residual control rights and their necessary incomplete contracts. Instead, he has been primarily working on theories which depend on reference points, a behavioral idea that when disagreements occur between parties, the ex-ante contracts are useful because they suggest “fair” divisions of rent, and induce shading and other destructive actions when those divisions are not given. These behavioral agents may very well disagree about what the ex-ante contract means for “fairness” ex-post. The primary result is that flexible contracts (e.g., contracts which deliberately leave lots of incompleteness) can adjust easily to changes in the world but will induce spiteful shading by at least one agent, while rigid contracts do not permit this shading but do cause parties to pursue suboptimal actions in some states of the world. This perspective has been applied by Hart to many questions over the past decade, such as why it can be credible to delegate decision making authority to agents; if you try to seize it back, the agent will feel aggrieved and will shade effort. These responses are hard, or perhaps impossible, to justify when agents are perfectly rational, and of course the Maskin-Tirole critique would apply if agents were purely rational.

So where does all this leave us concerning the initial problem of why firms exist in a sea of decentralized markets? In my view, we have many clever ideas, but still do not have the perfect theory. A perfect theory of the firm would need to be able to explain why firms are the size they are, why they own what they do, why they are organized as they are, why they persist over time, and why interfirm incentives look the way they do. It almost certainly would need its mechanisms to work if we assumed all agents were highly, or perfectly, rational. Since patterns of asset ownership are fundamental, it needs to go well beyond the type of hand-waving that makes up many “resource” type theories. (Firms exist because they create a corporate culture! Firms exist because some firms just are better at doing X and can’t be replicated! These are outcomes, not explanations.) I believe that there are reasons why the costs of maintaining relationships – transaction costs – endogenously differ within and outside firms, and that Hart is correct is focusing our attention on how asset ownership and decision making authority affects incentives to invest, but these theories even in their most endogenous form cannot do everything we wanted a theory of the firm to accomplish. I think that somehow reputation – and hence relational contracts – must play a fundamental role, and that the nexus of conflicting incentives among agents within an organization, as described by Holmstrom, must as well. But we still lack the precise insight to clear up this muddle, and give us a straightforward explanation for why we seem to need “little Communist bureaucracies” to assist our otherwise decentralized and almost magical market system.

Nobel Prize 2016 Part I: Bengt Holmstrom

The Nobel Prize in Economics has been announced, and what a deserving prize it is: Bengt Holmstrom and Oliver Hart have won for the theory of contracts. The name of this research weblog is “A Fine Theorem”, and it would be hard to find two economists whose work is more likely to elicit such a description! Both are incredibly deserving; more than five years ago on this site, I discussed how crazy it was that Holmstrom had yet to win!. The only shock is the combination: a more natural prize would have been Holmstrom with Paul Milgrom and Robert Wilson for modern applied mechanism design, and Oliver Hart with John Moore and Sandy Grossman for the theory of the firm. The contributions of Holmstrom and Hart are so vast that I’m splitting this post into two, so as to properly cover the incredible intellectual accomplishments of these two economists.

The Finnish economist Bengt Holmstrom did his PhD in operations research at Stanford, advised by Robert Wilson, and began his career at my alma mater, the tiny department of Managerial Economics and Decision Sciences at Northwestern’s Kellogg School. To say MEDS struck gold with their hires in this era is an extreme understatement: in 1978 and 1979 alone, they hired Holmstrom and his classmate Paul Milgrom (another Wilson student from Stanford), hired Nancy Stokey promoted Nobel laureate Roger Myerson to Associate Professor, and tenured an adviser of mine, Mark Satterthwaite. And this list doesn’t even include other faculty in the late 1970s and early 1980s like eminent contract theorist John Roberts, behavioralist Colin Camerer, mechanism designer John Ledyard or game theorist Ehud Kalai. This group was essentially put together by two senior economists at Kellogg, Nancy Schwartz and Stanley Reiter, who had the incredible foresight to realize both that applied game theory was finally showing promise of tackling first-order economic questions in a rigorous way, and that the folks with the proper mathematical background to tackle these questions were largely going unhired since they often did their graduate work in operations or mathematics departments rather than traditional economics departments. This market inefficiency, as it were, allowed Nancy and Stan to hire essentially every young scholar in what would become the field of mechanism design, and to develop a graduate program which combined operations, economics, and mathematics in a manner unlike any other place in the world.

From that fantastic group, Holmstrom’s contribution lies most centrally in the area of formal contract design. Imagine that you want someone – an employee, a child, a subordinate division, an aid contractor, or more generally an agent – to perform a task. How should you induce them to do this? If the task is “simple”, meaning the agent’s effort and knowledge about how to perform the task most efficiently is known and observable, you can simply pay a wage, cutting off payment if effort is not being exerted. When only the outcome of work can be observed, if there is no uncertainty in how effort is transformed into outcomes, knowing the outcome is equivalent to knowing effort, and hence optimal effort can be achieved via a bonus payment made on the basis of outcomes. All straightforward so far. The trickier situations, which Holmstrom and his coauthors analyzed at great length, are when neither effort nor outcomes are directly observable.

Consider paying a surgeon. You want to reward the doctor for competent, safe work. However, it is very difficult to observe perfectly what the surgeon is doing at all times, and basing pay on outcomes has a number of problems. First, the patient outcome depends on the effort of not just one surgeon, but on others in the operating room and prep table: team incentives must be provided. Second, the doctor has many ways to shift the balance of effort between reducing costs to the hospital, increasing patient comfort, increasing the quality of the medical outcome, and mentoring young assistant surgeons, so paying on the basis of one or two tasks may distort effort away from other harder-to-measure tasks: there is a multitasking problem. Third, the number of medical mistakes, or the cost of surgery, that a hospital ought expect from a competent surgeon depends on changes in training and technology that are hard to know, and hence a contract may want to adjust payments for its surgeons on the performance of surgeons elsewhere: contracts ought take advantage of relevant information when it is informative about the task being incentivized. Fourth, since surgeons will dislike risk in their salary, the fact that some negative patient outcomes are just bad luck means that you will need to pay the surgeon very high bonuses to overcome their risk aversion: when outcome measures involve uncertainty, optimal contracts will weigh “high-powered” bonuses against “low-powered” insurance against risk. Fifth, the surgeon can be incentivized either by payments today or by keeping their job tomorrow, and worse, these career concerns may cause the surgeon to waste the hospital’s money on tasks which matter to the surgeon’s career beyond the hospital.

Holmstrom wrote the canonical paper on each of these topics. His 1979 paper in the Bell Journal of Economics shows that any information which reduces the uncertainty about what an agent actually did should feature in a contract, since by reducing uncertainty, you reduce the risk premium needed to incentivize the agent to accept the contract. It might seem strange that contracts in many cases do not satisfy this “informativeness principle”. For instance, CEO bonuses are often not indexed to the performance of firms in the same industry. If oil prices rise, essentially all oil firms will be very profitable, and this is true whether or not a particular CEO is a good one. Bertrand and Mullainathan argue that this is because many firms with diverse shareholders are poorly governed!

The simplicity of contracts in the real world may have more prosaic explanations. Jointly with Paul Milgrom, the famous “multitasking” paper published in JLEO in 1991 notes that contracts shift incentives across different tasks in addition to serving as risk-sharing mechanisms and as methods for inducing effort. Since bonuses on task A will cause agents to shift effort away from hard-to-measure task B, it may be optimal to avoid strong incentives at all (just pay teachers a salary rather than a bonus based only on test performance) or to split job tasks (pay bonuses to teacher A who is told to focus only on math test scores, and pay salary to teacher B who is meant to serve as a mentor). That outcomes are generated by teams also motivates simpler contracts. Holmstrom’s 1982 article on incentives in teams, published in the Bell Journal, points out that if both my effort and yours is required to produce a good outcome, then the marginal product of our efforts are both equal to the entire value of what is produced, hence there is not enough output to pay each of us our marginal product. What can be done? Alchian and Demsetz had noticed this problem in 1972, arguing that firms exist to monitor the effort of individuals working in teams. With perfect knowledge of who does what, you can simply pay the workers a wage sufficient to make the optimal effort, then collect the residual as profit. Holmstrom notes that the monitoring isn’t the important bit: rather, even shareholder controlled firms where shareholders do no monitoring at all are useful. The reason is that shareholders can be residual claimants for profit, and hence there is no need to fully distribute profit to members of the team. Free-riding can therefore be eliminated by simply paying team members a wage of X if the team outcome is optimal, and 0 otherwise. Even a slight bit of shirking by a single agent drops their payment precipitously (which is impossible if all profits generated by the team are shared by the team), so the agents will not shirk. Of course, when there is uncertainty about how team effort transforms into outcomes, this harsh penalty will not work, and hence incentive problems may require team sizes to be smaller than that which is first-best efficient. A third justification for simple contracts is career concerns: agents work hard today to try to signal to the market that they are high quality, and do so even if they are paid a fixed wage. This argument had been made less formally by Fama, but Holmstrom (in a 1982 working paper finally published in 1999 in RESTUD) showed that this concern about the market only completely mitigates moral hazard if outcomes within a firm were fully observable to the market, or the future is not discounted at all, or there is no uncertainty about agent’s abilities. Indeed, career concerns can make effort provision worse; for example, agents may take actions to signal quality to the market which are negative for their current firm! A final explanation for simple contracts comes from Holmstrom’s 1987 paper with Milgrom in Econometrica. They argue that simple “linear” contracts, with a wage and a bonus based linearly on output, are more “robust” methods of solving moral hazard because they are less susceptible to manipulation by agents when the environment is not perfectly known. Michael Powell, a student of Holmstrom’s now at Northwestern, has a great set of PhD notes providing details of these models.

These ideas are reasonably intuitive, but the way Holmstrom answered them is not. Think about how an economist before the 1970s, like Adam Smith in his famous discussion of the inefficiency of sharecropping, might have dealt with these problems. These economists had few tools to deal with asymmetric information, so although economists like George Stigler analyzed the economic value of information, the question of how to elicit information useful to a contract could not be discussed in any systematic way. These economists would have been burdened by the fact that the number of contracts one could write are infinite, so beyond saying that under a contract of type X does not equate marginal cost to marginal revenue, the question of which “second-best” contract is optimal is extraordinarily difficult to answer in the absence of beautiful tricks like the revelation principle partially developed by Holmstrom himself. To develop those tricks, a theory of how individuals would respond to changes in their joint incentives over time was needed; the ideas of Bayesian equilibria and subgame perfection, developed by Harsanyi and Selten, were unknown before the 1960s. The accretion of tools developed by pure theory finally permitted, in the late 1970s and early 1980s, an absolute explosion of developments of great use to understanding the economic world. Consider, for example, the many results in antitrust provided by Nobel winner Jean Tirole, discussed here two years ago.

Holmstrom’s work has provided me with a great deal of understanding of why innovation management looks the way it does. For instance, why would a risk neutral firm not work enough on high-variance moonshot-type R&D projects, a question Holmstrom asks in his 1989 JEBO Agency Costs and Innovation? Four reasons. First, in Holmstrom and Milgrom’s 1987 linear contracts paper, optimal risk sharing leads to more distortion by agents the riskier the project being incentivized, so firms may choose lower expected value projects even if they themselves are risk neutral. Second, firms build reputation in capital markets just as workers do with career concerns, and high variance output projects are more costly in terms of the future value of that reputation when the interest rate on capital is lower (e.g., when firms are large and old). Third, when R&D workers can potentially pursue many different projects, multitasking suggests that workers should be given small and very specific tasks so as to lessen the potential for bonus payments to shift worker effort across projects. Smaller firms with fewer resources may naturally have limits on the types of research a worker could pursue, which surprisingly makes it easier to provide strong incentives for research effort on the remaining possible projects. Fourth, multitasking suggests agent’s tasks should be limited, and that high variance tasks should be assigned to the same agent, which provides a role for decentralizing research into large firms providing incremental, safe research, and small firms performing high-variance research. That many aspects of firm organization depend on the swirl of conflicting incentives the firm and the market provide is a topic Holmstrom has also discussed at length, especially in his beautiful paper “The Firm as an Incentive System”; I shall reserve discussion of that paper for a subsequent post on Oliver Hart.

Two final light notes on Holmstrom. First, he is the source of one of my favorite stories about Paul Samuelson, the greatest economic theorist of all time. Samuelson was known for having a steel trap of a mind. At a light trivia session during a house party for young faculty at MIT, Holmstrom snuck in a question, as a joke, asking for the name of the third President of independent Finland. Samuelson not only knew the name, but apparently was also able to digress on the man’s accomplishments! Second, I mentioned at the beginning of this post the illustrious roster of theorists who once sat at MEDS. Business school students are often very hesitant to deal with formal models, partially because they lack a technical background but also because there is a trend of “dumbing down” in business education whereby many schools (of course, not including my current department at The University of Toronto Rotman!) are more worried about student satisfaction than student learning. With perhaps Stanford GSB as an exception, it is inconceivable that any school today, Northwestern included, would gather such an incredible collection of minds working on abstract topics whose applicability to tangible business questions might lie years in the future. Indeed, I could name a number of so-called “top” business schools who have nobody on their faculty who has made any contribution of note to theory! There is a great opportunity for a Nancy Schwartz or Stan Reiter of today to build a business school whose students will have the ultimate reputation for rigorous analysis of social scientific questions.

Yuliy Sannikov and the Continuous Time Approach to Dynamic Contracting

The John Bates Clark Award, given to the best economist in the United States under 40, was given to Princeton’s Yuliy Sannikov today. The JBC has, in recent years, been tilted quite heavily toward applied empirical microeconomics, but the prize for Sannikov breaks that streak in striking fashion. Sannikov, it can be fairly said, is a mathematical genius and a high theorist of the first order. He is one of a very small number of people to win three gold medals at the International Math Olympiad – perhaps only Gabriel Carroll, another excellent young theorist, has an equally impressive mathematical background in his youth. Sannikov’s most famous work is in the pure theory of dynamic contracting, which I will spend most of this post discussing, but the methods he has developed turn out to have interesting uses in corporate finance and in macroeconomic models that wish to incorporate a financial sector without using linearization techniques that rob such models of much of their richness. A quick warning: Sannikov’s work is not for the faint of heart, and certainly not for those scared of an equation or two. Economists – and I count myself among this group – are generally scared of differential equations, as they don’t appear in most branches of economic theory (with exceptions, of course: Romer’s 1986 work on endogenous growth, the turnpike theorems, the theory of evolutionary games, etc.). As his work is incredibly technical, I will do my best to provide an overview of his basic technique and its uses without writing down a bunch of equations, but there really is no substitute for going to the mathematics itself if you find these ideas interesting.

The idea of dynamic contracting is an old one. Assume that a risk-neutral principal can commit to a contract that pays an agent on the basis of observed output, with that output being generated this year, next year, and so on. A risk-averse agent takes an unobservable action in every period, which affects output subject to some uncertainty. Payoffs in the future are discounted. Take the simplest possible case: there are two periods, an agent can either work hard or not, output is either 1 or 0, and the probability it is 1 is higher if the agent works hard than otherwise. The first big idea in the dynamic moral hazard of the late 1970s and early 1980s (in particular, Rogerson 1985 Econometrica, Lambert 1983 Bell J. Econ, Lazear and Moore 1984 QJE) is that the optimal contract will condition period 2 payoffs on whether there was a good or bad outcome in period 1; that is, payoffs are history-dependent. The idea is that you can use payoffs in period 2 to induce effort in period 1 (because continuation value increases) and in period 2 (because there is a gap between the payment following good or bad outcomes in that period), getting more bang for your buck. Get your employee to work hard today by dangling a chance at a big promotion opportunity tomorrow, then actually give them the promotion if they work hard tomorrow.

The second big result is that dynamic moral hazard (caveat: at least in cases where saving isn’t possible) isn’t such a problem. In a one-shot moral hazard problem, there is a tradeoff between risk aversion and high powered incentives. I either give you a big bonus when things go well and none if things go poorly (in which case you are induced to work hard, but may be unhappy because much of the bonus is based on things you can’t control), or I give you a fixed salary and hence you have no incentive to work hard. The reason this tradeoff disappears in a dynamic context is that when the agent takes actions over and over and over again, the principle can, using a Law of Large Numbers type argument, figure out exactly the frequency at which the agent has been slacking off. Further, when the agent isn’t slacking off, the uncertainty in output each period is just i.i.d., hence the principal can smooth out the agent’s bad luck, and hence as the discount rate goes to zero there is no tradeoff between providing incentives and the agent’s dislike of risk. Both of these results will hold even in infinite period models, where we just need to realize that all the agent cares about is her expected continuation value following every action, and hence we can analyze infinitely long problems in a very similar way to two period problems (Spear and Srivistava 1987).

Sannikov revisited this literature by solving for optimal or near-to-optimal contracts when agents take actions in continuous rather than discrete time. Note that the older literature generally used dynamic programming arguments and took the discount rate to a limit of zero in order to get interested results. These dynamic programs generally were solved using approximations that formed linear programs, and hence precise intuition of why the model was generating particular results in particular circumstances wasn’t obvious. Comparative statics in particular were tough – I can tell you whether an efficient contract exists, but it is tough to know how that efficient contract changes as the environment changes. Further, situations where discounting is positive are surely of independent interest – workers generally get performance reviews every year, contractors generally do not renegotiate continuously, etc. Sannikov wrote a model where an agent takes actions that control the mean of output continuously over time with Brownian motion drift (a nice analogue of the agent taking an action that each period generates some output that depends on the action and some random term). The agent has the usual decreasing marginal utility of income, so as the agent gets richer over time, it becomes tougher to incentivize the agent with a few extra bucks of payment.

Solving for the optimal contract essentially involves solving two embedded dynamic optimization problems. The agent optimizes effort over time given the contract the principal committed to, and hence the agent chooses an optimal dynamic history-dependent contract given what the agent will do in response. The space of possible history-dependent contracts is enormous. Sannikov shows that you can massively simplify, and solve analytically, for the optimal contract using a four step argument.

First, as in the discrete time approach, we can simplify things by noting that the agent only cares about their continuous-time continuation value following every action they make. The continuation value turns out to be a martingale (conditioning on history, my expectation of the continuation value tomorrow is just my continuation value today), and is basically just a ledger of my promises that I have made to the agent in the future on the basis of what happened in the past. Therefore, to solve for the optimal contract, I should just solve for the optimal stochastic process that determines the continuation value over time. The Martingale Representation Theorem tells me exactly and uniquely what that stochastic process must look like, under the constraint that the continuation value accurately “tracks” past promises. This stochastic process turns out to have a particular analytic form with natural properties (e.g., if you pay flow utility today, you can pay less tomorrow) that depend on the actions the agents take. Second, plug the agent’s incentive compatibility constraint into our equation for the stochastic process that determines the continuation value over time. Third, we just maximize profits for the principal given the stochastic process determining continuation payoffs that must be given to the agent. The principal’s problem determines an HJB equation which can be solved using Ito’s rule plus some effort checking boundary conditions – I’m afraid these details are far too complex for a blog post. But the basic idea is that we wind up with an analytic expression for the optimal way to control the agent’s continuation value over time, and we can throw all sorts of comparative statics right at that equation.

What does this method give us? Because the continuation value and the flow payoffs can be constructed analytically even for positive discount rates, we can actually answer questions like: should you use long-term incentives (continuation value) or short-term incentives (flow payoffs) more when, e.g., your workers have a good outside option? What happens as the discount rate increases? What happens if the uncertainty in the mapping between the agent’s actions and output increases? Answering questions of these types is very challenging, if not impossible, in a discrete time setting.

Though I’ve presented the basic Sannikov method in terms of incentives for workers, dynamic moral hazard – that certain unobservable actions control prices, or output, or other economic parameters, and hence how various institutions or contracts affect those unobservable actions – is a widespread problem. Brunnermeier and Sannikov have a nice recent AER which builds on the intuition of Kiyotaki-Moore models of the macroeconomy with financial acceleration. The essential idea is that small shocks in the financial sector may cause bigger real economy shocks due to deleveraging. Brunnermeier and Sannikov use the continuous-time approach to show important nonlinearities: minor financial shocks don’t do very much since investors and firms rely on their existing wealth, but major shocks off the steady state require capital sales which further depress asset prices and lead to further fire sales. A particularly interesting result is that exogenous risk is low – the economy isn’t very volatile – then there isn’t much precautionary savings, and so a shock that hits the economy will cause major harmful deleveraging and hence endogenous risk. That is, the very calmness of the world economy since 1983 may have made the eventual recession in 2008 worse due to endogenous choices of cash versus asset holdings. Further, capital requirements may actually be harmful if they aren’t reduced following shocks, since those very capital requirements will force banks to deleverage, accelerating the downturn started by the shock.

Sannikov’s entire oeuvre is essentially a graduate course in a new technique, so if you find the results described above interesting, it is worth digging deep into his CV. He is a great choice for the Clark medal, particularly given the deep and rigorous application he has applied his theory to in recent years. There really is no simple version of his results, but his 2012 survey, his recent working paper on moral hazard in labor contracts, and his dissertation work published in Econometrica in 2007 are most relevant. In related work, we’ve previously discussed on this site David Rahman’s model of collusion with continuous-time information flow, a problem very much related to work by Sannikov and his coauthor Andrzej Skrzypacz, as well as Aislinn Bohren’s model of reputation which is related to the single longest theory paper I’ve ever seen, Sannikov and Feingold’s Econometrica on the possibility of “fooling people” by pretending to be a type that you are not. I also like that this year’s JBC makes me look like a good prognosticator: Sannikov is one of a handful of names I’d listed as particularly deserving just two years ago when Gentzkow won!

“The Contributions of the Economics of Information to Twentieth Century Economics,” J. Stiglitz (2000)

There have been three major methodological developments in economics since 1970. First, following the Lucas Critique we are reluctant to accept policy advice which is not the result of directed behavior on the part of individuals and firms. Second, developments in game theory have made it possible to reformulate questions like “why do firms exist?”, “what will result from regulating a particular industry in a particular way?”, “what can I infer about the state of the world from an offer to trade?”, among many others. Third, imperfect and asymmetric information was shown to be of first-order importance for analyzing economic problems.

Why is information so important? Prices, Hayek taught us, solve the problem of asymmetric information about scarcity. Knowing the price vector is a sufficient statistic for knowing everything about production processes in every firm, as far as generating efficient behavior is concerned. The simple existence of asymmetric information, then, is not obviously a problem for economic efficiency. And if asymmetric information about big things like scarcity across society does not obviously matter, then how could imperfect information about minor things matter? A shopper, for instance, may not know exactly the price of every car at every dealership. But “Natura non facit saltum”, Marshall once claimed: nature does not make leaps. Tiny deviations from the assumptions of general equilibrium do not have large consequences.

But Marshall was wrong: nature does make leaps when it comes to information. The search model of Peter Diamond, most famously, showed that arbitrarily small search costs lead to firms charging the monopoly price in equilibrium, hence a welfare loss completely out of proportion to the search costs. That is, information costs and asymmetries, even very small ones, can theoretically be very problematic for the Arrow-Debreu welfare properties.

Even more interesting, we learned that prices are more powerful than we’d believed. They convey information about scarcity, yes, but also information about other people’s own information or effort. Consider, for instance, efficiency wages. A high wage is not merely a signal of scarcity for a particular type of labor, but is simultaneously an effort inducement mechanism. Given this dual role, it is perhaps not surprising that general equilibrium is no longer Pareto optimal, even if the planner is as constrained informationally as each agent.

How is this? Decentralized economies may, given information cost constraints, exert too much effort searching, or generate inefficient separating equilibrium that unravel trades. The beautiful equity/efficiency separation of the Second Welfare Theorem does not hold in a world of imperfect information. A simple example on this point is that it is often useful to allow some agents suffering moral hazard worries to “buy the firm”, mitigating the incentive problem, but limited liability means this may not happen unless those particular agents begin with a large endowment. That is, a different endowment, where the agents suffering extreme moral hazard problems begin with more money and are able to “buy the firm”, leads to more efficient production (potentially in a Pareto sense) than an endowment where those workers must be provided with information rents in an economy-distorting manner.

It is a strange fact that many social scientists feel economics to some extent stopped progressing by the 1970s. All the important basic results were, in some sense, known. How untrue this is! Imagine labor without search models, trade without monopolistic competitive equilibria, IO or monetary policy without mechanism design, finance without formal models of price discovery and equilibrium noise trading: all would be impossible given the tools we had in 1970. The explanations that preceded modern game theoretic and information-laden explanations are quite extraordinary: Marshall observed that managers have interests different from owners, yet nonetheless are “well-behaved” in running firms in a way acceptable to the owner. His explanation was to credit British upbringing and morals! As Stiglitz notes, this is not an explanation we would accept today. Rather, firms have used a number of intriguing mechanisms to structure incentives in a way that limits agency problems, and we now possess the tools to analyze these mechanisms rigorously.

Final 2000 QJE (RePEc IDEAS)

“Optimal Contracts for Experimentation,” M. Halac, N. Kartik & Q. Liu (2013)

Innovative activities have features not possessed by more standard modes of production. The eventual output, and its value, are subject to a lot of uncertainty. Effort can be difficult to monitor – it is often the case that the researcher knows more than management about what good science should look like. The inherent skill of the scientist is hard to observe. Output is generally only observed in discrete bunches.

These features make contracting for researchers inherently challenging. The classic reference here is Holmstrom’s 1989 JEBO, which just applies his great 1980s incentive contract papers to innovative activities. Take a risk-neutral firm. They should just work on the highest expected value project, right? Well, if workers are risk averse and supply unobserved effort, the optimal contract balances moral hazard (I would love to just pay you based on your output) and risk insurance (I would have to pay you to bear risk about the eventual output of the project). It turns out that the more uncertainty a project has, the more inefficient the information-constrained optimal contract becomes, so that even risk-neutral firms are biased toward relatively safe, lower expected value projects. Incentives within the firm matter in many other ways, as Holmstrom also points out: giving employee multiple tasks when effort is unobserved makes it harder to provide proper incentives because the opportunity cost of a given project goes up, firms with a good reputation in capital markets will be reluctant to pursue risky projects since the option value of variance in reputation is lower (a la Doug Diamond’s 1989 JPE), and so on. Nonetheless, the first order problem of providing incentives for a single researcher on a single project is hard enough!

Holmstrom’s model doesn’t have any adverse selection, however: both employer and employee know what expected output will result from a given amount of effort. Nor is Holmstrom’s problem dynamic. Marina Halac, Navin Kartik and Qingmin Liu have taken up the unenviable task of solving the dynamic researcher contracting problem under adverse selection and moral hazard. Let a researcher be either a high type or a low type. In every period, the researcher can work on a risky project at cost c, or shirk at no cost. The project is either feasible or not, with probability b. If the employee shirks, or the project is bad, there will be no invention this period. If the employee works, the project is feasible, and the employee is a high type, the project succeeds with probability L1, and if the employee is low type, with probability L2<L1. Note that as time goes on, if the employee works on the risk project, they continually update their beliefs about b. If enough time passes without an invention, belief about b becomes low enough that everyone (efficiently) stops working on the risky project. The firm's goal is to get employees to exert optimal effort for the optimal number of period given their type.

Here’s where things really get tricky. Who, in expectation and assuming efficient behavior, stops working on the risky project earlier conditional on not having finished the invention, the high type or the low type? On the one hand, for any belief about b, the high type is more likely to invent, hence since costs are identical for both types, the high type should expect to keep working longer. On the other hand, the high type learns more quickly whether the project is bad, and hence his belief about b declines more rapidly, so he ought expect to work for less time. That either case is possible makes solving for the optimal contract a real challenge, because I need to write the contracts for each type such that the low type does not ever prefer the high type payoffs and vice versa. To know whether these contracts are incentive compatible, I have to know what agents will do if they deviate to the “wrong” contract. The usual trick here is to use a single crossing result along the lines of “for any contract with properties P, action Y is more likely for higher types”. In the dynamic researcher problem, since efficient stopping times can vary nonmonotically with researcher type, the single crossing trick doesn’t look so useful.

The “simple” (where simple means a 30 page proof) case is when the higher types efficiently work longer in expectation. The information-constrained optimum involves inducing the high type to work efficiently, while providing the low type too little incentive to work for the efficient amount of time. Essentially, the high type is willing to work for less money per period if only you knew who he was. Asymmetric information means the high type can extract information rents. By reducing the incentive for the low type to work in later periods, the high type information rent is reduced, and hence the optimal mechanism trades off lower total surplus generated by the low type against lower information rents paid to the high type.

This constrained-optimal outcome can be implemented by paying scientists up front, and then letting them choose either a contract with progressively increasing penalties for lack of success each period, or a contract with a single large penalty if no success is achieved by the socially efficient high type stopping time. Also, “Penalty contracts” are nice because they remain optimal even if scientists can keep their results secret: since secrecy just means paying more penalties, everyone has an incentive to reveal their invention as soon as they create it. The proof is worth going through if you’re into dynamic mechanism design; essentially, the authors are using a clever set of relaxed problems where a form of single crossing will hold, then showing that mechanism is feasible even under the actual problem constraints.

Finally, note that if there is only moral hazard (scientist type is observable) or only adverse selection (effort is observable), the efficient outcome is easy. With moral hazard, just make the agent pay the expected surplus up front, and then provide a bonus to him each period equal to the firm’s profit from an invention occurring then; we usually say in this case that “the firm is sold to the employee”. With adverse selection, we can contract on optimal effort, using total surplus to screen types as in the correlated information mechanism design literature. Even though the “distortion only at the bottom” result looks familiar from static adverse selection, the rationale here is different.

Sept 2013 working paper (No RePEc IDEAS version). The article appears to be under R&R at ReStud.

“Competition in Persuasion,” M. Gentzkow & E. Kamenica (2012)

How’s this for fortuitous timing: I’d literally just gone through this paper by Gentzkow and Kamenica yesterday, and this morning it was announced that Gentzkow is the winner of the 2014 Clark Medal! More on the Clark in a bit, but first, let’s do some theory.

This paper is essentially the multiple sender version of the great Bayesian Persuasion paper by the same authors (discussed on this site a couple years ago). There are a group of experts who can (under commitment to only sending true signals) send costless signals about the realization of the state. Given the information received, the agent makes a decision, and each expert gets some utility depending on that decision. For example, the senders might be a prosecutor and a defense attorney who know the guilt of a suspect, and the agent a judge. The judge convicts if p(guilty)>=.5, the prosecutor wants to maximize convictions regardless of underlying guilt, and vice versa for the defense attorney. Here’s the question: if we have more experts, or less collusive experts, or experts with less aligned interests, is more information revealed?

A lot of our political philosophy is predicated on more competition in information revelation leading to more information actually being revealed, but this is actually a fairly subtle theoretical question! For one, John Stuart Mill and others of his persuasion would need some way of discussing how people competing to reveal information strategically interact, and to the extent that this strategic interaction is non-unique, they would need a way for “ordering” sets of potentially revealed information. We are lucky in 2014, thanks to our friends Nash and Topkis, to be able to nicely deal with each of those concerns.

The trick to solving this model (basically every proof in the paper comes down to algebra and some simple results from set theory; they are clever but not technically challenging) is the main result from the Bayesian Persuasion paper. Draw a graph with the agent’s posterior belief on the X-axis, and the utility (call this u) the sender gets from actions based on each posterior on the y-axis. Now draw the smallest concave function (call it V) that is everywhere greater than u. If V is strictly greater than u at the prior p, then a sender can improve her payoff by revealing information. Take the case of the judge and the prosecutor. If the judge has the prior that everyone brought before them is guilty with probability .6, then the prosecutor never reveals information about any suspect, and the judge always convicts (giving the prosecutor utility 1 rather than 0 from an acquittal). If, however, the judge’s prior is that everyone is guilty with .4, then the prosecutor can mix such that 80 percent of criminals are convicted by judiciously revealing information. How? Just take 2/3 of the innocent people, and all of the guilty people, and send signals that each of these people is guilty with p=.5, and give the judge information on the other 1/3 of innocent people that they are innocent with probability 1. This is plausible in a Bayesian sense. The judge will convict all of the folks where p(guilty)=.5, meaning 80 percent of all suspects are convicted. If you draw the graph described above with u=1 when the judge convicts and u=0 otherwise, it is clear that V>u if and only if p<.5, hence information is only revealed in that case.

What about when there are multiple senders with different utilities u? It is somewhat intuitive: more information is always, almost by definition, informative for the agent (remember Blackwell!). If there is any sender who can improve their payoff by revealing information given what has been revealed thus far, then we are not in equilibrium, and some sender has the incentive to deviate by revealing more information. Therefore, adding more senders increases the amount of information revealed and “shrinks” the set of beliefs that the agent might wind up holding (and, further, the authors show that any Bayesian plausible beliefs where no sender can further reveal information to improve their payoff is an equilibrium). We still have a number of technical details concerning multiplicity of equilibria to deal with, but the authors show that these results hold in a set order sense as well. This theorem is actually great: to check equilibrium information revelation, I only need to check where V and u diverge sender by sender, without worrying about complex strategic interactions. Because of that simplicity, it ends up being very easy to show that removing collusion among senders, or increasing the number of senders, will improve information revelation in equilibrium.

September 2012 working paper (IDEAS version). A brief word on the Clark medal. Gentzkow is a fine choice, particularly for his Bayesian persuasion papers, which are already very influential. I have no doubt that 30 years from now, you will still see the 2011 paper on many PhD syllabi. That said, the Clark medal announcement is very strange. It focuses very heavily on his empirical work on newspapers and TV, and mentions his hugely influential theory as a small aside! This means that five of the last six Clark medal winners, everyone but Levin and his relational incentive contracts, have been cited primarily for MIT/QJE-style theory-light empirical microeconomics. Even though I personally am primarily an applied microeconomist, I still see this as a very odd trend: no prizes for Chernozhukov or Tamer in metrics, or Sannikov in theory, or Farhi and Werning in macro, or Melitz and Costinot in trade, or Donaldson and Nunn in history? I understand these papers are harder to explain to the media, but it is not a good thing when the second most prominent prize in our profession is essentially ignoring 90% of what economists actually do.

%d bloggers like this: