Category Archives: History of Economic Thought

Operations Research and the Rise of Applied Game Theory – A Nobel for Milgrom and Wilson

Today’s Nobel Prize to Paul Milgrom and Robert Wilson is the capstone of an incredibly fruitful research line which began in the 1970s in a few small departments of Operations Research. Game theory, or the mathematical study of strategic interaction, dates back to work by Zermelo, Borel and von Neumann in the early 20th century. The famed book by von Neumann and Morganstern was published in 1944, and widely reviewed as one of the most important social scientific works of the century. And yet, it would be three decades before applications of game theory revolutionized antitrust, organizational policy, political theory, trade, finance, and more. Alongside the “credibility revolution” of causal econometrics, and to a lesser extent behavioral economics, applied game theory has been the most important development in economics in the past half century. The prize to Milgrom and Wilson is likely the final one that will go for early applied game theory, joining those in 1994, 2005, 2007, 2014 and 2016 that elevated the abstractions of the 1940s to today’s rigorous interpretations of so many previously disparate economic fields.

Neither Wilson nor Milgrom were trained in pure economics departments. Wilson came out of the decision sciences program of Howard Raiffa at Harvard, and Milgrom was a student of Wilson’s at Stanford Business School. However, the link between operations research and economics is a long one, with the former field often serving as a vector for new mathematical tools before the latter field was quite ready to accept them. In the middle of the century, the mathematics of optimal control and dynamic programming – how to solve problems where today’s action affects tomorrow’s possibilities – were applied to resource allocation by Kantorovich in the Soviet Union and to market economics problems in the West by Koopmans, Samuelson, Solow, and Dorfman. Luce and Raiffa explained how the theory of games and the ideas of Bayesian decision theory apply to social scientific problems. Stan Reiter’s group first at Purdue, then later with Nancy Schwartz at Kellogg MEDS, formally brought operations researchers and economists into the same department to apply these new mathematics to economic problems.

The real breakthrough, however, was the arrival of Bayesian games and subgame perfection from Harsanyi (1968) and Selten (1965, 1975). These tools in combination allow us to study settings where players signal, make strategic moves, bluff, attempt to deter, and so on. From the perspective of an institutional designer, they allow us, alongside Myerson’s revelation principle, to follow Hayek’s ideas formally and investigate how we should organize an economic activity given the differing information and possible actions of each player. Indeed, the Wilson Doctrine argues that practical application of game theory requires attention to these informational features. There remains a more complete intellectual history to be written here, but Paul Milgrom and Al Roth’s mutual interview of the JEP provides a great sense of the intellectual milieu of the 1970s as they developed their ideas. Wilson, the Teacher, and Milgrom, the Popularizer, were at the heart of showing just how widely these new tools in game theory could be applied.

Let us begin with the Popularizer. Milgrom was born and raised in Michigan, taking part in anti-war and anti-poverty protests as a radical student in Ann Arbor in the late 1960s. The 1960s were a strange time, and so Milgrom went straight from the world of student activism to the equally radical world of…working as an actuary for an insurance company. After enrolling in the MBA program at Stanford in the mid-1970s, he was invited to pursue a PhD under his co-laureate Robert Wilson, who, as we shall see, was pursuing an incredibly lucrative combination of operations research and economics with his students. It is hard to overstate how broad Milgrom’s contributions have been, both theoretically and in practice. But we can get a good taste by looking at four: the multitasking problem and the no-trade theorem on the theoretical side, and medieval guilds and modern spectrum auctions on the applied side.

It is perhaps surprising that Milgrom’s most-cited paper was published in the JLEO, well into his career. But the famed multitasking paper is so incredibly informative. The idea is simple: you can motivate someone either with direct rewards or by changing their opportunity cost. For instance, if you want a policeman to walk the beat more often, then make their office particularly dull and full of paperwork. Workers generally have many tasks they can work on, however, which vary in their relative costs. For example, a cop can slack off, arrest people on nonsense charges, or solve murders. Making their office dull will cause them to sit at their office desk for fewer hours, but it likely won’t cause them to solve murders rather than arrest people on nonsense charges. Why not just pay for the solved murders directly? Often it is impossible to measure or observe everything you want done.

If you “reward A while hoping for B”, as Steven Kerr’s famous management paper puts it, you are likely to get a lot of A. If you pay rewards for total arrests, your cops will cite people for riding bikes with no lights. So what can be done? Milgrom and Holmstrom give a simple model where workers exert effort, do some things you can measure and some which you cannot, and you get a payoff depending on both. If a job has some things you care about which are hard to measure, you should use fewer strong incentives on the things you can measure: by paying cops for arrests, you make the opportunity cost of solving murders for the cops who like doing this higher, because now they are giving up the reward they would get from arresting the bicyclists every time they work a murder! Further, you should give workers working on hard-to-measure tasks little job flexibility. The murder cop paid on salary should need to show her face in the office, while the meter maid getting paid based on how many tickets she gives already has a good reason not to shirk while on the clock. Once you start thinking about multitasking and the interaction of incentives with opportunity costs, you start seeing perverse incentives absolutely everywhere.

Milgrom’s writings on incentives within organizations are without a doubt the literature I draw on most heavily when teaching strategic management. It is a shame that the textbook written alongside John Roberts never caught on. For a taste of their basic view of management, check out “The Firm as an Incentive System”, which formally lays out formal incentives, asset ownership, and task assignments as a system of complements which make organizations function well. The field now known as organizational economics has grown to incorporate ideas like information transmission (Garicano 2000 JPE) and the link between relational contracts and firm culture (e.g., Gibbons and Henderson 2011). Yet there remain many questions on why firms are organized the way they are which are open to an enterprising graduate student with a good theoretical background.

Multitasking has a similar feel to many of Milgrom’s great papers: they provide a framework improving our intuition about some effect in the world, rather than just showing a mathematical curiosity. The same is true of his most famous finance paper, the “no-trade theorem” developed with Nancy Stokey. The idea is ex-post obvious but ex-ante incredibly surprising. Imagine that in the market for corn, there is free exchange, and all trades anyone wants to make (to mitigate risk, for use, to try to trade on private information, etc.) have been made. A farmer one day notices a blight on his crop, and suspects this blight is widespread in the region. Therefore, the supply of corn will fall. Can he profit from this insight? Milgrom-Stokey’s answer is no!

How could this be? Even if everyone had identical prior beliefs about corn supply, conditional on getting this information, the farmer definitely has a higher posterior belief about corn price come harvest season than everyone else. However, we assumed that before the farmer saw the blight, all gains from trade had been exhausted, and that it was common knowledge that this was so. The farmer offering to buy corn at a higher price is informative that the farmer has learned something. If the former price was $5/bushel, and the farmer offers you $7, then you know that he has received private information that the corn will be worth even more than $7, hence you should not sell him any. Now, of course there is trade on information all the time; indeed, there are huge sums spent collecting information so that it can be traded on! However, Milgrom-Stokey makes clear just how clear we have to be about what is causing the “common knowledge that all gains from trade were exhausted” assumption to fail. Models with “noise” traders, or models with heterogeneous prior beliefs (a very subtle philosophical issue), have built on Milgrom-Stokey to understand everything from asset bubbles to the collapse in trade in mortgage backed securities in 2008.

When it comes to practical application, Milgrom’s work on auctions is well-known, and formed the basis of his Nobel citation. How did auctions become so “practical”? There is no question that the rise of applied auction theory, with the economist as designer, has its roots in the privatization wave of the 1990s that followed the end of the Cold War. Governments held valuable assets: water rights, resource tracts, spectrum that was proving important for new technologies like the cell phone. Who was to be given these assets, and at what price? Milgrom’s 1995 Churchill lectures formed the basis for a book, “Putting Auction Theory to Work”, which is now essential reading alongside Klemperer’s “Theory and Practice”, for theorists and practitioners alike. Where it is unique is in its focus on the practical details of running auctions.

This focus is no surprise. Milgrom’s most famous theoretical work in his 1982 Econometrica with Robert Weber on optimal auctions which are partly common-value and partly private-value. That is, consider selling a house, where some of the value is your idiosyncratic taste, and some of the value is whether the house has mold. Milgrom and Weber show a seller should reduce uncertainty as much as possible about the “common” part of the value. If the seller does not know this information or can’t credibly communicate it, then unlike in auctions which don’t have that common component, it matters a lot whether how you run the auction. For instance, with a first-price auction, you may bid low even though you like the house because you worry about winning when other bidders noticed the mold and you didn’t. In a second-price auction, the price you pay incorporates in part that information from others, hence leads to more revenue for the homeseller.

In practical auctions more broadly, complements across multiple goods being sold separately, private information about common components, the potential to collude or form bidder rings, and the regularity with which auctions are held and hence the number of expected bidders are all incredibly important to auctioneer revenue and efficient allocation of the object being sold. I omit further details of precisely what Milgrom did in the many auctions he consulted on, as the popular press will cover this aspect of his work well, but it is not out of the question to say that the social value of better allocation of things like wireless spectrum is on the order of tens of billions of dollars.

One may wonder why we care about auctions at all. Why not just assign the item to whoever we wish, and then let the free market settle things such that the person with the highest willingness-to-pay winds up with the good? It seems natural to think that how the good is allocated matters for how much revenue the government earns – selling the object is better on this count than giving it away – but it turns out that the free market will not in general allocate goods efficiently when sellers and buyers are uncertain about who is willing to pay how much for a given object.

For instance, imagine you own a car, and you think I am willing to pay somewhere between $10,000 and $20,000 to buy it from you. I think you are willing to give up the car for somewhere between $5,000 and $15,000. I know my own valuation, so let’s consider the case where I am willing to pay exactly $10,000. If you are willing to sell for $8,000, it seems reasonable that we can strike a deal. This is not the case: since all you know is that I am willing to pay somewhere between $10,000 and $20,000, you do know you can always get a $2,000 profit by selling at $10,000, but also that it’s incredibly unlikely that I will say no if you charge $10,001, or $11,000, or even more. You therefore will be hesitant to strike the deal to sell for 10 flat. This essential tension is the famed Myerson-Satterthwaite Theorem, and it occurs precisely because the buyer and seller do not know each other’s value for the object. A government auctioning off an object initially, however, can do so efficiently in a much wider set of contexts (see Maskin 2004 JEL for details). The details of auction design cannot be fixed merely by letting the market sort things out ex-post: the post-Cold War asset sales had issues not just of equity, but also efficiency. Since auctions today are used to allocate everything from the right to extract water to carbon emissions permits at the heart of global climate change policy, ensuring we get their design right is not just a minor theoretical concern!

The problem of auction design today is, partly because of Milgrom’s efforts, equally prominent in computer science. Many allocation problems are computational, with players being algorithms. This is true of electricity markets in practice, as well as the allocation of online advertisements, the design of blockchain-like mechanisms for decentralized exchange and record-keeping, and methods for preventing denial of service attacks while permitting legitimate access to internet-connected servers. Even when humans remain in the loop to some extent, we need to guarantee not just an efficient algorithm, but a practically-computable equilibrium. Leyton-Brown, Milgrom and Segal discuss this in the context of a recent spectrum auction. The problem of computability turns out to be an old one: Robert Wilson’s early work was on precisely the problem of computing equilibria. Nonetheless, given its importance in algorithmic implementation of mechanisms, it would be no surprise to see many important results in applied game theory come from computer scientists and not just economists and mathematicians in coming years. This pattern of techniques flowing from their originating field to the one where they have important new applications looks a lot like the trail of applied game theory arriving in economics by way of operations research, does it not?

That deep results in game theory can inform the real world goes beyond cases like auctions, where the economic setting is easy to understand. Consider the case of the long distance trade in the Middle Ages. The fundamental problem is that of the Yuan dynasty folk song: when “heaven is high and the emperor is far away”, what stops the distant city you arrive in from confiscatory taxation, or outright theft, of your goods? Perhaps the threat that you won’t return to trade? This is not enough – you may not return, but other traders will be told, “we had to take the goods from the last guy because he broke some rules, but of course we will treat you fairly!” It was quite common for confiscation to be targeted only at one group – the Genoans in Constantinople, the Jews in Sicily – with all other traders being treated fairly.

The theory of repeated games can help explain what to do. It is easiest to reach efficiency when you punish not only the cheaters, but also punish those who do not themselves punish cheaters. That is, the Genoans need to punish not just the Turks by withdrawing business, but also punish the Saracens who would try to make up the trade after the Genoans pull out. The mechanism to do so is a merchant guild, a monopoly which can enforce boycotts in distant cities by taking away a given merchant’s rights in their own city. Greif, Milgrom and Weingast suggest that because merchant guilds allow cities to credibly commit to avoid confiscation, they benefit the cities themselves by increasing the amount of trade. This explains why cities encouraged the formations of guilds – one does not normally encourage your sellers to form a monopsony!

Enough on the student – let us turn to Milgrom’s advisor, Robert Wilson. Wilson was born in the tiny hamlet of Geneva, Nebraska. As discussed above, his doctoral training at Harvard was from Howard Raiffa and the decision theorists, after which he was hired at Stanford, where he has spent his career. As Milgrom is now also back at Stanford, their paths are so intertwined that the two men now quite literally live on the same street.

Wilson is most famous for his early work applying the advances of game theory in the 1970s to questions in auction design and reputation. His 3 page paper written in 1966 and published in Management Science in 1969 gives an early application of Harsanyi’s theory of Bayesian games to the “winner’s curse”. The winner’s curse means that the winner of an auction for a good with a “common value” – for instance, a tract of land that either has oil or does not – optimally bids less in a first-price auction than what they believe that good to be worth, or else loses money.

One benefit of being an operations researcher is that there is a tight link in that field between academia and industry. Wilson consulted with the Department of the Interior on oil licenses, and with private oil companies on how they bid in these auctions. What he noticed was that managers often shaded down their engineer’s best estimates of the value of an oil tract. The reason why is, as the paper shows, very straightforward. Assume we both get a signal uniformly distributed on [x-1,x+1] about the value of the tract, where x is the true value. Unconditionally, my best estimate of the value of the plot is exactly my signal. However, conditional on winning the auction, my signal was higher than my rivals. Therefore, if I knew my rival’s signal, I would have bid exactly halfway between the two. Of course, I don’t know her signal. But since my payoff is 0 if I don’t win, and my payoff is my bid minus x if I win, there is a straightforward formula, which depends on the distribution of the signals, for how much I should shade my bid. Many teachers have drawn on Bob’s famous example of the winner’s curse by auctioning off a jar of coins in class, the winner inevitably being the poor student who doesn’t realize they should have shaded their bid!

Wilson not only applied these new game theoretic tools, but also developed many of them. This is particularly true in 1982, when he published all three of his most cited papers: a resolution of the “Chain store paradox”, the idea of sequential equilibria, and the “Gang of Four” reputation paper with Kreps, Roberts, and Milgrom. To understand these, we need to understand the problem of non-credible threats.

The chain store paradox goes like this. Consider Walmart facing a sequence of potential competitors. If they stay out, Walmart earns monopoly profits in the town. If they enter, Walmart can either fight (in which case both make losses) or accept the entry (in which case they both earn duopoly profits, lower than what Walmart made as a monopolist). It seems intuitive that Walmart should fight a few early potential competitors to develop a reputation for toughness. Once they’ve done it, no one will enter. But if you think through the subgame perfect equilibrium here, the last firm who could enter knows that after they enter, Walmart is better off accepting the entry. Hence the second-to-last firm reasons that Walmart won’t benefit from establishing a reputation for deterrence, and hence won’t fight it. And likewise for the third-to-last entrant and so on up the line: Walmart never fights because it can’t “credibly” threaten to fight future entrants regardless of what it did in the past.

This seems odd. Kreps and Wilson (JET 1982) make an important contribution to reputation building by assuming there are two types of Walmart CEOs: a tough one who enjoys fighting, and a weak one with the normal payoffs above. Competitors don’t know which Walmart they are facing. If there is even a small chance the rivals think Walmart is tough, then even the weak Walmart may want to fight early rivals by “pretending” to be tougher than they are. Can this work as an equilibrium? We really need a new concept, because we both want the game to be perfect, where at any time, players play Nash equilibria from that point forward, and Bayesian, where players have beliefs about the others’ type and update those beliefs according to the hypothesized equilibrium play. Kreps and Wilson show how to do this in their Econometrica introducing sequential equilibria. The idea here is that equilibria involve strategies and beliefs at every node of the game tree, with both being consistent along the equilibrium path. Beyond having the nice property of allowing us to specifically examine the beliefs at any node, even off the equilibrium path, sequential equilibria are much simpler to compute than similar ideas like trembling hand perfection. Looking both back to Wilson’s early work on how to compute Nash equilibria, and Milgrom’s future work on practical mechanism design, is it not surprising to see the idea of practical tractability appear even back in 1982.

This type of reputation-building applies even to cooperation – or collusion, as cooperating when it is your interest to cheat and colluding when it is in your interest to undercut are the same mathematical problem. The Gang of Four paper by Kreps, Wilson, Milgrom, and Roberts shows that in finite prisoner’s dilemmas, you can get quite a bit of cooperation just with a small probability that your rival is an irrational type who always cooperates as long as you do so. Indeed, the Gang of Four show precisely how close to the end of the game players will cooperate for a given small belief that a rival is the naturally-cooperative type. Now, one may worry that allowing types in this way gives too much leeway for the modeler to justify any behavior, and indeed this is so. Nonetheless, the 1982 papers kicked off an incredibly fruitful search for justifications for reputation building – and given the role of reputation in everything from antitrust to optimal policy from central banks, a rigorous justification is incredibly important to understanding many features of the economic world.

I introduced Robert Wilson as The Teacher. This is not meant to devalue his pure research contributions, but rather to emphasize just how critical he was in developing students at the absolute forefront of applied games. Bengt Holmstrom did his PhD under Wilson in 1978, went to Kellogg MEDS after a short detour in Finland, then moved to Yale and MIT before winning the Nobel Prize. Al Roth studied with Wilson in 1974, was hired at the business school at Illinois, then Pittsburgh, then Harvard and Stanford before winning a Nobel Prize. Paul Milgrom was a 1979 student of Wilson’s, beginning also at MEDS before moving to Yale and Stanford, and winning his own Nobel Prize. This is to say nothing of his students developed later, including the great organizational theorist Bob Gibbons, or his earliest students like Armando Ortega Reichert, whose unpublished dissertation in 1969 contains important early results in auction theory and was an important influence on the limit pricing under incomplete information in Milgrom and Roberts (1982). It is one thing to write papers of Nobel quality. It is something else altogether to produce (at least!) three students who have done the same. And as any teacher is proud of their successful students, surely little is better than winning a Nobel alongside one of them!

Advertisement

Alberto Alesina and Oliver Williamson: Taking Political and Economic Frictions Seriously

Very sad news this week for the economics community: both Oliver Williamson and Alberto Alesina have passed away. Williamson has been in poor health for some time, but Alesina’s death is a greater shock: he apparently had a heart attack while on a hike with his wife, at the young age of 63. While one is most famous for the microeconomics of the firm, and the other for political economy, there is in fact a tight link between their research agendas. They have attempted to open “black boxes” in economic modeling – about why firms organize the way they do, and the nature of political constraints on economic activity – to clarify otherwise strange differences in how firms and governments behave.

First, let us discuss Oliver Williamson, the 2009 Nobel winner (alongside Elinor Ostrom), and student of Ken Arrow and later the Carnegie School. He grew up in Superior, Wisconsin, next to Duluth at the frigid tip of Lake Superior, as the son of two schoolteachers. Trained as an engineer before returning to graduate school, he had a strong technical background. However, he also possessed, in the words of Arrow, the more important trait of “asking good questions”.

Industrial organization in the 1960s was a field that needed a skeptical mind. To a first approximation, any activity that was unusual was presumed to be anti-competitive. Vertical integration as anticompetitive was high on this list. While Williamson was first thinking about the behavior of firms, the famous case of U.S. vs. Arnold, Schwinn reached the Supreme Court. Schwinn, the bicycle company, neither owned distributors nor retailers. However, it did contractually limit distributors from selling bikes to retailers that were not themselves partnered with Schwinn. In 1967, the Supreme Court ruled these contracts an antitrust violation.

Williamson was interested in why a firm might limit these distributors. Let’s start with the ideas of Mr. Coase. Coase argued that transactions in a market are not free: we need to find suppliers, evaluate quality, and so on. The organization of economic activity therefore attempts to economize on these “transaction costs”. In the Coasean world, transaction costs were nebulous, and attracted a great deal of critique. As Williamson, among many others, points out, both buying from a supplier and vertical integration require transaction costs: I need to haggle over the price of the component or else the price of the whole company! Therefore, in an unchanging world, it is not clear that integration does anything to reduce the transaction costs of evaluating what my partner – in procurement or in merger – is capable of. In the case of Schwinn, the transaction costs must be incurred whether we are debating how to split profits with a particular retailer for the upcoming year, or the price of a pallet of bicycles sold to that retailer.

Williamson’s model is richer. He takes change in the relationship as first order: the famous “unprogrammed adaptations”. The relationship between Schwinn and its retailers requires actions by both over time. Because we are not omniscient, no contract will cover every eventuality. When something unexpected happens, and we both want to renegotiate our contract, we are said to be facing an unprogrammed adaptation. For instance, if advertising is useful, and e-scooters unexpectedly become popular after Schwinn and their retailer sign their initial contract, then we will need to renegotiate who pays for those ads. Of course, we will only bother to negotiate at all if Schwinn and the retailer jointly profit from their relationship compared to their next best options, generating so-called “appropriable quasi-rents”.

We now have an explanation for Schwinn’s behavior. They expect frequent haggling with their retailer about which bicycles to advertise, service standards for repairs, employee training, and so on. If these negotiations fail, the next best option is pretty bad – many small towns might only have one full-service bicycle shop, the Schwinn bikes are more popular than alternatives, and Schwinn itself has neither the resources nor the knowledge to run its own full-service chain of retailers efficiently. Schwinn therefore uses exclusive retail contracts to limit the number of retailers it must negotiate with over service standards, advertising, and the like.

While we have focused on the application of transaction costs to antitrust, Williamson’s basic framework extends much further. He saw the problem as one of “choice” versus “contract”. The canonical topic of study in economics is choice: “Economics is the science which studies human behavior as a relationship between ends and scarce means which have alternative uses,” as Lionel Robbins famously puts it. However, constraints also matter. Agents can act only within the bounds of the law, as a function of what other firms are capable of, and so on. Some of these constraints are public – e.g., what tariff rate do we face, are we allowed to put a price on kidneys for exchange, and so on. Williamson focused our attention on private constraints: the contracts, governance structures, and tools to align incentives which help us reach efficiency when information is asymmetric and contracts are incomplete. The timing was perfect: both Williamson and his professor Ken Arrow, along with Alchian, Demsetz, Klein and others, saw how important this “private ordering” was in their work in the 1960s, but that work was largely qualitative. The formal advancements in game theory in the 1970s gave us the tools that permitted formal analyses of contracting let us transform these ideas into a modern field of industrial organization.

Williamson was in no way an ideologue who ignored the possibility of anticompetitive behavior. Indeed, many canonical anticompetitive strategies, such as “raising rivals’ costs” whereby a firm encourages legal restrictions which raise its own cost but raise rival costs to an even greater degree, originate with Williamson. I also particularly like that Williamson both wrote serious economics, but also frequently translated those results for law journals in order to reach a wider audience. Erik Hovenkamp and I tried to follow this legacy recently in our work on the antitrust of startup acquisitions, where we wrote both a theoretical version and a law review article on the implications of this theory for existing legal practice.

Transaction cost economics is now huge and the both the benefits and critiques of this approach are serious (for more, see my course notes on the theory of the firm). Every economist, when looking at “unusual” contracts or mergers, now follows Williamson in simultaneously looking for the strategic anticompetitive explanation and the cost-saving explanation. The name of this balance? Literally, the Williamson tradeoff!

—————–

If Williamson was interested in “private ordering”, Alesina was focused on the public constraints on behavior. He was without question at the head of the table when it came to winning a Nobel for political economy. Economists, by and large, are technocrats. We have models of growth, of R&D, of fiscal policy, of interstate coordination, and so on. These models imply useful policies. The “public choice” critique, that the politicians and bureaucrats implementing these policies, may muck things up, is well known. The “political business cycle” approach of Nordhaus has politicians taking advantage of myopic voters by, for instance, running expansionary, inflation-inducing policy right before an election, generating lower unemployment today but higher inflation tomorrow.

Alesina’s research goes further than either of these approaches. Entering the field after the rational expectations revolution arrived, Alesina saw how skeptical economists were of the idea that politicians could, each election cycle, take advantage of voters in the same way. I like to explain rational expectations to students as the Bob Marley rule: “You can fool some people sometimes, but you can’t fool all the people all the time.” Rather than myopic voters, we have voters who do not perfectly observe the government’s actions or information. Politicians wish to push their preferences (“ideology”) and also to get re-elected (“career concerns”). Voters have differing preferences. We then want to ask: to what extent can politicians use their private information to push preferences that “society” does not necessarily want, and how does that affect the feasibility of political unions, monetary policy, fiscal policy, and so on?

One important uncertainty is that voters are uncertain about who will win an election. Consider a government which can spend on the military or on education (“guns” or “butter”), and can finance this through debt if they like. A benevolent social planner uses debt to finance investment such that the tax burden is distributed over time. In a political macro model, however, Alesina and Tabellini (RESTUD 1990) show that there will be too much debt, especially when elections are close. If I favor military spending more than education, I can jack up the debt when I am in power with military spending. This not only gets me more military today, but also constrains the other party from spending so much on education tomorrow since society’s debt load will be too high. In equilibrium, both parties try to constrain their rival’s action in the future by using debt spending today. The model makes clear predictions about how debt relates to fundamentals of society – political polarization, and so on – without requiring irrationality on the part of any actor, whether voter or politician.

It is not hard to see how the interests of economists are so heavily linked to their country of origin. Many of our best macroeconomists come from Argentina, home of a great deal of macroeconomic instability. Americans are overrepresented in applied micro, no surprise given the salience of health, education, and labor issues in U.S. political debates. The French, with their high level of technical training in schools and universities, have many great theorists. And no surprise, the Italians are often interested in how political incentives affect and limit economic behavior. Once you start applying Alesina’s ideas, the behavior of politicians and implications for society become clear. Why do politicians delegate some tasks to bureaucrats and not others? The hard ones the politicians might be blamed for if they fail get delegated, and the ones that allow control of distribution do not ((Alesina and Tabellini 2007 AER). Why doesn’t the US have a strong welfare state compared to Europe? The distortions from taxation, relative income mobility, or political power of the poor are relatively unimportant to the racial fractionalization which also explains changes in European preferences over time (Alesina, Glaeser and Sacerdote, Brookings 2001 and Alesina, Miano and Stantcheva 2018).

Perhaps the most salient of Alesina’s questions is one of his oldest (Alesina and Spoloare, QJE 1997): why are there so many countries? Are there “too many”, and what could this mean? In a crisis like Covid, would we be better off with a European fiscal union rather than a bunch of independent countries? Big countries can raise funds with less distortion, public goods often economies of scale, and transfers within countries can handle idiosyncratic regional shocks – these are both assumptions and empirical facts. On the other hand, the bigger the country, the less agreement on how to value public goods. Consider a region on the outskirts of an existing country – say, Sudtirol in Italy. If they secede, they pay higher taxes for their public goods, but the public goods provided are much closer to their preferences. In a democratic secession, these Sudtirol voters do not account for how their secession causes the cost of government in the remaining rump of Italy to rise. Hence they are too likely to secede, versus what a social planner prefers.

We can see this effect in the EU right now. An EU fiscal union would reduce the cost of providing some public goods, insurance to shocks among them. However, the Germans and Dutch have very different public goods preferences from the Italians and Greeks. A planner would balance the marginal cost of lower alignment for the average EU citizen against the marginal benefit of lower public goods costs. A German elected leader will weigh the marginal cost of lower alignment for the average German citizen (worse than that of the EU median citizen!) against the marginal benefit of lower public goods costs (less important, because it doesn’t account for cheaper public goods for Greeks and Italians when Germany joins them to borrow funds jointly). We therefore get too little coordinated fiscal action. This lack of action of public goods makes some Europeans skeptical of other aspects of the EU project: one of Alesina’s final op-eds was on was on the disastrously nationalistic EU response to Covid. Luis Garicano, the well-known Spanish economist and current MEP, has a very interesting discussion with Luigi Zingales on precisely this point.

It’s positive enough that Alesina’s work was well-respected in political science and not just economics. What I especially like about Alesina, though, is how ideologically confusing his policy advice is, especially for an American. He simultaneously supports a lower tax rate for women on the basis on intrafamily dynamics, and was the leading proponent of expansionary austerity, or spending cuts during recessions! The tax rate idea is based on the greater elasticity of labor supply of women, hence is a direct application of the Ramsey rule. Expansionary austerity is based on a serious review of austerity policies over many decades. He pushed these ideas and many others in at least 10 books and dozens of op-eds (including more than 30 for VoxEU). Agree with these ideas or not – and I object to both! – Alesina nonetheless argued for these positions from a base of serious theory and empirics, rather than from ideology. What worthier legacy could there be for an academic?

What Randomization Can and Cannot Do: The 2019 Nobel Prize

It is Nobel Prize season once again, a grand opportunity to dive into some of our field’s most influential papers and to consider their legacy. This year’s prize was inevitable, an award to Abhijit Banerjee, Esther Duflo, and Michael Kremer for popularizing the hugely influential experimental approach to development. It is only fitting that my writeup this year has been delayed due to the anti-government road blockades here in Ecuador which delayed my return to the internet-enabled world – developing countries face many barriers to reaching prosperity, and rarely have I been so personally aware of the effects of place on productivity as I was this week!

The reason for the prize is straightforward: an entire branch of economics, development, looks absolutely different from what it looked like thirty years ago. Development used to be essentially a branch of economic growth. Researchers studied topics like the productivity of large versus small farms, the nature of “marketing” (or the nature of markets and how economically connected different regions in a country are), or the necessity of exports versus industrialization. Studies were almost wholly observational, deep data collections with throwaway references to old-school growth theory. Policy was largely driven by the subjective impression of donors or program managers about projects that “worked”. To be a bit too honest – it was a dull field, and hence a backwater. And worse than dull, it was a field where scientific progress was seriously lacking.

Banerjee has a lovely description of the state of affairs back in the 1990s. Lots of probably-good ideas were funded, informed deeply by history, but with very little convincing evidence that highly-funded projects were achieving their stated aims. Of the World Bank Sourcebook of recommended projects, everything from scholarships to girls to vouchers for poor children to citizens’ report cards were recommended. Did these actually work? Banerjee quotes a program providing computer terminals in rural areas of Madhya Pradesh which explains that due to a lack of electricity and poor connectivity, “only a few of the kiosks have proved to be commercially viable”, then notes, without irony, that “following the success of the initiative,” similar programs would be funded. Clearly this state of affairs is unsatisfactory. Surely we should be able to evaluate the projects we’ve funded already? And better, surely we should structure those evaluations to inform future projects? Banerjee again: “the most useful thing a development economist can do in this environment is stand up for hard evidence.”

And where do we get hard evidence? If by this we mean internal validity – that is, whether the effect we claim to have seen is actually caused by a particular policy in a particular setting – applied econometricians of the “credibility revolution” in labor in the 1980s and 1990s provided an answer. Either take advantage of natural variation with useful statistical properties, like the famed regression discontinuity, or else randomize treatment like a medical study. The idea here is that the assumptions needed to interpret a “treatment effect” are often less demanding than those needed to interpret the estimated parameter of an economic model, hence more likely to be “real”. The problem in development is that most of what we care about cannot be randomized. How are we, for instance, to randomize whether a country adopts import substitution industrialization or not, or randomize farm size under land reform – and at a scale large enough for statistical inference?

What Banerjee, Duflo, and Kremer noticed is that much of what development agencies do in practice has nothing to do with those large-scale interventions. The day-to-day work of development is making sure teachers show up to work, vaccines are distributed and taken up by children, corruption does not deter the creation of new businesses, and so on. By breaking down the work of development on the macro scale to evaluations of development at micro scale, we can at least say something credible about what works in these bite-size pieces. No longer should the World Bank Sourcebook give a list of recommended programs, based on handwaving. Rather, if we are to spend 100 million dollars sending computers to schools in a developing country, we should at least be able to say “when we spent 5 million on a pilot, we designed the pilot so as to learn that computers in that particular setting led to a 12% decrease in dropout rate, and hence a 34%-62% return on investment according to standard estimates of the link between human capital and productivity.” How to run those experiments? How should we set them up? Who can we get to pay for them? How do we deal with “piloting bias”, where the initial NGO we pilot with is more capable than the government we expect to act on evidence learned in the first study? How do we deal with spillovers from randomized experiments, econometrically? Banerjee, Duflo, and Kremer not only ran some of the famous early experiments, they also established the premier academic institution for running these experiments – J-PAL at MIT – and further wrote some of the best known practical guides to experiments in development.

Many of the experiments written by the three winners are now canonical. Let’s start with Michael Kremer’s paper on deworming, with Ted Miguel, in Econometrica. Everyone agreed that deworming kids infected with things like hookworm has large health benefits for the children directly treated. But since worms are spread by outdoor bathroom use and other poor hygiene practices, one infected kid can also harm nearby kids by spreading the disease. Kremer and Miguel suspected that one reason school attendance is so poor in some developing countries is because of the disease burden, and hence that reducing infections among one kid benefits the entire community, and neighboring ones as well, by reducing overall infection. By randomizing mass school-based deworming, and measuring school attendance both at the focal and at neighboring schools, they found that villages as far as 4km away saw higher school attendance (4km rather 6km in the original paper due to a correction of an error in the analysis). Note the good economics here: a change from individual to school-based deworming helps identify spillovers across schools, and some care goes into handling the spatial econometric issue whereby density of nearby schools equals density of nearby population equals differential baseline infection rates at these schools. An extra year of school attendance could therefore be “bought” by a donor for $3.50, much cheaper than other interventions such as textbook programs or additional teachers. Organizations like GiveWell still rate deworming among the most cost-effective educational interventions in the world: in terms of short-run impact, surely this is one of the single most important pieces of applied economics of the 21st century.

The laureates have also used experimental design to learn that some previously highly-regarded programs are not as important to development as you might suspect. Banerjee, Duflo, Rachel Glennerster and Cynthia Kinnan studied microfinance rollout in Hyderabad, randomizing the neighborhoods which received access to a major first-gen microlender. These programs are generally woman-focused, joint-responsibility, high-interest loans a la the Nobel Peace Prize winning Grameen Bank. 2800 households across the city were initially surveyed about their family characteristics, lending behavior, consumption, and entrepreneurship, then followups were performed a year after the microfinance rollout, and then three years later. While women in treated areas were 8.8 percentage points more likely to take a microloan, and existing entrepreneurs do in fact increase spending on their business, there is no long-run impact on education, health, or the likelihood women make important family decisions, nor does it make businesses more profitable. That is, credit constraints, at least in poor neighborhoods in Hyderabad, do not appear the main barrier to development; this is perhaps not very surprising, since higher-productivity firms in India in the 2000s already have access to reasonably well-developed credit markets, and surely they are the main driver of national income (followup work does see some benefits for very high talent, very poor entrepreneurs, but the long run key result remains).

Let’s realize how wild this paper is: a literal Nobel Peace Prize was awarded for a form of lending that had not really been rigorously analyzed. This form of lending effectively did not exist in rich countries at the time they developed, so it is not a necessary condition for growth. And yet enormous amounts of money went into a somewhat-odd financial structure because donors were nonetheless convinced, on the basis of very flimsy evidence, that microlending was critical.

By replacing conjecture with evidence, and showing randomized trials can actually be run in many important development settings, the laureates’ reformation of economic development has been unquestionably positive. Or has it? Before returning to the (truly!) positive aspects of Banerjee, Duflo and Kremer’s research program, we must take a short negative turn. Because though Banerjee, Duflo, and Kremer are unquestionably the leaders of the field of development, and the most influential scholars for young economists working in that field, there is much more controversy about RCTs than you might suspect if all you’ve seen are the press accolades of the method. Donors love RCTs, as they help select the right projects. Journalists love RCTs, as they are simple to explain (Wired, in a typical example of this hyperbole: “But in the realm of human behavior, just as in the realm of medicine, there’s no better way to gain insight than to compare the effect of an intervention to the effect of doing nothing at all. That is: You need a randomized controlled trial.”) The “randomista” referees love RCTs – a tribe is a tribe, after all. But RCTs are not necessarily better for those who hope to understand economic development! The critiques are three-fold.

First, that while the method of random trials is great for impact or program evaluation, it is not great for understanding how similar but not exact replications will perform in different settings. That is, random trials have no specific claim to external validity, and indeed are worse than other methods on this count. Second, it is argued that development is much more than program evaluation, and that the reason real countries grow rich has essentially nothing to do with the kinds of policies studied in the papers we discussed above: the “economist as plumber” famously popularized by Duflo, who rigorously diagnoses small problems and proposes solutions, is a fine job for a World Bank staffer, but a crazy use of the intelligence of our otherwise-leading scholars in development. Third, even if we only care about internal validity, and only care about the internal validity of some effect that can in principle be studied experimentally, the optimal experimental design is generally not an RCT.

The external validity problem is often seen to be one related to scale: well-run partner NGOs are just better at implementing any given policy that, say, a government, so the benefit of scaled-up interventions may be much lower than that identified by an experiment. We call this “piloting bias”, but it isn’t really the core problem. The core problem is that the mapping from one environment or one time to the next depends on many factors, and by definition the experiment cannot replicate those factors. A labor market intervention in a high-unemployment country cannot inform in an internally valid way about a low-unemployment country, or a country with different outside options for urban laborers, or a country with an alternative social safety net or cultural traditions about income sharing within families. Worse, the mapping from a partial equilibrium to a general equilibrium world is not at all obvious, and experiments do not inform as to the mapping. Giving cash transfers to some villagers may make them better off, but giving cash transfers to all villagers may cause land prices to rise, or cause more rent extraction by corrupt governments, or cause all sorts of other changes in relative prices.

You can see this issue in the Scientific Summary of this year’s Nobel. Literally, the introductory justification for RCTs is that, “[t]o give just a few examples, theory cannot tell us whether temporarily employing additional contract teachers with a possibility of re-employment is a more cost-effective way to raise the quality of education than reducing class sizes. Neither can it tell us whether microfinance programs effectively boost entrepreneurship among the poor. Nor does it reveal the extent to which subsidized health-care products will raise poor people’s investment in their own health.”

Theory cannot tell us the answers to these questions, but an internally valid randomized control trial can? Surely the wage of the contract teacher vis-a-vis more regular teachers and hence smaller class sizes matters? Surely it matters how well-trained these contract teachers are? Surely it matters what the incentives for investment in human capital by students in the given location is? To put this another way: run literally whatever experiment you want to run on this question in, say, rural Zambia in grade 4 in 2019. Then predict the cost-benefit ratio of having additional contract teachers versus more regular teachers in Bihar in high school in 2039. Who would think there is a link? Actually, let’s be more precise: who would think there is a link between what you learned in Zambia and what will happen in Bihar which is not primarily theoretical? Having done no RCT, I can tell you that if the contract teachers are much cheaper per unit of human capital, we should use more of them. I can tell you that if the students speak two different languages, there is a greater benefit in having a teacher assistant to translate. I can tell you that if the government or other principal has the ability to undo outside incentives with a side contract, hence are not committed to the mechanism, dynamic mechanisms will not perform as well as you expect. These types of statements are theoretical: good old-fashioned substitution effects due to relative prices, or a priori production function issues, or basic mechanism design.

Things are worse still. It is not simply that an internally valid estimate of a treatment effect often tells us nothing about how that effect generalizes, but that the important questions in development cannot be answered with RCTs. Everyone working in development has heard this critique. But just because a critique is oft-repeated does not mean it is wrong. As Lant Pritchett argues, national development is a social process involving markets, institutions, politics, and organizations. RCTs have focused on, in his reckoning, “topics that account for roughly zero of the observed variation in human development outcomes.” Again, this isn’t to say that RCTs cannot study anything. Improving the function of developing world schools, figuring out why malaria nets are not used, investigating how to reintegrate civil war fighters: these are not minor issues, and it’s good that folks like this year’s Nobelists and their followers provide solid evidence on these topics. The question is one of balance. Are we, as economists are famously wont to do, simply looking for keys underneath the spotlight when we focus our attention on questions which are amenable to a randomized study? Has the focus on internal validity diverted effort from topics that are much more fundamental to the wealth of nations?

But fine. Let us consider that our question of interest can be studied in a randomized fashion. And let us assume that we do not expect piloting bias or other external validity concerns to be first-order. We still have an issue: even on internal validity, randomized control trials are not perfect. They are certainly not a “gold standard”, and the econometricians who push back against this framing have good reason to do so. Two primary issues arise. First, to predict what will happen if I impose a policy, I am concerned that what I have learned in this past is biased (e.g., the people observed to use schooling subsidies are more diligent than those who would go to school if we made these subsidies universal). But I am also concerned about statistical inference: with small sample sizes, even an unbiased estimate will not predict very well. I recently talked with an organization doing recruitment who quasi-randomly recruited at a small number of colleges. On average, they attracted a handful of applicants in each college. They stopped recruiting at the colleges with two or fewer applicants after the first year. But of course random variation means the difference between two and four applicants is basically nil.

In this vein, randomized trials tend to have very small sample sizes compared to observational studies. When this is combined with high “leverage” of outlier observations when multiple treatment arms are evaluated, particularly for heterogeneous effects, randomized trials often predict poorly out of sample even when unbiased (see Alwyn Young in the QJE on this point). Observational studies allow larger sample sizes, and hence often predict better even when they are biased. The theoretical assumptions of a structural model permit parameters to be estimated even more tightly, as we use a priori theory to effectively restrict the nature of economic effects.

We have thus far assumed the randomized trial is unbiased, but that is often suspect as well. Even if I randomly assign treatment, I have not necessarily randomly assigned spillovers in a balanced way, nor have I restricted untreated agents from rebalancing their effort or resources. A PhD student of ours on the market this year, Carlos Inoue, examined the effect of random allocation of a new coronary intervention in Brazilian hospitals. Following the arrival of this technology, good doctors moved to hospitals with the “randomized” technology. The estimated effect is therefore nothing like what would have been found had all hospitals adopted the intervention. This issue can be stated simply: randomizing treatment does not in practice hold all relevant covariates constant, and if your response is just “control for the covariates you worry about”, then we are back to the old setting of observational studies where we need a priori arguments about what these covariates are if we are to talk about the effects of a policy.

The irony is that Banerjee, Duflo and Kremer are often quite careful in how they motivate their work with traditional microeconomic theory. They rarely make grandiose claims of external validity when nothing of the sort can be shown by their experiment. Kremer is an ace theorist in his own right, Banerjee often relies on complex decision and game theory particularly in his early work, and no one can read the care with which Duflo handles issues of theory and external validity and think she is merely punting. Most of the complaints about their “randomista” followers do not fully apply to the work of the laureates themselves.

And none of the critiques above should be taken to mean that experiments cannot be incredibly useful to development. Indeed, the proof of the pudding is in the tasting: some of the small-scale interventions by Banerjee, Duflo, and Kremer have been successfully scaled up! To analogize to a firm, consider a plant manager interested in improving productivity. She could read books on operations research and try to implement ideas, but it surely is also useful to play around with experiments within her plant. Perhaps she will learn that it’s not incentives but rather lack of information that is the biggest reason workers are, say, applying car door hinges incorrectly. She may then redo training, and find fewer errors in cars produced at the plant over the next year. This evidence – not only the treatment effect, but also the rationale – can then be brought to other plants at the same company. All totally reasonable. Indeed, would we not find it insane for a manager to try things out, and make minor changes on the margin, before implementing a huge change to incentives or training? And of course the same goes, or should go, when the World Bank or DFID or USAID spend tons of money trying to solve some development issue.

On that point, what would even a skeptic agree a development experiment can do? First, it is generally better than other methods at identifying internally valid treatment effects, though still subject to the caveats above.

Second, it can fine-tune interventions along margins where theory gives little guidance. For instance, do people not take AIDS drugs because they don’t believe they work, because they don’t have the money, or because they want to continue having sex and no one will sleep with them if they are seen picking up antiretrovirals? My colleague Laura Derksen suspected that people are often unaware that antiretrovirals prevent transmission, hence in locations with high rates of HIV, it may be safer to sleep with someone taking antiretrovirals than the population at large. She shows that informational interventions informing villagers about this property of antiretrovirals meaningfully increases takeup of medication. We learn from her study that it may be important in the case of AIDS prevention to correct this particular set of beliefs. Theory, of course, tells us little about how widespread these incorrect beliefs are, hence about the magnitude of this informational shift on drug takeup.

Third, experiments allow us to study policies that no one has yet implemented. Ignoring the problem of statistical identification in observational studies, there may be many policies we wish to implement which are wholly different in kind from those seen in the past. The negative income tax experiments of the 1970s are a classic example. Experiments give researchers more control. This additional control is of course balanced against the fact that we should expect super meaningful interventions to have already occurred, and we may have to perform experiments at relatively low scale due to cost. We should not be too small-minded here. There are now experimental development papers on topics thought to be outside the bounds of experiment. I’ve previously discussed on this site Kevin Donovan’s work randomizing the placement of roads and bridges connected remote villages to urban centers. What could be “less amenable” to randomization that the literal construction of a road and bridge network?

So where do we stand? It is unquestionable that a lot of development work in practice was based on the flimsiest of evidence. It is unquestionable that armies Banerjee, Duflo, and Kremer have sent into the world via J-PAL and similar institutions have brought much more rigor to understanding program evaluation. Some of these interventions are now literally improving the lives of millions of people with clear, well-identified, nonobvious policy. That is an incredible achievement! And there is something likeable about the desire of the ivory tower to get into the weeds of day-to-day policy. Michael Kremer on this point: “The modern movement for RCTs in development economics…is about innovation, as well as evaluation. It’s a dynamic process of learning about a context through painstaking on-the-ground work, trying out different approaches, collecting good data with good causal identification, finding out that results do not fit pre-conceived theoretical ideas, working on a better theoretical understanding that fits the facts on the ground, and developing new ideas and approaches based on theory and then testing the new approaches.” No objection here.

That said, we cannot ignore that there are serious people who seriously object to the J-PAL style of development. Deaton, who won the Nobel Prize only four years ago, writes the following, in line with our discussion above: “Randomized controlled trials cannot automatically trump other evidence, they do not occupy any special place in some hierarchy of evidence, nor does it make sense to refer to them as “hard” while other methods are “soft”… [T]he analysis of projects needs to be refocused towards the investigation of potentially generalizable mechanisms that explain why and in what contexts projects can be expected to work.” Lant Pritchett argues that despite success persuading donors and policymakers, the evidence that RCTs lead to better policies at the governmental level, and hence better outcomes for people, is far from the case. The barrier to the adoption of better policy is bad incentives, not a lack of knowledge on how given policies will perform. I think these critiques are quite valid, and the randomization movement in development often way overstates what they have, and could have in principle, learned. But let’s give the last word to Chris Blattman on the skeptic’s case for randomized trials in development: “if a little populist evangelism will get more evidence-based thinking in the world, and tip us marginally further from Great Leaps Forward, I have one thing to say: Hallelujah.” Indeed. No one, randomista or not, longs to go back to the day of unjustified advice on development, particularly “Great Leap Forward” type programs without any real theoretical or empirical backing!

A few remaining bagatelles:

1) It is surprising how early this award was given. Though incredibly influential, the earliest published papers by any of the laureates mentioned in the Nobel scientific summary are from 2003 and 2004 (Miguel-Kremer on deworming, Duflo-Saez on retirement plans, Chattopadhyay and Duflo on female policymakers in India, Banerjee and Duflo on health in Rajathstan). This seems shockingly recent for a Nobel – I wonder if there are any other Nobel winners in economics who won entirely for work published so close to the prize announcement.

2) In my field, innovation, Kremer is most famous for his paper on patent buyouts (we discussed that paper on this site way back in 2010). How do we both incentivize new drug production but also get these drugs sold at marginal cost once invented? We think the drugmakers have better knowledge about how to produce and test a new drug than some bureaucrat, so we can’t finance drugs directly. If we give a patent, then high-value drugs return more to the inventor, but at massive deadweight loss. What we want to do is offer inventors some large fraction of the social return to their invention ex-post, in exchange for making production perfectly competitive. Kremer proposes patent auctions where the government pays a multiple of the winning bid with some probability, giving the drug to the public domain. The auction reveals the market value, and the multiple allows the government to account for consumer surplus and deadweight loss as well. There are many practical issues, but I have always found this an elegant, information-based attempt to solve the problem of innovation production, and it has been quite influential on those grounds.

3) Somewhat ironically, Kremer also has a great 1990s growth paper with RCT-skeptics Pritchett, Easterly and Summers. The point is simple: growth rates by country vacillate wildly decade to decade. Knowing the 2000s, you likely would not have predicted countries like Ethiopia and Myanmar as growth miracles of the 2010s. Yet things like education, political systems, and so on are quite constant within-country across any two decade period. This necessarily means that shocks of some sort, whether from international demand, the political system, nonlinear cumulative effects, and so on, must be first-order for growth. A great, straightforward argument, well-explained.

4) There is some irony that two of Duflo’s most famous papers are not experiments at all. Her most cited paper by far is a piece of econometric theory on standard errors in difference-in-difference models, written with Marianne Bertrand. Her next most cited paper is a lovely study of the quasi-random school expansion policy in Indonesia, used to estimate the return on school construction and on education more generally. Nary a randomized experiment in sight in either paper.

5) I could go on all day about Michael Kremer’s 1990s essays. In addition to Patent Buyouts, two more of them appear on my class syllabi. The O-Ring theory is an elegant model of complementary inputs and labor market sorting, where slightly better “secretaries” earn much higher wages. The “One Million B.C.” paper notes that growth must have been low for most of human history, and that it was limited because low human density limited the spread of nonrivalrous ideas. It is the classic Malthus plus endogenous growth paper, and always a hit among students.

6) Ok, one more for Kremer, since “Elephants” is my favorite title in economics. Theoretically, future scarcity increases prices. When people think elephants will go extinct, the price of ivory therefore rises, making extinction more likely as poaching incentives go up. What to do? Hold a government stockpile of ivory and commit to selling it if the stock of living elephants falls below a certain point. Elegant. And I can’t help but think: how would one study this particular general equilibrium effect experimentally? I both believe the result and suspect that randomized trials are not a good way to understand it!

The Price of Everything, the Value of the Economy: A Clark Medal for Emi Nakamura!

Fantastic and well-deserved news this morning with the Clark Medal being awarded to Emi Nakamura, who has recently moved from Columbia to Berkeley. Incredibly, Nakamura’s award is the first Clark to go to a macroeconomist in the 21st century. The Great Recession, the massive changes in global trade patterns, the rise of monetary areas like the Eurozone, the “savings glut” and its effect on interest rates, the change in openness to hot financial flows: it has been a wild twenty years for the macroeconomy in the two decades since Andrei Schleifer won the Clark. It’s hard to imagine what could be more important for an economist to understand than these patterns.

Something unusual has happened in macroeconomics over the past twenty years: it has become more like Industrial Organization! A brief history may be useful. The term macroeconomics is due to Ragnar Frisch, in his 1933 article on the propagation of economic shocks. He writes,

“The macro-dynamic analysis…tries to give an account of the whole economic system taken in its entirety. Obviously in this case it is impossible to carry through the analysis in great detail. Of course, it is always possible to give even a macro-dynamic analysis in detail if we confine ourselves to a purely formal theory. Indeed, it is always possible by a suitable system of subscripts and superscripts, etc., to introduce practically all factors which we may imagine…Such a theory, however, would have only a rather limited interest. It would hardly be possible to study such fundamental problems as the exact time shape of the solution, [etc.]. These latter problems are just the essential problems in business cycle analysis. In order to attack these problems on a macro-dynamic basis…we must deliberately disregard a considerable amount of the details of the picture.

And so we did. The Keynesians collapsed the microfoundations of the macroeconomy into a handful of relevant market-wide parameters. The Lucas Critique argued that we can collapse some things – many agents into a representative agent, for instance – but we ought always begin our analysis with the fundamental parameters of tastes, constraints, and technologies. The neoclassical synthesis combined these raw parameters with nominal ridigities – sticky prices, limited information, and so on. But Frisch’s main point nonetheless held strong: to what use are these deeper theoretical parameters if we cannot estimate their value and their effect on the macroeconomy? As Einstein taught us, the goal of the scientist should be to make things as simple as possible, but no simpler.

What has changed recently in macroeconomics is twofold. First, computational power now makes it possible to estimate or calibrate very complex dynamic and stochastic models, with forward looking agents, with price paths in and out of equilibrium, with multiple frictions – it is in this way that macro begins to look like industrial organization, with microeconomic parameters at the base. But second, and again analogous to IO, the amount of data available to the researcher has grown enormously. We now have price scanner data that tells us exactly when and how prices change, how those changes propagate across supply chains and countries, how they interact with taxes, and so on. Frisch’s problem has in some sense been solved: we no longer have the same trade-off between usefulness and depth when studying the macroeconomy.

Nakamura is best known for using this deep combination of data and theory to understand how exactly firms set prices. Price rigidities play a particularly important role in theories of the macroeconomy that potentially involve inefficiency. Consider a (somewhat bowdlerized) version of real business cycle theory. Here, shocks hit the economy: for instance, an oil cartel withholds supply for political reasons. Firms must react to this “real” supply-side shock by reorganizing economic activity. The real shock then propagates across industries. The role of monetary policy in such a world is limited: a recession simply reflects industries reacting to real change in the economic environment.

When prices are “sticky”, however, that is no longer true. The speed by which real shocks propagate, and the distortion sticky prices introduce, can be affected by monetary policy, since firms will react to changes in expected inflation by changing the frequency in which they update prices. Famously, Golosov and Lucas in the JPE argued, theoretically and empirically, that the welfare effects of “sticky prices” or “menu costs” are not terribly large. Extracting these welfare effects is quite sensitive to a number of features in the data and in the theory. To what extent is there short-term price dispersion rather than an exogenous chance for all firms in an industry to change their prices? Note that price dispersion is difficult to maintain unless we have consumer search costs – otherwise, everyone buys from the cheapest vendor – so price dispersion adds a non-trivial technical challenge. How much do prices actually change – do we want to sweep out short-term sales, for example? When inflation is higher, do firms adjust prices equally often but with bigger price jumps (consider the famous doubling of the price of Coca-Cola), or do they adjust prices more often keeping the percentage change similar to low-inflation environments? How much heterogeneity is there is the price-setting practice across industries, and to what extent do these differences affect the welfare consequences of prices given the links across industries?

Namakura has pushed us very far into answering these questions. She has built insane price datasets, come up with clever identification strategies to separate pricing models, and used these tools to vastly increase our understanding of the interaction between price rigidities and the business cycle. Her “Five Facts” paper uses BLS microdata to show that sales were roughly half of the “price changes” earlier researchers has found, that prices change more rapidly when inflation is higher, and that there is huge heterogeneity across industries in price change behavior. Taking that data back to the 1970s, Nakamura and coauthors also show that high inflation environments do not cause more price dispersion: rather, firms update their prices more often. Bob Lucas in his Macroeconomic Priorities made a compelling argument that business cycle welfare costs are much smaller than the costs of inflation and inflation costs are themselves much smaller than the costs of tax distortions. As Nakamura points out, if you believe this, no wonder you prioritize price stability and tax policy! (Many have quibbled with Lucas’ basic argument, but even adding heterogeneous agents, it is tough to get business cycles to have large economic consequences; see, e.g., Krusell et al RED 2009.) Understanding better the true costs of inflation, via the feedback of monetary expansion on pricesetting, goes a great deal toward helping policymakers calibrate the costs and benefits of price stability vis-a-vis other macroeconomic goals.

Though generally known as an empirical macroeconomist, Nakamura also has a number of papers, many with her husband Jon Steinsson, on the theory of price setting. For example, why are prices both sticky and also involve sales? In a clever paper in the JME, Nakamura and Steinsson model a firm pricing to habit-forming consumers. If the firm does not constrain itself, it has the incentive to raise prices once consumers form their habit for a given product (as a Cheez-It fan, I understand the model well – my willingness to pay for a box shipped up from the US to the Cheez-It-free land of Canada is absurdly high). To avoid this time inconsistency problems, firms would like to commit to a price path with some flexibility to respond to changes in demand. An equilibrium in this relational contract-type model involves a price cap with sales when demand falls: rigid prices plus sales, as we see in the data! In a second theoretical paper with Steinsson and Alisdair McKay, Nakamura looks into how much communication about future nominal interest rates can affect behavior. In principle, a ton: if you tell me the Fed will keep the real interest rate low for many years (low rates in the future raise consumption in the future which raises inflation in the future which lowers real rates today), I will borrow away. Adding borrowing constraints and income risk, however, means that I will never borrow too much money: I might get a bad shock tomorrow and wind up on the street. Giving five years of forward guidance about interest rates rather than a year, therefore, doesn’t really affect my behavior that much: the desire to have precautionary savings is what limits my borrowing, not the interest rate.

Nakamura’s prize is a well-deserved award, going to a leader in the shift in macro toward a more empirical, more deeply “microeconomic” in its theory, style of macro. Her focus is keenly targeted toward some of the key puzzles relevant to macroeconomic policymakers. There is no way to cover such a broad field in one post – this is not one of those awards given for a single paper – but luckily Nakamura has two great easily-readable summaries of her core work. First, in the Annual Review of Economics, she lays out the new empirical facts on price changes, the attempts to identify the link between monetary policy and price changes, and the implications for business cycle theory. Second, in the Journal of Economic Perspectives, she discusses how macroeconomists have attempted to more credibly identify theoretical parameters. In particular, external validity is so concerning in macro – remember the Lucas Critique! – that the essence of the problem involves combining empirical variation for identification with theory mapping that variation into broader policy guidance. I hesitate to stop here since Nakamura has so many influential papers, but let us take just more quick tasters that are well worth your more deep exploration. On the government spending side, she uses local spending shocks and a serious model to figure out the national fiscal multiplier from government spending. Second, she recently has linked the end of large-scale increases in female moves from home production to the labor force has caused recessions to last longer.

How We Create and Destroy Growth: A Nobel for Romer and Nordhaus

Occasionally, the Nobel Committee gives a prize which is unexpected, surprising, yet deft in how it points out underappreciated research. This year, they did no such thing. Both William Nordhaus and Paul Romer have been running favorites for years in my Nobel betting pool with friends at the Federal Reserve. The surprise, if anything, is that the prize went to both men together: Nordhaus is best known for his environmental economics, and Romer for his theory of “endogenous” growth.

On reflection, the connection between their work is obvious. But it is the connection that makes clear how inaccurate many of today’s headlines – “an economic prize for climate change” – really is. Because it is not the climate that both winners build on, but rather a more fundamental economic question: economic growth. Why are some places and times rich and others poor? And what is the impact of these differences? Adam Smith’s “The Wealth of Nations” is formally titled “An Inquiry into the Nature and Causes of the Wealth of Nations”, so these are certainly not new questions in economics. Yet the Classical economists did not have the same conception of economic growth that we have; they largely lived in a world of cycles, of ebbs and flows, with income per capita facing the constraint of agricultural land. Schumpeter, who certainly cared about growth, notes that Smith’s discussion of the “different progress of opulence in different nations” is “dry and uninspired”, perhaps only a “starting point of a sort of economic sociology that was never written.”

As each generation became richer than the one before it – at least in a handful of Western countries and Japan – economists began to search more deeply for the reason. Marx saw capital accumulation as the driver. Schumpeter certainly saw innovation (though not invention, as he always made clear) as important, though he had no formal theory. It was two models that appear during and soon after World War II – that of Harrod-Domar, and Solow-Swan-Tinbergen – which began to make real progress. In Harrod-Domar, economic output is a function of capital Y=f(K), nothing is produced without capital f(0)=0, the economy is constant returns to scale in capital df/dK=c, and the change in capital over time depends on what is saved from output minus what depreciates dK/dt=sY-zK, where z is the rate of depreciation. Put those assumptions together and you will see that growth, dY/dt=sc-z. Since c and z are fixed, the only way to grow is to crank up the savings rate, Soviet style. And no doubt, capital deepening has worked in many places.

Solow-type models push further. They let the economy be a function of “technology” A(t), the capital stock K(t), and labor L(t), where output Y(t)=K^a*(A(t)L(t))^(1-a) – that is, that production is constant returns to scale in capital and labor. Solow assumes capital depends on savings and depreciation as in Harrod-Domar, that labor grows at a constant rate n, and that “technology” grows at constant rate g. Solving this model gets you that the economy grows such that dY/dt=sy-k(n+z+g), and that output is exactly proportional to capital. You can therefore just run a regression: we observe the amount of labor and capital, and Solow shows that there is not enough growth in those factors to explain U.S. growth. Instead, growth seems to be largely driven by change in A(t), what Abramovitz called “the measure of our ignorance” but which we often call “technology” or “total factor productivity”.

Well, who can see that fact, as well as the massive corporate R&D facilities of the post-war era throwing out inventions like the transistor, and not think: surely the factors that drive A(t) are endogenous, meaning “from within”, to the profit-maximizing choices of firms? If firms produce technology, what stops other firms from replicating these ideas, a classic positive externality which would lead the rate of technology in a free market to be too low? And who can see the low level of convergence of poor country incomes to rich, and not think: there must be some barrier to the spread of A(t) around the world, since otherwise the return to capital must be extraordinary in places with access to great technology, really cheap labor, and little existing capital to combine with it. And another question: if technology – productivity itself! – is endogenous, then ought we consider not just the positive externality that spills over to other firms, but also the negative externality of pollution, especially climate change, that new technologies both induce and help fix? Finally, if we know how to incentivize new technology, and how growth harms the environment, what is the best way to mitigate the great environmental problem of our day, climate change, without stopping the wondrous increase in living standards growth keeps providing? It is precisely for helping answer these questions that Romer and Nordhaus won the Nobel.

Romer and Endogenous Growth

Let us start with Paul Romer. You know you have knocked your Ph.D. thesis out of the park when the great economics journalist David Warsh writes an entire book hailing your work as solving the oldest puzzle in economics. The two early Romer papers, published in 1986 and 1990, have each been cited more than 25,000 times, which is an absolutely extraordinary number by the standards of economics.

Romer’s achievement was writing a model where inventors spend money to produce inventions with increasing returns to scale, other firms use those inventions to produce goods, and a competitive Arrow-Debreu equilibrium still exists. If we had such a model, we could investigate what policies a government might wish to pursue if it wanted to induce firms to produce growth-enhancing inventions.

Let’s be more specific. First, innovation is increasing returns to scale because ideas are nonrival. If I double the amount of labor and capital, holding technology fixed, I double output, but if I double technology, labor, and capital, I more than double output. That is, give one person a hammer, and they can build, say, one staircase a day. Give two people two hammers, and they can build two staircases by just performing exactly the same tasks. But give two people two hammers, and teach them a more efficient way to combine nail and wood, and they will be able to build more than two staircases. Second, if capital and labor are constant returns to scale and are paid their marginal product in a competitive equilibrium, then there is no output left to pay inventors anything for their ideas. That is, it is not tough to model in partial equilibrium the idea of nonrival ideas, and indeed the realization that a single invention improves productivity for all is also an old one: as Thomas Jefferson wrote in 1813, “[h]e who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me.” The difficulty is figuring out how to get these positive spillovers yet still have “prices” or some sort of rent for the invention. Otherwise, why would anyone pursue costly invention?

We also need to ensure that growth is not too fast. There is a stock of existing technology in the world. I use that technology to create new innovations which grow the economy. With more people over time and more innovations over time, you may expect the growth rate to be higher in bigger and more technologically advanced societies. It is in part, as Michael Kremer points out in his One Million B.C. paper. Nonetheless, the rate of growth is not asymptotically increasing by any stretch (see, e.g., Ben Jones on this point). Indeed, growth is nearly constant, abstracting from the business cycle, in the United States, despite a big growth in population and the stock of existing technology.

Romer’s first attempt at endogenous growth was based on his thesis and published in the JPE in 1986. Here, he adds “learning by doing” to Solow: technology is a function of the capital stock A(t)=bK(t). As each firm uses capital, they generate learning which spills over to other firms. Even if population is constant, with appropriate assumptions on production functions and capital depreciation, capital, output, and technology grow over time. There is a problem here, however, and one that is common to any model based on learning-by-doing which partially spills over to other firms. As Dasgupta and Stiglitz point out, if there is learning-by-doing which only partially spills over, the industry is a natural monopoly. And even if it starts competitively, as I learn more than you, dynamically I can produce more efficiently, lower my prices, and take market share from you. A decentralized competitive equilibrium with endogenous technological growth is unsustainable!

Back to the drawing board, then. We want firms to intentionally produce technology in a competitive market as they would other goods. We want technology to be nonrival. And we want technology production to lead to growth. Learning-by-doing allows technology to spill over, but would simply lead to a monopoly producer. Pure constant-returns-to-scale competitive production, where technology is just an input like capital produced with a “nonconvexity” – only the initial inventor pays the fixed cost of invention – means that there is no output left to pay for invention once other factors get their marginal product. A natural idea, well known to Arrow 1962 and others, emerges: we need some source of market power for inventors.

Romer’s insight is that inventions are nonrival, yes, but they are also partially excludable, via secrecy, patents, or other means. In his blockbuster 1990 JPE Endogenous Technological Change, he lets inventions be given an infinite patent, but also be partially substitutable by other inventions, constraining price (this is just a Spence-style monopolistic competition model). The more inventions there are, the more efficiently final goods can be made. Future researchers can use present technology as an input to their invention for free. Invention is thus partially excludable in the sense that my exact invention is “protected” from competition, but also spills over to other researchers by making it easier for them to invent other things. Inventions are therefore neither public nor private goods, and also not “club goods” (nonrival but excludable) since inventors cannot exclude future inventors from using their good idea to motivate more invention. Since there is free entry into invention, the infinite stream of monopoly rents from inventions is exactly equal to their opportunity cost.

From the perspective of final goods producers, there are just technologies I can license as inputs, which I then use in a constant returns to scale way to produce goods, as in Solow. Every factor is paid its marginal product, but inventions are sold for more than their marginal cost due to monopolistic excludability from secrecy or patents. The model is general equilibrium, and gives a ton of insight about policy: for instance, if you subsidize capital goods, do you get more or less growth? In Romer (1986), where all growth is learning-by-doing, cheaper capital means more learning means more growth. In Romer (1990), capital subsidies can be counterproductive!

There are some issues to be worked out: the Romer models still have “scale effects” where growth is not constant, roughly true in the modern world, despite changes in population and the stock of technology (see Chad Jones’ 1995 and 1999 papers). The neo-Schumpeterian models of Aghion-Howitt and Grossman-Helpman add the important idea that new inventions don’t just add to the stock of knowledge, but also make old inventions less valuable. And really critically, the idea that institutions and not just economic fundamentals affect growth – meaning laws, culture, and so on – is a massive field of research at present. But it was Romer who first cracked the nut of how to model invention in general equilibrium, and I am unaware of any later model which solves this problem in a more satisfying way.

Nordhaus and the Economic Solution to Pollution

So we have, with Romer, a general equilibrium model for thinking about why people produce new technology. The connection with Nordhaus comes in a problem that is both caused by, and potentially solved by, growth. In 2018, even an ignoramus knows the terms “climate change” and “global warming”. This was not at all the case when William Nordhaus began thinking about how the economy and the environment interrelate in the early 1970s.

Growth as a policy goal was fairly unobjectionable as a policy goal in 1960: indeed, a greater capability of making goods, and of making war, seemed a necessity for both the Free and Soviet worlds. But by the early 1970s, environmental concerns arose. The Club of Rome warned that we were going to run out of resources if we continued to use them so unsustainably: resources are of course finite, and there are therefore “limits to growth”. Beyond just running out of resources, growth could also be harmful because of negative externalities on the environment, particularly the newfangled idea of global warming an MIT report warned about in 1970.

Nordhaus treated those ideas both seriously and skeptically. In a 1974 AER P&P, he notes that technological progress or adequate factor substitution allow us to avoid “limits to growth”. To put it simply, whales are limited in supply, and hence whale oil is as well, yet we light many more rooms than we did in 1870 due to new technologies and substitutes for whale oil. Despite this skepticism, Nordhaus does show concern for the externalities of growth on global warming, giving a back-of-the-envelope calculation that along a projected Solow-type growth path, the amount of carbon in the atmosphere will reach a dangerous 487ppm by 2030, surprisingly close to our current estimates. In a contemporaneous essay with Tobin, and in a review of an environmentalist’s “system dynamics” predictions of future economic collapse, Nordhaus reaches a similar conclusion: substitutable factors mean that running out of resources is not a huge concern, but rather the exact opposite, that we will have access to and use too many polluting resources, should worry us. That is tremendous foresight for someone writing in 1974!

Before turning back to climate change, can we celebrate again the success of economics against the Club of Rome ridiculousness? There were widespread predictions, from very serious people, that growth would not just slow but reverse by the end of the 1980s due to “unsustainable” resource use. Instead, GDP per capita has nearly doubled since 1990, with the most critical change coming for the very poorest. There would have been no greater disaster for the twentieth century than had we attempted to slow the progress and diffusion of technology, in agriculture, manufacturing and services alike, in order to follow the nonsense economics being promulgated by prominent biologists and environmental scientists.

Now, being wrong once is no guarantee of being wrong again, and the environmentalists appear quite right about climate change. So it is again a feather in the cap of Nordhaus to both be skeptical of economic nonsense, and also sound the alarm about true environmental problems where economics has something to contribute. As Nordhaus writes, “to dismiss today’s ecological concerns out of hand would be reckless. Because boys have mistakenly cried “wolf’ in the past does not mean that the woods are safe.”

Just as we can refute Club of Rome worries with serious economics, so too can we study climate change. The economy affects the climate, and the climate effects the economy. What we need an integrated model to assess how economic activity, including growth, affects CO2 production and therefore climate change, allowing us to back out the appropriate Pigouvian carbon tax. This is precisely what Nordhaus did with his two celebrated “Integrated Assessment Models”, which built on his earlier simplified models (e.g., 1975’s Can We Control Carbon Dioxide?). These models have Solow-type endogenous savings, and make precise the tradeoffs of lower economic growth against lower climate change, as well as making clear the critical importance of the social discount rate and the micro-estimates of the cost of adjustment to climate change.

The latter goes well beyond the science of climate change holding the world constant: the Netherlands, in a climate sense, should be underwater, but they use dikes to restraint the ocean. Likewise, the cost of adjusting to an increase in temperature is something to be estimated empirically. Nordhaus takes climate change very seriously, but he is much less concerned about the need for immediate action than the famous Stern report, which takes fairly extreme positions about the discount rate (1000 generations in the future are weighed the same as us, in Stern) and the costs of adjustment.

Consider the following “optimal path” for carbon from Nordhaus’ most recent run of the model, where the blue line is his optimum.

Note that he permits much more carbon than Stern or a policy which mandates temperatures stay below a 2.5 C rise forever. The reason is the costs to growth in the short term are high: the world is still very poor in many places! There was a vitriolic debate following the Stern report about who was correct: whether the appropriate social discount rate is zero or something higher is a quasi-philosophical debate going back to Ramsey (1928). But you can see here how important the calibration is.

There are other minor points of disagreement between Nordhaus and Stern, and my sense is that there has been some, though not full, convergence if their beliefs about optimal policy. But there is no disagreement whatsoever between the economic and environmental community that the appropriate way to estimate the optimal response to climate change is via an explicit model incorporating some sort of endogeneity of economic reaction to climate policy. The power of the model is that we can be extremely clear about what points of disagreement remain, and we can examine the sensitivity of optimal policy to factors like climate “tipping points”.

There is one other issue: in Nordhaus’ IAMs, and in Stern, you limit climate change by imposing cap and trade or carbon taxes. But carbon harms cross borders. How do you stop free riding? Nordhaus, in a 2015 AER, shows theoretically that there is no way to generate optimal climate abatement without sanctions for non-participants, but that relatively small trade penalties work quite well. This is precisely what Emmanuel Macron is currently proposing!

Let’s wrap up by linking Nordhaus even more tightly back to Romer. It should be noted that Nordhaus was very interested in the idea of pure endogenous growth, as distinct from any environmental concerns, from the very start of his career. His thesis was on the topic (leading to a proto-endogenous growth paper in the AER P&P in 1969), and he wrote a skeptical piece in the QJE in 1973 about the then-leading theories of what factors induce certain types of innovation (objections which I think have been fixed by Acemoglu 2002). Like Romer, Nordhaus has long worried that inventors do not receive enough of the return to their invention, and that we measure innovation poorly – see his classic NBER chapter on inventions in lighting, and his attempt to estimate how much of how much of society’s output goes to innovators.

The connection between the very frontier of endogenous growth models, and environmental IAMs, has not gone unnoticed by other scholars. Nordhaus IAMs tend to have limited incorporation of endogenous innovation in dirty or clean sectors. But a fantastic paper by Acemoglu, Aghion, Bursztyn, and Hemous combines endogenous technical change with Nordhaus-type climate modeling to suggest a middle ground between Stern and Nordhaus: use subsidies to get green energy close to the technological frontier, then use taxes once their distortion is relatively limited because a good green substitute exists. Indeed, since this paper first started floating around 8 or so years ago, massive subsidies to green energy sources like solar by many countries have indeed made the “cost” of stopping climate change much lower than if we’d relied solely on taxes, since now production of very low cost solar, and mass market electric cars, is in fact economically viable.

It may indeed be possible to solve climate change – what Stern called “the greatest market failure” man has ever seen – by changing the incentives for green innovation, rather than just by making economic growth more expensive by taxing carbon. Going beyond just solving the problem of climate change, to solving it in a way that minimizes economic harm, is a hell of an accomplishment, and more than worthy of the Nobel prizes Romer and Nordhaus won for showing us this path!

Some Further Reading

In my PhD class on innovation, the handout I give on the very first day introduces Romer’s work and why non-mathematical models of endogenous innovation mislead. Paul Romer himself has a nice essay on climate optimism, and the extent to which endogenous invention matters for how we stop global warming. On why anyone signs climate change abatement agreements, instead of just free riding, see the clever incomplete contracts insight of Battaglini and Harstad. Romer has also been greatly interested in the policy of “high-growth” places, pushing the idea of Charter Cities. Charter Cities involve Hong Kong like exclaves of a developing country where the institutions and legal systems are farmed out to a more stable nation. Totally reasonable, but in fact quite controversial: a charter city proposal in Madagascar led to a coup, and I can easily imagine that the Charter City controversy delayed Romer’s well-deserved Nobel laurel. The New York Times points out that Nordhaus’ brother helped write the Clean Air Act of 1970. Finally, as is always true with the Nobel, the official scientific summary is lucid and deep in its exploration of the two winners’ work.

The 2017 Nobel: Richard Thaler

A true surprise this morning: the behavioral economist Richard Thaler from the University of Chicago has won the Nobel Prize in economics. It is not a surprise because it is undeserving; rather, it is a surprise because only four years ago, Thaler’s natural co-laureate Bob Shiller won while Thaler was left the bridesmaid. But Thaler’s influence on the profession, and the world, is unquestionable. There are few developed governments who do not have a “nudge” unit of some sort trying to take advantage of behavioral nudges to push people a touch in one way or another, including here in Ontario via my colleagues at BEAR. I will admit, perhaps under the undue influence of too many dead economists, that I am skeptical of nudging and behavioral finance on both positive and normative grounds, so this review will be one of friendly challenge rather than hagiography. I trust that there will be no shortage of wonderful positive reflections on Thaler’s contribution to policy, particularly because he is the rare economist whose work is totally accessible to laymen and, more importantly, journalists.

Much of my skepticism is similar to how Fama thinks about behavioral finance: “I’ve always said they are very good at describing how individual behavior departs from rationality. That branch of it has been incredibly useful. It’s the leap from there to what it implies about market pricing where the claims are not so well-documented in terms of empirical evidence.” In other words, surely most people are not that informed and not that rational much of the time, but repeated experience, market selection, and other aggregative factors mean that this irrationality may not matter much for the economy at large. It is very easy to claim that since economists model “agents” as “rational”, we would, for example, “not expect a gift on the day of the year in which she happened to get married, or be born” and indeed “would be perplexed by the idea of gifts at all” (Thaler 2015). This type of economist caricature is both widespread and absurd, I’m afraid. In order to understand the value of Thaler’s work, we ought first look at situations where behavioral factors matter in real world, equilibrium decisions of consequence, then figure out how common those situations are, and why.

The canonical example of Thaler’s useful behavioral nudges is his “Save More Tomorrow” pension plan, with Benartzi. Many individuals in defined contribution plans save too little, both because they are not good at calculating how much they need to save and because they are biased toward present consumption. You can, of course, force people to save a la Singapore, but we dislike these plans because individuals vary in their need and desire for saving, and because we find the reliance on government coercion to save heavy-handed. Alternatively, you can default defined-contribution plans to involve some savings rate, but it turns out people do not vary their behavior from the default throughout their career, and hence save too little solely because they didn’t want too much removed from their first paycheck. Thaler and Benartzi have companies offer plans where you agree now to having your savings rate increased when you get raises – for instance, if your salary goes up 2%, you will have half of that set into a savings plan tomorrow, until you reach a savings rate that is sufficiently high. In this way, no one takes a nominal post-savings paycut. People can, of course, leave this plan whenever they want. In their field experiments, savings rates did in fact soar (with takeup varying hugely depending on how information about the plan was presented), and attrition in the future from the plan was low.

This policy is what Thaler and Sunstein call “libertarian paternalism”. It is paternalistic because, yes, we think that you may make bad decisions from your own perspective because you are not that bright, or because you are lazy, or because you have many things which require your attention. It is libertarian because there is no compulsion, in that anyone can opt out at their leisure. Results similar to Thaler and Benartzi’s have found by Ashraf et al in a field experiment in the Philippines, and by Karlan et al in three countries where just sending reminder messages which make savings goals more salient modestly increase savings.

So far, so good. We have three issues to unpack, however. First, when is this nudge acceptable on ethical grounds? Second, why does nudging generate such large effects here, and if the effects are large, why doesn’t the market simply provide them? Third, is the 401k savings case idiosyncratic or representative? The idea that the homo economicus, rational calculator, misses important features of human behavior, and would do with some insights from psychology, is not new, of course. Thaler’s prize is, at minimum, the fifth Nobel to go to someone pushing this general idea, since Herb Simon, Maurice Allais, Daniel Kahneman, and the aforementioned Bob Shiller have all already won. Copious empirical evidence, and indeed simple human observation, implies that people have behavioral biases, that they are not perfectly rational – as Thaler has noted, we see what looks like irrationality even in the composition of 100 million dollar baseball rosters. The more militant behavioralists insist that ignoring these psychological factors is unscientific! And yet, and yet: the vast majority of economists, all of whom are by now familiar with these illustrious laureates and their work, still use fairly standard expected utility maximizing agents in nearly all of our papers. Unpacking the three issues above will clarify how that could possibly be so.

Let’s discuss ethics first. Simply arguing that organizations “must” make a choice (as Thaler and Sunstein do) is insufficient; we would not say a firm that defaults consumers into an autorenewal for a product they rarely renew when making an active choice is acting “neutrally”. Nudges can be used for “good” or “evil”. Worse, whether a nudge is good or evil depends on the planner’s evaluation of the agent’s “inner rational self”, as Infante and Sugden, among others, have noted many times. That is, claiming paternalism is “only a nudge” does not excuse the paternalist from the usual moral philosophic critiques! Indeed, as Chetty and friends have argued, the more you believe behavioral biases exist and are “nudgeable”, the more careful you need to be as a policymaker about inadvertently reducing welfare. There is, I think, less controversy when we use nudges rather than coercion to reach some policy goal. For instance, if a policymaker wants to reduce energy usage, and is worried about distortionary taxation, nudges may (depending on how you think about social welfare with non-rational preferences!) be a better way to achieve the desired outcomes. But this goal is very different from the common justification that nudges somehow are pushing people toward policies they actually like in their heart of hearts. Carroll et al have a very nice theoretical paper trying to untangle exactly what “better” means for behavioral agents, and exactly when the imprecision of nudges or defaults given our imperfect knowledge of individual’s heterogeneous preferences makes attempts at libertarian paternalism worse than laissez faire.

What of the practical effects of nudges? How can they be so large, and in what contexts? Thaler has very convincingly shown that behavioral biases can affect real world behavior, and that understanding those biases means two policies which are identical from the perspective of a homo economicus model can have very different effects. But many economic situations involve players doing things repeatedly with feedback – where heuristics approximated by rationality evolve – or involve players who “perform poorly” being selected out of the game. For example, I can think of many simple nudges to get you or me to play better basketball. But when it comes to Michael Jordan, the first order effects are surely how well he takes cares of his health, the teammates he has around him, and so on. I can think of many heuristics useful for understanding how simple physics will operate, but I don’t think I can find many that would improve Einstein’s understanding of how the world works. The 401k situation is unusual because it is a decision with limited short-run feedback, taken by unsophisticated agents who will learn little even with experience. The natural alternative, of course, is to have agents outsource the difficult parts of the decision, to investment managers or the like. And these managers will make money by improving people’s earnings. No surprise that robo-advisors, index funds, and personal banking have all become more important as defined contribution plans have become more common! If we worry about behavioral biases, we ought worry especially about market imperfections that prevent the existence of designated agents who handle the difficult decisions for us.

The fact that agents can exist is one reason that irrationality in the lab may not translate into irrationality in the market. But even without agents, we might reasonably be suspect of some claims of widespread irrationality. Consider Thaler’s famous endowment effect: how much you are willing to pay for, say, a coffee mug or a pen is much less than how much you would accept to have the coffee mug taken away from you. Indeed, it is not unusual in a study to find a ratio of three times or greater between the willingness to pay and willingness to accept amount. But, of course, if these were “preferences”, you could be money pumped (see Yaari, applying a theorem of de Finetti, on the mathematics of the pump). Say you value the mug at ten bucks when you own it and five bucks when you don’t. Do we really think I can regularly get you to pay twice as much by loaning you the mug for free for a month? Do we see car companies letting you take a month-long test drive of a $20,000 car then letting you keep the car only if you pay $40,000, with some consumers accepting? Surely not. Now the reason why is partly what Laibson and Yariv argue, that money pumps do not exist in competitive economies since market pressure will compete away rents: someone else will offer you the car at $20,000 and you will just buy from them. But even if the car company is a monopolist, surely we find the magnitude of the money pump implied here to be on face ridiculous.

Even worse are the dictator games introduced in Thaler’s 1986 fairness paper. Students were asked, upon being given $20, whether they wanted to give an anonymous student half of their endowment or 10%. Many of the students gave half! This experiment has been repeated many, many times, with similar effects. Does this mean economists are naive to neglect the social preferences of humans? Of course not! People are endowed with money and gifts all the time. They essentially never give any of it to random strangers – I feel confident assuming you, the reader, have never been handed some bills on the sidewalk by an officeworker who just got a big bonus! Worse, the context of the experiment matters a ton (see John List on this point). Indeed, despite hundreds of lab experiments on dictator games, I feel far more confident predicting real world behavior following windfalls if we use a parsimonious homo economicus model than if we use the results of dictator games. Does this mean the games are useless? Of course not – studying what factors affect other-regarding preferences is interesting, and important. But how odd to have a branch of our field filled with people who see armchair theorizing of homo economicus as “unscientific”, yet take lab experiments so literally even when they are so clearly contrary to data?

To take one final example, consider Thaler’s famous model of “mental accounting”. In many experiments, he shows people have “budgets” set aside for various tasks. I have my “gas budget” and adjust my driving when gas prices change. I only sell stocks when I am up overall on that stock since I want my “mental account” of that particular transaction to be positive. But how important is this in the aggregate? Take the Engel curve. Budget shares devoted to food fall with income. This is widely established historically and in the cross section. Where is the mental account? Farber (2008 AER) even challenges the canonical account of taxi drivers working just enough hours to make their targeted income. As in the dictator game and the endowment effect, there is a gap between what is real, psychologically, and what is consequential enough to be first-order in our economic understanding of the world.

Let’s sum up. Thaler’s work is brilliant – it is a rare case of an economist taking psychology seriously and actually coming up with policy-relevant consequences like the 401k policy. But Thaler’s work is also dangerous to young economists who see biases everywhere. Experts in a field, and markets with agents and mechanisms and all the other tricks they develop, are very very good at ferreting out irrationality, and economists core skill lies in not missing those tricks.

Some remaining bagatelles: 1) Thaler and his PhD advisor, Sherwin Rosen, have one of the first papers on measuring the “statistical” value of a life, a technique now widely employed in health economics and policy. 2) Beyond his academic work, Thaler has won a modicum of fame as a popular writer (Nudge, written with Cass Sunstein, is canonical here) and for his brief turn as an actor alongside Selena Gomez in “The Big Short”. 3) Dick has a large literature on “fairness” in pricing, a topic which goes back to Thomas Aquinas, if not earlier. Many of the experiments Thaler performs, like the thought experiments of Aquinas, come down to the fact that many perceive market power to be unfair. Sure, I agree, but I’m not sure there’s much more that can be learned than this uncontroversial fact. 4) Law and econ has been massively influenced by Thaler. As a simple example, if endowment effects are real, then the assignment of property rights matters even when there are no transaction costs. Jolls et al 1998 go into more depth on this issue. 5) Thaler’s precise results in so-called behavioral finance are beyond my area of expertise, so I defer to John Cochrane’s comments following the 2013 Nobel. Eugene Fama is, I think, correct when he suggests that market efficiency generated by rational traders with risk aversion is the best model we have of financial behavior, where best is measured by “is this model useful for explaining the world.” The number of behavioral anomalies at the level of the market which persist and are relevant in the aggregate do not strike me as large, while the number of investors and policymakers who make dreadful decisions because they believe markets are driven by behavioral sentiments is large indeed!

William Baumol: Truly Productive Entrepreneurship

It seems this weblog has become an obituary page rather than a simple research digest of late. I am not even done writing on the legacy of Ken Arrow (don’t worry – it will come!) when news arrives that yet another product of the World War 2 era in New York City, an of the CCNY system, has passed away: the great scholar of entrepreneurship and one of my absolute favorite economists, William Baumol.

But we oughtn’t draw the line on his research simply at entrepreneurship, though I will walk you through his best piece in the area, a staple of my own PhD syllabus, on “creative, unproductive, and destructive” entrepreneurship. Baumol was also a great scholar of the economics of the arts, performing and otherwise, which were the motivation for his famous cost disease argument. He was a very skilled micro theorist, a talented economic historian, and a deep reader of the history of economic thought, a nice example of which is his 2000 QJE on what we have learned since Marshall. In all of these areas, his papers are a pleasure to read, clear, with elegant turns of phrase and the casual yet erudite style of an American who’d read his PhD in London under Robbins and Viner. That he has passed without winning his Nobel Prize is a shame – how great would it have been had he shared a prize with Nate Rosenberg before it was too late for them both?

Baumol is often naively seen as a Schumpeter-esque defender of the capitalist economy and the heroic entrepreneur, and that is only half right. Personally, his politics were liberal, and as he argued in a recent interview, “I am well aware of all the very serious problems, such as inequality, unemployment, environmental damage, that beset capitalist societies. My thesis is that capitalism is a special mechanism that is uniquely effective in accomplishing one thing: creating innovations, applying those innovations and using them to stimulate growth.” That is, you can find in Baumol’s work many discussions of environmental externalities, of the role of government in funding research, in the nature of optimal taxation. You can find many quotes where Baumol expresses interest in the policy goals of the left (though often solved with the mechanism of the market, and hence the right). Yet the core running through much of Baumol’s work is a rigorous defense, historically and theoretically grounded, in the importance of getting incentives correct for socially useful innovation.

Baumol differs from many other prominent economists of innovation because is at his core a neoclassical theorist. He is not an Austrian like Kirzner or an evolutionary economist like Sid Winter. Baumol’s work stresses that entrepreneurs and the innovations they produce are fundamental to understanding the capitalist economy and its performance relative to other economic systems, but that the best way to understand the entrepreneur methodologically was to formalize her within the context of neoclassical equilibria, with innovation rather than price alone being “the weapon of choice” for rational, competitive firms. I’ve always thought of Baumol as being the lineal descendant of Schumpeter, the original great thinker on entrepreneurship and one who, nearing the end of his life and seeing the work of his student Samuelson, was convinced that his ideas should be translated into formal neoclassical theory.

A 1968 essay in the AER P&P laid out Baumol’s basic idea that economics without the entrepreneur is, in a line he would repeat often, like Hamlet without the Prince of Denmark. He clearly understood that we did not have a suitable theory for oligopoly and entry into new markets, or for the supply of entrepreneurs, but that any general economic theory needed to be able to explain why growth is different in different countries. Solow’s famous essay convinced much of the profession that the residual, interpreted then primarily as technological improvement, was the fundamental variable explaining growth, and Baumol, like many, believed those technological improvements came mainly from entrepreneurial activity.

But what precisely should the theory look like? Ironically, Baumol made his most productive step in a beautiful 1990 paper in the JPE which contains not a single formal theorem nor statistical estimate of any kind. Let’s define an entrepreneur as “persons who are ingenious or creative in finding ways to add to their wealth, power, or prestige”. These people may introduce new goods, or new methods of production, or new markets, as Schumpeter supposed in his own definition. But are these ingenious and creative types necessarily going to do something useful for social welfare? Of course not – the norms, institutions, and incentives in a given society may be such that the entrepreneurs perform socially unproductive tasks, such as hunting for new tax loopholes, or socially destructive tasks, such as channeling their energy into ever-escalating forms of warfare.

With the distinction between productive, unproductive, and destructive entrepreneurship in mind, we might imagine that the difference in technological progress across societies may have less to do with the innate drive of the society’s members, and more to do with the incentives for different types of entrepreneurship. Consider Rome, famously wealthy yet with very little in the way of useful technological diffusion: certainly the Romans appear less innovative than either the Greeks or Europe of the Middle Ages. How can a society both invent a primitive steam engine – via Herod of Alexandria – and yet see it used for nothing other than toys and religious ceremonies? The answer, Baumol notes, is that status in Roman society required one to get rich via land ownership, usury, or war; commerce was a task primarily for slaves and former slaves! And likewise in Song dynasty China, where imperial examinations were both the source of status and the ability to expropriate any useful inventions or businesses that happened to appear. In the European middle ages, incentives shift for the clever from developing war implements to the diffusion of technology like the water-mill under the Cistercians back to weapons. These examples were expanded to every society from Ancient Mesopotamia to the Dutch Republic to the modern United States in a series of economically-minded historians in a wonderful collection of essays called “The Invention of Enterprise” which was edited by Baumol alongside Joel Mokyr and David Landes.

Now we are approaching a sort of economic theory of entrepreneurship – no need to rely on the whims of character, but instead focus on relative incentives. But we are still far from Baumol’s 1968 goal: incorporating the entrepreneur into neoclassical theory. The closest Baumol comes is in his work in the early 1980s on contestable markets, summarized in the 1981 AEA Presidential Address. The basic idea is this. Assume industries have scale economies, so oligopoly is their natural state. How worried should we be? Well, if there are no sunk costs and no entry barriers for entrants, and if entrants can siphon off customers quicker than incumbents can respond, then Baumol and his coauthors claimed that the market was contestable: the threat of entry is sufficient to keep the incumbent from exerting their market power. On the one hand, fine, we all agree with Baumol now that industry structure is endogenous to firm behavior, and the threat of entry clearly can restrain market power. But on the other hand, is this “ultra-free entry” model the most sensible way to incorporate entry and exit into a competitive model? Why, as Dixit argued, is it quicker to enter a market than to change price? Why, as Spence argued, does the unrealized threat of entry change equilibrium behavior if the threat is truly unrealized along the equilibrium path?

It seems that what Baumol was hoping this model would lead to was a generalized theory of perfect competition that permitted competition for the market rather than just in the market, since the competition for the market is naturally the domain of the entrepreneur. Contestable markets are too flawed to get us there. But the basic idea, that game-theoretic endogenous market structure, rather than the old fashioned idea that industry structure affects conduct affects performance, is clearly here to stay: antitrust is essentially applied game theory today. And once you have the idea of competition for the market, the natural theoretical model is one where firms compete to innovate in order to push out incumbents, incumbents innovate to keep away from potential entrants, and profits depend on the equilibrium time until the dominant firm shifts: I speak, of course, about the neo-Schumpeterian models of Aghion and Howitt. These models, still a very active area of research, are finally allowing us to rigorously investigate the endogenous rewards to innovation via a completely neoclassical model of market structure and pricing.

I am not sure why Baumol did not find these neo-Schumpeterian models to be the Holy Grail he’d been looking for; in his final book, he credits them for being “very powerful” but in the end holding different “central concerns”. He may have been mistaken in this interpretation. It proved quite interesting to give a careful second read of Baumol’s corpus on entrepreneurship, and I have to say it disappoints in part: the questions he asked were right, the theoretical acumen he possessed was up to the task, the understanding of history and qualitative intuition was second to none, but in the end, he appears to have been just as stymied by the idea of endogenous neoclassical entrepreneurship as the many other doyens of our field who took a crack at modeling this problem without, in the end, generating the model they’d hoped they could write.

Where Baumol has more success, and again it is unusual for a theorist that his most well-known contribution is largely qualitative, is in the idea of cost disease. The concept comes from Baumol’s work with William Bowen (see also this extension with a complete model) on the economic problems of the performing arts. It is a simple idea: imagine productivity in industry rises 4% per year, but “the output per man-hour of a violinist playing a Schubert quarter in a standard concert hall” remains fixed. In order to attract workers into music rather than industry, wages must rise in music at something like the rate they rise in industry. But then costs are increasing while productivity is not, and the arts looks “inefficient”. The same, of course, is said for education, and health care, and other necessarily labor-intensive industries. Baumol’s point is that rising costs in unproductive sectors reflect necessary shifts in equilibrium wages rather than, say, growing wastefulness.

How much can cost disease explain? Because the concept is so widely known by now that it is, in fact, used to excuse stagnant industries. Teaching, for example, requires some labor, but does anybody believe that it is impossible for R&D and complementary inventions (like the internet, for example) to produce massive productivity improvements? Is it not true that movie theaters now show opera live from the world’s great halls on a regular basis? Is it not true that my Google Home can, activated by voice, call up two seconds from now essentially any piece of recorded music I desire, for free? Speculating about industries that are necessarily labor-intensive (and hence grow slowly) from those with rapid technological progress is a very difficult game, and one we ought hesitate to play. But equally, we oughtn’t forget Baumol’s lesson: in some cases, in some industries, what appears to be fixable slack is in fact simply cost disease. We may ask, how was it that Ancient Greece, with its tiny population, put on so many plays, while today we hustle ourselves to small ballrooms in New York and London? Baumol’s answer, rigorously shown: cost disease. The “opportunity cost” of recruiting a big chorus was low, as those singers would otherwise have been idle or working unproductive fields gathering olives. The difference between Athens and our era is not simply that they were “more supportive of the arts”!

Baumol was incredibly prolific, so these suggestions for further reading are but a taste: An interview by Alan Krueger is well worth the read for anecdotes alone, like the fact that apparently one used to do one’s PhD oral defense “over whiskies and sodas at the Reform Club”. I also love his defense of theory, where if he is very lucky, his initial intuition “turn[s] out to be totally wrong. Because when I turn out to be totally wrong, that’s when the best ideas come out. Because if my intuition was right, it’s almost always going to be simple and straightforward. When my intuition turns out to be wrong, then there is something less obvious to explain.” Every theorist knows this: formalization has this nasty habit of refining our intuition and convincing us our initial thoughts actually contain logical fallacies or rely on special cases! Though known as an applied micro theorist, Baumol also wrote a canonical paper, with Bradford, on optimal taxation: essentially, if you need to raise $x in tax, how should you optimally deviate from marginal cost pricing? The history of thought is nicely diagrammed, and of course this 1970 paper was very quickly followed by the classic work of Diamond and Mirrlees. Baumol wrote extensively on environmental economics, drawing in many of his papers on the role nonconvexities in the social production possibilities frontier play when they are generated by externalities – a simple example of this effect, and the limitations it imposes on Pigouvian taxation, is in the link. More recently, Baumol has been writing on international trade with Ralph Gomory (the legendary mathematician behind a critical theorem in integer programming, and later head of the Sloan Foundation); their main theorems are not terribly shocking to those used to thinking in terms of economies of scale, but the core example in the linked paper is again a great example of how nonconvexities can overturn a lot of our intuition, in the case on comparative advantage. Finally, beyond his writing on the economics of the arts, Baumol proved that there is no area in which he personally had stagnant productivity: an art major in college, he was also a fantastic artist in his own right, picking up computer-generated art while in his 80s and teaching for many years a course on woodworking at Princeton!

Kenneth Arrow Part II: The Theory of General Equilibrium

The first post in this series discussed Ken Arrow’s work in the broad sense, with particular focus on social choice. In this post, we will dive into his most famous accomplishment, the theory of general equilibrium (1954, Econometrica). I beg the reader to offer some sympathy for the approximations and simplifications that will appear below: the history of general equilibrium is, by this point, well-trodden ground for historians of thought, and the interpretation of history and theory in this area is quite contentious.

My read of the literature on GE following Arrow is as follows. First, the theory of general equilibrium is an incredible proof that markets can, in theory and in certain cases, work as efficiently as an all-powerful planner. That said, the three other hopes of general equilibrium theory since the days of Walras are, in fact, disproven by the work of Arrow and its followers. Market forces will not necessarily lead us toward these socially optimal equilibrium prices. Walrasian demand does not have empirical content derived from basic ordinal utility maximization. We cannot rigorously perform comparative statics on general equilibrium economic statistics without assumptions that go beyond simple utility maximization. From my read of Walras and the early general equilibrium theorists, all three of those results would be a real shock.

Let’s start at the beginning. There is an idea going back to Adam Smith and the invisible hand, an idea that individual action will, via the price system, lead to an increase or even maximization of economic welfare (an an aside, Smith’s own use of “invisible hand” trope is overstated, as William Grampp among others has convincingly argued). The kind of people who denigrate modern economics – the neo-Marxists, the back-of-the-room scribblers, the wannabe-contrarian-dilletantes – see Arrow’s work, and the idea of using general equilibrium theory to “prove that markets work”, as a barbarism. We know, and have known well before Arrow, that externalities exist. We know, and have known well before Arrow, that the distribution of income depends on the distribution of endowments. What Arrow was interested in was examining not only whether the invisible hand argument “is true, but whether it could be true”. That is, if we are to claim markets are uniquely powerful at organizing economic activity, we ought formally show that the market could work in such a manner, and understand the precise conditions under which it won’t generate these claimed benefits. How ought we do this? Prove the precise conditions under which there exists a price vector where markets clear, show the outcome satisfies some welfare criterion that is desirable, and note exactly why each of the conditions are necessary for such an outcome.

The question is, how difficult is it to prove these prices exist? The term “general equilibrium” has had many meanings in economics. Today, it is often used to mean “as opposed to partial equilibrium”, meaning that we consider economic effects allowing all agents to adjust to a change in the environment. For instance, a small random trial of guaranteed incomes has, as its primary effect, an impact on the incomes of the recipients; the general equilibrium effects of making such a policy widespread on the labor market will be difficult to discern. In the 19th and early 20th century, however, the term was much more concerned with the idea of the economy as a self-regulating system. Arrow put it very nicely in an encyclopedia chapter he wrote in 1966: general equilibrium is both “the simple notion of determinateness, that the relations which describe the economic system must form a system sufficiently complete to determine the values of its variables and…the more specific notion that each relation represents a balance of forces.”

If you were a classical, a Smith or a Marx or a Ricardo, the problem of what price will obtain in a market is simple to solve: ignore demand. Prices are implied by costs and a zero profit condition, essentially free entry. And we more or less think like this now in some markets. With free entry and every firm producing at the identical minimum efficient scale, price is entirely determined by the supply side, and only quantity is determined by demand. With one factor, labor where the Malthusian condition plays the role of free entry, or labor and land in the Ricardian system, this classical model of value is well-defined. How to handle capital and differentiated labor is a problem to be assumed away, or handled informally; Samuelson has many papers where he is incensed by Marx’s handling of capital as embodied labor.

The French mathematical economist Leon Walras finally cracked the nut by introducing demand and price-taking. There are household who produce and consume. Equilibrium involves supply and demand equating in each market, hence price is where margins along the supply and demand curves equate. Walras famously (and informally) proposed a method by which prices might actually reach equilibrium: the tatonnement. An auctioneer calls out a price vector: in some markets there is excess demand and in some excess supply. Prices are then adjusted one at a time. Of course each price change will affect excess demand and supply in other markets, but you might imagine things can “converge” if you adjust prices just right. Not bad for the 1870s – there is a reason Schumpeter calls this the “Magna Carta” of economic theory in his History of Economic Analysis. But Walras was mistaken on two counts: first, knowing whether there even exists an equilibrium that clears every market simultaneously is, it turns out, equivalent to a problem in Poincare’s analysis situs beyond the reach of mathematics in the 19th century, and second, the conditions under which tatonnement actually converges are a devilish problem.

The equilibrium existence problem is easy to understand. Take the simplest case, with all j goods made up of the linear combination of k factors. Demand equals supply just says that Aq=e, where q is the quantity of each good produced, e is the endowment of each factor, and A is the input-output matrix whereby product j is made up of some combination of factors k. Also, zero profit in every market will imply Ap(k)=p(j), where p(k) are the factor prices and p(j) the good prices. It was pointed out that even in this simple system where everything is linear, it is not at all trivial to ensure that prices and quantities are not negative. It would not be until Abraham Wald in the mid-1930s – later Arrow’s professor at Columbia and a fellow Romanian, links that are surely not a coincidence! – that formal conditions were shown giving existence of general equilibrium in a simple system like this one, though Wald’s proof greatly simplified by the general problem by imposing implausible restrictions on aggregate demand.

Mathematicians like Wald, trained in the Vienna tradition, were aghast at the state of mathematical reasoning in economics at the time. Oskar Morgenstern absolutely hammered the great economist John Hicks in a 1941 review of Hicks’ Value and Capital, particularly over the crazy assertion (similar to Walras!) that the number of unknowns and equations being identical in a general equilibrium system sufficed for a solution to exist (if this isn’t clear to you in a nonlinear system, a trivial example with two equations and two unknowns is here). Von Neumann apparently said (p. 85) to Oskar, in reference to Hicks and those of his school, “if those books are unearthed a hundred years hence, people will not believe they were written in our time. Rather they will think they are about contemporary with Newton, so primitive is the mathematics.” And Hicks was quite technically advanced compared to his contemporary economists, bringing the Keynesian macroeconomics and the microeconomics of indifference curves and demand analysis together masterfully. Arrow and Hahn even credit their initial interest in the problems of general equilibrium to the serendipity of coming across Hicks’ book.

Mathematics had advanced since Walras, however, and those trained at the mathematical frontier finally had the tools to tackle Walras’ problem seriously. Let D(p) be a vector of demand for all goods given price p, and e be initial endowments of each good. Then we simply need D(p)=e or D(p)-e=0 in each market. To make things a bit harder, we can introduce intermediate and factor goods with some form of production function, but the basic problem is the same: find whether there exists a vector p such that a nonlinear equation is equal to zero. This is the mathematics of fixed points, and Brouwer had, in 1912, given a nice theorem: every continuous function from a compact convex subset to itself has a fixed point. Von Neumann used this in the 1930s to prove a similar result to Wald. A mathematician named Shizuo Kakutani, inspired by von Neumann, extended the Brouwer result to set-valued mappings called correspondences, and John Nash in 1950 used that result to show, in a trivial proof, the existence of mixed equilibria in noncooperative games. The math had arrived: we had the tools to formally state when non-trivial non-linear demand and supply systems had a fixed point, and hence a price that cleared all markets. We further had techniques for handling “corner solutions” where demand for a given good was zero at some price, surely a common outcome in the world: the idea of the linear program and complementary slackness, and its origin in convex set theory as applied to the dual, provided just the mathematics Arrow and his contemporaries would need.

So here we stood in the early 1950s. The mathematical conditions necessary to prove that a set-valued function has an equilibrium have been worked out. Hicks, in Value and Capital, has given Arrow the idea that relating the future to today is simple: just put a date on every commodity and enlarge the commodity space. Indeed, adding state-contingency is easy: put an index for state in addition to date on every commodity. So we need not only zero excess demand in apples, or in apples delivered in May 1955, but in apples delivered in May 1955 if Eisenhower loses his reelection bid. Complex, it seems, but no matter: the conditions for the existence of a fixed point will be the same in this enlarged commodity space.

With these tools in mind, Arrow and Debreu can begin their proof. They first define a generalization of an n-person game where the feasible set of actions for each player depends on the actions of every other player; think of the feasible set as “what can I afford given the prices that will result for the commodities I am endowed with?” The set of actions is an n-tuple where n is the number of date and state indexed commodities a player could buy. Debreu showed in 1952 PNAS that these generalized games have an equilibrium as long as each payoff function varies continuously with other player’s actions, the feasible set of choices convex and varies continuously in other player’s actions, and the set of actions which improve a player’s payoff are convex for every action profile. Arrow and Debreu then show that the usual implications on individual demand are sufficient to aggregate up to the conditions Debreu’s earlier paper requires. This method is much, much different from what is done by McKenzie or other early general equilibrium theorists: excess demand is never taken as a primitive. This allows the Arrow-Debreu proof to provide substantial economic intuition as Duffie and Sonnenschein point out in a 1989 JEL. For instance, showing that the Arrow-Debreu equilibrium exists even with taxation is trivial using their method but much less so in methods that begin with excess demand functions.

This is already quite an accomplishment: Arrow and Debreu have shown that there exists a price vector that clears all markets simultaneously. The nature of their proof, as later theorists will point out, relies less on convexity on preferences and production sets as on the fact that every agent is “small” relative to the market (convexity is used to get continuity in the Debreu game, and you can get this equally well by making all consumers infinitesimal and then randomizing allocations to smooth things out; see Duffie and Sonnenschein above for an example). At this point, it’s the mid-1950s, heyday of the Neoclassical synthesis: surely we want to be able to answer questions like, when there is a negative demand shock, how will the economy best reach a Pareto-optimal equilibrium again? How do different speeds of adjustment due to sticky prices or other frictions affect the rate at which optimal is regained? Those types of question implicitly assume that the equilibrium is unique (at least locally) so that we actually can “return” to where we were before the shock. And of course we know some of the assumptions needed for the Arrow-Debreu proof are unrealistic – e.g., no fixed costs in production – but we would at least like to work out how to manipulate the economy in the “simple” case before figuring out how to deal with those issues.

Here is where things didn’t work out as hoped. Uzawa (RESTUD, 1960) proved that not only could Brouwer’s theorem be used to prove the existence of general equilibrum, but that the opposite was true as well: the existence of general equilibrium was logically equivalent to Brouwer. A result like this certainly makes one worry about how much one could say about prices in general equilibrium. The 1970s brought us the Sonnenschein-Mantel-Debreu “Anything Goes” theorem: aggregate excess demand functions do not inherit all the properties of individual excess demand functions because of wealth effects (when relative prices change, the value of one’s endowment changes as well). For any aggregate excess demand function satisfying a couple minor restrictions, there exists an economy with individual preferences generating that function; in particular, fewer restrictions than are placed on individual excess demand as derived from individual preference maximization. This tells us, importantly, that there is no generic reason for equilibria to be unique in an economy.

Multiplicity of equilibria is a problem: if the goal of GE was to be able to take underlying primitives like tastes and technology, calculate “the” prices that clear the market, then examine how those prices change (“comparative statics”), we essentially lose the ability to do all but local comparative statics since large changes in the environment may cause the economy to jump to a different equilibrium (luckily, Debreu (1970, Econometrica) at least generically gives us a finite number of equilibria, so we may at least be able to say something about local comparative statics for very small shocks). Indeed, these analyses are tough without an equilibrium selection mechanism, which we don’t really have even now. Some would say this is no big deal: of course the same technology and tastes can generate many equilibria, just as cars may wind up all driving on either the left or the right in equilibrium. And true, all of the Arrow-Debreu equilibria are Pareto optimal. But it is still far afield from what might have been hoped for in the 1930s when this quest for a modern GE theory began.

Worse yet is stability, as Arrow and his collaborators (1958, Ecta; 1959, Ecta) would help discover. Even if we have a unique equilibrium, Herbert Scarf (IER, 1960) showed, via many simple examples, how Walrasian tatonnement can lead to cycles which never converge. Despite a great deal of the intellectual effort in the 1960s and 1970s, we do not have a good model of price adjustment even now. I should think we are unlikely to ever have such a theory: as many theorists have pointed out, if we are in a period of price adjustment and not in an equilibrium, then the zero profit condition ought not apply, ergo why should there be “one” price rather than ten or a hundred or a thousand?

The problem of multiplicity and instability for comparative static analysis ought be clear, but it should also be noted how problematic they are for welfare analysis. Consider the Second Welfare Theorem: under the Arrow-Debreu system, for every Pareto optimal allocation, there exists an initial endowment of resources such that that allocation is an equilibrium. This is literally the main justification for the benefits of the market: if we reallocate endowments, free exchange can get us to any Pareto optimal point, ergo can get us to any reasonable socially optimal point no matter what social welfare function you happen to hold. How valid is this justification? Call x* the allocation that maximizes some social welfare function. Let e* be an initial endowment for which x* is an equilibrium outcome – such an endowment must exist via Arrow-Debreu’s proof. Does endowing agents with e* guarantee we reach that social welfare maximum? No: x* may not be unique. Even if it unique, will we reach it? No: if it is not a stable equilibrium, it is only by dint of luck that our price adjustment process will ever reach it.

So let’s sum up. In the 1870s, Walras showed us that demand and supply, with agents as price takers, can generate supremely useful insights into the economy. Since demand matters, changes in demand in one market will affect other markets as well. If the price of apples rises, demand for pears will rise, as will their price, whose secondary effect should be accounted for in the market for apples. By the 1930s we have the beginnings of a nice model of individual choice based on constrained preference maximization. Taking prices as given, individual demands have well-defined forms, and excess demand in the economy can be computed by a simple summing up. So we now want to know: is there in fact a price that clears the market? Yes, Arrow and Debreu show, there is, and we needn’t assume anything strange about individual demand to generate this. These equilibrium prices always give Pareto optimal allocations, as had long been known, but there also always exist endowments such that every Pareto optimal allocation is an equilibria. It is a beautiful and important result, and a triumph for the intuition of the invisible hand it its most formal sense.

Alas, it is there we reach a dead end. Individual preferences alone do not suffice to tell us what equilibria we are at, nor that any equilibria will be stable, nor that any equilibria will be reached by an economically sensible adjustment process. To say anything meaningful about aggregate economic outcomes, or about comparative statics after modest shocks, or about how technological changes change price, we need to make assumptions that go beyond individual rationality and profit maximization. This is, it seems to me, a shock for the economists of the middle of the century, and still a shock for many today. I do not think this means “general equilibrium is dead” or that the mathematical exploration in the field was a waste. We learned a great deal about precisely when markets could even in principle achieve the first best, and that education was critical for the work Arrow would later do on health care, innovation, and the environment, which I will discuss in the next two posts. And we needn’t throw out general equilibrium analysis because of uniqueness or stability problems, any more than we would throw out game theoretic analysis because of the same problems. But it does mean that individual rationality as the sole paradigm of economic analysis is dead: it is mathematically proven that postulates of individual rationality will not allow us to say anything of consequence about economic aggregates or game theoretic outcomes in the frequent scenarios where we do not have a unique equilibria with a well-defined way to get there (via learning in games, or a tatonnament process in GE, or something of a similar nature). Arrow himself (1986, J. Business) accepts this: “In the aggregate, the hypothesis of rational behavior has in general no implications.” This is an opportunity for economists, not a burden, and we still await the next Arrow who can guide us on how to proceed.

Some notes on the literature: For those interested in the theoretical development of general equilibrium, I recommend General Equilibrium Analysis by Roy Weintraub, a reformed theorist who now works in the history of thought. Wade Hands has a nice review of the neoclassical synthesis and the ways in which Keynesianism and GE analysis were interrelated. On the battle for McKenzie to be credited alongside Arrow and Debreu, and the potentially scandalous way Debreu may have secretly been responsible for the Arrow and Debreu paper being published first, see the fine book Finding Equilibrium by Weintraub and Duppe; both Debreu and McKenzie have particularly wild histories. Till Duppe, a scholar of Debreu, also has a nice paper in the JHET on precisely how Arrow and Debreu came to work together, and what the contribution of each to their famous ’54 paper was.

The Greatest Living Economist Has Passed Away: Notes on Kenneth Arrow Part I

It is amazing how quickly the titans of the middle of the century have passed. Paul Samuelson and his mathematization, Ronald Coase and his connection of law to economics, Gary Becker and his incorporation of choice into the full sphere of human behavior, John Nash and his formalization of strategic interaction, Milton Friedman and his defense of the market in the precarious post-war period, Robert Fogel and his cliometric revolution: the remaining titan was Kenneth Arrow, the only living economist who could have won a second Nobel Prize without a whit of complaint from the gallery. These figures ruled as economics grew from a minor branch of moral philosophy into the most influential, most prominent, and most advanced of the social sciences. It is hard to imagine our field will ever again have such a collection of scholars rise in one generation, and with the tragic news that Ken has now passed away as well, we have, with great sadness and great rapidity, lost the full set.

Though he was 95 years old, Arrow was still hard at work; his paper with Kamran Bilir and Alan Sorensen was making its way around the conference circuit just last year. And beyond incredible productivity, Arrow had a legendary openness with young scholars. A few years ago, a colleague and I were debating a minor point in the history of economic thought, one that Arrow had played some role in; with the debate deadlocked, it was suggested that I simply email the protagonist to learn the truth. No reply came; perhaps no surprise, given how busy he was and how unknown I was. Imagine my surprise when, two months letter, a large manila envelope showed up in my mailbox at Northwestern, with a four page letter Ken had written inside! Going beyond a simple answer, he patiently walked me through his perspective on the entire history of mathematical economics, the relative centrality of folks like Wicksteed and Edgeworth to the broader economic community, the work he did under Hotelling and the Cowles Commission, and the nature of formal logic versus price theory. Mind you, this was his response to a complete stranger.

This kindness extended beyond budding economists: Arrow was a notorious generator of petitions on all kinds of social causes, and remained so late in life, signing the Economists Against Trump that many of us supported last year. You will be hardpressed to find an open letter or amicus curiae, on any issue from copyright term extension to the use of nuclear weapons, which Arrow was unaware of. The Duke Library holds the papers of both Arrow and Paul Samuelson – famously they became brothers-in-law – and the frequency with which their correspondence involves this petition or that, with Arrow in general the instigator and Samuelson the deflector, is unmistakable. I recall a great series of letters where Arrow queried Samuelson as to who had most deserved the Nobel but had died too early to receive it. Arrow at one point proposed Joan Robinson, which sent Samuelson into convulsions. “But she was a communist! And besides, her theory of imperfect competition was subpar.” You get the feeling in these letters of Arrow making gentle comments and rejoinders while Samuelson exercises his fists in the way he often did when battling everyone from Friedman to the Marxists at Cambridge to (worst of all, for Samuelson) those who were ignorant of their history of economic thought. Their conversation goes way back: you can find in one of the Samuelson boxes his recommendation that the University of Michigan bring in this bright young fellow named Arrow, a missed chance the poor Wolverines must still regret!

Arrow is so influential, in some many areas of economics, that it is simply impossible to discuss his contributions in a single post. For this reason, I will break the post into four parts, with one posted each day this week. We’ll look at Arrow’s work in choice theory today, his work on general equilibrium tomorrow, his work on innovation on Thursday, and some selected topics where he made seminal contributions (the economics of the environment, the principal-agent problem, and the economics of health care, in particular) on Friday. I do not lightly say that Arrow was the greatest living economist, and in my reckoning second only to Samuelson for the title of greatest economist of all time. Arrow wrote the foundational paper of general equilibrium analysis, the foundational paper of social choice and voting, the foundational paper justifying government intervention in innovation, and the foundational paper in the economics of health care. His legacy is the greatest legacy possible for the mathematical approach pushed by the Cowles Commission, the Econometric Society, Irving Fisher, and the mathematician-cum-economist Harold Hotelling. And so it is there that we must begin.

Arrow was born in New York City, a CCNY graduate like many children of the Great Depression, who went on to study mathematics in graduate school at Columbia. Economics in the United States in the 1930s was not a particularly mathematical science. The formalism of von Neumann, the late-life theoretical conversion of Schumpeter, Samuelson’s Foundations, and the soft nests at Cowles and the Econometric Society were in their infancy.

The usual story is that Arrow’s work on social choice came out of his visit to RAND in 1948. But this misstates the intellectual history: Arrow’s actual encouragement comes from his engagement with a new form of mathematics, the expansions of formal logic beginning with people like Peirce and Boole. While a high school student, Arrow read Bertrand Russell’s text on mathematical logic, and was enthused with the way that set theory permitted logic to go well beyond the syllogisms of the Greeks. What a powerful tool for the generation of knowledge! His Senior year at CCNY, Arrow took the advanced course on relational logic taught by Alfred Tarski, where the eminent philosopher took pains to reintroduce the ideas of Charles Sanders Peirce, the greatest yet most neglected American philosopher. The idea of relations are familiar to economists: give some links between a set (i.e, xRy and yRz) and some properties to the relation (i.e., it is well-ordered), and you can then perform logical operations on the relation to derive further properties. Every trained economist sees an example of this when first learning about choice and utility, but of course things like “greater than” and “less than” are relations as well. In 1940, one would have had to be extraordinarily lucky to encounter this theory: Tarski’s own books were not even translated.

But what great training this would be! For Arrow joined a graudate program in mathematical statistics at Columbia, where one of the courses was taught by Hotelling from the economics department. Hotelling was an ordinalist, rare in those days, and taught his students demand theory from a rigorous basis in ordinal preferences. But what are these? Simply relations with certain properties! Combined with a statistician’s innate ability to write proofs using inequalities, Arrow greatly impressed Hotelling, and switched to a PhD in economics with inspiration in the then-new subfield on mathematical economics that Hotelling, Samuelson, and Hicks were helping to expand.

After his wartime service doing operation research related to weather and flight planning, and a two year detour into capital theory with little to show for it, Arrow took a visiting position at the Cowles Commission, a center of research in mathematical economics then at the University of Chicago. In 1948, Arrow spent the summer at RAND, still yet to complete his dissertation, or even to strike on a worthwhile idea. RAND in Santa Monica was the world center for applied game theory: philosophers, economists, and mathematicians prowled the halls working through the technical basics of zero-sum games, but also the application of strategic decision theory to problems of serious global importance. Arrow had been thinking about voting a bit, and had written a draft of a paper, similar to that of Duncan Black’s 1948 JPE, essentially suggesting that majority voting “works” when preferences are single-peaked; that is, if everyone can rank options from “left to right”, and simply differ on which point is their “peak” of preference, then majority voting reflects individual preferences in a formal sense. At RAND, the philosopher Olaf Helmer pointed out that a similar concern mattered in international relations: how are we to say that the Soviet Union or the United States have preferences? They are collections of individuals, not individuals themselves.

Right, Arrow agreed. But economists had thought about collective welfare, from Pareto to Bergson-Samuelson. The Bergson-Samuelson idea is simple. Let all individuals in society have preferences over states of the world. If we all prefer state A to state B, then the Pareto criterion suggests society should as well. Of course, tradeoffs are inevitable, so what are we to do? We could assume cardinal utility (e.g., “how much money are willing to be paid to accept A if you prefer B to A and society goes toward A?”) as in the Kaldor-Hicks criterion (though the technically minded will know that Kaldor-Hicks does not define an order on states of the world, so isn’t really great for social choice). But let’s assume all people have is their own ordinal utility, their own rank-order of states, an order that is naturally hard to compare across people. Let’s assume for some pairs we have Pareto dominance: we all prefer A to C, and Q to L, and Z to X, but for other pairs there is no such dominance. A great theorem due to the Polish mathematician Szpilrain, and I believe popularized among economists by Blackwell, says that if you have a quasiorder R that is transitive, then there exists an order R’ which completes it. In simple terms, if you can rank some pairs, and the pairs you do rank do not have any intransitivity, then you can generate a complete rankings of all pairs which respects the original incomplete ordering. Since individuals have transitive preferences, Pareto ranks are transitive, and hence we know there exist social welfare functions which “extend” Pareto. The implications of this are subtle: for instance, as I discuss in the link earlier in this paragraph, it implies that pure monetary egalitarianism can never be socially optimal even if the only requirement is to respect Pareto dominance.

So aren’t we done? We know what it means, via Bergson-Samuelson, for the Soviet Union to “prefer” X to Y. But alas, Arrow was clever and attacked the problem from a separate view. His view was to, rather than taking preference orderings of individuals as given and constructing a social ordering, to instead ask whether there is any mechanism for constructing a social ordering from arbitrary individual preferences that satisfies certain criteria. For instance, you may want to rule out a rule that says “whatever Kevin prefers most is what society prefers, no matter what other preferences are” (non-dictatorship). You may want to require Pareto dominance to be respected so that if everyone likes A more than B, A must be chosen (Pareto criterion). You may want to ensure that “irrelevant options” do not matter, so that if giving an option to choose “orange” in addition to “apple” and “pear” does not affect any individual’s ranking of apples and pears, then the orange option also oughtn’t affect society’s rankings of apples and pears (IIA). Arrow famously proved that if we do not restrict what types of preferences individuals may have over social outcomes, there is no system that can rank outcomes socially and still satisfy those three criteria. It has been known that majority voting suffers a problem of this sort since Condorcet in the 18th century, but the general impossibility was an incredible breakthrough, and a straightforward one once Arrow was equipped with the ideas of relational logic.

It was with this result, in the 1951 book-length version of the idea, that social choice as a field distinct from welfare economics really took off. It is a startling result in two ways. First, in pure political theory, it rather simply killed off two centuries of blather about what the “best” voting system was: majority rule, Borda counts, rank-order voting, or whatever you like, every system must violate one of the Arrow axioms. And indeed, subsequent work has shown that the axioms can be relaxed and still generate impossibility. In the end, we do need to make social choices, so what should we go with? If you’re Amartya Sen, drop the Pareto condition. Others have quibbled with IIA. The point is that there is no right answer. The second startling implication is that welfare economics may be on pretty rough footing. Kaldor-Hicks conditions, which in practice motivate all sorts of regulatory decisions in our society, both rely on the assumption of cardinal or interpersonally-comparable utility, and do not generate an order over social options. Any Bergson-Samuelson social welfare function, a really broad class, must violate some pretty natural conditions on how they treat “equivalent” people (see, e.g., Kemp and Ng 1976). One questions whether we are back in the pre-Samuelson state where, beyond Pareto dominance, we can’t say much with any rigor about whether something is “good” or “bad” for society without dictatorially imposing our ethical standard, individual preferences be damned. Arrow’s theorem is a remarkable achievement for a man as young as he was when he conceived it, one of those rare philosophical ideas that will enter the canon alongside the categorical imperative or Hume on induction, a rare idea that will without question be read and considered decades and centuries hence.

Some notes to wrap things up:

1) Most call the result “Arrow’s Impossibility Theorem”. After all, he did prove the impossibility of a certain form of social choice. But Tjalling Koopmans actually convinced Arrow to call the theorem a “Possibility Theorem” out of pure optimism. Proof that the author rarely gets to pick the eventual name!

2) The confusion between Arrow’s theorem and the existence of social welfare functions in Samuelson has a long and interesting history: see this recent paper by Herrada Igersheim. Essentially, as I’ve tried to make clear in this post, Arrow’s result does not prove that Bergson-Samuelson social welfare functions do not exist, but rather implicitly imposes conditions on the indifference curves which underlie the B-S function. Much more detail in the linked paper.

3) So what is society to do in practice given Arrow? How are we to decide? There is much to recommend in Posner and Weyl’s quadratic voting when preferences can be assumed to have some sort of interpersonally comparable cardinal structure, yet are unknown. When interpersonal comparisons are impossible and we do not know people’s preferences, the famous Gibbard-Satterthwaite Theorem says that we have no voting system that can avoid getting people to sometimes vote strategically. We might then ask, ok, fine, what voting or social choice system works “the best” (e.g., satisfies some desiderata) over the broadest possible sets of individual preferences? Partha Dasgupta and Eric Maskin recently proved that, in fact, good old fashioned majority voting works best! But the true answer as to the “best” voting system depends on the distribution of underlying preferences you expect to see – it is a far less simple question than it appears.

4) The conditions I gave above for Arrow’s Theorem are actually different from the 5 conditions in the original 1950 paper. The reason is that Arrow’s original proof is actually incorrect, as shown by Julian Blau in a 1957 Econometrica. The basic insight of the proof is of course salvageable.

5) Among the more beautiful simplifications of Arrow’s proof is Phil Reny’s “side by side” proof of Arrow and Gibbard-Satterthwaite, where he shows just how related the underlying logic of the two concepts is.

We turn to general equilibrium theory tomorrow. And if it seems excessive to need four days to cover the work on one man – even in part! – that is only because I understate the breadth of his contributions. Like Samuelson’s obscure knowledge of Finnish ministers which I recounted earlier this year, Arrow’s breadth of knowledge was also notorious. There is a story Eric Maskin has claimed to be true, where some of Arrow’s junior colleagues wanted to finally stump the seemingly all-knowing Arrow. They all studied the mating habits of whales for days, and then, when Arrow was coming down the hall, faked a vigorous discussion on the topic. Arrow stopped and turned, remaining silent at first. The colleagues had found a topic he didn’t fully know! Finally, Arrow interrupted: “But I thought Turner’s theory was discredited by Spenser, who showed that the supposed homing mechanism couldn’t possibly work”! And even this intellectual feat hardly matches Arrow’s well-known habit of sleeping through the first half of seminars, waking up to make the most salient point of the whole lecture, then falling back asleep again (as averred by, among others, my colleague Joshua Gans, a former student of Ken’s).

Nobel Prize 2016 Part II: Oliver Hart

The Nobel Prize in Economics was given yesterday to two wonderful theorists, Bengt Holmstrom and Oliver Hart. I wrote a day ago about Holmstrom’s contributions, many of which are simply foundational to modern mechanism design and its applications. Oliver Hart’s contribution is more subtle and hence more of a challenge to describe to a nonspecialist; I am sure of this because no concept gives my undergraduate students more headaches than Hart’s “residual control right” theory of the firm. Even stranger, much of Hart’s recent work repudiates the importance of his most famous articles, a point that appears to have been entirely lost on every newspaper discussion of Hart that I’ve seen (including otherwise very nice discussions like Applebaum’s in the New York Times). A major reason he has changed his beliefs, and his research agenda, so radically is not simply the whims of age or the pressures of politics, but rather the impact of a devastatingly clever, and devastatingly esoteric, argument made by the Nobel winners Eric Maskin and Jean Tirole. To see exactly what’s going on in Hart’s work, and why there remains many very important unsolved questions in this area, let’s quickly survey what economists mean by “theory of the firm”.

The fundamental strangeness of firms goes back to Coase. Markets are amazing. We have wonderful theorems going back to Hurwicz about how competitive market prices coordinate activity efficiently even when individuals only have very limited information about how various things can be produced by an economy. A pencil somehow involves graphite being mined, forests being explored and exploited, rubber being harvested and produced, the raw materials brought to a factory where a machine puts the pencil together, ships and trains bringing the pencil to retail stores, and yet this decentralized activity produces a pencil costing ten cents. This is the case even though not a single individual anywhere in the world knows how all of those processes up the supply chain operate! Yet, as Coase pointed out, a huge amount of economic activity (including the majority of international trade) is not coordinated via the market, but rather through top-down Communist-style bureaucracies called firms. Why on Earth do these persistent organizations exist at all? When should firms merge and when should they divest themselves of their parts? These questions make up the theory of the firm.

Coase’s early answer is that something called transaction costs exist, and that they are particularly high outside the firm. That is, market transactions are not free. Firm size is determined at the point where the problems of bureaucracy within the firm overwhelm the benefits of reducing transaction costs from regular transactions. There are two major problems here. First, who knows what a “transaction cost” or a “bureaucratic cost” is, and why they differ across organizational forms: the explanation borders on tautology. Second, as the wonderful paper by Alchian and Demsetz in 1972 points out, there is no reason we should assume firms have some special ability to direct or punish their workers. If your supplier does something you don’t like, you can keep them on, or fire them, or renegotiate. If your in-house department does something you don’t like, you can keep them on, or fire them, or renegotiate. The problem of providing suitable incentives – the contracting problem – does not simply disappear because some activity is brought within the boundary of the firm.

Oliver Williamson, a recent Nobel winner joint with Elinor Ostrom, has a more formal transaction cost theory: some relationships generate joint rents higher than could be generated if we split ways, unforeseen things occur that make us want to renegotiate our contract, and the cost of that renegotiation may be lower if workers or suppliers are internal to a firm. “Unforeseen things” may include anything which cannot be measured ex-post by a court or other mediator, since that is ultimately who would enforce any contract. It is not that everyday activities have different transaction costs, but that the negotiations which produce contracts themselves are easier to handle in a more persistent relationship. As in Coase, the question of why firms do not simply grow to an enormous size is largely dealt with by off-hand references to “bureaucratic costs” whose nature was largely informal. Though informal, the idea that something like transaction costs might matter seemed intuitive and had some empirical support – firms are larger in the developing world because weaker legal systems means more “unforeseen things” will occur outside the scope of a contract, hence the differential costs of holdup or renegotiation inside and outside the firm are first order when deciding on firm size. That said, the Alchian-Demsetz critique, and the question of what a “bureaucratic cost” is, are worrying. And as Eric van den Steen points out in a 2010 AER, can anyone who has tried to order paper through their procurement office versus just popping in to Staples really believe that the reason firms exist is to lessen the cost of intrafirm activities?

Grossman and Hart (1986) argue that the distinction that really makes a firm a firm is that it owns assets. They retain the idea that contracts may be incomplete – at some point, I will disagree with my suppliers, or my workers, or my branch manager, about what should be done, either because a state of the world has arrived not covered by our contract, or because it is in our first-best mutual interest to renegotiate that contract. They retain the idea that there are relationship-specific rents, so I care about maintaining this particular relationship. But rather than rely on transaction costs, they simply point out that the owner of the asset is in a much better bargaining position when this disagreement occurs. Therefore, the owner of the asset will get a bigger percentage of rents after renegotiation. Hence the person who owns an asset should be the one whose incentive to improve the value of the asset is most sensitive to that future split of rents.

Baker and Hubbard (2004) provide a nice empirical example: when on-board computers to monitor how long-haul trucks were driven began to diffuse, ownership of those trucks shifted from owner-operators to trucking firms. Before the computer, if the trucking firm owns the truck, it is hard to contract on how hard the truck will be driven or how poorly it will be treated by the driver. If the driver owns the truck, it is hard to contract on how much effort the trucking firm dispatcher will exert ensuring the truck isn’t sitting empty for days, or following a particularly efficient route. The computer solves the first problem, meaning that only the trucking firm is taking actions relevant to the joint relationship which are highly likely to be affected by whether they own the truck or not. In Grossman and Hart’s “residual control rights” theory, then, the introduction of the computer should mean the truck ought, post-computer, be owned by the trucking firm. If these residual control rights are unimportant – there is no relationship-specific rent and no incompleteness in contracting – then the ability to shop around for the best relationship is more valuable than the control rights asset ownership provides. Hart and Moore (1990) extends this basic model to the case where there are many assets and many firms, suggesting critically that sole ownership of assets which are highly complementary in production is optimal. Asset ownership affects outside options when the contract is incomplete by changing bargaining power, and splitting ownership of complementary assets gives multiple agents weak bargaining power and hence little incentive to invest in maintaining the quality of, or improving, the assets. Hart, Schleifer and Vishny (1997) provide a great example of residual control rights applied to the question of why governments should run prisons but not garbage collection. (A brief aside: note the role that bargaining power plays in all of Hart’s theories. We do not have a “perfect” – in a sense that can be made formal – model of bargaining, and Hart tends to use bargaining solutions from cooperative game theory like the Shapley value. After Shapley’s prize alongside Roth a few years ago, this makes multiple prizes heavily influenced by cooperative games applied to unexpected problems. Perhaps the theory of cooperative games ought still be taught with vigor in PhD programs!)

There are, of course, many other theories of the firm. The idea that firms in some industries are big because there are large fixed costs to enter at the minimum efficient scale goes back to Marshall. The agency theory of the firm going back at least to Jensen and Meckling focuses on the problem of providing incentives for workers within a firm to actually profit maximize; as I noted yesterday, Holmstrom and Milgrom’s multitasking is a great example of this, with tasks being split across firms so as to allow some types of workers to be given high powered incentives and others flat salaries. More recent work by Bob Gibbons, Rebecca Henderson, Jon Levin and others on relational contracting discusses how the nexus of self-enforcing beliefs about how hard work today translates into rewards tomorrow can substitute for formal contracts, and how the credibility of these “relational contracts” can vary across firms and depend on their history.

Here’s the kicker, though. A striking blow was dealt to all theories which rely on the incompleteness or nonverifiability of contracts by a brilliant paper of Maskin and Tirole (1999) in the Review of Economic Studies. Theories relying on incomplete contracts generally just hand-waved that there are always events which are unforeseeable ex-ante or impossible to verify in court ex-post, and hence there will always scope for disagreement about what to do when those events occur. But, as Maskin and Tirole correctly point out, agent don’t care about anything in these unforeseeable/unverifiable states except for what the states imply about our mutual valuations from carrying on with a relationship. Therefore, every “incomplete contract” should just involve the parties deciding in advance that if a state of the world arrives where you value keeping our relationship in that state at 12 and I value it at 10, then we should split that joint value of 22 at whatever level induces optimal actions today. Do this same ex-ante contracting for all future profit levels, and we are done. Of course, there is still the problem of ensuring incentive compatibility – why would the agents tell the truth about their valuations when that unforeseen event occurs? I will omit the details here, but you should read the original paper where Maskin and Tirole show a (somewhat convoluted but still working) mechanism that induces truthful revelation of private value by each agent. Taking the model’s insight seriously but the exact mechanism less seriously, the paper basically suggests that incomplete contracts don’t matter if we can truthfully figure out ex-post who values our relationship at what amount, and there are many real-world institutions like mediators who do precisely that. If, as Maskin and Tirole prove (and Maskin described more simply in a short note), incomplete contracts aren’t a real problem, we are back to square one – why have persistent organizations called firms?

What should we do? Some theorists have tried to fight off Maskin and Tirole by suggesting that their precise mechanism is not terribly robust to, for instance, assumptions about higher-order beliefs (e.g., Aghion et al (2012) in the QJE). But these quibbles do not contradict the far more basic insight of Maskin and Tirole, that situations we think of empirically as “hard to describe” or “unlikely to occur or be foreseen”, are not sufficient to justify the relevance of incomplete contracts unless we also have some reason to think that all mechanisms which split rent on the basis of future profit, like a mediator, are unavailable. Note that real world contracts regularly include provisions that ex-ante describe how contractual disagreement ex-post should be handled.

Hart’s response, and this is both clear from his CV and from his recent papers and presentations, is to ditch incompleteness as the fundamental reason firms exist. Hart and Moore’s 2007 AER P&P and 2006 QJE are very clear:

Although the incomplete contracts literature has generated some useful insights about firm boundaries, it has some shortcomings. Three that seem particularly important to us are the following. First, the emphasis on noncontractible ex ante investments seems overplayed: although such investments are surely important, it is hard to believe that they are the sole drivers of organizational form. Second, and related, the approach is ill suited to studying the internal organization of firms, a topic of great interest and importance. The reason is that the Coasian renegotiation perspective suggests that the relevant parties will sit down together ex post and bargain to an efficient outcome using side payments: given this, it is hard to see why authority, hierarchy, delegation, or indeed anything apart from asset ownership matters. Finally, the approach has some foundational weaknesses [pointed out by Maskin and Tirole (1999)].

To my knowledge, Oliver Hart has written zero papers since Maskin-Tirole was published which attempt to explain any policy or empirical fact on the basis of residual control rights and their necessary incomplete contracts. Instead, he has been primarily working on theories which depend on reference points, a behavioral idea that when disagreements occur between parties, the ex-ante contracts are useful because they suggest “fair” divisions of rent, and induce shading and other destructive actions when those divisions are not given. These behavioral agents may very well disagree about what the ex-ante contract means for “fairness” ex-post. The primary result is that flexible contracts (e.g., contracts which deliberately leave lots of incompleteness) can adjust easily to changes in the world but will induce spiteful shading by at least one agent, while rigid contracts do not permit this shading but do cause parties to pursue suboptimal actions in some states of the world. This perspective has been applied by Hart to many questions over the past decade, such as why it can be credible to delegate decision making authority to agents; if you try to seize it back, the agent will feel aggrieved and will shade effort. These responses are hard, or perhaps impossible, to justify when agents are perfectly rational, and of course the Maskin-Tirole critique would apply if agents were purely rational.

So where does all this leave us concerning the initial problem of why firms exist in a sea of decentralized markets? In my view, we have many clever ideas, but still do not have the perfect theory. A perfect theory of the firm would need to be able to explain why firms are the size they are, why they own what they do, why they are organized as they are, why they persist over time, and why interfirm incentives look the way they do. It almost certainly would need its mechanisms to work if we assumed all agents were highly, or perfectly, rational. Since patterns of asset ownership are fundamental, it needs to go well beyond the type of hand-waving that makes up many “resource” type theories. (Firms exist because they create a corporate culture! Firms exist because some firms just are better at doing X and can’t be replicated! These are outcomes, not explanations.) I believe that there are reasons why the costs of maintaining relationships – transaction costs – endogenously differ within and outside firms, and that Hart is correct is focusing our attention on how asset ownership and decision making authority affects incentives to invest, but these theories even in their most endogenous form cannot do everything we wanted a theory of the firm to accomplish. I think that somehow reputation – and hence relational contracts – must play a fundamental role, and that the nexus of conflicting incentives among agents within an organization, as described by Holmstrom, must as well. But we still lack the precise insight to clear up this muddle, and give us a straightforward explanation for why we seem to need “little Communist bureaucracies” to assist our otherwise decentralized and almost magical market system.

%d bloggers like this: