Who Got What, and Why? A Nobel for Claudia Goldin

How have women been paid for their work? More broadly, how are different skills in general rewarded in the labor market? The prices of different things are at the core of economics, and wages – that is, the price of labor – are the most important prices. Claudia Goldin, 2023’s Nobel Laureate in Economics, uses cliometrics, the combined tools of economics and history, to understand changes in the wages of women and men, of the less- and more-educated, of the part-time worker and late-night-at-the-office striver. This history is backward-looking in its evidence, but not in its usefulness. Goldin’s work helps us understand whose wages will rise, will fall, will equalize going forward. Not entirely unfairly, she will be described in much of today’s coverage as an economist who studies the gender gap. This description misses two critical pieces. The question of female wages is a direct implication of her earlier work on the return to different skills as the structure of the economy changes, and that structure is the subject of her earliest work on the development of the American economy. Further, her diagnosis of the gender gap is much more optimistic, and more subtle, than the majority of popular discourse on the topic.

Let us then begin at the start of Goldin’s career. She was trained as a cliometrician by fellow Laureate Robert Fogel at the University of Chicago, in the early days of this new field. To say that cliometrics was controversial among historians is a vast understatement. A horde of young brash economists, wielding alien theoretical models and talking about the most forbidden subject in history, the counterfactual (“History only occurs once,” the detractors cried), presumed to overturn some of the most widely-accepted facts in the field. Slavery was not naturally dying out due its lack of productivity! Since canals could have been built in the flat midwest, the railroad could not have been fundamental to US prosperity! To give a taste of the outsider nature of cliometrics, that very term was coined by a pure economic theorist, Stanley Reiter, who had no training whatsoever in history. Goldin’s review of the impact of cliometrics following Fogel and North’s Nobel makes clear her sympathies: a critical element in this research was uncovering “the interplay between structural change and technological change”, an interplay we can study with theoretical and empirical rigor. The structure of the economy changes rewards, which changes incentives, which cause us to study, to save, to work long hours or retire early, to benefit from increasing our knowledge or our brawn.

Chicago in the 1970s was, of course, the center of the field, and is sometimes seen in retrospect as the center of the orthodoxy. Not true at all! Beyond the cliometric revolution, Friedman was questioning the then-dominant belief of the role of the state in policy, Griliches was proving the importance of innovation and R&D for the economy, and Becker was freeing us to study what happens inside the family. If anything, Goldin arrived with an orthodox background – Bronx Science, that cradle of future laureates, then a shift from microbiology to antitrust at Cornell (nb: link is to a .doc file), before Hyde Park broadened the view of what a scientist, an economist, could do.

Goldin’s early work was straight history, though with a Beckerian tinge in its focus on the household. The old saw is that real economic history is dusty. Your hands must dig through archives, long-forgotten boxes with handwritten records that can barely be read, a spider web laced onto the corner of pages protected by neglect and darkness for decades. Her archaeology dug into urban work in the late 19th century (more daughters to help in the house allowed mothers to work elsewhere). She combined censuses from the early 19th century with a Lewis-type dual economy model to argue that the low relative productivity of women and children in the North (dairy and wheat) versus the south (cotton and tobacco, easier to monitor with piece rates) created a surplus labor that could fill the Lowell mills and, over time, drive up female wages. She found city-level records to look into black and white women’s labor force participation just after emancipation, arguing that norms related to slavery led to higher working rates among black women than could be explained by wages or incomes alone. She argues that the indirect economic costs of the Civil War, especially in the South, exceed the direct costs of damaged capital and deaths. It is perhaps unsurprising given this early history work that Goldin is a delightful academic writer, with carefully constructed turns of phrase and straightforward language. The influence of an insistence on clean prose by McCloskey, who was also in the history group at Chicago when Goldin studied there, seems evident.

There should be no surprise that work on the late 19th century, right before the American Century, draws one’s thoughts to how exactly the prosperity of the twentieth century came to be. Why did education soar, first at the high school level and later in universities? Why did women’s wages take so long to catch up (and in what ways have they still not done so)? How did societal changes, like gender norms or family size changes, matter relative to real economic factors like the development of new “second Industrial Revolution” technologies like electricity and modern chemistry?

The economic theory of the early and mid-20th century in America is straightforward. American industry became highly advanced. An increased demand for skill led to widespread schooling, making the US by far the world’s most educated large society. This supply shock combined with assembly line automation, or the incorporation of skilled work into machines, drove down the relative wages of white-collar workers, reducing inequality. Manufacturing work became highly efficient and more complex, requiring more skilled work over time, raising inequality and reducing the share of workers in that sector. Service-sector work grew in relative importance. As “brain” jobs replaced “brawn” jobs, women deciding on how to balance family and career increasingly weighed career more heavily, gaining expensive credentials to advance in the workplace.

But is this theory true? Largely due to the work of Goldin, we know it is. Consider schooling. In 1910, less than 10% of Americans graduated high school, and those that did were generally on a track to attend university. Latin and Greek were on the syllabus. Between 1910 and 1940, high school graduation soared to five times that rate in response to a growing demand for skilled labor, wages for bookkeeping jobs fell in response to this new supply, and the most productive firms began paying a premium for graduates even in blue collar occupations. And perhaps surprisingly, women were more likely to graduate than men throughout this period. This increase in schooling was not seen worldwide. In the United Kingdom, only 15% of 17-year-olds were attending school even in 1960. The decentralized, Republican (in the classic sense), secular United States had a unique source of state capacity to react to the opportunity of increased demand for skill.

World War II greatly increased the demand for less skilled manufacturing labor, and induced the temporary entry of women into the labor force. Inequality remained low, and female labor force participation continued to increase, following the war, even though the war itself was only a temporary event. Could World War II have caused some of these first-order twentieth century labor market patterns? Goldin argues no, returning again to basic economic theory. Inequality remained low in the middle of the century because of a surplus of skilled labor. And when it comes to Rosie the Riveter, Goldin shows no particular break in the entry of women into paid work surrounding the war, nor any important change in discrimination against married women by employers in its aftermath. That said, the middle of the century was taking us to the end of the era of the Steel Belt, the assembly line, and the powerful blue-collar union – even in 1940, Goldin and Katz show that the most technologically advanced manufacturing firms were recruiting higher-skilled workers at higher wages, restarting the skill-technology complementarity that would exacerbate inequality in the later decades of the twentieth century.

It should now be clear how Goldin arrived at the question of the gender gap. Let us begin by positing that workers are paid their marginal productivity – that is, holding all the capital and tasks and other workers constant, if adding you to the firm increases profits $20, then that is your marginal productivity. No one ought to pay you more than that, and competition among employers ought to push your wage to that level, in the standard economic model. Now many quibble with this model, sometimes correctly and sometimes not, but let us return to that complaint later; indeed, Goldin shows using a Women’s Bureau survey during the Great Depression that large firms discriminated wildly by assigning certain jobs only to certain sexes. Nonetheless, taking the standard economic model as a baseline, we can define wage discrimination as “otherwise identical workers are paid different wages”, or alternatively, “controlling for field, experience, education, and other factors which affect productivity, gender should not affect wages.”

Two obvious objections arise. First, why hold all else constant? The structure of work – the type of coworkers one has, the hours the office is open – directly affect a given workers’ marginal productivity. For instance, let there be two types of workers, Early Bird and Night Owl. When workers work at the same time, they produce a combined $10 of profit. If they work at different times, they produce $3 in profit each. If most workers in an industry are Night Owls, Early Bird either earns a lower wage or suffers the disutility of working an uncomfortable shift for which they must be compensated. And likewise, if most are Early Birds, then Night Owl earns less or gets more disutility from their work schedule. Alongside Larry Katz, Goldin gives the canonical example of the pharmacist, whose gender gap is smaller than almost every other high-wage profession. Why? Wages are largely “linear in hours”. Today, though not historically, pharmacists generally work in teams at offices where they can substitute for each other. No one is always “on call”. Hence a pharmacist who wants to work late nights while young, then shorter hours with a young kid at home, then a longer worker day when older can do so. If pharmacies were structured as independent contractors working for themselves, as they were historically, the marginal productivity of a worker who wanted this type of flexibility would be lower. The structure of the profession affects marginal productivity, hence wages and the gender gap, particularly given the different demand for steady and shorter hours among women. Now, not all jobs can be turned from ones with convex wages for long and unsteady hours to ones with linear wages, but as Goldin points out, it’s not at all obvious that academia or law or other high-wage professions can’t make this shift. Where these changes can be made, we all benefit from high-skilled women remaining in high-productivity jobs: Goldin calls this “the last chapter” of gender convergence.

The second objection to the definition of wage discrimination as “identical workers get identical wages” is that education and careers endogenously determine one’s marginal productivity. If women are banned from going to school, then otherwise identical women and men at birth will have different wages at adulthood. Likewise, if “marriage bars” are enacted preventing married women from working in a field, these women will receive less training from their early-career bosses and be put on “dead-end” tracks. And indeed the same is true of policies we may think of as beneficial: the option for young mothers to take temp work or a longer parental leave policy may harm all womens’ careers by changing job tracks and training in exactly the same way as a marriage bar. Goldin refers to a revolution in women’s work from one where women worked out of necessity, or temporarily before marriage, to one where they expected to have a career rather than a job and hence where they made education and career advancement decisions under that assumption, just as men had. Both technological and economic changes led to this revolution. The availability of the pill led to sharp increases in women attending university and professional programs in the 1970s. Women in the office in the early 20th century worked neither the lowest-paid nor the highest-paid jobs, but instead worked the jobs with no probability for advancement, a pattern Goldin credits to an attempt to signal to men that even the “bad jobs” they get still can become a “career”. Yet regardless of birth control and changes in societal norms, women’s investment in career, and the extent to which they are groomed for advancement, still depends on how likely they are to stay in the workforce at long hours. A particularly intriguing followup to Goldin’s evidence has been the “child penalty” literature of Kleven et al, showing a huge fraction of the remaining gender gap in the developed world occurs only after a child is born.

You may notice a wild difference in how Goldin the economic historian discusses the gender gap compared to most other writers. Holding experience, education and field constant, the gender gap has shrunk to the low single digits. But understand what this implies! How women negotiate pay raises, whether bosses discriminate in promotion within a given firm, whether a firm has a really inclusive set of benefits – all of this is absolutely on the margin. What matters for the gender gap in the long run is technology and societal norms, particularly in their interaction with family. If high-productivity jobs require brawn, or require sixty-hour work weeks while society pressures women to raise children at home, then the gender gap will be large. If women can rationally expect to work a longer career due to better birth control or fewer bars (legal or normative) on married women working, or if white-collar jobs become more common relative to manufacturing, women’s wages will rise. That is, the definition of wage discrimination as “identical workers get different pay” is useful for ruling out some of the most common popular explanations for the gender gap. It does not mean that economists like Goldin are unaware that women may disproportionately avoid fields with long hours, or more physical danger, or that women may be put on “mommy tracks” in their career with less training and room for advancement. She is quite aware of this and wants to focus your attention on these concerns!

Economic history helps us understand the past. But just as structural and technological changes in the past affected who got what, similar changes today will affect who gets what tomorrow. Current trends like onshoring, work-from-home, increasing parental leave lengths, flexible work locations, and growing gender gaps in schooling will not have innocuous distributional consequences. There is a lot to consider about how these changes both directly affect marginal productivity as well as indirectly affect societal norms, the expectations of young people, the training decisions of managers, and the speed of skill acquisition. As Goldin has shown, that progress benefits all people equally is not something we should assume. When evaluating her work, I can think of no stronger commendation than that I have no idea what Goldin will show me when I begin reading a paper; rather, she is always thoughtful, follows the data, rectifies what she finds with theory, and feels no compunction about sacrificing some golden goose – again, the legacy of 1970s Chicago rears its head. Especially on a topic as politically loaded as gender, this intellectual honesty is the source of her influence and a delight to the reader trying to understand such an important topic.

Statistics for Strategic Scientists – A Clark for Isaiah Andrews

Today’s 2021 Clark Medal goes to the Harvard econometrician Isaiah Andrews, and no surprise. Few young econometricians have produced such a volume of work so quickly. And while Andrews has a number of papers on traditional econometric topics – how to do high-powered inference on non-linear models, for instance – I want to focus here on his work on what you might call “strategic statistics”.

To understand what we mean by that term, we need to first detour a bit and understand what econometrics is anyway. The great Joseph Schumpeter, in a beautiful short introduction to the new Econometric Society in 1933, argues that economics is not only the most mathematical of the social or moral sciences, but of all sciences. How can that be? Concepts in physics like mass or velocity are surely quantitative, but must be measured before we can put some number of them. However, concepts in economics are fundamentally quantitative: our basic building blocks are prices and quantities. From these numerical concepts comes the natural desire to investigate the relationship between them: estimates of demand curves go back at least to Gregory King and Charles D’Avenant in the 17th century! The issue is not that economics is amenable to theoretical investigation. Rather, from King on forward through von Thunen, Cournot, Walras, Fisher and many more, economics is a science where numerical data from the practice of markets is combined with theory.

Econometrics, then, is not simply statistics, a problem of computing standard errors and developing new estimators. Rather, as historians of thought like Mary Morgan and Ray Epstein have pointed out, econometrics is a unique subfield of statistics because of its focus on identification is models where different variables are simultaneously determined. Consider estimating how a change in the price of iron will affect sales. You observe a number of points of data from prior years: at 20 dollars per ton, 40 tons were sold; at 30 dollars per ton, 45 tons were sold. Perhaps you have similar data from different countries with disconnected markets. Even in the 1800s, the tool of linear regression with controls was known. Look at the numbers above: 40 tons sold at $20, and 45 tons at $30. The demand curve slopes up! The naïve analyst goes to Mr. Carnegie and suggests, on the basis of past data, that if he increases the price of iron, he will sell even more!

The problem with running this regression, though, should be clear if one starts with theory. The price of iron depends on the conjunction of supply and demand, on Marshall’s famous “scissors”. Our observational data cannot tell us whether changes in the price-quantity pairs observed happened because demand or supply shifted. This conflation is common in public reasoning: we observe that house prices are rising very quickly at the same time as many new condos are being built in the neighborhood, and think the latter is causing the former. Correctly, however, both the price increase and the new construction can occur if demand for living in the neighborhood increases and the marginal cost of construction is increasing. Supply and demand is not the only simultaneous stochastic equation model in economics, of course: anything with strategic behavior that determines an equilibrium will be as well.

This causal identification problem goes back at least to what Trygve Haavelmo pointed out in 1943. The past relationship between prices and quantities sold is not informative as to what will happen if I choose to raise prices. Likewise, though rising prices and new construction correlate, if we choose to increase construction in an area, prices will fall. Though there is an empirical link between rising prices and a strong economy, we cannot generate a strong economy in the long run just by inflating the currency. Econometrics, then, is largely concerned with the particular statistical problem of identifying certain parameters that explain what will happen if we change one part of the system through policy or when we place people with different known preferences, costs, and so on in the same strategic situation.

How we can do that is an oft-told story, but roughly we can identify a parameter in a simultaneously determined model with statistical assumptions or with structural assumptions. In the context of supply and demand, if we randomly increase the price a certain good is sold at in a bunch of markets, that experiment can help identify the price elasticity of demand, holding demand constant (but tells us nothing about what happens to price and quantity if consumer demand changes!). If we use “demand” or “supply” shifters – random factors that affect price only via their effect of firm costs or consumer demand – these “instrumental variables” allow us to split the supply and demand curve in past observational data. If we assume more structure, such as that there are a set of firms who price according to Cournot, then we can back out firm costs and look at counterfactuals like “what if a merger reduced the number of firms in this market by one”. The important thing to realize is no matter where an empirical technique lies on the spectrum from purely statistical to heavily theory-driven, there are underlying assumptions being made by the econometrician to identify the parameters of interest. The exclusion restriction in an IV, that the shifter in question only affects price via the supply or demand side, is as much an untestable assumption as the argument that firms are profit-maximizing Cournot players.

This brings us back to Isaiah Andrews. How do scientists communicate their results to the public, particularly when different impossible-to-avoid assumptions give different results? How can we ensure the powerful statistical tools we use for internal validity, meaning causally-relevant insight in the particular setting from which the empirical is drawn, do not mislead about external validity, the potential for applying those estimates when participants have scope for self-selection or researchers select convenient non-representative times or locations for their study? When our estimation is driven by the assumptions of a model, what does it mean when we say our model “fits the data” or “explains key variation in the data”? These questions are interesting essentially because of the degrees of freedom the researcher holds in moving from a collection of observations to a “result”. Differences of opinion in economics are not largely about the precision of estimated data, a la high energy physics, but about the particular assumptions used by the analyst to move from data to estimated parameters of interest. Taking this seriously is what I mean above by “strategic statistics”: the fact that identification in economics requires choices by the analyst means we need to take the implications of those choices seriously. Andrews’ work has touched on each of the questions above in highly creative ways. I should also note that, by the standards of high-rigor econometrics, his papers tend to be quite readable and also quite concise.

Let’s begin with scientific communication. As we are all aware from the disastrous Covid-related public science in the past year (see Zeynep Tufekci’s writing for countless examples), there is often a tension between reporting results truthfully and the decisions taken based on those results. Andrews and Shapiro model this as a Wald-style game where scientists collect data and provide an estimate of some parameter, then a decision is made following that report. The estimate is of course imprecise: science involves uncertainty. The critical idea is that the “communications model” – where scientists report an estimate and different agents take actions based on that report – differs from the “decision model” where the scientist selects the actions (or, alternatively, the government chooses a common policy for all end-users on the basic of scientist recommendations). Optimal communication depends heavily on which setting you are in. Imagine that a costly drug is weakly known to improve health, but the exact benefit is unknown. When they can choose, users take the drug if the benefit exceeds their personal cost of taking it. In an RCT, because of sampling error, sometimes you’ll get that the drug is harmful when you try to estimate how “beneficial” it is. In a communications model, the readers adjust for sampling error, so you just report truthfully: there is still useful information in that “negative” estimate because it still tells you that the effect of the drug is likely to be close to zero the more negative the point estimate. No reason to hide that from readers! In a “decision model”, you would essentially be forcing a tax on the drug just because of sampling error, even though you know this is harmful, so optimally you censor the reporting and just give “no effect” in your scientific communications. There is a really interesting link between decision theory and econometrics going back to Wald’s classic paper. The tension between open communication of results to users with different preferences, and recommended decisions to those same users is well worth further investigation.

How to communicate results also hinges on internal versus external validity. A study done in Tanzania on schooling may find that paying parents to send kids to school increases attendance 16%. This study may be totally randomized with the region. What would the effect be of paying parents in Norway? If the only difference across families depends on observables within the support of the data in the experiment, we can simply reweight results. This seems implausible, though – there are many unobservable differences between Oslo and Dodoma. In theory, though, if all those unobservables were known, we again just have a reweighting problem. Emily Oster and Andrews show that bounds on the externally valid effect of a policy can be constructed if you assume the share of covariance between selection/participation and the estimated treatment effects (the idea here is not far off from the well-known Oster bounds for omitted variable bias). For instance, in the Bloom et al work from home in China paper, call center workers who choose to work-from-home see a nontrivial increase in productivity. Perhaps they select to work-from-home because they know they can do so efficiently, however. Using the Oster-Andrews bound, to get a negative effect of work-from-home for this call center, unobservable differences across workers would have to be 14.7 times more informative about treatment effect heterogeneity than observables.

In addition to unobservables making our estimates hard to apply outside very specific contexts, structural assumptions can also “drive” results. Structural models often use a complex set of assumptions to identify a model, where “identify” means that distinct estimated outcomes of interest depend on distinct underlying data (the “traditional” definition). But which assumptions are critical? What changes if we modify one of them? This is a very hard question to answer: as every structural economist knows, we often don’t know how to “close” the model so that it can be estimated if we change the assumptions. Many authors loosely say that “x is identified by y” when the estimated x is very sensitive to changes in y, where y might be an a priori assumption, or a particular type of data. In that sense, “what is critical to the estimate in this structural model” is asking “how can I trust the author that y in fact identifies x”? In a paper in JBES, Andrews and coauthors sum up this problem in a guide to practical sensitivity analysis in structural models: “A reader who accepted the full list of assumptions could walk away having learned a great deal. A reader who questioned even one of the assumptions might learn very little, as they would find it hard or impossible to predict how the conclusions might change under alternative assumptions.” Seems like a problem! However, Andrews has shown, in a 2017 QJE with Gentzkow and Shapiro, that the formal sensitivity of structural estimates is in fact possible.

The starting point is that some statistical techniques are transparent: if you regress wages on education, we all understand that omitting skill biases this relationship upward, and that if we know the covariance of skill and education, we have some idea of the magnitude of the bias. Andrews’ idea is to take the same idea more broadly to any moment-based estimate. If you have some guess about how an assumption affects particular moments of the data, then you can use a particular matrix to approximate how changes in those moments affect the parameters we care about. Consider this example. In a well-known paper, Dellavigna et al find that door-to-door donations to charity are often just based on social pressure. That is, we give a few bucks to get this person off our doorstep without being a jerk, not because we care about the charity. The model uses variation in whether you knew the person was coming to your doorstep alongside an assumption that, basically, social pressure drives small donations with a different distribution from altruistic/warm glow donations. In particular, the estimate of social pressure turns out to be quite sensitive to donations of exactly ten dollars. Using the easy-to-compute matrix in Andrews et al, you can easily answer, as a reader, questions like “how does the estimate of social pressure change if 10% of households just default to giving ten bucks because it is a single bill, regardless of social pressure vs. warm glow?” I think there will be much more role for ex-post dashboard/webapp type analyses by readers in the future: why should a paper restrict to the particular estimates and robustness the authors choose? Just as open data is not often required, I wouldn’t be surprised if “open analysis” in the style of this paper becomes common as well.

A few remaining bagatelles: Andrews’ work is much broader than just what has been discussed here, of course. First, in a very nice applied theory paper in the AER, Andrews and Dan Barron show how a firm like Toyota can motivate its suppliers to work hard even when output is not directly contractible. Essentially, recent high performers become “favored suppliers” who are chosen whenever the planner believes their productivity in the current period is likely quite high. Payoffs to the firm with this rule are strictly higher than just randomly choosing some supplier that is expected to be productive today, due to the need to dynamically provide incentives to avoid moral hazard. Second, in work with his dissertation advisor Anna Mikusheva, Andrews has used results from differential geometry to perform high-powered inference when the link between structural parameters and the outcome of interest is highly non-linear. Third, in work with Max Kasy, Andrews shows a much more powerful way to identify the effect of publication bias that simply comparisons of the distribution of p-values around “significance” cutoffs. Fourth, this is actually the second major prize for econometrics this year, as Silvana Tenreyro won the “European Clark”, the Yrjo Jahnsson Award, this year alongside Ricardo Reis. Tenreyro is well-known for the ppml estimator is her “log of gravity” paper with Santos Silva. One wonders who will be the next Nobel winner in pure econometrics, however: a prize has not gone to that subfield since Engle and Granger in 2003. I could see it going two ways: a more “traditional” prize to someone like Manski, Hausman, or Phillips, or a “modern causal inference” prize to any number of contributors to that influential branch. Finally, I realize I somehow neglected to cover the great Melissa Dell’s Clark prize last year – to be rectified soon!

Operations Research and the Rise of Applied Game Theory – A Nobel for Milgrom and Wilson

Today’s Nobel Prize to Paul Milgrom and Robert Wilson is the capstone of an incredibly fruitful research line which began in the 1970s in a few small departments of Operations Research. Game theory, or the mathematical study of strategic interaction, dates back to work by Zermelo, Borel and von Neumann in the early 20th century. The famed book by von Neumann and Morganstern was published in 1944, and widely reviewed as one of the most important social scientific works of the century. And yet, it would be three decades before applications of game theory revolutionized antitrust, organizational policy, political theory, trade, finance, and more. Alongside the “credibility revolution” of causal econometrics, and to a lesser extent behavioral economics, applied game theory has been the most important development in economics in the past half century. The prize to Milgrom and Wilson is likely the final one that will go for early applied game theory, joining those in 1994, 2005, 2007, 2014 and 2016 that elevated the abstractions of the 1940s to today’s rigorous interpretations of so many previously disparate economic fields.

Neither Wilson nor Milgrom were trained in pure economics departments. Wilson came out of the decision sciences program of Howard Raiffa at Harvard, and Milgrom was a student of Wilson’s at Stanford Business School. However, the link between operations research and economics is a long one, with the former field often serving as a vector for new mathematical tools before the latter field was quite ready to accept them. In the middle of the century, the mathematics of optimal control and dynamic programming – how to solve problems where today’s action affects tomorrow’s possibilities – were applied to resource allocation by Kantorovich in the Soviet Union and to market economics problems in the West by Koopmans, Samuelson, Solow, and Dorfman. Luce and Raiffa explained how the theory of games and the ideas of Bayesian decision theory apply to social scientific problems. Stan Reiter’s group first at Purdue, then later with Nancy Schwartz at Kellogg MEDS, formally brought operations researchers and economists into the same department to apply these new mathematics to economic problems.

The real breakthrough, however, was the arrival of Bayesian games and subgame perfection from Harsanyi (1968) and Selten (1965, 1975). These tools in combination allow us to study settings where players signal, make strategic moves, bluff, attempt to deter, and so on. From the perspective of an institutional designer, they allow us, alongside Myerson’s revelation principle, to follow Hayek’s ideas formally and investigate how we should organize an economic activity given the differing information and possible actions of each player. Indeed, the Wilson Doctrine argues that practical application of game theory requires attention to these informational features. There remains a more complete intellectual history to be written here, but Paul Milgrom and Al Roth’s mutual interview of the JEP provides a great sense of the intellectual milieu of the 1970s as they developed their ideas. Wilson, the Teacher, and Milgrom, the Popularizer, were at the heart of showing just how widely these new tools in game theory could be applied.

Let us begin with the Popularizer. Milgrom was born and raised in Michigan, taking part in anti-war and anti-poverty protests as a radical student in Ann Arbor in the late 1960s. The 1960s were a strange time, and so Milgrom went straight from the world of student activism to the equally radical world of…working as an actuary for an insurance company. After enrolling in the MBA program at Stanford in the mid-1970s, he was invited to pursue a PhD under his co-laureate Robert Wilson, who, as we shall see, was pursuing an incredibly lucrative combination of operations research and economics with his students. It is hard to overstate how broad Milgrom’s contributions have been, both theoretically and in practice. But we can get a good taste by looking at four: the multitasking problem and the no-trade theorem on the theoretical side, and medieval guilds and modern spectrum auctions on the applied side.

It is perhaps surprising that Milgrom’s most-cited paper was published in the JLEO, well into his career. But the famed multitasking paper is so incredibly informative. The idea is simple: you can motivate someone either with direct rewards or by changing their opportunity cost. For instance, if you want a policeman to walk the beat more often, then make their office particularly dull and full of paperwork. Workers generally have many tasks they can work on, however, which vary in their relative costs. For example, a cop can slack off, arrest people on nonsense charges, or solve murders. Making their office dull will cause them to sit at their office desk for fewer hours, but it likely won’t cause them to solve murders rather than arrest people on nonsense charges. Why not just pay for the solved murders directly? Often it is impossible to measure or observe everything you want done.

If you “reward A while hoping for B”, as Steven Kerr’s famous management paper puts it, you are likely to get a lot of A. If you pay rewards for total arrests, your cops will cite people for riding bikes with no lights. So what can be done? Milgrom and Holmstrom give a simple model where workers exert effort, do some things you can measure and some which you cannot, and you get a payoff depending on both. If a job has some things you care about which are hard to measure, you should use fewer strong incentives on the things you can measure: by paying cops for arrests, you make the opportunity cost of solving murders for the cops who like doing this higher, because now they are giving up the reward they would get from arresting the bicyclists every time they work a murder! Further, you should give workers working on hard-to-measure tasks little job flexibility. The murder cop paid on salary should need to show her face in the office, while the meter maid getting paid based on how many tickets she gives already has a good reason not to shirk while on the clock. Once you start thinking about multitasking and the interaction of incentives with opportunity costs, you start seeing perverse incentives absolutely everywhere.

Milgrom’s writings on incentives within organizations are without a doubt the literature I draw on most heavily when teaching strategic management. It is a shame that the textbook written alongside John Roberts never caught on. For a taste of their basic view of management, check out “The Firm as an Incentive System”, which formally lays out formal incentives, asset ownership, and task assignments as a system of complements which make organizations function well. The field now known as organizational economics has grown to incorporate ideas like information transmission (Garicano 2000 JPE) and the link between relational contracts and firm culture (e.g., Gibbons and Henderson 2011). Yet there remain many questions on why firms are organized the way they are which are open to an enterprising graduate student with a good theoretical background.

Multitasking has a similar feel to many of Milgrom’s great papers: they provide a framework improving our intuition about some effect in the world, rather than just showing a mathematical curiosity. The same is true of his most famous finance paper, the “no-trade theorem” developed with Nancy Stokey. The idea is ex-post obvious but ex-ante incredibly surprising. Imagine that in the market for corn, there is free exchange, and all trades anyone wants to make (to mitigate risk, for use, to try to trade on private information, etc.) have been made. A farmer one day notices a blight on his crop, and suspects this blight is widespread in the region. Therefore, the supply of corn will fall. Can he profit from this insight? Milgrom-Stokey’s answer is no!

How could this be? Even if everyone had identical prior beliefs about corn supply, conditional on getting this information, the farmer definitely has a higher posterior belief about corn price come harvest season than everyone else. However, we assumed that before the farmer saw the blight, all gains from trade had been exhausted, and that it was common knowledge that this was so. The farmer offering to buy corn at a higher price is informative that the farmer has learned something. If the former price was $5/bushel, and the farmer offers you $7, then you know that he has received private information that the corn will be worth even more than $7, hence you should not sell him any. Now, of course there is trade on information all the time; indeed, there are huge sums spent collecting information so that it can be traded on! However, Milgrom-Stokey makes clear just how clear we have to be about what is causing the “common knowledge that all gains from trade were exhausted” assumption to fail. Models with “noise” traders, or models with heterogeneous prior beliefs (a very subtle philosophical issue), have built on Milgrom-Stokey to understand everything from asset bubbles to the collapse in trade in mortgage backed securities in 2008.

When it comes to practical application, Milgrom’s work on auctions is well-known, and formed the basis of his Nobel citation. How did auctions become so “practical”? There is no question that the rise of applied auction theory, with the economist as designer, has its roots in the privatization wave of the 1990s that followed the end of the Cold War. Governments held valuable assets: water rights, resource tracts, spectrum that was proving important for new technologies like the cell phone. Who was to be given these assets, and at what price? Milgrom’s 1995 Churchill lectures formed the basis for a book, “Putting Auction Theory to Work”, which is now essential reading alongside Klemperer’s “Theory and Practice”, for theorists and practitioners alike. Where it is unique is in its focus on the practical details of running auctions.

This focus is no surprise. Milgrom’s most famous theoretical work in his 1982 Econometrica with Robert Weber on optimal auctions which are partly common-value and partly private-value. That is, consider selling a house, where some of the value is your idiosyncratic taste, and some of the value is whether the house has mold. Milgrom and Weber show a seller should reduce uncertainty as much as possible about the “common” part of the value. If the seller does not know this information or can’t credibly communicate it, then unlike in auctions which don’t have that common component, it matters a lot whether how you run the auction. For instance, with a first-price auction, you may bid low even though you like the house because you worry about winning when other bidders noticed the mold and you didn’t. In a second-price auction, the price you pay incorporates in part that information from others, hence leads to more revenue for the homeseller.

In practical auctions more broadly, complements across multiple goods being sold separately, private information about common components, the potential to collude or form bidder rings, and the regularity with which auctions are held and hence the number of expected bidders are all incredibly important to auctioneer revenue and efficient allocation of the object being sold. I omit further details of precisely what Milgrom did in the many auctions he consulted on, as the popular press will cover this aspect of his work well, but it is not out of the question to say that the social value of better allocation of things like wireless spectrum is on the order of tens of billions of dollars.

One may wonder why we care about auctions at all. Why not just assign the item to whoever we wish, and then let the free market settle things such that the person with the highest willingness-to-pay winds up with the good? It seems natural to think that how the good is allocated matters for how much revenue the government earns – selling the object is better on this count than giving it away – but it turns out that the free market will not in general allocate goods efficiently when sellers and buyers are uncertain about who is willing to pay how much for a given object.

For instance, imagine you own a car, and you think I am willing to pay somewhere between $10,000 and $20,000 to buy it from you. I think you are willing to give up the car for somewhere between $5,000 and $15,000. I know my own valuation, so let’s consider the case where I am willing to pay exactly $10,000. If you are willing to sell for $8,000, it seems reasonable that we can strike a deal. This is not the case: since all you know is that I am willing to pay somewhere between $10,000 and $20,000, you do know you can always get a $2,000 profit by selling at $10,000, but also that it’s incredibly unlikely that I will say no if you charge $10,001, or $11,000, or even more. You therefore will be hesitant to strike the deal to sell for 10 flat. This essential tension is the famed Myerson-Satterthwaite Theorem, and it occurs precisely because the buyer and seller do not know each other’s value for the object. A government auctioning off an object initially, however, can do so efficiently in a much wider set of contexts (see Maskin 2004 JEL for details). The details of auction design cannot be fixed merely by letting the market sort things out ex-post: the post-Cold War asset sales had issues not just of equity, but also efficiency. Since auctions today are used to allocate everything from the right to extract water to carbon emissions permits at the heart of global climate change policy, ensuring we get their design right is not just a minor theoretical concern!

The problem of auction design today is, partly because of Milgrom’s efforts, equally prominent in computer science. Many allocation problems are computational, with players being algorithms. This is true of electricity markets in practice, as well as the allocation of online advertisements, the design of blockchain-like mechanisms for decentralized exchange and record-keeping, and methods for preventing denial of service attacks while permitting legitimate access to internet-connected servers. Even when humans remain in the loop to some extent, we need to guarantee not just an efficient algorithm, but a practically-computable equilibrium. Leyton-Brown, Milgrom and Segal discuss this in the context of a recent spectrum auction. The problem of computability turns out to be an old one: Robert Wilson’s early work was on precisely the problem of computing equilibria. Nonetheless, given its importance in algorithmic implementation of mechanisms, it would be no surprise to see many important results in applied game theory come from computer scientists and not just economists and mathematicians in coming years. This pattern of techniques flowing from their originating field to the one where they have important new applications looks a lot like the trail of applied game theory arriving in economics by way of operations research, does it not?

That deep results in game theory can inform the real world goes beyond cases like auctions, where the economic setting is easy to understand. Consider the case of the long distance trade in the Middle Ages. The fundamental problem is that of the Yuan dynasty folk song: when “heaven is high and the emperor is far away”, what stops the distant city you arrive in from confiscatory taxation, or outright theft, of your goods? Perhaps the threat that you won’t return to trade? This is not enough – you may not return, but other traders will be told, “we had to take the goods from the last guy because he broke some rules, but of course we will treat you fairly!” It was quite common for confiscation to be targeted only at one group – the Genoans in Constantinople, the Jews in Sicily – with all other traders being treated fairly.

The theory of repeated games can help explain what to do. It is easiest to reach efficiency when you punish not only the cheaters, but also punish those who do not themselves punish cheaters. That is, the Genoans need to punish not just the Turks by withdrawing business, but also punish the Saracens who would try to make up the trade after the Genoans pull out. The mechanism to do so is a merchant guild, a monopoly which can enforce boycotts in distant cities by taking away a given merchant’s rights in their own city. Greif, Milgrom and Weingast suggest that because merchant guilds allow cities to credibly commit to avoid confiscation, they benefit the cities themselves by increasing the amount of trade. This explains why cities encouraged the formations of guilds – one does not normally encourage your sellers to form a monopsony!

Enough on the student – let us turn to Milgrom’s advisor, Robert Wilson. Wilson was born in the tiny hamlet of Geneva, Nebraska. As discussed above, his doctoral training at Harvard was from Howard Raiffa and the decision theorists, after which he was hired at Stanford, where he has spent his career. As Milgrom is now also back at Stanford, their paths are so intertwined that the two men now quite literally live on the same street.

Wilson is most famous for his early work applying the advances of game theory in the 1970s to questions in auction design and reputation. His 3 page paper written in 1966 and published in Management Science in 1969 gives an early application of Harsanyi’s theory of Bayesian games to the “winner’s curse”. The winner’s curse means that the winner of an auction for a good with a “common value” – for instance, a tract of land that either has oil or does not – optimally bids less in a first-price auction than what they believe that good to be worth, or else loses money.

One benefit of being an operations researcher is that there is a tight link in that field between academia and industry. Wilson consulted with the Department of the Interior on oil licenses, and with private oil companies on how they bid in these auctions. What he noticed was that managers often shaded down their engineer’s best estimates of the value of an oil tract. The reason why is, as the paper shows, very straightforward. Assume we both get a signal uniformly distributed on [x-1,x+1] about the value of the tract, where x is the true value. Unconditionally, my best estimate of the value of the plot is exactly my signal. However, conditional on winning the auction, my signal was higher than my rivals. Therefore, if I knew my rival’s signal, I would have bid exactly halfway between the two. Of course, I don’t know her signal. But since my payoff is 0 if I don’t win, and my payoff is my bid minus x if I win, there is a straightforward formula, which depends on the distribution of the signals, for how much I should shade my bid. Many teachers have drawn on Bob’s famous example of the winner’s curse by auctioning off a jar of coins in class, the winner inevitably being the poor student who doesn’t realize they should have shaded their bid!

Wilson not only applied these new game theoretic tools, but also developed many of them. This is particularly true in 1982, when he published all three of his most cited papers: a resolution of the “Chain store paradox”, the idea of sequential equilibria, and the “Gang of Four” reputation paper with Kreps, Roberts, and Milgrom. To understand these, we need to understand the problem of non-credible threats.

The chain store paradox goes like this. Consider Walmart facing a sequence of potential competitors. If they stay out, Walmart earns monopoly profits in the town. If they enter, Walmart can either fight (in which case both make losses) or accept the entry (in which case they both earn duopoly profits, lower than what Walmart made as a monopolist). It seems intuitive that Walmart should fight a few early potential competitors to develop a reputation for toughness. Once they’ve done it, no one will enter. But if you think through the subgame perfect equilibrium here, the last firm who could enter knows that after they enter, Walmart is better off accepting the entry. Hence the second-to-last firm reasons that Walmart won’t benefit from establishing a reputation for deterrence, and hence won’t fight it. And likewise for the third-to-last entrant and so on up the line: Walmart never fights because it can’t “credibly” threaten to fight future entrants regardless of what it did in the past.

This seems odd. Kreps and Wilson (JET 1982) make an important contribution to reputation building by assuming there are two types of Walmart CEOs: a tough one who enjoys fighting, and a weak one with the normal payoffs above. Competitors don’t know which Walmart they are facing. If there is even a small chance the rivals think Walmart is tough, then even the weak Walmart may want to fight early rivals by “pretending” to be tougher than they are. Can this work as an equilibrium? We really need a new concept, because we both want the game to be perfect, where at any time, players play Nash equilibria from that point forward, and Bayesian, where players have beliefs about the others’ type and update those beliefs according to the hypothesized equilibrium play. Kreps and Wilson show how to do this in their Econometrica introducing sequential equilibria. The idea here is that equilibria involve strategies and beliefs at every node of the game tree, with both being consistent along the equilibrium path. Beyond having the nice property of allowing us to specifically examine the beliefs at any node, even off the equilibrium path, sequential equilibria are much simpler to compute than similar ideas like trembling hand perfection. Looking both back to Wilson’s early work on how to compute Nash equilibria, and Milgrom’s future work on practical mechanism design, is it not surprising to see the idea of practical tractability appear even back in 1982.

This type of reputation-building applies even to cooperation – or collusion, as cooperating when it is your interest to cheat and colluding when it is in your interest to undercut are the same mathematical problem. The Gang of Four paper by Kreps, Wilson, Milgrom, and Roberts shows that in finite prisoner’s dilemmas, you can get quite a bit of cooperation just with a small probability that your rival is an irrational type who always cooperates as long as you do so. Indeed, the Gang of Four show precisely how close to the end of the game players will cooperate for a given small belief that a rival is the naturally-cooperative type. Now, one may worry that allowing types in this way gives too much leeway for the modeler to justify any behavior, and indeed this is so. Nonetheless, the 1982 papers kicked off an incredibly fruitful search for justifications for reputation building – and given the role of reputation in everything from antitrust to optimal policy from central banks, a rigorous justification is incredibly important to understanding many features of the economic world.

I introduced Robert Wilson as The Teacher. This is not meant to devalue his pure research contributions, but rather to emphasize just how critical he was in developing students at the absolute forefront of applied games. Bengt Holmstrom did his PhD under Wilson in 1978, went to Kellogg MEDS after a short detour in Finland, then moved to Yale and MIT before winning the Nobel Prize. Al Roth studied with Wilson in 1974, was hired at the business school at Illinois, then Pittsburgh, then Harvard and Stanford before winning a Nobel Prize. Paul Milgrom was a 1979 student of Wilson’s, beginning also at MEDS before moving to Yale and Stanford, and winning his own Nobel Prize. This is to say nothing of his students developed later, including the great organizational theorist Bob Gibbons, or his earliest students like Armando Ortega Reichert, whose unpublished dissertation in 1969 contains important early results in auction theory and was an important influence on the limit pricing under incomplete information in Milgrom and Roberts (1982). It is one thing to write papers of Nobel quality. It is something else altogether to produce (at least!) three students who have done the same. And as any teacher is proud of their successful students, surely little is better than winning a Nobel alongside one of them!

Alberto Alesina and Oliver Williamson: Taking Political and Economic Frictions Seriously

Very sad news this week for the economics community: both Oliver Williamson and Alberto Alesina have passed away. Williamson has been in poor health for some time, but Alesina’s death is a greater shock: he apparently had a heart attack while on a hike with his wife, at the young age of 63. While one is most famous for the microeconomics of the firm, and the other for political economy, there is in fact a tight link between their research agendas. They have attempted to open “black boxes” in economic modeling – about why firms organize the way they do, and the nature of political constraints on economic activity – to clarify otherwise strange differences in how firms and governments behave.

First, let us discuss Oliver Williamson, the 2009 Nobel winner (alongside Elinor Ostrom), and student of Ken Arrow and later the Carnegie School. He grew up in Superior, Wisconsin, next to Duluth at the frigid tip of Lake Superior, as the son of two schoolteachers. Trained as an engineer before returning to graduate school, he had a strong technical background. However, he also possessed, in the words of Arrow, the more important trait of “asking good questions”.

Industrial organization in the 1960s was a field that needed a skeptical mind. To a first approximation, any activity that was unusual was presumed to be anti-competitive. Vertical integration as anticompetitive was high on this list. While Williamson was first thinking about the behavior of firms, the famous case of U.S. vs. Arnold, Schwinn reached the Supreme Court. Schwinn, the bicycle company, neither owned distributors nor retailers. However, it did contractually limit distributors from selling bikes to retailers that were not themselves partnered with Schwinn. In 1967, the Supreme Court ruled these contracts an antitrust violation.

Williamson was interested in why a firm might limit these distributors. Let’s start with the ideas of Mr. Coase. Coase argued that transactions in a market are not free: we need to find suppliers, evaluate quality, and so on. The organization of economic activity therefore attempts to economize on these “transaction costs”. In the Coasean world, transaction costs were nebulous, and attracted a great deal of critique. As Williamson, among many others, points out, both buying from a supplier and vertical integration require transaction costs: I need to haggle over the price of the component or else the price of the whole company! Therefore, in an unchanging world, it is not clear that integration does anything to reduce the transaction costs of evaluating what my partner – in procurement or in merger – is capable of. In the case of Schwinn, the transaction costs must be incurred whether we are debating how to split profits with a particular retailer for the upcoming year, or the price of a pallet of bicycles sold to that retailer.

Williamson’s model is richer. He takes change in the relationship as first order: the famous “unprogrammed adaptations”. The relationship between Schwinn and its retailers requires actions by both over time. Because we are not omniscient, no contract will cover every eventuality. When something unexpected happens, and we both want to renegotiate our contract, we are said to be facing an unprogrammed adaptation. For instance, if advertising is useful, and e-scooters unexpectedly become popular after Schwinn and their retailer sign their initial contract, then we will need to renegotiate who pays for those ads. Of course, we will only bother to negotiate at all if Schwinn and the retailer jointly profit from their relationship compared to their next best options, generating so-called “appropriable quasi-rents”.

We now have an explanation for Schwinn’s behavior. They expect frequent haggling with their retailer about which bicycles to advertise, service standards for repairs, employee training, and so on. If these negotiations fail, the next best option is pretty bad – many small towns might only have one full-service bicycle shop, the Schwinn bikes are more popular than alternatives, and Schwinn itself has neither the resources nor the knowledge to run its own full-service chain of retailers efficiently. Schwinn therefore uses exclusive retail contracts to limit the number of retailers it must negotiate with over service standards, advertising, and the like.

While we have focused on the application of transaction costs to antitrust, Williamson’s basic framework extends much further. He saw the problem as one of “choice” versus “contract”. The canonical topic of study in economics is choice: “Economics is the science which studies human behavior as a relationship between ends and scarce means which have alternative uses,” as Lionel Robbins famously puts it. However, constraints also matter. Agents can act only within the bounds of the law, as a function of what other firms are capable of, and so on. Some of these constraints are public – e.g., what tariff rate do we face, are we allowed to put a price on kidneys for exchange, and so on. Williamson focused our attention on private constraints: the contracts, governance structures, and tools to align incentives which help us reach efficiency when information is asymmetric and contracts are incomplete. The timing was perfect: both Williamson and his professor Ken Arrow, along with Alchian, Demsetz, Klein and others, saw how important this “private ordering” was in their work in the 1960s, but that work was largely qualitative. The formal advancements in game theory in the 1970s gave us the tools that permitted formal analyses of contracting let us transform these ideas into a modern field of industrial organization.

Williamson was in no way an ideologue who ignored the possibility of anticompetitive behavior. Indeed, many canonical anticompetitive strategies, such as “raising rivals’ costs” whereby a firm encourages legal restrictions which raise its own cost but raise rival costs to an even greater degree, originate with Williamson. I also particularly like that Williamson both wrote serious economics, but also frequently translated those results for law journals in order to reach a wider audience. Erik Hovenkamp and I tried to follow this legacy recently in our work on the antitrust of startup acquisitions, where we wrote both a theoretical version and a law review article on the implications of this theory for existing legal practice.

Transaction cost economics is now huge and the both the benefits and critiques of this approach are serious (for more, see my course notes on the theory of the firm). Every economist, when looking at “unusual” contracts or mergers, now follows Williamson in simultaneously looking for the strategic anticompetitive explanation and the cost-saving explanation. The name of this balance? Literally, the Williamson tradeoff!

—————–

If Williamson was interested in “private ordering”, Alesina was focused on the public constraints on behavior. He was without question at the head of the table when it came to winning a Nobel for political economy. Economists, by and large, are technocrats. We have models of growth, of R&D, of fiscal policy, of interstate coordination, and so on. These models imply useful policies. The “public choice” critique, that the politicians and bureaucrats implementing these policies, may muck things up, is well known. The “political business cycle” approach of Nordhaus has politicians taking advantage of myopic voters by, for instance, running expansionary, inflation-inducing policy right before an election, generating lower unemployment today but higher inflation tomorrow.

Alesina’s research goes further than either of these approaches. Entering the field after the rational expectations revolution arrived, Alesina saw how skeptical economists were of the idea that politicians could, each election cycle, take advantage of voters in the same way. I like to explain rational expectations to students as the Bob Marley rule: “You can fool some people sometimes, but you can’t fool all the people all the time.” Rather than myopic voters, we have voters who do not perfectly observe the government’s actions or information. Politicians wish to push their preferences (“ideology”) and also to get re-elected (“career concerns”). Voters have differing preferences. We then want to ask: to what extent can politicians use their private information to push preferences that “society” does not necessarily want, and how does that affect the feasibility of political unions, monetary policy, fiscal policy, and so on?

One important uncertainty is that voters are uncertain about who will win an election. Consider a government which can spend on the military or on education (“guns” or “butter”), and can finance this through debt if they like. A benevolent social planner uses debt to finance investment such that the tax burden is distributed over time. In a political macro model, however, Alesina and Tabellini (RESTUD 1990) show that there will be too much debt, especially when elections are close. If I favor military spending more than education, I can jack up the debt when I am in power with military spending. This not only gets me more military today, but also constrains the other party from spending so much on education tomorrow since society’s debt load will be too high. In equilibrium, both parties try to constrain their rival’s action in the future by using debt spending today. The model makes clear predictions about how debt relates to fundamentals of society – political polarization, and so on – without requiring irrationality on the part of any actor, whether voter or politician.

It is not hard to see how the interests of economists are so heavily linked to their country of origin. Many of our best macroeconomists come from Argentina, home of a great deal of macroeconomic instability. Americans are overrepresented in applied micro, no surprise given the salience of health, education, and labor issues in U.S. political debates. The French, with their high level of technical training in schools and universities, have many great theorists. And no surprise, the Italians are often interested in how political incentives affect and limit economic behavior. Once you start applying Alesina’s ideas, the behavior of politicians and implications for society become clear. Why do politicians delegate some tasks to bureaucrats and not others? The hard ones the politicians might be blamed for if they fail get delegated, and the ones that allow control of distribution do not ((Alesina and Tabellini 2007 AER). Why doesn’t the US have a strong welfare state compared to Europe? The distortions from taxation, relative income mobility, or political power of the poor are relatively unimportant to the racial fractionalization which also explains changes in European preferences over time (Alesina, Glaeser and Sacerdote, Brookings 2001 and Alesina, Miano and Stantcheva 2018).

Perhaps the most salient of Alesina’s questions is one of his oldest (Alesina and Spoloare, QJE 1997): why are there so many countries? Are there “too many”, and what could this mean? In a crisis like Covid, would we be better off with a European fiscal union rather than a bunch of independent countries? Big countries can raise funds with less distortion, public goods often economies of scale, and transfers within countries can handle idiosyncratic regional shocks – these are both assumptions and empirical facts. On the other hand, the bigger the country, the less agreement on how to value public goods. Consider a region on the outskirts of an existing country – say, Sudtirol in Italy. If they secede, they pay higher taxes for their public goods, but the public goods provided are much closer to their preferences. In a democratic secession, these Sudtirol voters do not account for how their secession causes the cost of government in the remaining rump of Italy to rise. Hence they are too likely to secede, versus what a social planner prefers.

We can see this effect in the EU right now. An EU fiscal union would reduce the cost of providing some public goods, insurance to shocks among them. However, the Germans and Dutch have very different public goods preferences from the Italians and Greeks. A planner would balance the marginal cost of lower alignment for the average EU citizen against the marginal benefit of lower public goods costs. A German elected leader will weigh the marginal cost of lower alignment for the average German citizen (worse than that of the EU median citizen!) against the marginal benefit of lower public goods costs (less important, because it doesn’t account for cheaper public goods for Greeks and Italians when Germany joins them to borrow funds jointly). We therefore get too little coordinated fiscal action. This lack of action of public goods makes some Europeans skeptical of other aspects of the EU project: one of Alesina’s final op-eds was on was on the disastrously nationalistic EU response to Covid. Luis Garicano, the well-known Spanish economist and current MEP, has a very interesting discussion with Luigi Zingales on precisely this point.

It’s positive enough that Alesina’s work was well-respected in political science and not just economics. What I especially like about Alesina, though, is how ideologically confusing his policy advice is, especially for an American. He simultaneously supports a lower tax rate for women on the basis on intrafamily dynamics, and was the leading proponent of expansionary austerity, or spending cuts during recessions! The tax rate idea is based on the greater elasticity of labor supply of women, hence is a direct application of the Ramsey rule. Expansionary austerity is based on a serious review of austerity policies over many decades. He pushed these ideas and many others in at least 10 books and dozens of op-eds (including more than 30 for VoxEU). Agree with these ideas or not – and I object to both! – Alesina nonetheless argued for these positions from a base of serious theory and empirics, rather than from ideology. What worthier legacy could there be for an academic?

Covid-19 Innovation – Are We on the Right Track?

I never discuss my own research on this website – it’s more fun (for me at the very least!) to dive in to the great results the rest of the economics community produces. So I hope you’ll forgive me for breaking this rule today, as I want to show a few interesting, very time-sensitive results Jorge Lemus, Guillermo Marshall and I have developed about Covid-19 innovation.

Many of us in the innovation economics world have been asked by governments how they should handle R&D right now. The basic problem is clear. There is a pandemic. Stopping this has enormous economic benefits – a vaccine that arrived tomorrow would literally be among the most economically valuable inventions ever made. Treatments which allow normal economic activity are incredibly valuable as well. As always, governments have limited knowledge about who is able to invent what. There is tremendous uncertainty about how various R&D projects will pan out. Should government be running huge prizes for specific inventions? General subsidies for medical R&D? Precommitments to buy certain remedies? Perhaps laissez faire alone is enough to induce this invention? Should patents be stronger, to increase the returns to R&D, or should they be weaker, to encourage the WHO’s global access to remedies?

These are very challenging questions. Let’s instead narrow down to a simpler one: given existing policy, is the rate and direction of Covid-19 R&D worrying in any way? Our basic finding is that the rate of Covid innovation is incredibly rapid, but competitive forces are pushing that research in a very short-term direction. The policy implications are subtle – many ideas that we normally think of as useful for R&D, especially on global health issues, may actually be counterproductive.

Here we compare the rate of therapies somewhere in the pipeline, and the number of academic publications, related to Covid-19 compared to other epidemics like Zika, Ebola, and H1N1, and compared to breast cancer, the most heavily-funded long-run disease. The pipeline data is from BioMedTracker, a standardized commercial research database that independently validates reports of new projects on a given indication. Note two things. First, the rate of Covid research far exceeds the long-term average for breast cancer or the post-epidemic rate of research on Ebola, H1N1, or Zika. Second, this gap grows even larger after the globalization of the pandemic in early March 2020, as indicated by the vertical line in the figures. Covid therapies are entering the pipeline at a rate 15 to 80 times faster than any previous epidemic, with over 4 new therapies entering the commercial pipeline every single day. The number of these therapies in clinical trials within four months of the early December beginning of the epidemic exceeds the entire first-year number of trials for H1N1, Zika, and Ebola combined. This figure tracks through April 22, but the rate of new drugs in the pipelines since that date has continued at nearly the same pace. The expected return on Covid innovation is high enough to induce an incredible amount of entry.

Alas, there is a downside. Let’s split the pipeline into vaccines vs. other drugs, and repurposed drugs vs. novel compounds. The relative share of “short term” solutions – non-vaccines and repurposed drugs – is unusually high. 23 percent of Covid therapies are vaccines, versus at least half for the previous three recent less severe epidemics. Over 60 percent of Covid therapies are repurposed drugs, versus no more than a quarter of those for Ebola, Zika, or H1N1. The short-term share is particularly strong after Covid explodes globally in early March – the rate at which new vaccines enter the pipeline is essentially the same in February and April! Broadly, as the epidemic gets worse, a greater share of R&D goes to projects that can be developed quickly.

How should we interpret this? We need a model to understand what the direction of invention “should be”. Theoretically, let there be firms of different sizes considering paying a fixed cost to enter the market for drug therapies on a particular disease. After entering, these firms choose whether to work on short-term therapies, which can be developed quickly but are not as valuable, or long-term ones, which take time to invent but are quite valuable. Once some firm invents a remedy, the marginal value of other remedies change: for instance, a vaccine is more valuable than a treatment, but the marginal value of the vaccine if a reasonable treatment exists is lower than it would have been otherwise.

Let’s model an epidemic intensifying by a multiplicative increase in the payoff to all inventions related to that disease. This increased payoff for successful invention, holding the number of firms constant, increased each firm’s expected payoff from R&D. The higher expected payoff induces more firms to enter, particularly firms with limited specialized research capacity who otherwise wouldn’t bother with a disease outside their wheelhouse.

Increased entry means a more fractured market for R&D, with many small firms doing research instead of just a few big ones. Assume it would be efficient for most firms to try to invent a vaccine. A small firm – say, one which represents 1% of total research capacity in the industry – will reason as follows. “I can try to invent the vaccine, but the Sanofis and the Modernas of the world are likely to get their vaccine way before I do. However, since their projects will take many months to develop and validate, I can instead try to quickly develop a marginally useful treatment. The existence of that treatment, once invented, lowers the marginal payoff to working on vaccines, but who cares – I am very unlikely to invent a vaccine in any case.” The increased entry driven by huge payoffs to any successful Covid therapies causes entrants to inefficiently race toward lower-value therapies. If enough small firms begin to race in this way, even the large firms that otherwise would have worked on vaccines will give up. And note that this pattern appears empirically: more severe epidemic leads to more entry by small firms, more work on short-term projects, and a decreasing share of long-term projects even from large firms as that competitive racing gets worse.

The policy implication is somewhat worrying. Normally, we worry in global public health that there isn’t enough incentive to work on a disease class at all, since firms worry that poor countries would either be unable to afford, or would expropriate ex-post, any useful invention. Things like the advance market commitments supported by last year’s Nobel winner Michael Kremer, have richer governments precommit to buying successful therapies for diseases like malaria. In the case of Covid, however, the main problem isn’t the rate of innovation, but the direction – the expected financial return on innovation may be too high for short-run, partial solutions, yet too low for vaccines or novel therapies.

So how can society both encourage a ton of Covid-19 R&D so we can escape this pandemic, while also discouraging racing behavior toward minor solutions? We propose three solutions. First, limited antitrust enforcement on research joint ventures can help by causing firms to partially internalize the racing externality they would otherwise have imposed on their joint venture partner. Second, targeted subsidies specifically for vaccines or promising novel therapies reshift the balance back toward the social optimum. It’s amazing how little of this is being done: a recent Guardian roundup of leading vaccine candidates proudly notes that 40 million pounds of support has been given to the Imperial College London team. 40 million in support for a vaccine that, if successful, probably has a net present value of over a trillion pounds! Third, we suggest AMCs can be useful as long as the value of the AMC is based on the ex-ante value of the therapy being developed, rather than the ex-post value – doing so allows firms with ability to work on long-term, high-value solutions to ignore the racing being done by their smaller rivals.

April 2020 Draft: “Innovation During a Crisis: Evidence from Covid-19”. We worked very hard – literally many all nighters – to get an analysis done quickly which nonetheless uses the best available Covid R&D data, and provides policy advice based on rigorous innovation theory. In my work with an entrepreneurship program here at the University of Toronto since we finished the draft, where we consult with very experienced biotech VCs, I’ve actually noticed our proposed mechanism in action – everyone is aware of the competitive nature of Covid research, and the first-order questions are always whether a firm has the capacity to work at high speed, and whether their therapy could be modified into something less significant but quicker to develop. I truly do not believe the externality we suggest is merely a theoretical curiosity.

The Simple Economics of Social Distancing and the Coronavirus

“Social distancing” – reducing the number of daily close contacts individuals have – is being encouraged by policymakers and epidemiologists. Why it works, and why now rather than for other diseases, is often left unstated. Economists have two important contributions here. First, game theoretic models of behavior are great for thinking through where government mandates are needed and where they aren’t. Second, economists are used to thinking through tradeoffs, such as the relative cost and benefit of shutting down schools versus the economic consequences of doing so. The most straightforward epidemiological model of infection – the SIR model dating back to the 1920s – is actually quite commonly used in economic models of innovation or information diffusion, so it is one we are often quite familiar with. Let’s walk through the simple economics of epidemic policy.

We’ll start with three assumptions. First, an infected person will infect B other people before recovering if we make no social changes and no one is immune. Second, people who have recovered from coronavirus do not get sick again (which appears to be roughly true). Third, coronavirus patients tend to be infectious before they show up in a hospital. We will relax these assumptions shortly. Finally, we will let d represent the amount of social distancing. If d=1, we are all just living our normal lives. If d=0, we are completely isolated in bubbles and no infections transmit. The cost of distancing level d is c(d), where distancing grows ever more costly the more you do it – for the mathematically inclined, c(1-d)>0, c'(1-d)>0, c(1)=0.

In the classic SIR model, people are either susceptible, infected, or recovered (or “removed” depending on the author). Let S, I, and R be the fraction of the population in each bubble at any given time t. Only group S can be infected by someone new. When an infected person encounters someone, they pass the disease along with probability dB, where d is distancing level and B is the infection rate. If an infected person interacts with another infected or recovered person, they do not get sick, or at least not more sick.

Define one unit of time as the period someone is infectious. We then can define how the proportion of people in each group change over time as

dS/dt=-dBSI
dI/dt=dBSI-I
dR/dt=I

The second equation, for instance, says the proportion of population that is infected is the infection rate given distancing dB times the number of possible interactions between infected and susceptible people SI, minus the number of people infected in the current period I (remember, we define one unit of time as the period in which you are sick after being infected, so today’s sick are tomorrow’s recovered). If dI/dt>0, then the number of infected people is growing. When the infection is very young, almost everyone is in group S (hence S equals something close to 1) and no distancing is happening (so d=1), so the epidemic spreads if (dBS-1)I>0, and therefore when B>1. Intuitively, if 1 sick person infects on average more 1 person, the epidemic grows. To stop the epidemic, we need to slow that transmission so that dI/dt<0. The B in this model is the "R0" you may see in press, incidentally. With coronavirus, B is something like 2.

How can we end the epidemic, then? Two ways. First, the epidemic dies out because of “herd immunity”. This means that the number of people in bin R, those already infected, gets high enough that infected people interact with too few susceptible people to keep the disease going. With B=2 and no distancing (d=1), we would need dI/dt=(dBS-1)I=2S-1<0, or S<.5. In that case, the number of people eventually infected is half of society before the epidemic stops growing, then smaller numbers continue to get infected until the disease peters out. Claims that coronavirus will infect “70%” of society are based on this math – it will not happen because people would pursue serious distancing policies well before we reached anywhere near this point.

The alternative is distancing. I use distancing to mean any policy that reduces infectiousness – quarantine, avoiding large groups, washing your hands, etc. The math is simple. Again, let B=2. To stop coronavirus before large numbers (say, before 10% of society) are infected requires (dBS-1)I=(2dS-1)I<0. For S roughly equal to 1, we therefore need d<1/2. That is, we need to reduce the average number of infections a sick person gives to others at least in half. Frequent handwashing might reduce infection by 20% or so, though with huge error bars. That is, to stop the epidemic, we need fairly costly interventions like cancelling large events and work-from-home policies. Note that traditional influenza maybe has B=1.2, so small behavior changes like staying home when sick and less indoor interaction in the summer may be enough to stop epidemic spread. This is not true of coronavirus, as far as we know.

Ok, let’s turn to the economics. We have two questions here: what will individuals do in an epidemic, and what should society compel them to do? The first question involves looking for the d(s,t,z) chosen in a symmetric subgame perfect equilibrium, where s is your state (sick or not), z is society’s state (what fraction of people are sick), and t is time. The externality here is clear: Individuals care about not being sick themselves, but less about how their behavior affects the spread of disease to others. That is, epidemic prevention is a classic negative externality problem! There are two ways to solve these problems, as Weitzman taught us: Prices or Quantities. Prices means taxing behavior that spreads disease. In the coronavirus context, it might be something like “you can go to the ballgame, conditional on paying a tax of $x which is enough to limit attendance”. Quantities means limiting that behavior directly. As you might imagine, taxation is quite difficult to implement in this context, hence quantity limits (you can’t have events with more than N people) are more common.

It turns out that solving for the equilibrium in the SIR epidemic game is not easy and generally has to be done numerically. But we can say some things. Let m be the marginal cost of distancing in the limit, normalized to the cost of being infected (m=c'(d)/C for d=0, where C is the cost of being infected). If distancing is not very efficient (m is low) or if transmission is hard (B is not much more than 1), then in equilibrium no one will distance themselves, and the epidemic will spread. People will also not distance themselves until the epidemic has already spread quite a bit – the cost of social distancing needs to be paid even though the benefit in terms of not being infected is quite low, and you can always “hide from other people” later on.

Where, then, is mandated social distancing useful? If individuals do not account for the externality of their social distancing enough, they will avoid some contact to prevent getting sick right away, but not enough to prevent the epidemic from continuing to spread. If the epidemic is super dangerous (very high costs of sitting sick a la ebola or B very high), in equilibrium individuals will distance without being forced to. If the cost of being sick is high relative to the economic and social disruption of distancing, it is better even from a social planner’s point to view to just to risk getting sick. We don’t attempt to prevent the common cold with anything more extreme than covering our mouths.

However, if B is not too high, and the cost of being sick in high relative to the cost of distancing but not too high, it can be optimal for the government to impose social distancing. In the case of coronavirus, we need d<1/2. Since some people cannot reduce contact like doctors, the rest of us need to reduce the number of close contacts we interact with every day by even more than half. That is a bigger reduction than the workday versus Sunday number of interactions for the average person!

This model can be extended in a few useful ways. Three are particularly relevant: 1) what if we know who is sick before they are contagious, 2) what if people have different costs of being sick, and 3) what if the network of contacts is more complex than “any given person is likely to run into any other”.

If we observe people before they are sick, then quarantine can reduce d. Imagine a disease where everyone who gets sick turns blue, then they can only infect you two days later. Surely we can all see the very low cost method of preventing an epidemic – lock up the blue people! Ebola and leprosy are not far off from this case, and SARS also had the nice property that people are sick well before they are infectious. It seems coronavirus is quite infectious even when you only feel mildly ill, so pure testing plus quarantine is unlikely to move d – the distancing parameter – enough to reduce the number of infections caused by each sick person enough. This is especially true once the number of infected to too large to trace all of their contacts and test before they become infectious themselves.

If people have different costs of being sick, the case for government mandates is stronger. Let young people be only mildly sick, and old people much more so, even as each group is equally contagious. In equilibrium, the young will take only minor distancing precautions, and old major ones. Since the cost of distancing is convex, it is not efficient nor an equilibrium for the old to pursue extreme distancing while the young do relatively little. This convexity should increase the set of parameters where government mandated distancing is needed. As far as I am aware, there is not a good published model explicitly showing this in an SIR differential game, however (economists trapped home this weekend – let’s work this out and get it on ArXiv!).

Finally, the case of more “realistic” networks is interesting. In the real world, social contacts tend to have a “small world” property – we are tightly connected with a group of people who all know each other, not randomly connected to strangers. High clustering reduces the rate of early diffusion (see, e.g., this review) and makes quarantine more effective. For instance, if a wife is infected, the husband can be quarantined, as he is much more likely to be infected than some random person in society. “Brokerage” type contacts which connect two highly clustered groups are also important to separate, since they are the only way that disease spreads from one group to another. This is the justification for travel restrictions during epidemics – however, once most clusters have infected people, the travel restrictions no longer are important.

What Randomization Can and Cannot Do: The 2019 Nobel Prize

It is Nobel Prize season once again, a grand opportunity to dive into some of our field’s most influential papers and to consider their legacy. This year’s prize was inevitable, an award to Abhijit Banerjee, Esther Duflo, and Michael Kremer for popularizing the hugely influential experimental approach to development. It is only fitting that my writeup this year has been delayed due to the anti-government road blockades here in Ecuador which delayed my return to the internet-enabled world – developing countries face many barriers to reaching prosperity, and rarely have I been so personally aware of the effects of place on productivity as I was this week!

The reason for the prize is straightforward: an entire branch of economics, development, looks absolutely different from what it looked like thirty years ago. Development used to be essentially a branch of economic growth. Researchers studied topics like the productivity of large versus small farms, the nature of “marketing” (or the nature of markets and how economically connected different regions in a country are), or the necessity of exports versus industrialization. Studies were almost wholly observational, deep data collections with throwaway references to old-school growth theory. Policy was largely driven by the subjective impression of donors or program managers about projects that “worked”. To be a bit too honest – it was a dull field, and hence a backwater. And worse than dull, it was a field where scientific progress was seriously lacking.

Banerjee has a lovely description of the state of affairs back in the 1990s. Lots of probably-good ideas were funded, informed deeply by history, but with very little convincing evidence that highly-funded projects were achieving their stated aims. Of the World Bank Sourcebook of recommended projects, everything from scholarships to girls to vouchers for poor children to citizens’ report cards were recommended. Did these actually work? Banerjee quotes a program providing computer terminals in rural areas of Madhya Pradesh which explains that due to a lack of electricity and poor connectivity, “only a few of the kiosks have proved to be commercially viable”, then notes, without irony, that “following the success of the initiative,” similar programs would be funded. Clearly this state of affairs is unsatisfactory. Surely we should be able to evaluate the projects we’ve funded already? And better, surely we should structure those evaluations to inform future projects? Banerjee again: “the most useful thing a development economist can do in this environment is stand up for hard evidence.”

And where do we get hard evidence? If by this we mean internal validity – that is, whether the effect we claim to have seen is actually caused by a particular policy in a particular setting – applied econometricians of the “credibility revolution” in labor in the 1980s and 1990s provided an answer. Either take advantage of natural variation with useful statistical properties, like the famed regression discontinuity, or else randomize treatment like a medical study. The idea here is that the assumptions needed to interpret a “treatment effect” are often less demanding than those needed to interpret the estimated parameter of an economic model, hence more likely to be “real”. The problem in development is that most of what we care about cannot be randomized. How are we, for instance, to randomize whether a country adopts import substitution industrialization or not, or randomize farm size under land reform – and at a scale large enough for statistical inference?

What Banerjee, Duflo, and Kremer noticed is that much of what development agencies do in practice has nothing to do with those large-scale interventions. The day-to-day work of development is making sure teachers show up to work, vaccines are distributed and taken up by children, corruption does not deter the creation of new businesses, and so on. By breaking down the work of development on the macro scale to evaluations of development at micro scale, we can at least say something credible about what works in these bite-size pieces. No longer should the World Bank Sourcebook give a list of recommended programs, based on handwaving. Rather, if we are to spend 100 million dollars sending computers to schools in a developing country, we should at least be able to say “when we spent 5 million on a pilot, we designed the pilot so as to learn that computers in that particular setting led to a 12% decrease in dropout rate, and hence a 34%-62% return on investment according to standard estimates of the link between human capital and productivity.” How to run those experiments? How should we set them up? Who can we get to pay for them? How do we deal with “piloting bias”, where the initial NGO we pilot with is more capable than the government we expect to act on evidence learned in the first study? How do we deal with spillovers from randomized experiments, econometrically? Banerjee, Duflo, and Kremer not only ran some of the famous early experiments, they also established the premier academic institution for running these experiments – J-PAL at MIT – and further wrote some of the best known practical guides to experiments in development.

Many of the experiments written by the three winners are now canonical. Let’s start with Michael Kremer’s paper on deworming, with Ted Miguel, in Econometrica. Everyone agreed that deworming kids infected with things like hookworm has large health benefits for the children directly treated. But since worms are spread by outdoor bathroom use and other poor hygiene practices, one infected kid can also harm nearby kids by spreading the disease. Kremer and Miguel suspected that one reason school attendance is so poor in some developing countries is because of the disease burden, and hence that reducing infections among one kid benefits the entire community, and neighboring ones as well, by reducing overall infection. By randomizing mass school-based deworming, and measuring school attendance both at the focal and at neighboring schools, they found that villages as far as 4km away saw higher school attendance (4km rather 6km in the original paper due to a correction of an error in the analysis). Note the good economics here: a change from individual to school-based deworming helps identify spillovers across schools, and some care goes into handling the spatial econometric issue whereby density of nearby schools equals density of nearby population equals differential baseline infection rates at these schools. An extra year of school attendance could therefore be “bought” by a donor for $3.50, much cheaper than other interventions such as textbook programs or additional teachers. Organizations like GiveWell still rate deworming among the most cost-effective educational interventions in the world: in terms of short-run impact, surely this is one of the single most important pieces of applied economics of the 21st century.

The laureates have also used experimental design to learn that some previously highly-regarded programs are not as important to development as you might suspect. Banerjee, Duflo, Rachel Glennerster and Cynthia Kinnan studied microfinance rollout in Hyderabad, randomizing the neighborhoods which received access to a major first-gen microlender. These programs are generally woman-focused, joint-responsibility, high-interest loans a la the Nobel Peace Prize winning Grameen Bank. 2800 households across the city were initially surveyed about their family characteristics, lending behavior, consumption, and entrepreneurship, then followups were performed a year after the microfinance rollout, and then three years later. While women in treated areas were 8.8 percentage points more likely to take a microloan, and existing entrepreneurs do in fact increase spending on their business, there is no long-run impact on education, health, or the likelihood women make important family decisions, nor does it make businesses more profitable. That is, credit constraints, at least in poor neighborhoods in Hyderabad, do not appear the main barrier to development; this is perhaps not very surprising, since higher-productivity firms in India in the 2000s already have access to reasonably well-developed credit markets, and surely they are the main driver of national income (followup work does see some benefits for very high talent, very poor entrepreneurs, but the long run key result remains).

Let’s realize how wild this paper is: a literal Nobel Peace Prize was awarded for a form of lending that had not really been rigorously analyzed. This form of lending effectively did not exist in rich countries at the time they developed, so it is not a necessary condition for growth. And yet enormous amounts of money went into a somewhat-odd financial structure because donors were nonetheless convinced, on the basis of very flimsy evidence, that microlending was critical.

By replacing conjecture with evidence, and showing randomized trials can actually be run in many important development settings, the laureates’ reformation of economic development has been unquestionably positive. Or has it? Before returning to the (truly!) positive aspects of Banerjee, Duflo and Kremer’s research program, we must take a short negative turn. Because though Banerjee, Duflo, and Kremer are unquestionably the leaders of the field of development, and the most influential scholars for young economists working in that field, there is much more controversy about RCTs than you might suspect if all you’ve seen are the press accolades of the method. Donors love RCTs, as they help select the right projects. Journalists love RCTs, as they are simple to explain (Wired, in a typical example of this hyperbole: “But in the realm of human behavior, just as in the realm of medicine, there’s no better way to gain insight than to compare the effect of an intervention to the effect of doing nothing at all. That is: You need a randomized controlled trial.”) The “randomista” referees love RCTs – a tribe is a tribe, after all. But RCTs are not necessarily better for those who hope to understand economic development! The critiques are three-fold.

First, that while the method of random trials is great for impact or program evaluation, it is not great for understanding how similar but not exact replications will perform in different settings. That is, random trials have no specific claim to external validity, and indeed are worse than other methods on this count. Second, it is argued that development is much more than program evaluation, and that the reason real countries grow rich has essentially nothing to do with the kinds of policies studied in the papers we discussed above: the “economist as plumber” famously popularized by Duflo, who rigorously diagnoses small problems and proposes solutions, is a fine job for a World Bank staffer, but a crazy use of the intelligence of our otherwise-leading scholars in development. Third, even if we only care about internal validity, and only care about the internal validity of some effect that can in principle be studied experimentally, the optimal experimental design is generally not an RCT.

The external validity problem is often seen to be one related to scale: well-run partner NGOs are just better at implementing any given policy that, say, a government, so the benefit of scaled-up interventions may be much lower than that identified by an experiment. We call this “piloting bias”, but it isn’t really the core problem. The core problem is that the mapping from one environment or one time to the next depends on many factors, and by definition the experiment cannot replicate those factors. A labor market intervention in a high-unemployment country cannot inform in an internally valid way about a low-unemployment country, or a country with different outside options for urban laborers, or a country with an alternative social safety net or cultural traditions about income sharing within families. Worse, the mapping from a partial equilibrium to a general equilibrium world is not at all obvious, and experiments do not inform as to the mapping. Giving cash transfers to some villagers may make them better off, but giving cash transfers to all villagers may cause land prices to rise, or cause more rent extraction by corrupt governments, or cause all sorts of other changes in relative prices.

You can see this issue in the Scientific Summary of this year’s Nobel. Literally, the introductory justification for RCTs is that, “[t]o give just a few examples, theory cannot tell us whether temporarily employing additional contract teachers with a possibility of re-employment is a more cost-effective way to raise the quality of education than reducing class sizes. Neither can it tell us whether microfinance programs effectively boost entrepreneurship among the poor. Nor does it reveal the extent to which subsidized health-care products will raise poor people’s investment in their own health.”

Theory cannot tell us the answers to these questions, but an internally valid randomized control trial can? Surely the wage of the contract teacher vis-a-vis more regular teachers and hence smaller class sizes matters? Surely it matters how well-trained these contract teachers are? Surely it matters what the incentives for investment in human capital by students in the given location is? To put this another way: run literally whatever experiment you want to run on this question in, say, rural Zambia in grade 4 in 2019. Then predict the cost-benefit ratio of having additional contract teachers versus more regular teachers in Bihar in high school in 2039. Who would think there is a link? Actually, let’s be more precise: who would think there is a link between what you learned in Zambia and what will happen in Bihar which is not primarily theoretical? Having done no RCT, I can tell you that if the contract teachers are much cheaper per unit of human capital, we should use more of them. I can tell you that if the students speak two different languages, there is a greater benefit in having a teacher assistant to translate. I can tell you that if the government or other principal has the ability to undo outside incentives with a side contract, hence are not committed to the mechanism, dynamic mechanisms will not perform as well as you expect. These types of statements are theoretical: good old-fashioned substitution effects due to relative prices, or a priori production function issues, or basic mechanism design.

Things are worse still. It is not simply that an internally valid estimate of a treatment effect often tells us nothing about how that effect generalizes, but that the important questions in development cannot be answered with RCTs. Everyone working in development has heard this critique. But just because a critique is oft-repeated does not mean it is wrong. As Lant Pritchett argues, national development is a social process involving markets, institutions, politics, and organizations. RCTs have focused on, in his reckoning, “topics that account for roughly zero of the observed variation in human development outcomes.” Again, this isn’t to say that RCTs cannot study anything. Improving the function of developing world schools, figuring out why malaria nets are not used, investigating how to reintegrate civil war fighters: these are not minor issues, and it’s good that folks like this year’s Nobelists and their followers provide solid evidence on these topics. The question is one of balance. Are we, as economists are famously wont to do, simply looking for keys underneath the spotlight when we focus our attention on questions which are amenable to a randomized study? Has the focus on internal validity diverted effort from topics that are much more fundamental to the wealth of nations?

But fine. Let us consider that our question of interest can be studied in a randomized fashion. And let us assume that we do not expect piloting bias or other external validity concerns to be first-order. We still have an issue: even on internal validity, randomized control trials are not perfect. They are certainly not a “gold standard”, and the econometricians who push back against this framing have good reason to do so. Two primary issues arise. First, to predict what will happen if I impose a policy, I am concerned that what I have learned in this past is biased (e.g., the people observed to use schooling subsidies are more diligent than those who would go to school if we made these subsidies universal). But I am also concerned about statistical inference: with small sample sizes, even an unbiased estimate will not predict very well. I recently talked with an organization doing recruitment who quasi-randomly recruited at a small number of colleges. On average, they attracted a handful of applicants in each college. They stopped recruiting at the colleges with two or fewer applicants after the first year. But of course random variation means the difference between two and four applicants is basically nil.

In this vein, randomized trials tend to have very small sample sizes compared to observational studies. When this is combined with high “leverage” of outlier observations when multiple treatment arms are evaluated, particularly for heterogeneous effects, randomized trials often predict poorly out of sample even when unbiased (see Alwyn Young in the QJE on this point). Observational studies allow larger sample sizes, and hence often predict better even when they are biased. The theoretical assumptions of a structural model permit parameters to be estimated even more tightly, as we use a priori theory to effectively restrict the nature of economic effects.

We have thus far assumed the randomized trial is unbiased, but that is often suspect as well. Even if I randomly assign treatment, I have not necessarily randomly assigned spillovers in a balanced way, nor have I restricted untreated agents from rebalancing their effort or resources. A PhD student of ours on the market this year, Carlos Inoue, examined the effect of random allocation of a new coronary intervention in Brazilian hospitals. Following the arrival of this technology, good doctors moved to hospitals with the “randomized” technology. The estimated effect is therefore nothing like what would have been found had all hospitals adopted the intervention. This issue can be stated simply: randomizing treatment does not in practice hold all relevant covariates constant, and if your response is just “control for the covariates you worry about”, then we are back to the old setting of observational studies where we need a priori arguments about what these covariates are if we are to talk about the effects of a policy.

The irony is that Banerjee, Duflo and Kremer are often quite careful in how they motivate their work with traditional microeconomic theory. They rarely make grandiose claims of external validity when nothing of the sort can be shown by their experiment. Kremer is an ace theorist in his own right, Banerjee often relies on complex decision and game theory particularly in his early work, and no one can read the care with which Duflo handles issues of theory and external validity and think she is merely punting. Most of the complaints about their “randomista” followers do not fully apply to the work of the laureates themselves.

And none of the critiques above should be taken to mean that experiments cannot be incredibly useful to development. Indeed, the proof of the pudding is in the tasting: some of the small-scale interventions by Banerjee, Duflo, and Kremer have been successfully scaled up! To analogize to a firm, consider a plant manager interested in improving productivity. She could read books on operations research and try to implement ideas, but it surely is also useful to play around with experiments within her plant. Perhaps she will learn that it’s not incentives but rather lack of information that is the biggest reason workers are, say, applying car door hinges incorrectly. She may then redo training, and find fewer errors in cars produced at the plant over the next year. This evidence – not only the treatment effect, but also the rationale – can then be brought to other plants at the same company. All totally reasonable. Indeed, would we not find it insane for a manager to try things out, and make minor changes on the margin, before implementing a huge change to incentives or training? And of course the same goes, or should go, when the World Bank or DFID or USAID spend tons of money trying to solve some development issue.

On that point, what would even a skeptic agree a development experiment can do? First, it is generally better than other methods at identifying internally valid treatment effects, though still subject to the caveats above.

Second, it can fine-tune interventions along margins where theory gives little guidance. For instance, do people not take AIDS drugs because they don’t believe they work, because they don’t have the money, or because they want to continue having sex and no one will sleep with them if they are seen picking up antiretrovirals? My colleague Laura Derksen suspected that people are often unaware that antiretrovirals prevent transmission, hence in locations with high rates of HIV, it may be safer to sleep with someone taking antiretrovirals than the population at large. She shows that informational interventions informing villagers about this property of antiretrovirals meaningfully increases takeup of medication. We learn from her study that it may be important in the case of AIDS prevention to correct this particular set of beliefs. Theory, of course, tells us little about how widespread these incorrect beliefs are, hence about the magnitude of this informational shift on drug takeup.

Third, experiments allow us to study policies that no one has yet implemented. Ignoring the problem of statistical identification in observational studies, there may be many policies we wish to implement which are wholly different in kind from those seen in the past. The negative income tax experiments of the 1970s are a classic example. Experiments give researchers more control. This additional control is of course balanced against the fact that we should expect super meaningful interventions to have already occurred, and we may have to perform experiments at relatively low scale due to cost. We should not be too small-minded here. There are now experimental development papers on topics thought to be outside the bounds of experiment. I’ve previously discussed on this site Kevin Donovan’s work randomizing the placement of roads and bridges connected remote villages to urban centers. What could be “less amenable” to randomization that the literal construction of a road and bridge network?

So where do we stand? It is unquestionable that a lot of development work in practice was based on the flimsiest of evidence. It is unquestionable that armies Banerjee, Duflo, and Kremer have sent into the world via J-PAL and similar institutions have brought much more rigor to understanding program evaluation. Some of these interventions are now literally improving the lives of millions of people with clear, well-identified, nonobvious policy. That is an incredible achievement! And there is something likeable about the desire of the ivory tower to get into the weeds of day-to-day policy. Michael Kremer on this point: “The modern movement for RCTs in development economics…is about innovation, as well as evaluation. It’s a dynamic process of learning about a context through painstaking on-the-ground work, trying out different approaches, collecting good data with good causal identification, finding out that results do not fit pre-conceived theoretical ideas, working on a better theoretical understanding that fits the facts on the ground, and developing new ideas and approaches based on theory and then testing the new approaches.” No objection here.

That said, we cannot ignore that there are serious people who seriously object to the J-PAL style of development. Deaton, who won the Nobel Prize only four years ago, writes the following, in line with our discussion above: “Randomized controlled trials cannot automatically trump other evidence, they do not occupy any special place in some hierarchy of evidence, nor does it make sense to refer to them as “hard” while other methods are “soft”… [T]he analysis of projects needs to be refocused towards the investigation of potentially generalizable mechanisms that explain why and in what contexts projects can be expected to work.” Lant Pritchett argues that despite success persuading donors and policymakers, the evidence that RCTs lead to better policies at the governmental level, and hence better outcomes for people, is far from the case. The barrier to the adoption of better policy is bad incentives, not a lack of knowledge on how given policies will perform. I think these critiques are quite valid, and the randomization movement in development often way overstates what they have, and could have in principle, learned. But let’s give the last word to Chris Blattman on the skeptic’s case for randomized trials in development: “if a little populist evangelism will get more evidence-based thinking in the world, and tip us marginally further from Great Leaps Forward, I have one thing to say: Hallelujah.” Indeed. No one, randomista or not, longs to go back to the day of unjustified advice on development, particularly “Great Leap Forward” type programs without any real theoretical or empirical backing!

A few remaining bagatelles:

1) It is surprising how early this award was given. Though incredibly influential, the earliest published papers by any of the laureates mentioned in the Nobel scientific summary are from 2003 and 2004 (Miguel-Kremer on deworming, Duflo-Saez on retirement plans, Chattopadhyay and Duflo on female policymakers in India, Banerjee and Duflo on health in Rajathstan). This seems shockingly recent for a Nobel – I wonder if there are any other Nobel winners in economics who won entirely for work published so close to the prize announcement.

2) In my field, innovation, Kremer is most famous for his paper on patent buyouts (we discussed that paper on this site way back in 2010). How do we both incentivize new drug production but also get these drugs sold at marginal cost once invented? We think the drugmakers have better knowledge about how to produce and test a new drug than some bureaucrat, so we can’t finance drugs directly. If we give a patent, then high-value drugs return more to the inventor, but at massive deadweight loss. What we want to do is offer inventors some large fraction of the social return to their invention ex-post, in exchange for making production perfectly competitive. Kremer proposes patent auctions where the government pays a multiple of the winning bid with some probability, giving the drug to the public domain. The auction reveals the market value, and the multiple allows the government to account for consumer surplus and deadweight loss as well. There are many practical issues, but I have always found this an elegant, information-based attempt to solve the problem of innovation production, and it has been quite influential on those grounds.

3) Somewhat ironically, Kremer also has a great 1990s growth paper with RCT-skeptics Pritchett, Easterly and Summers. The point is simple: growth rates by country vacillate wildly decade to decade. Knowing the 2000s, you likely would not have predicted countries like Ethiopia and Myanmar as growth miracles of the 2010s. Yet things like education, political systems, and so on are quite constant within-country across any two decade period. This necessarily means that shocks of some sort, whether from international demand, the political system, nonlinear cumulative effects, and so on, must be first-order for growth. A great, straightforward argument, well-explained.

4) There is some irony that two of Duflo’s most famous papers are not experiments at all. Her most cited paper by far is a piece of econometric theory on standard errors in difference-in-difference models, written with Marianne Bertrand. Her next most cited paper is a lovely study of the quasi-random school expansion policy in Indonesia, used to estimate the return on school construction and on education more generally. Nary a randomized experiment in sight in either paper.

5) I could go on all day about Michael Kremer’s 1990s essays. In addition to Patent Buyouts, two more of them appear on my class syllabi. The O-Ring theory is an elegant model of complementary inputs and labor market sorting, where slightly better “secretaries” earn much higher wages. The “One Million B.C.” paper notes that growth must have been low for most of human history, and that it was limited because low human density limited the spread of nonrivalrous ideas. It is the classic Malthus plus endogenous growth paper, and always a hit among students.

6) Ok, one more for Kremer, since “Elephants” is my favorite title in economics. Theoretically, future scarcity increases prices. When people think elephants will go extinct, the price of ivory therefore rises, making extinction more likely as poaching incentives go up. What to do? Hold a government stockpile of ivory and commit to selling it if the stock of living elephants falls below a certain point. Elegant. And I can’t help but think: how would one study this particular general equilibrium effect experimentally? I both believe the result and suspect that randomized trials are not a good way to understand it!

The Price of Everything, the Value of the Economy: A Clark Medal for Emi Nakamura!

Fantastic and well-deserved news this morning with the Clark Medal being awarded to Emi Nakamura, who has recently moved from Columbia to Berkeley. Incredibly, Nakamura’s award is the first Clark to go to a macroeconomist in the 21st century. The Great Recession, the massive changes in global trade patterns, the rise of monetary areas like the Eurozone, the “savings glut” and its effect on interest rates, the change in openness to hot financial flows: it has been a wild twenty years for the macroeconomy in the two decades since Andrei Schleifer won the Clark. It’s hard to imagine what could be more important for an economist to understand than these patterns.

Something unusual has happened in macroeconomics over the past twenty years: it has become more like Industrial Organization! A brief history may be useful. The term macroeconomics is due to Ragnar Frisch, in his 1933 article on the propagation of economic shocks. He writes,

“The macro-dynamic analysis…tries to give an account of the whole economic system taken in its entirety. Obviously in this case it is impossible to carry through the analysis in great detail. Of course, it is always possible to give even a macro-dynamic analysis in detail if we confine ourselves to a purely formal theory. Indeed, it is always possible by a suitable system of subscripts and superscripts, etc., to introduce practically all factors which we may imagine…Such a theory, however, would have only a rather limited interest. It would hardly be possible to study such fundamental problems as the exact time shape of the solution, [etc.]. These latter problems are just the essential problems in business cycle analysis. In order to attack these problems on a macro-dynamic basis…we must deliberately disregard a considerable amount of the details of the picture.

And so we did. The Keynesians collapsed the microfoundations of the macroeconomy into a handful of relevant market-wide parameters. The Lucas Critique argued that we can collapse some things – many agents into a representative agent, for instance – but we ought always begin our analysis with the fundamental parameters of tastes, constraints, and technologies. The neoclassical synthesis combined these raw parameters with nominal ridigities – sticky prices, limited information, and so on. But Frisch’s main point nonetheless held strong: to what use are these deeper theoretical parameters if we cannot estimate their value and their effect on the macroeconomy? As Einstein taught us, the goal of the scientist should be to make things as simple as possible, but no simpler.

What has changed recently in macroeconomics is twofold. First, computational power now makes it possible to estimate or calibrate very complex dynamic and stochastic models, with forward looking agents, with price paths in and out of equilibrium, with multiple frictions – it is in this way that macro begins to look like industrial organization, with microeconomic parameters at the base. But second, and again analogous to IO, the amount of data available to the researcher has grown enormously. We now have price scanner data that tells us exactly when and how prices change, how those changes propagate across supply chains and countries, how they interact with taxes, and so on. Frisch’s problem has in some sense been solved: we no longer have the same trade-off between usefulness and depth when studying the macroeconomy.

Nakamura is best known for using this deep combination of data and theory to understand how exactly firms set prices. Price rigidities play a particularly important role in theories of the macroeconomy that potentially involve inefficiency. Consider a (somewhat bowdlerized) version of real business cycle theory. Here, shocks hit the economy: for instance, an oil cartel withholds supply for political reasons. Firms must react to this “real” supply-side shock by reorganizing economic activity. The real shock then propagates across industries. The role of monetary policy in such a world is limited: a recession simply reflects industries reacting to real change in the economic environment.

When prices are “sticky”, however, that is no longer true. The speed by which real shocks propagate, and the distortion sticky prices introduce, can be affected by monetary policy, since firms will react to changes in expected inflation by changing the frequency in which they update prices. Famously, Golosov and Lucas in the JPE argued, theoretically and empirically, that the welfare effects of “sticky prices” or “menu costs” are not terribly large. Extracting these welfare effects is quite sensitive to a number of features in the data and in the theory. To what extent is there short-term price dispersion rather than an exogenous chance for all firms in an industry to change their prices? Note that price dispersion is difficult to maintain unless we have consumer search costs – otherwise, everyone buys from the cheapest vendor – so price dispersion adds a non-trivial technical challenge. How much do prices actually change – do we want to sweep out short-term sales, for example? When inflation is higher, do firms adjust prices equally often but with bigger price jumps (consider the famous doubling of the price of Coca-Cola), or do they adjust prices more often keeping the percentage change similar to low-inflation environments? How much heterogeneity is there is the price-setting practice across industries, and to what extent do these differences affect the welfare consequences of prices given the links across industries?

Namakura has pushed us very far into answering these questions. She has built insane price datasets, come up with clever identification strategies to separate pricing models, and used these tools to vastly increase our understanding of the interaction between price rigidities and the business cycle. Her “Five Facts” paper uses BLS microdata to show that sales were roughly half of the “price changes” earlier researchers has found, that prices change more rapidly when inflation is higher, and that there is huge heterogeneity across industries in price change behavior. Taking that data back to the 1970s, Nakamura and coauthors also show that high inflation environments do not cause more price dispersion: rather, firms update their prices more often. Bob Lucas in his Macroeconomic Priorities made a compelling argument that business cycle welfare costs are much smaller than the costs of inflation and inflation costs are themselves much smaller than the costs of tax distortions. As Nakamura points out, if you believe this, no wonder you prioritize price stability and tax policy! (Many have quibbled with Lucas’ basic argument, but even adding heterogeneous agents, it is tough to get business cycles to have large economic consequences; see, e.g., Krusell et al RED 2009.) Understanding better the true costs of inflation, via the feedback of monetary expansion on pricesetting, goes a great deal toward helping policymakers calibrate the costs and benefits of price stability vis-a-vis other macroeconomic goals.

Though generally known as an empirical macroeconomist, Nakamura also has a number of papers, many with her husband Jon Steinsson, on the theory of price setting. For example, why are prices both sticky and also involve sales? In a clever paper in the JME, Nakamura and Steinsson model a firm pricing to habit-forming consumers. If the firm does not constrain itself, it has the incentive to raise prices once consumers form their habit for a given product (as a Cheez-It fan, I understand the model well – my willingness to pay for a box shipped up from the US to the Cheez-It-free land of Canada is absurdly high). To avoid this time inconsistency problems, firms would like to commit to a price path with some flexibility to respond to changes in demand. An equilibrium in this relational contract-type model involves a price cap with sales when demand falls: rigid prices plus sales, as we see in the data! In a second theoretical paper with Steinsson and Alisdair McKay, Nakamura looks into how much communication about future nominal interest rates can affect behavior. In principle, a ton: if you tell me the Fed will keep the real interest rate low for many years (low rates in the future raise consumption in the future which raises inflation in the future which lowers real rates today), I will borrow away. Adding borrowing constraints and income risk, however, means that I will never borrow too much money: I might get a bad shock tomorrow and wind up on the street. Giving five years of forward guidance about interest rates rather than a year, therefore, doesn’t really affect my behavior that much: the desire to have precautionary savings is what limits my borrowing, not the interest rate.

Nakamura’s prize is a well-deserved award, going to a leader in the shift in macro toward a more empirical, more deeply “microeconomic” in its theory, style of macro. Her focus is keenly targeted toward some of the key puzzles relevant to macroeconomic policymakers. There is no way to cover such a broad field in one post – this is not one of those awards given for a single paper – but luckily Nakamura has two great easily-readable summaries of her core work. First, in the Annual Review of Economics, she lays out the new empirical facts on price changes, the attempts to identify the link between monetary policy and price changes, and the implications for business cycle theory. Second, in the Journal of Economic Perspectives, she discusses how macroeconomists have attempted to more credibly identify theoretical parameters. In particular, external validity is so concerning in macro – remember the Lucas Critique! – that the essence of the problem involves combining empirical variation for identification with theory mapping that variation into broader policy guidance. I hesitate to stop here since Nakamura has so many influential papers, but let us take just more quick tasters that are well worth your more deep exploration. On the government spending side, she uses local spending shocks and a serious model to figure out the national fiscal multiplier from government spending. Second, she recently has linked the end of large-scale increases in female moves from home production to the labor force has caused recessions to last longer.

How We Create and Destroy Growth: A Nobel for Romer and Nordhaus

Occasionally, the Nobel Committee gives a prize which is unexpected, surprising, yet deft in how it points out underappreciated research. This year, they did no such thing. Both William Nordhaus and Paul Romer have been running favorites for years in my Nobel betting pool with friends at the Federal Reserve. The surprise, if anything, is that the prize went to both men together: Nordhaus is best known for his environmental economics, and Romer for his theory of “endogenous” growth.

On reflection, the connection between their work is obvious. But it is the connection that makes clear how inaccurate many of today’s headlines – “an economic prize for climate change” – really is. Because it is not the climate that both winners build on, but rather a more fundamental economic question: economic growth. Why are some places and times rich and others poor? And what is the impact of these differences? Adam Smith’s “The Wealth of Nations” is formally titled “An Inquiry into the Nature and Causes of the Wealth of Nations”, so these are certainly not new questions in economics. Yet the Classical economists did not have the same conception of economic growth that we have; they largely lived in a world of cycles, of ebbs and flows, with income per capita facing the constraint of agricultural land. Schumpeter, who certainly cared about growth, notes that Smith’s discussion of the “different progress of opulence in different nations” is “dry and uninspired”, perhaps only a “starting point of a sort of economic sociology that was never written.”

As each generation became richer than the one before it – at least in a handful of Western countries and Japan – economists began to search more deeply for the reason. Marx saw capital accumulation as the driver. Schumpeter certainly saw innovation (though not invention, as he always made clear) as important, though he had no formal theory. It was two models that appear during and soon after World War II – that of Harrod-Domar, and Solow-Swan-Tinbergen – which began to make real progress. In Harrod-Domar, economic output is a function of capital Y=f(K), nothing is produced without capital f(0)=0, the economy is constant returns to scale in capital df/dK=c, and the change in capital over time depends on what is saved from output minus what depreciates dK/dt=sY-zK, where z is the rate of depreciation. Put those assumptions together and you will see that growth, dY/dt=sc-z. Since c and z are fixed, the only way to grow is to crank up the savings rate, Soviet style. And no doubt, capital deepening has worked in many places.

Solow-type models push further. They let the economy be a function of “technology” A(t), the capital stock K(t), and labor L(t), where output Y(t)=K^a*(A(t)L(t))^(1-a) – that is, that production is constant returns to scale in capital and labor. Solow assumes capital depends on savings and depreciation as in Harrod-Domar, that labor grows at a constant rate n, and that “technology” grows at constant rate g. Solving this model gets you that the economy grows such that dY/dt=sy-k(n+z+g), and that output is exactly proportional to capital. You can therefore just run a regression: we observe the amount of labor and capital, and Solow shows that there is not enough growth in those factors to explain U.S. growth. Instead, growth seems to be largely driven by change in A(t), what Abramovitz called “the measure of our ignorance” but which we often call “technology” or “total factor productivity”.

Well, who can see that fact, as well as the massive corporate R&D facilities of the post-war era throwing out inventions like the transistor, and not think: surely the factors that drive A(t) are endogenous, meaning “from within”, to the profit-maximizing choices of firms? If firms produce technology, what stops other firms from replicating these ideas, a classic positive externality which would lead the rate of technology in a free market to be too low? And who can see the low level of convergence of poor country incomes to rich, and not think: there must be some barrier to the spread of A(t) around the world, since otherwise the return to capital must be extraordinary in places with access to great technology, really cheap labor, and little existing capital to combine with it. And another question: if technology – productivity itself! – is endogenous, then ought we consider not just the positive externality that spills over to other firms, but also the negative externality of pollution, especially climate change, that new technologies both induce and help fix? Finally, if we know how to incentivize new technology, and how growth harms the environment, what is the best way to mitigate the great environmental problem of our day, climate change, without stopping the wondrous increase in living standards growth keeps providing? It is precisely for helping answer these questions that Romer and Nordhaus won the Nobel.

Romer and Endogenous Growth

Let us start with Paul Romer. You know you have knocked your Ph.D. thesis out of the park when the great economics journalist David Warsh writes an entire book hailing your work as solving the oldest puzzle in economics. The two early Romer papers, published in 1986 and 1990, have each been cited more than 25,000 times, which is an absolutely extraordinary number by the standards of economics.

Romer’s achievement was writing a model where inventors spend money to produce inventions with increasing returns to scale, other firms use those inventions to produce goods, and a competitive Arrow-Debreu equilibrium still exists. If we had such a model, we could investigate what policies a government might wish to pursue if it wanted to induce firms to produce growth-enhancing inventions.

Let’s be more specific. First, innovation is increasing returns to scale because ideas are nonrival. If I double the amount of labor and capital, holding technology fixed, I double output, but if I double technology, labor, and capital, I more than double output. That is, give one person a hammer, and they can build, say, one staircase a day. Give two people two hammers, and they can build two staircases by just performing exactly the same tasks. But give two people two hammers, and teach them a more efficient way to combine nail and wood, and they will be able to build more than two staircases. Second, if capital and labor are constant returns to scale and are paid their marginal product in a competitive equilibrium, then there is no output left to pay inventors anything for their ideas. That is, it is not tough to model in partial equilibrium the idea of nonrival ideas, and indeed the realization that a single invention improves productivity for all is also an old one: as Thomas Jefferson wrote in 1813, “[h]e who receives an idea from me, receives instruction himself without lessening mine; as he who lights his taper at mine, receives light without darkening me.” The difficulty is figuring out how to get these positive spillovers yet still have “prices” or some sort of rent for the invention. Otherwise, why would anyone pursue costly invention?

We also need to ensure that growth is not too fast. There is a stock of existing technology in the world. I use that technology to create new innovations which grow the economy. With more people over time and more innovations over time, you may expect the growth rate to be higher in bigger and more technologically advanced societies. It is in part, as Michael Kremer points out in his One Million B.C. paper. Nonetheless, the rate of growth is not asymptotically increasing by any stretch (see, e.g., Ben Jones on this point). Indeed, growth is nearly constant, abstracting from the business cycle, in the United States, despite a big growth in population and the stock of existing technology.

Romer’s first attempt at endogenous growth was based on his thesis and published in the JPE in 1986. Here, he adds “learning by doing” to Solow: technology is a function of the capital stock A(t)=bK(t). As each firm uses capital, they generate learning which spills over to other firms. Even if population is constant, with appropriate assumptions on production functions and capital depreciation, capital, output, and technology grow over time. There is a problem here, however, and one that is common to any model based on learning-by-doing which partially spills over to other firms. As Dasgupta and Stiglitz point out, if there is learning-by-doing which only partially spills over, the industry is a natural monopoly. And even if it starts competitively, as I learn more than you, dynamically I can produce more efficiently, lower my prices, and take market share from you. A decentralized competitive equilibrium with endogenous technological growth is unsustainable!

Back to the drawing board, then. We want firms to intentionally produce technology in a competitive market as they would other goods. We want technology to be nonrival. And we want technology production to lead to growth. Learning-by-doing allows technology to spill over, but would simply lead to a monopoly producer. Pure constant-returns-to-scale competitive production, where technology is just an input like capital produced with a “nonconvexity” – only the initial inventor pays the fixed cost of invention – means that there is no output left to pay for invention once other factors get their marginal product. A natural idea, well known to Arrow 1962 and others, emerges: we need some source of market power for inventors.

Romer’s insight is that inventions are nonrival, yes, but they are also partially excludable, via secrecy, patents, or other means. In his blockbuster 1990 JPE Endogenous Technological Change, he lets inventions be given an infinite patent, but also be partially substitutable by other inventions, constraining price (this is just a Spence-style monopolistic competition model). The more inventions there are, the more efficiently final goods can be made. Future researchers can use present technology as an input to their invention for free. Invention is thus partially excludable in the sense that my exact invention is “protected” from competition, but also spills over to other researchers by making it easier for them to invent other things. Inventions are therefore neither public nor private goods, and also not “club goods” (nonrival but excludable) since inventors cannot exclude future inventors from using their good idea to motivate more invention. Since there is free entry into invention, the infinite stream of monopoly rents from inventions is exactly equal to their opportunity cost.

From the perspective of final goods producers, there are just technologies I can license as inputs, which I then use in a constant returns to scale way to produce goods, as in Solow. Every factor is paid its marginal product, but inventions are sold for more than their marginal cost due to monopolistic excludability from secrecy or patents. The model is general equilibrium, and gives a ton of insight about policy: for instance, if you subsidize capital goods, do you get more or less growth? In Romer (1986), where all growth is learning-by-doing, cheaper capital means more learning means more growth. In Romer (1990), capital subsidies can be counterproductive!

There are some issues to be worked out: the Romer models still have “scale effects” where growth is not constant, roughly true in the modern world, despite changes in population and the stock of technology (see Chad Jones’ 1995 and 1999 papers). The neo-Schumpeterian models of Aghion-Howitt and Grossman-Helpman add the important idea that new inventions don’t just add to the stock of knowledge, but also make old inventions less valuable. And really critically, the idea that institutions and not just economic fundamentals affect growth – meaning laws, culture, and so on – is a massive field of research at present. But it was Romer who first cracked the nut of how to model invention in general equilibrium, and I am unaware of any later model which solves this problem in a more satisfying way.

Nordhaus and the Economic Solution to Pollution

So we have, with Romer, a general equilibrium model for thinking about why people produce new technology. The connection with Nordhaus comes in a problem that is both caused by, and potentially solved by, growth. In 2018, even an ignoramus knows the terms “climate change” and “global warming”. This was not at all the case when William Nordhaus began thinking about how the economy and the environment interrelate in the early 1970s.

Growth as a policy goal was fairly unobjectionable as a policy goal in 1960: indeed, a greater capability of making goods, and of making war, seemed a necessity for both the Free and Soviet worlds. But by the early 1970s, environmental concerns arose. The Club of Rome warned that we were going to run out of resources if we continued to use them so unsustainably: resources are of course finite, and there are therefore “limits to growth”. Beyond just running out of resources, growth could also be harmful because of negative externalities on the environment, particularly the newfangled idea of global warming an MIT report warned about in 1970.

Nordhaus treated those ideas both seriously and skeptically. In a 1974 AER P&P, he notes that technological progress or adequate factor substitution allow us to avoid “limits to growth”. To put it simply, whales are limited in supply, and hence whale oil is as well, yet we light many more rooms than we did in 1870 due to new technologies and substitutes for whale oil. Despite this skepticism, Nordhaus does show concern for the externalities of growth on global warming, giving a back-of-the-envelope calculation that along a projected Solow-type growth path, the amount of carbon in the atmosphere will reach a dangerous 487ppm by 2030, surprisingly close to our current estimates. In a contemporaneous essay with Tobin, and in a review of an environmentalist’s “system dynamics” predictions of future economic collapse, Nordhaus reaches a similar conclusion: substitutable factors mean that running out of resources is not a huge concern, but rather the exact opposite, that we will have access to and use too many polluting resources, should worry us. That is tremendous foresight for someone writing in 1974!

Before turning back to climate change, can we celebrate again the success of economics against the Club of Rome ridiculousness? There were widespread predictions, from very serious people, that growth would not just slow but reverse by the end of the 1980s due to “unsustainable” resource use. Instead, GDP per capita has nearly doubled since 1990, with the most critical change coming for the very poorest. There would have been no greater disaster for the twentieth century than had we attempted to slow the progress and diffusion of technology, in agriculture, manufacturing and services alike, in order to follow the nonsense economics being promulgated by prominent biologists and environmental scientists.

Now, being wrong once is no guarantee of being wrong again, and the environmentalists appear quite right about climate change. So it is again a feather in the cap of Nordhaus to both be skeptical of economic nonsense, and also sound the alarm about true environmental problems where economics has something to contribute. As Nordhaus writes, “to dismiss today’s ecological concerns out of hand would be reckless. Because boys have mistakenly cried “wolf’ in the past does not mean that the woods are safe.”

Just as we can refute Club of Rome worries with serious economics, so too can we study climate change. The economy affects the climate, and the climate effects the economy. What we need an integrated model to assess how economic activity, including growth, affects CO2 production and therefore climate change, allowing us to back out the appropriate Pigouvian carbon tax. This is precisely what Nordhaus did with his two celebrated “Integrated Assessment Models”, which built on his earlier simplified models (e.g., 1975’s Can We Control Carbon Dioxide?). These models have Solow-type endogenous savings, and make precise the tradeoffs of lower economic growth against lower climate change, as well as making clear the critical importance of the social discount rate and the micro-estimates of the cost of adjustment to climate change.

The latter goes well beyond the science of climate change holding the world constant: the Netherlands, in a climate sense, should be underwater, but they use dikes to restraint the ocean. Likewise, the cost of adjusting to an increase in temperature is something to be estimated empirically. Nordhaus takes climate change very seriously, but he is much less concerned about the need for immediate action than the famous Stern report, which takes fairly extreme positions about the discount rate (1000 generations in the future are weighed the same as us, in Stern) and the costs of adjustment.

Consider the following “optimal path” for carbon from Nordhaus’ most recent run of the model, where the blue line is his optimum.

Note that he permits much more carbon than Stern or a policy which mandates temperatures stay below a 2.5 C rise forever. The reason is the costs to growth in the short term are high: the world is still very poor in many places! There was a vitriolic debate following the Stern report about who was correct: whether the appropriate social discount rate is zero or something higher is a quasi-philosophical debate going back to Ramsey (1928). But you can see here how important the calibration is.

There are other minor points of disagreement between Nordhaus and Stern, and my sense is that there has been some, though not full, convergence if their beliefs about optimal policy. But there is no disagreement whatsoever between the economic and environmental community that the appropriate way to estimate the optimal response to climate change is via an explicit model incorporating some sort of endogeneity of economic reaction to climate policy. The power of the model is that we can be extremely clear about what points of disagreement remain, and we can examine the sensitivity of optimal policy to factors like climate “tipping points”.

There is one other issue: in Nordhaus’ IAMs, and in Stern, you limit climate change by imposing cap and trade or carbon taxes. But carbon harms cross borders. How do you stop free riding? Nordhaus, in a 2015 AER, shows theoretically that there is no way to generate optimal climate abatement without sanctions for non-participants, but that relatively small trade penalties work quite well. This is precisely what Emmanuel Macron is currently proposing!

Let’s wrap up by linking Nordhaus even more tightly back to Romer. It should be noted that Nordhaus was very interested in the idea of pure endogenous growth, as distinct from any environmental concerns, from the very start of his career. His thesis was on the topic (leading to a proto-endogenous growth paper in the AER P&P in 1969), and he wrote a skeptical piece in the QJE in 1973 about the then-leading theories of what factors induce certain types of innovation (objections which I think have been fixed by Acemoglu 2002). Like Romer, Nordhaus has long worried that inventors do not receive enough of the return to their invention, and that we measure innovation poorly – see his classic NBER chapter on inventions in lighting, and his attempt to estimate how much of how much of society’s output goes to innovators.

The connection between the very frontier of endogenous growth models, and environmental IAMs, has not gone unnoticed by other scholars. Nordhaus IAMs tend to have limited incorporation of endogenous innovation in dirty or clean sectors. But a fantastic paper by Acemoglu, Aghion, Bursztyn, and Hemous combines endogenous technical change with Nordhaus-type climate modeling to suggest a middle ground between Stern and Nordhaus: use subsidies to get green energy close to the technological frontier, then use taxes once their distortion is relatively limited because a good green substitute exists. Indeed, since this paper first started floating around 8 or so years ago, massive subsidies to green energy sources like solar by many countries have indeed made the “cost” of stopping climate change much lower than if we’d relied solely on taxes, since now production of very low cost solar, and mass market electric cars, is in fact economically viable.

It may indeed be possible to solve climate change – what Stern called “the greatest market failure” man has ever seen – by changing the incentives for green innovation, rather than just by making economic growth more expensive by taxing carbon. Going beyond just solving the problem of climate change, to solving it in a way that minimizes economic harm, is a hell of an accomplishment, and more than worthy of the Nobel prizes Romer and Nordhaus won for showing us this path!

Some Further Reading

In my PhD class on innovation, the handout I give on the very first day introduces Romer’s work and why non-mathematical models of endogenous innovation mislead. Paul Romer himself has a nice essay on climate optimism, and the extent to which endogenous invention matters for how we stop global warming. On why anyone signs climate change abatement agreements, instead of just free riding, see the clever incomplete contracts insight of Battaglini and Harstad. Romer has also been greatly interested in the policy of “high-growth” places, pushing the idea of Charter Cities. Charter Cities involve Hong Kong like exclaves of a developing country where the institutions and legal systems are farmed out to a more stable nation. Totally reasonable, but in fact quite controversial: a charter city proposal in Madagascar led to a coup, and I can easily imagine that the Charter City controversy delayed Romer’s well-deserved Nobel laurel. The New York Times points out that Nordhaus’ brother helped write the Clean Air Act of 1970. Finally, as is always true with the Nobel, the official scientific summary is lucid and deep in its exploration of the two winners’ work.

The 2018 Fields Medal and its Surprising Connection to Economics!

The Fields Medal and Nevanlinna Prizes were given out today. They represent the highest honor possible for young mathematicians and theoretical computer scientists, and are granted only once every four years. The mathematics involved is often very challenging for outsiders. Indeed, the most prominent of this year’s winners, the German Peter Scholze, is best known for his work on “perfectoid spaces”, and I honestly have no idea how to begin explaining them aside from saying that they are useful in a number of problems in algebraic geometry (the lovely field mapping results in algebra – what numbers solve y=2x – and geometry – noting that those solutions to y=2x form a line). Two of this year’s prizes, however, the Fields given to Alessio Figalli and the Nevanlinna to Constantinos Daskalakis, have a very tight connection to an utterly core question in economics. Indeed, both of those men have published work in economics journals!

The problem of interest concerns how best to sell an object. If you are a monopolist hoping to sell one item to one consumer, where the consumer’s valuation of the object is only known to the consumer but commonly known to come from a distribution F, the mechanism that maximizes revenue is of course the Myerson auction from his 1981 paper in Math OR. The solution is simple: make a take it or leave it offer at a minimum price (or “reserve price”) which is a simple function of F. If you are selling one good and there are many buyers, then revenue is maximized by running a second-price auction with the exact same reserve price. In both cases, no potential buyer has any incentive to lie about their true valuation (the auction is “dominant strategy incentive compatible”). And further, seller revenue and expected payments for all players are identical to the Myerson auction in any other mechanism which allocates goods the same way in expectation, with minor caveats. This result is called “revenue equivalence”.

The Myerson paper is an absolute blockbuster. The revelation principle, the revenue equivalence theorem, and a solution to the optimal selling mechanism problem all in the same paper? I would argue it’s the most important result in economics since Arrow-Debreu-McKenzie, with the caveat that many of these ideas were “in the air” in the 1970s with the early ideas of mechanism design and Bayesian game theory. The Myerson result is also really worrying if you are concerned with general economic efficiency. Note that the reserve price means that the seller is best off sometimes not selling the good to anyone, in case all potential buyers have private values below the reserve price. But this is economically inefficient! We know that there exists an allocation mechanism which is socially efficient even when people have private information about their willingness to pay: the Vickrey-Clarke-Groves mechanism. This means that market power plus asymmetric information necessarily destroys social surplus. You may be thinking we know this already: an optimal monopoly price is classic price theory generates deadweight loss. But recall that a perfectly-price-discriminating monopolist sells to everyone whose willingness-to-pay exceeds the seller’s marginal cost of production, hence the only reason monopoly generates deadweight loss in a world with perfect information is that we constrain them to a “mechanism” called a fixed price. Myerson’s result is much worse: letting a monopolist use any mechanism, and price discriminate however they like, asymmetric information necessarily destroys surplus!

Despite this great result, there remain two enormous open problems. First, how should we sell a good when we will interact with the same buyer(s) in the future? Recall the Myerson auction involves bidders truthfully revealing their willingness to pay. Imagine that tomorrow, the seller will sell the same object. Will I reveal my willingness to pay truthfully today? Of course not! If I did, tomorrow the seller would charge the bidder with the highest willingness-to-pay exactly that amount. Ergo, today bidders will shade down their bids. This is called the “ratchet effect”, and despite a lot of progress in dynamic mechanism design, we have still not fully solved for the optimal dynamic mechanism in all cases.

The other challenging problem is one seller selling many goods, where willingness to pay for one good is related to willingness to pay for the others. Consider, for example, selling cable TV. Do you bundle the channels together? Do you offer a menu of possible bundles? This problem is often called “multidimensional screening”, because you are attempting to “screen” buyers such that those with high willingness to pay for a particular good actually pay a high price for that good. The optimal multidimensional screen is a devil of a problem. And it is here that we return to the Fields and Nevanlinna prizes, because they turn out to speak precisely to this problem!

What could possibly be the connection between high-level pure math and this particular pricing problem? The answer comes from the 18th century mathematician Gaspard Monge, founder of the Ecole Polytechnique. He asked the following question: what is the cheapest way to move mass from X to Y, such as moving apples from a bunch of distribution centers to a bunch of supermarkets. It turns out that without convexity or linearity assumptions, this problem is very hard, and it was not solved until the late 20th century. Leonid Kantorovich, the 1975 Nobel winner in economics, paved the way for this result by showing that there is a “dual” problem where instead of looking for the map from X to Y, you look for the probability that a given mass in Y comes from X. This dual turns out to be useful in that there exists an object called a “potential” which helps characterize the optimal transport problem solution in a much more tractable way than searching across any possible map.

Note the link between this problem and our optimal auction problem above, though! Instead of moving mass most cheaply from X to Y, we are looking to maximize revenue by assigning objects Y to people with willingness-to-pay drawn from X. So no surprise, the solution to the optimal transport problem when X has a particular structure and the solution to the revenue maximizing mechanism problem are tightly linked. And luckily for us economists, many of the world’s best mathematicians, including 2010 Fields winner Cedric Villani, and this year’s winner Alessio Figalli, have spent a great deal of effort working on exactly this problem. Ivar Ekeland has a nice series of notes explaining the link between the two problems in more detail.

In a 2017 Econometrica, this year’s Nevanlinna winner Daskalakis and his coauthors Alan Deckelbaum and Christos Tzamos, show precisely how to use strong duality in the optimal transport problem to solve the general optimal mechanism problem when selling multiple goods. The paper is very challenging, requiring some knowledge of measure theory, duality theory, and convex analysis. That said, the conditions they give to check an optimal solution, and the method to find the optimal solution, involve a reasonably straightforward series of inequalities. In particular, the optimal mechanism involves dividing the hypercube of potential types into (perhaps infinite) regions who get assigned the same prices and goods (for example, “you get good A and good B together with probability p at price X”, or “if you are unwilling to pay p1 for A, p2 for B, or p for both together, you get nothing”).

This optimal mechanism has some unusual properties. Remember that the Myerson auction for one buyer is “simple”: make a take it or leave it offer at the reserve price. You may think that if you are selling many items to one buyer, you would likewise choose a reserve price for the whole bundle, particularly when the number of goods with independently distributed values becomes large. For instance, if there are 1000 cable channels, and a buyer has value distributed uniformly between 0 and 10 cents for each channel, then by a limit theorem type argument it’s clear that the willingness to pay for the whole bundle is quite close to 50 bucks. So you may think, just price at a bit lower than 50. However, Daskalakis et al show that when there are sufficiently many goods with i.i.d. uniformly-distributed values, it is never optimal to just set a price for the whole bundle! It is also possible to show that the best mechanism often involves randomization, where buyers who report that they are willing to pay X for item a and Y for item b will only get the items with probability less than 1 at specified price. This is quite contrary to my intuition, which is that in most mechanism problems, we can restrict focus to deterministic assignment. It was well-known that multidimensional screening has weird properties; for example, Hart and Reny show that an increase in buyer valuations can cause seller revenue from the optimal mechanism to fall. The techniques Daskalakis and coauthors develop allow us to state exactly what we ought do in these situations previously unknown in the literature, such as when we know we need mechanisms more complicated than “sell the whole bundle at price p”.

The history of economics has been a long series of taking tools from the frontier of mathematics, from the physics-based analogues of the “marginalists” in the 1870s, to the fixed point theorems of the early game theorists, the linear programming tricks used to analyze competitive equilibrium in the 1950s, and the tropical geometry recently introduced to auction theory by Elizabeth Baldwin and Paul Klemperer. We are now making progress on pricing issues that have stumped some of the great theoretical minds in the history of the field. Multidimensional screening is an incredibly broad topic: how ought we regulate a monopoly with private fixed and marginal costs, how ought we tax agents who have private costs of effort and opportunities, how ought a firm choose wages and benefits, and so on. Knowing the optimum is essential when it comes to understanding when we can use simple, nearly-correct mechanisms. Just in the context of pricing, using related tricks to Daskalakis, Gabriel Carroll showed in a recent Econometrica that bundling should be avoided when the principal has limited knowledge about the correlation structure of types, and my old grad school friend Nima Haghpanah has shown, in a paper with Jason Hartline, that firms should only offer high-quality and low-quality versions of their products if consumers’ values for the high-quality good and their relative value for the low versus high quality good are positively correlated. Neither of these results are trivial to prove. Nonetheless, a hearty cheers to our friends in pure mathematics who continue to provide us with the tools we need to answer questions at the very core of economic life!