Author Archives: afinetheorem

Nobel Prize 2016 Part I: Bengt Holmstrom

The Nobel Prize in Economics has been announced, and what a deserving prize it is: Bengt Holmstrom and Oliver Hart have won for the theory of contracts. The name of this research weblog is “A Fine Theorem”, and it would be hard to find two economists whose work is more likely to elicit such a description! Both are incredibly deserving; more than five years ago on this site, I discussed how crazy it was that Holmstrom had yet to win!. The only shock is the combination: a more natural prize would have been Holmstrom with Paul Milgrom and Robert Wilson for modern applied mechanism design, and Oliver Hart with John Moore and Sandy Grossman for the theory of the firm. The contributions of Holmstrom and Hart are so vast that I’m splitting this post into two, so as to properly cover the incredible intellectual accomplishments of these two economists.

The Finnish economist Bengt Holmstrom did his PhD in operations research at Stanford, advised by Robert Wilson, and began his career at my alma mater, the tiny department of Managerial Economics and Decision Sciences at Northwestern’s Kellogg School. To say MEDS struck gold with their hires in this era is an extreme understatement: in 1978 and 1979 alone, they hired Holmstrom and his classmate Paul Milgrom (another Wilson student from Stanford), hired Nancy Stokey promoted Nobel laureate Roger Myerson to Associate Professor, and tenured an adviser of mine, Mark Satterthwaite. And this list doesn’t even include other faculty in the late 1970s and early 1980s like eminent contract theorist John Roberts, behavioralist Colin Camerer, mechanism designer John Ledyard or game theorist Ehud Kalai. This group was essentially put together by two senior economists at Kellogg, Nancy Schwartz and Stanley Reiter, who had the incredible foresight to realize both that applied game theory was finally showing promise of tackling first-order economic questions in a rigorous way, and that the folks with the proper mathematical background to tackle these questions were largely going unhired since they often did their graduate work in operations or mathematics departments rather than traditional economics departments. This market inefficiency, as it were, allowed Nancy and Stan to hire essentially every young scholar in what would become the field of mechanism design, and to develop a graduate program which combined operations, economics, and mathematics in a manner unlike any other place in the world.

From that fantastic group, Holmstrom’s contribution lies most centrally in the area of formal contract design. Imagine that you want someone – an employee, a child, a subordinate division, an aid contractor, or more generally an agent – to perform a task. How should you induce them to do this? If the task is “simple”, meaning the agent’s effort and knowledge about how to perform the task most efficiently is known and observable, you can simply pay a wage, cutting off payment if effort is not being exerted. When only the outcome of work can be observed, if there is no uncertainty in how effort is transformed into outcomes, knowing the outcome is equivalent to knowing effort, and hence optimal effort can be achieved via a bonus payment made on the basis of outcomes. All straightforward so far. The trickier situations, which Holmstrom and his coauthors analyzed at great length, are when neither effort nor outcomes are directly observable.

Consider paying a surgeon. You want to reward the doctor for competent, safe work. However, it is very difficult to observe perfectly what the surgeon is doing at all times, and basing pay on outcomes has a number of problems. First, the patient outcome depends on the effort of not just one surgeon, but on others in the operating room and prep table: team incentives must be provided. Second, the doctor has many ways to shift the balance of effort between reducing costs to the hospital, increasing patient comfort, increasing the quality of the medical outcome, and mentoring young assistant surgeons, so paying on the basis of one or two tasks may distort effort away from other harder-to-measure tasks: there is a multitasking problem. Third, the number of medical mistakes, or the cost of surgery, that a hospital ought expect from a competent surgeon depends on changes in training and technology that are hard to know, and hence a contract may want to adjust payments for its surgeons on the performance of surgeons elsewhere: contracts ought take advantage of relevant information when it is informative about the task being incentivized. Fourth, since surgeons will dislike risk in their salary, the fact that some negative patient outcomes are just bad luck means that you will need to pay the surgeon very high bonuses to overcome their risk aversion: when outcome measures involve uncertainty, optimal contracts will weigh “high-powered” bonuses against “low-powered” insurance against risk. Fifth, the surgeon can be incentivized either by payments today or by keeping their job tomorrow, and worse, these career concerns may cause the surgeon to waste the hospital’s money on tasks which matter to the surgeon’s career beyond the hospital.

Holmstrom wrote the canonical paper on each of these topics. His 1979 paper in the Bell Journal of Economics shows that any information which reduces the uncertainty about what an agent actually did should feature in a contract, since by reducing uncertainty, you reduce the risk premium needed to incentivize the agent to accept the contract. It might seem strange that contracts in many cases do not satisfy this “informativeness principle”. For instance, CEO bonuses are often not indexed to the performance of firms in the same industry. If oil prices rise, essentially all oil firms will be very profitable, and this is true whether or not a particular CEO is a good one. Bertrand and Mullainathan argue that this is because many firms with diverse shareholders are poorly governed!

The simplicity of contracts in the real world may have more prosaic explanations. Jointly with Paul Milgrom, the famous “multitasking” paper published in JLEO in 1991 notes that contracts shift incentives across different tasks in addition to serving as risk-sharing mechanisms and as methods for inducing effort. Since bonuses on task A will cause agents to shift effort away from hard-to-measure task B, it may be optimal to avoid strong incentives at all (just pay teachers a salary rather than a bonus based only on test performance) or to split job tasks (pay bonuses to teacher A who is told to focus only on math test scores, and pay salary to teacher B who is meant to serve as a mentor). That outcomes are generated by teams also motivates simpler contracts. Holmstrom’s 1982 article on incentives in teams, published in the Bell Journal, points out that if both my effort and yours is required to produce a good outcome, then the marginal product of our efforts are both equal to the entire value of what is produced, hence there is not enough output to pay each of us our marginal product. What can be done? Alchian and Demsetz had noticed this problem in 1972, arguing that firms exist to monitor the effort of individuals working in teams. With perfect knowledge of who does what, you can simply pay the workers a wage sufficient to make the optimal effort, then collect the residual as profit. Holmstrom notes that the monitoring isn’t the important bit: rather, even shareholder controlled firms where shareholders do no monitoring at all are useful. The reason is that shareholders can be residual claimants for profit, and hence there is no need to fully distribute profit to members of the team. Free-riding can therefore be eliminated by simply paying team members a wage of X if the team outcome is optimal, and 0 otherwise. Even a slight bit of shirking by a single agent drops their payment precipitously (which is impossible if all profits generated by the team are shared by the team), so the agents will not shirk. Of course, when there is uncertainty about how team effort transforms into outcomes, this harsh penalty will not work, and hence incentive problems may require team sizes to be smaller than that which is first-best efficient. A third justification for simple contracts is career concerns: agents work hard today to try to signal to the market that they are high quality, and do so even if they are paid a fixed wage. This argument had been made less formally by Fama, but Holmstrom (in a 1982 working paper finally published in 1999 in RESTUD) showed that this concern about the market only completely mitigates moral hazard if outcomes within a firm were fully observable to the market, or the future is not discounted at all, or there is no uncertainty about agent’s abilities. Indeed, career concerns can make effort provision worse; for example, agents may take actions to signal quality to the market which are negative for their current firm! A final explanation for simple contracts comes from Holmstrom’s 1987 paper with Milgrom in Econometrica. They argue that simple “linear” contracts, with a wage and a bonus based linearly on output, are more “robust” methods of solving moral hazard because they are less susceptible to manipulation by agents when the environment is not perfectly known. Michael Powell, a student of Holmstrom’s now at Northwestern, has a great set of PhD notes providing details of these models.

These ideas are reasonably intuitive, but the way Holmstrom answered them is not. Think about how an economist before the 1970s, like Adam Smith in his famous discussion of the inefficiency of sharecropping, might have dealt with these problems. These economists had few tools to deal with asymmetric information, so although economists like George Stigler analyzed the economic value of information, the question of how to elicit information useful to a contract could not be discussed in any systematic way. These economists would have been burdened by the fact that the number of contracts one could write are infinite, so beyond saying that under a contract of type X does not equate marginal cost to marginal revenue, the question of which “second-best” contract is optimal is extraordinarily difficult to answer in the absence of beautiful tricks like the revelation principle partially developed by Holmstrom himself. To develop those tricks, a theory of how individuals would respond to changes in their joint incentives over time was needed; the ideas of Bayesian equilibria and subgame perfection, developed by Harsanyi and Selten, were unknown before the 1960s. The accretion of tools developed by pure theory finally permitted, in the late 1970s and early 1980s, an absolute explosion of developments of great use to understanding the economic world. Consider, for example, the many results in antitrust provided by Nobel winner Jean Tirole, discussed here two years ago.

Holmstrom’s work has provided me with a great deal of understanding of why innovation management looks the way it does. For instance, why would a risk neutral firm not work enough on high-variance moonshot-type R&D projects, a question Holmstrom asks in his 1989 JEBO Agency Costs and Innovation? Four reasons. First, in Holmstrom and Milgrom’s 1987 linear contracts paper, optimal risk sharing leads to more distortion by agents the riskier the project being incentivized, so firms may choose lower expected value projects even if they themselves are risk neutral. Second, firms build reputation in capital markets just as workers do with career concerns, and high variance output projects are more costly in terms of the future value of that reputation when the interest rate on capital is lower (e.g., when firms are large and old). Third, when R&D workers can potentially pursue many different projects, multitasking suggests that workers should be given small and very specific tasks so as to lessen the potential for bonus payments to shift worker effort across projects. Smaller firms with fewer resources may naturally have limits on the types of research a worker could pursue, which surprisingly makes it easier to provide strong incentives for research effort on the remaining possible projects. Fourth, multitasking suggests agent’s tasks should be limited, and that high variance tasks should be assigned to the same agent, which provides a role for decentralizing research into large firms providing incremental, safe research, and small firms performing high-variance research. That many aspects of firm organization depend on the swirl of conflicting incentives the firm and the market provide is a topic Holmstrom has also discussed at length, especially in his beautiful paper “The Firm as an Incentive System”; I shall reserve discussion of that paper for a subsequent post on Oliver Hart.

Two final light notes on Holmstrom. First, he is the source of one of my favorite stories about Paul Samuelson, the greatest economic theorist of all time. Samuelson was known for having a steel trap of a mind. At a light trivia session during a house party for young faculty at MIT, Holmstrom snuck in a question, as a joke, asking for the name of the third President of independent Finland. Samuelson not only knew the name, but apparently was also able to digress on the man’s accomplishments! Second, I mentioned at the beginning of this post the illustrious roster of theorists who once sat at MEDS. Business school students are often very hesitant to deal with formal models, partially because they lack a technical background but also because there is a trend of “dumbing down” in business education whereby many schools (of course, not including my current department at The University of Toronto Rotman!) are more worried about student satisfaction than student learning. With perhaps Stanford GSB as an exception, it is inconceivable that any school today, Northwestern included, would gather such an incredible collection of minds working on abstract topics whose applicability to tangible business questions might lie years in the future. Indeed, I could name a number of so-called “top” business schools who have nobody on their faculty who has made any contribution of note to theory! There is a great opportunity for a Nancy Schwartz or Stan Reiter of today to build a business school whose students will have the ultimate reputation for rigorous analysis of social scientific questions.


“Patents as a Spur to Subsequent Innovation: Evidence from Pharmaceuticals,” D. Gilchrist (2016)

Many economists of innovation are hostile to patents as they currently stand: they do not seem to be important drivers of R&D in most industries, the market power they lead to generates substantial deadweight loss, the legal costs around enforcing patents are incredible, and the effect on downstream innovation can be particularly harmful. The argument for patents seems most clear cut in industries where the invention requires large upfront fixed costs of R&D that are paid only by the first inventor, where the invention is clearly delineated, where novelty is easy to understand, and where alternative means of inducing innovation (such as market power in complementary markets, or a large first mover advantage) do not exist. The canonical example of an industry of this type is pharma.

Duncan Gilchrist points out that the market power a patentholder obtains also affects the rents of partial substitutes which might be invented later. Imagine there is a blockbuster statin on patent. If I invent a related drug, the high price of the existing patented drug means I can charge a fairly high price too. If the blockbuster drug were off patent, though, my competitors would be generics whose low price would limit how much I can charge. In other words, the “effective” patent strength in terms of the markup I can charge depends on whether alternatives to my new drug are on patent or are generic. Therefore, the profits I will earn from my drug will be lower when alternative generics exist, and hence my incentive to pay a fixed cost to create the new drug will also be lower.

What does this mean for welfare? A pure “me-too” imitation drug, which generates very little social value compared to the existing patented drug, will never enter if its class is going to see generics in a few years anyway; profits will be competed down to zero. That same drug might find it worthwhile to pay a fixed cost of invention and earn duopoly profits if the existing on patent alternative had many years of patent protection remaining. On the other hand, a drug so much better than existing drugs that even at the pure monopoly price most consumers would prefer it to the existing alternative priced at marginal cost will be developed no matter what, since it faces no de facto restriction on its markup from whether the alternatives in its drug class are generics or otherwise. Therefore, longer patent protection from existing drugs increases entry of drugs in the same class, but mainly those that are only a bit better than existing drugs. This may be better or worse for welfare: there is a wasteful costs of entering with a drug only slightly better than what exists (the private return includes the business stealing, while social welfare doesn’t), but there are also lower prices and perhaps some benefit from variety.

I should note a caveat that really should have been noted in the existing model: changes in de facto patent length for the first drug in class also affect the entry decision of that drug. Longer patent protection may actually cause shorter effective monopoly by inducing entry of imitators! This paper is mainly empirical, so no need for a full Aghion Howitt ’92 model of creative destruction, but it is at least worth noting that the welfare implications of changes in patent protection are somewhat misstated because of this omission.

Empirically, Gilchrist shows clearly that the beginning of new clinical trials for drugs falls rapidly as the first drug in their class has less time remaining on patent: fear of competition with generic partial substitutes dulls the incentive to innovate. The results are clear in straightforward scatterplots, but there is also an IV, to help confirm the causal interpretation, using the gap between the first potentially-defensive patent on the fulcrum patent of the eventual drug, and the beginning of clinical trials, a gap that is driven by randomness in things like unexpected delays in in-house laboratory progress. Using the fact that particularly promising drugs get priority FDA review, Gilchrist also shows that these priority review entrants do not seem to be worried at all about competition from generic substitutes: the “me-too” type of drugs are the ones for whom alternatives going off patent is most damaging to profits.

Final published version in AEJ: Applied 8(4) (No RePEc IDEAS version). Gilchrist is a rare example of a well published young economist working in the private sector; he has a JPE on social learning and a Management Science on behavioral labor in addition to the present paper, but works at robo-investor Wealthfront. In my now six year dataset of the economics job market (which I should discuss again at some point), roughly 2% of “job market stars” wind up outside academia. Budish, Roin and Williams used the similar idea of investigating the effect of patents of innovation by taking advantage of the differing effective patent length drugs for various maladies get as a result of differences in the length of clinical trials following the patent grant. Empirical work on the effect of patent rules is, of course, very difficult since de jure patent strength is very similar in essentially every developed country and every industry; taking advantage of differences in de facto strength is surely a trick that will be applied more broadly.

“Scale versus Scope in the Diffusion of New Technology,” D. Gross (2016)

I am spending part of the fall down at Duke University visiting the well-known group of innovation folks at Fuqua and co-teaching a PhD innovation course with Wes Cohen, who you may know via his work on Absorptive Capacity (EJ, 1989), the “Carnegie Mellon” survey of inventors with Dick Nelson and John Walsh, and his cost sharing R&D argument (article gated) with Steven Klepper. Last week, the class went over a number of papers on the diffusion of technology over space and time, a topic of supreme importance in the economics of innovation.

There are some canonical ideas in diffusion. First, cumulative adoption on the extensive margin – are you or your firm using technology X – follows an S-curve, rising slowly, then rapidly, then slowly again until peak adoption is reached. This fact is known to economists thanks to Griliches 1957 but the idea was initially developed by social psychologists and sociologists. Second, there are massive gaps in the ability of firms and nations to adopt and quickly diffuse new technologies – Diego Comin and Burt Hobijn have written a great deal on this problem. Third, the reason why technologies are slow to adopt depends on many factors, including social learning (e.g., Conley and Udry on pineapple growing in Ghana), pure epidemic-style network spread (the “Bass model”), capital replacement, “appropriate technologies” arriving once conditions are appropriate, and many more.

One that is very much underrated, however, is that technologies diffuse because they and their complements change over time. Dan Gross from HBS, another innovation scholar who likes delving into history, has a great example: the early tractor. The tractor was, in theory, invented in the 1800s, but was uneconomical and not terribly useful. With an invention by Ford in the 1910s, tractors began to spread, particularly among the US wheat belt. The tractor eventually spreads to the rest of the Midwest in the late 1920s and 1930s. A back-of-the-envelope calculation by Gross suggests the latter diffusion saved something like 10% of agricultural labor in the areas where it spread. Why, then, was there such a lag in many states?

There are many hypotheses in the literature: binding financial constraints, differences in farm sizes that make tractors feasible in one area and not another, geographic spread via social learning, and so on. Gross’ explanation is much more natural: early tractors could not work with crops like corn, and it wasn’t until after a general purpose tractor was invented in the 1920s that complementary technologies were created allowing the tractor to be used on a wide variety of farms. The charts are wholly convincing on this point: tractor diffusion time is very much linked to dominant crop, the early tractor “skipped” geographies where were inappropriate, and farms in areas where tractors diffused late nonetheless had substantial diffusion of automobiles, suggesting capital constraints were not the binding factor.

But this leaves one more question: why didn’t someone modify the tractor to make it general purpose in the first place? Gross gives a toy model that elucidates the reason quite well. Assume there is a large firm that can innovate on a technology, and can either develop a general purpose or applied versions of the technology. Assume that there is a fringe of firms that can develop complementary technology to the general purpose one (a corn harvester, for instance). If the large firm is constrained in how much innovation it can perform at any one time, it will first work on the project with highest return. If the large firm could appropriate the rents earned by complements – say, via a licensing fee – it would like to do so, but that licensing fee would decrease the incentive to develop the complements in the first place. Hence the large firm may first work on direct applications where it can capture a larger share of rents. This will imply that technology diffuses slowly first because applications are very specialized, then only as the high-return specialties have all been developed will it become worthwhile to shift researchers over to the general purpose technology. The general purpose technology will induce complements and hence rapid diffusion. As adoption becomes widespread, the rate of adoption slows down again. That is, the S-curve is merely an artifact of differing incentives to change the scope of an invention. Much more convincing that reliance on behavioral biases!

2016 Working Paper (RePEc IDEAS version). I have a paper with Jorge Lemus at Illinois on the problem of incentivizing firms to work on the right type of project, and the implications thereof. We didn’t think in terms of product diffusion, but the incentive to create general purpose technologies can absolutely be added straight into a model of that type.

Reinhard Selten and the making of modern game theory

Reinhard Selten, it is no exaggeration, is a founding father of two massive branches of modern economics: experiments and industrial organization. He passed away last week after a long and idiosyncratic life. Game theory as developed by the three co-Nobel laureates Selten, Nash, and Harsanyi is so embedded in economic reasoning today that, to a great extent, it has replaced price theory as the core organizing principle of our field. That this would happen was not always so clear, however.

Take a look at some canonical papers before 1980. Arrow’s Possibility Theorem simply assumed true preferences can be elicited; not until Gibbard and Satterthwaite do we answer the question of whether there is even a social choice rule that can elicit those preferences truthfully! Rothschild and Stiglitz’s celebrated 1976 essay on imperfect information in insurance markets defines equilibria in terms of a individual rationality, best responses in the Cournot sense, and free entry. How odd this seems today – surely the natural equilibrium in an insurance market depends on beliefs about the knowledge held by others, and beliefs about those beliefs! Analyses of bargaining before Rubinstein’s 1982 breakthrough nearly always rely on axioms of psychology rather than strategic reasoning. Discussions of predatory pricing until the 1970s, at the very earliest, relied on arguments that we now find unacceptably loose in their treatment of beliefs.

What happened? Why didn’t modern game-theoretic treatment of strategic situations – principally those involve more than one agent but less than an infinite number, although even situations of perfect competition now often are motivated game theoretically – arrive soon after the proofs of von Neumann, Morganstern, and Nash? Why wasn’t the Nash program, of finding justification in self-interested noncooperative reasoning for cooperative or axiom-driven behavior, immediately taken up? The problem was that the core concept of the Nash equilibrium simply permits too great a multiplicity of outcomes, some of which feel natural and others of which are less so. As such, a long search, driven essentially by a small community of mathematicians and economists, attempted to find the “right” refinements of Nash. And a small community it was: I recall Drew Fudenberg telling a story about a harrowing bus ride at an early game theory conference, where a fellow rider mentioned offhand that should they crash, the vast majority of game theorists in the world would be wiped out in one go!

Selten’s most renowned contribution came in the idea of perfection. The concept of subgame perfection was first proposed in a German-language journal in 1965 (making it one of the rare modern economic classics inaccessible to English speakers in the original, alongside Maurice Allais’ 1953 French-language paper in Econometrica which introduces the Allais paradox). Selten’s background up to 1965 is quite unusual. A young man during World War II, raised Protestant but with one Jewish parent, Selten fled Germany to work on farms, and only finished high school at 20 and college at 26. His two interests were mathematics, for which he worked on the then-unusual extensive form game for his doctoral degree, and experimentation, inspired by the small team of young professors at Frankfurt trying to pin down behavior in oligopoly through small lab studies.

In the 1965 paper, on demand inertia (paper is gated), Selten wrote a small game theoretic model to accompany the experiment, but realized there were many equilibria. The term “subgame perfect” was not introduced until 1974, also by Selten, but the idea itself is clear in the ’65 paper. He proposed that attention should focus on equilibria where, after every action, each player continues to act rationally from that point forward; that is, he proposed that in every “subgame”, or every game that could conceivably occur after some actions have been taken, equilibrium actions must remain an equilibrium. Consider predatory pricing: a firm considers lowering price below cost today to deter entry. It is a Nash equilibrium for entrants to believe the price would continue to stay low should they enter, and hence to not enter. But it is not subgame perfect: the entrant should reason that after entering, it is not worthwhile for the incumbent to continue to lose money once the entry has already occurred.

Complicated strings of deductions which rule out some actions based on faraway subgames can seem paradoxical, of course, and did even to Selten. In his famous Chain Store paradox, he considers a firm with stores in many locations choosing whether to price aggressively to deter entry, with one potential entrant in each town choosing one at a time whether to enter. Entrants prefer to enter if pricing is not aggressive, but prefer to remain out otherwise; incumbents prefer to price nonaggressively either if entry occurs or not. Reasoning backward, in the final town we have the simple one-shot predatory pricing case analyzed above, where we saw that entry is the only subgame perfect equilibria. Therefore, the entrant in the second-to-last town knows that the incumbent will not fight entry aggressively in the final town, hence there is no benefit to doing so in the second-to-last town, hence entry occurs again. Reasoning similarly, entry occurs everywhere. But if the incumbent could commit in advance to pricing aggressively in, say, the first 10 towns, it would deter entry in those towns and hence its profits would improve. Such commitment may not possible, but what if the incumbent’s reasoning ability is limited, and it doesn’t completely understand why aggressive pricing in early stages won’t deter the entrant in the 16th town? And what if entrants reason that the incumbent’s reasoning ability is not perfectly rational? Then aggressive pricing to deter entry can occur.

That behavior may not be perfectly rational but rather bounded had been an idea of Selten’s since he read Herbert Simon as a young professor, but in his Nobel Prize biography, he argues that progress on a suitable general theory of bounded rationality has been hard to come by. The closest Selten comes to formalizing the idea is in his paper on trembling hand perfection in 1974, inspired by conversations with John Harsanyi. The problem with subgame perfection had been noted: if an opponent takes an action off the equilibrium path, it is “irrational”, so why should rationality of the opponent be assumed in the subgame that follows? Harsanyi assumes that tiny mistakes can happen, putting even rational players into subgames. Taking the limit as mistakes become infinitesimally rare produces the idea of trembling-hand perfection. The idea of trembles implicitly introduces the idea that players have beliefs at various information sets about what has happened in the game. Kreps and Wilson’s sequential equilibrium recasts trembles as beliefs under uncertainty, and showed that a slight modification of the trembling hand leads to an easier decision-theoretic interpretation of trembles, an easier computation of equilibria, and an outcome that is nearly identical to Selten’s original idea. Sequential equilibria, of course, goes on to become to workhorse solution concept in dynamic economics, a concept which underscores essentially all of modern industrial organization.

That Harsanyi, inventor of the Bayesian game, is credited by Selten for inspiring the trembling hand paper is no surprise. The two had met at a conference in Jerusalem in the mid-1960s, and they’d worked together both on applied projects for the US military, and on pure theory research while Selten visiting Berkeley. A classic 1972 paper of theirs on Nash bargaining with incomplete information (article is gated) begins the field of cooperative games with incomplete information. And this was no minor field: Roger Myerson, in his paper introducing mechanism design under incomplete information – the famous Bayesian revelation principle paper – shows that there exists a unique Selten-Harsanyi bargaining solution under incomplete information which is incentive compatible.

Myerson’s example is amazing. Consider building a bridge which costs $100. Two people will use the bridge. One values the bridge at $90. The other values the bridge at $90 with probability .9, and $30 with probability p=.1, where that valuation is the private knowledge of the second person. Note that in either case, the bridge is worth building. But who should pay? If you propose a 50/50 split, the bridge will simply not be built 10% of the time. If you propose an 80/20 split, where even in their worst case situation each person gets a surplus value of ten dollars, the outcome is unfair to player one 90% of the time (where “unfair” will mean, violates certain principles of fairness that Nash, and later Selten and Harsanyi, set out axiomatically). What of the 53/47 split that gives each party, on average, the same split? Again, this is not “interim incentive compatible”, in that player two will refuse to pay in the case he is the type that values the bridge only at $30. Myerson shows mathematically that both players will agree once they know their private valuations to the following deal, and that the deal satisfies the Selten-Nash fairness axioms: when player 2 claims to value at $90, the payment split is 49.5/50.5 and the bridge is always built, but when player 2 claims to value at $30, the entire cost is paid by player 1 but the bridge is built with only probability .439. Under this split, there are correct incentives for player 2 to always reveal his true willingness to pay. The mechanism means that there is a 5.61 percent chance the bridge isn’t built, but the split of surplus from the bridge nonetheless does better than any other split which satisfies all of Harsanyi and Selten’s fairness axioms.

Selten’s later work is, it appears to me, more scattered. His attempt with Harsanyi to formalize “the” equilibrium refinement, in a 1988 book, was a valiant but in the end misguided attempt. His papers on theoretical biology, inspired by his interest in long walks among the wildflowers, are rather tangential to his economics. And what of his experimental work? To understand Selten’s thinking, read this fascinating dialogue with himself that Selten gave as a Schwartz Lecture at Northwestern MEDS. In this dialogue, he imagines a debate between a Bayesian economist, experimentalist, and an evolutionary biologist. The economist argues that “theory without theorems” is doomed to fail, that Bayesianism is normatively “correct”, and the Bayesian reasoning can easily be extended to include costs of reasoning or reasoning mistakes. The experimentalist argues that ad hoc assumptions are better than incorrect ones: just as human anatomy is complex and cannot be reduced to a few axioms, neither can social behavior. The biologist argues that learning a la Nelson and Winter is descriptively accurate as far as how humans behave, whereas high level reasoning is not. The “chairman”, perhaps representing Selten himself, sums up the argument as saying that experiments which simply contradict Bayesianism are a waste of time, but that human social behavior surely depends on bounded rationality and hence empirical work ought be devoted to constructing a foundation for such a theory (shall we call this the “Selten program”?). And yet, this essay was from 1990, and we seem no closer to having such a theory, nor does it seem to me that behavioral research has fundamentally contradicted most of our core empirical understanding derived from theories with pure rationality. Selten’s program, it seems, remains not only incomplete, but perhaps not even first order; the same cannot be said of his theoretical constructs, as without perfection a great part of modern economics simply could not exist.

“The Gift of Moving: Intergenerational Consequences of a Mobility Shock,” E. Nakamura, J. Sigurdsson & J. Steinsson (2016)

The past decade has seen interesting work in many fields of economics on the importance of misallocation for economic outcomes. Hsieh and Klenow’s famous 2009 paper suggested that misallocation of labor and capital in the developing world costs countries like China and India the equivalent of many years of growth. The same two authors have a new paper with Erik Hurst and Chad Jones suggesting that a substantial portion of the growth in the US since 1960 has been via better allocation of workers. In 1960, they note, 94 percent of doctors and lawyers were white men, versus 62 percent today, and we have no reason to believe the innate talent distribution in those fields had changed. Therefore, there were large numbers of women and minorities who would have been talented enough to work in these high-value fields in 1960, but due to misallocation (including in terms of who is educated) did not. Lucia Foster, John Haltiwanger and Chad Syverson have a famous paper in the AER on how to think about reallocation within industries, and the extent to which competition reallocates production from less efficient to more efficient producers; this is important because it is by now well-established that there is an enormous range of productivity within each industry, and hence potentially enormous efficiency gains from proper reallocation away from low-productivity producers.

The really intriguing misallocation question, though, is misallocation of workers across space. Some places are very productive, and others are not. Why don’t workers move? Part of the explanation, particularly in the past few decades, is that due to increasing land use regulation, local changes in total factor productivity increase housing costs, meaning that only high skilled workers gain much by mobility in response to shocks (see, e.g., Ganong and Shoag on the direct question of who benefits from moving, and Hornbeck and Moretti on the effects of productivity shocks on rents and incomes).

A second explanation is that people, quite naturally, value their community. They value their community both because they have friends and often family in the area, and also because they make investments in skills that are well-matched to where they live. For this reason, even if Town A is 10% more productive for the average blue-collar worker, a particular worker in Town B may be reluctant to move if it means giving up community connections or trying to relearn a different skill. This effect appears to be important particularly for people whose original community is low productivity: Deyrugina, Kawano and Levitt showed how those induced out of poor areas of New Orleans by Hurricane Katrina would up with higher wages than those whose neighborhoods were not flooded, and (the well-surnamed) Bryan, Chowdhury and Mobarak find large gains in income when they induce poor rural Bangladeshis to temporarily move to cities.

Today’s paper, by Nakamura et al, is interesting because it shows these beneficial effects of being forced out of one’s traditional community can hold even if the community is rich. The authors look at the impact of the 1973 volcanic eruption which destroyed a large portion of the main town, a large fishing village, on Iceland’s Westman Islands. Though the town had only 5200 residents, this actually makes it large by Icelandic standards: even today, there is only one town on the whole island which is both larger than that and located more than 45 minutes drive from the capital. Further, though the town is a fishing village, it was then and is now quite prosperous due to its harbor, a rarity in Southern Iceland. Residents whose houses were destroyed were compensated by the government, and could have either rebuilt on the island or moved away: those with destroyed houses wind up 15 percentage points more likely to move away than islanders whose houses remained intact.

So what happened? If you were a kid when your family moved away, the instrumental variables estimation suggests you got an average of 3.6 more years of schooling and mid-career earnings roughly 30,000 dollars higher than if you’d remained! Adults who left saw, if anything, a slight decrease in their lifetime earnings. Remember that the Westman Islands were and are wealthier than the rest of Iceland, so moving would really only benefit those whose dynasties had comparative advantage in fields other than fishing. In particular, parents with college educations were more likely to be move, conditional on their house being destroyed, than those without. So why did those parents need to be induced by the volcano to pack up? The authors suggest some inability to bargain as a household (the kids benefited, but not the adults), as well as uncertainty (naturally, whether moving would increase kids’ wages forty years later may have been unclear). From the perspective of a choice model, however, the outcome doesn’t seem unusual: parents, due to their community connections and occupational choice, would have considered moving very costly, even if they knew it was in their kid’s best long-term interests.

There is a lesson in the Iceland experience, as well as in the Katrina papers and other similar results: economic policy should focus on people, and not communities. Encouraging closer community ties, for instance, can make reallocation more difficult, and can therefore increase long-run poverty, by increasing the subjective cost of moving. When we ask how to handle long-run poverty in Appalachia, perhaps the answer is to provide assistance for groups who want to move, therefore gaining the benefit of reallocation across space while lessening the perceived cost of moving (my favorite example of clustered moves is that roughly 5% of the world’s Marshall Islanders now live in Springdale, Arkansas!). Likewise, limits on the movement of parolees across states can entrench poverty at precisely the time the parolee likely has the lowest moving costs.

June 2016 Working Paper (No RePEc IDEAS version yet).

Yuliy Sannikov and the Continuous Time Approach to Dynamic Contracting

The John Bates Clark Award, given to the best economist in the United States under 40, was given to Princeton’s Yuliy Sannikov today. The JBC has, in recent years, been tilted quite heavily toward applied empirical microeconomics, but the prize for Sannikov breaks that streak in striking fashion. Sannikov, it can be fairly said, is a mathematical genius and a high theorist of the first order. He is one of a very small number of people to win three gold medals at the International Math Olympiad – perhaps only Gabriel Carroll, another excellent young theorist, has an equally impressive mathematical background in his youth. Sannikov’s most famous work is in the pure theory of dynamic contracting, which I will spend most of this post discussing, but the methods he has developed turn out to have interesting uses in corporate finance and in macroeconomic models that wish to incorporate a financial sector without using linearization techniques that rob such models of much of their richness. A quick warning: Sannikov’s work is not for the faint of heart, and certainly not for those scared of an equation or two. Economists – and I count myself among this group – are generally scared of differential equations, as they don’t appear in most branches of economic theory (with exceptions, of course: Romer’s 1986 work on endogenous growth, the turnpike theorems, the theory of evolutionary games, etc.). As his work is incredibly technical, I will do my best to provide an overview of his basic technique and its uses without writing down a bunch of equations, but there really is no substitute for going to the mathematics itself if you find these ideas interesting.

The idea of dynamic contracting is an old one. Assume that a risk-neutral principal can commit to a contract that pays an agent on the basis of observed output, with that output being generated this year, next year, and so on. A risk-averse agent takes an unobservable action in every period, which affects output subject to some uncertainty. Payoffs in the future are discounted. Take the simplest possible case: there are two periods, an agent can either work hard or not, output is either 1 or 0, and the probability it is 1 is higher if the agent works hard than otherwise. The first big idea in the dynamic moral hazard of the late 1970s and early 1980s (in particular, Rogerson 1985 Econometrica, Lambert 1983 Bell J. Econ, Lazear and Moore 1984 QJE) is that the optimal contract will condition period 2 payoffs on whether there was a good or bad outcome in period 1; that is, payoffs are history-dependent. The idea is that you can use payoffs in period 2 to induce effort in period 1 (because continuation value increases) and in period 2 (because there is a gap between the payment following good or bad outcomes in that period), getting more bang for your buck. Get your employee to work hard today by dangling a chance at a big promotion opportunity tomorrow, then actually give them the promotion if they work hard tomorrow.

The second big result is that dynamic moral hazard (caveat: at least in cases where saving isn’t possible) isn’t such a problem. In a one-shot moral hazard problem, there is a tradeoff between risk aversion and high powered incentives. I either give you a big bonus when things go well and none if things go poorly (in which case you are induced to work hard, but may be unhappy because much of the bonus is based on things you can’t control), or I give you a fixed salary and hence you have no incentive to work hard. The reason this tradeoff disappears in a dynamic context is that when the agent takes actions over and over and over again, the principle can, using a Law of Large Numbers type argument, figure out exactly the frequency at which the agent has been slacking off. Further, when the agent isn’t slacking off, the uncertainty in output each period is just i.i.d., hence the principal can smooth out the agent’s bad luck, and hence as the discount rate goes to zero there is no tradeoff between providing incentives and the agent’s dislike of risk. Both of these results will hold even in infinite period models, where we just need to realize that all the agent cares about is her expected continuation value following every action, and hence we can analyze infinitely long problems in a very similar way to two period problems (Spear and Srivistava 1987).

Sannikov revisited this literature by solving for optimal or near-to-optimal contracts when agents take actions in continuous rather than discrete time. Note that the older literature generally used dynamic programming arguments and took the discount rate to a limit of zero in order to get interested results. These dynamic programs generally were solved using approximations that formed linear programs, and hence precise intuition of why the model was generating particular results in particular circumstances wasn’t obvious. Comparative statics in particular were tough – I can tell you whether an efficient contract exists, but it is tough to know how that efficient contract changes as the environment changes. Further, situations where discounting is positive are surely of independent interest – workers generally get performance reviews every year, contractors generally do not renegotiate continuously, etc. Sannikov wrote a model where an agent takes actions that control the mean of output continuously over time with Brownian motion drift (a nice analogue of the agent taking an action that each period generates some output that depends on the action and some random term). The agent has the usual decreasing marginal utility of income, so as the agent gets richer over time, it becomes tougher to incentivize the agent with a few extra bucks of payment.

Solving for the optimal contract essentially involves solving two embedded dynamic optimization problems. The agent optimizes effort over time given the contract the principal committed to, and hence the agent chooses an optimal dynamic history-dependent contract given what the agent will do in response. The space of possible history-dependent contracts is enormous. Sannikov shows that you can massively simplify, and solve analytically, for the optimal contract using a four step argument.

First, as in the discrete time approach, we can simplify things by noting that the agent only cares about their continuous-time continuation value following every action they make. The continuation value turns out to be a martingale (conditioning on history, my expectation of the continuation value tomorrow is just my continuation value today), and is basically just a ledger of my promises that I have made to the agent in the future on the basis of what happened in the past. Therefore, to solve for the optimal contract, I should just solve for the optimal stochastic process that determines the continuation value over time. The Martingale Representation Theorem tells me exactly and uniquely what that stochastic process must look like, under the constraint that the continuation value accurately “tracks” past promises. This stochastic process turns out to have a particular analytic form with natural properties (e.g., if you pay flow utility today, you can pay less tomorrow) that depend on the actions the agents take. Second, plug the agent’s incentive compatibility constraint into our equation for the stochastic process that determines the continuation value over time. Third, we just maximize profits for the principal given the stochastic process determining continuation payoffs that must be given to the agent. The principal’s problem determines an HJB equation which can be solved using Ito’s rule plus some effort checking boundary conditions – I’m afraid these details are far too complex for a blog post. But the basic idea is that we wind up with an analytic expression for the optimal way to control the agent’s continuation value over time, and we can throw all sorts of comparative statics right at that equation.

What does this method give us? Because the continuation value and the flow payoffs can be constructed analytically even for positive discount rates, we can actually answer questions like: should you use long-term incentives (continuation value) or short-term incentives (flow payoffs) more when, e.g., your workers have a good outside option? What happens as the discount rate increases? What happens if the uncertainty in the mapping between the agent’s actions and output increases? Answering questions of these types is very challenging, if not impossible, in a discrete time setting.

Though I’ve presented the basic Sannikov method in terms of incentives for workers, dynamic moral hazard – that certain unobservable actions control prices, or output, or other economic parameters, and hence how various institutions or contracts affect those unobservable actions – is a widespread problem. Brunnermeier and Sannikov have a nice recent AER which builds on the intuition of Kiyotaki-Moore models of the macroeconomy with financial acceleration. The essential idea is that small shocks in the financial sector may cause bigger real economy shocks due to deleveraging. Brunnermeier and Sannikov use the continuous-time approach to show important nonlinearities: minor financial shocks don’t do very much since investors and firms rely on their existing wealth, but major shocks off the steady state require capital sales which further depress asset prices and lead to further fire sales. A particularly interesting result is that exogenous risk is low – the economy isn’t very volatile – then there isn’t much precautionary savings, and so a shock that hits the economy will cause major harmful deleveraging and hence endogenous risk. That is, the very calmness of the world economy since 1983 may have made the eventual recession in 2008 worse due to endogenous choices of cash versus asset holdings. Further, capital requirements may actually be harmful if they aren’t reduced following shocks, since those very capital requirements will force banks to deleverage, accelerating the downturn started by the shock.

Sannikov’s entire oeuvre is essentially a graduate course in a new technique, so if you find the results described above interesting, it is worth digging deep into his CV. He is a great choice for the Clark medal, particularly given the deep and rigorous application he has applied his theory to in recent years. There really is no simple version of his results, but his 2012 survey, his recent working paper on moral hazard in labor contracts, and his dissertation work published in Econometrica in 2007 are most relevant. In related work, we’ve previously discussed on this site David Rahman’s model of collusion with continuous-time information flow, a problem very much related to work by Sannikov and his coauthor Andrzej Skrzypacz, as well as Aislinn Bohren’s model of reputation which is related to the single longest theory paper I’ve ever seen, Sannikov and Feingold’s Econometrica on the possibility of “fooling people” by pretending to be a type that you are not. I also like that this year’s JBC makes me look like a good prognosticator: Sannikov is one of a handful of names I’d listed as particularly deserving just two years ago when Gentzkow won!

“Costly Search and Consideration Sets in Storable Goods Markets,” T. Pires (2015)

Terrible news arrived today in our small community of economists: the bright young Portuguese economist and an old friend from my Northwestern days, Tiago Pires, passed away suddenly over the weekend. Tiago is a structural IO economist at the University of North Carolina-Chapel Hill who has written on demand estimation particularly in the face of search costs. Everyone who has met him can tell you that he was always in good spirits, and that he has been a true friend and useful sounding board for many of us. Friends tell me that Tiago had been making the rounds at the industrial organization conference IIOC just this week, and seemed to be in perfect health. To honor Tiago, let’s discuss Tiago’s job market paper from a couple years ago.

The basic idea, which runs through much of Tiago’s work, is that properly evaluating demand for products, and hence the effects of mergers or other IO policies, depends fundamentally on costly search. The basic idea is not new – it can be seen as far back as the great George Stigler’s 1961 paper on the economics of information – but the implications are still not fully drawn out.

Consider shopping for laundry detergent. Rare is the shopper who, like Honey Boo Boo’s family, searches for coupons and compares relative prices every week. Rather, most weeks you likely just show up at your usual store, perhaps glancing at the price of your usual detergent as you pass the aisle; there could be a great sale on some other detergent, but you’d never know it. As you start to run low on detergent at home, you’re more likely to actually stroll down the whole detergent aisle, perhaps checking the price of few more options. On occasion, the detergent makers sponsor an ad or a promotion at the end of the aisle, and you learn the price of that particular product cheaply. If the price is good and you know the price, you might buy some detergent, though not too much since the cost of searching in the future must be traded off against the cost of storing a bunch of detergent in your closet.

Tiago models shoppers who proceed exactly in that fashion: on the basis of how much detergent you have left, you search a handful of detergent prices, and you buy if the price is right. When you are almost out of detergent, you might search a bunch of prices. When you have quite a bit of detergent, you rationally only buy if you happen to see your usual favorite on sale. The data match the basics of the model: in particular, you are more likely to buy your “usual” brand when you have a lot of detergent left than when you are almost out, since it’s not worth bothering to search prices in the former case. This combination of costly search plus changing household “inventory” means that standard static analysis gives a very misleading portrait of what consumers do. First, elasticity estimates will be screwed up: if rivals shift their price up and down, and I don’t even notice the changes, you may think my demand is very inelastic, but really it’s just that I am not searching. Second, price promotions in conjunction with ads that lower search costs aren’t actually that useful for revenue or profit: the shopper would have eventually checked prices when their detergent stock ran low, and the ad just causes them to check prices early and buy if there is a good enough sale, stealing sales away from future shopping trips. Third, popular brands should do what they can to keep consumers from running low in their stock, such as making it obvious via packaging how much detergent is left, or trying to sell bigger packages. The reason is that only the consumer who is low on stock will bother to search the prices of competitors.

Tiago has used search costs is a number of other papers. With Guillermo Marshall, he studied how stores trade off convenience (being located near consumers, roughly) with quality (being a nice supermarket rather than a convenience store, roughly): as travel costs increase because of traffic or bad weather, you see more stores invest in increasing convenience rather than quality, in surprisingly big economic magnitudes. Terrible convenience stores in the ‘hood are partially driven by market frictions due to high transportation costs, not just differences in products demanded or in income! With Fernando Luco and Mahraz Parsanasab, he studies how the Internet has affected the film industry by changing both search costs for learning about what movies might be worth seeing, as well as changing the market structure of the film industry via Netflix, piracy and similar. Looking across countries, internet access improves film industry revenue and decreases market concentration as the internet becomes common, but broadband access has no such revenue effect, and actually makes market concentration worse as it becomes common. Here’s to Tiago’s memory, and to the continued study of markets using our most powerful tool: theoretically-sound models of structural choice combined with data about the operation of real markets!

The costly search paper described above can be found in its most recent working paper version here: November 2015 working paper (No RePEc IDEAS version).

“Does Regression Produce Representative Estimates of Causal Effects?,” P. Aronow & C. Samii (2016)

A “causal empiricist” turn has swept through economics over the past couple decades. As a result, many economists are primarily interested in internally valid treatment effects according to the causal models of Rubin, meaning they are interested in credible statements of how some outcome Y is affected if you manipulate some treatment T given some covariates X. That is, to the extent that full functional form Y=f(X,T) is impossible to estimate because of unobserved confounding variables or similar, it turns out to still be possible to estimate some feature of that functional form, such as the average treatment effect E(f(X,1))-E(f(X,0)). At some point, people like Angrist and Imbens will win a Nobel prize not only for their applied work, but also for clarifying precisely what various techniques are estimating in a causal sense. For instance, an instrumental variable regression under a certain exclusion restriction (let’s call this an “auxiliary assumption”) estimates the average treatment effect along the local margin of people induced into treatment. If you try to estimate the same empirical feature using a different IV, and get a different treatment effect, we all know now that there wasn’t a “mistake” in either paper, but rather than the margins upon which the two different IVs operate may not be identical. Great stuff.

This causal model emphasis has been controversial, however. Social scientists have quibbled because causal estimates generally require the use of small, not-necessarily-general samples, such as those from a particular subset of the population or a particular set of countries, rather than national data or the universe of countries. Many statisticians have gone even further, suggestion that multiple regression with its linear parametric form does not take advantage of enough data in the joint distribution of (Y,X), and hence better predictions can be made with so-called machine learning algorithms. And the structural economists argue that the parameters we actually care about are much broader than regression coefficients or average treatment effects, and hence a full structural model of the data generating process is necessary. We have, then, four different techniques to analyze a dataset: multiple regression with control variables, causal empiricist methods like IV and regression discontinuity, machine learning, and structural models. What exactly do each of these estimate, and how do they relate?

Peter Aronow and Cyrus Samii, two hotshot young political economists, take a look at old fashioned multiple regression. Imagine you want to estimate y=a+bX+cT, where T is a possibly-binary treatment variable. Assume away any omitted variable bias, and more generally assume that all of the assumptions of the OLS model (linearity in covariates, etc.) hold. What does that coefficient c on the treatment indicator represent? This coefficient is a weighted combination of the individual estimated treatment effects, where more weight is given to units whose treatment status is not well explained by covariates. Intuitively, if you are regressing, say, the probability of civil war on participation in international institutions, then if a bunch of countries with very similar covariates all participate, the “treatment” of participation will be swept up by the covariates, whereas if a second group of countries with similar covariates all have different participation status, the regression will put a lot of weight toward those countries since differences in outcomes can be related to participation status.

This turns out to be quite consequential: Aronow and Samii look at one paper on FDI and find that even though the paper used a broadly representative sample of countries around the world, about 10% of the countries weighed more than 50% in the treatment effect estimate, with very little weight on a number of important regions, including all of the Asian tigers. In essence, the sample was general, but the effective sample once you account for weighting was just as limited as some of “nonrepresentative samples” people complain about when researchers have to resort to natural or quasinatural experiments! It turns out that similar effective vs. nominal representativeness results hold even with nonlinear models estimated via maximum likelihood, so this is not a result unique to OLS. Aronow and Samii’s result matters for interpreting bodies of knowledge as well. If you replicate a paper adding in an additional covariate, and get a different treatment effect, it may not reflect omitted variable bias! The difference may simply result from the additional covariate changing the effective weighting on the treatment effect.

So the “externally valid treatment effects” we have been estimating with multiple regression aren’t so representative at all. So when, then, is old fashioned multiple regression controlling for observable covariates a “good” way to learn about the world, compared to other techniques. I’ve tried to think through this is a uniform way; let’s see if it works. First consider machine learning, where we want to estimate y=f(X,T). Assume that there are no unobservables relevant to the estimation. The goal is to estimate the functional form f nonparametrically but to avoid overfitting, and statisticians have devised a number of very clever ways to do this. The proof that they work is in the pudding: cars drive themselves now. It is hard to see any reason why, if there are no unobservables, we wouldn’t want to use these machine learning/nonparametric techniques. However, at present the machine learning algorithms people use literally depend only on data in the joint distribution (X,Y), and not on any auxiliary assumptions. To interpret the marginal effect of a change in T as some sort of “treatment effect” that can be manipulated with policy, if estimated without auxiliary assumptions, requires some pretty heroic assumptions about the lack of omitted variable bias which essentially will never hold in most of the economic contexts we care about.

Now consider the causal model, where y=f(X,U,T) and you interested in what would happen with covariates X and unobservables U if treatment T was changed to a counterfactual. All of these techniques require a particular set of auxiliary assumptions: randomization requires the SUTVA assumption that treatment of one unit does not effect the independent variable of another unit, IV requires the exclusion restriction, diff-in-diff requires the parallel trends assumption, and so on. In general, auxiliary assumptions will only hold in certain specific contexts, and hence by construction the result will not be representative. Further, these assumptions are very limited in that they can’t recover every conditional aspect of y, but rather recover only summary statistics like the average treatment effect. Techniques like multiple regression with covariate controls, or machine learning nonparametric estimates, can draw on a more general dataset, but as Aronow and Samii pointed out, the marginal effect on treatment status they identify is not necessarily effectively drawing on a more general sample.

Structural folks are interested in estimating y=f(X,U,V(t),T), where U and V are unobserved, and the nature of unobserved variables V are affected by t. For example, V may be inflation expectations, T may be the interest rate, y may be inflation today, and X and U are observable and unobservable country characteristics. Put another way, the functional form of f may depend on how exactly T is modified, through V(t). This Lucas Critique problem is assumed away by the auxiliary assumptions in causal models. In order to identify a treatment effect, then, additional auxiliary assumptions generally derived from economic theory are needed in order to understand how V will change in response to a particular treatment type. Even more common is to use a set of auxiliary assumptions to find a sufficient statistic for the particular parameter desired, which may not even be a treatment effect. In this sense, structural estimation is similar to causal models in one way and different in two. It is similar in that it relies on auxiliary assumptions to help extract particular parameters of interest when there are unobservables that matter. It is different in that it permits unobservables to be functions of policy, and that it uses auxiliary assumptions whose credibility leans more heavily on non-obvious economic theory. In practice, structural models often also require auxiliary assumptions which do not come directly from economic theory, such as assumptions about the distribution of error terms which are motivated on the basis of statistical arguments, but in principle this distinction is not a first order difference.

We then have a nice typology. Even if you have a completely universal and representative dataset, multiple regression controlling for covariates does not generally give you a “generalizable” treatment effect. Machine learning can try to extract treatment effects when the data generating process is wildly nonlinear, but has the same nonrepresentativeness problem and the same “what about omitted variables” problem. Causal models can extract some parameters of interest from nonrepresentative datasets where it is reasonable to assume certain auxiliary assumptions hold. Structural models can extract more parameters of interest, sometimes from more broadly representative datasets, and even when there are unobservables that depend on the nature of the policy, but these models require auxiliary assumptions that can be harder to defend. The so-called sufficient statistics approach tries to retain the former advantages of structural models while reducing the heroics that auxiliary assumptions need to perform.

Aronow and Samii is forthcoming in the American Journal of Political Science; the final working paper is at the link. Related to this discussion, Ricardo Hausmann caused a bit of a stir online this week with his “constant adaptation rather than RCT” article. His essential idea was that, unlike with a new medical drug, social science interventions vary drastically depending on the exact place or context; that is, external validity matters so severely that slowly moving through “RCT: Try idea 1”, then “RCT: Try idea 2”, is less successful than smaller, less precise explorations of the “idea space”. He received a lot of pushback from the RCT crowd, but I think for the wrong reason: the constant iteration is less likely to discover underlying mechanisms than even an RCT, as it is still far too atheoretical. The link Hausmann makes to “lean manufacturing” is telling: GM famously (Henderson and Helper 2014) took photos of every square inch of their joint venture plant with NUMMI, and tried to replicate this plant in their other plants. But the underlying reason NUMMI and Toyota worked has to do with the credibility of various relational contracts, rather than the (constantly iterated) features of the shop floor. Iterating without attempting to glean the underlying mechanisms at play is not a rapid route to good policy.

Edit: A handful of embarrassing typos corrected, 2/26/2016

“Firm Dynamics, Persistent Effects of Entry Conditions, and Business Cycles,” S. Moreira (2016)

Business cycle fluctuations have long run effects on a number of economic variables. For instance, if you enter the labor force during a recession, your wages are harmed for many years afterward. Many other economic parameters revert to trend, leaving a past recession just a blip on the horizon. Sara Moreira, a job candidate from the University of Chicago, investigates in her job market paper whether entrepreneurship changes induced by recessions persist in the long run.

New firm formation is procyclical: entrepreneurship fell roughly 20 percent during the recent recession. Looking back at the universe of private firms since the late 1970s, Moreira shows that this procyclicality is common, and that the firms that do form during recessions tend to be smaller than those which form during booms. Incredibly, this size gap persists for at least a decade after the firms are founded! At first glance, this is crazy: if my firm is founded during the 2001 recession, surely any effects from my founding days should have worn off after a decade of introducing new products, hiring new staff, finding new funding sources, etc. And yet Moreira finds this effect no matter how you slice the data, using overall recessions, industry-specific shocks, shocks based on tradable versus nontradable commodities, and so on, and it remains even when accounting for the autocorrelation of the business cycle. The effect is not small: the average firm born during a year with above trend growth is roughly 2 percent larger 10 years later than the average firm born during below trend growth years.

This gap is double surprising if you think about how firms are founded. Imagine we are in middle of a recession, and I am thinking of forming a new construction company. Bank loans are probably tough to get, I am unlikely to be flush with cash to start a new spinoff, I may worry about running out of liquidity before demand picks up, and so on. Because of these negative effects, you might reasonably believe that only very high quality ideas will lead to new firms during recessions, and hence the average firms born during recessions will be the very high quality, fast growing, firms of the future, whereas the average firms born during booms will be dry cleaners and sole proprietorships and local restaurants. And indeed this is the case! Moreira finds that firms born during recessions have high productivity, are more likely to be in high innovation sectors, and and less likely to be (low-productivity) sole proprietorships. We have a real mystery, then: how can firms born during a recession both be high quality and find it tough to grow?

Moreira considers two stories. It may be that adjustment costs matter, and firms born small because the environment is recessionary find it too costly to ramp up in size when the economy improves. Moreira finds no support for this idea: capital-intensive industries show the same patterns as industries using little capital.

Alternatively, customers need to be acquired, and this acquisition process may generate persistence in firm size. Naturally, firms start small because it takes time to teach people about products and for demand to grow: a restaurant chain does not introduce 1000 restaurants in one go. If you start really small because of difficulty in getting funded, low demand, or any other reason, then in year 2 you have fewer existing customers and less knowledge about what consumers want. This causes you to grow slower in year 2, and hence in year 3, you remain smaller than firms that initially were large, and the effect persists every year thereafter. Moreira finds support for this effect: among other checks, industries whose products are more differentiated are the ones most likely to see persistence of size differences.

Taking this intuition to a Hopenhayn-style calibrated model, the data tells us the following. First, it is not guaranteed that recessions lead to smaller firms initially, since the selection of only high productivity ideas into entrepreneurship during recessions, and the problem of low demand, operate in opposite directions, but empirically the latter seems to dominate. Second, if the productivity distribution of new firms were identical during booms and recessions, the initial size difference between firms born during booms and recessions would be double what we actually observe, so the selection story does in fact moderate the effect of the business cycle on new firm size. Third, the average size gap does not close even though the effect of the initial demand shock, hence fewer customers in the first couple years and slower growth thereafter, begins to fade as many years go by. The reason is that idiosyncratic productivity is mean reverting, so the average (relatively low quality at birth) firm born during booms that doesn’t go out of business becomes more like an average overall firm, and the average (relatively high productivity at birth) firm born during recessions sees its relative productivity get worse. Therefore, the advantage recession-born firms get from being born with high quality firms fades, countering the fading harm of the size of these firms from the persistent demand channel. Fourth, the fact that high productivity firms born during recessions grow slowly due to the historic persistence of customer acquisition means that temporary recessions will still affect the job market many years later: the Great Recession, in Moreira’s calibration, will a decade later still be chewing up 600,000 jobs that firms from the 2008-2009 cohort would have employed. Really enjoyed this paper: it’s a great combination of forensic digging through the data, as well as theoretically well-founded rationalization of the patterns observed.

January 2016 working paper. Moreira also has interesting slides showing how to link the skilled wage premium to underlying industry-level elasticities in skilled and unskilled labor. She notes that as services become more important, where labor substitutability is more difficult, the effect of technological change on the wage premium will become more severe.

“Ranking Firms Using Revealed Preference,” I. Sorkin (2015)

Roughly 20 percent of earnings inequality is not driven by your personal characteristics or the type of job you work at, but by the precise firm you work for. This is odd. In a traditional neoclassical labor market, every firm should offer to same wage to workers with the same marginal productivity. If a firm doesn’t do so, surely their workers will quit and go to firms that pay better. One explanation is that since search frictions make it hard to immediately replace workers, firms with market power will wind up sharing rents with their employees. It is costly to search for jobs, but as your career advances, you try to move “up the job ladder” from positions that pay just your marginal product to positions that pay a premium: eventually you wind up as the city bus driver with the six figure contract and once there you don’t leave. But is this all that is going on?

Isaac Sorkin, a job market candidate from Michigan, correctly notes that workers care about the utility their job offers, not the wage. Some jobs stink even though they pay well: 80 hour weeks, high pressure bosses, frequent business travel to the middle of nowhere, low levels of autonomy, etc. We can’t observe the utility a job offers, of course, but this is a problem that always comes up in demand analysis. If a Chipotle burrito and a kale salad cost the same, but you buy the burrito, then you have revealed that you get more utility from the former; this is the old theory of revealed preference. Even though we rarely observe a single person choosing from a set of job offers, we do observe worker flows between firms. If we can isolate workers who leave their existing job for individual reasons, as distinct from those who leave because their entire firm suffers a negative shock, then their new job is “revealed” better. Intuitively, we see a lot of lawyers quit to run a bed and breakfast in Vermont, but basically zero lawyers quitting to take a mining job that pays the same as running a B&B, hence the B&B must be a “better job” than mining, and further if we don’t see any B&B owners quitting to become lawyers, the B&B must be a “better job” than corporate law even if the pay is lower.

A sensible idea, then: the same worker may be paid different amounts in relation to marginal productivity either because they have moved up the job ladder and luckily landed at a firm with market power and hence pay above marginal product (a “good job”), or because different jobs offer different compensating differentials (in which case high paying jobs may actually be “bad jobs” with long hours and terrible work environments). To separate the two rationales, we need to identify the relative attractiveness of jobs, for which revealed preference should work. The problem in practice is both figuring out which workers are leaving for individual reasons, and getting around the problem that it is unusual to observe in the data a nonzero number of people going from firm A to firm B and vice versa.

Sorkin solves these difficulties in a very clever way. Would you believe the secret is to draw on the good old Perron-Frebonius theorem, a trusted tool of microeconomists interested in network structure? How could that be? Workers meet firms in a search process, with firms posting offers in terms of a utility bundle of wages plus amenities. Each worker also has idiosyncratic tastes about things like where to live, how they like the boss, and so on. The number of folks that move voluntarily from job A to job B depends on how big firm A is (bigger firms have more workers that might leave), how frequently A has no negative productivity shocks (in which case moves are voluntary), and the probability a worker from A is offered a job at B when matched and accepts it, which depends on the relative utilities of the two jobs including the individual idiosyncratic portion. An assumption about the distribution of idiosyncratic utility across jobs allows Sorkin to translate probabilities of accepting a job into relative utilities.

What is particularly nice is that the model gives a linear restriction on any two job pairs: the relative probability of moving from A to B instead of B to A depends on the relative utility (abstracting from idiosyncratic portions) adjusted for firm size and offer probability. That is, if M(A,B) is the number of moves from A to B, and V(A) is a (defined in the paper) function of the non-idiosyncratic utility of job A, then

M(A,B)/M(B,A) = V(B)/V(A)

and hence

M(A,B)V(A) = M(B,A)V(B)

Taking this to data is still problematic because we need to restrict to job changes that are not just “my factory went out of business”, and because M(A,B) or M(B,A) are zero for many firm pairs. The first problem is solved by estimating the probability a given job switch is voluntary using the fact that layoff probability is related to the size and growth rate of a firm. The second problem can be solved by noting that if we sum the previous equation over all firms B not equal to A, we have

sum(B!=A)M(A,B)*V(A) = sum(B!=A)M(B,A)*V(B)


V(A) = sum(B!=A)M(B,A)*V(B)/sum(B!=A)M(A,B)

The numerator is the number of hires A makes weighted for the non-idiosyncratic utility of firms the hires come from, and the denominator is the number of people that leave firm A. There is one such linear restriction per firm, but the utility of firm A depends on the utility of all firms. How to avoid this circularity? Write the linear restrictions in matrix form, and use the Perron-Frebonius theorem to see that the relative values of V are determined by a particular eigenvector as long as the matrix of moves is strongly connected! Strongly connected just means that there is at least one chain of moves between employers that can get me from firm A to B and vice versa, for all firm pairs!. All that’s left to do now is to take this to the data (not a trivial computation task, since there are so many firms in the US data that calculating eigenvectors will require some numerical techniques).

So what do we learn? Industries like education offer high utility compared to pay, and industries like mining offer the opposite, as you’d expect. Many low paying jobs offer relatively high nonpay utility, and many female-dominated sectors do as well, implying the measured earnings inequality and gender gaps may be overstating the true extent of utility inequality. That is, a teacher making half what a miner makes is partly reflective of the fact that mining is a job that requires compensating differentials to make up for long hours in the dark and dangerous mine shaft. Further, roughly two thirds of the earnings inequality related to firms seems to be reflecting compensating differentials, and since just over 20% of earnings inequality in the US is firm related, this means that about 15% of earnings inequality is just reflecting the differential perceived quality of jobs. This is a surprising result, and it appears to be driven by differences in job amenities that are not easy to measure. Goldman Sachs is a “good job” despite relatively low pay compared to other finance firms because they offer good training and connections. This type of amenity is hard to observe, but Sorkin’s theoretical approach based on revealed preference allows the econometrician to “see” these types of differences across jobs, and hence to more properly understand which jobs are desirable. This is another great example of a question – how does the quality of jobs differ and what does that say about the nature of earnings inequality – that is fundamentally unanswerable by methodological techniques that are unwilling to inject some theoretical assumptions into the analysis.

November 2015 Working Paper. Sorkin has done some intriguing work using historical data on the minimum wage as well. Essentially, minimum wage changes that are not indexed to inflation are only temporary in real terms, so if it costly to switch from labor to machines, you might not do so in response to a “temporary” minimum wage shock. But a permanent increase does appear to cause long run shifts away from labor, something Sorkin sees in industries from apparel in the early 20th century to fast food restaurants. Simon J├Ąger, a job candidate from Harvard, also has an interesting purely empirical paper about friction in the labor market, taking advantage of early deaths of German workers. When these deaths happen, working in similar roles at the firm see higher wages and lower separation probability for many years, whereas other coworkers see lower wages, with particularly large effects when the dead worker has unusual skills. All quite intuitive from a search model theory of labor, where workers are partial substitutes for folks with the same skills, but complements for folks with firm-specific capital but dissimilar skills. Add these papers to the evidence that efficiency in the search-and-matching process of labor to firms is a first order policy problem.

%d bloggers like this: