“The Gift of Moving: Intergenerational Consequences of a Mobility Shock,” E. Nakamura, J. Sigurdsson & J. Steinsson (2016)

The past decade has seen interesting work in many fields of economics on the importance of misallocation for economic outcomes. Hsieh and Klenow’s famous 2009 paper suggested that misallocation of labor and capital in the developing world costs countries like China and India the equivalent of many years of growth. The same two authors have a new paper with Erik Hurst and Chad Jones suggesting that a substantial portion of the growth in the US since 1960 has been via better allocation of workers. In 1960, they note, 94 percent of doctors and lawyers were white men, versus 62 percent today, and we have no reason to believe the innate talent distribution in those fields had changed. Therefore, there were large numbers of women and minorities who would have been talented enough to work in these high-value fields in 1960, but due to misallocation (including in terms of who is educated) did not. Lucia Foster, John Haltiwanger and Chad Syverson have a famous paper in the AER on how to think about reallocation within industries, and the extent to which competition reallocates production from less efficient to more efficient producers; this is important because it is by now well-established that there is an enormous range of productivity within each industry, and hence potentially enormous efficiency gains from proper reallocation away from low-productivity producers.

The really intriguing misallocation question, though, is misallocation of workers across space. Some places are very productive, and others are not. Why don’t workers move? Part of the explanation, particularly in the past few decades, is that due to increasing land use regulation, local changes in total factor productivity increase housing costs, meaning that only high skilled workers gain much by mobility in response to shocks (see, e.g., Ganong and Shoag on the direct question of who benefits from moving, and Hornbeck and Moretti on the effects of productivity shocks on rents and incomes).

A second explanation is that people, quite naturally, value their community. They value their community both because they have friends and often family in the area, and also because they make investments in skills that are well-matched to where they live. For this reason, even if Town A is 10% more productive for the average blue-collar worker, a particular worker in Town B may be reluctant to move if it means giving up community connections or trying to relearn a different skill. This effect appears to be important particularly for people whose original community is low productivity: Deyrugina, Kawano and Levitt showed how those induced out of poor areas of New Orleans by Hurricane Katrina would up with higher wages than those whose neighborhoods were not flooded, and (the well-surnamed) Bryan, Chowdhury and Mobarak find large gains in income when they induce poor rural Bangladeshis to temporarily move to cities.

Today’s paper, by Nakamura et al, is interesting because it shows these beneficial effects of being forced out of one’s traditional community can hold even if the community is rich. The authors look at the impact of the 1973 volcanic eruption which destroyed a large portion of the main town, a large fishing village, on Iceland’s Westman Islands. Though the town had only 5200 residents, this actually makes it large by Icelandic standards: even today, there is only one town on the whole island which is both larger than that and located more than 45 minutes drive from the capital. Further, though the town is a fishing village, it was then and is now quite prosperous due to its harbor, a rarity in Southern Iceland. Residents whose houses were destroyed were compensated by the government, and could have either rebuilt on the island or moved away: those with destroyed houses wind up 15 percentage points more likely to move away than islanders whose houses remained intact.

So what happened? If you were a kid when your family moved away, the instrumental variables estimation suggests you got an average of 3.6 more years of schooling and mid-career earnings roughly 30,000 dollars higher than if you’d remained! Adults who left saw, if anything, a slight decrease in their lifetime earnings. Remember that the Westman Islands were and are wealthier than the rest of Iceland, so moving would really only benefit those whose dynasties had comparative advantage in fields other than fishing. In particular, parents with college educations were more likely to be move, conditional on their house being destroyed, than those without. So why did those parents need to be induced by the volcano to pack up? The authors suggest some inability to bargain as a household (the kids benefited, but not the adults), as well as uncertainty (naturally, whether moving would increase kids’ wages forty years later may have been unclear). From the perspective of a choice model, however, the outcome doesn’t seem unusual: parents, due to their community connections and occupational choice, would have considered moving very costly, even if they knew it was in their kid’s best long-term interests.

There is a lesson in the Iceland experience, as well as in the Katrina papers and other similar results: economic policy should focus on people, and not communities. Encouraging closer community ties, for instance, can make reallocation more difficult, and can therefore increase long-run poverty, by increasing the subjective cost of moving. When we ask how to handle long-run poverty in Appalachia, perhaps the answer is to provide assistance for groups who want to move, therefore gaining the benefit of reallocation across space while lessening the perceived cost of moving (my favorite example of clustered moves is that roughly 5% of the world’s Marshall Islanders now live in Springdale, Arkansas!). Likewise, limits on the movement of parolees across states can entrench poverty at precisely the time the parolee likely has the lowest moving costs.

June 2016 Working Paper (No RePEc IDEAS version yet).

Yuliy Sannikov and the Continuous Time Approach to Dynamic Contracting

The John Bates Clark Award, given to the best economist in the United States under 40, was given to Princeton’s Yuliy Sannikov today. The JBC has, in recent years, been tilted quite heavily toward applied empirical microeconomics, but the prize for Sannikov breaks that streak in striking fashion. Sannikov, it can be fairly said, is a mathematical genius and a high theorist of the first order. He is one of a very small number of people to win three gold medals at the International Math Olympiad – perhaps only Gabriel Carroll, another excellent young theorist, has an equally impressive mathematical background in his youth. Sannikov’s most famous work is in the pure theory of dynamic contracting, which I will spend most of this post discussing, but the methods he has developed turn out to have interesting uses in corporate finance and in macroeconomic models that wish to incorporate a financial sector without using linearization techniques that rob such models of much of their richness. A quick warning: Sannikov’s work is not for the faint of heart, and certainly not for those scared of an equation or two. Economists – and I count myself among this group – are generally scared of differential equations, as they don’t appear in most branches of economic theory (with exceptions, of course: Romer’s 1986 work on endogenous growth, the turnpike theorems, the theory of evolutionary games, etc.). As his work is incredibly technical, I will do my best to provide an overview of his basic technique and its uses without writing down a bunch of equations, but there really is no substitute for going to the mathematics itself if you find these ideas interesting.

The idea of dynamic contracting is an old one. Assume that a risk-neutral principal can commit to a contract that pays an agent on the basis of observed output, with that output being generated this year, next year, and so on. A risk-averse agent takes an unobservable action in every period, which affects output subject to some uncertainty. Payoffs in the future are discounted. Take the simplest possible case: there are two periods, an agent can either work hard or not, output is either 1 or 0, and the probability it is 1 is higher if the agent works hard than otherwise. The first big idea in the dynamic moral hazard of the late 1970s and early 1980s (in particular, Rogerson 1985 Econometrica, Lambert 1983 Bell J. Econ, Lazear and Moore 1984 QJE) is that the optimal contract will condition period 2 payoffs on whether there was a good or bad outcome in period 1; that is, payoffs are history-dependent. The idea is that you can use payoffs in period 2 to induce effort in period 1 (because continuation value increases) and in period 2 (because there is a gap between the payment following good or bad outcomes in that period), getting more bang for your buck. Get your employee to work hard today by dangling a chance at a big promotion opportunity tomorrow, then actually give them the promotion if they work hard tomorrow.

The second big result is that dynamic moral hazard (caveat: at least in cases where saving isn’t possible) isn’t such a problem. In a one-shot moral hazard problem, there is a tradeoff between risk aversion and high powered incentives. I either give you a big bonus when things go well and none if things go poorly (in which case you are induced to work hard, but may be unhappy because much of the bonus is based on things you can’t control), or I give you a fixed salary and hence you have no incentive to work hard. The reason this tradeoff disappears in a dynamic context is that when the agent takes actions over and over and over again, the principle can, using a Law of Large Numbers type argument, figure out exactly the frequency at which the agent has been slacking off. Further, when the agent isn’t slacking off, the uncertainty in output each period is just i.i.d., hence the principal can smooth out the agent’s bad luck, and hence as the discount rate goes to zero there is no tradeoff between providing incentives and the agent’s dislike of risk. Both of these results will hold even in infinite period models, where we just need to realize that all the agent cares about is her expected continuation value following every action, and hence we can analyze infinitely long problems in a very similar way to two period problems (Spear and Srivistava 1987).

Sannikov revisited this literature by solving for optimal or near-to-optimal contracts when agents take actions in continuous rather than discrete time. Note that the older literature generally used dynamic programming arguments and took the discount rate to a limit of zero in order to get interested results. These dynamic programs generally were solved using approximations that formed linear programs, and hence precise intuition of why the model was generating particular results in particular circumstances wasn’t obvious. Comparative statics in particular were tough – I can tell you whether an efficient contract exists, but it is tough to know how that efficient contract changes as the environment changes. Further, situations where discounting is positive are surely of independent interest – workers generally get performance reviews every year, contractors generally do not renegotiate continuously, etc. Sannikov wrote a model where an agent takes actions that control the mean of output continuously over time with Brownian motion drift (a nice analogue of the agent taking an action that each period generates some output that depends on the action and some random term). The agent has the usual decreasing marginal utility of income, so as the agent gets richer over time, it becomes tougher to incentivize the agent with a few extra bucks of payment.

Solving for the optimal contract essentially involves solving two embedded dynamic optimization problems. The agent optimizes effort over time given the contract the principal committed to, and hence the agent chooses an optimal dynamic history-dependent contract given what the agent will do in response. The space of possible history-dependent contracts is enormous. Sannikov shows that you can massively simplify, and solve analytically, for the optimal contract using a four step argument.

First, as in the discrete time approach, we can simplify things by noting that the agent only cares about their continuous-time continuation value following every action they make. The continuation value turns out to be a martingale (conditioning on history, my expectation of the continuation value tomorrow is just my continuation value today), and is basically just a ledger of my promises that I have made to the agent in the future on the basis of what happened in the past. Therefore, to solve for the optimal contract, I should just solve for the optimal stochastic process that determines the continuation value over time. The Martingale Representation Theorem tells me exactly and uniquely what that stochastic process must look like, under the constraint that the continuation value accurately “tracks” past promises. This stochastic process turns out to have a particular analytic form with natural properties (e.g., if you pay flow utility today, you can pay less tomorrow) that depend on the actions the agents take. Second, plug the agent’s incentive compatibility constraint into our equation for the stochastic process that determines the continuation value over time. Third, we just maximize profits for the principal given the stochastic process determining continuation payoffs that must be given to the agent. The principal’s problem determines an HJB equation which can be solved using Ito’s rule plus some effort checking boundary conditions – I’m afraid these details are far too complex for a blog post. But the basic idea is that we wind up with an analytic expression for the optimal way to control the agent’s continuation value over time, and we can throw all sorts of comparative statics right at that equation.

What does this method give us? Because the continuation value and the flow payoffs can be constructed analytically even for positive discount rates, we can actually answer questions like: should you use long-term incentives (continuation value) or short-term incentives (flow payoffs) more when, e.g., your workers have a good outside option? What happens as the discount rate increases? What happens if the uncertainty in the mapping between the agent’s actions and output increases? Answering questions of these types is very challenging, if not impossible, in a discrete time setting.

Though I’ve presented the basic Sannikov method in terms of incentives for workers, dynamic moral hazard – that certain unobservable actions control prices, or output, or other economic parameters, and hence how various institutions or contracts affect those unobservable actions – is a widespread problem. Brunnermeier and Sannikov have a nice recent AER which builds on the intuition of Kiyotaki-Moore models of the macroeconomy with financial acceleration. The essential idea is that small shocks in the financial sector may cause bigger real economy shocks due to deleveraging. Brunnermeier and Sannikov use the continuous-time approach to show important nonlinearities: minor financial shocks don’t do very much since investors and firms rely on their existing wealth, but major shocks off the steady state require capital sales which further depress asset prices and lead to further fire sales. A particularly interesting result is that exogenous risk is low – the economy isn’t very volatile – then there isn’t much precautionary savings, and so a shock that hits the economy will cause major harmful deleveraging and hence endogenous risk. That is, the very calmness of the world economy since 1983 may have made the eventual recession in 2008 worse due to endogenous choices of cash versus asset holdings. Further, capital requirements may actually be harmful if they aren’t reduced following shocks, since those very capital requirements will force banks to deleverage, accelerating the downturn started by the shock.

Sannikov’s entire oeuvre is essentially a graduate course in a new technique, so if you find the results described above interesting, it is worth digging deep into his CV. He is a great choice for the Clark medal, particularly given the deep and rigorous application he has applied his theory to in recent years. There really is no simple version of his results, but his 2012 survey, his recent working paper on moral hazard in labor contracts, and his dissertation work published in Econometrica in 2007 are most relevant. In related work, we’ve previously discussed on this site David Rahman’s model of collusion with continuous-time information flow, a problem very much related to work by Sannikov and his coauthor Andrzej Skrzypacz, as well as Aislinn Bohren’s model of reputation which is related to the single longest theory paper I’ve ever seen, Sannikov and Feingold’s Econometrica on the possibility of “fooling people” by pretending to be a type that you are not. I also like that this year’s JBC makes me look like a good prognosticator: Sannikov is one of a handful of names I’d listed as particularly deserving just two years ago when Gentzkow won!

“Costly Search and Consideration Sets in Storable Goods Markets,” T. Pires (2015)

Terrible news arrived today in our small community of economists: the bright young Portuguese economist and an old friend from my Northwestern days, Tiago Pires, passed away suddenly over the weekend. Tiago is a structural IO economist at the University of North Carolina-Chapel Hill who has written on demand estimation particularly in the face of search costs. Everyone who has met him can tell you that he was always in good spirits, and that he has been a true friend and useful sounding board for many of us. Friends tell me that Tiago had been making the rounds at the industrial organization conference IIOC just this week, and seemed to be in perfect health. To honor Tiago, let’s discuss Tiago’s job market paper from a couple years ago.

The basic idea, which runs through much of Tiago’s work, is that properly evaluating demand for products, and hence the effects of mergers or other IO policies, depends fundamentally on costly search. The basic idea is not new – it can be seen as far back as the great George Stigler’s 1961 paper on the economics of information – but the implications are still not fully drawn out.

Consider shopping for laundry detergent. Rare is the shopper who, like Honey Boo Boo’s family, searches for coupons and compares relative prices every week. Rather, most weeks you likely just show up at your usual store, perhaps glancing at the price of your usual detergent as you pass the aisle; there could be a great sale on some other detergent, but you’d never know it. As you start to run low on detergent at home, you’re more likely to actually stroll down the whole detergent aisle, perhaps checking the price of few more options. On occasion, the detergent makers sponsor an ad or a promotion at the end of the aisle, and you learn the price of that particular product cheaply. If the price is good and you know the price, you might buy some detergent, though not too much since the cost of searching in the future must be traded off against the cost of storing a bunch of detergent in your closet.

Tiago models shoppers who proceed exactly in that fashion: on the basis of how much detergent you have left, you search a handful of detergent prices, and you buy if the price is right. When you are almost out of detergent, you might search a bunch of prices. When you have quite a bit of detergent, you rationally only buy if you happen to see your usual favorite on sale. The data match the basics of the model: in particular, you are more likely to buy your “usual” brand when you have a lot of detergent left than when you are almost out, since it’s not worth bothering to search prices in the former case. This combination of costly search plus changing household “inventory” means that standard static analysis gives a very misleading portrait of what consumers do. First, elasticity estimates will be screwed up: if rivals shift their price up and down, and I don’t even notice the changes, you may think my demand is very inelastic, but really it’s just that I am not searching. Second, price promotions in conjunction with ads that lower search costs aren’t actually that useful for revenue or profit: the shopper would have eventually checked prices when their detergent stock ran low, and the ad just causes them to check prices early and buy if there is a good enough sale, stealing sales away from future shopping trips. Third, popular brands should do what they can to keep consumers from running low in their stock, such as making it obvious via packaging how much detergent is left, or trying to sell bigger packages. The reason is that only the consumer who is low on stock will bother to search the prices of competitors.

Tiago has used search costs is a number of other papers. With Guillermo Marshall, he studied how stores trade off convenience (being located near consumers, roughly) with quality (being a nice supermarket rather than a convenience store, roughly): as travel costs increase because of traffic or bad weather, you see more stores invest in increasing convenience rather than quality, in surprisingly big economic magnitudes. Terrible convenience stores in the ‘hood are partially driven by market frictions due to high transportation costs, not just differences in products demanded or in income! With Fernando Luco and Mahraz Parsanasab, he studies how the Internet has affected the film industry by changing both search costs for learning about what movies might be worth seeing, as well as changing the market structure of the film industry via Netflix, piracy and similar. Looking across countries, internet access improves film industry revenue and decreases market concentration as the internet becomes common, but broadband access has no such revenue effect, and actually makes market concentration worse as it becomes common. Here’s to Tiago’s memory, and to the continued study of markets using our most powerful tool: theoretically-sound models of structural choice combined with data about the operation of real markets!

The costly search paper described above can be found in its most recent working paper version here: November 2015 working paper (No RePEc IDEAS version).

“Does Regression Produce Representative Estimates of Causal Effects?,” P. Aronow & C. Samii (2016)

A “causal empiricist” turn has swept through economics over the past couple decades. As a result, many economists are primarily interested in internally valid treatment effects according to the causal models of Rubin, meaning they are interested in credible statements of how some outcome Y is affected if you manipulate some treatment T given some covariates X. That is, to the extent that full functional form Y=f(X,T) is impossible to estimate because of unobserved confounding variables or similar, it turns out to still be possible to estimate some feature of that functional form, such as the average treatment effect E(f(X,1))-E(f(X,0)). At some point, people like Angrist and Imbens will win a Nobel prize not only for their applied work, but also for clarifying precisely what various techniques are estimating in a causal sense. For instance, an instrumental variable regression under a certain exclusion restriction (let’s call this an “auxiliary assumption”) estimates the average treatment effect along the local margin of people induced into treatment. If you try to estimate the same empirical feature using a different IV, and get a different treatment effect, we all know now that there wasn’t a “mistake” in either paper, but rather than the margins upon which the two different IVs operate may not be identical. Great stuff.

This causal model emphasis has been controversial, however. Social scientists have quibbled because causal estimates generally require the use of small, not-necessarily-general samples, such as those from a particular subset of the population or a particular set of countries, rather than national data or the universe of countries. Many statisticians have gone even further, suggestion that multiple regression with its linear parametric form does not take advantage of enough data in the joint distribution of (Y,X), and hence better predictions can be made with so-called machine learning algorithms. And the structural economists argue that the parameters we actually care about are much broader than regression coefficients or average treatment effects, and hence a full structural model of the data generating process is necessary. We have, then, four different techniques to analyze a dataset: multiple regression with control variables, causal empiricist methods like IV and regression discontinuity, machine learning, and structural models. What exactly do each of these estimate, and how do they relate?

Peter Aronow and Cyrus Samii, two hotshot young political economists, take a look at old fashioned multiple regression. Imagine you want to estimate y=a+bX+cT, where T is a possibly-binary treatment variable. Assume away any omitted variable bias, and more generally assume that all of the assumptions of the OLS model (linearity in covariates, etc.) hold. What does that coefficient c on the treatment indicator represent? This coefficient is a weighted combination of the individual estimated treatment effects, where more weight is given to units whose treatment status is not well explained by covariates. Intuitively, if you are regressing, say, the probability of civil war on participation in international institutions, then if a bunch of countries with very similar covariates all participate, the “treatment” of participation will be swept up by the covariates, whereas if a second group of countries with similar covariates all have different participation status, the regression will put a lot of weight toward those countries since differences in outcomes can be related to participation status.

This turns out to be quite consequential: Aronow and Samii look at one paper on FDI and find that even though the paper used a broadly representative sample of countries around the world, about 10% of the countries weighed more than 50% in the treatment effect estimate, with very little weight on a number of important regions, including all of the Asian tigers. In essence, the sample was general, but the effective sample once you account for weighting was just as limited as some of “nonrepresentative samples” people complain about when researchers have to resort to natural or quasinatural experiments! It turns out that similar effective vs. nominal representativeness results hold even with nonlinear models estimated via maximum likelihood, so this is not a result unique to OLS. Aronow and Samii’s result matters for interpreting bodies of knowledge as well. If you replicate a paper adding in an additional covariate, and get a different treatment effect, it may not reflect omitted variable bias! The difference may simply result from the additional covariate changing the effective weighting on the treatment effect.

So the “externally valid treatment effects” we have been estimating with multiple regression aren’t so representative at all. So when, then, is old fashioned multiple regression controlling for observable covariates a “good” way to learn about the world, compared to other techniques. I’ve tried to think through this is a uniform way; let’s see if it works. First consider machine learning, where we want to estimate y=f(X,T). Assume that there are no unobservables relevant to the estimation. The goal is to estimate the functional form f nonparametrically but to avoid overfitting, and statisticians have devised a number of very clever ways to do this. The proof that they work is in the pudding: cars drive themselves now. It is hard to see any reason why, if there are no unobservables, we wouldn’t want to use these machine learning/nonparametric techniques. However, at present the machine learning algorithms people use literally depend only on data in the joint distribution (X,Y), and not on any auxiliary assumptions. To interpret the marginal effect of a change in T as some sort of “treatment effect” that can be manipulated with policy, if estimated without auxiliary assumptions, requires some pretty heroic assumptions about the lack of omitted variable bias which essentially will never hold in most of the economic contexts we care about.

Now consider the causal model, where y=f(X,U,T) and you interested in what would happen with covariates X and unobservables U if treatment T was changed to a counterfactual. All of these techniques require a particular set of auxiliary assumptions: randomization requires the SUTVA assumption that treatment of one unit does not effect the independent variable of another unit, IV requires the exclusion restriction, diff-in-diff requires the parallel trends assumption, and so on. In general, auxiliary assumptions will only hold in certain specific contexts, and hence by construction the result will not be representative. Further, these assumptions are very limited in that they can’t recover every conditional aspect of y, but rather recover only summary statistics like the average treatment effect. Techniques like multiple regression with covariate controls, or machine learning nonparametric estimates, can draw on a more general dataset, but as Aronow and Samii pointed out, the marginal effect on treatment status they identify is not necessarily effectively drawing on a more general sample.

Structural folks are interested in estimating y=f(X,U,V(t),T), where U and V are unobserved, and the nature of unobserved variables V are affected by t. For example, V may be inflation expectations, T may be the interest rate, y may be inflation today, and X and U are observable and unobservable country characteristics. Put another way, the functional form of f may depend on how exactly T is modified, through V(t). This Lucas Critique problem is assumed away by the auxiliary assumptions in causal models. In order to identify a treatment effect, then, additional auxiliary assumptions generally derived from economic theory are needed in order to understand how V will change in response to a particular treatment type. Even more common is to use a set of auxiliary assumptions to find a sufficient statistic for the particular parameter desired, which may not even be a treatment effect. In this sense, structural estimation is similar to causal models in one way and different in two. It is similar in that it relies on auxiliary assumptions to help extract particular parameters of interest when there are unobservables that matter. It is different in that it permits unobservables to be functions of policy, and that it uses auxiliary assumptions whose credibility leans more heavily on non-obvious economic theory. In practice, structural models often also require auxiliary assumptions which do not come directly from economic theory, such as assumptions about the distribution of error terms which are motivated on the basis of statistical arguments, but in principle this distinction is not a first order difference.

We then have a nice typology. Even if you have a completely universal and representative dataset, multiple regression controlling for covariates does not generally give you a “generalizable” treatment effect. Machine learning can try to extract treatment effects when the data generating process is wildly nonlinear, but has the same nonrepresentativeness problem and the same “what about omitted variables” problem. Causal models can extract some parameters of interest from nonrepresentative datasets where it is reasonable to assume certain auxiliary assumptions hold. Structural models can extract more parameters of interest, sometimes from more broadly representative datasets, and even when there are unobservables that depend on the nature of the policy, but these models require auxiliary assumptions that can be harder to defend. The so-called sufficient statistics approach tries to retain the former advantages of structural models while reducing the heroics that auxiliary assumptions need to perform.

Aronow and Samii is forthcoming in the American Journal of Political Science; the final working paper is at the link. Related to this discussion, Ricardo Hausmann caused a bit of a stir online this week with his “constant adaptation rather than RCT” article. His essential idea was that, unlike with a new medical drug, social science interventions vary drastically depending on the exact place or context; that is, external validity matters so severely that slowly moving through “RCT: Try idea 1”, then “RCT: Try idea 2”, is less successful than smaller, less precise explorations of the “idea space”. He received a lot of pushback from the RCT crowd, but I think for the wrong reason: the constant iteration is less likely to discover underlying mechanisms than even an RCT, as it is still far too atheoretical. The link Hausmann makes to “lean manufacturing” is telling: GM famously (Henderson and Helper 2014) took photos of every square inch of their joint venture plant with NUMMI, and tried to replicate this plant in their other plants. But the underlying reason NUMMI and Toyota worked has to do with the credibility of various relational contracts, rather than the (constantly iterated) features of the shop floor. Iterating without attempting to glean the underlying mechanisms at play is not a rapid route to good policy.

Edit: A handful of embarrassing typos corrected, 2/26/2016

“Firm Dynamics, Persistent Effects of Entry Conditions, and Business Cycles,” S. Moreira (2016)

Business cycle fluctuations have long run effects on a number of economic variables. For instance, if you enter the labor force during a recession, your wages are harmed for many years afterward. Many other economic parameters revert to trend, leaving a past recession just a blip on the horizon. Sara Moreira, a job candidate from the University of Chicago, investigates in her job market paper whether entrepreneurship changes induced by recessions persist in the long run.

New firm formation is procyclical: entrepreneurship fell roughly 20 percent during the recent recession. Looking back at the universe of private firms since the late 1970s, Moreira shows that this procyclicality is common, and that the firms that do form during recessions tend to be smaller than those which form during booms. Incredibly, this size gap persists for at least a decade after the firms are founded! At first glance, this is crazy: if my firm is founded during the 2001 recession, surely any effects from my founding days should have worn off after a decade of introducing new products, hiring new staff, finding new funding sources, etc. And yet Moreira finds this effect no matter how you slice the data, using overall recessions, industry-specific shocks, shocks based on tradable versus nontradable commodities, and so on, and it remains even when accounting for the autocorrelation of the business cycle. The effect is not small: the average firm born during a year with above trend growth is roughly 2 percent larger 10 years later than the average firm born during below trend growth years.

This gap is double surprising if you think about how firms are founded. Imagine we are in middle of a recession, and I am thinking of forming a new construction company. Bank loans are probably tough to get, I am unlikely to be flush with cash to start a new spinoff, I may worry about running out of liquidity before demand picks up, and so on. Because of these negative effects, you might reasonably believe that only very high quality ideas will lead to new firms during recessions, and hence the average firms born during recessions will be the very high quality, fast growing, firms of the future, whereas the average firms born during booms will be dry cleaners and sole proprietorships and local restaurants. And indeed this is the case! Moreira finds that firms born during recessions have high productivity, are more likely to be in high innovation sectors, and and less likely to be (low-productivity) sole proprietorships. We have a real mystery, then: how can firms born during a recession both be high quality and find it tough to grow?

Moreira considers two stories. It may be that adjustment costs matter, and firms born small because the environment is recessionary find it too costly to ramp up in size when the economy improves. Moreira finds no support for this idea: capital-intensive industries show the same patterns as industries using little capital.

Alternatively, customers need to be acquired, and this acquisition process may generate persistence in firm size. Naturally, firms start small because it takes time to teach people about products and for demand to grow: a restaurant chain does not introduce 1000 restaurants in one go. If you start really small because of difficulty in getting funded, low demand, or any other reason, then in year 2 you have fewer existing customers and less knowledge about what consumers want. This causes you to grow slower in year 2, and hence in year 3, you remain smaller than firms that initially were large, and the effect persists every year thereafter. Moreira finds support for this effect: among other checks, industries whose products are more differentiated are the ones most likely to see persistence of size differences.

Taking this intuition to a Hopenhayn-style calibrated model, the data tells us the following. First, it is not guaranteed that recessions lead to smaller firms initially, since the selection of only high productivity ideas into entrepreneurship during recessions, and the problem of low demand, operate in opposite directions, but empirically the latter seems to dominate. Second, if the productivity distribution of new firms were identical during booms and recessions, the initial size difference between firms born during booms and recessions would be double what we actually observe, so the selection story does in fact moderate the effect of the business cycle on new firm size. Third, the average size gap does not close even though the effect of the initial demand shock, hence fewer customers in the first couple years and slower growth thereafter, begins to fade as many years go by. The reason is that idiosyncratic productivity is mean reverting, so the average (relatively low quality at birth) firm born during booms that doesn’t go out of business becomes more like an average overall firm, and the average (relatively high productivity at birth) firm born during recessions sees its relative productivity get worse. Therefore, the advantage recession-born firms get from being born with high quality firms fades, countering the fading harm of the size of these firms from the persistent demand channel. Fourth, the fact that high productivity firms born during recessions grow slowly due to the historic persistence of customer acquisition means that temporary recessions will still affect the job market many years later: the Great Recession, in Moreira’s calibration, will a decade later still be chewing up 600,000 jobs that firms from the 2008-2009 cohort would have employed. Really enjoyed this paper: it’s a great combination of forensic digging through the data, as well as theoretically well-founded rationalization of the patterns observed.

January 2016 working paper. Moreira also has interesting slides showing how to link the skilled wage premium to underlying industry-level elasticities in skilled and unskilled labor. She notes that as services become more important, where labor substitutability is more difficult, the effect of technological change on the wage premium will become more severe.

“Ranking Firms Using Revealed Preference,” I. Sorkin (2015)

Roughly 20 percent of earnings inequality is not driven by your personal characteristics or the type of job you work at, but by the precise firm you work for. This is odd. In a traditional neoclassical labor market, every firm should offer to same wage to workers with the same marginal productivity. If a firm doesn’t do so, surely their workers will quit and go to firms that pay better. One explanation is that since search frictions make it hard to immediately replace workers, firms with market power will wind up sharing rents with their employees. It is costly to search for jobs, but as your career advances, you try to move “up the job ladder” from positions that pay just your marginal product to positions that pay a premium: eventually you wind up as the city bus driver with the six figure contract and once there you don’t leave. But is this all that is going on?

Isaac Sorkin, a job market candidate from Michigan, correctly notes that workers care about the utility their job offers, not the wage. Some jobs stink even though they pay well: 80 hour weeks, high pressure bosses, frequent business travel to the middle of nowhere, low levels of autonomy, etc. We can’t observe the utility a job offers, of course, but this is a problem that always comes up in demand analysis. If a Chipotle burrito and a kale salad cost the same, but you buy the burrito, then you have revealed that you get more utility from the former; this is the old theory of revealed preference. Even though we rarely observe a single person choosing from a set of job offers, we do observe worker flows between firms. If we can isolate workers who leave their existing job for individual reasons, as distinct from those who leave because their entire firm suffers a negative shock, then their new job is “revealed” better. Intuitively, we see a lot of lawyers quit to run a bed and breakfast in Vermont, but basically zero lawyers quitting to take a mining job that pays the same as running a B&B, hence the B&B must be a “better job” than mining, and further if we don’t see any B&B owners quitting to become lawyers, the B&B must be a “better job” than corporate law even if the pay is lower.

A sensible idea, then: the same worker may be paid different amounts in relation to marginal productivity either because they have moved up the job ladder and luckily landed at a firm with market power and hence pay above marginal product (a “good job”), or because different jobs offer different compensating differentials (in which case high paying jobs may actually be “bad jobs” with long hours and terrible work environments). To separate the two rationales, we need to identify the relative attractiveness of jobs, for which revealed preference should work. The problem in practice is both figuring out which workers are leaving for individual reasons, and getting around the problem that it is unusual to observe in the data a nonzero number of people going from firm A to firm B and vice versa.

Sorkin solves these difficulties in a very clever way. Would you believe the secret is to draw on the good old Perron-Frebonius theorem, a trusted tool of microeconomists interested in network structure? How could that be? Workers meet firms in a search process, with firms posting offers in terms of a utility bundle of wages plus amenities. Each worker also has idiosyncratic tastes about things like where to live, how they like the boss, and so on. The number of folks that move voluntarily from job A to job B depends on how big firm A is (bigger firms have more workers that might leave), how frequently A has no negative productivity shocks (in which case moves are voluntary), and the probability a worker from A is offered a job at B when matched and accepts it, which depends on the relative utilities of the two jobs including the individual idiosyncratic portion. An assumption about the distribution of idiosyncratic utility across jobs allows Sorkin to translate probabilities of accepting a job into relative utilities.

What is particularly nice is that the model gives a linear restriction on any two job pairs: the relative probability of moving from A to B instead of B to A depends on the relative utility (abstracting from idiosyncratic portions) adjusted for firm size and offer probability. That is, if M(A,B) is the number of moves from A to B, and V(A) is a (defined in the paper) function of the non-idiosyncratic utility of job A, then

M(A,B)/M(B,A) = V(B)/V(A)

and hence

M(A,B)V(A) = M(B,A)V(B)

Taking this to data is still problematic because we need to restrict to job changes that are not just “my factory went out of business”, and because M(A,B) or M(B,A) are zero for many firm pairs. The first problem is solved by estimating the probability a given job switch is voluntary using the fact that layoff probability is related to the size and growth rate of a firm. The second problem can be solved by noting that if we sum the previous equation over all firms B not equal to A, we have

sum(B!=A)M(A,B)*V(A) = sum(B!=A)M(B,A)*V(B)

or

V(A) = sum(B!=A)M(B,A)*V(B)/sum(B!=A)M(A,B)

The numerator is the number of hires A makes weighted for the non-idiosyncratic utility of firms the hires come from, and the denominator is the number of people that leave firm A. There is one such linear restriction per firm, but the utility of firm A depends on the utility of all firms. How to avoid this circularity? Write the linear restrictions in matrix form, and use the Perron-Frebonius theorem to see that the relative values of V are determined by a particular eigenvector as long as the matrix of moves is strongly connected! Strongly connected just means that there is at least one chain of moves between employers that can get me from firm A to B and vice versa, for all firm pairs!. All that’s left to do now is to take this to the data (not a trivial computation task, since there are so many firms in the US data that calculating eigenvectors will require some numerical techniques).

So what do we learn? Industries like education offer high utility compared to pay, and industries like mining offer the opposite, as you’d expect. Many low paying jobs offer relatively high nonpay utility, and many female-dominated sectors do as well, implying the measured earnings inequality and gender gaps may be overstating the true extent of utility inequality. That is, a teacher making half what a miner makes is partly reflective of the fact that mining is a job that requires compensating differentials to make up for long hours in the dark and dangerous mine shaft. Further, roughly two thirds of the earnings inequality related to firms seems to be reflecting compensating differentials, and since just over 20% of earnings inequality in the US is firm related, this means that about 15% of earnings inequality is just reflecting the differential perceived quality of jobs. This is a surprising result, and it appears to be driven by differences in job amenities that are not easy to measure. Goldman Sachs is a “good job” despite relatively low pay compared to other finance firms because they offer good training and connections. This type of amenity is hard to observe, but Sorkin’s theoretical approach based on revealed preference allows the econometrician to “see” these types of differences across jobs, and hence to more properly understand which jobs are desirable. This is another great example of a question – how does the quality of jobs differ and what does that say about the nature of earnings inequality – that is fundamentally unanswerable by methodological techniques that are unwilling to inject some theoretical assumptions into the analysis.

November 2015 Working Paper. Sorkin has done some intriguing work using historical data on the minimum wage as well. Essentially, minimum wage changes that are not indexed to inflation are only temporary in real terms, so if it costly to switch from labor to machines, you might not do so in response to a “temporary” minimum wage shock. But a permanent increase does appear to cause long run shifts away from labor, something Sorkin sees in industries from apparel in the early 20th century to fast food restaurants. Simon Jäger, a job candidate from Harvard, also has an interesting purely empirical paper about friction in the labor market, taking advantage of early deaths of German workers. When these deaths happen, working in similar roles at the firm see higher wages and lower separation probability for many years, whereas other coworkers see lower wages, with particularly large effects when the dead worker has unusual skills. All quite intuitive from a search model theory of labor, where workers are partial substitutes for folks with the same skills, but complements for folks with firm-specific capital but dissimilar skills. Add these papers to the evidence that efficiency in the search-and-matching process of labor to firms is a first order policy problem.

“Estimating Equilibrium in Health Insurance Exchanges,” P. Tebaldi (2016)

After a great visit to San Francisco for AEA and a couple weeks reading hundreds of papers while hiding out in the middle of the Pacific, it’s time to take a look at some of the more interesting job market papers this year. Though my home department isn’t directly hiring, I’m going to avoid commenting on work by candidates being flown out to Toronto in general, though a number of those are fantastic as well. I also now have a fourth year of data on job market “stars” which I’ll present in the next week or so.

Let’s start with a great structural IO paper by Pietro Tebaldi from Stanford. The Affordable Care Act in the United States essentially set up a version of universal health care that relies on subsidizing low income buyers, limiting prices via price caps and age rating limits (the elderly can only be charged a certain multiple of what the young are charged), and providing a centralized comparison system (“Bronze” or “Silver” or whatever plans essentially cover the same medical care, with only the provider and hospital bundle differing). The fundamental fact about American health care is less that it is a non-universal, privately-provided system than that it is enormously expensive: the US government, and this is almost impossible to believe until you look at the numbers, spends roughly the same percentage of GDP on health care as Canada, even though coverage is universal up north. Further, there is quite a bit of market power both on the insurer side, with a handful of insurers in any given market, and on the hospital side. Generally, there are only a handful of hospitals in any region, with the legacy of HMOs making many insurers very reluctant to drop hospitals from their network since customers will complain about losing “their” doctor. Because of these facts, a first-order concern for designing a government health care expansion must be controlling costs.

Tebaldi points out theoretically that the current ACA design inadvertently leads to high insurer markups. In nearly all oligopoly models, markup over costs depends on the price elasticity of demand: if buyers have inelastic demand in general, markups are high. In health care, intuitively young buyers are more price sensitive and have lower expected costs than older buyers; many young folks just won’t go to the doctor if it is too pricey to do so, and young folks are less likely to have a long-time doctor they require in their insurance network. Age rating which limits the maximum price difference between young and old buyers means that anything that leads to lower prices for the young will also lead to lower prices for the old. Hence, the more young people you get in the insurance pool, the lower the markups are, and hence the lower the cost becomes for everyone. The usual explanation for why you need young buyers is that they subsidize the old, high-cost buyers; the rationale here is that young buyers help even other young buyers by making aggregate demand more elastic and hence dragging down oligopoly insurer prices.

How can you get more young buyers in the pool while retaining their elastic demand? Give young buyers a fixed amount subsidy voucher that is bigger than what you give older buyers. Because buyers have low expected costs, they will only buy insurance if it cheap, hence will enter the insurance market if you subsidize them a lot. Once they enter, however, they remain very price sensitive. With lots of young folks as potential buyers, insurers will lower their prices for young buyers in order to attract them, which due to age rating also lowers prices for older buyers. It turns out that the government could lower the subsidy given to older buyers, so that total government subsidies fall, and yet out of pocket spending for the old would still be lower due to the price drop induced by the highly elastic young buyers entering the market.

Now that’s simply a theoretical result. Tebaldi also estimates what would happen in practice, using California data. Different regions of California have different age distributions. you can immediately see that prices are higher for young folks in regions where there are a lot of potential old buyers, and lower in regions with a fewer potential old buyers, for exactly the elasticity difference plus age rating reason given above. These regional differences permit identification of the demand curve, using age-income composition to instrument for price. The marginal costs of insurance companies are tougher to identify, but the essential idea just uses optimal pricing conditions as in Berry, Levinsohn and Pakes. The exact identification conditions are not at all straightforward in selection markets like insurance, since insurer marginal costs depend on who exactly their buyers are in addition to characteristics of the bundle of services they offer. The essential trick is that since insurers are pricing to set marginal revenue equal to marginal cost, the demand curve already estimated tells us whether most marginal customers are old or young in a given region, and hence we can back out what costs may be on the basis of pricing decisions across regions.

After estimating marginal costs and demand curves for insurance, Tebaldi can run a bunch of counterfactuals motivated by the theory discussed above. Replacing price-linked subsidies, where buyers get a subsidy linked to the second-lowest priced plan in their area, with vouchers, where buyers get a fixed voucher regardless of the prices set, essentially makes insurers act as if buyers are more price-elastic: raising insurance prices under price-linked subsidies will also raise the amount of the subsidy, and hence the price increase is not passed 1-for-1 to buyers. Tebaldi estimates insurance prices would fall $200 on average if the current price-linked subsidies were replaced with vouchers of an equivalent size. Since young buyers have elastic demand, coverage of young buyers increases as a result. The $200 fall in prices, then, results first from insurers realizing that all buyers are more sensitive to price changes when they hold vouchers rather than pay a semi-fixed amount determined by a subsidy, and second from the composition of buyers therefore including more elastic young buyers, lowering the optimal insurer markup. The harm, of course, is that vouchers do not guarantee year-to-year that subsidized buyers pay no more than a capped amount, since in general the government does not know every insurer’s cost curve and every buyer’s preferences.

Better still is to make vouchers depend on age. If young buyers get big vouchers, even more of them will buy insurance. This will drag down the price set by insurers since aggregate elasticity increases, and hence old buyers may be better off as well even if they see a reduction in their government subsidy. Tebaldi estimates that a $400 increase in subsidies for those under 45, and a $200 decrease for those over 45, will lead to 50% more young folks buying insurance, a 20% decrease in insurer markup, a post-subsidy price for those over 45 that is unchanged since their lower subsidy is made up for by lower insurer prices, and a 15% decrease in government spending since decreased subsidies for the old more than make up for increased subsidies for the young. There is such a thing as a free lunch!

Now, in practice, governments do not make changes around the margins like this. Certain aspects of the subsidy program are, crazily, set by law and not by bureaucrats. Note how political appearance and equilibrium effect differ in Tebaldi’s estimates: we decrease subsidies for the old and yet everyone including the old is better off due to indirect effects. Politicians, it goes without saying, do not win elections on the basis of indirect effects. A shame!

January 2016 working paper. The paper is quite concisely written, which I appreciate in our era of 80-page behemoths. If you are still reluctant to believe in the importance of insurer market power, Tebaldi and coauthors also have a paper in last year’s AER P&P showing in a really clean way the huge price differences in locations that have limited insurer competition. On the macro side, Tebaldi and network-extraordinaire Matt Jackson about deep recessions making a simple point. In labor search models, the better the boom times, the lower productivity workers you will settle for hiring. Thus, when negative economic shocks follow long expansions, they will lead to more unemployment, simply because there will be more relatively low productivity workers at every firm. Believable history-dependence in macro models is always a challenge, but this theory makes perfect sense.

Douglass North, An Economist’s Historian

Sad news today arrives, as we hear that Douglass North has passed away, living only just longer than his two great compatriots in Cliometrics (Robert Fogel) and New Institutional Economics (Ronald Coase). There will be many lovely pieces today, I’m sure, on North’s qualitative and empirical exploration of the rise of institutions as solutions to agency and transaction cost problems, a series of ideas that continues to be enormously influential. No economist today denies the importance of institutions. If economics is the study of the aggregation of rational choice under constraints, as it is sometimes thought to be, then North focused our mind on the origin of the constraints rather the choice or its aggregation. Why do states develop? Why do guilds, and trade laws, and merchant organizations, and courts, appear, and when? How does organizational persistence negatively affect the economy over time, a question pursued at great length by Daron Acemoglu and his coauthors? All important questions, and it is not clear that there are better answers than the ones North provided.

But North was not, first and foremost, a historian. His PhD is in economics, and even late in life he continued to apply the very most cutting edge economic tools to his studies of institutions. I want to discuss today a beautiful piece of his, “The Role of Institutions in the Revival of Trade”, written jointly with Barry Weingast and Paul Milgrom in 1990. This is one of the fundamental papers in “Analytic Narratives”, as it would later be called, a school which applied formal economic theory to historical questions; I have previously discussed here a series of papers by Avner Greif and his coauthors which are the canonical examples.

Here is the essential idea. In the late middle ages, long distance trade, particularly at “Fairs” held in specific places at specific times, arose again in Western Europe. Agency problems must have been severe: how do you keep people from cheating you, from stealing, from selling defective goods, or from reneging on granted credit? A harmonized body of rules, the Merchant Law, appeared across many parts of Western Europe, with local courts granting judgments on the basis of this Law. In the absence of nation-states, someone with a negative judgment could simply leave the local city where the verdict was given. The threat of not being able to sell in the future may have been sufficient to keep merchants fair, but if the threat of future lost business was the only credible punishment, then why were laws and courts needed at all? Surely merchants could simply let it be known that Johann or Giuseppe is a cheat, and that one shouldn’t deal with them? There is a puzzle here, then: it appears that the set of punishments the Merchant Law could give are identical to the set of “punishments” one receives for having a bad reputation, so why then did anybody bother with courts and formal rules? In terms of modern theory, if relational contracts and formal contracts can offer identical punishments for deviating from cooperation, and formal contracts are costly, then why doesn’t everyone simply rely on relational contracts?

Milgrom, North and Weingast consider a simple repeated Prisoner’s Dilemma. Two agents with a sufficiently high discount rate can sustain cooperation in a Prisoner’s Dilemma using tit-for-tat: if you cheat me today, I cheat you tomorrow. Of course, the Folk Theorem tells us that cooperation can be sustained using potentially more complex punishment strategies in infinitely repeated games with any number of players, although a fundamental idea in the repeated games literature is that it may be necessary to punish people who do not themselves punish when they are meant to do so. In a repeated prisoner’s dilemma with an arbitrary number of players who randomly match each period, cooperation can be sustained in a simple way: you cheat anyone you match with if they cheated their previous trading partner and their previous trading partner did not themselves cheat their partner two rounds ago, and otherwise cooperate.

The trick, though, is that you need to know the two-periods-back history of your current trading partner and their last trading partner. Particularly with long-distance trade, you might frequently encounter traders you don’t know even indirectly. Imagine that every period you trade with someone you have never met before, and who you will never meet again (the “Townsend turnpike”, with two infinite lines of traders moving in opposite directions), and imagine that you do not know the trading history of anyone you match with. In this incomplete information game, there is no punishment for cheating: you cheat the person you match with today, and no one you meet with tomorrow will ever directly or indirectly learn about this. Hence cooperation is not sustained.

What we need, then, is an institution that first collects a sufficient statistic for the honesty of traders you might deal with, that incentivizes merchants to bother to check this sufficient statistic and punish people who have cheated, and that encourages people to report if they have been cheated even if this reporting is personally costly. That is, “institutions must be designed both to keep the traders adequately informed of their responsibilities and to motivate them to do their duties.”

Consider an institution LM. When you are matched with a trading partner, you can query LM at cost Q to find out if there are any “unpaid judgments” against your trading partner, and this query is common knowledge to you and your partner. You and your partner then play a trading game which is a Prisoner’s Dilemma. After trading, and only if you paid the query cost Q, when you have been cheated you can pay another cost C to take your trading partner to trial. If your partner cheated you in the Prisoner’s Dilemma and you took them to trial, you win a judgment penalty of J which the cheater can either voluntarily pay you at cost c(J) or which the cheater can ignore. If the cheater doesn’t pay a judgment, LM lists them as having “unpaid judgments”.

Milgrom, North and Weingast show that, under certain conditions, the following is an equilibrium where everyone always cooperates: if you have no unpaid judgments, you always query LM. If no one queries LM, or if there are unpaid judgments against your trading partner, you defect in the Prisoner’s Dilemma, else you cooperate. If both parties queried LM and only one defects in the Prisoner’s Dilemma, the other trader pays cost C and takes the cheater to the LM for judgment. The conditions needed for this to be an equilibrium are that penalties for cheating are high enough, but not so high that cheaters prefer to retire to the countryside rather than pay them, and that the cost of querying LM is not too high. Note how the LM equilibrium encourages anyone to pay the personal cost of checking their trading partner’s history: if you don’t check, then you can’t go to LM for judgment if you are cheated, hence you will definitely be cheated. The LM also encourages people to pay the personal cost of putting a cheater on trial, because that is the only way to get a judgment decision, and that judgment is actually paid in equilibrium. Relying on reputation in the absence of an institution may not work if communicating reputation of someone who cheated you is personally costly: if you need to print up posters that Giuseppe cheated you, but can otherwise get no money back from Giuseppe, you are simply “throwing good money after bad” and won’t bother. The LM institution provides you an incentive to narc on the cheats.

Note also that in equilibrium, the only cost of the system is the cost of querying, since no one cheats. That is, in the sense of transactions costs, the Law Merchant may be a very low-cost institution: it generates cooperation even though only one piece of information, the existence of unpaid judgments, needs to be aggregated and communicated, and it generates cooperation among a large set of traders that never personally interact by using a single centralized “record-keeper”. Any system that induces cooperation must, at a minimum, inform a player whether their partner has cheated in the past. The Law Merchant system does this with no other costs in equilibrium, since in equilibrium, no one cheats, no one goes for judgment, and no resources are destroyed paying fines.

That historical institutions develop largely to limit transactions costs is a major theme in North’s work, and this paper is a beautiful, highly formal, explication of that broad Coasean idea. Our motivating puzzle – why use formal institutions when reputation provides precisely the same potential for punishment? – can be answered simply by noting that reputation requires information, and the cost-minimizing incentive-compatible way to aggregate and share that information may require an institution. The Law Merchant arises not because we need a way to punish offenders, since in the absence of the nation-state the Law Merchant offers no method for involuntary punishment beyond those that exist in its absence; and yet, in its role reducing costs in the aggregation of information, the Law proves indispensable. What a beautiful example of how theory can clarify our observations!

“The Role of Institutions in the Revival of Trade” appeared in Economics and Politics 1.2, March 1990, and extensions of these ideas to long distance trade with many centers are considered in the papers by Avner Greif and his coauthors linked at the beginning of this post. A broad philosophical defense of the importance of transaction costs to economic history is North’s 1984 essay in the Journal of Institutional and Theoretical Economics. Two other titans of economics have also recently passed away, I’m afraid. Herbert Scarf, the mathematician whose work is of fundamental importance to modern market design, was eulogized by Ricky Vohra and Al Roth. Nate Rosenberg, who with Zvi Griliches was the most important thinker on the economics of invention, was memorialized by Joshua Gans and Joel West.

“Valuing Diversity,” G. Loury & R. Fryer (2013)

The old chair of my alma mater’s economics department, Glenn Loury is, somehow, wrapped up in a kerfuffle related to the student protests that have broken out across the United States. Loury, who is now at Brown, wrote an op-ed in the student paper which to an economist just says that the major racial problem in the United States is statistical discrimination not taste-based discrimination, and hence the types of protests and desired recourse of the student protesters is wrongheaded. After being challenged about “what type of a black scholar” he is, Loury wrote a furious response pointing out that he is, almost certainly, the world’s most prominent scholar on the topic of racial discrimination and potential remedies, and has been thinking about how policy can remedy racial injustice since before the student’s parents were even born.

An important aspect of his work is that, under statistical discrimination, there is huge scope for perverse and unintended effects of policies. This idea has been known since Ken Arrow’s famous 1973 paper, but Glenn Loury and Stephen Coate in 1993 worked it out in greater detail. Imagine there are black and white workers, and high-paid good jobs, which require skill, and low-paid bad jobs which do not. Workers make an unobservable investment in skill, where the firm only sees a proxy: sometimes unskilled workers “look like” skilled workers, sometimes skilled workers “look like” unskilled workers, and sometimes we aren’t sure. As in Arrow’s paper, there can be multiple equilibria: when firms aren’t sure of a worker’s skill, if they assume all of those workers are unskilled, then in equilibrium investment in skill will be such that the indeterminate workers can’t profitably be placed in skilled jobs, but if the firms assume all indeterminate workers are skilled, then there is enough skill investment to make it worthwhile for firms to place those workers in high-skill, high-wage jobs. Since there are multiple equilibria, if race or some other proxy is observable, we can be in the low-skill-job, low-investment equilibrium for one group, and the high-skill-job, high-investment equilibrium for a different group. That is, even with no ex-ante difference across groups and no taste-based bias, we still wind up with a discriminatory outcome.

The question Coate and Loury ask is whether affirmative action can fix this negative outcome. Let an affirmative action rule state that the proportion of all groups assigned to the skilled job must be equal. Ideally, affirmative action would generate equilibrium beliefs by firms about workers that are the same no matter what group those workers come from, and hence skill investment across groups that is equal. Will this happen? Not necessarily. Assume we are in the equilibrium where one group is assumed low-skill when their skill in indeterminate, and the other group is assumed high-skill.

In order to meet the affirmative action rule, either more of the discriminated group needs to be assigned to the high-skill job, or more of the favored group need to be assigned to the low-skill job. Note that in the equilibrium without affirmative action, the discriminated group invests less in skills, and hence the proportion of the discriminated group that tests as unskilled is higher than the proportion of the favored group that does so. The firms can meet the affirmative action rule, then, by keeping the assignment rule for favored groups as before, and by assigning all proven-skilled and indeterminate discriminated workers as well as some random proportion of proven-unskilled discriminated workers, to the skilled task. This rule decreases the incentive to invest in skills for the discriminated group, and hence it is no surprise that not only can it be an equilibrium, but that Coate and Loury can show the dynamics of this policy lead to fewer and fewer discriminated workers investing in skills over time: despite identical potential at birth, affirmative action policies can lead to “patronizing equilibria” that exacerbate, rather than fix, differences across groups. The growing skill difference between previously-discriminated-against “Bumiputra” Malays and Chinese Malays following affirmative action policies in the 1970s fits this narrative nicely.

The broader point here, and one that comes up in much of Loury’s theoretical work, is that because policies affect beliefs even of non-bigoted agents, statistical discrimination is a much harder problem to solve than taste-based or “classical” bias. Consider the job market for economists. If women or minorities have trouble finding jobs because of an “old boys’ club” that simply doesn’t want to hire those groups, then the remedy is simple: require hiring quotas and the like. If, however, the problem is that women or minorities don’t enter economics PhD programs because of a belief that it will be hard to be hired, and that difference in entry leads to fewer high-quality women or minorities come graduation, then remedies like simple quotas may lead to perverse incentives.

Moving beyond perverse incentives, there is also the question of how affirmative action programs should be designed if we want to equate outcomes across groups that face differential opportunities. This question is taken up in “Valuing Diversity”, a recent paper Loury wrote with recent John Bates Clark medal winner Roland Fryer. Consider dalits in India or African-Americans: for a variety of reasons, from historic social network persistence to neighborhood effects, the cost of increasing skill may be higher for these groups. We have an opportunity which is valuable, such as slots at a prestigious college. Simply providing equal opportunity may not be feasible because the social reasons why certain groups face higher costs of increasing skill are very difficult to solve. Brown University, or even the United States government as a whole, may be unable to fix the persistence social difference in upbringing among blacks and whites. So what to do?

There are two natural fixes. We can provide a lower bar for acceptance for the discriminated group at the prestigious college, or subsidize skill acquisition for the discriminated group by providing special summer programs, tutoring, etc. If policy can be conditioned on group identity, then the optimal policy is straightforward. First, note that in a laissez faire world, individuals invest in skill until the cost of investment for the marginal accepted student exactly equates to the benefit the student gets from attending the fancy college. That is, the equilibrium is efficient: students with the lowest cost of acquiring skill are precisely the ones who invest and are accepted. But precisely that weighing of marginal benefit and costs holds within group if the acceptance cutoff differs by group identity, so if policy can condition on group identity, we can get whatever mix of students from different groups we want while still ensuring that the students within each group with the lowest cost of upgrading their skill are precisely the ones who invest and are accepted. The policy change itself, by increasing the quota of slots for the discriminated group, will induce marginal students from that group to upgrade their skills in order to cross the acceptance threshold; that is, quotas at the assignment stage implicitly incentivize higher investment by the discriminated group.

The trickier problem is when policy cannot condition on group identity, as is the case in the United States under current law. I would like somehow to accept more students from the discriminated against group, and to ensure that those students invest in their skill, but the policy I set needs to treat the favored and discriminated against groups equally. Since discriminated-against students make up a bigger proportion of those with a high cost of skill acquisition compared to students with a low cost of skill acquisition, any “blind” policy that does condition on group identity will induce identical investment activity and acceptance probability among agents with identical costs of skill upgrading. Hence any blind policy that induces more discriminated-students to attend college must somehow be accepting students with higher costs of skill acquisition than the marginal accepted student under laissez faire, and must not be accepting students whose costs of skill acquisition were at the laissez faire margin. Fryer and Loury show, by solving the relevant linear program, that we can best achieve this by allowing the most productive students to buy their slots, and then randomly assigning slots to everyone else.

Under that policy, very low cost of effort students still invest so that their skill is high enough that buying a guaranteed slot is worth it. I then use either a tax or subsidy on skill investment in order to affect how many people find it worth investing in skill and then buying the guaranteed slot, and hence in conjunction with the randomized slot assignment, ensuring that the desired mixture across groups that are accepted is achieved.

This result resembles certain results in dynamic pricing. How do I get people to pay a high price for airplane tickets while still hoping to sell would-be-empty seats later at a low price? The answer is that I make high-value people worried that if they don’t buy early, the plane may sell out. The high-value people then trade off paying a high price and getting a seat with probability 1 versus waiting for a low price by maybe not getting on the plane at all. Likewise, how do I induce people to invest in skills even when some lower-skill people will be admitted? Ensure that lower-skill people are only admitted with some randomness. The folks who can get perfect grades and test scores fairly easily will still exert effort to do so, ensuring they get into their top choice college guaranteed rather than hoping to be admitted subject to some random luck. This type of intuition is non-obvious, which is precisely Loury’s point: racial and other forms of injustice are often due to factors much more subtle than outright bigotry, and the optimal response to these more subtle causes do not fit easily on a placard or a bullhorn slogan.

Final working paper (RePEc IDEAS version), published in the JPE, 2013. Hanming Fang and Andrea Moro have a nice handbook chapter on theoretical explorations of discrimination. On the recent protests, Loury and John McWhorter have an interesting and provocative dialog on the recent student protests at Bloggingheads.

Angus Deaton, 2015 Nobel Winner: A Prize for Structural Analysis?

Angus Deaton, the Scottish-born, Cambridge-trained Princeton economist, best known for his careful work on measuring the changes in wellbeing of the world’s poor, has won the 2015 Nobel Prize in economics. His data collection is fairly easy to understand, so I will leave larger discussion of exactly what he has found to the general news media; Deaton’s book “The Great Escape” provides a very nice summary of what he has found as well, and I think a fair reading of his development preferences are that he much prefers the currently en vogue idea of just giving cash to the poor and letting them spend it as they wish.

Essentially, when one carefully measures consumption, health, or generic characteristics of wellbeing, there has been tremendous improvement indeed in the state of the world’s poor. National statistics do not measure these ideas well, because developing countries do not tend to track data at the level of the individual. Indeed, even in the United States, we have only recently begun work on localized measures of the price level and hence the poverty rate. Deaton claims, as in his 2010 AEA Presidential Address (previously discussed briefly on two occasions on AFT), that many of the measures of global inequality and poverty used by the press are fundamentally flawed, largely because of the weak theoretical justification for how they link prices across regions and countries. Careful non-aggregate measures of consumption, health, and wellbeing, like those generated by Deaton, Tony Atkinson, Alwyn Young, Thomas Piketty and Emmanuel Saez, are essential for understanding how human welfare has changed over time and space, and is a deserving rationale for a Nobel.

The surprising thing about Deaton, however, is that despite his great data-collection work and his interest in development, he is famously hostile to the “randomista” trend which proposes that randomized control trials (RCT) or other suitable tools for internally valid causal inference are the best way of learning how to improve the lives of the world’s poor. This mode is most closely associated with the enormously influential J-PAL lab at MIT, and there is no field in economics where you are less likely to see traditional price theoretic ideas than modern studies of development. Deaton is very clear on his opinion: “Randomized controlled trials cannot automatically trump other evidence, they do not occupy any special place in some hierarchy of evidence, nor does it make sense to refer to them as “hard” while other methods are “soft”… [T]he analysis of projects needs to be refocused towards the investigation of potentially generalizable mechanisms that explain why and in what contexts projects can be expected to work.” I would argue that Deaton’s work is much closer to more traditional economic studies of development than to RCTs.

To understand this point of view, we need to go back to Deaton’s earliest work. Among Deaton’s most famous early papers was his well-known development of the Almost Ideal Demand System (AIDS) in 1980 with Muellbauer, a paper chosen as one of the 20 best published in the first 100 years of the AER. It has long been known that individual demand equations which come from utility maximization must satisfy certain properties. For example, a rational consumer’s demand for food should not depend on whether the consumer’s equivalent real salary is paid in American or Canadian dollars. These restrictions turn out to be useful in that if you want to know how demand for various products depend on changes in income, among many other questions, the restrictions of utility theory simplify estimation greatly by reducing the number of free parameters. The problem is in specifying a form for aggregate demand, such as how demand for cars depends on the incomes of all consumers and prices of other goods. It turns out that, in general, aggregate demand generated by utility-maximizing households does not satisfy the same restrictions as individual demand; you can’t simply assume that there is a “representative consumer” with some utility function and demand function equal to each individual agent. What form should we write for aggregate demand, and how congruent is that form with economic theory? Surely an important question if we want to estimate how a shift in taxes on some commodity, or a policy of giving some agricultural input to some farmers, is going to affect demand for output, its price, and hence welfare!

Let q(j)=D(p,c,e) say that the quantity of j consumed, in aggregate is a function of the price of all goods p and the total consumption (or average consumption) c, plus perhaps some random error e. This can be tough to estimate: if D(p,c,e)=Ap+e, where demand is just a linear function of relative prices, then we have a k-by-k matrix to estimate, where k is the number of goods. Worse, that demand function is also imposing an enormous restriction on what individual demand functions, and hence utility functions, look like, in a way that theory does not necessarily support. The AIDS of Deaton and Muellbauer combine the fact that Taylor expansions approximately linearize nonlinear functions and that individual demand can be aggregated even when heterogeneous across individuals if the restrictions of Muellbauer’s PIGLOG papers are satisfied to show a functional form for aggregate demand D which is consistent with aggregated individual rational behavior and which can sometimes be estimated via OLS. They use British data to argue that aggregate demand violates testable assumptions of the model and hence factors like credit constraints or price expectations are fundamental in explaining aggregate consumption.

This exercise brings up a number of first-order questions for a development economist. First, it shows clearly the problem with estimating aggregate demand as a purely linear function of prices and income, as if society were a single consumer. Second, it gives the importance of how we measure the overall price level in figuring out the effects of taxes and other policies. Third, it combines theory and data to convincingly suggest that models which estimate demand solely as a function of current prices and current income are necessarily going to give misleading results, even when demand is allowed to take on very general forms as in the AIDS model. A huge body of research since 1980 has investigated how we can better model demand in order to credibly evaluate demand-affecting policy. All of this is very different from how a certain strand of development economist today might investigate something like a subsidy. Rather than taking obversational data, these economists might look for a random or quasirandom experiment where such a subsidy was introduced, and estimate the “effect” of that subsidy directly on some quantity of interest, without concern for how exactly that subsidy generated the effect.

To see the difference between randomization and more structural approaches like AIDS, consider the following example from Deaton. You are asked to evaluate whether China should invest more in building railway stations if they wish to reduce poverty. Many economists trained in a manner influenced by the randomization movement would say, well, we can’t just regress the existence of a railway on a measure of city-by-city poverty. The existence of a railway station depends on both things we can control for (the population of a given city) and things we can’t control for (subjective belief that a town is “growing” when the railway is plopped there). Let’s find something that is correlated with rail station building but uncorrelated with the random component of how rail station building affects poverty: for instance, a city may lie on a geographically-accepted path between two large cities. If certain assumptions hold, it turns out that a two-stage “instrumental variable” approach can use that “quasi-experiment” to generate the LATE, or local average treatment effect. This effect is the average benefit of a railway station on poverty reduction, at the local margin of cities which are just induced by the instrument to build a railway station. Similar techniques, like difference-in-difference and randomized control trials, under slightly different assumptions can generate credible LATEs. In development work today, it is very common to see a paper where large portions are devoted to showing that the assumptions (often untestable) of a given causal inference model are likely to hold in a given setting, then finally claiming that the treatment effect of X on Y is Z. That LATEs can be identified outside of a purely randomized contexts is incredibly important and valuable, and the economists and statisticians who did the heavy statistical lifting on this so-called Rubin model will absolutely and justly win an Economics Nobel sometime soon.

However, this use of instrumental variables would surely seem strange to the old Cowles Commission folks: Deaton is correct that “econometric analysis has changed its focus over the years, away from the analysis of models derived from theory towards much looser specifications that are statistical representations of program evaluation. With this shift, instrumental variables have moved from being solutions to a well-defined problem of inference to being devices that induce quasi-randomization.” The traditional use of instrumental variables was that after writing down a theoretically justified model of behavior or aggregates, certain parameters – not treatment effects, but parameters of a model – are not identified. For instance, price and quantity transacted are determined by the intersection of aggregate supply and aggregate demand. Knowing, say, that price and quantity was (a,b) today, and is (c,d) tomorrow, does not let me figure out the shape of either the supply or demand curve. If price and quantity both rise, it may be that demand alone has increased pushing the demand curve to the right, or that demand has increased while the supply curve has also shifted to the right a small amount, or many other outcomes. An instrument that increases supply without changing demand, or vice versa, can be used to “identify” the supply and demand curves: an exogenous change in the price of oil will affect the price of gasoline without much of an effect on the demand curve, and hence we can examine price and quantity transacted before and after the oil supply shock to find the slope of supply and demand.

Note the difference between the supply and demand equation and the treatment effects use of instrumental variables. In the former case, we have a well-specified system of supply and demand, based on economic theory. Once the supply and demand curves are estimated, we can then perform all sorts of counterfactual and welfare analysis. In the latter case, we generate a treatment effect (really, a LATE), but we do not really know why we got the treatment effect we got. Are rail stations useful because they reduce price variance across cities, because they allow for increasing returns to scale in industry to be utilized, or some other reason? Once we know the “why”, we can ask questions like, is there a cheaper way to generate the same benefit? Is heterogeneity in the benefit important? Ought I expect the results from my quasiexperiment in place A and time B to still operate in place C and time D (a famous example being the drug Opren, which was very successful in RCTs but turned out to be particularly deadly when used widely by the elderly)? Worse, the whole idea of LATE is backwards. We traditionally choose a parameter of interest, which may or may not be a treatment effect, and then choose an estimation technique that can credible estimate that parameter. Quasirandom techniques instead start by specifying the estimation technique and then hunt for a quasirandom setting, or randomize appropriately by “dosing” some subjects and not others, in order to fit the assumptions necessary to generate a LATE. If is often the case that even policymakers do not care principally about the LATE, but rather they care about some measure of welfare impact which rarely is immediately interpretable even if the LATE is credibly known!

Given these problems, why are random and quasirandom techniques so heavily endorsed by the dominant branch of development? Again, let’s turn to Deaton: “There has also been frustration with the World Bank’s apparent failure to learn from its own projects, and its inability to provide a convincing argument that its past activities have enhanced economic growth and poverty reduction. Past development practice is seen as a succession of fads, with one supposed magic bullet replacing another—from planning to infrastructure to human capital to structural adjustment to health and social capital to the environment and back to infrastructure—a process that seems not to be guided by progressive learning.” This is to say, the conditions necessary to estimate theoretical models are so stringent that development economists have been writing noncredible models, estimating them, generating some fad of programs that is used in development for a few years until it turns out not to be silver bullet, then abandoning the fad for some new technique. Better, the randomistas argue, to forget about external validity for now, and instead just evaluate the LATEs on a program-by-program basis, iterating what types of programs we evaluate until we have a suitable list of interventions that we feel confident work. That is, development should operate like medicine.

We have something of an impasse here. Everyone agrees that on many questions theory is ambiguous in the absence of particular types of data, hence more and better data collection is important. Everyone agrees that many parameters of interest for policymaking require certain assumptions, some more justifiable than others. Deaton’s position is that the parameters of interest to economists by and large are not LATEs, and cannot be generated in a straightforward way from LATEs. Thus, following Nancy Cartwright’s delightful phrasing, if we are to “use” causes rather than just “hunt” for what they are, we have no choice but to specify the minimal economic model which is able to generate the parameters we care about from the data. Glen Weyl’s attempt to rehabilitate price theory and Raj Chetty’s sufficient statistics approach are both attempts to combine the credibility of random and quasirandom inference with the benefits of external validity and counterfactual analysis that model-based structural designs permit.

One way to read Deaton’s prize, then, is as an award for the idea that effective development requires theory if we even hope to compare welfare across space and time or to understand why policies like infrastructure improvements matter for welfare and hence whether their beneficial effects will remain when moved to a new context. It is a prize which argues against the idea that all theory does is propose hypotheses. For Deaton, going all the way back to his work with AIDS, theory serves three roles: proposing hypotheses, suggesting which data is worthwhile to collect, and permitting inference on the basis of that data. A secondary implication, very clear in Deaton’s writing, is that even though the “great escape” from poverty and want is real and continuing, that escape is almost entirely driven by effects which are unrelated to aid and which are uninfluenced by the type of small bore, partial equilibrium policies for which randomization is generally suitable. And, indeed, the best development economists very much understand this point. The problem is that the media, and less technically capable young economists, still hold the mistaken belief that they can infer everything they want to infer about “what works” solely using the “scientific” methods of random- and quasirandomization. For Deaton, results that are easy to understand and communicate, like the “dollar-a-day” poverty standard or an average treatment effect, are less virtuous than results which carefully situate numbers in the role most amenable to answering an exact policy question.

Let me leave you three side notes and some links to Deaton’s work. First, I can’t help but laugh at Deaton’s description of his early career in one of his famous “Notes from America”. Deaton, despite being a student of the 1984 Nobel laureate Richard Stone, graduated from Cambridge essentially unaware of how one ought publish in the big “American” journals like Econometrica and the AER. Cambridge had gone from being the absolute center of economic thought to something of a disconnected backwater, and Deaton, despite writing a paper that would win a prize as one of the best papers in Econometrica published in the late 1970s, had essentially no understanding of the norms of publishing in such a journal! When the history of modern economics is written, the rise of a handful of European programs and their role in reintegrating economics on both sides of the Atlantic will be fundamental. Second, Deaton’s prize should be seen as something of a callback to the ’84 prize to Stone and ’77 prize to Meade, two of the least known Nobel laureates. I don’t think it is an exaggeration to say that the majority of new PhDs from even the very best programs will have no idea who those two men are, or what they did. But as Deaton mentions, Stone in particular was one of the early “structural modelers” in that he was interested in estimating the so-called “deep” or behavioral parameters of economic models in a way that is absolutely universal today, as well as being a pioneer in the creation and collection of novel economic statistics whose value was proposed on the basis of economic theory. Quite a modern research program! Third, of the 19 papers in the AER “Top 20 of all time” whose authors were alive during the era of the economics Nobel, 14 have had at least one author win the prize. Should this be a cause for hope for the living outliers, Anne Krueger, Harold Demsetz, Stephen Ross, John Harris, Michael Todaro and Dale Jorgensen?

For those interested in Deaton’s work beyond what this short essay, his methodological essay, quoted often in this post, is here. The Nobel Prize technical summary, always a great and well-written read, can be found here.

Follow

Get every new post delivered to your Inbox.

Join 336 other followers

%d bloggers like this: