Category Archives: Education

What Randomization Can and Cannot Do: The 2019 Nobel Prize

It is Nobel Prize season once again, a grand opportunity to dive into some of our field’s most influential papers and to consider their legacy. This year’s prize was inevitable, an award to Abhijit Banerjee, Esther Duflo, and Michael Kremer for popularizing the hugely influential experimental approach to development. It is only fitting that my writeup this year has been delayed due to the anti-government road blockades here in Ecuador which delayed my return to the internet-enabled world – developing countries face many barriers to reaching prosperity, and rarely have I been so personally aware of the effects of place on productivity as I was this week!

The reason for the prize is straightforward: an entire branch of economics, development, looks absolutely different from what it looked like thirty years ago. Development used to be essentially a branch of economic growth. Researchers studied topics like the productivity of large versus small farms, the nature of “marketing” (or the nature of markets and how economically connected different regions in a country are), or the necessity of exports versus industrialization. Studies were almost wholly observational, deep data collections with throwaway references to old-school growth theory. Policy was largely driven by the subjective impression of donors or program managers about projects that “worked”. To be a bit too honest – it was a dull field, and hence a backwater. And worse than dull, it was a field where scientific progress was seriously lacking.

Banerjee has a lovely description of the state of affairs back in the 1990s. Lots of probably-good ideas were funded, informed deeply by history, but with very little convincing evidence that highly-funded projects were achieving their stated aims. Of the World Bank Sourcebook of recommended projects, everything from scholarships to girls to vouchers for poor children to citizens’ report cards were recommended. Did these actually work? Banerjee quotes a program providing computer terminals in rural areas of Madhya Pradesh which explains that due to a lack of electricity and poor connectivity, “only a few of the kiosks have proved to be commercially viable”, then notes, without irony, that “following the success of the initiative,” similar programs would be funded. Clearly this state of affairs is unsatisfactory. Surely we should be able to evaluate the projects we’ve funded already? And better, surely we should structure those evaluations to inform future projects? Banerjee again: “the most useful thing a development economist can do in this environment is stand up for hard evidence.”

And where do we get hard evidence? If by this we mean internal validity – that is, whether the effect we claim to have seen is actually caused by a particular policy in a particular setting – applied econometricians of the “credibility revolution” in labor in the 1980s and 1990s provided an answer. Either take advantage of natural variation with useful statistical properties, like the famed regression discontinuity, or else randomize treatment like a medical study. The idea here is that the assumptions needed to interpret a “treatment effect” are often less demanding than those needed to interpret the estimated parameter of an economic model, hence more likely to be “real”. The problem in development is that most of what we care about cannot be randomized. How are we, for instance, to randomize whether a country adopts import substitution industrialization or not, or randomize farm size under land reform – and at a scale large enough for statistical inference?

What Banerjee, Duflo, and Kremer noticed is that much of what development agencies do in practice has nothing to do with those large-scale interventions. The day-to-day work of development is making sure teachers show up to work, vaccines are distributed and taken up by children, corruption does not deter the creation of new businesses, and so on. By breaking down the work of development on the macro scale to evaluations of development at micro scale, we can at least say something credible about what works in these bite-size pieces. No longer should the World Bank Sourcebook give a list of recommended programs, based on handwaving. Rather, if we are to spend 100 million dollars sending computers to schools in a developing country, we should at least be able to say “when we spent 5 million on a pilot, we designed the pilot so as to learn that computers in that particular setting led to a 12% decrease in dropout rate, and hence a 34%-62% return on investment according to standard estimates of the link between human capital and productivity.” How to run those experiments? How should we set them up? Who can we get to pay for them? How do we deal with “piloting bias”, where the initial NGO we pilot with is more capable than the government we expect to act on evidence learned in the first study? How do we deal with spillovers from randomized experiments, econometrically? Banerjee, Duflo, and Kremer not only ran some of the famous early experiments, they also established the premier academic institution for running these experiments – J-PAL at MIT – and further wrote some of the best known practical guides to experiments in development.

Many of the experiments written by the three winners are now canonical. Let’s start with Michael Kremer’s paper on deworming, with Ted Miguel, in Econometrica. Everyone agreed that deworming kids infected with things like hookworm has large health benefits for the children directly treated. But since worms are spread by outdoor bathroom use and other poor hygiene practices, one infected kid can also harm nearby kids by spreading the disease. Kremer and Miguel suspected that one reason school attendance is so poor in some developing countries is because of the disease burden, and hence that reducing infections among one kid benefits the entire community, and neighboring ones as well, by reducing overall infection. By randomizing mass school-based deworming, and measuring school attendance both at the focal and at neighboring schools, they found that villages as far as 4km away saw higher school attendance (4km rather 6km in the original paper due to a correction of an error in the analysis). Note the good economics here: a change from individual to school-based deworming helps identify spillovers across schools, and some care goes into handling the spatial econometric issue whereby density of nearby schools equals density of nearby population equals differential baseline infection rates at these schools. An extra year of school attendance could therefore be “bought” by a donor for $3.50, much cheaper than other interventions such as textbook programs or additional teachers. Organizations like GiveWell still rate deworming among the most cost-effective educational interventions in the world: in terms of short-run impact, surely this is one of the single most important pieces of applied economics of the 21st century.

The laureates have also used experimental design to learn that some previously highly-regarded programs are not as important to development as you might suspect. Banerjee, Duflo, Rachel Glennerster and Cynthia Kinnan studied microfinance rollout in Hyderabad, randomizing the neighborhoods which received access to a major first-gen microlender. These programs are generally woman-focused, joint-responsibility, high-interest loans a la the Nobel Peace Prize winning Grameen Bank. 2800 households across the city were initially surveyed about their family characteristics, lending behavior, consumption, and entrepreneurship, then followups were performed a year after the microfinance rollout, and then three years later. While women in treated areas were 8.8 percentage points more likely to take a microloan, and existing entrepreneurs do in fact increase spending on their business, there is no long-run impact on education, health, or the likelihood women make important family decisions, nor does it make businesses more profitable. That is, credit constraints, at least in poor neighborhoods in Hyderabad, do not appear the main barrier to development; this is perhaps not very surprising, since higher-productivity firms in India in the 2000s already have access to reasonably well-developed credit markets, and surely they are the main driver of national income (followup work does see some benefits for very high talent, very poor entrepreneurs, but the long run key result remains).

Let’s realize how wild this paper is: a literal Nobel Peace Prize was awarded for a form of lending that had not really been rigorously analyzed. This form of lending effectively did not exist in rich countries at the time they developed, so it is not a necessary condition for growth. And yet enormous amounts of money went into a somewhat-odd financial structure because donors were nonetheless convinced, on the basis of very flimsy evidence, that microlending was critical.

By replacing conjecture with evidence, and showing randomized trials can actually be run in many important development settings, the laureates’ reformation of economic development has been unquestionably positive. Or has it? Before returning to the (truly!) positive aspects of Banerjee, Duflo and Kremer’s research program, we must take a short negative turn. Because though Banerjee, Duflo, and Kremer are unquestionably the leaders of the field of development, and the most influential scholars for young economists working in that field, there is much more controversy about RCTs than you might suspect if all you’ve seen are the press accolades of the method. Donors love RCTs, as they help select the right projects. Journalists love RCTs, as they are simple to explain (Wired, in a typical example of this hyperbole: “But in the realm of human behavior, just as in the realm of medicine, there’s no better way to gain insight than to compare the effect of an intervention to the effect of doing nothing at all. That is: You need a randomized controlled trial.”) The “randomista” referees love RCTs – a tribe is a tribe, after all. But RCTs are not necessarily better for those who hope to understand economic development! The critiques are three-fold.

First, that while the method of random trials is great for impact or program evaluation, it is not great for understanding how similar but not exact replications will perform in different settings. That is, random trials have no specific claim to external validity, and indeed are worse than other methods on this count. Second, it is argued that development is much more than program evaluation, and that the reason real countries grow rich has essentially nothing to do with the kinds of policies studied in the papers we discussed above: the “economist as plumber” famously popularized by Duflo, who rigorously diagnoses small problems and proposes solutions, is a fine job for a World Bank staffer, but a crazy use of the intelligence of our otherwise-leading scholars in development. Third, even if we only care about internal validity, and only care about the internal validity of some effect that can in principle be studied experimentally, the optimal experimental design is generally not an RCT.

The external validity problem is often seen to be one related to scale: well-run partner NGOs are just better at implementing any given policy that, say, a government, so the benefit of scaled-up interventions may be much lower than that identified by an experiment. We call this “piloting bias”, but it isn’t really the core problem. The core problem is that the mapping from one environment or one time to the next depends on many factors, and by definition the experiment cannot replicate those factors. A labor market intervention in a high-unemployment country cannot inform in an internally valid way about a low-unemployment country, or a country with different outside options for urban laborers, or a country with an alternative social safety net or cultural traditions about income sharing within families. Worse, the mapping from a partial equilibrium to a general equilibrium world is not at all obvious, and experiments do not inform as to the mapping. Giving cash transfers to some villagers may make them better off, but giving cash transfers to all villagers may cause land prices to rise, or cause more rent extraction by corrupt governments, or cause all sorts of other changes in relative prices.

You can see this issue in the Scientific Summary of this year’s Nobel. Literally, the introductory justification for RCTs is that, “[t]o give just a few examples, theory cannot tell us whether temporarily employing additional contract teachers with a possibility of re-employment is a more cost-effective way to raise the quality of education than reducing class sizes. Neither can it tell us whether microfinance programs effectively boost entrepreneurship among the poor. Nor does it reveal the extent to which subsidized health-care products will raise poor people’s investment in their own health.”

Theory cannot tell us the answers to these questions, but an internally valid randomized control trial can? Surely the wage of the contract teacher vis-a-vis more regular teachers and hence smaller class sizes matters? Surely it matters how well-trained these contract teachers are? Surely it matters what the incentives for investment in human capital by students in the given location is? To put this another way: run literally whatever experiment you want to run on this question in, say, rural Zambia in grade 4 in 2019. Then predict the cost-benefit ratio of having additional contract teachers versus more regular teachers in Bihar in high school in 2039. Who would think there is a link? Actually, let’s be more precise: who would think there is a link between what you learned in Zambia and what will happen in Bihar which is not primarily theoretical? Having done no RCT, I can tell you that if the contract teachers are much cheaper per unit of human capital, we should use more of them. I can tell you that if the students speak two different languages, there is a greater benefit in having a teacher assistant to translate. I can tell you that if the government or other principal has the ability to undo outside incentives with a side contract, hence are not committed to the mechanism, dynamic mechanisms will not perform as well as you expect. These types of statements are theoretical: good old-fashioned substitution effects due to relative prices, or a priori production function issues, or basic mechanism design.

Things are worse still. It is not simply that an internally valid estimate of a treatment effect often tells us nothing about how that effect generalizes, but that the important questions in development cannot be answered with RCTs. Everyone working in development has heard this critique. But just because a critique is oft-repeated does not mean it is wrong. As Lant Pritchett argues, national development is a social process involving markets, institutions, politics, and organizations. RCTs have focused on, in his reckoning, “topics that account for roughly zero of the observed variation in human development outcomes.” Again, this isn’t to say that RCTs cannot study anything. Improving the function of developing world schools, figuring out why malaria nets are not used, investigating how to reintegrate civil war fighters: these are not minor issues, and it’s good that folks like this year’s Nobelists and their followers provide solid evidence on these topics. The question is one of balance. Are we, as economists are famously wont to do, simply looking for keys underneath the spotlight when we focus our attention on questions which are amenable to a randomized study? Has the focus on internal validity diverted effort from topics that are much more fundamental to the wealth of nations?

But fine. Let us consider that our question of interest can be studied in a randomized fashion. And let us assume that we do not expect piloting bias or other external validity concerns to be first-order. We still have an issue: even on internal validity, randomized control trials are not perfect. They are certainly not a “gold standard”, and the econometricians who push back against this framing have good reason to do so. Two primary issues arise. First, to predict what will happen if I impose a policy, I am concerned that what I have learned in this past is biased (e.g., the people observed to use schooling subsidies are more diligent than those who would go to school if we made these subsidies universal). But I am also concerned about statistical inference: with small sample sizes, even an unbiased estimate will not predict very well. I recently talked with an organization doing recruitment who quasi-randomly recruited at a small number of colleges. On average, they attracted a handful of applicants in each college. They stopped recruiting at the colleges with two or fewer applicants after the first year. But of course random variation means the difference between two and four applicants is basically nil.

In this vein, randomized trials tend to have very small sample sizes compared to observational studies. When this is combined with high “leverage” of outlier observations when multiple treatment arms are evaluated, particularly for heterogeneous effects, randomized trials often predict poorly out of sample even when unbiased (see Alwyn Young in the QJE on this point). Observational studies allow larger sample sizes, and hence often predict better even when they are biased. The theoretical assumptions of a structural model permit parameters to be estimated even more tightly, as we use a priori theory to effectively restrict the nature of economic effects.

We have thus far assumed the randomized trial is unbiased, but that is often suspect as well. Even if I randomly assign treatment, I have not necessarily randomly assigned spillovers in a balanced way, nor have I restricted untreated agents from rebalancing their effort or resources. A PhD student of ours on the market this year, Carlos Inoue, examined the effect of random allocation of a new coronary intervention in Brazilian hospitals. Following the arrival of this technology, good doctors moved to hospitals with the “randomized” technology. The estimated effect is therefore nothing like what would have been found had all hospitals adopted the intervention. This issue can be stated simply: randomizing treatment does not in practice hold all relevant covariates constant, and if your response is just “control for the covariates you worry about”, then we are back to the old setting of observational studies where we need a priori arguments about what these covariates are if we are to talk about the effects of a policy.

The irony is that Banerjee, Duflo and Kremer are often quite careful in how they motivate their work with traditional microeconomic theory. They rarely make grandiose claims of external validity when nothing of the sort can be shown by their experiment. Kremer is an ace theorist in his own right, Banerjee often relies on complex decision and game theory particularly in his early work, and no one can read the care with which Duflo handles issues of theory and external validity and think she is merely punting. Most of the complaints about their “randomista” followers do not fully apply to the work of the laureates themselves.

And none of the critiques above should be taken to mean that experiments cannot be incredibly useful to development. Indeed, the proof of the pudding is in the tasting: some of the small-scale interventions by Banerjee, Duflo, and Kremer have been successfully scaled up! To analogize to a firm, consider a plant manager interested in improving productivity. She could read books on operations research and try to implement ideas, but it surely is also useful to play around with experiments within her plant. Perhaps she will learn that it’s not incentives but rather lack of information that is the biggest reason workers are, say, applying car door hinges incorrectly. She may then redo training, and find fewer errors in cars produced at the plant over the next year. This evidence – not only the treatment effect, but also the rationale – can then be brought to other plants at the same company. All totally reasonable. Indeed, would we not find it insane for a manager to try things out, and make minor changes on the margin, before implementing a huge change to incentives or training? And of course the same goes, or should go, when the World Bank or DFID or USAID spend tons of money trying to solve some development issue.

On that point, what would even a skeptic agree a development experiment can do? First, it is generally better than other methods at identifying internally valid treatment effects, though still subject to the caveats above.

Second, it can fine-tune interventions along margins where theory gives little guidance. For instance, do people not take AIDS drugs because they don’t believe they work, because they don’t have the money, or because they want to continue having sex and no one will sleep with them if they are seen picking up antiretrovirals? My colleague Laura Derksen suspected that people are often unaware that antiretrovirals prevent transmission, hence in locations with high rates of HIV, it may be safer to sleep with someone taking antiretrovirals than the population at large. She shows that informational interventions informing villagers about this property of antiretrovirals meaningfully increases takeup of medication. We learn from her study that it may be important in the case of AIDS prevention to correct this particular set of beliefs. Theory, of course, tells us little about how widespread these incorrect beliefs are, hence about the magnitude of this informational shift on drug takeup.

Third, experiments allow us to study policies that no one has yet implemented. Ignoring the problem of statistical identification in observational studies, there may be many policies we wish to implement which are wholly different in kind from those seen in the past. The negative income tax experiments of the 1970s are a classic example. Experiments give researchers more control. This additional control is of course balanced against the fact that we should expect super meaningful interventions to have already occurred, and we may have to perform experiments at relatively low scale due to cost. We should not be too small-minded here. There are now experimental development papers on topics thought to be outside the bounds of experiment. I’ve previously discussed on this site Kevin Donovan’s work randomizing the placement of roads and bridges connected remote villages to urban centers. What could be “less amenable” to randomization that the literal construction of a road and bridge network?

So where do we stand? It is unquestionable that a lot of development work in practice was based on the flimsiest of evidence. It is unquestionable that armies Banerjee, Duflo, and Kremer have sent into the world via J-PAL and similar institutions have brought much more rigor to understanding program evaluation. Some of these interventions are now literally improving the lives of millions of people with clear, well-identified, nonobvious policy. That is an incredible achievement! And there is something likeable about the desire of the ivory tower to get into the weeds of day-to-day policy. Michael Kremer on this point: “The modern movement for RCTs in development economics…is about innovation, as well as evaluation. It’s a dynamic process of learning about a context through painstaking on-the-ground work, trying out different approaches, collecting good data with good causal identification, finding out that results do not fit pre-conceived theoretical ideas, working on a better theoretical understanding that fits the facts on the ground, and developing new ideas and approaches based on theory and then testing the new approaches.” No objection here.

That said, we cannot ignore that there are serious people who seriously object to the J-PAL style of development. Deaton, who won the Nobel Prize only four years ago, writes the following, in line with our discussion above: “Randomized controlled trials cannot automatically trump other evidence, they do not occupy any special place in some hierarchy of evidence, nor does it make sense to refer to them as “hard” while other methods are “soft”… [T]he analysis of projects needs to be refocused towards the investigation of potentially generalizable mechanisms that explain why and in what contexts projects can be expected to work.” Lant Pritchett argues that despite success persuading donors and policymakers, the evidence that RCTs lead to better policies at the governmental level, and hence better outcomes for people, is far from the case. The barrier to the adoption of better policy is bad incentives, not a lack of knowledge on how given policies will perform. I think these critiques are quite valid, and the randomization movement in development often way overstates what they have, and could have in principle, learned. But let’s give the last word to Chris Blattman on the skeptic’s case for randomized trials in development: “if a little populist evangelism will get more evidence-based thinking in the world, and tip us marginally further from Great Leaps Forward, I have one thing to say: Hallelujah.” Indeed. No one, randomista or not, longs to go back to the day of unjustified advice on development, particularly “Great Leap Forward” type programs without any real theoretical or empirical backing!

A few remaining bagatelles:

1) It is surprising how early this award was given. Though incredibly influential, the earliest published papers by any of the laureates mentioned in the Nobel scientific summary are from 2003 and 2004 (Miguel-Kremer on deworming, Duflo-Saez on retirement plans, Chattopadhyay and Duflo on female policymakers in India, Banerjee and Duflo on health in Rajathstan). This seems shockingly recent for a Nobel – I wonder if there are any other Nobel winners in economics who won entirely for work published so close to the prize announcement.

2) In my field, innovation, Kremer is most famous for his paper on patent buyouts (we discussed that paper on this site way back in 2010). How do we both incentivize new drug production but also get these drugs sold at marginal cost once invented? We think the drugmakers have better knowledge about how to produce and test a new drug than some bureaucrat, so we can’t finance drugs directly. If we give a patent, then high-value drugs return more to the inventor, but at massive deadweight loss. What we want to do is offer inventors some large fraction of the social return to their invention ex-post, in exchange for making production perfectly competitive. Kremer proposes patent auctions where the government pays a multiple of the winning bid with some probability, giving the drug to the public domain. The auction reveals the market value, and the multiple allows the government to account for consumer surplus and deadweight loss as well. There are many practical issues, but I have always found this an elegant, information-based attempt to solve the problem of innovation production, and it has been quite influential on those grounds.

3) Somewhat ironically, Kremer also has a great 1990s growth paper with RCT-skeptics Pritchett, Easterly and Summers. The point is simple: growth rates by country vacillate wildly decade to decade. Knowing the 2000s, you likely would not have predicted countries like Ethiopia and Myanmar as growth miracles of the 2010s. Yet things like education, political systems, and so on are quite constant within-country across any two decade period. This necessarily means that shocks of some sort, whether from international demand, the political system, nonlinear cumulative effects, and so on, must be first-order for growth. A great, straightforward argument, well-explained.

4) There is some irony that two of Duflo’s most famous papers are not experiments at all. Her most cited paper by far is a piece of econometric theory on standard errors in difference-in-difference models, written with Marianne Bertrand. Her next most cited paper is a lovely study of the quasi-random school expansion policy in Indonesia, used to estimate the return on school construction and on education more generally. Nary a randomized experiment in sight in either paper.

5) I could go on all day about Michael Kremer’s 1990s essays. In addition to Patent Buyouts, two more of them appear on my class syllabi. The O-Ring theory is an elegant model of complementary inputs and labor market sorting, where slightly better “secretaries” earn much higher wages. The “One Million B.C.” paper notes that growth must have been low for most of human history, and that it was limited because low human density limited the spread of nonrivalrous ideas. It is the classic Malthus plus endogenous growth paper, and always a hit among students.

6) Ok, one more for Kremer, since “Elephants” is my favorite title in economics. Theoretically, future scarcity increases prices. When people think elephants will go extinct, the price of ivory therefore rises, making extinction more likely as poaching incentives go up. What to do? Hold a government stockpile of ivory and commit to selling it if the stock of living elephants falls below a certain point. Elegant. And I can’t help but think: how would one study this particular general equilibrium effect experimentally? I both believe the result and suspect that randomized trials are not a good way to understand it!

The 2018 John Bates Clark: Parag Pathak

The most prestigious award a young researcher can receive in economics, the John Bates Clark medal, has been awarded to the incredibly productive Parag Pathak. His CV is one that could only belong a true rocket of a career: he was a full professor at MIT 11 years after he started his PhD, finishing the degree in four years before going to Paul Samuelson route through the Harvard Society of Fellows to become the top young theorist in Cambridge.

Pathak is best known for his work on the nature and effects of how students are allocated to schools. This is, of course, an area where theorists have had incredible influence on public policy, notably via Pathak’s PhD Advisor, the Nobel prize winner Al Roth, the group of Turkish researchers including Atila Abdulkadiroglu, Utku Unver, and Tayfun Sonmez, as well as the work of 2015 Clark medal winner Roland Fryer. Indeed, this group’s work on how to best allocate students to schools in an incentive-compatible way – that is, in a way where parents need only truthfully state which schools they like best – was adopted by the city of Boston, to my knowledge the first-time this theoretically-optimal mechanism was used by an actual school district. As someone born in Boston’s contentious Dorchester neighborhood, it is quite striking how much more successful this reform was than the busing policies of the 1970s which led to incredible amounts of bigoted pushback.

Consider the old “Boston mechanism”. Parents list their preferred schools in order. Everyone would be allocated their first choice if possible. If a school is oversubscribed, some random percentage get their second choice, and if still oversubscribed, their third, and so on. This mechanism gives clear reason for strategic manipulation: you certainly don’t want to list a very popular school as your second choice, since there is almost no chance that it won’t fill up with first choices. The Boston mechanism was replaced by the Gale-Shapley mechanism, which has the property that it is never optimal for a parent to misrepresent preferences. In theory, not only is this more efficient but it is also fairer: parents who do not have access to sophisticated neighbors helping them game the system are, you might reason, most likely to lose out. And this is precisely what Pathak and Sonmez show in a theoretical model: sophisticated parents may prefer the old Boston mechanism because it makes them better off at the expense of the less sophisticated! The latter concern is a tough one for traditional mechanism design to handle, as we generally assume that agents act in their self-interest, including taking advantage of the potential for strategic manipulation. There remains some debate about what is means for a mechanism to be “better” when some agents are unsophisticated or when they do not have strict preferences over all options.

Competition for students may also affect the quality of underlying schools, either because charter schools and for-profits compete on a profit-maximizing basis, or because public schools somehow respond to the incentive to get good students. Where Pathak’s work is particularly rigorous is that he notes how critical both the competitive environment and the exact mechanism for assigning students are for the responsiveness of schools. It is not that “charters are good” or “charters are bad” or “test schools produce X outcome”, but rather the conditionality of these statements on how the choice and assignment mechanism works. A pure empirical study found charters in Boston performed much better for lottery-selected students than public schools, and that attending an elite test school in Boston or New York doesn’t really affect student outcomes. Parents appear unable to evaluate school quality except in terms of peer effects, which can be particularly problematic when good peers enroll is otherwise bad schools.

Pathak’s methodological approach is refreshing. He has a theorist’s toolkit and an empiricist’s interest in policy. For instance, imagine we want to know how well charter schools perform. The obvious worry is that charter students are better than the average student. Many studies take advantage of charter lotteries, where oversubscribed schools assign students from the application class by lottery. This does not identify the full effect of charters, however: whether I enter a lottery at all depends on how I ranked schools, and hence participants in a lottery for School A versus School B are not identical. Pathak, Angrist and Walters show how to combine the data from specific school choice mechanisms with the lottery in a way that ensures we are comparing like-to-like when evaluating lotteries at charters versus non-charters. In particular, they find that Denver’s charter schools do in fact perform better.

Indeed, reading through Pathak’s papers this afternoon, I find myself struck by how empirical his approach has become over time: if a given question requires the full arsenal of choice, he deploys it, and if it is unnecessary, he estimates things in a completely reduced form way. Going beyond the reduced form often produces striking results. For instance, what would be lost if affirmative action based on race were banned in school choice? Chicago’s tier-based plans, where many seats at test schools were reserved for students from low-SES tiers, works dreadfully: not only does it not actually select low-SES students (high-SES students in low-income neighborhoods are selected), but it would require massively dropping entrance score criteria to get a pre-established number of black and hispanic students to attend. This is particularly true for test schools on the north side of the city. Answering the question of what racial representation looks like in a counterfactual world where Chicago doesn’t have to use SES-based criteria to indirectly choose students to get a given racial makeup, and the question of whether the problem is Chicago’s particular mechanism or whether it is fundamental to any location-based selection mechanism, requires theory, and Pathak deploys it wonderfully. Peng Shi and Pathak also do back-testing on their theory-based discrete-choice predictions of the impact of a change in Boston’s mechanism meant to reduce travel times, showing that to the extent the model missed, it was because the student characteristics were unexpected, not because there were not underlying structural preferences. If we are going to deploy serious theoretical methods to applied questions, rather than just to thought experiments, this type of rigorous combination of theory, design-based empirics, and back-testing is essential.

In addition to his education research, Pathak has also contributed, alongside Fuhito Kojima, to the literature on large matching markets. The basic idea is the following. In two-sided matching, where both sides have preferences like in a marriage market, there are no stable matching mechanisms where both sides want to report truthfully. For example, if you use the mechanism currently in place in Boston where students rank schools, schools themselves have the incentive to manipulate the outcome by changing how many slots they offer. Pathak and Kojima show that when the market is large, it is (in a particular sense) an equilibrium for both sides to act truthfully; roughly, even if I screw one student I don’t want out of a slot, in a thick market it is unlikely I wind up with a more-preferred student to replace them. There has more recently been a growing literature on what really matters in matching markets: is it the stability of the matching mechanism, or the thickness of the market, or the timing, and so on.

This award strikes me as the last remaining award, at least in the near term, from the matching/market design boom of the past 20 years. As Becker took economics out of pure market transactions and into a wider world of rational choice under constraints, the work of Al Roth and his descendants, including Parag Pathak, has greatly expanded our ability to take advantage of choice and local knowledge in situations like education and health where, for many reasons, we do not use the price mechanism. That said, there remains quite a bit to do on understanding how to get the benefits of decentralization without price – I am deeply interested in this question when it comes to innovation policy – and I don’t doubt that two decades from now, continued inquiry along these lines will have fruitfully exploited the methods and careful technique that Parag Pathak embodies.

One final note, and this in no way takes away from how deserving Pathak and other recent winners have been. Yet: I would be remiss if I didn’t point out, again, how unusually “micro” the Clark medal has been of late. There literally has not been a pure macroeconomist or econometrician – two of the three “core” fields of economics – since 1999, and only Donaldson and Acemoglu are even arguably close. Though the prize has gone to three straight winners with a theoretical bent, at least in part, the prize is still not reflecting our field as a whole. Nothing for Emi Nakamura, or Victor Chernozhukov, or Emmanuel Farhi, or Ivan Werning, or Amir Sufi, or Chad Syverson, or Marc Melitz? These folks are incredibly influential on our field as a whole, and the Clark medal is failing to reflect the totality of what economists actually do.

“Designing Efficient College and Tax Policies,” S. Findeisen & D. Sachs (2014)

It’s job market season, which is a great time of year for economists because we get to read lots of interesting papers. The one, by Dominik Sachs from Cologne and his coauthor Sebastian Findeisen, is particularly relevant given the recent Obama policy announcement about further subsidizing community college. The basic facts of marginal college students are fairly well-known: there is a pretty substantial wage bump for college grads (including ones who are not currently attending but who would attend if college was a little cheaper), many do not go to college even given this wage bump, there are probably externalities both in the economic and social realm from having a more education population though these are quite hard to measure, borrowing constraints bind for some potential college students but don’t appear to be that important, and it is very hard to design policies which benefit only marginal college candidates without also subsidizing those who would go whether or not the subsidy existed.

The naive thought might be “why should we subsidize college in the absence of borrowing constraints? By revealed preference, people choose not to go to college even given the wage bump, which likely implies that for many people studying and spending time going to class gives negative utility. Given the wage bump, these people are apparently willing to pay a lot of money to avoid spending time in college. The social externalities of college probably exist, but in general equilibrium more college-educated workers might drive down the return to college for people who are currently going. Therefore, we ought not distort the market.”

However, Sachs and Findeisen point out that there is also a fiscal externality: higher wages equals higher tax revenue in the future, and only the government cares about that revenue. Even more, the government is risk-neutral, or at least less risk-averse than individuals, about that revenue; people might avoid going to college if, along with bumping up their expected future wages, college also introduces uncertainty into their future wage path. If a subsidy could be targeted largely to students on the margin rather than those currently attending college, and if those marginal students see a big wage bump, and if government revenue less transfers back to the taxpayer is high, then it may be worth it for the government to subsidize college even if there are no other social benefits!

The authors write a nice little structural model. People choose to go to college or not depending on their innate ability, their parent’s wealth, the cost of college, the wage bump they expect (and the variance thereof), and their personal taste or distaste for studying as opposed to working (“psychic costs”). All of those variables aside from personal taste and innate ability can be pulled out of U.S. longitudinal data, performance on the army qualifying test can proxy for innate ability, and given distributional assumptions, we can identify the last free parameter, personal taste, by assuming that people go to college only if their lifetime discounted utility from attendance, less psychic costs, exceeds the lifetime utility from working instead. A choice model of this type seems to match data from previous studies with quasirandom variation concerning the returns to college education.

The direct benefit to the government from higher tax revenue from a subsidy policy, then, is the cost of the subsidy times the number subsidized, minus the proportion of subsidized students who would not have gone to college but for the subsidy times the discounted lifetime wage bump for those students times government tax revenue as a percent of that wage bump. The authors find that a general college subsidy program nearly pays for itself: if you subsidize everyone there aren’t many marginal students, but even for those students the wage bump is substantial. Targeting low income students is even better. Though the low income students affected on the margin tend to be less academically gifted, and hence to earn a lower (absolute) increase in wages from going to college, subsidies targeted at low income students do not waste as much money subsidizing students who would go to college anyway (i.e., a large percentage of high income kids). Note that the subsidies are small enough in absolute terms that the distortion on parental labor supply, from working less in order to qualify for subsidies, is of no quantitative importance, a fact the authors show rigorously. Merit-based subsidies will attract better students who have more to gain from going to college, but they also largely affect people who would go to college anyway, hence offer less bang for the buck to government compared to need-based grants.

The authors have a nice calibrated model in hand, so there are many more questions they ask beyond the direct partial equilibrium benefits of college attendance. For example, in general equilibrium, if we induce people to go to college, the college wage premium will fall. But note that wages for non-college-grads will rise in relative terms, so the net effect of the grants discussed in the previous paragraph on government revenue is essentially unchanged. Further, as Nate Hilger found using quasirandom variation in income due to layoffs, liquidity constraints do not appear to be terribly important for the college making decision: it is increasing grants, not changing loan eligibility, that will do anything of any importance to college attendance.

November 2014 working paper (No IDEAS version). The authors have a handful of other very interesting papers in the New Dynamic Public Finance framework, which is blazing hot right now. As far as I understand the project of NDPF, essentially we can simplify the (technically all-but-impossible-to-solve) dynamic mechanism problem of designing optimal taxes and subsidies under risk aversion and savings behavior to an equivalent reduced form that essentially only depends on simple first order conditions and a handful of elasticities. Famously, it is not obvious that capital taxation should be zero.

%d bloggers like this: