This article, I hope, will be widely read by economists working on field experiments. And it comes with David Card’s name right on the title page; this is certainly not a name that one associates with structural modeling!
Field experiments and randomized control trials are booming at the moment. Until the past decade, an average year saw a single field experiment published in any of the top five journals. Now, 8 to 10 a year are. The vast majority of these papers are atheoretical, though I have a small complaint about the definition of “theoretical” which I’ll leave for the final paragraph of this post. The same atheoretical nature is largely true of lab experiments; I generally am very receptive to field experiments and much less so to lab experiments, so I’ll leave out discussion of the lab for now.
(That said, I’m curious for the lab types out there: are there any good examples of lab experiments which have overturned a key economic insight? By overturned, I mean the reversal was accepted as valid by many economists. I don’t mean “behavioral theory” like Kahneman-Tversky. I mean, an actual lab experiment in the style of the German School – we ought call it that at this point. It just seems to me like many of the “surprising” results just turn out not to be true once we move to economically relevant behavior in the market. The “gift reciprocity” paper by Fehr and coauthors is a great example, and Card, Dellavigna and Malmendier discuss it. In the lab, people “work” much harder when they get paid a surprisingly high wage. In field and natural experiments trying to replicate this, with Gneezy and List (2006) being the canonical example, there is no such economically relevant effect. I would love some counterexamples of this phenomenon, though: I’m trying my best to keep an open mind!)
But back to field experiments. After noting the paucity of theory in most experimental papers, the authors give three examples of where theory could have played a role. In the gift reciprocity/wages literature mentioned above, there are many potential explanations for what is going on in the lab. Perhaps workers feel inequity aversion, and don’t want to “rip off” unprofitable employers. Perhaps they simple act under reciprocity – if you pay me a high wage, I’ll work hard even in a one-shot game. A properly designed field experiment can distinguish between the two. An even better example is charitable giving. List and Lucking-Reiley ran a famous 2002 field experiment where they examined whether giving to charity could be affected by, for example, claiming in the brochure that the goal of the fundraising drive was already almost reached. But can’t we learn much more about charity? Do people give because of warm glow? Or because of social pressure? Or some other reason? List, Dellavigna and Malmendier have a wonderful 2010 paper that writes down a basic structural model of gift-giving, and introduces just enough randomization into the experimental design to identify all of the parameters. They find that social pressure is important, and that door-to-door fundraising can actually lower total social welfare, even taking into account the gain from purchasing whatever public good charity is raising money for. And their results have a great link back to earlier theory and to future experiments along similar lines. Now that’s great work!
The complaints against structural models always seemed hollow to me. As Card, Dellavigna and Malmendier note, when interpreting results, every paper, structural or not, is making implicit assumptions. Why not make them in a way that is both clear and is guided by the huge body of theoretical knowledge that social science has already developed? The authors note a turn away from structural models in experiments after the negative income tax papers of the 70s and 80s were thought to be failures in some sense due to the difficulty of interpreting their results. This argument was always a bit ridiculous: all social science results are hard to interpret, and there’s no way around this. Writing up research in a way that it seems more clearcut to a policy audience does not mean that the evidence actually is clearcut.
I do have one quibble with this paper, though – and I think the authors will sympathize with this complaint given their case studies. The authors divide experimental papers into four groups: descriptive, single model, competing model and parameter estimation. Single model, to take one example, is defined as a paper that lays out a formal model and tests one or more implications thereof. Similar definitions are given for competing models and parameter estimations. Once we get over Friedman’s 1953 model of economic methodology, though, we’ve got to realize that “testing” models is far, far away from the only link between theory and data. Theory is useful to empirics because it can guide interesting and nonobvious questions to look for, because it can be used to justify nontestable econometric assumptions, because it allows for reasonable discussion of counterfactuals, because it allows empirical studies to be linked into a broader conception of knowledge, because it allows for results to be interpreted correctly, etc. I’d argue that checking whether papers “test” models is almost irrelevant for knowing whether empirical papers properly use theory. Let me give my favorite example, which I used in a presentation to empirical economists last year. Imagine you study government-mandated hospital report cards, and find that two years into the program, there is no evidence that hospitals or patients are changing behavior based on the ratings, but that 20% of patients were looking at the report cards at some point. An atheoretical paper might suggest that these report card programs are a waste of money. A theoretically-guided paper would note that game theorists have shown reputational equilibria often are discontinuous, and that perhaps if more patients were induced to look at the report cards (maybe by directly mailing them to each household once a year), hospitals would begin to react by giving better care. There is no testing of a theoretical model or anything similar, but there is certainly great use of theory! (Perhaps of interest: my two favorite job market papers of the last couple years, those of Ben Handel and Heidi Williams, both use theory in one of the ways above rather than in the direct “let’s use data to test a theoretical model” framework…)
Similar comments apply to theorists’ use of empirical research, of course, but let’s save that for another day.
http://elsa.berkeley.edu/~sdellavi/wp/FieldExperimentJEPFeb11Tris.pdf (February 2011 working paper – forthcoming in the JEP)