“The Role of Theory in Field Experiments,” D. Card, S. Dellavigna & U. Malmendier (2011)

This article, I hope, will be widely read by economists working on field experiments. And it comes with David Card’s name right on the title page; this is certainly not a name that one associates with structural modeling!

Field experiments and randomized control trials are booming at the moment. Until the past decade, an average year saw a single field experiment published in any of the top five journals. Now, 8 to 10 a year are. The vast majority of these papers are atheoretical, though I have a small complaint about the definition of “theoretical” which I’ll leave for the final paragraph of this post. The same atheoretical nature is largely true of lab experiments; I generally am very receptive to field experiments and much less so to lab experiments, so I’ll leave out discussion of the lab for now.

(That said, I’m curious for the lab types out there: are there any good examples of lab experiments which have overturned a key economic insight? By overturned, I mean the reversal was accepted as valid by many economists. I don’t mean “behavioral theory” like Kahneman-Tversky. I mean, an actual lab experiment in the style of the German School – we ought call it that at this point. It just seems to me like many of the “surprising” results just turn out not to be true once we move to economically relevant behavior in the market. The “gift reciprocity” paper by Fehr and coauthors is a great example, and Card, Dellavigna and Malmendier discuss it. In the lab, people “work” much harder when they get paid a surprisingly high wage. In field and natural experiments trying to replicate this, with Gneezy and List (2006) being the canonical example, there is no such economically relevant effect. I would love some counterexamples of this phenomenon, though: I’m trying my best to keep an open mind!)

But back to field experiments. After noting the paucity of theory in most experimental papers, the authors give three examples of where theory could have played a role. In the gift reciprocity/wages literature mentioned above, there are many potential explanations for what is going on in the lab. Perhaps workers feel inequity aversion, and don’t want to “rip off” unprofitable employers. Perhaps they simple act under reciprocity – if you pay me a high wage, I’ll work hard even in a one-shot game. A properly designed field experiment can distinguish between the two. An even better example is charitable giving. List and Lucking-Reiley ran a famous 2002 field experiment where they examined whether giving to charity could be affected by, for example, claiming in the brochure that the goal of the fundraising drive was already almost reached. But can’t we learn much more about charity? Do people give because of warm glow? Or because of social pressure? Or some other reason? List, Dellavigna and Malmendier have a wonderful 2010 paper that writes down a basic structural model of gift-giving, and introduces just enough randomization into the experimental design to identify all of the parameters. They find that social pressure is important, and that door-to-door fundraising can actually lower total social welfare, even taking into account the gain from purchasing whatever public good charity is raising money for. And their results have a great link back to earlier theory and to future experiments along similar lines. Now that’s great work!

The complaints against structural models always seemed hollow to me. As Card, Dellavigna and Malmendier note, when interpreting results, every paper, structural or not, is making implicit assumptions. Why not make them in a way that is both clear and is guided by the huge body of theoretical knowledge that social science has already developed? The authors note a turn away from structural models in experiments after the negative income tax papers of the 70s and 80s were thought to be failures in some sense due to the difficulty of interpreting their results. This argument was always a bit ridiculous: all social science results are hard to interpret, and there’s no way around this. Writing up research in a way that it seems more clearcut to a policy audience does not mean that the evidence actually is clearcut.

I do have one quibble with this paper, though – and I think the authors will sympathize with this complaint given their case studies. The authors divide experimental papers into four groups: descriptive, single model, competing model and parameter estimation. Single model, to take one example, is defined as a paper that lays out a formal model and tests one or more implications thereof. Similar definitions are given for competing models and parameter estimations. Once we get over Friedman’s 1953 model of economic methodology, though, we’ve got to realize that “testing” models is far, far away from the only link between theory and data. Theory is useful to empirics because it can guide interesting and nonobvious questions to look for, because it can be used to justify nontestable econometric assumptions, because it allows for reasonable discussion of counterfactuals, because it allows empirical studies to be linked into a broader conception of knowledge, because it allows for results to be interpreted correctly, etc. I’d argue that checking whether papers “test” models is almost irrelevant for knowing whether empirical papers properly use theory. Let me give my favorite example, which I used in a presentation to empirical economists last year. Imagine you study government-mandated hospital report cards, and find that two years into the program, there is no evidence that hospitals or patients are changing behavior based on the ratings, but that 20% of patients were looking at the report cards at some point. An atheoretical paper might suggest that these report card programs are a waste of money. A theoretically-guided paper would note that game theorists have shown reputational equilibria often are discontinuous, and that perhaps if more patients were induced to look at the report cards (maybe by directly mailing them to each household once a year), hospitals would begin to react by giving better care. There is no testing of a theoretical model or anything similar, but there is certainly great use of theory! (Perhaps of interest: my two favorite job market papers of the last couple years, those of Ben Handel and Heidi Williams, both use theory in one of the ways above rather than in the direct “let’s use data to test a theoretical model” framework…)

Similar comments apply to theorists’ use of empirical research, of course, but let’s save that for another day.

http://elsa.berkeley.edu/~sdellavi/wp/FieldExperimentJEPFeb11Tris.pdf (February 2011 working paper – forthcoming in the JEP)

About these ads

3 thoughts on ““The Role of Theory in Field Experiments,” D. Card, S. Dellavigna & U. Malmendier (2011)

  1. Paul says:

    Hi, I’m a recent PhD graduate from a “behavioral economics” graduate program–just wanted to reply about your quip about lab experiments.

    First one funny story—Norbert Schwarz, a fairly famous social psychologist, was giving a talk at Harvard’s experimental/behavioral econ seminar about 2 years ago. He was presenting some findings on accessibility (e.g. that easy to pronounce stock market ticker names did better at IPO—how’s that for market relevance?) and received the standard econ question about incentives and whether that would change his observed lab experiment. Upon which he remarked, “have incentives ever really been shown to truly change a bias once it is demonstrated in the lab? can you take even one important bias and show that it goes away?” Apparently he’s not familiar with John List’s work :-P I don’t believe anyone in the audience replied–Ed Glaeser didn’t come that day, else I’m sure he would have made a caustic retort.

    But to you I would direct a similar critique. There are numerous examples of biases that don’t change in the face of market incentives. Medical decision making is one place where it’s very difficult to square people’s choices with a rational model. I would check out some of Peter Ubel’s work in that vein. But to offer you a better example–how about this QJE paper of Ariely and Loewenstein on Coherent Arbitrariness. You have a lab study that essentially creates a market and shows that anchoring effects are not altered by market exchange. In general many of the canonical biases in Kahneman and Tversky don’t seem to change very much in the face of incentives–Holt and Laury’s paper on risk aversion comes to mind. So I think there are problems on both sides.

    (Not to mention the methodological issues with trusting studies in each camp–theres plenty of con in econometrics, and plenty of file drawer and other dirty secrets in social psych– the researcher degrees of freedom problem to me presents the most serious problem in trying to
    understand what papers have the most external validity)

    Coherent Arbitrariness:
    (though here’s one replication failure for more common market goods: http://levine.sscnet.ucla.edu/archive/refs4661465000000000312.pdf
    I’ve run anchoring studies and found them to be pretty stable–friends as well)

    Risk Aversion:

    I’d be curious to know what you think…

  2. Paul says:

    Oh yeah, one more obvious example comes to mind…all of Thaler’s stuff on default effects on retirement decisions. People have huge incentives to be rational there, but don’t seem to be…

    • afinetheorem says:

      Thanks for the citations, Paul. I’ll definitely check them out. Concerning Thaler and retirement, though, I suppose I question the extent to which lab studies about anchoring “convinced” the profession rather than simple evidence of the choices people actually make with their retirement accounts. Two followups to that statement: first, I have nothing against “behavioral economics” per se – it’s clearly useful – but I’m less sold on the lab experimental method of behavioral social science. Second, you might wonder why I care about whether A or B is found convincing by the profession. I’ve written a few times here that I subscribe to a philosophy of science that puts a ton of weight on subjective judgment of quality of research by practitioners, so I actually think the question of credit is pretty important here.

      I’m with you that there certainly need be more conversations between different methodological schools, however. I’m sure Ed would have been caustic upon hearing Schwarz’ remark, and I recall a lecture here at NW by Ernst Fehr where that eminence grise was nearly driven out the room with pitchforks.

Comments are closed.


Get every new post delivered to your Inbox.

Join 169 other followers

%d bloggers like this: