Category Archives: Experimentation

“The Role of Theory in Field Experiments,” D. Card, S. Dellavigna & U. Malmendier (2011)

This article, I hope, will be widely read by economists working on field experiments. And it comes with David Card’s name right on the title page; this is certainly not a name that one associates with structural modeling!

Field experiments and randomized control trials are booming at the moment. Until the past decade, an average year saw a single field experiment published in any of the top five journals. Now, 8 to 10 a year are. The vast majority of these papers are atheoretical, though I have a small complaint about the definition of “theoretical” which I’ll leave for the final paragraph of this post. The same atheoretical nature is largely true of lab experiments; I generally am very receptive to field experiments and much less so to lab experiments, so I’ll leave out discussion of the lab for now.

(That said, I’m curious for the lab types out there: are there any good examples of lab experiments which have overturned a key economic insight? By overturned, I mean the reversal was accepted as valid by many economists. I don’t mean “behavioral theory” like Kahneman-Tversky. I mean, an actual lab experiment in the style of the German School – we ought call it that at this point. It just seems to me like many of the “surprising” results just turn out not to be true once we move to economically relevant behavior in the market. The “gift reciprocity” paper by Fehr and coauthors is a great example, and Card, Dellavigna and Malmendier discuss it. In the lab, people “work” much harder when they get paid a surprisingly high wage. In field and natural experiments trying to replicate this, with Gneezy and List (2006) being the canonical example, there is no such economically relevant effect. I would love some counterexamples of this phenomenon, though: I’m trying my best to keep an open mind!)

But back to field experiments. After noting the paucity of theory in most experimental papers, the authors give three examples of where theory could have played a role. In the gift reciprocity/wages literature mentioned above, there are many potential explanations for what is going on in the lab. Perhaps workers feel inequity aversion, and don’t want to “rip off” unprofitable employers. Perhaps they simple act under reciprocity – if you pay me a high wage, I’ll work hard even in a one-shot game. A properly designed field experiment can distinguish between the two. An even better example is charitable giving. List and Lucking-Reiley ran a famous 2002 field experiment where they examined whether giving to charity could be affected by, for example, claiming in the brochure that the goal of the fundraising drive was already almost reached. But can’t we learn much more about charity? Do people give because of warm glow? Or because of social pressure? Or some other reason? List, Dellavigna and Malmendier have a wonderful 2010 paper that writes down a basic structural model of gift-giving, and introduces just enough randomization into the experimental design to identify all of the parameters. They find that social pressure is important, and that door-to-door fundraising can actually lower total social welfare, even taking into account the gain from purchasing whatever public good charity is raising money for. And their results have a great link back to earlier theory and to future experiments along similar lines. Now that’s great work!

The complaints against structural models always seemed hollow to me. As Card, Dellavigna and Malmendier note, when interpreting results, every paper, structural or not, is making implicit assumptions. Why not make them in a way that is both clear and is guided by the huge body of theoretical knowledge that social science has already developed? The authors note a turn away from structural models in experiments after the negative income tax papers of the 70s and 80s were thought to be failures in some sense due to the difficulty of interpreting their results. This argument was always a bit ridiculous: all social science results are hard to interpret, and there’s no way around this. Writing up research in a way that it seems more clearcut to a policy audience does not mean that the evidence actually is clearcut.

I do have one quibble with this paper, though – and I think the authors will sympathize with this complaint given their case studies. The authors divide experimental papers into four groups: descriptive, single model, competing model and parameter estimation. Single model, to take one example, is defined as a paper that lays out a formal model and tests one or more implications thereof. Similar definitions are given for competing models and parameter estimations. Once we get over Friedman’s 1953 model of economic methodology, though, we’ve got to realize that “testing” models is far, far away from the only link between theory and data. Theory is useful to empirics because it can guide interesting and nonobvious questions to look for, because it can be used to justify nontestable econometric assumptions, because it allows for reasonable discussion of counterfactuals, because it allows empirical studies to be linked into a broader conception of knowledge, because it allows for results to be interpreted correctly, etc. I’d argue that checking whether papers “test” models is almost irrelevant for knowing whether empirical papers properly use theory. Let me give my favorite example, which I used in a presentation to empirical economists last year. Imagine you study government-mandated hospital report cards, and find that two years into the program, there is no evidence that hospitals or patients are changing behavior based on the ratings, but that 20% of patients were looking at the report cards at some point. An atheoretical paper might suggest that these report card programs are a waste of money. A theoretically-guided paper would note that game theorists have shown reputational equilibria often are discontinuous, and that perhaps if more patients were induced to look at the report cards (maybe by directly mailing them to each household once a year), hospitals would begin to react by giving better care. There is no testing of a theoretical model or anything similar, but there is certainly great use of theory! (Perhaps of interest: my two favorite job market papers of the last couple years, those of Ben Handel and Heidi Williams, both use theory in one of the ways above rather than in the direct “let’s use data to test a theoretical model” framework…)

Similar comments apply to theorists’ use of empirical research, of course, but let’s save that for another day.

http://elsa.berkeley.edu/~sdellavi/wp/FieldExperimentJEPFeb11Tris.pdf (February 2011 working paper – forthcoming in the JEP)

“Reviews, Reputation and Revenue: The Case of Yelp.com,” M. Luca (2010)

I’m doing some work related to social learning, and a friend passed along the present paper by a recent job market candidate. It’s quite clever, and a great use of the wealth of data now available to the empirically-minded economist.

Here’s the question: there are tons of ways products, stores and restaurants develop reputation. One of these ways is reviews. How important is that extra Michelin star, or higher Zagat rating, or better word of mouth? And how could we ever separate the effect of reputation from the underlying quality of the restaurant?

Luca scrapes restaurant review data from Yelp, which really began penetrating Seattle in 2005; Yelp data is great because it includes review dates, so you can go back in time and reconstruct, with some error due to deleted reviews, what the review profile used to look like. Luca also has, incredibly, 7 years of restaurant revenue data from the city of Seattle. Just put the two together and you can track how restaurant reviews are correlated with revenue.

But what of causality? Here’s the clever bit. He notes that Yelp aggregates reviews into a star rating. So a restaurant with average review 3.24 gets 3 stars, and one with 3.25 gets 3.5 stars. Since no one actually reads all 200, for example, reviews of a given restaurant, the star rating can be said to represent reputation, while the actual review average is the underlying restaurant quality. It’s 2011, so this calls for some regression discontinuity (apparently, some grad students at Harvard call the empirical publication gatekeepers “the identification Taliban”; at least the present paper gets the internal validity right and doesn’t seem to have too many interpretive problems with external validity).

Holding underlying quality constant, the discontinuous jump of a half star is worth a 4.5% increase in revenue in the relevant quarter. This is large, but not crazy: similar gains have been found in recent work for moving from “B” to “A” in sanitary score, or from calorie consumption after calorie info was posted in New York City. The effect is close to zero for chain stores – one way this might be interpreted is that no one Yelps restaurants they are already familiar with. I would have liked to see some sort of demographic check here also: is the “Yelp effect” stronger in neighborhoods with younger, more internet-savvy consumers, as you might expect? Also, you may wonder whether there is manipulation by restaurant owners, given the large gains from a tiny jump in star rating. A quick and dirty distributional check doesn’t find any problem with manipulation, but that may change after this paper gets published!

You may also be wondering why reputation matters at all: why don’t I just go to a good restaurant? The answer is social learning plus costs of experimentation. The paper I’m working on now follows this line of thought toward what I think is a rather surprising policy implication: more on this at a future date.

http://people.bu.edu/mluca/JMP.pdf (Working paper version – Luca was hired at HBS, so savvy use of a great dataset pays off!)

%d bloggers like this: