Category Archives: Experimentation

The 2017 Nobel: Richard Thaler

A true surprise this morning: the behavioral economist Richard Thaler from the University of Chicago has won the Nobel Prize in economics. It is not a surprise because it is undeserving; rather, it is a surprise because only four years ago, Thaler’s natural co-laureate Bob Shiller won while Thaler was left the bridesmaid. But Thaler’s influence on the profession, and the world, is unquestionable. There are few developed governments who do not have a “nudge” unit of some sort trying to take advantage of behavioral nudges to push people a touch in one way or another, including here in Ontario via my colleagues at BEAR. I will admit, perhaps under the undue influence of too many dead economists, that I am skeptical of nudging and behavioral finance on both positive and normative grounds, so this review will be one of friendly challenge rather than hagiography. I trust that there will be no shortage of wonderful positive reflections on Thaler’s contribution to policy, particularly because he is the rare economist whose work is totally accessible to laymen and, more importantly, journalists.

Much of my skepticism is similar to how Fama thinks about behavioral finance: “I’ve always said they are very good at describing how individual behavior departs from rationality. That branch of it has been incredibly useful. It’s the leap from there to what it implies about market pricing where the claims are not so well-documented in terms of empirical evidence.” In other words, surely most people are not that informed and not that rational much of the time, but repeated experience, market selection, and other aggregative factors mean that this irrationality may not matter much for the economy at large. It is very easy to claim that since economists model “agents” as “rational”, we would, for example, “not expect a gift on the day of the year in which she happened to get married, or be born” and indeed “would be perplexed by the idea of gifts at all” (Thaler 2015). This type of economist caricature is both widespread and absurd, I’m afraid. In order to understand the value of Thaler’s work, we ought first look at situations where behavioral factors matter in real world, equilibrium decisions of consequence, then figure out how common those situations are, and why.

The canonical example of Thaler’s useful behavioral nudges is his “Save More Tomorrow” pension plan, with Benartzi. Many individuals in defined contribution plans save too little, both because they are not good at calculating how much they need to save and because they are biased toward present consumption. You can, of course, force people to save a la Singapore, but we dislike these plans because individuals vary in their need and desire for saving, and because we find the reliance on government coercion to save heavy-handed. Alternatively, you can default defined-contribution plans to involve some savings rate, but it turns out people do not vary their behavior from the default throughout their career, and hence save too little solely because they didn’t want too much removed from their first paycheck. Thaler and Benartzi have companies offer plans where you agree now to having your savings rate increased when you get raises – for instance, if your salary goes up 2%, you will have half of that set into a savings plan tomorrow, until you reach a savings rate that is sufficiently high. In this way, no one takes a nominal post-savings paycut. People can, of course, leave this plan whenever they want. In their field experiments, savings rates did in fact soar (with takeup varying hugely depending on how information about the plan was presented), and attrition in the future from the plan was low.

This policy is what Thaler and Sunstein call “libertarian paternalism”. It is paternalistic because, yes, we think that you may make bad decisions from your own perspective because you are not that bright, or because you are lazy, or because you have many things which require your attention. It is libertarian because there is no compulsion, in that anyone can opt out at their leisure. Results similar to Thaler and Benartzi’s have found by Ashraf et al in a field experiment in the Philippines, and by Karlan et al in three countries where just sending reminder messages which make savings goals more salient modestly increase savings.

So far, so good. We have three issues to unpack, however. First, when is this nudge acceptable on ethical grounds? Second, why does nudging generate such large effects here, and if the effects are large, why doesn’t the market simply provide them? Third, is the 401k savings case idiosyncratic or representative? The idea that the homo economicus, rational calculator, misses important features of human behavior, and would do with some insights from psychology, is not new, of course. Thaler’s prize is, at minimum, the fifth Nobel to go to someone pushing this general idea, since Herb Simon, Maurice Allais, Daniel Kahneman, and the aforementioned Bob Shiller have all already won. Copious empirical evidence, and indeed simple human observation, implies that people have behavioral biases, that they are not perfectly rational – as Thaler has noted, we see what looks like irrationality even in the composition of 100 million dollar baseball rosters. The more militant behavioralists insist that ignoring these psychological factors is unscientific! And yet, and yet: the vast majority of economists, all of whom are by now familiar with these illustrious laureates and their work, still use fairly standard expected utility maximizing agents in nearly all of our papers. Unpacking the three issues above will clarify how that could possibly be so.

Let’s discuss ethics first. Simply arguing that organizations “must” make a choice (as Thaler and Sunstein do) is insufficient; we would not say a firm that defaults consumers into an autorenewal for a product they rarely renew when making an active choice is acting “neutrally”. Nudges can be used for “good” or “evil”. Worse, whether a nudge is good or evil depends on the planner’s evaluation of the agent’s “inner rational self”, as Infante and Sugden, among others, have noted many times. That is, claiming paternalism is “only a nudge” does not excuse the paternalist from the usual moral philosophic critiques! Indeed, as Chetty and friends have argued, the more you believe behavioral biases exist and are “nudgeable”, the more careful you need to be as a policymaker about inadvertently reducing welfare. There is, I think, less controversy when we use nudges rather than coercion to reach some policy goal. For instance, if a policymaker wants to reduce energy usage, and is worried about distortionary taxation, nudges may (depending on how you think about social welfare with non-rational preferences!) be a better way to achieve the desired outcomes. But this goal is very different from the common justification that nudges somehow are pushing people toward policies they actually like in their heart of hearts. Carroll et al have a very nice theoretical paper trying to untangle exactly what “better” means for behavioral agents, and exactly when the imprecision of nudges or defaults given our imperfect knowledge of individual’s heterogeneous preferences makes attempts at libertarian paternalism worse than laissez faire.

What of the practical effects of nudges? How can they be so large, and in what contexts? Thaler has very convincingly shown that behavioral biases can affect real world behavior, and that understanding those biases means two policies which are identical from the perspective of a homo economicus model can have very different effects. But many economic situations involve players doing things repeatedly with feedback – where heuristics approximated by rationality evolve – or involve players who “perform poorly” being selected out of the game. For example, I can think of many simple nudges to get you or me to play better basketball. But when it comes to Michael Jordan, the first order effects are surely how well he takes cares of his health, the teammates he has around him, and so on. I can think of many heuristics useful for understanding how simple physics will operate, but I don’t think I can find many that would improve Einstein’s understanding of how the world works. The 401k situation is unusual because it is a decision with limited short-run feedback, taken by unsophisticated agents who will learn little even with experience. The natural alternative, of course, is to have agents outsource the difficult parts of the decision, to investment managers or the like. And these managers will make money by improving people’s earnings. No surprise that robo-advisors, index funds, and personal banking have all become more important as defined contribution plans have become more common! If we worry about behavioral biases, we ought worry especially about market imperfections that prevent the existence of designated agents who handle the difficult decisions for us.

The fact that agents can exist is one reason that irrationality in the lab may not translate into irrationality in the market. But even without agents, we might reasonably be suspect of some claims of widespread irrationality. Consider Thaler’s famous endowment effect: how much you are willing to pay for, say, a coffee mug or a pen is much less than how much you would accept to have the coffee mug taken away from you. Indeed, it is not unusual in a study to find a ratio of three times or greater between the willingness to pay and willingness to accept amount. But, of course, if these were “preferences”, you could be money pumped (see Yaari, applying a theorem of de Finetti, on the mathematics of the pump). Say you value the mug at ten bucks when you own it and five bucks when you don’t. Do we really think I can regularly get you to pay twice as much by loaning you the mug for free for a month? Do we see car companies letting you take a month-long test drive of a $20,000 car then letting you keep the car only if you pay $40,000, with some consumers accepting? Surely not. Now the reason why is partly what Laibson and Yariv argue, that money pumps do not exist in competitive economies since market pressure will compete away rents: someone else will offer you the car at $20,000 and you will just buy from them. But even if the car company is a monopolist, surely we find the magnitude of the money pump implied here to be on face ridiculous.

Even worse are the dictator games introduced in Thaler’s 1986 fairness paper. Students were asked, upon being given $20, whether they wanted to give an anonymous student half of their endowment or 10%. Many of the students gave half! This experiment has been repeated many, many times, with similar effects. Does this mean economists are naive to neglect the social preferences of humans? Of course not! People are endowed with money and gifts all the time. They essentially never give any of it to random strangers – I feel confident assuming you, the reader, have never been handed some bills on the sidewalk by an officeworker who just got a big bonus! Worse, the context of the experiment matters a ton (see John List on this point). Indeed, despite hundreds of lab experiments on dictator games, I feel far more confident predicting real world behavior following windfalls if we use a parsimonious homo economicus model than if we use the results of dictator games. Does this mean the games are useless? Of course not – studying what factors affect other-regarding preferences is interesting, and important. But how odd to have a branch of our field filled with people who see armchair theorizing of homo economicus as “unscientific”, yet take lab experiments so literally even when they are so clearly contrary to data?

To take one final example, consider Thaler’s famous model of “mental accounting”. In many experiments, he shows people have “budgets” set aside for various tasks. I have my “gas budget” and adjust my driving when gas prices change. I only sell stocks when I am up overall on that stock since I want my “mental account” of that particular transaction to be positive. But how important is this in the aggregate? Take the Engel curve. Budget shares devoted to food fall with income. This is widely established historically and in the cross section. Where is the mental account? Farber (2008 AER) even challenges the canonical account of taxi drivers working just enough hours to make their targeted income. As in the dictator game and the endowment effect, there is a gap between what is real, psychologically, and what is consequential enough to be first-order in our economic understanding of the world.

Let’s sum up. Thaler’s work is brilliant – it is a rare case of an economist taking psychology seriously and actually coming up with policy-relevant consequences like the 401k policy. But Thaler’s work is also dangerous to young economists who see biases everywhere. Experts in a field, and markets with agents and mechanisms and all the other tricks they develop, are very very good at ferreting out irrationality, and economists core skill lies in not missing those tricks.

Some remaining bagatelles: 1) Thaler and his PhD advisor, Sherwin Rosen, have one of the first papers on measuring the “statistical” value of a life, a technique now widely employed in health economics and policy. 2) Beyond his academic work, Thaler has won a modicum of fame as a popular writer (Nudge, written with Cass Sunstein, is canonical here) and for his brief turn as an actor alongside Selena Gomez in “The Big Short”. 3) Dick has a large literature on “fairness” in pricing, a topic which goes back to Thomas Aquinas, if not earlier. Many of the experiments Thaler performs, like the thought experiments of Aquinas, come down to the fact that many perceive market power to be unfair. Sure, I agree, but I’m not sure there’s much more that can be learned than this uncontroversial fact. 4) Law and econ has been massively influenced by Thaler. As a simple example, if endowment effects are real, then the assignment of property rights matters even when there are no transaction costs. Jolls et al 1998 go into more depth on this issue. 5) Thaler’s precise results in so-called behavioral finance are beyond my area of expertise, so I defer to John Cochrane’s comments following the 2013 Nobel. Eugene Fama is, I think, correct when he suggests that market efficiency generated by rational traders with risk aversion is the best model we have of financial behavior, where best is measured by “is this model useful for explaining the world.” The number of behavioral anomalies at the level of the market which persist and are relevant in the aggregate do not strike me as large, while the number of investors and policymakers who make dreadful decisions because they believe markets are driven by behavioral sentiments is large indeed!


Reinhard Selten and the making of modern game theory

Reinhard Selten, it is no exaggeration, is a founding father of two massive branches of modern economics: experiments and industrial organization. He passed away last week after a long and idiosyncratic life. Game theory as developed by the three co-Nobel laureates Selten, Nash, and Harsanyi is so embedded in economic reasoning today that, to a great extent, it has replaced price theory as the core organizing principle of our field. That this would happen was not always so clear, however.

Take a look at some canonical papers before 1980. Arrow’s Possibility Theorem simply assumed true preferences can be elicited; not until Gibbard and Satterthwaite do we answer the question of whether there is even a social choice rule that can elicit those preferences truthfully! Rothschild and Stiglitz’s celebrated 1976 essay on imperfect information in insurance markets defines equilibria in terms of a individual rationality, best responses in the Cournot sense, and free entry. How odd this seems today – surely the natural equilibrium in an insurance market depends on beliefs about the knowledge held by others, and beliefs about those beliefs! Analyses of bargaining before Rubinstein’s 1982 breakthrough nearly always rely on axioms of psychology rather than strategic reasoning. Discussions of predatory pricing until the 1970s, at the very earliest, relied on arguments that we now find unacceptably loose in their treatment of beliefs.

What happened? Why didn’t modern game-theoretic treatment of strategic situations – principally those involve more than one agent but less than an infinite number, although even situations of perfect competition now often are motivated game theoretically – arrive soon after the proofs of von Neumann, Morganstern, and Nash? Why wasn’t the Nash program, of finding justification in self-interested noncooperative reasoning for cooperative or axiom-driven behavior, immediately taken up? The problem was that the core concept of the Nash equilibrium simply permits too great a multiplicity of outcomes, some of which feel natural and others of which are less so. As such, a long search, driven essentially by a small community of mathematicians and economists, attempted to find the “right” refinements of Nash. And a small community it was: I recall Drew Fudenberg telling a story about a harrowing bus ride at an early game theory conference, where a fellow rider mentioned offhand that should they crash, the vast majority of game theorists in the world would be wiped out in one go!

Selten’s most renowned contribution came in the idea of perfection. The concept of subgame perfection was first proposed in a German-language journal in 1965 (making it one of the rare modern economic classics inaccessible to English speakers in the original, alongside Maurice Allais’ 1953 French-language paper in Econometrica which introduces the Allais paradox). Selten’s background up to 1965 is quite unusual. A young man during World War II, raised Protestant but with one Jewish parent, Selten fled Germany to work on farms, and only finished high school at 20 and college at 26. His two interests were mathematics, for which he worked on the then-unusual extensive form game for his doctoral degree, and experimentation, inspired by the small team of young professors at Frankfurt trying to pin down behavior in oligopoly through small lab studies.

In the 1965 paper, on demand inertia (paper is gated), Selten wrote a small game theoretic model to accompany the experiment, but realized there were many equilibria. The term “subgame perfect” was not introduced until 1974, also by Selten, but the idea itself is clear in the ’65 paper. He proposed that attention should focus on equilibria where, after every action, each player continues to act rationally from that point forward; that is, he proposed that in every “subgame”, or every game that could conceivably occur after some actions have been taken, equilibrium actions must remain an equilibrium. Consider predatory pricing: a firm considers lowering price below cost today to deter entry. It is a Nash equilibrium for entrants to believe the price would continue to stay low should they enter, and hence to not enter. But it is not subgame perfect: the entrant should reason that after entering, it is not worthwhile for the incumbent to continue to lose money once the entry has already occurred.

Complicated strings of deductions which rule out some actions based on faraway subgames can seem paradoxical, of course, and did even to Selten. In his famous Chain Store paradox, he considers a firm with stores in many locations choosing whether to price aggressively to deter entry, with one potential entrant in each town choosing one at a time whether to enter. Entrants prefer to enter if pricing is not aggressive, but prefer to remain out otherwise; incumbents prefer to price nonaggressively either if entry occurs or not. Reasoning backward, in the final town we have the simple one-shot predatory pricing case analyzed above, where we saw that entry is the only subgame perfect equilibria. Therefore, the entrant in the second-to-last town knows that the incumbent will not fight entry aggressively in the final town, hence there is no benefit to doing so in the second-to-last town, hence entry occurs again. Reasoning similarly, entry occurs everywhere. But if the incumbent could commit in advance to pricing aggressively in, say, the first 10 towns, it would deter entry in those towns and hence its profits would improve. Such commitment may not possible, but what if the incumbent’s reasoning ability is limited, and it doesn’t completely understand why aggressive pricing in early stages won’t deter the entrant in the 16th town? And what if entrants reason that the incumbent’s reasoning ability is not perfectly rational? Then aggressive pricing to deter entry can occur.

That behavior may not be perfectly rational but rather bounded had been an idea of Selten’s since he read Herbert Simon as a young professor, but in his Nobel Prize biography, he argues that progress on a suitable general theory of bounded rationality has been hard to come by. The closest Selten comes to formalizing the idea is in his paper on trembling hand perfection in 1974, inspired by conversations with John Harsanyi. The problem with subgame perfection had been noted: if an opponent takes an action off the equilibrium path, it is “irrational”, so why should rationality of the opponent be assumed in the subgame that follows? Harsanyi assumes that tiny mistakes can happen, putting even rational players into subgames. Taking the limit as mistakes become infinitesimally rare produces the idea of trembling-hand perfection. The idea of trembles implicitly introduces the idea that players have beliefs at various information sets about what has happened in the game. Kreps and Wilson’s sequential equilibrium recasts trembles as beliefs under uncertainty, and showed that a slight modification of the trembling hand leads to an easier decision-theoretic interpretation of trembles, an easier computation of equilibria, and an outcome that is nearly identical to Selten’s original idea. Sequential equilibria, of course, goes on to become to workhorse solution concept in dynamic economics, a concept which underscores essentially all of modern industrial organization.

That Harsanyi, inventor of the Bayesian game, is credited by Selten for inspiring the trembling hand paper is no surprise. The two had met at a conference in Jerusalem in the mid-1960s, and they’d worked together both on applied projects for the US military, and on pure theory research while Selten visiting Berkeley. A classic 1972 paper of theirs on Nash bargaining with incomplete information (article is gated) begins the field of cooperative games with incomplete information. And this was no minor field: Roger Myerson, in his paper introducing mechanism design under incomplete information – the famous Bayesian revelation principle paper – shows that there exists a unique Selten-Harsanyi bargaining solution under incomplete information which is incentive compatible.

Myerson’s example is amazing. Consider building a bridge which costs $100. Two people will use the bridge. One values the bridge at $90. The other values the bridge at $90 with probability .9, and $30 with probability p=.1, where that valuation is the private knowledge of the second person. Note that in either case, the bridge is worth building. But who should pay? If you propose a 50/50 split, the bridge will simply not be built 10% of the time. If you propose an 80/20 split, where even in their worst case situation each person gets a surplus value of ten dollars, the outcome is unfair to player one 90% of the time (where “unfair” will mean, violates certain principles of fairness that Nash, and later Selten and Harsanyi, set out axiomatically). What of the 53/47 split that gives each party, on average, the same split? Again, this is not “interim incentive compatible”, in that player two will refuse to pay in the case he is the type that values the bridge only at $30. Myerson shows mathematically that both players will agree once they know their private valuations to the following deal, and that the deal satisfies the Selten-Nash fairness axioms: when player 2 claims to value at $90, the payment split is 49.5/50.5 and the bridge is always built, but when player 2 claims to value at $30, the entire cost is paid by player 1 but the bridge is built with only probability .439. Under this split, there are correct incentives for player 2 to always reveal his true willingness to pay. The mechanism means that there is a 5.61 percent chance the bridge isn’t built, but the split of surplus from the bridge nonetheless does better than any other split which satisfies all of Harsanyi and Selten’s fairness axioms.

Selten’s later work is, it appears to me, more scattered. His attempt with Harsanyi to formalize “the” equilibrium refinement, in a 1988 book, was a valiant but in the end misguided attempt. His papers on theoretical biology, inspired by his interest in long walks among the wildflowers, are rather tangential to his economics. And what of his experimental work? To understand Selten’s thinking, read this fascinating dialogue with himself that Selten gave as a Schwartz Lecture at Northwestern MEDS. In this dialogue, he imagines a debate between a Bayesian economist, experimentalist, and an evolutionary biologist. The economist argues that “theory without theorems” is doomed to fail, that Bayesianism is normatively “correct”, and the Bayesian reasoning can easily be extended to include costs of reasoning or reasoning mistakes. The experimentalist argues that ad hoc assumptions are better than incorrect ones: just as human anatomy is complex and cannot be reduced to a few axioms, neither can social behavior. The biologist argues that learning a la Nelson and Winter is descriptively accurate as far as how humans behave, whereas high level reasoning is not. The “chairman”, perhaps representing Selten himself, sums up the argument as saying that experiments which simply contradict Bayesianism are a waste of time, but that human social behavior surely depends on bounded rationality and hence empirical work ought be devoted to constructing a foundation for such a theory (shall we call this the “Selten program”?). And yet, this essay was from 1990, and we seem no closer to having such a theory, nor does it seem to me that behavioral research has fundamentally contradicted most of our core empirical understanding derived from theories with pure rationality. Selten’s program, it seems, remains not only incomplete, but perhaps not even first order; the same cannot be said of his theoretical constructs, as without perfection a great part of modern economics simply could not exist.

“Optimal Contracts for Experimentation,” M. Halac, N. Kartik & Q. Liu (2013)

Innovative activities have features not possessed by more standard modes of production. The eventual output, and its value, are subject to a lot of uncertainty. Effort can be difficult to monitor – it is often the case that the researcher knows more than management about what good science should look like. The inherent skill of the scientist is hard to observe. Output is generally only observed in discrete bunches.

These features make contracting for researchers inherently challenging. The classic reference here is Holmstrom’s 1989 JEBO, which just applies his great 1980s incentive contract papers to innovative activities. Take a risk-neutral firm. They should just work on the highest expected value project, right? Well, if workers are risk averse and supply unobserved effort, the optimal contract balances moral hazard (I would love to just pay you based on your output) and risk insurance (I would have to pay you to bear risk about the eventual output of the project). It turns out that the more uncertainty a project has, the more inefficient the information-constrained optimal contract becomes, so that even risk-neutral firms are biased toward relatively safe, lower expected value projects. Incentives within the firm matter in many other ways, as Holmstrom also points out: giving employee multiple tasks when effort is unobserved makes it harder to provide proper incentives because the opportunity cost of a given project goes up, firms with a good reputation in capital markets will be reluctant to pursue risky projects since the option value of variance in reputation is lower (a la Doug Diamond’s 1989 JPE), and so on. Nonetheless, the first order problem of providing incentives for a single researcher on a single project is hard enough!

Holmstrom’s model doesn’t have any adverse selection, however: both employer and employee know what expected output will result from a given amount of effort. Nor is Holmstrom’s problem dynamic. Marina Halac, Navin Kartik and Qingmin Liu have taken up the unenviable task of solving the dynamic researcher contracting problem under adverse selection and moral hazard. Let a researcher be either a high type or a low type. In every period, the researcher can work on a risky project at cost c, or shirk at no cost. The project is either feasible or not, with probability b. If the employee shirks, or the project is bad, there will be no invention this period. If the employee works, the project is feasible, and the employee is a high type, the project succeeds with probability L1, and if the employee is low type, with probability L2<L1. Note that as time goes on, if the employee works on the risk project, they continually update their beliefs about b. If enough time passes without an invention, belief about b becomes low enough that everyone (efficiently) stops working on the risky project. The firm's goal is to get employees to exert optimal effort for the optimal number of period given their type.

Here’s where things really get tricky. Who, in expectation and assuming efficient behavior, stops working on the risky project earlier conditional on not having finished the invention, the high type or the low type? On the one hand, for any belief about b, the high type is more likely to invent, hence since costs are identical for both types, the high type should expect to keep working longer. On the other hand, the high type learns more quickly whether the project is bad, and hence his belief about b declines more rapidly, so he ought expect to work for less time. That either case is possible makes solving for the optimal contract a real challenge, because I need to write the contracts for each type such that the low type does not ever prefer the high type payoffs and vice versa. To know whether these contracts are incentive compatible, I have to know what agents will do if they deviate to the “wrong” contract. The usual trick here is to use a single crossing result along the lines of “for any contract with properties P, action Y is more likely for higher types”. In the dynamic researcher problem, since efficient stopping times can vary nonmonotically with researcher type, the single crossing trick doesn’t look so useful.

The “simple” (where simple means a 30 page proof) case is when the higher types efficiently work longer in expectation. The information-constrained optimum involves inducing the high type to work efficiently, while providing the low type too little incentive to work for the efficient amount of time. Essentially, the high type is willing to work for less money per period if only you knew who he was. Asymmetric information means the high type can extract information rents. By reducing the incentive for the low type to work in later periods, the high type information rent is reduced, and hence the optimal mechanism trades off lower total surplus generated by the low type against lower information rents paid to the high type.

This constrained-optimal outcome can be implemented by paying scientists up front, and then letting them choose either a contract with progressively increasing penalties for lack of success each period, or a contract with a single large penalty if no success is achieved by the socially efficient high type stopping time. Also, “Penalty contracts” are nice because they remain optimal even if scientists can keep their results secret: since secrecy just means paying more penalties, everyone has an incentive to reveal their invention as soon as they create it. The proof is worth going through if you’re into dynamic mechanism design; essentially, the authors are using a clever set of relaxed problems where a form of single crossing will hold, then showing that mechanism is feasible even under the actual problem constraints.

Finally, note that if there is only moral hazard (scientist type is observable) or only adverse selection (effort is observable), the efficient outcome is easy. With moral hazard, just make the agent pay the expected surplus up front, and then provide a bonus to him each period equal to the firm’s profit from an invention occurring then; we usually say in this case that “the firm is sold to the employee”. With adverse selection, we can contract on optimal effort, using total surplus to screen types as in the correlated information mechanism design literature. Even though the “distortion only at the bottom” result looks familiar from static adverse selection, the rationale here is different.

Sept 2013 working paper (No RePEc IDEAS version). The article appears to be under R&R at ReStud.

“Dynamic Commercialization Strategies for Disruptive Technologies: Evidence from the Speech Recognition Industry,” M. Marx, J. Gans & D. Hsu (2014)

Disruption. You can’t read a book about the tech industry without Clayton Christensen’s Innovator’s Dilemma coming up. Jobs loved it. Bezos loved it. Economists – well, they were a bit more confused. Here’s the story at its most elemental: in many industries, radical technologies are introduced. They perform very poorly initially, and so are ignored by the incumbent. These technologies rapidly improve, however, and the previously ignored entrants go on to dominate the industry. The lesson many tech industry folks take from this is that you ought to “disrupt yourself”. If there is a technology that can harm your most profitable business, then you should be the one to develop it; take Amazon’s “Lab126” Kindle skunkworks as an example.

There are a couple problems with this strategy, however (well, many problems actually, but I’ll save the rest for Jill Lepore’s harsh but lucid takedown of the disruption concept which recently made waves in the New Yorker). First, it simply isn’t true that all innovative industries are swept by “gales of creative destruction” – consider automobiles or pharma or oil, where the major players are essentially all quite old. Gans, Hsu and Scott Stern pointed out in a RAND article many years ago that if the market for ideas worked well, you would expect entrants with good ideas to just sell to incumbents, since the total surplus would be higher (less duplication of sales assets and the like) and since rents captured by the incumbent would be higher (less product market competition). That is, there’s no particular reason that highly innovative industries require constant churn of industry leaders.

The second problem concerns disrupting oneself or waiting to see which technologies will last. Imagine it is costly to investigate potentially disruptive technologies for the incumbent. For instance, selling mp3s in 2002 would have cannibalized existing CD sales at a retailer with a large existing CD business. Early on, the potentially disruptive technology isn’t “that good”, hence it is not in and of itself that profitable. Eventually, some of these potentially disruptive technologies will reveal themselves to actually be great improvements on the status quo. If that is the case, then, why not just let the entrant make these improvements/drive down costs/learn about market demand, and then buy them once they reveal that the potentially disruptive product is actually great? Presumably the incumbent even by this time still retains its initial advantage in logistics, sales, brand, etc. By waiting and buying instead of disrupting yourself, you can still earn those high profits on the CD business in 2002 even if mp3s had turned out to be a flash in the pan.

This is roughly the intuition in a new paper by Matt Marx – you may know his work on non-compete agreements – Gans and Hsu. Matt has also collected a great dataset from industry journals on every firm that ever operated in automated speech recognition. Using this data, the authors show that a policy by entrants of initial competition followed by licensing or acquisition is particularly common when the entrants come in with a “disruptive technology”. You should see these strategies, where the entrant proves the value of their technology and the incumbent waits to acquire, in industries where ideas are not terribly appropriable (why buy if you can steal?) and entry is not terribly expensive (in an area like biotech, clinical trials and the like are too expensive for very small firms). I would add that you also need complementary assets to be relatively hard to replicate; if they aren’t, the incumbent may well wind up being acquired rather than the entrant should the new technology prove successful!

Final July 2014 working paper (RePEc IDEAS). The paper is forthcoming in Management Science.

“Strategic Experimentation with Poisson Bandits,” G. Keller & S. Rady (2010)

The multiarmed bandit is a true workhorse of modern mathematical economics. In a bandit problem, there are multiple arms you can pull, as in some types of slot machines. You have beliefs about the distribution of payoffs given when you pull a given arm. For instance, there may be a safe arm which yields an expectation of one coin every time you pull it, and a risky arm which yields an expectation of 2 coins with prior probability 1/3, and 0 coins with prior 2/3. Returns are generally discounted. There is often a “value of experimentation” where agents will pull an arm with a lower current expected value than another arm because, for instance, learning that the second arm above is the type with expected value 2 will increase my payoff from now until infinity, while I only pay the cost of experimenting now; in many single-person bandit problems, the optimal arm to pull can be solved simply using a formula called a Gittins index derived by J.C. Gittins in the 1970s.

As far as I know, the first explicitly economic bandit paper is Rothschild’s 1974 JET “A two-armed bandit theory of market pricing.” Rothschild tries to explain why prices may be disperse for the same product over time. His explanation is simple: consumer demand is unknown, and I try to learn demand by experimenting with prices. This produces a very particular form of price dispersion. Since Rothschild, a huge amount of economics work on bandit problems involves externalities: experimenting with the risky arm is socially valuable, but I bear all the cost privately, and don’t get all the benefit. This has been used, in many forms, extensively in the R&D literature by a number of economists you may know, like Bergemann, Besanko and Hopenhayn. Keller and Rady, along with Cripps, have a famous 2005 Econometrica involving exponential bandits (i.e., a safe arm and an arm that is either a total failure or a success, with the success learned while pulling that arm according to an exponential time distribution).

This 2010 paper, in Theoretical Economics, expands the R&D model to Poisson bandits. There are two arms being pulled by N firms in continuous time. On is a safe arm which pays a flow rate of s, and one is an arm which either gives an expectation of s’>s, or s”<s. The risky arm gives payoffs in lumps, so the only difference between the risky arm of type s' and the one of type s'' is that the Poisson arrival rate is slower for s''. This means that a single "success" on the risky arm does not tell me conclusively whether the arm is the good type or the bad type.

For the usual free-riding reasons diagrammed above, experimentation will be suboptimal. But there is another interesting effect here. Let p1* be the belief about the risky arm being of type s’ such that if there were only one firm, he would pull the risky arm if his belief were above p1* and pull the safe arm if it was below p1*. Keller and Rady prove that in any Markov perfect equilibrium with N firms, I am willing to spend some of my time pulling the risky arm even when my belief is below p1*. Why? They call this the “encouragement effect.” If there is just me, then the only benefit of pulling the risky arm when I am near p1* is that I might learn the risky arm is better than I previously thought by getting a Poisson success. But with N firms, getting a Poisson success both gives me this information and, by improving everyone’s belief about the quality of the risky arm, encourages others to also experiment with the risky arm in the future. Since payoffs exhibit strategic complementarity, I will benefit from their future experimentation.

There is one other neat trick, which involves some technical tricks as well. We usually solve just for the symmetric MPE, for simplicity. In the symmetric MPE, which is unique, we all mix between the safe and risky arm as long as we above some cutoff P. But as we get closer and closer to P, we are spending arbitrarily close to zero effort on the risky arm, so our posterior, given bad news, decreases only very slowly and we never reach P in finite time. This suggests that an asymmetric MPE may do better, even in a Pareto sense. Consider the following: near P, have one person experiment with full effort if the current belief is in some set B1, and have the other person experiment if the current belief is in B2. If it is my turn to experiment, I have multiple reasons to exert full effort: most importantly, because B1 and B2 are set up so that if I change the belief enough through my experimentation, the other person will take over the cost of experimenting. Characterizing the full set of MPE is difficult, of course. (Final version in TE issue 5, 2010. Theoretical Economics is an amazing journal. It is completely open access, allows reuse and republication under a generous CC license, doesn’t charge any publication fee, doesn’t charge any submission fee as of now, and has among the fastest turnaround time in the business. Is it any surprise that TE has, by many accounts, passed Elsevier’s JET as the top field journal in micro theory?)

“Nuclear Power Reactors: A Study in Technological Lock-In,” R. Cowan (1990)

If you want to start a heated debate among historians of technology, just express an opinion about the importance of path dependence and then watch the sparks fly. Do “bad” technologies prevail because of random factors, what Brian Arthur calls “historical small events”? Or are what look like bad technologies actually good ones that prevailed for sensible reasons? Could optimal policy improve things? More on that last question in the final paragraph.

Robin Cowan, now at Maastricht, is an economist right in my sweet spot: a theorist interested in technology who enjoys the occasional dig through historical archives. His PhD dissertation concerns conditions for technological lock-in. Basically, increasing returns to scale (learning-by-using, for example) and unknown future benefits of a given research line (here is where the multiarmed bandit comes in) generally will lead to 1) a sole technology dominating the market, 2) each technology, regardless of underlying quality, having a positive probability of being that technology, and 3) cycling between technology early in the lifecycle. In the present paper, Cowan examines the history of nuclear power reactors through this framework; apropos to the previous post on this site, I think what Cowan does is a much more sensible test of a theory than any sort of statistical process.

Nuclear power is interesting because, at least of as 1990, light water nuclear power reactors are dominant despite the fact that many other types of reactors appear to have underlying quality/cost combinations as good or better. How did light water come to dominate? After WW2, the US had a monopoly on enriched uranium production, and was unwilling to share because of national security concerns. Development of nuclear power technology was also driven by military concerns: nuclear submarines could stay underwater longer, for example. A military focus led policymakers to focus research effort on small reactors which could be developed quickly.

In the 1950s, following the Soviet atomic bomb, US nuclear power policy shifted somewhat toward developing power for civilians. There was a belief that the Soviets would develop allies in exchange for Soviet nuclear power plants, and so the US began pushing “Atoms for Peace” civilian nuclear power to counter that threat. There was an urgency to such development, and because light water reactors had already been developed and accepted for submarine use, they were the quickest to develop for civilian power plant export. A handful of US firms with experience in light water heavily subsidized the capital cost of their plants, which led to rapid adoption in the early 1960s. Because of learning-by-doing, light water plant costs quickly decreased, and because of network effects – more users means more knowledge of potential safety risks, for example – a number of nations adopted light water plants soon after. During this period, other technologies like heavy water and gas graphite suffered temporary setbacks which you can think of as a bad draw in a multiarmed bandit. Because of future uncertainty in the bandit model, and learning-by-using, light water plants locked themselves in. As of 1990, at least, Cowan notes that experts both then, as well as in the 50s and 60s, did not believe that light water was necessarily the best civilian nuclear power technology.

Much more detail is found in the paper. One thing to worry about when reading, though, is the conflation of path dependence in general and socially suboptimal path dependence. Imagine two technologies with identical output and marginal cost, but one with fixed research cost 7 and one with fixed research cost 10. If the second is adopted by everyone, it appears naively that the “wrong” technology has won out. But what if the cost of 10 was already borne by military researchers developing a similar product? In that case, the second technology is socially optimal. The multiarmed bandit has similar issues – in the fact of uncertainty about nuclear power technology quality, it is not obvious that a social planner would have done anything different; indeed, many important decisions were made by the US Navy. I only mention this distinction because a friend and I have a model of technology that generates similar path dependence, but in a way that can absolutely be countered by better policy, and I’m not sure how Cowan’s historical example speaks to our model. (Final Journal of Economic History 1990 version)

“How Demanding is the Revealed Preference Approach to Demand?,” T. Beatty & I. Crawford (2011)

If you’ve read this site at all, you know that I see little value in “testing” economic theories, but if we’re going to do it, we ought at least do it in a way that makes a bit of sense. There are a ton of studies testing whether agents (here meaning not just humans; Chen and coauthors have a series of papers about revealed preference and other forms of maximizing behavior in Capuchin monkeys!) have preferences that can be described by the standard model: a concave, monotonic, continuous utility function that is time-invariant. Generally, the studies do find such maximizing behavior. But this may mean nothing: a theory that is trivially satisfied will never be shown to violate utility maximization, and indeed lots of experiments and empirical datasets see so little variation in prices that nearly any set of choices can be rationalized.

Beatty and Crawford propose a simple fix here. Consider an experiment with only two goods, and two price/income bundles. There is a feasible mixture among those two goods for each bundle. Consider the share of income under each price/income bundle spent on each of the two goods. If, say, 75% of income is spent on Good A under price/income bundle 1, then, for example, utility maximization may be consistent with spending anywhere between 0 and 89% of income on Good A under price/bundle 2. Imagine drawing a square with “income share spent on Good A under price/income bundle 1” on the x-axis, and “income share on A under bundle 2” on the y-axis. Some sets of choices will lie in a part of that square which is incompatible with utility maximization. The greater the proportion of total area which is incompatible with utility maximization, the more restrictive a test of utility maximizing behavior will be. The idea extends in a straightforward way to tests with N goods and M choices.

Beatty and Crawford assume you want a measure of “how well” agents do in a test of revealed preference as a function of both the pass rate (what proportion of the sample does not reject utility maximizing behavior) and the test difficulty (how often a random number generator selecting bundles would pass); if this all sounds like redefining the concept of statistical power, it should. It turns out that r minus a, where r is the pass rate and a is the test difficulty, has some nice axiomatic properties; I’m not totally convinced this part of the paper is that important, so I’ll leave it for you to read. The authors then apply this idea to some Spanish consumption data, where households were tracked for eight quarters. They find that about 96% of households in the sample pass: they show no purchases which violate utility maximizing behavior. But the variation in prices and quarterly income is so minimal that utility maximizing behavior imposes almost no constraints: 91% of random number generators would “pass” given the same variation in prices and incomes.

What do we learn from an exercise like this? There is definitely some benefit: if you want to design experiments concerning revealed preference, the measure in the present paper is useful indeed for helping choose precisely what variation in incomes and prices to use in order to subject revealed preference to a “tough” test. But this assumes you want to test at all. “Science is underdetermined,” they shout from the rooftops! Even if people showed behavior that “rejected” utility maximization, we would surely ask, first, by how much; second, are you sure “budget” and “price” are determined correctly (there is Varian’s error in price measurement, and no one is using lifetime income adjusted for credit constraints when talking about “budgets”); third, are you just rejecting concaveness and not maximizing behavior?; fourth, are there not preference shocks over a two year period, such as my newfound desire to buy diapers after a newborn arrives?; and so on. I think such critiques would be accepted by essentially any economist. Those of the philosophic school that I like to discuss on this site would further note that the model of utility maximization is not necessarily meant to be predictive, that we know it is “wrong” in that clearly people do not always act as if they are maximizers, and that the Max U model is nonetheless useful as a epistemic device for social science researchers. (Final working paper – final version published in AER October 2011)

“Stakes Matter in Ultimatum Games,” S. Andersen, S. Ertac, U. Gneezy, M. Hoffman & J. List (2011)

[Update, 9/7/2011: A comment at Cheap Talk mentioned a new paper by Nicholas Bardsley which I find quite relevant to the final paragraph of this post. Essentially, Bardsley is able to completely change (as far as I’m concerned) the “sharing” characteristic of the dictator game just by changing the action set available to players; if the dictator can also “take” money, and not simply share, then they do take indeed. The Hawthorne Effect Is Real, shout the villagers from the mountaintop.]

Here is one more experimental paper, which I believe is forthcoming in the AER as well. Experimentalists love the Ultimatum Game. In the Ultimatum Game, two anonymous people are matched and one of them is given X dollars. She is told to propose a split of the money between herself and the other player. The other player can then either accept his share of the split, or reject, in which case both parties get nothing. Tons of experiments over the past 20 years have shown, everywhere from U.S. undergraduate labs to tribes in the Amazon, offers that tend to be rather high (30-50%) and also high rejection rates on low offers. This is “strange” (more on this shortly) to economists because the unique subgame perfect Nash equilibrium is to offer one penny, and for the responder to accept. Even if you think that the so-called paradox is nothing of the sort – rather, people are unused to one-shot games and are instead trying to develop reputation in a repeated game called Life – there is an even stranger stylized fact: changing stakes doesn’t seem to affect behavior. That is, if the stakes are 1 dollar, 10 dollars or 100 dollars, people still reject. Why aren’t people responding to incentives at all?

I remember a study a few years ago, from Indonesia perhaps, where many days worth of wages were being rejected seemingly out of spite. (And speaking of spite, ultimatum game papers are great examples of economists abusing language. One man’s “unfair offers were consistently rejected” is another man’s “primitive spite seems more important to responders than rational thought.”)

Andersen et al (more on this also in a second) play the ultimatum game in India using stakes that range up to a year’s income. And unsurprisingly, stakes matter a lot. No matter how low the split, only one time is an offer rejected with the year’s income stakes, and that offer was less than 10% of the stake. As stakes increase from 20 rupees up to 20000, the rejection rate for a given split falls, though it seems to fall fastest when stakes get very large. The takeaway: even given all of the experimental results on the Ultimatum game, spite is probably not terribly important vis-a-vis more standard incentives across the range of “very important economic phenomena.” None of this is to say that CEOs won’t cost their firm millions out of spite – surely they sometimes do – but rather claims that human nature is hardwired for fairness or spite or whatever you want to call it even at the expense of standard maximizing behavior are limited claims indeed.

Two final notes here. First, I think economists need to come to some conclusion concerning norms on experimental papers. Econ has long had a standard of giving author billing only to those who were essential for the idea and the completion of a paper – rarely has this meant more than three authors. Credit for data collection, straightforward math, coding, etc. has generally been given in the acknowledgments. A lot of econ psych and experimental work strikes me as fighting that norm: five and six authors have become standard. (I should caveat this by saying that in the present paper, I have no idea how workload was divided; rather, I think it’s undeniable that more generally the work expected of a coauthor in experimental papers is lower than that which was traditional in economics.)

Second, and I’m sure someone has done this but I don’t have a cite, the “standard” instructions in ultimatum games seem to prime the results to a ridiculous degree. Imagine the following exercise. Give 100 dollars to a research subject (Mr. A). Afterwards, tell some other subject (Ms. B) that 100 dollars was given to Mr. A. Tell Mr. A that the other subject knows he was given the money, but don’t prime him to “share” or “offer a split” or anything similar. Later, tell Ms. B that she can, if she wishes, reverse the result and take the 100 dollars away from A – if she does so, had Mr. A happened to have given her some of the money, that would also be taken. I hope we can agree that if you did such an experiment, A would share no money and B would show no spite, as neither has been primed to see the 100 dollars as something that should have been shared in the first place. One doesn’t normally expect anonymous strangers to share their good fortune with you, surely. That is, feelings of spite, jealousy and fairness can be, and are, primed by researchers. I think this is worth keeping in mind when trying to apply the experimental results on ultimatum games to the real economy. (January 2011 working paper, forthcoming in the AER)

“A Continuous Dilemma,” D. Friedman & R. Oprea (2011)

I feel pretty confident that the two lab experiment papers I will write about today will represent the only such posts on that field here for quite a while. Both results are interesting, but as an outsider to experimental econ, I’m quite surprised that these represent the “state of the art”, and at some level both must since both are forthcoming in the AER.

In the present paper, Friedman and Oprea run three versions of the prisoner’s dilemma: a one-shot game, a one-minute continuous time game where players must “wait” 7.5 seconds to react to an opponent’s change of strategy, and a one-minute continuous time game with no limit on reaction speed aside from human reaction time. We’ve known since Nash that finitely-repeated prisoner’s dilemmas can only support defect every period as an equilibrium (by a simple backward induction unraveling argument), but that infinitely-repeated prisoner’s dilemmas can support any payoffs from the cooperate payoff to the defect payoff in equilibrium (by the Fudenberg-Maskin folk theorem). Two results from the 1980s save us a bit here. First, as the underrated Radner has pointed out, if you can react quickly to an opponent’s deviation, then you can only lose a tiny bit by cooperating and hoping your opponent cooperates also. That is, with a very high number of periods, cooperate until almost the very end is an “almost” dominant equilibria. If your opponent defects, you defect almost immediately afterward and thereafter both players play the “unique” equilibrium defect-defect. If your opponent does not defect, you both continue to cooperate until the very end. Regardless of your opponent’s strategy, “cooperate until opponent defects the first time” gains only a tiny bit less than the maximal payoff from using defect every period. Second, Simon and Stinchcombe (1989) show that in continuous time games, induction cannot be used and something like the folk theorem applies.

Friedman and Oprea test this in a lab. Basically none of their subjects cooperate in the one-shot game, and cooperation steadily increases as the minimum wait to react drops from 30 seconds to nearly continuous. In the example where the only restriction on reaction time is human response time, cooperation occurs 80-90% of the time, essentially encompassing the entire game in every example except for the last few seconds. A modification of Radner’s insight shows that this type of cutoff strategy is an epsilon-equilibrium, and that expected cooperation given the limits on reaction time are reasonable. The authors do not fully solve for the (epsilon)-equilibria of their game – I have no idea how they got away with this, but I would love to know what they said to the referees! In any case, the intuition for why cutoff strategies are nearly dominant equilibria seems reasonable, although it should be noted that this intuition is essentially Radner’s intuition and not anything novel to the present paper.

So what’s the takeaway? For a theoretically-minded reader, I think the experimental results here are simply more justification for taking care in interpreting Nash predictions for actions in lengthy, finitely-repeated games. Even for modeling purposes, it might be reasonable to see more work on epsilon-equilibria in, say, oligopoly behavior; cartel pricing is much easier to support when prices and quantities are very quickly reported if we look at that type of equilibria. I still find it a bit strange that the authors do not, as far as I can tell, attempt to distinguish between different types of theoretical explanation for high rates of cooperation in repeated games. Is there infection from beliefs a la the Kreps’ et al Gang of Four paper? (This does not appear to be the case to me, since I believe Gang of Four can sustain cooperation all the way to the horizon.) Would bounded rationality matter? (Both players’ complete action profile over time is available throughout the game in the present paper.) There are many other explanations that could be tested here. (Indeed, Bigoni et al have a new paper following up the present results with some discussion of infinite versus finite horizon continuous time games.) (Dec 2010 working paper. Final version forthcoming in the AER. If you’re coming from a theory background, there are many norms in experimental econ that will strike you as strange – writing about an experiment with 36 American undergraduates who self select into lab studies as if it representative of human behavior, for example – but I’m afraid that battle has already been lost. Best just to read experimental work for what it is; some interesting insights for theory lie inside even despite these peccadillos.)

“The Role of Theory in Field Experiments,” D. Card, S. Dellavigna & U. Malmendier (2011)

This article, I hope, will be widely read by economists working on field experiments. And it comes with David Card’s name right on the title page; this is certainly not a name that one associates with structural modeling!

Field experiments and randomized control trials are booming at the moment. Until the past decade, an average year saw a single field experiment published in any of the top five journals. Now, 8 to 10 a year are. The vast majority of these papers are atheoretical, though I have a small complaint about the definition of “theoretical” which I’ll leave for the final paragraph of this post. The same atheoretical nature is largely true of lab experiments; I generally am very receptive to field experiments and much less so to lab experiments, so I’ll leave out discussion of the lab for now.

(That said, I’m curious for the lab types out there: are there any good examples of lab experiments which have overturned a key economic insight? By overturned, I mean the reversal was accepted as valid by many economists. I don’t mean “behavioral theory” like Kahneman-Tversky. I mean, an actual lab experiment in the style of the German School – we ought call it that at this point. It just seems to me like many of the “surprising” results just turn out not to be true once we move to economically relevant behavior in the market. The “gift reciprocity” paper by Fehr and coauthors is a great example, and Card, Dellavigna and Malmendier discuss it. In the lab, people “work” much harder when they get paid a surprisingly high wage. In field and natural experiments trying to replicate this, with Gneezy and List (2006) being the canonical example, there is no such economically relevant effect. I would love some counterexamples of this phenomenon, though: I’m trying my best to keep an open mind!)

But back to field experiments. After noting the paucity of theory in most experimental papers, the authors give three examples of where theory could have played a role. In the gift reciprocity/wages literature mentioned above, there are many potential explanations for what is going on in the lab. Perhaps workers feel inequity aversion, and don’t want to “rip off” unprofitable employers. Perhaps they simple act under reciprocity – if you pay me a high wage, I’ll work hard even in a one-shot game. A properly designed field experiment can distinguish between the two. An even better example is charitable giving. List and Lucking-Reiley ran a famous 2002 field experiment where they examined whether giving to charity could be affected by, for example, claiming in the brochure that the goal of the fundraising drive was already almost reached. But can’t we learn much more about charity? Do people give because of warm glow? Or because of social pressure? Or some other reason? List, Dellavigna and Malmendier have a wonderful 2010 paper that writes down a basic structural model of gift-giving, and introduces just enough randomization into the experimental design to identify all of the parameters. They find that social pressure is important, and that door-to-door fundraising can actually lower total social welfare, even taking into account the gain from purchasing whatever public good charity is raising money for. And their results have a great link back to earlier theory and to future experiments along similar lines. Now that’s great work!

The complaints against structural models always seemed hollow to me. As Card, Dellavigna and Malmendier note, when interpreting results, every paper, structural or not, is making implicit assumptions. Why not make them in a way that is both clear and is guided by the huge body of theoretical knowledge that social science has already developed? The authors note a turn away from structural models in experiments after the negative income tax papers of the 70s and 80s were thought to be failures in some sense due to the difficulty of interpreting their results. This argument was always a bit ridiculous: all social science results are hard to interpret, and there’s no way around this. Writing up research in a way that it seems more clearcut to a policy audience does not mean that the evidence actually is clearcut.

I do have one quibble with this paper, though – and I think the authors will sympathize with this complaint given their case studies. The authors divide experimental papers into four groups: descriptive, single model, competing model and parameter estimation. Single model, to take one example, is defined as a paper that lays out a formal model and tests one or more implications thereof. Similar definitions are given for competing models and parameter estimations. Once we get over Friedman’s 1953 model of economic methodology, though, we’ve got to realize that “testing” models is far, far away from the only link between theory and data. Theory is useful to empirics because it can guide interesting and nonobvious questions to look for, because it can be used to justify nontestable econometric assumptions, because it allows for reasonable discussion of counterfactuals, because it allows empirical studies to be linked into a broader conception of knowledge, because it allows for results to be interpreted correctly, etc. I’d argue that checking whether papers “test” models is almost irrelevant for knowing whether empirical papers properly use theory. Let me give my favorite example, which I used in a presentation to empirical economists last year. Imagine you study government-mandated hospital report cards, and find that two years into the program, there is no evidence that hospitals or patients are changing behavior based on the ratings, but that 20% of patients were looking at the report cards at some point. An atheoretical paper might suggest that these report card programs are a waste of money. A theoretically-guided paper would note that game theorists have shown reputational equilibria often are discontinuous, and that perhaps if more patients were induced to look at the report cards (maybe by directly mailing them to each household once a year), hospitals would begin to react by giving better care. There is no testing of a theoretical model or anything similar, but there is certainly great use of theory! (Perhaps of interest: my two favorite job market papers of the last couple years, those of Ben Handel and Heidi Williams, both use theory in one of the ways above rather than in the direct “let’s use data to test a theoretical model” framework…)

Similar comments apply to theorists’ use of empirical research, of course, but let’s save that for another day. (February 2011 working paper – forthcoming in the JEP)

%d bloggers like this: