Category Archives: Decision Theory

“Das Unsicherheitsmoment in der Wirtlehre,” K. Menger (1934)

Every economist surely knows the St. Petersburg Paradox described by Daniel Bernoulli in 1738 in a paper which can fairly claim to be the first piece of theoretical economics. Consider a casino offering a game of sequential coinflips that pays 2^(n-1) as a payoff if the first heads arrives on the nth flip of the coin. That is, if there is a heads on the first flip, you receive 1. If there is a tails on the first flip, and a heads on the second, you receive 2, and 4 if TTH, and 8 if TTTH, and so on. It is quite immediate that this game has expected payoff of infinity. Yet, Bernoulli points out, no one would pay anywhere near infinity for such a game. Why not? Perhaps they have what we would now call logarithmic utility, in which case I value the gamble at .5*ln(1)+.25*ln(2)+.125*ln(4)+…, a finite sum.

Now, here’s the interesting bit. Karl Menger proved in the 1927 that the standard response to the St. Petersburg paradox is insufficient (note that Karl with a K is the mathematically inclined son and mentor to Morganstern, rather than the relatively qualitative father, Carl, who somewhat undeservingly joined Walras and Jevons on the Mt. Rushmore of Marginal Utility). For instance, if the casino pays out e^(2^n-1) rather than 2^(n-1), then even an agent with logarithmic utility have infinite expected utility from such a gamble. This, nearly 200 years after Bernoulli’s original paper! Indeed, such a construction is possible for any unbounded utility function; let the casino pay out U^-1(2^(n-1)) when the first heads arrives on the nth flip, where U^-1 is inverse utility.

Things are worse, Menger points out. One can construct a thought experiment where, for any finite amount C and an arbitrarily small probability p, there is a bounded utility function where an agent will prefer the gamble to win some finite amount D with probability p to getting a sure thing of C [Sentence edited as suggested in the comments.] So bounding the utility function does not kill off all paradoxes of this type.

The 1927 lecture and its response are discussed in length in Rob Leonard’s “Von Neumann, Morganstern, and the Creation of Game Theory.” Apparently, Oskar Morganstern was at the Vienna Kreis where Menger first presented this result, and was quite taken with it, a fact surely interesting given Morganstern’s later development of expected utility theory. Indeed, one of Machina’s stated aims in his famous paper on EU with the Independence Axiom is providing a way around Menger’s result while salvaging EU analysis. If you are unfamiliar with Machina’s paper, one of the most cited in decision theory in the past 30 years, it may be worthwhile to read the New School HET description of the “fanning out” hypothesis which relates Machina to vN-M expected utility. (Unfortunately, the paper above is both gated, and in German, as the original publication was in the formerly-famous journal Zeitschrift fur Nationalokonomie. The first English translation is in Shubik’s festschrift for Morganstern published in 1967, but I don’t see any online availability.)

“A Bayesian Model of Risk and Uncertainty,” N. al-Najjar & J. Weinstein (2012)

In a Bayesian world with expected utility maximizers, you have a prior belief on the chance that certain events will occur, and you maximize utility subject to those beliefs. But what if you are “uncertain” about what your prior even is? Perhaps you think with 60 percent probability, peace negotiations will commence and there will be a .5 chance of war and a .5 chance of peace, but with 40 percent probability, war is guaranteed to occur. It turns out these types of compound lotteries don’t affect your decision if you’re just making a single choice: simply combine the compound lottery and use that as your prior. In this case, you think war will occur with .6*.5+.4*1=.7 probability. That is, the Bayesian world is great for discussing risk – decisionmaking with concave utility and known distributions – not that useful for talking about one-shot Knightian uncertainty, or decisionmaking when the distributions are not well-known.

al-Najjar and Weinstein show, however, that this logic does not hold when you take multiple decisions that depend on a parameter that is common (or at least correlated) across those decisions. Imagine that a stock has a daily return determined by some IID process which is bought by a risk-averse agent, and imagine that the agent doesn’t have a single prior about that parameter, but rather a prior over the set of possible priors. For instance, as above, with probability .6 you have a .5 chance of a 1 percent increase and a .5 chance of a 1 percent decrease, but with probability .4, a 1 percent increase is assured. Every period, I can update my “prior over priors”. Does the logic about the compound lottery collapsing still hold, or does this uncertainty matter for decisionmaking?

If utility is linear or separable over time, then uncertainty doesn’t matter, but otherwise it does. Why? Call the prior over your priors “uncertainty.” Mathematically, the expected utility is a double integral: the outer integral is over possible priors with respect to your uncertainty, and the inner integral is just standard expected utility over N time periods with respect to each prior currently being summed in the outer integral. In the linear or separable utility case, I can swap the position of the integrals with the summation of utility over time, making the problem equivalent to adding up N one-period decision problems; as before, having priors over your prior when only one decision is being made cannot affect the decision you make, since you can just collapse the compound lottery.

If utility is not linear or separable over time, uncertainty will affect your decision. In particular, with concave utility, you will be uncertainty averse in addition to being risk averse. Al-Najjar and Weinstein use a modified Dirichlet distribution to talk about this more concretely. In particular, assuming a uniform prior-over-priors is actually equivalent to assuming very little uncertainty: the uniform prior-over-priors will respond very slowly to information learned during the first few periods. Alternatively, if you have a lot of uncertainty (a low Dirichlet parameter), your prior-over-priors, and hence your decisions, will change rapidly in the first few periods.

So what’s the use of this model? First, it allows you to talk about dynamic uncertainty without invoking any of the standard explanations for ambiguity – the problems with the ambiguity models are discussed in a well-known 2009 article by the authors of the present paper. If you’re, say, an observer of people’s behavior on the stock market, and see actions in some sectors that suggests purchase variability that far exceeds the known ex-post underlying variability of the asset, you might want to infer that the prior-over-priors exhibited a lot of uncertainty during the time examined; the buyers were not necessarily irrational. In particular, during regime shifts or periods with new financial product introduction, even if the ex-post level of risk does not change, assets may move with much more variance than expected due to the underlying uncertainty. Alternatively, if new assets whose underlying parameters are likely to be subject to much Knightian uncertainty, this model gives you a perfectly Bayesian explanation for why returns on that asset are higher than seems justified given known levels of risk aversion.

December 2011 Working Paper

“Fact, Fiction and Forecast,” N. Goodman (1954)

Fact, Fiction and Forecast is one of the seminal texts of 20th century philosophy: you may know it from the famous “grue/bleen” example. The text deals principally with two problems, the meaning of counterfactuals and a “new riddle” of induction, where the first is essential for any social scientist to understand, and the second has, I think, some interesting implications for decision theory. I will discuss each in turn. My notes are from the 4th edition, including the foreword by the legendary Hilary Putnam.

The first involves counterfactual conditionals, or sentences of the type “If X were true, then Y would obtain” along with the fact that X is not actually true. Counterfactual conditionals are both the focus of a huge number of economics paper (“If the Fed had done X, then GDP would have done Y”, “If deworming had been expanded to 100% in this village, school attendance would have been Y”, etc.). Counterfactuals are also, I would argue, the concept which has been forefront in the minds of the world’s leading philosophers over the past 60 years.

When economists use counterfactuals, I think they are naively trying to say something like “If the world is precisely the same, except that also X is true, then Y would hold.” There are a ton of problems with this. First, if everything in the world is precisely the same, then Not X is true, and since X and Not X are both true, by the principle of explosion, everything is true, including Not Y. So we must mean that everything in the world is precisely the same, except that X holds and Not X does not. Call the counterfactual set of true statements S’. But here we have more problems: S’ may contain a logical inconsistency, in that X may deductively imply some statement Z which is logically incompatible with something in S. Getting around that problem presents even more difficulties; David Lewis has the most famous resolution with his possible worlds logic, but even that is far from unproblematic.

Ignoring this basic problem of what is meant by a counterfactual, it is not well-known among social scientists that counterfactual conditionals are absolutely not strictly defined by their logical content, in the way that standard deductive logic is. That is, consider the statement If A then B, where A is a counterfactual. Let A’ be logically equivalent to A. It is easy to construct an example where you intuitively accept that A implies B, but not that A’ implies B. For instance, let A be “Bill Clinton were the same person as Julius Caesar,” A’ be “Julius Caesar were the same person as Bill Clinton” and B be “Bill Clinton would be emperor of Rome.” Given the importance of counterfactual logic to economics, there is a lot to be gained for our science from a better understanding of the philosophic issues here.

The more interesting point in Goodman for the decision theorist concerns induction. Hume showed in the 18th century why induction is invalid; the validity of induction involves assuming some sort of continuity of nature, and such an assumption is an induction itself. Even probabilistic induction – “The sun has risen every day, so I think it probable the sun will rise tomorrow” – is invalid for the same reason. There are many arguments contra Hume, but I hope you’ll take my word that they have all failed, and that the validity of induction is no longer an open question. That said, the wisdom of induction certainly is. Though we know induction is invalid reasoning, we nonetheless rely on it trivially every day (I get on a bus going north to my office, and not south, on the inductive assumption that my office is still north of my apartment) and less trivially on important policy issues (acceptance of “science” as a valid method for learning truth, rather than reading sacred books, is implicitly an acceptance of the wisdom of induction). What exactly do we mean when we say induction is wise? We mean that, there exist regularities for which the past existence of the regularity is evidence that we should expect the regularity in the future.

What Goodman points out is that the interesting question is not whether induction is valid – it isn’t – but rather what do we mean by a “regularity” anyway? This problem of induction is precisely the same to a problem in counterfactuals. Consider the regularity that every object is my pocket is a coin made of metal. I have investigated this many times, and every object I check is a metal coin. Consider the counterfactual “If I put a piece of chocolate in my pocket” or the induction on objects in my pocket where the only thing in my pocket today is a chocolate. Surely we don’t think we should induct that the chocolate will be a metal coin when I take it from my pocket. Alternatively, consider the regularity that all metal coins conduct electricity. I have investigated this many times also, and every metal coin I check conducts. If I check another coin, I do believe it will conduct. What is the difference between the chocolate example and the coin example? It is that I trust induction when I believe a law holds for some regularity, and do not trust induction when I believe past draws are simply random. The “grue/bleen” example, if you know it, is even stronger: I interpret it to mean that whatever rationale we use to delineate coincidences from regularities depends on more than how we selected instances in the past, or on the type of the property (say, color, or conductivity) we are examining. Goodman proposes some thoughts on how we know what histories are evidence of laws and what aren’t, but the exact delineation remains controversial.

So what does this mean for decision theory? Decision theory is heavily influenced by de Finetti and Savage, and somewhat by Carnap, and less so by other massive philosophy figures in this literature like Ayer, Goodman, Putnam, and Quine. That is, we conceive of the world as having states over which agents have a prior, and evidence changing that prior according to Bayes’ rule. Let Ω(₶) be the state space, where states are a countably infinite product space of potential observations. Let a “lawlike” set of hypotheses be a set of (infinite-length) observations that are compatible with some law, where the nature of possible laws is given exogenously. For instance, a lawlike set might be “all metals conduct” and the state space simply made up of tests of conductivity of various metals in each period plus a draw from the set {0,1}. The nature of the set of possible laws in the prior is that either all metals conduct, or the conductivity properties of various metals is not linked. Imagine in periods 1 and 2 that all metals conduct and we draw a 0 each time, and that in a second possible world, in periods 1 and 2 all metals conduct except copper in period 2, and we draw a 0 each time. What can we conclude as a Savage-style Bayesian? Think about what conditions on the prior are imposed.

There is one further worry for the standard econ model. How we induct in Goodman depends on what predicates we have as potential sources of laws: how ought we set up the state space? If we, say, put 0 prior on the world where all emeralds are grue, and positive prior on the world where all emeralds are green – and the standard model of state space means that we must include both possibilities as states – then we are violating Carnap’s “principle of total evidence” since we rule of grue before even seeing any evidence, and we are violating any of the standard rationales for putting positive probability on all possible states in the prior. (The Google Books preview contains the entire introduction plus the foreword by Putnam, which should give a good taste of the content. Among economists, Itzhak Gilboa seems to have done the most work on expanding Goodman-style ideas to decision theory.)

“Common Knowledge and Equilibria Switching,” N.A. Dalkiran & M. Hoffman (2011)

It is not unusual that, at 2 A.M. on any given Saturday morning, a less-than-forthright gentlemen will ask his acquaintance whether “she would like to come up for some coffee.” To those just learning game theory, there is something strange here. Both parties are fully aware that no coffee will be served at such a late hour. We are both fully capable of translating the innuendo into its real meaning: there is no uncertainty here. But why, then, will nobody just ask for sex? And how is this question related to financial crises?

But perhaps these situations are not that strange. We all know from Rubinstein’s Electronic Mail game (you may know this as the story of the two coordinating generals) that mutual knowledge is not common knowledge. Imagine two generals on different ridges are planning an attack, and the attack will only succeed if both parties get a “good” signal; if either of us draws a bad signal, we know the attack will fail. The generals can communicate with each other by a messenger on horseback, but with probability epsilon close to zero, the messenger falls off his horse and never delivers the message. When I send a horse out, I know my signal and that’s it. When I receive the first horsemen, I know the other general’s signal and my own. When he receives a message back, he knows his signal, he knows my signal, and he knows that I know his signal. And so on. After two horsemen, we both know the other got a good signal, but we do not know that the other person knows we know this. So “almost” common knowledge is not almost at all, since common knowledge requires the “I know that he knows that I know” chain to continue infinitely, and that will happen with probability zero. Similar “contagion” arguments have been explored by many others (writeups on similar papers by Morris, Rob and Shin and Weinstein and Yildiz can be found on this site).

Dalkiran and Hoffman explore a similar question: when do similar tricky issues concerning higher order knowledge lead to “switching” of equilibria? More precisely, consider a two player, two action game, where (A,A) and (B,B) are the only pure strategy Nash equilibria: in other words, a coordination game. Let one equilibrium be a high payoff equilibrium, and the other be a low payoff equilibrium. Let there be a number of states of the world, with each agent endowed with an information partition in the standard way. Does there exist an equilibrium set of strategies where (A,A) is played with probability 1 in at least one state, and (B,B) with probability 1 in another state? That is, what conditions on priors, payoffs and the information partitions allow for equilibrium strategies where the “focal point” varies in different states even when the payoff matrix is not state-dependent. And what that might tell us about “customs” or behavior like the “would you like to come up for a drink” scenario? (Trivially, of course, such an equilibrium exists if we can both identify state 1 and state 2 with probability 1; the interesting situations are those where our knowledge of the current state is imperfect and heterogeneous, though I hope you’ll agree that such a situation is the most natural one!)

The authors provide necessary and sufficient conditions for arbitrary games, but the following example they give works nicely; the exact conditions rely on definitions of evident events and common p-belief and other such technical terms which will be familiar to decision theorists but are a bit too tricky to explain to a general audience in this blog post – if you read this paper and want to know more about those concepts, Aumann’s 2-part “Interactive Epistemology” articles and Larry Samuelson’s 2004 JEL are good places to start.

Imagine one agent (Aygun, in their example) is a bouncer at a whorehouse, and another agent (Moshe – the authors have a footnote explaining that they use their own names in this disreputable example so as not to defame the good name of readers with common game theory named like Ann and Bob!) is an occasional john. Aygun sometimes reads and doesn’t notice who walks in the brothel, and Moshe occasionally looks at the ground and doesn’t notice whether the bouncer sees him. It is a social convention that people should not have close friendships with anyone if it is common knowledge that they attend a brothel. There are then two coordinating equilibria: (A,A) for future close friendships and (B,B) for future weak friendships, which are coordinating in the sense that unequal friendships are worth less than equal friendships for both parties. There are then five states: H, (R,G), (R’,G), (R,G’) and (R’,G’), where H is the state in which Moshe stays home, (R,G) is the state where Moshe goes to the brothel, he looks at the Ground, and Aygun Reads, (R’,G) is the state where Moshe goes to the brothel, he looks at the Ground, and Aygun does not Read, etc. Both Moshe and Aygun have a common prior about the probability of looking at the ground, of staying home, and of reading.

The interesting potential equilibria here is the one where agents play (A,A) in state H and play (B,B) in state (R’,G’), the state where eye contact is made at the brothel. In such an equilibrium, would Moshe do better to avoid eye contact, meaning that (A,A) is the equilibrium strategy in states (R,G) and (R’,G)? Using the main theorem of the paper, a simple sufficiency condition obtains, which essentially says that the interesting equilibria exists if Aygun reads with sufficiently high probability, and that Aygun does not expect Moshe to be at the brothel with sufficiently high probability given that he is reading. If those conditions hold, then when Moshe looks at the ground, he will reason that Aygun is likely to be reading, and since Aygun is likely to be reading, he is likely to believe Moshe is at home, and therefore Moshe expects that Aygun expects that Moshe will play A, hence Moshe expects Aygun will play A, hence Moshe plays A. And Aygun reasons in exactly the same manner, so (A,A) is played in all states where eye contact is not made. But remember what is going on in (R’,G), the state where Aygun is not reading and Moshe is looking at the ground. Aygun knows Moshe is going to the brothel because he sees him, and Moshe of course knows that he himself is going to the brothel. So there is mutual knowledge here, but not common knowledge. And yet moving from mutual to common knowledge will break the “good” payoffs!

Now it goes without saying that in these types of coordination games, there are always equilibria where either (A,A) is played in every state or (B,B) in every state. But to the extent that certain states are associated with certain “focal points”, the ways in which customs or focal points can or can’t change equilibria across states are totally non-trivial in situations where agents have different information partitions. For instance, the authors give an example of the focal point at a traffic light where the color of the light is obscured to the drivers with some probability. They also generate a simple model of a bank run where switching depends on how much we expect other people to be following the news. Given the importance of discontinuous jumps and expectations to the financial world, I don’t doubt that understanding how and why equilibria switch is supremely relevant to understanding how stable or fragile a given financial regime is. Who knew politely asking a girl up to your apartment after a date was so related to the stability of the international financial system! (November 2011 working paper – this paper is the job market paper of N. Aygun Dalkiran, a colleague of mine at Kellogg MEDS. If your department is looking for a good theorist, give him a call!)

“How Demanding is the Revealed Preference Approach to Demand?,” T. Beatty & I. Crawford (2011)

If you’ve read this site at all, you know that I see little value in “testing” economic theories, but if we’re going to do it, we ought at least do it in a way that makes a bit of sense. There are a ton of studies testing whether agents (here meaning not just humans; Chen and coauthors have a series of papers about revealed preference and other forms of maximizing behavior in Capuchin monkeys!) have preferences that can be described by the standard model: a concave, monotonic, continuous utility function that is time-invariant. Generally, the studies do find such maximizing behavior. But this may mean nothing: a theory that is trivially satisfied will never be shown to violate utility maximization, and indeed lots of experiments and empirical datasets see so little variation in prices that nearly any set of choices can be rationalized.

Beatty and Crawford propose a simple fix here. Consider an experiment with only two goods, and two price/income bundles. There is a feasible mixture among those two goods for each bundle. Consider the share of income under each price/income bundle spent on each of the two goods. If, say, 75% of income is spent on Good A under price/income bundle 1, then, for example, utility maximization may be consistent with spending anywhere between 0 and 89% of income on Good A under price/bundle 2. Imagine drawing a square with “income share spent on Good A under price/income bundle 1” on the x-axis, and “income share on A under bundle 2” on the y-axis. Some sets of choices will lie in a part of that square which is incompatible with utility maximization. The greater the proportion of total area which is incompatible with utility maximization, the more restrictive a test of utility maximizing behavior will be. The idea extends in a straightforward way to tests with N goods and M choices.

Beatty and Crawford assume you want a measure of “how well” agents do in a test of revealed preference as a function of both the pass rate (what proportion of the sample does not reject utility maximizing behavior) and the test difficulty (how often a random number generator selecting bundles would pass); if this all sounds like redefining the concept of statistical power, it should. It turns out that r minus a, where r is the pass rate and a is the test difficulty, has some nice axiomatic properties; I’m not totally convinced this part of the paper is that important, so I’ll leave it for you to read. The authors then apply this idea to some Spanish consumption data, where households were tracked for eight quarters. They find that about 96% of households in the sample pass: they show no purchases which violate utility maximizing behavior. But the variation in prices and quarterly income is so minimal that utility maximizing behavior imposes almost no constraints: 91% of random number generators would “pass” given the same variation in prices and incomes.

What do we learn from an exercise like this? There is definitely some benefit: if you want to design experiments concerning revealed preference, the measure in the present paper is useful indeed for helping choose precisely what variation in incomes and prices to use in order to subject revealed preference to a “tough” test. But this assumes you want to test at all. “Science is underdetermined,” they shout from the rooftops! Even if people showed behavior that “rejected” utility maximization, we would surely ask, first, by how much; second, are you sure “budget” and “price” are determined correctly (there is Varian’s error in price measurement, and no one is using lifetime income adjusted for credit constraints when talking about “budgets”); third, are you just rejecting concaveness and not maximizing behavior?; fourth, are there not preference shocks over a two year period, such as my newfound desire to buy diapers after a newborn arrives?; and so on. I think such critiques would be accepted by essentially any economist. Those of the philosophic school that I like to discuss on this site would further note that the model of utility maximization is not necessarily meant to be predictive, that we know it is “wrong” in that clearly people do not always act as if they are maximizers, and that the Max U model is nonetheless useful as a epistemic device for social science researchers. (Final working paper – final version published in AER October 2011)

“How (Not) to Do Decision Theory,” E. Dekel & B. Lipman (2009)

Economics has a very strong methodological paradigm, but economists on the whole are incapable of expressing what it is. And this can get us in trouble. Chris Sims and Tom Sargent have both been shooting around the media echo chamber the last week because they have, by and large, refused to answer questions like “What will happen to the economy?” or “What will be the impact of policy X?” Not having an answer is fine, of course: I’m sure Sims would gladly answer any question about the econometric techniques he pioneered, but not being an expert on the details of policy X, he doesn’t feel it’s his place to give (relatively) uninformed comment on such a policy. Unfortunately, parts of the media take his remarks as an excuse to take potshots at “useless” mathematical formalization and axiomatization. What, then, is the point of our models?

Dekel and Lipman answer this question with respect to the most theoretical of all economics: decision theory. Why should we care that, say, the Savage axioms imply subjective expected utility maximization? We all (aside from Savage, perhaps) agree that the axioms are not always satisfied in real life, not should they necessarily be satisfied on normative grounds. Further, the theory, strictly speaking, makes few if any predictions that the statement “People maximize subjective expected utility” does not.

I leave most of the details of their exposition to the paper, but I found the following very compelling. It concerns Gilboa-Schmeidler preferences. These preferences give a utility function where, in the face of ambiguity about probabilities, agents always assume the worst. Dekel and Lipman:

The importance of knowing we have all the implications is particularly clear when the story of the model is potentially misleading about its predictions. For example, the multiple priors model seems to describe an extraordinarily pessimistic agent. Yet the axioms that characterize behavior in this model do not have this feature. The sufficiency theorem ensures that there is not some unrecognized pessimism requirement.

And this is the point. You might think, seeing only the utility representation, that Gilboa-Schmeidler agents are super pessimistic. This turns out not to be necessary at all – the axioms gives seemingly mild conditions on choice under ambiguity which lead to such seeming pessimism. Understanding this gives us a lot of insight into what might be going on when we see Ellsberg-style pessimism in the face of ambiguity.

My problem with Dekel and Lipman here, though, is that, like almost all economists, they are implicitly infected by the most damaging economics article ever written: Milton Friedman’s 1953 Methodology of Positive Economics. That essay roughly says that the goal of an economic model is not to be true, but to predict within a limited sphere of things we want to predict. Such belief suggests that we can “test” models by checking whether predictions in their given sphere are true. I think both of these concepts are totally contrary both to how we should use models in economics, and how we do use them; if you like appeals to authority, I should note that philosophers of social science are equally dismayed as I am by Friedman ’53.

So how should we judge and use models? My standard is that a model is good if end users of the model find that it helps guide their intuition. You might also say that a model is good if it “subjectively compelling.” Surely prediction of the future is a nice property a model might have, but it is by no means necessary, not does “refuting” the predictions implicit in a model mean the model is worthless. What follows is a list of what I would consider subjectively useful uses of a model, accepting that how you weight these uses is entirely subjective, but keeping in mind that our theory has end users and we ought keep some guess at how the model will be used in mind when we write it:

1) Dealing with unforeseen situations. The vast majority of social situations that could be modeled by an economist will not be so modeled. That is, we don’t even claim to make predictions in essentially every situation. There are situations that are inconceivable at the time a paper is written – who knows what the world will care about in 50 years. Does this mean economics is useless in these unforeseen situations? Of course not. Theoretical models can still be useful: Sandeep Baliga has a post at Cheap Talk today where he gains intuition into Pakistan-US bargaining from a Stiglitz-Shapiro model of equilibrium unemployment. The thought experiments, the why of the model, are as relevant, if not more relevant, than the consequence/prediction/etc. of the model. Indeed, look at the introduction – often a summary of results – of your favorite theory paper. Rarely are the theorems stated alone. Instead, the theory and the basic intuition behind the proof are usually given. If we knew a theorem to be true given its assumptions, but the proof was in a black box, the paper would be judged much less compelling by essentially all economists, even though such a paper could “predict” equally well as a paper with proofs.

2) Justifying identification restrictions and other unfalsifiable assumptions in empirical work. Sometimes these are trivial and do not to be formally modeled. Sometimes less so: I have a old note which I’m mentioned here a few times that gives an example from health care. A paper found that hospital report cards that were mandated at a subset of hospitals and otherwise voluntary were totally ineffective in changing patient or hospital behavior. A simple game theoretic model (well known in reputational games) shows that such effects are discontinuous: I need a sufficiently large number of patients to pay attention to the report cards before I (discontinuously) begin to see real effects. Such theoretical intuition guides the choice of empirical model in many, many cases.

3) Counterfactual analysis. By assumption, no “predictions” can or will ever be checked in counterfactual worlds. Ccounterfactual analysis is the basis of a ton of policy work. Even if you care about predictions, somehow defined, on a counterfactual space, surely we agree that such predictions cannot be tested. Which brings us to…

4) Model selection. Even within the class of purely predictive theories, it is trivial to create theories which “overfit” the past such that they match past data perfectly. How do I choose among the infinitely large class of models which predict all data thus seen perfectly? “Intuition” is the only reasonable answer: the explanations in Model A are more compelling than in Model B. And good economic models can help guide this intuition in future papers. Quine-Duhem Thesis is relevant here as well: when a model I have is “refuted” by new data, what was wrong with the explanation proposed? Quine-Duhem essentially says there is no procedure that will answer that question. (I only write this because there are some Popperians left in Economics, despite the fact that every philosopher of science after Popper has pointed out how ridiculous his model of science should work is: it says nothing about prediction in a stochastic world, it says nothing about how to select what questions to work on, etc.)

Obviously these aren’t the only non-predictive uses of theory – theory helps tie the literature together letting economics as a science progress rather than stand as a series of independent papers; theory can serve to check qualitative intuition, since many seemingly obvious arguments turn out to much less obvious when written down formally (more on this point in Dekel and Lipman). Nonetheless they are enough, I hope, to make the point that prediction is but one goal among many in good social science modeling. I think the Friedman idea about methodology would be long gone in economics if graduate training required the type of methodology/philosophy course, taught by faculty well read in philosophical issues, that every other social and policy science requires. Would that it were so! (2009 Working Paper; final version in the 2010 Annual Review of Economics)

“On the Evolution of Attitudes Toward Risk in Winner-Take-All Games,” E. Dekel & S. Scotchmer (1999)

How about a couple of posts about evolution of preferences? Informal evolutionary arguments are everywhere in economics. People will do X in a market because if they didn’t, they would lose money and be forced out. Firms will profit maximize because if they don’t, they will be selected away. Many of these informal arguments are probably wrong: rare is the replicator dynamic with random matching that gives a trivial outcome! But they are important. If I have one heterodox crusade, it’s to get profit maximization by firms replaced by selection arguments: if you think firms in some sense luck into optimal pricing, or quantity setting, or marketing, rather than always minimizing costs, then you will be much more hesitant to support policies like patents that lead to monopoly power. I heard second-hand that a famed micro professor used to teach that he was more worried about the “big rectangles” of efficiency loss when monopolies don’t cost minimize than the “small triangles” of deadweight loss; the irony is that when I heard the story, the worried professor was Harberger of the Harberger Triangle himself!

But back to the Dekel and Scotchmer paper. The question here is whether, in a winner take all world, preference for risk will come to dominate. This is an informal argument both for what people will do in general situations (men, in particular, take a lot of risks and there are casual evobiology arguments that this is a result of winner-take-all mating in our distant past) and for what firms will survive situations like a patent race. This makes intuitive sense: if only the best of group survive to the next generation, and we can choose the random variable that represents our skill, we should choose one with high variance. What could be wrong with that argument?

Quite a bit, it turns out. I use “men” from now on to mean whatever agent is being selected in winner take all contests each generation. Each man is genetically programmed to choose some lottery from a finite set. In each period, groups of size m meet. Each man realizes an outcome from his lottery, and the highest outcome “wins” and reproduces in the next period. Here’s the trick. If a distribution (call it F) FOSD another distribution, then it is “favored,” meaning that measure of distribution F players will be higher next period. But risk loving behavior has to do with second order stochastic dominance; distributions that are second order stochastically dominated are more risky. And here the ranking is much less straightforward. Consider groups of size 2. Let F give 1 with probability 1. Let G give 1/4 with probability 2/3 and give 2.5 with probability 1/3. F SOSD G – F and G have the same mean, while G in a specific sense has more “spread” – but F is also favored in evolution over G.

The intuition of that example is that increasing risk in a way that just expands the tails is not very useful: in a contest, winning by epsilon is just as good as winning by a million. So you might imagine that some condition on the possible tail distributions is necessary to make risk loving evolutionarily dominant. And indeed there is. This condition requires the group size to be sufficiently large, though, so if the contests are played in small groups, even restricting the possible lotteries may not be enough to make risk loving dominate over time.

What if everybody plays in a contest against everybody else? Without mutations, this game will end in one period (whichever type draws the highest number in the one period the game is played with make it to the next generation). Adding a small number of mutations in the normal way allows us to examine this scenario, though. And surprisingly, it’s even harder to get risk loving behavior to dominate than in the cases where contests were in small groups. The authors give an example where a distribution first order stochastically dominates and yet is still not successful. The exact condition needed for SOSD to be linked to evolutionary success when contests are played among the whole population turns out to be a strengthening of the tail condition described above.

I don’t know that there’s a moral about evolution here, but there certainly is a good warning against believe informal evolutionary arguments! More on this point in tomorrow’s post, on a new and related working paper. (Final JET version; big thumbs up to Suzanne Scotchmer for putting final, published versions of her papers online.)

“Apparent Overconfidence,” J.-P. Benoit & J. Dubra (2011)

Here’s one of those papers that makes you wonder, why didn’t I think of that? Particularly now that it’s been published in Econometrica! Benoit and Dubra noticed that many, many psychological and experimental papers make statements like “More than 50% of the subjects believed their skill at X better than the median” and then try to explain such irrational overconfidence. Further, a handful of papers have noted that for some difficult and rare tasks (like living to 100), people often underrate their chances of being successful. Irrational underconfidence, perhaps?

Not necessarily. While it is true that only, by definition, fifty percent of people can be better than the median, it is not true that only fifty percent of people can rationally believe themselves better than the median. Here’s a quick example from the paper. In 1990, there was about 1 car crash for every 10 young people in the US. Assume car crashes follow some sort of 80-20 rule of thumb, where 80 percent of crashes are caused by 20 percent of the population. That implies “good” drivers have a 2.5% chance, and bad drivers at 40% chance, of being in an accident in any given year. Assume no one knows when they start to drive whether they are good or bad, but simply Bayesian updates depending on whether they crashed in a given year or not. Working the numbers out for three years, 79% of young drivers will have beliefs about themselves that first order stochastically dominate the population distribution. And they will have these beliefs rationally!

In particular, given survey answers about how we compare to the median, or what decile of the distribution we think we are in, which answers are firm evidence of irrationality, and which can be explained by Bayesian updating the population distribution based on events? Benoit and Dubra construct these bounds, and call the latter explanation median-rationalizing. They show that “rare success” population distributions can lead to underconfidence rationally. They then give examples from psychological studies. A Swedish study of driver confidence which asked drivers to rate the decile in which they believe their driving to fall is median-rationalizable even though only 5.7% of drivers put themselves in the bottom 30 percent of the distribution. A similar study in America found 46% of drivers putting themselves in the top 20%, which is not median-rationalizable (and thus is evidence of overconfidence), since the upper bound on the number of drivers who can believe they are in the top two deciles is only 40%.

I also really like the conclusion. The authors are not claiming that the Swedish data shows no evidence of overconfidence and the American data does. Rather, they are providing “a proper framework with which to analyze” the data. In that framework, the Swedish data may not be evidence of overconfidence. Complaints that the approach in the present paper is nonsense because, for instance, individuals do not use Bayes’ rule are insufficient. If you buy that argument, then the psychological papers may just be evidence that people are bad at math, not that they are overconfident.

(One last note: as is the case with 99% of economics papers published today, this one is too long. I imagine that Samuelson would have written this in 10 pages, proofs included. Would that editors become firmer with their chopping blocks!) (Final Econometrica version – big thumbs up to Juan Dubra for putting final published versions of his papers on his personal website!)

“Why is Language Vague?,” B. Lipman (2009)

Bart Lipman, a great micro theorist back at my alma mater Boston University, has argued that the lack of interest among economists in language is puzzling. We write volumes on signaling games, on contracts and their incompleteness, on game theoretical message spaces, yet in the world we are describing, all of these things are expressed in natural language. The present paper has to do with vagueness, where vagueness is a statement describing a set that is not well-defined, and precision is the opposite. That is, “obese” meaning “BMI of 30+” is precise (intervals can be precise!), whereas “large” is not. Lipman’s paper has been floating around for at least a decade and is still unpublished, despite a lot of interesting ideas. Perhaps the fantastic abstract has something to do with it. “Abstract: I don’t know.”

I have three points about vagueness that are worth keeping in mind before discussing Lipman’s result. First, vagueness is not a property of language itself. That is, we generally choose to use vague terms like “tall” or “red” even though the English language allows us to say “6 foot 5.2 inches” or to define a color in terms of CMYK. Second, vagueness is often determined by who is speaking to whom. Two NBA scouts will describe a potential signee as “7 foot 2” whereas you or I would just say he is tall. We would probably just say he is tall even if we knew his actual height in inches. Third, vagueness is not solely determined by the identity of the speaker and listener, but by the context. Two graphic designers working on a project may describe precisely the color they wish to use, but when out for a walk may simply say that the sunset was a lovely red that evening.

Lipman imagines the following model. I have a message space of some set cardinality, say 2. I observe a signal h from a distribution H (say, someone’s height). My observation may be a point or a subjective distribution, so we are not assuming that height is precisely known by the speaker. The speaker then chooses a message from the message space, and sends it to the listener. The listener then takes action conditional on the message. For example, the speaker may need the listener to pick up a friend at the airport, and want to describe the friend’s height. Both players have the same utility function, so there is no conflict of interest, though see Blume and Board (2009) for an interesting discussion of “strategic” vagueness.

In this story, a vague statement is just a mixed strategy conditional on the signal. For example, imagine the message space is “tall” and “short”. A vague language has the speaker say “tall” when the friend is above six feet, say “short” if below five foot eight, and play a mixed strategy if the height is inbetween. A precise strategy picks some cutoff, says “tall” is above the cutoff, and “short” otherwise. A one-line proof shows that the precise language always gives higher utility to both players than the vague language.

So what explains vagueness when interests are aligned? It’s not a matter of using a more limited vocabulary: in the example above, “tall” and “short” are the only words in both the vague or the precise cases. It’s also not a matter of context-dependent flexibility. In both the vague and precise cases, we still need some sense of what tall means when referring to coffee and what tall means when referring to NBA players. It’s not even a matter of the impossibility of precision: the phrase “tall” can precisely refer to an interval, or precisely refer to a distribution; in any case, the first and third uses of vagueness I mentioned above seem to mitigate against the idea that vagueness simply happens because we can’t measure or speak precisely.

There are a few better stories. First, a lot of vagueness is not really vague as the computer scientist Kees van Deemter, among others, has pointed out. When we say exercise is “good for young and old”, the phrases “young” and “old” are vague in and of themselves, but as a whole, the phrase precisely means “everybody”. Nouns affect the meaning of adjectives. A second response is that people use vague speech because they have a vague understanding of the world; that is, people do not actually form, say, a Savage-style subjective probability distribution over the height of who they are talking about. This is roughly Lipman’s best stab at an explanation, but given that the same people often vary between vague and precise language, I don’t find it terribly convincing. A better reason, also due to van Deemter and a game theorist named Rohit Parikh, is that vagueness can actually help search. Imagine asking someone to grab a blue book for you, and imagine that we have slight perceptual differences in how we see color. If blue is precisely defined, then your friend will first look through your blue books (as he perceives them), and if he does not see the book you want, will have to search through the rest of your collection at random. If blue is vaguely defined, your friend will first look through all the books he considers “bluish”, and only after doing that will search the rest of your collection. When there is a sufficient lack of overlap in our conceptions of blue, the vague search will be quicker.

So where to go from here? The intuition about “bluishness” is very tough to incorporate into a learning about the state space model of knowledge – somehow we need to incorporate fuzzy logic, perhaps. A lot of useful results can come from this, though. Applying the above idea to, for example, communication within firms, I think you can learn a lot about why certain types of communication are used at different times. I have the beginnings of a paper along these lines, but any comments about modeling vagueness are vastly appreciated. (Nov. 2009 working paper. Much of the discussion here is informed by a great followup by a computer scientist named Kees van Deemter who has written a lot of vague speech. His paper is called “Utility and Language Generation: The Case of Vagueness”. Ariel Rubinstein has also written an interesting (and free to download!) book called Economics and Language; it is worth looking through, but I wouldn’t say it’s Rubinstein’s best work. Admittedly, matching the quality of Rubinstein’s best is difficult for anyone, even Rubinstein himself!

“Akrasia, Instincts and Revealed Preferences,” A. Sandroni (2011)

Akrasia is a Greek term referring to choices made that are not in the best interest of the person making them. Standard utility theory does not accommodate such choices, of course – indeed, we often go back and “read” underlying preferences from WARP-satisfying choices. There has been a ton of work on preferences that allow choice cycling (A>B,B>C and C>A) or other violations of WARP published in (relatively) recently: Gul and Pesendorfer on temptation, of course, but also the “menu choice” papers leading up to Rubinstein and Salant’s “Choice from Lists” in TE 2006. Unlike most of those papers, Sandroni considers akratic choice only in a static, not a dynamic, context, and shows that with some assumptions it is possible to identify non-akratic choice even in the static context. Further, one can identify an agent whose preferences deviate from this akratic model using only choice data.

Now this sounds like some sort of trick: if non-akratic choice is choice made in accordance with an agent’s preferences, and akratic choice are choices violating those inherent preferences, how am I able to look at choice data and refute this model? The trick is the following. Let non-akratic choice follow some strict preference relation R that represents “deep thought” preferences. Let akratic choice follow from some other strict preference relation S that represents “instinct”. Assume that instinctual preferences satisfy WARP. Now give a list from the power set of all choice bundles which specifies which bundles trigger instinctive choice; this can be arbitrary (Correction: the list is a subset of the set of alternatives, not of the power set). If that list is the null set, then we are in the standard classical utility model since all choice is non-akratic. The researcher can see only final choice made; that is, I see the choice made using R if the choice is made from a set of alternatives not on the instinctive list, and otherwise, I see the choice made using S. In either case, I don’t know what that instinctive list contains. Choice is said to be revealed akratic if there is no preference relation R which can explain the observed choices.

Let a super-issue B* be a set of alternatives such that another set of alternatives (a sub-issue) B is a subset of B*. Sandroni shows that, within the instinct/reflection paradigm, choice C(B) is revealed akratic iff there is a super-issue B* of B such that C(B*) and C(B) together violate WARP. Further, choice function C is within the instinct/reflection paradigm iff given two pairs of nested issues (B1,B1*) and (B2,B2*), each of which evidence violations of WARP, there is some preference relation that resolves B1, B2 and the union of B1 and B2 in a consistent way. The proofs are fairly tricky, though by decision theory standards you might consider them basic!

This is an interesting result: essentially, Sandroni constructs a model of choice that allows for multiple selves where the “self” making the decision is exogenously determined by the choice being considered, rather than endogenously determined as in a temptation-avoidance model. Indeed, such a model has the nice property of being falsifiable in the same manner as standard choice theory.

I don’t see this mentioned in the paper, but my intuition is that if the agent has 3 (or more) separate preference relations depending on the choice set, the nice results in this paper will fall apart. Essentially, set inclusion is driving the representation theorem, and we need “larger” and “smaller” sets with WARP-violating anomalies. I would have liked to see a more direct connection to the choice under frames literature as well – I can’t really make the theoretical connection at first glance, though it must be fairly straightforward. (Final version in forthcoming issue of Synthese – GATED COPY, I’m afraid. I don’t see a working paper version online.)

%d bloggers like this: