Category Archives: Decision Theory

“The Axiomatic Structure of Empirical Content,” C. Chambers, F. Echenique & E. Shmaya (2013)

Here’s a particularly interesting article at the intersection of philosophy of science and economic theory. Economic theorists have, for much of the twentieth century, linked high theory to observable data using the technique of axiomatization. Many axiomatizations operate by proving that if an agent has such-and-such behavioral properties, their observed actions will encompass certain other properties, and vice versa. For example, demand functions over convex budget sets satisfy the strong axiom of revealed preference if and only if they are generated by the usual restrictions on preference.

You may wonder, however: to what extent is the axiomatization interesting when you care about falsification (not that you should care, necessarily, but if you did)? Note first that we only observe partial data about the world. I can observe that you choose apples when apples and oranges are available (A>=B or B>=A, perhaps strictly if I offer you a bit of money as well) but not whether you prefer apples or bananas when those are the only two options. This shows that a theory may be falsifiable in principle (I may observe that you prefer strictly A to B, B to C and C to A, violating transitivity, falsifying rational preferences) yet still make nonfalsifiable statements (rational preferences also require completeness, yet with only partial data, I can’t observe that you either weakly prefer apples to bananas, or weakly prefer bananas to apples).

Note something interesting here, if you know your Popper. The theory of rational preferences (complete and transitive, with strict preferences defined as the strict part of the >= relation) is universal in Popper’s sense: these axioms can be written using the “for all” quantifier only. So universality under partial observation cannot be all we mean if we wish to consider only the empirical content of a theory. And partial observability is yet harsher on Popper. Consider the classic falsifiable statement, “All swans are white.” If I can in principle only observe a subset of all of the swans in the world, then that statement is not, in fact, falsifiable, since any of the unobserved swans may actually be black.

What Chambers et al do is show that you can take any theory (a set of data generating processes which can be examined with your empirical data) and reduce it to stricter and stricter theories, in the sense that any data which would reject the original theory still reject the restricted theory. The strongest restriction has the following property: every axiom is UNCAF, meaning it can be written using only “for all” operators which negate a conjunction of atomic formulas. So “for all swans s, the swan is white” is not UNCAF (since it lacks a negation). In economics, the strict preference transitivity axiom “for all x,y,z, not x>y and y>z and z>x” is UNCAF and the completeness axiom “for all x,y, x>=y or y>=x” is not, since it is an “or” statement and cannot be reduced to the negation of a conjunction. It is straightforward to extend this to checking for empirical content relative to a technical axiom like continuity.

Proving this result requires some technical complexity, but the result itself is very easy to use for consumers and creators of axiomatizations. Very nice. The authors also note that Samuelson, in his rejoinder to Friedman’s awful ’53 methodology paper, more or less got things right. Friedman claimed that the truth of axioms is not terribly important. Samuelson pointed out that either all of a theory can falsified, in which case since the axioms themselves are always implied by a theory Friedman’s arguments are in trouble, or the theory makes some non-falsifiable claims, in which case attempts to test the theory as a whole are uninformative. Either way, if you care about predictive theories, you ought choose those the weakest theory that generates some given empirical content. In Chambers et al’s result, this means you better be choosing theories whose axioms are UNCAF with respect to technical assumptions. (And of course, if you are writing a theory for explanation, or lucidity, or simplicity, or whatever non-predictive goal you have in mind, continue not to worry about any of this!)

Dec 2012 Working Paper (no IDEAS version).

Paul Samuelson’s Contributions to Welfare Economics, K. Arrow (1983)

I happened to come across a copy of a book entitled “Paul Samuelson and Modern Economic Theory” when browsing the library stacks recently. Clear evidence of his incredible breadth are in the section titles: Arrow writes about his work on social welfare, Houthhaker on consumption theory, Patinkin on money, Tobin on fiscal policy, Merton on financial economics, and so on. Arrow’s chapter on welfare economics was particularly interesting. This book comes from the early 80s, which is roughly the end of social welfare as a major field of study in economics. I was never totally clear on the reason for this – is it simply that Arrow’s Possibility Theorem, Sen’s Liberal Paradox, and the Gibbard-Satterthwaite Theorem were so devastating to any hope of “general” social choice rules?

In any case, social welfare is today little studied, but Arrow mentions a number of interesting results which really ought be better known. Bergson-Samuelson, conceived when the two were in graduate school together, is rightfully famous. After a long interlude of confused utilitarianism, Pareto had us all convinced that we should dismiss cardinal utility and interpersonal utility comparisons. This seems to suggest that all we can say about social welfare is that we should select a Pareto-optimal state. Bergson and Samuelson were unhappy with this – we suggest individuals should have preferences which represent an order (complete and transitive) over states, and the old utilitarians had a rule which imposed a real number for society’s value of any state (hence an order). Being able to order states from a social point of view seems necessary if we are to make decisions. Some attempts to extend Pareto did not give us an order. (Why is an order important? Arrow does not discuss this, but consider earlier attempts at extending Pareto like Kaldor-Hicks efficiency: going from state s to state s’ is KH-efficient if there exist ex-post transfers under which the change is Paretian. Let person a value the bundle (1,1)>(2,0)>(1,0)>all else, and person b value the bundle (1,1)>(0,2)>(0,1)>all else. In state s, person a is allocated (2,0) and person b (0,1). In state s’, person a is allocated (1,0) and person b is allocated (0,2). Note that going from s to s’ is a Kaldor-Hicks improvement, but going from s’ to s is also a Kaldor-Hicks improvement!)

Bergson and Samuelson wanted to respect individual preferences – society can’t prefer s to s’ if s’ is a Pareto improvement on s in the individual preference relations. Take the relation RU. We will say that sRUs’ if all individuals weakly prefer s to s’. Not that though RU is not complete, it is transitive. Here’s the great, and non-obvious, trick. The Polish mathematician Szpilrajn has a great 1930 theorem which says that if R is a transitive relation, then there exists a complete relation R2 which extends R; that is, if sRs’ then sR2s’, plus we complete the relation by adding some more elements. This is not a terribly easy proof, it turns out. That is, there exists social welfare orders which are entirely ordinal and which respect Pareto dominance. Of course, there may be lots of them, and which you pick is a problem of philosophy more than economics, but they exist nonetheless. Note why Arrow’s theorem doesn’t apply: we are starting with given sets of preferences and constructing a social preference, rather than attempting to find a rule that maps any individual preferences into a social rule. There have been many papers arguing that this difference doesn’t matter, so all I can say is that Arrow himself, in this very essay, accepts that difference completely. (One more sidenote here: if you wish to start with individual utility functions, we can still do everything in an ordinal way. It is not obvious that every indifference map can be mapped to a utility function, and not even true without some type of continuity assumption, especially if we want the utility functions to themselves be continuous. A nice proof of how we can do so using a trick from probability theory is in Neuefeind’s 1972 paper, which was followed up in more generality by Mount and Reiter here at MEDS then by Chichilnisky in a series of papers. Now just sum up these mapped individual utilities, and I have a Paretian social utility function which was constructed entirely in an ordinal fashion.)

Now, this Bergson-Samuelson seems pretty unusable. What do we learn that we don’t know from a naive Pareto property? Here are two great insights. First, choose any social welfare function from the set we have constructed above. Let individuals have non-identical utility functions. In general, there is no social welfare function which is maximized by always keeping every individual’s income identical in all states of the world! The proof of this is very easy if we use Harsanyi’s extension of Bergson-Samuelson: if agents are Expected Utility maximizers, than any B-S social welfare function can be written as the weighted linear combination of individual utility functions. As relative prices or the social production possibilities frontier changes, the weights are constant, but the individual marginal utilities are (generically) not. Hence if it was socially optimal to endow everybody with equal income before the relative price change, it (generically) is not later, no matter which Pareto-respecting measure of social welfare your society chooses to use! That is, I think, an astounding result for naive egalitarianism.

Here’s a second one. Surely any good economist knows policies should be evaluated according to cost-benefit analysis. If, for instance, the summed willingness-to-pay for a public good exceeds the cost of the public good, then society should buy it. When, however, does a B-S social welfare function allow us to make such an inference? Generically, such an inference is only possible if the distribution of income is itself socially optimal, since willingness-to-pay depends on the individual budget constraints. Indeed, even if demand estimation or survey evidence suggests that there is very little willingness-to-pay for a public good, society may wish to purchase the good. This is true even if the underlying basis for choosing the particular social welfare function we use has nothing at all to do with equity, and further since the B-S social welfare function respects individual preferences via the Paretian criterion, the reason we build the public good also has nothing to do with paternalism. Results of this type are just absolutely fundamental to policy analysis, and are not at all made irrelevant by the impossibility results which followed Arrow’s theorem.

This is a book chapter, so I’m afraid I don’t have an online version. The book is here. Arrow is amazingly still publishing at the age of 91; he had an interesting article with the underrated Partha Dasgupta in the EJ a couple years back. People claim that relative consumption a la Veblen matters in surveys. Yet it is hard to find such effects in the data. Why is this? Assume I wish to keep up with the Joneses when I move to a richer place. If I increase consumption today, I am decreasing savings, which decreases consumption even more tomorrow. How my desire to change consumption today if I have richer peers then depends on that dynamic tradeoff, which Arrow and Dasgupta completely characterize.

“The Meaning of Utility Measurement,” A. Alchian (1953)

Armen Alchian, one of the dons from UCLA’s glory days, passed away today at 98. His is, for me, a difficult legacy to interpret. On the one hand, Alchian-Demsetz 1972 is among the most famous economics papers ever written, and it can fairly be considered the precursor to mechanism design, the most important new idea in economics in the past 50 years. People produce more by working together. It is difficult to know who shirks when we work as a team. A firm gives a residual claimant (an owner) who then has an incentive to monitor shirking, and as only one person needs to monitor the shirking, this is much less costly than a market where each member of the team production would need somehow to monitor whether other parts of the team shirk. Firms are deluded if they think that they can order their labor inputs to do whatever they want – agency problems exist both within and outside the firm. Such an agency theory of the firm is very modern indeed. That said, surely this can’t explain things like horizontally integrated firms, with different divisions producing wholly different products (or, really, any firm behavior where output is a separable function of each input in the firm).

Alchian’s other super famous work is his 1950 paper on evolution and the firm. As Friedman would later argue, Alchian suggested that we are justified treating firms as if they are profit maximizers when we do our analyses since the nature of competition means that non-profit maximizing firms will disappear in the long run. I am a Nelson/Winter fan, so of course I like the second half of the argument, but if I want to suggest that firms partially seek opportunities and partially are driven out by selection (one bit Lamarck, one bit Darwin), then why not just drop the profit maximization axiom altogether and try to write a parsimonious description of firm behavior which doesn’t rely on such maximization?

It turns out that if you do the math, profit maximization is not generally equivalent to selection. Using an example from Sandroni 2000, take two firms. There are two equally likely states of nature, Good and Bad. There are two things a firm can do, the risky one, which returns profit 3 in good states and 0 in bad states, and a risk-free one, which always returns 1. Maximizing expected profit means always investing all capital in the risky state, hence eventually going bankrupt. A firm who doesn’t profit maximize (say, it has incorrect beliefs and thinks we are always in the Bad state, hence always takes the risk-free action) can survive. This example is far too simple to be of much worth, but it does at least remind us of lesson in the St. Petersburg paradox: expected value maximization and survival have very little to do with each other.

More interesting is the case with random profits, as in Radner and Dutta 2003. Firms invest their capital stock, choosing some mean-variance profits pair as a function of capital stock. The owner can, instead of reinvesting profits into the capital stock, pay out to herself or investors. If the marginal utility of a dollar of capital stock falls below a dollar, the profit-maximizing owner will not reinvest that money. But a run of (random) losses can drive the firm to bankruptcy, and does so eventually with certainty. A non-profit maximizing firm may just take the lowest variance earnings in every period, pay out to investors a fraction of the capital stock exactly equal to the minimum earnings that period, and hence live forever. But why would investors ever invest in such a firm? If investment demand is bounded, for example, and there are many non profit-maximizing firms from the start, it is not the highest rate of return but the marginal rate of return which determines the market interest rate paid to investors. A non profit-maximizer that can pay out to investors at least that much will survive, and all the profit maximizers will eventually fail.

The paper in the title of this post is much simpler: it is merely a very readable description of von Neumann expected utility, when utility can be associated with a number and when it cannot, and the possibility of interpersonal utility comparison. Alchian, it is said, was a very good teacher, and from this article, I believe it. What’s great is the timing: 1953. That’s one year before Savage’s theory, the most beautiful in all of economics. Given that Alchian was associated with RAND, where Savage was fairly often, I imagine he must have known at least some of the rudiments of Savage’s subjective theory, though nothing appears in this particular article. 1953 is also two years before Herbert Simon’s behavioral theory. When describing the vN-M axioms, Alchian gives situations which might contradict each, except for the first, a complete and transitive order over bundles of goods, an assumption which is consistent with all but “totally unreasonable behavior”!

1953 AER final version (No IDEAS version).

“Until the Bitter End: On Prospect Theory in a Dynamic Context,” S. Ebert & P. Strack (2012)

Let’s kick off job market season with an interesting paper by Sebastian Ebert, a post-doc at Bonn, and Philipp Strack, who is on the job market from Bonn (though this doesn’t appear to be his main job market paper). The paper concerns the implications of Tversky and Kahneman’s prospect theory is its 1992 form. This form of utility is nothing obscure: the 1992 paper has over 5,000 citations, and the original prospect theory paper has substantially more. Roughly, cumulative prospect theory (CPT) says that agents have utility which is concave above a reference point, convex below it, with big losses and gains that occur with small probability weighed particularly heavily. Such loss aversion is thought to explain, for example, the simultaneous existence of insurance and gambling, or the difference in willingness to pay for objects you possess versus objects you don’t possess.

As Machina, among others, pointed out a couple decades ago, once you leave expected utility, you are definitely going to be writing down preferences that generate strange behavior at least somewhere. This is a direct result of Savage’s theorem. If you are not an EU-maximizer, then you are violating at least one of Savage’s axioms, and those axioms in their totality are proven to avoid many types of behavior that we find normatively unappealing such as falling for the sunk cost fallacy. Ebert and Strack write down a really general version of CPT, even more general than the rough definition I gave above. They then note that loss aversion means I can always construct a right-skewed gamble with negative expected payout that the loss averse agent will accept. Why? Agents like big gains that occur with small probability. Right-skew the gamble so that a big gain occurs with a tiny amount of probability, and otherwise the agent loses a tiny amount. An agent with CPT preferences will accept this gamble. Such a gamble exists at any wealth level, no matter what the reference point. Likewise, there is a left-skewed, positive expected payoff gamble that is rejected at any wealth level.

If you take a theory-free definition of risk aversion to mean “Risk-averse agents never accept gambles with zero expected payoff” and “Risk-loving agents always accept a risk with zero expected payoff”, then the theorem in the previous paragraph means that CPT agents are neither risk-averse, nor risk-loving, at any wealth level. This is interesting because a naive description of the loss averse utility function is that CPT agents are “risk-averse above the reference point, and risk-loving below it”. But the fact that small probability events are given more weight, in Ebert and Strack’s words, dominates whatever curvature the utility function possesses when it comes to some types of gambles.

So what does this mean, then? Let’s take CPT agents into a dynamic framework, and let them be naive about their time inconsistency (since they are non EU-maximizers, they will be time inconsistent). Bring them to a casino where a random variable moves with negative drift. Give them an endowment of money and any reference point. The CPT agent gambles at any time t as long as she has some strategy which (naively) increases her CPT utility. By the skewness result above, we know she can, at the very least, gamble a very small amount, plan to stop if I lose, and plan to keep gambling if I win. There is always such a bet. If I do lose, then tomorrow I will bet again, since there is a gamble with positive expected utility gain no matter my wealth level. Since the process has negative drift, I will continue gambling until I go bankrupt. This result isn’t relying on any strange properties of continuous time or infinite state spaces; the authors construct an example on a 37-number roulette wheel using the original parameterization of Kahneman and Tversky which has the CPT agent bet all the way to bankruptcy.

What do we learn? Two things. First, a lot of what is supposedly explained by prospect theory may, in fact, be explained by the skewness preference which the heavy weighting on low probability events in CPT, a fact mentioned by a number of papers the authors cite. Second, not to go all Burke on you, but when dealing with qualitative models, we have good reason to stick to the orthodoxy in many cases. The logical consequences of orthodox models will generally have been explored in great depth. The logical consequences of alternatives will not have been explored in the same way. All of our models of dynamic utility are problematic: expected utility falls in the Rabin critique, ambiguity aversion implies sunk cost fallacies, and prospect theory is vulnerable in the ways described here. But any theory which has been used for a long time will have its flaws shown more visibly than newer, alternative theories. We shouldn’t mistake the lack of visible flaws for their lack more generally.

SSRN Feb. 2012 working paper (no IDEAS version).

“Das Unsicherheitsmoment in der Wirtlehre,” K. Menger (1934)

Every economist surely knows the St. Petersburg Paradox described by Daniel Bernoulli in 1738 in a paper which can fairly claim to be the first piece of theoretical economics. Consider a casino offering a game of sequential coinflips that pays 2^(n-1) as a payoff if the first heads arrives on the nth flip of the coin. That is, if there is a heads on the first flip, you receive 1. If there is a tails on the first flip, and a heads on the second, you receive 2, and 4 if TTH, and 8 if TTTH, and so on. It is quite immediate that this game has expected payoff of infinity. Yet, Bernoulli points out, no one would pay anywhere near infinity for such a game. Why not? Perhaps they have what we would now call logarithmic utility, in which case I value the gamble at .5*ln(1)+.25*ln(2)+.125*ln(4)+…, a finite sum.

Now, here’s the interesting bit. Karl Menger proved in the 1927 that the standard response to the St. Petersburg paradox is insufficient (note that Karl with a K is the mathematically inclined son and mentor to Morganstern, rather than the relatively qualitative father, Carl, who somewhat undeservingly joined Walras and Jevons on the Mt. Rushmore of Marginal Utility). For instance, if the casino pays out e^(2^n-1) rather than 2^(n-1), then even an agent with logarithmic utility have infinite expected utility from such a gamble. This, nearly 200 years after Bernoulli’s original paper! Indeed, such a construction is possible for any unbounded utility function; let the casino pay out U^-1(2^(n-1)) when the first heads arrives on the nth flip, where U^-1 is inverse utility.

Things are worse, Menger points out. One can construct a thought experiment where, for any finite amount C and an arbitrarily small probability p, there is a bounded utility function where an agent will prefer the gamble to win some finite amount D with probability p to getting a sure thing of C [Sentence edited as suggested in the comments.] So bounding the utility function does not kill off all paradoxes of this type.

The 1927 lecture and its response are discussed in length in Rob Leonard’s “Von Neumann, Morganstern, and the Creation of Game Theory.” Apparently, Oskar Morganstern was at the Vienna Kreis where Menger first presented this result, and was quite taken with it, a fact surely interesting given Morganstern’s later development of expected utility theory. Indeed, one of Machina’s stated aims in his famous paper on EU with the Independence Axiom is providing a way around Menger’s result while salvaging EU analysis. If you are unfamiliar with Machina’s paper, one of the most cited in decision theory in the past 30 years, it may be worthwhile to read the New School HET description of the “fanning out” hypothesis which relates Machina to vN-M expected utility.

http://www.springerlink.com/content/m7q803520757q700/fulltext.pdf (Unfortunately, the paper above is both gated, and in German, as the original publication was in the formerly-famous journal Zeitschrift fur Nationalokonomie. The first English translation is in Shubik’s festschrift for Morganstern published in 1967, but I don’t see any online availability.)

“A Bayesian Model of Risk and Uncertainty,” N. al-Najjar & J. Weinstein (2012)

In a Bayesian world with expected utility maximizers, you have a prior belief on the chance that certain events will occur, and you maximize utility subject to those beliefs. But what if you are “uncertain” about what your prior even is? Perhaps you think with 60 percent probability, peace negotiations will commence and there will be a .5 chance of war and a .5 chance of peace, but with 40 percent probability, war is guaranteed to occur. It turns out these types of compound lotteries don’t affect your decision if you’re just making a single choice: simply combine the compound lottery and use that as your prior. In this case, you think war will occur with .6*.5+.4*1=.7 probability. That is, the Bayesian world is great for discussing risk – decisionmaking with concave utility and known distributions – not that useful for talking about one-shot Knightian uncertainty, or decisionmaking when the distributions are not well-known.

al-Najjar and Weinstein show, however, that this logic does not hold when you take multiple decisions that depend on a parameter that is common (or at least correlated) across those decisions. Imagine that a stock has a daily return determined by some IID process which is bought by a risk-averse agent, and imagine that the agent doesn’t have a single prior about that parameter, but rather a prior over the set of possible priors. For instance, as above, with probability .6 you have a .5 chance of a 1 percent increase and a .5 chance of a 1 percent decrease, but with probability .4, a 1 percent increase is assured. Every period, I can update my “prior over priors”. Does the logic about the compound lottery collapsing still hold, or does this uncertainty matter for decisionmaking?

If utility is linear or separable over time, then uncertainty doesn’t matter, but otherwise it does. Why? Call the prior over your priors “uncertainty.” Mathematically, the expected utility is a double integral: the outer integral is over possible priors with respect to your uncertainty, and the inner integral is just standard expected utility over N time periods with respect to each prior currently being summed in the outer integral. In the linear or separable utility case, I can swap the position of the integrals with the summation of utility over time, making the problem equivalent to adding up N one-period decision problems; as before, having priors over your prior when only one decision is being made cannot affect the decision you make, since you can just collapse the compound lottery.

If utility is not linear or separable over time, uncertainty will affect your decision. In particular, with concave utility, you will be uncertainty averse in addition to being risk averse. Al-Najjar and Weinstein use a modified Dirichlet distribution to talk about this more concretely. In particular, assuming a uniform prior-over-priors is actually equivalent to assuming very little uncertainty: the uniform prior-over-priors will respond very slowly to information learned during the first few periods. Alternatively, if you have a lot of uncertainty (a low Dirichlet parameter), your prior-over-priors, and hence your decisions, will change rapidly in the first few periods.

So what’s the use of this model? First, it allows you to talk about dynamic uncertainty without invoking any of the standard explanations for ambiguity – the problems with the ambiguity models are discussed in a well-known 2009 article by the authors of the present paper. If you’re, say, an observer of people’s behavior on the stock market, and see actions in some sectors that suggests purchase variability that far exceeds the known ex-post underlying variability of the asset, you might want to infer that the prior-over-priors exhibited a lot of uncertainty during the time examined; the buyers were not necessarily irrational. In particular, during regime shifts or periods with new financial product introduction, even if the ex-post level of risk does not change, assets may move with much more variance than expected due to the underlying uncertainty. Alternatively, if new assets whose underlying parameters are likely to be subject to much Knightian uncertainty, this model gives you a perfectly Bayesian explanation for why returns on that asset are higher than seems justified given known levels of risk aversion.

December 2011 Working Paper

“Fact, Fiction and Forecast,” N. Goodman (1954)

Fact, Fiction and Forecast is one of the seminal texts of 20th century philosophy: you may know it from the famous “grue/bleen” example. The text deals principally with two problems, the meaning of counterfactuals and a “new riddle” of induction, where the first is essential for any social scientist to understand, and the second has, I think, some interesting implications for decision theory. I will discuss each in turn. My notes are from the 4th edition, including the foreword by the legendary Hilary Putnam.

The first involves counterfactual conditionals, or sentences of the type “If X were true, then Y would obtain” along with the fact that X is not actually true. Counterfactual conditionals are both the focus of a huge number of economics paper (“If the Fed had done X, then GDP would have done Y”, “If deworming had been expanded to 100% in this village, school attendance would have been Y”, etc.). Counterfactuals are also, I would argue, the concept which has been forefront in the minds of the world’s leading philosophers over the past 60 years.

When economists use counterfactuals, I think they are naively trying to say something like “If the world is precisely the same, except that also X is true, then Y would hold.” There are a ton of problems with this. First, if everything in the world is precisely the same, then Not X is true, and since X and Not X are both true, by the principle of explosion, everything is true, including Not Y. So we must mean that everything in the world is precisely the same, except that X holds and Not X does not. Call the counterfactual set of true statements S’. But here we have more problems: S’ may contain a logical inconsistency, in that X may deductively imply some statement Z which is logically incompatible with something in S. Getting around that problem presents even more difficulties; David Lewis has the most famous resolution with his possible worlds logic, but even that is far from unproblematic.

Ignoring this basic problem of what is meant by a counterfactual, it is not well-known among social scientists that counterfactual conditionals are absolutely not strictly defined by their logical content, in the way that standard deductive logic is. That is, consider the statement If A then B, where A is a counterfactual. Let A’ be logically equivalent to A. It is easy to construct an example where you intuitively accept that A implies B, but not that A’ implies B. For instance, let A be “Bill Clinton were the same person as Julius Caesar,” A’ be “Julius Caesar were the same person as Bill Clinton” and B be “Bill Clinton would be emperor of Rome.” Given the importance of counterfactual logic to economics, there is a lot to be gained for our science from a better understanding of the philosophic issues here.

The more interesting point in Goodman for the decision theorist concerns induction. Hume showed in the 18th century why induction is invalid; the validity of induction involves assuming some sort of continuity of nature, and such an assumption is an induction itself. Even probabilistic induction – “The sun has risen every day, so I think it probable the sun will rise tomorrow” – is invalid for the same reason. There are many arguments contra Hume, but I hope you’ll take my word that they have all failed, and that the validity of induction is no longer an open question. That said, the wisdom of induction certainly is. Though we know induction is invalid reasoning, we nonetheless rely on it trivially every day (I get on a bus going north to my office, and not south, on the inductive assumption that my office is still north of my apartment) and less trivially on important policy issues (acceptance of “science” as a valid method for learning truth, rather than reading sacred books, is implicitly an acceptance of the wisdom of induction). What exactly do we mean when we say induction is wise? We mean that, there exist regularities for which the past existence of the regularity is evidence that we should expect the regularity in the future.

What Goodman points out is that the interesting question is not whether induction is valid – it isn’t – but rather what do we mean by a “regularity” anyway? This problem of induction is precisely the same to a problem in counterfactuals. Consider the regularity that every object is my pocket is a coin made of metal. I have investigated this many times, and every object I check is a metal coin. Consider the counterfactual “If I put a piece of chocolate in my pocket” or the induction on objects in my pocket where the only thing in my pocket today is a chocolate. Surely we don’t think we should induct that the chocolate will be a metal coin when I take it from my pocket. Alternatively, consider the regularity that all metal coins conduct electricity. I have investigated this many times also, and every metal coin I check conducts. If I check another coin, I do believe it will conduct. What is the difference between the chocolate example and the coin example? It is that I trust induction when I believe a law holds for some regularity, and do not trust induction when I believe past draws are simply random. The “grue/bleen” example, if you know it, is even stronger: I interpret it to mean that whatever rationale we use to delineate coincidences from regularities depends on more than how we selected instances in the past, or on the type of the property (say, color, or conductivity) we are examining. Goodman proposes some thoughts on how we know what histories are evidence of laws and what aren’t, but the exact delineation remains controversial.

So what does this mean for decision theory? Decision theory is heavily influenced by de Finetti and Savage, and somewhat by Carnap, and less so by other massive philosophy figures in this literature like Ayer, Goodman, Putnam, and Quine. That is, we conceive of the world as having states over which agents have a prior, and evidence changing that prior according to Bayes’ rule. Let Ω(₶) be the state space, where states are a countably infinite product space of potential observations. Let a “lawlike” set of hypotheses be a set of (infinite-length) observations that are compatible with some law, where the nature of possible laws is given exogenously. For instance, a lawlike set might be “all metals conduct” and the state space simply made up of tests of conductivity of various metals in each period plus a draw from the set {0,1}. The nature of the set of possible laws in the prior is that either all metals conduct, or the conductivity properties of various metals is not linked. Imagine in periods 1 and 2 that all metals conduct and we draw a 0 each time, and that in a second possible world, in periods 1 and 2 all metals conduct except copper in period 2, and we draw a 0 each time. What can we conclude as a Savage-style Bayesian? Think about what conditions on the prior are imposed.

There is one further worry for the standard econ model. How we induct in Goodman depends on what predicates we have as potential sources of laws: how ought we set up the state space? If we, say, put 0 prior on the world where all emeralds are grue, and positive prior on the world where all emeralds are green – and the standard model of state space means that we must include both possibilities as states – then we are violating Carnap’s “principle of total evidence” since we rule of grue before even seeing any evidence, and we are violating any of the standard rationales for putting positive probability on all possible states in the prior.

http://books.google.com/books?hl=en&lr=&id=i97_LdPXwrAC (The Google Books preview contains the entire introduction plus the foreword by Putnam, which should give a good taste of the content. Among economists, Itzhak Gilboa seems to have done the most work on expanding Goodman-style ideas to decision theory.)

“Common Knowledge and Equilibria Switching,” N.A. Dalkiran & M. Hoffman (2011)

It is not unusual that, at 2 A.M. on any given Saturday morning, a less-than-forthright gentlemen will ask his acquaintance whether “she would like to come up for some coffee.” To those just learning game theory, there is something strange here. Both parties are fully aware that no coffee will be served at such a late hour. We are both fully capable of translating the innuendo into its real meaning: there is no uncertainty here. But why, then, will nobody just ask for sex? And how is this question related to financial crises?

But perhaps these situations are not that strange. We all know from Rubinstein’s Electronic Mail game (you may know this as the story of the two coordinating generals) that mutual knowledge is not common knowledge. Imagine two generals on different ridges are planning an attack, and the attack will only succeed if both parties get a “good” signal; if either of us draws a bad signal, we know the attack will fail. The generals can communicate with each other by a messenger on horseback, but with probability epsilon close to zero, the messenger falls off his horse and never delivers the message. When I send a horse out, I know my signal and that’s it. When I receive the first horsemen, I know the other general’s signal and my own. When he receives a message back, he knows his signal, he knows my signal, and he knows that I know his signal. And so on. After two horsemen, we both know the other got a good signal, but we do not know that the other person knows we know this. So “almost” common knowledge is not almost at all, since common knowledge requires the “I know that he knows that I know” chain to continue infinitely, and that will happen with probability zero. Similar “contagion” arguments have been explored by many others (writeups on similar papers by Morris, Rob and Shin and Weinstein and Yildiz can be found on this site).

Dalkiran and Hoffman explore a similar question: when do similar tricky issues concerning higher order knowledge lead to “switching” of equilibria? More precisely, consider a two player, two action game, where (A,A) and (B,B) are the only pure strategy Nash equilibria: in other words, a coordination game. Let one equilibrium be a high payoff equilibrium, and the other be a low payoff equilibrium. Let there be a number of states of the world, with each agent endowed with an information partition in the standard way. Does there exist an equilibrium set of strategies where (A,A) is played with probability 1 in at least one state, and (B,B) with probability 1 in another state? That is, what conditions on priors, payoffs and the information partitions allow for equilibrium strategies where the “focal point” varies in different states even when the payoff matrix is not state-dependent. And what that might tell us about “customs” or behavior like the “would you like to come up for a drink” scenario? (Trivially, of course, such an equilibrium exists if we can both identify state 1 and state 2 with probability 1; the interesting situations are those where our knowledge of the current state is imperfect and heterogeneous, though I hope you’ll agree that such a situation is the most natural one!)

The authors provide necessary and sufficient conditions for arbitrary games, but the following example they give works nicely; the exact conditions rely on definitions of evident events and common p-belief and other such technical terms which will be familiar to decision theorists but are a bit too tricky to explain to a general audience in this blog post – if you read this paper and want to know more about those concepts, Aumann’s 2-part “Interactive Epistemology” articles and Larry Samuelson’s 2004 JEL are good places to start.

Imagine one agent (Aygun, in their example) is a bouncer at a whorehouse, and another agent (Moshe – the authors have a footnote explaining that they use their own names in this disreputable example so as not to defame the good name of readers with common game theory named like Ann and Bob!) is an occasional john. Aygun sometimes reads and doesn’t notice who walks in the brothel, and Moshe occasionally looks at the ground and doesn’t notice whether the bouncer sees him. It is a social convention that people should not have close friendships with anyone if it is common knowledge that they attend a brothel. There are then two coordinating equilibria: (A,A) for future close friendships and (B,B) for future weak friendships, which are coordinating in the sense that unequal friendships are worth less than equal friendships for both parties. There are then five states: H, (R,G), (R’,G), (R,G’) and (R’,G’), where H is the state in which Moshe stays home, (R,G) is the state where Moshe goes to the brothel, he looks at the Ground, and Aygun Reads, (R’,G) is the state where Moshe goes to the brothel, he looks at the Ground, and Aygun does not Read, etc. Both Moshe and Aygun have a common prior about the probability of looking at the ground, of staying home, and of reading.

The interesting potential equilibria here is the one where agents play (A,A) in state H and play (B,B) in state (R’,G’), the state where eye contact is made at the brothel. In such an equilibrium, would Moshe do better to avoid eye contact, meaning that (A,A) is the equilibrium strategy in states (R,G) and (R’,G)? Using the main theorem of the paper, a simple sufficiency condition obtains, which essentially says that the interesting equilibria exists if Aygun reads with sufficiently high probability, and that Aygun does not expect Moshe to be at the brothel with sufficiently high probability given that he is reading. If those conditions hold, then when Moshe looks at the ground, he will reason that Aygun is likely to be reading, and since Aygun is likely to be reading, he is likely to believe Moshe is at home, and therefore Moshe expects that Aygun expects that Moshe will play A, hence Moshe expects Aygun will play A, hence Moshe plays A. And Aygun reasons in exactly the same manner, so (A,A) is played in all states where eye contact is not made. But remember what is going on in (R’,G), the state where Aygun is not reading and Moshe is looking at the ground. Aygun knows Moshe is going to the brothel because he sees him, and Moshe of course knows that he himself is going to the brothel. So there is mutual knowledge here, but not common knowledge. And yet moving from mutual to common knowledge will break the “good” payoffs!

Now it goes without saying that in these types of coordination games, there are always equilibria where either (A,A) is played in every state or (B,B) in every state. But to the extent that certain states are associated with certain “focal points”, the ways in which customs or focal points can or can’t change equilibria across states are totally non-trivial in situations where agents have different information partitions. For instance, the authors give an example of the focal point at a traffic light where the color of the light is obscured to the drivers with some probability. They also generate a simple model of a bank run where switching depends on how much we expect other people to be following the news. Given the importance of discontinuous jumps and expectations to the financial world, I don’t doubt that understanding how and why equilibria switch is supremely relevant to understanding how stable or fragile a given financial regime is. Who knew politely asking a girl up to your apartment after a date was so related to the stability of the international financial system!

http://www.kellogg.northwestern.edu/faculty/dalkiran/dalkiran-jmp.pdf (November 2011 working paper – this paper is the job market paper of N. Aygun Dalkiran, a colleague of mine at Kellogg MEDS. If your department is looking for a good theorist, give him a call!)

“How Demanding is the Revealed Preference Approach to Demand?,” T. Beatty & I. Crawford (2011)

If you’ve read this site at all, you know that I see little value in “testing” economic theories, but if we’re going to do it, we ought at least do it in a way that makes a bit of sense. There are a ton of studies testing whether agents (here meaning not just humans; Chen and coauthors have a series of papers about revealed preference and other forms of maximizing behavior in Capuchin monkeys!) have preferences that can be described by the standard model: a concave, monotonic, continuous utility function that is time-invariant. Generally, the studies do find such maximizing behavior. But this may mean nothing: a theory that is trivially satisfied will never be shown to violate utility maximization, and indeed lots of experiments and empirical datasets see so little variation in prices that nearly any set of choices can be rationalized.

Beatty and Crawford propose a simple fix here. Consider an experiment with only two goods, and two price/income bundles. There is a feasible mixture among those two goods for each bundle. Consider the share of income under each price/income bundle spent on each of the two goods. If, say, 75% of income is spent on Good A under price/income bundle 1, then, for example, utility maximization may be consistent with spending anywhere between 0 and 89% of income on Good A under price/bundle 2. Imagine drawing a square with “income share spent on Good A under price/income bundle 1″ on the x-axis, and “income share on A under bundle 2″ on the y-axis. Some sets of choices will lie in a part of that square which is incompatible with utility maximization. The greater the proportion of total area which is incompatible with utility maximization, the more restrictive a test of utility maximizing behavior will be. The idea extends in a straightforward way to tests with N goods and M choices.

Beatty and Crawford assume you want a measure of “how well” agents do in a test of revealed preference as a function of both the pass rate (what proportion of the sample does not reject utility maximizing behavior) and the test difficulty (how often a random number generator selecting bundles would pass); if this all sounds like redefining the concept of statistical power, it should. It turns out that r minus a, where r is the pass rate and a is the test difficulty, has some nice axiomatic properties; I’m not totally convinced this part of the paper is that important, so I’ll leave it for you to read. The authors then apply this idea to some Spanish consumption data, where households were tracked for eight quarters. They find that about 96% of households in the sample pass: they show no purchases which violate utility maximizing behavior. But the variation in prices and quarterly income is so minimal that utility maximizing behavior imposes almost no constraints: 91% of random number generators would “pass” given the same variation in prices and incomes.

What do we learn from an exercise like this? There is definitely some benefit: if you want to design experiments concerning revealed preference, the measure in the present paper is useful indeed for helping choose precisely what variation in incomes and prices to use in order to subject revealed preference to a “tough” test. But this assumes you want to test at all. “Science is underdetermined,” they shout from the rooftops! Even if people showed behavior that “rejected” utility maximization, we would surely ask, first, by how much; second, are you sure “budget” and “price” are determined correctly (there is Varian’s error in price measurement, and no one is using lifetime income adjusted for credit constraints when talking about “budgets”); third, are you just rejecting concaveness and not maximizing behavior?; fourth, are there not preference shocks over a two year period, such as my newfound desire to buy diapers after a newborn arrives?; and so on. I think such critiques would be accepted by essentially any economist. Those of the philosophic school that I like to discuss on this site would further note that the model of utility maximization is not necessarily meant to be predictive, that we know it is “wrong” in that clearly people do not always act as if they are maximizers, and that the Max U model is nonetheless useful as a epistemic device for social science researchers.

http://www.tc.umn.edu/~tbeatty/working_papers/revisedpowerpaper.pdf (Final working paper – final version published in AER October 2011)

“How (Not) to Do Decision Theory,” E. Dekel & B. Lipman (2009)

Economics has a very strong methodological paradigm, but economists on the whole are incapable of expressing what it is. And this can get us in trouble. Chris Sims and Tom Sargent have both been shooting around the media echo chamber the last week because they have, by and large, refused to answer questions like “What will happen to the economy?” or “What will be the impact of policy X?” Not having an answer is fine, of course: I’m sure Sims would gladly answer any question about the econometric techniques he pioneered, but not being an expert on the details of policy X, he doesn’t feel it’s his place to give (relatively) uninformed comment on such a policy. Unfortunately, parts of the media take his remarks as an excuse to take potshots at “useless” mathematical formalization and axiomatization. What, then, is the point of our models?

Dekel and Lipman answer this question with respect to the most theoretical of all economics: decision theory. Why should we care that, say, the Savage axioms imply subjective expected utility maximization? We all (aside from Savage, perhaps) agree that the axioms are not always satisfied in real life, not should they necessarily be satisfied on normative grounds. Further, the theory, strictly speaking, makes few if any predictions that the statement “People maximize subjective expected utility” does not.

I leave most of the details of their exposition to the paper, but I found the following very compelling. It concerns Gilboa-Schmeidler preferences. These preferences give a utility function where, in the face of ambiguity about probabilities, agents always assume the worst. Dekel and Lipman:

The importance of knowing we have all the implications is particularly clear when the story of the model is potentially misleading about its predictions. For example, the multiple priors model seems to describe an extraordinarily pessimistic agent. Yet the axioms that characterize behavior in this model do not have this feature. The sufficiency theorem ensures that there is not some unrecognized pessimism requirement.

And this is the point. You might think, seeing only the utility representation, that Gilboa-Schmeidler agents are super pessimistic. This turns out not to be necessary at all – the axioms gives seemingly mild conditions on choice under ambiguity which lead to such seeming pessimism. Understanding this gives us a lot of insight into what might be going on when we see Ellsberg-style pessimism in the face of ambiguity.

My problem with Dekel and Lipman here, though, is that, like almost all economists, they are implicitly infected by the most damaging economics article ever written: Milton Friedman’s 1953 Methodology of Positive Economics. That essay roughly says that the goal of an economic model is not to be true, but to predict within a limited sphere of things we want to predict. Such belief suggests that we can “test” models by checking whether predictions in their given sphere are true. I think both of these concepts are totally contrary both to how we should use models in economics, and how we do use them; if you like appeals to authority, I should note that philosophers of social science are equally dismayed as I am by Friedman ’53.

So how should we judge and use models? My standard is that a model is good if end users of the model find that it helps guide their intuition. You might also say that a model is good if it “subjectively compelling.” Surely prediction of the future is a nice property a model might have, but it is by no means necessary, not does “refuting” the predictions implicit in a model mean the model is worthless. What follows is a list of what I would consider subjectively useful uses of a model, accepting that how you weight these uses is entirely subjective, but keeping in mind that our theory has end users and we ought keep some guess at how the model will be used in mind when we write it:

1) Dealing with unforeseen situations. The vast majority of social situations that could be modeled by an economist will not be so modeled. That is, we don’t even claim to make predictions in essentially every situation. There are situations that are inconceivable at the time a paper is written – who knows what the world will care about in 50 years. Does this mean economics is useless in these unforeseen situations? Of course not. Theoretical models can still be useful: Sandeep Baliga has a post at Cheap Talk today where he gains intuition into Pakistan-US bargaining from a Stiglitz-Shapiro model of equilibrium unemployment. The thought experiments, the why of the model, are as relevant, if not more relevant, than the consequence/prediction/etc. of the model. Indeed, look at the introduction – often a summary of results – of your favorite theory paper. Rarely are the theorems stated alone. Instead, the theory and the basic intuition behind the proof are usually given. If we knew a theorem to be true given its assumptions, but the proof was in a black box, the paper would be judged much less compelling by essentially all economists, even though such a paper could “predict” equally well as a paper with proofs.

2) Justifying identification restrictions and other unfalsifiable assumptions in empirical work. Sometimes these are trivial and do not to be formally modeled. Sometimes less so: I have a old note which I’m mentioned here a few times that gives an example from health care. A paper found that hospital report cards that were mandated at a subset of hospitals and otherwise voluntary were totally ineffective in changing patient or hospital behavior. A simple game theoretic model (well known in reputational games) shows that such effects are discontinuous: I need a sufficiently large number of patients to pay attention to the report cards before I (discontinuously) begin to see real effects. Such theoretical intuition guides the choice of empirical model in many, many cases.

3) Counterfactual analysis. By assumption, no “predictions” can or will ever be checked in counterfactual worlds. Ccounterfactual analysis is the basis of a ton of policy work. Even if you care about predictions, somehow defined, on a counterfactual space, surely we agree that such predictions cannot be tested. Which brings us to…

4) Model selection. Even within the class of purely predictive theories, it is trivial to create theories which “overfit” the past such that they match past data perfectly. How do I choose among the infinitely large class of models which predict all data thus seen perfectly? “Intuition” is the only reasonable answer: the explanations in Model A are more compelling than in Model B. And good economic models can help guide this intuition in future papers. Quine-Duhem Thesis is relevant here as well: when a model I have is “refuted” by new data, what was wrong with the explanation proposed? Quine-Duhem essentially says there is no procedure that will answer that question. (I only write this because there are some Popperians left in Economics, despite the fact that every philosopher of science after Popper has pointed out how ridiculous his model of science should work is: it says nothing about prediction in a stochastic world, it says nothing about how to select what questions to work on, etc.)

Obviously these aren’t the only non-predictive uses of theory – theory helps tie the literature together letting economics as a science progress rather than stand as a series of independent papers; theory can serve to check qualitative intuition, since many seemingly obvious arguments turn out to much less obvious when written down formally (more on this point in Dekel and Lipman). Nonetheless they are enough, I hope, to make the point that prediction is but one goal among many in good social science modeling. I think the Friedman idea about methodology would be long gone in economics if graduate training required the type of methodology/philosophy course, taught by faculty well read in philosophical issues, that every other social and policy science requires. Would that it were so!

http://people.bu.edu/blipman/Papers/dekel-lipman2.pdf (2009 Working Paper; final version in the 2010 Annual Review of Economics)

Follow

Get every new post delivered to your Inbox.

Join 169 other followers

%d bloggers like this: