Category Archives: Decision Theory

Some Results Related to Arrow’s Theorem

Arrow’s (Im)possibility Theorem is, and I think this is universally acknowledged, one of the great social science theorems of all time. I particularly love it because of its value when arguing with Popperians and other anti-theory types: the theorem is “untestable” in that it quite literally does not make any predictions, yet surely all would consider it a valuable scientific insight.

In this post, I want to talk about a couple. new papers using Arrow’s result is unusual ways. First, a philosopher has shown exactly how Arrow’s result is related to the general philosophical problem of choosing which scientific theory to accept. Second, a pair of computer scientists have used AI techniques to generate an interesting new method for proving Arrow.

The philosophic problem is the following. A good theory should satisfy a number of criteria; for Kuhn, these included accuracy, consistency, breadth, simplicity and fruitfulness. Imagine now there are a group of theories (about, e.g., how galaxies form, why birds have wings, etc.) and we ordinally rank them based on these criteria. Also imagine that we have ranked each theory according to these criteria and we all agree on the rankings. Which theory ought we accept? Arrow applied to theory choice gives us the worrying result that not only is there no unique method of choosing among theories but also that there may not exist any such method at all, at least if we want to satisfy unanimity, non-dictatorship and independence of irrelevant alternatives. That is, even if you and I all agree about how each theory ranks according to different desirability criteria, we still don’t have a good, general method of aggregating among criteria.

So what to do? Davide Rizza, in a new paper in Synthese (gated, I’m afraid), discusses a number of solutions. Of course, if we have more than just ordinal information about each criterion, then we can construct aggregated orders. For instance, if we assigned a number for the relative rankings on each criterion, we could just add these up for each theory and hence have an order. Note that this theory choice rule can be done even if we just have ordinal data – if there are N theories, then on criterion C, give the best theorem in that criterion N points, the second best N-1, and so on, then add up the scores. This is the famous Borda Count.

Why can’t we choose theories by the Borda Count or similar, then? Well, Borda (and any other rule that could construct an aggregate order while satisfying unanimity and non-dictatorship) must be violating the IIA assumption in Arrow. Unanimity, which insists a rule accept a theory if it considered best along every criterion, and non-dictatorship, where more than one criterion can at least matter in principle, seem totally unobjectionable. So maybe we ought just toss IIA from our theory choice rule, as perhaps Donald Saari would wish us to do. And IIA is a bit strange indeed. If I rank A>B>C, and if you require me to have transitive preferences, then just knowing the binary rankings A>B and B>C is enough to tell you that I prefer A>C even if I didn’t know that particular binary relationship. In this case, adding B isn’t “irrelevant”; there is information in the binary pairs generated by transitivity which IIA does not allow me to take advantage of. Some people call the IIA assumption “binary independence” since it aggregates using only binary relations, an odd thing given that the individual orders contain, by virtue of being orders, more than just binary relations. It turns out that there are aggregation rules which generate an order if we loosen IIA to an alternative restriction on how to use information in sequences. IIA, rather than ordinal rankings across criteria, is where Arrow poses a problem for theory choice. Now, Rizza points out that these aggregation rules needn’t be unique so we still can have situations where we all agree about how different theories rank according to each criterion, and agree on the axiomatic properties we want in an aggregation rules, yet nonetheless disagree about which theory to accept. Still worrying, though not for Kuhn, and certainly not for us crazier Feyerabend and Latour fans!

(A quick aside: How strange it is that Arrow’s Theorem is so heavily associated with voting? That every voting rule is subject to tactical behavior is Gibbard-Satterthwaite, not Arrow, and this result about strategic voting imposes nothing like an IIA assumption. Arrow’s result is about the far more general problem of aggregating orders, a problem which fundamentally has nothing to do with individual behavior. Indeed, I seem to recall that Arrow came up with his theorem while working one summer as a grad student at RAND on the problem of what, if anything, it could mean for a country to have preferences when voting on behalf of its citizens in bodies like the UN. The story also goes that when he showed his advisor – perhaps Hotelling? – what he had been working on over the summer, he was basically told the result was so good that he might as well just graduate right away!)

The second paper today comes from two computer scientists. There are lots of proofs of Arrow’s theorem – the original proof in Arrow’s 1951 book is actually incorrect! – but the CS guys use a technique I hadn’t seen before. Essentially, they first prove with a simple induction that iff you can find a case with 2 voters and 3 options that satisfies the Arrow axioms, can you find such a case with N>=2 voters and M>=3 options. This doesn’t actually narrow the problem a great deal: there are still 3!=6 ways to order 3 options, hence 6^2=36 permutations of the joint vote of the 2 voters, hence 6^36 functions mapping the voter orders to a social order. Nonetheless, the problem is small enough to be tackled by a Constraint Satisfaction algortihm which checks IIA and unanimity and finds only two social welfare functions not violating one of those constraints, which are just the cases where Agents 1 and 2 are dictators. Their algorithm took one second to run on a standard computer (clearly they are better algorithm writers than the average economist!). Sen’s theorem and Muller-Satterthwaite can also be proven using a similar restriction to the base case followed by algorithmic search.

Of course, algorithmic proofs tend to lack the insight and elegance of standard proofs. But they have benefits as well. Just as you can show that only 2 social welfare functions with N=2 voters and M=3 options satisfy IIA and unanimity, you can also show that only 94 (out of 6^36!) satisfy IIA. That is, it is IIA rather than other assumptions which is doing most of the work in Arrow. Inspecting those 94 remaining social welfare functions by hand can help elucidate alternative sets of axioms which also generate aggregation possibility or impossibility.

(And a third paper, just for fun: it turns out that Kiribati and Nauru actually use Borda counts in their elections, and that there does appear to be strategic candidate nomination behavior designed to take advantage of the non-IIA nature of Borda! IIA looks in many ways like a restriction on tactical behavior by candidates or those nominating issues, rather than a restriction on tactical behavior by voters. If you happen to teach Borda counts, this is a great case to give students.)

“Epistemic Game Theory,” E. Dekel & M. Siniscalchi (2014)

Here is a handbook chapter that is long overdue. The theory of epistemic games concerns a fairly novel justification for solution concepts under strategic uncertainty – that is, situations where what I want to do depends on other people do, and vice versa. We generally analyze these as games, and have a bunch of equilibrium (Nash, subgame perfection, etc.) and nonequilibrium (Nash bargain, rationalizability, etc.) solution concepts. So which should you use? I can think of four classes of justification for a game solution. First, the solution might be stable: if you told each player what to do, no one person (or sometimes group) would want to deviate. Maskin mentions this justification is particularly worthy when it comes to mechanism design. Second, the solution might be the outcome of a dynamic selection process, such as evolution or a particular learning rule. Third, the solution may be justified by certain axiomatic first principles; Shapley value is a good example in this class. The fourth class, however, is the one we most often teach students: a solution concept is good because it is justified by individual behavior assumptions. Nash, for example, is often thought to be justified by “rationality plus correct beliefs”. Backward induction is similarly justified by “common knowledge of rationality at all states.”

Those are informal arguments, however. The epistemic games (or sometimes, “interactive epistemology”) program seeks to formally analyze assumptions about the knowledge and rationality of players and what it implies for behavior. There remain many results we don’t know (for instance, I asked around and could only come up with one paper on the epistemics of coalitional games), but the results proven so far are actually fascinating. Let me give you three: rationality and common belief in rationality implies rationalizable strategies are played, the requirements for Nash are different depending on how players there are, and backward induction is surprisingly difficult to justify on epistemic grounds.

First, rationalizability. Take a game and remove any strictly dominated strategy for each player. Now in the reduced game, remove anything that is strictly dominated. Continue doing this until nothing is left to remove. The remaining strategies for each player are “rationalizable”. If players can hold any belief they want about what potential “types” opponents may be – where a given (Harsanyi) type specifies what an opponent will do – then as long as we are all rational, we all believe the opponents are rational, we all believe the opponents all believe that we all are rational, ad infinitum, the only possible outcomes to the game are the rationalizable ones. Proving this is actually quite complex: if we take as primitive the “hierarchy of beliefs” of each player (what do I believe my opponents will do, what do I believe they believe I will do, and so on), then we need to show that any hierarchy of beliefs can be written down in a type structure, then we need to be careful about how we define “rational” and “common belief” on a type structure, but all of this can be done. Note that many rationalizable strategies are not Nash equilibria.

So what further assumptions do we need to justify Nash? Recall the naive explanation: “rationality plus correct beliefs”. Nash takes us from rationalizability, where play is based on conjectures about opponent’s play, to an equilibrium, where play is based on correct conjectures. But which beliefs need to be correct? With two players and no uncertainty, the result is actually fairly straightforward: if our first order beliefs are (f,g), we mutually believe our first order beliefs are (f,g), and we mutually believe we are rational, then beliefs (f,g) represent a Nash equilibrium. You should notice three things here. First, we only need mutual belief (I know X, and you know I know X), not common belief, in rationality and in our first order beliefs. Second, the result is that our first-order beliefs are that a Nash equilibrium strategy will be played by all players; the result is about beliefs, not actual play. Third, with more than two players, we are clearly going to need assumptions about how my beliefs about our mutual opponent are related to your beliefs; that is, Nash will require more, epistemically, than “basic strategic reasoning”. Knowing these conditions can be quite useful. For instance, Terri Kneeland at UCL has investigated experimentally the extent to which each of the required epistemic conditions are satisfied, which helps us to understand situations in which Nash is harder to justify.

Finally, how about backward induction? Consider a centipede game. The backward induction rationale is that if we reached the final stage, the final player would defect, hence if we are in the second-to-last stage I should see that coming and defect before her, hence if we are in the third-to-last stage she will see that coming and defect before me, and so on. Imagine that, however, player 1 does not defect in the first stage. What am I to infer? Was this a mistake or am I perhaps facing an irrational opponent? Backward induction requires that I never make such an inference, and hence I defect in stage 2.

Here is a better justification for defection in the centipede game, though. If player 1 doesn’t defect in the first stage, then I “try my best” to retain a belief in his rationality. That is, if it is possible for him to have some belief about my actions in the second stage which rationally justified his first stage action, then I must believe that he holds those beliefs. For example, he may believe that I believe he will continue again in the third stage, hence that I will continue in the second stage, hence he will continue in the first stage then plan to defect in the third stage. Given his beliefs about me, his actions in the first stage were rational. But if that plan to defect in stage three were his justification, then I should defect in stage two. He realizes I will make these inferences, hence he will defect in stage 1. That is, the backward induction outcome is justified by forward induction. Now, it can be proven that rationality and common “strong belief in rationality” as loosely explained above, along with a suitably rich type structure for all players, generates a backward induction outcome. But the epistemic justification is completely based on the equivalence between forward and backward induction under those assumptions, not on any epistemic justification for backward induction reasoning per se. I think that’s a fantastic result.

Final version, prepared for the new Handbook of Game Theory. I don’t see a version on RePEc IDEAS.

“The Axiomatic Structure of Empirical Content,” C. Chambers, F. Echenique & E. Shmaya (2013)

Here’s a particularly interesting article at the intersection of philosophy of science and economic theory. Economic theorists have, for much of the twentieth century, linked high theory to observable data using the technique of axiomatization. Many axiomatizations operate by proving that if an agent has such-and-such behavioral properties, their observed actions will encompass certain other properties, and vice versa. For example, demand functions over convex budget sets satisfy the strong axiom of revealed preference if and only if they are generated by the usual restrictions on preference.

You may wonder, however: to what extent is the axiomatization interesting when you care about falsification (not that you should care, necessarily, but if you did)? Note first that we only observe partial data about the world. I can observe that you choose apples when apples and oranges are available (A>=B or B>=A, perhaps strictly if I offer you a bit of money as well) but not whether you prefer apples or bananas when those are the only two options. This shows that a theory may be falsifiable in principle (I may observe that you prefer strictly A to B, B to C and C to A, violating transitivity, falsifying rational preferences) yet still make nonfalsifiable statements (rational preferences also require completeness, yet with only partial data, I can’t observe that you either weakly prefer apples to bananas, or weakly prefer bananas to apples).

Note something interesting here, if you know your Popper. The theory of rational preferences (complete and transitive, with strict preferences defined as the strict part of the >= relation) is universal in Popper’s sense: these axioms can be written using the “for all” quantifier only. So universality under partial observation cannot be all we mean if we wish to consider only the empirical content of a theory. And partial observability is yet harsher on Popper. Consider the classic falsifiable statement, “All swans are white.” If I can in principle only observe a subset of all of the swans in the world, then that statement is not, in fact, falsifiable, since any of the unobserved swans may actually be black.

What Chambers et al do is show that you can take any theory (a set of data generating processes which can be examined with your empirical data) and reduce it to stricter and stricter theories, in the sense that any data which would reject the original theory still reject the restricted theory. The strongest restriction has the following property: every axiom is UNCAF, meaning it can be written using only “for all” operators which negate a conjunction of atomic formulas. So “for all swans s, the swan is white” is not UNCAF (since it lacks a negation). In economics, the strict preference transitivity axiom “for all x,y,z, not x>y and y>z and z>x” is UNCAF and the completeness axiom “for all x,y, x>=y or y>=x” is not, since it is an “or” statement and cannot be reduced to the negation of a conjunction. It is straightforward to extend this to checking for empirical content relative to a technical axiom like continuity.

Proving this result requires some technical complexity, but the result itself is very easy to use for consumers and creators of axiomatizations. Very nice. The authors also note that Samuelson, in his rejoinder to Friedman’s awful ’53 methodology paper, more or less got things right. Friedman claimed that the truth of axioms is not terribly important. Samuelson pointed out that either all of a theory can falsified, in which case since the axioms themselves are always implied by a theory Friedman’s arguments are in trouble, or the theory makes some non-falsifiable claims, in which case attempts to test the theory as a whole are uninformative. Either way, if you care about predictive theories, you ought choose those the weakest theory that generates some given empirical content. In Chambers et al’s result, this means you better be choosing theories whose axioms are UNCAF with respect to technical assumptions. (And of course, if you are writing a theory for explanation, or lucidity, or simplicity, or whatever non-predictive goal you have in mind, continue not to worry about any of this!)

Dec 2012 Working Paper (no IDEAS version).

Paul Samuelson’s Contributions to Welfare Economics, K. Arrow (1983)

I happened to come across a copy of a book entitled “Paul Samuelson and Modern Economic Theory” when browsing the library stacks recently. Clear evidence of his incredible breadth are in the section titles: Arrow writes about his work on social welfare, Houthhaker on consumption theory, Patinkin on money, Tobin on fiscal policy, Merton on financial economics, and so on. Arrow’s chapter on welfare economics was particularly interesting. This book comes from the early 80s, which is roughly the end of social welfare as a major field of study in economics. I was never totally clear on the reason for this – is it simply that Arrow’s Possibility Theorem, Sen’s Liberal Paradox, and the Gibbard-Satterthwaite Theorem were so devastating to any hope of “general” social choice rules?

In any case, social welfare is today little studied, but Arrow mentions a number of interesting results which really ought be better known. Bergson-Samuelson, conceived when the two were in graduate school together, is rightfully famous. After a long interlude of confused utilitarianism, Pareto had us all convinced that we should dismiss cardinal utility and interpersonal utility comparisons. This seems to suggest that all we can say about social welfare is that we should select a Pareto-optimal state. Bergson and Samuelson were unhappy with this – we suggest individuals should have preferences which represent an order (complete and transitive) over states, and the old utilitarians had a rule which imposed a real number for society’s value of any state (hence an order). Being able to order states from a social point of view seems necessary if we are to make decisions. Some attempts to extend Pareto did not give us an order. (Why is an order important? Arrow does not discuss this, but consider earlier attempts at extending Pareto like Kaldor-Hicks efficiency: going from state s to state s’ is KH-efficient if there exist ex-post transfers under which the change is Paretian. Let person a value the bundle (1,1)>(2,0)>(1,0)>all else, and person b value the bundle (1,1)>(0,2)>(0,1)>all else. In state s, person a is allocated (2,0) and person b (0,1). In state s’, person a is allocated (1,0) and person b is allocated (0,2). Note that going from s to s’ is a Kaldor-Hicks improvement, but going from s’ to s is also a Kaldor-Hicks improvement!)

Bergson and Samuelson wanted to respect individual preferences – society can’t prefer s to s’ if s’ is a Pareto improvement on s in the individual preference relations. Take the relation RU. We will say that sRUs’ if all individuals weakly prefer s to s’. Not that though RU is not complete, it is transitive. Here’s the great, and non-obvious, trick. The Polish mathematician Szpilrajn has a great 1930 theorem which says that if R is a transitive relation, then there exists a complete relation R2 which extends R; that is, if sRs’ then sR2s’, plus we complete the relation by adding some more elements. This is not a terribly easy proof, it turns out. That is, there exists social welfare orders which are entirely ordinal and which respect Pareto dominance. Of course, there may be lots of them, and which you pick is a problem of philosophy more than economics, but they exist nonetheless. Note why Arrow’s theorem doesn’t apply: we are starting with given sets of preferences and constructing a social preference, rather than attempting to find a rule that maps any individual preferences into a social rule. There have been many papers arguing that this difference doesn’t matter, so all I can say is that Arrow himself, in this very essay, accepts that difference completely. (One more sidenote here: if you wish to start with individual utility functions, we can still do everything in an ordinal way. It is not obvious that every indifference map can be mapped to a utility function, and not even true without some type of continuity assumption, especially if we want the utility functions to themselves be continuous. A nice proof of how we can do so using a trick from probability theory is in Neuefeind’s 1972 paper, which was followed up in more generality by Mount and Reiter here at MEDS then by Chichilnisky in a series of papers. Now just sum up these mapped individual utilities, and I have a Paretian social utility function which was constructed entirely in an ordinal fashion.)

Now, this Bergson-Samuelson seems pretty unusable. What do we learn that we don’t know from a naive Pareto property? Here are two great insights. First, choose any social welfare function from the set we have constructed above. Let individuals have non-identical utility functions. In general, there is no social welfare function which is maximized by always keeping every individual’s income identical in all states of the world! The proof of this is very easy if we use Harsanyi’s extension of Bergson-Samuelson: if agents are Expected Utility maximizers, than any B-S social welfare function can be written as the weighted linear combination of individual utility functions. As relative prices or the social production possibilities frontier changes, the weights are constant, but the individual marginal utilities are (generically) not. Hence if it was socially optimal to endow everybody with equal income before the relative price change, it (generically) is not later, no matter which Pareto-respecting measure of social welfare your society chooses to use! That is, I think, an astounding result for naive egalitarianism.

Here’s a second one. Surely any good economist knows policies should be evaluated according to cost-benefit analysis. If, for instance, the summed willingness-to-pay for a public good exceeds the cost of the public good, then society should buy it. When, however, does a B-S social welfare function allow us to make such an inference? Generically, such an inference is only possible if the distribution of income is itself socially optimal, since willingness-to-pay depends on the individual budget constraints. Indeed, even if demand estimation or survey evidence suggests that there is very little willingness-to-pay for a public good, society may wish to purchase the good. This is true even if the underlying basis for choosing the particular social welfare function we use has nothing at all to do with equity, and further since the B-S social welfare function respects individual preferences via the Paretian criterion, the reason we build the public good also has nothing to do with paternalism. Results of this type are just absolutely fundamental to policy analysis, and are not at all made irrelevant by the impossibility results which followed Arrow’s theorem.

This is a book chapter, so I’m afraid I don’t have an online version. The book is here. Arrow is amazingly still publishing at the age of 91; he had an interesting article with the underrated Partha Dasgupta in the EJ a couple years back. People claim that relative consumption a la Veblen matters in surveys. Yet it is hard to find such effects in the data. Why is this? Assume I wish to keep up with the Joneses when I move to a richer place. If I increase consumption today, I am decreasing savings, which decreases consumption even more tomorrow. How my desire to change consumption today if I have richer peers then depends on that dynamic tradeoff, which Arrow and Dasgupta completely characterize.

“The Meaning of Utility Measurement,” A. Alchian (1953)

Armen Alchian, one of the dons from UCLA’s glory days, passed away today at 98. His is, for me, a difficult legacy to interpret. On the one hand, Alchian-Demsetz 1972 is among the most famous economics papers ever written, and it can fairly be considered the precursor to mechanism design, the most important new idea in economics in the past 50 years. People produce more by working together. It is difficult to know who shirks when we work as a team. A firm gives a residual claimant (an owner) who then has an incentive to monitor shirking, and as only one person needs to monitor the shirking, this is much less costly than a market where each member of the team production would need somehow to monitor whether other parts of the team shirk. Firms are deluded if they think that they can order their labor inputs to do whatever they want – agency problems exist both within and outside the firm. Such an agency theory of the firm is very modern indeed. That said, surely this can’t explain things like horizontally integrated firms, with different divisions producing wholly different products (or, really, any firm behavior where output is a separable function of each input in the firm).

Alchian’s other super famous work is his 1950 paper on evolution and the firm. As Friedman would later argue, Alchian suggested that we are justified treating firms as if they are profit maximizers when we do our analyses since the nature of competition means that non-profit maximizing firms will disappear in the long run. I am a Nelson/Winter fan, so of course I like the second half of the argument, but if I want to suggest that firms partially seek opportunities and partially are driven out by selection (one bit Lamarck, one bit Darwin), then why not just drop the profit maximization axiom altogether and try to write a parsimonious description of firm behavior which doesn’t rely on such maximization?

It turns out that if you do the math, profit maximization is not generally equivalent to selection. Using an example from Sandroni 2000, take two firms. There are two equally likely states of nature, Good and Bad. There are two things a firm can do, the risky one, which returns profit 3 in good states and 0 in bad states, and a risk-free one, which always returns 1. Maximizing expected profit means always investing all capital in the risky state, hence eventually going bankrupt. A firm who doesn’t profit maximize (say, it has incorrect beliefs and thinks we are always in the Bad state, hence always takes the risk-free action) can survive. This example is far too simple to be of much worth, but it does at least remind us of lesson in the St. Petersburg paradox: expected value maximization and survival have very little to do with each other.

More interesting is the case with random profits, as in Radner and Dutta 2003. Firms invest their capital stock, choosing some mean-variance profits pair as a function of capital stock. The owner can, instead of reinvesting profits into the capital stock, pay out to herself or investors. If the marginal utility of a dollar of capital stock falls below a dollar, the profit-maximizing owner will not reinvest that money. But a run of (random) losses can drive the firm to bankruptcy, and does so eventually with certainty. A non-profit maximizing firm may just take the lowest variance earnings in every period, pay out to investors a fraction of the capital stock exactly equal to the minimum earnings that period, and hence live forever. But why would investors ever invest in such a firm? If investment demand is bounded, for example, and there are many non profit-maximizing firms from the start, it is not the highest rate of return but the marginal rate of return which determines the market interest rate paid to investors. A non profit-maximizer that can pay out to investors at least that much will survive, and all the profit maximizers will eventually fail.

The paper in the title of this post is much simpler: it is merely a very readable description of von Neumann expected utility, when utility can be associated with a number and when it cannot, and the possibility of interpersonal utility comparison. Alchian, it is said, was a very good teacher, and from this article, I believe it. What’s great is the timing: 1953. That’s one year before Savage’s theory, the most beautiful in all of economics. Given that Alchian was associated with RAND, where Savage was fairly often, I imagine he must have known at least some of the rudiments of Savage’s subjective theory, though nothing appears in this particular article. 1953 is also two years before Herbert Simon’s behavioral theory. When describing the vN-M axioms, Alchian gives situations which might contradict each, except for the first, a complete and transitive order over bundles of goods, an assumption which is consistent with all but “totally unreasonable behavior”!

1953 AER final version (No IDEAS version).

“Until the Bitter End: On Prospect Theory in a Dynamic Context,” S. Ebert & P. Strack (2012)

Let’s kick off job market season with an interesting paper by Sebastian Ebert, a post-doc at Bonn, and Philipp Strack, who is on the job market from Bonn (though this doesn’t appear to be his main job market paper). The paper concerns the implications of Tversky and Kahneman’s prospect theory is its 1992 form. This form of utility is nothing obscure: the 1992 paper has over 5,000 citations, and the original prospect theory paper has substantially more. Roughly, cumulative prospect theory (CPT) says that agents have utility which is concave above a reference point, convex below it, with big losses and gains that occur with small probability weighed particularly heavily. Such loss aversion is thought to explain, for example, the simultaneous existence of insurance and gambling, or the difference in willingness to pay for objects you possess versus objects you don’t possess.

As Machina, among others, pointed out a couple decades ago, once you leave expected utility, you are definitely going to be writing down preferences that generate strange behavior at least somewhere. This is a direct result of Savage’s theorem. If you are not an EU-maximizer, then you are violating at least one of Savage’s axioms, and those axioms in their totality are proven to avoid many types of behavior that we find normatively unappealing such as falling for the sunk cost fallacy. Ebert and Strack write down a really general version of CPT, even more general than the rough definition I gave above. They then note that loss aversion means I can always construct a right-skewed gamble with negative expected payout that the loss averse agent will accept. Why? Agents like big gains that occur with small probability. Right-skew the gamble so that a big gain occurs with a tiny amount of probability, and otherwise the agent loses a tiny amount. An agent with CPT preferences will accept this gamble. Such a gamble exists at any wealth level, no matter what the reference point. Likewise, there is a left-skewed, positive expected payoff gamble that is rejected at any wealth level.

If you take a theory-free definition of risk aversion to mean “Risk-averse agents never accept gambles with zero expected payoff” and “Risk-loving agents always accept a risk with zero expected payoff”, then the theorem in the previous paragraph means that CPT agents are neither risk-averse, nor risk-loving, at any wealth level. This is interesting because a naive description of the loss averse utility function is that CPT agents are “risk-averse above the reference point, and risk-loving below it”. But the fact that small probability events are given more weight, in Ebert and Strack’s words, dominates whatever curvature the utility function possesses when it comes to some types of gambles.

So what does this mean, then? Let’s take CPT agents into a dynamic framework, and let them be naive about their time inconsistency (since they are non EU-maximizers, they will be time inconsistent). Bring them to a casino where a random variable moves with negative drift. Give them an endowment of money and any reference point. The CPT agent gambles at any time t as long as she has some strategy which (naively) increases her CPT utility. By the skewness result above, we know she can, at the very least, gamble a very small amount, plan to stop if I lose, and plan to keep gambling if I win. There is always such a bet. If I do lose, then tomorrow I will bet again, since there is a gamble with positive expected utility gain no matter my wealth level. Since the process has negative drift, I will continue gambling until I go bankrupt. This result isn’t relying on any strange properties of continuous time or infinite state spaces; the authors construct an example on a 37-number roulette wheel using the original parameterization of Kahneman and Tversky which has the CPT agent bet all the way to bankruptcy.

What do we learn? Two things. First, a lot of what is supposedly explained by prospect theory may, in fact, be explained by the skewness preference which the heavy weighting on low probability events in CPT, a fact mentioned by a number of papers the authors cite. Second, not to go all Burke on you, but when dealing with qualitative models, we have good reason to stick to the orthodoxy in many cases. The logical consequences of orthodox models will generally have been explored in great depth. The logical consequences of alternatives will not have been explored in the same way. All of our models of dynamic utility are problematic: expected utility falls in the Rabin critique, ambiguity aversion implies sunk cost fallacies, and prospect theory is vulnerable in the ways described here. But any theory which has been used for a long time will have its flaws shown more visibly than newer, alternative theories. We shouldn’t mistake the lack of visible flaws for their lack more generally.

SSRN Feb. 2012 working paper (no IDEAS version).

“Das Unsicherheitsmoment in der Wirtlehre,” K. Menger (1934)

Every economist surely knows the St. Petersburg Paradox described by Daniel Bernoulli in 1738 in a paper which can fairly claim to be the first piece of theoretical economics. Consider a casino offering a game of sequential coinflips that pays 2^(n-1) as a payoff if the first heads arrives on the nth flip of the coin. That is, if there is a heads on the first flip, you receive 1. If there is a tails on the first flip, and a heads on the second, you receive 2, and 4 if TTH, and 8 if TTTH, and so on. It is quite immediate that this game has expected payoff of infinity. Yet, Bernoulli points out, no one would pay anywhere near infinity for such a game. Why not? Perhaps they have what we would now call logarithmic utility, in which case I value the gamble at .5*ln(1)+.25*ln(2)+.125*ln(4)+…, a finite sum.

Now, here’s the interesting bit. Karl Menger proved in the 1927 that the standard response to the St. Petersburg paradox is insufficient (note that Karl with a K is the mathematically inclined son and mentor to Morganstern, rather than the relatively qualitative father, Carl, who somewhat undeservingly joined Walras and Jevons on the Mt. Rushmore of Marginal Utility). For instance, if the casino pays out e^(2^n-1) rather than 2^(n-1), then even an agent with logarithmic utility have infinite expected utility from such a gamble. This, nearly 200 years after Bernoulli’s original paper! Indeed, such a construction is possible for any unbounded utility function; let the casino pay out U^-1(2^(n-1)) when the first heads arrives on the nth flip, where U^-1 is inverse utility.

Things are worse, Menger points out. One can construct a thought experiment where, for any finite amount C and an arbitrarily small probability p, there is a bounded utility function where an agent will prefer the gamble to win some finite amount D with probability p to getting a sure thing of C [Sentence edited as suggested in the comments.] So bounding the utility function does not kill off all paradoxes of this type.

The 1927 lecture and its response are discussed in length in Rob Leonard’s “Von Neumann, Morganstern, and the Creation of Game Theory.” Apparently, Oskar Morganstern was at the Vienna Kreis where Menger first presented this result, and was quite taken with it, a fact surely interesting given Morganstern’s later development of expected utility theory. Indeed, one of Machina’s stated aims in his famous paper on EU with the Independence Axiom is providing a way around Menger’s result while salvaging EU analysis. If you are unfamiliar with Machina’s paper, one of the most cited in decision theory in the past 30 years, it may be worthwhile to read the New School HET description of the “fanning out” hypothesis which relates Machina to vN-M expected utility. (Unfortunately, the paper above is both gated, and in German, as the original publication was in the formerly-famous journal Zeitschrift fur Nationalokonomie. The first English translation is in Shubik’s festschrift for Morganstern published in 1967, but I don’t see any online availability.)

“A Bayesian Model of Risk and Uncertainty,” N. al-Najjar & J. Weinstein (2012)

In a Bayesian world with expected utility maximizers, you have a prior belief on the chance that certain events will occur, and you maximize utility subject to those beliefs. But what if you are “uncertain” about what your prior even is? Perhaps you think with 60 percent probability, peace negotiations will commence and there will be a .5 chance of war and a .5 chance of peace, but with 40 percent probability, war is guaranteed to occur. It turns out these types of compound lotteries don’t affect your decision if you’re just making a single choice: simply combine the compound lottery and use that as your prior. In this case, you think war will occur with .6*.5+.4*1=.7 probability. That is, the Bayesian world is great for discussing risk – decisionmaking with concave utility and known distributions – not that useful for talking about one-shot Knightian uncertainty, or decisionmaking when the distributions are not well-known.

al-Najjar and Weinstein show, however, that this logic does not hold when you take multiple decisions that depend on a parameter that is common (or at least correlated) across those decisions. Imagine that a stock has a daily return determined by some IID process which is bought by a risk-averse agent, and imagine that the agent doesn’t have a single prior about that parameter, but rather a prior over the set of possible priors. For instance, as above, with probability .6 you have a .5 chance of a 1 percent increase and a .5 chance of a 1 percent decrease, but with probability .4, a 1 percent increase is assured. Every period, I can update my “prior over priors”. Does the logic about the compound lottery collapsing still hold, or does this uncertainty matter for decisionmaking?

If utility is linear or separable over time, then uncertainty doesn’t matter, but otherwise it does. Why? Call the prior over your priors “uncertainty.” Mathematically, the expected utility is a double integral: the outer integral is over possible priors with respect to your uncertainty, and the inner integral is just standard expected utility over N time periods with respect to each prior currently being summed in the outer integral. In the linear or separable utility case, I can swap the position of the integrals with the summation of utility over time, making the problem equivalent to adding up N one-period decision problems; as before, having priors over your prior when only one decision is being made cannot affect the decision you make, since you can just collapse the compound lottery.

If utility is not linear or separable over time, uncertainty will affect your decision. In particular, with concave utility, you will be uncertainty averse in addition to being risk averse. Al-Najjar and Weinstein use a modified Dirichlet distribution to talk about this more concretely. In particular, assuming a uniform prior-over-priors is actually equivalent to assuming very little uncertainty: the uniform prior-over-priors will respond very slowly to information learned during the first few periods. Alternatively, if you have a lot of uncertainty (a low Dirichlet parameter), your prior-over-priors, and hence your decisions, will change rapidly in the first few periods.

So what’s the use of this model? First, it allows you to talk about dynamic uncertainty without invoking any of the standard explanations for ambiguity – the problems with the ambiguity models are discussed in a well-known 2009 article by the authors of the present paper. If you’re, say, an observer of people’s behavior on the stock market, and see actions in some sectors that suggests purchase variability that far exceeds the known ex-post underlying variability of the asset, you might want to infer that the prior-over-priors exhibited a lot of uncertainty during the time examined; the buyers were not necessarily irrational. In particular, during regime shifts or periods with new financial product introduction, even if the ex-post level of risk does not change, assets may move with much more variance than expected due to the underlying uncertainty. Alternatively, if new assets whose underlying parameters are likely to be subject to much Knightian uncertainty, this model gives you a perfectly Bayesian explanation for why returns on that asset are higher than seems justified given known levels of risk aversion.

December 2011 Working Paper

“Fact, Fiction and Forecast,” N. Goodman (1954)

Fact, Fiction and Forecast is one of the seminal texts of 20th century philosophy: you may know it from the famous “grue/bleen” example. The text deals principally with two problems, the meaning of counterfactuals and a “new riddle” of induction, where the first is essential for any social scientist to understand, and the second has, I think, some interesting implications for decision theory. I will discuss each in turn. My notes are from the 4th edition, including the foreword by the legendary Hilary Putnam.

The first involves counterfactual conditionals, or sentences of the type “If X were true, then Y would obtain” along with the fact that X is not actually true. Counterfactual conditionals are both the focus of a huge number of economics paper (“If the Fed had done X, then GDP would have done Y”, “If deworming had been expanded to 100% in this village, school attendance would have been Y”, etc.). Counterfactuals are also, I would argue, the concept which has been forefront in the minds of the world’s leading philosophers over the past 60 years.

When economists use counterfactuals, I think they are naively trying to say something like “If the world is precisely the same, except that also X is true, then Y would hold.” There are a ton of problems with this. First, if everything in the world is precisely the same, then Not X is true, and since X and Not X are both true, by the principle of explosion, everything is true, including Not Y. So we must mean that everything in the world is precisely the same, except that X holds and Not X does not. Call the counterfactual set of true statements S’. But here we have more problems: S’ may contain a logical inconsistency, in that X may deductively imply some statement Z which is logically incompatible with something in S. Getting around that problem presents even more difficulties; David Lewis has the most famous resolution with his possible worlds logic, but even that is far from unproblematic.

Ignoring this basic problem of what is meant by a counterfactual, it is not well-known among social scientists that counterfactual conditionals are absolutely not strictly defined by their logical content, in the way that standard deductive logic is. That is, consider the statement If A then B, where A is a counterfactual. Let A’ be logically equivalent to A. It is easy to construct an example where you intuitively accept that A implies B, but not that A’ implies B. For instance, let A be “Bill Clinton were the same person as Julius Caesar,” A’ be “Julius Caesar were the same person as Bill Clinton” and B be “Bill Clinton would be emperor of Rome.” Given the importance of counterfactual logic to economics, there is a lot to be gained for our science from a better understanding of the philosophic issues here.

The more interesting point in Goodman for the decision theorist concerns induction. Hume showed in the 18th century why induction is invalid; the validity of induction involves assuming some sort of continuity of nature, and such an assumption is an induction itself. Even probabilistic induction – “The sun has risen every day, so I think it probable the sun will rise tomorrow” – is invalid for the same reason. There are many arguments contra Hume, but I hope you’ll take my word that they have all failed, and that the validity of induction is no longer an open question. That said, the wisdom of induction certainly is. Though we know induction is invalid reasoning, we nonetheless rely on it trivially every day (I get on a bus going north to my office, and not south, on the inductive assumption that my office is still north of my apartment) and less trivially on important policy issues (acceptance of “science” as a valid method for learning truth, rather than reading sacred books, is implicitly an acceptance of the wisdom of induction). What exactly do we mean when we say induction is wise? We mean that, there exist regularities for which the past existence of the regularity is evidence that we should expect the regularity in the future.

What Goodman points out is that the interesting question is not whether induction is valid – it isn’t – but rather what do we mean by a “regularity” anyway? This problem of induction is precisely the same to a problem in counterfactuals. Consider the regularity that every object is my pocket is a coin made of metal. I have investigated this many times, and every object I check is a metal coin. Consider the counterfactual “If I put a piece of chocolate in my pocket” or the induction on objects in my pocket where the only thing in my pocket today is a chocolate. Surely we don’t think we should induct that the chocolate will be a metal coin when I take it from my pocket. Alternatively, consider the regularity that all metal coins conduct electricity. I have investigated this many times also, and every metal coin I check conducts. If I check another coin, I do believe it will conduct. What is the difference between the chocolate example and the coin example? It is that I trust induction when I believe a law holds for some regularity, and do not trust induction when I believe past draws are simply random. The “grue/bleen” example, if you know it, is even stronger: I interpret it to mean that whatever rationale we use to delineate coincidences from regularities depends on more than how we selected instances in the past, or on the type of the property (say, color, or conductivity) we are examining. Goodman proposes some thoughts on how we know what histories are evidence of laws and what aren’t, but the exact delineation remains controversial.

So what does this mean for decision theory? Decision theory is heavily influenced by de Finetti and Savage, and somewhat by Carnap, and less so by other massive philosophy figures in this literature like Ayer, Goodman, Putnam, and Quine. That is, we conceive of the world as having states over which agents have a prior, and evidence changing that prior according to Bayes’ rule. Let Ω(₶) be the state space, where states are a countably infinite product space of potential observations. Let a “lawlike” set of hypotheses be a set of (infinite-length) observations that are compatible with some law, where the nature of possible laws is given exogenously. For instance, a lawlike set might be “all metals conduct” and the state space simply made up of tests of conductivity of various metals in each period plus a draw from the set {0,1}. The nature of the set of possible laws in the prior is that either all metals conduct, or the conductivity properties of various metals is not linked. Imagine in periods 1 and 2 that all metals conduct and we draw a 0 each time, and that in a second possible world, in periods 1 and 2 all metals conduct except copper in period 2, and we draw a 0 each time. What can we conclude as a Savage-style Bayesian? Think about what conditions on the prior are imposed.

There is one further worry for the standard econ model. How we induct in Goodman depends on what predicates we have as potential sources of laws: how ought we set up the state space? If we, say, put 0 prior on the world where all emeralds are grue, and positive prior on the world where all emeralds are green – and the standard model of state space means that we must include both possibilities as states – then we are violating Carnap’s “principle of total evidence” since we rule of grue before even seeing any evidence, and we are violating any of the standard rationales for putting positive probability on all possible states in the prior. (The Google Books preview contains the entire introduction plus the foreword by Putnam, which should give a good taste of the content. Among economists, Itzhak Gilboa seems to have done the most work on expanding Goodman-style ideas to decision theory.)

“Common Knowledge and Equilibria Switching,” N.A. Dalkiran & M. Hoffman (2011)

It is not unusual that, at 2 A.M. on any given Saturday morning, a less-than-forthright gentlemen will ask his acquaintance whether “she would like to come up for some coffee.” To those just learning game theory, there is something strange here. Both parties are fully aware that no coffee will be served at such a late hour. We are both fully capable of translating the innuendo into its real meaning: there is no uncertainty here. But why, then, will nobody just ask for sex? And how is this question related to financial crises?

But perhaps these situations are not that strange. We all know from Rubinstein’s Electronic Mail game (you may know this as the story of the two coordinating generals) that mutual knowledge is not common knowledge. Imagine two generals on different ridges are planning an attack, and the attack will only succeed if both parties get a “good” signal; if either of us draws a bad signal, we know the attack will fail. The generals can communicate with each other by a messenger on horseback, but with probability epsilon close to zero, the messenger falls off his horse and never delivers the message. When I send a horse out, I know my signal and that’s it. When I receive the first horsemen, I know the other general’s signal and my own. When he receives a message back, he knows his signal, he knows my signal, and he knows that I know his signal. And so on. After two horsemen, we both know the other got a good signal, but we do not know that the other person knows we know this. So “almost” common knowledge is not almost at all, since common knowledge requires the “I know that he knows that I know” chain to continue infinitely, and that will happen with probability zero. Similar “contagion” arguments have been explored by many others (writeups on similar papers by Morris, Rob and Shin and Weinstein and Yildiz can be found on this site).

Dalkiran and Hoffman explore a similar question: when do similar tricky issues concerning higher order knowledge lead to “switching” of equilibria? More precisely, consider a two player, two action game, where (A,A) and (B,B) are the only pure strategy Nash equilibria: in other words, a coordination game. Let one equilibrium be a high payoff equilibrium, and the other be a low payoff equilibrium. Let there be a number of states of the world, with each agent endowed with an information partition in the standard way. Does there exist an equilibrium set of strategies where (A,A) is played with probability 1 in at least one state, and (B,B) with probability 1 in another state? That is, what conditions on priors, payoffs and the information partitions allow for equilibrium strategies where the “focal point” varies in different states even when the payoff matrix is not state-dependent. And what that might tell us about “customs” or behavior like the “would you like to come up for a drink” scenario? (Trivially, of course, such an equilibrium exists if we can both identify state 1 and state 2 with probability 1; the interesting situations are those where our knowledge of the current state is imperfect and heterogeneous, though I hope you’ll agree that such a situation is the most natural one!)

The authors provide necessary and sufficient conditions for arbitrary games, but the following example they give works nicely; the exact conditions rely on definitions of evident events and common p-belief and other such technical terms which will be familiar to decision theorists but are a bit too tricky to explain to a general audience in this blog post – if you read this paper and want to know more about those concepts, Aumann’s 2-part “Interactive Epistemology” articles and Larry Samuelson’s 2004 JEL are good places to start.

Imagine one agent (Aygun, in their example) is a bouncer at a whorehouse, and another agent (Moshe – the authors have a footnote explaining that they use their own names in this disreputable example so as not to defame the good name of readers with common game theory named like Ann and Bob!) is an occasional john. Aygun sometimes reads and doesn’t notice who walks in the brothel, and Moshe occasionally looks at the ground and doesn’t notice whether the bouncer sees him. It is a social convention that people should not have close friendships with anyone if it is common knowledge that they attend a brothel. There are then two coordinating equilibria: (A,A) for future close friendships and (B,B) for future weak friendships, which are coordinating in the sense that unequal friendships are worth less than equal friendships for both parties. There are then five states: H, (R,G), (R’,G), (R,G’) and (R’,G’), where H is the state in which Moshe stays home, (R,G) is the state where Moshe goes to the brothel, he looks at the Ground, and Aygun Reads, (R’,G) is the state where Moshe goes to the brothel, he looks at the Ground, and Aygun does not Read, etc. Both Moshe and Aygun have a common prior about the probability of looking at the ground, of staying home, and of reading.

The interesting potential equilibria here is the one where agents play (A,A) in state H and play (B,B) in state (R’,G’), the state where eye contact is made at the brothel. In such an equilibrium, would Moshe do better to avoid eye contact, meaning that (A,A) is the equilibrium strategy in states (R,G) and (R’,G)? Using the main theorem of the paper, a simple sufficiency condition obtains, which essentially says that the interesting equilibria exists if Aygun reads with sufficiently high probability, and that Aygun does not expect Moshe to be at the brothel with sufficiently high probability given that he is reading. If those conditions hold, then when Moshe looks at the ground, he will reason that Aygun is likely to be reading, and since Aygun is likely to be reading, he is likely to believe Moshe is at home, and therefore Moshe expects that Aygun expects that Moshe will play A, hence Moshe expects Aygun will play A, hence Moshe plays A. And Aygun reasons in exactly the same manner, so (A,A) is played in all states where eye contact is not made. But remember what is going on in (R’,G), the state where Aygun is not reading and Moshe is looking at the ground. Aygun knows Moshe is going to the brothel because he sees him, and Moshe of course knows that he himself is going to the brothel. So there is mutual knowledge here, but not common knowledge. And yet moving from mutual to common knowledge will break the “good” payoffs!

Now it goes without saying that in these types of coordination games, there are always equilibria where either (A,A) is played in every state or (B,B) in every state. But to the extent that certain states are associated with certain “focal points”, the ways in which customs or focal points can or can’t change equilibria across states are totally non-trivial in situations where agents have different information partitions. For instance, the authors give an example of the focal point at a traffic light where the color of the light is obscured to the drivers with some probability. They also generate a simple model of a bank run where switching depends on how much we expect other people to be following the news. Given the importance of discontinuous jumps and expectations to the financial world, I don’t doubt that understanding how and why equilibria switch is supremely relevant to understanding how stable or fragile a given financial regime is. Who knew politely asking a girl up to your apartment after a date was so related to the stability of the international financial system! (November 2011 working paper – this paper is the job market paper of N. Aygun Dalkiran, a colleague of mine at Kellogg MEDS. If your department is looking for a good theorist, give him a call!)

%d bloggers like this: