## “The Well-Calibrated Bayesian,” A.P. Dawid (1982)

I’m helping run a reading group on a subfield of economics called “expert testing” this semester, so I’ll be posting a number of notes on the topic over the next couple months, as well as an annotated reading list at the end of March. The subfield in many ways develops out of the calibration literature in statistics, of which this article by Dawid in JASA is the classic article.

Consider a forecaster who tries to predict a stream of draws by nature. For instance, let nature choose a probability of rain every period, and let a weatherman likewise try to predict this probability (measurable to all weather occurring as of yesterday). How should we judge the accuracy of such forecasts? This turns out to be a totally nontrivial problem. “How close you are to being right about the true distribution” doesn’t work, since we only see the draw from nature’s mixed strategy – rain or no rain – and not the underlying strategy.

Dawid proposes calibration as a rule. Let nature play and the forecast predict an arbitrarily long sequence of days. Consider all the days where the forecaster projects rain with probability x, say 30 percent. Then a forecaster is well-calibrated if, in fact, it rained on 30 percent of those days. Calibration is, at best, a minimal property for good forecasting. For instance, just predicting the long-run probability of rain, every day, will ensure a forecaster is well-calibrated.

It is proven that Bayesian agents cannot be subjectively miscalibrated, assuming that forecasters sequentially make predictions from a fixed distribution that is conditional on all past data (i.e., on when it has rained and when not in the past), and assuming that forecasters are coherent, a term due to de Finetti that essentially means the forecaster’s subjective probabilities follow the normal rules of probabilities. That is, after making sufficiently many predictions, a forecaster must believe the empirical event “rain” will occur exactly p percent of the time on days where rain was predicted to occur with probability p. Forecasters cannot believe themselves miscalibrated, no matter what evidence they see to the contrary. The basic reason is that, at time zero, the forecaster had already computed in a coherent way what he will predict conditional on seeing the history that he in fact sees. If he wants to “change his mind” when later making his sequential forecasts – say, if upon predicting snow over and over when it had in fact not snowed – he would essentially need to have two different subjective probabilities in his head, the original conditional one, and the new conditional one. This would violate coherence.

Now, this might not be a problem: perhaps there is a joint distribution over histories which an agent can play that can never become miscalibrated. That is, my forecasting technique is so good that whatever stream of weather nature throws my way, my predictions are calibrated with that reality. Unfortunately, it is very easy to construct a “malevolent” nature – that is, a nature playing minimax against an agent trying to predict it – who will cause any forecasting system to miscalibrate. Dawid and Oakes, both in 1985 articles in JASA, produce simple examples. Basically, if an agent’s probability distribution forecast conditional on history A says to predict rain with probability more than .5, then nature plays sun, and vice versa. In this way, the agent is always poorly calibrated. The implication, essentially, is that no forecasting system – that is, no econometric technique no matter how sophisticated – will be able to always model natural processes. The implication for learning in games is particularly devastating, because even if we think nature doesn’t try to fool us, we certainly can believe that opposing players will try to fool our forecasting algorithm in competitive games of learning.

http://fitelson.org/seminar/dawid.pdf (Final version – published in JASA 1982)

## “Actualist Rationality,” C. Manski (2009)

I should warn you straight away that multiple decision theorists have told me they think this article is bonkers. Nonetheless, I enjoy Chuck Manski’s take on econometrics – roughly, that analysts should be completely agnostic when it comes to missing or spoiled data, and therefore report only best and worst-case ranges for results – and the present paper essentially takes that philosophy into the world of decision theory. Let’s first note, however, some discomfort with the term “actualist rationality”. Manski defines as actualist rational agent simply as one who does not select a weakly dominated action when faced with a decsion. He rejects all axioms related to consistency across counterfactual choices. The problem with the term is that actualism seems to me to have a totally different philosophic term: actualism is the rejection of possible things or worlds (like those in David K. Lewis’ counterfactuals) which do not actually exist. Why not just call actualist rationality “consequentalist rationality”?

The heart of the article, fittingly, is a ultraconsequentialist viewpoint of “should”, particularly from the standpoint of an analyst. No behavioral axiom, such as the intransitivity among decisions which Savage argues ought be held by rational decisionmakers, is allowed. Independence of irrelevant alternatives is also rejected. Clearly, IIA does not allow potential regret to affect decisionmakers, and Manski has long been a fan of utility functions like minimax expected regret.

I’m afraid parts of the article are a red herring. No decision theorist, save perhaps Savage himself, believes that decisionmakers always and everywhere should behave consistently with the Savage axioms. Indeed, the entire field over the past half century has more or less been devoted to exploring what happens when you alter some of the Savage axioms. Many decision theorists, on the other hand, view the Savage axioms as more or less reasonable, and therefore see subjective expected utility maximization as reasonable. In line with this, many decision theorists find EU or SEU useful tools for investigating the world, without proscribing any “ought” distinction on other people’s decisions. In any case, whether you’re convinced by Manski or not, it’s surely worth understanding the types of assumptions you make about your own assumptions: is it valid for economists to make non-consequentialist claims about human behavior? In what contexts? For a number of reasons, I think the answer is yes, deontology and ethics and so on have much to contribute to economics, even from the viewpoint of an analyst. Check out this paper for the converse argument.

http://faculty.wcas.northwestern.edu/~cfm754/actualist_rationality.pdf (WP, final version in Theory and Decision 2009)

## “On Consensus through Communication with a Commonly Known Protocol,” E. Tsakas & M. Voorneveld (2010)

(Site note: I will be down in Cuba until Dec. 24, so posting will be light until then, though I do have a few new papers to discuss. I’m going to meet with some folks there about the recent economic reforms and their effects, so perhaps I’ll have something interesting to pass along on that front.)

A couple weeks ago, I posted about the nice result of Parikh and Krasucki (1990), who show that when communication is pairwise, beliefs can fail to converge under many types of pre-specified orders of communication. In their paper, and in every paper following it that I know of, common knowledge of the order of communication is always assumed. For instance, if Amanda talks with Bob and then Bob talks with Carol, since only common knowledge of the original information partitions is assumed, for Carol to update “properly” she needs to know whether has Bob has talked to Amanda previously.

In a paper pointed out by a commenter, Tsakas and Voorneveld point out through counterexample just how strict this requirement is. They expand the state space to include knowledge of the order of communication (using knowledge in the standard Aumann way). It turns out that with all of the necessary conditions of Parikh and Krasucki holding, and uncertainty about whether a single act of communication occurred, consensus can fail to be reached. What’s worrying here from a modeling perspective is that it is really convenient to model communication as a directed graph, where A links to B if A talks to B infinite times. I see the Tsakas and Voorneveld result as giving some pause to that assumption. In particular, in the example, all agents have common knowledge of the communications graph, since the only uncertainty is in one period and therefore no uncertainty about the structure of the graph.

There is no positive result here: we don’t have useful conditions guaranteeing belief convergence under uncertainty about the protocol. In the paper I’m working on, I restrict all results to “regular” communication, meaning the only communication is through formal channels that occur infinite times, and because of this I only need to assume knowledge of the graph.

http://edocs.ub.unimaas.nl/loader/file.asp?id=1490 (Working Paper. Tsakas and Voorneveld also have a 2007 paper on this topic that corrects some erroneous earlier work: https://gupea.ub.gu.se/dspace/bitstream/2077/4576/1/gunwpe0255.pdf. In particular, even if consensus is reached, information only becomes common knowledge among under really restrictive assumptions. This is important if, for instance, you are studying mechanisms on a network, since many results in game theory require common knowledge about what opponents will do: see Dekel and Brandenburger (1987) and Aumann and Brandenburger (1995), for instance. I’ll have more to say about this about this once I get a few more results proved.)

## “Communication, Consensus & Knowledge,” R. Parikh & P. Krasucki (1990)

Aumann (1976) famously proved that with common priors and common knowledge of a posterior, individuals cannot agree to disagree about a fact. Geanakoplos and Polemarchakis (1982) explained how one might reach a common posterior, by showing that if two agents can communicate their posterior, and then reupdate, they will in finite transfers of information converge on a posterior belief (of course, it might not be the true state of the world that they converge on, but converge they will), and hence will not agree to disagree. This fact turns out to generalize to signal functions other than Bayesian updates, as in Cave (1983).

One might wonder, then: does this result hold for more than two people? The answer is that it does not. Parikh and Krasucki define a communication protocol among N agents as a sequence of names r(t) and s(t) specifying who is speaking and who is listening in every period t. Note that all communication is pairwise. As long as communication is “fair” (more on this shortly), meaning that everyone communicates with everyone else either directly or indirectly an infinite number of times, and as long as the information update function satisfies a convexity property (Bayesian updating does), then beliefs will converge, although unlike in Cave, Aumann and Geanakoplos, the posterior may not be common knowledge.

There is no very simple example of beliefs not converging, but a long(ish) counterexample is found in the paper. A followup by Houy and Menager notes that even when information updates are Bayesian, different order of communication can lead to beliefs converging to different points, and proves results about how much information can be gleaned when we can first discuss in which order we wish to discuss our evidence; if it is common knowledge that two groups of agents disagree about which protocol will make them better off (in the sense of giving them the finest information partition after all updates have been done), then any order of communication, along with the knowledge about who “wanted to speak first”, will leads to beliefs converging to the same point. That is, if Jim and Joe both wish to speak second, and this is common knowledge, then no matter who speaks first, beliefs will converge to the same point).

One important point about Parikh and Krasucki. First, the result is “wrong” in the sense that the method of updating beliefs is problematic. In particular, when agents get new information, the method of updating beliefs turns out to ignore some valuable information. I will make this statement clearer in a post tomorrow.

This entire line of reasoning makes you wonder whether, under common topologies of communication, we can guarantee convergence of beliefs in groups: or indeed, whether we can guarantee that “the boss”, somehow defined, knows at least as much as everyone else. This is the project I’m working on at current, and hopefully will have results to share here by the end of the year.

## “Is Arbitrage Socially Beneficial?,” E. Glen Weyl (2007)

Job market time is coming up, which means I’m sure I’ll be posting more about some of the top job market papers. This short paper on arbitrage is not Glen’s JMP – more on that in a few days, as it is right up my alley and also the best JMP I’ve read this year – but interesting nonetheless. If you’re not familiar with Weyl, you may remember him as the guy who finished his PhD in one year at Princeton a few years back before signing on as a Harvard Fellow.

This paper concerns whether arbitrage is useful. The standard explanation is simple: arbitrage, by creating markets where one may not already exist, allows more efficient allocation of risk, and to the extent that such efficient allocation is useful in determining which projects are socially useful to pursue, allows more efficient production across the economy as a whole: a useful function rather than the biblical “den of thieves”.

But what if consumers disagree, subjectively, about the payoff of an asset? One may believe it is distributed N(-r,1) and another N(r,1), when ex-post, we will learn the true distribution is N(0,1). Assume that the two consumers cannot trade, and both hold zero units of the asset. If an arbitrageur comes along, the good will be relatively cheaper in the first market, and so the arbitrageur will buy cheap there and sell dear in the second market. Since risk-averse individuals would not want any of the asset if they knew the asset was N(0,1), this arbitrage-induced trade will lower ex-post “objective” utility. If the model is broadened to allow that asset prices also help effectively allocate capital to the right projects, and hence the role of arbitragers in collecting information is incorporated into the model, it is still easy to construct examples where arbitrage lowers utility, ex-post.

The point is well-taken, and Weyl strikes me as having a Samuelsonian writing style that skips on fluff and rather goes straight to the most salient aspects of a theoretical model. That said, I have two worries. First, I don’t really know what “objective utility” is. If consumers are subjective expected utility maximizers, then nothing in this model suggests that their ex-ante maximizing selves are made worse off by arbitrageurs. Indeed, as discussed on this site, there is an interesting line of research about how social planners should deal with the problem of “contradictory” subjective beliefs; I think of this as the “Problem of the Duel” reflecting Gilboa’s example of two sharpshooters in a potential duel who (impossibly) both believe themselves to be the quicker draw. The second problem concerns Aumann-style information flow. Why don’t I update my beliefs in this model when I see the existence of the arbitrageur willing to buy or sell the asset? I don’t even think you need Myerson-Satterthwaite to generate a no trade situation here. A simple model of asking an (honest) arbitrageur if he is still willing to buy, then updating, then asking again, ad infinitum, should cause convergence of beliefs as in Geanakoplos. If the arbitrageur is not honest, then some sort of mechanism problem needs to be written out. In any case, even a single and unavoidable information update would change subjective beliefs and therefore change the welfare calculations. In any case, the basic point is that there are massive methodological issues with “objective” social welfare which are not easy to paper over; credit to Weyl, at least, for acknowledging this, and pointing out that even if you reject the existence of “objective” social welfare, the fact that his model leads risk to be allocated to people other than the most risk loving is itself troubling in many contexts beyond consumer welfare.

## “Axioms for Deferred Acceptance,” F. Kojima & M. Manea (2010)

Many allocation problems in economics deal with divisible objects that can be allocated using “money” in the broadest sense of the world. Not everything is this way, however. There are many interesting problems that involve indivisible goods (such as slots in a magnet school) and for which there are legal/ethical problems with using money to allocate (think of kidney exchanges). Given preferences over objects, “optimal” allocation given individual preferences is called the Assignment Problem, and has long been studied in economics and operations research. Gale and Shapley – mathematicians originally, though Shapley in particular is well known to economic theorists for his work on cooperative games – wrote an algorithm called the “deferred acceptance algorithm” in a 1962 paper about college admissions (though this paper is most famous for an example within about stable marriages). Today, much of the work on these types of indivisible allocation problems is associated with the Cambridge schools.

Deferred acceptance works as follows: everyone ranks the objects (strictly), possibly including “empty set”, which is some outside option, such as “marry no one”, and each object “ranks” the individuals, by which we mean something like “each public school has a priority list for who they wish to accept, ranked based on test scores”. In the first round, every individual “applies” to their top choice, and each object accepts, for the time being, the individuals highest in that object’s priority list until that object’s quota is filled (e.g., in the marriage market, the quota is 1), and rejects everyone else. In the 2nd round, everyone not yet assigned applies to their second choice, and objects select any individuals who apply and are higher in that object’s priority list than somebody who was selected in the first round. This continues until all agents are accepted by an object or by “empty set”, which has no quota. It has long been known that DA is Pareto-optimal among all stable matchings, where stable means that no two agents wish to switch objects, and that DA induces truthful revelation of preferences as a dominant strategy for all agents (this is pretty easy to see given how the algorithm works, though note that DA is strategy-proof for one side of a market; when you have an assignment problem where incentive problems crop up for both the individuals and the “objects”, then you will only have strategy proofness for the side that proposes matches, which are the individuals in the description above. For instance, Al Roth has pointed out in a series of papers that hospitals, when dealing with kidney transplants, and schools, when dealing with student allocation, also will game the system, and DA will not prevent this).

Kojima and Manea, unquestionably two of the biggest stars in the theory job market over the last few years, note in this recent Econometrica that despite the importance of DA, no one had yet given a representation theorem for DA in terms of more basic conditions on an allocation rule. They show that, even if we have no idea what the priority rule is (technically, as long as the priority rule is acceptant and substitutable, but these are nonworrisome), any allocation which is nonwasteful, meaning that if an agent prefers an object to “empty set” and that object has not filled its quota then the agent is assigned the object, and IR monotonic, meaning that if everyone’s preferences change such that some agents have fewer objects preferred to the empty set then no agent becomes worse off after preferences change, then the allocation rule must be DA. It’s indeed surprising that you can generate this result without reference to the structure of the object’s priority rules. An alternative characterization replaces IR monotonicity with “weak Maskin monotonicity” (which should sound familiar to you auction theorists) and a condition called population monotonicity and derives the same result. Finally, if you specify a priority C and an allocation rule, then as long as standard Maskin monotonicity is satisfied, the allocation rule will be Pareto efficient and strategy-proof even when groups can correlate their strategies.

(Sidenote: You may be wondering, at this point, why we need Maskin monotonicity to guarantee Pareto efficiency, since DA appears to always gives a “Pareto efficient” match. It is not terribly clear in the paper – perhaps this point is too obvious for theorists in this area – but DA only gives Pareto efficient allocations within the class of stable allocations. It says nothing about Pareto efficiency more generally.)

This paper also suggests that there are many, many other algorithms, or simplifications, or what have you, for which it would be nice to see representation theorems. I can think of one example in particular concerning knowledge diffusion, which I will write about further here when I have some better developed results. Less interesting to me, but perhaps to the behavioralists here, would be representation theorems for various heuristics. That is, if firms or individuals are using some rule of thumb, what exactly does that rule of thumb mean in terms of more general beliefs/behaviors/actions?

## “Quantile Maximization in Decision Theory,” M. Rostek (2010)

Savage famously provide a representation theorem for subjective expected utility maximization, where both a utility over acts and a subjective probability measure are derived from preferences. The problem with his technique is that we are often interested in whether agents have probability distributions in mind – that is, whether they are “probabilistically sophisticated” – even if they are not EU maximizers. In this recent ReStud, Rostek provides a representation for quantile maximizers, such as those that maximize the median outcome, or the minimum outcome (“mixmax”), etc. It turns out such beliefs are probabilistically sophisticated when the quantile being maximized is not extreme – that is, when the agent is maximizing any percentile except 0 or 1.

Non-theorists have considered quantile maximization (particularly its treatment of downside risk), and this paper provides a useful behavioral characterization. I would only note that, in a normative sense, I still can’t think of any reason an individual would maximize quantiles. As the author notes, maximizing the median means ignoring all information (such as the spread) outside the median. For instance, a median maximizer (with u(x)=x) prefers the lottery (.49,\$0,.51,\$10000) to (1,\$9999). As has been mentioned on this site before, counterexamples are not “refutations” of economic theories – we know theories in social science are always wrong. Nonetheless, it’s tough for me to see applications to applied work here. The examples given in the paper are interesting (for instance, a social planner should use the median rather than the mean when maximizing, in a utilitarian sense, public goods provision, since this helps overcome information problems), but don’t seem to necessitate a representation theorem for quantile maximization.

In any case, the proof technique is quite different from Savage (in particular, likelihood relations are not used), so the paper is worth a look for that alone.

http://www.ssc.wisc.edu/~mrostek/Quantile_Maximization.pdf

## “Expected Utility Theory without the Completeness Axiom,” J. Dubra, F. Maccheroni & E. Ok (2004)

(Apologies for the light posting recently. Over the next week, I plan on posting a number of (relatively) recent results from decision theory, then returning to standard fare thereafter. The decision theory results tend to be technical, so I’m going to assume some working knowledge of von Neumann and Savage results.)

The classic von Neumann/Morganstern representation theorem for expected utility is powerful in that it only makes four assumptions about preferences: they are complete (every set of objects can be compared), they are transitive, they are continuous, that the domain is a Polish space, and they have the independence property (if A is preferred to B, then .8A plus .2C is preferred to .8B plus .2C). Deviations from the final property are well-studied (and will be discussed on this page shortly). The third property is a technical assumption, and does rule out things like lexicographic preferences on lotteries, but is generally not considered problematic. The Polish space (a separable metrizable space) is broad enough to encompass R^n and compact metric spaces, so assumption 4 is not worrying. The second property, transitivity, is normatively compelling (the “money pump”) and empirically justifiable depending on how the preference domain is specificied; economists tend not to worry about transitivity, though my philosopher friends generally do not find such an assumption normatively appealing. This paper discusses deviations from the first assumption, completeness.

I think even hardened economists would agree that completeness is neither true of real-world preferences nor normatively appealing – there are many pairs of objects, particularly obscure objects like two-stage AA lotteries, on which I have no preference relation. More commonly, when the agent is a firm or a household, Arrow-style results tell us that we often have no way of aggregating preferences to choose whether A or B is preferred. The authors show that if the domain of preferences is restricted to compact metric spaces, and if completeness is replaced by reflexivity (A is weakly preferred to itself), then a unique “multiutility” representation theorem still exists. That is, there is a unique (in the sense of “biggest”) set U of utility functions such that lottery a is weakly preferred to lottery b iff the expectation of u(a)>=u(b) for all continuous functions u in U. The proof involves cleverly defining a convex cone, applying an infinite-dimensional version of the separating hyperplane theorem, and noting that the polar cone of the convex cone defined earlier gives the set U; details are in the paper.

http://cowles.econ.yale.edu/P/cd/d12b/d1294.pdf (WP version – final version in JET 2004)

## “The Ambiguity Aversion Literature: A Critical Assessment,” N. al-Najjar & J. Weinstein (2009)

A wide class of decision theoretic models have recently attempted to model “ambiguity”, or the lack of firm knowledge about a probability. This is important, because in the traditional Bayesian literature, there is zero distinction between p(a)=p(b)=.5 because I don’t know what the true probability is, or p(a)=p(b)=.5 because I am certain after updating on many pieces of data that .5 is about right for each. The best-known model of ambiguity aversion is the multiple priors model of Gilboa & Schmeidler, in which agents have a set of priors for some events, and given a decision, will assume that the “worst-case” prior is the true one. This model essentially comes from dropping the Sure Thing principle in Savage, and it allows Ellsberg-type paradoxes to be rationalized.

Al-Najjar and Weinstein find such models wanting. First, models like multiple priors, in addition to rationalizing the Ellsberg paradox, also rationalize behavior like the sunk cost fallacy that nearly all economists would agree is normatively irrational. Second, these sorts of models lead to problems with dynamic updating. In particular, new information can lead an agent to change not just his probabilities about acts, but his preferences over those acts themselves. If the decisionmaker is “sophisticated” and does not accept such reversals, then the decisionmaker may ignore new information altogether even when that information is useful. Each of these behaviors are normatively unappealing.

The authors suggest that the “simple” answer for Ellsberg is the most sensible – that people use heuristics in unfamiliar situations such as laboratory experiments, and that people who make Ellsberg-type mistakes would not make them if the consequences of such behavior were explained. Of course, there are many explanations for Ellsberg aside from loosening the Sure Thing principle; I particularly like a version of multiple priors where, from the point of view of the analyst, the decisionmaker is using multiple priors, updating each element of that set according to Bayes, but then using any possible capacity (a mathematical generalization of a probability) consistent with those priors at a given decision. This model is lacking in that it can’t pin down behavior, but it also allows the decisionmaker to, for instance, avoid the sunk cost fallacy while still being susceptible to Ellsberg behavior in a lab.

http://www.kellogg.northwestern.edu/~/media/Files/Faculty/Research/ArticlesBookChaptersWorkingPapers/AmbiguityFinal.ashx (Final WP – ifinal version published in Economics and Philosophy 25 (2009))

## “Information, Trade, and Common Knowledge,” P. Milgrom & N. Stokey (1982)

(An aside: While sitting in a lecture today – Al Roth was giving a talk on his organ donation chains – I was working through a result from Myerson’s famous optimal auction paper, and rather stumped on a technical point, I was rudely surprised to see Myerson himself sit down behind me. This inspired me to look back through a few of the famous early 80s papers on mechanisms and information transfer so as to avoid any embarrassment should one of these guys catch me fumbling through their proofs!)

This Milgrom/Stokey paper is the source of the famous “no-trade theorem”. Before Aumann and his followers showed just how strong the assumption of common knowledge is – and therefore how strong the informational requirements of a rational expectations model with public prices are – there was an assumption that traders could profit on their inside information if they were “small” relative to a market. This turns out to be false. Assume that beliefs are concordant; that is, if we agreed on the state of the world, we would agree on what outcome will occur. If traders are (even slightly) risk averse and we are at an equilibrium with known prices, then none of us will trade (we will infer that everyone knows there must be a “sucker” who is willing to accept the bet, since all trades for insurance reasons have already occurred by the assumption that we are in equilibrium). If there are markets before and after private information is revealed, then there exists a fully revealing ex-post equilibrium; that is, we all learn everyone’s private signal by Aumann’s common knowledge result applied to the change in prices, and therefore update our belief about the true state identically, leading to a new price vector that reveals all the private information. Indeed, with some technical assumptions, any ex-post equilibrium, even one that is not fully revealing, will have prices that incorporate all of the private information available to the agent; he could forget his private signal and will nonetheless be willing to trade in exactly the same way.

The no-trade theorem is a bit worrying, since we do in fact see people trying to trade on private information all the time, even in markets (like a stock market) where the prices are surely common knowledge. A great amount of work has tried to escape this conclusion – a particularly successful argument involves a small subset of ignorant traders (“noise traders”) whose existence suffices to break common knowledge of rationality and allow for trade to resume even among the non-noise traders.