I’m helping run a reading group on a subfield of economics called “expert testing” this semester, so I’ll be posting a number of notes on the topic over the next couple months, as well as an annotated reading list at the end of March. The subfield in many ways develops out of the calibration literature in statistics, of which this article by Dawid in JASA is the classic article.
Consider a forecaster who tries to predict a stream of draws by nature. For instance, let nature choose a probability of rain every period, and let a weatherman likewise try to predict this probability (measurable to all weather occurring as of yesterday). How should we judge the accuracy of such forecasts? This turns out to be a totally nontrivial problem. “How close you are to being right about the true distribution” doesn’t work, since we only see the draw from nature’s mixed strategy – rain or no rain – and not the underlying strategy.
Dawid proposes calibration as a rule. Let nature play and the forecast predict an arbitrarily long sequence of days. Consider all the days where the forecaster projects rain with probability x, say 30 percent. Then a forecaster is well-calibrated if, in fact, it rained on 30 percent of those days. Calibration is, at best, a minimal property for good forecasting. For instance, just predicting the long-run probability of rain, every day, will ensure a forecaster is well-calibrated.
It is proven that Bayesian agents cannot be subjectively miscalibrated, assuming that forecasters sequentially make predictions from a fixed distribution that is conditional on all past data (i.e., on when it has rained and when not in the past), and assuming that forecasters are coherent, a term due to de Finetti that essentially means the forecaster’s subjective probabilities follow the normal rules of probabilities. That is, after making sufficiently many predictions, a forecaster must believe the empirical event “rain” will occur exactly p percent of the time on days where rain was predicted to occur with probability p. Forecasters cannot believe themselves miscalibrated, no matter what evidence they see to the contrary. The basic reason is that, at time zero, the forecaster had already computed in a coherent way what he will predict conditional on seeing the history that he in fact sees. If he wants to “change his mind” when later making his sequential forecasts – say, if upon predicting snow over and over when it had in fact not snowed – he would essentially need to have two different subjective probabilities in his head, the original conditional one, and the new conditional one. This would violate coherence.
Now, this might not be a problem: perhaps there is a joint distribution over histories which an agent can play that can never become miscalibrated. That is, my forecasting technique is so good that whatever stream of weather nature throws my way, my predictions are calibrated with that reality. Unfortunately, it is very easy to construct a “malevolent” nature – that is, a nature playing minimax against an agent trying to predict it – who will cause any forecasting system to miscalibrate. Dawid and Oakes, both in 1985 articles in JASA, produce simple examples. Basically, if an agent’s probability distribution forecast conditional on history A says to predict rain with probability more than .5, then nature plays sun, and vice versa. In this way, the agent is always poorly calibrated. The implication, essentially, is that no forecasting system – that is, no econometric technique no matter how sophisticated – will be able to always model natural processes. The implication for learning in games is particularly devastating, because even if we think nature doesn’t try to fool us, we certainly can believe that opposing players will try to fool our forecasting algorithm in competitive games of learning.
http://fitelson.org/seminar/dawid.pdf (Final version – published in JASA 1982)