In the last post, I wrote about Dawid’s result that no forecasting technique, no matter how clever, will be able to calibrate itself against nature in a coherent way. Is there a way to save calibration? Foster and Vohra claim yes: let forecasters play mixed strategies. That is, rather than predict a 40 percent chance of rain given the history observed and beliefs about what nature will do this period, instead play a strategy that predicts a 60 percent chance of rain with .5 probability and a 20 percent chance of rain with .5 probability. Though in some sense of expectation (I’m abusing the term here), this strategy still predicts a 40 percent chance of rain, the predictions will follow a distribution rather than be a point.
Foster and Vohra let nature choose its joint distribution after seeing the forecaster’s joint distribution. Nature tries to make the agent as poorly calibrated as possible. Nature is even allowed to condition its time t strategies on the history of forecasts made by the agent up until time t-1. In particular, if p is a forecast (from a finite set A of arbitrary fineness) of how often rain arrives, q(p) is the fraction of days it actually rains when p is projected, and n(p,t) is the number of days p is forecast up until time t, then let (q(p)-p)^2 times the proportion of days p is forecast, summed over all possible predictions p, be the calibration, and let an agent be well calibrated with respect to nature’s strategy if her forecasts are such that that term goes to zero as time goes to infinity.
Incredibly, there is a mixed strategy that is sufficient to defeat this malevolent nature. The proof relies on the minimax theorem. You can think of nature and the forecaster as playing a two-player zero sum game. Since the set of forecasts is assumed finite (say, forecasts and nature only produce rain up to the ability to measure it, in increments of .01 inches, with max of 50 inches in one day), von Neumann’s famous theorem applies, and I can just look for the value of the game.
The proof in Foster-Vohra is algebraically tedious, but Fudenberg and Levine give a very simple technique for calibrating in their short followup published in GEB. Essentially, the agent should play each strategy K times, where K is sufficiently large (this is the “initialization stage”). After initialization (which is finite, and during which nature can only beat the agent by a total amount that is finite), every period can be considered a zero-sum stage game. Applying minimax again, nature will choose the strategy that increases how poorly calibrated the agent becomes by the greatest amount, under the assumption that nature correctly forecasts the strategy the agent will use. The agent’s calibration score will increase the most when nature plays the strategy the agent has used the least. But since every action has been used many times, the amount by which nature can try to ruin a person’s calibration in any one period is bounded by an amount that is decreasing in K. From here, it is easy to show that the average increase in the forecaster’s error in any given period is bounded by an arbitrarily small number, and that therefore asymptotically the deviation from perfect calibration is bounded by an arbitrarily small number. That is, agent’s playing mixed strategies make it difficult for even the most malevolent nature to throw off their calibration.
You may be wondering: why is calibration a good criterion for forecasters anyway? The proof here essentially says that I can be well-calibrated even when I make completely uninformed forecasts for KN periods, where N is the number of possible predictions. Perhaps an asymptotic definition of good forecasting is not the most sensible? But, then, what rule should we use to judge forecasts? It turns out the answer to that question is not at all obvious; more importantly, economists’ have a very nice solution to the problem of what rule to use. Instead of choosing a test arbitrarily, why not show that all tests in a class, so big that it contains nearly any reasonable test, will have the same problems as calibration? More on this forthcoming.
http://www.dklevine.com/archive/refs4468.pdf (Final published version, in Biometrika)