Here’s a paper that really deserves to be better-known. Last week, I mentioned a result by Foster and Vohra that says that completely uninformed people can pass calibration tests if they are allowed to make predictions that are mixed strategies. Recall that being calibrated means that of the times when a forecaster predicts, say, rain with probability .3, nature actually does rain with probability .3. An application of the minimax theorem says that I can “fool” a calibration test by playing a suitably complex mixed strategy even when I, as a forecaster, actually have no idea what probability distribution nature is playing.
Now you might think that the result implies calibration tests are too “weak” in some sense. That is, if nature is playing {0,1,0,1…} and you predict .5 every period, then you are calibrated, but in no real sense are you making good predictions. A series of papers following Foster-Vohra (Hart and Mas-Colell have a couple, as do Kalai, Lehrer and Smorodinsky) looked at stronger and stronger versions of calibration, such as those that required subsequence predictions to calibrate as well, but kept coming up with the result that a clever mixed strategy could still fool the tests.
In the present paper, Sandroni shows that any test can be manipulated. That is, let a tester choose some test that will, no matter what probability distribution is playing, “pass” someone who actually knows that distribution with probability arbitrarily close to 1. Another fairly simple application of the minimax theorem (Fan’s theorem for infinite games, in this case) shows that a fake forecaster who does not know the true distribution can still also pass that test! That is a devastating result, as far as I’m concerned, for our ability to judge science.
It may not be obvious what this result means. If nature is playing, say, “rain with p=1 every period”, then why not just use the test “your prediction every period must be exactly what nature does or you fail”. In that case, someone who knew what nature was doing would pass but the fake predictor surely wouldn’t. The problem with that (false) counterexample is that the proposed test will not pass a true forecaster almost all of the time no matter what probability distribution nature is playing, and there is no way for the tester to know in advance that nature is playing something so deterministic. If nature was actually playing “rain with p=.99 every period”, the proposed test would fail the knowledgeable forecaster because, simply by the draw of the probability distribution, he would sometimes predict rain when nature draws sun.
A couple of caveats are in order. First, tests here are restricted to finite tests in the sense that we test at period n whether a test has been passed using the predictions and draws of nature up to period n. I will discuss next week results by Dekel and Feinberg, as well as by Eran Shmaya, which deal with “infinite” predictions; measure-theoretic issues make those proofs much less intuitive than the one presented here (though perhaps this is only on the surface, since the present proof uses Fan’s theorem, which uses Hahn-Banach, which uses Axiom of Choice…). Second, you might wonder why I care about testing experts in and of itself. There doesn’t seem to be any decision theory here. Mightn’t I not care about accidentally letting through an “fake” expert if her false prediction is not terribly harmful to whatever decision I’m going to make? For instance, as a classmate of mine pointed out, even if I as the tester don’t know the true distribution of nature, perhaps I only need to know (for utility purposes) who is the expert if nature is playing a deterministic strategy. In a case, I might be fine with a test that “rejects” true predictions when nature is playing a strategy that is not deterministic, as long as I can tell who is a real expert when it comes to deterministic nature. Is such a test possible? Results like this, at the intersection of decision theory, mechanism design, and the statistical literature, really look like the present frontier in this line of research, and I’ll discuss some of them in the next few weeks.
http://www.springerlink.com/content/9vn5dw3xnun9j0dm/fulltext.pdf (Final version – it appears to be ungated at the official IJGT website, but let me know in comments if this doesn’t open for you…)
First, forgive my ignorance in this comment because the topics you cover in this blog are usually well beyond my expertise. (I follow it to broaden my horizons.)
So my question: Does this basically mean that we can never determine expertise with certainty? (Can we always be fooled by non-experts?) I ask because I’ve been reading up on various aspects of expertise (cognitive, behavioral, etc.) and it seems that expertise is harder to obtain than we realize, more fragile than we realize, and harder to identify than we realize.
2nd question: If it’s impossible to determine expertise with certainty, what is the optimal strategy for maximizing the benefits of expertise that does exist?
Always glad to hear from readers – the papers are beyond my expertise until I read them, too!
I actually have answers to both your comments coming up on the site shortly, but I’ll give you the brief answer now. First, you cannot test experts *in isolation*, but you can, for instance, test potential experts against each other and know who is “better”. The heart of the matter is that we never see the actual process nature is using to create some knowledge (like whether it rains or not), but rather only see the realization (it rained today). The realization (rain) is not the same as the underlying process (nature actually is playing 30% chance of rain today, and just happens to draw “rain”). This allows for a lot of scope for fooling tests. But I think it’s better to say that “finding who is the expert with certainty is hard” but not impossible.
For the second, I will address that directly over the next month, but it turns out there are mechanisms which allow us to ensure that, even when we are fooled by a fake expert, we are only harmed in a very minor way. So for decision-making, the inability to find experts might not be too problematic.