“The Law of Genius and Home Runs Refuted,” J. Dinardo & J. Winfree (2010)

Power laws have been described as the source of any number of distributions, from the length of words to the size of cities and ant colonies to the number of home runs in a season by professional ballplayers. It is often assumed that power law data is generated by critically self-organizing processes. For example, sand dripping atop the center of a sand pile will, for chaotic reasons, sometimes cause no movement in the pile and sometimes cause “sand avalanches”. The problem is that viewing the distribution of data does not allow one to infer its dynamic data generating process. The existence of a process where there is “no reason” but randomness for a distribution to appear as a power law does not imply that the data was in fact generated in that process.

Further, the fact that data appears to be distributed a power law does not mean that it, in fact, is. The reason is that the “special characteristic” of a power law – its fat tails – are by definition of “tail” not observed frequently in the data. For many datasets where power laws are claimed, simple comparisons of likelihood ratios against other non-power law distributions will often reject the power law as the correct distribution. For these reasons, (and along with Perling (2005) and Solow et al (2003), among others, Dinardo and Winfree urge skepticism when facing particular claims about the data generating process of data that appears like a power law.


%d bloggers like this: