If you’ve read this site at all, you know that I see little value in “testing” economic theories, but if we’re going to do it, we ought at least do it in a way that makes a bit of sense. There are a ton of studies testing whether agents (here meaning not just humans; Chen and coauthors have a series of papers about revealed preference and other forms of maximizing behavior in Capuchin monkeys!) have preferences that can be described by the standard model: a concave, monotonic, continuous utility function that is time-invariant. Generally, the studies do find such maximizing behavior. But this may mean nothing: a theory that is trivially satisfied will never be shown to violate utility maximization, and indeed lots of experiments and empirical datasets see so little variation in prices that nearly any set of choices can be rationalized.
Beatty and Crawford propose a simple fix here. Consider an experiment with only two goods, and two price/income bundles. There is a feasible mixture among those two goods for each bundle. Consider the share of income under each price/income bundle spent on each of the two goods. If, say, 75% of income is spent on Good A under price/income bundle 1, then, for example, utility maximization may be consistent with spending anywhere between 0 and 89% of income on Good A under price/bundle 2. Imagine drawing a square with “income share spent on Good A under price/income bundle 1” on the x-axis, and “income share on A under bundle 2” on the y-axis. Some sets of choices will lie in a part of that square which is incompatible with utility maximization. The greater the proportion of total area which is incompatible with utility maximization, the more restrictive a test of utility maximizing behavior will be. The idea extends in a straightforward way to tests with N goods and M choices.
Beatty and Crawford assume you want a measure of “how well” agents do in a test of revealed preference as a function of both the pass rate (what proportion of the sample does not reject utility maximizing behavior) and the test difficulty (how often a random number generator selecting bundles would pass); if this all sounds like redefining the concept of statistical power, it should. It turns out that r minus a, where r is the pass rate and a is the test difficulty, has some nice axiomatic properties; I’m not totally convinced this part of the paper is that important, so I’ll leave it for you to read. The authors then apply this idea to some Spanish consumption data, where households were tracked for eight quarters. They find that about 96% of households in the sample pass: they show no purchases which violate utility maximizing behavior. But the variation in prices and quarterly income is so minimal that utility maximizing behavior imposes almost no constraints: 91% of random number generators would “pass” given the same variation in prices and incomes.
What do we learn from an exercise like this? There is definitely some benefit: if you want to design experiments concerning revealed preference, the measure in the present paper is useful indeed for helping choose precisely what variation in incomes and prices to use in order to subject revealed preference to a “tough” test. But this assumes you want to test at all. “Science is underdetermined,” they shout from the rooftops! Even if people showed behavior that “rejected” utility maximization, we would surely ask, first, by how much; second, are you sure “budget” and “price” are determined correctly (there is Varian’s error in price measurement, and no one is using lifetime income adjusted for credit constraints when talking about “budgets”); third, are you just rejecting concaveness and not maximizing behavior?; fourth, are there not preference shocks over a two year period, such as my newfound desire to buy diapers after a newborn arrives?; and so on. I think such critiques would be accepted by essentially any economist. Those of the philosophic school that I like to discuss on this site would further note that the model of utility maximization is not necessarily meant to be predictive, that we know it is “wrong” in that clearly people do not always act as if they are maximizers, and that the Max U model is nonetheless useful as a epistemic device for social science researchers.
http://www.tc.umn.edu/~tbeatty/working_papers/revisedpowerpaper.pdf (Final working paper – final version published in AER October 2011)