Posted by

“The Institutional Causes of China’s Great Famine, 1959-1961,” X. Meng, N. Qian & P. Yared (2011)

Nancy Qian, along with a big group of coauthors, has done a great amount of interesting empirical work in recent years on the economics of modern China; among other things, she has shown that local elections actually do cause policy changes in line with local preferences and that the state remains surprisingly powerful in the Chinese economy. In this paper with Xin Meng and Pierre Yared, she considers what is likely the worst famine in the history of mankind, China’s famous famine following the Great Leap Forward. After a agricultural production shock in 1959, a series of misguided policy experiments in the mid-1950s (like “backyard steel” production, which produced worthless metal), and an anti-Rightist purge which ended a brief period of less rigid bureaucracy, 30 million or so people would die from hunger over the next two years, with most deaths among the young and the very old. To put this in relative context, in the worst-hit counties, the birth-cohorts that should have been born or very young in 1960 and 1961 are today missing more than 80% of their projected members.

What is interesting, and what we have known since Sen, is that famines generally result from problems of food distribution rather than food production. And, indeed, the authors show that total grain production in caloric terms across rural parts of China is a multiple of what is necessary to hold off starvation during the height of the productivity shock. What is interesting and novel, though, is that provinces with higher historic per-capita grain production had the highest mortality, and likewise counties with the highest per-capita production as measured by a proxy based on climate also have the largest number of “missing” members in their birth year cohort in the 1990 census. This is strange – you might think that places that are living on the edge in normal times are most susceptible to famine.

This is where politics comes into play. The Chinese government “sent down” many competent bureaucrats during the anti-Rightist purges in the late 1950s, limiting the ability of the government to use flexible mechanisms for food procurement. The food system at the time involved the central government collecting a set amount of grain from each region, then returning stocks to communal kitchens. Now, local leaders had a strong incentive to understate how much was produced in a given year so that they could use the remainder for local power purposes. Because of limited communication technology and ineffective bureaucracy, the optimal mechanism (not specified formally, but apparently done so in an earlier version) for the central government involved pre-setting fixed production goals for every region. Here is the problem: imagine you wish the city, rural area 1 and rural area 2 to have the same expected consumption, with the city producing no food, and rural area 1 producing 1 ton per capita per year and rural area 2 producing 1.4 tons per capita. This gives total consumption of .8 tons per capita if the government sets in advance a fixed “tax” of .2 tons per capita from 1 and .6 tons per capita from region 2. Now a productivity shock lowers production everywhere by 10 percent. The city still gets its .8 tons per capita (since the “tax” is fixed), but area 1 now gets .9*1-.2=.7 tons per capita, and area 2 gets 1.4*.9-.6=.66 tons per capita. That is, the lack of flexibility in the system is more likely to push the productive regions into famine than other regions.

Now, this is not the whole story. Alternative explanations, already suggested in the literature, also are quantitatively important. Places with more anti-Rightist purges before the famine saw higher mortality (see this 2011 APSR by Kung and Chen), as did places with earlier adoption of communal dining halls or larger increases in backyard steel production, both proxies for “zealous” adherence to the Great Leap Forward. I would really like to see some attempt at a decomposition here: if you buy that local political leadership, the central government quota system, and political punishment of counterrevolutionary areas were all important, and that weather shocks alone were not, how many of the deaths should we ascribe to each of those factors? This seems an important question for preventing future famines. It seems that a further fleshing out of how these results relate to the old theory of the firm debates about flexibility of local managers under imperfect and partially unverifiable reporting can help us understand what was going on with the CCP policy choices; I’m thinking, for instance, of explicitly showing whether it is true that loss of members of the bureacracy (i.e., an increase in the cost of monitoring) necessarily incentivizes more rigid allocation rules. Theory here could help to quantify how important this mechanism might be.

2011 working paper (IDEAS version). This paper is R&R at ReStud currently. Qian has a couple other working papers that caught my eye. First, a paper with Duflo and Banerjee on Chinese transportation infrastructure finds very little impact on relative incomes of (quasi-random) access to a good transportation network, and suggests in a short model (which is less convincing…) that relative immobility of capital might be causing this. The techniques in the paper are similar to those used by Ben Faber in his very nice paper showing Krugman’s home market effect: if you are small and poor, being connected with a big productive place may not be good for you due to increasing returns to scale. Qian also has a 2013 paper with Nathan Nunn on food aid which suggests, pretty convincingly, that food aid in civil war zones prolongs conflicts; the mechanism, roughly, is that local armies can easily steal the aid and hence have less reason to sue for peace. The identification strategy here is quite nice: the US government buys wheat for price stabilization reasons, then gives much of this away to impoverished countries. The higher the price of wheat, the less the government surplus is, hence the less is given away.

“Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks,” B.A. Prakash et al (2011)

No need to separate economics from the rest of the social sciences, and no need to separate social science from the rest of science: we often learn quite a bit from our compatriot fields. Here’s a great example. Consider any epidemic diffusion, where a population (of nodes) is connected to each other (along, in this case, unweighted edges, equal to 1 if and only if there is a link between the nodes). Consider the case where nodes can become “infected” – in economics, we may think of nodes as people or cities adopting a new technology, or purchasing a new product. Does a given seeding on the network lead to an “infection” that spreads across the network, or is the network fairly impervious to infections?

This seems like it must be a tricky question, for nodes can be connected to other nodes in an arbitrary fashion. Let’s make it even more challenging for the analyst: allow there to be m “susceptible” states, an “exposed” state, an “infected” state, and N “vaccinated” states, who cannot be infected. Only exposed or infected agents can propagate an infection, and do so to each of their neighbors in any given period according to probabilities a and b, independently across neighbors. Parameters tell me the probability each agent transitions from susceptible or vaccinated states to other such states.

You may know the simple SIR model – susceptible, infected, recovered. In these models, all agents begin as susceptible pr infected. If my neighbor is infected and I am susceptible, he gives me the disease with probability a. If I am infected, I recover with probability c. This system spreads across the population if the first eigenvalue of the adjacency matrix (which equals 1 if two people are connected, and 0 otherwise) is greater than a/c. (Incredibly, I believe this proof dates back to Kermack and McKendrick in 1927). That is, the only way the network topology matters is in a single-valued summary statistic, the first eigenvalue. Pretty incredible.

The authors of the present paper show that this is a general property. For any epidemic model in which disease spreads over a network such that, first, transmissions are independent across neighbors, and second, one can only enter the exposed or infected state from an exposed or infected neighbor, the general property is the same: the disease spreads through the population if the first eigenvalue of the adjacency matrix is larger than a constant which depends only on model parameters and not on the topology of the network (and, in fact, these parameters are easy to characterize). It is a particularly nice proof. First we compute the probabilities of transitioning from each state to any other. This gives us a discrete-time nonlinear dynamic system. Such systems are asymptotically stable if all real eigenvalues of the nonlinear dynamic are less than one in absolute value. If there are no infections at all, the steady state is just the steady state of a Markov chain: only infected or exposed people can infect me, so the graph structure doesn’t matter if we assume no infections, and transition between the susceptible and vaccinated states are just Markov by assumption. We then note that the Jacobian has a nice block structure which limits the eigenvalues to being one of two types, show that the first type of eigenvalues are always less than one in absolute value, then show that the second types are less than one if and only if a property depending on model parameters only are satisfied; this property has nothing to do with the network topology.

The result tells you some interesting things as well. For example, say you wish to stop the spread of an epidemic. Should you immunize people with many friends? No – you should immunize the person who lowers the first eigenvalue of the adjacency matrix the most. This result is independent of the actual network topology or the properties of the disease (how long it incubates, how fast it transmits, how long people stay sick, how likely they are to develop natural immunity, etc.). Likewise, in the opposite problem, if you wish an innovation to diffuse through a society, how should you organize conferences or otherwise create a network? Create links between people or locations such that the first eigenvalue of the adjacency matrix increases by the highest amount. Again, this is independent of the current network topology or the properties of the particular invention you wish to diffuse. Nice.

Final conference paper from ICDM2011. (No IDEAS version).

“Railroads of the Raj: Estimating the Impact of Transportation Infrastructure,” D. Donaldson (2013)

Somehow I’ve never written about Dave Donaldson’s incredible Indian railroad paper before; as it has a fair claim on being the best job market paper in the past few years, it’s time to rectify that. I believe Donaldson spent eight years as LSE working on his PhD, largely made up of this paper. And that time led to a well-received result: in addition to conferences, a note on the title page mentions that the paper has been presented at Berkeley, BU, Brown, Chicago, Harvard, the IMF, LSE, MIT, the Minneapolis Fed, Northwestern, Nottingham, NYU, Oxford, Penn, Penn State, the Philly Fed, Princeton, Stanford, Toronto, Toulouse, UCL, UCLA, Warwick, the World Bank and Yale! So we can safely say, this is careful and well-vetted work.

Donaldson’s study considers the importance of infrastructure to development; it is, in many ways, the opposite of the “small changes”, RCT-based development literature that was particularly en vogue in the 2000s. Intuitively, we all think infrastructure is important, both for improving total factor productivity and for improving market access. The World Bank, for instance, spends 20 percent of its funds on infrastructure, more than “education, health, and social services combined.” But how important is infrastructure spending anyway? That’s a pretty hard question to define, let alone answer.

So let’s go back to one of the great infrastructure projects in human history: the Indian railroad during the British Raj. The British built over 67,000 km of rail in a country with few navigable rivers. They also, luckily for the economist, were typically British in the enormous number of price, weather, and rail shipment statistics they collected. Problematically for the economist, these statistics tended to be hand-written in weathered documents hidden away in the back rooms of India’s bureaucratic state. Donaldson nonetheless collected almost 1.5 million individual pieces of data from these weathered tomes. Now, you might think, let’s just regress new rail access on average incomes, use some IV to make sure that rail lines weren’t endogenous, and be done with it. Not so fast! First, there’s no district-level income per capita data for India in the 1800s! And second, we can use some theory to really tease out why infrastructure matters.

Let’s use four steps. First, try to estimate how much rail access lowered trade costs per kilometer; if a good is made in only one region, then theory suggests that the trade cost between regions is just the price difference of that commodity across regions. Even if we had shipping receipts, this wouldn’t be sufficient; bandits, and spoilage, and all the rest of Samuelson’s famous “iceberg” raise trade costs as well. Second, check whether lowered trade costs actually increased trade volume, and at what elasticity, using rainfall as a proxy for local productivity shocks. Third, note that even though we don’t have income, theory tells us that for agricultural workers, percentage changes in total production per unit of land deflated by a local price index is equivalent to percentage changes in real income per unit of land. Therefore, we can check in a reduced form way whether new rail access increases real incomes, though we can’t say why. Fourth, in Donaldson’s theoretical model (an extension, more or less, of Eaton and Kortum’s Ricardian model), trade costs and differences in region sizes and productivity shocks in all regions all interact to affect local incomes, but they all act through a sufficient statistic: the share of consumption that consists of local products. That is, if we do our regression testing for the impact of rail access on real income changes, but control for changes in the share of consumption from within the district, we should see no effect from rail access.

Now, these stages are tough. Donaldson constructs a network of rail, road and river routes using 19th century sources linked on GIS, and traces out the least-cost paths from any one district to another. He then non-linearly estimates the relative cost per kilometer of rail, sea, river and road transport using the prices of eight types of salt, each of which were sold across British India but only produced in a single location. He then finds that lowered trade costs do appear to raise trade volumes with quite high elasticity. The reduced form regression suggests that access to the Indian railway increased local incomes by an average of 16 percent (Indian real incomes per capita increased only 22 percent during the entire period 1870 to 1930, so 16 percent locally is substantial). Using the “trade share” sufficient statistic described above, Donaldson shows that almost all of that increase was due to lowered trade costs rather than internal migration or other effects. Wonderful.

This paper is a great exercise in the value of theory for empiricists. Theory is meant to be used, not tested. Here, fairly high-level trade theory – literally the cutting edge – was deployed to coax an answer to a super important question even though atheoretical data could have provided us nothing (remember, there isn’t even any data on income per capita to use!). The same theory also allowed to explain the effect, rather than just state it, a feat far more interesting to those who care about external validity. Two more exercises would be nice, though; first, and Donaldson notes this in the conclusion, trade can also improve welfare by lowering volatility of income, particularly in agricultural areas. Is this so in the Indian data? Second, rail, like lots of infrastructure, is a network – what did the time trend in income effects look like?

September 2012 Working Paper (IDEAS version). No surprise, Donaldson’s website mentions this is forthcoming in the AER. (There is a bit of a mystery – Donaldson was on the market with this paper over four years ago. If we need four years to get even a paper of this quality through the review process, something has surely gone wrong with the review process in our field.)

“The Axiomatic Structure of Empirical Content,” C. Chambers, F. Echenique & E. Shmaya (2013)

Here’s a particularly interesting article at the intersection of philosophy of science and economic theory. Economic theorists have, for much of the twentieth century, linked high theory to observable data using the technique of axiomatization. Many axiomatizations operate by proving that if an agent has such-and-such behavioral properties, their observed actions will encompass certain other properties, and vice versa. For example, demand functions over convex budget sets satisfy the strong axiom of revealed preference if and only if they are generated by the usual restrictions on preference.

You may wonder, however: to what extent is the axiomatization interesting when you care about falsification (not that you should care, necessarily, but if you did)? Note first that we only observe partial data about the world. I can observe that you choose apples when apples and oranges are available (A>=B or B>=A, perhaps strictly if I offer you a bit of money as well) but not whether you prefer apples or bananas when those are the only two options. This shows that a theory may be falsifiable in principle (I may observe that you prefer strictly A to B, B to C and C to A, violating transitivity, falsifying rational preferences) yet still make nonfalsifiable statements (rational preferences also require completeness, yet with only partial data, I can’t observe that you either weakly prefer apples to bananas, or weakly prefer bananas to apples).

Note something interesting here, if you know your Popper. The theory of rational preferences (complete and transitive, with strict preferences defined as the strict part of the >= relation) is universal in Popper’s sense: these axioms can be written using the “for all” quantifier only. So universality under partial observation cannot be all we mean if we wish to consider only the empirical content of a theory. And partial observability is yet harsher on Popper. Consider the classic falsifiable statement, “All swans are white.” If I can in principle only observe a subset of all of the swans in the world, then that statement is not, in fact, falsifiable, since any of the unobserved swans may actually be black.

What Chambers et al do is show that you can take any theory (a set of data generating processes which can be examined with your empirical data) and reduce it to stricter and stricter theories, in the sense that any data which would reject the original theory still reject the restricted theory. The strongest restriction has the following property: every axiom is UNCAF, meaning it can be written using only “for all” operators which negate a conjunction of atomic formulas. So “for all swans s, the swan is white” is not UNCAF (since it lacks a negation). In economics, the strict preference transitivity axiom “for all x,y,z, not x>y and y>z and z>x” is UNCAF and the completeness axiom “for all x,y, x>=y or y>=x” is not, since it is an “or” statement and cannot be reduced to the negation of a conjunction. It is straightforward to extend this to checking for empirical content relative to a technical axiom like continuity.

Proving this result requires some technical complexity, but the result itself is very easy to use for consumers and creators of axiomatizations. Very nice. The authors also note that Samuelson, in his rejoinder to Friedman’s awful ’53 methodology paper, more or less got things right. Friedman claimed that the truth of axioms is not terribly important. Samuelson pointed out that either all of a theory can falsified, in which case since the axioms themselves are always implied by a theory Friedman’s arguments are in trouble, or the theory makes some non-falsifiable claims, in which case attempts to test the theory as a whole are uninformative. Either way, if you care about predictive theories, you ought choose those the weakest theory that generates some given empirical content. In Chambers et al’s result, this means you better be choosing theories whose axioms are UNCAF with respect to technical assumptions. (And of course, if you are writing a theory for explanation, or lucidity, or simplicity, or whatever non-predictive goal you have in mind, continue not to worry about any of this!)

Dec 2012 Working Paper (no IDEAS version).

“An Elementary Theory of Comparative Advantage,” A. Costinot (2009)

Arnaud Costinot is one of many young economists doing interesting work in trade theory. In this 2009 Econometrica, he uses a mathematical technique familiar to any auction theorist – log-supermodularity – to derive a number of general results about trade which have long been seen as intractable, using few assumptions other than free trade and immobile factors of production.

Take two standard reasons for the existence of trade. First is differences in factor productivity. Country A ought produce good 1 and Country B good 2 if A has higher relative productivity in good 1 than B, f(1,A)/f(2,A) > f(1,B)/f(2,B). This is simply Ricardo’s law of comparative advantage. Ricardo showed that comparative advantage in good 1 by country A means that under (efficient) free trade, country A will actually produce more of good A than country B. The problem is when you have a large number of countries and a large number of goods; the simple algebra of Ricardo is no longer sufficient. Here’s the trick, then. Note that the 2-country, 2-good condition just says that the production function f is log-supermodular in countries and goods; “higher” countries are relatively more productive producing “higher” goods, under an appropriate ranking (for instance, more educated workforce countries might be “higher” and more complicated products might be “higher”; all that matters is that such an order exists). If the production function is log-supermodular, then aggregate production is also log-supermodular in goods and countries. Why? In this elementary model, each country specializes in producing only one good. If aggregate production is not log-supermodular, then maximizing behavior by countries means the marginal return to factors of production for a “low” good must be high in the “high” countries and low in the “low” countries. This cannot happen if countries are maximizing their incomes since each country can move factors of production around to different goods as they like and the production function is log-supermodular. What does this theorem tell me? It tells me that under trade with any number of countries and goods, there is a technology ladder, where “higher” countries produce “higher” goods. The proof is literally one paragraph, but it is impossible without the use of mathematics of lattices and supermodularity. Nice!

Consider an alternative model, Heckscher-Ohlin’s trade model which suggests that differences in factor endowments, not differences in technological or institutional capabilities which generate Ricardian comparative advantage, are what drives trade. Let the set of factors of production be distributed across countries according to F, and let technology vary across countries but only in a Hicks-neutral way (i.e., “technology” is just a parameter that scales aggregate production up or down, regardless of how that production is created or what that production happens to be). Let the production function, then, be A(c)h(g,p); that is, a country-specific technology parameter A(c) times a log-supermodular function of the goods produced g and the factors of production p. Assume further that factors are distributed such that “high” countries are relatively more-endowed with “high” factors of production, according to some order; many common distribution functions will give you this property. Under these assumptions, again, “high” countries produce “high” goods in a technology ladder. Why? Efficiency requires that each country assign “high” factors of production to “high” goods. The distributional assumption tells me that “high” factors are more likely to appear in “high” countries. Hence it can be proven using some simple results from lattice theory that “high” countries produce more “high” goods.

There are many further extensions, the most interesting one being that even though the extensions of Ricardo and Heckscher-Ohlin both suggest a ladder of “higher” and “lower” goods, these ladders might not be the same, and hence if both effects are important, we need more restrictive assumptions on the production function to generate interesting results about the worldwide distribution of trade. Costinot also points out that the basic three type (country, good, factor of production) model with log-supermodularity assumptions fits many other fields, since all it roughly says is that heterogeneous agents (countries) with some density of characteristics (goods and factors of productions) then sort into outcomes according to some payoff function of the three types; e.g., heterogeneous firms may be choosing different financial instruments depending on heterogeneous productivity. Ordinal discussion of which types of productivity lead firms to choose which types of financial instruments (or any similar problem) are often far, far easier using log-supermodularity arguments that using functional forms plus derivatives.

Final 2009 ECTA (IDEAS version). Big thumbs up to Costinot for putting the final, published version of his papers on his website.

“What Determines Productivity,” C. Syverson (2011)

Chad Syverson, along with Nick Bloom, John van Reenen, Pete Klenow and many others, has been at the forefront of a really interesting new strand of the economics literature: persistent differences in productivity. Syverson looked at productivity differences within 4-digit SIC industries in the US (quite narrow industries like “Greeting Cards” or “Industrial Sealants”) a number of years back, and found that in the average industry, the 90-10 ratio of total factor productivity plants was almost 2. That is, the top decile plant in the average industry produced twice as much output as the bottom decline plant, using exactly the same inputs! Hsieh and Klenow did a similar exercise in China and India and found even starker productivity differences, largely due a big left-tail of very low productivity firms. This basic result is robust to different measures of productivity, and to different techniques for identifying differences; you can make assumptions which let you recover a Solow residual directly, or run a regression (adjusting for differences in labor and capital quality, or not), or look at deviations like firms having higher marginal productivity of labor than the wage rate, etc. In the paper discussed in the post, Syverson summarizes the theoretical and empirical literature on persistent productivity differences.

Why aren’t low productivity firms swept from the market? We know from theory that if entry is allowed, potentially infinite and instantaneous, then no firm can remain which is less productive than the entrants. This suggests that persistence of inefficient firms must result from either limits on entry, limits on expansion by efficient firms, or non-immediate efficiency because of learning-by-doing or similar (a famous study by Benkard of a Lockwood airplane showed that a plant could produce a plane with half the labor hours after producing 30, and half again after producing 100). Why don’t inefficient firms already in the market adopt best practices? This is related to the long literature on diffusion, which Syverson doesn’t cover in much detail, but essentially it is not obvious to a firm whether a “good” management practice at another firm is actually good or not. Everett Rogers, in his famous “Diffusion of Innovations” book, refers to a great example of this from Peru in the 1950s. A public health consultant was sent for two years to a small village, and tried to convince the locals to boil their water before drinking it. The water was terribly polluted and the health consequences of not boiling were incredible. After two years, only five percent of the town adopted the “innovation” of boiling. Some didn’t adopt because it was too hard, many didn’t adopt because of a local belief system that suggested only the already-sick ought drink boiled water, some didn’t adopt because they didn’t trust the experience of the advisor, et cetera. Diffusion is difficult.

Ok, so given that we have inefficient firms, what is the source of the inefficiency? It is difficult to decompose all of the effects. Learning-by-doing is absolutely relevant in many industries – we have plenty of evidence on this count. Nick Bloom and coauthors seem to suggest that management practices play a huge role. They have shown clear correlation between “best practice” management and high TFP across firms, and a recent randomized field experiment in India (discussed before on this site) showed massive impacts on productivity from management improvements. Regulation and labor/capital distortions also appear to play quite a big role. On this topic, James Schmitz wrote a very interesting paper, published in 2005 in the JPE, on iron ore producers. TFP in Great Lakes ore had been more or less constant for many decades, with very little entry or foreign competition until the 1980s. Once Brazil began exporting ore to the US, labor productivity doubled within a handful of years, and capital and total factor productivity also soared. A main driver of the change was more flexible workplace rules.

Final version in 2011 JEP (IDEAS version). Syverson was at Kellogg recently presenting a new paper of his, with an all-star cast of coauthors, on the medical market. It’s well worth reading. Medical productivity is similarly heterogeneous, and since the medical sector is coming up on 20% of GDP, the sources of inefficiency in medicine are particularly important!

Paul Samuelson’s Contributions to Welfare Economics, K. Arrow (1983)

I happened to come across a copy of a book entitled “Paul Samuelson and Modern Economic Theory” when browsing the library stacks recently. Clear evidence of his incredible breadth are in the section titles: Arrow writes about his work on social welfare, Houthhaker on consumption theory, Patinkin on money, Tobin on fiscal policy, Merton on financial economics, and so on. Arrow’s chapter on welfare economics was particularly interesting. This book comes from the early 80s, which is roughly the end of social welfare as a major field of study in economics. I was never totally clear on the reason for this – is it simply that Arrow’s Possibility Theorem, Sen’s Liberal Paradox, and the Gibbard-Satterthwaite Theorem were so devastating to any hope of “general” social choice rules?

In any case, social welfare is today little studied, but Arrow mentions a number of interesting results which really ought be better known. Bergson-Samuelson, conceived when the two were in graduate school together, is rightfully famous. After a long interlude of confused utilitarianism, Pareto had us all convinced that we should dismiss cardinal utility and interpersonal utility comparisons. This seems to suggest that all we can say about social welfare is that we should select a Pareto-optimal state. Bergson and Samuelson were unhappy with this – we suggest individuals should have preferences which represent an order (complete and transitive) over states, and the old utilitarians had a rule which imposed a real number for society’s value of any state (hence an order). Being able to order states from a social point of view seems necessary if we are to make decisions. Some attempts to extend Pareto did not give us an order. (Why is an order important? Arrow does not discuss this, but consider earlier attempts at extending Pareto like Kaldor-Hicks efficiency: going from state s to state s’ is KH-efficient if there exist ex-post transfers under which the change is Paretian. Let person a value the bundle (1,1)>(2,0)>(1,0)>all else, and person b value the bundle (1,1)>(0,2)>(0,1)>all else. In state s, person a is allocated (2,0) and person b (0,1). In state s’, person a is allocated (1,0) and person b is allocated (0,2). Note that going from s to s’ is a Kaldor-Hicks improvement, but going from s’ to s is also a Kaldor-Hicks improvement!)

Bergson and Samuelson wanted to respect individual preferences – society can’t prefer s to s’ if s’ is a Pareto improvement on s in the individual preference relations. Take the relation RU. We will say that sRUs’ if all individuals weakly prefer s to s’. Not that though RU is not complete, it is transitive. Here’s the great, and non-obvious, trick. The Polish mathematician Szpilrajn has a great 1930 theorem which says that if R is a transitive relation, then there exists a complete relation R2 which extends R; that is, if sRs’ then sR2s’, plus we complete the relation by adding some more elements. This is not a terribly easy proof, it turns out. That is, there exists social welfare orders which are entirely ordinal and which respect Pareto dominance. Of course, there may be lots of them, and which you pick is a problem of philosophy more than economics, but they exist nonetheless. Note why Arrow’s theorem doesn’t apply: we are starting with given sets of preferences and constructing a social preference, rather than attempting to find a rule that maps any individual preferences into a social rule. There have been many papers arguing that this difference doesn’t matter, so all I can say is that Arrow himself, in this very essay, accepts that difference completely. (One more sidenote here: if you wish to start with individual utility functions, we can still do everything in an ordinal way. It is not obvious that every indifference map can be mapped to a utility function, and not even true without some type of continuity assumption, especially if we want the utility functions to themselves be continuous. A nice proof of how we can do so using a trick from probability theory is in Neuefeind’s 1972 paper, which was followed up in more generality by Mount and Reiter here at MEDS then by Chichilnisky in a series of papers. Now just sum up these mapped individual utilities, and I have a Paretian social utility function which was constructed entirely in an ordinal fashion.)

Now, this Bergson-Samuelson seems pretty unusable. What do we learn that we don’t know from a naive Pareto property? Here are two great insights. First, choose any social welfare function from the set we have constructed above. Let individuals have non-identical utility functions. In general, there is no social welfare function which is maximized by always keeping every individual’s income identical in all states of the world! The proof of this is very easy if we use Harsanyi’s extension of Bergson-Samuelson: if agents are Expected Utility maximizers, than any B-S social welfare function can be written as the weighted linear combination of individual utility functions. As relative prices or the social production possibilities frontier changes, the weights are constant, but the individual marginal utilities are (generically) not. Hence if it was socially optimal to endow everybody with equal income before the relative price change, it (generically) is not later, no matter which Pareto-respecting measure of social welfare your society chooses to use! That is, I think, an astounding result for naive egalitarianism.

Here’s a second one. Surely any good economist knows policies should be evaluated according to cost-benefit analysis. If, for instance, the summed willingness-to-pay for a public good exceeds the cost of the public good, then society should buy it. When, however, does a B-S social welfare function allow us to make such an inference? Generically, such an inference is only possible if the distribution of income is itself socially optimal, since willingness-to-pay depends on the individual budget constraints. Indeed, even if demand estimation or survey evidence suggests that there is very little willingness-to-pay for a public good, society may wish to purchase the good. This is true even if the underlying basis for choosing the particular social welfare function we use has nothing at all to do with equity, and further since the B-S social welfare function respects individual preferences via the Paretian criterion, the reason we build the public good also has nothing to do with paternalism. Results of this type are just absolutely fundamental to policy analysis, and are not at all made irrelevant by the impossibility results which followed Arrow’s theorem.

This is a book chapter, so I’m afraid I don’t have an online version. The book is here. Arrow is amazingly still publishing at the age of 91; he had an interesting article with the underrated Partha Dasgupta in the EJ a couple years back. People claim that relative consumption a la Veblen matters in surveys. Yet it is hard to find such effects in the data. Why is this? Assume I wish to keep up with the Joneses when I move to a richer place. If I increase consumption today, I am decreasing savings, which decreases consumption even more tomorrow. How my desire to change consumption today if I have richer peers then depends on that dynamic tradeoff, which Arrow and Dasgupta completely characterize.

“Incentives for Unaware Agents,” E.L. von Thadden & X. Zhao (2012)

There is a paradox that troubles a lot of applications of mechanism design: complete contracts (or, indeed, conditional contracts of any kind!) appear to be quite rare in the real world. One reason for this may be that agents are simply unaware of what they can do, an idea explored by von Thadden and Zhao in this article as well as by Rubinstein and Glazer in a separate 2012 paper in the JPE. I like the opening example in Rubinstein and Glazer:

“I went to a bar and was told it was full. I asked the bar hostess by what time one should arrive in order to get in. She said by 12 PM and that once the bar is full you can only get in if you are meeting a friend who is already inside. So I lied and said that my friend was already inside. Without having been told, I would not have known which of the possible lies to tell in order to get in.”

The contract itself gave the agent the necessary information. If I don’t specify the rule that patrons whose friend is inside are allowed entry, then only those who are aware of that possibility will ask. Of course, some patrons who I do wish to allow in, because their friend actually is inside, won’t know to ask unless I tell them. If the harm to the bar from previously unaware people learning and then lying overwhelms the gain from allowing unaware friends in, then the bar is better off not giving an explicit “contract”. Similar problems occur all the time. There are lots of behavioral explanations (recall the famous Israeli daycare which was said to have primed people into an “economic relationship” state of mind by setting a fine for picking kids up late, leading to more lateness, not less). But the bar story above relies on no behavioral action aside from agents having a default (ask about the friend clause if aware, or don’t ask if unaware) which can be removed if agents are informed about their real possible actions when given a contract.

When all agents are unaware, the tradeoff is simple, as above: I make everyone aware of their true actions if the cost of providing incentive rents is exceeded by the benefit of agents switching to actions I prefer more. Imagine that agents can not clean, partially clean, or fully clean their tools at the end of the workday (giving some stochastic output of cleanliness). They get no direct utility out of cleaning, and indeed get disutility the more time they spend cleaning. If there is no contract, they default to partially cleaning. If there is a contract, then if all cleaning pays the same the agent will exert zero effort and not clean. The only reason I might offer high-powered incentives, then, is if the benefit of getting agents to fully clean their tools exceeds the IC rents I will have to pay them once the contract is in place.

More interesting is the case with aware and unaware agents, when I don’t know which agent is which. The unaware agents gets contracts that pay the same wage no matter what their output, and the aware agents can get high-powered incentives. Solving the contracting problem involves a number of technical difficulties (standard envelope theorem arguments won’t work), but the solution is fairly intuitive. Offer two incomplete wage contracts w(x) and v(x). Let v(x) just fully insure: no matter what the output, the wage is the same. Let w(x) increase with better outputs. Choose the full insurance wage v low enough that the unaware agents’ participation constraint just binds. Then offer just enough rents in w(x) that the aware agents, who can take any action they want, actually take the planner preferred action. Unlike in a standard screening problem, I can manipulate this problem by just telling unaware agents about their possible actions: it turns out that profits only increase by making these agents aware if there are sufficiently few unaware agents in the population.

Some interesting sidenotes. Unawareness is “stable” in the sense that unaware agents will never be told they are unaware, and hence if we played this game for two periods, they would remain unaware. It is not optimal for aware agents to make unaware agents aware, since the aware earn information rents as a result of that unawareness. It is not optimal for the planner to make unaware agents aware: the firm is maximizing total profit, announcements strictly decrease wages of aware agents (by taking their information rents), and don’t change unaware agents rents (they get zero since their wage is always chosen to make their PC bind, as is usual for “low types” in screening problems). Interesting.

2009 working paper (IDEAS). Final version in REStud 2012. The Rubinstein/Glazer paper takes a slightly different tack. Roughly, it says that contract designers can write a codex of rules, where you are accepted if you satisfy all the rules. An agent made aware of the rules can figure out how to lie if it involves only lying about one rule. A patient, for instance, may want a painkiller prescription. He can lie about any (unverifiable) condition, but he is only smart enough to lie once. The question is, which codices are not manipulable?

Talk: Carnegie Mellon, April 12 2013

A quick housekeeping note: if you are at Carnegie Mellon, I will be presenting a new paper (joint with Jorge Lemus, also here at Northwestern) in Posner Hall, room 388, from 12 to 1 this Friday. Come by and say hi!

The paper I’m presenting is actually quite cool, and we have pretty high hopes for it. Take the standard models of patent races or sequential innovation. The problem in those models is, do firms exert the socially optimal amount of effort on R&D? But this is not the only problem, as the title of the classic 1962 “Rate and Direction of Inventive Activity” NBER volume makes clear. The amount of effort may be fine, but firms may be working on the wrong projects. The intuition here we know from statements like “Firms do not do enough basic research.” What we’ve done is write out a totally general model of research direction which, for applied folks, is usable as long as you are familiar with standard sequential innovation models. We shut down all the already-known sources of inefficiency, and find that the interaction of direction choice and sequential innovation creates three qualitatively novel sources of inefficiency. Fixing these inefficiencies can be difficult; for example, broad patents for early inventors can actually make inefficiency worse.

Both Jorge and I would love your comments, so come on by!

“The Economic Benefits of Pharmaceutical Innovations: The Case of Cox-2 Inhibitors,” C. Garthwaite (2012)

Cost-benefit analysis and comparative effectiveness are the big buzzwords in medical policy these days. If we are going to see 5% annual real per-capita increases in medical spending, we better be getting something for all that effort. The usual way to study cost effectiveness is with QALYs, Quality-Adjusted Life Years. The idea is that a medicine which makes you live longer, with less pain, is worth more, and we can use alternative sources (such as willingness to accept jobs with higher injury risk) to get numerical values on each component of the QALY.

But medicine has other economic effects, as Craig Garthwaite (from here at Kellogg) reminds us in a recent paper of his. One major impact is through the labor market: the disabled or those with chronic pain choose to work less. Garthwaite considers the case of Vioxx. Vioxx was a very effective remedy for long-term pain, which (it was thought) could be used without the gastrointestinal side effects of ibuprofen or naproxen. It rapidly become very widely prescribed. However, evidence began to accumulate which suggested that Vioxx also caused serious heart problems, and the pill was taken off the market in 2004. Alternative joint pain medications for long term use weren’t really comparable (though, having taken naproxen briefly for a joint injury, I assure you it is basically a miracle drug.)

We have a great panel on medical spending called MEPS which includes age, medical history, prescriptions, income, and labor supply decisions. That is, we have everything we need for a quick diff-in-diff. Take those with joint pain and those without, before Vioxx leaves the market and after. We see parallel trends in labor supply before Vioxx is removed (though of course, those with joint pain are on average older, more female, and less educated, hence much less likely to work). The year Vioxx is removed, labor supply drops 10 percent among those with joint pain, and even more if we look ahead a few periods after Vioxx is taken off the market.

For more precision, let’s do a two-stage IV on the panel data, first estimating use of any joint pain drug conditioning on the Vioxx removal and the presence of joint pain, then labor supply conditional on use of an joint pain drug. Use of any joint pain drug fell about 50% in the panel following the removal of Vioxx. Labor supply of those with joint pain is about 22 percentage points higher when Vioxx is available in the individual fixed effects IV, meaning a 54% decline in probability of working for those who were taking chronic joint pain drugs before Vioxx was removed. How big an economic effect is this? About 3% of the work force are elderly folks reporting some kind of joint pain, and 20% of them found the pain serious enough to have prescription joint pain medication. If 54% of that group leaves the labor force, this means overall labor supply changed by .35 percentage points because of Vioxx (accounting for spillovers to related drugs), or $19 billion of labor income disappeared when Vioxx was taken off the market. This is a lot, though of course these estimates are not too precise. The point is that medical cost effectiveness studies, in cases like the one studied here, can miss quite a lot if they fail to account for impacts beyond QALYs.

Final working paper (IDEAS page). Paper published in AEJ: Applied 2012.

Follow

Get every new post delivered to your Inbox.

Join 88 other followers

%d bloggers like this: