Category Archives: Political Economy

“On the Origin of States: Stationary Bandits and Taxation in Eastern Congo,” R. S. de la Sierra (2013)

The job market is yet again in full swing. I won’t be able to catch as many talks this year as I would like to, but I still want to point out a handful of papers that I consider particularly elucidating. This article, by Columbia’s de la Sierra, absolutely fits that category.

The essential question is, why do states form? Would that all young economists interested in development put their effort toward such grand questions! The old Rousseauian idea you learned your first year of college, where individuals come together voluntarily for mutual benefit, seems contrary to lots of historical evidence. Instead, war appears to be a prime mover for state formation; armed groups establish a so-called “monopoly on violence” in an area for a variety of reasons, and proto-state institutions evolve. This basic idea is widespread in the literature, but it is still not clear which conditions within an area lead armed groups to settle rather than to pillage. Further, examining these ideas empirically seems quite problematic for two reasons, first because states themselves are the ones who collect data hence we rarely observe anything before states have formed, and second, because most of the planet has long since been under the rule of a state (with apologies to James Scott!)

De la Sierra brings some economics to this problem. What is the difference between pillaging and sustained state-like forms? The pillager can only extract assets on its way through, while the proto-state can establish “taxes”. What taxes will it establish? If the goal is long-run revenue maximization, Ramsey long ago told us that it is optimal to tax elements that are inelastic. If labor can flee, but the output of the mine can not, then you ought tax the output of the mine highly and set a low poll tax. If labor supply is inelastic but output can be hidden from the taxman, then use a high poll tax. Thus, when will bandits form a state instead of just pillaging? When there is a factor which can be dynamically taxed at such a rate that the discounted tax revenue exceeds what can be pillaged today. Note that the ability to, say, restrict movement along roads, or to expand output through state-owned capital, changes relevant tax elasticities, so at a more fundamental level, capacity by rebels along these margins are also important (and I imagine that extending de la Sierra’s paper will involve the evolutionary development of these types of capacities).

This is really an important idea. It is not that there is a tradeoff between producing and pillaging. Instead, there is a three way tradeoff between producing in your home village, joining an armed group to pillage, and joining an armed group that taxes like a state! The armed group that taxes will, as a result of its desire to increase tax revenue, perhaps introduce institutions that increase production in the area under its control. And to the extent that institutions persist, short-run changes that cause potential bandits to form taxing relationships may actually lead to long-run increases in productivity in a region.

De la Sierra goes a step beyond theory, investigating these ideas empirically in the Congo. Eastern Congo during and after the Second Congo War was characterized by a number of rebel groups that occasionally just pillaged, but occasionally formed stable tax relationships with villages that could last for years. That is, the rebels occasionally implemented something looking like states. The theory above suggests that exogenous changes in the ability to extract tax revenue (over a discounted horizon) will shift the rebels from pillagers to proto-states. And, incredibly, there were a number of interesting exogenous changes that had exactly that effect.

The prices of coltan and gold both suffered price shocks during the war. Coltan is heavy, hard to hide, and must be shipped by plane in the absence of roads. Gold is light, easy to hide, and can simply be carried from the mine on jungle footpaths. When the price of coltan rises, the maximal tax revenue of a state increases since taxable coltan production is relatively inelastic. This is particularly true near airstrips, where the coltan can actually be sold. When the price of gold increases, the maximal tax revenue does not change much, since gold is easy to hide, and hence the optimal tax is on labor rather than on output. An exogenous rise in coltan prices should encourage proto-state formation in areas with coltan, then, while an exogenous rise is gold prices should have little impact on the pillage vs. state tradeoff. Likewise, a government initiative to root out rebels (be they stationary or pillaging) decreases the expected number of years a proto-state can extract rents, hence makes pillaging relatively more lucrative.

How to confirm these ideas, though, when there was no data collected on income, taxes, labor supply, or proto-state existence? Here is the crazy bit – 11 locals were hired in Eastern Congo to travel to a large number of villages, spend a week there querying families and village elders about their experiences during the war, the existence of mines, etc. The “state formation” in these parts of Congo is only a few years in the past, so it is at least conceivable that memories, suitably combined, might actually be reliable. And indeed, the data do seem to match aggregate trends known to monitors of the war. What of the model predictions? They all seem to hold, and quite strongly: the ability to extract more tax revenue is important for proto-state formation, and areas where proto-states existed do appear to have retained higher productive capacity years later perhaps as a result of the proto-institutions those states developed. Fascinating. Even better, because there is a proposed mechanism rather than an identified treatment effect, we can have some confidence that this course is, to some extent, externally valid!

December 2013 working paper (No IDEAS page). You may wonder what a study like this costs (particularly if you are, like me, a theorist using little more than chalk and a chalkboard); I have no idea, but de la Sierra’s CV lists something like a half million dollars of grants, an incredible total for a graduate student. On a personal level, I spent a bit of time in Burundi a number of years ago, including visiting a jungle camp where rebels from the Second Congo War were still hiding. It was pretty amazing how organized even these small groups were in the areas they controlled; there was nothing anarchic about it.

“The Institutional Causes of China’s Great Famine, 1959-1961,” X. Meng, N. Qian & P. Yared (2011)

Nancy Qian, along with a big group of coauthors, has done a great amount of interesting empirical work in recent years on the economics of modern China; among other things, she has shown that local elections actually do cause policy changes in line with local preferences and that the state remains surprisingly powerful in the Chinese economy. In this paper with Xin Meng and Pierre Yared, she considers what is likely the worst famine in the history of mankind, China’s famous famine following the Great Leap Forward. After a agricultural production shock in 1959, a series of misguided policy experiments in the mid-1950s (like “backyard steel” production, which produced worthless metal), and an anti-Rightist purge which ended a brief period of less rigid bureaucracy, 30 million or so people would die from hunger over the next two years, with most deaths among the young and the very old. To put this in relative context, in the worst-hit counties, the birth-cohorts that should have been born or very young in 1960 and 1961 are today missing more than 80% of their projected members.

What is interesting, and what we have known since Sen, is that famines generally result from problems of food distribution rather than food production. And, indeed, the authors show that total grain production in caloric terms across rural parts of China is a multiple of what is necessary to hold off starvation during the height of the productivity shock. What is interesting and novel, though, is that provinces with higher historic per-capita grain production had the highest mortality, and likewise counties with the highest per-capita production as measured by a proxy based on climate also have the largest number of “missing” members in their birth year cohort in the 1990 census. This is strange – you might think that places that are living on the edge in normal times are most susceptible to famine.

This is where politics comes into play. The Chinese government “sent down” many competent bureaucrats during the anti-Rightist purges in the late 1950s, limiting the ability of the government to use flexible mechanisms for food procurement. The food system at the time involved the central government collecting a set amount of grain from each region, then returning stocks to communal kitchens. Now, local leaders had a strong incentive to understate how much was produced in a given year so that they could use the remainder for local power purposes. Because of limited communication technology and ineffective bureaucracy, the optimal mechanism (not specified formally, but apparently done so in an earlier version) for the central government involved pre-setting fixed production goals for every region. Here is the problem: imagine you wish the city, rural area 1 and rural area 2 to have the same expected consumption, with the city producing no food, and rural area 1 producing 1 ton per capita per year and rural area 2 producing 1.4 tons per capita. This gives total consumption of .8 tons per capita if the government sets in advance a fixed “tax” of .2 tons per capita from 1 and .6 tons per capita from region 2. Now a productivity shock lowers production everywhere by 10 percent. The city still gets its .8 tons per capita (since the “tax” is fixed), but area 1 now gets .9*1-.2=.7 tons per capita, and area 2 gets 1.4*.9-.6=.66 tons per capita. That is, the lack of flexibility in the system is more likely to push the productive regions into famine than other regions.

Now, this is not the whole story. Alternative explanations, already suggested in the literature, also are quantitatively important. Places with more anti-Rightist purges before the famine saw higher mortality (see this 2011 APSR by Kung and Chen), as did places with earlier adoption of communal dining halls or larger increases in backyard steel production, both proxies for “zealous” adherence to the Great Leap Forward. I would really like to see some attempt at a decomposition here: if you buy that local political leadership, the central government quota system, and political punishment of counterrevolutionary areas were all important, and that weather shocks alone were not, how many of the deaths should we ascribe to each of those factors? This seems an important question for preventing future famines. It seems that a further fleshing out of how these results relate to the old theory of the firm debates about flexibility of local managers under imperfect and partially unverifiable reporting can help us understand what was going on with the CCP policy choices; I’m thinking, for instance, of explicitly showing whether it is true that loss of members of the bureacracy (i.e., an increase in the cost of monitoring) necessarily incentivizes more rigid allocation rules. Theory here could help to quantify how important this mechanism might be.

2011 working paper (IDEAS version). This paper is R&R at ReStud currently. Qian has a couple other working papers that caught my eye. First, a paper with Duflo and Banerjee on Chinese transportation infrastructure finds very little impact on relative incomes of (quasi-random) access to a good transportation network, and suggests in a short model (which is less convincing…) that relative immobility of capital might be causing this. The techniques in the paper are similar to those used by Ben Faber in his very nice paper showing Krugman’s home market effect: if you are small and poor, being connected with a big productive place may not be good for you due to increasing returns to scale. Qian also has a 2013 paper with Nathan Nunn on food aid which suggests, pretty convincingly, that food aid in civil war zones prolongs conflicts; the mechanism, roughly, is that local armies can easily steal the aid and hence have less reason to sue for peace. The identification strategy here is quite nice: the US government buys wheat for price stabilization reasons, then gives much of this away to impoverished countries. The higher the price of wheat, the less the government surplus is, hence the less is given away.

Paul Samuelson’s Contributions to Welfare Economics, K. Arrow (1983)

I happened to come across a copy of a book entitled “Paul Samuelson and Modern Economic Theory” when browsing the library stacks recently. Clear evidence of his incredible breadth are in the section titles: Arrow writes about his work on social welfare, Houthhaker on consumption theory, Patinkin on money, Tobin on fiscal policy, Merton on financial economics, and so on. Arrow’s chapter on welfare economics was particularly interesting. This book comes from the early 80s, which is roughly the end of social welfare as a major field of study in economics. I was never totally clear on the reason for this – is it simply that Arrow’s Possibility Theorem, Sen’s Liberal Paradox, and the Gibbard-Satterthwaite Theorem were so devastating to any hope of “general” social choice rules?

In any case, social welfare is today little studied, but Arrow mentions a number of interesting results which really ought be better known. Bergson-Samuelson, conceived when the two were in graduate school together, is rightfully famous. After a long interlude of confused utilitarianism, Pareto had us all convinced that we should dismiss cardinal utility and interpersonal utility comparisons. This seems to suggest that all we can say about social welfare is that we should select a Pareto-optimal state. Bergson and Samuelson were unhappy with this – we suggest individuals should have preferences which represent an order (complete and transitive) over states, and the old utilitarians had a rule which imposed a real number for society’s value of any state (hence an order). Being able to order states from a social point of view seems necessary if we are to make decisions. Some attempts to extend Pareto did not give us an order. (Why is an order important? Arrow does not discuss this, but consider earlier attempts at extending Pareto like Kaldor-Hicks efficiency: going from state s to state s’ is KH-efficient if there exist ex-post transfers under which the change is Paretian. Let person a value the bundle (1,1)>(2,0)>(1,0)>all else, and person b value the bundle (1,1)>(0,2)>(0,1)>all else. In state s, person a is allocated (2,0) and person b (0,1). In state s’, person a is allocated (1,0) and person b is allocated (0,2). Note that going from s to s’ is a Kaldor-Hicks improvement, but going from s’ to s is also a Kaldor-Hicks improvement!)

Bergson and Samuelson wanted to respect individual preferences – society can’t prefer s to s’ if s’ is a Pareto improvement on s in the individual preference relations. Take the relation RU. We will say that sRUs’ if all individuals weakly prefer s to s’. Not that though RU is not complete, it is transitive. Here’s the great, and non-obvious, trick. The Polish mathematician Szpilrajn has a great 1930 theorem which says that if R is a transitive relation, then there exists a complete relation R2 which extends R; that is, if sRs’ then sR2s’, plus we complete the relation by adding some more elements. This is not a terribly easy proof, it turns out. That is, there exists social welfare orders which are entirely ordinal and which respect Pareto dominance. Of course, there may be lots of them, and which you pick is a problem of philosophy more than economics, but they exist nonetheless. Note why Arrow’s theorem doesn’t apply: we are starting with given sets of preferences and constructing a social preference, rather than attempting to find a rule that maps any individual preferences into a social rule. There have been many papers arguing that this difference doesn’t matter, so all I can say is that Arrow himself, in this very essay, accepts that difference completely. (One more sidenote here: if you wish to start with individual utility functions, we can still do everything in an ordinal way. It is not obvious that every indifference map can be mapped to a utility function, and not even true without some type of continuity assumption, especially if we want the utility functions to themselves be continuous. A nice proof of how we can do so using a trick from probability theory is in Neuefeind’s 1972 paper, which was followed up in more generality by Mount and Reiter here at MEDS then by Chichilnisky in a series of papers. Now just sum up these mapped individual utilities, and I have a Paretian social utility function which was constructed entirely in an ordinal fashion.)

Now, this Bergson-Samuelson seems pretty unusable. What do we learn that we don’t know from a naive Pareto property? Here are two great insights. First, choose any social welfare function from the set we have constructed above. Let individuals have non-identical utility functions. In general, there is no social welfare function which is maximized by always keeping every individual’s income identical in all states of the world! The proof of this is very easy if we use Harsanyi’s extension of Bergson-Samuelson: if agents are Expected Utility maximizers, than any B-S social welfare function can be written as the weighted linear combination of individual utility functions. As relative prices or the social production possibilities frontier changes, the weights are constant, but the individual marginal utilities are (generically) not. Hence if it was socially optimal to endow everybody with equal income before the relative price change, it (generically) is not later, no matter which Pareto-respecting measure of social welfare your society chooses to use! That is, I think, an astounding result for naive egalitarianism.

Here’s a second one. Surely any good economist knows policies should be evaluated according to cost-benefit analysis. If, for instance, the summed willingness-to-pay for a public good exceeds the cost of the public good, then society should buy it. When, however, does a B-S social welfare function allow us to make such an inference? Generically, such an inference is only possible if the distribution of income is itself socially optimal, since willingness-to-pay depends on the individual budget constraints. Indeed, even if demand estimation or survey evidence suggests that there is very little willingness-to-pay for a public good, society may wish to purchase the good. This is true even if the underlying basis for choosing the particular social welfare function we use has nothing at all to do with equity, and further since the B-S social welfare function respects individual preferences via the Paretian criterion, the reason we build the public good also has nothing to do with paternalism. Results of this type are just absolutely fundamental to policy analysis, and are not at all made irrelevant by the impossibility results which followed Arrow’s theorem.

This is a book chapter, so I’m afraid I don’t have an online version. The book is here. Arrow is amazingly still publishing at the age of 91; he had an interesting article with the underrated Partha Dasgupta in the EJ a couple years back. People claim that relative consumption a la Veblen matters in surveys. Yet it is hard to find such effects in the data. Why is this? Assume I wish to keep up with the Joneses when I move to a richer place. If I increase consumption today, I am decreasing savings, which decreases consumption even more tomorrow. How my desire to change consumption today if I have richer peers then depends on that dynamic tradeoff, which Arrow and Dasgupta completely characterize.

“Pollution for Promotion,” R. Jia (2012)

Ruixue Jia is on the job market from IIES in Stockholm, and she has the good fortune to have a job market topic which is very much au courant. In China, government promotions often depend both on the inherent quality of the politician and on how connected you are to current leaders; indeed, a separate paper by Jia finds that promotion probability in China depends only on the interaction of economic growth and personal connections rather than either factor by itself. Assume that a mayor can choose how much costly effort to exert. The mayor chooses how much dirty and clean technology – complements in production – to use, with the total amount of technology available an increasing function of the mayor’s effort. The mayor may personally dislike dirty technology. For any given bundle of technology, the observed economic output is higher the higher the mayor’s inherent quality (which he does not know). The central government, when deciding on promotions, only observes economic output.

Since mayors with good connections have a higher probability of being promoted for any level of output in their city, the marginal return to effort and the marginal return to dirty technology are increasing in the connectedness of the mayor. For any given distaste for pollution among the mayor, a more connected mayor will mechanically want to substitute clean for dirty technology since higher output is more valuable to him for career concerns while the marginal cost of distaste for pollution has not changed. Further, by a Le Chatelier argument, higher marginal returns to output increase the optimal effort choice, which allows a higher budget to purchase technology, dirty tech included. To the extent that the government cares about limiting the (unobserved) use of dirty tech, this is “almost” the standard multitasking concern: the folly of rewarding A and hoping for B. Although in this case, empirically there is no evidence that the central government cares about promoting local politicians who are good for the environment!

How much do local leaders increase pollution (and simultaneously speed up economic growth!) in exchange for a shot at a better job? The theory above gives us some help. We see that the same politician will substitute in dirty technology if, in some year, his old friends get on the committee that assigns promotions (the Politburo Standing Committee, or PSC, in China’s case). This allows us to see the effect of the Chinese incentive system on pollution even if we know nothing about the quality of each individual politician or whether highly-connected politicians get plum jobs in low pollution regions, since every effect we find is at the within-politician level. Using a diff-in-diff, Jia finds that in the year after a politician’s old friend makes the PSC, sulfur dioxide goes up 25%, a measure of river pollution goes up by a similar amount, industrial GDP rises by 15%, and non-industrial GDP does not change. So it appears that China’s governance institution does incentivize governors, although whether those incentives are good or bad for welfare depends on how you trade off pollution and growth in your utility function.

Good stuff. A quick aside, since what I like about Jia’s work is that she makes an attempt to more than simply find a clever strategy for getting internal validity. Many other recent job market stars – Dave Donaldson and Melissa Dell, for instance – have been equally good when it comes to caring about more than just nice identification. But such care is rare indeed! It has been three decades since we, supposedly, “took the ‘con’ out of Econometrics”. And yet an unbearable number of papers are still floating around which quite nicely identify a relationship of interest in a particular dataset, then go on to give only the vaguest and most unsatisfying remarks concerning external validity. That’s a much worse con than bad identification! Identification, by definition, can only hold ceteris paribus. Even perfect identification of some marginal effect tells me absolutely nothing about the magnitude of that effect when I go to a different time, or a different country, or a more general scenario. The only way – the only way! – to generalize an internally valid result, and the only way to explain why that result is the way it is, is to use theory. A good paper puts the theoretical explanation and the specific empirical case examined in context with other empirical papers on the same general topic, rather than stopping after the identification is cleanly done. And a good empirical paper needs to explain, and needs to generalize, because we care about unemployment (not unemployment in border counties of New Jersey in the 1990s) and we care about the effect of military training on labor supply (not the effect of the Vietnam War on labor supply in the few years following), etc. If we really want the credibility revolution in empirical economics to continue, let’s spend less seminar and referee time worrying only about internal validity, and more time shutting down the BS that is often passed off as “explanation”.

November 2012 working paper. Jia also has an interesting paper about the legacy of China’s treaty ports, as well as a nice paper (a la Nunn and Qian) on the importance of the potato in world history (really! I may be a biased Dorchester-born Mick, but still, the potato has been fabulously important).

“How Does Family Income Affect Enrollment?,” N. Hilger (2012)

Nate Hilger is on the market from Harvard this year. His job market paper continues a long line of inference that is probably at odds with mainstream political intuition. Roughly, economists generally support cash rather than in-kind transfers because people tend to be the best judges of the optimal use of money they receive; food stamps are not so useful if you really need to pay the heat bill that week. That said, if the goal is to cause some behavior change among the recipient, in-kind transfers can be more beneficial, especially when the cash transfer would go to a family while the in-kind transfer would go to a child or a wife.

Hilger managed to get his hands on the full universe of IRS data. I’m told by my empirically-minded friends that this data is something of a holy grail, with the IRS really limiting who can use the data after Saez proved its usefulness. IRS data is great because of the 1098T: colleges are required to file information about their students’ college attendance so that the government can appropriately dole out aid and tax credits. Even better, firms that fire or layoff workers file a 1099G. Finally, claimed dependents on the individual tax form let us link parents and children. That’s quite a trove of data!

Here’s a question we can answer with it: does low household income lower college attendance, and would income transfers to poor families help reduce the college attendance gap? In a world with perfect credit markets, it shouldn’t matter, since any student could pledge the human capital she would gain as collateral for a college attendance loan. Of course, pledging one’s human capital turns out to be quite difficult. Even if the loans aren’t there, a well-functioning and comprehensive university aid program should insulate the poor from this type of liquidity problem. Now, we know from previous studies that increased financial aid has a pretty big effect on college attendance among the poor and lower middle class. Is this because the aid is helping loosen the family liquidity constraint?

Hilger uses the following trick. Consider a worker who is laid off. This is only a temporary shock, but this paper and others estimate a layoff lowers discounted lifetime earnings by an average of nearly $100,000. So can we just propensity match laid off and employed workers when the child is college age, and see if the income shock lowers attendance? Not so fast. It turns out that matching on whatever observables we have, children whose fathers are laid off when the child is 19 are also much less likely to attend college than children whose fathers are not laid off, even though age 19 would be after the attendance decision is made. Roughly, a father who is ever laid off is correlated with some nonobservables that lower college attendance of children. So let’s compare children whose dads are laid off at 17 to children whose dads are laid off from a similar firm at age 19, matching on all other observables. The IRS data has so many data points that this is actually possible.

What do we learn? First, consumption (in this case, on housing) spending declines roughly in line with the lifetime income hypothesis after the income shock. Second, there is hardly any effect on college attendance and quality: attendance for children whose dads suffer the large income shock falls by half a percentage point. Further, the decline is almost entirely borne by middle class children, not the very poor or the rich: this makes sense since poor students rely very little on parental funding to pay for college, and the rich have enough assets to overcome any liquidity shock. The quality of college chosen also declines after a layoff, but only by a very small amount. That is, the Engel curve for college spending is very flat: families with more income tend to spend roughly similar amounts on college.

Policy-wise, what does this mean? Other authors have estimated that a $1000 increase in annual financial aid increases college enrollment by approximately three percentage points (a particularly strong effect is found among students from impoverished families); the Kalamazoo experiment shows positive feedback loops that many make the efficacy of such aid even higher, since students will exert more effort in high school knowing that college is a realistic financial possibility. Hilger’s paper shows that a $1000 cash grant to poor families will likely improve college attendance by .007 to .04 percentage points depending on whether the layoff is lowering college attendance due to a transitory or a permanent income shock. That is, financial aid is orders of magnitude more useful in raising college attendance than cash transfers, especially among the poor.

November 2012 working paper (No IDEAS version). My old Federal Reserve coworker Christopher Herrington is also on the job market, and has a result suggesting the importance of Hilger’s finding. He computes a DSGE model of lifetime human capital formation, and considers the counterfactual where the US has more equal education funding (that is, schools that centrally funded rather than well-funded in rich areas and poorly-funded in poor areas). Around 15% of eventual earnings inequality – again taking into account many general equilibrium effects – can be explained by the high variance of US education funding. As in Hilger, directly altering the requirement that parents pay for school (either through direct payments at the university level, or by purchasing housing in rich areas at the primary level) can cure a good portion of our growing inequality.

“Paternalism, Libertarianism and the Nature of Disagreement,” U. Loginova & P. Persson (2012)

Petra Persson is on the job market this year from Columbia. Her CV is pretty incredible – there’s pure theory, cutting edge empirical techniques, policy work, networks, behavioral and more. Her job market paper is about the impact of social insurance policy on seemingly unrelated markets like marriage, and I’ll discuss it briefly at the end of the post, but I want to focus on another paper on hers which struck me as quite interesting.

Imagine a benevolent ruler who has private information about some policy, such as the relative safety of wearing seatbelts. This ruler can either tell citizens the information, or lie, or coerce them to take such action. Naive libertarianism suggests that we should always be truthful is altruistic; consumers can then weigh the information according to their preferences and then choose the policy optimal for them.

But note something interesting. On some issues, one subset of politicians has libertarian leanings, while on others, a different subset has those leanings. For instance, a politician may favor legal assisted suicide but insist on mandatory seatbelt rules, while another politician may be against the mandatory belt and also against legal assisted suicide. Politicians can even vary in how libertarian they wish to be depending on who the policy affects. Witness that many politicians favor legalizing marijuana but very few favor legalizing it for 16 year olds. What explains this behavior?

Loginova and Persson examine this theoretically. Take a population of citizens. There are two possible states, 0 and 1. They can either think each state equally likely yet have different heterogeneous preferences from the politician (measured with a Crawford-Sobel style quadratic loss, though this isn’t a critical model) or they can have identical preferences as the politician yet have different heterogenous (prior) beliefs about the probability of each state. The politician can be altruistic to varying degrees – more altruism means he, according to his own prior, puts more and more weight on the utility of the agent. The politician gets a noisy signal about the true state. To limit the extent of opposing beliefs, the politician is restricted to having the same prior as the median citizen.

If the politician can only advise or not advise, when does he make a truthful public announcement? If he disagrees on preferences with the citizens, then the more altruistic, the more likely he is to announce truthfully, for the standard libertarian reason: the citizens know their own preferences, and the better informed they are, the better they can maximize their own welfare. If, however, he disagrees on priors with the citizens, then the more altruistic, the less likely he is to announce truthfully: altruism means I care about the citizen’s welfare, but since they have priors that are in my eyes wrong, the citizens know that even when I am altruistic I have incentive to lie so that citizens take actions that are optimal according to my prior, therefore truthful communication cannot be sustained.

Now what if the politician could (at a cost to him) force all individuals to take an individual action? With preference disagreement, an altruistic politician would never do this, both because he can send all the information to citizens with a free message and also because a mandate does not respect heterogeneity of preferences. Even if action 0 is better than action 1 for 90% of the population, an altruistic principle also cares about the other 10%. With disagreement about priors, however, an altruistic politician is more likely to impose a mandate the more altruistic he is. Even though citizens have heterogeneous priors, the principle thinks all of them are wrong, and hence is not worried about heterogeneity when imposing a mandate. Since we noted in the last paragraph that altruistic politicians who have different priors from citizens will not be able to credibly send their information, the mandate allows the politician’s private information to be used in the citizen’s actions.

Finally, what if the politician can send individual-level messages or enforce individual mandates? A politician with preference disagreement need to be fairly altruistic before his public message is credible; in fact, he needs to be able to credibly persuade the individual with the average disagreement in order for his public signal to be credible. If he is not altruistic enough, he can still credibly persuade those agents who have only a limited amount of preference disagreement with him. If mandates are possible, the politician with limited altruism will force individuals whose preferences are very different from the politician to take the politician’s desired action, but since preferences of the politician and the agents are more aligned when altruism is higher, the share of citizens who face a mandate declines as the politician’s altruism increases. Likewise, a politician with disagreement about priors can only truthfully send information when his altruism is low. If the politician is very altruistic, even though the public signal will not be believed, a politician can still credibly send information to those whose priors are similar to the politician. The politician with low levels of altruism will only mandate the action of agents with extreme beliefs, but as altruism increases, more and more citizens will face a mandate.

Very good – the use of paternalistic policies, and the extent to which they are targeted at individuals, depends qualitatively on whether the politician disagrees with the agents about their preferences or about their knowledge, and the extent to which mandates are applied on certain groups depends on how extreme their preferences or beliefs are. There is nothing inherently contradictory in an altruistic politician taking the libertarian side on one issue and the paternalistic side on another.

July 2012 working paper (No IDEAS version). Petra has many other interesting papers. In her job market paper, presented here last week, she shows that social insurance, in this case a widow’s benefit in Sweden, can have major affects in other markets. In particular, a really nice regression discontinuity shows that the benefit was leading to a huge number of extra marriages, that these were more likely to end in divorce, that intrahousehold bargaining was affected, and much more (Jeff at Cheap Talk has a longer description). Her paper Circles of Trust notes a reason for cliquish behavior in some labor markets. If I have information whose value declines with use (such as a stock tip) and I am altruistic, I may wish to tell my friends the info. But I worry that they will tell their friends, who I don’t know and hence don’t really care about. If my friend could commit not to tell his friends, I would give him the info. How can we commit ex-ante? Make our social networks a clique. I would bet that this phenomenon explains hiring in, say, small hedge funds to a great extent.

Game Theory and History, A. Greif & Friends (1993, 1994)

(This post refers to A. Greif, “Contract Enforceability and Economic Institutions in Early Trade: The Maghribi Traders’ Coalition”, AER 1993, and A. Greif, P. Milgrom & B. Weingast, “Coordination, Commitment and Enforcement: The Case of the Merchant Guild,” JPE 1994.)

Game theory, after a rough start, may actually be fulfilling its role as proposed by Herbert Gintis: unifier of the sciences. It goes without saying that game theoretic analysis is widespread in economics, political science (e.g., voter behavior), sociology (network games), law (antitrust), computer science (defending networks against attacks), biology (evolutionary strategies), pure philosophy (more on this in a post tomorrow!), with occasional appearances in psychology, religion (recall Aumann’s Talmud paper), physics (quantum games), etc. But history? Surely game theory, particularly the more complex recent results, has no place there? Yet Avner Greif, an economic historian at Stanford, has shown that games can play a very interesting role indeed in understanding historical events.

Consider first his Maghribi traders paper. In the 11th and 12th century, a group of Judeo-Arabic traders called the Maghribis traded across the Mediterranean. Two institutional aspects of their trade are interesting. First, they all hired agents in foreign cities to carry out their trade, and second, they generally used other Maghribi merchants as their agents. This is quite different from, for instance, Italy, where merchants tended to hire agents in foreign cities who were not themselves merchants. What explains that difference, and more generally, how can long distance traders insure that traders do not rip you off? For instance, how do I keep them from claiming they sold at a low price when actually they sold at a high one?

To a theorist, this looks like a repeated reputational game with imperfect monitoring. Greif doesn’t go the easy route and just assume there are trustworthy and untrustworthy types. Rather, he assumes that there are a set of potential agents who can be hired in each period, that agents are exogenously separated from merchants with probability p in each period, and that merchants can choose to hire and fire at any wage they choose. You probably know from economics of reputation or from the efficiency wage literature that I need to offer wages higher than the agent’s outside option to keep him from stealing; the value of the continuation game, then, is more than the value of stealing now. Imagine that I fire anyone who steals and never hire him again. How do I ensure that other firms do not then hire that same agent (perhaps the agent will say, “Look, give me a second chance and I will work at a lower wage”)? Well, an agent who has cheated one merchant will never be hired by that merchant again. This means that when he is in the unemployed pool, even if other merchants are willing to hire him, his probability of getting hired is lower, since one merchant will definitely not hire him. That means that the continuation value of the game if he doesn’t steal from me is lower. Therefore, the efficiency wage I must pay him to keep him from stealing is higher than the efficiency wage I can pay someone who hasn’t ever stolen, so I strictly prefer to hire agents who have never stolen. This allows the whole coalition to coordinate. Note that the fewer agents there are, the higher the continuation value from not stealing, and hence the lower the efficiency wage I can pay: it is optimal to keep the set of potential agents small.

What of the Italian merchants? Why do they not hire only each other? Maghribi merchants tended to be involved only in long distance trade, while Italian merchants were also involved in real estate and other pursuits. This means the outside option (continuation value after cheating if no one hires me again) is higher for Italian merchants than Maghribi merchants, which means that hiring merchants at the necessary efficiency wage will be relatively more expensive for Italians than Maghribis.

A followup by Greif, with Milgrom and Weingast, considers the problem of long distance trade from the perspective of cities. Forget about keeping your agent from ripping you off: how do you keep the city from ripping you off? For instance, Genoans in Constantinople had their district overrun by a mob at one point, with no compensation offered. Sicilians raised taxes on sales by Jews at one point after they had brought their goods for sale. You may naively think that reputation alone will be enough; I won’t rip anyone off because I want a reputation of being a safe and fair city for trade.

But again, the literature of repeated games tells us this will not work. Generally, I need to punish deviations from the efficient set of strategies, and punish those who themselves do not punish deviators. In terms of medieval trade, to keep a city from ripping me off, I need not only to punish the city by bringing it less trade, but I also need to make sure the city doesn’t make up for my lost trade by offering a special deal to some other trader. That is, I need to get information about violation against a single trader to other traders, and I need to make sure they are willing to punish the deviating city.

The merchant guild was the institution that solved this problem. Merchant guilds were able to punish their own members by, for example, keeping them from earning rents from special privileges in their own city. In the most general setting, when a guild orders a boycott, cities may be able to attract some trade, but less than the efficient amount, because only by offering a particularly good deal to the merchants who come during a boycott will entice them to come and to credibly believe the city will not steal.

This is all to say that strong guilds may be in the best interest of cities since they allow the city to solve its commitment problem. The historical record confirms many examples of cities encouraging guilds to come trade, and encouraging the strengthening of guilds. Only a reputational model like the above one can explain such city behavior; if guilds are merely extracting rents with monopoly privilege, cities would not encourage them all. Both of these papers, I think, are quite brilliant.

1993 AER (IDEAS version) and 1994 JPE (IDEAS version). Big thumbs up to Avner for having the final published versions of these papers on his website.

The Well-Calibrated Pundit

I’ll get back to the usual posting regimen on new research, but the recent election is a great time to popularize some ideas that are well known in the theory community, though perhaps not generally. Consider the problem of punditry. Nature is going to draw an election winner, perhaps in a correlated way, from 51 distributions representing each state plus DC. An “expert” is someone who knows the true distribution, e.g., “With .7 probability, independent from all other states, Obama will win in New Hampshire.” We wish to identify the true experts. You can see the problem: the true expert knows distributions, yet we who are evaluating the expert can only see one realization from each distribution.

When forecasts are made sequentially – imagine a weather forecaster declaring whether it will rain or not every day – there is a nice literature (done principally here at MEDS) about the problem of divining experts. Essentially, as first pointed out by Foster and Vohra in a 1998 Biometrika, imagine that you set a rule such that a true expert, who knows the underlying distribution each period, “passes” the rule with very high probability. It then turns out (this can be proven using a version of the minmax theorem) that a complete ignoramus who knows nothing of the underlying distribution can also pass your test. This is true no matter what the test is.

Now, the testing literature is interesting, but more interesting are properties of what a good test for a forecaster might be. In an idea I first saw through a famous 1982 paper in JASA, one minimally sensible rule might be called “calibration”. I am well-calibrated if, on the days where I predict rain with probability .4, then it actually rains 40 percent of the time. Clearly this is not sufficient – I am well-calibrated if I simply predict the long run empirical average frequency of rain every day – but it seems a good minimum necessary condition. A law of large numbers argument shows that a true expert will pass a calibration test with arbitrarily high probability. With a lot of data points, we could simply bin predictions (say, here are days where prediction is between 40 and 45%), and graph those points against the actual empirical realization on the predicted days; a well-calibrated forecast would generate all data points along the 45-degree line.

Here is where we come to punditry. The recent election looks like a validation for data-using pundits like Nate Silver, and in many ways it is; people were calling people like him idiots literally a week ago, yet Silver, Sam Wang and the rest have more or less correctly named the winner in every state. But, you might say, what good is that? After all, aside from Florida, Intrade also correctly predicted the winner in every state. Here is where we can use calibration tests to figure out who is the better pundit.

Of the swing states which went for Obama, Intrade had Virginia and Colorado as tossups; Ohio, Iowa, New Hampshire and Florida as 2/3 favorites for the frontrunner (Obama in the first three, Romney in FL); and Wisconsin, Pennsylvania, Michigan, Nevada, Ohio and North Carolina as 75 to 85% chances for the frontrunner (Obama in the first five, Romney in NC). A well-calibrated prediction would have had the half the tossup states go to each candidate, 67% of the second group of states to the frontrunner, and 80% or so of the third group to the frontrunner. That is, a well-calibrated Intrade forecast should have “missed” on .5*2+.33*4+.2*6, or roughly 3, states. Intrade actually missed only one.

Doing the same exercise with Silver’s predictions, he had Florida as a tossup, Colorado as a .8 Obama favorite, NC a .8 Romney favorite, Iowa and NH about .85 Obama favorites, Ohio and Nevada about .9 Obama favorites, and the rest of the swing states a .95 or higher Obama favorite. A well-calibrated Silver forecast, then, should have been wrong on about 1.5 states. With Florida going to Obama, Silver will have correctly called every state. There is a very reasonable argument that Silver’s prediction would have been better had Florida gone to Romney! He would have called fewer states correctly in their binary outcomes, but his percentages would have been better calibrated.

That is, Silver is 1.5 states “off” the well-calibrated prediction, and Intrade 2 states “off” the well-calibrated prediction; we say, then, that Silver’s predictions were better. A similar calculation could be made for other pundits. Such an exercise is far better than the “how many states did you call correctly?” reckoning that you see, for example, here.

Two caveats. First, the events here are both simultaneous and correlated. If we had an even number of Romney-leaning and Obama-leaning swing states, the correlation would be much less important, but given how many of the close states were predicted to go for Obama, you might worry that even true experts will either get 0 states wrong, or a whole bunch of states wrong. This is a fair point, but in the absence of total correlation tangential to the general argument that forecasters of probabilistic binary events who get “everything right” should not be correct on every event. Second, you may note that forecasters like Silver also made predictions about vote shares and other factors. I agree that it is much more useful to distinguish good and bad forecasters using that more detailed, non-binary data, but even there, nature is only giving us one realization from a true underlying distribution in each state, so the point about calibration still applies. (EDIT: I should note here that if you’re interested in calibrated forecasts, there are much more sophisticated ways of doing the type of analysis that I did above, though with the same qualitative point. Google “Brier score” for a particularly well-known way to evaluate binary outcomes; the Brier score can be decomposed in a way that you extract something very similar to the more basic analysis above. In general, scoring rules are in a branch of statistics that we economists very much like; unlike pure frequentism or Bayesianism, scoring rules and other Wald-style statistics implicitly set out a decision problem with a maximand before doing any analysis. Very satisfying.)

Back to our regular style of post tomorrow.

“Trafficking Networks and the Mexican Drug War,” M. Dell (2011)

Job market talks for 2012 have concluded at many schools, and therefore this is my last post on a job candidate paper. This is also the only paper I didn’t have a change to see presented live, and for good reason: Melissa Dell is clearly this year’s superstar, and I think it’s safe to assume she can have any job she wants, and at a salary she names. I have previously discussed another paper of hers – the Mining Mita paper – which would also have been a mindblowing job market paper; essentially, she gives a cleanly identified and historically important example of long-run effects of institutions a la Acemoglu and Robinson, but the effect she finds is that “bad” institutions in the colonial era led to “good” outcomes today. The mechanism by which historical institutions persist is not obvious and must be examined on a case-by-case basis.

Today’s paper is about another critical issue: the Mexican drug war. Over 40,000 people have been killed in drug-related violence in Mexico in the past half-decade, and that murder rate has been increasing over time. Nearly all of Mexico’s domestic drug production, principally pot and heroin, is destined for the US. There have been suggestions, quite controversial, that the increase in violence is a result of Mexican government policies aimed at shutting down drug gangs. Roughly, some have claimed that when a city arrests leaders of a powerful gang, the power vacuum leads to a violent contest among new gangs attempting to move into that city; in terms of the most economics-laden gang drama, removing relatively non-violent Barksdale only makes it easier for violent Marlo.

But is this true? And if so, when is it true? How ought Mexico deploy scarce drugfighting resources? Dell answers all three questions. First, she notes that the Partido Acción Nacional is, for a number of reasons, associated with greater crackdowns on drug trafficking in local areas. She then runs a regression discontinuity on municipal elections – which vary nicely over time in Mexico – where PAN barely wins versus barely loses. These samples appear balanced according to a huge range of regressors, including the probability that PAN has won elections in the area previously, a control for potential corruption at the local level favoring PAN candidates. In a given municipality-month, the probability of a drug-related homicide rises from 6 percent to 15 percent following a PAN inauguration after such a close election. There does not appear to be any effect during the lame duck period before PAN takes office, so the violence appears correlated to anti-trafficking policies that occur after PAN takes control. There is also no such increase in cases where PAN barely loses. The effect is greatest in municipalities on the border of two large drug gang territories. The effect is also greatest in municipalities where detouring around that city on the Mexican road network heading toward the US is particularly arduous.

These estimates are interesting, and do suggest that Mexican government policy is casually related to increasing drug violence, but the more intriguing question is what we should do about this. Here, the work is particularly fascinating. Dell constructs a graph where the Mexican road network forms edges and municipalities form vertices. She identifies regions which are historical sources of pot and poppyseed production, and identifies ports and border checkpoints. Two models on this graph are considered. In the first model, drug traffickers seek to reach a US port according to the shortest possible route. When PAN wins a close election, that municipality is assumed closed to drug traffic and gangs reoptimize routes. We can then identify which cities are likely to receive diverted drug traffic. Using data on drug possession arrests above $1000 – traffickers, basically – she finds that drug confiscations in the cities expected by the model to get traffic post-elections indeed rises 18 to 25 percent, depending on your measure. This is true even when the predicted new trafficking routes do not have a change in local government party: the change in drug confiscation is not simply PAN arresting more people, but actually does seem like more traffic along the route.

A second model is even nicer. She considers the equilibrium where traffickers try to avoid congestion. That is, if all gangs go to the same US port of entry, trafficking is very expensive. She estimates a cost function using pre-election trafficking data that is fairly robust to differing assumptions about the nature of the cost of congestion, and solves for the Waldrop equilibrium, a concept allowing for relatively straightforward computational solutions to congestion games on a network. The model in the pre-election period for which parameters on the costs are estimated very closely matches actual data on known drug trafficking at that time – congestion at US ports appears to be really important, whereas congestion on internal Mexican roads doesn’t matter too much. Now again, she considers the period after close PAN elections, assuming that these close PAN victories increase the cost of trafficking by some amount (results are robust to the exact amount), and resolves the congestion game from the perspective of the gangs. As in the simpler model, drug trafficking rises by 20 percent or so in municipalities that gain a drug trafficking route after the elections. Probability of drug-related homicides similarly increases. A really nice sensitivity check is performed by checking cocaine interdictions in the same city: they do not increase at all, as expected by the model, since the model maps trafficking routes from pot and poppy production sites to the US, and cocaine is only transshipped to Mexico via ports unknown to the researcher.

So we know now that, particularly when a territory is on a predicted trafficking route near the boundary of multiple gang territories, violence will likely increase after a crackdown. And we can use the network model to estimate what will happen to trafficking costs if we set checkpoints to make some roads harder to use. Now, given that the government has resources to set checkpoints on N roads, with the goal of increasing trafficking costs and decreasing violence, where ought checkpoints be set? Exact solutions turn out to be impossible – this “vital edges” problem in NP-hard and the number of edges is in the tens of thousands – but approximate algorithms can be used, and Dell shows which areas will benefit most from greater police presence. The same model, as long as data is good enough, can be applied to many other countries. Choosing trafficking routes is a problem played often enough by gangs that if you buy the 1980s arguments about how learning converges to Nash play, then you may believe (I do!) that the problem of selecting where to spend government counter-drug money is amenable to game theory using the techniques Dell describes. Great stuff. Now, between the lines, and understand this is my reading and not Dell’s claim, I get the feeling that she also thinks that the violence spillovers of interdiction are so large that the Mexican government may want to consider giving up altogether on fighting drug gangs.

http://econ-www.mit.edu/files/7484 (Nov 2011 Working Paper. I should note that this year is another example of strong female presence at the top of the economics job market. The lack of gender diversity in economics is problematic for a number of reasons, but it does appear things are getting better: Heidi Williams, Alessandra Voena, Melissa Dell, and Aislinn Bohren, among others, have done great work. The lack of socioeconomic diversity continues to be worrying, however; the field does much worse than fellow social sciences at developing researchers hailing from the developing world, or from blue-collar family backgrounds. Perhaps next year.)

“Buy Coal!: A Case for Supply Side Environmental Policy,” B. Harstad (2012)

The vast majority of world nations are not currently participating in agreements to limit global warming. Many countries cut down their rainforests in a way harmful to global social welfare. Even worse, attempts to improve things by the countries that do care can be self-defeating because of the problem of “leakage”, or what we economists just call supply and demand. Imagine Sweden cuts emissions by lowering domestic demand for oil. That lowers world demand for oil, lowering world price, hence increasing quantity demanded elsewhere. Boycotts may work in a similar way: when consumers in Canada stop buying some rare wood, the price of that wood falls, increasing the quantity of wood consumed in other countries.

What to do? Well, Coase tells us that externalities are in many cases not a worry when property rights are properly defined. Instead of trying to limit demand side consumption, why not limit supply? In particular, imagine that one set of countries (call them Scandinavia) imagines some harm from consumption of oil, and another set doesn’t care (let’s call them Tartary, after my favorite long-lost empire). Oil is costly to produce, and there is no entry, which isn’t a bad assumption for something like oil. Let there be a market for oil deposits – and you may have noticed from the number of Chinese currently laying pipe in Africa that such a market exists!

Let (q*,p*) be the quantity and price that clears the world market. Let h be the marginal harm to Scandinavia from global oil consumption. Let qopt be the socially optimal level of consumption from the perspective of Scandinavia, and popt the price that clears the market at that quantity. The Scandinavians just need to buy all the oil deposits whose cost of extraction is higher than popt minus h and lower than popt. Once they own the rights, they place an extraction tax on those deposits equal to the harm, h. With such a policy, no one exploits these marginal oil fields because of the tax, and no one exploits any more costly-to-extract fields because the cost of extraction is lower than the world oil price. There are many well-known mechanisms for buying the marginal oil fields at a cost lower than the harm inflicted on Scandinavia if the oil were exploited: the exact cost is particularly low if a few countries own all the world oil, since that country will benefit from Scandinavia’s policy as the world oil price rises following Scandinavia’s purchase of the marginal fields. Note that this policy is also nice in that oil, after the policy, costs exactly the same in Tartary and Scandinavia, so there is no worry about firms moving to the country with lax environmental policies. Another benefit is that it avoids the time inconsistency of related dynamic problems, such as using subsidies for green technology until they are invented, then getting rid of the subsidies.

There are some policies like this currently in place: for example, Norway’s environmental agency buys the rights to forest tracts and keeps them unexploited. But note that you have to buy the right tract to avoid leakage: you want to buy the tract that is worth exploiting, but just barely. This is great for you as the environmentalist, though, since this will be the cheapest tract to buy given the limited profit to be made if it is cut down!

This paper should also suggest to you other ways to enact environmental policy when facing leakage: political recalcitrance doesn’t mean we are completely out of options. The problem is that you want to decrease quantity consumed in your country – whose policies you control – without causing quantity consumed to rise elsewhere as price falls. The Pigouvian solution is to make the marginal suppliers unprofitable, or make the marginal demanders lower their willingness to pay. One way to do this without tax policy is to introduce products that are substitutes for the polluting good: clean energy, for instance. Or introduce complements for products which are substitutes for the polluting product. There are many more!

http://www.kellogg.northwestern.edu/faculty/harstad/htm/deposits.pdf (January 2012 draft – forthcoming in the Journal of Political Economy)

Follow

Get every new post delivered to your Inbox.

Join 174 other followers

%d bloggers like this: