Category Archives: Political Economy

“International Trade and Institutional Change: Medieval Venice’s Response to Globalization,” D. Puga & D. Trefler

(Before discussing the paper today, I should forward a couple great remembrances of Stanley Reiter, who passed away this summer, by Michael Chwe (whose interests at the intersection of theory and history are close to my heart) and Rakesh Vohra. After leaving Stanford – Chwe mentions this was partly due to a nasty letter written by Reiter’s advisor Milton Friedman! – Reiter established an incredible theory group at Purdue which included Afriat, Vernon Smith and PhD students like Sonnenschein and Ledyard. He then moved to Northwestern where he helped build up the great group in MEDS which is too long to list, but which includes one Nobel winner already in Myerson and, by my reckoning, two more which are favorites to win the prize next Monday.

I wonder if we may be at the end of an era for topic-diverse theory departments. Business schools are all a bit worried about “Peak MBA”, and theorists are surely the first ones out the door when enrollment falls. Economic departments, journals and funders seem to have shifted, in the large, toward more empirical work, for better or worse. Our knowledge both of how economic and social interactions operate in their most platonic form, and our ability to interpret empirical results when considering novel or counterfactual policies, have greatly benefited by the theoretical developments following Samuelson and Hicks’ mathematization of primitives in the 1930s and 40s, and the development of modern game theory and mechanism design in the 1970s and 80s. Would that a new Cowles and a 21st century Reiter appear to help create a critical mass of theorists again!)

On to today’s paper, a really interesting theory-driven piece of economic history. Venice was one of the most important centers of Europe’s “commercial revolution” between the 10th and 15th century; anyone who read Marco Polo as a schoolkid knows of Venice’s prowess in long-distance trade. Among historians, Venice is also well-known for the inclusive political institutions that developed in the 12th century, and the rise of oligarchy following the “Serrata” at the end of the 13th century. The Serrata was followed by a gradual decrease in Venice’s power in long-distance trade and a shift toward manufacturing, including the Murano glass it is still famous for today. This is a fairly worrying history from our vantage point today: as the middle class grew wealthier, democratic forms of government and free markets did not follow. Indeed, quite the opposite: the oligarchs seized political power, and within a few decades of the serrata restricted access to the types of trade that previously drove wealth mobility. Explaining what happened here is both a challenge due to limited data, and of great importance given the public prominence of worries about the intersection of growing inequality and corruption of the levers of democracy.

Dan Trefler, an economic historian here at U. Toronto, and Diego Puga, an economist at CEMFI who has done some great work in economic geography, provide a great explanation of this history. Here’s the model. Venice begins with lots of low-wealth individuals, a small middle and upper class, and political power granted to anyone in the upper class. Parents in each dynasty can choose to follow a risky project – becoming a merchant in a long-distance trading mission a la Niccolo and Maffeo Polo – or work locally in a job with lower expected pay. Some of these low and middle class families will succeed on their trade mission and become middle and upper class in the next generation. Those with wealth can sponsor ships via the colleganza, a type of early joint-stock company with limited liability, and potentially join the upper class. Since long-distance trade is high variance, there is a lot of churn across classes. Those with political power also gather rents from their political office. As the number of wealthy rise in the 11th and 12th century, the returns to sponsoring ships falls due to competition across sponsors in the labor and export markets. At any point, the upper class can vote to restrict future entry into the political class by making political power hereditary. They need to include sufficiently many powerful people in this hereditary class or there will be a revolt. As the number of wealthy increase, eventually the wealthy find it worthwhile to restrict political power so they can keep political rents within their dynasty forever. Though political power is restricted, the economy is still free, and the number of wealthy without power continue to grow, lowering the return to wealth for those with political power due to competition in factor and product markets. At some point, the return is so low that it is worth risking revolt from the lower classes by restricting entry of non-nobles into lucrative industries. To prevent revolt, a portion of the middle classes are brought in to the hereditary political regime, such that the regime is powerful enough to halt a revolt. Under these new restrictions, lower classes stop engaging in long-distance trade and instead work in local industry. These outcomes can all be generated with a reasonable looking model of dynastic occupation choice.

What historical data would be consistent with this theoretical mechanism? We should expect lots of turnover in political power and wealth in the 10th through 13th centuries. We should find examples in the literature of families beginning as long-distance traders and rising to voyage sponsors and political agents. We should see a period of political autocracy develop, followed later by the expansion of hereditary political power and restrictions on lucrative industry entry to those with such power. Economic success based on being able to activate large amounts of capital from within the nobility class will make the importance of inter-family connections more important in the 14th and 15th centuries than before. Political power and participation in lucrative economic ventures will be limited to a smaller number of families after this political and economic closure than before. Those left out of the hereditary regime will shift to local agriculture and small-scale manufacturing.

Indeed, we see all of these outcomes in Venetian history. Trefler and Puga use some nice techniques to get around limited data availability. Since we don’t have data on family incomes, they use the correlation in eigenvector centrality within family marriage networks as a measure of the stability of the upper classes. They code colleganza records – a non-trivial task involving searching thousands of scanned documents for particular Latin phrases – to investigate how often new families appear in these records, and how concentration in the funding of long-distance trade changes over time. They show that all of the families with high eigenvector centrality in the noble marriage market after political closure – a measure of economic importance, remember – were families that were in the top quartile of seat-share in the pre-closure Venetian legislature, and that those families which had lots of political power pre-closure but little commercial success thereafter tended to be unsuccessful in marrying into lucrative alliances.

There is a lot more historical detail in the paper, but as a matter of theory useful to the present day, the Venetian experience ought throw cold water on the idea that political inclusiveness and economic development always form a virtuous circle. Institutions are endogenous, and changes in the nature of inequality within a society following economic development alter the potential for political and economic crackdowns to survive popular revolt.

Final published version in QJE 2014 (RePEc IDEAS). A big thumbs up to Diego for having the single best research website I have come across in five years of discussing papers in this blog. Every paper has an abstract, well-organized replication data, and a link to a locally-hosted version of the final published paper. You may know his paper with Nathan Nunn on how rugged terrain in Africa is associated with good economic outcomes today because slave traders like the infamous Tippu Tip couldn’t easily exploit mountainous areas, but it’s also worth checking out his really clever theoretical disambiguation of why firms in cities are more productive, as well as his crazy yet canonical satellite-based investigation of the causes of sprawl. There is a really cool graphic on the growth of U.S. sprawl at that last link!

“The Tragedy of the Commons in a Violent World,” P. Sekeris (2014)

The prisoner’s dilemma is one of the great insights in the history of the social sciences. Why would people ever take actions that make everyone worse off? Because we all realize that if everyone took the socially optimal action, we would each be better off individually by cheating and doing something else. Even if we interact many times, that incentive to cheat will remain in our final interaction, hence cooperation will unravel all the way back to the present. In the absence of some ability to commit or contract, then, it is no surprise we see things like oligopolies who sell more than the quantity which maximizes industry profit, or countries who exhaust common fisheries faster than they would if the fishery were wholly within national waters, and so on.

But there is a wrinkle: the dreaded folk theorem. As is well known, if we play frequently enough, and the probability that any given game is the last is low enough, then any feasible outcome which is better than what players can guarantee themselves regardless of other player’s action can be sustained as an equilibrium; this, of course, includes the socially optimal outcome. And the punishment strategies necessary to get to that social optimum are often fairly straightforward. Consider oligopoly: if your firm produces more than half the monopoly output, then I produce the Cournot duopoly quantity in the next period. If you think I will produce Cournot, your best response is also to produce Cournot, and we will do so forever. Therefore, if we are setting prices frequently enough, the benefit to you of cheating today is not enough to overcome the lower profits you will earn in every future period, and hence we are able to collude at the monopoly level of output.

Folk theorems are really robust. What if we only observe some random public signal of what each of us did in the last period? The folk theorem holds. What if we only privately observe some random signal of what the other people did last period? No problem, the folk theorem holds. There are many more generalizations. Any applied theorist has surely run into the folk theorem problem – how do I let players use “reasonable” strategies in a repeated game but disallow crazy strategies which might permit tacit collusion?

This is Sekeris’ problem in the present paper. Consider two nations sharing a common pool of resources like fish. We know from Hotelling how to solve the optimal resource extraction problem if there is only one nation. With more than one nation, each party has an incentive to overfish today because they don’t take sufficient account of the fact that their fishing today lowers the amount of fish left for the opponent tomorrow, but the folk theorem tells us that we can still sustain cooperation if we interact frequently enough. Indeed, Ostrom won the Nobel a few years ago for showing how such punishments operate in many real world situations. But, but! – why then do we see fisheries and other common pool resources overdepleted so often?

There are a few ways to get around the folk theorem. First, it may just be that players do not interact forever, at least probabalistically; some firms may last longer than others, for instance. Second, it may be that firms cannot change their strategies frequently enough, so that you will not be punished so harshly if you deviate from the cooperative optimum. Third, Mallesh Pai and coauthors show in a recent paper that with a large number of players and sufficient differential obfuscation of signals, it becomes too difficult to “catch cheaters” and hence the stage game equilibrium is retained. Sekeris proposes an alternative to all of these: allow players to take actions which change the form of the stage game in the future. In particular, he allows players to fight for control of a bigger share of the common pool if they wish. Fighting requires expending resources from the pool building arms, and the fight itself also diminishes the size of the pool by destroying resources.

As the remaining resource pool gets smaller and smaller, then each player is willing to waste fewer resources arming themselves in a fight over that smaller pool. This means that if conflict does break out, fewer resources will be destroyed in the “low intensity” fight. Because fighting is less costly when the pool is small, as the pool is depleted through cooperative extraction, eventually the players will fight over what remains. Since players will have asymmetric access to the pool following the outcome of the fight, there are fewer ways for the “smaller” player to harm the bigger one after the fight, and hence less ability to use threats of such harm to maintain folk-theorem cooperation before the fight. Therefore, the cooperative equilibrium partially unravels and players do not fully cooperate even at the start of the game when the common pool is big.

That’s a nice methodological trick, but also somewhat reasonable in the context of common resource pool management. If you don’t overfish today, it must be because you fear I will punish you by overfishing myself tomorrow. If you know I will enact such punishment, then you will just invade me tomorrow (perhaps metaphorically via trade agreements or similar) before I can enact such punishment. This possibility limits the type of credible threats that can be made off the equilibrium path.

Final working paper (RePEc IDEAS. Paper published in Fall 2014 RAND.

“On the Origin of States: Stationary Bandits and Taxation in Eastern Congo,” R. S. de la Sierra (2013)

The job market is yet again in full swing. I won’t be able to catch as many talks this year as I would like to, but I still want to point out a handful of papers that I consider particularly elucidating. This article, by Columbia’s de la Sierra, absolutely fits that category.

The essential question is, why do states form? Would that all young economists interested in development put their effort toward such grand questions! The old Rousseauian idea you learned your first year of college, where individuals come together voluntarily for mutual benefit, seems contrary to lots of historical evidence. Instead, war appears to be a prime mover for state formation; armed groups establish a so-called “monopoly on violence” in an area for a variety of reasons, and proto-state institutions evolve. This basic idea is widespread in the literature, but it is still not clear which conditions within an area lead armed groups to settle rather than to pillage. Further, examining these ideas empirically seems quite problematic for two reasons, first because states themselves are the ones who collect data hence we rarely observe anything before states have formed, and second, because most of the planet has long since been under the rule of a state (with apologies to James Scott!)

De la Sierra brings some economics to this problem. What is the difference between pillaging and sustained state-like forms? The pillager can only extract assets on its way through, while the proto-state can establish “taxes”. What taxes will it establish? If the goal is long-run revenue maximization, Ramsey long ago told us that it is optimal to tax elements that are inelastic. If labor can flee, but the output of the mine can not, then you ought tax the output of the mine highly and set a low poll tax. If labor supply is inelastic but output can be hidden from the taxman, then use a high poll tax. Thus, when will bandits form a state instead of just pillaging? When there is a factor which can be dynamically taxed at such a rate that the discounted tax revenue exceeds what can be pillaged today. Note that the ability to, say, restrict movement along roads, or to expand output through state-owned capital, changes relevant tax elasticities, so at a more fundamental level, capacity by rebels along these margins are also important (and I imagine that extending de la Sierra’s paper will involve the evolutionary development of these types of capacities).

This is really an important idea. It is not that there is a tradeoff between producing and pillaging. Instead, there is a three way tradeoff between producing in your home village, joining an armed group to pillage, and joining an armed group that taxes like a state! The armed group that taxes will, as a result of its desire to increase tax revenue, perhaps introduce institutions that increase production in the area under its control. And to the extent that institutions persist, short-run changes that cause potential bandits to form taxing relationships may actually lead to long-run increases in productivity in a region.

De la Sierra goes a step beyond theory, investigating these ideas empirically in the Congo. Eastern Congo during and after the Second Congo War was characterized by a number of rebel groups that occasionally just pillaged, but occasionally formed stable tax relationships with villages that could last for years. That is, the rebels occasionally implemented something looking like states. The theory above suggests that exogenous changes in the ability to extract tax revenue (over a discounted horizon) will shift the rebels from pillagers to proto-states. And, incredibly, there were a number of interesting exogenous changes that had exactly that effect.

The prices of coltan and gold both suffered price shocks during the war. Coltan is heavy, hard to hide, and must be shipped by plane in the absence of roads. Gold is light, easy to hide, and can simply be carried from the mine on jungle footpaths. When the price of coltan rises, the maximal tax revenue of a state increases since taxable coltan production is relatively inelastic. This is particularly true near airstrips, where the coltan can actually be sold. When the price of gold increases, the maximal tax revenue does not change much, since gold is easy to hide, and hence the optimal tax is on labor rather than on output. An exogenous rise in coltan prices should encourage proto-state formation in areas with coltan, then, while an exogenous rise is gold prices should have little impact on the pillage vs. state tradeoff. Likewise, a government initiative to root out rebels (be they stationary or pillaging) decreases the expected number of years a proto-state can extract rents, hence makes pillaging relatively more lucrative.

How to confirm these ideas, though, when there was no data collected on income, taxes, labor supply, or proto-state existence? Here is the crazy bit – 11 locals were hired in Eastern Congo to travel to a large number of villages, spend a week there querying families and village elders about their experiences during the war, the existence of mines, etc. The “state formation” in these parts of Congo is only a few years in the past, so it is at least conceivable that memories, suitably combined, might actually be reliable. And indeed, the data do seem to match aggregate trends known to monitors of the war. What of the model predictions? They all seem to hold, and quite strongly: the ability to extract more tax revenue is important for proto-state formation, and areas where proto-states existed do appear to have retained higher productive capacity years later perhaps as a result of the proto-institutions those states developed. Fascinating. Even better, because there is a proposed mechanism rather than an identified treatment effect, we can have some confidence that this course is, to some extent, externally valid!

December 2013 working paper (No IDEAS page). You may wonder what a study like this costs (particularly if you are, like me, a theorist using little more than chalk and a chalkboard); I have no idea, but de la Sierra’s CV lists something like a half million dollars of grants, an incredible total for a graduate student. On a personal level, I spent a bit of time in Burundi a number of years ago, including visiting a jungle camp where rebels from the Second Congo War were still hiding. It was pretty amazing how organized even these small groups were in the areas they controlled; there was nothing anarchic about it.

“The Institutional Causes of China’s Great Famine, 1959-1961,” X. Meng, N. Qian & P. Yared (2011)

Nancy Qian, along with a big group of coauthors, has done a great amount of interesting empirical work in recent years on the economics of modern China; among other things, she has shown that local elections actually do cause policy changes in line with local preferences and that the state remains surprisingly powerful in the Chinese economy. In this paper with Xin Meng and Pierre Yared, she considers what is likely the worst famine in the history of mankind, China’s famous famine following the Great Leap Forward. After a agricultural production shock in 1959, a series of misguided policy experiments in the mid-1950s (like “backyard steel” production, which produced worthless metal), and an anti-Rightist purge which ended a brief period of less rigid bureaucracy, 30 million or so people would die from hunger over the next two years, with most deaths among the young and the very old. To put this in relative context, in the worst-hit counties, the birth-cohorts that should have been born or very young in 1960 and 1961 are today missing more than 80% of their projected members.

What is interesting, and what we have known since Sen, is that famines generally result from problems of food distribution rather than food production. And, indeed, the authors show that total grain production in caloric terms across rural parts of China is a multiple of what is necessary to hold off starvation during the height of the productivity shock. What is interesting and novel, though, is that provinces with higher historic per-capita grain production had the highest mortality, and likewise counties with the highest per-capita production as measured by a proxy based on climate also have the largest number of “missing” members in their birth year cohort in the 1990 census. This is strange – you might think that places that are living on the edge in normal times are most susceptible to famine.

This is where politics comes into play. The Chinese government “sent down” many competent bureaucrats during the anti-Rightist purges in the late 1950s, limiting the ability of the government to use flexible mechanisms for food procurement. The food system at the time involved the central government collecting a set amount of grain from each region, then returning stocks to communal kitchens. Now, local leaders had a strong incentive to understate how much was produced in a given year so that they could use the remainder for local power purposes. Because of limited communication technology and ineffective bureaucracy, the optimal mechanism (not specified formally, but apparently done so in an earlier version) for the central government involved pre-setting fixed production goals for every region. Here is the problem: imagine you wish the city, rural area 1 and rural area 2 to have the same expected consumption, with the city producing no food, and rural area 1 producing 1 ton per capita per year and rural area 2 producing 1.4 tons per capita. This gives total consumption of .8 tons per capita if the government sets in advance a fixed “tax” of .2 tons per capita from 1 and .6 tons per capita from region 2. Now a productivity shock lowers production everywhere by 10 percent. The city still gets its .8 tons per capita (since the “tax” is fixed), but area 1 now gets .9*1-.2=.7 tons per capita, and area 2 gets 1.4*.9-.6=.66 tons per capita. That is, the lack of flexibility in the system is more likely to push the productive regions into famine than other regions.

Now, this is not the whole story. Alternative explanations, already suggested in the literature, also are quantitatively important. Places with more anti-Rightist purges before the famine saw higher mortality (see this 2011 APSR by Kung and Chen), as did places with earlier adoption of communal dining halls or larger increases in backyard steel production, both proxies for “zealous” adherence to the Great Leap Forward. I would really like to see some attempt at a decomposition here: if you buy that local political leadership, the central government quota system, and political punishment of counterrevolutionary areas were all important, and that weather shocks alone were not, how many of the deaths should we ascribe to each of those factors? This seems an important question for preventing future famines. It seems that a further fleshing out of how these results relate to the old theory of the firm debates about flexibility of local managers under imperfect and partially unverifiable reporting can help us understand what was going on with the CCP policy choices; I’m thinking, for instance, of explicitly showing whether it is true that loss of members of the bureacracy (i.e., an increase in the cost of monitoring) necessarily incentivizes more rigid allocation rules. Theory here could help to quantify how important this mechanism might be.

2011 working paper (IDEAS version). This paper is R&R at ReStud currently. Qian has a couple other working papers that caught my eye. First, a paper with Duflo and Banerjee on Chinese transportation infrastructure finds very little impact on relative incomes of (quasi-random) access to a good transportation network, and suggests in a short model (which is less convincing…) that relative immobility of capital might be causing this. The techniques in the paper are similar to those used by Ben Faber in his very nice paper showing Krugman’s home market effect: if you are small and poor, being connected with a big productive place may not be good for you due to increasing returns to scale. Qian also has a 2013 paper with Nathan Nunn on food aid which suggests, pretty convincingly, that food aid in civil war zones prolongs conflicts; the mechanism, roughly, is that local armies can easily steal the aid and hence have less reason to sue for peace. The identification strategy here is quite nice: the US government buys wheat for price stabilization reasons, then gives much of this away to impoverished countries. The higher the price of wheat, the less the government surplus is, hence the less is given away.

Paul Samuelson’s Contributions to Welfare Economics, K. Arrow (1983)

I happened to come across a copy of a book entitled “Paul Samuelson and Modern Economic Theory” when browsing the library stacks recently. Clear evidence of his incredible breadth are in the section titles: Arrow writes about his work on social welfare, Houthhaker on consumption theory, Patinkin on money, Tobin on fiscal policy, Merton on financial economics, and so on. Arrow’s chapter on welfare economics was particularly interesting. This book comes from the early 80s, which is roughly the end of social welfare as a major field of study in economics. I was never totally clear on the reason for this – is it simply that Arrow’s Possibility Theorem, Sen’s Liberal Paradox, and the Gibbard-Satterthwaite Theorem were so devastating to any hope of “general” social choice rules?

In any case, social welfare is today little studied, but Arrow mentions a number of interesting results which really ought be better known. Bergson-Samuelson, conceived when the two were in graduate school together, is rightfully famous. After a long interlude of confused utilitarianism, Pareto had us all convinced that we should dismiss cardinal utility and interpersonal utility comparisons. This seems to suggest that all we can say about social welfare is that we should select a Pareto-optimal state. Bergson and Samuelson were unhappy with this – we suggest individuals should have preferences which represent an order (complete and transitive) over states, and the old utilitarians had a rule which imposed a real number for society’s value of any state (hence an order). Being able to order states from a social point of view seems necessary if we are to make decisions. Some attempts to extend Pareto did not give us an order. (Why is an order important? Arrow does not discuss this, but consider earlier attempts at extending Pareto like Kaldor-Hicks efficiency: going from state s to state s’ is KH-efficient if there exist ex-post transfers under which the change is Paretian. Let person a value the bundle (1,1)>(2,0)>(1,0)>all else, and person b value the bundle (1,1)>(0,2)>(0,1)>all else. In state s, person a is allocated (2,0) and person b (0,1). In state s’, person a is allocated (1,0) and person b is allocated (0,2). Note that going from s to s’ is a Kaldor-Hicks improvement, but going from s’ to s is also a Kaldor-Hicks improvement!)

Bergson and Samuelson wanted to respect individual preferences – society can’t prefer s to s’ if s’ is a Pareto improvement on s in the individual preference relations. Take the relation RU. We will say that sRUs’ if all individuals weakly prefer s to s’. Not that though RU is not complete, it is transitive. Here’s the great, and non-obvious, trick. The Polish mathematician Szpilrajn has a great 1930 theorem which says that if R is a transitive relation, then there exists a complete relation R2 which extends R; that is, if sRs’ then sR2s’, plus we complete the relation by adding some more elements. This is not a terribly easy proof, it turns out. That is, there exists social welfare orders which are entirely ordinal and which respect Pareto dominance. Of course, there may be lots of them, and which you pick is a problem of philosophy more than economics, but they exist nonetheless. Note why Arrow’s theorem doesn’t apply: we are starting with given sets of preferences and constructing a social preference, rather than attempting to find a rule that maps any individual preferences into a social rule. There have been many papers arguing that this difference doesn’t matter, so all I can say is that Arrow himself, in this very essay, accepts that difference completely. (One more sidenote here: if you wish to start with individual utility functions, we can still do everything in an ordinal way. It is not obvious that every indifference map can be mapped to a utility function, and not even true without some type of continuity assumption, especially if we want the utility functions to themselves be continuous. A nice proof of how we can do so using a trick from probability theory is in Neuefeind’s 1972 paper, which was followed up in more generality by Mount and Reiter here at MEDS then by Chichilnisky in a series of papers. Now just sum up these mapped individual utilities, and I have a Paretian social utility function which was constructed entirely in an ordinal fashion.)

Now, this Bergson-Samuelson seems pretty unusable. What do we learn that we don’t know from a naive Pareto property? Here are two great insights. First, choose any social welfare function from the set we have constructed above. Let individuals have non-identical utility functions. In general, there is no social welfare function which is maximized by always keeping every individual’s income identical in all states of the world! The proof of this is very easy if we use Harsanyi’s extension of Bergson-Samuelson: if agents are Expected Utility maximizers, than any B-S social welfare function can be written as the weighted linear combination of individual utility functions. As relative prices or the social production possibilities frontier changes, the weights are constant, but the individual marginal utilities are (generically) not. Hence if it was socially optimal to endow everybody with equal income before the relative price change, it (generically) is not later, no matter which Pareto-respecting measure of social welfare your society chooses to use! That is, I think, an astounding result for naive egalitarianism.

Here’s a second one. Surely any good economist knows policies should be evaluated according to cost-benefit analysis. If, for instance, the summed willingness-to-pay for a public good exceeds the cost of the public good, then society should buy it. When, however, does a B-S social welfare function allow us to make such an inference? Generically, such an inference is only possible if the distribution of income is itself socially optimal, since willingness-to-pay depends on the individual budget constraints. Indeed, even if demand estimation or survey evidence suggests that there is very little willingness-to-pay for a public good, society may wish to purchase the good. This is true even if the underlying basis for choosing the particular social welfare function we use has nothing at all to do with equity, and further since the B-S social welfare function respects individual preferences via the Paretian criterion, the reason we build the public good also has nothing to do with paternalism. Results of this type are just absolutely fundamental to policy analysis, and are not at all made irrelevant by the impossibility results which followed Arrow’s theorem.

This is a book chapter, so I’m afraid I don’t have an online version. The book is here. Arrow is amazingly still publishing at the age of 91; he had an interesting article with the underrated Partha Dasgupta in the EJ a couple years back. People claim that relative consumption a la Veblen matters in surveys. Yet it is hard to find such effects in the data. Why is this? Assume I wish to keep up with the Joneses when I move to a richer place. If I increase consumption today, I am decreasing savings, which decreases consumption even more tomorrow. How my desire to change consumption today if I have richer peers then depends on that dynamic tradeoff, which Arrow and Dasgupta completely characterize.

“Pollution for Promotion,” R. Jia (2012)

Ruixue Jia is on the job market from IIES in Stockholm, and she has the good fortune to have a job market topic which is very much au courant. In China, government promotions often depend both on the inherent quality of the politician and on how connected you are to current leaders; indeed, a separate paper by Jia finds that promotion probability in China depends only on the interaction of economic growth and personal connections rather than either factor by itself. Assume that a mayor can choose how much costly effort to exert. The mayor chooses how much dirty and clean technology – complements in production – to use, with the total amount of technology available an increasing function of the mayor’s effort. The mayor may personally dislike dirty technology. For any given bundle of technology, the observed economic output is higher the higher the mayor’s inherent quality (which he does not know). The central government, when deciding on promotions, only observes economic output.

Since mayors with good connections have a higher probability of being promoted for any level of output in their city, the marginal return to effort and the marginal return to dirty technology are increasing in the connectedness of the mayor. For any given distaste for pollution among the mayor, a more connected mayor will mechanically want to substitute clean for dirty technology since higher output is more valuable to him for career concerns while the marginal cost of distaste for pollution has not changed. Further, by a Le Chatelier argument, higher marginal returns to output increase the optimal effort choice, which allows a higher budget to purchase technology, dirty tech included. To the extent that the government cares about limiting the (unobserved) use of dirty tech, this is “almost” the standard multitasking concern: the folly of rewarding A and hoping for B. Although in this case, empirically there is no evidence that the central government cares about promoting local politicians who are good for the environment!

How much do local leaders increase pollution (and simultaneously speed up economic growth!) in exchange for a shot at a better job? The theory above gives us some help. We see that the same politician will substitute in dirty technology if, in some year, his old friends get on the committee that assigns promotions (the Politburo Standing Committee, or PSC, in China’s case). This allows us to see the effect of the Chinese incentive system on pollution even if we know nothing about the quality of each individual politician or whether highly-connected politicians get plum jobs in low pollution regions, since every effect we find is at the within-politician level. Using a diff-in-diff, Jia finds that in the year after a politician’s old friend makes the PSC, sulfur dioxide goes up 25%, a measure of river pollution goes up by a similar amount, industrial GDP rises by 15%, and non-industrial GDP does not change. So it appears that China’s governance institution does incentivize governors, although whether those incentives are good or bad for welfare depends on how you trade off pollution and growth in your utility function.

Good stuff. A quick aside, since what I like about Jia’s work is that she makes an attempt to more than simply find a clever strategy for getting internal validity. Many other recent job market stars – Dave Donaldson and Melissa Dell, for instance – have been equally good when it comes to caring about more than just nice identification. But such care is rare indeed! It has been three decades since we, supposedly, “took the ‘con’ out of Econometrics”. And yet an unbearable number of papers are still floating around which quite nicely identify a relationship of interest in a particular dataset, then go on to give only the vaguest and most unsatisfying remarks concerning external validity. That’s a much worse con than bad identification! Identification, by definition, can only hold ceteris paribus. Even perfect identification of some marginal effect tells me absolutely nothing about the magnitude of that effect when I go to a different time, or a different country, or a more general scenario. The only way – the only way! – to generalize an internally valid result, and the only way to explain why that result is the way it is, is to use theory. A good paper puts the theoretical explanation and the specific empirical case examined in context with other empirical papers on the same general topic, rather than stopping after the identification is cleanly done. And a good empirical paper needs to explain, and needs to generalize, because we care about unemployment (not unemployment in border counties of New Jersey in the 1990s) and we care about the effect of military training on labor supply (not the effect of the Vietnam War on labor supply in the few years following), etc. If we really want the credibility revolution in empirical economics to continue, let’s spend less seminar and referee time worrying only about internal validity, and more time shutting down the BS that is often passed off as “explanation”.

November 2012 working paper. Jia also has an interesting paper about the legacy of China’s treaty ports, as well as a nice paper (a la Nunn and Qian) on the importance of the potato in world history (really! I may be a biased Dorchester-born Mick, but still, the potato has been fabulously important).

“How Does Family Income Affect Enrollment?,” N. Hilger (2012)

Nate Hilger is on the market from Harvard this year. His job market paper continues a long line of inference that is probably at odds with mainstream political intuition. Roughly, economists generally support cash rather than in-kind transfers because people tend to be the best judges of the optimal use of money they receive; food stamps are not so useful if you really need to pay the heat bill that week. That said, if the goal is to cause some behavior change among the recipient, in-kind transfers can be more beneficial, especially when the cash transfer would go to a family while the in-kind transfer would go to a child or a wife.

Hilger managed to get his hands on the full universe of IRS data. I’m told by my empirically-minded friends that this data is something of a holy grail, with the IRS really limiting who can use the data after Saez proved its usefulness. IRS data is great because of the 1098T: colleges are required to file information about their students’ college attendance so that the government can appropriately dole out aid and tax credits. Even better, firms that fire or layoff workers file a 1099G. Finally, claimed dependents on the individual tax form let us link parents and children. That’s quite a trove of data!

Here’s a question we can answer with it: does low household income lower college attendance, and would income transfers to poor families help reduce the college attendance gap? In a world with perfect credit markets, it shouldn’t matter, since any student could pledge the human capital she would gain as collateral for a college attendance loan. Of course, pledging one’s human capital turns out to be quite difficult. Even if the loans aren’t there, a well-functioning and comprehensive university aid program should insulate the poor from this type of liquidity problem. Now, we know from previous studies that increased financial aid has a pretty big effect on college attendance among the poor and lower middle class. Is this because the aid is helping loosen the family liquidity constraint?

Hilger uses the following trick. Consider a worker who is laid off. This is only a temporary shock, but this paper and others estimate a layoff lowers discounted lifetime earnings by an average of nearly $100,000. So can we just propensity match laid off and employed workers when the child is college age, and see if the income shock lowers attendance? Not so fast. It turns out that matching on whatever observables we have, children whose fathers are laid off when the child is 19 are also much less likely to attend college than children whose fathers are not laid off, even though age 19 would be after the attendance decision is made. Roughly, a father who is ever laid off is correlated with some nonobservables that lower college attendance of children. So let’s compare children whose dads are laid off at 17 to children whose dads are laid off from a similar firm at age 19, matching on all other observables. The IRS data has so many data points that this is actually possible.

What do we learn? First, consumption (in this case, on housing) spending declines roughly in line with the lifetime income hypothesis after the income shock. Second, there is hardly any effect on college attendance and quality: attendance for children whose dads suffer the large income shock falls by half a percentage point. Further, the decline is almost entirely borne by middle class children, not the very poor or the rich: this makes sense since poor students rely very little on parental funding to pay for college, and the rich have enough assets to overcome any liquidity shock. The quality of college chosen also declines after a layoff, but only by a very small amount. That is, the Engel curve for college spending is very flat: families with more income tend to spend roughly similar amounts on college.

Policy-wise, what does this mean? Other authors have estimated that a $1000 increase in annual financial aid increases college enrollment by approximately three percentage points (a particularly strong effect is found among students from impoverished families); the Kalamazoo experiment shows positive feedback loops that many make the efficacy of such aid even higher, since students will exert more effort in high school knowing that college is a realistic financial possibility. Hilger’s paper shows that a $1000 cash grant to poor families will likely improve college attendance by .007 to .04 percentage points depending on whether the layoff is lowering college attendance due to a transitory or a permanent income shock. That is, financial aid is orders of magnitude more useful in raising college attendance than cash transfers, especially among the poor.

November 2012 working paper (No IDEAS version). My old Federal Reserve coworker Christopher Herrington is also on the job market, and has a result suggesting the importance of Hilger’s finding. He computes a DSGE model of lifetime human capital formation, and considers the counterfactual where the US has more equal education funding (that is, schools that centrally funded rather than well-funded in rich areas and poorly-funded in poor areas). Around 15% of eventual earnings inequality – again taking into account many general equilibrium effects – can be explained by the high variance of US education funding. As in Hilger, directly altering the requirement that parents pay for school (either through direct payments at the university level, or by purchasing housing in rich areas at the primary level) can cure a good portion of our growing inequality.

“Paternalism, Libertarianism and the Nature of Disagreement,” U. Loginova & P. Persson (2012)

Petra Persson is on the job market this year from Columbia. Her CV is pretty incredible – there’s pure theory, cutting edge empirical techniques, policy work, networks, behavioral and more. Her job market paper is about the impact of social insurance policy on seemingly unrelated markets like marriage, and I’ll discuss it briefly at the end of the post, but I want to focus on another paper on hers which struck me as quite interesting.

Imagine a benevolent ruler who has private information about some policy, such as the relative safety of wearing seatbelts. This ruler can either tell citizens the information, or lie, or coerce them to take such action. Naive libertarianism suggests that we should always be truthful is altruistic; consumers can then weigh the information according to their preferences and then choose the policy optimal for them.

But note something interesting. On some issues, one subset of politicians has libertarian leanings, while on others, a different subset has those leanings. For instance, a politician may favor legal assisted suicide but insist on mandatory seatbelt rules, while another politician may be against the mandatory belt and also against legal assisted suicide. Politicians can even vary in how libertarian they wish to be depending on who the policy affects. Witness that many politicians favor legalizing marijuana but very few favor legalizing it for 16 year olds. What explains this behavior?

Loginova and Persson examine this theoretically. Take a population of citizens. There are two possible states, 0 and 1. They can either think each state equally likely yet have different heterogeneous preferences from the politician (measured with a Crawford-Sobel style quadratic loss, though this isn’t a critical model) or they can have identical preferences as the politician yet have different heterogenous (prior) beliefs about the probability of each state. The politician can be altruistic to varying degrees – more altruism means he, according to his own prior, puts more and more weight on the utility of the agent. The politician gets a noisy signal about the true state. To limit the extent of opposing beliefs, the politician is restricted to having the same prior as the median citizen.

If the politician can only advise or not advise, when does he make a truthful public announcement? If he disagrees on preferences with the citizens, then the more altruistic, the more likely he is to announce truthfully, for the standard libertarian reason: the citizens know their own preferences, and the better informed they are, the better they can maximize their own welfare. If, however, he disagrees on priors with the citizens, then the more altruistic, the less likely he is to announce truthfully: altruism means I care about the citizen’s welfare, but since they have priors that are in my eyes wrong, the citizens know that even when I am altruistic I have incentive to lie so that citizens take actions that are optimal according to my prior, therefore truthful communication cannot be sustained.

Now what if the politician could (at a cost to him) force all individuals to take an individual action? With preference disagreement, an altruistic politician would never do this, both because he can send all the information to citizens with a free message and also because a mandate does not respect heterogeneity of preferences. Even if action 0 is better than action 1 for 90% of the population, an altruistic principle also cares about the other 10%. With disagreement about priors, however, an altruistic politician is more likely to impose a mandate the more altruistic he is. Even though citizens have heterogeneous priors, the principle thinks all of them are wrong, and hence is not worried about heterogeneity when imposing a mandate. Since we noted in the last paragraph that altruistic politicians who have different priors from citizens will not be able to credibly send their information, the mandate allows the politician’s private information to be used in the citizen’s actions.

Finally, what if the politician can send individual-level messages or enforce individual mandates? A politician with preference disagreement need to be fairly altruistic before his public message is credible; in fact, he needs to be able to credibly persuade the individual with the average disagreement in order for his public signal to be credible. If he is not altruistic enough, he can still credibly persuade those agents who have only a limited amount of preference disagreement with him. If mandates are possible, the politician with limited altruism will force individuals whose preferences are very different from the politician to take the politician’s desired action, but since preferences of the politician and the agents are more aligned when altruism is higher, the share of citizens who face a mandate declines as the politician’s altruism increases. Likewise, a politician with disagreement about priors can only truthfully send information when his altruism is low. If the politician is very altruistic, even though the public signal will not be believed, a politician can still credibly send information to those whose priors are similar to the politician. The politician with low levels of altruism will only mandate the action of agents with extreme beliefs, but as altruism increases, more and more citizens will face a mandate.

Very good – the use of paternalistic policies, and the extent to which they are targeted at individuals, depends qualitatively on whether the politician disagrees with the agents about their preferences or about their knowledge, and the extent to which mandates are applied on certain groups depends on how extreme their preferences or beliefs are. There is nothing inherently contradictory in an altruistic politician taking the libertarian side on one issue and the paternalistic side on another.

July 2012 working paper (No IDEAS version). Petra has many other interesting papers. In her job market paper, presented here last week, she shows that social insurance, in this case a widow’s benefit in Sweden, can have major affects in other markets. In particular, a really nice regression discontinuity shows that the benefit was leading to a huge number of extra marriages, that these were more likely to end in divorce, that intrahousehold bargaining was affected, and much more (Jeff at Cheap Talk has a longer description). Her paper Circles of Trust notes a reason for cliquish behavior in some labor markets. If I have information whose value declines with use (such as a stock tip) and I am altruistic, I may wish to tell my friends the info. But I worry that they will tell their friends, who I don’t know and hence don’t really care about. If my friend could commit not to tell his friends, I would give him the info. How can we commit ex-ante? Make our social networks a clique. I would bet that this phenomenon explains hiring in, say, small hedge funds to a great extent.

Game Theory and History, A. Greif & Friends (1993, 1994)

(This post refers to A. Greif, “Contract Enforceability and Economic Institutions in Early Trade: The Maghribi Traders’ Coalition”, AER 1993, and A. Greif, P. Milgrom & B. Weingast, “Coordination, Commitment and Enforcement: The Case of the Merchant Guild,” JPE 1994.)

Game theory, after a rough start, may actually be fulfilling its role as proposed by Herbert Gintis: unifier of the sciences. It goes without saying that game theoretic analysis is widespread in economics, political science (e.g., voter behavior), sociology (network games), law (antitrust), computer science (defending networks against attacks), biology (evolutionary strategies), pure philosophy (more on this in a post tomorrow!), with occasional appearances in psychology, religion (recall Aumann’s Talmud paper), physics (quantum games), etc. But history? Surely game theory, particularly the more complex recent results, has no place there? Yet Avner Greif, an economic historian at Stanford, has shown that games can play a very interesting role indeed in understanding historical events.

Consider first his Maghribi traders paper. In the 11th and 12th century, a group of Judeo-Arabic traders called the Maghribis traded across the Mediterranean. Two institutional aspects of their trade are interesting. First, they all hired agents in foreign cities to carry out their trade, and second, they generally used other Maghribi merchants as their agents. This is quite different from, for instance, Italy, where merchants tended to hire agents in foreign cities who were not themselves merchants. What explains that difference, and more generally, how can long distance traders insure that traders do not rip you off? For instance, how do I keep them from claiming they sold at a low price when actually they sold at a high one?

To a theorist, this looks like a repeated reputational game with imperfect monitoring. Greif doesn’t go the easy route and just assume there are trustworthy and untrustworthy types. Rather, he assumes that there are a set of potential agents who can be hired in each period, that agents are exogenously separated from merchants with probability p in each period, and that merchants can choose to hire and fire at any wage they choose. You probably know from economics of reputation or from the efficiency wage literature that I need to offer wages higher than the agent’s outside option to keep him from stealing; the value of the continuation game, then, is more than the value of stealing now. Imagine that I fire anyone who steals and never hire him again. How do I ensure that other firms do not then hire that same agent (perhaps the agent will say, “Look, give me a second chance and I will work at a lower wage”)? Well, an agent who has cheated one merchant will never be hired by that merchant again. This means that when he is in the unemployed pool, even if other merchants are willing to hire him, his probability of getting hired is lower, since one merchant will definitely not hire him. That means that the continuation value of the game if he doesn’t steal from me is lower. Therefore, the efficiency wage I must pay him to keep him from stealing is higher than the efficiency wage I can pay someone who hasn’t ever stolen, so I strictly prefer to hire agents who have never stolen. This allows the whole coalition to coordinate. Note that the fewer agents there are, the higher the continuation value from not stealing, and hence the lower the efficiency wage I can pay: it is optimal to keep the set of potential agents small.

What of the Italian merchants? Why do they not hire only each other? Maghribi merchants tended to be involved only in long distance trade, while Italian merchants were also involved in real estate and other pursuits. This means the outside option (continuation value after cheating if no one hires me again) is higher for Italian merchants than Maghribi merchants, which means that hiring merchants at the necessary efficiency wage will be relatively more expensive for Italians than Maghribis.

A followup by Greif, with Milgrom and Weingast, considers the problem of long distance trade from the perspective of cities. Forget about keeping your agent from ripping you off: how do you keep the city from ripping you off? For instance, Genoans in Constantinople had their district overrun by a mob at one point, with no compensation offered. Sicilians raised taxes on sales by Jews at one point after they had brought their goods for sale. You may naively think that reputation alone will be enough; I won’t rip anyone off because I want a reputation of being a safe and fair city for trade.

But again, the literature of repeated games tells us this will not work. Generally, I need to punish deviations from the efficient set of strategies, and punish those who themselves do not punish deviators. In terms of medieval trade, to keep a city from ripping me off, I need not only to punish the city by bringing it less trade, but I also need to make sure the city doesn’t make up for my lost trade by offering a special deal to some other trader. That is, I need to get information about violation against a single trader to other traders, and I need to make sure they are willing to punish the deviating city.

The merchant guild was the institution that solved this problem. Merchant guilds were able to punish their own members by, for example, keeping them from earning rents from special privileges in their own city. In the most general setting, when a guild orders a boycott, cities may be able to attract some trade, but less than the efficient amount, because only by offering a particularly good deal to the merchants who come during a boycott will entice them to come and to credibly believe the city will not steal.

This is all to say that strong guilds may be in the best interest of cities since they allow the city to solve its commitment problem. The historical record confirms many examples of cities encouraging guilds to come trade, and encouraging the strengthening of guilds. Only a reputational model like the above one can explain such city behavior; if guilds are merely extracting rents with monopoly privilege, cities would not encourage them all. Both of these papers, I think, are quite brilliant.

1993 AER (IDEAS version) and 1994 JPE (IDEAS version). Big thumbs up to Avner for having the final published versions of these papers on his website.

The Well-Calibrated Pundit

I’ll get back to the usual posting regimen on new research, but the recent election is a great time to popularize some ideas that are well known in the theory community, though perhaps not generally. Consider the problem of punditry. Nature is going to draw an election winner, perhaps in a correlated way, from 51 distributions representing each state plus DC. An “expert” is someone who knows the true distribution, e.g., “With .7 probability, independent from all other states, Obama will win in New Hampshire.” We wish to identify the true experts. You can see the problem: the true expert knows distributions, yet we who are evaluating the expert can only see one realization from each distribution.

When forecasts are made sequentially – imagine a weather forecaster declaring whether it will rain or not every day – there is a nice literature (done principally here at MEDS) about the problem of divining experts. Essentially, as first pointed out by Foster and Vohra in a 1998 Biometrika, imagine that you set a rule such that a true expert, who knows the underlying distribution each period, “passes” the rule with very high probability. It then turns out (this can be proven using a version of the minmax theorem) that a complete ignoramus who knows nothing of the underlying distribution can also pass your test. This is true no matter what the test is.

Now, the testing literature is interesting, but more interesting are properties of what a good test for a forecaster might be. In an idea I first saw through a famous 1982 paper in JASA, one minimally sensible rule might be called “calibration”. I am well-calibrated if, on the days where I predict rain with probability .4, then it actually rains 40 percent of the time. Clearly this is not sufficient – I am well-calibrated if I simply predict the long run empirical average frequency of rain every day – but it seems a good minimum necessary condition. A law of large numbers argument shows that a true expert will pass a calibration test with arbitrarily high probability. With a lot of data points, we could simply bin predictions (say, here are days where prediction is between 40 and 45%), and graph those points against the actual empirical realization on the predicted days; a well-calibrated forecast would generate all data points along the 45-degree line.

Here is where we come to punditry. The recent election looks like a validation for data-using pundits like Nate Silver, and in many ways it is; people were calling people like him idiots literally a week ago, yet Silver, Sam Wang and the rest have more or less correctly named the winner in every state. But, you might say, what good is that? After all, aside from Florida, Intrade also correctly predicted the winner in every state. Here is where we can use calibration tests to figure out who is the better pundit.

Of the swing states which went for Obama, Intrade had Virginia and Colorado as tossups; Ohio, Iowa, New Hampshire and Florida as 2/3 favorites for the frontrunner (Obama in the first three, Romney in FL); and Wisconsin, Pennsylvania, Michigan, Nevada, Ohio and North Carolina as 75 to 85% chances for the frontrunner (Obama in the first five, Romney in NC). A well-calibrated prediction would have had the half the tossup states go to each candidate, 67% of the second group of states to the frontrunner, and 80% or so of the third group to the frontrunner. That is, a well-calibrated Intrade forecast should have “missed” on .5*2+.33*4+.2*6, or roughly 3, states. Intrade actually missed only one.

Doing the same exercise with Silver’s predictions, he had Florida as a tossup, Colorado as a .8 Obama favorite, NC a .8 Romney favorite, Iowa and NH about .85 Obama favorites, Ohio and Nevada about .9 Obama favorites, and the rest of the swing states a .95 or higher Obama favorite. A well-calibrated Silver forecast, then, should have been wrong on about 1.5 states. With Florida going to Obama, Silver will have correctly called every state. There is a very reasonable argument that Silver’s prediction would have been better had Florida gone to Romney! He would have called fewer states correctly in their binary outcomes, but his percentages would have been better calibrated.

That is, Silver is 1.5 states “off” the well-calibrated prediction, and Intrade 2 states “off” the well-calibrated prediction; we say, then, that Silver’s predictions were better. A similar calculation could be made for other pundits. Such an exercise is far better than the “how many states did you call correctly?” reckoning that you see, for example, here.

Two caveats. First, the events here are both simultaneous and correlated. If we had an even number of Romney-leaning and Obama-leaning swing states, the correlation would be much less important, but given how many of the close states were predicted to go for Obama, you might worry that even true experts will either get 0 states wrong, or a whole bunch of states wrong. This is a fair point, but in the absence of total correlation tangential to the general argument that forecasters of probabilistic binary events who get “everything right” should not be correct on every event. Second, you may note that forecasters like Silver also made predictions about vote shares and other factors. I agree that it is much more useful to distinguish good and bad forecasters using that more detailed, non-binary data, but even there, nature is only giving us one realization from a true underlying distribution in each state, so the point about calibration still applies. (EDIT: I should note here that if you’re interested in calibrated forecasts, there are much more sophisticated ways of doing the type of analysis that I did above, though with the same qualitative point. Google “Brier score” for a particularly well-known way to evaluate binary outcomes; the Brier score can be decomposed in a way that you extract something very similar to the more basic analysis above. In general, scoring rules are in a branch of statistics that we economists very much like; unlike pure frequentism or Bayesianism, scoring rules and other Wald-style statistics implicitly set out a decision problem with a maximand before doing any analysis. Very satisfying.)

Back to our regular style of post tomorrow.

Follow

Get every new post delivered to your Inbox.

Join 206 other followers

%d bloggers like this: