Category Archives: Political Economy

“How Does Family Income Affect Enrollment?,” N. Hilger (2012)

Nate Hilger is on the market from Harvard this year. His job market paper continues a long line of inference that is probably at odds with mainstream political intuition. Roughly, economists generally support cash rather than in-kind transfers because people tend to be the best judges of the optimal use of money they receive; food stamps are not so useful if you really need to pay the heat bill that week. That said, if the goal is to cause some behavior change among the recipient, in-kind transfers can be more beneficial, especially when the cash transfer would go to a family while the in-kind transfer would go to a child or a wife.

Hilger managed to get his hands on the full universe of IRS data. I’m told by my empirically-minded friends that this data is something of a holy grail, with the IRS really limiting who can use the data after Saez proved its usefulness. IRS data is great because of the 1098T: colleges are required to file information about their students’ college attendance so that the government can appropriately dole out aid and tax credits. Even better, firms that fire or layoff workers file a 1099G. Finally, claimed dependents on the individual tax form let us link parents and children. That’s quite a trove of data!

Here’s a question we can answer with it: does low household income lower college attendance, and would income transfers to poor families help reduce the college attendance gap? In a world with perfect credit markets, it shouldn’t matter, since any student could pledge the human capital she would gain as collateral for a college attendance loan. Of course, pledging one’s human capital turns out to be quite difficult. Even if the loans aren’t there, a well-functioning and comprehensive university aid program should insulate the poor from this type of liquidity problem. Now, we know from previous studies that increased financial aid has a pretty big effect on college attendance among the poor and lower middle class. Is this because the aid is helping loosen the family liquidity constraint?

Hilger uses the following trick. Consider a worker who is laid off. This is only a temporary shock, but this paper and others estimate a layoff lowers discounted lifetime earnings by an average of nearly $100,000. So can we just propensity match laid off and employed workers when the child is college age, and see if the income shock lowers attendance? Not so fast. It turns out that matching on whatever observables we have, children whose fathers are laid off when the child is 19 are also much less likely to attend college than children whose fathers are not laid off, even though age 19 would be after the attendance decision is made. Roughly, a father who is ever laid off is correlated with some nonobservables that lower college attendance of children. So let’s compare children whose dads are laid off at 17 to children whose dads are laid off from a similar firm at age 19, matching on all other observables. The IRS data has so many data points that this is actually possible.

What do we learn? First, consumption (in this case, on housing) spending declines roughly in line with the lifetime income hypothesis after the income shock. Second, there is hardly any effect on college attendance and quality: attendance for children whose dads suffer the large income shock falls by half a percentage point. Further, the decline is almost entirely borne by middle class children, not the very poor or the rich: this makes sense since poor students rely very little on parental funding to pay for college, and the rich have enough assets to overcome any liquidity shock. The quality of college chosen also declines after a layoff, but only by a very small amount. That is, the Engel curve for college spending is very flat: families with more income tend to spend roughly similar amounts on college.

Policy-wise, what does this mean? Other authors have estimated that a $1000 increase in annual financial aid increases college enrollment by approximately three percentage points (a particularly strong effect is found among students from impoverished families); the Kalamazoo experiment shows positive feedback loops that many make the efficacy of such aid even higher, since students will exert more effort in high school knowing that college is a realistic financial possibility. Hilger’s paper shows that a $1000 cash grant to poor families will likely improve college attendance by .007 to .04 percentage points depending on whether the layoff is lowering college attendance due to a transitory or a permanent income shock. That is, financial aid is orders of magnitude more useful in raising college attendance than cash transfers, especially among the poor.

November 2012 working paper (No IDEAS version). My old Federal Reserve coworker Christopher Herrington is also on the job market, and has a result suggesting the importance of Hilger’s finding. He computes a DSGE model of lifetime human capital formation, and considers the counterfactual where the US has more equal education funding (that is, schools that centrally funded rather than well-funded in rich areas and poorly-funded in poor areas). Around 15% of eventual earnings inequality – again taking into account many general equilibrium effects – can be explained by the high variance of US education funding. As in Hilger, directly altering the requirement that parents pay for school (either through direct payments at the university level, or by purchasing housing in rich areas at the primary level) can cure a good portion of our growing inequality.

“Paternalism, Libertarianism and the Nature of Disagreement,” U. Loginova & P. Persson (2012)

Petra Persson is on the job market this year from Columbia. Her CV is pretty incredible – there’s pure theory, cutting edge empirical techniques, policy work, networks, behavioral and more. Her job market paper is about the impact of social insurance policy on seemingly unrelated markets like marriage, and I’ll discuss it briefly at the end of the post, but I want to focus on another paper on hers which struck me as quite interesting.

Imagine a benevolent ruler who has private information about some policy, such as the relative safety of wearing seatbelts. This ruler can either tell citizens the information, or lie, or coerce them to take such action. Naive libertarianism suggests that we should always be truthful is altruistic; consumers can then weigh the information according to their preferences and then choose the policy optimal for them.

But note something interesting. On some issues, one subset of politicians has libertarian leanings, while on others, a different subset has those leanings. For instance, a politician may favor legal assisted suicide but insist on mandatory seatbelt rules, while another politician may be against the mandatory belt and also against legal assisted suicide. Politicians can even vary in how libertarian they wish to be depending on who the policy affects. Witness that many politicians favor legalizing marijuana but very few favor legalizing it for 16 year olds. What explains this behavior?

Loginova and Persson examine this theoretically. Take a population of citizens. There are two possible states, 0 and 1. They can either think each state equally likely yet have different heterogeneous preferences from the politician (measured with a Crawford-Sobel style quadratic loss, though this isn’t a critical model) or they can have identical preferences as the politician yet have different heterogenous (prior) beliefs about the probability of each state. The politician can be altruistic to varying degrees – more altruism means he, according to his own prior, puts more and more weight on the utility of the agent. The politician gets a noisy signal about the true state. To limit the extent of opposing beliefs, the politician is restricted to having the same prior as the median citizen.

If the politician can only advise or not advise, when does he make a truthful public announcement? If he disagrees on preferences with the citizens, then the more altruistic, the more likely he is to announce truthfully, for the standard libertarian reason: the citizens know their own preferences, and the better informed they are, the better they can maximize their own welfare. If, however, he disagrees on priors with the citizens, then the more altruistic, the less likely he is to announce truthfully: altruism means I care about the citizen’s welfare, but since they have priors that are in my eyes wrong, the citizens know that even when I am altruistic I have incentive to lie so that citizens take actions that are optimal according to my prior, therefore truthful communication cannot be sustained.

Now what if the politician could (at a cost to him) force all individuals to take an individual action? With preference disagreement, an altruistic politician would never do this, both because he can send all the information to citizens with a free message and also because a mandate does not respect heterogeneity of preferences. Even if action 0 is better than action 1 for 90% of the population, an altruistic principle also cares about the other 10%. With disagreement about priors, however, an altruistic politician is more likely to impose a mandate the more altruistic he is. Even though citizens have heterogeneous priors, the principle thinks all of them are wrong, and hence is not worried about heterogeneity when imposing a mandate. Since we noted in the last paragraph that altruistic politicians who have different priors from citizens will not be able to credibly send their information, the mandate allows the politician’s private information to be used in the citizen’s actions.

Finally, what if the politician can send individual-level messages or enforce individual mandates? A politician with preference disagreement need to be fairly altruistic before his public message is credible; in fact, he needs to be able to credibly persuade the individual with the average disagreement in order for his public signal to be credible. If he is not altruistic enough, he can still credibly persuade those agents who have only a limited amount of preference disagreement with him. If mandates are possible, the politician with limited altruism will force individuals whose preferences are very different from the politician to take the politician’s desired action, but since preferences of the politician and the agents are more aligned when altruism is higher, the share of citizens who face a mandate declines as the politician’s altruism increases. Likewise, a politician with disagreement about priors can only truthfully send information when his altruism is low. If the politician is very altruistic, even though the public signal will not be believed, a politician can still credibly send information to those whose priors are similar to the politician. The politician with low levels of altruism will only mandate the action of agents with extreme beliefs, but as altruism increases, more and more citizens will face a mandate.

Very good – the use of paternalistic policies, and the extent to which they are targeted at individuals, depends qualitatively on whether the politician disagrees with the agents about their preferences or about their knowledge, and the extent to which mandates are applied on certain groups depends on how extreme their preferences or beliefs are. There is nothing inherently contradictory in an altruistic politician taking the libertarian side on one issue and the paternalistic side on another.

July 2012 working paper (No IDEAS version). Petra has many other interesting papers. In her job market paper, presented here last week, she shows that social insurance, in this case a widow’s benefit in Sweden, can have major affects in other markets. In particular, a really nice regression discontinuity shows that the benefit was leading to a huge number of extra marriages, that these were more likely to end in divorce, that intrahousehold bargaining was affected, and much more (Jeff at Cheap Talk has a longer description). Her paper Circles of Trust notes a reason for cliquish behavior in some labor markets. If I have information whose value declines with use (such as a stock tip) and I am altruistic, I may wish to tell my friends the info. But I worry that they will tell their friends, who I don’t know and hence don’t really care about. If my friend could commit not to tell his friends, I would give him the info. How can we commit ex-ante? Make our social networks a clique. I would bet that this phenomenon explains hiring in, say, small hedge funds to a great extent.

Game Theory and History, A. Greif & Friends (1993, 1994)

(This post refers to A. Greif, “Contract Enforceability and Economic Institutions in Early Trade: The Maghribi Traders’ Coalition”, AER 1993, and A. Greif, P. Milgrom & B. Weingast, “Coordination, Commitment and Enforcement: The Case of the Merchant Guild,” JPE 1994.)

Game theory, after a rough start, may actually be fulfilling its role as proposed by Herbert Gintis: unifier of the sciences. It goes without saying that game theoretic analysis is widespread in economics, political science (e.g., voter behavior), sociology (network games), law (antitrust), computer science (defending networks against attacks), biology (evolutionary strategies), pure philosophy (more on this in a post tomorrow!), with occasional appearances in psychology, religion (recall Aumann’s Talmud paper), physics (quantum games), etc. But history? Surely game theory, particularly the more complex recent results, has no place there? Yet Avner Greif, an economic historian at Stanford, has shown that games can play a very interesting role indeed in understanding historical events.

Consider first his Maghribi traders paper. In the 11th and 12th century, a group of Judeo-Arabic traders called the Maghribis traded across the Mediterranean. Two institutional aspects of their trade are interesting. First, they all hired agents in foreign cities to carry out their trade, and second, they generally used other Maghribi merchants as their agents. This is quite different from, for instance, Italy, where merchants tended to hire agents in foreign cities who were not themselves merchants. What explains that difference, and more generally, how can long distance traders insure that traders do not rip you off? For instance, how do I keep them from claiming they sold at a low price when actually they sold at a high one?

To a theorist, this looks like a repeated reputational game with imperfect monitoring. Greif doesn’t go the easy route and just assume there are trustworthy and untrustworthy types. Rather, he assumes that there are a set of potential agents who can be hired in each period, that agents are exogenously separated from merchants with probability p in each period, and that merchants can choose to hire and fire at any wage they choose. You probably know from economics of reputation or from the efficiency wage literature that I need to offer wages higher than the agent’s outside option to keep him from stealing; the value of the continuation game, then, is more than the value of stealing now. Imagine that I fire anyone who steals and never hire him again. How do I ensure that other firms do not then hire that same agent (perhaps the agent will say, “Look, give me a second chance and I will work at a lower wage”)? Well, an agent who has cheated one merchant will never be hired by that merchant again. This means that when he is in the unemployed pool, even if other merchants are willing to hire him, his probability of getting hired is lower, since one merchant will definitely not hire him. That means that the continuation value of the game if he doesn’t steal from me is lower. Therefore, the efficiency wage I must pay him to keep him from stealing is higher than the efficiency wage I can pay someone who hasn’t ever stolen, so I strictly prefer to hire agents who have never stolen. This allows the whole coalition to coordinate. Note that the fewer agents there are, the higher the continuation value from not stealing, and hence the lower the efficiency wage I can pay: it is optimal to keep the set of potential agents small.

What of the Italian merchants? Why do they not hire only each other? Maghribi merchants tended to be involved only in long distance trade, while Italian merchants were also involved in real estate and other pursuits. This means the outside option (continuation value after cheating if no one hires me again) is higher for Italian merchants than Maghribi merchants, which means that hiring merchants at the necessary efficiency wage will be relatively more expensive for Italians than Maghribis.

A followup by Greif, with Milgrom and Weingast, considers the problem of long distance trade from the perspective of cities. Forget about keeping your agent from ripping you off: how do you keep the city from ripping you off? For instance, Genoans in Constantinople had their district overrun by a mob at one point, with no compensation offered. Sicilians raised taxes on sales by Jews at one point after they had brought their goods for sale. You may naively think that reputation alone will be enough; I won’t rip anyone off because I want a reputation of being a safe and fair city for trade.

But again, the literature of repeated games tells us this will not work. Generally, I need to punish deviations from the efficient set of strategies, and punish those who themselves do not punish deviators. In terms of medieval trade, to keep a city from ripping me off, I need not only to punish the city by bringing it less trade, but I also need to make sure the city doesn’t make up for my lost trade by offering a special deal to some other trader. That is, I need to get information about violation against a single trader to other traders, and I need to make sure they are willing to punish the deviating city.

The merchant guild was the institution that solved this problem. Merchant guilds were able to punish their own members by, for example, keeping them from earning rents from special privileges in their own city. In the most general setting, when a guild orders a boycott, cities may be able to attract some trade, but less than the efficient amount, because only by offering a particularly good deal to the merchants who come during a boycott will entice them to come and to credibly believe the city will not steal.

This is all to say that strong guilds may be in the best interest of cities since they allow the city to solve its commitment problem. The historical record confirms many examples of cities encouraging guilds to come trade, and encouraging the strengthening of guilds. Only a reputational model like the above one can explain such city behavior; if guilds are merely extracting rents with monopoly privilege, cities would not encourage them all. Both of these papers, I think, are quite brilliant.

1993 AER (IDEAS version) and 1994 JPE (IDEAS version). Big thumbs up to Avner for having the final published versions of these papers on his website.

The Well-Calibrated Pundit

I’ll get back to the usual posting regimen on new research, but the recent election is a great time to popularize some ideas that are well known in the theory community, though perhaps not generally. Consider the problem of punditry. Nature is going to draw an election winner, perhaps in a correlated way, from 51 distributions representing each state plus DC. An “expert” is someone who knows the true distribution, e.g., “With .7 probability, independent from all other states, Obama will win in New Hampshire.” We wish to identify the true experts. You can see the problem: the true expert knows distributions, yet we who are evaluating the expert can only see one realization from each distribution.

When forecasts are made sequentially – imagine a weather forecaster declaring whether it will rain or not every day – there is a nice literature (done principally here at MEDS) about the problem of divining experts. Essentially, as first pointed out by Foster and Vohra in a 1998 Biometrika, imagine that you set a rule such that a true expert, who knows the underlying distribution each period, “passes” the rule with very high probability. It then turns out (this can be proven using a version of the minmax theorem) that a complete ignoramus who knows nothing of the underlying distribution can also pass your test. This is true no matter what the test is.

Now, the testing literature is interesting, but more interesting are properties of what a good test for a forecaster might be. In an idea I first saw through a famous 1982 paper in JASA, one minimally sensible rule might be called “calibration”. I am well-calibrated if, on the days where I predict rain with probability .4, then it actually rains 40 percent of the time. Clearly this is not sufficient – I am well-calibrated if I simply predict the long run empirical average frequency of rain every day – but it seems a good minimum necessary condition. A law of large numbers argument shows that a true expert will pass a calibration test with arbitrarily high probability. With a lot of data points, we could simply bin predictions (say, here are days where prediction is between 40 and 45%), and graph those points against the actual empirical realization on the predicted days; a well-calibrated forecast would generate all data points along the 45-degree line.

Here is where we come to punditry. The recent election looks like a validation for data-using pundits like Nate Silver, and in many ways it is; people were calling people like him idiots literally a week ago, yet Silver, Sam Wang and the rest have more or less correctly named the winner in every state. But, you might say, what good is that? After all, aside from Florida, Intrade also correctly predicted the winner in every state. Here is where we can use calibration tests to figure out who is the better pundit.

Of the swing states which went for Obama, Intrade had Virginia and Colorado as tossups; Ohio, Iowa, New Hampshire and Florida as 2/3 favorites for the frontrunner (Obama in the first three, Romney in FL); and Wisconsin, Pennsylvania, Michigan, Nevada, Ohio and North Carolina as 75 to 85% chances for the frontrunner (Obama in the first five, Romney in NC). A well-calibrated prediction would have had the half the tossup states go to each candidate, 67% of the second group of states to the frontrunner, and 80% or so of the third group to the frontrunner. That is, a well-calibrated Intrade forecast should have “missed” on .5*2+.33*4+.2*6, or roughly 3, states. Intrade actually missed only one.

Doing the same exercise with Silver’s predictions, he had Florida as a tossup, Colorado as a .8 Obama favorite, NC a .8 Romney favorite, Iowa and NH about .85 Obama favorites, Ohio and Nevada about .9 Obama favorites, and the rest of the swing states a .95 or higher Obama favorite. A well-calibrated Silver forecast, then, should have been wrong on about 1.5 states. With Florida going to Obama, Silver will have correctly called every state. There is a very reasonable argument that Silver’s prediction would have been better had Florida gone to Romney! He would have called fewer states correctly in their binary outcomes, but his percentages would have been better calibrated.

That is, Silver is 1.5 states “off” the well-calibrated prediction, and Intrade 2 states “off” the well-calibrated prediction; we say, then, that Silver’s predictions were better. A similar calculation could be made for other pundits. Such an exercise is far better than the “how many states did you call correctly?” reckoning that you see, for example, here.

Two caveats. First, the events here are both simultaneous and correlated. If we had an even number of Romney-leaning and Obama-leaning swing states, the correlation would be much less important, but given how many of the close states were predicted to go for Obama, you might worry that even true experts will either get 0 states wrong, or a whole bunch of states wrong. This is a fair point, but in the absence of total correlation tangential to the general argument that forecasters of probabilistic binary events who get “everything right” should not be correct on every event. Second, you may note that forecasters like Silver also made predictions about vote shares and other factors. I agree that it is much more useful to distinguish good and bad forecasters using that more detailed, non-binary data, but even there, nature is only giving us one realization from a true underlying distribution in each state, so the point about calibration still applies. (EDIT: I should note here that if you’re interested in calibrated forecasts, there are much more sophisticated ways of doing the type of analysis that I did above, though with the same qualitative point. Google “Brier score” for a particularly well-known way to evaluate binary outcomes; the Brier score can be decomposed in a way that you extract something very similar to the more basic analysis above. In general, scoring rules are in a branch of statistics that we economists very much like; unlike pure frequentism or Bayesianism, scoring rules and other Wald-style statistics implicitly set out a decision problem with a maximand before doing any analysis. Very satisfying.)

Back to our regular style of post tomorrow.

“Trafficking Networks and the Mexican Drug War,” M. Dell (2011)

Job market talks for 2012 have concluded at many schools, and therefore this is my last post on a job candidate paper. This is also the only paper I didn’t have a change to see presented live, and for good reason: Melissa Dell is clearly this year’s superstar, and I think it’s safe to assume she can have any job she wants, and at a salary she names. I have previously discussed another paper of hers – the Mining Mita paper – which would also have been a mindblowing job market paper; essentially, she gives a cleanly identified and historically important example of long-run effects of institutions a la Acemoglu and Robinson, but the effect she finds is that “bad” institutions in the colonial era led to “good” outcomes today. The mechanism by which historical institutions persist is not obvious and must be examined on a case-by-case basis.

Today’s paper is about another critical issue: the Mexican drug war. Over 40,000 people have been killed in drug-related violence in Mexico in the past half-decade, and that murder rate has been increasing over time. Nearly all of Mexico’s domestic drug production, principally pot and heroin, is destined for the US. There have been suggestions, quite controversial, that the increase in violence is a result of Mexican government policies aimed at shutting down drug gangs. Roughly, some have claimed that when a city arrests leaders of a powerful gang, the power vacuum leads to a violent contest among new gangs attempting to move into that city; in terms of the most economics-laden gang drama, removing relatively non-violent Barksdale only makes it easier for violent Marlo.

But is this true? And if so, when is it true? How ought Mexico deploy scarce drugfighting resources? Dell answers all three questions. First, she notes that the Partido Acción Nacional is, for a number of reasons, associated with greater crackdowns on drug trafficking in local areas. She then runs a regression discontinuity on municipal elections – which vary nicely over time in Mexico – where PAN barely wins versus barely loses. These samples appear balanced according to a huge range of regressors, including the probability that PAN has won elections in the area previously, a control for potential corruption at the local level favoring PAN candidates. In a given municipality-month, the probability of a drug-related homicide rises from 6 percent to 15 percent following a PAN inauguration after such a close election. There does not appear to be any effect during the lame duck period before PAN takes office, so the violence appears correlated to anti-trafficking policies that occur after PAN takes control. There is also no such increase in cases where PAN barely loses. The effect is greatest in municipalities on the border of two large drug gang territories. The effect is also greatest in municipalities where detouring around that city on the Mexican road network heading toward the US is particularly arduous.

These estimates are interesting, and do suggest that Mexican government policy is casually related to increasing drug violence, but the more intriguing question is what we should do about this. Here, the work is particularly fascinating. Dell constructs a graph where the Mexican road network forms edges and municipalities form vertices. She identifies regions which are historical sources of pot and poppyseed production, and identifies ports and border checkpoints. Two models on this graph are considered. In the first model, drug traffickers seek to reach a US port according to the shortest possible route. When PAN wins a close election, that municipality is assumed closed to drug traffic and gangs reoptimize routes. We can then identify which cities are likely to receive diverted drug traffic. Using data on drug possession arrests above $1000 – traffickers, basically – she finds that drug confiscations in the cities expected by the model to get traffic post-elections indeed rises 18 to 25 percent, depending on your measure. This is true even when the predicted new trafficking routes do not have a change in local government party: the change in drug confiscation is not simply PAN arresting more people, but actually does seem like more traffic along the route.

A second model is even nicer. She considers the equilibrium where traffickers try to avoid congestion. That is, if all gangs go to the same US port of entry, trafficking is very expensive. She estimates a cost function using pre-election trafficking data that is fairly robust to differing assumptions about the nature of the cost of congestion, and solves for the Waldrop equilibrium, a concept allowing for relatively straightforward computational solutions to congestion games on a network. The model in the pre-election period for which parameters on the costs are estimated very closely matches actual data on known drug trafficking at that time – congestion at US ports appears to be really important, whereas congestion on internal Mexican roads doesn’t matter too much. Now again, she considers the period after close PAN elections, assuming that these close PAN victories increase the cost of trafficking by some amount (results are robust to the exact amount), and resolves the congestion game from the perspective of the gangs. As in the simpler model, drug trafficking rises by 20 percent or so in municipalities that gain a drug trafficking route after the elections. Probability of drug-related homicides similarly increases. A really nice sensitivity check is performed by checking cocaine interdictions in the same city: they do not increase at all, as expected by the model, since the model maps trafficking routes from pot and poppy production sites to the US, and cocaine is only transshipped to Mexico via ports unknown to the researcher.

So we know now that, particularly when a territory is on a predicted trafficking route near the boundary of multiple gang territories, violence will likely increase after a crackdown. And we can use the network model to estimate what will happen to trafficking costs if we set checkpoints to make some roads harder to use. Now, given that the government has resources to set checkpoints on N roads, with the goal of increasing trafficking costs and decreasing violence, where ought checkpoints be set? Exact solutions turn out to be impossible – this “vital edges” problem in NP-hard and the number of edges is in the tens of thousands – but approximate algorithms can be used, and Dell shows which areas will benefit most from greater police presence. The same model, as long as data is good enough, can be applied to many other countries. Choosing trafficking routes is a problem played often enough by gangs that if you buy the 1980s arguments about how learning converges to Nash play, then you may believe (I do!) that the problem of selecting where to spend government counter-drug money is amenable to game theory using the techniques Dell describes. Great stuff. Now, between the lines, and understand this is my reading and not Dell’s claim, I get the feeling that she also thinks that the violence spillovers of interdiction are so large that the Mexican government may want to consider giving up altogether on fighting drug gangs.

http://econ-www.mit.edu/files/7484 (Nov 2011 Working Paper. I should note that this year is another example of strong female presence at the top of the economics job market. The lack of gender diversity in economics is problematic for a number of reasons, but it does appear things are getting better: Heidi Williams, Alessandra Voena, Melissa Dell, and Aislinn Bohren, among others, have done great work. The lack of socioeconomic diversity continues to be worrying, however; the field does much worse than fellow social sciences at developing researchers hailing from the developing world, or from blue-collar family backgrounds. Perhaps next year.)

“Buy Coal!: A Case for Supply Side Environmental Policy,” B. Harstad (2012)

The vast majority of world nations are not currently participating in agreements to limit global warming. Many countries cut down their rainforests in a way harmful to global social welfare. Even worse, attempts to improve things by the countries that do care can be self-defeating because of the problem of “leakage”, or what we economists just call supply and demand. Imagine Sweden cuts emissions by lowering domestic demand for oil. That lowers world demand for oil, lowering world price, hence increasing quantity demanded elsewhere. Boycotts may work in a similar way: when consumers in Canada stop buying some rare wood, the price of that wood falls, increasing the quantity of wood consumed in other countries.

What to do? Well, Coase tells us that externalities are in many cases not a worry when property rights are properly defined. Instead of trying to limit demand side consumption, why not limit supply? In particular, imagine that one set of countries (call them Scandinavia) imagines some harm from consumption of oil, and another set doesn’t care (let’s call them Tartary, after my favorite long-lost empire). Oil is costly to produce, and there is no entry, which isn’t a bad assumption for something like oil. Let there be a market for oil deposits – and you may have noticed from the number of Chinese currently laying pipe in Africa that such a market exists!

Let (q*,p*) be the quantity and price that clears the world market. Let h be the marginal harm to Scandinavia from global oil consumption. Let qopt be the socially optimal level of consumption from the perspective of Scandinavia, and popt the price that clears the market at that quantity. The Scandinavians just need to buy all the oil deposits whose cost of extraction is higher than popt minus h and lower than popt. Once they own the rights, they place an extraction tax on those deposits equal to the harm, h. With such a policy, no one exploits these marginal oil fields because of the tax, and no one exploits any more costly-to-extract fields because the cost of extraction is lower than the world oil price. There are many well-known mechanisms for buying the marginal oil fields at a cost lower than the harm inflicted on Scandinavia if the oil were exploited: the exact cost is particularly low if a few countries own all the world oil, since that country will benefit from Scandinavia’s policy as the world oil price rises following Scandinavia’s purchase of the marginal fields. Note that this policy is also nice in that oil, after the policy, costs exactly the same in Tartary and Scandinavia, so there is no worry about firms moving to the country with lax environmental policies. Another benefit is that it avoids the time inconsistency of related dynamic problems, such as using subsidies for green technology until they are invented, then getting rid of the subsidies.

There are some policies like this currently in place: for example, Norway’s environmental agency buys the rights to forest tracts and keeps them unexploited. But note that you have to buy the right tract to avoid leakage: you want to buy the tract that is worth exploiting, but just barely. This is great for you as the environmentalist, though, since this will be the cheapest tract to buy given the limited profit to be made if it is cut down!

This paper should also suggest to you other ways to enact environmental policy when facing leakage: political recalcitrance doesn’t mean we are completely out of options. The problem is that you want to decrease quantity consumed in your country – whose policies you control – without causing quantity consumed to rise elsewhere as price falls. The Pigouvian solution is to make the marginal suppliers unprofitable, or make the marginal demanders lower their willingness to pay. One way to do this without tax policy is to introduce products that are substitutes for the polluting good: clean energy, for instance. Or introduce complements for products which are substitutes for the polluting product. There are many more!

http://www.kellogg.northwestern.edu/faculty/harstad/htm/deposits.pdf (January 2012 draft – forthcoming in the Journal of Political Economy)

“Passion Over Reason? Mixed Motives and the Optimal Size of Voting Bodies,” J. Morgan & F. Vardi (2010)

Why do people vote? Voting has, as Condorcet noted centuries ago, great information aggregation properties. The classic mystery, best described in Downs’ famous 1957 paper, is basically the following: as the size of the voting polity increases, the probability of being a pivotal voter decreases. Therefore, the balance tilts for all of us toward freeriding if voting is at all costly and your benefit from determining who wins the election is not enormous (though see fellow blogger Andrew Gelman for his rebuttal based on the social benefits of voting). One way around this is in Sandroni and Feddersen’s 2006 AER. There, voters derive utility from fulfilling a duty, and that utility is endogenous to the strategies other agents choose. That is, I get utility from voting if you vote. This is a form of Harsanyi’s rule utilitarianism. You may also know this in game theory as a form of “procedural concern” – your utility in the game depends not only on the ends, but also on the means. A number of papers have taken a similar tack to try to explain stylized facts about voting.


In a new paper, Morgan and Vardy propose using similar procedural utility to investigate whether Condorcet is still right when people get “expressive” utility. Their jumping off point is Madison’s argument in Federalist Paper #58 that “passion, not reason” in votes tends to prevail when legislatures get too large (we’ll ignore for now that Madison’s “passion” works very differently than the theory in the present paper – Madison was worried that in large legislatures you just get more dumb, easily-swayed people, not that individual worries about expressing themselves to their constituency would lead to suboptimal votes).


In the present paper, there are two states, both ex ante the true one with equal probability. Each potential voter is given a signal positively correlated with the actual state. Voters all have an ex-ante expressive bias toward one of the states, and voters are all “malleable” or “stubborn” in that with some probability they are willing to ignore their bias when they receive the informative signal. When voters are unwilling to ignore their bias, they receive a fixed (potentially arbitrarily small) utility payoff from voting their bias instead of using their signal. For example, the evidence may suggest voting for one economic policy, but the home constituency may not like that particular vote. “Stubborn” voters are those who get some small amount of extra utility from voting against the evidence in a way that makes constituents back home happy. “Malleable” voters are those who, when they see the evidence, completely ignore their original bias. Note that stubborn voters may still vote for the economic policy they’re biased against: they’re just weighing the epsilon amount of utility from expressing their bias against the (potentially large) utility they will get from seeing the “good” policy enacted, multiplied of course times the probability that their vote is the critical one.

Even a small amount of extra utility from expressiveness can lead to inefficient information aggregation, particularly when there are a large number of stubborn voters. The intuition here is fairly straightforward: the probability of being pivotal decreases in the number of voters, but the benefit of voting expressively if you are stubborn is fixed regardless of how many voters there are. When there are a relatively large number of stubborn voters, an additional voter in expectation (perhaps stubborn, perhaps not) is more likely to break a tie correctly if the true state is the state with bias than they are to break a tie incorrectly if the true state is the state without bias. Tiebreaking is the only situation that concerns voters that only care about enacting the right policy. Though with even probability of ties information aggregation still occurs, as the number of voters increases, the proportion of tied votes becomes more and more tilted toward the state without bias. The marginal voter with some probability is stubborn, and since her probability of being pivotal is tiny when the number of voters is large, she will vote expressively with high probability. Voting expressively means, in expectation, voting for the state with bias, which breaks ties incorrectly much more often than it does so correctly. Information aggregation is destroyed. Indeed, as the number of voters goes to infinity, the vote is no more likely than a coin flip to choose the “correct” policy in these cases.

There are a couple other minor results of note. First, even when the level of stubbornness is low, it is not always beneficial to increase the number of voters. That is, even though as the probability of the vote aggregating information correctly goes to 1, it does not do so monotonically. In particular, when the number of voters is small, no matter what the other parameters are, everybody simply votes according to the signal they receive about the state. When the number of voters are large, even when voting is purely expressive for stubborn voters, their expressive desires are outweighed by the information aggregating votes of the malleable voters, and the Condorcet result holds. However, there is a range of voter polity size where voters whose signal is contrary to their bias play a mixed strategy, and this particular voting behavior is bad for information aggregation; I wish I could tell you why, but unfortunately the authors give very little intuition for their proofs in the current version of the paper. In any case, the results are an interesting and novel argument for smaller voting polities when information is dispersed.

http://econ.la.psu.edu/papers/JMorgan110210.pdf (Nov 2010 Working Paper)

“Secrets,” D. Ellsberg (2002)

Generally, the public won’t know even the most famous economists – mention Paul Samuelson to your non-economist friends and watch the blank stares – but a select few manage to enter the zeitgeist through something other than their research. Friedman had a weekly column and a TV series, Krugman is regularly in the New York Times, and Greenspan, Summers and Romer, among many others, are famous for their governmental work. These folks at least have their fame attributable to their economics, if not their economic research. The real rare trick is being both a famous economist and famous in another way. I can think of two.

First is Paul Douglas, of the Cobb-Douglas production function. Douglas was a Chicago economist who went on to become a long-time U.S. Senator. MLK Jr. called Douglas “the greatest of all Senators” for his work on civil rights. In ’52, with Truman’s popularity at a nadir, Douglas was considered a prohibitive favorite for the Democratic nomination would he have run. I think modern-day economists would very much like Douglas’ policies: he was a fiscally conservative, socially liberal reformist who supported Socialists, Democrats and Republicans at various times, generally preferring the least-corrupt technocrat.

The other famous-for-non-economics-economist, of course, is Daniel Ellsberg. Ellsberg is known to us for the Ellsberg Paradox, which in many ways is more important than the work of Tversky and Kahneman for encouraging non-expected utility derivations by decision theorists. Ellsberg would have been a massive star had he stayed in econ: he got his PhD in just a couple years, published his undergrad thesis (“the Theory of the Reluctant Duelist”) in the AER, his PhD thesis in the QJE, and was elected to the Harvard Society of Fellows, joining Samuelson and Tobin in that still-elite group.

As with many of the “whiz kids” of the Kennedy and Johnson era, he consulted for the US government, both at RAND and as an assistant to the Undersecretary of Defense. Government was filled with theorists at the time – Ellsberg recounts meetings with Schelling and various cabinet members where game theoretic analyses were discussed. None of this made Ellsberg famous, however: he entered popular culture when he leaked the “Pentagon Papers” early in the Nixon presidency. These documents were a top secret, internal government report on presidential decisionmaking in Vietnam going back to Eisenhower, and showed a continuous pattern of deceit and overconfidence by presidents and their advisors.

Ellsberg’s description of why he leaked the data, and the consequences thereof, are interesting in and of themselves. But what interests me in this book – from the perspective of economic theory – is what the Pentagon Papers tell us about secrecy within organizations. Governments and firms regularly make decisions, as an entity, where optimal decisionmaking depends on correctly aggregating information held by various employees and contractors. Standard mechanism design is actually very bad at dealing with desires for secrecy within this context. That is, imagine that I want to aggregate information but I don’t want to tell my contractors what I’m going to use it for. A paper I’m working on currently says this goal is basically hopeless. A more complicated structure is one where a firm has multiple levels (in a hierarchy, let’s say), and the bosses want some group of low-level employees to take an action, but don’t want anyone outside the branch of the organizational tree containing those employees to know that such an action was requested. How can the boss send the signal to the low-level employees without those employees thinking their immediate boss is undermining the CEO? Indeed, something like this problem is described in Ellsberg’s book: Nixon and Kissinger were having low-level soldiers fake flight reports so that it would appear that American plans were not bombing Laos. The Secretary of Defense, Laird, did not support this policy, so Nixon and Kissinger wanted to keep this secret from him. The jig was up when some soldier on the ground contacted the Pentagon because he thought that his immediate supervisors were bombing Laos against the wishes of Nixon!

In general, secrecy concerns make mechanism problems harder because they can undermine the use of the revelation principle – we want the information transmitted without revealing our type. More on this to come. Also, if you can think of any other economists who are most famous for their non-economic work, like Douglas and Ellsberg, please post in the comments.

(No link – Secrets is a book and I don’t see it online. Amazon has a copy for just over 6 bucks right now, though).

“Information and Voting: The Wisdom of the Experts versus the Wisdom of the Masses,” J. McMurray (2011)

Consider elections where there is no question of preferences: the sole point of the election is to aggregate information about which candidate is “better” objectively, or whether a criminal facing a jury is guilty, etc. Let nature choose which of two candidates is better objectively. Let each agent get a signal about which candidate is better, and a signal of the quality of that signal. For instance, a signal “A is better” and a quality .5 means that A is truly better fifty percent of the time, and B fifty percent of the time. The signal “A is better” and a quality 1 means that A is truly better one hundred percent of the time.

There are two main, contradictory intuitions. First, Condorcet in the 18th century showed in his famous “jury theorem” that, as a matter of statistics, aggregating partially informative signals always improves outcomes (the proof is simple and can be found on Wikipedia, among others). On the other hand, Condorcet didn’t have any equilibrium concept, and it turns out that the aggregation in his jury theorem is not a Nash equilibrium. In particular, let there be two signals, one completely uninformative (quality .5) and one totally informative (quality 1). In equilibrium, Feddersen and Pesendorfer famously proved that the low quality voters do not vote in equilibrium. The reason is that, when you write out the relevant binomial formulae, conditioning on being a pivotal vote that swings the election, the probability of swinging the election by electing the wrong guy mistakenly is greater than the probability of swinging the election by electing the “better” candidate. In particular, the unanimity rule in jury voting is not optimal when we have this division of information: more voters are not necessarily better.

McMurray extends Fedderson and Pesendorfer to the case with continuous signal quality on [.5,1], with majority rule voting. This seems the obvious next step, but the problem had proved relatively intractable. McMurray is able to get around some technical difficulties by letting the number of voters be drawn from a Poisson distribution and using some results from Myerson’s Poisson population games papers. In particular, given a (symmetric) strategy profile s, and a distribution of signal quality F, the Poisson number of voters means that the number of voters for candidates A and B given that nature chooses candidate A as better are independent random variables with means np(AA) and np(AB) where n is the Poisson parameter and p(Ax) is just the probability of voting for candidate x given the true state is A and the strategy s is used, integrating over the distribution of signal quality types. That independence means that the probability of a voting outcome is the product of independent Poisson probabilities, and is therefore tractable.

With this trick in hand, McMurray shows that a symmetric cutoff rule is the only symmetric Bayesian equilibrium, meaning that if your signal quality is above a threshhold, you vote for whichever candidate your signal says is better. More importantly, this cutoff is, for any distribution F, bounded below 1 even as the number of voters goes to infinity. The intuition is the following: as n goes to infinity, the expected margin of victory for the better candidate grows unboundedly. However, an individual voting decision only depends on what happens conditional on victory being by between -1 and 1 votes. As long as the variance of the margin of victory is also increasing relatively rapidly, an agent with arbitrarily high signal quality will continue to believe that conditional on him being pivotal, a number of other agents must be making mistakes. In particular, the number of votes for and against a candidate n+ and n- are independent Poisson variables, so as the number of potential voters grows large, the margin of victory (n+)-(n-), the difference of two independent Poissons, hence asymptotically normal. This normal variable, the limiting margin of victory, turns out to have a constant ratio of expected value to variance. Therefore, the ratio of a one win victory to a one win defeat for the better candidate is just the ratio calculated from a standard normal distribution, and will be strictly less than one. Further, the cutoff turns out to be exactly the cutoff a social planner would choose were she to try to maximize the chance of choosing the right candidate, so there are some nice welfare properties. Of course, if a social planner can design any mechanism, and not just majority vote with a cutoff decision rule on whether to vote or not, she would do better: everyone has identical preferences, so the optimal mechanism would just ask each agent for their signal quality and what signal was received, then compute the likelihood function.

There are some numerical examples that attempt to convince the reader that this model has some nice empirical properties: for instance, if the distribution of quality types is uniform on [.5,1], then 59% of voters will vote, which sounds about right for most democracies. I don’t find these examples terrible convincing, though. The uniform type distribution gives an expected margin of victory of 70% for the winning candidate. You can manipulate the distribution of types to get around 50% participation and close elections, of course, but the necessary distributions are pretty ad hoc, and equally reasonable distributions can give you roughly any proportion of voters and margin of victory that you want. Certainly the more important part of the paper is showing that with continuous signals, equilibria can be computed and that they never imply full, compulsory voting is optimal.

One last brief word: when talking about elections, and not jury trials or similar, identical preferences and asymmetric information may not be the most satisfying model. It’s tough to come up with a story where information aggregation is the most compelling reason for aggregating votes. Some friends here were discussing what happens in this model if you let there be two groups of agents, with identical preferences within each group. The hypothesis is that voting is still not 100%, but is higher than in the McMurray paper because you sometimes want to flip a result that is being driven by the aggregation of the opposing group’s preferences.

http://econ.byu.edu/Faculty/Joseph%20McMurray/Assets/Research/Turnout.pdf (Current working paper, March 2011)

“Advocates,” M. Dewatripont & J. Tirole (1999)

We often see institutional design that deliberately encourages partisan advocates when it comes to information aggregation: lawyers defend and prosecutors try to convict, defense ministries angle for defense spending and education ministries for school spending. This is the case even though ex-ante the information collectors are indifferent about the outcome – until he’s given the case, surely the lawyer doesn’t care what the verdict on some random criminal proceeding will be. Why not just have a judge collect information and render a verdict, or have all government ministers collect information on what’s best for government as a whole?

Dewatripont and Tirole offer a nice little model of where we might optimally see advocates. Consider a policy status quo for which there may be good arguments to expand or contract the policy, or perhaps neither or both. Any agent can look for one of those pieces of news at cost K, or both simultaneously at cost 2K; with probability x, when an argument is looked for, it will be found. From a decisionmaking standpoint, getting arguments for both expansion and contraction is equivalent to receiving no new arguments at all. Regardless of who collects the information, it is ex-post verifiable for free. For now, also assume that information cannot be concealed; if the agent’s research program finds both expansionary and contractionary arguments, she must present both. Assume that benefits from good information are such that the principal always wants the agent to look for both arguments. Also assume that payments must be conditional on the decision, and not on the precise information presented: that is, payments if two contradicting arguments are offered must be the same as if no arguments are offered. Tirole and Dewatripont give some arguments, not terribly convincing, for why this restriction is often seen in practice, but for our purposes, just note that it is empirically true that laywers, for instance, are often paid based on what verdict is given, and not on the inherent quality of their rhetoric.

The problem with nonpartisan information gathering is the following: if an agent is incentivized to exert effort on two causes rather than one, he must be paid more than twice what he is paid to look for one piece of information. This is because if the first search finds an argument (say, to expand), then there is a possibility the second search will contradict that information. Since rewards are based on final decisions only, and the final decision with contradicting arguments is just to retain the status quo, the agent will be paid zero if he presents contradicting arguments. In particular, if the probability of discovering a given argument, x, is greater than .5, then there is no incentive compatible way to induce the agent to search for both pieces of information since the probability of contradicting oneself is too high; if x is less than .5, there are incentive compatible payments, but the agent is able to extract a rent.

What of partisan advocates? If I hire two agents, and pay each only if the final decision supports “their side”, then a payment of K/x(1-x) induces each agent to search and exhausts all their rents. In equilibrium the other agent is searching, so my probability of finding an argument and getting paid for it is x(1-x), so incentive compatibility just requires x(1-x)w-K>=0.

In standard Tirole fashion, a number of simple extensions are offered. If an agent can, for free, sometimes conceal “bad news”, then a single nonpartisan advocate can be optimal if there is a high cost of maintaining the status quo when evidence favors a change in policy. If the decision-maker – a judge, for instance – is potentially biased, advocacy can also be optimal. Consider a world where very costly appeals courts can be used if some party does like a verdict. If appeals are very costly, then they should only be used as an off-the-equilibrium path threat. The IC payment scheme mentioned earlier pays the nonpartisan advocate only if a decision moves away from the status quo, and further pays the advocate identically no matter which way the decision goes. So if judges are biased one way or the other, the nonpartisan advocate will never appeal since she wants the decision to shift policy from the status quo. On the other hand, if there are two advocates, and the decisionmaker picks Expand when the evidence favors Contract, the advocate for Contract will appeal. Since appeals are costly, this bias in judgment will never appear in equilibrium.

(An aside: how does Tirole still not have his Nobel? Some old Fed buddies and I have a Econ Nobel Draft every year, and Tirole has gone in the top 3 each of the last four years. It seems to me the only no-question micro prizes yet to be given are Tirole and Holmstrom – perhaps adding Laffont – and a Milgrom/Roth market design prize. You may like something like Alesina-Tabellini for political theory as well. In any case, Tirole is right at the top of this list. Give him the prize!)

http://www.nyu.edu/econ/user/bisina/advocates.pdf (Final JPE version)

%d bloggers like this: