Category Archives: Econometrics

“How Robust Standard Errors Expose Methodological Problems They Do Not Fix,” G. King and M. Roberts (2012)

Essentially every economist reports Huber-White robust standard errors, rather than traditional standard errors, in their work these days, and for good reason: heteroskedasticity, or heterogeneity in error variance across observations, can lead to incorrect standard error calculations. Generally, robust standard errors are only used to ensure that the parameter of interest is “real” and not an artifact of random statistical variation; the value of the parameter itself is unbiased under heteroskedasticity in many models as long as the model itself is correctly specified. For example, if data is generated by the linear process y=Xb+e, then the estimated parameter b is unbiased even if the OLS assumption that e is homoskedastic is violated. Many researchers just tag a “,robust” onto their Stata code and hope this inoculates them from criticism about the validity of their statistical inference.

King and Roberts point out, using three very convincing examples from published papers, that robust standard errors have another much more important use. If robust errors and traditional errors are very different, then researchers ought try to figure out what is causing the heteroskedasticity in their data since, in general, tests like Breusch-Pagan or White’s Test cannot distinguish between model misspecification and fundamental heteroskedasticity. Heteroskedasticity is common in improperly specified models, e.g., by estimating OLS when the data is truncated at zero.

Nothing here should be surprising. If you are a structural economist, then surely you find the idea of estimating any function other than the precise form suggested by theory (which is rarely OLS) to be quite strange; why would anyone estimate any function other than the one directly suggested by the model, where indeed the model gives you the overall variance structure? But King and Roberts show that such advise is not often heeded.

They first look at a paper in a top International Relations jouranl, which suggested that small countries receive excess foreign aid (which seems believable at first glance; I spent some time in East Timor a few years ago, a tiny country which seemed to have five IGO workers for every resident). The robust and traditional standard errors diverged enormously. Foreign aid flow amounts are super skewed. Taking a Box-Cox transformation gets the data looking relatively normal again, and rerunning the estimation on the transformed data shows little difference between robust and traditional standard errors. In addition to fixing the heteroskedasticity, transforming the specified model flips the estimated parameter: small countries receive less foreign aid than other covariates might suggest.

King and Roberts then examine a top political science publication (on trade agreements and foreign aid), where again robust and traditional errors diverge. Some diagnostic work finds that a particular detrending technique assumed homogenous across countries fits much better if done heterogeneously across countries; otherwise, spurious variation over time is introduced. Changing the detrending method causes robust and traditional errors to converge again, and as in the small country aid paper above, the modified model specification completely flips the sign on the parameter of interest. A third example came from a paper using Poisson to estimate overdispersed (variance exceeds the mean) count data; replacing Poisson with the more general truncated negative binomial model again causes robust and traditional errors to converge, and again completely reverses the sign on the parameter of interest. Interesting. If you insist on estimating models that are not fully specified theoretically, then at least use the information that divergent robust standard errors give you about whether you model is sensible.

September 2012 Working Paper (No IDEAS version)

“Recruiting for Ideas: How Firms Exploit the Prior Inventions of New Hires,” J. Singh & A. Agrawal (2011)

Firms poach engineers and researchers from each other all the time. One important reason to do so is to gain access to the individual’s knowledge. A strain of theory going back to Becker, however, suggests that if, after the poaching, the knowledge remains embodied solely in the new employer, it will be difficult for the firm to profit: surely the new employee will have an enormous amount of bargaining power over wages if she actually possesses unique and valuable information. (As part of my own current research project, I learned recently that Charles Martin Hall, co-inventor of the Hall-Heroult process for aluminum smelting, was able to gather a fortune of around $300 million after he brought his idea to the company that would become Alcoa.)

In a resource-based view of the firm, then, you may hope to not only access a new employer’s knowledge, but also spread it to other employees at your firm. By doing this, you limit the wage bargaining power of the new hire, and hence can scrape off some rents. Singh and Agrawal break open the patent database to investigate this. First, use name and industry data to try to match patentees who have an individual patent with one firm at time t, and then another patent at a separate firm some time later; such an employee has “moved”. We can’t simply check whether the receiving firm cites this new employee’s old patents more often, as there is an obvious endogeneity problem. First, firms may recruit good scientists more aggressively. Second, they may recruit more aggressively in technology fields where they are already planning to do work in the future. This suggests that matching plus diff-in-diff may work. Match every patent to another patent held by an inventor who never switches firms, attempting to find a second patent with very similar citation behavior, inventor age, inventor experience, technology class, etc. Using our matched sample, check how much the propensity to cite the mover’s patent changes compares to the propensity to the cite the stayer’s patent. That is, let Joe move to General Electric. Joe had a patent while working at Intel. GE researchers were citing that Intel patent once per year before Joe moved. They were citing a “matched” patent 1 times per year. After the move, they cite the Intel patent 2 times per year, and the “matched” patent 1.1 times per year. The diff-in-diff then suggests that moving increases the propensity to cite the Intel patent at GE by (2-1)-(1.1-1)=.9 citations per year, where the first difference helps account for the first type of endogeneity we discussed above, and the second difference for the second type of endogeneity.

What do we find? It is true that, after a move, the average patent held by a mover is cited more often at the receiving firm, especially in the first couple years after a move. Unfortunately, about half of new patents which cite the new employee’s old patent after she moves are made by the new employee herself, and another fifteen percent or so are made by previous patent collaborators of the poached employee. What’s worse, if you examine these citations by year, even five years after the move, citations to the pre-move patent are still highly likely to come from the poached employee. That is, to the extent that the poached employee had some special knowledge, the firm appears to have simply bought that knowledge embodied in the new employee, rather than gained access to useful techniques that quickly spread through the firm.

Three quick comments. First, applied econometrician friends: is there any reason these days to do diff-in-diff linearly rather than using the nonparametric “changes-in-changes” of Athey and Imbens 2006, which allows recovery of the entire distribution of effects of treatment on the treated? Second, we learn from this paper that the mean poached research employee doesn’t see her knowledge spread through the new firm, which immediately suggests the question of whether there are certain circumstances in which such knowledge spreads. Third, this same exercise could be done using all patents held by the moving employee’s old firm – I may be buying access to general techniques owned by the employee’s old firm rather than the specific knowledge represented in that employee’s own pre-move patents. I wonder if there’s any difference.

Final Management Science version (IDEAS version). Big thumbs up to Jasjit Singh for putting final published versions of his papers up on his site.

“How Does Family Income Affect Enrollment?,” N. Hilger (2012)

Nate Hilger is on the market from Harvard this year. His job market paper continues a long line of inference that is probably at odds with mainstream political intuition. Roughly, economists generally support cash rather than in-kind transfers because people tend to be the best judges of the optimal use of money they receive; food stamps are not so useful if you really need to pay the heat bill that week. That said, if the goal is to cause some behavior change among the recipient, in-kind transfers can be more beneficial, especially when the cash transfer would go to a family while the in-kind transfer would go to a child or a wife.

Hilger managed to get his hands on the full universe of IRS data. I’m told by my empirically-minded friends that this data is something of a holy grail, with the IRS really limiting who can use the data after Saez proved its usefulness. IRS data is great because of the 1098T: colleges are required to file information about their students’ college attendance so that the government can appropriately dole out aid and tax credits. Even better, firms that fire or layoff workers file a 1099G. Finally, claimed dependents on the individual tax form let us link parents and children. That’s quite a trove of data!

Here’s a question we can answer with it: does low household income lower college attendance, and would income transfers to poor families help reduce the college attendance gap? In a world with perfect credit markets, it shouldn’t matter, since any student could pledge the human capital she would gain as collateral for a college attendance loan. Of course, pledging one’s human capital turns out to be quite difficult. Even if the loans aren’t there, a well-functioning and comprehensive university aid program should insulate the poor from this type of liquidity problem. Now, we know from previous studies that increased financial aid has a pretty big effect on college attendance among the poor and lower middle class. Is this because the aid is helping loosen the family liquidity constraint?

Hilger uses the following trick. Consider a worker who is laid off. This is only a temporary shock, but this paper and others estimate a layoff lowers discounted lifetime earnings by an average of nearly $100,000. So can we just propensity match laid off and employed workers when the child is college age, and see if the income shock lowers attendance? Not so fast. It turns out that matching on whatever observables we have, children whose fathers are laid off when the child is 19 are also much less likely to attend college than children whose fathers are not laid off, even though age 19 would be after the attendance decision is made. Roughly, a father who is ever laid off is correlated with some nonobservables that lower college attendance of children. So let’s compare children whose dads are laid off at 17 to children whose dads are laid off from a similar firm at age 19, matching on all other observables. The IRS data has so many data points that this is actually possible.

What do we learn? First, consumption (in this case, on housing) spending declines roughly in line with the lifetime income hypothesis after the income shock. Second, there is hardly any effect on college attendance and quality: attendance for children whose dads suffer the large income shock falls by half a percentage point. Further, the decline is almost entirely borne by middle class children, not the very poor or the rich: this makes sense since poor students rely very little on parental funding to pay for college, and the rich have enough assets to overcome any liquidity shock. The quality of college chosen also declines after a layoff, but only by a very small amount. That is, the Engel curve for college spending is very flat: families with more income tend to spend roughly similar amounts on college.

Policy-wise, what does this mean? Other authors have estimated that a $1000 increase in annual financial aid increases college enrollment by approximately three percentage points (a particularly strong effect is found among students from impoverished families); the Kalamazoo experiment shows positive feedback loops that many make the efficacy of such aid even higher, since students will exert more effort in high school knowing that college is a realistic financial possibility. Hilger’s paper shows that a $1000 cash grant to poor families will likely improve college attendance by .007 to .04 percentage points depending on whether the layoff is lowering college attendance due to a transitory or a permanent income shock. That is, financial aid is orders of magnitude more useful in raising college attendance than cash transfers, especially among the poor.

November 2012 working paper (No IDEAS version). My old Federal Reserve coworker Christopher Herrington is also on the job market, and has a result suggesting the importance of Hilger’s finding. He computes a DSGE model of lifetime human capital formation, and considers the counterfactual where the US has more equal education funding (that is, schools that centrally funded rather than well-funded in rich areas and poorly-funded in poor areas). Around 15% of eventual earnings inequality – again taking into account many general equilibrium effects – can be explained by the high variance of US education funding. As in Hilger, directly altering the requirement that parents pay for school (either through direct payments at the university level, or by purchasing housing in rich areas at the primary level) can cure a good portion of our growing inequality.

“The Determinants and Welfare Implications of US Workers’ Diverging Location Choices by Skill: 1980-2000,” R. Diamond (2012)

Rebecca Diamond, on the market from Harvard, presented this interesting paper on inequality here on Friday. As is well-known, wage inequality increased enormously from the 1970s until today, with the divergence fairly well split between higher wages at top incomes and higher incomes to higher educated workers. There was simultaneously a great amount of locational sorting: the percentage of a city’s population which is college educated ranges from 15% in the Bakersfield MSA to around 45% in Boston, San Francisco and Washington, DC. Those cities that have attracted the highly educated have also seen huge increases in rent and housing prices. So perhaps the increase in wage inequality is overstated: these lawyers and high-flying tech employees are getting paid a ton, but also living in places where a 2,000 square foot house costs a million dollars.

Diamond notes that this logic is not complete. New York City has become much more expensive, yes, but it’s crime rate has gone way down, the streets are cleaner, the number of restaurants per capita has boomed, and the presence of highly educated neighbors and coworkers is good for your own productivity in the standard urban spillover models. It may be that wage inequality is underestimated using wage alone if better amenities in cities with lots of educated workers more than compensates for the higher rents.

How to sort this out? If you read this blog, you know the answer: data alone cannot tell you. What we need is a theory of high and low education workers’ location choice and a theory of wage determination. One such theory lets you do the following. First, find a way to identify exogenous changes in labor demand for some industry in cities, which ceteris parabis will increase the wages of workers employed in that industry. Second, note that workers can choose where to work, and that in equilibrium they must receive the same utility from all cities where they could be employed. Every city has a housing supply whose elasticity differs; cities with less land available for development because of water or mountains, and cities with stricter building regulations, have less elastic housing supply. Third, the amenities of a city are endogenous to who lives there; cities with more high education workers tend to have less crime, better symphonies, more restaurants, etc., which may be valued differently by high and low education workers.

Estimating the equilibrium distribution of high and low skill workers takes care. Using an idea from a 1991 paper by Bartik, Diamond notes that some shocks hit industries nationally. For instance, a shock may hit oil production, or hit the semiconductor industry. The first shock would increase low skill labor demand in Houston or Tulsa, and the second would increase high skill labor demand in San Jose and Boston. This tells us what happens to the labor demand curve. As always, to identify the intersection of demand and supply, we also need to identify changes in labor supply. Here, different housing supply elasticity helps us. A labor demand shock in a city with elastic housing supply will cause a lot of workers to move there (since rents won’t skyrocket), with fewer workers moving if housing supply is inelastic.

Estimating the full BLP-style model shows that, in fact, we are underestimating the change in well-being inequality between high and low education workers. The model suggests, no surprise, that both types of workers prefer higher wages, lower rents, and better amenities. However, the elasticity of college worker labor supply to amenities is much higher than that of less educated workers. This means that highly educated workers are more willing to accept lower after-rent wages for a city with better amenities than a less educated worker. Also, the only way to rationalize the city choices of highly educated workers over the time examined is with endogenous amenities; if well-being depends only on wages and rents, then highly educated workers would only have moved where they ended moving if they didn’t care at all about housing prices. Looking at smaller slices of the data, immigrant workers are much more sensitive to wages: they spend less of their income on housing, and hence care much more about wages when deciding where to live. In terms of spillovers, a 1% increase in the ratio of college educated workers to other workers increases college worker productivity by a half percentage point, and less educated worker productivity by about .2 percentage points.

Backing out the implies value of amenity in each MSA, the MSAs with the best amenities for both high and low education workers include places like Los Angeles and Boston; the least desirable for both types include high-crime Rust Belt cities. Inferred productivity by worker type is very different, however. While both types of workers appear to agree on which cities have the best and worst amenities, the productivity of high skill workers is highest in places like San Jose, San Francisco and New York, whereas productivity for low skill workers is particularly high in San Bernardino, Detroit and Las Vegas. The differential changes in productivity across cities led to re-sorting of different types of workers, which led to differential changes in amenities across cities. The observed pattern of location choices by different types of workers is consistent with a greater increase in well-being between high and low education workers, even taking into account changes in housing costs, than that implied by wage alone!

The data requirements and econometric skill involved in this model is considerable, but it should allow a lot of other interesting questions in urban policy to be answered. I asked Rebecca whether she looked at the welfare impacts of housing supply restrictions. Many cities that have experienced shocks to high education labor demand are also cities with very restrictive housing policies: LA, San Francisco, Boston, DC. In the counterfactual world where DC allowed higher density building, with the same labor demand shocks we actually observed, what would have happened to wages? Or inequality? She told me she is working on a similar idea, but that the welfare impacts are actually nontrivial. More elastic housing supply will cause more workers to move to high productivity cities, which is good. On the other hand, there are spillovers: housing supply restrictions form a fence that makes a city undesirable to low education workers, and all types of workers appear to both prefer highly educated workers and the amenities they bring. Weighing the differential impact of these two effects is an interesting next step.

November 2012 working paper (No IDEAS version). Fittingly on the week James Buchanan died, Diamond also has an interesting paper on rent extraction by government workers on her website. Roughly, government workers like to pay themselves higher salaries. If they raise taxes, private sector workers move away. But when some workers move out, the remaining population gets higher wages and pays lower rents as long as labor demand slopes down and housing supply slopes up. If housing supply is very inelastic, then higher taxes lead to workers leaving lead to a large decrease in housing costs, which stops the outflow of migration. So if extractive governments are trading off higher taxes against a lower population after the increase, they will ceteris parabis have higher taxes when housing supply is less elastic. And indeed this is true in the data. Interesting!

“Trafficking Networks and the Mexican Drug War,” M. Dell (2011)

Job market talks for 2012 have concluded at many schools, and therefore this is my last post on a job candidate paper. This is also the only paper I didn’t have a change to see presented live, and for good reason: Melissa Dell is clearly this year’s superstar, and I think it’s safe to assume she can have any job she wants, and at a salary she names. I have previously discussed another paper of hers – the Mining Mita paper – which would also have been a mindblowing job market paper; essentially, she gives a cleanly identified and historically important example of long-run effects of institutions a la Acemoglu and Robinson, but the effect she finds is that “bad” institutions in the colonial era led to “good” outcomes today. The mechanism by which historical institutions persist is not obvious and must be examined on a case-by-case basis.

Today’s paper is about another critical issue: the Mexican drug war. Over 40,000 people have been killed in drug-related violence in Mexico in the past half-decade, and that murder rate has been increasing over time. Nearly all of Mexico’s domestic drug production, principally pot and heroin, is destined for the US. There have been suggestions, quite controversial, that the increase in violence is a result of Mexican government policies aimed at shutting down drug gangs. Roughly, some have claimed that when a city arrests leaders of a powerful gang, the power vacuum leads to a violent contest among new gangs attempting to move into that city; in terms of the most economics-laden gang drama, removing relatively non-violent Barksdale only makes it easier for violent Marlo.

But is this true? And if so, when is it true? How ought Mexico deploy scarce drugfighting resources? Dell answers all three questions. First, she notes that the Partido Acción Nacional is, for a number of reasons, associated with greater crackdowns on drug trafficking in local areas. She then runs a regression discontinuity on municipal elections – which vary nicely over time in Mexico – where PAN barely wins versus barely loses. These samples appear balanced according to a huge range of regressors, including the probability that PAN has won elections in the area previously, a control for potential corruption at the local level favoring PAN candidates. In a given municipality-month, the probability of a drug-related homicide rises from 6 percent to 15 percent following a PAN inauguration after such a close election. There does not appear to be any effect during the lame duck period before PAN takes office, so the violence appears correlated to anti-trafficking policies that occur after PAN takes control. There is also no such increase in cases where PAN barely loses. The effect is greatest in municipalities on the border of two large drug gang territories. The effect is also greatest in municipalities where detouring around that city on the Mexican road network heading toward the US is particularly arduous.

These estimates are interesting, and do suggest that Mexican government policy is casually related to increasing drug violence, but the more intriguing question is what we should do about this. Here, the work is particularly fascinating. Dell constructs a graph where the Mexican road network forms edges and municipalities form vertices. She identifies regions which are historical sources of pot and poppyseed production, and identifies ports and border checkpoints. Two models on this graph are considered. In the first model, drug traffickers seek to reach a US port according to the shortest possible route. When PAN wins a close election, that municipality is assumed closed to drug traffic and gangs reoptimize routes. We can then identify which cities are likely to receive diverted drug traffic. Using data on drug possession arrests above $1000 – traffickers, basically – she finds that drug confiscations in the cities expected by the model to get traffic post-elections indeed rises 18 to 25 percent, depending on your measure. This is true even when the predicted new trafficking routes do not have a change in local government party: the change in drug confiscation is not simply PAN arresting more people, but actually does seem like more traffic along the route.

A second model is even nicer. She considers the equilibrium where traffickers try to avoid congestion. That is, if all gangs go to the same US port of entry, trafficking is very expensive. She estimates a cost function using pre-election trafficking data that is fairly robust to differing assumptions about the nature of the cost of congestion, and solves for the Waldrop equilibrium, a concept allowing for relatively straightforward computational solutions to congestion games on a network. The model in the pre-election period for which parameters on the costs are estimated very closely matches actual data on known drug trafficking at that time – congestion at US ports appears to be really important, whereas congestion on internal Mexican roads doesn’t matter too much. Now again, she considers the period after close PAN elections, assuming that these close PAN victories increase the cost of trafficking by some amount (results are robust to the exact amount), and resolves the congestion game from the perspective of the gangs. As in the simpler model, drug trafficking rises by 20 percent or so in municipalities that gain a drug trafficking route after the elections. Probability of drug-related homicides similarly increases. A really nice sensitivity check is performed by checking cocaine interdictions in the same city: they do not increase at all, as expected by the model, since the model maps trafficking routes from pot and poppy production sites to the US, and cocaine is only transshipped to Mexico via ports unknown to the researcher.

So we know now that, particularly when a territory is on a predicted trafficking route near the boundary of multiple gang territories, violence will likely increase after a crackdown. And we can use the network model to estimate what will happen to trafficking costs if we set checkpoints to make some roads harder to use. Now, given that the government has resources to set checkpoints on N roads, with the goal of increasing trafficking costs and decreasing violence, where ought checkpoints be set? Exact solutions turn out to be impossible – this “vital edges” problem in NP-hard and the number of edges is in the tens of thousands – but approximate algorithms can be used, and Dell shows which areas will benefit most from greater police presence. The same model, as long as data is good enough, can be applied to many other countries. Choosing trafficking routes is a problem played often enough by gangs that if you buy the 1980s arguments about how learning converges to Nash play, then you may believe (I do!) that the problem of selecting where to spend government counter-drug money is amenable to game theory using the techniques Dell describes. Great stuff. Now, between the lines, and understand this is my reading and not Dell’s claim, I get the feeling that she also thinks that the violence spillovers of interdiction are so large that the Mexican government may want to consider giving up altogether on fighting drug gangs.

http://econ-www.mit.edu/files/7484 (Nov 2011 Working Paper. I should note that this year is another example of strong female presence at the top of the economics job market. The lack of gender diversity in economics is problematic for a number of reasons, but it does appear things are getting better: Heidi Williams, Alessandra Voena, Melissa Dell, and Aislinn Bohren, among others, have done great work. The lack of socioeconomic diversity continues to be worrying, however; the field does much worse than fellow social sciences at developing researchers hailing from the developing world, or from blue-collar family backgrounds. Perhaps next year.)

“Trygve Haavelmo and the Emergence of Causal Calculus,” J. Pearl (2011)

Ignore the title of this article; it is simply a nice rhetorical trick to get economists to start using modern tools of the type Judea Pearl has developed to discuss causality. Economists know Haavelmo (winnter of the ’89 Nobel) for his “simultaneous equations” paper, in which he notes that regression cannot identify supply and demand simultaneously from a series of (price, quantity) bundles for the simple reason that regression of intersections between supply and demand won’t identify whether the supply curve or the demand curve has shifted. Theoretical assumptions about what changes to the economy affect demand and what affect supply – that is, economics, not statistics – solve the identification problem. (A side note: there is some interesting history on why econometrics comes about as late at is does. Economists until the 40s or so, including Keynes, essentially rejected statistical work in social science. They may have done so with good reason, though! Theories of stochastic processes that were needed to make sense of inference on non-IID variables like an economics time series weren’t yet developed, and economists rightly noted the non-IIDness of their data.)

Haavelmo’s other famous paper is his 1944 work on the probabilistic approach to economics. He notes that a system of theoretical equations is of interest not because of regression estimate itself, but because of the counterfactual where we vary one parameter keeping another one the same. That is, if we have in our data a joint distribution of X and Y, we are interested in more than simply that joint distribution; rather, we are interested in the counterfactual world where we could control one of those two variables. This is explicitly outside the statistical relationship between X and Y.

With Haavelmo 1944 as a suitable primer, Pearl presents the basic idea of his Structural Causal Models (SCM). This consists of a model M (usually a set of structural equations), a set of assumptions A (omitted factors, exclusion restrictions, correlations, etc.), a set of queries for the model to answer, and some data which is perhaps generated in accordance with A. The outputs are the logical implications of A, a set of data-dependent claims concerning model-dependent magnitudes or likelihoods of each of the queries, and a set of testable implications of the model itself answering questions like “to what extent do the model assumptions match the data?” I’ll ignore in the post, and Pearl generally ignores in the present paper, the much broader question ofwhen failures of that question matter, and further what the word “match” even means.

What’s cool about SCM and causal calculus more generally is that you can answer a bunch of questions without assuming anything about the functional form of relationships between variables – all you need are the causal arrows. Take a model of observed variables plus unobserved exogenous variables. Assume the latter to be independent. The model might be that X is a function of Y, W and an unobserved variable U1, Y is a function of V, W and U2, V is a function of U3 and W is a function of U4. You can draw a graph of causal arrows relating any of these concepts. With that graph in hand, you can answer a huge number of questions of interest to the econometrician. For instance: what are the testable implications of the model if only X and W are measured? Which variables can be used together to get an unbiased estimate of the effect of any one variable on another? Which variables must be measured if we wish to measure the direct effect of any variable on any other? There are many more, with answers found in Pearl’s 2009 textbook. Pearl also comes down pretty harshly on experimentalists of the Angrist type. He notes correctly that experimental potential-outcome studies also rely on a ton of underlying assumptions – concerning external validity, in particular – and at heart structural models just involve stating those assumptions clearly.

Worth a look – and if you find the paper interesting, grab the 2009 book as well.

http://ftp.cs.ucla.edu/pub/stat_ser/r391.pdf (December 2011 working paper)

“But Economics is Not an Experimental Science,” C. Sims (2010)

Christopher Sims, a winner of yesterday’s Nobel, wrote this great little comment in the JEP last year that has been making the blog rounds recently (hat tip to Andrew Gelman and Dan Hirschman). It’s basically a broadside against the “Identification Mafia”/quasinatural experiment type of economics that is particularly prevalent these days in applied micro and development.

The article is short enough that you should read it yourself, but the basic point is that a well-identified causal effect is, in and of itself, insufficient to give policy advice. For instance, if smaller class sizes lead to better outcomes in a quasinatural experiment, we might reasonably wonder why this happens. That is, if I were a principal and I created some small classes and some very large classes – and this is what universities do with the lecture hall/seminar model – am I better off than if I used equal classes all around? A well-estimated structural model can tell you. A simple identified quasinatural experiment cannot. And this problem does not even rely on expectations feedback and other phenomena that make many “experiments” in macro less than plausible.

Two final notes. First, let’s not go overboard: well-identified models and RCTs are good things! But, good internal validity is not an excuse to ignore external validity. Well-identified empirics that, through their structural setup, allow counterfactuals to be discussed and allow comparison with the rest of the literature are quite clearly the future of empirical economics. Second, as Sims notes, computers are very powerful now and growing more so. There is little excuse in the year 2011 for avoid nonlinear/nonparametric structures if we believe them to be at all important.

http://sims.princeton.edu/yftp/AngristPischkeJEP/AngristPischkeComment.pdf (Final working paper – published in Spring JEP commenting on a review of quasinatural experiments by Angrist and Pischke of “Mostly Harmless Economics” fame)

“Homophily and Contagion are Generically Confounded in Observational Social Network Studies,” C. Shalizi & A. Thomas (2011)

Stories of “social contagion” are fairly common in a number of literatures now, most famously in the work of Christakis & Fowler. Those two authors have won media fame for claims that, for example, obesity is contagious the same way a cold is contagious. The basic point is uncontroversial – surely everyone believes in peer effects – but showing the size of the contagion in a rigorous statistical way is controversial indeed. In the present paper, Shalizi and Thomas point out, in what is really a one-line proof once everything is written down properly, that contagion cannot be distinguished from latent homophily in real world data.

Consider the authors’ bridge-jumping example. Joey jumps off a bridge, then Ian does. What might be going on? It may be peer pressure, in which case breaking the social tie between Golden Child Ian and Bad Apple Joey would break the contagion and keep Ian from jumping. It might be, though, that both are members of a thrill-seeking club, whose membership roll is public and can therefore be conditioned on; call this manifest homophily. But it may be that Joey and Ian met on a rollercoaster, and happen to have a shared taste for thrillseeking which is not observable by the analyst; call this latent homophily. More generally, social networks form endogenously based on shared interests: how do I know whether obesity is contagious or whether people who like going out for steak and potatoes are more likely to be friends?

A nice way to analyze this problem is in terms of graphical causal models. For some reason, economists (and as far as I know, other social scientists) generally are unaware of the causal model literature, but it is terribly useful whenever you want to reason about when and in what direction causality can flow, given some basic assumptions. Judea Pearl has a great book which will get you up to speed. The homophily/contagion problem is simple. If past outcomes are informative about current period outcomes, and unobserved traits are informative about each other when two people share a social tie, and those unobserved traits are informative about current period outcomes, then when two people share a social tie, Joey’s period t-1 outcome will be statistically linked to Ian’s period t outcome, even if there is no actual contagion. That is, no matter what observable data we condition on, we cannot separate a direct effect of the social tie on outcomes from the indirect path identified in the previous sentence.

Christakis and Fowler, in a 2007 paper, offered a way around this problem: take advantage of asymmetry. If Joey reports Ian as his best friend, but Ian does not report Joey as his best friend, then the influence from Ian to Joey should be stronger than the influence from Joey to Ian. Shalizi and Thomas show that the asymmetry trick requires fairly strict assumptions about how latent homophily affects outcomes.

So what can be done? If all variables of interest for predicting outcomes and social ties were known to the researcher, then certainly we can distinguish between contagion and latent homophily, since there would then be no latent homophily. Even if not all relevant latent variables are known to the analyst, it still may be possible to make some progress by constructing Manski-style bounds using well-known properties of, for example, linear causal models. If the social network did not possess homophily – for instance, if the relevant network was randomly assigned in an experiment – then we are also OK. One way to approximate this is to control for latent variables by using statistical techniques to identify clusters of friends who seem like they may share unobservable interests; work in this area is far from complete, but interesting.

http://arxiv.org/abs/1004.4704 (Working paper; final version in Sociological Methods and Research 2011. Shalizi has also discussed this paper in more depth on his website.)

“Valuation of New Goods Under Perfect and Imperfect Competition,” J. Hausman (1996)

If you know this paper, you know it as the “Apple-Cinnamon Cheerios” paper. The question is the following: how valuable are seemingly incremental introductions of new products, such as new cereal brands? And if they are valuable, is the CPI missing something significant by aggregating prices at too high a level and therefore missing welfare gains from new goods?

Consider Apple-Cinnamon Cheerios. To calculate welfare effects of ACC, we can follow Hicks and find the “virtual price”, or the price at which demand would be zero. With that in hand, we can perform the usual price-index calculations as demand increases and price falls. That is, consider a product introduced in 1990. Give me the demand in 2000 at 2000 prices and incomes. Use Hausman (1981) or Hausman-Newey (1995) if you’re really good at differential equations, to find the expenditure function giving the minimum income necessary to purchase the actual 2000 bundle and give 2000 utility at 2000 prices. With that expenditure function in hand, calculate the spending necessary to get 2000 utility at 1990 prices, where the price of the new good is its virtual price.

That’s all well and good, but three big issues arise. First, calculating elasticities of substitution across hundreds of cereal brands, for instance, is asking a lot of the data, and some utility forms like Dixit-Stiglitz do not allow arbitrary patterns of substitutability like we might want. Second, we generally don’t see any price in the data near the virtual price, so we’re going to have to make some assumptions about the shape of the demand curve up there; Hausman uses the nice trick of also showing lower bounds for the virtual price assuming only that demand is convex. Third, pricing strategies under imperfect competition with multiproduct firms are very different from competitive pricing, since new goods can either cannibalize other products’ demand curves (say, that of regular Cheerios) or lead to increased prices of other products’ demand curves if they make those products’ demand curves steeper (say, Honey Nut Cheerios). Assuming competitive market pricing for products other than the new goods might overstate welfare gains for this reason. The section on imperfect competition is rather brief and not totally convincing, so I won’t discuss it further, but it’s definitely an important problem!

Hausman, unsurprisingly given his background, breaks out some nice econometric wizardry to try to get around these problems. He essentially assumes Gorman’s multi-stage budgeting on the part of the consumer, and therefore assuming substitutability occurs within known classes of products – in the case of Apple-Cinnamon Cheerios, substitutability only directly occurs among other “family” cereals, though of course family cereals may substitute for children’s cereals, and cereals as a whole are still allowed to substitute for other products. He assumed lowest-level demand is Deaton-Muellbauer’s Almost Ideal Demand, which allows arbitrary cross-price elasticities.

Using 137 weeks of cash register data from 7 cities, he identifies elasticities. The obvious problem here – indeed, the reason the term “econometrics” exists – is the simultaneity problem of supply and demand. That is, prices are endogenous. Hausman’s identification assumption is that though shocks to prices can occur over time within cities (say, ad campaigns), or within cities (say, differences in transport cost), there are no nationwide shocks to demand at a given time. The argument is something like “cereal ad campaigns are generally not national.” In the response by Tim Bresnahan that follows the pdf I link to below, Tim notes that if this assumption is wrong, the estimates in the paper are biased toward too-steep demand curves, and hence too big an effect. I heard through the grapevine that Bresnahan was not, and I’ll understate, terribly enthused with this paper when it was first presented.

In any case, if you buy the methodology, the introduction of Apple-Cinnamon Cheerios increased consumer welfare by 78 million dollars per year. The virtual price was approximately double the sales price (note that this is huge: this means for many consumers there was very little substitutability between Apple-Cinnamon Cheerios and pre-existing products). Since 25% of cereal demand when Hausman wrote was from new brands introduced in the prior ten years, if the virtual price was generally about twice the sales price, the price index for cereals should have gone down by roughly 25%. That is, for new cereal brands, the “price” fell by half, and for old brands there was no change.

Finally, a couple notes. First, Hausman has a great footnote on the common technique of estimating demands for individual attributes of a product independently and then summing them to find demand for a new product: “I realized the limitation of these models when I tried applying them to the choices among French champagnes. Somehow, the bubble content could never be made to come in significant.” Personally, I have no idea why people do probit-attribute estimation: what could possibly be the economic theory justification for doing so, particularly when linearity among attribute demands is assumed, as it usually is? Second, if you like this paper, you will definitely like Trajtenberg’s seminal 1990 book on CT scanners, as well as Aviv Nevo’s follow-up Econometrica on the cereal industry. Third, Greenwood and Kopecky have a nice new working paper on the value of the personal computer. They estimate parameters of the demand function through a calibration technique, which obviously is going to paper over the endogeneity of prices problem. That said, I’m working on a paper now where I’m calculating welfare effects from a product innovation, and the effects involve too many moving parts for economic theory to guide me when it comes to identification. What do you guys think of calibration exercises of the type done in Greenwood and Kopecky? I think my paper is a little easier because the supply side is more or less exogenous and because I’m doing a very-limited-assumption “increased/decreased consumer welfare” exercise in addition to a point estimate, so missing standard errors on the point estimate is probably not a big deal. That said, any general comments on calibration?

http://bpp.wharton.upenn.edu/ma….0Comment.pdf (Final NBER New Goods volume version, including Bresnahan’s comments. In dark parts of the internet, you can find some overly-personal back-and-forths by Hausman and Bresnahan that followed the publication of this volume.)

“The Persistent Effects of Peru’s Mining Mita,” M. Dell (2010)

We’ve had quite a few papers on this site recently by job market candidates, so let’s up the ante with paper written by a PhD student mostly before she even started her doctoral degree yet nonetheless published in Econometrica.

Institutions and their effect on long-run growth have been one of the most productive areas of economic research in the past decade or so. There are a number of results that discuss broad trends – English legal system colonies tend to have done better than Spanish, for instance. The exact mechanisms by which an institution from 200 years previous can still affect economic outcomes is less well understood. Dell discusses Engerman and Sokoloff’s contention than high inequality in Latin America in the colonial era led to bad economic outcomes today. Rather than compare across countries, she examines a particular colonial policy, Peru’s mita system of forced labor, shows large modern differences across the mita region boundary, and traces what historical processes may have led the mita to have effects persisting hundreds of years into the future.

The mita was a colonial system, begun in the 16th century, whereby villages in some areas were required to send a fraction of their working-age men to work in the state’s silver and mercury mines (how the colonial government avoided the agency problem here and wound up with anything but the most feeble workers, I don’t know…). Regions were sometimes included in the mita for geographical regions, but often were included solely because of their proximity to a colonial-era path leading to the mines. There was (and is) no significant difference in language, percentage indigenous, etc. along the mita border. The mita boundary has had no official meaning in 200 years.

Running a regression discontinuity (in two directions, since the boundary is located in geographical space) shows that health outcomes (stunted growh) and consumption are quite a bit lower for villages within the old mita boundary even today. For instance, people are nine percentage points more likely to have stunted growth, a sign of poverty. There are a number of potential explanations, but most come down to the fact that large haciendas did not develop in the colonial period within the mita region, since the state didn’t want competition for labor. Those haciendas later used their political power to ensure road networks and other inputs to production were built in their regions. Further, when the hacienda system was dismantled in the 1960s, the hacienda land was distributed to peasants, given them properly-titled land. That is, a case can be made that, at least in Peru, the particularly unequal regions, with large-scale landowners, were in some sense good for growth; this is the opposite conclusion of Engerman and Sokoloff, and a suggestion that idiosyncratic features can overwhelm more obvious theoretical insights when we talk about processes lasting hundreds of years.

I am confused a bit here, though this may be simply because I’m a terrible econometrician. Doesn’t the use of regression discontinuity require that the effect of the treatment is discontinuous at the boundary? Consider regional roads. A village on one side of the boundary is x kilometers from the nearest road. A village right on the other side is x+1 kilometers away. How is this a discontinuity? This has implications for interpreting the results as well. When the paper says the mita lowers household consumption by 25%, RD implies that household consumption falls by 25% at the boundary of the Mita region. If roads and network infrastructure are the reason, it’s tough to see first why you would have such a large effect at the boundary, and second, why I care particularly about the effect at the boundary vis-a-vis the average effect within the mita region. Perhaps someone can explain to me why RD is appropriate here.

http://econ-www.mit.edu/files/5645 (Final WP – published in Econometrica 2010)

Follow

Get every new post delivered to your Inbox.

Join 189 other followers

%d bloggers like this: