Stories of “social contagion” are fairly common in a number of literatures now, most famously in the work of Christakis & Fowler. Those two authors have won media fame for claims that, for example, obesity is contagious the same way a cold is contagious. The basic point is uncontroversial – surely everyone believes in peer effects – but showing the size of the contagion in a rigorous statistical way is controversial indeed. In the present paper, Shalizi and Thomas point out, in what is really a one-line proof once everything is written down properly, that contagion cannot be distinguished from latent homophily in real world data.
Consider the authors’ bridge-jumping example. Joey jumps off a bridge, then Ian does. What might be going on? It may be peer pressure, in which case breaking the social tie between Golden Child Ian and Bad Apple Joey would break the contagion and keep Ian from jumping. It might be, though, that both are members of a thrill-seeking club, whose membership roll is public and can therefore be conditioned on; call this manifest homophily. But it may be that Joey and Ian met on a rollercoaster, and happen to have a shared taste for thrillseeking which is not observable by the analyst; call this latent homophily. More generally, social networks form endogenously based on shared interests: how do I know whether obesity is contagious or whether people who like going out for steak and potatoes are more likely to be friends?
A nice way to analyze this problem is in terms of graphical causal models. For some reason, economists (and as far as I know, other social scientists) generally are unaware of the causal model literature, but it is terribly useful whenever you want to reason about when and in what direction causality can flow, given some basic assumptions. Judea Pearl has a great book which will get you up to speed. The homophily/contagion problem is simple. If past outcomes are informative about current period outcomes, and unobserved traits are informative about each other when two people share a social tie, and those unobserved traits are informative about current period outcomes, then when two people share a social tie, Joey’s period t-1 outcome will be statistically linked to Ian’s period t outcome, even if there is no actual contagion. That is, no matter what observable data we condition on, we cannot separate a direct effect of the social tie on outcomes from the indirect path identified in the previous sentence.
Christakis and Fowler, in a 2007 paper, offered a way around this problem: take advantage of asymmetry. If Joey reports Ian as his best friend, but Ian does not report Joey as his best friend, then the influence from Ian to Joey should be stronger than the influence from Joey to Ian. Shalizi and Thomas show that the asymmetry trick requires fairly strict assumptions about how latent homophily affects outcomes.
So what can be done? If all variables of interest for predicting outcomes and social ties were known to the researcher, then certainly we can distinguish between contagion and latent homophily, since there would then be no latent homophily. Even if not all relevant latent variables are known to the analyst, it still may be possible to make some progress by constructing Manski-style bounds using well-known properties of, for example, linear causal models. If the social network did not possess homophily – for instance, if the relevant network was randomly assigned in an experiment – then we are also OK. One way to approximate this is to control for latent variables by using statistical techniques to identify clusters of friends who seem like they may share unobservable interests; work in this area is far from complete, but interesting.