Category Archives: Information Econ

“Wall Street and Silicon Valley: A Delicate Interaction,” G.-M. Angeletos, G. Lorenzoni & A. Pavan (2012)

The Keynesian Beauty Contest – is there any better example of an “old” concept in economics that, when read in its original form, is just screaming out for a modern analysis? You’ve got coordination problems, higher-order beliefs, signal extraction about underlying fundamentals, optimal policy response by a planner herself informationally constrained: all of these, of course, problems that have consumed micro theorists over the past few decades. The general problem of irrational exuberance when we start to model things formally, though, is that it turns out to be very difficult to generate “irrational” actions by rational, forward-looking agents. Angeletos et al have a very nice model that can generate irrational-looking asset price movements even when all agents are perfectly rational, based on the idea of information frictions between the real and financial sector.

Here is the basic plot. Entrepreneurs get an individual signal and a correlated signal about the “real” state of the economy (the correlation in error about fundamentals may be a reduced-form measure of previous herding, for instance). The entrepreneurs then make a costly investment. In the next period, some percentage of the entrepreneurs have to sell their asset on a competitive market. This may represent, say, idiosyncratic liquidity shocks, but really it is just in the model to abstract away from the finance sector learning about entrepreneur signals based on the extensive margin choice of whether to sell or not. The price paid for the asset depends on the financial sector’s beliefs about the real state of the economy, which come from a public noisy signal and the trader’s observations about how much investment was made by entrepreneurs. Note that the price traders pay is partially a function of trader beliefs about the state of the economy derived from the total investment made by entrepreneurs, and the total investment made is partially a function of the price at which entrepreneurs expect to be able to sell capital should a liquidity crisis hit a given firm. That is, higher order beliefs of both the traders and entrepreneurs about what the other aggregate class will do determine equilibrium investment and prices.

What does this imply? Capital investment is higher in the first stage if either the state of the world is believed to be good by entrepreneurs, or if the price paid in the following period for assets is expected to be high. Traders will pay a high price for an asset if the state of the world is believed to be good. These traders look at capital investment and essentially see another noisy signal about the state of the world. When an entrepreneur sees a correlated signal that is higher than his private signal, he increases investment due to a rational belief that the state of the world is better, but then increases it even more because of an endogenous strategic complementarity among the entrepreneurs, all of whom prefer higher investment by the class as a whole since that leads to more positive beliefs by traders and hence higher asset prices tomorrow. Of course, traders understand this effect, but a fixed point argument shows that even accounting for the aggregate strategic increase in investment when the correlated signal is high, aggregate capital can be read by traders precisely as a noisy signal of the actual state of the world. This means that when when entrepreneurs invest partially on the basis of a signal correlated among their class (i.e., there are information spillovers), investment is based too heavily on noise. An overweighting of public signals in a type of coordination game is right along the lines of the lesson in Morris and Shin (2002). Note that the individual signals for entrepreneurs are necessary to keep the traders from being able to completely invert the information contained in capital production.

What can a planner who doesn’t observe these signals do? Consider taxing investment as a function of asset prices, where high taxes appear when the market gets particularly frothy. This is good on the one hand: entrepreneurs build too much capital following a high correlated signal because other entrepreneurs will be doing the same and therefore traders will infer the state of the world is high and pay high prices for the asset. Taxing high asset prices lowers the incentive for entrepreneurs to shade capital production up when the correlated signal is good. But this tax will also lower the incentive to produce more capital when the actual state of the world, and not just the correlated signal, is good. The authors discuss how taxing capital and the financial sector separately can help alleviate that concern.

Proving all of this formally, it should be noted, is quite a challenge. And the formality is really a blessing, because we can see what is necessary and what is not if a beauty contest story is to explain excess aggregate volatility. First, we require some correlation in signals in the real sector to get the Morris-Shin effect operating. Second, we do not require the correlation to be on a signal about the real world; it could instead be correlation about a higher order belief held by the financial sector! The correlation merely allows entrepreneurs to figure something out about how much capital they as a class will produce, and hence about what traders in the next period will infer about the state of the world from that aggregate capital production. Instead of a signal that correlates entrepreneur beliefs about the state of the world, then, we could have a correlated signal about higher-order beliefs, say, how traders will interpret how entrepreneurs interpret how traders interpret capital production. The basic mechanism will remain: traders essentially read from aggregate actions of entrepreneurs a noisy signal about the true state of the world. And all this beauty contest logic holds in an otherwise perfectly standard Neokeynesian rational expectations model!

2012 working paper (IDEAS version). This paper used to go by the title “Beauty Contests and Irrational Exuberance”; I prefer the old name!

“The Limits of Price Discrimination,” D. Bergemann, B. Brooks and S. Morris (2013)

Rakesh Vohra, who much to the regret of many of us at MEDS has recently moved on to a new and prestigious position, pointed out a clever paper today by Bergemann, Brooks and Morris (the first and third names you surely know, the second is a theorist on this year’s market). Beyond some clever uses of linear algebra in the proofs, the results of the paper are in and of themselves very interesting. The question is the following: if a regulator, or a third party, can segment consumers by willingness-to-pay and provide that information to a monopolist, what are the effects on welfare and profits?

In a limited sense, this is an old question. Monopolies generate deadweight loss as they sell at a price above marginal cost. Monopolies that can perfectly price discriminate remove that deadweight loss but also steal all of the consumer surplus. Depending on your social welfare function, this may be a good or bad thing. When markets can be segmented (i.e., third degree price discrimination) with no chance of arbitrage, we know that monopolist profits are weakly higher since the uniform monopoly price could be maintained in both markets, but the effect on consumer surplus is ambiguous.

Bergemann et al provide two really interesting results. First, if you can choose the segmentation, it is always possible to segment consumers such that monopoly profits are just the profits gained under the uniform price, but quantity sold is nonetheless efficient. Further, there exist segmentations such that producer surplus P is anything between the uniform price profit P* and the perfect price discrimination profit P**, and such that consumer surplus plus consumer surplus P+C is anything between P* and P**! This seems like magic, but the method is actually pretty intuitive.

Let’s generate the first case, where producer profit is the uniform price profit P* and consumer surplus is maximal, C=P**-P*. In any segmentation, the monopolist can always charge P* to every segment. So if we want consumers to capture all of the surplus, there can’t be “too many” high-value consumers in a segment, since otherwise the monopolist would raise their price above P*. Let there be 3 consumer types, with the total market uniformly distributed across the three, such that valuations are 1, 2 and 3. Let marginal cost be constant at zero. The profit-maximizing price is 2, earning the monopolist 2*(2/3)=4/3. But what if we tell the monopolist that consumers can either be Class A or Class B. Class A consists of all consumers with willingness-to-pay 1 and exactly enough consumers with WTP 2 and 3 that the monopolist is just indifferent between choosing price 1 or price 2 for Class A. Class B consists of the rest of the types 2 and 3 (and since the relative proportion of type 2 and 3 in this Class is the same as in the market as a whole, where we already know the profit maximizing price is 2 with only types 2 and 3 buying, the profit maximizing price remains 2 here). Some quick algebra shows that if Class A consists of all of the WTP 1 consumers and exactly 1/2 of the WTP 2 and 3 consumers, then the monopolist is indifferent between charging 1 and 2 to Class A, and charges 2 to Class B. Therefore, it is an equilibrium for all consumers to buy the good, the monopolist to earn uniform price profits P*, and consumer surplus to be maximized. The paper formally proves that this intuition holds for general assumptions about (possibly continuous) consumer valuations.

The other two “corner cases” for bundles of consumer and producer surplus are also easy to construct. Maximal producer surplus P** with consumer surplus 0 is simply the case of perfect price discrimination: the producer knows every consumer’s exact willingness-to-pay. Uniform price producer surplus P* and consumer surplus 0 is constructed by mixing the very low WTP consumers with all of the very high types (along with some subset of consumers with less extreme valuations), such that the monopolist is indifferent between charging the monopolist price or just charging the high type price so that everyone below the high type does not buy. Then mix the next highest WTP types with low but not quite as low WTP types, and continue iteratively. A simple argument based on a property of convex sets allows mixtures of P and C outside the corner cases; Rakesh has provided an even more intuitive proof than that given in the paper.

Now how do we use this result in policy? At a first pass, since information is always good for the seller (weakly) and ambiguous for the consumer, a policymaker should be particularly worried about bundlers providing information about willingness-to-pay that is expected to drastically lower consumer surplus while only improving rent extraction by sellers a small bit. More works needs to be done in specific cases, but the mathematical setup in this paper provides a very straightforward path for such applied analysis. It seems intuitive that precise information about consumers with willigness-to-pay below the monopoly price is unambiguously good for welfare, whereas information bundles that contain a lot of high WTP consumers but also a relatively large number of lower WTP consumers will lower total quantity sold and hence social surplus.

I am also curious about the limits of price discrimination in the oligopoly case. In general, the ability to price discriminate (even perfectly!) can be very good for consumers under oligopoly. The intuition is that under uniform pricing, I trade-off stealing your buyers by lowering prices against earning less from my current buyers; the ability to price discriminate allows me to target your buyers without worrying about the effect on my own current buyers, hence the reaction curves are steeper, hence consumer surplus tends to increase (see Section 7 of Mark Armstrong’s review of the price discrimination literature). With arbitrary third degree price discrimination, however, I imagine mathematics similar to that in the present paper could prove similarly elucidating.

2013 Working Paper (IDEAS version).

“Long Cheap Talk,” R. Aumann & S. Hart (2003)

I wonder if Crawford and Sobel knew just what they were starting when they wrote their canonical cheap talk paper – it is incredible how much more we know about the value of cheap communication even when agents are biased. Most importantly, it is not true that bias or self-interest means we must always require people to “put skin in the game” or perform some costly action in order to prove the true state of their private information. A colleague passed along this paper by Aumann and Hart which addresses a question that has long bedeviled students of repeated games: why don’t they end right away? (And fair notice: we once had a full office shrine, complete with votive candles, to Aumann, he of the antediluvian beard and two volume tome, so you could say we’re fans!)

Take a really simple cheap talk game, where only one agent has any useful information. Row knows what game we are playing, and Column only knows the probability distribution of such games. In the absence of conflict (say, where there are two symmetric games, each of which has one Pareto optimal equilibrium), Row first tells Column that which game is the true one, this is credible, and so Column plays the Pareto optimal action. In other cases, we know from Crawford-Sobel logic that partial revelation may be useful even when there are conflicts of interest: Row tells Column with some probability what the true game is. We can also create new equilibria by using talk to reach “compromise”. Take a Battle of the Sexes, with LL payoff (6,2), RR (2,6) and LR=RL=(0,0). The equilibria of the simultaneous game without cheap talk are LL,RR, or randomize 3/4 on your preferred location and 1/4 of the opponent’s preferred location. But a new equilibria is possible if we can use talk to create a public randomization device. We both write down 1 or 2 on a piece of paper, then show the papers to each other. If the sum is even, we both go LL. If the sum is odd, we both then go RR. This gives ex-ante payoff (4,4), which is not an equilibrium payoff without the cheap talk.

So how do multiple rounds help us? They allow us to combine these motives for cheap talk. Take an extended Battle of the Sexes, with a third action A available to Column. LL still pays off (6,2), RR still (2,6) and LR=RL=(0,0). RA or LA pays off (3,0). Before we begin play, we may be playing extended Battle of the Sexes, or we may be playing a game Last Option that pays off 0 to both players unless Column plays A, in which case both players get 4; both games are equally probable ex-ante, and only Row learns which game we actually in. Here, we can enforce a payoff of (4,4) if, when the game is actually extended Battle of the Sexes, we randomize between L and R as in the previous paragraph, but if the game is Last Option, Column always plays A. But the order in which we publicly randomize and reveal information matters! If we first randomize, then reveal which game we are playing, then whenever the public randomization causes us to play RR (giving row player a payoff of 2 in Battle of the Sexes), Row will afterwards have the incentive to claim we are actually playing Last Resort. But if Row first reveals which game we are playing, and then we randomize if we are playing extended Battle of the Sexes, we indeed enforce ex-ante expected payoff (4,4).

Aumann and Hart show precisely what can be achieved with arbitrarily long strings of cheap talk, using a clever geometric proof which is far too complex to even summarize here. But a nice example of how really long cheap talk of this fashion can be used is in a paper by Krishna and Morgan called The Art of Conversation. Take a standard Crawford-Sobel model. The true state of the world is drawn uniformly from [0,1]. I know the true state, and get utility which is maximized when you take action on [0,1] as close as possible to the true state of the world plus .1. Your utility is maximized when you take action as close as possible to the true state of the world. With this “bias”, there is a partially informative one-shot cheap talk equilibrium: I tell you whether we are in [0,.3] or [.3,1] and you in turn take action either .15 or .65. How might we do better with a string of cheap talk? Try the following: first I tell you whether we are in [0,.2] or [.2,1]. If I say we are in the low interval, you take action .1. If I say we are in the high interval, we perform a public randomization which ends the game (with you taking action .6) with probability 4/9 and continues the game with probability 5/9; for example, to publicly randomize we might both shout out numbers between 1 and 9, and if the difference is 4 or less, we continue. If we continue, I tell you whether we are in [.2,.4] or [.4,1]. If I say [.2,.4], you take action .3, else you take action .7. It is easy to calculate that both players are better off ex-ante that in the one-shot cheap talk game. The probabilities 4/9 and 5/9 were chosen so as to make each player indifferent from following the proposed equilibrium after the randomization or not.

The usefulness of the lotteries interspersed with the partial revelation are to let the sender credibly reveal more information. If there were no lottery, but instead we always continued with probability 1, look at what happens when the true state of nature is .19. The sender knows he can say in the first revelation that, actually, we are on [.2,1], then in the second revelation that, actually, we are on [.2,4], in which case the receiver plays .3 (which is almost exactly sender’s ideal point .29). Hence without the lotteries, the sender has an incentive to lie at the first revelation stage. That is, cheap talk can serve to give us jointly controlled lotteries in between successive revelation of information, and in so doing, improve our payoffs.

Final published Econometrica 2003 copy (IDEAS). Sequential cheap talk has had many interesting uses. I particularly enjoyed this 2008 AER by Alonso, Dessein and Matouschek. The gist is the following: it is often thought that the tradeoff between decentralized firms and centralized firms is more local control in exchange for more difficult coordination. But think hard about what information will be transmitted by regional managers who only care about their own division’s profits. As coordination becomes more important, the optimal strategy in my division is more closely linked to the optimal decision in other divisions. Hence I, the regional manager, have a greater incentive to freely share information with other regional managers than in the situation where coordination is less important. You may prefer centralized decision-making when cooperation is least important because this is when individual managers are least likely to freely share useful information with each other.

“Welfare Gains from Optimal Pollution Regulation,” J. M. Abito (2012)

Mechanism design isn’t just some theoretical curiosity or a trick for examining auctions. It has, in the hands of skilled practitioners like David Baron, Jean-Jacques Laffont, Jean Tirole and David Besanko (an advisor of mine!), had a huge impact on economic regulation. Consider regulating a natural monopoly that has private information about its costs. In the standard sorting problem, I am going to have to pay information rents to firms that have low costs, since otherwise they will claim to have high costs and thus get to charge higher prices. If funds are costly – and the standard estimate in US public finance is that the marginal dollar of taxation imposes 30 cents of deadweight loss on society – then those information rents are a welfare loss and not just a transfer. Hence I may be willing to sacrifice some allocative efficiency by, for example, randomizing over all firms who claim to be at least somewhat efficient rather than paying a large information rent to learn exactly who the efficient firm is. Laffont’s 1994 Econometric Society address covers this basic point quite nicely.

Mike Abito, on the job market here at Northwestern, notes that few real-world policies actually take account of this tradeoff. Consider a regulator who wants polluting firms to abate their pollution when economically feasible. If the distribution of abatement costs is widely dispersed, then low cost firms have a large incentive to claim high costs and therefore avoid paying for abatement. Especially in this case, it may be worthwhile to sacrifice some allocative efficiency in an optimal pollution abatement scheme, having low cost firms not abate as much as they would if the regulator wanted all information about each firm’s costs. In order to design the optimal pollution regulation scheme, then, we need to know the distribution of marginal abatement costs, which is not something we know immediately from data. In particular, consider regulating SO2 among power plants. Hence, to the world of theory, my friends! (And, briefly, why not just sell pollution permits? If you give away the permits to each plant, then the same informational issue arises, and you do not earn any tax revenue that could offset distortionary taxes elsewhere in the economy.)

Let a power plant, at some cost and effort level, produce some bundle of electricity and SO2. Observed costs alone are not enough if firms have inherently high costs, since firms may appear to have high costs when in fact they are simply exerting low effort. Abito notices that power plants are both rate regulated – meaning that they are natural monopolies whose rates are set by a government agency that estimates their costs – and regulated for pollution reasons. By writing down an auditing game, you see that in the periods the firm is being watched for rate-setting purposes, they exert low effort. They do exert effort in future periods, since any cost reduction comes to them as profits. Indeed, if you look at, for instance, heat generation during years where the plants are being watched, the amount of heat generated declines by roughly the same amount as effort is estimated to decline in the model, so the hypothesized equilibrium of the auditing game is not totally out of line with the data.

What this wedge between cost efficiency in years when the plant is being watched and in other years gives us is an estimate of the cost function, including disutility of effort, which generates some bundle of SO2 and electricity. In fact, it gives us just enough of an exclusion restriction to estimate the distribution of marginal abatement costs of SO2 using techniques from dynamic structural IO. Once we have estimated that distribution, we can solve for numerical estimates of the welfare gain from various abatement policies. Laffont long ago showed that the optimal pollution regulation under this private information, assuming we know the distribution of marginal abatement costs, involves a bundle of type-dependent emission taxes and type-dependent transfers which give the least efficient firm zero profits, but which also lead to less effort and less pollution abatement for more efficient firms that you would get with full information; again, this is just the tradeoff between information rents and allocative efficiency. Such a heterogeneous policy might be tough to implement in practice, however. Welfare gains from the optimal policy instead of a uniform emissions standard, given the estimated distribution of marginal abatement costs, are equal to about 10% of the entire variable cost of the average plant. A uniform emissions tax (rather than a standard which imposes a maximum amount of emissions) captures something like 60-70% of this improvement, and is easier to implement.

More generally, the gain to society of using regulatory regimes that condition on the underlying properties of each firm really depends on properties like the distribution of marginal abatement costs which atheoretically can never be known, but which with the use of proper structure can actually be estimated. What is particularly cool here is that, unlike most earlier work, the underlying firm properties are estimated without assuming that the regulator is already optimizing, an assumption that is simply false in the case of pollution regulation. Good stuff.

November 2012 Working Paper (Not available on IDEAS). There are a number of interesting papers in environmental economics on the job market this year. Lint Barrage at Yale discusses how carbon taxes and other taxes should interact in optimal fiscal policy. In particular, since carbon in the atmosphere lowers the productive capacity of assets (like agricultural land) in the future, not taxing carbon is identical to taxing capital, producing the same distortion. When the economy already has distortionary taxation, the optimal rate of carbon taxation will need to be adjusted. Joseph Shapiro from MIT estimates the environmental damage from CO2 produced in international trade. It is two orders of magnitude smaller than the gains from that trade, and a small carbon tax on international shipping is optimal. In a separate paper, Shapiro and coauthors find that US mortality during heat waves declined massively over the twentieth century, that all of the decline appears to be linked to adoption of air conditioning, and hence that mitigation of some negative health impacts of climate change in poor countries will likely be handled by A/C. Since A/C uses electricity, non-carbon methods of generating that electricity are critical if we want to avoid making climate change worse while we mitigate these impacts.

“How Better Information Can Garble Experts’ Advice,” M. Elliott, B. Golub & A. Kirilenko (2012)

Ben Golub from Stanford GSB is on the market this year following a postdoc. This paper, which I hear is currently under submission, is a simple and straightforward theoretical point, but it does have some worrying implications for public policy. Consider a set of experts which society queries about the chance of some probabalistic event; the authors mention the severity of a flu, or the risk of a financial crisis, as examples. These experts all have different private information. Given their private information, and the (unmodeled) payoff they receive from a prediction, they weigh the risk of type I and type II errors.

Now imagine that information improves for each expert (restricting to two experts as in the paper). With the new information, any possible set of type I and type II errors is still possible, and there is now the possibility of making predictions with strictly fewer type I and type II errors. This means that the “error frontier” expands outward for each expert. To be precise, if each agent gets a signal in [0,1] whose cdf is G(i) for expert i if the event will actually occur. A new signal that generates a second cdf G2(i) which first order stochastically dominates G(i) is an information improvements. Imagine both experts receive information improvements. Is this socially useful? It turns out that is it not necessarily a good thing.

How? Imagine that expert 1 is optimizing by making x1 type I errors and y1 type II errors given his signal, and expert 2 is optimizing by making x2 type I errors and y2 type II errors. Initially expert 1 is making very few type I errors, and expert 2 is making very few type II errors. Information improves for both, pushing out the “error frontier”. At the new optimum for expert 1, he makes more type I errors, but many fewer type II errors. Likewise, at the new optimum, expert 2 makes more type II errors and fewer type I errors. Indeed, it can be the case that expert I after the information improvement is making more type I and type II errors than expert 2 did in her original prediction, and that expert II is now making more type I and type II errors than expert 1 did in his original prediction. That is, the new set of predictions are a Blackwell garbling of the original set of predictions, and hence less useful to society no matter what decision rule society uses when applying the information to some problem. Note that this result does not depend on experts trying to out-guess each other or anything similar.

Is such a perverse outcome unusual? Not necessarily. Let both experts be “envious” before new information arrives, meaning the both experts prefer the other’s bundle of type I and type II errors to any such bundle the expert can choose himself. Let the agents payoffs not depend on the prediction of the other agents. Finally, Let the new information be a “technology transfer”, meaning a sharing of some knowledge already known to one or both agents. That is, after the new information arrives, the error frontier of both agents lies within the convex hull of their original combined error frontiers. With envious agents, there is always a technology transfer that makes society worse off. All of the above holds even when experts are not required to make discrete {0,1} predictions.

This is all to say that, as the authors note, “better diagnostic technology need not lead to better diagnoses”. But let’s not go too far: there is no principal-agent game here. You may wonder if society can design payment rules to experts to avoid such perversity. We have a literature, now large, on expert testing, where you want to avoid paying “fake experts” for information. Though you can’t generally tell experts and charlatans apart, Shmaya and Echenique have a paper showing that there do exist mechanisms to ensure that, at least, I am not harmed “too much” by the charlatans’ advice. It is not clear whether a mechanism exists for paying experts which ensures that information improvements are strictly better for society. By Blackwell’s theorem, more information is strictly better for the principal, so incentivizing the experts to express their entire type I-type II error frontier (which is equivalent to expressing their prior and their signal) would work. How to do that is a job for another paper.

July 2012 working paper (unavailable on Repec IDEAS).

“Paternalism, Libertarianism and the Nature of Disagreement,” U. Loginova & P. Persson (2012)

Petra Persson is on the job market this year from Columbia. Her CV is pretty incredible – there’s pure theory, cutting edge empirical techniques, policy work, networks, behavioral and more. Her job market paper is about the impact of social insurance policy on seemingly unrelated markets like marriage, and I’ll discuss it briefly at the end of the post, but I want to focus on another paper on hers which struck me as quite interesting.

Imagine a benevolent ruler who has private information about some policy, such as the relative safety of wearing seatbelts. This ruler can either tell citizens the information, or lie, or coerce them to take such action. Naive libertarianism suggests that we should always be truthful is altruistic; consumers can then weigh the information according to their preferences and then choose the policy optimal for them.

But note something interesting. On some issues, one subset of politicians has libertarian leanings, while on others, a different subset has those leanings. For instance, a politician may favor legal assisted suicide but insist on mandatory seatbelt rules, while another politician may be against the mandatory belt and also against legal assisted suicide. Politicians can even vary in how libertarian they wish to be depending on who the policy affects. Witness that many politicians favor legalizing marijuana but very few favor legalizing it for 16 year olds. What explains this behavior?

Loginova and Persson examine this theoretically. Take a population of citizens. There are two possible states, 0 and 1. They can either think each state equally likely yet have different heterogeneous preferences from the politician (measured with a Crawford-Sobel style quadratic loss, though this isn’t a critical model) or they can have identical preferences as the politician yet have different heterogenous (prior) beliefs about the probability of each state. The politician can be altruistic to varying degrees – more altruism means he, according to his own prior, puts more and more weight on the utility of the agent. The politician gets a noisy signal about the true state. To limit the extent of opposing beliefs, the politician is restricted to having the same prior as the median citizen.

If the politician can only advise or not advise, when does he make a truthful public announcement? If he disagrees on preferences with the citizens, then the more altruistic, the more likely he is to announce truthfully, for the standard libertarian reason: the citizens know their own preferences, and the better informed they are, the better they can maximize their own welfare. If, however, he disagrees on priors with the citizens, then the more altruistic, the less likely he is to announce truthfully: altruism means I care about the citizen’s welfare, but since they have priors that are in my eyes wrong, the citizens know that even when I am altruistic I have incentive to lie so that citizens take actions that are optimal according to my prior, therefore truthful communication cannot be sustained.

Now what if the politician could (at a cost to him) force all individuals to take an individual action? With preference disagreement, an altruistic politician would never do this, both because he can send all the information to citizens with a free message and also because a mandate does not respect heterogeneity of preferences. Even if action 0 is better than action 1 for 90% of the population, an altruistic principle also cares about the other 10%. With disagreement about priors, however, an altruistic politician is more likely to impose a mandate the more altruistic he is. Even though citizens have heterogeneous priors, the principle thinks all of them are wrong, and hence is not worried about heterogeneity when imposing a mandate. Since we noted in the last paragraph that altruistic politicians who have different priors from citizens will not be able to credibly send their information, the mandate allows the politician’s private information to be used in the citizen’s actions.

Finally, what if the politician can send individual-level messages or enforce individual mandates? A politician with preference disagreement need to be fairly altruistic before his public message is credible; in fact, he needs to be able to credibly persuade the individual with the average disagreement in order for his public signal to be credible. If he is not altruistic enough, he can still credibly persuade those agents who have only a limited amount of preference disagreement with him. If mandates are possible, the politician with limited altruism will force individuals whose preferences are very different from the politician to take the politician’s desired action, but since preferences of the politician and the agents are more aligned when altruism is higher, the share of citizens who face a mandate declines as the politician’s altruism increases. Likewise, a politician with disagreement about priors can only truthfully send information when his altruism is low. If the politician is very altruistic, even though the public signal will not be believed, a politician can still credibly send information to those whose priors are similar to the politician. The politician with low levels of altruism will only mandate the action of agents with extreme beliefs, but as altruism increases, more and more citizens will face a mandate.

Very good – the use of paternalistic policies, and the extent to which they are targeted at individuals, depends qualitatively on whether the politician disagrees with the agents about their preferences or about their knowledge, and the extent to which mandates are applied on certain groups depends on how extreme their preferences or beliefs are. There is nothing inherently contradictory in an altruistic politician taking the libertarian side on one issue and the paternalistic side on another.

July 2012 working paper (No IDEAS version). Petra has many other interesting papers. In her job market paper, presented here last week, she shows that social insurance, in this case a widow’s benefit in Sweden, can have major affects in other markets. In particular, a really nice regression discontinuity shows that the benefit was leading to a huge number of extra marriages, that these were more likely to end in divorce, that intrahousehold bargaining was affected, and much more (Jeff at Cheap Talk has a longer description). Her paper Circles of Trust notes a reason for cliquish behavior in some labor markets. If I have information whose value declines with use (such as a stock tip) and I am altruistic, I may wish to tell my friends the info. But I worry that they will tell their friends, who I don’t know and hence don’t really care about. If my friend could commit not to tell his friends, I would give him the info. How can we commit ex-ante? Make our social networks a clique. I would bet that this phenomenon explains hiring in, say, small hedge funds to a great extent.

“Bayesian Persuasion,” E. Kamenica & M. Gentzkow (2011)

Kamenica and Gentzkow recently published this gloriously-titled extension of the cheap talk literature in AER. Recall that in Crawford-Sobel cheap talk, a sender and receiver differ in their preferred action, and the sender holds better information about the true state. The receiver cannot commit to an action conditional on the message, so this is not a principal-agent problem. Even though both people know the sender is biased, there are equilibria that are partially informative, where the sender credibly just tells you what set of states the true state is in, and the receiver updates based on that knowledge. These equilibria are not always good for the sender, nor for the receiver – both may be better off in the “babbling equilibrium” where the signal is totally ignored.

Kamenica and Gentzkow consider a problem where the agent picks the signal structure, and then sends a verifiable signal; this avoids worries about the babbling equilibrium. Their example is a prosecutor, who legally cannot lie, but can choose how to collect information. Both judge and prosecutor have a prior of .3 that the defendant is guilty. The prosecutor earns payoff 1 from a guilty conviction, and 0 from innocence. The judge earns payoff 1 for a correct conviction/acquittal, and 0 from an incorrect one. The prosecutor bias is known. Nonetheless, the prosecutor can get the defendant convicted 60% of the time! How? Collect two types of evidence. We want to use signals to generate a posterior that the defendant is guilty with probability of exactly .5 (in which case the expected-utility maximizing judge convicts), and a posterior that the defendant is innocent with probability 1. That is, half of the time that the prosecutor says “guilty”, the defendant is innocent, and always if the prosecutor says “innocent”, the defendant is innocent. This means the prosecutor says guilty 60% of the time, and innocent 40% of the time. These are Bayesian rational given the prior of guilt with p=.5.

Note what’s going on. From the prosecutor’s perspective, it doesn’t matter when the judge thinks the defendant is guilty with probability 1 or p=.8 or p=.55. As long as p(guilty)>=.5, the judge will convict, and the prosecutor will get payoff 1. So collecting evidence that is really strong is worthless. Better to collect evidence that is just strong enough to get a conviction; by doing so, the prosecutor can give the same evidence for lots of innocent people as well as the truly guilty! This principle applies broadly; the authors show that a car dealership will want a buyer to just barely believe the car is a good match for her if she should buy, and tell all others that the car is a terrible match.

The math here is interesting. Subgame perfection, plus a short proposition, means that the agent’s action will be a deterministic function of his posterior. The sender’s payoff is a function of the agent’s action. Hence the sender’s payoff is a function of the induced posterior. Whenever there are convexities in the sender payoff as a function of agent’s action, the sender should choose signals (or randomize) to “smooth out” that convexity. In the example, the prosecutor payoff is 0 if the judge’s posterior of guilt is below .5, and 1 if equal to or above .5. That function has a convexity between 0 and .5. We can choose signals in a verifiable way such that if the prior is some p in (0, .5), the overall probability of conviction rises to 2p; just choose evidence such that the p percent of guilty defendants, plus another p percent of innocent defendants, all have the same just-strong-enough evidence against them.

Final AER version (IDEAS page)

“Dynamic Lemons Problem,” I-K Cho & A. Matsui (2012)

Here’s a result so fresh that the paper doesn’t even exist yet; Cho (yes, the Cho from Cho-Kreps) presented a version of the following at a seminar here recently and I wanted to jot some notes down while it’s fresh in my mind.

Take the standard Akerlof problem. There is a unit mass of buyers, a unit mass of high quality sellers, and a unit mass of low quality sellers. Quality is known only to the individual seller. All buyers value high quality goods at H and low quality goods at L. Low quality sellers have a reservation value of 0 and high quality sellers a reservation value of C, where H-C>L. Thus, efficiency is maximized by each high quality seller selling at some price to a buyer. Car quality is unobservable, so assume there is a pooling equilibrium where high and low quality cars are sold at the same price. Then consumers value these cars at (H+L)/2. So if (H+L)/2 is less than C, high quality sellers will be unwilling to sell, only low quality sellers will remain in the market, and we say the market has collapsed. It is a pretty robust result in these types of asymmetric information models that the lowest quality types always remain in the market, and the high quality types are often pushed completely out.

What if this market, however, were dynamic? What we mean here is that, in period one, buyers and sellers match randomly with each other; there are twice as many sellers as buyers, so only half the sellers match. In any given match, a price is offered for the transaction. If either party rejects the price, no match is made, payoffs are zero this period, and both buyer and seller rejoin the unmatched pool in the next period. If it is accepted, the good is sold, its value is realized, and the seller receives payoff of the price minus his cost, while the buyer receives either H or L minus the price, depending on what quality the good ended up being. This relationship is maintained, with precisely the same payoffs, every period into the future, except that with probability (1-delta) the relationship ends and both buyer and seller rejoin the unmatched pool. We then move to the next period and everyone who is currently unmatched is randomly matched again. We are interested in the existence of price(s) that form an undominated stationary equilibrium as the period length (and hence discounting between periods) goes to zero.

Will the static logic, that all buyers match with low quality types at a price somewhere between 0 and L, maintain? It will not. If that were an equilibrium, then every seller unmatched will be a high quality seller. So an individual buyer who rejoins the unmatched pool will match almost certainly with a high quality seller, and if the price is C+epsilon, the seller will accept the offer and the buyer will improve his payoff. In any equilibrium, then, there must be at least two prices being offered, and at least some high quality sellers must match.

Note, now, the incentive for low quality sellers. By pretending to be a high quality seller (only accepting the price C+epsilon, and not the price at which only low types would accept), the low quality sellers improve their payoff. So, again, in equilibrium, low quality sellers will mix between accepting any offer and only accepting the high offer in which they pretend to be high quality types. Cho shows that this deception leads to a couple worrying results: first, this deception means that, in the limit, buyer’s expected payoffs go to zero in any equilibrium, and second, that some buyers do not match for arbitrarily long numbers of periods. The second result obtains because buyers are getting arbitrarily close to zero payoff, so the harm from waiting is very low, and if they accept a low quality seller masquerading as a high quality seller, they will receive a negative payoff.

Note that all of the intuition above appears (at least to me) robust to allowing a continuum of seller types or to allowing sellers and buyers to break existing relationships the period after they begin. What remains to be seen, though, is what economic problem looks like Mortensen-Pissarides plus Akerlof lemons. I’d be surprised, given the importance of relational contracts in the world of finance, if we couldn’t find something suitable in that venue. (Extended abstract – I’m fairly certain no draft of this paper is out yet)

“Agreeing to Disagree: The Non-Probabilistic Case,” D. Samet (2010)

Aumann famously showed that two Bayesian agents with common priors cannot “agree to disagree” about a posterior that is common knowledge. One might wonder, does this generalize to decision functions other than Bayesianism? In the early 80s, Cave (1983) and Bacharach (1985) did precisely that, stating that likemindedness (we take the same decision if we have the same information) and a sure thing principle that only implicitly used the knowledge operator. This recent paper in GEB by Dov Samet shows that the sure thing principle they use is problematic, and rederives conditions necessary for agreement.

The problem essentially is this. Give me two agents with two information partitions. I want to say that A is more knowledgeable than B if A’s knowledge is given by a set E and B’s knowledge by a set E+F, where + here represents the union operator. The problem is that this is impossible with the standard partitional formulation of knowledge that philosophers and economists use. If two agents do not have exactly the same information, then each one knows something the other does not. This is true even if one has an information partition that is a strict refinement of the other. Why? Let A know event G at state w. Let B not know event G. The knowledge operator itself also defines events, and by a property of knowledge, A does not know that A does not know G at w, while B knows that he does not know G.

The intuition from Cave and Bacharach can still work, though. Let [j>=i] be all the states where no matter what event E occurs, j knows E whenever i knows it. Assume that if i knows j is at least as knowledgeable at he is in state w, then i takes the same decision as j. Finally assume that if we add a third agent who knows less than i or j at w, then all three agents make the same decision. When these assumptions hold, agents cannot agree to disagree.

Samet quotes a story from Aumann that sums up how the theory works. Alice and Bob are detectives. Bob collects data until 5 with his partner Alice. Alice then stays at work until late at night collecting more information. Both were trained in the same academy, and therefore make the same decisions if they have the same information. Intuitively, if Bob knows that Alice has every bit of information he has plus more, then he should just make the same decision in the end as Alice about the guilt of the suspect. The conditions in the prior paragraph capture this intuition. (Final GEB version)

“Organizations as Information Processing Systems,” R. Daft & R. Lengel (1983)

I don’t believe this paper is well-known by economists, but it has been hugely influential for management and media studies. The theory in this paper is qualitative in the same way economic theory is, but is not mathematical. In this post, I’ll try to reinterpret the main ideas mathematically.

Firms face two primary types of uncertainty. First, the outside environment is uncertain. Second, the internal environment is uncertain. When speech is vague, a manager may misinterpret what the true state of the world is, or subordinates may misinterpret the goals of the organization. When speech is precise, it can be very costly to interpret. Indeed, precise speech about unclear goals is basically worthless: two subordinates may precisely state the answer to two different problems, both of which are different from what the manager wanted to know.

Choice of media, then, can vary. Sometimes speech within an organization is very formal: quantitative models, memos, etc. Sometimes it is informal: face-to-face meetings, informal legends, company lore. The informal speech is able to discuss a broader set of ideas, but with greater ambiguity. The formal speech can present specific ideas exactly, but nothing more. This tradeoff roughly implies the following: when the purpose of a discussion is equivocal or unclear, informal speech should be used to “get us on the same page”. When a discussion involves something routine, precise speech can be used. This has a number of implications: for example, informal communication will be most common at the goal setting stage, or when two different departments are beginning to work together on a task, but formal communication will be most common within a division or after goals have been agreed upon by all parties or when the external environment has less uncertainty.

Clearly, the intersection of language and economics is far more general. For example, equivocality is often introduced on purpose: people speak vaguely, for example, in order than common knowledge does not develop. An example, after a first date: “Would you like to come up to my apartment for some coffee?” Further, vague and precise speech are more than simply vague or precise, but rather are vague and precise in particular ways. Poetry is quoted rather than a meaningless stream of words, for example. Neither the authors or I have much to say on these extensions, but it is definitely an open field right now for some interested researcher.

How might you model the ideas of the present paper mathematically? (Of course, you might ask why these ideas should be modeled mathematically anyway, but I have discussed many times here why social science theory ought be formal, and to the extent that it’s formal, the tools of mathematical logic allow the cleanest possible transmission of ideas and derivation of unexpected consequences, so I won’t rehash those arguments here. Indeed, the whole “should we be formal” discussion seems a bit too meta in the context of this post…) Let the relevant true state be a number in [0,1]^n. Let transmission of the exact state be increasing in its dimension, perhaps linearly. Let transmission of imprecise information be increasing less than linearly, perhaps logarithmically. Imprecise states are interpreted by the receiver with error (something like the truncated exponential version of a normal distribution to ensure we stay in [0,1]^n). Loss functions of the final decision made by the receiver depend on distance from the true state. What should a manager do? Well, on simple decisions where the relevant state is only a point on the line segment [0,1], getting the exact state is cheap, so subordinates should send the manager fairly precise information like a statistical estimate in a memo. On complex decisions, where the relevant state is a point in the 100-dimension [0,1] hypercube, learning the true state will be very expensive (it may require the manager to read a 1000 page quantitative report, for instance), but learning an approximate state will be relatively cheap (it may involve some face-to-face conversations). Once the model is formalized like this, then we can answer questions like “Should management communicate via a hierarchy or not?” I have some plans for work along these lines, using some ideas about transmitting counterfactuals given a set of information partitions, and would definitely appreciate comments concerning how to model this type of media richness. (Working paper)

%d bloggers like this: