“Competition in Persuasion,” M. Gentzkow & E. Kamenica (2012)

How’s this for fortuitous timing: I’d literally just gone through this paper by Gentzkow and Kamenica yesterday, and this morning it was announced that Gentzkow is the winner of the 2014 Clark Medal! More on the Clark in a bit, but first, let’s do some theory.

This paper is essentially the multiple sender version of the great Bayesian Persuasion paper by the same authors (discussed on this site a couple years ago). There are a group of experts who can (under commitment to only sending true signals) send costless signals about the realization of the state. Given the information received, the agent makes a decision, and each expert gets some utility depending on that decision. For example, the senders might be a prosecutor and a defense attorney who know the guilt of a suspect, and the agent a judge. The judge convicts if p(guilty)>=.5, the prosecutor wants to maximize convictions regardless of underlying guilt, and vice versa for the defense attorney. Here’s the question: if we have more experts, or less collusive experts, or experts with less aligned interests, is more information revealed?

A lot of our political philosophy is predicated on more competition in information revelation leading to more information actually being revealed, but this is actually a fairly subtle theoretical question! For one, John Stuart Mill and others of his persuasion would need some way of discussing how people competing to reveal information strategically interact, and to the extent that this strategic interaction is non-unique, they would need a way for “ordering” sets of potentially revealed information. We are lucky in 2014, thanks to our friends Nash and Topkis, to be able to nicely deal with each of those concerns.

The trick to solving this model (basically every proof in the paper comes down to algebra and some simple results from set theory; they are clever but not technically challenging) is the main result from the Bayesian Persuasion paper. Draw a graph with the agent’s posterior belief on the X-axis, and the utility (call this u) the sender gets from actions based on each posterior on the y-axis. Now draw the smallest concave function (call it V) that is everywhere greater than u. If V is strictly greater than u at the prior p, then a sender can improve her payoff by revealing information. Take the case of the judge and the prosecutor. If the judge has the prior that everyone brought before them is guilty with probability .6, then the prosecutor never reveals information about any suspect, and the judge always convicts (giving the prosecutor utility 1 rather than 0 from an acquittal). If, however, the judge’s prior is that everyone is guilty with .4, then the prosecutor can mix such that 80 percent of criminals are convicted by judiciously revealing information. How? Just take 2/3 of the innocent people, and all of the guilty people, and send signals that each of these people is guilty with p=.5, and give the judge information on the other 1/3 of innocent people that they are innocent with probability 1. This is plausible in a Bayesian sense. The judge will convict all of the folks where p(guilty)=.5, meaning 80 percent of all suspects are convicted. If you draw the graph described above with u=1 when the judge convicts and u=0 otherwise, it is clear that V>u if and only if p<.5, hence information is only revealed in that case.

What about when there are multiple senders with different utilities u? It is somewhat intuitive: more information is always, almost by definition, informative for the agent (remember Blackwell!). If there is any sender who can improve their payoff by revealing information given what has been revealed thus far, then we are not in equilibrium, and some sender has the incentive to deviate by revealing more information. Therefore, adding more senders increases the amount of information revealed and “shrinks” the set of beliefs that the agent might wind up holding (and, further, the authors show that any Bayesian plausible beliefs where no sender can further reveal information to improve their payoff is an equilibrium). We still have a number of technical details concerning multiplicity of equilibria to deal with, but the authors show that these results hold in a set order sense as well. This theorem is actually great: to check equilibrium information revelation, I only need to check where V and u diverge sender by sender, without worrying about complex strategic interactions. Because of that simplicity, it ends up being very easy to show that removing collusion among senders, or increasing the number of senders, will improve information revelation in equilibrium.

September 2012 working paper (IDEAS version). A brief word on the Clark medal. Gentzkow is a fine choice, particularly for his Bayesian persuasion papers, which are already very influential. I have no doubt that 30 years from now, you will still see the 2011 paper on many PhD syllabi. That said, the Clark medal announcement is very strange. It focuses very heavily on his empirical work on newspapers and TV, and mentions his hugely influential theory as a small aside! This means that five of the last six Clark medal winners, everyone but Levin and his relational incentive contracts, have been cited primarily for MIT/QJE-style theory-light empirical microeconomics. Even though I personally am primarily an applied microeconomist, I still see this as a very odd trend: no prizes for Chernozhukov or Tamer in metrics, or Sannikov in theory, or Farhi and Werning in macro, or Melitz and Costinot in trade, or Donaldson and Nunn in history? I understand these papers are harder to explain to the media, but it is not a good thing when the second most prominent prize in our profession is essentially ignoring 90% of what economists actually do.

“Finite Additivity, Another Lottery Paradox, and Conditionalisation,” C. Howson (2014)

If you know the probability theorist Bruno de Finetti, you know him either for his work on exchangeable processes, or for his legendary defense of finite additivity. Finite additivity essentially replaces the Kolmogorov assumption of countable additivity of probabilities. If Pr(i) for i=1 to N is the probability of event i, then the probability of the union of all i is just the sum of each individual probability under either countable of finite additivity, but countable additivity requires that property to hold for a countably infinite set of events.

What is objectionable about countable additivity? There are three classic problems. First, countable additivity restricts me from some very reasonable subjective beliefs. For instance, I might imagine that a Devil is going to pick one of the integers, and that he is equally likely to predict any given number. That is, my prior is uniform over the integers. Countable additivity does not allow this: if the probability of any given number being picked is greater than zero, then the sum diverges, and if the probability any given number is picked is zero, then by countable additivity the sum of the grand set is also zero, violating the usual axiom that the grand set has probability 1. The second problem, loosely related to the first, is that I literally cannot assign probabilities to some objects, such as a nonmeasurable set.

The third problem, though, is the really worrying one. To the extent that a theory of probability has epistemological meaning and is not simply a mathematical abstraction, we might want to require that it not contradict well-known philosophical premises. Imagine that every day, nature selects either 0 or 1. Let us observe 1 every day until the present (call this day N). Let H be the hypothesis that nature will select 1 every day from now until infinity. It is straightforward to show that countable additivity requires that as N grows large, continued observation of 1 implies that Pr(H)->1. But this is just saying that induction works! And if there is any great philosophical advance in the modern era, it is Hume’s (and Goodman’s, among others) demolition of the idea that induction is sensible. My own introduction to finite additivity comes from a friend’s work on consensus formation and belief updating in economics: we certainly don’t want to bake in ridiculous conclusions about beliefs that rely entirely on countable additivity, given how strongly that assumption militates for induction. Aumann was always very careful on this point.

It turns out that if you simply replace countable additivity with finite additivity, all of these problems (among others) go away. Howson, in a paper in the newest issue of Synthese, asks why, given that clear benefit, anyone still finds countable additivity justifiable? Surely there are lots of pretty theorems, from Radon-Nikodym on down, that require countable additivity, but if the theorem critically hinges on the basis of an unjustifiable assumption, then what exactly are we to infer about the justifiability of the theorem itself?

Two serious objections are tougher to deal with for de Finetti acolytes: coherence and conditionalization. Coherence, a principle closely associated with de Finetti himself, says that there should not be “fair bets” given your beliefs where you are guaranteed to lose money. It is sometimes claimed that a uniform prior over the naturals is not coherent: you are willing to take a bet that any given natural number will not be drawn, but the conjunction of such bets for all natural numbers means you will lose money with certainty. This isn’t too worrying, though; if we reject countable additivity, then why should we define coherence to apply to non-finite conjunctions of bets?

Conditionalization is more problematic. It means that given prior P(i), your posterior P(f) of event S after observing event E must be such that P(f)(S)=P(i)(S|E). This is just “Bayesian updating” off of a prior. Lester Dubins pointed out the following. Let A and B be two mutually exclusive hypothesis, such that P(A)=P(B)=.5. Let the random quantity X take positive integer values such that P(X=n|B)=0 (you have a uniform prior over the naturals conditional on B obtaining, which finite additivity allows), and P(X=n|A)=2^(-n). By the law of total probability, for all n, P(X=n)>0, and therefore by Bayes’ Theorem, P(B|X=n)=1 and P(A|X=n)=0, no matter which n obtains! Something is odd here. Before seeing the resolution of n, you would take a fair bet on A obtaining. But once n obtains (no matter which n!), you are guaranteed to lose money by betting on A.

Here is where Howson tries to save de Finetti with an unexpected tack. The problem in Dubins example is not finite additivity, but conditionalization – Bayesian updating from priors – itself! Here’s why. By a principle called “reflection”, if using a suitable updating rule, your future probability of event A is p with certainty, then your current probability of event A must also be p. By Dubins argument, then, P(A)=0 must hold before X realizes. But that means your prior must be 0, which means that whatever independent reasons you had for the prior being .5 must be rejected. If we are to give up one of Reflection, Finite Additivity, Conditionalization, Bayes’ Theorem or the Existence of Priors, Howson says we ought give up conditionalization. Now, there are lots of good reasons why conditionalization is sensible within a utility framework, so at this point, I will simply point your toward the full paper and let you decide for yourself whether Howson’s conclusion is sensible. In any case, the problems with countable additivity should be better known by economists.

Final version in Synthese, March 2014 [gated]. Incidentially, de Finetti was very tightly linked to the early econometricians. His philosophy – that probability is a form of logic and hence non-ampliative (“That which is logical is exact, but tells us nothing”) – simply oozes out of Savage/Aumann/Selten methods of dealing with reasoning under uncertainty. Read, for example, what Keynes had to say about what a probability is, and you will see just how radical de Finetti really was.

“At Least Do No Harm: The Use of Scarce Data,” A. Sandroni (2014)

This paper by Alvaro Sandroni in the new issue of AEJ:Micro is only four pages long, and has only one theorem whose proof is completely straightforward. Nonetheless, you might find it surprising if you don’t know the literature on expert testing.

Here’s the problem. I have some belief p about which events (perhaps only one, perhaps many) will occur in the future, but this belief is relatively uninformed. You come up to me and say, hey, I actually *know* the distribution, and it is p*. How should I incentivize you to truthfully reveal your knowledge? This step is actually an old one: all we need is something called a proper scoring rule, the Brier Score being the most famous. If someone makes N predictions f(i) about the probability of binary events i occurring, then the Brier Score is the sum of the squared difference between each prediction and its outcome {0,1}, divided by N. So, for example, if there are three events, you say all three will independently happen with p=.5, and the actual outcomes are {0,1,0}, your score is 1/3*[(.5-1)^2+2*(.5-0)^2], or .25. The Brier Score being a proper scoring rule means that your expected score is lowest if you actually predict the true probability distribution. That being the case, all I need to do is pay you more the lower your Brier Score is, and if you are risk-neutral you, being the expert, will truthfully reveal your knowledge. There are more complicated scoring rules that can handle general non-binary outcomes, of course. (If you don’t know what a scoring rule is, it might be worthwhile to convince yourself why a rule equal to the summed absolute value of deviations between prediction and outcome is not proper.)

That’s all well and good, but a literature over the past decade or so called “expert testing” has dealt with the more general problem of knowing who is actually an expert at all. It turns out that it is incredibly challenging to screen experts from charlatans when it comes to probabilistic forecasts. The basic (too basic, I’m afraid) reason is that your screening rule can only condition on realizations, but the expert is expected to know a much more complicated object, the probability distributions of each event. Imagine you want to use the following rule, called calibration, to test weathermen: on days where rain was predicted p=.4, it actually does rain close to 40 percent of those days. A charlatan has no idea whether it will rain today or tomorrow, but after making a year of predictions, notices that most of his predictions are “too low”. When rain was predicted with .6, it rained 80 percent of the time, and when predicted with .7, it rained 72 percent of the time, etc. What should the charlatan do? Start predicting rain every day, to become “better calibrated”. As the number of days grows large, this trick gets the charlatan closer and closer to calibration.

But, you say, surely I can notice such an obviously tricky strategy. That implicitly means you want to use a more complicated test to screen the charlatans from the experts. And a famous result of Foster and Vohra (which apparently was very hard to publish because so many referees simply didn’t believe the proof!) says that any test which passes experts with high probability for any realization of nature as the number of predictions gets large can be passed by a suitably clever and strategic charlatan with high probability. And, indeed, the proof of this turns out to be a straightforward application of an abstract minimax theorem proven by Fan in the early 1950s.

Back, now, to the original problem of this post. If I know you are an expert, I can get your information with a payment that is maximized when a proper scoring rule is minimized. But what if, in addition to wanting info when it is good, I don’t want to be harmed when you are a charlatan? And further, what if only a single prediction is being made? The expert testing results mean that screening good from bad is going to be a challenge no matter how much data I have. If you are a charlatan and are always incentivized to report my prior, then I am not hurt. But if you actually know the true probabilities, I want to pay you according to a proper scoring rule. Try this payment scheme: if you predict my prior p, then you get a payment ε which does not depend on the realization of the data. If you predict anything else, you get an expected payment based on a proper scoring rule, and that expected payment is greater than ε. So the informed expert is incentivized to report truthfully (there is a straightforward modification of the above if the informed expert is not risk-neutral). How can we get the charlatan to always report p? If the charlatan has minmax preferences as in Gilboa-Schmeidler, then the payoff is ε if p is reported no matter how the data realizes. If, however, the probability distribution actually is p, and the charlatan ever reports anything other than p, then since payoffs are based on a proper scoring rule, in that “worst-case scenario” the charlatan’s expected payoff is less than ε, hence she will never report anything other than p due to the minmax preferences. I wouldn’t worry too much about the minmax assumption, since it makes quite a bit of sense as a utility function for a charlatan that must make a decision what to announce under a complete veil of ignorance about nature’s true distribution.

Final AEJ:Micro version, which is unfortunately behind a paywall (IDEAS page). I can’t find an ungated version of this article. It remains a mystery why the AEA is still gating articles in the AEJ journals. This is especially true of AEJ:Micro, a society-run journal whose main competitor, Theoretical Economics, is completely open access.

“Immigration and the Diffusion of Technology: The Huguenot Diaspora in Prussia,” E. Hornung (2014)

Is immigration good for natives of the recipient country? This is a tough question to answer, particularly once we think about the short versus long run. Large-scale immigration might have bad short-run effects simply because more L plus fixed K means lower average incomes in essentially any economic specification, but even given that fact, immigrants bring with them tacit knowledge of techniques, ideas, and plans which might be relatively uncommon in the recipient country. Indeed, world history is filled with wise leaders who imported foreigners, occasionally by force, in order to access their knowledge. As that knowledge spreads among the domestic population, productivity increases and immigrants are in the long-run a net positive for native incomes.

How substantial can those long-run benefits be? History provides a nice experiment, described by Erik Hornung in a just-published paper. The Huguenots, French protestants, were largely expelled from France after the Edict of Nantes was revoked by the Sun King, Louis XIV. The Huguenots were generally in the skilled trades, and their expulsion to the UK, the Netherlands and modern Germany (primarily) led to a great deal of tacit technology transfer. And, no surprise, in the late 17th century, there was very little knowledge transfer aside from face-to-face contact.

In particular, Frederick William, Grand Elector of Brandenburg, offered his estates as refuge for the fleeing Huguenots. Much of his land had been depopulated in the plagues that followed the Thirty Years’ War. The centralized textile production facilities sponsored by nobles and run by Huguenots soon after the Huguenots arrived tended to fail quickly – there simply wasn’t enough demand in a place as poor as Prussia. Nonetheless, a contemporary mentions 46 professions brought to Prussia by the Huguenots, as well as new techniques in silk production, dyeing fabrics and cotton printing. When the initial factories failed, knowledge among the apprentices hired and purchased capital remained. Technology transfer to natives became more common as later generations integrated more tightly with natives, moving out of Huguenot settlements and intermarrying.

What’s particularly interesting with this history is that the quantitative importance of such technology transfer can be measured. In 1802, incredibly, the Prussians had a census of manufactories, or factories producing stock for a wide region, including capital and worker input data. Also, all immigrants were required to register yearly, and include their profession, in 18th century censuses. Further, Huguenots did not simply move to places with existing textile industries where their skills were most needed; indeed, they tended to be placed by the Prussians in areas which had suffered large population losses following the Thirty Years’ War. These population losses were highly localized (and don’t worry, before using population loss as an IV, Hornung makes sure that population loss from plague is not simply tracing out existing transportation highways). Using input data to estimate a Cobb-Douglas textile production function, an additional percentage point of the population with Huguenot origins in 1700 is associated with a 1.5 percentage point increase in textile productivity in 1800. This result is robust in the IV regression using wartime population loss to proxy for the percentage of Huguenot immigrants, as well as many other robustness checks. 1.5% is huge given the slow rate of growth in this era.

An interesting historical case. It is not obvious to me how relevant this estimation to modern immigration debates; clearly it must depend on the extent to which knowledge can be written down or communicated at distance. I would posit that the strong complementarity of factors of production (including VC funding, etc.) are much more important that tacit knowledge spread in modern agglomeration economies of scale, but that is surely a very difficult claim to investigate empirically using modern data.

2011 Working Paper (IDEAS version). Final paper published in the January 2014 AER.

“Wall Street and Silicon Valley: A Delicate Interaction,” G.-M. Angeletos, G. Lorenzoni & A. Pavan (2012)

The Keynesian Beauty Contest – is there any better example of an “old” concept in economics that, when read in its original form, is just screaming out for a modern analysis? You’ve got coordination problems, higher-order beliefs, signal extraction about underlying fundamentals, optimal policy response by a planner herself informationally constrained: all of these, of course, problems that have consumed micro theorists over the past few decades. The general problem of irrational exuberance when we start to model things formally, though, is that it turns out to be very difficult to generate “irrational” actions by rational, forward-looking agents. Angeletos et al have a very nice model that can generate irrational-looking asset price movements even when all agents are perfectly rational, based on the idea of information frictions between the real and financial sector.

Here is the basic plot. Entrepreneurs get an individual signal and a correlated signal about the “real” state of the economy (the correlation in error about fundamentals may be a reduced-form measure of previous herding, for instance). The entrepreneurs then make a costly investment. In the next period, some percentage of the entrepreneurs have to sell their asset on a competitive market. This may represent, say, idiosyncratic liquidity shocks, but really it is just in the model to abstract away from the finance sector learning about entrepreneur signals based on the extensive margin choice of whether to sell or not. The price paid for the asset depends on the financial sector’s beliefs about the real state of the economy, which come from a public noisy signal and the trader’s observations about how much investment was made by entrepreneurs. Note that the price traders pay is partially a function of trader beliefs about the state of the economy derived from the total investment made by entrepreneurs, and the total investment made is partially a function of the price at which entrepreneurs expect to be able to sell capital should a liquidity crisis hit a given firm. That is, higher order beliefs of both the traders and entrepreneurs about what the other aggregate class will do determine equilibrium investment and prices.

What does this imply? Capital investment is higher in the first stage if either the state of the world is believed to be good by entrepreneurs, or if the price paid in the following period for assets is expected to be high. Traders will pay a high price for an asset if the state of the world is believed to be good. These traders look at capital investment and essentially see another noisy signal about the state of the world. When an entrepreneur sees a correlated signal that is higher than his private signal, he increases investment due to a rational belief that the state of the world is better, but then increases it even more because of an endogenous strategic complementarity among the entrepreneurs, all of whom prefer higher investment by the class as a whole since that leads to more positive beliefs by traders and hence higher asset prices tomorrow. Of course, traders understand this effect, but a fixed point argument shows that even accounting for the aggregate strategic increase in investment when the correlated signal is high, aggregate capital can be read by traders precisely as a noisy signal of the actual state of the world. This means that when when entrepreneurs invest partially on the basis of a signal correlated among their class (i.e., there are information spillovers), investment is based too heavily on noise. An overweighting of public signals in a type of coordination game is right along the lines of the lesson in Morris and Shin (2002). Note that the individual signals for entrepreneurs are necessary to keep the traders from being able to completely invert the information contained in capital production.

What can a planner who doesn’t observe these signals do? Consider taxing investment as a function of asset prices, where high taxes appear when the market gets particularly frothy. This is good on the one hand: entrepreneurs build too much capital following a high correlated signal because other entrepreneurs will be doing the same and therefore traders will infer the state of the world is high and pay high prices for the asset. Taxing high asset prices lowers the incentive for entrepreneurs to shade capital production up when the correlated signal is good. But this tax will also lower the incentive to produce more capital when the actual state of the world, and not just the correlated signal, is good. The authors discuss how taxing capital and the financial sector separately can help alleviate that concern.

Proving all of this formally, it should be noted, is quite a challenge. And the formality is really a blessing, because we can see what is necessary and what is not if a beauty contest story is to explain excess aggregate volatility. First, we require some correlation in signals in the real sector to get the Morris-Shin effect operating. Second, we do not require the correlation to be on a signal about the real world; it could instead be correlation about a higher order belief held by the financial sector! The correlation merely allows entrepreneurs to figure something out about how much capital they as a class will produce, and hence about what traders in the next period will infer about the state of the world from that aggregate capital production. Instead of a signal that correlates entrepreneur beliefs about the state of the world, then, we could have a correlated signal about higher-order beliefs, say, how traders will interpret how entrepreneurs interpret how traders interpret capital production. The basic mechanism will remain: traders essentially read from aggregate actions of entrepreneurs a noisy signal about the true state of the world. And all this beauty contest logic holds in an otherwise perfectly standard Neokeynesian rational expectations model!

2012 working paper (IDEAS version). This paper used to go by the title “Beauty Contests and Irrational Exuberance”; I prefer the old name!

Personal Note: Moving to Toronto

Before discussing a lovely application of High Micro Theory to a long-standing debate in macro in a post coming right behind this one, a personal note: starting this summer, I am joining the Strategy group at the University of Toronto Rotman School of Management as an Assistant Professor. I am, of course, very excited about the opportunity, and am glad that Rotman was willing to give me a shot even though I have a fairly unusual set of interests. Some friends asked recently if I have any job market advice, and I told them that I basically just spent five years reading interesting papers, trying to develop a strong toolkit, and using that knowledge base to attack questions I am curious about as precisely as I could, with essentially no concern about how the market might view this. Even if you want to be strategic, though, this type of idiosyncrasy might not be a bad strategy.

Consider the following model: any school evaluates you according to v+e(s), where v is a common signal of your quality and e(s) is a school-specific taste shock. You get an offer if v+e(s) is maximized for some school s; you are maximizing a first-order statistic, essentially. What this means is that increasing v (by being smarter, or harder-working, or in a hotter field) and increasing the variance of e (by, e.g., working on very specific topics even if they are not “hot”, or by developing an unusual set of talents) are equally effective in garnering a job you will be happy with. And, at least in my case, increasing v provides disutility whereas increasing the variance of e can be quite enjoyable! If you do not want to play such a high-variance strategy, though, my friend James Bailey (heading from Temple’s PhD program to work at Creighton) has posted some more sober yet still excellent job market advice. I should also note that writing a research-oriented blog seemed to be weakly beneficial as far as interviews were concerned; in perhaps a third of my interviews, someone mentioned this site, and I didn’t receive any negative feedback. Moving from personal anecdote to the minimal sense of the word data, Jonathan Dingel of Trade Diversion also seems to have had a great deal of success. Given this, I would suggest that there isn’t much need to worry that writing publicly about economics, especially if restricted to technical content, will torpedo a future job search.

“The Explanatory Relevance of Nash Equilibrium: One-Dimensional Chaos in Boundedly Rational Learning,” E. Wagner (2013)

The top analytic philosophy journals publish a surprising amount of interesting game and decision theory; the present article, by Wagner in the journal Philosophy of Science, caught my eye recently.

Nash equilibria are stable in a static sense, we have long known; no player wishes to deviate given what others do. Nash equilibria also require fairly weak epistemic conditions: if all players are rational and believe the other players will play the actual strategies they play with probability 1, then the set of outcomes is the Nash equilibrium set. A huge amount of work in the 80s and 90s considered whether players would “learn” to play Nash outcomes, and the answer is by and large positive, at least if we expand from Nash equilibria to correlated equilibria: fictitious play (I think what you do depends on the proportion of actions you took in the past) works pretty well, rules that are based on the relative payoffs of various strategies in the past work with certainty, and a type of Bayesian learning given initial beliefs about the strategy paths that might be used generates Nash in the limit, though note the important followup on that paper by Nachbar in Econometrica 2005. (Incidentally, a fellow student pointed out that the Nachbar essay is a great example of how poor citation measures are for theory. The paper has 26 citations on Google Scholar mainly because it helped kill a literature; the number of citations drastically underestimates how well-known the paper is among the theory community.)

A caution, though! It is not the case that every reasonable evolutionary or learning rule leads to an equilibrium outcome. Consider the “continuous time imitative-logic dynamic”. A continuum of agents exist. At some exponential time for each agent, a buzzer rings, at which point they randomly play another agent. The agent imitates the other agent in the future with probability exp(beta*pi(j)), where beta is some positive number and pi(j) is the payoff to the opponent; if imitation doesn’t occur, a new strategy is chosen at random from all available strategies. A paper by Hofbauer and Weibull shows that as beta grows large, this dynamic is approximately a best-response dynamic, where strictly dominated strategies are driven out; as beta grows small, it looks a lot like a replicator dynamic, where imitation depends on the myopic relative fitness of a strategy. A discrete version of the continuous dynamics above can be generated (all agents simultaneously update rather than individually update) which similarly “ranges” from something like the myopic replicator to something like a best response dynamic as beta grows. Note that strictly dominated strategies are not played for any beta in both the continuous and discrete time i-logic dynamics.

Now consider a simple two strategy game with the following payoffs:

      Left Right
Left   (1,1) (a,2)
Right (2,a) (1,1)

The unique Nash equilibrium is X=1/A. Let, say, A=3. When beta is very low (say, beta=1), and players are “relatively myopic”, and the initial condition is X=.1, the discrete time i-logic dynamic converges to X=1/A. But if beta gets higher, say beta=5, then players are “more rational” yet the dynamic does not converge or cycle at all: indeed, whether the population plays left or right follows a chaotic system! This property can be generated for many initial points X and A.

The dynamic here doesn’t seem crazy, and making agents “more rational” in a particular sense makes convergence properties worse, not better. And since play is chaotic, a player hoping to infer what the population will play next is required to know the initial conditions with certainty. Nash or correlated equilibria may have some nice dynamic properties for wide classes of reasonable learning rules, but the point that some care is needed concerning what “reasonable learning rules” might look like is well taken.

Final 2013 preprint. Big thumbs up to Wagner for putting all of his papers on his website, a real rarity among philosophers. Actually, a number of his papers look quite interesting: Do cooperate and fair bargaining evolve in tandem? How do small world networks help the evolution of meaning in Lewis-style sender-receiver games? How do cooperative “stag hunt” equilibria evolve when 2-player stag hunts have such terrible evolutionary properties? I think this guy, though a recent philosophy PhD in a standard philosophy department, would be a very good fit in many quite good economic theory programs…

“Information Frictions and the Law of One Price,” C. Steinwender (2014)

Well, I suppose there is no surprise that I really enjoyed this paper by Claudia Steinwender, a PhD candidate from LSE. The paper’s characteristics are basically my catnip: one of the great inventions in history, a policy question relevant to the present day, and a nice model to explain what is going on. The question she asks is how informational differences affect the welfare gains from trade. In the present day, the topic comes up over and over again, from the importance of cell phones to village farmers to the welfare impact of public versus closed financial exchanges.

Steinwender examines the completion of the transatlantic telegraph in July 1866. A number of attempts over a decade had been made in constructing this link; the fact that the 1866 line was stable was something of a surprise. Its completion lowered the time necessary to transmit information about local cotton prices in New York (from which much of the supply was sent) and Liverpool (where much of the cotton was bought; see Chapter 15 of Das Kapital for a nice empirical description of the cotton industry at this time). Before the telegraph, steam ships took 7 to 21 days, depending on weather conditions, to traverse the Pond. In a reduced form estimate, the mean price difference in each port, and the volatility of the price difference, fell; price shocks in Liverpool saw immediate responses in shipments from America, and the prices there; exports increases and become more volatile; and similar effects were seen from shocks to ship speed before the telegraph, or temporary technical problems with the line after July 1866. These facts come from amazingly well documented data in New York and UK newspapers.

Those facts are all well and good, but how to explain them, and how to interpret them? It is not at all obvious that information in trade with a durable good should matter. If you ship too much one day, then just store it and ship less in the next period, right? But note the reduced form evidence: it is not just that prices harmonize, but that total shipments increase. What is going on? Without the telegraph, the expected price tomorrow in Liverpool from the perspective of New York sellers is less variable (the conditional expectation conditions on less information about the underlying demand shock, since only the two-week-old autocorrelated demand shock data brought by steamship is available). When high demand in Liverpool is underestimated, then, exports are lower in the era before the telegraph. On the other hand, a low demand shock and a very low demand shock in Liverpool both lead to zero exports, since exporting is unprofitable. Hence, ignoring storage, better information increases the variance of perceived demand, with asymmetric effects from high and low demand shocks, leading to higher overall exports. Storage should moderate the volatility of exports, but not entirely, since a period of many consecutive high demand shocks will eventually exhaust the storage in Liverpool. That is, the lower bound on stored cotton at zero means that even optimal cotton storage does not fully harmonize prices in the presence of information frictions.

Steinwender confirms that intuition by solving for the equilibrium with storage numerically; this is actually a pretty gutsy move, since the numerical estimates are quantitatively quite different than what was observed in the data. Nonetheless, I think she is correct that we are fine interpreting these as qualitative comparative statics from an abstract model rather than trying to interpret their magnitude in any way. (Although I should note, it is not clear to me that we cannot sign the relevant comparative statics just because the model with storage cannot be solved analytically in its entirety…)

The welfare impact of information frictions with storage can be bounded below in a very simple way. If demand is overestimated in New York, then too much is exported, and though some of this cotton is stored, the lower bound at zero for storage means that the price in Liverpool is still too high. If demand in underestimated in New York, then too little is exported, and though some stored cotton might be sold, the lower bound on storage means that the price in Liverpool is still too low. A lower bound on the deadweight loss from those effects can be computed simply by knowing the price difference between the UK and the US and the slopes of the demand and supply curves; in the case of the telegraph, this deadweight loss is on the order of 8% of the value of US cotton exports to the UK, or equivalent to the DWL from a 6% tax on cotton. That is large. I am curious about the impact of this telegraph on US vis-a-vis Indian or Egyptian cotton imports, the main Civil War substitutes; information differences must distort the direction of trade in addition to its magnitude.

January 2014 working paper (No IDEAS version).

“Dynamic Constraints on the Distribution of Stochastic Choice: Drift Diffusion Implies Random Utility,” R. Webb (2013)

Neuroeconomics is a slightly odd field. It seems promising to “open up the black box” of choice using evidence from neuroscience, but despite this promise, I don’t see very many terribly interesting economic results. And perhaps this isn’t surprising – in general, economic models are deliberately abstract and do not hinge on the precise reason why decisions are made, so unsurprisingly neuro appears most successful in, e.g., selecting among behavioral models in specific circumstances.

Ryan Webb, a post-doc on the market this year, shows another really powerful use of neuroeconomic evidence: guiding our choices of the supposedly arbitrary parts of our models. Consider empirical models of random utility. Consumers make a discrete choice, such that the object chosen i is that which maximizes utility v(i). In the data, even the same consumer does not always make the same choice (I love my Chipotle burrito bowl, but I nonetheless will have a different lunch from time to time!). How, then, can we use the standard choice setup in empirical work? Add a random variable n(i) to the decision function, letting agents choose i which maximizes v(i)+n(i). As n will take different realizations, choice patterns can vary somewhat.

The question, though, is what distribution n(i) should take? Note that the probability i is chosen is just

P(v(i)+n(i)>=v(j)+n(j)) for all j

or

P(v(i)-v(j)>=n(i)-n(j)) for all j

If n are distributed independent normal, then the difference n(i)-n(j) is normal. If n are extreme value type I, the difference is logistic. Do either of those assumptions, or some alternative, make sense?

Webb shows that random utility is really just a reduced form of a well-established class of models in psychology called bounded accumulation models. Essentially, you receive a series of sensory inputs stochastically, the data adds up in your brain, and you make a decision according to some sort of stopping rule as the data accumulates in a drift diffusion. In a choice model, you might think for a bit, accumulating reasons to choose A or B, then stop at a fixed time T* and choose the object that, after the random drift, has the highest perceived “utility”. Alternatively, you might stop once the gap between the perceived utilities of different alternatives is high enough, or once one alternative has a sufficiently high perceived utility. It is fairly straightforward to show that this class of models all collapses to max v(i)+n(i), with differing implications for the distribution of n. Thus, neuroscience evidence about which types of bounded accumulation models appear most realistic can help choose among distributions of n for empirical random utility work.

How, exactly? Well, for any stopping rule, there is an implied distribution of stopping times T*. The reduced form errors n are then essentially the sample mean of random draws from an finite accretion process, and hence if the rule implies relatively short stopping times, n will be fat-tailed rather than normal. Also, consider letting the difference in underlying utility v(i)-v(j) be large. Then the stopping time under the accumulation models is relatively short, and hence the variance in the distribution of reduced form errors (again, essentially the sample mean of random draws) is relatively large. Hence, errors are heteroskedastic in the underlying v(i)-v(j). Webb gives additional results relating to the skew and correlation of n. He further shows that assuming independent normality or independent extreme value type I for the error terms can lead to mistaken inference, using a recent AER that tries to infer risk aversion parameters from choices among monetary lotteries. Quite interesting, even for a neuroecon skeptic!

2013 Working Paper (No IDEAS version).

Is Theory Really Dead? 2013-14 Job Market Stars in Economics

Solely out of curiosity, I have been collecting data on the characteristics of economics job market “stars” over the past few years. In order to receive a tenure-track offer, an economist must first be “flown out” to a university to give a talk presenting their best research. I define a star using an somewhat arbitrary cutoff based on flyouts reported publicly online – roughly, the minimum cutoff would be a candidate who is flown out to, e.g., Chicago Booth, UCLA, Cornell and Toronto. 95%+ of the job candidates from prior years above that cutoff have been hired into what I would consider a highly prestigious job, and hence are in a good place to influence the direction of the profession in the years to come.

It is widely recognized that the topics and methodologies of interest to young economists are a leading indicator of where economics might be heading. Overall, as Hamermesh pointed out in a JEL article last year, there has been an enormous shift over the past couple decades towards empirical work, particularly work where the parameters of interest are either simple treatment effects from observational or experimental data; this work is often called “reduced-form”, though that term traditionally had a very different meaning.

This trend does not hold among the top candidates this year. I find 42 candidates, from 21 universities including 6 outside the United States (LSE, CEMFI, EUI, Toulouse, Sciences Po, UCL) above the “star” cutoff; this list omits junior candidates coming off extended (> 2 year) post-docs. I generally use self-reported field in the table below.

Job Market Stars by Field
Macro 8
Labor 6
Micro theory 5
IO 4
Econometrics 4
Applied Micro 4
Intl./Trade 3
Finance 3
Public 2
Development 1
History 1
Political Economy 1

In the following tables, I split papers up into pure theory and empirics, and then split the empirical papers into structural models (where the estimates of interest are parameters in a choice and/or equilibrium-based economic model), “light theory” (where the main estimates are treatment effects whose interest is derived from a light model), and pure treatment effect estimation (where the work is purely experimental, either in the lab or the field, or a reduced form estimate of some economic parameter).

Theory versus Empirics
Pure Theory 11
Empirical 31
  of which
Structural 25
“Light theory” 4
Experimental/Reduced-form 2

Data Source if Empirical
Custom Data 11
Public Data 20

Finally, there seems to be a widespread belief that publications are necessary in order to be a top job market candidate. In the table below, “Top 5″ means AER, Econometrica, ReStud, QJE or JPE, and R&R denotes a publicly divulged Revise & Resubmit. I include AER Papers & Proceedings as a publication, but omit all non-peer reviewed publications such as Fed or think tank journal articles. Categories refer to the “best” publication should a candidate have more than one.

Publication History Among Stars
No Pubs or R&Rs 20
Sole-authored top five 1
Coauthored top five 4
Sole-authored top five R&R 1
Coauthored top five R&R 6
Sole-authored other pub 2
Coauthored other pub 5
Sole-authored other R&R 3

What is the takeaway here? I see three major ones. First, the market is fairly efficient: students from many schools beyond the Harvards and MITs of the world are able to get looks from top departments. Second, publications are nice but far from necessary: less than 20% of the stars even have a sole-authored revise & resubmit, let alone an AER on their CV.

Third, and most importantly, theory is far from dead; indeed, purely applied economics appears to be the method going out of favor! Of the 42 star job market papers, 11 are pure theory, and 25 estimate structural models; in many of those papers, the theoretical mechanisms identified clearly trump the data work. Only 6 of the 42 could by any stretch be identified as reduced form or experimental economics, and of those 6, 4 nonetheless include a non-trivial economic model to guide the empirical estimation. Given Hamermesh’s data, this is a major change (and indeed, it seems quite striking even compared with the market five years ago!).

Follow

Get every new post delivered to your Inbox.

Join 169 other followers

%d bloggers like this: