“Stochastic Games in Continuous Time: Persistent Actions in Long-Run Relationships,” A. Bohren (2011)

A straw poll of my theory-inclined classmates gave UCSD’s Aislinn Bohren the win as this year’s most impressive job market candidate in theory. The present paper is a particularly interesting take on the question of reputation. Imagine two of us are playing a repeated game with imperfect monitoring: I can lie, or cheat, or deviate in some other way in one stage, and you won’t be able to perfectly detect that deviation. I then have a very strong desire to deviate short term. But many games give higher total discounted payoffs if only I could commit to not deviating. For instance, a politician may benefit from taking lobbying money now, but will get a higher vote share long time if voters think he didn’t take lobbying money. If lobbying is only imperfectly observable, the politician has a lot of incentive to deviate since it will be difficult to catch him.

The traditional way to model reputation-building is with behavioral types – this model begins with the “Gang of Four” 1982 paper by Kreps, Milgrom, Roberts and Wilson. In their model, there is some percentage of politicians who never take lobbying money. By giving off some good signals by not taking money early, I can convince the voter that I am the “good” behavioral type and not the self-interested type. There are two problems with this method, though. First, though I’m sure many will disagree, I think behavioral types go completely against the spirit of game theoretic reasoning, and we ought use them as infrequently as possible. Second, Faingold and Sannikov’s Book-of-Genesis-length 2011 Econometrica on infinitely-repeated continuous time games with imperfect monitoring showed that in games between a long run player (who is perhaps a behavioral type) and a series of short run players, the long run players can achieve no higher than the stage game payoff in each period. Even assuming behavioral types, in continuous time the short run players eventually learn whether the long-run player is “really” a behavioral type, hence any benefit from mimicking a behavioral type is temporary. There is a third reason you might not like the behavioral type model: it is not at all obvious that when we say “reputation” in everyday speak, we mean the Gang of Four meaning. If an auto body has a reputation for doing good work on cars, mightn’t this mean the auto body has taken good past actions that imply it has no incentive to cheat you today, rather than that the auto body has merely taken good past actions?

Bohren’s model uses a persistent state variable to model reputation in precisely the first manner. The action the long run player can take today is a function of a state variable X determined by past actions, with the most recent actions having the largest effect. Imagine X as the stock of knowledge a surgeon has, with X higher if the surgeon exerted high effort in past periods. Or X might be the “moral potential” of a politician, increasing in past “good” actions. Formally, let Y be a stochastic public signal evolving in continuous time such that the change in Y over time is simply a drift term determined by the actions (a(t), b(t)) of the long and short run players, plus some Brownian motion; if you’re not familiar with Brownian motion, just imagine normally-distributed noise. A publicly observable state variable X evolves as a function of the public signal. This implies that the state variable gives no information about past actions by either type of player which is not already learned by knowing Y. Payoffs for the long run player are just discounted expected payoffs, which is the integral from 0 to infinity of a function depending on his action a(t), opponent action b(t) and the state X(t); payoffs for the short-run player are just myopic flow payoffs that are also a function of a(t), b(t) and X(t). We restrict to perfect public equilibria.

Here’s the cool result: there is a unique PPE in such a game, and it is Markovian! That is, strategies will only depend on the state variable X and not on the public signal of actions Y. Here’s an interpretation: because monitoring is imperfect, the short run player only imperfectly knows the past actions of the long run player. The normal punishment strategies used to get payoffs higher than stage-game payoffs in repeated games with perfect monitoring involve threatening to destroy some value for both players if either player deviates. With imperfect monitoring, such threats will need to be acted upon sometimes, simply because we only imperfectly detect deviations. But a number of earlier papers (Sannikov and Skrzypacz 2010 AER, for instance) have proven that in continuous time games where signals have Brownian distortion, such value burning cannot be an effective incentive scheme. So long-run incentives must come from somewhere other than Y – in this case, X. The state variables works where the public signal does not because the state variable directly affects current payoffs, rather than reflecting past actions and in so doing giving some potential for punishment. X is the carrot and Y in the stick, and the stick is simply not useful in continuous time. The Markovian nature of equilibria is super-useful for potential applied applications, since Markov solutions are far easier to solve for than general PPE equilibria, and this result allows a quick retort to that one referee who always wants a justification of the Markov restriction! The fact that continuous time games with imperfect monitoring generally allow for closed-form solutions of optimal actions for a given discount rate, something not always true in discrete time, is also nice.

What do equilibrium strategies look like for the long run player? He will follow cycles of reputation building and destruction, but such cycles will continue out to infinity. Basically, the long run player has a strong incentive to build up reputation if by doing so he can get the short run player to change to the “good”/”trusting”/etc. action relatively soon. Because monitoring and the evolution of the state variable is subject to uncertainty, the long-run player may build up the state variable even when he takes the bad action, and likewise may lower the state variable even when taking the good action.

An interesting model – it wouldn’t surprise me if this asset-like model of reputation usurped the Gang of Four model in the future. I’d also like to see some discussion of what this model means in terms of persistence of action: the model is almost, but not quite, amenable to a model where actions in a repeated game can only be changed slowly. Is there a way to fit that idea into the present formulation?

http://econ.ucsd.edu/~abohren/pdfs/Bohren.StochasticGamesContTime.pdf (Nov. 2011 Working Paper)

%d bloggers like this: