Category Archives: Theory of the Firm

“Dynamic Commercialization Strategies for Disruptive Technologies: Evidence from the Speech Recognition Industry,” M. Marx, J. Gans & D. Hsu (2014)

Disruption. You can’t read a book about the tech industry without Clayton Christensen’s Innovator’s Dilemma coming up. Jobs loved it. Bezos loved it. Economists – well, they were a bit more confused. Here’s the story at its most elemental: in many industries, radical technologies are introduced. They perform very poorly initially, and so are ignored by the incumbent. These technologies rapidly improve, however, and the previously ignored entrants go on to dominate the industry. The lesson many tech industry folks take from this is that you ought to “disrupt yourself”. If there is a technology that can harm your most profitable business, then you should be the one to develop it; take Amazon’s “Lab126″ Kindle skunkworks as an example.

There are a couple problems with this strategy, however (well, many problems actually, but I’ll save the rest for Jill Lepore’s harsh but lucid takedown of the disruption concept which recently made waves in the New Yorker). First, it simply isn’t true that all innovative industries are swept by “gales of creative destruction” – consider automobiles or pharma or oil, where the major players are essentially all quite old. Gans, Hsu and Scott Stern pointed out in a RAND article many years ago that if the market for ideas worked well, you would expect entrants with good ideas to just sell to incumbents, since the total surplus would be higher (less duplication of sales assets and the like) and since rents captured by the incumbent would be higher (less product market competition). That is, there’s no particular reason that highly innovative industries require constant churn of industry leaders.

The second problem concerns disrupting oneself or waiting to see which technologies will last. Imagine it is costly to investigate potentially disruptive technologies for the incumbent. For instance, selling mp3s in 2002 would have cannibalized existing CD sales at a retailer with a large existing CD business. Early on, the potentially disruptive technology isn’t “that good”, hence it is not in and of itself that profitable. Eventually, some of these potentially disruptive technologies will reveal themselves to actually be great improvements on the status quo. If that is the case, then, why not just let the entrant make these improvements/drive down costs/learn about market demand, and then buy them once they reveal that the potentially disruptive product is actually great? Presumably the incumbent even by this time still retains its initial advantage in logistics, sales, brand, etc. By waiting and buying instead of disrupting yourself, you can still earn those high profits on the CD business in 2002 even if mp3s had turned out to be a flash in the pan.

This is roughly the intuition in a new paper by Matt Marx – you may know his work on non-compete agreements – Gans and Hsu. Matt has also collected a great dataset from industry journals on every firm that ever operated in automated speech recognition. Using this data, the authors show that a policy by entrants of initial competition followed by licensing or acquisition is particularly common when the entrants come in with a “disruptive technology”. You should see these strategies, where the entrant proves the value of their technology and the incumbent waits to acquire, in industries where ideas are not terribly appropriable (why buy if you can steal?) and entry is not terribly expensive (in an area like biotech, clinical trials and the like are too expensive for very small firms). I would add that you also need complementary assets to be relatively hard to replicate; if they aren’t, the incumbent may well wind up being acquired rather than the entrant should the new technology prove successful!

Final July 2014 working paper (RePEc IDEAS). The paper is forthcoming in Management Science.

On Coase’s Two Famous Theorems

Sad news today that Ronald Coase has passed away; he was still working, often on the Chinese economy, at the incredible age of 102. Coase is best known to economists for two statements: that transaction costs explain many puzzles in the organization of society, and that pricing for durable goods presents a particular worry since even a monopolist selling a durable good needs to “compete” with its future and past selves. Both of these statements are horribly, horribly misunderstood, particularly the first.

Let’s talk first about transaction costs, as in “The Nature of the Firm” and “The Problem of Social Cost”, which are to my knowledge the most cited and the second most cited papers in economics. The Problem of Social Cost leads with its famous cattle versus crops example. A farmer wishes to grow crops, and a rancher wishes his cattle to roam where the crops grow. Should we make the rancher liable for damage to the crops (or restrain the rancher from letting his cattle roam at all!), or indeed ought we restrain the farmer from building a fence where the cattle wish to roam? Coase points out that in some sense both parties are causally responsible for the externality, that there is some socially efficient amount of cattle grazing and crop planting, and that if a bargain can be reached costlessly, then there is some set of side payments where the rancher and the farmer are both better off than having the crops eaten or the cattle fenced. Further, it doesn’t matter whether you give grazing rights to the cattle and force the farmer to pay for the “right” to fence and grow crops, or whether you give farming rights and force the rancher to pay for the right to roam his cattle.

This basic principle applies widely in law, where Coase had his largest impact. He cites a case where confectioner machines shake a doctor’s office, making it impossible for the doctor to perform certain examinations. The court restricts the ability of the confectioner to use the machine. But Coase points out that if the value of the machine to the confectioner exceeds the harm of shaking to the doctor, then there is scope for a mutually beneficial side payment whereby the machine is used (at some level) and one or the other is compensated. A very powerful idea indeed.

Powerful, but widely misunderstood. I deliberately did not mention property rights above. Coase is often misunderstood (and, to be fair, he does at points in the essay imply this misunderstanding) as saying that property rights are important, because once we have property rights, we have something that can “be priced” when bargaining. Hence property rights + externalities + no transaction costs should lead to no inefficiency if side payments can be made. Dan Usher famously argued that this is “either tautological, incoherent, or wrong”. Costless bargaining is efficient tautologically; if I assume people can agree on socially efficient bargains, then of course they will. The fact that side payments can be agreed upon is true even when there are no property rights at all. Coase says that “[i]t is necessary to know whether the damaging business is liable or not for damage since without the establishment of this initial delimitation of rights there can be no market transactions to transfer and recombine them.” Usher is correct: that statement is wrong. In the absence of property rights, a bargain establishes a contract between parties with novel rights that needn’t exist ex-ante.

But all is not lost for Coase. Because the real point of his paper begins with Section VI, not before, when he notes that the case without transaction costs isn’t the interesting one. The interesting case is when transaction costs make bargaining difficult. What you should take from Coase is that social efficiency can be enhanced by institutions (including the firm!) which allow socially efficient bargains to be reached by removing restrictive transaction costs, and particularly that the assignment of property rights to different parties can either help or hinder those institutions. One more thing to keep in mind about the Coase Theorem (which Samuelson famously argued was not a theorem at all…): Coase implicitly is referring to Pareto efficiency in his theorem, but since property rights are an endowment, we know from the Welfare Theorems that benefits exceeds costs is not sufficient for maximizing social welfare.

Let’s now consider the Coase Conjecture: this conjecture comes, I believe, from a very short 1972 paper, Durability and Monopoly. The idea is simple and clever. Let a monopolist own all of the land in the US. If there was a competitive market in land, the price per unit would be P and all Q units will be sold. Surely a monopolist will sell a reduced quantity Q2 less than Q at price P2 greater than P? But once those are sold, we are in trouble, since the monopolist still has Q-Q2 units of land. Unless the monopolist can commit to never sell that additional land, we all realize he will try to sell it sometime later, at a new maximizing price P3 which is greater than P but less than P2. He then still has some land left over, which he will sell even cheaper in the next period. Hence, why should anyone buy in the first period, knowing the price will fall (and note that the seller who discounts the future has the incentive to make the length between periods of price cutting arbitrarily short)? The monopolist with a durable good is thus unable to make rents. Now, Coase essentially never uses mathematical theorems in his papers, and you game theorists surely can see that there are many auxiliary assumptions about beliefs and the like running in the background here.

Luckily, given the importance of this conjecture to pricing strategies, antitrust, auctions, etc., there has been a ton of work on the problem since 1972. Nancy Stokey (article gated) has a famous paper written here at MEDS showing that the conjecture only holds strictly when the seller is capable of selling in continuous time and the buyers are updating beliefs continuously, though approximate versions of the conjecture hold when periods are discrete. Gul, Sonnenschein and Wilson flesh out the model more completely, generally showing the conjecture to hold in well-defined stationary equilibrium across various assumptions about the demand curve. McAfee and Wiseman show in a recent ReStud that even the tiniest amount of “capacity cost”, or a fee that must be paid in any period for X amount of capacity (i.e., the need to hire sales agents for the land), destroys the Coase reasoning. The idea is that in the final few periods, when I am selling to very few people, even a small capacity cost is large relative to the size of the market, so I won’t pay it; backward inducting, then, agents in previous periods know it is not necessarily worthwhile to wait, and hence they buy earlier at the higher price. It goes without saying that there are many more papers in the formal literature.

(Some final notes: Coase’s Nobel lecture is well worth reading, as it summarizes the most important thread in his work: “there [are] costs of using the pricing mechanism.” It is these costs that explain why, though markets in general have such amazing features, even in capitalist countries there are large firms run internally as something resembling a command state. McCloskey has a nice brief article which generally blames Stigler for the misunderstanding of Coase’s work. Also, while gathering some PDFs for this article, I was shocked to see that Ithaka, who run JSTOR, is now filing DMCA takedowns with Google against people who host some of these legendary papers (like “Problem of Social Cost”) on their academic websites. What ridiculousness from a non-profit that claims its mission is to “help the academic community use digital technologies to preserve the scholarly record.”)

“What Determines Productivity,” C. Syverson (2011)

Chad Syverson, along with Nick Bloom, John van Reenen, Pete Klenow and many others, has been at the forefront of a really interesting new strand of the economics literature: persistent differences in productivity. Syverson looked at productivity differences within 4-digit SIC industries in the US (quite narrow industries like “Greeting Cards” or “Industrial Sealants”) a number of years back, and found that in the average industry, the 90-10 ratio of total factor productivity plants was almost 2. That is, the top decile plant in the average industry produced twice as much output as the bottom decline plant, using exactly the same inputs! Hsieh and Klenow did a similar exercise in China and India and found even starker productivity differences, largely due a big left-tail of very low productivity firms. This basic result is robust to different measures of productivity, and to different techniques for identifying differences; you can make assumptions which let you recover a Solow residual directly, or run a regression (adjusting for differences in labor and capital quality, or not), or look at deviations like firms having higher marginal productivity of labor than the wage rate, etc. In the paper discussed in the post, Syverson summarizes the theoretical and empirical literature on persistent productivity differences.

Why aren’t low productivity firms swept from the market? We know from theory that if entry is allowed, potentially infinite and instantaneous, then no firm can remain which is less productive than the entrants. This suggests that persistence of inefficient firms must result from either limits on entry, limits on expansion by efficient firms, or non-immediate efficiency because of learning-by-doing or similar (a famous study by Benkard of a Lockwood airplane showed that a plant could produce a plane with half the labor hours after producing 30, and half again after producing 100). Why don’t inefficient firms already in the market adopt best practices? This is related to the long literature on diffusion, which Syverson doesn’t cover in much detail, but essentially it is not obvious to a firm whether a “good” management practice at another firm is actually good or not. Everett Rogers, in his famous “Diffusion of Innovations” book, refers to a great example of this from Peru in the 1950s. A public health consultant was sent for two years to a small village, and tried to convince the locals to boil their water before drinking it. The water was terribly polluted and the health consequences of not boiling were incredible. After two years, only five percent of the town adopted the “innovation” of boiling. Some didn’t adopt because it was too hard, many didn’t adopt because of a local belief system that suggested only the already-sick ought drink boiled water, some didn’t adopt because they didn’t trust the experience of the advisor, et cetera. Diffusion is difficult.

Ok, so given that we have inefficient firms, what is the source of the inefficiency? It is difficult to decompose all of the effects. Learning-by-doing is absolutely relevant in many industries – we have plenty of evidence on this count. Nick Bloom and coauthors seem to suggest that management practices play a huge role. They have shown clear correlation between “best practice” management and high TFP across firms, and a recent randomized field experiment in India (discussed before on this site) showed massive impacts on productivity from management improvements. Regulation and labor/capital distortions also appear to play quite a big role. On this topic, James Schmitz wrote a very interesting paper, published in 2005 in the JPE, on iron ore producers. TFP in Great Lakes ore had been more or less constant for many decades, with very little entry or foreign competition until the 1980s. Once Brazil began exporting ore to the US, labor productivity doubled within a handful of years, and capital and total factor productivity also soared. A main driver of the change was more flexible workplace rules.

Final version in 2011 JEP (IDEAS version). Syverson was at Kellogg recently presenting a new paper of his, with an all-star cast of coauthors, on the medical market. It’s well worth reading. Medical productivity is similarly heterogeneous, and since the medical sector is coming up on 20% of GDP, the sources of inefficiency in medicine are particularly important!

“Decentralization, Hierarchies and Incentives: A Mechanism Design Perspective,” D. Mookherjee (2006)

Lerner, Hayek, Lange and many others in the middle of the 20th century wrote exhaustively about the possibility for centralized systems like communism to perform better than decentralized systems like capitalism. The basic tradeoff is straightforward: in a centralized system, we can account for distributional concerns, negative externalities, etc., while a decentralized system can more effectively use local information. This type of abstract discussion about ideal worlds actually has great applications even to the noncommunist world: we often have to decide between centralization or decentralization within the firm, or within the set of regulators. I am continually amazed by how often the important Hayekian argument is misunderstood. The benefit of capitalism can’t have much to do with profit incentives per se, since (almost) every employee of a modern firm is a not an owner, and hence is incentivized to work hard only by her labor contract. A government agency could conceivably use precisely the same set of contracts and get precisely the same outcome as the private firm (the principle-agent problem is identical in the two cases). The big difference is thus not profit incentive but the use of dispersed information.

Mookherjee, in a recent JEL survey, considers decentralization from the perspective of mechanism design. What is interesting here is that, if the revelation principle applies, there is no reason to use any decentralized decisionmaking system over a centralized one where the boss tells everyone exactly what they should do. That is, any contract where I could subcontract to A who then subsubcontracts to B is weakly dominated by a contract where I get both A and B to truthfully reveal their types and then contract with each myself. The same logic applies, for example, to whether a firm should have middle management or not. This suggests that if we want to explain decentralization in firms, we have only two roads to go down: first, show conditions where decentralization is equally good to centralization, or second, investigate cases where the revelation principle does not apply. In the context of recent discussions on this site of what “good theory” is, I would suggest that this is a great example of a totally nonpredictive theorem (revelation) being quite useful (in narrowing down potential explanations of decentralization) to a specific set of users (applied economic theorists).

(I am assuming most readers of a site like this are familiar with the revelation principle, but if not, it is just a couple lines of math to prove. Assume agents have information or types a in a set A. If I write them a contract F, they will tell me their type is G(a)=a’ where G is just a function that, for all a in A, chooses a’ to maximize u(F(a’)), where u is the utility the agent gets from the contract F by reporting a’. The contract given to an agent of type a, then, leads to outcome F(G(a)). If this contract exists, then just let H be “the function concatenating F(G(.))”. H is now a “truthful” contract, since it is in each agent’s interest just to reveal their true type. That is, the revelation principle guarantees that any outcome from a mechanism, no matter how complicated or involving how many side payments or whatever, can be replicated by a contract where each agent just states what they know truthfully to the principal.)

First, when can we do just as well with decentralization and centralization even when the revelation principle applies? Consider choosing whether to (case 1) hire A who also subcontracts some work to B, or (case 2) just hiring both A and B directly. If A is the only one who knows B’s production costs, then A will need to get informational rents in case 1 unless A and B produce perfectly complementary goods: without such rents, A has an incentive to produce a larger share of production by reporting that B is a high cost producer. Indeed, A is essentially “extracting” information rents both from B and from the principal by virtue of holding information that the principal cannot access. A number of papers have shown that this problem can be eliminated if A is risk-neutral and has an absence of limited liability (so I can tax away ex-ante information rents), contracting is top-down (I contract with A before she learns B’s costs), and A’s production quantity is known (so I can optimally subsidize or tax this production).

More interesting is to consider when revelation fails. Mookherjee notes that the proof of the revelation principle requires 1) noncollusion among agents, 2) absence of communication costs, information processing costs, or contract complexity costs, and 3) no possibility of ex-post contract renegotiation by the principal. I note here that both the present paper, and the hierarchy literature in general, tends to shy away from ongoing relationships, but these are obviously relevant in many cases, and we know that in dynamic mechanism design, the revelation principle will not hold. The restricted message space literature is still rather limited, mainly because mechanism design theory at this point does not give any simple results like the revelation principle when the message space is restricted. It’s impossible to go over every result Mookherjee describes – this is a survey paper after all – but here is a brief summary. Limited message spaces are not a panacea since the restrictions required for limited message space to motivate decentralization, and particularly middle management, are quite strong. Collusion among agents does offer some promise, though. Imagine A and B are next to each other on an assembly line, and B can see A’s effort. The principal just sees whether the joint production is successful or not. For a large number of parameters, Baliga and Sjostrom (1998) proved that delegation is optimal: for example, pay B a wage conditional on output, and let him and A negotiate on the side how to divvy up that payment.

Much more work on the design of organizations is needed, that is for sure.

http://people.bu.edu/dilipm/publications/jeldecsurvrev.pdf (Final working paper – published in June 2006 JEL)

“Collaborating,” A. Bonatti & J. Horner (2011)

(Apologies for the long delay since the last post. I’ve been in that tiniest of Southeast Asian backwaters, East Timor, talking to UN and NGO folks about how the new democracy is coming along. The old rule of thumb is that you need 25 years of free and fair elections before society consolidates a democracy, but we still have a lot to learn about how that process takes place. I have some theoretical ideas about how to avoid cozy/corrupt links between government ministers and the private sector in these unconsolidated democracies, and I wanted to get some anecdotes which might guide that theory. And in case you’re wondering: I would give pretty high odds that, for a variety of reasons, the Timorese economy is going absolutely nowhere fast. Now back to the usual new research summaries…)

Teamwork is essential, you’re told from kindergarten on. But teamwork presents a massive moral hazard problem: how do I make sure the other guy does his share? In the static setting, Alchain-Demsetz (1972) and a series of papers by Holmstrom (May He Win His Deserved Nobel) have long ago discussed why people will free ride when their effort is hidden, and what contracts can be written to avoid this problem. Bonatti and Horner make the problem dynamic, and with a few pretty standard tricks from optimal control develop some truly counterintuitive results.

The problem is the following. N agents are engaged in working on a project which is “good” with probability p. Agents exert costly effort continuously over time. Depending on the effort exerted by agents at any given time, a breakthrough occurs with some probability if the project is good, but never occurs if the project is bad. Over time, given effort along the equilibrium path, agents become more and more pessimistic about the project being good if no breakthrough occurs. The future is discounted. Agents only observe their own effort choice (but have correct beliefs about the effort of others in equilibrium). This means that off-path, beliefs of effort exertion are not common knowledge: if I deviate and work harder now, and no breakthrough occurs, then I am more pessimistic than others about the goodness of the project since I know, and they don’t, that a higher level of effort was put in.

In this setting, not only do agents shirk (hoping the other agents will pick up the slack), but they also procrastinate. Imagine a two-period world. In a two period world, I can shift some effort to period 2, in the hope that the other agent’s period 1 effort will lead to a success. I don’t want to work extremely hard in period 1 when all that this leads to is wasted effort because my teammate has already solved the problem in that period. Note that this procrastination motive is not optimal when the team is of size 1: you need a coauthor to justify your slacking! Better monitoring here does not help, surprisingly. If I can see how much effort my opponent puts in each period, then what happens? If I decrease my period 1 effort, and this is observable by both agents, then my teammate will not be so pessimistic about the success of the project in period 2. Hence, she will work harder in period 2. Hence, each agent has an incentive to work less in period 1 vis-a-vis the hidden action case. (Of course, you may wonder why this is an equilibrium; that is, why doesn’t the teammate play grim trigger and punish me for shirking? It turns out there are a number of reasonable equilibria in the case with observable actions, some of which give higher welfare and some of which give lower welfare than under hidden action. The point is just that allowing observability doesn’t necessarily help things.)

So what have we learned? Three things in particular. First, work in teams gives extra incentive to procrastinate compared to solo work. Second, this means that setting binding deadlines can be welfare improving; the authors further show that the larger the team, the tighter the deadline necessary. Third, letting teams observe how hard the other is working is not necessarily optimal. Surely observability by a principal would be welfare-enhancing – the contract could be designed to look like dynamic Holmstrom – but observability between the agents is not necessarily so. Interesting stuff.

http://cowles.econ.yale.edu/P/cd/d16b/d1695.pdf (Final Cowles Foundation WP – paper published in April 2011 AER)

“Organizations as Information Processing Systems,” R. Daft & R. Lengel (1983)

I don’t believe this paper is well-known by economists, but it has been hugely influential for management and media studies. The theory in this paper is qualitative in the same way economic theory is, but is not mathematical. In this post, I’ll try to reinterpret the main ideas mathematically.

Firms face two primary types of uncertainty. First, the outside environment is uncertain. Second, the internal environment is uncertain. When speech is vague, a manager may misinterpret what the true state of the world is, or subordinates may misinterpret the goals of the organization. When speech is precise, it can be very costly to interpret. Indeed, precise speech about unclear goals is basically worthless: two subordinates may precisely state the answer to two different problems, both of which are different from what the manager wanted to know.

Choice of media, then, can vary. Sometimes speech within an organization is very formal: quantitative models, memos, etc. Sometimes it is informal: face-to-face meetings, informal legends, company lore. The informal speech is able to discuss a broader set of ideas, but with greater ambiguity. The formal speech can present specific ideas exactly, but nothing more. This tradeoff roughly implies the following: when the purpose of a discussion is equivocal or unclear, informal speech should be used to “get us on the same page”. When a discussion involves something routine, precise speech can be used. This has a number of implications: for example, informal communication will be most common at the goal setting stage, or when two different departments are beginning to work together on a task, but formal communication will be most common within a division or after goals have been agreed upon by all parties or when the external environment has less uncertainty.

Clearly, the intersection of language and economics is far more general. For example, equivocality is often introduced on purpose: people speak vaguely, for example, in order than common knowledge does not develop. An example, after a first date: “Would you like to come up to my apartment for some coffee?” Further, vague and precise speech are more than simply vague or precise, but rather are vague and precise in particular ways. Poetry is quoted rather than a meaningless stream of words, for example. Neither the authors or I have much to say on these extensions, but it is definitely an open field right now for some interested researcher.

How might you model the ideas of the present paper mathematically? (Of course, you might ask why these ideas should be modeled mathematically anyway, but I have discussed many times here why social science theory ought be formal, and to the extent that it’s formal, the tools of mathematical logic allow the cleanest possible transmission of ideas and derivation of unexpected consequences, so I won’t rehash those arguments here. Indeed, the whole “should we be formal” discussion seems a bit too meta in the context of this post…) Let the relevant true state be a number in [0,1]^n. Let transmission of the exact state be increasing in its dimension, perhaps linearly. Let transmission of imprecise information be increasing less than linearly, perhaps logarithmically. Imprecise states are interpreted by the receiver with error (something like the truncated exponential version of a normal distribution to ensure we stay in [0,1]^n). Loss functions of the final decision made by the receiver depend on distance from the true state. What should a manager do? Well, on simple decisions where the relevant state is only a point on the line segment [0,1], getting the exact state is cheap, so subordinates should send the manager fairly precise information like a statistical estimate in a memo. On complex decisions, where the relevant state is a point in the 100-dimension [0,1] hypercube, learning the true state will be very expensive (it may require the manager to read a 1000 page quantitative report, for instance), but learning an approximate state will be relatively cheap (it may involve some face-to-face conversations). Once the model is formalized like this, then we can answer questions like “Should management communicate via a hierarchy or not?” I have some plans for work along these lines, using some ideas about transmitting counterfactuals given a set of information partitions, and would definitely appreciate comments concerning how to model this type of media richness.

http://www.dtic.mil/cgi-bin/GetTRDoc?AD=ADA128980&Location=U2&doc=GetTRDoc.pdf (Working paper)

“Hierarchies and the Organization of Knowledge in Production,” L. Garicano (2000)

The organization of firms, principally but not always in a hierarchy, is related to incentive constraint problems. But this is not all. Informational problems, particularly the transmission of relevant information across the firm to specific individuals who need it, and the collection of knowledge at specific areas in the firm, are also important. Focusing only on that issue, and ignoring incentives, what shape does the firm take?

Garicano considers the following model. Let a firm hire a group of homogenous workers. There is a continuum of “problems” which workers will need to solve. If a worker knows the answer, he solves the problem himself, else he asks someone else for the answer. Workers, after hiring but before production starts, are trained at cost to solve certain problems at a cost linear in the measure of solutions learned. When a solution is not evident and must be asked for, the receiver of the question bears a cost, even if she does not know the answer to that question; time spent answering queries is time not spent producing output. A given worker only has a unit interval of time to spend working or answering questions. Problems arrive at the firm every period according to a known distribution F(Z), where Z is reordered such that the most common problems occur on the left side of the distribution.

In this simple model, one homogenous group of production workers is trained to solve only the easiest problems, and other groups of workers are trained in successively harder problems. Production workers first try to solve the problem themselves, then ask the first level of non production workers for a solution to problems they can’t solve, then ask the second level, etc. This can be proven in four parts; there is nothing tricky. Essentially, divide all the workers into classes where a class specifies what knowledge is learned and what order other classes are asked about problems. First, show that only one class ever produces: if two groups produce output, and the output they produce is not the same, then workers in the less productive group specialize in knowledge to support the more productive group’s production. Second, knowledge never overlaps across classes. since overlapping knowledge is costly and never used. Third, production workers solve the easiest problems, the first level of management solves the slightly harder problems, and so on, with production workers asking levels of management in order until they learn the answer to a problem. If this weren’t true, we could swap an interval of the knowledge held by any two levels, keeping learning costs constant, but now letting easier questions be answered “earlier”, hence reducing communication costs. Fourth, the organization is a hierarchy with fewer workers at higher levels. That final result essentially comes from the fact that only really uncommon problems are solved by workers at the top, hence not many of them are needed.

Making assumptions about the distribution of problem difficult, Garicano also solves for a number of comparative statics when learning costs and communication costs change, or when the production process becomes less predictable in the sense that the distribution of problems shifts rightward. The exact results here rely partly on strong assumptions about the size of the organization, so I omit them here (essentially, everything is proven conditional on a very large firm and exponentially distributed problem difficulty, though the intution behind most of the results probably wouldn’t change were these assumptions to be loosened).

Three final thoughts, related to my current research: is getting knowledge to production workers really the most salient informational issue in firms? This seems backward. One might think that, to the extent firms are organized for knowledge aggregation and transmission reasons, the most important decisions are the ones faced by the boss/president/lead prosecutor, and it is she who uses information held by others in the firm in order to inform her decision. Second, is not contingency-related decisionmaking a relevant concern? That is, often, firms do not even know what problems will arise. Generally, managers specialize in making decisions under those circumstances, solving problems who existence is probably unknown when the firm begins operation. Third, is intrafirm training, particularly at the manager level, that important in real firms? Again, one might imagine that potential workers arrive at the firm endowed with certain knowledge, and then are placed into a role in the firm conditional on that knowledge. This isn’t the say that training doesn’t happen, but surely training paid for by the firm is not always the most common way relevant knowledge is acquired.

http://www2.wiwi.hu-berlin.de/institute/organisation/Lehre/WS_2004_05/Adv_Topic_04-05/material/gar_00.pdf (Final JPE version)

“How to Count to One Thousand,” J. Sobel (1992)

You have a stack of money, supposedly containing one thousand coins. You want to make sure that count is accurate. However, with probability p, you will make a mistake at every step of the counting, and will know you’ve made the mistake (“five hundred and twelve, five hundred and thirteen, five hundred and….wait, how many was I at?). What is the optimal way to count the coins? And what does this have to do with economics?

The optimal way to count to one thousand turns out to be precisely what intuition tells you. Count a stack of coins, perhaps forty of them, set that stack aside, count another forty, set that aside, and so on, then count at the end to make sure you have twenty-five stacks. If your probability of making a mistake is very high, you may wish only to count ten coins at a time, set them aside, then count ten stacks of ten, setting those superstacks aside, then counting at the end to make sure you have ten stacks of one hundred. The higher the number of coins, and the higher your probability of making a mistake, the more “levels” you will need to build. Proving this is a rather straightforward dynamic programming exercise.

Imagine you’ve hired workers to perform these tasks. If tasks cannot be subdivided, the fastest workers should be assigned to count the first layer of stacks (since they will be repeating the task most often after mistakes are made) and the most accurate are assigned to do the later counts (since they “destroy more value” when a mistake is made, as in Kremer’s O-Ring paper). The counting process will suffer from decreasing returns to scale – the more coins to count, the more value is destroyed on average by a mistake. With optimal subdivision, the number of extra counts needed to make sure the number of stacks is accurate grows slower than the number of coins to be counted, and the optimal stack size is independent of the total number of coins, so counting technology has almost-constant returns to scale.

The basic idea here tells us something about the boundary and optimal organization of a firm, but in a very stylized way. If workers only imperfectly know when mistakes are made, the problem is more difficult, and is not solved by Sobel. If workers definitely do not know when a mistake is made, there still can be gains to subdividing. Sobel mentions a parable about prisoners told by Rubinstein. There are two prisoners who want to coordinate an escape 89 days from now. Both prisoners can see the sun out their window. The odds of one of the two mistaking the day count after that long is quite high, causing a lack of coordination. If both prisoners can also see the moon, though, they need only count three full moons plus five days.

http://www.jstor.org/stable/pdfplus/2234847.pdf?acceptTC=true (JSTOR gated version – I couldn’t find an ungated copy. Prof. Sobel, hire one of your students to put all of your old papers up on your website!)

“Secrets,” D. Ellsberg (2002)

Generally, the public won’t know even the most famous economists – mention Paul Samuelson to your non-economist friends and watch the blank stares – but a select few manage to enter the zeitgeist through something other than their research. Friedman had a weekly column and a TV series, Krugman is regularly in the New York Times, and Greenspan, Summers and Romer, among many others, are famous for their governmental work. These folks at least have their fame attributable to their economics, if not their economic research. The real rare trick is being both a famous economist and famous in another way. I can think of two.

First is Paul Douglas, of the Cobb-Douglas production function. Douglas was a Chicago economist who went on to become a long-time U.S. Senator. MLK Jr. called Douglas “the greatest of all Senators” for his work on civil rights. In ’52, with Truman’s popularity at a nadir, Douglas was considered a prohibitive favorite for the Democratic nomination would he have run. I think modern-day economists would very much like Douglas’ policies: he was a fiscally conservative, socially liberal reformist who supported Socialists, Democrats and Republicans at various times, generally preferring the least-corrupt technocrat.

The other famous-for-non-economics-economist, of course, is Daniel Ellsberg. Ellsberg is known to us for the Ellsberg Paradox, which in many ways is more important than the work of Tversky and Kahneman for encouraging non-expected utility derivations by decision theorists. Ellsberg would have been a massive star had he stayed in econ: he got his PhD in just a couple years, published his undergrad thesis (“the Theory of the Reluctant Duelist”) in the AER, his PhD thesis in the QJE, and was elected to the Harvard Society of Fellows, joining Samuelson and Tobin in that still-elite group.

As with many of the “whiz kids” of the Kennedy and Johnson era, he consulted for the US government, both at RAND and as an assistant to the Undersecretary of Defense. Government was filled with theorists at the time – Ellsberg recounts meetings with Schelling and various cabinet members where game theoretic analyses were discussed. None of this made Ellsberg famous, however: he entered popular culture when he leaked the “Pentagon Papers” early in the Nixon presidency. These documents were a top secret, internal government report on presidential decisionmaking in Vietnam going back to Eisenhower, and showed a continuous pattern of deceit and overconfidence by presidents and their advisors.

Ellsberg’s description of why he leaked the data, and the consequences thereof, are interesting in and of themselves. But what interests me in this book – from the perspective of economic theory – is what the Pentagon Papers tell us about secrecy within organizations. Governments and firms regularly make decisions, as an entity, where optimal decisionmaking depends on correctly aggregating information held by various employees and contractors. Standard mechanism design is actually very bad at dealing with desires for secrecy within this context. That is, imagine that I want to aggregate information but I don’t want to tell my contractors what I’m going to use it for. A paper I’m working on currently says this goal is basically hopeless. A more complicated structure is one where a firm has multiple levels (in a hierarchy, let’s say), and the bosses want some group of low-level employees to take an action, but don’t want anyone outside the branch of the organizational tree containing those employees to know that such an action was requested. How can the boss send the signal to the low-level employees without those employees thinking their immediate boss is undermining the CEO? Indeed, something like this problem is described in Ellsberg’s book: Nixon and Kissinger were having low-level soldiers fake flight reports so that it would appear that American plans were not bombing Laos. The Secretary of Defense, Laird, did not support this policy, so Nixon and Kissinger wanted to keep this secret from him. The jig was up when some soldier on the ground contacted the Pentagon because he thought that his immediate supervisors were bombing Laos against the wishes of Nixon!

In general, secrecy concerns make mechanism problems harder because they can undermine the use of the revelation principle – we want the information transmitted without revealing our type. More on this to come. Also, if you can think of any other economists who are most famous for their non-economic work, like Douglas and Ellsberg, please post in the comments.

(No link – Secrets is a book and I don’t see it online. Amazon has a copy for just over 6 bucks right now, though).

“Who Will Monitor the Monitor?,” D. Rahman (2010)

In any organization, individuals can shirk by taking advantage of the fact that their actions are private; only a stochastic signal of effort can be observed, for instance. Because of this, firms and governments hire monitors to watch, imperfectly, what workers are doing, and to punish the workers if it is believed that the workers are taking actions contrary to what the bosses desire. Even if the monitor observed signals that are not available to the bosses, as long as that observation is free, the monitor has no incentive to lie. But what if monitoring is costly? How can we ensure the monitor has the right incentives to do his job? That is, who shall monitor the monitor? The answer, clearly, isn’t a third level of monitors, since this just pushes the problem back one more level.

In a very interesting new paper, David Rahman extends Holmstrom’s (who should share the next Nobel with Milgrom; it’s nuts they both haven’t won yet!) group incentives. The idea of group incentives is simple, and it works when monitor’s statements are verifiable. Say it costs 1 to monitor and the agent’s disutility from work is also 1. The principle doesn’t mind an equilibrium of (monitor, work), but better would be the equilibrium (don’t monitor, work), since then I don’t need to pay a monitor to watch my workers. The worker will just shirk if no one watches him, though. Group penalties fix this. Tell the monitor to check only one percent of the time. If he reports (verifiably) that the worker shirked, nobody gets paid. If he reports (verifiably) that the worker worked, the monitor gets $1.02 and the worker gets $100. By increasing the payment to the worker for “good news”, the firm can get arbitrarily close to the payoffs from the “never monitor, work” equilibrium.

That’s all well and good, but what about when the monitor’s reports are not verifiable? In that case, the monitor would never actually check but would just report that the worker worked, and the worker would always shirk. We can use the same idea as in Holmstrom, though, and sometimes ask the worker to shirk. Make payments still have group penalties, but pay the workers only when the report matches the recommended action – that is, pay for “monitor/shirk” and “monitor/work”. For the same reason as in the above example, the frequency of monitoring and shirking can both be made arbitrarily small, with the contract still incentive compatible (assuming risk neutrality, of course).

More generally, a nice use of the Minimax theorem shows that we check for deviations from the bosses’ recommended actions for the monitor and the agent one by one – that is, we needn’t check for all deviations simultaneously. So-called “detectable” deviations are shut down by contracts like the one in the example above. Undetectable deviations by the monitor still fulfill the monitoring role – by virtue of being undetectable, the agent won’t notice the deviation either – but it turns out that finiteness of the action space is enough to save us from an infinite regress of profitable undetectable deviations, and therefore a strategy like the one in the example above does allow for “almost” optimal costly and unverifiable monitoring.

Two quick notes: First, collusion, as Rahman notes, can clearly take place in this model (each agent just tells the other when he is told to monitor or to shirk), so it really speaks only to situations where we don’t expect such collusion. Second, this model is quite nice because it clarifies, again, that monitoring power needn’t be vested in a principal. That is, the monitor here collects no residual profits or anything of that sort – he is merely a “security guard”. Separating the monitoring role of agents in a firm from the management role is particularly important when we talk about more complex organizational forms, and I think it’s clear that the question of how to do so is far from being completely answered.

http://www.econ.umn.edu/~dmr/monitor.pdf (WP – currently R&R at AER and presumably will wind up there…)

Follow

Get every new post delivered to your Inbox.

Join 195 other followers

%d bloggers like this: