Author Archives: afinetheorem

“Minimal Model Explanations,” R.W. Batterman & C.C. Rice (2014)

I unfortunately was overseas and wasn’t able to attend the recent Stanford conference on Causality in the Social Sciences; a friend organized the event and was able to put together a really incredible set of speakers: Nancy Cartwright, Chuck Manski, Joshua Angrist, Garth Saloner and many others. Coincidentally, a recent issue of the journal Philosophy of Science had an interesting article quite relevant to economists interested in methodology: how is it that we learn anything about the world when we use a model that is based on false assumptions?

You might think of there being five classes which make up nearly every paper published in the best economics journals. First are pure theoretical exercises, or “tool building”, such as investigations of the properties of equilibria or the development of a new econometric technique. Second are abstract models which are meant to speak to an applied problem. Third are empirical papers whose primary quantities of interest are the parameters of an economic model (broadly, “structural papers”, although this isn’t quite the historic use of the term). Fourth are empirical papers whose primary quantities of interest are causal treatment effects (broadly, “reduced form papers”, although again this is not the historic meaning of that term). Fifth are descriptive work or historical summary. Lab and field experiments, and old-fashioned correlation analysis, all fit into that framework fairly naturally as well. It is the second and third classes which seem very strange to many non-economists. We write a model which is deliberately abstract and which is based on counterfactual assumptions about human or firm behavior, but nonetheless we feel that these types of models are “useful” or “explanatory” in some sense. Why?

Let’s say that in the actual world, conditions A imply outcome B via implication C (perhaps causal, perhaps as part of a simultaneous equilibrium, or whatever). The old Friedman 1953 idea is that a good model predicts B well across all questions with which we are concerned, and the unreality of the assumptions (or implicitly of the logical process C) are unimportant. Earlier literature in the philosophy of science has suggested that “minimal models” explain because A’, a subset of A, are sufficient to drive B via C; that is, the abstraction merely strips away any assumptions that are not what the philosopher Weisberg calls “explanatorily privileged causal factors.” Pincock, another philosopher, suggests that models track causes, yes, but also isolate factors and connect phenomena via mathematical similarity. That is, the model focuses on causes A’, subset of A, and on implications C’, subset of C, which are of special interest because they help us see how the particular situation we are analyzing is similar to ones we have analyzed before.

Batterman and Rice argue that these reasons are not why minimal models “work”. For instance, if we are to say that a model explains because it abstracts only to the relevant causal factors, the question is how we know what those factors are in advance of examining them. Consider Fisher’s sex ratio model: why do we so frequently see 1:1 sex ratios in nature? He argues that there is a fitness advantage for those whose offspring tend toward the less common sex, since they find it easier to procreate. In the model, parents choose sex of offspring, reproduction is asexual (does not involve matching), no genetic recombination occurs, there are no random changes to genes, etc: many of the assumptions are completely contrary to reality. Why, then, do we think the model explains? It explains because there is a story about why the omitted factors are irrelevant to the behavior being explained. That is, in the model assumptions D generate E via causal explanation C, and there is a story about why D->E via C and A->B via C operate in similar ways. Instead of simply assuming that certain factors are “explanatorily privileged”, we show that that model factors affect outcomes in similar ways to how more complicated real world objects operate.

Interesting, but I feel that this still isn’t what’s going on in economics. Itzhak Gilboa, the theorist, in a review of Mary Morgan’s delightful book The World in the Model, writes that “being an economic theorist, I have been conditioned to prefer elegance over accuracy, insight over detail.” I take that to mean that what economic theorists care about are explanatory factors or implications C’, subset of C. That is, the deduction is the theory. Think of Arrow’s possibility theorem. There is nothing “testable” about it; certainly the theory does not make any claim about real world outcomes. It merely shows the impossibility of preference aggregation satisfying certain axioms, full stop. How is this “useful”? Well, the usefulness of this type of abstract model depends entirely on the user. Some readers may find such insight trivial, or uninteresting, or whatever, whereas others may find such an exploration of theoretical space helps clarify their thinking about some real world phenomenon. The whole question of “Why do minimal models explain/work/predict” is less interesting to me than the question “Why do minimal models prove useful for a given reader“.

The closest philosophical position to this idea is some form of Peirce-style pragmatism – he actually uses a minimal model himself in exactly this way in his Note on the Economy of the Theory of Research! I also find it useful to think about the usefulness of abstract models via Economic Models as Analogies, an idea pushed by Gilboa and three other well-known theorists. Essentially, a model is a case fully examined. Examining a number of cases in the theoretical world, and thinking formally through those cases, can prove useful when critiquing new policy ideas or historical explanations about the world. The theory is not a rule – and how could it be given the abstractness of the model – but an element in your mental toolkit. In physics, for example, if your engineer proposes spending money building a machine that implies perpetual motion, you have models of the physical world in your toolkit which, while not being about exactly that machine, are useful when analyzing how such a machine would or would not work. Likewise, if Russian wants to think about how it should respond to a “sudden stop” in investment and a currency outflow, the logical consequences of any real world policy are so complex that it is useful to have thought through the equilibrium implications of policies within the context of toy models, even if such models are only qualitatively useful or only useful in certain cases. When students complain, “but the assumptions are so unrealistic” or “but the model can’t predict anything”, you ought respond that the model can predict perfectly within the context of the model, and it is your job as the student, as the reader, to consider how understanding the mechanisms in the model help you think more clearly about related problems in the real world.

Final version in Philosophy of Science, which is gated, I’m afraid; I couldn’t find an ungated draft. Of related interest in the philosophy journals recently is Kevin Davey’s Can Good Science Be Logically Inconsistent? in Synthese. Note that economists use logically inconsistent reasoning all the time, in that we use model with assumption A in context B, and model with assumption Not A in context C. If “accepting a model” means thinking of the model as “justified belief”, then Davey provides very good reasons to think that science cannot be logically inconsistent. If, however, “accepting a model” meaning “finding it useful as a case” or “finding the deduction in the model of inherent interest”, then of course logically inconsistent models can still prove useful. So here’s to inconsistent economics!

“What Do Small Businesses Do?,” E. Hurst & B. Pugsley (2011)

There are a huge number of policies devoted toward increasing the number of small businesses. The assumption, it seems, is that small businesses are generating more spillovers than large businesses, in terms of innovation, increases in the labor match rate, or indirect welfare benefits from creative destruction. Indeed, politicians like to think of these “Joe the Plumber” types as heroic job creators, although I’m not sure what that could possibly mean since the long run level of unemployment is constant and unrelated the amount of entrepreneurial churn in whatever economic model or empirical data you wish to investigate.

These policies beg the question: are new firms actually quick-growing, innovative concerns, or are they mainly small restaurants, doctor’s offices and convenience stores? The question is important since it is tough to see why the tax code should privilege, say, an independent convenience store over a new corporate-run branch – if anything, the independent is less innovative and less likely to grow in the future. Erik Hurst and Ben Pugsley do a nice job of generating stylized facts on these issues using a handful of recent surveys of firm outcomes and the stated goals of the owners of new firms.

The evidence is pretty overwhelming that most new firms are not the heroic, job-creating innovator. Among firms with less than 20 employees, most are concentrated in a very small number of industries like construction, retail, restaurants, etc, and this concentration is much more evident than among larger firms. Most small firms never hire more than a couple employees, and this is true even among firms that survive five or ten years. Among new firms, only 2.7% file for a patent within four years, and only 6-8% develop any proprietary product or technique at all.

It is not only in outcomes, but in expectations as well where it seems small businesses are not rapidly-growing innovative firms. At their origin, 75% of small business owners report no desire to grow their business, nonpecuniary reasons (such as “to be my own boss”) are the most common reason given to start a business, and only 10% plan to develop any new product or process. That is, most small businesses are like the corner doctor’s office or small plumbing shop. Starting a business for nonpecuniary reasons is also correlated with not wanting to grow, not wanting to innovate, and not actually doing so. They are small and non-innovative because they don’t want to be big, not because they fail at trying to become big. It’s also worth mentioning that hardly any small business owners in the U.S. sample report starting a business because they couldn’t find a job; the opposite is true in developing countries.

These facts make it really hard to justify a lot of policy. For instance, consider subsidies that only accrue to businesses below a certain size. This essentially raises the de facto marginal tax rate on growing firms (since the subsidy disappears once the firm grows above a certain size), even though rapidly growing small businesses are exactly the type we presumably are trying to subsidize. If liquidity constraints or other factors limiting firm entry were important, then the subsidies might still be justified, but it seems from Hurst and Pagsley’s survey that all these policies will do is increase entry among business owners who want to be their own boss and who never plan to hire or innovate in any economically important way. A lot more work here, especially on the structural/theoretical side, is needed to develop better entrepreneurial policies (I have a few thoughts myself, so watch this space).

Final Working Paper (RePEc IDEAS) which was eventually published in the Brookings series. Also see Haltiwanger et al’s paper showing that it’s not small firms but young firms which are engines of growth. I posted on a similar topic a few weeks ago, which may be of interest.

Labor Unions and the Rust Belt

I’ve got two nice papers for you today, both exploring a really vexing question: why is it that union-heavy regions of the US have fared so disastrously over the past few decades? In principle, it shouldn’t matter: absent any frictions, a rational union and a profit-maximizing employer ought both desire to take whatever actions generate the most total surplus for the firm, with union power simply affecting how those rents are shared between management, labor and owners. Nonetheless, we notice empirically a couple of particularly odd facts. First, especially in the US, union-dominated firms tend to limit adoption of new, productivity-enhancing technology; the late adoption of the radial tire among U.S. firms is a nice example. Second, unions often negotiate not only about wages but about “work rules”, insisting upon conditions like inflexible employee roles. A great example here is a California longshoremen contract which insisted upon a crew whose sole job was to stand and watch while another crew did the job. Note that preference for leisure can’t explain this, since surely taking that leisure at home rather than standing around the worksite would be preferable for the employees!

What, then, might drive unions to push so hard for seemingly “irrational” contract terms, and how might union bargaining power under various informational frictions or limited commitment affect the dynamic productivity of firms? “Competition, Work Rules and Productivity” by the BEA’s Benjamin Bridgman discusses the first issue, and a new NBER working paper, “Competitive Pressure and the Decline of the Rust Belt: A Macroeconomic Analysis” by Alder, Lagakos and Ohanian covers the second; let’s examine these in turn.

First, work rules. Let a union care first about keeping all members employed, and about keeping wage as high as possible given full employment. Assume that the union cannot negotiate the price at which products are sold. Abstractly, work rules are most like a fixed cost that is a complete waste: no matter how much we produce, we have to incur some bureaucratic cost of guys standing around and the like. Firms will set marginal revenue equal to marginal cost when deciding how much to produce, and at what price that production should be sold. Why would the union like these wasteful costs?

Let firm output given n workers just be n-F, where n is the number of employees, and F is how many of them are essentially doing nothing because of work rules. The firm chooses price p and the number of employees n given demand D(p) and wage w to maximize p*D(p)-w*n, subject to total production being feasible D(p)=n-F. Note that, as long as total firm profits under optimal pricing exceed F, the firm stays in business and its pricing decision, letting marginal revenue equal marginal cost, is unaffected by F. That is, the optimal production quantity does not depend on F. However, the total amount of employment does depend on F, since to produce quantity D(p) you need to employ n-F workers. Hence there is a tradeoff if the union only negotiates wages: to employ more people, you need a lower wage, but using wasteful work rules, employment can be kept high even when wages are raised. Note also that F is limited by the total rents earned by the firm, since if work rules are particularly onerous, firms that are barely breaking even without work rules will simply shut down. Hence in more competitive industries (formally, when demand is less elastic), work rules are less likely to imposed by unions. Bridgman also notes that if firms can choose technology (output is An-F, where A is the level of technology), then unions will resist new technology unless they can impose more onerous work rules, since more productive technology lowers the number of employees needed to produce a given amount of output.

This is a nice result. Note that the work rule requirements have nothing to do with employees not wanting to work hard, since work rules in the above model are a pure waste and generate no additional leisure time for workers. Of course, this result really hinges on limiting what unions can bargain over: if they can select the level of output, or can impose the level of employment directly, or can permit lump-sum transfers from management to labor, then unionized firms will produce at the same productivity at non-unionized firms. Information frictions, among other worries, might be a reason why we don’t see these types of contracts at some unionized firms. With this caveat in mind, let’s turn to the experience of the Rust Belt.

The U.S. Rust Belt, roughly made up of states surrounding the Great Lakes, saw a precipitous decline from the 1950s to today. Alder et al present the following stylized facts: the share of manufacturing employment in the U.S. located in the Rust Belt fell from the 1950s to the mid-1980s, there was a large wage gap between Rust Belt and other U.S. manufacturing workers during this period, Rust Belt firms were less likely to adopt new innovations, and labor productivity growth in Rust Belt states was lower than the U.S. average. After the mid-1980s, Rust Belt manufacturing firms begin to look a lot more like manufacturing firms in the rest of the U.S.: the wage gap is essentially gone, the employment share stabilizes, strikes become much less common, and productivity growth is similar. What happened?

In a nice little model, the authors point out that output competition (do I have lots of market power?) and labor market bargaining power (are my workers powerful enough to extract a lot of my rents?) interact in an interesting way when firms invest in productivity-increasing technology and when unions cannot commit to avoid a hold-up problem by striking for a better deal after the technology investment cost is sunk. Without commitment, stronger unions will optimally bargain away some of the additional rents created by adopting an innovation, hence unions function as a type of tax on innovation. With sustained market power, firms have an ambiguous incentive to adopt new technology – on the one hand, they already have a lot of market power and hence better technology will not accrue too many more sales, but on the other hand, having market power in the future makes investments today more valuable. Calibrating the model with reasonable parameters for market power, union strength, and various elasticities, the authors find that roughly 2/3 of the decline in the Rust Belt’s manufacturing share can be explained by strong unions and little output market competition decreasing the incentive to invest in upgrading technology. After the 1980s, declining union power and more foreign competition limited both disincentives and the Rust Belt saw little further decline.

Note again that unions and firms rationally took actions that lowered the total surplus generated in their industry, and that if the union could have committed not to hold up the firm after an innovation was adopted, optimal technology adoption would have been restored. Alder et al cite some interesting quotes from union heads suggesting that the confrontational nature of U.S. management-union relations led to a belief that management figures out profits, and unions figure out to secure part of that profit for their members. Both papers discussed here show that this type of division, by limiting the nature of bargains which can be struck, can have calamitous effects for both workers and firms.

Bridgman’s latest working paper version is here (RePEc IDEAS page); the latest version of Adler, Lagakos and Ohanian is here (RePEc IDEAS). David Lagakos in particular has a very nice set of recent papers about why services and agriculture tend to have such low productivity, particularly in the developing world; despite his macro background, I think he might be a closet microeconomist!

Nobel Prize 2014: Jean Tirole

A Nobel Prize for applied theory – now this something I can get behind! Jean Tirole’s prize announcement credits him for his work on market power and regulation, and there is no question that he is among the leaders, if not the world leader, in the application of mechanism design theory to industrial organization; indeed, the idea of doing IO in the absence of this theoretical toolbox seems so strange to me that it’s hard to imagine anyone had ever done it! Economics is sometimes defined by a core principle that agents – people or firms – respond to incentives. Incentives are endogenous; how my bank or my payment processor or my lawyer wants to act depends on how other banks or other processors or other prosecutors act. Regulation is therefore a game. Optimal regulation is therefore a problem of mechanism design, and we now have mathematical tools that allow investigation across the entire space of potential regulating mechanisms, even those that our counterfactual. That is an incredibly powerful methodological advance, so powerful that there will be at least one more Nobel (Milgrom and Holmstrom?) based on this literature.

Because Tirole’s toolbox is theoretical, he has written an enormous amount of “high theory” on the implications of the types of models modern IO economists use. I want to focus in this post on a particular problem where Tirole has stood on both sides of the divide: that of the seemingly obscure question of what can be contracted on.

This literature goes back to a very simple question: what is a firm, and why do they exist? And when they exist, why don’t they grow so large that they become one giant firm a la Schumpeter’s belief in Capitalism, Socialism, and Democracy? One answer is that given by Coase and updated by Williamson, among many others: transaction costs. There are some costs of haggling or similar involved in getting things done with suppliers or independent contractors. When these costs are high, we integrate that factor into the firm. When they are low, we avoid the bureaucratic costs needed to manage all those factors.

For a theorist trained in mechanism design, this is a really strange idea. For one, what exactly are these haggling or transaction costs? Without specifying what precisely is meant, it is very tough to write a model incorporating them and exploring the implications of them. But worse, why would we think these costs are higher outside the firm than inside? A series of papers by Sandy Grossman, Oliver Hart and John Moore point out, quite rightly, that firms cannot make their employees do anything. They can tell them to do something, but the employees will respond to incentives like anyone else. Given that, why would we think the problem of incentivizing employees within an organization is any easier or harder than incentivizing them outside the organization? The solution they propose is the famous Property Rights Theory of the firm (which could fairly be considered the most important paper ever published in the illustrious JPE). This theory says that firms are defined by the assets they control. If we can contract on every future state of the world, then this control shouldn’t matter, but when unforeseen contingencies arise, the firm still has “residual control” of its capital. Therefore, efficiency depends on the allocation of scarce residual control rights, and hence the allocation of these rights inside or outside of a firm are important. Now that is a theory of the firm – one well-specified and based on incentives – that I can understand. (An interesting sidenote: when people think economists don’t really understand the economy because, hey, they’re not rich, we can at least point to Sandy Grossman. Sandy, a very good theorist, left academia to start his own firm, and as far as I know, he is now a billionaire!)

Now you may notice one problem with Grossman, Hart and Moore’s papers. As there was an assumption of nebulous transaction costs in Coase and his followers, there is a nebulous assumption of “incomplete contracts” in GHM. This seems reasonable at first glance: there is no way we could possibly write a contract that covers every possible contingency or future state of the world. I have to imagine everyone that has ever rented an apartment or leased a car or ran a small business has first-hand experience with the nature of residual control rights when some contingency arises. Here is where Tirole comes in. Throughout the 80s and 90s, Tirole wrote many papers using incomplete contracts: his 1994 paper with Aghion on contracts for R&D is right within this literature. In complete contracting, the courts can verify and enforce any contract that relies on observable information, though adverse selection (hidden information by agents) or moral hazard (unverifiable action by agents) may still exist. Incomplete contracting further restricts the set of contracts to a generally simple set of possibilities. In the late 1990s, however, Tirole, along with his fellow Nobel winner Eric Maskin, realized in an absolute blockbuster of a paper that there is a serious problem with these incomplete contracts as usually modeled.

Here is why: even if we can’t ex-ante describe all the future states of the world, we may still ex-post be able to elicit information about the payoffs we each get. As Tirole has noted, firms do not care about indescribable contingencies per se; they only care about how those contingencies affect their payoffs. That means that, at an absolute minimum, the optimal “incomplete contract” better be at least as good as the optimal contract which conditions on elicited payoffs. These payoffs may be stochastic realizations of all of our actions, of course, and hence this insight might not actually mean we can first-best efficiency when the future is really hard to describe. Maskin and Tirole’s 1999 paper shows, incredibly, that indescribability of states is irrelevant, and that even if we can’t write down a contract on states of the world, we can contract on payoff realizations in a way that is just as good as if we could actually write the complete contract.

How could this be? Imagine (here via a simple example of Maskin’s) two firms contracting for R&D. Firm 1 exerts effort e1 and produces a good with value v(e1). Firm 2 invests in some process that will lower the production cost of firm 1’s new good, investing e2 to make production cost equal to c(e2). Payoffs, then, are u1(p-c(e2)-e1) and u2(v(e1)-p-e2). If we knew u1 and u2 and could contract upon it, then the usual Nash implementation literature tells us how to generate efficient levels of e1 and e2 (call them e1*, e2*) by writing a contract: if the product doesn’t have the characteristics of v(e1*) and the production process doesn’t have the characteristics of c(e2*), then we fine the person who cheated. If effort generated stochastic values rather than absolute ones, the standard mechanism design literature tells us exactly when we can still get the first best.

Now, what if v and c are state-dependent, and there are huge number of states of the world? That is, efficient e1* and e2* are now functions of the state of the world realized after we write the initial contract. Incomplete contracting assumed that we cannot foresee all the possible v and c, and hence won’t write a contract incorporating all of them. But, aha!, we can still write a contract that says, look, whatever happens tomorrow, we are going to play a game tomorrow where I say what my v is and you say what your c is. It turns out that there exists such a game which generates truthful revelation of v and c (Maskin and Tirole do this using an idea similar to that of the subgame implementation literature, but the exact features are not terribly important for our purposes). Since the only part of the indescribable state I care about is the part that affects my payoffs, we are essentially done: no matter how many v and c’s there could be in the future, as long as I can write a contract specifying how we each get other to truthfully say what those parameters are, this indescribability doesn’t matter.

Whoa. That is a very, very, very clever insight. Frankly, it is convincing enough that the only role left for property rights theories of the firm are some kind of behavioral theory which restricts even contracts of the Maskin-Tirole sense – and since these contracts are quite a bit simpler in some way than the hundreds of pages of legalese which we see in a lot of real-world contracts on important issues, it’s not clear that bounded rationality or similar theories will get us far.

Where to go from here? Firms, and organizations more generally, exist. I am sure the reason has to do with incentives. But exactly why – well, we still have a lot of work to do in explaining why. And Tirole has played a major role in explaining why.

Tirole’s Walras-Bowley lecture, published in Econometrica in 1999, is a fairly accessible introduction to his current view of incomplete contracts. He has many other fantastic papers, across a wide variety of topics. I particularly like his behavioral theory written mainly with Roland Benabou; see, for instance, their 2003 ReStud on when monetary rewards are bad for incentives.

“International Trade and Institutional Change: Medieval Venice’s Response to Globalization,” D. Puga & D. Trefler

(Before discussing the paper today, I should forward a couple great remembrances of Stanley Reiter, who passed away this summer, by Michael Chwe (whose interests at the intersection of theory and history are close to my heart) and Rakesh Vohra. After leaving Stanford – Chwe mentions this was partly due to a nasty letter written by Reiter’s advisor Milton Friedman! – Reiter established an incredible theory group at Purdue which included Afriat, Vernon Smith and PhD students like Sonnenschein and Ledyard. He then moved to Northwestern where he helped build up the great group in MEDS which is too long to list, but which includes one Nobel winner already in Myerson and, by my reckoning, two more which are favorites to win the prize next Monday.

I wonder if we may be at the end of an era for topic-diverse theory departments. Business schools are all a bit worried about “Peak MBA”, and theorists are surely the first ones out the door when enrollment falls. Economic departments, journals and funders seem to have shifted, in the large, toward more empirical work, for better or worse. Our knowledge both of how economic and social interactions operate in their most platonic form, and our ability to interpret empirical results when considering novel or counterfactual policies, have greatly benefited by the theoretical developments following Samuelson and Hicks’ mathematization of primitives in the 1930s and 40s, and the development of modern game theory and mechanism design in the 1970s and 80s. Would that a new Cowles and a 21st century Reiter appear to help create a critical mass of theorists again!)

On to today’s paper, a really interesting theory-driven piece of economic history. Venice was one of the most important centers of Europe’s “commercial revolution” between the 10th and 15th century; anyone who read Marco Polo as a schoolkid knows of Venice’s prowess in long-distance trade. Among historians, Venice is also well-known for the inclusive political institutions that developed in the 12th century, and the rise of oligarchy following the “Serrata” at the end of the 13th century. The Serrata was followed by a gradual decrease in Venice’s power in long-distance trade and a shift toward manufacturing, including the Murano glass it is still famous for today. This is a fairly worrying history from our vantage point today: as the middle class grew wealthier, democratic forms of government and free markets did not follow. Indeed, quite the opposite: the oligarchs seized political power, and within a few decades of the serrata restricted access to the types of trade that previously drove wealth mobility. Explaining what happened here is both a challenge due to limited data, and of great importance given the public prominence of worries about the intersection of growing inequality and corruption of the levers of democracy.

Dan Trefler, an economic historian here at U. Toronto, and Diego Puga, an economist at CEMFI who has done some great work in economic geography, provide a great explanation of this history. Here’s the model. Venice begins with lots of low-wealth individuals, a small middle and upper class, and political power granted to anyone in the upper class. Parents in each dynasty can choose to follow a risky project – becoming a merchant in a long-distance trading mission a la Niccolo and Maffeo Polo – or work locally in a job with lower expected pay. Some of these low and middle class families will succeed on their trade mission and become middle and upper class in the next generation. Those with wealth can sponsor ships via the colleganza, a type of early joint-stock company with limited liability, and potentially join the upper class. Since long-distance trade is high variance, there is a lot of churn across classes. Those with political power also gather rents from their political office. As the number of wealthy rise in the 11th and 12th century, the returns to sponsoring ships falls due to competition across sponsors in the labor and export markets. At any point, the upper class can vote to restrict future entry into the political class by making political power hereditary. They need to include sufficiently many powerful people in this hereditary class or there will be a revolt. As the number of wealthy increase, eventually the wealthy find it worthwhile to restrict political power so they can keep political rents within their dynasty forever. Though political power is restricted, the economy is still free, and the number of wealthy without power continue to grow, lowering the return to wealth for those with political power due to competition in factor and product markets. At some point, the return is so low that it is worth risking revolt from the lower classes by restricting entry of non-nobles into lucrative industries. To prevent revolt, a portion of the middle classes are brought in to the hereditary political regime, such that the regime is powerful enough to halt a revolt. Under these new restrictions, lower classes stop engaging in long-distance trade and instead work in local industry. These outcomes can all be generated with a reasonable looking model of dynastic occupation choice.

What historical data would be consistent with this theoretical mechanism? We should expect lots of turnover in political power and wealth in the 10th through 13th centuries. We should find examples in the literature of families beginning as long-distance traders and rising to voyage sponsors and political agents. We should see a period of political autocracy develop, followed later by the expansion of hereditary political power and restrictions on lucrative industry entry to those with such power. Economic success based on being able to activate large amounts of capital from within the nobility class will make the importance of inter-family connections more important in the 14th and 15th centuries than before. Political power and participation in lucrative economic ventures will be limited to a smaller number of families after this political and economic closure than before. Those left out of the hereditary regime will shift to local agriculture and small-scale manufacturing.

Indeed, we see all of these outcomes in Venetian history. Trefler and Puga use some nice techniques to get around limited data availability. Since we don’t have data on family incomes, they use the correlation in eigenvector centrality within family marriage networks as a measure of the stability of the upper classes. They code colleganza records – a non-trivial task involving searching thousands of scanned documents for particular Latin phrases – to investigate how often new families appear in these records, and how concentration in the funding of long-distance trade changes over time. They show that all of the families with high eigenvector centrality in the noble marriage market after political closure – a measure of economic importance, remember – were families that were in the top quartile of seat-share in the pre-closure Venetian legislature, and that those families which had lots of political power pre-closure but little commercial success thereafter tended to be unsuccessful in marrying into lucrative alliances.

There is a lot more historical detail in the paper, but as a matter of theory useful to the present day, the Venetian experience ought throw cold water on the idea that political inclusiveness and economic development always form a virtuous circle. Institutions are endogenous, and changes in the nature of inequality within a society following economic development alter the potential for political and economic crackdowns to survive popular revolt.

Final published version in QJE 2014 (RePEc IDEAS). A big thumbs up to Diego for having the single best research website I have come across in five years of discussing papers in this blog. Every paper has an abstract, well-organized replication data, and a link to a locally-hosted version of the final published paper. You may know his paper with Nathan Nunn on how rugged terrain in Africa is associated with good economic outcomes today because slave traders like the infamous Tippu Tip couldn’t easily exploit mountainous areas, but it’s also worth checking out his really clever theoretical disambiguation of why firms in cities are more productive, as well as his crazy yet canonical satellite-based investigation of the causes of sprawl. There is a really cool graphic on the growth of U.S. sprawl at that last link!

“Organizing Venture Capital: The Rise and Demise of American Research and Development Corporation, 1946-1973,” D. Hsu & M. Kenney (2005)

Venture capital financing of innovative firms feels like a new phenomenon, and is clearly of great importance to high tech companies as well as cities that hope to attract these companies. The basic principle involves relatively small numbers of wealthy individuals providing long-term financing to a group of managers who seek out early-stage, unprofitable firms, make an investment (generally equity), and occasionally help actively manage the company.

There are many other ways firms can fund themselves: issuance of equity, investment from friends or family, investment from an existing firm in a spinoff, investment from the saved funds of an individual, or debt loans from a bank, among others. Two questions, then, are immediate: why does anyone fund with VC in the first place, and how did this institutional form come about? VC is strange at first glance: in a stage in which entrepreneur effort is particularly important, why would I write a financing contract which takes away some of the upside of working hard on the part of the entrepreneur by diluting her ownership share? Two things are worth noting. VC rather than debt finance is particularly common when returns are highly skewed – a bank loan can only be repaid with interest, hence will have trouble capturing that upside. Second, early-stage equity finance and active managerial assistance appear to come bundled, hence some finance folks have argued that the moral hazard problem lies both with the entrepreneur, who must be incentivized to work hard, and with the VC firm and their employees, who need the same incentive.

Let’s set aside the question of entrepreneurial finance, and look into history. Though something like venture capital appeared to be important in the Second Industrial Revolution (see, e.g., Lamoreaux et al (2006) on that hub of high-tech, Cleveland!), and it may have existed in a proto-form as early as the 1700s with the English country banks (though I am not totally convinced of that equivalence), the earliest modern VC firm was Boston’s American Research and Development Corporation. The decline of textiles hit New England hard in the 1920s and 1930s. A group of prominent blue bloods, including the President of MIT and the future founder of INSEAD, had discussed the social need for an organization that would fund firms which could potentially lead to new industries, and they believed that despite this social goal, the organization ought be a profit-making concern if it were to be successful in the long run.

After a few false starts, the ARD formed in 1946, a time of widespread belief in the power of R&D following World War II and Vannevar Bush’s famous “Science: the Endless Frontier”. ARD was organized as a closed-end investment trust, which permitted institutional investors to contribute. Investments tended to be solicited, were very likely to be made to New England firms, and were, especially in the first few years, concentrated in R&D intensive companies; local, solicited, R&D heavy investment is even today the most common type of VC. Management was often active, and there are reports of entire management teams being replaced by ARD if they felt the firm was not growing quickly enough.

So why have you never of ARD, then? Two reasons: returns, and organizational structure. ARD’s returns over the 50s and 60s were barely higher, even before fees, than the S&P 500 as a whole. And this overstates things: an investment in Digital Equipment, the pioneering minicomputer company, was responsible for the vast majority of profits. No surprise, then, that even early VCs had highly skewed returns. More problematic was competition. A 1958 law permitted Small Business Investment Corporations (SBICs) to make VC-style investments at favorable tax rates, and the organizational form of limited partnership VC was less constrained by the SEC than a closed-end investment fund. In particular, the partnerships “2 and 20″ structure meant that top investment managers could earn much more money at that type of firm than at ARD. One investment manager at ARD put a huge amount of effort into developing a company called Optical Scanning, whose IPO made the founder $10 million. The ARD employee, partially because of SEC regulations, earned a $2000 bonus. By 1973, ARD had been absorbed into another company, and was for all practical purposes defunct.

It’s particularly interesting, though, that the Boston Brahmins were right: VC has been critical in two straight resurgences in the New England economy, the minicomputer cluster of the 1960s, and the more recent Route 128 biotech cluster, both of which were the world’s largest. New England, despite the collapse of textiles, has not gone the way of the rust belt – were it a country, it would be wealthier per capita than all but a couple of microstates. And yet, ARD as a profitmaking enterprise went kaput rather quickly. Yet more evidence of the danger of being a market leader – not only can other firms avoid your mistakes, but they can also take advantage of more advantageous organizational forms and laws that are permitted or created in response to your early success!

Final published version, in Industrial and Corporate Change 2005 (RePEc IDEAS).

“Optimal Contracts for Experimentation,” M. Halac, N. Kartik & Q. Liu (2013)

Innovative activities have features not possessed by more standard modes of production. The eventual output, and its value, are subject to a lot of uncertainty. Effort can be difficult to monitor – it is often the case that the researcher knows more than management about what good science should look like. The inherent skill of the scientist is hard to observe. Output is generally only observed in discrete bunches.

These features make contracting for researchers inherently challenging. The classic reference here is Holmstrom’s 1989 JEBO, which just applies his great 1980s incentive contract papers to innovative activities. Take a risk-neutral firm. They should just work on the highest expected value project, right? Well, if workers are risk averse and supply unobserved effort, the optimal contract balances moral hazard (I would love to just pay you based on your output) and risk insurance (I would have to pay you to bear risk about the eventual output of the project). It turns out that the more uncertainty a project has, the more inefficient the information-constrained optimal contract becomes, so that even risk-neutral firms are biased toward relatively safe, lower expected value projects. Incentives within the firm matter in many other ways, as Holmstrom also points out: giving employee multiple tasks when effort is unobserved makes it harder to provide proper incentives because the opportunity cost of a given project goes up, firms with a good reputation in capital markets will be reluctant to pursue risky projects since the option value of variance in reputation is lower (a la Doug Diamond’s 1989 JPE), and so on. Nonetheless, the first order problem of providing incentives for a single researcher on a single project is hard enough!

Holmstrom’s model doesn’t have any adverse selection, however: both employer and employee know what expected output will result from a given amount of effort. Nor is Holmstrom’s problem dynamic. Marina Halac, Navin Kartik and Qingmin Liu have taken up the unenviable task of solving the dynamic researcher contracting problem under adverse selection and moral hazard. Let a researcher be either a high type or a low type. In every period, the researcher can work on a risky project at cost c, or shirk at no cost. The project is either feasible or not, with probability b. If the employee shirks, or the project is bad, there will be no invention this period. If the employee works, the project is feasible, and the employee is a high type, the project succeeds with probability L1, and if the employee is low type, with probability L2<L1. Note that as time goes on, if the employee works on the risk project, they continually update their beliefs about b. If enough time passes without an invention, belief about b becomes low enough that everyone (efficiently) stops working on the risky project. The firm's goal is to get employees to exert optimal effort for the optimal number of period given their type.

Here’s where things really get tricky. Who, in expectation and assuming efficient behavior, stops working on the risky project earlier conditional on not having finished the invention, the high type or the low type? On the one hand, for any belief about b, the high type is more likely to invent, hence since costs are identical for both types, the high type should expect to keep working longer. On the other hand, the high type learns more quickly whether the project is bad, and hence his belief about b declines more rapidly, so he ought expect to work for less time. That either case is possible makes solving for the optimal contract a real challenge, because I need to write the contracts for each type such that the low type does not ever prefer the high type payoffs and vice versa. To know whether these contracts are incentive compatible, I have to know what agents will do if they deviate to the “wrong” contract. The usual trick here is to use a single crossing result along the lines of “for any contract with properties P, action Y is more likely for higher types”. In the dynamic researcher problem, since efficient stopping times can vary nonmonotically with researcher type, the single crossing trick doesn’t look so useful.

The “simple” (where simple means a 30 page proof) case is when the higher types efficiently work longer in expectation. The information-constrained optimum involves inducing the high type to work efficiently, while providing the low type too little incentive to work for the efficient amount of time. Essentially, the high type is willing to work for less money per period if only you knew who he was. Asymmetric information means the high type can extract information rents. By reducing the incentive for the low type to work in later periods, the high type information rent is reduced, and hence the optimal mechanism trades off lower total surplus generated by the low type against lower information rents paid to the high type.

This constrained-optimal outcome can be implemented by paying scientists up front, and then letting them choose either a contract with progressively increasing penalties for lack of success each period, or a contract with a single large penalty if no success is achieved by the socially efficient high type stopping time. Also, “Penalty contracts” are nice because they remain optimal even if scientists can keep their results secret: since secrecy just means paying more penalties, everyone has an incentive to reveal their invention as soon as they create it. The proof is worth going through if you’re into dynamic mechanism design; essentially, the authors are using a clever set of relaxed problems where a form of single crossing will hold, then showing that mechanism is feasible even under the actual problem constraints.

Finally, note that if there is only moral hazard (scientist type is observable) or only adverse selection (effort is observable), the efficient outcome is easy. With moral hazard, just make the agent pay the expected surplus up front, and then provide a bonus to him each period equal to the firm’s profit from an invention occurring then; we usually say in this case that “the firm is sold to the employee”. With adverse selection, we can contract on optimal effort, using total surplus to screen types as in the correlated information mechanism design literature. Even though the “distortion only at the bottom” result looks familiar from static adverse selection, the rationale here is different.

Sept 2013 working paper (No RePEc IDEAS version). The article appears to be under R&R at ReStud.

“Aggregation in Production Functions: What Applied Economists Should Know,” J. Felipe & F. Fisher (2003)

Consider a firm that takes heterogeneous labor and capital inputs L1, L2… and K1, K2…, using these to produce some output Y. Define a firm production function Y=F(K1, K2…, L1, L2…) as the maximal output that can be produced using the given vector of outputs – and note the implicit optimization condition in that definition, which means that production functions are not simply technical relationships. What conditions are required to construct an aggregated production function Y=F(K,L), or more broadly to aggregate across firms an economy-wide production function Y=F(K,L)? Note that the question is not about the definition of capital per se, since defining “labor” is equally problematic when man-hours are clearly heterogeneous, and this question is also not about the more general capital controversy worries, like reswitching (see Samuelson’s champagne example) or the dependence of the return to capital on the distribution of income which, itself, depends on the return to capital.

(A brief aside: on that last worry, why the Cambridge UK types and their modern day followers are so worried about the circularity of the definition of the interest rate, yet so unconcerned about the exact same property of the object we call “wage”, is quite strange to me, since surely if wages equal marginal product, and marginal product in dollars is a function of aggregate demand, and aggregate demand is a function of the budget constraint determined by wages, we are in an identical philosophical situation. I think it’s pretty clear that the focus on “r” rather than “w” is because of the moral implications of capitalists “earning their marginal product” which are less than desirable for people of a certain political persuasion. But I digress; let’s return to more technical concerns.)

It turns out, and this should be fairly well-known, that the conditions under which factors can be aggregated are ridiculously stringent. If we literally want to add up K or L when firms use different production functions, the condition (due to Leontief) is that the marginal rate of substitution between different types of factors in one aggregation, e.g. capital, does not depend on the level of factors not in that aggregation, e.g. labor. Surely this is a condition that rarely holds: how much I want to use, in an example due to Solow, different types of trucks will depend on how much labor I have at hand. A follow-up by Nataf in the 1940s is even more discouraging. Assume every firm uses homogenous labor, every firm uses capital which though homogenous within each firms differs across firms, and every firm has identical constant returns to scale production technology. When can I now write an aggregate production function Y=F(K,L) summing up the capital in each firm K1, K2…? That aggregate function exists if and only if every firm’s production function is additively separable in capital and labor (in which case, the aggregation function is pretty obvious)! Pretty stringent, indeed.

Fisher helps things just a bit in a pair of papers from the 1960s. Essentially, he points out that we don’t want to aggregate for all vectors K and L, but rather we need to remember that production functions measure the maximum output possible when all inputs are used most efficiently. Competitive factor markets guarantee that this assumption will hold in equilibrium. That said, even assuming only one type of labor, efficient factor markets, and a constant returns to scale production function, aggregation is possible if and only if every firm has the same production function Y=F(b(v)K(v),L), where v denotes a given firm and b(v) is a measure of how efficiently capital is employed in that firm. That is, aside from capital efficiency, every firm’s production function must be identical if we want to construct an aggregate production function. This is somewhat better than Nataf’s result, but still seems highly unlikely across a sector (to say nothing of an economy!).

Why, then, do empirical exercises using, say, aggregate Cobb-Douglas seem to give such reasonable parameters, even though the above theoretical results suggest that parameters like “aggregate elasticity of substitution between labor and capital” don’t even exist? That is, when we estimate elasticities or total factor productivities from Y=AK^a*L^b, using some measure of aggregated capital, what are we even estimating? Two things. First, Nelson and Winter in their seminal book generate aggregate date which can almost perfectly be fitted using Cobb-Douglas even though their model is completely evolutionary and does not even involve maximizing behavior by firms, so the existence of a “good fit” alone is, and this should go without saying, not great evidence in support of a model. Second, since ex-post production Y must equal the wage bill plus the capital payments plus profits, Felipe notes that this identity can be algebraically manipulated to Y=AF(K,L) where the form of F depends on the nature of the factor shares. That is, the good fit of Cobb-Douglas or CES can simply reflect an accounting identity even when nothing is known about micro-level elasticities or similar.

So what to do? I am not totally convinced we should throw out aggregate production functions – it surely isn’t a coincidence that Solow residuals for TFP match are estimated to be high in places where our intuition says technological change has been rapid. Because of results like this, it doesn’t strike me that aggregate production functions are measuring arbitrary things. However, if we are using parameters from these functions to do counterfactual analysis, we really ought know better exactly what approximations or assumptions are being baked into the cake, and it doesn’t seem that we are quite there yet. Until we are, a great deal of care should be taken in assigning interpretations to estimates based on aggregate production models. I’d be grateful for any pointers in the comments to recent work on this problem.

Final published version (RePEc IDEAS. The “F. Fisher” on this paper is the former Clark Medal winner and well-known IO economist Franklin Fisher; rare is it to find a nice discussion of capital issues written by someone who is firmly part of the economics mainstream and completely aware of the major theoretical results from “both Cambridges”. Tip of the cap to Cosma Shalizi for pointing out this paper.

Some Results Related to Arrow’s Theorem

Arrow’s (Im)possibility Theorem is, and I think this is universally acknowledged, one of the great social science theorems of all time. I particularly love it because of its value when arguing with Popperians and other anti-theory types: the theorem is “untestable” in that it quite literally does not make any predictions, yet surely all would consider it a valuable scientific insight.

In this post, I want to talk about a couple. new papers using Arrow’s result is unusual ways. First, a philosopher has shown exactly how Arrow’s result is related to the general philosophical problem of choosing which scientific theory to accept. Second, a pair of computer scientists have used AI techniques to generate an interesting new method for proving Arrow.

The philosophic problem is the following. A good theory should satisfy a number of criteria; for Kuhn, these included accuracy, consistency, breadth, simplicity and fruitfulness. Imagine now there are a group of theories (about, e.g., how galaxies form, why birds have wings, etc.) and we ordinally rank them based on these criteria. Also imagine that we have ranked each theory according to these criteria and we all agree on the rankings. Which theory ought we accept? Arrow applied to theory choice gives us the worrying result that not only is there no unique method of choosing among theories but also that there may not exist any such method at all, at least if we want to satisfy unanimity, non-dictatorship and independence of irrelevant alternatives. That is, even if you and I all agree about how each theory ranks according to different desirability criteria, we still don’t have a good, general method of aggregating among criteria.

So what to do? Davide Rizza, in a new paper in Synthese (gated, I’m afraid), discusses a number of solutions. Of course, if we have more than just ordinal information about each criterion, then we can construct aggregated orders. For instance, if we assigned a number for the relative rankings on each criterion, we could just add these up for each theory and hence have an order. Note that this theory choice rule can be done even if we just have ordinal data – if there are N theories, then on criterion C, give the best theorem in that criterion N points, the second best N-1, and so on, then add up the scores. This is the famous Borda Count.

Why can’t we choose theories by the Borda Count or similar, then? Well, Borda (and any other rule that could construct an aggregate order while satisfying unanimity and non-dictatorship) must be violating the IIA assumption in Arrow. Unanimity, which insists a rule accept a theory if it considered best along every criterion, and non-dictatorship, where more than one criterion can at least matter in principle, seem totally unobjectionable. So maybe we ought just toss IIA from our theory choice rule, as perhaps Donald Saari would wish us to do. And IIA is a bit strange indeed. If I rank A>B>C, and if you require me to have transitive preferences, then just knowing the binary rankings A>B and B>C is enough to tell you that I prefer A>C even if I didn’t know that particular binary relationship. In this case, adding B isn’t “irrelevant”; there is information in the binary pairs generated by transitivity which IIA does not allow me to take advantage of. Some people call the IIA assumption “binary independence” since it aggregates using only binary relations, an odd thing given that the individual orders contain, by virtue of being orders, more than just binary relations. It turns out that there are aggregation rules which generate an order if we loosen IIA to an alternative restriction on how to use information in sequences. IIA, rather than ordinal rankings across criteria, is where Arrow poses a problem for theory choice. Now, Rizza points out that these aggregation rules needn’t be unique so we still can have situations where we all agree about how different theories rank according to each criterion, and agree on the axiomatic properties we want in an aggregation rules, yet nonetheless disagree about which theory to accept. Still worrying, though not for Kuhn, and certainly not for us crazier Feyerabend and Latour fans!

(A quick aside: How strange it is that Arrow’s Theorem is so heavily associated with voting? That every voting rule is subject to tactical behavior is Gibbard-Satterthwaite, not Arrow, and this result about strategic voting imposes nothing like an IIA assumption. Arrow’s result is about the far more general problem of aggregating orders, a problem which fundamentally has nothing to do with individual behavior. Indeed, I seem to recall that Arrow came up with his theorem while working one summer as a grad student at RAND on the problem of what, if anything, it could mean for a country to have preferences when voting on behalf of its citizens in bodies like the UN. The story also goes that when he showed his advisor – perhaps Hotelling? – what he had been working on over the summer, he was basically told the result was so good that he might as well just graduate right away!)

The second paper today comes from two computer scientists. There are lots of proofs of Arrow’s theorem – the original proof in Arrow’s 1951 book is actually incorrect! – but the CS guys use a technique I hadn’t seen before. Essentially, they first prove with a simple induction that iff you can find a case with 2 voters and 3 options that satisfies the Arrow axioms, can you find such a case with N>=2 voters and M>=3 options. This doesn’t actually narrow the problem a great deal: there are still 3!=6 ways to order 3 options, hence 6^2=36 permutations of the joint vote of the 2 voters, hence 6^36 functions mapping the voter orders to a social order. Nonetheless, the problem is small enough to be tackled by a Constraint Satisfaction algortihm which checks IIA and unanimity and finds only two social welfare functions not violating one of those constraints, which are just the cases where Agents 1 and 2 are dictators. Their algorithm took one second to run on a standard computer (clearly they are better algorithm writers than the average economist!). Sen’s theorem and Muller-Satterthwaite can also be proven using a similar restriction to the base case followed by algorithmic search.

Of course, algorithmic proofs tend to lack the insight and elegance of standard proofs. But they have benefits as well. Just as you can show that only 2 social welfare functions with N=2 voters and M=3 options satisfy IIA and unanimity, you can also show that only 94 (out of 6^36!) satisfy IIA. That is, it is IIA rather than other assumptions which is doing most of the work in Arrow. Inspecting those 94 remaining social welfare functions by hand can help elucidate alternative sets of axioms which also generate aggregation possibility or impossibility.

(And a third paper, just for fun: it turns out that Kiribati and Nauru actually use Borda counts in their elections, and that there does appear to be strategic candidate nomination behavior designed to take advantage of the non-IIA nature of Borda! IIA looks in many ways like a restriction on tactical behavior by candidates or those nominating issues, rather than a restriction on tactical behavior by voters. If you happen to teach Borda counts, this is a great case to give students.)

“Seeking the Roots of Entrepreneurship: Insights from Behavioral Economics,” T. Astebro, H. Herz, R. Nanda & R. Weber (2014)

Entrepreneurship is a strange thing. Entrepreneurs work longer hours, make less money in expectation, and have higher variance earnings than those working for firms; if anyone knows of solid evidence to the contrary, I would love to see the reference. The social value of entrepreneurship through greater product market competition, new goods, etc., is very high, so as a society the strange choice of entrepreneurs may be a net benefit. We even encourage it here at UT! Given these facts, why does anyone start a company anyway?

Astebro and coauthors, as part of a new JEP symposium on entrepreneurship, look at evidence from behavioral economics. The evidence isn’t totally conclusive, but it appears entrepreneurs are not any more risk-loving or ambiguity-loving than the average person. Though they are overoptimistic, you still see entrepreneurs in high-risk, low-performance firms even ten years after they are founded, at which point surely any overoptimism must have long since been beaten out of them.

It is, however, true that entrepreneurship is much more common among the well-off. If risk aversion can’t explain things, then perhaps entrepreneurship is in some sense consumption: the founders value independence and control. Experimental evidence provides fairly strong evidence for this hypothesis. For many entrepreneurs, it is more about not having a boss than about the small chance of becoming very rich.

This leads to a couple questions: why so many immigrant entrepreneurs, and what are we make of the declining rate of firm formation in the US? Pardon me if I speculate a bit here. The immigrant story may just be selection; almost by definition, those who move across borders, especially those who move for graduate school, tend to be quite independent! The declining rate of firm formation may be tied with inequality changes; to the extent that entrepreneurship involves consumption of a luxury good (control over one’s working life) in addition to standard risk-adjusted cost-benefit analysis, then changes in the income distribution will change that consumption pattern. More work is needed on these questions.

Summer 2014 JEP (RePEc IDEAS). As always, a big thumbs up to the JEP for being free to read! It is also worth checking out the companion articles by Bill Kerr and coauthors on experimentation, with some amazing stats using internal VC project evaluation data for which ex-ante projections were basically identical for ex-post failures and ex-post huge successes, and one by Haltiwanger and coauthors documenting the important role played by startups in job creation, the collapse in startup formation and job churn which began well before 2008, and the utter mystery about what is causing this collapse (which we can see across regions and across industries).

Follow

Get every new post delivered to your Inbox.

Join 244 other followers

%d bloggers like this: