No Free Lunch: Natural Experiments and the. Construction of Instrumental Variables

Size: px
Start display at page:

Download "No Free Lunch: Natural Experiments and the. Construction of Instrumental Variables"

Transcription

1 No Free Lunch: Natural Experiments and the Construction of Instrumental Variables Thad Dunning Department of Political Science Yale University June 28,

2 Abstract Social scientists have increasingly exploited sources of random or quasi-random variation, including natural experiments, to construct instrumental variables for use in regression analysis. In many applications, researchers seek to defend the plausibility of a key assumption: namely, while an instrument or set of instruments is empirically correlated with an endogenous regressor in a linear regression model, it is independent of the error term in that model. I argue here that while fulfilling this exogeneity criterion may be necessary for a valid application of the instrumental variables approach, it is far from sufficient. In particular, in the regression context the identification of causal effects depends not just on the exogeneity of the instrument(s) but also on the validity of the underlying model. In this paper, I focus attention on the implications of one feature of characteristic models: the assumption of common effects across exogenous and endogenous portions of the problematic regressor(s). In many applications, this assumption may be quite strong, but relaxing it can limit our ability to estimate parameters of greatest theoretical interest. After discussing two substantive examples, I discuss analytic results (simulations are reported elsewhere). I also present a specification test that may be useful for determining the relevance of these issues in a given application. 2

3 1 Introduction Social scientists increasingly exploit natural experiments as sources of instrumental variables for use in regression analysis. Unlike true experiments, in a natural experiment the manipulation of a treatment variable is not under the control of an experimental researcher; instead, analysts take advantage of interventions they observe in the social and political world. Akin to randomized controlled experiments, however, and unlike other observational studies, a researcher exploiting a natural experiment can make a credible claim that the observed assignment of non-experimental subjects to treatment and control conditions is done at random or as if at random. In any given study, it may happen that units of analysis are as if randomized to the treatment of theoretical interest. In this case, a natural experiment may be very close to a true experiment, in which the researchers have planned and introduced the randomized intervention. Perhaps more often, however, nature randomizes units of analysis to levels of some variable Z that is different from the treatment variable X. Under further assumptions to be discussed below, randomization of subjects to such a variable Z may nonetheless allow identification of the causal effect of the non-randomly assigned treatment X, so long as Z is correlated with treatment. The well-known idea is as follows. Consider the regression equation, Y = Xβ + ɛ, (1) where Y is a column vector of observations on a dependent variable, X is a matrix of observations on treatment variables and covariates, β is a vector of parameters, and ɛ is a vector of unobserved, mean-zero error terms. Unlike the classical regression model, here at least some columns of X may be dependent on the error term, that is, endogenous. The Ordinary Least Squares (OLS) estimator of β will therefore be biased by a quantity related to the expected value of the error term, given X. However, under additional assumptions, Instrumental Variables Least Squares (IVLS) 3

4 regression provides a way to obtain consistent estimates of β. To use IVLS, we must find a matrix of instrumental variables Z, with at least as many columns as X (exogenous columns of X may be included in Z), for which the following conditions hold: Z Z and Z X have full rank, and Z is independent of the unobserved error term (Greene 2003: 74-80; Freedman 2005: 175). The latter requirement is the hard one, and it is the one for which natural experiments are often exploited. For instance, Miguel, Satyanath, and Sergenti (2004) take advantage of the as if random assignment of African countries to inclement weather to instrument for GDP growth, in a study of the influence of growth on civil conflict. Acemoglu, Johnson, and Robinson (2001) use variation in historic settler mortality rates across former European colonies to instrument for current institutional quality, in a regression of measures of economic development on institutions. Angrist and Lavy (1999) exploit as if random variation in the size of Israeli classes to estimate the effect of class size on educational attainment. In these and other applications, researchers tend to devote substantial attention to defending the plausibility that a set of instrumental variables Z is independent of the error term in an equation like (1), as required for consistent estimation of β. In the context of an equation like (1), however, it is not merely the exogeneity of the instrument(s) that allows for estimation of the causal impact of X on Y. Instead, Z is tied to inferences about the impact of X on Y through a particular causal model. This underlying causal model in turn lends itself to a regression equation like (1), which is used to estimate the effect of treatment. Exogeneity is therefore necessary but not sufficient for valid application of the instrumental variables approach: the validity of the model is always at issue as well. Though this observation is in itself unremarkable, I argue here that an under-appreciated aspect of a statistical model like equation (1) plays an important role in sustaining causal inferences about the impact of X on Y. In brief, the statistical model in equation (1) assumes common effects across exogenous and endogenous portions of the treatment variable. While this assumption may be innocuous in some settings, it is far from clear that it holds in other contexts in which we would commonly use IVLS. Below, I discuss at length two substantive examples in which a compelling 4

5 natural experiment provides plausible as if randomization and thus supplies an instrumental variable that is credibly exogenous. Because of the exogeneity of the instruments, these examples lend themselves to particularly credible applications of the IVLS approach. However, as these examples will also suggest, the assumption that endogenous and exogenous portions of a problematic regressor have the same effects on the outcome of interest may be quite strong. Suppose that X is a (mean-zero) scalar random variable, and we have X = X 1 + X 2, with X 1 endogenous and X 2 exogenous; this partition of X into endogenous and exogenous portions emerges in a natural application-specific way in one of the examples discussed below. One alternative is that the true data-generating process is such that we should estimate Y = β(x 1 + X 2 ) + ɛ (2) Another alternative, however, is instead Y = β 1 X 1 + β 2 X 2 + ɛ (3) with β 1 β 2. We can think of the rows of equation (1) as i.i.d. realizations of the data-generating process implied by equation (2) or (3). The simple point I make here is that in many applications, as the substantive examples discussed below suggest, equation (3) may be more natural than equation (2). Since the point of using IVLS is often to recover estimates of the coefficient on an endogenous variable such as X = X 1 + X 2, positing a model like equation (2) is an important part of the technique. However, erroneously assuming constant coefficients can also produce IVLS estimates that are misleading. The point is not that there is a general failure in IVLS applications. Rather, the point is that in the regression context, the identification of causal effects depends not just on the exogeneity of the instrument(s) but also on the validity of the underlying model. This is, of course, 5

6 a general point that goes beyond applications of IVLS, yet it is one we tend to forget in focusing our attention only on the exogeneity of the instruments. The spirit of the discussion might also be put as follows. In a typical randomized controlled experiment, questions are sometimes raised about the extent to which the effect of one treatment can be generalized to another (perhaps similar) treatment. The standard response to such questions would be, we need to conduct another experiment to find out. Yet the natural-experiment-cuminstrumental variables approach seems to offer another alternative: although natural experiments often randomize nations or other units of analysis to treatments (weather patterns, setter mortality rates, and so on) other than the treatment of theoretical interest (such as GDP growth or institutional quality), it appears that we may use IVLS to recover the effect of these treatments. This approach can substitute for conducting a new experiment or for finding a different natural experiment in which units are in fact randomized to the treatments of primary interest only under assumptions that may be quite strong. In these contexts, the IVLS approach may not provide a free lunch, that is, a way to surmount the unfortunate fact that nature has randomized our units of analysis not to the treatments about which we care the most but rather to some other, related treatment. Whether these issues are germane in any given application is mostly a matter for a priori reflection, though at the end of this article I sketch a statistical specification test that might be of some use. The specification test requires an additional instrument, however, and therefore may be of limited practical utility. The main goal of the paper is thus to underscore the general relevance of the issues and to encourage their discussion in applications. Indeed, specification of the model, especially the assumption of constant effects, should perhaps be defended with the same energy with which we often defend exogeneity. This paper relates to but is distinct from several strands of literature in econometrics, political science, statistics, and program evaluation. There is a large literature that discusses the relative merits of IVLS (Kennedy 1985: 115; Hanushek and Jackson 1977: 238). Bartels (1991) uses simulations to study a bias-variance trade-off under the assumption that the instrument itself is (weakly) 6

7 endogenous. On a different topic, there is also a literature on instruments that may be exogenous but that are only weakly correlated with an endogenous regressor or set of regressors (e.g., Bound et al. 1995). The focus of the current paper differs from this previous work, in that it considers instruments that are strictly exogenous and that are also well-correlated with endogenous regressors. In this case, IVLS estimates are consistent, and the efficiency loss won t be too great because of the high correlation between the instrument and the endogenous regressors. Nonetheless, IVLS may fail to produce accurate estimates, due to a particular form of model misspecification. The paper thus focuses on inferential difficulties that arise even when the standard requirements for a valid instrument are met. More related to the present article is an important recent literature on understanding instrumental variables in the presence of causal heterogeneity or essential heterogeneity. Many recent papers have clarified what instrumental variables can estimate in such settings. In some settings, for example, the instrumental variables approach estimates treatment effects for individuals whose behavior is modified by instruments (Heckman and Robb 1985, 1986); instrumental variables can identify what Imbens and Angrist (1994) call local average treatment effects (see also Angrist, Imbens, and Rubin 1996; for discussion, Heckman, Urzua and Vytlacil 2006). Rosenzweig and Wolpin (2000) also show that what IVLS estimates depends on the underlying behavioral models that are posited. In these papers, however, which are often formulated in the context of the Neyman-Holland-Rubin potential outcomes model, the heterogeneity of interest comes across units (individuals, countries, etc.) that is, across i. In the present paper, we suppose that coefficients are constant across i and instead investigate the consequences of heterogeneity across variables that is, that is, heterogeneity across exogenous and endogenous portions of the regressors in X. 7

8 2 An example on lottery winnings In a recent paper, Doherty, Green and Gerber (2005) are interested in assessing the relationship between income and political attitudes. 1 They surveyed 342 people who had won a lottery in an Eastern state between 1983 and 2000 and asked a variety of questions about attitudes towards estate taxes, government redistribution, and social and economic policies more generally. Given the number and kinds of lottery tickets that individuals buy, the level of lottery winnings are randomly assigned among lottery players. 2 Abstracting from sample non-response and other issues that might threaten the validity of the inferences, 3, Doherty, Green, and Gerber can exploit the lottery to make compelling claims about the causal impact of winnings on political beliefs. It turns out that winning large amounts in a lottery has an effect on some relatively narrow political attitudes e.g., those who win more in the lottery favor the estate tax less but lottery winnings have relatively little impact on broader political attitudes, for instance, towards the proper role of government in the economy writ large. However, the question of perhaps greater social-scientific interest concerns the political effects of overall income: while relatively few people have lottery winnings, many have incomes. Does the natural experiment also allow us to generalize from the impact of lottery winnings to the effect of overall income on attitudes? It does not, without assumptions that may be quite strong in this context. As Doherty et al. (2005) carefully point out, the effect on political attitudes of windfall lottery winnings may be very different from other kinds of income for example, income earned through work, interest on wealth inherited from a rich parent, and so on. These kinds of concerns may also limit our ability to use IVLS to estimate the causal effect 1 Portions of the material in this section are based on Dunning (forthcoming). 2 Lottery winners are paid a large range of dollar amounts. In Doherty et al. s sample, the minimum total prize was $47,581, while the maximum was $15.1 million, both awarded in annual installments. 3 See Doherty et al. (2005) for further details. 8

9 of overall income on political attitudes. Consider the regression equation ATTITUDES i = β INCOME i + ɛ i (4) where ATTITUDES i measures the political attitudes of respondent i, INCOME i is the self-reported income (from all sources) of respondent i, and β is a regression coefficient common to all respondents. 4 For ease of exposition, the variables are mean-deviated, and covariates are not included. 5 The error term ɛ i is a random variable, independently and identically distributed across respondents with E(ɛ i ) = 0. The goal is to estimate the value of the parameter β, defined here as the impact of overall income on political attitudes. Equation (4) is the standard linear regression set-up, except for one catch: the error term is not independent of income, because unobserved (unmeasured) variables may be associated with both overall income and political attitudes. For instance, rich parents may teach their children how to play the stock market and also influence their attitudes towards government intervention. Peer-group networks may influence both economic success and political values. Ideology may itself shape economic returns, perhaps through the channel of beliefs about the returns to hard work. Even if some of these variables could be measured and controlled, clearly there are many unobserved variables that could conceivably confound inferences about the causal impact of overall income on political attitudes. From the perspective of standard approaches to instrumental variables regression, however, the innovative research design of Doherty et al. (2005) supplies the perfect instrument namely, a variable that is both correlated with overall income and is independent of the error term in equa- 4 Note that according to equation (4), subject i s response depends on the values of i s right-hand side variables; values for other subjects are irrelevant. The analog in Rubin s formulation of the Neyman model is the stable unit treatment value assumption (SUTVA) (Neyman 1923, Dabrowska and Speed 1990; Holland 1986). 5 In a similar regression model, Doherty et al. (2005) include various covariates, including a vector of variables to control for the kind of lottery tickets bought, and another vector of demographic variables to boost statistical efficiency by adjusting for slight imbalances due to the randomization. Doherty et al. (2005) also estimate a series of ordered probit models to estimate the impact of lottery winnings per se on attitudes. 9

10 tion (4). This variable is the level of lottery winnings of respondent i. An accounting identity is INCOME i EARNED INCOME i + WINNINGS i (5) where WINNINGS i are the lottery winnings of survey respondent i and EARNED INCOME i is shorthand for all other income sources of respondent i, net of lottery winnings. (We call this earned income, though it could of course include any additional income source beyond lottery winnings). Equation (5) implies that Cov(INCOME i, WINNINGS i ) 0 (6) since the variable WINNINGS i is a component of INCOME i. 6 Moreover, since levels of lottery winnings are randomly assigned to the lottery-playing survey respondents, winnings should be statistically independent of other characteristics of the respondents, including characteristics that might influence political attitudes. Thus: WINNINGS i ɛ i (7) where A B means A is independent of B. Viewed in the context of equation (4), equation (7) is an exclusion restriction (Greene 2003: 74-80). Together with equation (6), it says that there exists a variable that is correlated with the endogenous regressor INCOME i in equation (4) but that is independent of the error term in that equation. The Instrumental Variables Least Squares (IVLS) estimator is ˆβ IVLS = Ĉov(WINNINGS, ATTITUDES) Ĉov(WINNINGS, INCOME) (8) 6 This assumes (eminently plausibly) that Cov(EARNED INCOME i, WINNINGS i ) var(winnings i ). 10

11 that is, the sample covariance of lottery winnings and attitudes divided by the sample covariance of lottery winnings and overall income. 7 Under the assumptions of the model, equation (8) will provide a consistent estimator for β in equation (4). Note, however, that our ability to generalize from the effect of one treatment lottery winnings to the effect of another treatment total income is ensured only by the model in equation (4). Given the model, we can use the instrumental variables technique to obtain a consistent estimator of β, since lottery winnings are correlated with income but independent of the error term in equation (4). Yet the model itself does an important portion of the inferential work. To see this, note that by equation (5), the total income of individual i will be the sum of income from lottery winnings and income from all other sources (or earned income ). It is useful to rewrite equation (4) as ATTITUDES i = β(earned INCOME i + WINNINGS i ) + ɛ i (9) Writing the equation this way makes the assumptions of the model clearer. Among the most important assumptions is that β is assumed to be constant for all i, and, especially, constant for all forms of income. According to the model, it does not matter whether income comes from lottery winnings or from other sources: a marginal increment in either lottery winnings or in earned income will be associated with the same expected marginal increment in political attitudes. Put differently, the slope coefficient is assumed to be the same across endogenous and exogenous portions of the regressor INCOME i. The model therefore assumes away the important inferential issue that Doherty et al. (2005) point out. Suppose instead that we allowed lottery winnings to have its own slope parameter in equation (9), thus assuming that political attitudes are a linear combination of earned income, lottery 7 Equivalently, we can use Two-Stage Least Squares (IISLS); see Freedman (2005: ) on the equivalence of these estimators. 11

12 winnings, and an error term. Then we can rewrite equation (9) as ATTITUDES i = β 1 EARNED INCOME i + β 2 WINNINGS i + ɛ i (10) The variable WINNINGS i is independent of the error term among lottery winners, due to the randomization provided by the natural experiment. However, EARNED INCOME i remains endogenous, perhaps because factors such as education or parental attitudes influence both earned income and political attitudes. We could again resort to the instrumental variables approach, but since we need as many instruments as there are regressors in (10), we will need some new instrument in addition to WINNINGS i. Even if we could find one, we would need to assume a constant coefficient β 1 across the exogenous and endogenous portions of EARNED INCOME i. Suppose the data were generated according to equation (10), with β 1 β 2, and we (erroneously) assume equation (9). Now we estimate the model with IVLS, using WINNINGS as an instrument for INCOME. As I show analytically in Section 4, given the independence of EARNED INCOME and WINNINGS, IVLS will asymptotically estimate the coefficient β 2 on WINNINGS. Yet the coefficient on the endogenous variable EARNED INCOME may be of theoretical interest. (After all, if we only cared about β 2, we could simply regress ATTITUDES on the exogenous variable WINNINGS). In this context, IVLS provides no free lunch, despite the fact that we have an ideal instrumental variable for the endogenous regressor in equation (4). The identification provided by the natural experiment can help us recover the impact of lottery winnings, but not the impact of overall income unless the data were indeed generated according to (4). Again, the point is not that there is a general flaw in the IVLS approach. The point is the model is misspecified; to use the IVLS approach effectively, the data should have been generated according to (9), not (10). 12

13 3 An example on economic growth and civil conflict in Africa Before turning to the analytic results, however, I explore a different example drawn from the growing literature that applies the IVLS approach in comparative political economy and comparative politics. As we will see, there are issues that are parallel to those raised above, in the context of the study of lottery winnings. Just as in that study, the point here is neither that IVLS is necessarily the wrong approach nor that there are fundamental flaws in the study: indeed, the study reviewed here is one of the most creative and compelling of recent applications of IVLS in the field. The point is simply that purging estimates of their endogeneity through an application of IVLS depends on an important assumption about homogenous effects, and this is an assumption that it may be useful to discuss. Miguel, Satyanath, and Sergenti (2004) are interested in the effects of economic recession on the likelihood of civil conflict in Africa. Although the idea that adverse economic conditions may incite conflict is an old one, recent studies have posited specific mechanisms through which economic recessions may increase the likelihood of civil war. For instance, according to the influential models advanced by Paul Collier and Anke Hoeffler of the World Bank (1998, 2001, 2002), economic factors help to explain the incidence of civil war because of the important role they play in rebel recruitment. Miguel et al. (2004: 727) summarize the approach as follows: Collier and Hoeffler stress the gap between the returns from taking up arms relative to those from conventional economic activities, such as farming, as the causal mechanism linking low income to the incidence of civil war. 8 Indeed, Collier and Hoeffler suggest that the economic motives of potential rebels far outweigh other factors, such as indicators of social injustice, in explaining the incidence of rebellion. In their well-known formulation, it is greed, not grievance, that mainly explains variation in the occurrence of civil wars. 8 Miguel et al. (2004) also discuss an alternative, though possibly complementary, approach, that of Fearon and Laitin (2003), who emphasize the importance of state capacity and road coverage in explaining the outbreak and duration of civil war. 13

14 However, there is an important problem for purposes of testing such theories about the influence of economic conditions on civil conflict: civil conflict may influence economic conditions, and there may be confounding, too. As Miguel et al. (2004) put it, the existing literature does not adequately address the endogeneity of economic variables to civil war and thus does not convincingly establish a causal relationship. In addition to endogeneity, omitted variables for example, government institutional quality may drive both economic outcomes and conflict, producing misleading cross-country estimates (2004: 726). In other words, in a regression of civil conflict on economic growth, the latter may be dependent on the error term in the underlying regression model. The solution to this problem, Miguel et al. (2004) suggest, is instrumental variables regression. The instrument for economic growth they propose is weather shocks stemming from variation in rainfall. In sub-saharan Africa, as these authors demonstrate empirically, there is a strong positive correlation between percentage change in rainfall over the previous year and economic growth (and the correlation holds up for both lagged and contemporaneous annual change in rainfall). Drought hinders economic growth. Rainfall thus passes one key requirement for a potential instrument, that it be correlated with the endogenous regressor. The other key requirement, and always the harder one to fulfill, is that rainfall is exogenous that is, independent of the error term in the underlying regression model. This assertion is, of course, essentially untestable; but Miguel et al. (convincingly) probe its plausibility at length. Although establishing the plausibility of an instrument s exogeneity is always an important component of the instrumental variables approach, it is not the issue here. For purposes of this discussion, we will therefore assume rainfall is exogenous, which seems very sensible. The IVLS estimates presented by Miguel et al. suggest a strong negative relationship between economic growth and civil conflict: a five-percentage-point drop in annual economic growth increases the likelihood of a civil conflict (at least 25 deaths per year) in the following 14

15 year by over 12 percentage points which amounts to an increase of more than one-half in the likelihood of civil war (2004:727). 9 Miguel et al. also find perhaps surprisingly that the impact of income shocks on civil conflict is not significantly different in richer, more democratic, more ethnically diverse, or more mountainous African countries. This appears to be compelling evidence of a causal relationship, and Miguel et al. also have a plausible mechanism to explain the effect namely, the impact of drought on the recruitment of rebel soldiers. But have Miguel et al. really estimated the effect of economic growth on conflict? This is not so clear. Making this assertion, as we will see, depends on specific assumptions about the way growth produces conflict. In particular, it depends on positing a model in which economic growth has a constant effect on civil conflict constant, that is, across the components of growth. As with the example on lottery winnings, using the IVLS machinery to identify causal effects depends not just on the validity of the exclusion restriction it also depends on the validity of this model. The point might be elaborated as follows. Suppose for purposes of this argument that we can model economic growth in country i in year t as a function of growth in two sectors, agriculture and industry. We then want to consider two alternate ways in civil conflict could be a function of economic variables. On the one hand, it might be the case that the probability of civil conflict is given by Prob{C it = 1} = γ Y it + ɛ it (11) where C it is a binary variable for conflict in country i in year t (with C it = 1 indicating conflict), Y it is the economic growth rate of country i in year t, and ɛ it is a latent mean-zero random variable meant to capture unmeasured characteristics that affect the probability of civil war. 10 According to 9 The dependent variable is dichotomous; it measures the incidence of civil conflict in which there are more than 25 (alternatively, more than 1,000) battle deaths in a given year. The main equation thus appears to be specified as a linear probability model. 10 Equation (11) resembles the main equation found in Miguel et al. (2004: 737), though we abstract from control variables as well as lagged growth values here for ease of presentation. Miguel et al. in fact specify C it = γ( Y it ) + X it β + ɛ it, so the dichotomous C it is assumed to be a linear combination of continuous right-hand side covariates and a 15

16 the model, if we intervene to increase the economic growth rate in country i and year t by one unit, the probability of conflict in that country-year is expected to increase by γ units (or to decrease, if γ is negative). However, the model is agnostic about the source of this increase in economic growth. Indeed, if we want to influence the probability of conflict we might consider different interventions to boost growth: for example, we might target foreign aid with an eye to increasing industrial productivity, or we might hope that more rainfall will boost agricultural productivity. Suppose, on the other hand, that growth in agriculture and growth in industry which both influence overall economic growth have different effects on conflict, as in the following model: Prob{C it = 1} = α I t + β A t + ɛ it (12) where I t and A t are the annual growth rates in industry and agriculture, respectively. What might motivate such an alternative model? As the causal mechanism posited by Collier and Hoeffler suggests, decreases in agricultural productivity may increase the difference in returns to taking up arms and farming, making it more likely that the rebel force will grow and civil conflict will increase; yet in a context in which many rebels are recruited from the countryside, changes in (urban) industrial productivity may have no or at least different effects on the probability of conflict. In this context, heterogenous effects on the probability of conflict across components of growth may be the conservative assumption. Moving from either equation (11) or equation (12) to data, and thus to estimation of γ or α and β requires further statistical assumptions. In a standard linear probability model, the Y it in equation (11) would be independent of the ɛ it, and OLS would give unbiased estimates of γ. The problem is that Y it is dependent on the ɛ it, that is, endogenous. The solution that Miguel et al. propose is IVLS, with rainfall growth as an instrument for the Y it. Given the response schedule in equation (11), this seems like a very plausible solution. Rainfall growth is correlated continuous error term; yet the authors clearly have in mind a linear probability model, so in the text I write equation (11) instead. 16

17 with economic growth in Africa. If rainfall growth is also exogenous, as Miguel et al. argue, IVLS delivers the goods. If the true data-generating process is equation (12), however, another approach is needed. It seems reasonable that industrial growth and agricultural growth will both be dependent on the error term in equation (12), for the same reasons as Miguel et al. (2004) suggest that overall economic growth is endogenous. For instance, conflict may depress agricultural growth, and harm urban productivity as well. If rainfall growth is correlated with agricultural growth but not with industrial growth as seems plausible intuitively we have a good instrument for A t. But we cannot estimate α without an additional instrument for industrial productivity. The point here is not that equation (12) is the right response schedule; indeed, it is as stylized as equation (11), and there may well be contexts in which it is more appropriate to estimate (12). There are important policy implications, of course: if growth reduces conflict no matter what the source, we might counsel more foreign aid for the urban industrial sector, while if only agricultural productivity matters, the policy recommendations would be quite different. The objective here, however, is merely to point out that what IVLS estimates in this context (or any context) depends importantly on the assumed model, and not just on the plausible exogeneity of the instrument. 4 Analysis If the data-generating process involves heterogenous effects across endogenous and exogenous portions of X, and we instead assume homogenous effects, what does IVLS estimate? In this section, I analyze a case akin to the example on lottery winnings, where an endogenous regressor breaks down into the sum of exogenous and endogenous portions. 17

18 For each observation i, the true data-generating process is y i = β 1 x 1i + β 2 x 2i + ɛ i, (13) where β 1 and β 2 are parameters. When β 1 β 2, equation (13) is identical to equation (10) in the example on lottery winnings; x 1 is analogous to earned income and x 2 is analogous to lottery winnings. The subjects are i.i.d., and E(ɛ i ) = E(x 1i ) = E(x 2i ) = 0, with ɛ x 2 but Cov(ɛ, x 1 ) 0. Also, x 1 x 2, (14) as in the example on lottery winnings: subjects are randomized by the natural experiment to levels of x 2. Suppose we (erroneously) assume that data were generated according to y i = β(x 1i + x 2i ) + ɛ i, (15) Equation (15) is the usual regression model, with one exception: the regressor x Ti x 1i + x 2i (with T for total ) is endogenous, because x 1 and ɛ are dependent. However, we have available to us (by construction) a valid instrument, since x 2 is correlated with the endogenous regressor but also independent of the error term. The instrumental variables estimator is: ˆβ IVLS = Cov(x 2, y) Cov(x 2, x T ) (16) where the covariances are taken over data. 11 Now, substituting for x T and distributing covariances, 11 Equation (16) is valid because all of the x s have expectation 0. 18

19 we have ˆβ IVLS = Cov(x 2, y) Cov(x 2, x 1 ) + Var(x 2 ) (17) By assumption, x 1 and x 2 are independent, so Cov(x 2,x 1 ) should be near zero, and Cov(x 2, y) lim n Var(x 2 ) = β 2 (18) IVLS here delivers an estimate of the impact of the exogenous portion of treatment to which subjects have been randomized but not the endogenous portion of the aggregate variable of interest, x T. Yet we are ultimately interested in the effect of x 1, or at least x T ; otherwise, we could simply regress the dependent variable on x 2. This is the sense in which there is no free lunch provided by the exogeneity of the instrument, given that the data-generating process is equation (13) and not (15). In other cases, the situation may be somewhat more complicated. For instance, when Cov(x 1, x 2 ) 0, the IVLS estimate of β in equation (15) will converge to a mixture of β 1 and β 2, the weights being w = Cov(x 2, x 1 )/[Cov(x 2, x 1 ) + Var(x 2 )] and 1 w. In simulations reported on the author s website, I investigate what IVLS estimates under a range of other assumptions about the true data-generating process A specification test The discussion above suggests a natural specification test, which requires the availability of an additional instrument, z 1, with the following properties: z 1 ɛ (19) 12 In the formula for w, Var and Cov operate on random variables, and w could be negative. 19

20 and Cov(z 1, x 1 ) 0 (20) where the notation follows the section above. We will then use IVLS to estimate the model in equation (13) above, that is, y i = β 1 x 1i + β 2 x 2i + ɛ i (21) using z 1 and x 2 (which is exogenous) as the instruments. Let Σ be the estimated variance-covariance matrix for the coefficient estimates: Ĉov( ˆ β 1, ˆ β 2 x 1, z 1 ) = Σ (22) Using the diagonal and off-diagonal elements of this 2 2 matrix, we can calculate Var ( ˆ β 1 ˆ β 2 ) = Var ˆ β 1 + Var ˆ β 2 2 Ĉov ( ˆ β 1, ˆ β 2 ) (23) The coefficient estimates are asymptotically normal, and z-tests for the difference can be applied (see Greene 2003: for details). If pooling is appropriate, then the estimated coefficient on x 1 should be the same as the estimated coefficient on x 2, up to random error. Statistical tests should therefore fail to reject the null hypothesis that β 1 and β 2 are equal. This adaptation of a standard test compares a pooling estimator to a splitting estimator; it could be viewed as a Hausmann test, in which an additional instrument is needed to test the pooling restriction because x 1 is endogenous. In simulations, the specification test is able to detect model specification failures with a high degree of accuracy. Of course, like most specification tests, this one is robust only against a limited class of alternatives: we stipulate that the data are generated according to equation (13), and the alternatives are that β 1 = β 2 or β 1 β 2. Moreover, since the test requires the availability of an additional instrument, it may only be useful in certain classes of 20

21 applications. For instance, we do not attempt to key the test to data from the examples discussed in this paper because we do not see an available additional instrument Conclusion In a given natural experiment, nature may assign the units of analysis not to levels of the treatment variable X that is of greatest interest but to levels of another variable Z. Nonetheless, the causal effect of X may be recovered through instrumental variables regression analysis provided that the assumptions of the IVLS model are valid. In most applications, analysts tend to focus attention on two canonical requirements for a valid instrumental variable: the variable Z must be correlated with the endogenous regressor X, and it must itself be exogenous, that is, independent of the error term. The first assumption can be checked from the data. The second, essentially untestable, assumption is generally the more difficult one, and it is the one for which a good natural experiment can be particularly useful. However, satisfying these requirements is not enough for valid application of the instrumental variables approach. In particular, the regression model linking X to Y must also be valid. While this may seem obvious, in this article I have drawn attention to a too-infrequently remarked feature of the canonical IVLS regression model: the assumption of constant effects across exogenous and endogenous portions of the problematic regressor X. Violations of this assumption can limit the ability of the natural-experiment-cum-instrumentalvariables approach to recover causal parameters. For example, in order to use lottery income to estimate the effect of overall income on political attitudes, we must assume that the effects of lottery income and earned income are the same. To use rainfall changes to estimate the effect of economic growth on civil conflict, we must assume that growth in the agricultural sector has the same effect as growth in the industrial sector. These are strong assumptions, and they should be 13 For simulations, see the author s website. 21

22 defended with same kind of energy that is used to defend exogeneity. If the assumption of constant effects across endogenous and exogenous portions of X is wrong, then IVLS estimates can be quite misleading. When heterogeneity takes the simple form I have investigated here that is, the outcome variable is a sum of independent exogenous and endogenous portions instrumental variables regression simply estimates the coefficients of the exogenous portion. In more complicated settings, IVLS may estimate a mixture of the true coefficients of interest. Thus, if the model is incorrectly specified, exogeneity may not be much help. Ultimately, of course, the question of model specification is a theoretical and not a technical one. Whether it is proper to specify constant coefficients across exogenous and endogenous portions of a treatment variable, in examples like those discussed in this paper, is a matter for a priori reflection. This is not unique to applications of IVLS indeed, similar issues may arise even if there is no endogeneity yet special issues are raised with IVLS because we often hope to use the technique to recover the causal impact of endogenous treatments. What about the potential problem of infinite regress? In the lottery example, for instance, it might well be that different kinds of earned income have different impacts on political attitudes; in the Africa example, different sorts of agricultural income could have different effects on conflict. To test many permutations, given the endogeneity of the variables, we would need many instruments and these are not usually available. This is exactly the point. Deciding when it is appropriate to assume constant effects is a crucial theoretical issue. That issue tends to be given short shrift in current applications of the natural-experiment-cum-instrumental-variables approach, where the focus is on exogeneity. The basic point emphasized here is therefore not that data analysis or regression diagnostics are the key. Rather, in any particular application, a priori and theoretical reasoning should be brought to bear to justify the crucial assumptions of constant effects across endogenous and exogenous portions of the problematic regressor. In some settings, the constancy assumption may 22

23 be innocuous; the point here is not that the IVLS approach is necessarily flawed, but that the validity of underlying regression model should be carefully considered. The no free lunch principle suggests randomization to Z instead of X may not be enough to recover the causal impact of X. 23

24 References [1] Acemoglu, Daron, Simon Johnson, and James Robinson The Comparative Origins of Comparative Development: An Empirical Investigation. The American Economic Review Vol. 91 (5): [2] Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association 91 (434): [3] Angrist, Joshua D. and Victor Lavy Using Maimonides Rule to Estimate the Effect of Class Size on Student Achievement. Quarterly Journal of Economics 114: [4] Bartels, Larry M Instrumental and Quasi-Instrumental Variables. American Journal of Political Science Vol. 35 (3): bound et al Bound, John, David Jaeger, and Regina Baker Problems with Instrumental Variables Estimation when the Correlation between the Instruments and the Endogenous Explanatory Variables is Weak. Journal of the American Statistical Association Vol. 90 (430): [5] Doherty, Daniel, Donald Green, and Alan Gerber Personal Income and Attitudes toward Redistribution: A Study of Lottery Winners. Political Psychology Vol. 27 (3): [6] Dunning, Thad. Forthcoming. Improving Causal Inference: Strengths and Limitations of Natural Experiments. Political Research Quarterly. Previous version presented at the meetings of the American Political Science Association, August 31-September 5, Washington, D.C., [7] Freedman, David Statistical Models: Theory and Practice. Cambridge: Cambridge University Press. [8] Freedman, David, Robert Pisani, and Roger Purves Statistics. 3rd 3d. New York: W.W. Norton, Inc. 24

25 [9] Greene, William H Econometric Analysis. Prentice Hall: Upper Saddle River, NJ, Fifth Edition. [10] Hanushek, Eric A. and John E. Jackson Statistical Methods for Social Scientists. San Diego, CA: Academic Press, Harcourt Brace Company. [11] Heckman, James J Randomization as an Instrumental Variable. The Review of Economics and Statistics Vol 78 (2): [12] Heckman, James J. and R. Robb Alternative Methods for Evaluating the Impact of Interventions. In James J. Heckman and Burton Singer, eds., Longitudinal Analysis of Labor Market Data, Volume 10, pp New York: Cambridge University Press. [13] Heckman, James J. and R. Robb Alternative Methods for solving the problem of selection bias in evaluating the impact of treatments on outcomes. In Howard Wainer, ed., Drawing Inferences from Self-Selected Samples, pp New York: Springer-Verlag. [14] Heckman, James J., Sergio Urzua, Edward Vytlacil Understanding Instrumental Variables in Models with Essential Heterogeneity. Paper given by Heckman as the Tjalling C. Koopmans Lecture, Cowles Foundation, Yale University, September 26-27, [15] Holland, Paul W Statistics and causal inference. Journal of the American Statistical Association 8: (with discussion). [16] Kennedy, Peter A Guide to Econometrics. 2d ed. Cambridge: MIT Press. [17] Krasno, Jonathan S. and Donald P. Green Do Televised Presidential Ads Increase Voter Turnout? Evidence from a Natural Experiment. Manuscript, Department of Political Science, Yale University. 25

26 [18] Neyman, Jersey Sur les applications de la théorie des probabilités aux experiences agricoles: Essai des principes. Roczniki Nauk Rolniczych 10: 1 51, in Polish. English translation by DM Dabrowska and TP Speed (1990), Statistical Science 5: (with discussion). [19] Rosenzweig, Mark R. and Kenneth I. Wolpin Natural Natural Experiments in Economics. Journal of Economic Literature Vol. 38 (4): [20] Rubin, Donald Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66:

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap Deviations from the standard

More information

Post-Instrument Bias

Post-Instrument Bias Post-Instrument Bias Adam N. Glynn and Miguel R. Rueda October 5, 2017 Abstract Post-instrument covariates are often included in IV analyses to address a violation of the exclusion restriction. We demonstrate

More information

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous

Econometrics of causal inference. Throughout, we consider the simplest case of a linear outcome equation, and homogeneous Econometrics of causal inference Throughout, we consider the simplest case of a linear outcome equation, and homogeneous effects: y = βx + ɛ (1) where y is some outcome, x is an explanatory variable, and

More information

Empirical approaches in public economics

Empirical approaches in public economics Empirical approaches in public economics ECON4624 Empirical Public Economics Fall 2016 Gaute Torsvik Outline for today The canonical problem Basic concepts of causal inference Randomized experiments Non-experimental

More information

Statistical Models for Causal Analysis

Statistical Models for Causal Analysis Statistical Models for Causal Analysis Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 Three Modes of Statistical Inference 1. Descriptive Inference: summarizing and exploring

More information

Final Exam. Economics 835: Econometrics. Fall 2010

Final Exam. Economics 835: Econometrics. Fall 2010 Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Quantitative Economics for the Evaluation of the European Policy

Quantitative Economics for the Evaluation of the European Policy Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti Davide Fiaschi Angela Parenti 1 25th of September, 2017 1 ireneb@ec.unipi.it, davide.fiaschi@unipi.it,

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3

More information

AGEC 661 Note Fourteen

AGEC 661 Note Fourteen AGEC 661 Note Fourteen Ximing Wu 1 Selection bias 1.1 Heckman s two-step model Consider the model in Heckman (1979) Y i = X iβ + ε i, D i = I {Z iγ + η i > 0}. For a random sample from the population,

More information

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables

Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Applied Econometrics (MSc.) Lecture 3 Instrumental Variables Estimation - Theory Department of Economics University of Gothenburg December 4, 2014 1/28 Why IV estimation? So far, in OLS, we assumed independence.

More information

Applied Microeconometrics (L5): Panel Data-Basics

Applied Microeconometrics (L5): Panel Data-Basics Applied Microeconometrics (L5): Panel Data-Basics Nicholas Giannakopoulos University of Patras Department of Economics ngias@upatras.gr November 10, 2015 Nicholas Giannakopoulos (UPatras) MSc Applied Economics

More information

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha January 18, 2010 A2 This appendix has six parts: 1. Proof that ab = c d

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Panel Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University Joint

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

150C Causal Inference

150C Causal Inference 150C Causal Inference Instrumental Variables: Modern Perspective with Heterogeneous Treatment Effects Jonathan Mummolo May 22, 2017 Jonathan Mummolo 150C Causal Inference May 22, 2017 1 / 26 Two Views

More information

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 14: Control Functions and Related Methods Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. Linear-in-Parameters Models: IV versus Control Functions 2. Correlated

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Introduction to Linear Regression Analysis

Introduction to Linear Regression Analysis Introduction to Linear Regression Analysis Samuel Nocito Lecture 1 March 2nd, 2018 Econometrics: What is it? Interaction of economic theory, observed data and statistical methods. The science of testing

More information

4.8 Instrumental Variables

4.8 Instrumental Variables 4.8. INSTRUMENTAL VARIABLES 35 4.8 Instrumental Variables A major complication that is emphasized in microeconometrics is the possibility of inconsistent parameter estimation due to endogenous regressors.

More information

A Measure of Robustness to Misspecification

A Measure of Robustness to Misspecification A Measure of Robustness to Misspecification Susan Athey Guido W. Imbens December 2014 Graduate School of Business, Stanford University, and NBER. Electronic correspondence: athey@stanford.edu. Graduate

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

ECO375 Tutorial 9 2SLS Applications and Endogeneity Tests

ECO375 Tutorial 9 2SLS Applications and Endogeneity Tests ECO375 Tutorial 9 2SLS Applications and Endogeneity Tests Matt Tudball University of Toronto Mississauga November 23, 2017 Matt Tudball (University of Toronto) ECO375H5 November 23, 2017 1 / 33 Hausman

More information

Potential Outcomes Model (POM)

Potential Outcomes Model (POM) Potential Outcomes Model (POM) Relationship Between Counterfactual States Causality Empirical Strategies in Labor Economics, Angrist Krueger (1999): The most challenging empirical questions in economics

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Department of Politics Center for Statistics and Machine Learning Princeton University

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science EXAMINATION: QUANTITATIVE EMPIRICAL METHODS Yale University Department of Political Science January 2014 You have seven hours (and fifteen minutes) to complete the exam. You can use the points assigned

More information

Dealing With Endogeneity

Dealing With Endogeneity Dealing With Endogeneity Junhui Qian December 22, 2014 Outline Introduction Instrumental Variable Instrumental Variable Estimation Two-Stage Least Square Estimation Panel Data Endogeneity in Econometrics

More information

P E R S P E C T I V E S

P E R S P E C T I V E S PHOENIX CENTER FOR ADVANCED LEGAL & ECONOMIC PUBLIC POLICY STUDIES Econometric Analysis of Broadband Subscriptions: A Note on Specification George S. Ford, PhD May 12, 2009 Broadband subscriptions are

More information

Lecture notes to Stock and Watson chapter 12

Lecture notes to Stock and Watson chapter 12 Lecture notes to Stock and Watson chapter 12 Instrument variable regression Tore Schweder October 2008 TS () LN10 21/10 1 / 16 Outline Do SW: 11.6 Exogenous and endogenous regressors The problem of estimating

More information

Instrumental Variables and the Problem of Endogeneity

Instrumental Variables and the Problem of Endogeneity Instrumental Variables and the Problem of Endogeneity September 15, 2015 1 / 38 Exogeneity: Important Assumption of OLS In a standard OLS framework, y = xβ + ɛ (1) and for unbiasedness we need E[x ɛ] =

More information

Applied Health Economics (for B.Sc.)

Applied Health Economics (for B.Sc.) Applied Health Economics (for B.Sc.) Helmut Farbmacher Department of Economics University of Mannheim Autumn Semester 2017 Outlook 1 Linear models (OLS, Omitted variables, 2SLS) 2 Limited and qualitative

More information

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes

Chapter 1 Introduction. What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes Chapter 1 Introduction What are longitudinal and panel data? Benefits and drawbacks of longitudinal data Longitudinal data models Historical notes 1.1 What are longitudinal and panel data? With regression

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics T H I R D E D I T I O N Global Edition James H. Stock Harvard University Mark W. Watson Princeton University Boston Columbus Indianapolis New York San Francisco Upper Saddle

More information

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data?

When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? When Should We Use Linear Fixed Effects Regression Models for Causal Inference with Longitudinal Data? Kosuke Imai Princeton University Asian Political Methodology Conference University of Sydney Joint

More information

Econometric Causality

Econometric Causality Econometric (2008) International Statistical Review, 76(1):1-27 James J. Heckman Spencer/INET Conference University of Chicago Econometric The econometric approach to causality develops explicit models

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Introduction to Statistical Inference Kosuke Imai Princeton University January 31, 2010 Kosuke Imai (Princeton) Introduction to Statistical Inference January 31, 2010 1 / 21 What is Statistics? Statistics

More information

Lecture 8. Roy Model, IV with essential heterogeneity, MTE

Lecture 8. Roy Model, IV with essential heterogeneity, MTE Lecture 8. Roy Model, IV with essential heterogeneity, MTE Economics 2123 George Washington University Instructor: Prof. Ben Williams Heterogeneity When we talk about heterogeneity, usually we mean heterogeneity

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 1 Jakub Mućk Econometrics of Panel Data Meeting # 1 1 / 31 Outline 1 Course outline 2 Panel data Advantages of Panel Data Limitations of Panel Data 3 Pooled

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

6. Assessing studies based on multiple regression

6. Assessing studies based on multiple regression 6. Assessing studies based on multiple regression Questions of this section: What makes a study using multiple regression (un)reliable? When does multiple regression provide a useful estimate of the causal

More information

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data

Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data Recent Advances in the Field of Trade Theory and Policy Analysis Using Micro-Level Data July 2012 Bangkok, Thailand Cosimo Beverelli (World Trade Organization) 1 Content a) Endogeneity b) Instrumental

More information

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply

ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply ECO 310: Empirical Industrial Organization Lecture 2 - Estimation of Demand and Supply Dimitri Dimitropoulos Fall 2014 UToronto 1 / 55 References RW Section 3. Wooldridge, J. (2008). Introductory Econometrics:

More information

Chapter 6 Stochastic Regressors

Chapter 6 Stochastic Regressors Chapter 6 Stochastic Regressors 6. Stochastic regressors in non-longitudinal settings 6.2 Stochastic regressors in longitudinal settings 6.3 Longitudinal data models with heterogeneity terms and sequentially

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models James J. Heckman and Salvador Navarro The University of Chicago Review of Economics and Statistics 86(1)

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models. An obvious reason for the endogeneity of explanatory

Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models. An obvious reason for the endogeneity of explanatory Wooldridge, Introductory Econometrics, 3d ed. Chapter 16: Simultaneous equations models An obvious reason for the endogeneity of explanatory variables in a regression model is simultaneity: that is, one

More information

The propensity score with continuous treatments

The propensity score with continuous treatments 7 The propensity score with continuous treatments Keisuke Hirano and Guido W. Imbens 1 7.1 Introduction Much of the work on propensity score analysis has focused on the case in which the treatment is binary.

More information

Identifying the Monetary Policy Shock Christiano et al. (1999)

Identifying the Monetary Policy Shock Christiano et al. (1999) Identifying the Monetary Policy Shock Christiano et al. (1999) The question we are asking is: What are the consequences of a monetary policy shock a shock which is purely related to monetary conditions

More information

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL

IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL IDENTIFICATION OF TREATMENT EFFECTS WITH SELECTIVE PARTICIPATION IN A RANDOMIZED TRIAL BRENDAN KLINE AND ELIE TAMER Abstract. Randomized trials (RTs) are used to learn about treatment effects. This paper

More information

review session gov 2000 gov 2000 () review session 1 / 38

review session gov 2000 gov 2000 () review session 1 / 38 review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review

More information

Linear Models in Econometrics

Linear Models in Econometrics Linear Models in Econometrics Nicky Grant At the most fundamental level econometrics is the development of statistical techniques suited primarily to answering economic questions and testing economic theories.

More information

Understanding Ding s Apparent Paradox

Understanding Ding s Apparent Paradox Submitted to Statistical Science Understanding Ding s Apparent Paradox Peter M. Aronow and Molly R. Offer-Westort Yale University 1. INTRODUCTION We are grateful for the opportunity to comment on A Paradox

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Potential Outcomes and Causal Inference I

Potential Outcomes and Causal Inference I Potential Outcomes and Causal Inference I Jonathan Wand Polisci 350C Stanford University May 3, 2006 Example A: Get-out-the-Vote (GOTV) Question: Is it possible to increase the likelihood of an individuals

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Friday, June 5, 009 Examination time: 3 hours

More information

Ec1123 Section 7 Instrumental Variables

Ec1123 Section 7 Instrumental Variables Ec1123 Section 7 Instrumental Variables Andrea Passalacqua Harvard University andreapassalacqua@g.harvard.edu November 16th, 2017 Andrea Passalacqua (Harvard) Ec1123 Section 7 Instrumental Variables November

More information

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents Longitudinal and Panel Data Preface / i Longitudinal and Panel Data: Analysis and Applications for the Social Sciences Table of Contents August, 2003 Table of Contents Preface i vi 1. Introduction 1.1

More information

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008

A Course in Applied Econometrics Lecture 7: Cluster Sampling. Jeff Wooldridge IRP Lectures, UW Madison, August 2008 A Course in Applied Econometrics Lecture 7: Cluster Sampling Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of roups and

More information

Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D.

Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D. Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D. David Kaplan Department of Educational Psychology The General Theme

More information

Sixty years later, is Kuznets still right? Evidence from Sub-Saharan Africa

Sixty years later, is Kuznets still right? Evidence from Sub-Saharan Africa Quest Journals Journal of Research in Humanities and Social Science Volume 3 ~ Issue 6 (2015) pp:37-41 ISSN(Online) : 2321-9467 www.questjournals.org Research Paper Sixty years later, is Kuznets still

More information

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College December 2016 Abstract Lewbel (2012) provides an estimator

More information

Part VII. Accounting for the Endogeneity of Schooling. Endogeneity of schooling Mean growth rate of earnings Mean growth rate Selection bias Summary

Part VII. Accounting for the Endogeneity of Schooling. Endogeneity of schooling Mean growth rate of earnings Mean growth rate Selection bias Summary Part VII Accounting for the Endogeneity of Schooling 327 / 785 Much of the CPS-Census literature on the returns to schooling ignores the choice of schooling and its consequences for estimating the rate

More information

Assessing Studies Based on Multiple Regression

Assessing Studies Based on Multiple Regression Assessing Studies Based on Multiple Regression Outline 1. Internal and External Validity 2. Threats to Internal Validity a. Omitted variable bias b. Functional form misspecification c. Errors-in-variables

More information

Policy-Relevant Treatment Effects

Policy-Relevant Treatment Effects Policy-Relevant Treatment Effects By JAMES J. HECKMAN AND EDWARD VYTLACIL* Accounting for individual-level heterogeneity in the response to treatment is a major development in the econometric literature

More information

Research Note: A more powerful test statistic for reasoning about interference between units

Research Note: A more powerful test statistic for reasoning about interference between units Research Note: A more powerful test statistic for reasoning about interference between units Jake Bowers Mark Fredrickson Peter M. Aronow August 26, 2015 Abstract Bowers, Fredrickson and Panagopoulos (2012)

More information

ECON 402: Advanced Macroeconomics 1. Advanced Macroeconomics, ECON 402. New Growth Theories

ECON 402: Advanced Macroeconomics 1. Advanced Macroeconomics, ECON 402. New Growth Theories ECON 402: Advanced Macroeconomics 1 Advanced Macroeconomics, ECON 402 New Growth Theories The conclusions derived from the growth theories we have considered thus far assumes that economic growth is tied

More information

What s New in Econometrics. Lecture 13

What s New in Econometrics. Lecture 13 What s New in Econometrics Lecture 13 Weak Instruments and Many Instruments Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Motivation 3. Weak Instruments 4. Many Weak) Instruments

More information

Big Data, Machine Learning, and Causal Inference

Big Data, Machine Learning, and Causal Inference Big Data, Machine Learning, and Causal Inference I C T 4 E v a l I n t e r n a t i o n a l C o n f e r e n c e I F A D H e a d q u a r t e r s, R o m e, I t a l y P a u l J a s p e r p a u l. j a s p e

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Functional form misspecification We may have a model that is correctly specified, in terms of including

More information

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from

Lecture Notes 1: Decisions and Data. In these notes, I describe some basic ideas in decision theory. theory is constructed from Topics in Data Analysis Steven N. Durlauf University of Wisconsin Lecture Notes : Decisions and Data In these notes, I describe some basic ideas in decision theory. theory is constructed from The Data:

More information

Multiple Linear Regression CIVL 7012/8012

Multiple Linear Regression CIVL 7012/8012 Multiple Linear Regression CIVL 7012/8012 2 Multiple Regression Analysis (MLR) Allows us to explicitly control for many factors those simultaneously affect the dependent variable This is important for

More information

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i,

A Course in Applied Econometrics Lecture 18: Missing Data. Jeff Wooldridge IRP Lectures, UW Madison, August Linear model with IVs: y i x i u i, A Course in Applied Econometrics Lecture 18: Missing Data Jeff Wooldridge IRP Lectures, UW Madison, August 2008 1. When Can Missing Data be Ignored? 2. Inverse Probability Weighting 3. Imputation 4. Heckman-Type

More information

Chapter 2: simple regression model

Chapter 2: simple regression model Chapter 2: simple regression model Goal: understand how to estimate and more importantly interpret the simple regression Reading: chapter 2 of the textbook Advice: this chapter is foundation of econometrics.

More information

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague

Econometrics. Week 8. Fall Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Econometrics Week 8 Institute of Economic Studies Faculty of Social Sciences Charles University in Prague Fall 2012 1 / 25 Recommended Reading For the today Instrumental Variables Estimation and Two Stage

More information

ECON Introductory Econometrics. Lecture 17: Experiments

ECON Introductory Econometrics. Lecture 17: Experiments ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.

More information

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case

Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Econ 2148, fall 2017 Instrumental variables I, origins and binary treatment case Maximilian Kasy Department of Economics, Harvard University 1 / 40 Agenda instrumental variables part I Origins of instrumental

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

Club Convergence: Some Empirical Issues

Club Convergence: Some Empirical Issues Club Convergence: Some Empirical Issues Carl-Johan Dalgaard Institute of Economics University of Copenhagen Abstract This note discusses issues related to testing for club-convergence. Specifically some

More information

Technical Track Session I: Causal Inference

Technical Track Session I: Causal Inference Impact Evaluation Technical Track Session I: Causal Inference Human Development Human Network Development Network Middle East and North Africa Region World Bank Institute Spanish Impact Evaluation Fund

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

Has the Family Planning Policy Improved the Quality of the Chinese New. Generation? Yingyao Hu University of Texas at Austin

Has the Family Planning Policy Improved the Quality of the Chinese New. Generation? Yingyao Hu University of Texas at Austin Very preliminary and incomplete Has the Family Planning Policy Improved the Quality of the Chinese New Generation? Yingyao Hu University of Texas at Austin Zhong Zhao Institute for the Study of Labor (IZA)

More information

Analysis of Panel Data: Introduction and Causal Inference with Panel Data

Analysis of Panel Data: Introduction and Causal Inference with Panel Data Analysis of Panel Data: Introduction and Causal Inference with Panel Data Session 1: 15 June 2015 Steven Finkel, PhD Daniel Wallace Professor of Political Science University of Pittsburgh USA Course presents

More information

PhD/MA Econometrics Examination January 2012 PART A

PhD/MA Econometrics Examination January 2012 PART A PhD/MA Econometrics Examination January 2012 PART A ANSWER ANY TWO QUESTIONS IN THIS SECTION NOTE: (1) The indicator function has the properties: (2) Question 1 Let, [defined as if using the indicator

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Online Appendix: The Role of Theory in Instrument-Variables Strategies

Online Appendix: The Role of Theory in Instrument-Variables Strategies Journal of Economic Perspectives Volume 24, Number 3 Summer 2010 Pages 1 6 Online Appendix: The Role of Theory in Instrument-Variables Strategies In this appendix, I illustrate the role of theory further

More information

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data

Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data Panel data Repeated observations on the same cross-section of individual units. Important advantages relative to pure cross-section data - possible to control for some unobserved heterogeneity - possible

More information

CEPA Working Paper No

CEPA Working Paper No CEPA Working Paper No. 15-06 Identification based on Difference-in-Differences Approaches with Multiple Treatments AUTHORS Hans Fricke Stanford University ABSTRACT This paper discusses identification based

More information

Applied Quantitative Methods II

Applied Quantitative Methods II Applied Quantitative Methods II Lecture 4: OLS and Statistics revision Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 4 VŠE, SS 2016/17 1 / 68 Outline 1 Econometric analysis Properties of an estimator

More information

Making sense of Econometrics: Basics

Making sense of Econometrics: Basics Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/

More information

Regression Discontinuity Designs

Regression Discontinuity Designs Regression Discontinuity Designs Kosuke Imai Harvard University STAT186/GOV2002 CAUSAL INFERENCE Fall 2018 Kosuke Imai (Harvard) Regression Discontinuity Design Stat186/Gov2002 Fall 2018 1 / 1 Observational

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Quantitative Empirical Methods Exam

Quantitative Empirical Methods Exam Quantitative Empirical Methods Exam Yale Department of Political Science, August 2016 You have seven hours to complete the exam. This exam consists of three parts. Back up your assertions with mathematics

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap of the course Introduction.

More information

Robustness of Logit Analysis: Unobserved Heterogeneity and Misspecified Disturbances

Robustness of Logit Analysis: Unobserved Heterogeneity and Misspecified Disturbances Discussion Paper: 2006/07 Robustness of Logit Analysis: Unobserved Heterogeneity and Misspecified Disturbances J.S. Cramer www.fee.uva.nl/ke/uva-econometrics Amsterdam School of Economics Department of

More information

08 Endogenous Right-Hand-Side Variables. Andrius Buteikis,

08 Endogenous Right-Hand-Side Variables. Andrius Buteikis, 08 Endogenous Right-Hand-Side Variables Andrius Buteikis, andrius.buteikis@mif.vu.lt http://web.vu.lt/mif/a.buteikis/ Introduction Consider a simple regression model: Y t = α + βx t + u t Under the classical

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information