Dr. StrangeLOVE, or. How I Learned to Stop Worrying and Love Omitted Variables. Adam W. Meade, Tara S. Behrend and Charles E.

Size: px
Start display at page:

Download "Dr. StrangeLOVE, or. How I Learned to Stop Worrying and Love Omitted Variables. Adam W. Meade, Tara S. Behrend and Charles E."

Transcription

1 AU: Check that your name is presented correctly and consistently here against the TOC 4 Dr. StrangeLOVE, or How I Learned to Stop Worrying and Love Omitted Variables Adam W. Meade, Tara S. Behrend and Charles E. Lance A well-known problem in path analysis and structural equation modeling (SEM) is that even the largest and most comprehensive models cannot contain all of the causes of models endogenous variables. This violation of one of the underlying assumptions of path analysis and SEM gives rise to a commonly held belief that failure to include all relevant causes of endogenous variables may invalidate study results in path analysis and SEM. This problem has been referred to variously as the unmeasured variables problem (Duncan, 1975; James, 1980), the omitted variables problem (James, 1980; Kenny, 1979; Sackett, Laczo, & Lippe, 2003), left out variables error (LOVE; Mauro, 1990), a lack of perfect isolation (i.e., pseudo-isolation; Bollen, 1989), and lack of self-containment (James, Mulaik, & Brett, 1982). It has also been discussed as a particular type of model specification error (Hanushek & Jackson, 1977; Kenny, 1979). The omitted variables problem arises when the assumption that all relevant variables that influence the dependent (endogenous) variables are included in the model is violated. However, in the social sciences, this assumption is rarely, if ever, fulfilled. Although there is no shortage of scholarly discussion and writing related to omitted variables, it is less clear how often this issue arises in substantive academic and applied research. This is because discussion of omitted variables usually takes place behind the scenes, for example during the manuscript review process. In response to a post to the RMNET message board on June 11, 2007, several authors 91 RT2382X.indb 91 5/19/08 7:44:32 AM

2 92 Adam W. Meade, Tara S. Behrend and Charles E. Lance indicated that omitted variable discussions have arisen during the review process. In one example, an anonymous reviewer commented on a paper related to sources of work absenteeism: However, omitted variables that are tied to absenteeism still remain a concern as family size, number of children, and being single head of household are also related to race/ethnicity. The issue is not that perceived value of diversity and children, etc. are related (as the authors contend), it is that race is correlated with both reports of value of diversity and number of children etc., and then with absenteeism. Hence, absenteeism is potentially being driven by factors other than what the author(s) allege. Simply acknowledging the lack of critical data (pages 26 & 27) does not eliminate the concern that major confounds were not adequately controlled. (S. Tonidandel, personal communication, June 12, 2007). This comment is undoubtedly typical of those researchers regularly encounter. In order to provide some index of the extent to which researchers consider omitted variable issues in their work, we conducted a cited reference search. Specifically, we used the Social Science Citation Index to identify works that cited two seminal papers on omitted variables, James (1980) and Mauro (1990), on the assumption that authors dealing with omitted variables issues in their research would be likely to cite these works. A total of 63 sources were found that cited these studies. We then coded each of these sources into one of four categories based on the context in which they discussed omitted variables. Of the 63 sources, 12 actually took steps to assess risk from omitted variables or acted to minimize the impact of omitted variables in some way (e.g., including relevant variables not of central focus to the model [Prussia, Kinicki, & Bracker, 1993], testing alternative models with and without potential additional determinant variables [Colquitt, LePine, & Noe, 2000; Prussia & Kinicki, 1996]). An additional 21 articles cited James (1980) or Mauro (1990) when discussing the potential biasing effect of omitted variables but did not attempt to account for such variables in any way. Twenty-six sources cited these works as part of a methodological review of path analysis or SEM. Finally, four sources mentioned the potential of omitted variables as a limitation of previous research in order to help justify their current study. In sum, it seems that reviewers and others critically evaluating organizational research are aware of the omitted variables issue and voice concerns over LOVE, perhaps even in contexts in which there is minimal risk of omitted variables compromising research RT2382X.indb 92 5/19/08 7:44:32 AM

3 Dr. StrangeLOVE 93 conclusions. On the other hand, authors seem to address omitted variables in a meaningful way less frequently than would be desired. This is not surprising given that authors may not want to call attention to methodological issues that could question the validity of their study conclusions. However, there are some instances in which omitted variables do pose a considerable threat to the conclusions of path analyses and SEM. In order to provide a better understanding of when omitted variables may or may not jeopardize the validity of path analysis and SEM, this chapter has three goals: (a) review the relevant assumptions in path analysis and SEM and present a mathematical explanation of the omitted variables problem, (b) discuss the conditions under which omitted variables are likely to be problematic and those under which the effects of omitted variables are negligible, and (c) provide recommendations for minimizing the risk of LOVE. Theoretical and Mathematical Definition of the Omitted Variables Problem Conceptually, the problems that may be caused by omitted variables are not difficult to understand. When researchers specify path or structural equation models in order to evaluate a theory, path coefficients are estimated based on the correlations among the measured variables in the model and the pattern of structural relations specified. If an endogenous (dependent) variable is affected by a variable that is unmeasured, and the unmeasured variable correlates to a moderate degree with other causal determinants in the model, the effects of the unmeasured variable can be incorrectly attributed to the measured causal determinants in the model. While the effect of the omitted variable could serve to decrease the magnitude of the path coefficient of the measured variable (i.e., a suppressor effect), it is more often assumed that the effect would cause a positive bias in the path coefficient of the measured variable. This positive bias could also result in the determination that a determinant has a statistically significant effect on an endogenous variable, when such a finding would not have been the case if the unmeasured variable had been included in the path model. This error is referred to as LOVE. The omitted variables problem is perhaps best understood by first looking at the basic mathematics supporting path modeling. In RT2382X.indb 93 5/19/08 7:44:32 AM

4 94 Adam W. Meade, Tara S. Behrend and Charles E. Lance order to clearly demonstrate this issue, we outline a series of progressively more complex path models based on standardized variables (i.e., β will be used as the symbol for path coefficients and regression weights). These models may then be generalized to the case of latent variables in SEM as the underlying conceptual issues are the same. The simplest linear causal model includes one exogenous variable (X) and a single endogenous variable (Y). Assuming that both are expressed in standard score form, the relationship between them can be expressed as Y = β yx X + d (4.1) where β yx is the standardized regression coefficient, and d is a disturbance term composed of (a) random shocks, (b) nonsystematic measurement error, (c) unmeasured relevant causes, and (d) unmeasured nonrelevant causes (James et al., 1982). Random shocks can be thought of as unstable causal influences, measurement error refers to nonsystematic error, and unmeasured causes are omitted variables (see James et al., 1982). Whether or not a cause is relevant depends on the nature of its relationship with other variables in the model and is illustrated below. Figure 4.1 illustrates the path model for the case of a single causal exogenous variable and a single endogenous variable. In Figure 4.1a, the disturbance term (d) consists exclusively of random shocks (RS), measurement error (ME), and unmeasured nonrelevant causes (NRC). For this model, the expected relationship between X and Y is given by the equation E(X*Y) = β yx E(Y*Y) + E(X*d) (4.2) For Figure 4.1a, E(X*Y) reduces to β yx as E(Y*Y) = 1.0 for standardized variables and E(X*d) = 0 because the expected relationship between each of the three components of d (random shocks, measurement error, nonrelevant causes) and X equals zero. In this case r xy is an unbiased estimate of the causal parameter β yx. In Figure 4.1b, however, an additional component is present in the disturbance term, an omitted relevant cause (O). As before, the expected relationship between the random shocks, measurement error, and nonrelevant causes and X equals zero. However, the RT2382X.indb 94 5/19/08 7:44:32 AM

5 Dr. StrangeLOVE 95 (a) d (= RS + ME + NRC) X β yx Y (b) r xo d (= RS + ME + NRC + O) X β yx Y Figure 4.1 Path model for one exogenous and one endogenous variable. expected relationship between X and d = r xo b yo as there is an indirect effect of X on d due to the omitted variable that is present in d. An important concept to highlight is that the relevance of an omitted determinant of the endogenous variable is based entirely on the omitted variable s relationship with other variables in the model. That is, if the omitted causal variable correlates with other determinants of Y, the omitted variable is by definition a relevant omitted variable. Conversely, if the omitted variable does not correlate with other determinants of Y, it is by definition a nonrelevant cause of Y. Consider now the case of a path model in which one of two exogenous variables is erroneously omitted from the path model (O in Figure 4.2). Assume further that O correlates significantly with both X and Y. In this case, the measured correlation between X and Y reflects not only the direct effect of X on Y, but also the indirect effect of X on Y via the shared correlation both variables have with O. In other words, the observed correlation is determined by the equation r xy = β yx + r xo β yo (4.3) X 1 β yx1 d X β yx d r x1x2 Y r xo Y X 2 β yx2 O β yo (a) (b) Figure 4.2 Path model for two exogenous variables (one omitted). RT2382X.indb 95 5/19/08 7:44:33 AM

6 96 Adam W. Meade, Tara S. Behrend and Charles E. Lance However, because O is omitted from the path model, the (naively) estimated path between X and Y (β yx ) will be equal to r yx, though r yx is actually determined by the effect of both β yx and r xo β yo. As a result, r yx as an estimate of β yx will be biased by a factor of r xo β yo. The effect of r xo is obvious. If X were not correlated with O, then r yx is not affected by O and r yx is an unbiased estimate of β yx. In this case, O is a nonrelevant omitted cause of Y. That is, its omission from the path equation has minimal effect on the estimated path coefficient of the included exogenous variables or on their associated tests of statistical significance. Conversely, if X were nontrivially correlated with O, r xy would differ from β xy by a factor equal to r xo β yo so that r xy would be a biased estimate of β xy. This bias can affect tests of statistical significance and lead to erroneous conclusions regarding the model. In this case, O is a relevant omitted cause of Y. Although the potential biasing effect of r xo on β yx is obvious, the effect of β yo is less transparent. The equation for the path coefficient β yo is r r r (4.4) 2 1 r yo yx xo β yo = xo so that in order for β yo to have a biasing effect on r xy, which could be taken as the estimate of β yx, the correlation between X and O must be nonzero. If the correlation between X and O is nontrivially positive, bias in β yx will be greater when the correlation between Y and O is large and the correlation between Y and X is small. In order to provide some context for illustration, Table 4.1 includes several hypothetical values for r xy, r xo, and r yo. Note that no values of r xo = 0 are presented because there is no bias in r xy as an estimate of β yx when there is no correlation between the exogenous variable and the omitted variable (i.e., O is a nonrelevant cause of Y). As can be seen in Table 4.1, bias is greatest when the correlation between the X and Y is somewhat low (.20) yet the omitted variable correlates highly with both X and Y. This is the classic third variable problem (e.g., the spurious correlation between ice cream sales and drowning deaths) and a primary reason that correlation cannot be interpreted as causation. In this case, much of the effect attributed to the relationship between X and Y is actually due to their mutual correlation with and/or dependence on O. RT2382X.indb 96 5/19/08 7:44:34 AM

7 Dr. StrangeLOVE 97 Table 4.1 Biasing Effects of an Omitted Variable in a Two- Determinant Model ˆβ yx = r xy r xo r yo β yo β xy Bias Note. Bias is the estimated path coefficient ( ˆβyx = r xy ) minus the true path coefficient β yx. This value is equal to r xo β yo. Conditions in which r xo = 0 are not displayed, as there is no bias under these conditions. Note that when the correlation between the endogenous variable (Y) and the omitted variable (O) is close to zero, b yo can take on negative values. When b yo is negative, r xy (which is used to estimate b yx but is mathematically equal to b yx + r xo b yo ) will actually be greater than β yx. In this case, the omission of O causes an underestimate of the path coefficient between X and Y and variable O is said to have a suppressor effect such that its inclusion in the model serves to increase the estimated path coefficient between X and Y. Examples of such negative bias are present in Table 4.1. Suppressor effects are most readily manifested when the omitted variable has a very low correlation with RT2382X.indb 97 5/19/08 7:44:35 AM

8 98 Adam W. Meade, Tara S. Behrend and Charles E. Lance the endogenous variable but a moderate or large correlation with the exogenous variable in question. In such cases, the true path coefficient for the observed exogenous variable is considerably larger than the zero-order correlation between the exogenous variable and endogenous variable that is used as an estimate of the path coefficient. In sum, several important points result from the discussion of a model with one observed determinant (X) and one omitted determinant (O) of a single endogenous variable: 1. r xy will be a biased estimate of b yx to the extent that there exist omitted relevant causes of Y. 2. This bias will be upward (i.e., r xy > b yx ) to the extent that r xo b yo > By extension, both r xo and b yo must be nonzero for bias to occur. If either r xo 0 (O is unrelated to X and thus is a nonrelevant cause) or b yo 0 (there is not unique effect of O on Y; it is not a determinant of Y), no bias occurs. 4. If one of the terms, r xo or b yo, is negative and the other is positive, a suppression situation occurs (i.e., r xy < b yo ). 5. If r xo and b yo are both negative, there will be upward bias in the estimation of b yx from r xy. Violated Assumptions Omitted relevant variable represents a violation of the assumption of self-containment in causal modeling (James et al., 1982; Simon, 1977) and is but one type of model misspecification. We cannot isolate an endogenous variable from all potential causal explanatory variables in the social sciences. Instead, we replace the assumption of isolation with one of pseudo-isolation by assuming that the disturbance term, variance in the endogenous variable not accounted for by its modeled causes, is uncorrelated with exogenous variables (Bollen, 1989), or with endogenous variables that precede the variable in question in the causal path (Duncan, 1975; James, 1980). This can be seen by again examining Figure 4.2b. In Figure 4.2b, the disturbance term, d, would now include the effect of the standardized omitted variable (β yo ). Clearly, the self-containment assumption is violated, as X will correlate with the disturbance term by a magnitude of r xo β oy. RT2382X.indb 98 5/19/08 7:44:35 AM

9 Dr. StrangeLOVE 99 X r xo β mx M β yx β ym Y d β mo O β yo Figure 4.3 Partially mediated path model with omitted variable. More Complex Models Although the effects of the omitted variable are clearly visible in a model with two exogenous variables, things rapidly become more complex when more variables are added to the model. Figure 4.3 depicts a path model illustrating the partially mediating effect of a mediator (M) on the relationship between an exogenous variable, X, and an omitted relevant causal variable, O, with the endogenous variable (Y). The path model for M is identical to that of a two exogenous variable model. As in the previous example, if O is omitted, then the expected path coefficients and potential for bias are identical to those of a path coefficient with two determinants. There are three causes of Y, yet one of these is omitted. The true population path equation for this model is Y = β yx X + β ym M + β yo O + d (4.5) And the path coefficient β yx in the true model is given as ( ) + ( ) + ( ) r 1 r r r r r r r r rxo (4.6) yx mo ym xo mo xm yo xm mo β yx = 1+ 2rxmrmo rxo rxo rmo rxm More complicated models are obviously possible as well, though algebraic expressions for the path coefficients rapidly become unwieldy. In the current example, if variable O were omitted, the estimated path coefficient for the direct effect of X on Y would be RT2382X.indb 99 5/19/08 7:44:36 AM

10 100 Adam W. Meade, Tara S. Behrend and Charles E. Lance that of a two-determinant model, in which the effect of the omitted variable is ignored: r r r = (4.7) 2 1 r ˆβ yx yx ym xm xm In order to further illustrate the effects of an omitted variable in this model, data were simulated for several levels of correlation between variables O and Y. Table 4.2 contains the level of bias observed in the path coefficient of X for different levels of correlation between the omitted variable and the other causal variables in the model. Readily apparent from Table 4.2 is that the magnitude of bias is not large in any of the conditions when the correlation between O and Y is.20. Results are more mixed for those conditions in which the correlation between the omitted variable and Y is.60. In these conditions, the magnitude of the bias of path coefficient of X can be large, but only when the correlation between the X and O is also quite large. Also, the magnitude of the bias is mitigated somewhat by the correlation between the omitted variable and M, though the bias is still sizable. Note the values presented in Table 4.2 that represent the case in which there is a relatively small correlation between X and Y, and large correlations between O and both Y and X. Under these circumstances, bias can be sizable. We set the correlations in Tables 4.1 and 4.2 to arbitrary values in order to demonstrate their effects, but in practice correlation coefficients may not plausibly vary independently of one another (Mauro, 1990). In other words, a situation in which two variables correlate very highly, and one of those two correlates highly with a third variable while the other correlates negatively with the third variable, is mathematically improbable. The patterns of correlations that result in the most bias are those in which there is a very low correlation between the measured determinants and the endogenous variable, and high correlations between both the measured determinants and omitted variables and the omitted and endogenous variables (refer to Tables 4.1 and 4.2). While such patterns of correlations are mathematically possible, they may be unlikely in some domains of study given what is known from previous research. To summarize, omitted variables can introduce bias in estimated path coefficients and this bias may be positive or negative in RT2382X.indb 100 5/19/08 7:44:36 AM

11 Dr. StrangeLOVE 101 Table 4.2 Biasing Effects of an Omitted Variable in a Three- Determinant Model r yx r ym r yo r xm r xo r mo β yx ˆβyx Bias Note. β yx represents the true path coefficient of the exogenous variable X in the completely specified model. ˆβyx represents the estimated path coefficient of X in the omitted variable model. Bias is the difference between these two. direction. The issue is then, under what conditions is it possible for an omitted variable to bias path coefficients? Below is a summary for a model with one observed exogenous variable and one relevant omitted variable: If O is uncorrelated with the exogenous variable, r xy is an unbiased estimator of b yx and the omitted variable has no effect. If the variance in Y accounted for by O is completely redundant with the variables in the model, its unique effect (β yo ) will be near zero and it will have little biasing effect. RT2382X.indb 101 5/19/08 7:44:37 AM

12 102 Adam W. Meade, Tara S. Behrend and Charles E. Lance If O is uncorrelated with the endogenous variable but strongly correlated with the exogenous variable, r xy may underestimate b yx (i.e., a suppressor effect). Thus, there are three conditions which must be present in order for an omitted variable to cause positive bias in estimated path coefficients; that variable must (a) correlate at a nonzero level with other determinants of Y, (b) not be completely redundant with other variables included in the path model, and (c) correlate with the endogenous variable. If (a) and (b) are true, but (c) is not, the omitted variable may serve to artificially deflate the estimate of the path coefficient of the variables included in the model. In sum, the potential for LOVE is greatest when the omitted variable correlates highly with the outcome variable and moderately with other determinants in the model. Path Coefficient Bias Versus Significance Testing It is important to make a distinction between the biasing effect of omitted variables on the magnitude of path coefficients and the effect of omitted variables on the significance tests of those path coefficients. Generally speaking, in theory building via path analysis and SEM, there are two important outcomes of interest to the researcher: the magnitudes of the estimates of the path coefficients themselves and associated significance tests. Often in early stages of research, the primary outcome of interest in path analyses is the significance test associated with the path coefficient. In other words, the answer to the question does the variable have a unique effect on the outcome? would seem more important than the question what is the precise magnitude of the unique effect of the variable on the outcome? If early forays into model testing with a given set of variables indicate that the effect of a determinant on an endogenous variable is nonsignificant, it is less likely that future researchers would include this variable as a measured cause as frequently as if the variable did have a significant effect on the outcome. In this context, the magnitude of the path coefficient per se is less important than the decision as to the presence or absence of an effect of X on Y. If there does appear to be an effect (i.e., the test is significant), then future use and, importantly, replication of this effect is much more likely. While the rough magnitude of the effect RT2382X.indb 102 5/19/08 7:44:38 AM

13 Dr. StrangeLOVE 103 is undoubtedly important, small bias in the path coefficients would likely be of little concern so long as the conclusion of the significance test is not affected at this stage of investigation. The second outcome of path analysis is the magnitude of the path coefficients themselves. Estimates of path coefficients are important in that standardized coefficients are one index of the unique variance in the endogenous variable accounted for by the determinant. Additionally, unstandardized coefficients can be compared over time, and cumulative evidence can be collected such that the relative effect of a determinant on an outcome can be estimated. As research cumulates over time, the precision of estimated paths becomes important to future meta-analysts such that an accurate estimate of the effect of a determinant on an endogenous variable can be calculated. Thus, even though precise estimates of effects may not be of primary interest to a researcher in early stages of research on a topic, these estimates take on additional importance over time as research accumulates and meta-analyses are conducted. Recall that if the omitted variable does not correlate with the endogenous variable but correlates with other variables in the model, it may act as a suppressor variable. This was shown in Tables 4.1 and 4.2 where the exclusion of an omitted variable resulted in negative bias of the estimated path coefficient. That is, its inclusion in the model could serve to increase the estimated path coefficients of the observed variables. In regard to significance testing, omitted variables that do not correlate with the endogenous are potentially problematic in that they may result in Type II errors (i.e., failure to detect an effect that truly exists). However, reviewer criticisms of a lack of comprehensive path models typically center more on the potential upward biasing effects of omitted variables and associated Type I error (i.e., wrongly identifying an effect that does not exist). The focus on Type I errors is understandable as such errors may translate to immediate implications for practice and use of an determinant variable whereas Type II errors are less likely to be published and likely will be rectified in future studies. If Type II error is seen as less problematic as Type I error, the requirement of a significant correlation between the omitted variable and the outcome may be added to the list of conditions that must be met before the possibility of an omitted variable becomes a concern in path models. Omitted variables that do not correlate with the outcome cannot cause RT2382X.indb 103 5/19/08 7:44:38 AM

14 104 Adam W. Meade, Tara S. Behrend and Charles E. Lance upward bias in path coefficient estimates, which is typically the focus of LOVE concerns. Minimizing the Risk of LOVE There are specific conditions under which omitted variables can be problematic, and it is true that no matter how comprehensive a path model, there are always omitted relevant variables in organizational research. We have also illustrated that there can be substantial bias under some conditions; thus, there is a kernel of truth relating to LOVE in organizational research. To this extent, educating researchers on the ways in which to minimize the risk of omitted variable problems is of paramount importance. There are several ways in which organizational researchers can minimize the risk of omitted variables biasing path coefficients, discussed below. Experimental Control First, one could incorporate design characteristics that minimize the correlation between measured exogenous variables and omitted variables. Random assignment of participants is extremely successful in controlling for a wide range of known or unknown omitted individual difference variables. As we have emphasized, there can be no possible biasing effect of an omitted variable if that variable does not correlate with the observed variables in the path model (given sufficient sample size). As such, random assignment is highly effective for controlling for almost any individual difference variable in a path model. Although random assignment may not be possible in many instances of organizational research, there are some cases in which it may be employed. For example, participants may be randomly assigned to different types of training courses, reward systems, equipment and other environmental factors, or organizational interventions for which the effectiveness may be evaluated. In more mathematical terms, recall that in the case of one exogenous variable (X) and one omitted variable (O), the estimated effect of X on the endogenous variable (Y) is the zero-order correlation between X and Y. However, the true effect of X on Y should be given as Equation 4.8: RT2382X.indb 104 5/19/08 7:44:38 AM

15 Dr. StrangeLOVE 105 r r r (4.8) 2 1 r xy yo xo β yx = xo When random assignment is used, the correlation between X and O will be near zero (with sufficient sample size). Thus, Equation 4.8 reduces to r xy and there is no bias. More Inclusive Models Second, researchers should include as many known causes of the endogenous variable as is practically possible in the path model. The potential for bias in path coefficient estimates caused by omitted variables is much greater when they serve as unique causal agents of the endogenous variable. Recall that for a two determinant model with one determinant omitted, the bias present is equal to r xo β yo. By incorporating more determinants of the outcome, the unique effects of omitted variables may be reduced as β yo approaches zero. Note however, that there is a paradoxical side effect of including more variables. That is, each additional determinant that is included in the model is also prone to LOVE and is subject to the assumption of model self-containment. Use Previous Research to Justify Assumptions Researchers may also use what is already known from past research to demonstrate that omitted variables are not likely to be problematic. For example, when estimating the effects of ability determinants of job performance, one could legitimately leave out entire classes of other performance determinants such as personality and motivation, because these are likely to be uncorrelated with ability determinants and therefore are nonrelevant causes (Ackerman & Heggestad, 1997; Sackett, Gruys, & Ellingson, 1998; Salgado, Viswesvaran, & Ones, 2001; Schmidt & Hunter, 1998; see also Lance & James, 1999). On the other hand, if both verbal and quantitative aptitude were thought to be causes of employee job performance, it is unlikely that the omission of similar types of tests (e.g., mechanical ability) would RT2382X.indb 105 5/19/08 7:44:38 AM

16 106 Adam W. Meade, Tara S. Behrend and Charles E. Lance produce a strong biasing effect on path coefficients of those tests in the model as mechanical ability is exceedingly likely to have a large correlation (i.e., be redundant with) with the measured ability test variables. As such, the plausibility of bias due to omitting mechanical ability tests is very low as again β yo will be closer to zero. Put differently, in many instances nonrelevant causes can largely be ignored because they are either (a) not related to measured causes or (b) largely redundant with relevant causes that are already measured. To this extent, prior research on correlates of both the outcome and other determinants can provide guidance on what variables are essential to include in the model and which may be safely omitted. Consideration of Research Purpose If the goal is to provide a precise estimate of path coefficients, or to compare the relative variance accounted for by different determinants, omitted variables are considerably more problematic than if the goal is to test the statistical significance of the effect of a determinant on an outcome. Examining again the simple two determinant case, influence due to omitted variables can result in bias in the estimated path coefficient (r xy ) with respect to its true value (Equation 4.8). However, with large sample sizes, even sizable bias in estimated path coefficients are less likely to change decisions drawn from the statistical significance test associated with those coefficients. With large sample sizes, power is such that even small estimated effects tend to be statistically significant. In sum, omitted variables are a fact of life in organizational research and they can be problematic. Researchers should be particularly vigilant in cases in which (a) there are a large number of determinants of the outcome variable, (b) the study in question includes only a small subset of those determinants, (c) it is likely that the omitted variables have moderate or large correlations with the measured determinants, and (d) it is likely that the omitted variables would account for unique variance in the outcome variables. However, the notion that omitted variables are always problematic is a myth as the threat to the inferences that we tend to draw may not be as serious as some have believed. RT2382X.indb 106 5/19/08 7:44:39 AM

17 Dr. StrangeLOVE 107 References Ackerman, P. L., & Heggestad, E. D. (1997). Intelligence, personality, and interests: Evidence for overlapping traits. Psychological Bulletin, 121, Bollen, K. A. (1989). Structural equations with latent variables. Oxford, England: John Wiley and Sons. Colquitt, J. A., LePine, J. A., & Noe, R. A. (2000). Toward an integrative theory of training motivation: A meta-analytic path analysis of 20 years of research. Journal of Applied Psychology, 85, Duncan, O. D. (1975). Introduction to structural equation models. New York: Academic Press. Hanushek, E. A., & Jackson, J. E. (1977). Statistical methods for social scientists. San Diego: Academic Press. James, L. R. (1980). The unmeasured variables problem in path analysis. Journal of Applied Psychology, 65, James, L. R., Mulaik, S. A., & Brett, J. M. (1982). Causal analysis: Assumptions, models and data. Beverly Hills: Sage. Kenny, D. A. (1979). Correlation and causality. New York: Wiley-Interscience. Lance, C. E., & James, L. R. (1999). ν 2 : A proportional variance-accountedfor index for some cross-level and person-situation research designs. Organizational Research Methods, 2, Mauro, R. (1990). Understanding L.O.V.E. (left out variables error): A method for estimating the effects of omitted variables. Psychological Bulletin, 108, Prussia, G. E., & Kinicki, A. J. (1996). A motivational investigation of group effectiveness using social-cognitive theory. Journal of Applied Psychology, 81, Prussia, G. E., Kinicki, A. J., & Bracker, J. S. (1993). Psychological and behavioral consequences of job loss: A covariance structure analysis using Weiner s (1985) attribution model. Journal of Applied Psychology, 78, Sackett, P. R., Gruys, M. L., & Ellingson, J. E. (1998). Ability-personality interactions when predicting job performance. Journal of Applied Psychology, 83, Sackett, P. R., Laczo, R. M., & Lippe, Z. P. (2003). Differential prediction and the use of multiple predictors: The omitted variables problem. Journal of Applied Psychology, 88, Salgado, J. F., Viswesvaran, C., & Ones, D. S. (2001). Predictors used for personnel selection: An overview of constructs, methods and techniques. In D. S. Ones et al. (Eds.), Handbook of industrial, work and organizational psychology, Vol. 1: Personnel psychology (pp ). London, England: Sage Publications. RT2382X.indb 107 5/19/08 7:44:39 AM

18 108 Adam W. Meade, Tara S. Behrend and Charles E. Lance Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, Simon, H. A. (1977). Models of discovery: And other topics in the methods of science. Dordrecht, Holland: D. Reidel. RT2382X.indb 108 5/19/08 7:44:39 AM

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models Chapter 5 Introduction to Path Analysis Put simply, the basic dilemma in all sciences is that of how much to oversimplify reality. Overview H. M. Blalock Correlation and causation Specification of path

More information

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SEM is a family of statistical techniques which builds upon multiple regression,

More information

Outline

Outline 2559 Outline cvonck@111zeelandnet.nl 1. Review of analysis of variance (ANOVA), simple regression analysis (SRA), and path analysis (PA) 1.1 Similarities and differences between MRA with dummy variables

More information

Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior

Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior Comparing Change Scores with Lagged Dependent Variables in Models of the Effects of Parents Actions to Modify Children's Problem Behavior David R. Johnson Department of Sociology and Haskell Sie Department

More information

Estimating Operational Validity Under Incidental Range Restriction: Some Important but Neglected Issues

Estimating Operational Validity Under Incidental Range Restriction: Some Important but Neglected Issues A peer-reviewed electronic journal. Copyright is retained by the first or sole author, who grants right of first publication to Practical Assessment, Research & Evaluation. Permission is granted to distribute

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Chapter 8. Models with Structural and Measurement Components. Overview. Characteristics of SR models. Analysis of SR models. Estimation of SR models

Chapter 8. Models with Structural and Measurement Components. Overview. Characteristics of SR models. Analysis of SR models. Estimation of SR models Chapter 8 Models with Structural and Measurement Components Good people are good because they've come to wisdom through failure. Overview William Saroyan Characteristics of SR models Estimation of SR models

More information

Methods for Integrating Moderation and Mediation: Moving Forward by Going Back to Basics. Jeffrey R. Edwards University of North Carolina

Methods for Integrating Moderation and Mediation: Moving Forward by Going Back to Basics. Jeffrey R. Edwards University of North Carolina Methods for Integrating Moderation and Mediation: Moving Forward by Going Back to Basics Jeffrey R. Edwards University of North Carolina Research that Examines Moderation and Mediation Many streams of

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

Assessing Studies Based on Multiple Regression

Assessing Studies Based on Multiple Regression Assessing Studies Based on Multiple Regression Outline 1. Internal and External Validity 2. Threats to Internal Validity a. Omitted variable bias b. Functional form misspecification c. Errors-in-variables

More information

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha

Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha Online Appendix to Yes, But What s the Mechanism? (Don t Expect an Easy Answer) John G. Bullock, Donald P. Green, and Shang E. Ha January 18, 2010 A2 This appendix has six parts: 1. Proof that ab = c d

More information

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions

More information

CORRELATIONS ~ PARTIAL REGRESSION COEFFICIENTS (GROWTH STUDY PAPER #29) and. Charles E. Werts

CORRELATIONS ~ PARTIAL REGRESSION COEFFICIENTS (GROWTH STUDY PAPER #29) and. Charles E. Werts RB-69-6 ASSUMPTIONS IN MAKING CAUSAL INFERENCES FROM PART CORRELATIONS ~ PARTIAL CORRELATIONS AND PARTIAL REGRESSION COEFFICIENTS (GROWTH STUDY PAPER #29) Robert L. Linn and Charles E. Werts This Bulletin

More information

6. Assessing studies based on multiple regression

6. Assessing studies based on multiple regression 6. Assessing studies based on multiple regression Questions of this section: What makes a study using multiple regression (un)reliable? When does multiple regression provide a useful estimate of the causal

More information

Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D.

Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D. Comments on The Role of Large Scale Assessments in Research on Educational Effectiveness and School Development by Eckhard Klieme, Ph.D. David Kaplan Department of Educational Psychology The General Theme

More information

Chapter 11. Correlation and Regression

Chapter 11. Correlation and Regression Chapter 11. Correlation and Regression The word correlation is used in everyday life to denote some form of association. We might say that we have noticed a correlation between foggy days and attacks of

More information

1 Motivation for Instrumental Variable (IV) Regression

1 Motivation for Instrumental Variable (IV) Regression ECON 370: IV & 2SLS 1 Instrumental Variables Estimation and Two Stage Least Squares Econometric Methods, ECON 370 Let s get back to the thiking in terms of cross sectional (or pooled cross sectional) data

More information

FAQ: Linear and Multiple Regression Analysis: Coefficients

FAQ: Linear and Multiple Regression Analysis: Coefficients Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable

More information

Propensity Score Matching

Propensity Score Matching Methods James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 Methods 1 Introduction 2 3 4 Introduction Why Match? 5 Definition Methods and In

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

Variance Partitioning

Variance Partitioning Lecture 12 March 8, 2005 Applied Regression Analysis Lecture #12-3/8/2005 Slide 1 of 33 Today s Lecture Muddying the waters of regression. What not to do when considering the relative importance of variables

More information

Instrumental Variables

Instrumental Variables James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 3 4 Instrumental variables allow us to get a better estimate of a causal

More information

6.3 How the Associational Criterion Fails

6.3 How the Associational Criterion Fails 6.3. HOW THE ASSOCIATIONAL CRITERION FAILS 271 is randomized. We recall that this probability can be calculated from a causal model M either directly, by simulating the intervention do( = x), or (if P

More information

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis

Path Analysis. PRE 906: Structural Equation Modeling Lecture #5 February 18, PRE 906, SEM: Lecture 5 - Path Analysis Path Analysis PRE 906: Structural Equation Modeling Lecture #5 February 18, 2015 PRE 906, SEM: Lecture 5 - Path Analysis Key Questions for Today s Lecture What distinguishes path models from multivariate

More information

Reconciling factor-based and composite-based approaches to structural equation modeling

Reconciling factor-based and composite-based approaches to structural equation modeling Reconciling factor-based and composite-based approaches to structural equation modeling Edward E. Rigdon (erigdon@gsu.edu) Modern Modeling Methods Conference May 20, 2015 Thesis: Arguments for factor-based

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

Statistical Models for Causal Analysis

Statistical Models for Causal Analysis Statistical Models for Causal Analysis Teppei Yamamoto Keio University Introduction to Causal Inference Spring 2016 Three Modes of Statistical Inference 1. Descriptive Inference: summarizing and exploring

More information

ECON Introductory Econometrics. Lecture 17: Experiments

ECON Introductory Econometrics. Lecture 17: Experiments ECON4150 - Introductory Econometrics Lecture 17: Experiments Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 13 Lecture outline 2 Why study experiments? The potential outcome framework.

More information

Estimating direct effects in cohort and case-control studies

Estimating direct effects in cohort and case-control studies Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research

More information

Mplus Code Corresponding to the Web Portal Customization Example

Mplus Code Corresponding to the Web Portal Customization Example Online supplement to Hayes, A. F., & Preacher, K. J. (2014). Statistical mediation analysis with a multicategorical independent variable. British Journal of Mathematical and Statistical Psychology, 67,

More information

Selection on Observables: Propensity Score Matching.

Selection on Observables: Propensity Score Matching. Selection on Observables: Propensity Score Matching. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 24/10/2017 I. Brunetti Labour Economics in an European Perspective 24/10/2017

More information

Instrumental Variables

Instrumental Variables Instrumental Variables James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Instrumental Variables 1 / 10 Instrumental Variables

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Variance Partitioning

Variance Partitioning Chapter 9 October 22, 2008 ERSH 8320 Lecture #8-10/22/2008 Slide 1 of 33 Today s Lecture Test review and discussion. Today s Lecture Chapter 9: Muddying the waters of regression. What not to do when considering

More information

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models

Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models Using Matching, Instrumental Variables and Control Functions to Estimate Economic Choice Models James J. Heckman and Salvador Navarro The University of Chicago Review of Economics and Statistics 86(1)

More information

An Introduction to Parameter Estimation

An Introduction to Parameter Estimation Introduction Introduction to Econometrics An Introduction to Parameter Estimation This document combines several important econometric foundations and corresponds to other documents such as the Introduction

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Section 7 Model Assessment This section is based on Stock and Watson s Chapter 9. Internal vs. external validity Internal validity refers to whether the analysis is valid for the population and sample

More information

EMERGING MARKETS - Lecture 2: Methodology refresher

EMERGING MARKETS - Lecture 2: Methodology refresher EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different

More information

On line resources Should be able to use for homework

On line resources Should be able to use for homework On line resources Should be able to use for homework http://www.amstat.org/publications/jse/v10n3/aberson/po wer_applet.html http://www.indiana.edu/~psyugrad/gradschool/apply.php http://onlinestatbook.com/stat_sim/conf_interval/index.ht

More information

FORMATIVE AND REFLECTIVE MODELS: STATE OF THE ART. Anna Simonetto *

FORMATIVE AND REFLECTIVE MODELS: STATE OF THE ART. Anna Simonetto * Electronic Journal of Applied Statistical Analysis EJASA (2012), Electron. J. App. Stat. Anal., Vol. 5, Issue 3, 452 457 e-issn 2070-5948, DOI 10.1285/i20705948v5n3p452 2012 Università del Salento http://siba-ese.unile.it/index.php/ejasa/index

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares

Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Wooldridge, Introductory Econometrics, 4th ed. Chapter 15: Instrumental variables and two stage least squares Many economic models involve endogeneity: that is, a theoretical relationship does not fit

More information

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013

STRUCTURAL EQUATION MODELING. Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 STRUCTURAL EQUATION MODELING Khaled Bedair Statistics Department Virginia Tech LISA, Summer 2013 Introduction: Path analysis Path Analysis is used to estimate a system of equations in which all of the

More information

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate

More information

Correlation and Regression Bangkok, 14-18, Sept. 2015

Correlation and Regression Bangkok, 14-18, Sept. 2015 Analysing and Understanding Learning Assessment for Evidence-based Policy Making Correlation and Regression Bangkok, 14-18, Sept. 2015 Australian Council for Educational Research Correlation The strength

More information

Causal Inference Using Nonnormality Yutaka Kano and Shohei Shimizu 1

Causal Inference Using Nonnormality Yutaka Kano and Shohei Shimizu 1 Causal Inference Using Nonnormality Yutaka Kano and Shohei Shimizu 1 Path analysis, often applied to observational data to study causal structures, describes causal relationship between observed variables.

More information

Structural equation modeling

Structural equation modeling Structural equation modeling Rex B Kline Concordia University Montréal ISTQL Set B B1 Data, path models Data o N o Form o Screening B2 B3 Sample size o N needed: Complexity Estimation method Distributions

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Two-sample Categorical data: Testing

Two-sample Categorical data: Testing Two-sample Categorical data: Testing Patrick Breheny October 29 Patrick Breheny Biostatistical Methods I (BIOS 5710) 1/22 Lister s experiment Introduction In the 1860s, Joseph Lister conducted a landmark

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Measurement Error and Causal Discovery

Measurement Error and Causal Discovery Measurement Error and Causal Discovery Richard Scheines & Joseph Ramsey Department of Philosophy Carnegie Mellon University Pittsburgh, PA 15217, USA 1 Introduction Algorithms for causal discovery emerged

More information

Introduction. Consider a variable X that is assumed to affect another variable Y. The variable X is called the causal variable and the

Introduction. Consider a variable X that is assumed to affect another variable Y. The variable X is called the causal variable and the 1 di 23 21/10/2013 19:08 David A. Kenny October 19, 2013 Recently updated. Please let me know if your find any errors or have any suggestions. Learn how you can do a mediation analysis and output a text

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised April 6, 2015

Nonrecursive Models Highlights Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised April 6, 2015 Nonrecursive Models Highlights Richard Williams, University of Notre Dame, https://www3.nd.edu/~rwilliam/ Last revised April 6, 2015 This lecture borrows heavily from Duncan s Introduction to Structural

More information

Introduction to Structural Equation Modeling

Introduction to Structural Equation Modeling Introduction to Structural Equation Modeling Notes Prepared by: Lisa Lix, PhD Manitoba Centre for Health Policy Topics Section I: Introduction Section II: Review of Statistical Concepts and Regression

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems

Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Wooldridge, Introductory Econometrics, 3d ed. Chapter 9: More on specification and data problems Functional form misspecification We may have a model that is correctly specified, in terms of including

More information

CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum

CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE. From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE From Exploratory Factor Analysis Ledyard R Tucker and Robert C. MacCallum 1997 65 CHAPTER 4 THE COMMON FACTOR MODEL IN THE SAMPLE 4.0. Introduction In Chapter

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors

IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors IV Estimation and its Limitations: Weak Instruments and Weakly Endogeneous Regressors Laura Mayoral IAE, Barcelona GSE and University of Gothenburg Gothenburg, May 2015 Roadmap Deviations from the standard

More information

review session gov 2000 gov 2000 () review session 1 / 38

review session gov 2000 gov 2000 () review session 1 / 38 review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review

More information

Potential Outcomes Model (POM)

Potential Outcomes Model (POM) Potential Outcomes Model (POM) Relationship Between Counterfactual States Causality Empirical Strategies in Labor Economics, Angrist Krueger (1999): The most challenging empirical questions in economics

More information

Upon completion of this chapter, you should be able to:

Upon completion of this chapter, you should be able to: 1 Chaptter 7:: CORRELATIION Upon completion of this chapter, you should be able to: Explain the concept of relationship between variables Discuss the use of the statistical tests to determine correlation

More information

Mediation for the 21st Century

Mediation for the 21st Century Mediation for the 21st Century Ross Boylan ross@biostat.ucsf.edu Center for Aids Prevention Studies and Division of Biostatistics University of California, San Francisco Mediation for the 21st Century

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

2 Prediction and Analysis of Variance

2 Prediction and Analysis of Variance 2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering

More information

A Distinction between Causal Effects in Structural and Rubin Causal Models

A Distinction between Causal Effects in Structural and Rubin Causal Models A istinction between Causal Effects in Structural and Rubin Causal Models ionissi Aliprantis April 28, 2017 Abstract: Unspecified mediators play different roles in the outcome equations of Structural Causal

More information

STATISTICS Relationships between variables: Correlation

STATISTICS Relationships between variables: Correlation STATISTICS 16 Relationships between variables: Correlation The gentleman pictured above is Sir Francis Galton. Galton invented the statistical concept of correlation and the use of the regression line.

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

Modern Mediation Analysis Methods in the Social Sciences

Modern Mediation Analysis Methods in the Social Sciences Modern Mediation Analysis Methods in the Social Sciences David P. MacKinnon, Arizona State University Causal Mediation Analysis in Social and Medical Research, Oxford, England July 7, 2014 Introduction

More information

SHOPPING FOR EFFICIENT CONFIDENCE INTERVALS IN STRUCTURAL EQUATION MODELS. Donna Mohr and Yong Xu. University of North Florida

SHOPPING FOR EFFICIENT CONFIDENCE INTERVALS IN STRUCTURAL EQUATION MODELS. Donna Mohr and Yong Xu. University of North Florida SHOPPING FOR EFFICIENT CONFIDENCE INTERVALS IN STRUCTURAL EQUATION MODELS Donna Mohr and Yong Xu University of North Florida Authors Note Parts of this work were incorporated in Yong Xu s Masters Thesis

More information

Technical Track Session I:

Technical Track Session I: Impact Evaluation Technical Track Session I: Click to edit Master title style Causal Inference Damien de Walque Amman, Jordan March 8-12, 2009 Click to edit Master subtitle style Human Development Human

More information

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression.

PBAF 528 Week 8. B. Regression Residuals These properties have implications for the residuals of the regression. PBAF 528 Week 8 What are some problems with our model? Regression models are used to represent relationships between a dependent variable and one or more predictors. In order to make inference from the

More information

What is Wrong With Hypotheses Sociology? Or: How Theory-Driven Empirical Research Should Look Like. Katrin Auspurg and Josef Brüderl November 2016

What is Wrong With Hypotheses Sociology? Or: How Theory-Driven Empirical Research Should Look Like. Katrin Auspurg and Josef Brüderl November 2016 What is Wrong With Hypotheses Sociology? Or: How Theory-riven Empirical Research Should Look Like Katrin Auspurg and Josef Brüderl November 2016 Social Research in the Era of Regression Since the advent

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Donghoh Kim & Se-Kang Kim

Donghoh Kim & Se-Kang Kim Behav Res (202) 44:239 243 DOI 0.3758/s3428-02-093- Comparing patterns of component loadings: Principal Analysis (PCA) versus Independent Analysis (ICA) in analyzing multivariate non-normal data Donghoh

More information

8. Instrumental variables regression

8. Instrumental variables regression 8. Instrumental variables regression Recall: In Section 5 we analyzed five sources of estimation bias arising because the regressor is correlated with the error term Violation of the first OLS assumption

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula

ANCOVA. ANCOVA allows the inclusion of a 3rd source of variation into the F-formula (called the covariate) and changes the F-formula ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control You know how ANOVA works the total variation among

More information

Propensity Score Methods for Causal Inference

Propensity Score Methods for Causal Inference John Pura BIOS790 October 2, 2015 Causal inference Philosophical problem, statistical solution Important in various disciplines (e.g. Koch s postulates, Bradford Hill criteria, Granger causality) Good

More information

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS

DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS DEALING WITH MULTIVARIATE OUTCOMES IN STUDIES FOR CAUSAL EFFECTS Donald B. Rubin Harvard University 1 Oxford Street, 7th Floor Cambridge, MA 02138 USA Tel: 617-495-5496; Fax: 617-496-8057 email: rubin@stat.harvard.edu

More information

Causal Inference. Prediction and causation are very different. Typical questions are:

Causal Inference. Prediction and causation are very different. Typical questions are: Causal Inference Prediction and causation are very different. Typical questions are: Prediction: Predict Y after observing X = x Causation: Predict Y after setting X = x. Causation involves predicting

More information

Using Mplus individual residual plots for. diagnostics and model evaluation in SEM

Using Mplus individual residual plots for. diagnostics and model evaluation in SEM Using Mplus individual residual plots for diagnostics and model evaluation in SEM Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 20 October 31, 2017 1 Introduction A variety of plots are available

More information

Causal inference in multilevel data structures:

Causal inference in multilevel data structures: Causal inference in multilevel data structures: Discussion of papers by Li and Imai Jennifer Hill May 19 th, 2008 Li paper Strengths Area that needs attention! With regard to propensity score strategies

More information

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded

Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded Controlling for latent confounding by confirmatory factor analysis (CFA) Blinded Blinded 1 Background Latent confounder is common in social and behavioral science in which most of cases the selection mechanism

More information

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i

B. Weaver (24-Mar-2005) Multiple Regression Chapter 5: Multiple Regression Y ) (5.1) Deviation score = (Y i B. Weaver (24-Mar-2005) Multiple Regression... 1 Chapter 5: Multiple Regression 5.1 Partial and semi-partial correlation Before starting on multiple regression per se, we need to consider the concepts

More information

Simpson s paradox, moderation, and the emergence of quadratic relationships in path models: An information systems illustration

Simpson s paradox, moderation, and the emergence of quadratic relationships in path models: An information systems illustration Simpson s paradox, moderation, and the emergence of quadratic relationships in path models: An information systems illustration Ned Kock Leebrian Gaskins Full reference: Kock, N., & Gaskins, L. (2016).

More information

Workshop on Statistical Applications in Meta-Analysis

Workshop on Statistical Applications in Meta-Analysis Workshop on Statistical Applications in Meta-Analysis Robert M. Bernard & Phil C. Abrami Centre for the Study of Learning and Performance and CanKnow Concordia University May 16, 2007 Two Main Purposes

More information

Prerequisite Material

Prerequisite Material Prerequisite Material Study Populations and Random Samples A study population is a clearly defined collection of people, animals, plants, or objects. In social and behavioral research, a study population

More information

Research Design - - Topic 19 Multiple regression: Applications 2009 R.C. Gardner, Ph.D.

Research Design - - Topic 19 Multiple regression: Applications 2009 R.C. Gardner, Ph.D. Research Design - - Topic 19 Multiple regression: Applications 2009 R.C. Gardner, Ph.D. Curve Fitting Mediation analysis Moderation Analysis 1 Curve Fitting The investigation of non-linear functions using

More information

A Guide to Proof-Writing

A Guide to Proof-Writing A Guide to Proof-Writing 437 A Guide to Proof-Writing by Ron Morash, University of Michigan Dearborn Toward the end of Section 1.5, the text states that there is no algorithm for proving theorems.... Such

More information

Variable Selection and Model Building

Variable Selection and Model Building LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 37 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur The complete regression

More information

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science EXAMINATION: QUANTITATIVE EMPIRICAL METHODS Yale University Department of Political Science January 2014 You have seven hours (and fifteen minutes) to complete the exam. You can use the points assigned

More information

Moderation 調節 = 交互作用

Moderation 調節 = 交互作用 Moderation 調節 = 交互作用 Kit-Tai Hau 侯傑泰 JianFang Chang 常建芳 The Chinese University of Hong Kong Based on Marsh, H. W., Hau, K. T., Wen, Z., Nagengast, B., & Morin, A. J. S. (in press). Moderation. In Little,

More information

Implications of Direct and Indirect Range Restriction for Meta-Analysis Methods and Findings

Implications of Direct and Indirect Range Restriction for Meta-Analysis Methods and Findings Journal of Applied Psychology Copyright 006 by the American Psychological Association 006, Vol. 91, No. 3, 594 61 001-9010/06/$1.00 DOI: 10.1037/001-9010.91.3.594 Implications of Direct and Indirect Range

More information