Dr. StrangeLOVE, or. How I Learned to Stop Worrying and Love Omitted Variables. Adam W. Meade, Tara S. Behrend and Charles E.

Size: px

Start display at page:

Download "Dr. StrangeLOVE, or. How I Learned to Stop Worrying and Love Omitted Variables. Adam W. Meade, Tara S. Behrend and Charles E."

Jason Cobb
6 years ago
Views:

1 AU: Check that your name is presented correctly and consistently here against the TOC 4 Dr. StrangeLOVE, or How I Learned to Stop Worrying and Love Omitted Variables Adam W. Meade, Tara S. Behrend and Charles E. Lance A well-known problem in path analysis and structural equation modeling (SEM) is that even the largest and most comprehensive models cannot contain all of the causes of models endogenous variables. This violation of one of the underlying assumptions of path analysis and SEM gives rise to a commonly held belief that failure to include all relevant causes of endogenous variables may invalidate study results in path analysis and SEM. This problem has been referred to variously as the unmeasured variables problem (Duncan, 1975; James, 1980), the omitted variables problem (James, 1980; Kenny, 1979; Sackett, Laczo, & Lippe, 2003), left out variables error (LOVE; Mauro, 1990), a lack of perfect isolation (i.e., pseudo-isolation; Bollen, 1989), and lack of self-containment (James, Mulaik, & Brett, 1982). It has also been discussed as a particular type of model specification error (Hanushek & Jackson, 1977; Kenny, 1979). The omitted variables problem arises when the assumption that all relevant variables that influence the dependent (endogenous) variables are included in the model is violated. However, in the social sciences, this assumption is rarely, if ever, fulfilled. Although there is no shortage of scholarly discussion and writing related to omitted variables, it is less clear how often this issue arises in substantive academic and applied research. This is because discussion of omitted variables usually takes place behind the scenes, for example during the manuscript review process. In response to a post to the RMNET message board on June 11, 2007, several authors 91 RT2382X.indb 91 5/19/08 7:44:32 AM

2 92 Adam W. Meade, Tara S. Behrend and Charles E. Lance indicated that omitted variable discussions have arisen during the review process. In one example, an anonymous reviewer commented on a paper related to sources of work absenteeism: However, omitted variables that are tied to absenteeism still remain a concern as family size, number of children, and being single head of household are also related to race/ethnicity. The issue is not that perceived value of diversity and children, etc. are related (as the authors contend), it is that race is correlated with both reports of value of diversity and number of children etc., and then with absenteeism. Hence, absenteeism is potentially being driven by factors other than what the author(s) allege. Simply acknowledging the lack of critical data (pages 26 & 27) does not eliminate the concern that major confounds were not adequately controlled. (S. Tonidandel, personal communication, June 12, 2007). This comment is undoubtedly typical of those researchers regularly encounter. In order to provide some index of the extent to which researchers consider omitted variable issues in their work, we conducted a cited reference search. Specifically, we used the Social Science Citation Index to identify works that cited two seminal papers on omitted variables, James (1980) and Mauro (1990), on the assumption that authors dealing with omitted variables issues in their research would be likely to cite these works. A total of 63 sources were found that cited these studies. We then coded each of these sources into one of four categories based on the context in which they discussed omitted variables. Of the 63 sources, 12 actually took steps to assess risk from omitted variables or acted to minimize the impact of omitted variables in some way (e.g., including relevant variables not of central focus to the model [Prussia, Kinicki, & Bracker, 1993], testing alternative models with and without potential additional determinant variables [Colquitt, LePine, & Noe, 2000; Prussia & Kinicki, 1996]). An additional 21 articles cited James (1980) or Mauro (1990) when discussing the potential biasing effect of omitted variables but did not attempt to account for such variables in any way. Twenty-six sources cited these works as part of a methodological review of path analysis or SEM. Finally, four sources mentioned the potential of omitted variables as a limitation of previous research in order to help justify their current study. In sum, it seems that reviewers and others critically evaluating organizational research are aware of the omitted variables issue and voice concerns over LOVE, perhaps even in contexts in which there is minimal risk of omitted variables compromising research RT2382X.indb 92 5/19/08 7:44:32 AM

3 Dr. StrangeLOVE 93 conclusions. On the other hand, authors seem to address omitted variables in a meaningful way less frequently than would be desired. This is not surprising given that authors may not want to call attention to methodological issues that could question the validity of their study conclusions. However, there are some instances in which omitted variables do pose a considerable threat to the conclusions of path analyses and SEM. In order to provide a better understanding of when omitted variables may or may not jeopardize the validity of path analysis and SEM, this chapter has three goals: (a) review the relevant assumptions in path analysis and SEM and present a mathematical explanation of the omitted variables problem, (b) discuss the conditions under which omitted variables are likely to be problematic and those under which the effects of omitted variables are negligible, and (c) provide recommendations for minimizing the risk of LOVE. Theoretical and Mathematical Definition of the Omitted Variables Problem Conceptually, the problems that may be caused by omitted variables are not difficult to understand. When researchers specify path or structural equation models in order to evaluate a theory, path coefficients are estimated based on the correlations among the measured variables in the model and the pattern of structural relations specified. If an endogenous (dependent) variable is affected by a variable that is unmeasured, and the unmeasured variable correlates to a moderate degree with other causal determinants in the model, the effects of the unmeasured variable can be incorrectly attributed to the measured causal determinants in the model. While the effect of the omitted variable could serve to decrease the magnitude of the path coefficient of the measured variable (i.e., a suppressor effect), it is more often assumed that the effect would cause a positive bias in the path coefficient of the measured variable. This positive bias could also result in the determination that a determinant has a statistically significant effect on an endogenous variable, when such a finding would not have been the case if the unmeasured variable had been included in the path model. This error is referred to as LOVE. The omitted variables problem is perhaps best understood by first looking at the basic mathematics supporting path modeling. In RT2382X.indb 93 5/19/08 7:44:32 AM

4 94 Adam W. Meade, Tara S. Behrend and Charles E. Lance order to clearly demonstrate this issue, we outline a series of progressively more complex path models based on standardized variables (i.e., β will be used as the symbol for path coefficients and regression weights). These models may then be generalized to the case of latent variables in SEM as the underlying conceptual issues are the same. The simplest linear causal model includes one exogenous variable (X) and a single endogenous variable (Y). Assuming that both are expressed in standard score form, the relationship between them can be expressed as Y = β yx X + d (4.1) where β yx is the standardized regression coefficient, and d is a disturbance term composed of (a) random shocks, (b) nonsystematic measurement error, (c) unmeasured relevant causes, and (d) unmeasured nonrelevant causes (James et al., 1982). Random shocks can be thought of as unstable causal influences, measurement error refers to nonsystematic error, and unmeasured causes are omitted variables (see James et al., 1982). Whether or not a cause is relevant depends on the nature of its relationship with other variables in the model and is illustrated below. Figure 4.1 illustrates the path model for the case of a single causal exogenous variable and a single endogenous variable. In Figure 4.1a, the disturbance term (d) consists exclusively of random shocks (RS), measurement error (ME), and unmeasured nonrelevant causes (NRC). For this model, the expected relationship between X and Y is given by the equation E(X*Y) = β yx E(Y*Y) + E(X*d) (4.2) For Figure 4.1a, E(X*Y) reduces to β yx as E(Y*Y) = 1.0 for standardized variables and E(X*d) = 0 because the expected relationship between each of the three components of d (random shocks, measurement error, nonrelevant causes) and X equals zero. In this case r xy is an unbiased estimate of the causal parameter β yx. In Figure 4.1b, however, an additional component is present in the disturbance term, an omitted relevant cause (O). As before, the expected relationship between the random shocks, measurement error, and nonrelevant causes and X equals zero. However, the RT2382X.indb 94 5/19/08 7:44:32 AM

5 Dr. StrangeLOVE 95 (a) d (= RS + ME + NRC) X β yx Y (b) r xo d (= RS + ME + NRC + O) X β yx Y Figure 4.1 Path model for one exogenous and one endogenous variable. expected relationship between X and d = r xo b yo as there is an indirect effect of X on d due to the omitted variable that is present in d. An important concept to highlight is that the relevance of an omitted determinant of the endogenous variable is based entirely on the omitted variable s relationship with other variables in the model. That is, if the omitted causal variable correlates with other determinants of Y, the omitted variable is by definition a relevant omitted variable. Conversely, if the omitted variable does not correlate with other determinants of Y, it is by definition a nonrelevant cause of Y. Consider now the case of a path model in which one of two exogenous variables is erroneously omitted from the path model (O in Figure 4.2). Assume further that O correlates significantly with both X and Y. In this case, the measured correlation between X and Y reflects not only the direct effect of X on Y, but also the indirect effect of X on Y via the shared correlation both variables have with O. In other words, the observed correlation is determined by the equation r xy = β yx + r xo β yo (4.3) X 1 β yx1 d X β yx d r x1x2 Y r xo Y X 2 β yx2 O β yo (a) (b) Figure 4.2 Path model for two exogenous variables (one omitted). RT2382X.indb 95 5/19/08 7:44:33 AM

6 96 Adam W. Meade, Tara S. Behrend and Charles E. Lance However, because O is omitted from the path model, the (naively) estimated path between X and Y (β yx ) will be equal to r yx, though r yx is actually determined by the effect of both β yx and r xo β yo. As a result, r yx as an estimate of β yx will be biased by a factor of r xo β yo. The effect of r xo is obvious. If X were not correlated with O, then r yx is not affected by O and r yx is an unbiased estimate of β yx. In this case, O is a nonrelevant omitted cause of Y. That is, its omission from the path equation has minimal effect on the estimated path coefficient of the included exogenous variables or on their associated tests of statistical significance. Conversely, if X were nontrivially correlated with O, r xy would differ from β xy by a factor equal to r xo β yo so that r xy would be a biased estimate of β xy. This bias can affect tests of statistical significance and lead to erroneous conclusions regarding the model. In this case, O is a relevant omitted cause of Y. Although the potential biasing effect of r xo on β yx is obvious, the effect of β yo is less transparent. The equation for the path coefficient β yo is r r r (4.4) 2 1 r yo yx xo β yo = xo so that in order for β yo to have a biasing effect on r xy, which could be taken as the estimate of β yx, the correlation between X and O must be nonzero. If the correlation between X and O is nontrivially positive, bias in β yx will be greater when the correlation between Y and O is large and the correlation between Y and X is small. In order to provide some context for illustration, Table 4.1 includes several hypothetical values for r xy, r xo, and r yo. Note that no values of r xo = 0 are presented because there is no bias in r xy as an estimate of β yx when there is no correlation between the exogenous variable and the omitted variable (i.e., O is a nonrelevant cause of Y). As can be seen in Table 4.1, bias is greatest when the correlation between the X and Y is somewhat low (.20) yet the omitted variable correlates highly with both X and Y. This is the classic third variable problem (e.g., the spurious correlation between ice cream sales and drowning deaths) and a primary reason that correlation cannot be interpreted as causation. In this case, much of the effect attributed to the relationship between X and Y is actually due to their mutual correlation with and/or dependence on O. RT2382X.indb 96 5/19/08 7:44:34 AM

7 Dr. StrangeLOVE 97 Table 4.1 Biasing Effects of an Omitted Variable in a Two- Determinant Model ˆβ yx = r xy r xo r yo β yo β xy Bias Note. Bias is the estimated path coefficient ( ˆβyx = r xy ) minus the true path coefficient β yx. This value is equal to r xo β yo. Conditions in which r xo = 0 are not displayed, as there is no bias under these conditions. Note that when the correlation between the endogenous variable (Y) and the omitted variable (O) is close to zero, b yo can take on negative values. When b yo is negative, r xy (which is used to estimate b yx but is mathematically equal to b yx + r xo b yo ) will actually be greater than β yx. In this case, the omission of O causes an underestimate of the path coefficient between X and Y and variable O is said to have a suppressor effect such that its inclusion in the model serves to increase the estimated path coefficient between X and Y. Examples of such negative bias are present in Table 4.1. Suppressor effects are most readily manifested when the omitted variable has a very low correlation with RT2382X.indb 97 5/19/08 7:44:35 AM

8 98 Adam W. Meade, Tara S. Behrend and Charles E. Lance the endogenous variable but a moderate or large correlation with the exogenous variable in question. In such cases, the true path coefficient for the observed exogenous variable is considerably larger than the zero-order correlation between the exogenous variable and endogenous variable that is used as an estimate of the path coefficient. In sum, several important points result from the discussion of a model with one observed determinant (X) and one omitted determinant (O) of a single endogenous variable: 1. r xy will be a biased estimate of b yx to the extent that there exist omitted relevant causes of Y. 2. This bias will be upward (i.e., r xy > b yx ) to the extent that r xo b yo > By extension, both r xo and b yo must be nonzero for bias to occur. If either r xo 0 (O is unrelated to X and thus is a nonrelevant cause) or b yo 0 (there is not unique effect of O on Y; it is not a determinant of Y), no bias occurs. 4. If one of the terms, r xo or b yo, is negative and the other is positive, a suppression situation occurs (i.e., r xy < b yo ). 5. If r xo and b yo are both negative, there will be upward bias in the estimation of b yx from r xy. Violated Assumptions Omitted relevant variable represents a violation of the assumption of self-containment in causal modeling (James et al., 1982; Simon, 1977) and is but one type of model misspecification. We cannot isolate an endogenous variable from all potential causal explanatory variables in the social sciences. Instead, we replace the assumption of isolation with one of pseudo-isolation by assuming that the disturbance term, variance in the endogenous variable not accounted for by its modeled causes, is uncorrelated with exogenous variables (Bollen, 1989), or with endogenous variables that precede the variable in question in the causal path (Duncan, 1975; James, 1980). This can be seen by again examining Figure 4.2b. In Figure 4.2b, the disturbance term, d, would now include the effect of the standardized omitted variable (β yo ). Clearly, the self-containment assumption is violated, as X will correlate with the disturbance term by a magnitude of r xo β oy. RT2382X.indb 98 5/19/08 7:44:35 AM

9 Dr. StrangeLOVE 99 X r xo β mx M β yx β ym Y d β mo O β yo Figure 4.3 Partially mediated path model with omitted variable. More Complex Models Although the effects of the omitted variable are clearly visible in a model with two exogenous variables, things rapidly become more complex when more variables are added to the model. Figure 4.3 depicts a path model illustrating the partially mediating effect of a mediator (M) on the relationship between an exogenous variable, X, and an omitted relevant causal variable, O, with the endogenous variable (Y). The path model for M is identical to that of a two exogenous variable model. As in the previous example, if O is omitted, then the expected path coefficients and potential for bias are identical to those of a path coefficient with two determinants. There are three causes of Y, yet one of these is omitted. The true population path equation for this model is Y = β yx X + β ym M + β yo O + d (4.5) And the path coefficient β yx in the true model is given as ( ) + ( ) + ( ) r 1 r r r r r r r r rxo (4.6) yx mo ym xo mo xm yo xm mo β yx = 1+ 2rxmrmo rxo rxo rmo rxm More complicated models are obviously possible as well, though algebraic expressions for the path coefficients rapidly become unwieldy. In the current example, if variable O were omitted, the estimated path coefficient for the direct effect of X on Y would be RT2382X.indb 99 5/19/08 7:44:36 AM

10 100 Adam W. Meade, Tara S. Behrend and Charles E. Lance that of a two-determinant model, in which the effect of the omitted variable is ignored: r r r = (4.7) 2 1 r ˆβ yx yx ym xm xm In order to further illustrate the effects of an omitted variable in this model, data were simulated for several levels of correlation between variables O and Y. Table 4.2 contains the level of bias observed in the path coefficient of X for different levels of correlation between the omitted variable and the other causal variables in the model. Readily apparent from Table 4.2 is that the magnitude of bias is not large in any of the conditions when the correlation between O and Y is.20. Results are more mixed for those conditions in which the correlation between the omitted variable and Y is.60. In these conditions, the magnitude of the bias of path coefficient of X can be large, but only when the correlation between the X and O is also quite large. Also, the magnitude of the bias is mitigated somewhat by the correlation between the omitted variable and M, though the bias is still sizable. Note the values presented in Table 4.2 that represent the case in which there is a relatively small correlation between X and Y, and large correlations between O and both Y and X. Under these circumstances, bias can be sizable. We set the correlations in Tables 4.1 and 4.2 to arbitrary values in order to demonstrate their effects, but in practice correlation coefficients may not plausibly vary independently of one another (Mauro, 1990). In other words, a situation in which two variables correlate very highly, and one of those two correlates highly with a third variable while the other correlates negatively with the third variable, is mathematically improbable. The patterns of correlations that result in the most bias are those in which there is a very low correlation between the measured determinants and the endogenous variable, and high correlations between both the measured determinants and omitted variables and the omitted and endogenous variables (refer to Tables 4.1 and 4.2). While such patterns of correlations are mathematically possible, they may be unlikely in some domains of study given what is known from previous research. To summarize, omitted variables can introduce bias in estimated path coefficients and this bias may be positive or negative in RT2382X.indb 100 5/19/08 7:44:36 AM

11 Dr. StrangeLOVE 101 Table 4.2 Biasing Effects of an Omitted Variable in a Three- Determinant Model r yx r ym r yo r xm r xo r mo β yx ˆβyx Bias Note. β yx represents the true path coefficient of the exogenous variable X in the completely specified model. ˆβyx represents the estimated path coefficient of X in the omitted variable model. Bias is the difference between these two. direction. The issue is then, under what conditions is it possible for an omitted variable to bias path coefficients? Below is a summary for a model with one observed exogenous variable and one relevant omitted variable: If O is uncorrelated with the exogenous variable, r xy is an unbiased estimator of b yx and the omitted variable has no effect. If the variance in Y accounted for by O is completely redundant with the variables in the model, its unique effect (β yo ) will be near zero and it will have little biasing effect. RT2382X.indb 101 5/19/08 7:44:37 AM

12 102 Adam W. Meade, Tara S. Behrend and Charles E. Lance If O is uncorrelated with the endogenous variable but strongly correlated with the exogenous variable, r xy may underestimate b yx (i.e., a suppressor effect). Thus, there are three conditions which must be present in order for an omitted variable to cause positive bias in estimated path coefficients; that variable must (a) correlate at a nonzero level with other determinants of Y, (b) not be completely redundant with other variables included in the path model, and (c) correlate with the endogenous variable. If (a) and (b) are true, but (c) is not, the omitted variable may serve to artificially deflate the estimate of the path coefficient of the variables included in the model. In sum, the potential for LOVE is greatest when the omitted variable correlates highly with the outcome variable and moderately with other determinants in the model. Path Coefficient Bias Versus Significance Testing It is important to make a distinction between the biasing effect of omitted variables on the magnitude of path coefficients and the effect of omitted variables on the significance tests of those path coefficients. Generally speaking, in theory building via path analysis and SEM, there are two important outcomes of interest to the researcher: the magnitudes of the estimates of the path coefficients themselves and associated significance tests. Often in early stages of research, the primary outcome of interest in path analyses is the significance test associated with the path coefficient. In other words, the answer to the question does the variable have a unique effect on the outcome? would seem more important than the question what is the precise magnitude of the unique effect of the variable on the outcome? If early forays into model testing with a given set of variables indicate that the effect of a determinant on an endogenous variable is nonsignificant, it is less likely that future researchers would include this variable as a measured cause as frequently as if the variable did have a significant effect on the outcome. In this context, the magnitude of the path coefficient per se is less important than the decision as to the presence or absence of an effect of X on Y. If there does appear to be an effect (i.e., the test is significant), then future use and, importantly, replication of this effect is much more likely. While the rough magnitude of the effect RT2382X.indb 102 5/19/08 7:44:38 AM

13 Dr. StrangeLOVE 103 is undoubtedly important, small bias in the path coefficients would likely be of little concern so long as the conclusion of the significance test is not affected at this stage of investigation. The second outcome of path analysis is the magnitude of the path coefficients themselves. Estimates of path coefficients are important in that standardized coefficients are one index of the unique variance in the endogenous variable accounted for by the determinant. Additionally, unstandardized coefficients can be compared over time, and cumulative evidence can be collected such that the relative effect of a determinant on an outcome can be estimated. As research cumulates over time, the precision of estimated paths becomes important to future meta-analysts such that an accurate estimate of the effect of a determinant on an endogenous variable can be calculated. Thus, even though precise estimates of effects may not be of primary interest to a researcher in early stages of research on a topic, these estimates take on additional importance over time as research accumulates and meta-analyses are conducted. Recall that if the omitted variable does not correlate with the endogenous variable but correlates with other variables in the model, it may act as a suppressor variable. This was shown in Tables 4.1 and 4.2 where the exclusion of an omitted variable resulted in negative bias of the estimated path coefficient. That is, its inclusion in the model could serve to increase the estimated path coefficients of the observed variables. In regard to significance testing, omitted variables that do not correlate with the endogenous are potentially problematic in that they may result in Type II errors (i.e., failure to detect an effect that truly exists). However, reviewer criticisms of a lack of comprehensive path models typically center more on the potential upward biasing effects of omitted variables and associated Type I error (i.e., wrongly identifying an effect that does not exist). The focus on Type I errors is understandable as such errors may translate to immediate implications for practice and use of an determinant variable whereas Type II errors are less likely to be published and likely will be rectified in future studies. If Type II error is seen as less problematic as Type I error, the requirement of a significant correlation between the omitted variable and the outcome may be added to the list of conditions that must be met before the possibility of an omitted variable becomes a concern in path models. Omitted variables that do not correlate with the outcome cannot cause RT2382X.indb 103 5/19/08 7:44:38 AM

14 104 Adam W. Meade, Tara S. Behrend and Charles E. Lance upward bias in path coefficient estimates, which is typically the focus of LOVE concerns. Minimizing the Risk of LOVE There are specific conditions under which omitted variables can be problematic, and it is true that no matter how comprehensive a path model, there are always omitted relevant variables in organizational research. We have also illustrated that there can be substantial bias under some conditions; thus, there is a kernel of truth relating to LOVE in organizational research. To this extent, educating researchers on the ways in which to minimize the risk of omitted variable problems is of paramount importance. There are several ways in which organizational researchers can minimize the risk of omitted variables biasing path coefficients, discussed below. Experimental Control First, one could incorporate design characteristics that minimize the correlation between measured exogenous variables and omitted variables. Random assignment of participants is extremely successful in controlling for a wide range of known or unknown omitted individual difference variables. As we have emphasized, there can be no possible biasing effect of an omitted variable if that variable does not correlate with the observed variables in the path model (given sufficient sample size). As such, random assignment is highly effective for controlling for almost any individual difference variable in a path model. Although random assignment may not be possible in many instances of organizational research, there are some cases in which it may be employed. For example, participants may be randomly assigned to different types of training courses, reward systems, equipment and other environmental factors, or organizational interventions for which the effectiveness may be evaluated. In more mathematical terms, recall that in the case of one exogenous variable (X) and one omitted variable (O), the estimated effect of X on the endogenous variable (Y) is the zero-order correlation between X and Y. However, the true effect of X on Y should be given as Equation 4.8: RT2382X.indb 104 5/19/08 7:44:38 AM

15 Dr. StrangeLOVE 105 r r r (4.8) 2 1 r xy yo xo β yx = xo When random assignment is used, the correlation between X and O will be near zero (with sufficient sample size). Thus, Equation 4.8 reduces to r xy and there is no bias. More Inclusive Models Second, researchers should include as many known causes of the endogenous variable as is practically possible in the path model. The potential for bias in path coefficient estimates caused by omitted variables is much greater when they serve as unique causal agents of the endogenous variable. Recall that for a two determinant model with one determinant omitted, the bias present is equal to r xo β yo. By incorporating more determinants of the outcome, the unique effects of omitted variables may be reduced as β yo approaches zero. Note however, that there is a paradoxical side effect of including more variables. That is, each additional determinant that is included in the model is also prone to LOVE and is subject to the assumption of model self-containment. Use Previous Research to Justify Assumptions Researchers may also use what is already known from past research to demonstrate that omitted variables are not likely to be problematic. For example, when estimating the effects of ability determinants of job performance, one could legitimately leave out entire classes of other performance determinants such as personality and motivation, because these are likely to be uncorrelated with ability determinants and therefore are nonrelevant causes (Ackerman & Heggestad, 1997; Sackett, Gruys, & Ellingson, 1998; Salgado, Viswesvaran, & Ones, 2001; Schmidt & Hunter, 1998; see also Lance & James, 1999). On the other hand, if both verbal and quantitative aptitude were thought to be causes of employee job performance, it is unlikely that the omission of similar types of tests (e.g., mechanical ability) would RT2382X.indb 105 5/19/08 7:44:38 AM

16 106 Adam W. Meade, Tara S. Behrend and Charles E. Lance produce a strong biasing effect on path coefficients of those tests in the model as mechanical ability is exceedingly likely to have a large correlation (i.e., be redundant with) with the measured ability test variables. As such, the plausibility of bias due to omitting mechanical ability tests is very low as again β yo will be closer to zero. Put differently, in many instances nonrelevant causes can largely be ignored because they are either (a) not related to measured causes or (b) largely redundant with relevant causes that are already measured. To this extent, prior research on correlates of both the outcome and other determinants can provide guidance on what variables are essential to include in the model and which may be safely omitted. Consideration of Research Purpose If the goal is to provide a precise estimate of path coefficients, or to compare the relative variance accounted for by different determinants, omitted variables are considerably more problematic than if the goal is to test the statistical significance of the effect of a determinant on an outcome. Examining again the simple two determinant case, influence due to omitted variables can result in bias in the estimated path coefficient (r xy ) with respect to its true value (Equation 4.8). However, with large sample sizes, even sizable bias in estimated path coefficients are less likely to change decisions drawn from the statistical significance test associated with those coefficients. With large sample sizes, power is such that even small estimated effects tend to be statistically significant. In sum, omitted variables are a fact of life in organizational research and they can be problematic. Researchers should be particularly vigilant in cases in which (a) there are a large number of determinants of the outcome variable, (b) the study in question includes only a small subset of those determinants, (c) it is likely that the omitted variables have moderate or large correlations with the measured determinants, and (d) it is likely that the omitted variables would account for unique variance in the outcome variables. However, the notion that omitted variables are always problematic is a myth as the threat to the inferences that we tend to draw may not be as serious as some have believed. RT2382X.indb 106 5/19/08 7:44:39 AM

17 Dr. StrangeLOVE 107 References Ackerman, P. L., & Heggestad, E. D. (1997). Intelligence, personality, and interests: Evidence for overlapping traits. Psychological Bulletin, 121, Bollen, K. A. (1989). Structural equations with latent variables. Oxford, England: John Wiley and Sons. Colquitt, J. A., LePine, J. A., & Noe, R. A. (2000). Toward an integrative theory of training motivation: A meta-analytic path analysis of 20 years of research. Journal of Applied Psychology, 85, Duncan, O. D. (1975). Introduction to structural equation models. New York: Academic Press. Hanushek, E. A., & Jackson, J. E. (1977). Statistical methods for social scientists. San Diego: Academic Press. James, L. R. (1980). The unmeasured variables problem in path analysis. Journal of Applied Psychology, 65, James, L. R., Mulaik, S. A., & Brett, J. M. (1982). Causal analysis: Assumptions, models and data. Beverly Hills: Sage. Kenny, D. A. (1979). Correlation and causality. New York: Wiley-Interscience. Lance, C. E., & James, L. R. (1999). ν 2 : A proportional variance-accountedfor index for some cross-level and person-situation research designs. Organizational Research Methods, 2, Mauro, R. (1990). Understanding L.O.V.E. (left out variables error): A method for estimating the effects of omitted variables. Psychological Bulletin, 108, Prussia, G. E., & Kinicki, A. J. (1996). A motivational investigation of group effectiveness using social-cognitive theory. Journal of Applied Psychology, 81, Prussia, G. E., Kinicki, A. J., & Bracker, J. S. (1993). Psychological and behavioral consequences of job loss: A covariance structure analysis using Weiner s (1985) attribution model. Journal of Applied Psychology, 78, Sackett, P. R., Gruys, M. L., & Ellingson, J. E. (1998). Ability-personality interactions when predicting job performance. Journal of Applied Psychology, 83, Sackett, P. R., Laczo, R. M., & Lippe, Z. P. (2003). Differential prediction and the use of multiple predictors: The omitted variables problem. Journal of Applied Psychology, 88, Salgado, J. F., Viswesvaran, C., & Ones, D. S. (2001). Predictors used for personnel selection: An overview of constructs, methods and techniques. In D. S. Ones et al. (Eds.), Handbook of industrial, work and organizational psychology, Vol. 1: Personnel psychology (pp ). London, England: Sage Publications. RT2382X.indb 107 5/19/08 7:44:39 AM

18 108 Adam W. Meade, Tara S. Behrend and Charles E. Lance Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, Simon, H. A. (1977). Models of discovery: And other topics in the methods of science. Dordrecht, Holland: D. Reidel. RT2382X.indb 108 5/19/08 7:44:39 AM

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models

Chapter 5. Introduction to Path Analysis. Overview. Correlation and causation. Specification of path models. Types of path models Chapter 5 Introduction to Path Analysis Put simply, the basic dilemma in all sciences is that of how much to oversimplify reality. Overview H. M. Blalock Correlation and causation Specification of path