Unpublished manuscript. Power to Detect 1. Running head: THE POWER TO DETECT MEDIATED EFFECTS

Unpublished manuscript. Power to Detect 1 Running head: THE POWER TO DETECT MEDIATED EFFECTS The Power to Detect Mediated Effects in Experimental and Correlational Studies David P. MacKinnon and Chondra M. Lockwood Arizona State University July 7, 1997

Unpublished manuscript. Power to Detect 2 Abstract Mediation effects provide important information regarding the hypothesized relationships among variables. Confidence intervals and significance tests for whether a mediated effect is larger than expected by chance variability are beginning to appear in the research literature. Mediation occurs when an independent variable causes changes in an intermediate variable which, in turn, causes changes in the dependent variable. Typically, mediation is tested for significance using Sobel s (1982) method in which the product of two regression coefficients is divided by its standard error and the resulting Z- statistic is compared with the normal curve. We show that Type 1 error rates and estimates of statistical power based on this method are too low when compared with empirical values. This method is inaccurate because the distribution of the mediated effect divided by its standard error is normal only in special cases. An alternative method for testing mediation is developed based on Craig s (1936) work on the distribution of the product of two standard normal deviates. Comparison of the results of the new method with empirical values showed that power calculations and Type 1 error rates were far more accurate.

Unpublished manuscript. Power to Detect 3 The Power to Detect Mediated Effects in Experimental and Correlational Studies A mediator is a variable that accounts for all or part of a relationship between an independent and dependent variable. Mediation implies a causal hypothesis whereby an independent variable causes a mediator, which, in turn, causes a dependent variable (Sobel, 1990). Hypotheses regarding mediated or indirect effects are common in social science research (Alwin & Hauser, 1975; Baron & Kenny, 1986; James & Brett, 1984; Judd & Kenny, 1981). Examples of mediational hypotheses include the prediction that attitudes cause intentions, which, in turn, cause behavior (Ajzen & Fishbein, 1980), and prediction that knowledge leads to perceived risk of a health threat which, in turn, leads to health related behaviors (Janz & Becker, 1984). In addition to its application in cross-sectional and longitudinal studies, mediation analysis has been applied to randomized experimental studies. Examples in basic research in social psychology include the effect of attention on causal attribution, mediated by positive visual recall (Fiske, Taylor, & Kenny, 1982) and the effect of an agreeableness manipulation on dating desirability, mediated by perception of agreeableness (Jensen-Campbell, Graziano, & West, 1996). Examples in more applied contexts include the evaluation of a multi-component drug prevention program (Hansen & Graham, 1991) and a program to reduce symptoms of divorce (Wolchik, Ruehlman, Braver & Sandler, 1993). In these contexts, the randomization of subjects to treatment conditions and the knowledge that the treatment precedes the mediating variable and the outcome in time greatly strengthens the causal inferences that may be drawn (Holland, 1988). Mediation analysis provides a check on whether the

Unpublished manuscript. Power to Detect 4 program changed the intermediate variables it was designed to change, provides information on the process through which the treatment produced the outcome, and in applied settings generates information that may improve programs (MacKinnon, 1994; West & Aiken, 1997). Thus, the accuracy of statistical tests of mediation is critical for both basic and applied researchers in several substantive areas of psychology. Statistical Power and Type 1 Error Rates Statistical power is the probability of rejecting a false null hypothesis and is equal to 1-Type 2 error. Power calculations are used before a study to determine the sample size necessary to have a reasonable probability (usually.80) of rejecting a false null hypothesis of a specified size. Power calculations are also useful after a study to determine if the study design had sufficient power to detect an effect. Power calculations also provide a framework for researchers to formalize their emphasis on reducing Type 1 or Type 2 errors. Forming the ratio of Type 1 to Type 2 error provides a gauge of research areas relative seriousness of committing Type 1 or Type 2 errors (Sedlmeier & Gigerenzer, 1989). Current recommendations for Type 1 and Type 2 errors, for example, suggest that Type 1 errors are four (.20/.05) times more serious than Type 2 errors (Cohen, 1988). Concern for statistical power has increased over the last thirty years since Cohen's original studies that demonstrated rather low statistical power for most psychological research (Cohen, 1962, 1973). Low statistical power was observed in other research areas as well as in psychology including education (Cohen, 1973; Rossi, 1990) and clinical trials (Frieman, Chalmers, Smith, & Kuebler, 1978). There is evidence however that even though consciousness regarding statistical power has increased, there appears to be little change in the power of studies conducted after Cohen's original

Unpublished manuscript. Power to Detect 5 article, at least through the 1980s (Sedlmeier & Gigerenzer, 1989). These studies find that the actual power of psychological research is approximately.5, suggesting that Type 1 errors are actually treated as 10 times (.5/.05) more serious than Type 2 errors. No studies have specified the power to detect mediated effects or the sample size required to obtain a mediated effect of a certain size, with the exception of a technical report on some aspects of this topic (MacKinnon & Warsi, 1991) and an investigation of the detection of the proportion mediated with a binary outcome (Freedman & Schatzkin, 1992). Estimation of Mediation Effects and Standard Errors Before providing power formulas, it is necessary to define the estimation of mediated effects and their standard errors. The mediation model is shown in Figure 1 and is summarized in the three equations described below. (1) (2) In these equations, Y O is the outcome variable, X is the independent variable, X M is the mediator, J codes the relationship between the program and the outcome in the first equation, J! is the coefficient relating the program to the outcome adjusted for the effects of the mediator,, 1 and, 2 code residuals, and for ease of presentation, all predictor and outcome variables have been centered so there is no intercept.

Unpublished manuscript. Power to Detect 6 --------------------------- Insert Figure 1 about here --------------------------- In the first regression equation, the outcome variable (Y O ) is regressed on only the independent variable (X). In the second regression equation, the outcome (Y O ) is regressed on both the independent variable (X) and the mediator (X M ). The value of the mediated or indirect effect equals the difference in the program coefficients (J - J!) in the two regression models (Judd & Kenny, 1981). If the treatment coefficient (J!) is zero when the mediator is included in the model, then the program effect is entirely mediated by the mediating variable. A second method also involves estimation of two regression equations, and is illustrated in Figure 1. First, the coefficient in the model relating the mediator to the outcome is estimated ($) in Model 2 above. Second, as shown in the equation below, the coefficient (") relating the program to the mediating variable is estimated, where, 3 is a residual and all variables have been centered so the intercept is zero. (3) The product of these two parameters ("$) is the mediated or indirect effect. The coefficient relating the treatment variable to the outcome adjusted for the mediator (J!) is the nonmediated or direct effect. The rationale behind this method is that mediation depends on the extent to which the program changes the mediator (") and the extent to which the mediator affects the outcome variable ($). The J-J! and "$

Unpublished manuscript. Power to Detect 7 estimates of the mediated effect are equivalent (MacKinnon, Warsi, & Dwyer, 1995). The variance of the mediated effect,, can be found by finding the variance of the product of the " and $ regression coefficients. The exact variance of the multiplication of two independent random variables such as " and ß is equal to (4) Sobel (l982) was the first to derive the approximate variance of the mediated effect using the multivariate delta method and show its application to research data. The multivariate delta method is used to derive the variance of functions of parameters (Bishop, Fienberg, & Holland, l975). The method consists of pre- and post-multiplying the covariance matrix among the relevant parameters by the partial derivatives of a function of random variables with respect to each random variable. The approximate variance derived by Sobel (1982, 1986) is based on first derivatives, so it does not include the term, which is usually small compared to the other two terms. The variance estimators assume that coefficient vector containing " and $ is consistent, efficient, and asymptotically normal. As a result, the standard error is based on asymptotic theory and may not be accurate at small samples. These variances can be used to construct standard errors and confidence limits for the mediated effect. Both variance estimators have been shown to be unbiased at sample sizes as small as 50 for a three variable model (MacKinnon, Warsi, & Dwyer, 1995). In many studies, the mediated effect is divided by its standard error and the resulting ratio is compared to a table of the normal distribution to test its significance (Bollen & Stine, 1990; Kim, Sandler, & Tein, 1997; MacKinnon et al., 1991; Scheier & Botvin, 1997; Wolchik et al., 1993). If the

Unpublished manuscript. Power to Detect 8 value of the observed Z statistic is greater than 1.645, then it is concluded that an effect of that size would be observed 5% of the time by chance. Confidence limits for the mediated effect have also been used and lead to the same conclusion with regard to the null hypothesis. Although the variance and standard error estimates of the mediated effect are unbiased even at small sample sizes, there is some evidence that hypothesis tests based on these values do not perform well. In the two major simulation studies conducted to date (MacKinnon, Warsi, & Dwyer, 1995; Stone & Sobel, 1990) there appears to be an imbalance in the number of times a true value is outside to the left or right of the confidence limits. For positive values of the mediated effect in which " and $ are both positive or both negative, the confidence limits are more often to the left than to the right of the true value. The implication of this in a research study is that there is less power to detect a true mediated effect. The imbalance is due to the assumption that the distribution of the mediated effect is symmetric when, in fact, it is skewed for nonzero mediated effects, as will be shown. Power Formulas for the Mediated Effect Statistical Power. Statistical power is calculated using a distribution where there is a zero mediated effect, the central distribution, and a distribution based on a true nonzero mediated effect, the noncentral distribution. Typically, the central distribution is the Z or normal distribution for a null hypothesis of no effect. The noncentral distribution is typically more complicated because the shape of the distribution may differ for different noncentral mean values (e.g., the F and P 2 distributions). The noncentral Z distribution, however, retains its original shape, and is merely shifted away from a zero mean by the value of noncentrality parameter. Assuming that Z="$/F "$ is normally distributed, the noncentral Z statistic is obtained by dividing the point estimate of the mediated effect by its standard

Unpublished manuscript. Power to Detect 9 error, "$/F "$. For a one-tailed alpha=.05, the critical value on the central Z distribution is 1.645. An important step in power calculations is to find the point on the noncentral Z-distribution corresponding to the value of 1.645 on the central distribution. The point on the noncentral Z-distribution corresponding to 1.645 on the central distribution is obtained by finding the difference between 1.645 and the mean of the noncentral Z distribution. Power = 1 - P(Z central - Z noncentral ), where P(Z central - Z noncentral ) is the probability of observing a Z-value equal to the difference between the central and noncentral Z values. The population values of the " and $ parameters and their population standard errors for a specified mediation model are used to calculate power. Given these values, power for any combination of " and $ values can be easily computed. In Appendix A, we derive the covariances among X, X M and Y O, assuming normally distributed error terms and variance of each error term equal to one. We also show the partial correlations between the mediator and Y O with X held constant and between X and Y O with X M held constant. For all continuous measures where the variance of " is equal to 1/(N-2) and the variance of $ is equal to 1/(N-3). (5) For a binary independent variable corresponding to an experimental study, the formula is (6)

Unpublished manuscript. Power to Detect 10 where the standard error of " is equal to 4/(N-2) assuming an equal number of cases in each of two groups. The numerator of the standard error is 4 rather than 1 in the binary independent variable case because the variance of the binary variable is equal to pq=(.5)(.5)=.25 when there are equal numbers of subjects in each group. The variance of the continuous variables is thus reduced to 1/.25=4 (Cohen, 1983). The noncentral Z equals "$/F "$. The power to detect the mediated effect is the probability of observing a difference as large or larger between the central and noncentral Z statistics. In the rest of this paper, we refer to this method to test significance as Z="$/F "$. Methods to calculate sample size required to detect a mediated effect given the values of ", $, and Type 1 and Type 2 errors are shown in Appendix B. An Alternative Method Based on the Distribution of a Product As will be shown below, the assumption that the mediated effect divided by its standard error is distributed normally is incorrect in some situations. As a result, the power formulas described above will be incorrect in certain situations. Up to now, the power to detect a mediated effect "$/F "$ has been discussed. An alternative method for testing mediated effects can be developed based on distributional work by Aroian (1944), Craig (1936), and Springer (1979). These authors have done extensive work on the product of two normally distributed random variables. Consequently, instead of considering the product of "$, we consider the product of Z " Z $, where Z " ="/F " and Z $ =$/F $. This product of two standard normal variables is not normally distributed (Lomnicki, 1967; Springer & Thompson, 1966). In fact, in the null case where both Z " and Z $ have means equal to zero, the distribution is symmetric but has kurtosis equal to 6 (Craig, 1936). When the product of the means, Z " Z $, is nonzero, the

Unpublished manuscript. Power to Detect 11 distributions are skewed as well as having excess kurtosis (Craig, 1936). The four moments of the product of two correlated normal variables were given by Craig (1936) and a typographical error was corrected in Aroian, Taneja, and Cornwell (1978). Below are the moments when the variables are uncorrelated as in the case here. (7) (8) (9) (10) The general analytical solution for the distribution of the product of two independent standard normal variables does not approximate any of the familiar distributions commonly used in statistics (some special cases do, however). Instead, the analytical solution for this product is a Bessel function of the second kind with a purely imaginary argument (Aroian, 1944; Craig, 1936). While computation of these values is complex, Springer and Thompson (1966) provide a table of the values of this function when Z " = Z $ = 0. We also provide a program to compute these values in Appendix C using the Mathematica (1988) programming language when Z " = Z $ = 0. The formula for the case when Z " = Z $ = 0 is equal to (1/B)K 0 and the general formula for any value of Z " and Z $ are (11)

Unpublished manuscript. Power to Detect 12 (Hayya & Ferrara, 1972) where K is the Bessel function and where E is equal to (12) where (13) The frequency distribution function of the product of two standardized normal variables can be used to test the significance of the mediated effect and therefore can be used to compute the statistical power to detect mediated effects. In this method the product of the two Z statistics (one for the " parameter, Z ", and another for the $ parameter, Z $ ) is computed and compared to the tabled values for the frequency distribution produced by the Mathematica program in Appendix C and described by Craig (1936) and Springer and Thompson (1966). The method to test significance based on the product of Z statistics is called P= Z " Z $ in this paper. The analytical distribution function of the mediated effect divided by its standard error, "$/F "$, is more complicated than the product of noncentral Z statistics and does not appear to reduce to a straightforward analytical formula as for the product of two standard normal deviates (Springer, 1997 personal communication).

Unpublished manuscript. Power to Detect 13 Given that there is not an analytical formula for a distribution, we took the alternative approach of empirically determining these distributions. Data for a large number of simulated cases are tabulated and the resulting cumulative probability distribution is used to compute statistical significance and power. For the "$/F "$ case, a computer is used to randomly generate data with true values, such as "=0 and $=0 for a specific sample size. The data are repeatedly generated 10,000 times and the probability distributions for the 10,000 cases is used as an approximation in place of a probability distribution derived analytically. The formulas used to calculate power for these new distributions are the same as described above except that values of the new empirical probability distribution are used in place of the Z values for a normal distribution. These considerations lead to three different methods of computing statistical power of the mediated effect. We replace the "Z" in these formulas with Z! for the empirical distribution of Z!="$/F "$ and "P" for the product of standardized normal variables, P=Z " Z $ to signify that the distributions are different from the Z distribution, e.g. Power = 1 - P (P central - P noncentral ). Summary The purpose of this paper is to demonstrate that methods used to test the significance and compute power for mediated effects based on the assumption of a normal Z statistic for "$/F "$ are incorrect. Two alternatives to correct the problem, one based on the distribution of P = Z " Z $ and the other based on Z!="$/F "$, are evaluated. These alternatives are based on empirical frequency distributions of functions of random variables. We also evaluated the Type 1 error rate for each method when the mediated effect was equal to zero. Methods

Unpublished manuscript. Power to Detect 14 There are two major parts to this research. First, we generate empirical distributions for P=Z " Z $ and Z! = "$/F "$ because of the lack of analytical solutions for the Z! = "$/F "$ distribution and the lack of a program to integrate equation 11. Second, we use the results of these empirical distributions to compare the predicted and empirical statistical power and Type 1 error rates of the different methods to test the significance of mediated effects for sample sizes covering a range of values common in the social sciences. Three methods are used to test significance. First, we test significance with formulas assuming that the mediated effect divided by its standard error is distributed normally and its significance can be evaluated with a Z-statistic, Z="$/F "$. Second, we test the significance of mediated effects by multiplying Z statistics for the " and $ paths and looking up the probability of observing these values in tables based on the frequency distribution of the product of two standardized normal random variables (Springer & Thompson, 1966), P=Z " Z $. Finally, we use critical values determined by the empirical distribution of the mediated effect divided by its standard error, Z! = "$/F "$. Simulation Description. The SAS (Statistical Analysis System, 1989) programming language was used to conduct the statistical simulations. The data were generated from a normal distribution using the normally distributed RANNOR function with current time as the seed for each simulation. Five different sample sizes corresponding to sample sizes common in the social sciences were simulated: 50, 100, 200, 500, and 1000. For the case of a continuous independent variable, four different sets of parameter values were simulated, "=$=0,"=$=.14,"=$=.39, and "=$=.59 corresponding to partial correlations of "=0 and $=0, "=.14 and $=.14, "=.36 and $=.36, "=.51 and $=.51, respectively. The different effect sizes correspond to zero, small (2% of the variance), medium (13% of the

Unpublished manuscript. Power to Detect 15 variance), and large (26% of the variance) effect sizes as described in Cohen (1988, p. 412-414). In every model, the J! direct effect parameter was equal to zero. The independent variable was simulated to have one of two distributions: (a) a normally distributed continuous variable, or (b) a normally distributed variable that was dichotomized at the mean into a binary variable with equal numbers of subjects in each group. For the binary independent variable example, four different sets of parameter values were simulated, "=$=0,"=.28 and $=.14, "=.78 and $=.39, and "=1.18 and $=.59 corresponding to partial correlations of "=0 and $=0, "=.14 and $=.14, "=.36 and $=.36, "=.51 and $=.51, respectively. Different parameter values were simulated for the case of a binary independent variable because of the reduced effect size when a normally distributed continuous variable is dichotomized (Cohen, 1983). The parameter values were chosen so that the effect size for the binary versus continuous independent variable case were the same. The four effect sizes (zero, small, medium, or large), five sample sizes (50, 100, 200, 500, or 1000), and two types of independent variables (binary or continuous) yield a total of 40 different combinations. Empirical Distributions. Determination of empirical distributions was done prior to assessing Type 1 error rates and statistical power. The determination of the empirical distribution for product of two Z statistics, P=Z " Z $ and the distribution of the mediated effect divided by its standard error, Z!="$/F "$, were obtained by generating 10,000 samples each for the 40 possible combinations of parameter values, sample size, and type of independent variable conditions. The empirical frequency distribution for each of the 40 combinations was constructed by tabulating the 10,000 replications into a cumulative frequency distribution. These empirical cumulative distributions were then used as the central

Unpublished manuscript. Power to Detect 16 (for the no effect condition) and the noncentral (when the effect was nonzero) distribution to calculate Type 1 error rates and the power to detect the mediated effect. Empirical Power, Predicted Power and Type 1 Error Rate Calculations. The empirical frequency distributions described above were used to calculate predicted power and Type 1 error rates. There were three predicted power values based on (1) Z="$/F "$, under the assumption that it is normally distributed, (2) the product of independent standardized normal variables P=Z " Z $, and (3) the empirical distribution of Z!="$/F "$. Empirical power and Type 1 error rates were used to evaluate the three methods to test the significance of the mediated effect. The number of times that the mediated effect was statistically significant in 500 samples for each of the 40 combinations of parameter values, samples sizes, and type of independent variable was tabulated. For the case where "=$=0, the probability of rejecting the hypotheses of zero mediated effect, the Type 1 error rates, were calculated for each method. Because we use the 5% significance level, the mediated effect should be statistically significant in 25 of the 500 or 5% of the samples when the mediated effect equals zero. When both " and $ do not equal zero, the probability of concluding that the mediated effect is statistically significant is the correct decision. The number of times that each of the three methods led to the conclusion that the mediated effect was larger than expected by chance at the 5% level is the measure of statistical power. The most accurate method would have the highest number of times where it was concluded that the mediated effect was statistically significant. These empirical power values are based on the number of times that the effect was statistically significant out of the 500 samples. For each of the three methods, the predicted power was calculated using empirical distributions

Unpublished manuscript. Power to Detect 17 for P=Z " Z $, Z!="$/F "$, and the normal distribution for Z="$/F "$. The empirical power values are compared to the predicted power values for each of the three significance testing methods. An accurate method should have a close correspondence between the predicted and empirical power. Check of simulation program and power calculations. We also include the Type 1 error rates and statistical power to detect the significance of a regression coefficient to demonstrate that these effects have Type 1 error rates and statistical power consistent with the normal distribution test for the regression estimate divided by its standard error. This procedure is a check on the accuracy of the simulation program and verifies the discrepancy between the predicted and empirical power to detect mediated effects. Power of a regression coefficient is calculated in the same way as the mediated effect, the probability of observing a value in the distribution equal to the difference between the critical value for the selected distribution and the noncentral value. The noncentral Z is the Z value for the true value of the coefficient divided by its standard error. Power, then, equals 1-P(Z critical - Z noncentral ). Another way to calculate power is to reduce the regression coefficient into a correlation, and calculate power of the correlation via a t-test (Cohen, 1988). The same formula for power holds, using the t-distribution rather than the Z-distribution. These methods were used to estimate power based on a variety of regression coefficient values and sample sizes. Results Empirical Distributions The empirical distributions of the product of two standardized normal random variables, Z " Z $, and the empirical distributions of the mediated effect divided by its standard error are shown in Table 1. These frequency distributions are based on 10,000 replications for each of three cases corresponding

Unpublished manuscript. Power to Detect 18 to "=$=0,"=$=.14,"=$=.39, and "=$=.59, corresponding to a zero, small, medium, and large effect (Cohen, 1988). The empirical distribution for sample sizes of 50, 100, 200, 500, and 1000 are presented. --------------------------- Insert Table 1 about here --------------------------- In one situation we are able to check the accuracy of the empirical distribution with an analytical distribution. For the case where "=0 and $=0, the distribution function of P=Z " Z $ should be equal to 1/B times the Bessel function of zero order. A check on the methodology to generate empirical distributions is provided by comparing the critical values based on this formula to the actual empirical distribution function. These values are very close to the values computed using the analytical formula (using the program in Appendix C) increasing confidence that both the formula and the simulation methodology are accurate. The difference between the normal distribution and the empirical distributions of Z!="$/F "$ is illustrated in the plot in Figure 2. Note the excess of values in the distribution of Z!="$/F "$ around zero as found in the distribution of P=Z " Z $ in other studies (Craig, 1936). Note the skewness and kurtosis of the distributions in Figure 3, which compares the normal curve to Z!="$/F "$ when "$ 0. The predicted moments using formulas 7, 8, 9, and 10 above and the actual values of these moments in the empirical distributions are shown in Table 2. There were few discrepancies between the predicted and empirical values for the moments. Predicted and empirical values will differ somewhat because the empirical distributions are sample realizations of the population values.

Unpublished manuscript. Power to Detect 19 --------------------------- Insert Table 2 and Figures 2 and 3 about here ---------------------------. Statistical Power and Type 1 Error Rates As described in the Methods section, the empirical and predicted statistical power and Type 1 error rates for the regression coefficient were computed and are shown in Table 3. As expected, the predicted and empirical power and Type 1 error rates were very close providing a check for the simulation procedure. --------------------------- Insert Table 3 about here ---------------------------. Power and Type 1 error rate calculations using Z!="$/F "$ and P=Z " Z $ proceed in the same manner as for the Z-distributions except that the critical values of the central and noncentral distributions are obtained from the empirical cumulative frequency distributions. Type 1 error rates. The Type 1 error rates for each of three methods are shown in Table 4 for an effect size of zero. The nominal Type 1 error rate is 5%. The results are quite consistent across continuous or dichotomous independent variables because we set the effect sizes to be equal. When the effect size is zero, the Type 1 error rates for the normality assumption method are essentially zero. The two empirical methods yield Type 1 error rates close to the nominal value of.05. These latter two methods are more accurate than the method based on the normal distribution assumption.

Unpublished manuscript. Power to Detect 20 --------------------------- Insert Table 4 about here ---------------------------. Statistical Power. Table 4 shows the predicted and actual power for each of the three methods. In every case the method assuming the normal distribution has lower power than the other methods and importantly, even lower power than predicted from formulas assuming the normal distribution. The discrepancy exists because the assumption that "$/F "$ is normally distributed is incorrect. The predicted power values and empirical power values are quite consistent for P=Z " Z $ and Z!="$/F "$. These methods also have considerably more power than the method that assumes a normal distribution. Conclusions The most common method to test the statistical significance of a mediated effect has Type 1 error rates that are below the nominal value and has reduced power compared to other more accurate methods. As a result, the method is too conservative, dismissing true nonzero mediated effects as nonsignificant. If an error must be made, such a conservative approach may be better than one that is too liberal. However, it is likely that important, true mediated effects would be missed using this procedure. If a researcher is using mediation analysis to decide which components of a program to retain, it can be argued that she would rather keep an inert component (if she could afford to ) rather than throw out an effective component due to lack of power. The reason for the inaccuracy is that the assumption that "$/F "$ follows a normal distribution is wrong. More accurate methods are available based on the product of random variables and the empirical distributions of "$/F "$. In this study, these

Unpublished manuscript. Power to Detect 21 methods had more statistical power along with accurate Type 1 error rates. It is not clear, however, whether these new methods will perform well in the case of model misspecifications such as omitted variables and nonnormal data. Research on the effects of misspecification is now underway. There are other methods to test the significance of mediated effects. These include the steps mentioned in Baron and Kenny (1986) and Judd and Kenny (1981), as well as confidence limits for the difference between the correlation between the independent variable and the dependent variable and the correlation between the independent and dependent variable with the mediator partialled out (Olkin and Finn, 1995). Our study of these methods however, indicates that these methods have even less power than the procedure that assumes a normal distribution for Z="$/F "$ (MacKinnon, Lockwood, Sheets, Braver & West, 1997). One interesting result of testing the significance of mediated effects is that either " or $ can be nonsignificant but Z " Z $ may be significant, indicating that the mediated effect is larger than expected by chance alone while the regression coefficients contributing to its effect are not. This article has been silent regarding important conceptual issues in interpreting mediation effects such as whether a nonsignificant regression coefficients should yield significant mediation effects. Here we have assumed that the mediation model is known. In practice, the hypothesized chain of effects in a mediation relationship may be wrong and there may be several equivalent models that will explain the relationships equally well. For example, the mediator may actually change the independent variable that may then affect the outcome. In the case of a randomized experiment, the independent variable improves interpretation because it must precede the mediator and the dependent variable, but even in this situation the interpretation of mediation effects is more complicated than what might be expected

Unpublished manuscript. Power to Detect 22 (Holland, 1988). In the relation between the mediator and the dependent variable, for example, it is difficult to determine what aspect of the mediator is changed by the experimental manipulation that in turn changes the outcome variable and what part is the existing relationship between the mediator and dependent variable. Issues regarding the specificity of the effect to one or a few of many mediators and future experiments targeted at specific mediators improve the interpretation of these effects (West & Aiken, 1997). None of these methods to test the statistical significance of mediated effects answer these critical conceptual questions, but when combined with careful replication studies these relationships should be clarified. It will also be helpful to obtain accurate confidence limits for the mediated effect that incorporate the asymettry of the nonzero distributions. The information included in the Tables in this article can be used to obtain asymettric confidence limits but only for the parameter values included in the empirical simulations. It should be possible to use the information on the four moments of the product distribution to adjust confidence limits. It is clear from this study that these distributions often contain substantial skewness and kurtosis. We have empirical distribution results for all the possible combinations of four effect sizes (zero, small, medium, and large) for each of the " and $ parameters for all the sample sizes described in this document.. Tables containing the cumulative percentages by each percentage point for these distributions are available from the authors and are now being included in an internet web site. We are now working on an easy way to obtain the values for the general analytical solution (formula 11) for Z " Z $ when Z " and Z $ are nonzero. These results require the integration of formula 11 which is cumbersome but the Mathematica program should do this. The analytical distribution for Z!="$/F "$ is

Unpublished manuscript. Power to Detect 23 more complicated but tractable. It is hoped that this article will lead to the analytical solutions for this distribution. Several references will be useful in this work (Aroian, Taneja, & Cornwell, 1978; Springer, 1979). Examples Here are several examples of how a researcher would use the information in this article to calculate power. Case 1. A researcher conducted a study of the extent to which attitudes affect intentions which in turn affect behavior, where attitudes, intentions, and behavior are all measured continuously. A total of 50 subjects were included in the study. The relationship between attitudes and intentions corresponded to a medium effect. The relationship between the intentions and behavior was also a medium effect. The power to detect this effect is.92 from Table 4, using the empirical distribution of ("$)/F "$. Case 2. A researcher has designed a study to evaluate a program to prevent symptoms in children of divorce. It was hypothesized that the program (binary variable) would have beneficial effects by improving mother s positive discipline strategies (continuous variable). There were a total of 100 subjects, with 50 in each of two groups. The statistical test of this mediation effect was nonsignificant leading the researchers to claim that the positive discipline was not a likely mediator of the prevention program. The effect sizes for the for the " and $ parameters were small. Reading from Table 4, it is seen that the power to detect this mediation effect is.45 using the empirical distribution, Z!="$/F "$. It appears that the power to detect the effects was lower than.8. In fact, the power of.45 is very close to the average power of most studies in the social sciences (Rossi, 1990).

Unpublished manuscript. Power to Detect 24 Case 3. Half of 500 subjects were randomized to receive an intervention to increase healthy nutrition behaviors by changing perceptions of the norm regarding dietary behavior. The effect size for the program effect (binary) on perceptions of the norm regarding dietary behavior (continuous) and the effect size relating perceptions of the norm to healthy nutrition behaviors were both small. The power to detect this effect is.98 as seen in Table 4. Case 4. A program to reduce intentions to use anabolic steroids among high school football players was designed to change knowledge of strength training alternatives to anabolic steroids. It was hypothesized that the program effect (binary) on knowledge of alternatives and the relationship between knowledge and intentions to use anabolic steroids (continuous) were both small. There were a total of 200 subjects with half exposed to the prevention program. The power to detect the mediated effect was.73 using the empirical distribution, Z!="$/F "$..

Unpublished manuscript. Power to Detect 25 References Ajzen, I., & Fishbein, M. (1980). Understanding attitudes and predicting social behavior. Englewood Cliffs, NJ: Prentice Hall. Aroian, L. A. (1944). The probability function of the product of two normally distributed variables. Annals of Mathematical Statistics, 18, 265-271. Aroian, L. A., Taneja, V. S., & Cornwell, L. W. (1978). Mathematical forms of the distribution of the product of two normal variables. Communications in Statistics: Theory and Methods, A7(2), 165-172. Alwin, D. F. & Hauser, R. M. (1975). The decomposition of effects in path analysis. American Sociological Review, 40, 37-47. Baron, R. M. & Kenny, D.A. (1986). The moderator-mediator distinction in social psychological research: Conceptual, Strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6), 1173-1182. Bishop, Y. M., Fienberg, S. E., & Holland, P. W. (1975). Discrete multivariate analysis: Theory and practice. Cambridge, MA: MIT Press. Bollen, K.A. & Stine, R. (1990). Direct and indirect effects: Classical and bootstrap estimates of variability. In C. C. Clogg (Ed.), Sociological Methodology (pp. 115-140). Washington, DC: American Sociological Association. Cohen, J. (1962). The statistical power of abnormal-social psychological research. Journal of Abnormal and Social Psychology, 65, 145-153. Cohen, J. (1973). Statistical power analysis and research results. American Educational

Unpublished manuscript. Power to Detect 26 research Journal, 10, 225-229. Cohen, J. (1983). The cost of dichotomization. Applied Psychological Measurement, 7(3), 249-253. Cohen, J. (1988). Statistical power for the behavioral sciences. Hillsdale, New Jersey: Lawrence Erlbaum Associates. Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45(12), 1304-1312. Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. Craig, C. C. (1936). On the frequency function of xy. Annals of Mathematical Statistics, 7, 1-15. Fiske, S. T., Kenny, D. A., & Taylor, S. E. (1982). Structural models for the mediation of salience effects on attribution. Journal of Experimental Social Psychology, 18(2), 105-127. Freedman, L. S., & Schatzkin, A. (1992). Sample size for studying intermediate endpoints within intervention trials or observational studies. American Journal of Epidemiology, 136(9), 1148-1159. Frieman, J. A., Chalmers, T. C., Smith, H., & Kuebler, R. R. (1978). The importance of beta, the type II error and sample size in the design and interpretation of the randomized control trial: Survey of 71 negative trials. The New England Journal of Medicine, 299, 690-694. Hansen, W. B., Graham, J. W., Wolkenstein, B. H., & Rohrbach, L. A. (1991). Program integrity as a moderator of prevention program effectiveness: Results for fifth-grade students in the adolescent alcohol prevention trial. Journal of Studies on Alcohol, 52(6), 568-579. Hayya, J. C., & Ferrara, W. L. (1972). On normal approximations of the frequency functions

Unpublished manuscript. Power to Detect 27 of standard forms where the main variables are normally distributed. Management Science, 19(2), 173-186. Holland, P. W. (1988). Causal inference, path analysis, and recursive structural equation models. Sociological Methodology, 18, 449-484. James, L. R. & Brett, J. M. (1984). Mediators, moderators and tests for mediation. Journal of Applied Psychology, 69(2), 307-321. Janz, N. K. & Becker, M. H. (1984). The health belief model: a decade later. Health Education Quarterly, 11, 1-47. Jensen-Campbell, L. A., Graziano, W. G., & West, S. G. (1996). Dominance, prosocial orientation, and female preferences: Do nice guys really finish last? Journal of Personality and Social Psychology, 68(3), 427-440. Judd, C. M. & Kenny, D. A. (1981). Process Analysis: Estimating mediation in treatment evaluations. Evaluation Review, 5(5), 602-619. Kim, L. S., Sandler, I. N., & Tein, J.-Y. (1997). Locus of control: A stress moderator and mediator in children of divorce. Journal of Abnormal Child Psychology, 25, 181-199. Lomnicki, Z. A. (1967). On the distribution of products of random variables. Journal of the Royal Statistical Society, 29, 513-524. MacKinnon, D. P. (1994). Analysis of mediating variables in prevention and intervention studies. National Institute on Drug Abuse Research Monograph Series, 139, 127-153. MacKinnon, D. P., Johnson, C.A., Pentz, M. A., Dwyer, J. H., Hansen, W.B., Flay, B. R., & Wang, E. (1991). Mediating mechanisms in a school-based drug prevention program: First year effects

Unpublished manuscript. Power to Detect 28 of the Midwestern Prevention Project. Health Psychology, 10(3), 164-172. MacKinnon, D.P., Lockwood, C.M., Sheets, V., Braver, S., & West, S. G. (1997). Comparison of methods to detect mediated effects. Manuscript in preparation. MacKinnon, D.P. & Warsi, G. (1991). On the variance of measures of mediation. Technical Report. Available from the first author. MacKinnon, D. P. & Warsi, G., & Dwyer, J. H. (1995). A simulation study of mediated effect measures. Multivariate Behavioral Research, 30(1), 41-62. Mathematica (Version 3.0) [Computer software]. Champaign, IL: Wolfram Research, Inc. Olkin, I. & Finn, J. D. (1995). Correlations redux. Psychological Bulletin, 118(1), 155-164. Pruett, J. M. (1972). The distribution of products of some independent, non-standardized random variables. Doctoral Dissertation. The University of Arkansas. Rossi, J. S. (1990). Statistical power of psychological research: What have we gained in 20 years? Journal of Consulting and Clinical Psychology, 58, 646-656. SAS (Version 6.12) [Computer program]. (1989). Cary, NC: SAS Institute, Inc. Satorra, A., & Saris, W. E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50, 83-90. Scheier, L. M., & Botvin, G. J. (1997). Expectancies as mediators of the effects of social influences and alcohol knowledge on adolescent alcohol use: A prospective analysis. Psychology of Addictive Behaviors, 11(1), 48-64. Sedlmeier, P. & Gigerenzer, G. (1989). Do studies of statistical power have an effect on the power of studies? Psychological Bulletin, 105, 309-316.

Unpublished manuscript. Power to Detect 29 Sobel, M. E. (1982). Asymptotic confidence intervals for indirect effects in structural equation models. In S. Leinhardt (Ed.), Sociological Methodology (pp. 290-312). Washington, DC: American Sociological Association. Sobel, M. E. (1986). Some new results on indirect effects and their standard errors in covariance structure models. In N. Tuma (Ed.), Sociological Methodology (pp. 159-186). Washington, DC: American Sociological Association. Sobel, M.E. (1990). Effect analysis and causation in linear structural equation models. Psychometrika, 55(3), 495-515. Springer, M. D. (1979). The algebra of random variables. New York: John Wiley & Sons. Springer, M. D. and Thompson, W. E. (1966). The distribution of independent random variables. SIAM Journal on Applied Mathematics, 14(3), 511-526. Stone, C. A. & Sobel, M. E. (1990). The robustness of estimates of total indirect effects in covariance structure models estimated by maximum likelihood. Psychometrika, 55(2), 337-352. West, S. G., & Aiken, L. S. (1997). Towards understanding individual effects in multiple component prevention programs: Design and analysis strategies. In K. Bryant, M. Windle, and S. West (Eds.), New methodological approaches to prevention research. Washington, D.C.: American Psychological Association. West, S.G., Sandler, I., Baca, L., Pillow, D., & Gersten, J. C. (1991). The use of structural equation modeling in generative research: Toward the design of a preventive intervention for bereaved children. American Journal of Community Psychology, 21, 293-331. Wolchik, S. A., Ruehlman, L. S., Braver, S. L., & Sandler, I. N. (1989). Social support of

Unpublished manuscript. Power to Detect 30 children of divorce: Direct and stress buffering effects. American Journal of Community Psychology, 17(4), 485-501.

Unpublished manuscript. Power to Detect 31 Appendix A. Covariance algebra yields the following predicted covariances based on the equations among X, X M, and Y O described in the text and assuming independent, normally distributed error terms: The correlations among X, X M and Y O are: The correlation between the mediator and Y O partialled for X, r ym.x, is: The correlation between X and Y O partialled for the mediator, r yx.m is:

Unpublished manuscript. Power to Detect 32 The partial correlations squared provide measures of effect size (Cohen, 1988, p. 477) for the parameters in the mediation model.

Unpublished manuscript. Power to Detect 33 Appendix B Calculation of Sample Size. Each of the following formulas assume normal distributions. To calculate sample size with the empirical distributions for P and Z!, replace z type 1 and z 1-type 2 with the values of these new distributions. The formula for the sample size required to detect a mediated effect of a certain size for all continuous variables (when the parameters of the mediated effect are equal, " = $) is and for a binary independent variable where z 1-type2 is the Z value for the required power and z type1 is the critical Z value for testing significance. When " does not equal $, the formula for continuous variables is

Unpublished manuscript. Power to Detect 34 and for a binary independent variable is For maximum likelihood estimation using covariance structure modeling and the multivariate delta method for the variance of the mediated effect, the formula for the noncentral Z-statistic equals and the formula for sample size equals The derivation for the last equation is given in more detail for illustration. Square both sides and divide by " 2 $ 2. Divide each side by " 2 +$ 2.

Unpublished manuscript. Power to Detect 35 Add 1 to each side.

Unpublished manuscript. Power to Detect 36 Appendix C. Mathematica (1988) program to compute frequency distribution of the product of two standardized random variables when Z " = Z $ = 0. Plot[1/Pi BesselK[0, x],{x, 0, 2}]; Do[Print[{b},{NIntegrate[1/Pi BesselK[0, x],{x, 0.0000000001, b}]}],{b,.01, 2,.01}];

Unpublished manuscript. Power to Detect 37 Appendix D Computer Program to Compute Statistical Power The SAS (1989) computer program, POWMED, will compute statistical power for a model with one mediator. The input for the program is the values for the ", $, and J, the variances of each variable, the Z for the statistical test, and the Z corresponding to the required statistical power. The program then computes the power to detect the effect using both regression and structural equation programming. The power to detect the mediation effect with the exact solution for the variance of the mediated effect, the first derivative solution (Sobel, 1982;1986), and a statistical test of whether the " and $ parameters are statistically significant. Tabled values of the distribution of P=Z " Z $ and Z!="$/F "$ are also provided. At this point only a few of these distributions are included in the program. The program is available by writing to the first author. It is straightforward to extend this program for the latent variable model (using SAS PROC CALIS) and for multiple mediator models. The program uses the covariance algebra for the one mediator model to obtain the true values for the covariances and parameters as well as the true standard error of the mediated effect. A noncentral Z statistic is computed and used as input equations that compute the power to detect the effects. In this way, the procedure is the same as that described in Satorra and Saris (1985). The partial correlations for each parameter are also output in this program, providing a measure of effect size.