Non-parametric Mediation Analysis for direct effect with categorial outcomes

Size: px

Start display at page:

Download "Non-parametric Mediation Analysis for direct effect with categorial outcomes"

Gyles Moses McBride
5 years ago
Views:

1 Non-parametric Mediation Analysis for direct effect with categorial outcomes JM GALHARRET, A. PHILIPPE, P ROCHET July 3, Introduction Within the human sciences, mediation designates a particular causal phenomenon where the effect of a variable X on another variable Y passes (partially or entirely) through a third variable M (see Baron and Kenny (1986)). The study of mediation is particularly popular in psychology, sociology or marketing, as it allows the detection of variables that may trigger specific human behaviors. In the mediation model, the total effect of X on Y is divided into the influence of X over Y in presence of M (the direct effect) and the part of this effect that reroutes through M (the indirect effect). For instance, Schmader and Johns (2003) have shown that a reduction in working memory capacity mediates the negative effect caused by a stereotype treat on women s mathematical performances. MacKinnon (2008) compares testing procedures regarding the indirect effect. M a b X γ Y Figure 1: Summary of the relations between Y, X, M. The direct and indirect effects are defined by γ and ab respectively, according to MacKinnon (2008) The main objective in the mediation model is to quantify the added effect of X on Y in presence of M. A natural first step in this direction is to detect the absence of a direct effect altogether, which would signify that X could (and should) be ignored to investigate Y. Detecting the direct effect is generally achieved via a statistical test on the significance of the coefficient γ in the model. If Y is a continuous variable, the mediation model typically follows a classical linear regression framework : Y = α + γx + bm + ε, where ɛ is a random error uncorrelated to X and M, with zero mean and finite variance. In this model, testing whether there is a direct effect can be achieved by a Student significance test on the coefficient 1

2 γ. A discrete analogue when Y is a categorical variable is given by the logistic regression model in which the absence of a direct effect is tested via the likelihood ratio test also called LR test (see e.g. Agresti, 2006) or via the Wald test (see Jr. and Donner, 1977, for example). In such linear mediation models, the study of a direct effect is well understood in both the discrete and continuous cases. However, linear relations between the variables implicitly reduces causality to a correlation issue, which can be unrealistic in some practical situations. If so, a more general model must be adopted in order to account for possible non-linear relations. In this paper, we propose a more general definition of the direct effect in a mediation model that investigates the conditional dependence between the variables instead of focusing on the correlation. This definition conveys that no direct effect exists between X and Y if the conditional distribution of Y, given the variables X and M, is a function of M alone. In other words, the whole effect (linear or non-linear) of X on Y is entirely explained by M. Because the general mediation model encompasses the linear one, we argue that the absence of a direct effect should be detected by the non-parametric approach even if the linear assumptions hold. On the contrary, a linear mediation model may be unadapted and fail to properly interpret the information of the data in a non-linear setting. We present a non-parametric test procedure to infer on the absence of a direct effect in the general mediation model (see Imai and Keele (2010)). The test statistics are obtained from kernel estimators of the densities (and conditional densities) of the variables of the model. Although the theoretical distribution of the test statistics under the null hypothesis is unknown, it is possible to estimate it by a bootstrap procedure, thus providing an approximation of the p-value. A real data application to students performances linked to well-being and self-efficacy is presented. We show that the conclusions regarding the existence of a direct effect may differ, whether the considered model is linear (in this case, the logistic regression model) or not. A comparative study of the two tests procedures is carried out on simulated data in both a linear and non-linear framework (for parametric procedures, the Wald and LR tests were used). This study reveals that the logistic model may misread the causal effect in the data if the linearity assumption is not satisfied, and more particularly in absence of monotonic effect of X on Y. On the contrary, the performances of the non-linear test procedure remain comparable to the parametric tests in the logistic regression setting. We note also that the comparison between both parametric tests is in favor of the LR test in terms of power for small samples, in agreement with the published literature (see e.g. Harrell, 2006). The paper is organized as follows. In Section 2 we describe the mathematical formalism behind the non-linear mediation model, whose definition relies on the joint distribution of the variables. We show that this model effectively generalizes the linear mediation model, in the sense that a direct effect in a linear scenario results in a direct effect in the general setting, while the reciprocal may not be true. The extension of the significance test for a direct effect to the non-linear setting is then developed. Finally, the test procedure is applied to numerical examples in Section 4, both on simulated and real data. 2 A non-linear mediation model The absence of direct effect means that the influence of X over Y is canceled out in the presence of M. In mathematical terms, the absence of direct effect can be interpreted as the distribution of Y given X, M being equal to its distribution given M or equivalently, for all measurable sets A, P(Y A X, M) a.s. = P(Y A M) (1) 2

3 where a.s. stands for almost surely. Arguably, this condition is the most general possible when it comes to formalizing the absence of direct effect in a mediation model. Remark 2.1. In these circumstances, we assume implicitly that X and M are dependent variables, or else searching for a direct effect is meaningless. If X and M are independent, any actual effect of X over Y cannot be canceled in presence of M although the above condition might still hold if Y and X are also independent. Thus, assuming that X and M are dependent rules out this trivial yet problematic situation. Testing the equality of the two conditional distributions can be quite tedious in practice, especially for a continuous variable Y. However, for many statistical models this equality is equivalent to (H 0 ) : E(Y X, M) a.s. = E(Y M). (2) Both conditional expectations can be easily estimated in a non-parametric way by using the well-known kernel density estimators (see Wolfgang Härdle and Sperlich (2004)). This allows us to construct testing statistics in Section 4. Remark 2.2. The null hypothesis is somewhat similar to the one considered in Hayes (2013), where the authors propose the condition E(Y X = x, M = m) = E(Y X = x 1, M = m), as a non-linear characterization of the absence of direct effect. However, no test procedure is developed in the general case. We will now describe some models in which the absence of direct effect can be reduced to (H 0 ) as defined in (2). Binary outcomes: If Y is a binary variable, then the conditional probability is defined by P(Y = 1 X, M) = E(Y X, M). Thus, the equivalence between (1) and (2) is immediate. The situation is slightly more complicated if Y is a categorical variable, for example with outcomes such as 1,..., J. In this case, the general condition (1) reduces to the equality of the conditional probabilities P(Y = j X, M) a.s. = P(Y = j M), j = 1,..., J, which cannot be reduced to a condition on E(Y X, M) only. To solve this issue, one may consider the vector Y = (1{Y = 1},..., 1{Y = J 1}) as the variable of interest. The condition (1) is equivalent to (H 0) : E(Y X, M) a.s. = E(Y M). (3) Alternatively, this situation can be tackled by using multiple tests over the different outcomes, thus reducing to the binary case. Non-parametric regression: The non-parametric regression model is of the form Y = ρ(x, M) + ɛ, where ɛ is a residual error independent from X, M and ρ is an unknown measurable function. In this model, the absence of direct effect can be investigated via ρ(x, M), which is generally more accessible than the conditional distribution. The condition (1) is equivalent to ρ(x, M) does not depend on X. Indeed, the conditional distribution of Y given X, M is the distribution of ɛ translated by ρ(x, M). If Y is integrable, we obtain ρ(x, M) = E(Y X, M), and thus hypothesis (1) is equivalent to (H 0 ) as defined in (2). 3

4 Remark 2.3. For logistic model and linear model mostly used mostly in the literature, we have the following parametric relation { f(α + bm + γx) logistic model, E(Y X, M) = α + bm + γx linear model. where f is a known function, typically the logistic function f(t) = 1/(1 + e t ). In our framework, no such assumption is made. In this parametric framework, the testing hypotheses become γ = 0 vs γ 0 (see VanderWeele and Vansteelandt (2010) for logistic model). Our non-parametric approach described below avoids this restriction on the form of the conditional expectation. 3 The non-parametric test procedure Hereafter we assume that Y is an integrable random variable and that (X, M) has a density on R 2 with respect to Lebesgue measure. Let ρ(x, M) = E(Y X, M) and φ(m) = E(Y M), testing the null hypothesis H 0 boils down to checking if the parameter θ := E ρ(x, M) φ(m) is zero. As a result, a simple test procedure can be constructed from a consistent estimator ˆθ. The functions ρ and φ can be estimated by the standard Nadaraya-Watson method: n i=1 ρ(x, m) := Y n ik h (X i x)k h (M i m) n i=1 K and h(x i x)k h (M i m) φ(m) i=1 := Y ik h (M i m) n i=1 K h(m i m). In this case, K is a symmetric kernel and K h = K(./h)/h with h > 0 the bandwidth. For simplicity s sake we chose a gaussian kernel and with theoretically optimal bandwidths (h = n 1/6 for ρ and h = n 1/5 for φ). This is sufficient to ensure the consistency of the kernel estimators under mild assumptions. As a matter of fact, calibrating the bandwidth adaptively in order to improve the estimation turns out to be unnecessary for our purposes since we are mainly interested in the distribution of the test statistics. We then compute the empirical estimator: θ := 1 n n ρ(x i, M i ) φ(m i ). i=1 If there is no direct effect, the statistic θ is expected to be close to zero, assuming that the regularity conditions for the consistency of the Nadaraya-Watson method are verified. To build the test for the absence of a direct effect, we investigate the distribution of θ under the null hypothesis. This problem is not easily tractable analytically, even asymptotically. However with binary outcomes Y we show that it can rely on bootstrap to approximate it. The distribution of θ is estimated by a bootstrap procedure as follows. 1. We draw B samples (X (b) i, M (b) i ), b = 1,..., B of size n with replacement from the original sample (X i, M i ), i = 1,..., n. 2. For each b, we generate a Bernoulli variable Y (b) i with probability φ ( M (b) ) i for i = 1,..., n. This aims to approximate the distribution of Y conditionally to M, X, which only depends on M under the null hypothesis. 4

5 3. We compute the statistics θ b over all bootstrap samples b = 1,..., B. Let t denote the observed value of θ on the observed sample. The p-value of the test is then obtained as the empirical quantile of θ 1,..., θ B, evaluated at t: p-value = 1 B B 1{ θ b > t}. b=1 Since the absence of direct effect conveys that θ must be close to zero, the null hypothesis is rejected at a significance level α (0, 1) if p-value < α. Remark 3.1. The bootstrap procedure is effective in this situation because we are able to approximate the distribution of Y (b) i conditionally to M (b) i under the null hypothesis. In the binary case, this is as a Bernoulli random variable with parameter φ ( M (b) ) i. This can easily achieved by generating Y (b) i be extended to more than two values by generating Y (b) as a multinomial distribution with probabilities ) i estimated for each value j. In the continuous case, the bootstrap step requires the approximation ( (b) φ j M i of the distribution of the residual term ɛ in the non-linear relation Y = φ(m) + ɛ. Both parametric (e.g. normality assumptions) or non-parametric approaches are possible, although they may have a nonnegligible impact on the performances of the test. 4 Non-parametric test against data 4.1 Application to students well-being To motivate the non-parametric approach, we investigate the mediation of Students Self-Efficacy M in the relation between Well-being X and Academic performance in mathematics and in French Y. 244 students from the Nantes region (France) participated in the experimentation. The variables are measured by a test instrument (i.e. a questionnaire): Variable Numbers of items Likert-Scale Score Well-Being Mean SEF in mathematics Mean SEF in french Mean Table 1: Multiple-item testing instruments used The teachers evaluate the academic performance as above average (Y = 1) or not (Y = 0). We compare the results of our non-parametric test with both the Wald test and the LR test, which are commonly applied for this kind of psychological studies. p W p LR p NP Mathematics French Mathematics French Table 2: Comparison of results on the real dataset (Table 1). The p-values p W, p LR, p NP refer respectively to the Wald, LR and non-parametric tests. 5

6 The p-values of the parametric and non-parametric tests are similar in all cases. We may note that the results are ambiguous in the first case, at the typical significance level α =.05, where the non-parametric test does not detect a direct effect. A more thorough analysis reveals that the linearity assumption on which the parametric tests rely is dubious. Indeed, the Box and Tidwell test, which measures the significance of the added variable X log(x) in the logistic model, gives a p-value.03, thus indicating a non-linear dependence in X. 4.2 Simulated data We generate data from three distinct models corresponding to three different forms of the conditional probability ρ γ (x, m) := P(Y = 1 X = x, M = m), indexed by γ being a coefficient that quantifies the importance of the direct effect, varying from γ = 0 (i.e. no direct effect) to γ = 1 (i.e. only direct effect). For each model, we generate N = 10, 000 samples of sizes n = 20, 30, 50 and 100. The observations X i, M i are selected randomly from the actual dataset of the previous section, and Y i follows a Bernoulli distribution with parameter ρ γ (X i, M i ). The three different scenarios are described below. 1. The first scenario is the logistic regression model with ρ γ (x, m) = exp ( 3 + 2γx + (1 γ)m ). This is the theoretical framework of the LR and Wald tests, although both are based on asymptotic approximations. 2. The second scenario is generated from ρ γ (x, m) := γ0.5 1 x>3 + (1 γ) m>5, where 1 stands for the indicator function. In this case, the relation between Y and X, M is nonlinear but monotonic in both x and m, so that it is expected not to deviate too much from the linear framework. 3. For the third model, we take ρ γ (x, m) = γ 1.72x x (1 γ)0.1m, where x + = max(x, 0). Due to its non-monotonic behavior in x, this setup is unfavorable for the parametric tests while it is still covered by the non-parametric one. The coefficients of the polynomial function in x are chosen so that ρ γ (X i, M i ) remains in the interval [0, 1] for all possible values of X i, M i and γ. Simulations and computations were performed using R Core Team (2016). For fixing α = 5% the significance level, Figures?? show the evolution of the empirical probability to reject the null hypothesis as function of γ. The value γ = 0 corresponds to the empirical significance level and γ > 0 give the simulated power function. All these probabilities are estimated from 10,000 independent replications. Table 4.2 displays the simulated probabilities to reject the null hypothesis. The simulated power of the three tests is low for n = 20 sample-size and significance levels α =.01,.05,.10. Moreover, the NP-test has a better empirical significance level in 75 percent of cases, LR test in 6 percent and Wald test in 19 percent. Lastly, the LR test is not conservative for all n = 20, 30 and α =.01,.05,.10, leading to reject the null hypothesis with higher probability than α given that the null hypothesis is true. 6

7 In the n = 50, 100 sample-size, in agreement with published literature (see e.g. Agresti, 2006), the Wald and LR test perform similarly in the three exemples. The non monotonic framework highlights the adding value of the nonparametric method. Indeed, for all significance level α =.01,.05,.10, the NP test has a high simulated power as soon as n 30. In this case, both parametric tests are not appropriate even for large-sample (e.g. for γ = 1 and α =.05, the Wald test and the LR test rejects only the null hypothesis respectively 31 percent and 35 percent while the NP-test always rejects the null hypothesis). Figure 2: Comparison of the empirical level significance (γ = 0) and of the empirical power (γ > 0) in the logistic model with significance level α =.05. The NP-test outperforms the parametric tests in a linear setup, as one would anticipate for large sample size (n = 50, n = 100). However, results show that NP-test has a better significance level than the Wald and LR tests for small samples (n = 20, 30, 50). Furthermore, the Wald test is the worst test for the sample sizes n = 20, n = 30. 7

8 Figure 3: Comparison of the empirical level significance and of the empirical power in the non-linear monotonic case with significance level α =.05. In the non-linear monotonic case, the power of the tests is similar, as one would anticipate. As in logistic framework, the power of all the LR tests is low for small sample size (n = 20, 30), but the empirical level significance of the NP-test is better than the others for all α {.01,.05,.10}. Conversely, even if the sample size and the deviation from the null hypothesis are large, the parametric tests have rarely detected the direct effect of X on p = P(Y = 1 X, M). 8

9 Figure 4: Comparison of the empirical level significance and of the empirical power in the non-linear non monotonic case with significance level α =.05. The power of the parametric tests is low for all sample size. On the contrary, the power of the NP-test increases with γ. Moreover, the empirical significance level is close to the theoretical value α =.05. 9

10 Sample size n γ Simulated power α =.01 α =.05 α =.10 π W π LR π NP π W π LR π NP π W π LR π NP Logistic Monotonic relationship Non monotonic relationship Table 3: Summary of the estimated significance levels and powers in the three scenarios. 10

11 5 Aknowledgements This research was funded by several grants from France s Ministère de l Education Nationale et de l Enseignement Supérieur, the Dfenseur des Droits, and the Agence pour la Cohésion Sociale et l Egalité des Chances. References Agresti, A. (2006). Multicategory Logit Models. John Wiley & Sons, Inc. Baron, R. and Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6):1173. Box, G. E. and Tidwell, P. W. (1962). Transformation of the independent variables. Technometrics, 4(4): Harrell, Jr., F. E. (2006). Regression Modeling Strategies. Springer-Verlag, Berlin, Heidelberg. Hayes, A. (2013). Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach. Methodology in the Social Sciences Series. Guilford Publications. Imai, K. and Keele, L. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4): Jr., W. W. H. and Donner, A. (1977). Wald s test as applied to hypotheses in logit analysis. Journal of the American Statistical Association, 72(360a): MacKinnon, D. P. (2008). Introduction to Statistical Mediation Analysis. Routledge. R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Schmader, T. and Johns, M. (2003). Converging evidence that stereotype threat reduces working memory capacity. Journal of personality and social psychology, 85(3): VanderWeele, T. J. and Vansteelandt, S. (2010). Odds ratios for mediation analysis for a dichotomous outcome. American Journal of Epidemiology, 172(12). Wolfgang Härdle, Axel Werwatz, M. M. and Sperlich, S. (2004). Nonparametric and Semiparametric Models. Spinger. 11

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions