Non-parametric Mediation Analysis for direct effect with categorial outcomes
|
|
- Gyles Moses McBride
- 5 years ago
- Views:
Transcription
1 Non-parametric Mediation Analysis for direct effect with categorial outcomes JM GALHARRET, A. PHILIPPE, P ROCHET July 3, Introduction Within the human sciences, mediation designates a particular causal phenomenon where the effect of a variable X on another variable Y passes (partially or entirely) through a third variable M (see Baron and Kenny (1986)). The study of mediation is particularly popular in psychology, sociology or marketing, as it allows the detection of variables that may trigger specific human behaviors. In the mediation model, the total effect of X on Y is divided into the influence of X over Y in presence of M (the direct effect) and the part of this effect that reroutes through M (the indirect effect). For instance, Schmader and Johns (2003) have shown that a reduction in working memory capacity mediates the negative effect caused by a stereotype treat on women s mathematical performances. MacKinnon (2008) compares testing procedures regarding the indirect effect. M a b X γ Y Figure 1: Summary of the relations between Y, X, M. The direct and indirect effects are defined by γ and ab respectively, according to MacKinnon (2008) The main objective in the mediation model is to quantify the added effect of X on Y in presence of M. A natural first step in this direction is to detect the absence of a direct effect altogether, which would signify that X could (and should) be ignored to investigate Y. Detecting the direct effect is generally achieved via a statistical test on the significance of the coefficient γ in the model. If Y is a continuous variable, the mediation model typically follows a classical linear regression framework : Y = α + γx + bm + ε, where ɛ is a random error uncorrelated to X and M, with zero mean and finite variance. In this model, testing whether there is a direct effect can be achieved by a Student significance test on the coefficient 1
2 γ. A discrete analogue when Y is a categorical variable is given by the logistic regression model in which the absence of a direct effect is tested via the likelihood ratio test also called LR test (see e.g. Agresti, 2006) or via the Wald test (see Jr. and Donner, 1977, for example). In such linear mediation models, the study of a direct effect is well understood in both the discrete and continuous cases. However, linear relations between the variables implicitly reduces causality to a correlation issue, which can be unrealistic in some practical situations. If so, a more general model must be adopted in order to account for possible non-linear relations. In this paper, we propose a more general definition of the direct effect in a mediation model that investigates the conditional dependence between the variables instead of focusing on the correlation. This definition conveys that no direct effect exists between X and Y if the conditional distribution of Y, given the variables X and M, is a function of M alone. In other words, the whole effect (linear or non-linear) of X on Y is entirely explained by M. Because the general mediation model encompasses the linear one, we argue that the absence of a direct effect should be detected by the non-parametric approach even if the linear assumptions hold. On the contrary, a linear mediation model may be unadapted and fail to properly interpret the information of the data in a non-linear setting. We present a non-parametric test procedure to infer on the absence of a direct effect in the general mediation model (see Imai and Keele (2010)). The test statistics are obtained from kernel estimators of the densities (and conditional densities) of the variables of the model. Although the theoretical distribution of the test statistics under the null hypothesis is unknown, it is possible to estimate it by a bootstrap procedure, thus providing an approximation of the p-value. A real data application to students performances linked to well-being and self-efficacy is presented. We show that the conclusions regarding the existence of a direct effect may differ, whether the considered model is linear (in this case, the logistic regression model) or not. A comparative study of the two tests procedures is carried out on simulated data in both a linear and non-linear framework (for parametric procedures, the Wald and LR tests were used). This study reveals that the logistic model may misread the causal effect in the data if the linearity assumption is not satisfied, and more particularly in absence of monotonic effect of X on Y. On the contrary, the performances of the non-linear test procedure remain comparable to the parametric tests in the logistic regression setting. We note also that the comparison between both parametric tests is in favor of the LR test in terms of power for small samples, in agreement with the published literature (see e.g. Harrell, 2006). The paper is organized as follows. In Section 2 we describe the mathematical formalism behind the non-linear mediation model, whose definition relies on the joint distribution of the variables. We show that this model effectively generalizes the linear mediation model, in the sense that a direct effect in a linear scenario results in a direct effect in the general setting, while the reciprocal may not be true. The extension of the significance test for a direct effect to the non-linear setting is then developed. Finally, the test procedure is applied to numerical examples in Section 4, both on simulated and real data. 2 A non-linear mediation model The absence of direct effect means that the influence of X over Y is canceled out in the presence of M. In mathematical terms, the absence of direct effect can be interpreted as the distribution of Y given X, M being equal to its distribution given M or equivalently, for all measurable sets A, P(Y A X, M) a.s. = P(Y A M) (1) 2
3 where a.s. stands for almost surely. Arguably, this condition is the most general possible when it comes to formalizing the absence of direct effect in a mediation model. Remark 2.1. In these circumstances, we assume implicitly that X and M are dependent variables, or else searching for a direct effect is meaningless. If X and M are independent, any actual effect of X over Y cannot be canceled in presence of M although the above condition might still hold if Y and X are also independent. Thus, assuming that X and M are dependent rules out this trivial yet problematic situation. Testing the equality of the two conditional distributions can be quite tedious in practice, especially for a continuous variable Y. However, for many statistical models this equality is equivalent to (H 0 ) : E(Y X, M) a.s. = E(Y M). (2) Both conditional expectations can be easily estimated in a non-parametric way by using the well-known kernel density estimators (see Wolfgang Härdle and Sperlich (2004)). This allows us to construct testing statistics in Section 4. Remark 2.2. The null hypothesis is somewhat similar to the one considered in Hayes (2013), where the authors propose the condition E(Y X = x, M = m) = E(Y X = x 1, M = m), as a non-linear characterization of the absence of direct effect. However, no test procedure is developed in the general case. We will now describe some models in which the absence of direct effect can be reduced to (H 0 ) as defined in (2). Binary outcomes: If Y is a binary variable, then the conditional probability is defined by P(Y = 1 X, M) = E(Y X, M). Thus, the equivalence between (1) and (2) is immediate. The situation is slightly more complicated if Y is a categorical variable, for example with outcomes such as 1,..., J. In this case, the general condition (1) reduces to the equality of the conditional probabilities P(Y = j X, M) a.s. = P(Y = j M), j = 1,..., J, which cannot be reduced to a condition on E(Y X, M) only. To solve this issue, one may consider the vector Y = (1{Y = 1},..., 1{Y = J 1}) as the variable of interest. The condition (1) is equivalent to (H 0) : E(Y X, M) a.s. = E(Y M). (3) Alternatively, this situation can be tackled by using multiple tests over the different outcomes, thus reducing to the binary case. Non-parametric regression: The non-parametric regression model is of the form Y = ρ(x, M) + ɛ, where ɛ is a residual error independent from X, M and ρ is an unknown measurable function. In this model, the absence of direct effect can be investigated via ρ(x, M), which is generally more accessible than the conditional distribution. The condition (1) is equivalent to ρ(x, M) does not depend on X. Indeed, the conditional distribution of Y given X, M is the distribution of ɛ translated by ρ(x, M). If Y is integrable, we obtain ρ(x, M) = E(Y X, M), and thus hypothesis (1) is equivalent to (H 0 ) as defined in (2). 3
4 Remark 2.3. For logistic model and linear model mostly used mostly in the literature, we have the following parametric relation { f(α + bm + γx) logistic model, E(Y X, M) = α + bm + γx linear model. where f is a known function, typically the logistic function f(t) = 1/(1 + e t ). In our framework, no such assumption is made. In this parametric framework, the testing hypotheses become γ = 0 vs γ 0 (see VanderWeele and Vansteelandt (2010) for logistic model). Our non-parametric approach described below avoids this restriction on the form of the conditional expectation. 3 The non-parametric test procedure Hereafter we assume that Y is an integrable random variable and that (X, M) has a density on R 2 with respect to Lebesgue measure. Let ρ(x, M) = E(Y X, M) and φ(m) = E(Y M), testing the null hypothesis H 0 boils down to checking if the parameter θ := E ρ(x, M) φ(m) is zero. As a result, a simple test procedure can be constructed from a consistent estimator ˆθ. The functions ρ and φ can be estimated by the standard Nadaraya-Watson method: n i=1 ρ(x, m) := Y n ik h (X i x)k h (M i m) n i=1 K and h(x i x)k h (M i m) φ(m) i=1 := Y ik h (M i m) n i=1 K h(m i m). In this case, K is a symmetric kernel and K h = K(./h)/h with h > 0 the bandwidth. For simplicity s sake we chose a gaussian kernel and with theoretically optimal bandwidths (h = n 1/6 for ρ and h = n 1/5 for φ). This is sufficient to ensure the consistency of the kernel estimators under mild assumptions. As a matter of fact, calibrating the bandwidth adaptively in order to improve the estimation turns out to be unnecessary for our purposes since we are mainly interested in the distribution of the test statistics. We then compute the empirical estimator: θ := 1 n n ρ(x i, M i ) φ(m i ). i=1 If there is no direct effect, the statistic θ is expected to be close to zero, assuming that the regularity conditions for the consistency of the Nadaraya-Watson method are verified. To build the test for the absence of a direct effect, we investigate the distribution of θ under the null hypothesis. This problem is not easily tractable analytically, even asymptotically. However with binary outcomes Y we show that it can rely on bootstrap to approximate it. The distribution of θ is estimated by a bootstrap procedure as follows. 1. We draw B samples (X (b) i, M (b) i ), b = 1,..., B of size n with replacement from the original sample (X i, M i ), i = 1,..., n. 2. For each b, we generate a Bernoulli variable Y (b) i with probability φ ( M (b) ) i for i = 1,..., n. This aims to approximate the distribution of Y conditionally to M, X, which only depends on M under the null hypothesis. 4
5 3. We compute the statistics θ b over all bootstrap samples b = 1,..., B. Let t denote the observed value of θ on the observed sample. The p-value of the test is then obtained as the empirical quantile of θ 1,..., θ B, evaluated at t: p-value = 1 B B 1{ θ b > t}. b=1 Since the absence of direct effect conveys that θ must be close to zero, the null hypothesis is rejected at a significance level α (0, 1) if p-value < α. Remark 3.1. The bootstrap procedure is effective in this situation because we are able to approximate the distribution of Y (b) i conditionally to M (b) i under the null hypothesis. In the binary case, this is as a Bernoulli random variable with parameter φ ( M (b) ) i. This can easily achieved by generating Y (b) i be extended to more than two values by generating Y (b) as a multinomial distribution with probabilities ) i estimated for each value j. In the continuous case, the bootstrap step requires the approximation ( (b) φ j M i of the distribution of the residual term ɛ in the non-linear relation Y = φ(m) + ɛ. Both parametric (e.g. normality assumptions) or non-parametric approaches are possible, although they may have a nonnegligible impact on the performances of the test. 4 Non-parametric test against data 4.1 Application to students well-being To motivate the non-parametric approach, we investigate the mediation of Students Self-Efficacy M in the relation between Well-being X and Academic performance in mathematics and in French Y. 244 students from the Nantes region (France) participated in the experimentation. The variables are measured by a test instrument (i.e. a questionnaire): Variable Numbers of items Likert-Scale Score Well-Being Mean SEF in mathematics Mean SEF in french Mean Table 1: Multiple-item testing instruments used The teachers evaluate the academic performance as above average (Y = 1) or not (Y = 0). We compare the results of our non-parametric test with both the Wald test and the LR test, which are commonly applied for this kind of psychological studies. p W p LR p NP Mathematics French Mathematics French Table 2: Comparison of results on the real dataset (Table 1). The p-values p W, p LR, p NP refer respectively to the Wald, LR and non-parametric tests. 5
6 The p-values of the parametric and non-parametric tests are similar in all cases. We may note that the results are ambiguous in the first case, at the typical significance level α =.05, where the non-parametric test does not detect a direct effect. A more thorough analysis reveals that the linearity assumption on which the parametric tests rely is dubious. Indeed, the Box and Tidwell test, which measures the significance of the added variable X log(x) in the logistic model, gives a p-value.03, thus indicating a non-linear dependence in X. 4.2 Simulated data We generate data from three distinct models corresponding to three different forms of the conditional probability ρ γ (x, m) := P(Y = 1 X = x, M = m), indexed by γ being a coefficient that quantifies the importance of the direct effect, varying from γ = 0 (i.e. no direct effect) to γ = 1 (i.e. only direct effect). For each model, we generate N = 10, 000 samples of sizes n = 20, 30, 50 and 100. The observations X i, M i are selected randomly from the actual dataset of the previous section, and Y i follows a Bernoulli distribution with parameter ρ γ (X i, M i ). The three different scenarios are described below. 1. The first scenario is the logistic regression model with ρ γ (x, m) = exp ( 3 + 2γx + (1 γ)m ). This is the theoretical framework of the LR and Wald tests, although both are based on asymptotic approximations. 2. The second scenario is generated from ρ γ (x, m) := γ0.5 1 x>3 + (1 γ) m>5, where 1 stands for the indicator function. In this case, the relation between Y and X, M is nonlinear but monotonic in both x and m, so that it is expected not to deviate too much from the linear framework. 3. For the third model, we take ρ γ (x, m) = γ 1.72x x (1 γ)0.1m, where x + = max(x, 0). Due to its non-monotonic behavior in x, this setup is unfavorable for the parametric tests while it is still covered by the non-parametric one. The coefficients of the polynomial function in x are chosen so that ρ γ (X i, M i ) remains in the interval [0, 1] for all possible values of X i, M i and γ. Simulations and computations were performed using R Core Team (2016). For fixing α = 5% the significance level, Figures?? show the evolution of the empirical probability to reject the null hypothesis as function of γ. The value γ = 0 corresponds to the empirical significance level and γ > 0 give the simulated power function. All these probabilities are estimated from 10,000 independent replications. Table 4.2 displays the simulated probabilities to reject the null hypothesis. The simulated power of the three tests is low for n = 20 sample-size and significance levels α =.01,.05,.10. Moreover, the NP-test has a better empirical significance level in 75 percent of cases, LR test in 6 percent and Wald test in 19 percent. Lastly, the LR test is not conservative for all n = 20, 30 and α =.01,.05,.10, leading to reject the null hypothesis with higher probability than α given that the null hypothesis is true. 6
7 In the n = 50, 100 sample-size, in agreement with published literature (see e.g. Agresti, 2006), the Wald and LR test perform similarly in the three exemples. The non monotonic framework highlights the adding value of the nonparametric method. Indeed, for all significance level α =.01,.05,.10, the NP test has a high simulated power as soon as n 30. In this case, both parametric tests are not appropriate even for large-sample (e.g. for γ = 1 and α =.05, the Wald test and the LR test rejects only the null hypothesis respectively 31 percent and 35 percent while the NP-test always rejects the null hypothesis). Figure 2: Comparison of the empirical level significance (γ = 0) and of the empirical power (γ > 0) in the logistic model with significance level α =.05. The NP-test outperforms the parametric tests in a linear setup, as one would anticipate for large sample size (n = 50, n = 100). However, results show that NP-test has a better significance level than the Wald and LR tests for small samples (n = 20, 30, 50). Furthermore, the Wald test is the worst test for the sample sizes n = 20, n = 30. 7
8 Figure 3: Comparison of the empirical level significance and of the empirical power in the non-linear monotonic case with significance level α =.05. In the non-linear monotonic case, the power of the tests is similar, as one would anticipate. As in logistic framework, the power of all the LR tests is low for small sample size (n = 20, 30), but the empirical level significance of the NP-test is better than the others for all α {.01,.05,.10}. Conversely, even if the sample size and the deviation from the null hypothesis are large, the parametric tests have rarely detected the direct effect of X on p = P(Y = 1 X, M). 8
9 Figure 4: Comparison of the empirical level significance and of the empirical power in the non-linear non monotonic case with significance level α =.05. The power of the parametric tests is low for all sample size. On the contrary, the power of the NP-test increases with γ. Moreover, the empirical significance level is close to the theoretical value α =.05. 9
10 Sample size n γ Simulated power α =.01 α =.05 α =.10 π W π LR π NP π W π LR π NP π W π LR π NP Logistic Monotonic relationship Non monotonic relationship Table 3: Summary of the estimated significance levels and powers in the three scenarios. 10
11 5 Aknowledgements This research was funded by several grants from France s Ministère de l Education Nationale et de l Enseignement Supérieur, the Dfenseur des Droits, and the Agence pour la Cohésion Sociale et l Egalité des Chances. References Agresti, A. (2006). Multicategory Logit Models. John Wiley & Sons, Inc. Baron, R. and Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6):1173. Box, G. E. and Tidwell, P. W. (1962). Transformation of the independent variables. Technometrics, 4(4): Harrell, Jr., F. E. (2006). Regression Modeling Strategies. Springer-Verlag, Berlin, Heidelberg. Hayes, A. (2013). Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach. Methodology in the Social Sciences Series. Guilford Publications. Imai, K. and Keele, L. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4): Jr., W. W. H. and Donner, A. (1977). Wald s test as applied to hypotheses in logit analysis. Journal of the American Statistical Association, 72(360a): MacKinnon, D. P. (2008). Introduction to Statistical Mediation Analysis. Routledge. R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Schmader, T. and Johns, M. (2003). Converging evidence that stereotype threat reduces working memory capacity. Journal of personality and social psychology, 85(3): VanderWeele, T. J. and Vansteelandt, S. (2010). Odds ratios for mediation analysis for a dichotomous outcome. American Journal of Epidemiology, 172(12). Wolfgang Härdle, Axel Werwatz, M. M. and Sperlich, S. (2004). Nonparametric and Semiparametric Models. Spinger. 11
An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016
An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions
More information13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,
13 Appendix 13.1 Causal effects with continuous mediator and continuous outcome Consider the model of Section 3, y i = β 0 + β 1 m i + β 2 x i + β 3 x i m i + β 4 c i + ɛ 1i, (49) m i = γ 0 + γ 1 x i +
More informationFlexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.
FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke
More informationStatistical Analysis of Causal Mechanisms
Statistical Analysis of Causal Mechanisms Kosuke Imai Princeton University November 17, 2008 Joint work with Luke Keele (Ohio State) and Teppei Yamamoto (Princeton) Kosuke Imai (Princeton) Causal Mechanisms
More informationCausal Mechanisms Short Course Part II:
Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal
More informationIdentification and Inference in Causal Mediation Analysis
Identification and Inference in Causal Mediation Analysis Kosuke Imai Luke Keele Teppei Yamamoto Princeton University Ohio State University November 12, 2008 Kosuke Imai (Princeton) Causal Mediation Analysis
More informationCausal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms
Causal Mediation Analysis in R Kosuke Imai Princeton University June 18, 2009 Joint work with Luke Keele (Ohio State) Dustin Tingley and Teppei Yamamoto (Princeton) Kosuke Imai (Princeton) Causal Mediation
More informationStatistical Analysis of Causal Mechanisms
Statistical Analysis of Causal Mechanisms Kosuke Imai Luke Keele Dustin Tingley Teppei Yamamoto Princeton University Ohio State University July 25, 2009 Summer Political Methodology Conference Imai, Keele,
More informationTesting for Regime Switching in Singaporean Business Cycles
Testing for Regime Switching in Singaporean Business Cycles Robert Breunig School of Economics Faculty of Economics and Commerce Australian National University and Alison Stegman Research School of Pacific
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationRevision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE
Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE insert p. 90: in graphical terms or plain causal language. The mediation problem of Section 6 illustrates how such symbiosis clarifies the
More informationStatistical Analysis of Causal Mechanisms for Randomized Experiments
Statistical Analysis of Causal Mechanisms for Randomized Experiments Kosuke Imai Department of Politics Princeton University November 22, 2008 Graduate Student Conference on Experiments in Interactive
More informationCausal mediation analysis: Definition of effects and common identification assumptions
Causal mediation analysis: Definition of effects and common identification assumptions Trang Quynh Nguyen Seminar on Statistical Methods for Mental Health Research Johns Hopkins Bloomberg School of Public
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationA review of some semiparametric regression models with application to scoring
A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France
More informationCasual Mediation Analysis
Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction
More informationSection 7: Local linear regression (loess) and regression discontinuity designs
Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear
More informationModel Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao
Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley
More informationFlexible Mediation Analysis in the Presence of Nonlinear Relations: Beyond the Mediation Formula
Multivariate Behavioral Research ISSN: 0027-3171 (Print) 1532-7906 (Online) Journal homepage: http://www.tandfonline.com/loi/hmbr20 Flexible Mediation Analysis in the Presence of Nonlinear Relations: Beyond
More informationEmpirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design
1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationGoodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links
Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department
More informationEstimation and sample size calculations for correlated binary error rates of biometric identification devices
Estimation and sample size calculations for correlated binary error rates of biometric identification devices Michael E. Schuckers,11 Valentine Hall, Department of Mathematics Saint Lawrence University,
More informationTest of Association between Two Ordinal Variables while Adjusting for Covariates
Test of Association between Two Ordinal Variables while Adjusting for Covariates Chun Li, Bryan Shepherd Department of Biostatistics Vanderbilt University May 13, 2009 Examples Amblyopia http://www.medindia.net/
More informationIntroduction to Statistical Analysis
Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive
More informationGeneralized Linear Modeling - Logistic Regression
1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More informationDo Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods
Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Robert V. Breunig Centre for Economic Policy Research, Research School of Social Sciences and School of
More informationTECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study
TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,
More informationMeasuring Social Influence Without Bias
Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from
More informationAbstract Title Page. Title: Degenerate Power in Multilevel Mediation: The Non-monotonic Relationship Between Power & Effect Size
Abstract Title Page Title: Degenerate Power in Multilevel Mediation: The Non-monotonic Relationship Between Power & Effect Size Authors and Affiliations: Ben Kelcey University of Cincinnati SREE Spring
More informationProbability and Statistics
Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT
More informationANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW
SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved
More informationRobust Backtesting Tests for Value-at-Risk Models
Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society
More informationBehind the Curve and Beyond: Calculating Representative Predicted Probability Changes and Treatment Effects for Non-Linear Models
Metodološki zvezki, Vol. 15, No. 1, 2018, 43 58 Behind the Curve and Beyond: Calculating Representative Predicted Probability Changes and Treatment Effects for Non-Linear Models Bastian Becker 1 Abstract
More informationMath 494: Mathematical Statistics
Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/
More informationCan we do statistical inference in a non-asymptotic way? 1
Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.
More informationHelp! Statistics! Mediation Analysis
Help! Statistics! Lunch time lectures Help! Statistics! Mediation Analysis What? Frequently used statistical methods and questions in a manageable timeframe for all researchers at the UMCG. No knowledge
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationStatistical Analysis of Causal Mechanisms
Statistical Analysis of Causal Mechanisms Kosuke Imai Princeton University April 13, 2009 Kosuke Imai (Princeton) Causal Mechanisms April 13, 2009 1 / 26 Papers and Software Collaborators: Luke Keele,
More information2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2
PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors
More informationRatio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects
Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Guanglei Hong University of Chicago, 5736 S. Woodlawn Ave., Chicago, IL 60637 Abstract Decomposing a total causal
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationWU Weiterbildung. Linear Mixed Models
Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationHarvard University. Rigorous Research in Engineering Education
Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationHow likely is Simpson s paradox in path models?
How likely is Simpson s paradox in path models? Ned Kock Full reference: Kock, N. (2015). How likely is Simpson s paradox in path models? International Journal of e- Collaboration, 11(1), 1-7. Abstract
More informationStatistical Analysis of the Item Count Technique
Statistical Analysis of the Item Count Technique Kosuke Imai Department of Politics Princeton University Joint work with Graeme Blair May 4, 2011 Kosuke Imai (Princeton) Item Count Technique UCI (Statistics)
More information36-720: The Rasch Model
36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more
More informationGeneralized Linear Models for Non-Normal Data
Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationTests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test)
Chapter 861 Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test) Introduction Logistic regression expresses the relationship between a binary response variable and one or more
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationEstimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?
Journal of Modern Applied Statistical Methods Volume 10 Issue Article 13 11-1-011 Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?
More informationLecture 2: Basic Concepts of Statistical Decision Theory
EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture
More informationStatistical Methods for Causal Mediation Analysis
Statistical Methods for Causal Mediation Analysis The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Accessed Citable
More informationMeasures of Association and Variance Estimation
Measures of Association and Variance Estimation Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 35
More informationLecture 21: October 19
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationA Model for Correlated Paired Comparison Data
Working Paper Series, N. 15, December 2010 A Model for Correlated Paired Comparison Data Manuela Cattelan Department of Statistical Sciences University of Padua Italy Cristiano Varin Department of Statistics
More informationCORRELATION, ASSOCIATION, CAUSATION, AND GRANGER CAUSATION IN ACCOUNTING RESEARCH
CORRELATION, ASSOCIATION, CAUSATION, AND GRANGER CAUSATION IN ACCOUNTING RESEARCH Alireza Dorestani, Northeastern Illinois University Sara Aliabadi, Northeastern Illinois University ABSTRACT In this paper
More informationCausal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies
Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed
More informationStatistical Analysis of List Experiments
Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys
More informationInterpreting and using heterogeneous choice & generalized ordered logit models
Interpreting and using heterogeneous choice & generalized ordered logit models Richard Williams Department of Sociology University of Notre Dame July 2006 http://www.nd.edu/~rwilliam/ The gologit/gologit2
More informationA nonparametric test for seasonal unit roots
Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna To be presented in Innsbruck November 7, 2007 Abstract We consider a nonparametric test for the
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationClinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.
Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,
More informationEstimating direct effects in cohort and case-control studies
Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research
More informationIdentification, Inference, and Sensitivity Analysis for Causal Mediation Effects
Identification, Inference, and Sensitivity Analysis for Causal Mediation Effects Kosuke Imai Luke Keele Teppei Yamamoto First Draft: November 4, 2008 This Draft: January 15, 2009 Abstract Causal mediation
More informationLECTURE 5. Introduction to Econometrics. Hypothesis testing
LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will
More informationModeling Mediation: Causes, Markers, and Mechanisms
Modeling Mediation: Causes, Markers, and Mechanisms Stephen W. Raudenbush University of Chicago Address at the Society for Resesarch on Educational Effectiveness,Washington, DC, March 3, 2011. Many thanks
More informationBootstrapping Sensitivity Analysis
Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationNew developments in structural equation modeling
New developments in structural equation modeling Rex B Kline Concordia University Montréal Set B: Mediation A UNL Methodology Workshop A2 Topics o Mediation: Design requirements Conditional process modeling
More informationStat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010
1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationUQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables
UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1
More informationUnpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies
Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies Kosuke Imai Princeton University Joint work with Keele (Ohio State), Tingley (Harvard), Yamamoto (Princeton)
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationUSE OF R ENVIRONMENT FOR A FINDING PARAMETERS NONLINEAR REGRESSION MODELS
Session 1. Statistic Methods and Their Applications Proceedings of the 11th International Conference Reliability and Statistics in Transportation and Communication (RelStat 11), 19 22 October 2011, Riga,
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationSTA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.
STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationSC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)
SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SEM is a family of statistical techniques which builds upon multiple regression,
More informationCovariate Balancing Propensity Score for General Treatment Regimes
Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian
More informationDiscussion of Papers on the Extensions of Propensity Score
Discussion of Papers on the Extensions of Propensity Score Kosuke Imai Princeton University August 3, 2010 Kosuke Imai (Princeton) Generalized Propensity Score 2010 JSM (Vancouver) 1 / 11 The Theme and
More informationMediation for the 21st Century
Mediation for the 21st Century Ross Boylan ross@biostat.ucsf.edu Center for Aids Prevention Studies and Division of Biostatistics University of California, San Francisco Mediation for the 21st Century
More informationECON 5350 Class Notes Functional Form and Structural Change
ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this
More informationData Integration for Big Data Analysis for finite population inference
for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation
More informationOn the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit
On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit March 27, 2004 Young-Sun Lee Teachers College, Columbia University James A.Wollack University of Wisconsin Madison
More informationTesting an Autoregressive Structure in Binary Time Series Models
ömmföäflsäafaäsflassflassflas ffffffffffffffffffffffffffffffffffff Discussion Papers Testing an Autoregressive Structure in Binary Time Series Models Henri Nyberg University of Helsinki and HECER Discussion
More informationStatistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018
Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical
More informationDefect Detection using Nonparametric Regression
Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare
More informationScore test for random changepoint in a mixed model
Score test for random changepoint in a mixed model Corentin Segalas and Hélène Jacqmin-Gadda INSERM U1219, Biostatistics team, Bordeaux GDR Statistiques et Santé October 6, 2017 Biostatistics 1 / 27 Introduction
More informationSample size determination for logistic regression: A simulation study
Sample size determination for logistic regression: A simulation study Stephen Bush School of Mathematical Sciences, University of Technology Sydney, PO Box 123 Broadway NSW 2007, Australia Abstract This
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend
More information