Non-parametric Mediation Analysis for direct effect with categorial outcomes

Size: px
Start display at page:

Download "Non-parametric Mediation Analysis for direct effect with categorial outcomes"

Transcription

1 Non-parametric Mediation Analysis for direct effect with categorial outcomes JM GALHARRET, A. PHILIPPE, P ROCHET July 3, Introduction Within the human sciences, mediation designates a particular causal phenomenon where the effect of a variable X on another variable Y passes (partially or entirely) through a third variable M (see Baron and Kenny (1986)). The study of mediation is particularly popular in psychology, sociology or marketing, as it allows the detection of variables that may trigger specific human behaviors. In the mediation model, the total effect of X on Y is divided into the influence of X over Y in presence of M (the direct effect) and the part of this effect that reroutes through M (the indirect effect). For instance, Schmader and Johns (2003) have shown that a reduction in working memory capacity mediates the negative effect caused by a stereotype treat on women s mathematical performances. MacKinnon (2008) compares testing procedures regarding the indirect effect. M a b X γ Y Figure 1: Summary of the relations between Y, X, M. The direct and indirect effects are defined by γ and ab respectively, according to MacKinnon (2008) The main objective in the mediation model is to quantify the added effect of X on Y in presence of M. A natural first step in this direction is to detect the absence of a direct effect altogether, which would signify that X could (and should) be ignored to investigate Y. Detecting the direct effect is generally achieved via a statistical test on the significance of the coefficient γ in the model. If Y is a continuous variable, the mediation model typically follows a classical linear regression framework : Y = α + γx + bm + ε, where ɛ is a random error uncorrelated to X and M, with zero mean and finite variance. In this model, testing whether there is a direct effect can be achieved by a Student significance test on the coefficient 1

2 γ. A discrete analogue when Y is a categorical variable is given by the logistic regression model in which the absence of a direct effect is tested via the likelihood ratio test also called LR test (see e.g. Agresti, 2006) or via the Wald test (see Jr. and Donner, 1977, for example). In such linear mediation models, the study of a direct effect is well understood in both the discrete and continuous cases. However, linear relations between the variables implicitly reduces causality to a correlation issue, which can be unrealistic in some practical situations. If so, a more general model must be adopted in order to account for possible non-linear relations. In this paper, we propose a more general definition of the direct effect in a mediation model that investigates the conditional dependence between the variables instead of focusing on the correlation. This definition conveys that no direct effect exists between X and Y if the conditional distribution of Y, given the variables X and M, is a function of M alone. In other words, the whole effect (linear or non-linear) of X on Y is entirely explained by M. Because the general mediation model encompasses the linear one, we argue that the absence of a direct effect should be detected by the non-parametric approach even if the linear assumptions hold. On the contrary, a linear mediation model may be unadapted and fail to properly interpret the information of the data in a non-linear setting. We present a non-parametric test procedure to infer on the absence of a direct effect in the general mediation model (see Imai and Keele (2010)). The test statistics are obtained from kernel estimators of the densities (and conditional densities) of the variables of the model. Although the theoretical distribution of the test statistics under the null hypothesis is unknown, it is possible to estimate it by a bootstrap procedure, thus providing an approximation of the p-value. A real data application to students performances linked to well-being and self-efficacy is presented. We show that the conclusions regarding the existence of a direct effect may differ, whether the considered model is linear (in this case, the logistic regression model) or not. A comparative study of the two tests procedures is carried out on simulated data in both a linear and non-linear framework (for parametric procedures, the Wald and LR tests were used). This study reveals that the logistic model may misread the causal effect in the data if the linearity assumption is not satisfied, and more particularly in absence of monotonic effect of X on Y. On the contrary, the performances of the non-linear test procedure remain comparable to the parametric tests in the logistic regression setting. We note also that the comparison between both parametric tests is in favor of the LR test in terms of power for small samples, in agreement with the published literature (see e.g. Harrell, 2006). The paper is organized as follows. In Section 2 we describe the mathematical formalism behind the non-linear mediation model, whose definition relies on the joint distribution of the variables. We show that this model effectively generalizes the linear mediation model, in the sense that a direct effect in a linear scenario results in a direct effect in the general setting, while the reciprocal may not be true. The extension of the significance test for a direct effect to the non-linear setting is then developed. Finally, the test procedure is applied to numerical examples in Section 4, both on simulated and real data. 2 A non-linear mediation model The absence of direct effect means that the influence of X over Y is canceled out in the presence of M. In mathematical terms, the absence of direct effect can be interpreted as the distribution of Y given X, M being equal to its distribution given M or equivalently, for all measurable sets A, P(Y A X, M) a.s. = P(Y A M) (1) 2

3 where a.s. stands for almost surely. Arguably, this condition is the most general possible when it comes to formalizing the absence of direct effect in a mediation model. Remark 2.1. In these circumstances, we assume implicitly that X and M are dependent variables, or else searching for a direct effect is meaningless. If X and M are independent, any actual effect of X over Y cannot be canceled in presence of M although the above condition might still hold if Y and X are also independent. Thus, assuming that X and M are dependent rules out this trivial yet problematic situation. Testing the equality of the two conditional distributions can be quite tedious in practice, especially for a continuous variable Y. However, for many statistical models this equality is equivalent to (H 0 ) : E(Y X, M) a.s. = E(Y M). (2) Both conditional expectations can be easily estimated in a non-parametric way by using the well-known kernel density estimators (see Wolfgang Härdle and Sperlich (2004)). This allows us to construct testing statistics in Section 4. Remark 2.2. The null hypothesis is somewhat similar to the one considered in Hayes (2013), where the authors propose the condition E(Y X = x, M = m) = E(Y X = x 1, M = m), as a non-linear characterization of the absence of direct effect. However, no test procedure is developed in the general case. We will now describe some models in which the absence of direct effect can be reduced to (H 0 ) as defined in (2). Binary outcomes: If Y is a binary variable, then the conditional probability is defined by P(Y = 1 X, M) = E(Y X, M). Thus, the equivalence between (1) and (2) is immediate. The situation is slightly more complicated if Y is a categorical variable, for example with outcomes such as 1,..., J. In this case, the general condition (1) reduces to the equality of the conditional probabilities P(Y = j X, M) a.s. = P(Y = j M), j = 1,..., J, which cannot be reduced to a condition on E(Y X, M) only. To solve this issue, one may consider the vector Y = (1{Y = 1},..., 1{Y = J 1}) as the variable of interest. The condition (1) is equivalent to (H 0) : E(Y X, M) a.s. = E(Y M). (3) Alternatively, this situation can be tackled by using multiple tests over the different outcomes, thus reducing to the binary case. Non-parametric regression: The non-parametric regression model is of the form Y = ρ(x, M) + ɛ, where ɛ is a residual error independent from X, M and ρ is an unknown measurable function. In this model, the absence of direct effect can be investigated via ρ(x, M), which is generally more accessible than the conditional distribution. The condition (1) is equivalent to ρ(x, M) does not depend on X. Indeed, the conditional distribution of Y given X, M is the distribution of ɛ translated by ρ(x, M). If Y is integrable, we obtain ρ(x, M) = E(Y X, M), and thus hypothesis (1) is equivalent to (H 0 ) as defined in (2). 3

4 Remark 2.3. For logistic model and linear model mostly used mostly in the literature, we have the following parametric relation { f(α + bm + γx) logistic model, E(Y X, M) = α + bm + γx linear model. where f is a known function, typically the logistic function f(t) = 1/(1 + e t ). In our framework, no such assumption is made. In this parametric framework, the testing hypotheses become γ = 0 vs γ 0 (see VanderWeele and Vansteelandt (2010) for logistic model). Our non-parametric approach described below avoids this restriction on the form of the conditional expectation. 3 The non-parametric test procedure Hereafter we assume that Y is an integrable random variable and that (X, M) has a density on R 2 with respect to Lebesgue measure. Let ρ(x, M) = E(Y X, M) and φ(m) = E(Y M), testing the null hypothesis H 0 boils down to checking if the parameter θ := E ρ(x, M) φ(m) is zero. As a result, a simple test procedure can be constructed from a consistent estimator ˆθ. The functions ρ and φ can be estimated by the standard Nadaraya-Watson method: n i=1 ρ(x, m) := Y n ik h (X i x)k h (M i m) n i=1 K and h(x i x)k h (M i m) φ(m) i=1 := Y ik h (M i m) n i=1 K h(m i m). In this case, K is a symmetric kernel and K h = K(./h)/h with h > 0 the bandwidth. For simplicity s sake we chose a gaussian kernel and with theoretically optimal bandwidths (h = n 1/6 for ρ and h = n 1/5 for φ). This is sufficient to ensure the consistency of the kernel estimators under mild assumptions. As a matter of fact, calibrating the bandwidth adaptively in order to improve the estimation turns out to be unnecessary for our purposes since we are mainly interested in the distribution of the test statistics. We then compute the empirical estimator: θ := 1 n n ρ(x i, M i ) φ(m i ). i=1 If there is no direct effect, the statistic θ is expected to be close to zero, assuming that the regularity conditions for the consistency of the Nadaraya-Watson method are verified. To build the test for the absence of a direct effect, we investigate the distribution of θ under the null hypothesis. This problem is not easily tractable analytically, even asymptotically. However with binary outcomes Y we show that it can rely on bootstrap to approximate it. The distribution of θ is estimated by a bootstrap procedure as follows. 1. We draw B samples (X (b) i, M (b) i ), b = 1,..., B of size n with replacement from the original sample (X i, M i ), i = 1,..., n. 2. For each b, we generate a Bernoulli variable Y (b) i with probability φ ( M (b) ) i for i = 1,..., n. This aims to approximate the distribution of Y conditionally to M, X, which only depends on M under the null hypothesis. 4

5 3. We compute the statistics θ b over all bootstrap samples b = 1,..., B. Let t denote the observed value of θ on the observed sample. The p-value of the test is then obtained as the empirical quantile of θ 1,..., θ B, evaluated at t: p-value = 1 B B 1{ θ b > t}. b=1 Since the absence of direct effect conveys that θ must be close to zero, the null hypothesis is rejected at a significance level α (0, 1) if p-value < α. Remark 3.1. The bootstrap procedure is effective in this situation because we are able to approximate the distribution of Y (b) i conditionally to M (b) i under the null hypothesis. In the binary case, this is as a Bernoulli random variable with parameter φ ( M (b) ) i. This can easily achieved by generating Y (b) i be extended to more than two values by generating Y (b) as a multinomial distribution with probabilities ) i estimated for each value j. In the continuous case, the bootstrap step requires the approximation ( (b) φ j M i of the distribution of the residual term ɛ in the non-linear relation Y = φ(m) + ɛ. Both parametric (e.g. normality assumptions) or non-parametric approaches are possible, although they may have a nonnegligible impact on the performances of the test. 4 Non-parametric test against data 4.1 Application to students well-being To motivate the non-parametric approach, we investigate the mediation of Students Self-Efficacy M in the relation between Well-being X and Academic performance in mathematics and in French Y. 244 students from the Nantes region (France) participated in the experimentation. The variables are measured by a test instrument (i.e. a questionnaire): Variable Numbers of items Likert-Scale Score Well-Being Mean SEF in mathematics Mean SEF in french Mean Table 1: Multiple-item testing instruments used The teachers evaluate the academic performance as above average (Y = 1) or not (Y = 0). We compare the results of our non-parametric test with both the Wald test and the LR test, which are commonly applied for this kind of psychological studies. p W p LR p NP Mathematics French Mathematics French Table 2: Comparison of results on the real dataset (Table 1). The p-values p W, p LR, p NP refer respectively to the Wald, LR and non-parametric tests. 5

6 The p-values of the parametric and non-parametric tests are similar in all cases. We may note that the results are ambiguous in the first case, at the typical significance level α =.05, where the non-parametric test does not detect a direct effect. A more thorough analysis reveals that the linearity assumption on which the parametric tests rely is dubious. Indeed, the Box and Tidwell test, which measures the significance of the added variable X log(x) in the logistic model, gives a p-value.03, thus indicating a non-linear dependence in X. 4.2 Simulated data We generate data from three distinct models corresponding to three different forms of the conditional probability ρ γ (x, m) := P(Y = 1 X = x, M = m), indexed by γ being a coefficient that quantifies the importance of the direct effect, varying from γ = 0 (i.e. no direct effect) to γ = 1 (i.e. only direct effect). For each model, we generate N = 10, 000 samples of sizes n = 20, 30, 50 and 100. The observations X i, M i are selected randomly from the actual dataset of the previous section, and Y i follows a Bernoulli distribution with parameter ρ γ (X i, M i ). The three different scenarios are described below. 1. The first scenario is the logistic regression model with ρ γ (x, m) = exp ( 3 + 2γx + (1 γ)m ). This is the theoretical framework of the LR and Wald tests, although both are based on asymptotic approximations. 2. The second scenario is generated from ρ γ (x, m) := γ0.5 1 x>3 + (1 γ) m>5, where 1 stands for the indicator function. In this case, the relation between Y and X, M is nonlinear but monotonic in both x and m, so that it is expected not to deviate too much from the linear framework. 3. For the third model, we take ρ γ (x, m) = γ 1.72x x (1 γ)0.1m, where x + = max(x, 0). Due to its non-monotonic behavior in x, this setup is unfavorable for the parametric tests while it is still covered by the non-parametric one. The coefficients of the polynomial function in x are chosen so that ρ γ (X i, M i ) remains in the interval [0, 1] for all possible values of X i, M i and γ. Simulations and computations were performed using R Core Team (2016). For fixing α = 5% the significance level, Figures?? show the evolution of the empirical probability to reject the null hypothesis as function of γ. The value γ = 0 corresponds to the empirical significance level and γ > 0 give the simulated power function. All these probabilities are estimated from 10,000 independent replications. Table 4.2 displays the simulated probabilities to reject the null hypothesis. The simulated power of the three tests is low for n = 20 sample-size and significance levels α =.01,.05,.10. Moreover, the NP-test has a better empirical significance level in 75 percent of cases, LR test in 6 percent and Wald test in 19 percent. Lastly, the LR test is not conservative for all n = 20, 30 and α =.01,.05,.10, leading to reject the null hypothesis with higher probability than α given that the null hypothesis is true. 6

7 In the n = 50, 100 sample-size, in agreement with published literature (see e.g. Agresti, 2006), the Wald and LR test perform similarly in the three exemples. The non monotonic framework highlights the adding value of the nonparametric method. Indeed, for all significance level α =.01,.05,.10, the NP test has a high simulated power as soon as n 30. In this case, both parametric tests are not appropriate even for large-sample (e.g. for γ = 1 and α =.05, the Wald test and the LR test rejects only the null hypothesis respectively 31 percent and 35 percent while the NP-test always rejects the null hypothesis). Figure 2: Comparison of the empirical level significance (γ = 0) and of the empirical power (γ > 0) in the logistic model with significance level α =.05. The NP-test outperforms the parametric tests in a linear setup, as one would anticipate for large sample size (n = 50, n = 100). However, results show that NP-test has a better significance level than the Wald and LR tests for small samples (n = 20, 30, 50). Furthermore, the Wald test is the worst test for the sample sizes n = 20, n = 30. 7

8 Figure 3: Comparison of the empirical level significance and of the empirical power in the non-linear monotonic case with significance level α =.05. In the non-linear monotonic case, the power of the tests is similar, as one would anticipate. As in logistic framework, the power of all the LR tests is low for small sample size (n = 20, 30), but the empirical level significance of the NP-test is better than the others for all α {.01,.05,.10}. Conversely, even if the sample size and the deviation from the null hypothesis are large, the parametric tests have rarely detected the direct effect of X on p = P(Y = 1 X, M). 8

9 Figure 4: Comparison of the empirical level significance and of the empirical power in the non-linear non monotonic case with significance level α =.05. The power of the parametric tests is low for all sample size. On the contrary, the power of the NP-test increases with γ. Moreover, the empirical significance level is close to the theoretical value α =.05. 9

10 Sample size n γ Simulated power α =.01 α =.05 α =.10 π W π LR π NP π W π LR π NP π W π LR π NP Logistic Monotonic relationship Non monotonic relationship Table 3: Summary of the estimated significance levels and powers in the three scenarios. 10

11 5 Aknowledgements This research was funded by several grants from France s Ministère de l Education Nationale et de l Enseignement Supérieur, the Dfenseur des Droits, and the Agence pour la Cohésion Sociale et l Egalité des Chances. References Agresti, A. (2006). Multicategory Logit Models. John Wiley & Sons, Inc. Baron, R. and Kenny, D. A. (1986). The moderator-mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations. Journal of Personality and Social Psychology, 51(6):1173. Box, G. E. and Tidwell, P. W. (1962). Transformation of the independent variables. Technometrics, 4(4): Harrell, Jr., F. E. (2006). Regression Modeling Strategies. Springer-Verlag, Berlin, Heidelberg. Hayes, A. (2013). Introduction to Mediation, Moderation, and Conditional Process Analysis: A Regression-Based Approach. Methodology in the Social Sciences Series. Guilford Publications. Imai, K. and Keele, L. (2010). A general approach to causal mediation analysis. Psychological Methods, 15(4): Jr., W. W. H. and Donner, A. (1977). Wald s test as applied to hypotheses in logit analysis. Journal of the American Statistical Association, 72(360a): MacKinnon, D. P. (2008). Introduction to Statistical Mediation Analysis. Routledge. R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Schmader, T. and Johns, M. (2003). Converging evidence that stereotype threat reduces working memory capacity. Journal of personality and social psychology, 85(3): VanderWeele, T. J. and Vansteelandt, S. (2010). Odds ratios for mediation analysis for a dichotomous outcome. American Journal of Epidemiology, 172(12). Wolfgang Härdle, Axel Werwatz, M. M. and Sperlich, S. (2004). Nonparametric and Semiparametric Models. Spinger. 11

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 An Introduction to Causal Mediation Analysis Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016 1 Causality In the applications of statistics, many central questions

More information

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect,

13.1 Causal effects with continuous mediator and. predictors in their equations. The definitions for the direct, total indirect, 13 Appendix 13.1 Causal effects with continuous mediator and continuous outcome Consider the model of Section 3, y i = β 0 + β 1 m i + β 2 x i + β 3 x i m i + β 4 c i + ɛ 1i, (49) m i = γ 0 + γ 1 x i +

More information

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula.

Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. FACULTY OF PSYCHOLOGY AND EDUCATIONAL SCIENCES Flexible mediation analysis in the presence of non-linear relations: beyond the mediation formula. Modern Modeling Methods (M 3 ) Conference Beatrijs Moerkerke

More information

Statistical Analysis of Causal Mechanisms

Statistical Analysis of Causal Mechanisms Statistical Analysis of Causal Mechanisms Kosuke Imai Princeton University November 17, 2008 Joint work with Luke Keele (Ohio State) and Teppei Yamamoto (Princeton) Kosuke Imai (Princeton) Causal Mechanisms

More information

Causal Mechanisms Short Course Part II:

Causal Mechanisms Short Course Part II: Causal Mechanisms Short Course Part II: Analyzing Mechanisms with Experimental and Observational Data Teppei Yamamoto Massachusetts Institute of Technology March 24, 2012 Frontiers in the Analysis of Causal

More information

Identification and Inference in Causal Mediation Analysis

Identification and Inference in Causal Mediation Analysis Identification and Inference in Causal Mediation Analysis Kosuke Imai Luke Keele Teppei Yamamoto Princeton University Ohio State University November 12, 2008 Kosuke Imai (Princeton) Causal Mediation Analysis

More information

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms

Causal Mediation Analysis in R. Quantitative Methodology and Causal Mechanisms Causal Mediation Analysis in R Kosuke Imai Princeton University June 18, 2009 Joint work with Luke Keele (Ohio State) Dustin Tingley and Teppei Yamamoto (Princeton) Kosuke Imai (Princeton) Causal Mediation

More information

Statistical Analysis of Causal Mechanisms

Statistical Analysis of Causal Mechanisms Statistical Analysis of Causal Mechanisms Kosuke Imai Luke Keele Dustin Tingley Teppei Yamamoto Princeton University Ohio State University July 25, 2009 Summer Political Methodology Conference Imai, Keele,

More information

Testing for Regime Switching in Singaporean Business Cycles

Testing for Regime Switching in Singaporean Business Cycles Testing for Regime Switching in Singaporean Business Cycles Robert Breunig School of Economics Faculty of Economics and Commerce Australian National University and Alison Stegman Research School of Pacific

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE

Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE Revision list for Pearl s THE FOUNDATIONS OF CAUSAL INFERENCE insert p. 90: in graphical terms or plain causal language. The mediation problem of Section 6 illustrates how such symbiosis clarifies the

More information

Statistical Analysis of Causal Mechanisms for Randomized Experiments

Statistical Analysis of Causal Mechanisms for Randomized Experiments Statistical Analysis of Causal Mechanisms for Randomized Experiments Kosuke Imai Department of Politics Princeton University November 22, 2008 Graduate Student Conference on Experiments in Interactive

More information

Causal mediation analysis: Definition of effects and common identification assumptions

Causal mediation analysis: Definition of effects and common identification assumptions Causal mediation analysis: Definition of effects and common identification assumptions Trang Quynh Nguyen Seminar on Statistical Methods for Mental Health Research Johns Hopkins Bloomberg School of Public

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

A review of some semiparametric regression models with application to scoring

A review of some semiparametric regression models with application to scoring A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France

More information

Casual Mediation Analysis

Casual Mediation Analysis Casual Mediation Analysis Tyler J. VanderWeele, Ph.D. Upcoming Seminar: April 21-22, 2017, Philadelphia, Pennsylvania OXFORD UNIVERSITY PRESS Explanation in Causal Inference Methods for Mediation and Interaction

More information

Section 7: Local linear regression (loess) and regression discontinuity designs

Section 7: Local linear regression (loess) and regression discontinuity designs Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Flexible Mediation Analysis in the Presence of Nonlinear Relations: Beyond the Mediation Formula

Flexible Mediation Analysis in the Presence of Nonlinear Relations: Beyond the Mediation Formula Multivariate Behavioral Research ISSN: 0027-3171 (Print) 1532-7906 (Online) Journal homepage: http://www.tandfonline.com/loi/hmbr20 Flexible Mediation Analysis in the Presence of Nonlinear Relations: Beyond

More information

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design

Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design 1 / 32 Empirical Likelihood Methods for Two-sample Problems with Data Missing-by-Design Changbao Wu Department of Statistics and Actuarial Science University of Waterloo (Joint work with Min Chen and Mary

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

Estimation and sample size calculations for correlated binary error rates of biometric identification devices

Estimation and sample size calculations for correlated binary error rates of biometric identification devices Estimation and sample size calculations for correlated binary error rates of biometric identification devices Michael E. Schuckers,11 Valentine Hall, Department of Mathematics Saint Lawrence University,

More information

Test of Association between Two Ordinal Variables while Adjusting for Covariates

Test of Association between Two Ordinal Variables while Adjusting for Covariates Test of Association between Two Ordinal Variables while Adjusting for Covariates Chun Li, Bryan Shepherd Department of Biostatistics Vanderbilt University May 13, 2009 Examples Amblyopia http://www.medindia.net/

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Generalized Linear Modeling - Logistic Regression

Generalized Linear Modeling - Logistic Regression 1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Robert V. Breunig Centre for Economic Policy Research, Research School of Social Sciences and School of

More information

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study

TECHNICAL REPORT # 59 MAY Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study TECHNICAL REPORT # 59 MAY 2013 Interim sample size recalculation for linear and logistic regression models: a comprehensive Monte-Carlo study Sergey Tarima, Peng He, Tao Wang, Aniko Szabo Division of Biostatistics,

More information

Measuring Social Influence Without Bias

Measuring Social Influence Without Bias Measuring Social Influence Without Bias Annie Franco Bobbie NJ Macdonald December 9, 2015 The Problem CS224W: Final Paper How well can statistical models disentangle the effects of social influence from

More information

Abstract Title Page. Title: Degenerate Power in Multilevel Mediation: The Non-monotonic Relationship Between Power & Effect Size

Abstract Title Page. Title: Degenerate Power in Multilevel Mediation: The Non-monotonic Relationship Between Power & Effect Size Abstract Title Page Title: Degenerate Power in Multilevel Mediation: The Non-monotonic Relationship Between Power & Effect Size Authors and Affiliations: Ben Kelcey University of Cincinnati SREE Spring

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Behind the Curve and Beyond: Calculating Representative Predicted Probability Changes and Treatment Effects for Non-Linear Models

Behind the Curve and Beyond: Calculating Representative Predicted Probability Changes and Treatment Effects for Non-Linear Models Metodološki zvezki, Vol. 15, No. 1, 2018, 43 58 Behind the Curve and Beyond: Calculating Representative Predicted Probability Changes and Treatment Effects for Non-Linear Models Bastian Becker 1 Abstract

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Help! Statistics! Mediation Analysis

Help! Statistics! Mediation Analysis Help! Statistics! Lunch time lectures Help! Statistics! Mediation Analysis What? Frequently used statistical methods and questions in a manageable timeframe for all researchers at the UMCG. No knowledge

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Statistical Analysis of Causal Mechanisms

Statistical Analysis of Causal Mechanisms Statistical Analysis of Causal Mechanisms Kosuke Imai Princeton University April 13, 2009 Kosuke Imai (Princeton) Causal Mechanisms April 13, 2009 1 / 26 Papers and Software Collaborators: Luke Keele,

More information

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

2/26/2017. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 When and why do we use logistic regression? Binary Multinomial Theory behind logistic regression Assessing the model Assessing predictors

More information

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects

Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Ratio of Mediator Probability Weighting for Estimating Natural Direct and Indirect Effects Guanglei Hong University of Chicago, 5736 S. Woodlawn Ave., Chicago, IL 60637 Abstract Decomposing a total causal

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Harvard University. Rigorous Research in Engineering Education

Harvard University. Rigorous Research in Engineering Education Statistical Inference Kari Lock Harvard University Department of Statistics Rigorous Research in Engineering Education 12/3/09 Statistical Inference You have a sample and want to use the data collected

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

How likely is Simpson s paradox in path models?

How likely is Simpson s paradox in path models? How likely is Simpson s paradox in path models? Ned Kock Full reference: Kock, N. (2015). How likely is Simpson s paradox in path models? International Journal of e- Collaboration, 11(1), 1-7. Abstract

More information

Statistical Analysis of the Item Count Technique

Statistical Analysis of the Item Count Technique Statistical Analysis of the Item Count Technique Kosuke Imai Department of Politics Princeton University Joint work with Graeme Blair May 4, 2011 Kosuke Imai (Princeton) Item Count Technique UCI (Statistics)

More information

36-720: The Rasch Model

36-720: The Rasch Model 36-720: The Rasch Model Brian Junker October 15, 2007 Multivariate Binary Response Data Rasch Model Rasch Marginal Likelihood as a GLMM Rasch Marginal Likelihood as a Log-Linear Model Example For more

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Marginal Screening and Post-Selection Inference

Marginal Screening and Post-Selection Inference Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2

More information

Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test)

Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test) Chapter 861 Tests for the Odds Ratio in Logistic Regression with One Binary X (Wald Test) Introduction Logistic regression expresses the relationship between a binary response variable and one or more

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful? Journal of Modern Applied Statistical Methods Volume 10 Issue Article 13 11-1-011 Estimation and Hypothesis Testing in LAV Regression with Autocorrelated Errors: Is Correction for Autocorrelation Helpful?

More information

Lecture 2: Basic Concepts of Statistical Decision Theory

Lecture 2: Basic Concepts of Statistical Decision Theory EE378A Statistical Signal Processing Lecture 2-03/31/2016 Lecture 2: Basic Concepts of Statistical Decision Theory Lecturer: Jiantao Jiao, Tsachy Weissman Scribe: John Miller and Aran Nayebi In this lecture

More information

Statistical Methods for Causal Mediation Analysis

Statistical Methods for Causal Mediation Analysis Statistical Methods for Causal Mediation Analysis The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters. Citation Accessed Citable

More information

Measures of Association and Variance Estimation

Measures of Association and Variance Estimation Measures of Association and Variance Estimation Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 35

More information

Lecture 21: October 19

Lecture 21: October 19 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 21: October 19 21.1 Likelihood Ratio Test (LRT) To test composite versus composite hypotheses the general method is to use

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

A Model for Correlated Paired Comparison Data

A Model for Correlated Paired Comparison Data Working Paper Series, N. 15, December 2010 A Model for Correlated Paired Comparison Data Manuela Cattelan Department of Statistical Sciences University of Padua Italy Cristiano Varin Department of Statistics

More information

CORRELATION, ASSOCIATION, CAUSATION, AND GRANGER CAUSATION IN ACCOUNTING RESEARCH

CORRELATION, ASSOCIATION, CAUSATION, AND GRANGER CAUSATION IN ACCOUNTING RESEARCH CORRELATION, ASSOCIATION, CAUSATION, AND GRANGER CAUSATION IN ACCOUNTING RESEARCH Alireza Dorestani, Northeastern Illinois University Sara Aliabadi, Northeastern Illinois University ABSTRACT In this paper

More information

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies

Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Causal Inference Lecture Notes: Causal Inference with Repeated Measures in Observational Studies Kosuke Imai Department of Politics Princeton University November 13, 2013 So far, we have essentially assumed

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Graeme Blair Kosuke Imai Princeton University December 17, 2010 Blair and Imai (Princeton) List Experiments Political Methodology Seminar 1 / 32 Motivation Surveys

More information

Interpreting and using heterogeneous choice & generalized ordered logit models

Interpreting and using heterogeneous choice & generalized ordered logit models Interpreting and using heterogeneous choice & generalized ordered logit models Richard Williams Department of Sociology University of Notre Dame July 2006 http://www.nd.edu/~rwilliam/ The gologit/gologit2

More information

A nonparametric test for seasonal unit roots

A nonparametric test for seasonal unit roots Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna To be presented in Innsbruck November 7, 2007 Abstract We consider a nonparametric test for the

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

Estimating direct effects in cohort and case-control studies

Estimating direct effects in cohort and case-control studies Estimating direct effects in cohort and case-control studies, Ghent University Direct effects Introduction Motivation The problem of standard approaches Controlled direct effect models In many research

More information

Identification, Inference, and Sensitivity Analysis for Causal Mediation Effects

Identification, Inference, and Sensitivity Analysis for Causal Mediation Effects Identification, Inference, and Sensitivity Analysis for Causal Mediation Effects Kosuke Imai Luke Keele Teppei Yamamoto First Draft: November 4, 2008 This Draft: January 15, 2009 Abstract Causal mediation

More information

LECTURE 5. Introduction to Econometrics. Hypothesis testing

LECTURE 5. Introduction to Econometrics. Hypothesis testing LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will

More information

Modeling Mediation: Causes, Markers, and Mechanisms

Modeling Mediation: Causes, Markers, and Mechanisms Modeling Mediation: Causes, Markers, and Mechanisms Stephen W. Raudenbush University of Chicago Address at the Society for Resesarch on Educational Effectiveness,Washington, DC, March 3, 2011. Many thanks

More information

Bootstrapping Sensitivity Analysis

Bootstrapping Sensitivity Analysis Bootstrapping Sensitivity Analysis Qingyuan Zhao Department of Statistics, The Wharton School University of Pennsylvania May 23, 2018 @ ACIC Based on: Qingyuan Zhao, Dylan S. Small, and Bhaswar B. Bhattacharya.

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

New developments in structural equation modeling

New developments in structural equation modeling New developments in structural equation modeling Rex B Kline Concordia University Montréal Set B: Mediation A UNL Methodology Workshop A2 Topics o Mediation: Design requirements Conditional process modeling

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies

Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies Unpacking the Black-Box: Learning about Causal Mechanisms from Experimental and Observational Studies Kosuke Imai Princeton University Joint work with Keele (Ohio State), Tingley (Harvard), Yamamoto (Princeton)

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

USE OF R ENVIRONMENT FOR A FINDING PARAMETERS NONLINEAR REGRESSION MODELS

USE OF R ENVIRONMENT FOR A FINDING PARAMETERS NONLINEAR REGRESSION MODELS Session 1. Statistic Methods and Their Applications Proceedings of the 11th International Conference Reliability and Statistics in Transportation and Communication (RelStat 11), 19 22 October 2011, Riga,

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM)

SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SC705: Advanced Statistics Instructor: Natasha Sarkisian Class notes: Introduction to Structural Equation Modeling (SEM) SEM is a family of statistical techniques which builds upon multiple regression,

More information

Covariate Balancing Propensity Score for General Treatment Regimes

Covariate Balancing Propensity Score for General Treatment Regimes Covariate Balancing Propensity Score for General Treatment Regimes Kosuke Imai Princeton University October 14, 2014 Talk at the Department of Psychiatry, Columbia University Joint work with Christian

More information

Discussion of Papers on the Extensions of Propensity Score

Discussion of Papers on the Extensions of Propensity Score Discussion of Papers on the Extensions of Propensity Score Kosuke Imai Princeton University August 3, 2010 Kosuke Imai (Princeton) Generalized Propensity Score 2010 JSM (Vancouver) 1 / 11 The Theme and

More information

Mediation for the 21st Century

Mediation for the 21st Century Mediation for the 21st Century Ross Boylan ross@biostat.ucsf.edu Center for Aids Prevention Studies and Division of Biostatistics University of California, San Francisco Mediation for the 21st Century

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

Data Integration for Big Data Analysis for finite population inference

Data Integration for Big Data Analysis for finite population inference for Big Data Analysis for finite population inference Jae-kwang Kim ISU January 23, 2018 1 / 36 What is big data? 2 / 36 Data do not speak for themselves Knowledge Reproducibility Information Intepretation

More information

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit

On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit On the Use of Nonparametric ICC Estimation Techniques For Checking Parametric Model Fit March 27, 2004 Young-Sun Lee Teachers College, Columbia University James A.Wollack University of Wisconsin Madison

More information

Testing an Autoregressive Structure in Binary Time Series Models

Testing an Autoregressive Structure in Binary Time Series Models ömmföäflsäafaäsflassflassflas ffffffffffffffffffffffffffffffffffff Discussion Papers Testing an Autoregressive Structure in Binary Time Series Models Henri Nyberg University of Helsinki and HECER Discussion

More information

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 Statistics Boot Camp Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018 March 21, 2018 Outline of boot camp Summarizing and simplifying data Point and interval estimation Foundations of statistical

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

Score test for random changepoint in a mixed model

Score test for random changepoint in a mixed model Score test for random changepoint in a mixed model Corentin Segalas and Hélène Jacqmin-Gadda INSERM U1219, Biostatistics team, Bordeaux GDR Statistiques et Santé October 6, 2017 Biostatistics 1 / 27 Introduction

More information

Sample size determination for logistic regression: A simulation study

Sample size determination for logistic regression: A simulation study Sample size determination for logistic regression: A simulation study Stephen Bush School of Mathematical Sciences, University of Technology Sydney, PO Box 123 Broadway NSW 2007, Australia Abstract This

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend

More information