Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Similar documents
Chapter 1 Statistical Inference

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Outline. Example and Model ANOVA table F tests Pairwise treatment comparisons with LSD Sample and subsample size determination

Math 423/533: The Main Theoretical Topics

Exam Applied Statistical Regression. Good Luck!

Time-Invariant Predictors in Longitudinal Models

Chapter 22: Log-linear regression for Poisson counts

Experimental Design and Data Analysis for Biologists

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

Time-Invariant Predictors in Longitudinal Models

Introduction to Within-Person Analysis and RM ANOVA

Correlation and regression

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Statistical Distribution Assumptions of General Linear Models

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Lecture 12: Effect modification, and confounding in logistic regression

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Time-Invariant Predictors in Longitudinal Models

Introduction to Statistical modeling: handout for Math 489/583

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear Models

Categorical Predictor Variables

Analysis of Variance

Generalized Linear Models 1

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Simple Linear Regression

Review of Statistics 101

Generalized Linear Models (GLZ)

Generalized linear models

Multiple Linear Regression

Biostatistics for physicists fall Correlation Linear regression Analysis of variance

20. REML Estimation of Variance Components. Copyright c 2018 (Iowa State University) 20. Statistics / 36

Comparing Nested Models

Linear Modelling in Stata Session 6: Further Topics in Linear Modelling

Logistic Regressions. Stat 430

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

Linear Regression Models P8111

Generalized Models: Part 1

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

How the mean changes depends on the other variable. Plots can show what s happening...

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Statistics 572 Semester Review

Logistic Regression. Continued Psy 524 Ainsworth

Mathematical Notation Math Introduction to Applied Statistics

Generalized linear models

Regression, Ridge Regression, Lasso

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

ST430 Exam 2 Solutions

Time Invariant Predictors in Longitudinal Models

Stats fest Analysis of variance. Single factor ANOVA. Aims. Single factor ANOVA. Data

Sociology 6Z03 Review II

1 The Classic Bivariate Least Squares Model

Multiple linear regression S6

36-463/663: Multilevel & Hierarchical Models

Linear Regression In God we trust, all others bring data. William Edwards Deming

Acknowledgements. Outline. Marie Diener-West. ICTR Leadership / Team INTRODUCTION TO CLINICAL RESEARCH. Introduction to Linear Regression

UNIVERSITY OF TORONTO Faculty of Arts and Science

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

SRI RAMAKRISHNA INSTITUTE OF TECHNOLOGY DEPARTMENT OF SCIENCE & HUMANITIES STATISTICS & NUMERICAL METHODS TWO MARKS

Model Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection

Model Estimation Example

Confidence Intervals, Testing and ANOVA Summary

MASM22/FMSN30: Linear and Logistic Regression, 7.5 hp FMSN40:... with Data Gathering, 9 hp

Introduction to General and Generalized Linear Models

Exam details. Final Review Session. Things to Review

Topic 12. The Split-plot Design and its Relatives (Part II) Repeated Measures [ST&D Ch. 16] 12.9 Repeated measures analysis

BIOMETRICS INFORMATION

Inferences for Regression

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

10. Alternative case influence statistics

Lecture 6: Linear Regression (continued)

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Longitudinal Data Analysis of Health Outcomes

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

A (Brief) Introduction to Crossed Random Effects Models for Repeated Measures Data

WU Weiterbildung. Linear Mixed Models

A strategy for modelling count data which may have extra zeros

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

Ch 2: Simple Linear Regression

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Generalized Linear Models

Multinomial Logistic Regression Models

Machine Learning Linear Regression. Prof. Matteo Matteucci

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Using R formulae to test for main effects in the presence of higher-order interactions

Random and Mixed Effects Models - Part III

A brief introduction to mixed models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Intro to Linear Regression

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

Transcription:

1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of coefficients and degrees of freedom Main effect of a numerical predictor: 1 (slope) Main effect of a factor with a categories: a 1 (adjustments to intercept) Interaction effect of two numerical predictors: 1 Interaction effect of a numerical predictor and a factor with a categories: a 1 (adjustments to slope) Interaction effect of two factors, a and b categories each: (a 1) (b 1) (adjustment to group means) 2. Assumptions homogeneous variance of residuals, normal distribution of residuals, correct average relationship IE(Y ) = Xβ. This may mean a straight linear relationship with a numerical predictor, and/or absence of interaction when interaction effects are not included, etc. 3. Transformations of numerical variables: used to meet the assumptions of the linear model or to decrease the influence of points that have high leverage. Interpretation when the response is log-transformed: numerical predictor with slope β 1 : an increase of 1 unit in the predictor corresponds to an increase of about (β 1 100) % in the average response (when β 1 is small). factor, with difference β 1 between two categories: switching from one category to the other corresponds to an increase of (β 1 100) % in the average response (if β 1 has a small value). Interpretation when a numerical predictor is log-transformed: non-transformed response: a 1% increase in the predictor corresponds to an increase of the log-predictor by 0.01, and to an increase of β 1 0.01 in the average response. log-transformed response: a 1% increase in the predictor corresponds to an increase by about β 1 % in the average response. 4. Sums of Squares SSTrt = RSS(model without Trt) RSS(model with Trt) Type I SS: formula above with a model that progressively includes all predictors, added one by one. Type III SS: formula above with a model that uses all predictors, with/without Trt. In traditional designs with complete balance and all categorical predictors, type I SS = type III SS. Otherwise, type III SS and associated ANOVA f-tests are preferred. 1

5. ANOVA f-test To compare nested models, based on the reduction in RSS from the simple model to the complex model. Ex: the complex model has the same list of predictors as the simple model, plus an extra predictor: Trt. H 0 : the simple model is correct, i.e. all coefficients for Trt are = 0. Compare the f value with an F-distribution, degrees of freedom for Trt (numerator) and residual df (denominator). Exact test, provided model assumptions are met. 6. Likelihood Ratio Test (LRT) To compare nested models, based on the increase in likelihood from the simple model to the complex model. Ex: the complex model has the same list of predictors as the simple model, plus an extra predictor: Trt. H 0 : the simple model is correct, i.e. all coefficients for Trt are = 0. Compare the LR statistic X 2 = 2 log L(complex) 2 log L(simple) to a χ 2 k-distribution, with degrees of freedom k = number of coefficients for Trt. Approximate test. Good approximation when the sample size is large. 7. Testing individual coefficients for post-hoc comparisons Wald t-tests: based on SE for each coefficient. H 0 : the coefficient = 0. If a predictor uses a single coefficient, the Wald t-test for that coefficient and the f-test for the predictor provide the same p-value. If a predictor is associated with several coefficients, beware of the meaning of these coefficients (differences to which reference level? or differences to an overall mean?) Also beware of multiple testing. F-test preferred for an overall test of the predictor. 8. Model Selection Beware of multiple testing: avoid starting with over-complicated models Search methods: stepwise backward, forward, or in both direction Search criteria: AIC (includes more predictors), BIC (includes fewer), f-test or LRT (both provide p-values) 2

2 Logistic Regression Binary response (yes/no), binomial distribution assumed: Y i B(p i, n i ) with p i = eη i 1 + e η i p i and η i = log( ) = Xβ 1 p i 1. Interpretation The linear predictor η = log of the odds of success. exp(η i ) = odds of success. Coefficient β 1 of a numerical predictor: at the 50:50 point, an increase of 1 unit in the predictor results in an increase of about β 1 /4 in the proportion of success (approximation okay provided β 1 /4 is not too large). Coefficient β 1 for the difference between 2 categories of a treatment factor: when switching from one category to the other, the odds of success is muliplied by exp(β 1 ), which corresponds to an increase of about (β 1 100)% if β 1 is small. 2. Likelihood ratio test There is no F-test in logistic regression. Only the LR test is available. H 0 : the simple model is correct, i.e. all extra coefficients in the complex model are = 0. Compare X 2 = Deviance(simple) Deviance(complex) to a χ 2 k distribution with df k = number of extra parameters in the complex model. It is approximate. 3. Wald test for post-hoc tests of individual coefficients H 0 : the coefficient = 0. The Wald test is a z -test: the ratio of a coefficient to its SE is compared to a normal distribution. It is very approximate. For a predictor that corresponds to a single coefficient, the Wald test for this coefficient does not always agree with the LR test for the predictor. The LR test is to be preferred. Check the meaning of each coefficient. Changing the reference level of a categorical predictor can drastically alter the p-values from the Wald tests, because the new coefficients take new meanings. 4. Model selection Available criteria: LR test, AIC, BIC. Only the LR test provides p-values. 5. Goodness-of-fit test H 0 : the current model is good enough to explain the data. The alternative hypothesis includes any reason that may cause the model to not be good enough, such as: an important predictor is missing, lack of independence within groups causing overdispersion, presence of one outlier that is not representative of the population, etc. Compare the residual deviance to a χ 2 dfres distribution on df = residual df. Approximate. 3

Beware: the p-value obtained that way is a very poor approximation when all n i = 1. Alternative: compare the residual deviance to that obtained by parametric bootstrapping (simulate experiments under H 0 ). 6. Residuals Response, deviance, or Pearson s residuals. Deviance residuals are preferred for residual plots. When all n i = 1, there is an expected pattern in residual plots. Pearson residuals are preferred for estimating the dispersion. 7. Dispersion σ 2 under the hypothesis that the variance of observations has the form var(y i ) = σ 2 n i p i (1 p i ). The Binomial distribution implies σ 2 = 1. Estimate ˆσ 2 from Pearson residuals. Deviance/dfRes is the estimate from deviance residuals. Over-dispersion can be invoked when ˆσ 2 is significantly > 1 and when everything else fails. 8. Quasi-binomial analysis Based on logistic regression with the extra σ 2 parameter (but does not correspond to a model with a well defined distribution). An ad-hoc f-test replaces the LR test. Ad-hoc Wald t-tests replace the Wald z-tests. 3 Random and mixed effect models Normal distribution of residuals, and some categorical predictors are given random effects: 1. Parameters Y = X fixed β + X random α + ɛ with ɛ N (0, σ 2 e) and α s are normally distributed Coefficients for fixed effects (intercept and possibly other coefficients), the variance of residuals σ 2 e and one or more variance component(s). 2. ML, REML and MS ML: maximum likelihood method to estimate parameters. components. Tends to underestimate variance REML: restricted maximum likelihood. Less biased or unbiased variance components. Can only be compared across models that share the same fixed effects. MS: mean squares approach. Closed formula for estimating all parameters, but only available in balanced cases. Can result in negative estimated variance components. Same as REML in many cases. 4

3. LR test for random effects To compare nested models, when the complex model has extra random effects. H 0 : the simple model is correct. Based on the LR statistic X 2 = 2 log L(complex) 2 log L(simple). The REML criterion is preferred, but the ML criterion can be used as well. Use the same criterion for both models. χ 2 -based p-value: compare X 2 to a χ 2 k distribution with df = number of extra variance parameters in the complex model. the p-value above tends to be conservative (too large). Dividing it by 2 provides a more accurate p-value in some cases when a single variance parameter is tested. Alternatively, get the exact p-value: compare X 2 to the true null distribution as obtained by parametric bootstrapping (simulate new data under H 0 ). 4. LR test for fixed effects To compare nested models, when the complex model has extra fixed effects. H 0 : the simple model is correct. Based on the LR statistic X 2 = 2 log L(complex) 2 log L(simple). The ML criterion must be used. χ 2 -based p-value: compare X 2 to a χ 2 k distribution with df = number of extra coefficients in the complex model. the p-value above tends to be anti-conservative (too small). Sometimes very much so. Other times it is accurate when the data set is large compared to the number of coefficients. Alternatively, get the exact p-value: compare X 2 to the true null distribution as obtained by parametric bootstrapping (simulate new data under H 0 ). 5. F test for fixed effects To compare nested models, when the complex model has extra fixed effects. H 0 : the simple model is correct. F = ratio of Mean Square (MS) values, and based on closed formulas for MS with balanced designs. Based on linear algebra in general. Approximate test: ignores the fact that variance components are not known perfectly. Compare the f-value with an F-distribution. Numerator df = number of extra coefficients in the complex model. Denominator df: can be calculated exactly in balanced designs only. Beware of random effects that are nested in other effects. They may cause the denominator df to differ from the residual df. Can be more powerful than the LR test. 5

4 Topics for all models 1. Two-way and higher order interactions Two-way interaction: when the effect of one predictor depends on the level of the other predictor. Lack of parallelism in profile plot of group means. Three-way interaction: when the two-way interaction effect of two predictors depends on the level of a third predictor. Two predictors can be independent and also have an interaction effect on the response. 2. Correlation among predictors Frequent scenarios: Both predictors are significant when included alone in the model, both are non-significant when included together in the model. Their correlation prevents us from teasing out which predictor is really associated with the response. Either predictor can be chosen for inclusion in the model; the other predictor should be dropped. The conclusion should state the uncertainty about which predictor is best. One predictor is significant whether the other predictor is included in the model or not. The significance of the other predictor depends on the inclusion of the first predictor in the model. The first predictor should be included in the model. It can be a confounding variable for the second predictor (and would be lurking if ignored). 3. Causality versus prediction In designed experiments, randomization aims to eliminate any known or unknown confounding variables, by forming treatment groups that are equivalent about everything expect for the treatment(s). Causality can be inferred from designed experiments. Otherwise, only prediction can be made, and more assumptions are needed to extend prediction to causality statements. 5 Particular experimental designs 1. Subsampling Happens when measurements are repeated on the the same experimental unit. Subsampling can be added on top on any design. Ignoring the dependence that comes from subsampling is called pseudo-replication. Usually results in false significance. Subsampling is accounted for by adding a random effect for the experimental unit from which subsamples are taken. Experimental units are nested within treatments (and maybe within blocks as well) because each unit received only one treatment (or belongs to only one block). Therefore... 6

Beware of not using the Subsampling MSerror for F-tests, but instead use the MSError and dferror at the experimental unit level. 2. CRD: Completely Randomized Design In this design, all levels of the treatment or all combinations of treatment levels are assigned to experimental units completely at random. Randomization is most complete, and the associated model is most simple: the only predictors are the treatment(s). For a treatment with a categories and n units (plots) per treatment and possibly s subsamples per unit (plot): Treatment A a 1 MStrt/MSErr Error a(n 1) Total an 1 3. RCBD: Randomized Complete Block Design or Treatment A a 1 MStrt/MSPlotErr Plot Error a(n 1) Subsamp. Err. an(s 1) Total ans 1 Blocks are defined based on biological grounds, so that variability within blocks is expected to be lower than variability between measurements in different blocks. Y = Treatment + Block + error No interaction in the usual case when there is no replication within blocks. For a treatment with a categories, b blocks, 1 unit (plot) per combination treatment/block and possibly s subsamples per unit (plot): Trt a 1 MSTrt/MSErr Block b 1 MSBlock/MSError Error (a 1)(b 1) Total ab 1 4. Latin Square Design with double-blocking: Y = Treatment + Row + Column + error Treatment a 1 MSTrt/MSPlotErr Row a 1 MSRow/MSPlotErr Column a 1 MSCol/MSPlotErr Error (a 1)(a 2) Total a 2 1 or Treatment a 1 MSTrt/MSPlotErr Block b 1 MSBlock/MSPlotErr Plot Error (a 1)(b 1) Subsamp. Err. ab(s 1) Total abs 1 Multiple Latin squares could be used, in which case a square effect is added. Usually row and column effects are considered nested within squares. 7

5. Factorial designs For two or more factors. Full factorial: when all combinations of factor levels are tested. When several replicates are included for each combination, the model includes the interaction between the two factors. Y = factor A + factor B + A:B + error If A has a levels, B has b levels and n units per combination of treatment levels: A a 1 MSA/MSErr B b 1 MSB/MSErr A:B (a 1)(b 1) MSA:B/MSErr Error ab(n 1) Total abn 1 This design could either use complete randomization (CRD, table above) or blocking (RCBD, modify the model and table with a blocking factor). 6. Split-plot designs Happens when one factor (C) is randomized at a finer level than another factor (A). Whole-plots are assigned to levels of A and subplots within whole-plots are assigned to levels of B. Y = A + C + A:C + whole-plot + error Whole-plots must be given random effects, because they are nested in A (each whole-plot receives only 1 level of A). Therefore... Beware of using MS Whole-plot and df Whole-plot when testing the effect of A with the f-test. If A has a levels, C has c levels, b whole-plots per level of A, and 1 subplot per level of C within each whole-plot: per combination of treatment levels: A a 1 MSA/MSWPErr Whole-plot (WP) Err. a(b 1) C c 1 MSC/MSSPErr A:C (a 1)(c 1) MSA:C/MSSPErr Subplot (SP) Error a(b 1)(c 1) Total abc 1 Can be combined with blocking whole-plots into blocks, in which case the model needs to account for blocks as well. With b blocks and 1 whole-plot per block per level of A: A a 1 MSA/MSWPErr Block b 1 MSBlk/MSWPErr Whole-plot (WP) Err. (a 1)(b 1) C c 1 MSC/MSSPErr A:C (a 1)(c 1) MSA:C/MSSPErr Subplot (SP) Error a(b 1)(c 1) Total abc 1 8