Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Similar documents
Introduction to the R Statistical Computing Environment

Generalized Linear Models

Generalized Linear Models: An Introduction

Generalized linear models

The First Thing You Ever Do When Receive a Set of Data Is

Package HGLMMM for Hierarchical Generalized Linear Models

Sta s cal Models: Day 3

Lecture 9 STK3100/4100

11. Generalized Linear Models: An Introduction

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Statistics 572 Semester Review

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

1 gamma.mixed: Mixed effects gamma regression

Package blme. August 29, 2016

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Stat 209 Lab: Linear Mixed Models in R This lab covers the Linear Mixed Models tutorial by John Fox. Lab prepared by Karen Kapur. ɛ i Normal(0, σ 2 )

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Generalized Linear Models

STAT5044: Regression and Anova

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

SPH 247 Statistical Analysis of Laboratory Data. April 28, 2015 SPH 247 Statistics for Laboratory Data 1

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

Nonlinear Models. What do you do when you don t have a line? What do you do when you don t have a line? A Quadratic Adventure

Generalized Linear Models (GLZ)

High-Throughput Sequencing Course

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Categorical Predictor Variables

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Introduction to General and Generalized Linear Models

WU Weiterbildung. Linear Mixed Models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Model Assumptions; Predicting Heterogeneity of Variance

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

12 Generalized linear models

Generalized Linear Models 1

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Chapter 1. Modeling Basics

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Generalized linear models

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

GLM I An Introduction to Generalized Linear Models

Generalized Linear Models for Non-Normal Data

M. H. Gonçalves 1 M. S. Cabral 2 A. Azzalini 3

0.1 gamma.mixed: Mixed effects gamma regression

Generalized Linear Models I

Modeling Overdispersion

On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation

Package dispmod. March 17, 2018

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents

Correlation in Linear Regression

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

For Bonnie and Jesse (again)

Stat 579: Generalized Linear Models and Extensions

mgcv: GAMs in R Simon Wood Mathematical Sciences, University of Bath, U.K.

Title. Description. Remarks and examples. stata.com. stata.com. methods and formulas for gsem Methods and formulas for gsem

STAT 526 Advanced Statistical Methodology

Generalized Linear Models Introduction

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Semiparametric Generalized Linear Models

9 Generalized Linear Models

Chapter 4 Multi-factor Treatment Designs with Multiple Error Terms 93

Non-Gaussian Response Variables

Analysis of means: Examples using package ANOM

36-463/663: Multilevel & Hierarchical Models

Longitudinal Data Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

Stat 5101 Lecture Notes

Draft of an article prepared for the Encyclopedia of Social Science Research Methods, Sage Publications. Copyright by John Fox 2002

Chapter 22: Log-linear regression for Poisson counts

Contents. Part I: Fundamentals of Bayesian Inference 1

Multivariate Statistics in Ecology and Quantitative Genetics Summary

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

The linear mixed model: introduction and the basic model

Statistics Boot Camp. Dr. Stephanie Lane Institute for Defense Analyses DATAWorks 2018

Sample Size and Power Considerations for Longitudinal Studies

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

A Practitioner s Guide to Generalized Linear Models

Machine Learning Linear Classification. Prof. Matteo Matteucci

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Poisson regression: Further topics

Experimental Design and Data Analysis for Biologists

Time-Series Regression and Generalized Least Squares in R*

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Approaches to Modeling Menstrual Cycle Function

Package gma. September 19, 2017

SAS/STAT 15.1 User s Guide Introduction to Mixed Modeling Procedures

Multinomial Logistic Regression Models

Introduction to mtm: An R Package for Marginalized Transition Models

12 Modelling Binomial Response Data

Faculty of Health Sciences. Regression models. Counts, Poisson regression, Lene Theil Skovgaard. Dept. of Biostatistics

Linear Regression With Special Variables

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Introduction to GSEM in Stata

Introduction to General and Generalized Linear Models

Part 8: GLMs and Hierarchical LMs and GLMs

Transcription:

Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear Models in R Topics To be covered as time permits: Part 1 Part 2 Multiple linear regression Factors and dummy regression models Overview of the lm() function The structure of generalized linear models (GLMs) in R; the glm() function GLMs for binary/binomial data and count data Mixed-effects models for hierarchical and longitudinal data Visualizing statistical models Tests and confidence intervals for coefficients Diagnostics for linear and generalized linear models John Fox (McMaster University) Statistical Models in R ICPSR 2018 2 / 19

Linear Models in R Arguments of the lm function lm(formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset,...) formula Expression Interpretation Example A + B include both A and B income + education A - B exclude B from A a*b*d - a:b:d A:B all interactions of A and B type:education A*B A + B + A:B type*education B %in% A B nested within A education %in% type A/B A + B %in% A type/education A^k effects crossed to order k (a + b + d)^2 John Fox (McMaster University) Statistical Models in R ICPSR 2018 3 / 19 Linear Models in R Arguments of the lm() function data: A data frame containing the data for the model. subset: a logical vector: subset = sex == "F" a numeric vector of observation indices: subset = 1:100 a negative numeric vector with observations to be omitted: subset = -c(6, 16) weights: for weighted-least-squares regression na.action: name of a function to handle missing data; default given by the na.action option, initially "na.omit" method, model, x, y, qr, singular.ok: technical arguments contrasts: specify list of contrasts for factors; e.g., contrasts=list(partner.status=contr.sum, fcategory=contr.poly)) offset: term added to the right-hand-side of the model with a fixed coefficient of 1. John Fox (McMaster University) Statistical Models in R ICPSR 2018 4 / 19

Generalized Linear Models in R Review of the Structure of GLMs A generalized linear model consists of three components: 1 A random component, specifying the conditional distribution of the response variable, y i, given the predictors. Traditionally, the random component is an exponential family the normal (Gaussian), binomial, Poisson, gamma, or inverse-gaussian. 2 A linear function of the regressors, called the linear predictor, η i = α + β 1 x i1 + + β k x ik on which the expected value µ i of y i depends. 3 A link function g(µ i ) = η i, which transforms the expectation of the response to the linear predictor. The inverse of the link function is called the mean function: g 1 (η i ) = µ i. John Fox (McMaster University) Statistical Models in R ICPSR 2018 5 / 19 Generalized Linear Models in R Review of the Structure of GLMs In the following table, the logit, probit and complementary log-log links are for binomial or binary data: Link η i = g(µ i ) µ i = g 1 (η i ) identity µ i η i log log e µ i e η i inverse inverse-square square-root µ i 1 ηi 1 µ i 2 η 1/2 i µi ηi 2 µ i 1 logit log e 1 µ i 1 + e η i probit Φ(µ i ) Φ 1 (η i ) complementary log-log log e [ log e (1 µ i )] 1 exp[ exp(η i )] John Fox (McMaster University) Statistical Models in R ICPSR 2018 6 / 19

Generalized Linear Models in R Implementation of GLMs in R Generalized linear models are fit with the glm() function. Most of the arguments of glm() are similar to those of lm(): The response variable and regressors are given in a model formula. data, subset, and na.action arguments determine the data on which the model is fit. The additional family argument is used to specify a family-generator function, which may take other arguments, such as a link function. John Fox (McMaster University) Statistical Models in R ICPSR 2018 7 / 19 Generalized Linear Models in R Implementation of GLMs in R The following table gives family generators and default links: Family Default Link Range of y i V (y i η i ) gaussian identity (, + ) φ binomial logit 0, 1,..., n i n i µ i (1 µ i ) poisson log 0, 1, 2,... µ i Gamma inverse (0, ) inverse.gaussian 1/mu^2 (0, ) φµ 2 i φµ 3 i For distributions in the exponential families, the variance is a function of the mean and a dispersion parameter φ (fixed to 1 for the binomial and Poisson distributions). John Fox (McMaster University) Statistical Models in R ICPSR 2018 8 / 19

Generalized Linear Models in R Implementation of GLMs in R The following table shows the links available for each family in R, with the default links as : link family identity inverse sqrt 1/mu^2 gaussian binomial poisson Gamma inverse.gaussian quasi quasibinomial quasipoisson John Fox (McMaster University) Statistical Models in R ICPSR 2018 9 / 19 Generalized Linear Models in R Implementation of GLMs in R link family log logit probit cloglog gaussian binomial poisson Gamma inverse.gaussian quasi quasibinomial quasipoisson The quasi, quasibinomial, and quasipoisson family generators do not correspond to exponential families. John Fox (McMaster University) Statistical Models in R ICPSR 2018 10 / 19

Generalized Linear Models in R GLMs for Binary/Binomial and Count Data The response for a binomial GLM may be specified in several forms: For binary data, the response may be a variable or an R expression that evaluates to 0 s ( failure ) and 1 s ( success ). a logical variable or expression (with TRUE representing success, and FALSE failure). a factor (in which case the first category is taken to represent failure and the others success). For binomial data, the response may be a two-column matrix, with the first column giving the count of successes and the second the count of failures for each binomial observation. a vector giving the proportion of successes, while the binomial denominators (total counts or numbers of trials) are given by the weights argument to glm. John Fox (McMaster University) Statistical Models in R ICPSR 2018 11 / 19 Generalized Linear Models in R GLMs for Binary/Binomial and Count Data Poisson generalized linear models are commonly used when the response variable is a count (Poisson regression) and for modeling associations in contingency tables (loglinear models). The two applications are formally equivalent. Poisson GLMs are fit in R using the poisson family generator with glm(). Overdispersed binomial and Poisson models may be fit via the quasibinomial and quasipoisson families. The glm.nb() function in the MASS package fits negative-binomial GLMs to count data. John Fox (McMaster University) Statistical Models in R ICPSR 2018 12 / 19

The Linear Mixed-Effects Model The Laird-Ware form of the linear mixed model: y ij = β 1 + β 2 x 2ij + + β p x pij + b 1i z 1ij + + b qi z qij + ε ij b ki N(0, ψ 2 k), Cov(b ki, b k i) = ψ kk b ki, b k i are independent for i = i ε ij N(0, σ 2 λ ijj ), Cov(ε ij, ε ij ) = σ 2 λ ijj ε ij, ε i j are independent for i = i John Fox (McMaster University) Statistical Models in R ICPSR 2018 13 / 19 The Linear Mixed-Effects Model where: y ij is the value of the response variable for the jth of n i observations in the ith of M groups or clusters. β 1, β 2,..., β p are the fixed-effect coefficients, which are identical for all groups. x 2ij,..., x pij are the fixed-effect regressors for observation j in group i; there is also implicitly a constant regressor, x 1ij = 1. b 1i,..., b qi are the random-effect coefficients for group i, assumed to be multivariately normally distributed, independent of the random effects of other groups. The random effects, therefore, vary by group. The b ik are thought of as random variables, not as parameters, and are similar in this respect to the errors ε ij. z 1ij,..., z qij are the random-effect regressors. The z s are almost always a subset of the x s (and may include all of the x s). When there is a random intercept term, z 1ij = 1. John Fox (McMaster University) Statistical Models in R ICPSR 2018 14 / 19

The Linear Mixed-Effects Model and: ψk 2 are the variances and ψ kk the covariances among the random effects, assumed to be constant across groups. In some applications, the ψ s are parametrized in terms of a smaller number of fundamental parameters. ε ij is the error for observation j in group i. The errors for group i are assumed to be multivariately normally distributed, and independent of errors in other groups. σ 2 λ ijj are the covariances between errors in group i. Generally, the λ ijj are parametrized in terms of a few basic parameters, and their specific form depends upon context. When observations are sampled independently within groups and are assumed to have constant error variance (as is typical in hierarchical models), λ ijj = 1, λ ijj = 0 (for j = j ), and thus the only free parameter to estimate is the common error variance, σ 2. If the observations in a group represent longitudinal data on a single individual, then the structure of the λ s may be specified to capture serial (i.e., over-time) dependencies among the errors. John Fox (McMaster University) Statistical Models in R ICPSR 2018 15 / 19 Fitting Mixed Models in R with the nlme and lme4 packages In the nlme package (Pinheiro, Bates, DebRoy, and Sarkar): lme(): linear mixed-effects models with nested random effects; can model serially correlated errors. nlme(): nonlinear mixed-effects models. In the lme4 package (Bates, Maechler, Bolker, and Walker): lmer(): linear mixed-effects models with nested or crossed random effects; no facility (yet) for serially correlated errors. glmer(): generalized-linear mixed-effects models. There are also Bayesian approaches to modeling hierarchical and longitudinal data that offer certain advantages; see in particular the rstan package that links R to the state-of-the-art STAN software for Bayesian modeling. John Fox (McMaster University) Statistical Models in R ICPSR 2018 16 / 19

A Mixed Model for the Exercise Data Longitudinal Model A level-1 model specifying a linear growth curve for log exercise for each subject: log -exercise ij = α 0i + α 1i (age ij 8) + ε ij Our interest in detecting differences in exercise histories between subjects and controls suggests the level-2 model α 0i = γ 00 + γ 01 group i + ω 0i α 1i = γ 10 + γ 11 group i + ω 1i where group is a dummy variable coded 1 for subjects and 0 for controls. John Fox (McMaster University) Statistical Models in R ICPSR 2018 17 / 19 A Mixed Model for the Exercise Data Laird-Ware form of the Model Substituting the level-2 model into the level-1 model produces log -exercise ij = (γ 00 + γ 01 group i + ω 0i ) in Laird-Ware form, + (γ 10 + γ 11 group i + ω 1i )(age ij 8) + ε ij = γ 00 + γ 01 group i + γ 10 (age ij 8) + γ 11 group i (age ij 8) + ω 0i + ω 1i (age ij 8) + ε ij Y ij = β 1 + β 2 x 2ij + β 3 x 3ij + β 4 x 4ij + δ 1i + δ 2i z 2ij + ε ij Continuous first-order autoregressive process for the errors: Cor(ε it, ε i,t+s ) = ρ(s) = φ s where the time-interval between observations, s, need not be an integer. John Fox (McMaster University) Statistical Models in R ICPSR 2018 18 / 19

A Mixed Model for the Exercise Data Specifying the Model in lme Using lme() in the nlme package: lme(log.exercise ~ I(age - 8)*group, random = ~ I(age - 8) subject, correlation = corcar1(form = ~ age subject) data=blackmoor) John Fox (McMaster University) Statistical Models in R ICPSR 2018 19 / 19