Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Similar documents
Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Generalized, Linear, and Mixed Models

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Chapter 4 Multi-factor Treatment Designs with Multiple Error Terms 93

Generalized Linear Models for Non-Normal Data

DESIGNING EXPERIMENTS AND ANALYZING DATA A Model Comparison Perspective

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

Experimental Design and Data Analysis for Biologists

Statistical Distribution Assumptions of General Linear Models

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Longitudinal and Panel Data: Analysis and Applications for the Social Sciences. Table of Contents

Econometric Analysis of Cross Section and Panel Data

Chapter 1 Statistical Inference

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

Generalized Linear Models (GLZ)

Generalized Models: Part 1

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Generalized Linear Models

Linear Regression Models P8111

Pattern Recognition and Machine Learning

HANDBOOK OF APPLICABLE MATHEMATICS

Generalized Linear Models for Count, Skewed, and If and How Much Outcomes

Generalized linear models

Chapter 1. Modeling Basics

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

AND NONLINEAR SCIENCE SERIES. Partial Differential. Equations with MATLAB. Matthew P. Coleman. CRC Press J Taylor & Francis Croup

Contents. Acknowledgments. xix

SAS/STAT 15.1 User s Guide Introduction to Mixed Modeling Procedures

NUMERICAL METHODS. lor CHEMICAL ENGINEERS. Using Excel', VBA, and MATLAB* VICTOR J. LAW. CRC Press. Taylor & Francis Group

Stat 5101 Lecture Notes

Linear Mixed Models: Methodology and Algorithms

Introduction to Generalized Models

Local Polynomial Modelling and Its Applications

FINITE-DIMENSIONAL LINEAR ALGEBRA

LOGISTIC REGRESSION Joseph M. Hilbe

SAS/STAT 13.1 User s Guide. Introduction to Mixed Modeling Procedures

Linear Algebra and Probability

WU Weiterbildung. Linear Mixed Models

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Response Surface Methodology

Statistical Methods in HYDROLOGY CHARLES T. HAAN. The Iowa State University Press / Ames

A Guide to Modern Econometric:

Poisson regression: Further topics

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Statistícal Methods for Spatial Data Analysis

Generalized Multilevel Models for Non-Normal Outcomes

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Outline. Mixed models in R using the lme4 package Part 5: Generalized linear mixed models. Parts of LMMs carried over to GLMMs

For Bonnie and Jesse (again)

Practical Statistics for the Analytical Scientist Table of Contents

Numerical Analysis for Statisticians

Introduction to. Process Control. Ahmet Palazoglu. Second Edition. Jose A. Romagnoli. CRC Press. Taylor & Francis Group. Taylor & Francis Group,

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

More Accurately Analyze Complex Relationships

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Outline of GLMs. Definitions

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

Review of CLDP 944: Multilevel Models for Longitudinal Data

INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS

Generalized Linear Models 1

Longitudinal Modeling with Logistic Regression

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

Contents. Part I: Fundamentals of Bayesian Inference 1

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

A strategy for modelling count data which may have extra zeros

Model Estimation Example

Christopher Dougherty London School of Economics and Political Science

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Finite Population Sampling and Inference

Multinomial Logistic Regression Models

Generalized linear mixed models for biologists

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Open Problems in Mixed Models

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

PRINCIPLES OF STATISTICAL INFERENCE

Longitudinal Data Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

Statistics 203: Introduction to Regression and Analysis of Variance Course review

DESIGN AND ANALYSIS OF EXPERIMENTS Third Edition

TABLE OF CONTENTS INTRODUCTION, APPROXIMATION & ERRORS 1. Chapter Introduction to numerical methods 1 Multiple-choice test 7 Problem set 9

Using Estimating Equations for Spatially Correlated A

Multilevel Methodology

Longitudinal Analysis. Michael L. Berbaum Institute for Health Research and Policy University of Illinois at Chicago

Introduction to Random Effects of Time and Model Estimation

INTRODUCTORY REGRESSION ANALYSIS

BIOMETRICS INFORMATION

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

An R # Statistic for Fixed Effects in the Linear Mixed Model and Extension to the GLMM

Monte-Carlo Methods and Stochastic Processes

Transcription:

Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup an informa business A CHAPMAN & HALL BOOK

Preface Acknowledgments xv xxv Part I The Big Picture 1. Modeling Basics 3 1.1 What Is a Model? 3 1.2 Two Model Forms: Model Equation and Probability Distribution 4 1.2.1 Twist Illustrating the Weakness of the Model Equation Form 5 1.3 Types of Model Effects 9 1.3.1 Extension of the Linear Regression Example to Illustrate an Important Distinction between Types of Model Effects 9 1.4 Writing Models in Matrix Form 12 1.4.1 Fixed-Effects-Only Models 13 1.4.2 Mixed Models: Models with Fixed and Random Effects 18 1.5 Summary: Essential Elements for a Complete Statement of the Model 20 Exercises 21 2. Design Matters 25 2.1 Introductory Ideas for Translating Design and Objectives into Models 25 2.1.1 Chapter Organization 26 2.2 Describing "Data Architecture" to Facilitate Model Specification 27 2.2.1 Every Data Set Has a "Plot Plan" 27 2.2.2 Terminology for Treatment and Design Structure 28 2.2.3 Nested and Cross-Classification: Alternative Ways of Organizing Treatment and Design Structure 29 2.3 From Plot Plan to Linear Predictor 31 2.3.1 Unit of Replication Approach 31 2.3.2 "What Would Fisher Do?" 32 2.3.2.1 Complication 33 2.3.2.2 Linear Predictor for Nested Schools 34 2.3.3 Matching the Objective 35 2.4 Distribution Matters 38 2.4.1 Model Effects: Fixed or Random 38 2.4.2 Response Variable Distribution 40 2.4.3 Fixed or Random?: Tough Calls 41 2.5 More Complex Example: Multiple Factors with Different Units of Replication 44 2.5.1 Variations on the Multifactor, Multisize Unit of Replication Theme 46

iii Exercises 49 b) Distributions 51 2.A Appendix A: Common Response Variable (y 2.B Appendix B: Communicating Your Model to Software or "How SAS PROC GLIMMIX 'Thinks'" 52 2.B.1 General Principles 52 3. Setting the Stage 65 3.1 Goals for Inference with Models: Overview 65 3.2 Basic Tools of Inference 68 3.2.1 Estimable Functions 68 3.2.2 Linear Combinations of Fixed and Random Effects: Predictable Functions 69 3.2.3 Three Issues for Inference 69 3.2.3.1 Model Scale vs. Data Scale 70 3.2.3.2 Inference Space 70 3.2.3.3 Inference Based on Conditional and Marginal Models 70 3.3 Issue I: Data Scale vs. Model Scale 71 3.3.1 Model Scale Estimation 73 3.3.2 Data Scale 74 3.4 Issue II: Inference Space 90 3.4.1 Broad Inference 91 3.4.2 Narrow Inference 95 3.4.2.1 GLIMMIX Implementation 96 3.5 Issue III: Conditional and Marginal Models 99 3.5.1 Normal Approximation 99 3.5.2 Binomial GLMM 101 3.5.3 Conditional and Marginal Distribution 102 3.5.4 Visualizing the Marginal p.d.f 104 3.5.5 What Do the Normal Approximation and the GLMM Estimate? 106 3.5.6 Gaussian Conditional and Marginal Models 107 3.5.7 Non-Gaussian Marginal vs. Conditional Model 109 3.5.8 One Last Aspect of the Conditional Model: The Role of "Residual" in Gaussian LMM vs. One-Parameter Non-Gaussian GLMM 112 3.6 Summary 115 Exercises 116 Part II Estimation and Inference Essentials 4. Estimation 121 4.1 Introduction 121 4.2 Essential Background 121 4.2.1 Exponential Family 122 4.2.1.1 Essential Terminology and Results 123 4.2.2 Maximum Likelihood Estimation 125 4.2.3 Newton-Raphson and Fisher Scoring 126 4.2.4 Quasi-Likelihood 127

ix 4.3 Fixed Effects Only 128 4.3.1 Relation to Least Squares Estimation 129 4.3.1.1 Pseudo-Likelihood for GLM 130 4.3.1.2 Gaussian Linear Models and Ordinary Least Squares 130 4.4 Gaussian Mixed Models 131 4.4.1 Mixed Model Equations for p and b 131 4.4.2 Relation to Least Squares 132 4.4.3 Unknown G and R: ML and REML Variance-Covariance Component Estimation 134 4.4.3.1 ANOVA Estimator 135 4.4.3.2 Maximum Likelihood 135 4.4.3.3 Restricted Maximum Likelihood 138 4.5 Generalized Linear Mixed Models 140 4.5.1 Pseudo-Likelihood for GLMM 141 4.5.2 Variance-Covariance Estimation with Pseudo-Likelihood 141 4.5.3 Integral Approximation: Laplace and Quadrature 142 4.6 Summary 145 Exercises 146 5. Inference, Part I: Model Effects 149 5.1 Introduction 149 5.2 Essential Background 149 5.2.1 Estimable and Predictable Functions 150 5.2.1.1 Estimability and GLMMs 151 5.2.2 Basics of Interval Estimates and Test Statistics 151 5.2.3 Approximate Distribution of Estimable and Predictable Functions... 152 5.2.3.1 Distribution of (3 in the LM with Known V 152 5.2.3.2 Distribution of the Quadratic Form Defined on p for the LM with Known V 152 5.2.3.3 LM with Unknown V 153 5.2.3.4 LM with Unknown V: Case 1 V = 5.2.3.5 LM with Unknown V: Case 2 All Covariance Components c2z 153 Must Be Estimated 154 5.2.3.6 GLM 156 5.2.3.7 GLM: Case 1 No Scale Parameter to Estimate 156 5.2.3.8 GLM: Case 2 Estimated Scale Parameter(s) 156 5.2.3.9 Mixed Models 157 5.3 Approaches to Testing 159 5.3.1 Likelihood Ratio and Deviance 160 5.3.2 Wald and Approximate F-statistics 161 5.3.2.1 A Special Case: The Gaussian LM with V=Ici2 161 5.3.3 Multiple Effect Models and Order of Testing 162 5.4 Inference Model-Based Statistics Using 165 of Freedom 166 5.4.1 Naive Statistics and Degrees 5.4.2 Satterthwaite Degrees of Freedom Approximation 167 5.4.3 Bias Correction for Model-Based Standard Errors and Test Statistics 168

X 5.5 Inference Using Empirical Standard Error 170 5.5.1 Sandwich (a.k.a Robust or Empirical) Estimator 170 5.5.2 Bias Correction for Sandwich Estimators 171 5.6 Summary of Main Ideas and General Guidelines for Implementation 173 Exercises 174 6. Inference, Part II: Covariance Components 179 6.1 Introduction 179 6.2 Formal Testing of Covariance Components 179 6.2.1 ANOVA-Based Tests for Variance-Component-Only LMMs 180 6.2.2 Wald Statistics for Covariance Component Testing and Why They Should Not Be Used 181 6.2.3 Likelihood Ratio Tests for Covariance Components 182 of Variance 183 6.2.3.1 One-Way ANOVA: Test for Homogeneity 6.2.3.2 Repeated Measures Example: Selecting a Parsimonious Covariance Model 183 6.2.4 Consequences of PL versus Integral Approximation for GLMMs 186 6.2.4.1 R-Side or Working Correlation Model 187 6.2.4.2 "What Would Fisher Do?" The G-Side Approach 188 6.2.4.3 R-Side versus G-Side: Consequences for Covariance Model Selection 189 6.3 Fit Statistics to Compare Covariance Models 191 6.3.1 AIC and AICC 191 6.3.2 BIC 192 6.3.3 Application to Comparison of Covariance Models 192 6.4 Interval Estimation 194 6.4.1 Wald Approach Based on the x2 195 6.4.2 Likelihood-Based Approach 195 6.5 Summary 195 Exercises 196 Part III Working with GLMMs 7. Treatment and Explanatory Variable Structure 203 7.1 Types of Treatment Structures 203 7.2 Types of Estimable Functions 204 7.2.1 Relation to Classical ANOVA Reduction Sums of Squares 204 7.2.2 How Do We Know What We Are Testing? 205 7.2.3 How to Decide What to Test Rather than Letting It Be Decided for Us 206 7.2.4 Multiplicity 206 7.3 Multiple Factor Models: Overview 209 7.4 Multifactor Models with All Factors Qualitative 211 7.4.1 Review of Options 212 7.4.2 Tools for Qualitative Factorial Inference: "SLICE," "SLICEDIFF," and Other Tools 213 7.4.3 Multiplicity Adjustments 216

xi 7.5 Multifactor: Some Factors Qualitative, Some Factors Quantitative 219 7.5.1 Generic Form of the Linear Predictor 219 7.5.2 Many Uses of the Generic Linear Predictor 220 7.5.2.1 Latent Growth Curve Models 220 7.5.2.2 Analysis of Covariance 220 7.5.2.3 Factorial Treatment Design 226 7.6 Multifactor: All Factors Quantitative 229 7.6.1 Second-Order Polynomial, a.k.a. Classical Response Surface Linear Predictors 229 7.6.2 Other Quantitative-by-Quantitative Models 231 7.6.2.1 Nonlinear Mean Models 231 7.6.2.2 Spline or Segmented Regression 235 7.7 Summary 236 8. Multilevel Models 239 8.1 Types of Design Structure: Single- and Multilevel Models Defined 239 8.2 Types of Multilevel Models and How They Arise 240 8.2.1 Units of Replication: Not Just in Designed Experiments 241 8.2.2 "What Would Fisher Do?" Revisited: Topographical and Treatment Component 242 8.3 Role of Blocking in Multilevel Models 245 8.3.1 "Block Effects Fixed vs. Block Effects Random" Revisited 246 8.3.2 Fixed Blocks, Multilevel Designs, and Spurious Nonestimability 248 8.4 Working with Multilevel Designs 250 8.4.1 Examples of Multilevel Structures 250 8.4.2 Multifactor Treatment and Multilevel Design Structures: How They Fit Together 259 8.5 Marginal and Conditional Multilevel Models 264 8.5.1 Gaussian Data 265 8.5.2 Non-Gaussian Models 267 8.6 Summary 267 Exercises 268 9. Best Linear Unbiased Prediction 271 9.1 Review of Estimable and Predictable Functions 271 9.2 BLUP in Random-Effects-Only Models 272 9.2.1 One-Way Random Effects Model 273 9.2.2 Two-Way Random Effects Nested Model 276 9.2.2.1 Analysis: Balanced Case 278 9.2.2.2 Unbalanced Case 280 9.3 Gaussian Data with Fixed and Random Effects 284 9.3.1 Mixed-Model Analysis with BLUP to Modify the Inference Space 285 9.3.2 Relationship 9.4 Advanced Applications with Complex between BLUP and Fixed Effect Estimators 288 Z Matrices 292 9.5 Summary 296 10. Rates and Proportions 299 10.1 Types of Rate and Proportion Data 299

xii 10.2 Discrete Proportions: Binary and Binomial Data 299 10.2.1 Pseudo-Likelihood or Integral Approximation 300 10.2.2 Example of Explanatory-Response Models 303 10.2.3 Models for Contingency Tables 313 10.3 Alternative Link Functions for Binomial Data 317 10.3.1 Role of "Residual" in Binomial Models 322 10.4 Continuous Proportions 326 10.4.1 Beta Distribution 326 10.4.2 Continuous Proportion Example Using the Beta Distribution 327 10.5 Summary 330 Exercises 331 11. Counts 335 11.1 Introduction 335 11.1.1 Count Data and the Poisson Distribution 335 11.1.2 Example Comparing Pre-GLM ANOVA-Based Analysis to Poisson GLM 336 11.2 Overdispersion in Count Data 340 11.2.1 Overdispersion Defined 340 11.2.2 Detecting Overdispersion 342 11.2.3 Strategies 346 11.2.3.1 Scale Parameter 347 11.2.3.2 "What Would Fisher Do?" Revisited 348 11.2.3.3 Alternative Distributions 350 11.3 More on Alternative Distributions 352 11.3.1 Negative Binomial 352 11.3.2 Generalized Poisson 354 11.4 Conditional and Marginal 356 11.5 Too Many Zeroes 361 11.5.1 Formal Description of Zero-Inflated and Hurdle Models 362 11.5.2 GLMM for Poisson and Negative Binomial Zero-Inflated and Hurdle Models 362 11.6 Summary 369 Exercises 369 12. Time-to-Event Data 375 12.1 Introduction: Probability Concepts for Time-to-Event Data 375 12.2 Gamma GLMMs 376 12.2.1 Hierarchical (Split-Plot) Gamma GLMM 377 12.2.1.1 What Happens If We Fit This Model Using a Gaussian LMM? 378 12.2.1.2 Gamma Generalized Linear Model 379 12.2.2 Response Surface for Time-to-Event: An Example Using the Box-Behnken Design 381 12.2.2.1 Gaussian LMM 382 12.2.2.2 Gamma GLMM 384 12.3 GLMMs and Survival Analysis 386 12.3.1 Basic Concepts and Terminology 387

xiii 12.3.2 Exponential Survival GLMM for Uncensored Data 388 12.3.3 Exponential Survival GLMM for Censored Data 391 12.4 Summary 394 13. Multinomial Data 397 13.1 Overview 397 13.2 Multinomial Data with Ordered Categories 398 13.3 Nominal Categories: Generalized Logit Models 404 13.4 Model Comparison 408 13.5 Summary 410 Exercises 410 14. Correlated Errors, Part I: Repeated Measures 413 14.1 Overview 413 14.1.1 What Are Repeated Measures/Longitudinal Data 413 14.1.2 Pre-GLMM Methods 414 14.2 Gaussian Data: Correlation and Covariance Models for LMMs 417 14.3 Covariance Model Selection, 418 14.3.1 Why Does It Matter? 419 14.3.2 Covariance Model Selection Methods 420 14.4 Non-Gaussian Case 429 14.4.1 GEE-Type Models 429 14.4.2 GLMMs 431 14.5 Issues for Non-Gaussian Repeated Measures 434 14.5.1 How Do Correlated Errors Arise? Deciding What We Are Modeling 434 14.5.2 Covariance Model Selection and Non-Gaussian Repeated Measures 435 14.5.3 Inference Space, Standard Errors, and Test Statistics 435 14.6 Summary 437 Exercises 438 15. Correlated Errors, Part II: Spatial Variability 443 15.1 Overview 443 15.1.1 Types of Spatial Variability 443 15.1.2 Pre-GLMM Methods 447 15.1.2.1 Nearest-Neighbor Adjustment 447 15.1.2.2 Blocking 448 15.2 Gaussian Case with Covariance Model 448 15.2.1 Covariance Model Selection 449 15.2.2 Impact of Spatial Variability on Inference 452 15.3 Spatial Covariance Modeling by Smoothing Spline 453 15.4 Non-Gaussian Case 456 15.4.1 Randomized Complete Block Model 457 15.4.2 Incomplete Block Model 457 15.4.3 GLIMMIX Statements 457 15.4.3.1 RCB 457 15.4.3.2 Lattice Incomplete Blocks 457

xiv 15.4.4 GEE-Type "R-Side" Spatial Correlation Model 458 15.4.5 "G-Side" Spatial Correlation Model 459 15.4.5.1 G-Side Spatial Radial Smoothing Model 460 15.4.5.2 Relevant Output 460 15.5 Summary 464 Exercise 465 16. Power, Sample Size, and Planning 467 16.1 Basics of GLMM-Based Power and Precision Analysis 467 16.1.1 Essential GLMM Theory for Power and Precision Analysis 468 16.1.2 Using SAS PROC GLIMMIX to Implement a Power Analysis 469 16.2 Gaussian Example 474 16.3 Power for Binomial GLMMs 479 16.4 GLMM-Based Power Analysis for Count Data 484 16.5 Power and Planning for Repeated Measures 487 16.5.1 Straightforward Cases: Gaussian and One-Parameter Exponential Family 488 16.5.2 On the Frontier: The Two-Parameter Exponential Family 490 16.6 Summary 492 Exercises 494 Appendices: Essential Matrix Operations and Results 499 Appendix A: Matrix Operations 501 Appendix B: Distribution Theory for Matrices 509 References 513 Index 519