Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Similar documents
Sections 2.3, 2.4. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis 1 / 21

Chapter 11: Analysis of matched pairs

Chapter 4: Generalized Linear Models-II

Chapter 2: Describing Contingency Tables - II

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Chapter 5: Logistic Regression-I

Sections 3.4, 3.5. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Sections 4.1, 4.2, 4.3

Linear Regression Models P8111

Generalized Linear Models for Non-Normal Data

Longitudinal Modeling with Logistic Regression

Stat 5101 Lecture Notes

Multinomial Logistic Regression Models

Generalized Models: Part 1

Stat 642, Lecture notes for 04/12/05 96

STAT 705 Generalized linear mixed models

Categorical data analysis Chapter 5

Chapter 1. Modeling Basics

Chapter 11: Models for Matched Pairs

Introduction to Generalized Models

Poisson regression: Further topics

Section Poisson Regression

Generalized linear models

Semiparametric Generalized Linear Models

STAT 705: Analysis of Contingency Tables

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

An Overview of Methods in the Analysis of Dependent Ordered Categorical Data: Assumptions and Implications

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Ronald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California

Classification. Chapter Introduction. 6.2 The Bayes classifier

9 Generalized Linear Models

Generalized Linear Models

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

Generalized Linear Modeling - Logistic Regression

11. Generalized Linear Models: An Introduction

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Figure 36: Respiratory infection versus time for the first 49 children.

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Exam Applied Statistical Regression. Good Luck!

Outline of GLMs. Definitions

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

More Accurately Analyze Complex Relationships

36-463/663: Multilevel & Hierarchical Models

Testing Independence

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

STAT 526 Advanced Statistical Methodology

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Generalized Linear Models: An Introduction

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Describing Contingency tables

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Generalized Linear Models (GLZ)

2 Describing Contingency Tables

UNIVERSITY OF TORONTO Faculty of Arts and Science

Simple logistic regression

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Chapter 4: Generalized Linear Models-I

LOGISTIC REGRESSION Joseph M. Hilbe

CHAPTER 1: BINARY LOGIT MODEL

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

Poisson regression 1/15

Lecture 8: Summary Measures

,..., θ(2),..., θ(n)

Generalized, Linear, and Mixed Models

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Chapter 14 Logistic and Poisson Regressions

BIOS 625 Fall 2015 Homework Set 3 Solutions

Model Estimation Example

The material for categorical data follows Agresti closely.

Multivariate Survival Analysis

More Statistics tutorial at Logistic Regression and the new:

Homework 1 Solutions

STAT5044: Regression and Anova

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek

COMPLEMENTARY LOG-LOG MODEL

Generalized linear mixed models for biologists

Statistical Methods in Clinical Trials Categorical Data

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Index. Regression Models for Time Series Analysis. Benjamin Kedem, Konstantinos Fokianos Copyright John Wiley & Sons, Inc. ISBN.

High-Throughput Sequencing Course

Lecture 25: Models for Matched Pairs

Notes for week 4 (part 2)

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Matched Pair Data. Stat 557 Heike Hofmann

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Modeling the Mean: Response Profiles v. Parametric Curves

Statistical. Psychology

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Ling 289 Contingency Table Statistics

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

Generalized logit models for nominal multinomial responses. Local odds ratios

Transcription:

Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22

Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial, multinomial. Basic ideas on MLE theory, CI s and testing. Pneumonia in calves example. You should know basic MLE/CLT theory as applied to the binomial distribution, and be able to construct simple Wald, LRT, and score tests. You should be able to perform a simple hypothesis test and form a CI from an estimate ˆβ j and associated standard error se( ˆβ j ). Recall z 0.025 = 1.96. 2 / 22

Chapter 2: I J tables 2 2 tables: OR θ. X Y θ = 1. Differences in proportions, relative risk. Estimability of odds ratio from case/control data interpretation flips. Types of sampling: multinomial, product multinomial, Poisson. 2 2 K tables. Simpson s paradox: marginal association has different direction than conditional association. Homogeneous association θ XY (1) = = θ XY (K), conditional independence X Y Z. Conditional independence does not imply marginal independence. Death penalty example, infection cream across 8 clinics example. On to I J tables. Ordinal trends: γ measure of concordance and polychoric correlation ρ P. 3 / 22

Chapter 3: Estimation, testing in I J tables Estimation: OR, RR, difference in proportions for 2 2 tables. Testing independence: Pearson and LRT. Large sample χ 2 (I 1)(J 1) approximation. If reject H 0 : X Y find out why: residuals and partitioning χ 2. I J tables with ordinal outcomes. Focusing on a measure of association: ˆγ, polychoric correlation ˆρ, Pearson correlation based on replacing outcomes with scores ˆρ; all in PROC FREQ. Exact (incl. Fisher) tests of H 0 : X Y by conditioning on sufficient statistics (marginal totals). 4 / 22

Chapter 4: How does categorical response or counts change with predictors? GLMs GLMs: basic notation. Binomial and Poisson regression; identity and canonical links. Crab satellite data! Deviance G 2, saturated model. Negative binomial regression (just mentioned). Bit of GLM theory: moments, fitting procedures, residuals. Quasi-likelihood adds dispersion φ. MOM estimation from large sample theory, overdispersion inflate MLE SE s via ˆφ. SCALE=PEARSON or SCALE=DEVIANCE. 5 / 22

Chapter 5: logistic regression I Logistic regression with one predictor. Parameter estimates give odds ratios. Case/control (retrospective) studies don t change parameter estimates (only intercept estimate). More crab analyses. GOF for logistic regression. Grouped data versus ungrouped. Hosmer and Lemeshow. Categorical predictors; interactions; quadratic effects. Multiple predictors; type 3 tests. A bit on fitting. 6 / 22

Chapter 6: Logistic regression II Building models. Hierarchical models. Backwards elimination; stepwise procedures. AIC. Crab data (yet again). Diagnostics: residuals (Pearson and standardized Pearson r i ), Cook s distance-type influence statistic c i. Dfbeta ij. Logistic regression residuals LOESS smooth plot. Predictive ability assessment: ROC curve, CTABLE, default. 7 / 22

Chapter 6: Logistic regression II 2 2 K tables: CMH (Cochran-Mantel-Haenszel) versus logistic approach. Estimation of stratum (block) effects (useful for model checking in GLMM!). Testing X Y Z: additive versus interaction alternatives. Clinical trial data on infection cream. Additive = homogeneous association: one overall treatment effect. Finite ˆβ. Sample size and power in study design. 8 / 22

Chapter 7: Logistic regression III, adding flexibility Alternate links: probit, complimentary log-log, Cauchy. Left out: nonparametric estimation of link. Small sample testing of β j in logistic regression. Bayesian approach works for small samples. Use of Jeffreys prior asmyptotically (FIRTH in LOGISTIC) and for small samples (BAYES COEFPRIOR=JEFFREYS in GENMOD) Generalized additive models. 9 / 22

Chapter 8: extending the logistic regression model to nominal and ordinal multinomial outcomes Baseline-category logit models for nominal multinomial response. Alligator food!!! Know how to write down model and obtain probabilities. Cumulative logit (proportional odds) models for ordinal multinomial response. log P(Y j x 1)/P(Y > j x 1 ) P(Y j x 2 )/P(Y > j x 2 ) = β (x 1 x 2 ) is log cumulative odds ratio. Latent variable motivation. Mental impairment example. Skipped discrete survival. Discrete choice model. 10 / 22

Chapter 11: matched pairs & marginal versus conditional modeling Marginal analysis of dependent proportions. Prime minister approval rating data!!! McNemar s test of marginal homogeneity for 2 2 table. Conditional logistic regression. Matched case/control studies: gives different conditional likelihood than unmatched case/control data. Introduces idea of subject-specific effects u i. In PROC LOGISTIC add a STRATA statement. From text: Conditional ML is also appropriate with retrospective sampling. In that case, bias can occur with a random effects approach because the clusters are not randomly sampled. 11 / 22

I I tables Marginal homogeneity (Stuart-Maxwell) in I I table. Symmetry. κ statistic for rater agreement. 12 / 22

Chapter 12: Marginal modeling of clustered data: GEE approach GEE approach to marginal modeling. Focuses on estimation of population averaged (marginal) effects. Working correlation structures: exchangeable, AR(1), etc. Sandwich estimator ĉov(ˆβ) uses estimated working covariance matrix as well as empirical estimate; requires proper specification of the mean E(Y ij ) = g 1 (x ijβ) to be valid (as most models do). Longitudinal mental depression data. Interaction of time and treatment. QIC. Markov transitional modeling for time series type Bernoulli data. 13 / 22

Chapter 13: Conditional modeling of clustered data: GLMMs GLMM used a lot, and widespread use of random effects and latent variables models in general. Basic idea: random effect u i induces positive correlation among repeated measurements in cluster i: (Y i1,..., Y ini ). Can represent latent, unmeasured covariates or predisposition toward the event being modeled. e.g. level of sleeplessness, tolerance for pain, clinic population effect. Only looked at univariate u i. Logistic-normal model. Marginal from conditional: P(Y ij = 1) = E(Y ij ) e cx ij β /(1 + e cx ij β ) where c = 1/ 1 + 0.6σ 2. 14 / 22

More on GLMMs Longitudinal mental depression example again. Differences in interpretation between GEE approach and GLMM. Clinical trials example again. Checking normality of random effects. 2 2 K tables where stratum effect u i modeled explicitly via random effects (homework problem). Testing H 0 : σ = 0 in logistic-normal model from fitting model with and without random effects. Is Wald test from table of coefficients okay here? Left out: nonparametric modeling of random effects u 1,..., u n. Diagnostics. Multilevel models with layers of random effects. Other correlation models, e.g. temporal, spatial. PQL approach (fast, easy, and inaccurate similar to GLIMMIX). 15 / 22

Chapters 9 and 10: log-linear models Back to tables, but higher order than I J, e.g. I J K L. Model the cell counts directly as outcome. Every model implies a conditional dependence structure. Shorthand [ABD][CBD] implies? Collapsibility theorem. Diagnostics: {r ijkl } and G 2. 16 / 22

Omitted or only briefly mentioned... Various models: quasi-symmetry, quasi-independence, Bradley-Terry model (Chapter 11); adjacent categories logits, etc... Additive models: various alternative fitting approaches, interaction surfaces, etc... Marginal approach to tables via MLE (12.1). Much, much more...we scratched the surface but covered a lot of ground. 17 / 22

You should be able to... Briefly describe in words what the polychoric correlation ˆρ, gamma statistic ˆγ, and Pearson statistic ˆρ (based on scores) measure. For what type of data are these measures valid? Show how odds ratio interpretation flips in 2 2 table. Be able to interpret a logistic regression model with numerous categorical predictors involving interactions. Be able to coherently interpret the residual and partitioning approaches to following up tests of independence in I J tables. Patterns of residual signs? Be able to describe in words what the quasi-likelihood approach to modeling overdispersion does for Poisson and binomial data. That is, what is var(y i ) modeled as under the actual (real) probability models versus the quasi-likelihood approaches? 18 / 22

You should be able to... Be able to interpret output for logistic regression models with logistic and identity links. Be able to interpret output for Poisson regression models with log and identity links. Have good working knowledge of what the deviance and Pearson GOF tests measure and when you can trust the p-values. Have an idea of what the Hosmer and Lemeshow GOF test measures. What are the null and alternative hypotheses in all of these tests? Have an idea of when asymptotic χ 2 tests for H 0 : X Y are valid. 19 / 22

You should be able to... Know these models: (a) logistic regression with continuous and categorial predictors, (b) baseline category logit for nominal and ordinal, (c) proportional odds for ordinal, (d) marginal and conditional (i.e. random effects) versions of logistic regression. Be able to obtain odds and probabilities, relative risks, et cetera from these models for any covariate combination. Marginal approaches (Chapters 11 and 12). Course focused more on GEE approach of Chapter 12. Understand what handful of working correlation structures imply about clusters of outcomes. Be able to interpret SAS output. Understand difference and interpretation between marginal and conditional approaches. 20 / 22

Fixed vs. random blocks... Conditional approaches (Chapters 11 and 13). Course focused more on maximum likelihood approach to fully specified model (u 1,..., u n iid N(0, σ 2 )). Correct test for H 0 : σ = 0. Interpretation and comparison to fixed effects analogue. I think of random effects as a sample from some large (theoretically infinite) population; if iid they imply exchangeability. Fixed effects are used if you do not have exchangeability (e.g. age groups), or there s only a few of them. Either way, these effects are usually of second interest (they imply blocks) to treatment or population effects. 21 / 22

We re done! Thanks for taking STAT 770 and wading through to the end. Thanks to Shiwen Shen for being an outstanding TA! Have a great break! 22 / 22