Matched Pair Data. Stat 557 Heike Hofmann

Similar documents
Logistic Regressions. Stat 430

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

R Hints for Chapter 10

Proportional Odds Logistic Regression. stat 557 Heike Hofmann

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Linear Regression Models P8111

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Ch 6: Multicategory Logit Models

Generalized Linear Models. stat 557 Heike Hofmann

Interactions in Logistic Regression

Generalized linear models

Logistic Regression - problem 6.14

9 Generalized Linear Models

UNIVERSITY OF TORONTO Faculty of Arts and Science

BMI 541/699 Lecture 22

Regression Methods for Survey Data

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

STA102 Class Notes Chapter Logistic Regression

Modeling Overdispersion

Introduction to General and Generalized Linear Models

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Exam Applied Statistical Regression. Good Luck!

Log-linear Models for Contingency Tables

Classification. Chapter Introduction. 6.2 The Bayes classifier

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Duration of Unemployment - Analysis of Deviance Table for Nested Models

Exercise 5.4 Solution

Logistic Regression 21/05

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Week 7 Multiple factors. Ch , Some miscellaneous parts

On the Inference of the Logistic Regression Model

STA 450/4000 S: January

Chapter 11: Analysis of matched pairs

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

R Output for Linear Models using functions lm(), gls() & glm()

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Analysing categorical data using logit models

Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

Age 55 (x = 1) Age < 55 (x = 0)

ssh tap sas913, sas

Checking the Poisson assumption in the Poisson generalized linear model

STAC51: Categorical data Analysis

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Leftovers. Morris. University Farm. University Farm. Morris. yield

Sample solutions. Stat 8051 Homework 8

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Poisson Regression. The Training Data

12 Modelling Binomial Response Data

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Linear Regression With Special Variables

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Generalised linear models. Response variable can take a number of different formats

Introduction to the Analysis of Tabular Data

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

β j = coefficient of x j in the model; β = ( β1, β2,

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

MSH3 Generalized linear model

STAT5044: Regression and Anova

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Non-Gaussian Response Variables

Various Issues in Fitting Contingency Tables

Introduction to logistic regression

R code and output of examples in text. Contents. De Jong and Heller GLMs for Insurance Data R code and output. 1 Poisson regression 2

Hypothesis testing, part 2. With some material from Howard Seltman, Blase Ur, Bilge Mutlu, Vibha Sazawal

Logistic & Tobit Regression

STAT 7030: Categorical Data Analysis

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

MSH3 Generalized linear model

Longitudinal Modeling with Logistic Regression

Statistics 203 Introduction to Regression Models and ANOVA Practice Exam

Simple logistic regression

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

High-Throughput Sequencing Course

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Multinomial Logistic Regression Models

Cohen s s Kappa and Log-linear Models

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Random Independent Variables

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Generalized Linear Models for Non-Normal Data

Neural networks (not in book)

Lecture 8 Stat D. Gillen

1. Logistic Regression, One Predictor 2. Inference: Estimating the Parameters 3. Multiple Logistic Regression 4. AIC and BIC in Logistic Regression

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Transcription:

Matched Pair Data Stat 557 Heike Hofmann

Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic regression

Matched Pair Data 2nd Rating Assumptions 1st Rating Approve Disapprove Approve Disapprove 794 150 86 570 Diagonal heavily loaded Association usually strongly positive (most people don t change their opinion) Distinguish between movers & stayers

Marginal Homogeneity logit P(Yt = 1 xt ) = α + β xt xt is dummy variable for time points x1 = 0, x2 = 1 Then β is log odds ratio based on overall population

RAND -American Life Panel https://mmicdata.rand.org/alp/?page=election#electionforecast Panel of 3500 US citizens above 18 tracked since July Data isn t published on individual basis, but from change and overall margins we can (almost) work out change pattern 1 week after 1st debate Obama Romney before 1st debate Obama Romney 1585 121 162 1432 3300

> mswitch <- glm(i(candidate=="obama")~time, data=votem, family=binomial(), weight=votes) > summary(mswitch) Call: glm(formula = I(candidate == "Obama") ~ time, family = binomial(), data = votem, weights = votes) Deviance Residuals: Min 1Q Median 3Q Max -46.462-22.929-0.435 21.992 45.733 Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) 0.11771 0.03488 3.375 0.000738 *** timevote2-0.04981 0.04929-1.010 0.312299 --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 9135.4 on 7 degrees of freedom Residual deviance: 9134.3 on 6 degrees of freedom AIC: 9138.3 Number of Fisher Scoring iterations: 3

Subject Specific Model link P(Yit = 1) = αi + β xt xt is dummy variable for time points x1 = 0, x2 = 1 then αi = link P(Yi1 = 1) β = link P(Yi2 = 1) - link P(Yi1 = 1) painful to fit...

Marginal vs Subject- Specific Model Estimates for β is identical for marginal model and subject specific model in case of identity link are different for logit link marginal model: β = logit P(Y2 = 1 x2 ) - logit P(Y1 = 1 x1 ) subject specific, for all i: β = logit P(Yi2 = 1 x2 ) - logit P(Yi1 = 1 x1 )

Subject-Specific Model logit P(Yit = 1) = αi + β xt Assumptions generally: responses from different subjects independent (for all i) responses for different time-points independent

Subject-Specific Model logit P(Yit = 1) = αi + β xt Assumptions generally: responses from different subjects independent (for all i) responses for different time-points independent

Subject-Specific Model Violation of independence taken care of by model structure: Generally, αi >> β For large αi, probability of P(Yit = 1) is either close to 0 or close to 1 (largest dependence in the data) When αi is small, we have the most variability between responses of the same individual - i.e. least dependence. That s the records, on which estimation of β is based on.

Subject Specific Model link P(Yit = 1) = αi + β xt but: estimation αi of becomes problematic for large numbers of subjects idea: condition on sufficient statistic for αi leads to conditional (logistic) regression

Likelihood for αi

Fitting the Subject Specific Model Let Si = yi1+yi2, then Si in {0,1,2} Si are sufficient statistics for αi only values of 1 contribute to the estimation of β logit P(Yit = 1 Si = 1) = αi + β xt

Estimating β MLE for β is log n21/n12 standard deviation of estimate is then sqrt(1/n12 + 1/n21) Use clogit from the survival package to fit model

Navajo Indians 144 victims of myocardiac infarcts (MI cases) are matched with 144 control subjects (disease free) according to gender and age. All participants of the study are asked about whether they ever were diagnosed with diabetes: Controls Diabetes no Cases Diabetes no 9 16 37 82

> myo.ml <- clogit(mi ~ diabetes + strata(pair), data=t103) > summary(myo.ml) Call: coxph(formula = Surv(rep(1, 288L), MI) ~ diabetes + strata(pair), data = t103, method = "exact") n= 288 coef exp(coef) se(coef) z Pr(> z ) diabetes 0.8383 2.3125 0.2992 2.802 0.00508 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 exp(coef) exp(-coef) lower.95 upper.95 diabetes 2.312 0.4324 1.286 4.157 Rsquare= 0.029 (max possible= 0.5 ) Likelihood ratio test= 8.55 on 1 df, p=0.003449 Wald test = 7.85 on 1 df, p=0.005082 Score (logrank) test = 8.32 on 1 df, p=0.003919

Conditional Logistic Regression as GLM Model logit(p(y it = 1)) = α i + β 1 x 1it + β 2 x 2it +... + β p x pit Conditioned on one success: P(Y i1 =1, Y i2 =0 S i = 1) = P(Y i1 =0, Y i2 =1 S i = 1) = 1 1 + exp ((x i2 x i1 ) β) exp ((x i2 x i1 ) β) 1 + exp ((x i2 x i1 ) β)

Conditional Logistic Regression as GLM Rewrite Then Y = logit(p(y i 1 if Y i1 =0, Y i2 =1, 0 if Y i1 =1, Y i2 =0. and X i = X i2 X i1 for all i. = 1)) = β 1 x 1i + β 2 x 2i +... + β p x pi no intercept logistic regression

> table(ystar) ystar 1 144 > table(xstar) xstar -1 0 1 16 91 37 glm(formula = ystar ~ xstar - 1, family = binomial(logit)) Deviance Residuals: Min 1Q Median 3Q Max 0.8478 0.8478 1.1774 1.1774 1.5477 Coefficients: Estimate Std. Error z value Pr(> z ) xstar 0.8383 0.2992 2.802 0.00508 ** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 199.63 on 144 degrees of freedom Residual deviance: 191.07 on 143 degrees of freedom AIC: 193.07 Number of Fisher Scoring iterations: 4

Matched Pairs: Ordinal Y1 and Y2 are ordinal variables with J>2 categories POLR model (marginal): logit(p(y t j)) = α j + βx t cumulative odds ratios are constant for all j: log θ j = log P(Y 2 j)/p(y 2 > j) P(Y 1 j)/p(y 1 > j) = β(x 2 x 1 )=β,

Marginal Homogeneity Marginal homogeneity is equivalent to zero log odds ratio: β =0 logit(p(y 1 j)) = logit(p(y 2 j)) j P(Y 1 j) =P(Y 2 j) j π j+ = π +j Model Fit based on 1+ (J-1) parameters Model has J-2 degrees of freedom j Overall we have 2(J-1) degrees of freedom

Matched Pairs: Nominal Baseline Logistic Regression log P(Yt = j)/p(yt = J) = alphaj + betaj xt Then betaj=0 is test for marginal homogeneity POLR model (marginal):

Models for Square Contingency Tables For nominal Y with J 3 categories, use J as baseline Baseline Logistic Regression log P(Yt = j)/p(yt = J) = αj + βj xt Then βj=0 is test for marginal homogeneity

Migration Data 95% of the data is on the diagonal. Residence in 1985 Residence 80 NE MW S W Total NE 11607 100 366 124 12197 MW 87 13677 515 302 14581 S 172 225 17819 270 18486 W 63 176 286 10192 10717 Total 11929 14178 18986 10888 55981 95% of data is on diagonal marginal homogeneity seems given, is data even symmetric?

Symmetry Model H0: πab = πba for all a,b as logistic regression: log πab/πba = 1 as loglinear model log mab = µ + µa X + µb Y + µab XY with µa X = µa Y and µab XY = µba XY

Migration Data Symmetry seems to be violated: e.g. fewer people move MW -> S than vice versa