Lecture 3.1 Basic Logistic LDA

Similar documents
Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lecture 3 Linear random intercept models

multilevel modeling: concepts, applications and interpretations

Monday 7 th Febraury 2005

Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data

Generalized linear models

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.

Control Function and Related Methods: Nonlinear Models

Lecture 12: Effect modification, and confounding in logistic regression

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Homework Solutions Applied Logistic Regression

Binary Dependent Variables

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Lecture 10: Introduction to Logistic Regression

Recent Developments in Multilevel Modeling

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

Single-level Models for Binary Responses

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Problem Set 10: Panel Data

Analyzing Proportions

Lecture 1 Introduction to Multi-level Models

Chapter 11. Regression with a Binary Dependent Variable

The Multilevel Logit Model for Binary Dependent Variables Marco R. Steenbergen

PSC 8185: Multilevel Modeling Fitting Random Coefficient Binary Response Models in Stata

Jeffrey M. Wooldridge Michigan State University

Soc 63993, Homework #7 Answer Key: Nonlinear effects/ Intro to path analysis

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

ECON 594: Lecture #6

QIC program and model selection in GEE analyses

Figure 36: Respiratory infection versus time for the first 49 children.

Sociology 362 Data Exercise 6 Logistic Regression 2

Binary Dependent Variable. Regression with a

Nonlinear Econometric Analysis (ECO 722) : Homework 2 Answers. (1 θ) if y i = 0. which can be written in an analytically more convenient way as

Problem set - Selection and Diff-in-Diff

Generalized Linear Models for Non-Normal Data

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

One-stage dose-response meta-analysis

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Review of Panel Data Model Types Next Steps. Panel GLMs. Department of Political Science and Government Aarhus University.

Logistic & Tobit Regression

University of California at Berkeley Fall Introductory Applied Econometrics Final examination. Scores add up to 125 points

GEE for Longitudinal Data - Chapter 8

****Lab 4, Feb 4: EDA and OLS and WLS

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

STAT 705 Generalized linear mixed models

Statistical Modelling with Stata: Binary Outcomes

Modelling Binary Outcomes 21/11/2017

General Linear Model (Chapter 4)

LOGISTIC REGRESSION Joseph M. Hilbe

Simultaneous Equations with Error Components. Mike Bronner Marko Ledic Anja Breitwieser

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

Longitudinal Modeling with Logistic Regression

Exercices for Applied Econometrics A

Generalized linear models

STA6938-Logistic Regression Model

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

2. We care about proportion for categorical variable, but average for numerical one.

Models for binary data

Quantitative Methods Final Exam (2017/1)

Binary Outcomes. Objectives. Demonstrate the limitations of the Linear Probability Model (LPM) for binary outcomes

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Lecture 4: Generalized Linear Mixed Models

Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt

Fixed and Random Effects Models: Vartanian, SW 683

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

Generalized Models: Part 1

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

Mixed Models for Longitudinal Ordinal and Nominal Outcomes

STAT 7030: Categorical Data Analysis

ECON Introductory Econometrics. Lecture 5: OLS with One Regressor: Hypothesis Tests

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Introduction to Generalized Models

,..., θ(2),..., θ(n)

STAT 526 Advanced Statistical Methodology

i (x i x) 2 1 N i x i(y i y) Var(x) = P (x 1 x) Var(x)

Lecture 14: Introduction to Poisson Regression

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Stata tip 63: Modeling proportions

A Journey to Latent Class Analysis (LCA)

Please discuss each of the 3 problems on a separate sheet of paper, not just on a separate page!

Lecture 2: Poisson and logistic regression

Module 6 Case Studies in Longitudinal Data Analysis

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Logistic Regression. Building, Interpreting and Assessing the Goodness-of-fit for a logistic regression model

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Longitudinal Data Analysis

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

Instantaneous geometric rates via Generalized Linear Models

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

Linear Regression Models P8111

Econometrics II. Seppo Pynnönen. Spring Department of Mathematics and Statistics, University of Vaasa, Finland

Chapter 1. Modeling Basics

Title. Description. Special-interest postestimation commands. xtmelogit postestimation Postestimation tools for xtmelogit

Multilevel/Mixed Models and Longitudinal Analysis Using Stata

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Transcription:

y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data Analysis -- The Johns Hopkins Graduate Summer Institute of Epidemiology and Biostatistics - Michael Griswold - 2 Logistic Regression refresher: Women s Employment status Data on married women from the Women s Labor Force Participation Dataset (Fox 1997) Data on 26 Canadian Women in 1997 Workstat : employment status 0: not working, 0: not working Recoded to binary 1: working part-time 1: working 2: working full time Husbinc : husband income in $1000 Childpres : child present in the houshold (dummy variable: 0,1) Women s Employment status: Data. list obs workstat husbinc chilpres in 1/10 +----------------------------------------+ obs workstat husbinc chilpres ---------------------------------------- 1. 1 Not Working 15 present 2. 2 Not Working 1 present. Not Working 45 present 4. 4 Not Working 2 present 5. 5 Not Working 19 present ---------------------------------------- 6. 6 Not Working 7 present 7. 7 Not Working 15 present 8. 8 Working 7 present 9. 9 Not Working 15 present 10. 10 Not Working 2 present +----------------------------------------+ 4 1

Women s Employment status: Data. list obs workstat husbinc chilpres in 1/10, nolabel +-------------------------------------+ obs workstat husbinc chilpres ------------------------------------- 1. 1 0 15 1 2. 2 0 1 1. 0 45 1 4. 4 0 2 1 5. 5 0 19 1 ------------------------------------- 6. 6 0 7 1 7. 7 0 15 1 8. 8 1 7 1 9. 9 0 15 1 10. 10 0 2 1 +-------------------------------------+ 5 Logistic regression model logitp(y i 1, ) 1 2. logit workstat husbinc chilpres Logistic regression Number of obs = 26 LR chi2(2) = 6.42 Prob > chi2 = 0.0000 Log likelihood = -159.86627 Pseudo R2 = 0.102 workstat Coef. Std. Err. z P> z [95% Conf. Interval] husbinc -.042084.0197801-2.14 0.02 -.0810767 -.005401 chilpres -1.575648.2922629-5.9 0.000-2.14847-1.002824 _cons 1.58.8762.48 0.000.586677 2.087992 logor comparing odds of success for additional $1000 in husband s income logor comparing odds of success for those who have children vs. those who don t Baseline logodds of success: Women with husbands who make $0 and have no children 6 Logistic regression model logitp(y i 1, ) 1 2. logit workstat husbinc chilpres, or Logistic regression Number of obs = 26 LR chi2(2) = 6.42 Prob > chi2 = 0.0000 Log likelihood = -159.86627 Pseudo R2 = 0.102 workstat Odds Ratio Std. Err. z P> z [95% Conf. Interval] husbinc.9585741.0189607-2.14 0.02.9221229.9964661 chilpres.206874.0604614-5.9 0.000.1166622.668421 OR comparing odds of success for additional $1000 in husband s income OR comparing odds of success for those who have children vs. those Parameter interpretations in logistic regression Comparing women with and without a child at home, whose husbands have the same income, the odds of working are about 5 (1/0.21) times as high for the women who don t have a child at home Within the two groups of women (the ones that have and don t have a child), each extra $1,000 of husband s income reduces the odds of working by about 4% [(1-0.96)X100] who don t 7 8 2

Standard errors Exponentiating standard errors of regression coefficients is a no-no. For confidence intervals or hypothesis tests. For instance, the 95% confidence intervals in the above output were computed as NOT: exp{ 1.96 ˆ SE( )} ˆ exp{ ˆ} 1.96exp{ SE( ˆ)} 9 Visualization of the predictive probabilities from logistic regression ˆ i exp( ˆ 1 ˆ 2 ˆ ) 1 exp( ˆ 1 ˆ 2 ˆ ) Probability that wife works 0.2.4.6.8 0 10 20 0 40 50 Husband's income / $1000 No child Child 10 Probability that wife works 0.2.4.6.8 1 Predicted probabilities extrapolating outside the range of the data No Data -100-50 0 50 100 Husband's income / $1000 No child Obs Data Child No Data 11 Reminder -Extensions to Linear Regression: Usual Linear Regression (OLS) 1. Y i = X + i 2. i ~ N(0,I 2 ) Use a Marginal Model to estimate effects 1. Y i = X+ i 2. i ~ N(0,). = R 2 ; R is a working corr structure (ind,exch,ar,...) Use a Conditional Model to estimate effects 1. Y i u i = X+ Zu i + i 2. u i ~ N(0,G). i ~ N(0, ) i & u i independent 12

Extensions to Logistic Regression Usual Logistic Regression 1. log{odds(y i = 1)} = X Pr( Yi 1) log X 1 Pr( Yi 1) Use a Marginal Model to estimate effects 1. log{odds(y i = 1)} = X 2. Assoc(Y i ) = R. R is a working Assoc structure, log(or), Corr, etc Use a Conditional Model to estimate effects 1. log{odds(y i =1 u i )} = X+ Zu i 2. u i ~ N(0,G) 1 Logistic Regression Example: Cross-over trial Data from the 2 2 crossover trial on cerebrovascular deficiency adapted from Jones and Kenward, where treatment A and B are active drug and placebo, respectively; the outcome indicates whether an electrocardiogram was judged abnormal (0) or normal (1). Goal: to compare the effect of an active drug (A) and a placebo (B) on cerebrovascular deficiency Marginal Model: 1. log{odds(n ij )} = 0 + 1 Period ij + 2 Trt ij 2. Corr(N i1, N i2 )} = (exch) Conditional Model: 1. log{odds(n ij u i )} = 0 + 1 Period ij + 2 Trt ij + u i 2. u i ~ N(0, 2 ) 14 Ordinary Logisitic Marginal Logisitic: exch. xtgee res visit trt, i(id) f(bin) l(logit) corr(exch) eform. logit res visit trt, or Logistic regression Number of obs = 14 LR chi2(2) = 2.76 Prob > chi2 = 0.2514 Log likelihood = -81.94172 Pseudo R2 = 0.0166 visit.760092.2864164-0.7 0.467.61771 1.590795 trt 1.74745.661255 1.48 0.140.82419.668676 15 GEE population-averaged model Number of obs = 14 Link: logit Obs per group: min = 2 Family: binomial avg = 2.0 Correlation: exchangeable max = 2 Wald chi2(2) = 7.51 Scale parameter: 1 Prob > chi2 = 0.024 visit.7445206.1724982-1.27 0.20.4727826 1.17244 trt 1.766264.4120557 2.44 0.015 1.11809 2.790194. xtcorr Estimated within-id correlation matrix R: c1 c2 r1 1.0000 r2 0.624 1.0000 16 4

Marginal Logisitic: exch. xtlogit res visit trt, or pa i(id) corr(exch) GEE population-averaged model Number of obs = 14 Link: logit Obs per group: min = 2 Family: binomial avg = 2.0 Correlation: exchangeable max = 2 Wald chi2(2) = 7.51 Scale parameter: 1 Prob > chi2 = 0.024 visit.7445206.1724982-1.27 0.20.4727826 1.17244 trt 1.766264.4120557 2.44 0.015 1.11809 2.790194 Marginal Logisitic: exch. xtgee res visit trt, i(id) f(bin) l(logit) corr(exch) eform robust GEE population-averaged model Number of obs = 14 Link: logit Obs per group: min = 2 Family: binomial avg = 2.0 Correlation: exchangeable max = 2 Wald chi2(2) = 8.26 Scale parameter: 1 Prob > chi2 = 0.0161 (Std. Err. adjusted for clustering on id) Semi-robust visit.7445206.1776-1.27 0.205.471691 1.175157 trt 1.766264.414156 2.4 0.015 1.115487 2.796704 17 18. xtlogit res visit trt, or RE Logisitic: Random-effects logistic regression Number of obs = 14 Random effects u_i ~ Gaussian Obs per group: min = 2 avg = 2.0 max = 2 Wald chi2(2) = 4.69 Log likelihood = -68.1187 Prob > chi2 = 0.0960 res OR Std. Err. z P> z [95% Conf. Interval] visit.155.2752541-1. 0.184.0650441 1.688029 trt 7.07957 6.58601 2.10 0.05 1.149 4.8114 /lnsig2u.7476.671748 2.0558 4.69414 sigma_u 5.40525 1.8194 2.794544 10.45486 rho.89879.061246.705979.9707811. display 5.4^2/(5.4^2 +.14^2/).89870926 Latent Response Formulation: ICC = 2 / ( 2 + 2 / ) 19 Marginal -vs- Random Intercept Models; Cross-over Example Model Variable Ordinary Logistic Regression Period 0.76 (0.29) [0.467] Treatment 1.75 (0.66) [0.140] Marginal (GEE) Logistic Regression 0.74 (0.17) [0.20] 1.77 (0.41) [0.015] Random-Effect Logistic Regression 0. (0.28) [0.184] 7.08 (6.58) [0.05] Assoc. -- 0.624 5.4 (1.8) *RE model fit with random intercept, adaptive quadrature with 12 integration pts 20 5

Marginal vs- Random Intercept Model log{odds(y i )} = 0 + 1 *Trt VS. log{odds(y i u i )} = 0 + 1 *Trt + u i population prevalences Drug A Placebo cluster specific comparisons Drug A Placebo Note: In the X-over trial we have obs on pts both on AND off Drug; Usually true? Extras Source: DHLZ 2002 (pg 15) 21 22 Latent Response formulation: Logit Another way to think of these models is to consider that underlying the observed dichotomous response (whether the women works or not), there is an unobserved or latent continuous response, representing the propensity to work. If this latent response is greater than zero, then the observed response is 1: Latent Response formulation: Probit Another way to think of these models is to consider that underlying the observed dichotomous response (whether the women works or not), there is an unobserved or latent continuous response, representing the propensity to work. If this latent response is greater than zero, then the observed response is 1: y i * 1 2 i y i * 1 2 i y i * 0 y i 1 y i * 0 y i 1 y i * 0 y i 0 y i * 0 y i 0 Logistic Regression: i has logistic distribution: E( i ) = 0 Pr( i Var( i ) = 2 / var( i ) 2 exp( a) a) 1 exp( a) 2 Probit Regression: i has Std. Normal distribution: i ~ N(0,1) E( i ) = 0 Var( i ) = 1 var( i ) 2 24 6

Probit Regression: -1 {Pr(Y i =1)} = 0 + 1 x i Note: I borrowed this figure from MLMUS text 25 Women s Employment status: probit. glm workstat husbinc chilpres, link(probit) family(binom) Generalized linear models No. of obs = 26 Optimization : ML Residual df = 260 Scale parameter = 1 Deviance = 19.9597291 (1/df) Deviance = 1.20614 Pearson = 265.451854 (1/df) Pearson = 1.020558 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = invnorm(u) [Probit] AIC = 1.2991 Log likelihood = -159.9798646 BIC = -1128.8 OIM workstat Coef. Std. Err. z P> z [95% Conf. Interval] husbinc -.0242081.0114252-2.12 0.04 -.0466011 -.001815 chilpres -.970616.1769051-5.49 0.000-1.1744 -.628887 _cons.7981507.2240082.56 0.000.591027 1.27199 26 Women s Employment status: Logit. glm workstat husbinc chilpres, link(logit) family(binom) Generalized linear models No. of obs = 26 Optimization : ML Residual df = 260 Scale parameter = 1 Deviance = 19.72578 (1/df) Deviance = 1.229741 Pearson = 265.961512 (1/df) Pearson = 1.022929 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u/(1-u)) [Logit] AIC = 1.28527 Log likelihood = -159.8662689 BIC = -1129.028 OIM workstat Coef. Std. Err. z P> z [95% Conf. Interval] husbinc -.042084.0197801-2.14 0.02 -.0810768 -.005401 chilpres -1.575648.2922629-5.9 0.000-2.14847-1.002824 _cons 1.58.8764.48 0.000.586674 2.087992 27 Probability that wife works 0.2.4.6.8 1 GLM: Logistic vs Probit -1 {Pr(Y i =1)} = 0 + 1 x i logodds(y i =1) = 0 + 1 x i Pr( Y i 1) log = 0 + 1 x i 1 Pr( Yi 1) -100-50 0 50 100 Husband's income / $1000 Logit link Probit link Note: only difference is the link. Here, both give similar results. 28 7