Lecture 3.1 Basic Logistic LDA

y Lecture.1 Basic Logistic LDA 0.2.4.6.8 1 Outline Quick Refresher on Ordinary Logistic Regression and Stata Women s employment example Cross-Over Trial LDA Example -100-50 0 50 100 -- Longitudinal Data Analysis -- The Johns Hopkins Graduate Summer Institute of Epidemiology and Biostatistics - Michael Griswold - 2 Logistic Regression refresher: Women s Employment status Data on married women from the Women s Labor Force Participation Dataset (Fox 1997) Data on 26 Canadian Women in 1997 Workstat : employment status 0: not working, 0: not working Recoded to binary 1: working part-time 1: working 2: working full time Husbinc : husband income in $1000 Childpres : child present in the houshold (dummy variable: 0,1) Women s Employment status: Data. list obs workstat husbinc chilpres in 1/10 +----------------------------------------+ obs workstat husbinc chilpres ---------------------------------------- 1. 1 Not Working 15 present 2. 2 Not Working 1 present. Not Working 45 present 4. 4 Not Working 2 present 5. 5 Not Working 19 present ---------------------------------------- 6. 6 Not Working 7 present 7. 7 Not Working 15 present 8. 8 Working 7 present 9. 9 Not Working 15 present 10. 10 Not Working 2 present +----------------------------------------+ 4 1

Women s Employment status: Data. list obs workstat husbinc chilpres in 1/10, nolabel +-------------------------------------+ obs workstat husbinc chilpres ------------------------------------- 1. 1 0 15 1 2. 2 0 1 1. 0 45 1 4. 4 0 2 1 5. 5 0 19 1 ------------------------------------- 6. 6 0 7 1 7. 7 0 15 1 8. 8 1 7 1 9. 9 0 15 1 10. 10 0 2 1 +-------------------------------------+ 5 Logistic regression model logitp(y i 1, ) 1 2. logit workstat husbinc chilpres Logistic regression Number of obs = 26 LR chi2(2) = 6.42 Prob > chi2 = 0.0000 Log likelihood = -159.86627 Pseudo R2 = 0.102 workstat Coef. Std. Err. z P> z [95% Conf. Interval] husbinc -.042084.0197801-2.14 0.02 -.0810767 -.005401 chilpres -1.575648.2922629-5.9 0.000-2.14847-1.002824 _cons 1.58.8762.48 0.000.586677 2.087992 logor comparing odds of success for additional $1000 in husband s income logor comparing odds of success for those who have children vs. those who don t Baseline logodds of success: Women with husbands who make $0 and have no children 6 Logistic regression model logitp(y i 1, ) 1 2. logit workstat husbinc chilpres, or Logistic regression Number of obs = 26 LR chi2(2) = 6.42 Prob > chi2 = 0.0000 Log likelihood = -159.86627 Pseudo R2 = 0.102 workstat Odds Ratio Std. Err. z P> z [95% Conf. Interval] husbinc.9585741.0189607-2.14 0.02.9221229.9964661 chilpres.206874.0604614-5.9 0.000.1166622.668421 OR comparing odds of success for additional $1000 in husband s income OR comparing odds of success for those who have children vs. those Parameter interpretations in logistic regression Comparing women with and without a child at home, whose husbands have the same income, the odds of working are about 5 (1/0.21) times as high for the women who don t have a child at home Within the two groups of women (the ones that have and don t have a child), each extra $1,000 of husband s income reduces the odds of working by about 4% [(1-0.96)X100] who don t 7 8 2

Standard errors Exponentiating standard errors of regression coefficients is a no-no. For confidence intervals or hypothesis tests. For instance, the 95% confidence intervals in the above output were computed as NOT: exp{ 1.96 ˆ SE( )} ˆ exp{ ˆ} 1.96exp{ SE( ˆ)} 9 Visualization of the predictive probabilities from logistic regression ˆ i exp( ˆ 1 ˆ 2 ˆ ) 1 exp( ˆ 1 ˆ 2 ˆ ) Probability that wife works 0.2.4.6.8 0 10 20 0 40 50 Husband's income / $1000 No child Child 10 Probability that wife works 0.2.4.6.8 1 Predicted probabilities extrapolating outside the range of the data No Data -100-50 0 50 100 Husband's income / $1000 No child Obs Data Child No Data 11 Reminder -Extensions to Linear Regression: Usual Linear Regression (OLS) 1. Y i = X + i 2. i ~ N(0,I 2 ) Use a Marginal Model to estimate effects 1. Y i = X+ i 2. i ~ N(0,). = R 2 ; R is a working corr structure (ind,exch,ar,...) Use a Conditional Model to estimate effects 1. Y i u i = X+ Zu i + i 2. u i ~ N(0,G). i ~ N(0, ) i & u i independent 12

Extensions to Logistic Regression Usual Logistic Regression 1. log{odds(y i = 1)} = X Pr( Yi 1) log X 1 Pr( Yi 1) Use a Marginal Model to estimate effects 1. log{odds(y i = 1)} = X 2. Assoc(Y i ) = R. R is a working Assoc structure, log(or), Corr, etc Use a Conditional Model to estimate effects 1. log{odds(y i =1 u i )} = X+ Zu i 2. u i ~ N(0,G) 1 Logistic Regression Example: Cross-over trial Data from the 2 2 crossover trial on cerebrovascular deficiency adapted from Jones and Kenward, where treatment A and B are active drug and placebo, respectively; the outcome indicates whether an electrocardiogram was judged abnormal (0) or normal (1). Goal: to compare the effect of an active drug (A) and a placebo (B) on cerebrovascular deficiency Marginal Model: 1. log{odds(n ij )} = 0 + 1 Period ij + 2 Trt ij 2. Corr(N i1, N i2 )} = (exch) Conditional Model: 1. log{odds(n ij u i )} = 0 + 1 Period ij + 2 Trt ij + u i 2. u i ~ N(0, 2 ) 14 Ordinary Logisitic Marginal Logisitic: exch. xtgee res visit trt, i(id) f(bin) l(logit) corr(exch) eform. logit res visit trt, or Logistic regression Number of obs = 14 LR chi2(2) = 2.76 Prob > chi2 = 0.2514 Log likelihood = -81.94172 Pseudo R2 = 0.0166 visit.760092.2864164-0.7 0.467.61771 1.590795 trt 1.74745.661255 1.48 0.140.82419.668676 15 GEE population-averaged model Number of obs = 14 Link: logit Obs per group: min = 2 Family: binomial avg = 2.0 Correlation: exchangeable max = 2 Wald chi2(2) = 7.51 Scale parameter: 1 Prob > chi2 = 0.024 visit.7445206.1724982-1.27 0.20.4727826 1.17244 trt 1.766264.4120557 2.44 0.015 1.11809 2.790194. xtcorr Estimated within-id correlation matrix R: c1 c2 r1 1.0000 r2 0.624 1.0000 16 4

Marginal Logisitic: exch. xtlogit res visit trt, or pa i(id) corr(exch) GEE population-averaged model Number of obs = 14 Link: logit Obs per group: min = 2 Family: binomial avg = 2.0 Correlation: exchangeable max = 2 Wald chi2(2) = 7.51 Scale parameter: 1 Prob > chi2 = 0.024 visit.7445206.1724982-1.27 0.20.4727826 1.17244 trt 1.766264.4120557 2.44 0.015 1.11809 2.790194 Marginal Logisitic: exch. xtgee res visit trt, i(id) f(bin) l(logit) corr(exch) eform robust GEE population-averaged model Number of obs = 14 Link: logit Obs per group: min = 2 Family: binomial avg = 2.0 Correlation: exchangeable max = 2 Wald chi2(2) = 8.26 Scale parameter: 1 Prob > chi2 = 0.0161 (Std. Err. adjusted for clustering on id) Semi-robust visit.7445206.1776-1.27 0.205.471691 1.175157 trt 1.766264.414156 2.4 0.015 1.115487 2.796704 17 18. xtlogit res visit trt, or RE Logisitic: Random-effects logistic regression Number of obs = 14 Random effects u_i ~ Gaussian Obs per group: min = 2 avg = 2.0 max = 2 Wald chi2(2) = 4.69 Log likelihood = -68.1187 Prob > chi2 = 0.0960 res OR Std. Err. z P> z [95% Conf. Interval] visit.155.2752541-1. 0.184.0650441 1.688029 trt 7.07957 6.58601 2.10 0.05 1.149 4.8114 /lnsig2u.7476.671748 2.0558 4.69414 sigma_u 5.40525 1.8194 2.794544 10.45486 rho.89879.061246.705979.9707811. display 5.4^2/(5.4^2 +.14^2/).89870926 Latent Response Formulation: ICC = 2 / ( 2 + 2 / ) 19 Marginal -vs- Random Intercept Models; Cross-over Example Model Variable Ordinary Logistic Regression Period 0.76 (0.29) [0.467] Treatment 1.75 (0.66) [0.140] Marginal (GEE) Logistic Regression 0.74 (0.17) [0.20] 1.77 (0.41) [0.015] Random-Effect Logistic Regression 0. (0.28) [0.184] 7.08 (6.58) [0.05] Assoc. -- 0.624 5.4 (1.8) *RE model fit with random intercept, adaptive quadrature with 12 integration pts 20 5

Marginal vs- Random Intercept Model log{odds(y i )} = 0 + 1 *Trt VS. log{odds(y i u i )} = 0 + 1 *Trt + u i population prevalences Drug A Placebo cluster specific comparisons Drug A Placebo Note: In the X-over trial we have obs on pts both on AND off Drug; Usually true? Extras Source: DHLZ 2002 (pg 15) 21 22 Latent Response formulation: Logit Another way to think of these models is to consider that underlying the observed dichotomous response (whether the women works or not), there is an unobserved or latent continuous response, representing the propensity to work. If this latent response is greater than zero, then the observed response is 1: Latent Response formulation: Probit Another way to think of these models is to consider that underlying the observed dichotomous response (whether the women works or not), there is an unobserved or latent continuous response, representing the propensity to work. If this latent response is greater than zero, then the observed response is 1: y i * 1 2 i y i * 1 2 i y i * 0 y i 1 y i * 0 y i 1 y i * 0 y i 0 y i * 0 y i 0 Logistic Regression: i has logistic distribution: E( i ) = 0 Pr( i Var( i ) = 2 / var( i ) 2 exp( a) a) 1 exp( a) 2 Probit Regression: i has Std. Normal distribution: i ~ N(0,1) E( i ) = 0 Var( i ) = 1 var( i ) 2 24 6

Probit Regression: -1 {Pr(Y i =1)} = 0 + 1 x i Note: I borrowed this figure from MLMUS text 25 Women s Employment status: probit. glm workstat husbinc chilpres, link(probit) family(binom) Generalized linear models No. of obs = 26 Optimization : ML Residual df = 260 Scale parameter = 1 Deviance = 19.9597291 (1/df) Deviance = 1.20614 Pearson = 265.451854 (1/df) Pearson = 1.020558 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = invnorm(u) [Probit] AIC = 1.2991 Log likelihood = -159.9798646 BIC = -1128.8 OIM workstat Coef. Std. Err. z P> z [95% Conf. Interval] husbinc -.0242081.0114252-2.12 0.04 -.0466011 -.001815 chilpres -.970616.1769051-5.49 0.000-1.1744 -.628887 _cons.7981507.2240082.56 0.000.591027 1.27199 26 Women s Employment status: Logit. glm workstat husbinc chilpres, link(logit) family(binom) Generalized linear models No. of obs = 26 Optimization : ML Residual df = 260 Scale parameter = 1 Deviance = 19.72578 (1/df) Deviance = 1.229741 Pearson = 265.961512 (1/df) Pearson = 1.022929 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u/(1-u)) [Logit] AIC = 1.28527 Log likelihood = -159.8662689 BIC = -1129.028 OIM workstat Coef. Std. Err. z P> z [95% Conf. Interval] husbinc -.042084.0197801-2.14 0.02 -.0810768 -.005401 chilpres -1.575648.2922629-5.9 0.000-2.14847-1.002824 _cons 1.58.8764.48 0.000.586674 2.087992 27 Probability that wife works 0.2.4.6.8 1 GLM: Logistic vs Probit -1 {Pr(Y i =1)} = 0 + 1 x i logodds(y i =1) = 0 + 1 x i Pr( Y i 1) log = 0 + 1 x i 1 Pr( Yi 1) -100-50 0 50 100 Husband's income / $1000 Logit link Probit link Note: only difference is the link. Here, both give similar results. 28 7