Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data

Similar documents
QIC program and model selection in GEE analyses

Outline. Linear OLS Models vs: Linear Marginal Models Linear Conditional Models. Random Intercepts Random Intercepts & Slopes

Lecture 3.1 Basic Logistic LDA

Generalized linear models

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lecture 3 Linear random intercept models

Longitudinal Data Analysis Using Stata Paul D. Allison, Ph.D. Upcoming Seminar: May 18-19, 2017, Chicago, Illinois

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

GENERALIZED LINEAR MODELS Joseph M. Hilbe

Sociology 362 Data Exercise 6 Logistic Regression 2

Monday 7 th Febraury 2005

LOGISTIC REGRESSION Joseph M. Hilbe

STAT 526 Advanced Statistical Methodology

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Homework Solutions Applied Logistic Regression

Modelling Rates. Mark Lunt. Arthritis Research UK Epidemiology Unit University of Manchester

Class Notes. Examining Repeated Measures Data on Individuals

Analysing repeated measurements whilst accounting for derivative tracking, varying within-subject variance and autocorrelation: the xtiou command

Generalized Estimating Equations

Understanding the multinomial-poisson transformation

GEE for Longitudinal Data - Chapter 8

A Journey to Latent Class Analysis (LCA)

Jeffrey M. Wooldridge Michigan State University

GMM Estimation in Stata

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

Dynamic Panels. Chapter Introduction Autoregressive Model

The Simulation Extrapolation Method for Fitting Generalized Linear Models with Additive Measurement Error

Generalized linear models

Instantaneous geometric rates via Generalized Linear Models

-redprob- A Stata program for the Heckman estimator of the random effects dynamic probit model

Addition to PGLR Chap 6

Generalized method of moments estimation in Stata 11

Lecture 4: Generalized Linear Mixed Models

Control Function and Related Methods: Nonlinear Models

Trends in Human Development Index of European Union

Analyzing Proportions

Software for GEE: PROC GENMOD and SUDAAN Babubhai V. Shah, Research Triangle Institute, Research Triangle Park, NC

PSC 8185: Multilevel Modeling Fitting Random Coefficient Binary Response Models in Stata

Recent Developments in Multilevel Modeling

Estimation of discrete time (grouped duration data) proportional hazards models: pgmhaz

Longitudinal Modeling with Logistic Regression

On Properties of QIC in Generalized. Estimating Equations. Shinpei Imori

Estimating Markov-switching regression models in Stata

Chapter 11. Regression with a Binary Dependent Variable

xtdpdqml: Quasi-maximum likelihood estimation of linear dynamic short-t panel data models

Module 6 Case Studies in Longitudinal Data Analysis

crreg: A New command for Generalized Continuation Ratio Models

Logistic Regression - problem 6.14

Lecture 12: Effect modification, and confounding in logistic regression

Title. Description. Special-interest postestimation commands. xtmelogit postestimation Postestimation tools for xtmelogit

The simulation extrapolation method for fitting generalized linear models with additive measurement error

Latent class analysis and finite mixture models with Stata

Using Estimating Equations for Spatially Correlated A

Simulating Longer Vectors of Correlated Binary Random Variables via Multinomial Sampling

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Case of single exogenous (iv) variable (with single or multiple mediators) iv à med à dv. = β 0. iv i. med i + α 1

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

Fitting Stereotype Logistic Regression Models for Ordinal Response Variables in Educational Research (Stata)

multilevel modeling: concepts, applications and interpretations

Logit estimates Number of obs = 5054 Wald chi2(1) = 2.70 Prob > chi2 = Log pseudolikelihood = Pseudo R2 =

Generalized Estimating Equations (gee) for glm type data

Confidence intervals for the variance component of random-effects linear models

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Appendix A. Numeric example of Dimick Staiger Estimator and comparison between Dimick-Staiger Estimator and Hierarchical Poisson Estimator

Semiparametric Generalized Linear Models

Lecture 2: Poisson and logistic regression

ECON Introductory Econometrics. Lecture 11: Binary dependent variables

Using the same data as before, here is part of the output we get in Stata when we do a logistic regression of Grade on Gpa, Tuce and Psi.

CRE METHODS FOR UNBALANCED PANELS Correlated Random Effects Panel Data Models IZA Summer School in Labor Economics May 13-19, 2013 Jeffrey M.

David M. Drukker Italian Stata Users Group meeting 15 November Executive Director of Econometrics Stata

SAS/STAT 14.2 User s Guide. The GENMOD Procedure

PQL Estimation Biases in Generalized Linear Mixed Models

****Lab 4, Feb 4: EDA and OLS and WLS

ECON 594: Lecture #6

Using generalized structural equation models to fit customized models without programming, and taking advantage of new features of -margins-

Propensity Score Matching and Analysis TEXAS EVALUATION NETWORK INSTITUTE AUSTIN, TX NOVEMBER 9, 2018

2. We care about proportion for categorical variable, but average for numerical one.

fhetprob: A fast QMLE Stata routine for fractional probit models with multiplicative heteroskedasticity

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Binary Dependent Variables

Conducting Linear Growth Curve Analysis with the SA!! System

The GENMOD Procedure (Book Excerpt)

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

More Accurately Analyze Complex Relationships

Multivariate probit regression using simulated maximum likelihood

Calculating Odds Ratios from Probabillities

Parametric and semiparametric estimation of ordered response models with sample selection and individual-specific thresholds

Package mmm. R topics documented: February 20, Type Package

4/9/2014. Outline for Stochastic Frontier Analysis. Stochastic Frontier Production Function. Stochastic Frontier Production Function

One-stage dose-response meta-analysis

Mixed Models for Longitudinal Binary Outcomes. Don Hedeker Department of Public Health Sciences University of Chicago.

arxiv: v1 [stat.ap] 17 Jun 2013

11. Generalized Linear Models: An Introduction

Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt

Multilevel Modeling Day 2 Intermediate and Advanced Issues: Multilevel Models as Mixed Models. Jian Wang September 18, 2012

Robust covariance estimator for small-sample adjustment in the generalized estimating equations: A simulation study

Index. Regression Models for Time Series Analysis. Benjamin Kedem, Konstantinos Fokianos Copyright John Wiley & Sons, Inc. ISBN.

mi estimate : regress newrough_ic Current Torch_height Slowoncurves E_0 G_0 Current > 2 Pressure2 Cut_speed2 Presscutspeed, noconstant

Transcription:

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 1 Model and Working Correlation Structure Selection in GEE Analyses of Longitudinal Data Dr Jisheng Cui Public Health Research Cluster Deakin University

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 2 1. Introduction GEE method - background information QIC - comparison with AIC Stata program - description of options Examples - Longitudinal data analyses Summary

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 3 2. GEE method Generalized Estimating Equations (GEE) Liang and Zeger 1986; Biometrika Zeger and Liang 1986; Biometrics Zeger, Liang, Albert 1988; Biometrics Extension of GLM Nelder and Wedderburn 1972; JRSS A McCullagh and Nelder 1989; GLM (2nd Ed) S(β; R, D) n i=1 D iv 1 i (Y i µ i ) = 0

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 4 Part I: Working correlation structure R V i = A 1/2 i R(α)A 1/2 i Consistent estimates even if R mis-specified Independent Exchangeable Autoregressive Stationary Non-stationary Unstructured Do we still need to specify R correctly?

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 5 Under Ind. model, β efficient if R not large Zeger 1988; Biometrics For time-varying covariates β may be inefficient Fitzmaurce 1995; Biometrics Population-average model

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 6 Part II: Mean & Variance of mean of outcome First two orders of moments Different family variance differs Table 1: Variance functions Distribution V (µ) Bernoulli µ(1 µ) Normal 1 Poisson µ Gamma µ 2 Negative binomial µ + αµ 2 Inverse Gaussian µ 3

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 7 Part III: Link function between outcome & variables Different model link function differs Identity Log Logit Probit Power Reciprocal Analysis of Longitudinal data Subject ID Time variable

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 8 Model section in GEE No criterion available between 1986 & 2001 Akaike Information Criterion (AIC) not applicable No distribution was assumed for GEE No likelihood defined Quasi-likelihood Q(µ) = µ y y t φv (t) dt

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 9 Under Independence correlation matrix Q(β, φ; I, D) = n i=1 n i j=1 Q(β, φ; (Y ij, X ij )) GEE S(β; I, D) equivalent Q(β, φ; I, D)/ β Under a general correlation matrix Quasi-likelihood may not exist McCullagh and Nelder 1989

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 10 Pan (2001) in Biometrics proposed QIC Hardin & Hilbe (2003) Generalized Estimating Equation Application was slow

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 11 3. QIC method Quasi-likelihood under Independence model Criterion Table 2: Quasi-likelihood functions Distribution φq(µ) Bernoulli y ln( µ ) + ln(1 µ) 1 µ Normal 2 1 (y µ)2 Poisson y ln(µ) µ Gamma (y/µ + ln(µ)) Negative binomial y ln( 1+αµ αµ α 1 ln(1 + αµ) Inverse Gaussian 2µ y + µ 1

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 12 QIC formula QIC = 2Q( µ; I) + 2trace( Ω 1 I V R ) Ω I variance estimator under independence correlation structure Ω R robust variance estimator under specified working correlation structure Select correlation structure & most parsimonious model QIC u = 2Q( µ; I) + 2p AIC = 2LL + 2p

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 13 4. Stata program qic depvar [indepvars ] [if ] [in ] [, i(varname i ) t(varname t ) family(familyname) link(linkname) corr(correlation) exposure(varname) offset(varname) noconstant force eform nodisplay nolog iterate(#) tolerance(#) trace ]

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 14 5. An example Longitudinal study of lung function decline Follow-up from 1995 until 2003 322 males participants used for demonstration) Variables of interest FEV 1 - Forced Expiratory Volume in one second Age at each interview Smoking status - current, former, nonsmoker Age at baseline Height at baseline

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 15 Two individuals +-----------------------------------------------------------+ id wave fev age smoking agebase htbase ----------------------------------------------------------- 1150024 0 4.58 26.0 0 26.0 1.806 1150024 1 4.68 26.6 0 26.0 1.806 1150024 2 4.35 27.5 0 26.0 1.806 1150024 3 4.58 28.5 0 26.0 1.806 1150024 4 4.74 29.4 0 26.0 1.806 1150024 5 4.64 30.4 0 26.0 1.806 1150024 6 4.68 31.3 0 26.0 1.806 ----------------------------------------------------------- 1150032 0 4.49 17.3 0 17.3 1.72 1150032 1 4.31 17.6 0 17.3 1.72 1150032 2 4.36 18.6 1 17.3 1.72 +-----------------------------------------------------------+

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 16 Stata Command. qic fev age smoking2 smoking3 agebase htbase, i(id) t(wave) corr(exchangeable)

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 17 GEE population-averaged model Number of obs = 1141 Group variable: id Number of groups = 322 Link: identity Obs per group: min = 1 Family: Gaussian avg = 3.5 Correlation: independent max = 9 Wald chi2(5) = 701.01 Scale parameter:.2505905 Prob > chi2 = 0.0000 Pearson chi2(1141): 285.92 Deviance = 285.92 Dispersion (Pearson):.2505905 Dispersion =.2505905 ------------------------------------------------------------------------------ fev Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- age -.0344535.0080028-4.31 0.000 -.0501387 -.0187682 smoking2 -.1772179.0342687-5.17 0.000 -.2443834 -.1100524 smoking3.0346093.0540126 0.64 0.522 -.0712534.140472 agebase.0142697.0082136 1.74 0.082 -.0018286.030368 htbase 5.091886.2154737 23.63 0.000 4.669566 5.514207 _cons -3.888907.3858109-10.08 0.000-4.645083-3.132732 ------------------------------------------------------------------------------

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 18 GEE population-averaged model Number of obs = 1141 Group variable: id Number of groups = 322 Link: identity Obs per group: min = 1 Family: Gaussian avg = 3.5 Correlation: exchangeable max = 9 Wald chi2(5) = 253.13 Scale parameter:.2547384 Prob > chi2 = 0.0000 (Std. Err. adjusted for clustering on id) ------------------------------------------------------------------------------ Semi-robust fev Coef. Std. Err. z P> z [95% Conf. Interval] -------------+---------------------------------------------------------------- age -.0354697.0040787-8.70 0.000 -.0434638 -.0274755 smoking2 -.0483209.02919-1.66 0.098 -.1055323.0088906 smoking3 -.0098896.022714-0.44 0.663 -.0544083.0346291 agebase.0152248.0054013 2.82 0.005.0046385.0258111 htbase 4.962204.404878 12.26 0.000 4.168658 5.75575 _cons -3.702856.717312-5.16 0.000-5.108762-2.29695 ------------------------------------------------------------------------------

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 19 QIC Corr = exchangeable Family = gau Link = iden p = 6 Trace = 11.535 QIC = 313.727

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 20. qic fev age smoking2 smoking3 agebase htbase, i(id) t(wave) nodisplay QIC Corr = exc Family = gau Link = iden p = 6 Trace = 11.535 QIC = 313.727

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 21 Table 2: QIC for selection of correlation structure Correlation Variable p QIC Independent age, smoking, agebase, htbase 6 324.19 Exchangeable age, smoking, agebase, htbase 6 313.73 Autoregressive age, smoking, agebase, htbase 6 321.16 Stationary age, smoking, agebase, htbase 6 322.66 Non-stationary age, smoking, agebase, htbase 6 322.49 Unstructured age, smoking, agebase, htbase 6 322.83

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 22. qic fev age, i(id) t(wave) corr(exchangeable) nolog nodisplay QIC Corr = exchangeable Family = gau Link = iden p = 2 Trace = 5.260 QIC = 454.081

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 23 Table 3: QIC for selection of the best model Correlation Variable p QIC Exchangeable age 2 454.08 Exchangeable agebase 2 455.19 Exchangeable htbase 2 340.47 Exchangeable smoking 3 469.37 Exchangeable age, agebase 3 449.41 Exchangeable age, htbase 3 314.94 Exchangeable agebase, htbase 3 317.44 Exchangeable age, smoking 4 452.72 Exchangeable smoking, agebase 4 455.58 Exchangeable smoking, htbase 4 342.20 Exchangeable age, agebase, htbase 4 314.81 Exchangeable age, smoking, agebase 5 447.22 Exchangeable age, smoking, htbase 5 314.30 Exchangeable smoking, agebase, htbase 5 318.51 Exchangeable age, smoking, agebase, htbase 6 313.73

The 3rd Australian and New Zealand Stata Users Group Meeting, Sydney, 5 November 2009 24 6. Summary Developed a new program qic using Stata software Flexible & easy to use Include all distribution functions, link functions and correlation structures available in Stata