ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

Save this PDF as:

Size: px
Start display at page:

Transcription

1 ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November / 53

2 Outline Survival Data Example: Malignant Melanoma Data The Cox Model Cox in SAS Choice of Time-Scale Example: Guinea-Bissau Data Delayed entries Time dependent explanatory variables 2 / 53

3 d i=1 exp(βx i ) j R(t i ) exp(βx j) 3 / 53

4 Survival Data Time to death or other event of interest. One time-scale including a well-defined starting time time-origin: Time from start of randomized clinical trial to death. Time from first employment to pension. Time from filling of a tooth to filling falls out. What is special about survival data? Right-skewed. No problem. CENSORING: For some we will only know a lower bound of lifetime. 4 / 53

5 Simple data Individual Times (months) 5 / 53

6 Survival and hazard function Let T be the TIME to event of interest: S(t) = P(T > t) = probability of survival to time t after entry at time 0 λ(t) = incidence, rate, or hazard Relationship: S(t) = exp ( t ) λ(s)ds = exp( Λ(t)) 0 Λ(t) is called the integrated hazard function. 6 / 53

7 λ(t) = λ S(t) = e λt Hazard rate Survival Function Time (t) Time (t) Λ(t) = λt Integrated hazard Time (t) 7 / 53

8 Kaplan-Meier estimate of survival function Death times t 1,..., t d (ordered). Y (t i ) = # alive just before t i. Ŝ(t) = ( 1 1 ) Y (t t i t i ) Risk sets Individual Times (months) 8 / 53

9 Survival probability Kaplan Meier survival estimate Time (months) Number at risk / 53

10 Malignant Melanoma Data In the period a total of 205 patients had their tumor removed and were followed until At the end of 1977: 57 died of mgl. mel. (status=1) 134 were still alive. (status=2) 14 died of non-related mgl. mel. (status=3) competing risk Purpose: Study effect on survival of sex, age, thickness of tumor, ulceration, etc / 53

11 Malignant melanoma N time status sex age year thickness ulcer / 53

12 The Cox Model The Cox model assumes that the rate for the ith individual is λ i (t) = λ 0 (t) exp(β 1 X i1 + β 2 X i β p X ip ) where β 1, β 2,..., β p are regression parameters, X i1 is the covariate value for covariate 1 for individual i, etc. Finally, λ 0 (t) is the baseline hazard. Time t is the time-scale of choice, e.g. age, time since randomization, or time since operation. As formulated here the only quantity on the right-hand side of the equal sign that depends on time is the baseline hazard λ 0 (t). If all covariates (X s) are zero we get λ i (t) = λ 0 (t). The interpretation of the baseline hazard is thus the hazard of a individual that have all covariates equal to zero. 12 / 53

13 The Cox model λ i (t) = λ 0 (t) exp(β 1 X i1 + β 2 X i β p X ip ) can also be written on the log-scale (natural log) log(λ i (t)) = log(λ 0 (t) exp(β 1 X i1 + β 2 X i β p X ip )) The Cox model assumes that = log(λ 0 (t)) + β 1 X i1 + β 2 X i β p X ip. the effects of covariates are additive and linear on the log rate scale, just like the poisson regression. the CORNER i.e. the baseline hazard is non-parametric and depends on time, and time is thus adjusted for. We now turn to the interpretation of the regression parameters β 1, β 2,..., β p. 13 / 53

14 One binary covariate To make things more simple we only study the effect of one single binary covariate, e.g. sex on the risk of dying { 0 if individual i is a female X i = 1 if individual i is a male The Cox model is λ i (t) = λ 0 (t) exp(βx i ). With X i defined as above we get { λ 0 (t) if individual i is a female λ i (t) = λ 0 (t) exp(β) if individual i is a male 14 / 53

15 Mortality Rate Ratio Hazard Ratio If λ i (t) = { λ 0 (t) λ 0 (t) exp(β) if individual i is a female if individual i is a male then we have that the RATE RATIO (RR) between males and females is RR = λ 0(t) exp(β) = exp(β). λ 0 (t) Importantly, the ratio is independent of time, i.e. we have PROPORTIONAL HAZARDS over time. The Cox model is also called the proportional hazards model. How to estimate β? And what about baseline hazard λ 0 (t)? 15 / 53

16 Likelihood Function The baseline hazard is regarded as a nuisance and is not in general estimated, but it is possible. Let t 1,..., t d be the ordered death times It can been shown, that all we need is to find the β that maximizes the following function called Cox s partial likelihood function d exp(βx i ) L(β) = j R(t i ) exp(βx j) i=1 where R(t i ) is the RISK SET at death time t i i.e. the set of individuals being at risk of dying (under observation) just before time t i. The resulting estimate β is called the MAXIMUM LIKELIHOOD ESTIMATE of β. 16 / 53

17 Likelihood Function a closer look Death times t 1,..., t d, numbering individuals with deaths first: i = 1, 2,..., d, d + 1,..., n. with times and covariates t 1, t 2,..., t d, t d+1,..., t n. X 1, X 2,..., X d, X d+1,..., X n. At each death time we have the RISK SET: individuals alive and at risk of dying just before the death time: R(t 1 ), R(t 2 ),..., R(t d ) 17 / 53

18 Risk sets Individual Times (months) 18 / 53

19 For the Cox model λ i (t) = λ 0 (t) exp(βx i ) we use the Cox likelihood function to estimate β: L(β) = = d exp(βx i ) j R(t i ) exp(βx j) i=1 exp(βx 1 ) j R(t 1 ) exp(βx j) exp(βx 2 ) j R(t 2 ) exp(βx j) exp(βx d ) j R(t d ) exp(βx j) We index individuals in the risk sets using the letter j. Writing j R(t 1 ) exp(βx j) means summing over the individuals in the risk set for death time t 1. If we here assume that no one was censored before the first death time all individuals are in the risk set R(t 1 ) and the sum is exp(βx 1 ) + exp(βx 2 ) + + exp(βx n ). 19 / 53

20 For example for the Cox model λ i (t) = λ 0 (t) exp(β sex) Sex: 1=male, 0=female. Likelihood function: exp(β) j R(t 1 ) exp(βx j) 1 j R(t 2 ) exp(βx j) exp(β) j R(t d ) exp(βx j). If we again assume that no one was censored before the first death time all individuals are in the risk set R(t 1 ) and the sum is exp(β) exp(β) = N M exp(β) + N F, where N M and N F number of males and females respectively in R(t 1 ). The risk sets also play a crucial role in nested case-control studies more on this later in the course. 20 / 53

21 So far the following assumptions have been made for the Cox model The baseline hazard is assumed non-parametric, i.e. assumed to vary freely. The effects of covariates are additive and linear on the log rate scale. The ratio of the hazard rate for two subjects are constant over time. In other words, there is no interaction between the covariates and the time variable. Let us look at the Melanoma data using SAS. 21 / 53

22 Kaplan Meier survival estimates, by sex Time (years) female male What is the estimate of the RR between males and females? 22 / 53

23 Cox in SAS In SAS, proc phreg and proc tphreg can be used for estimating in the Cox model. We will use proc tphreg as this procedure can handle categorical variables much easier than proc phreg. Using proc tphreg we define the variable sex to be categorical using the class statement. For the variable sex 1 is males and 0 is females. proc tphreg data=melanom; class sex; model time*status(2,3) = sex; run; Please note, that we have two censoring codes namely 2 and 3. NB: In SAS 9.2 proc phreg now handles class variables and proc tphreg is obsolete. 23 / 53

24 Part of output from proc tphreg: Analysis of Maximum Likelihood Estimates Parameter Standard Hazard Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio sex The column Parameter Estimate is β. For a class variable SAS will automatically choose the highest number (here 1) as the reference. Thus, the rate ratio or Hazard Ratio is females compared to males. There is no estimate statement in proc (t)phreg, but a similar so-called contrast statement exists. Instead we can use the ref option in the class statement. Note also the option risklimits in the model statement which calculates the confidence interval for the hazard ratio. 24 / 53

25 proc tphreg data=melanom; class sex(ref="0"); model time*status(2,3) = sex / risklimits; run;... Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits sex / 53

26 Melanoma data, thickness of tumor given by variable gtyk 1 if <2mm gtyk = 2 if 2-5 mm 3 if >5 mm proc tphreg data=melanom; class gtyk; model time*status(2,3) = gtyk / risklimits; run; Type 3 Tests Wald Effect DF Chi-Square Pr > ChiSq gtyk <.0001 Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits gtyk < gtyk / 53

27 Melanoma data, + age in years proc tphreg data=melanom; class gtyk sex; model time*status(2,3) = gtyk sex age / risklimits; run; Type 3 Tests Wald Effect DF Chi-Square Pr > ChiSq sex gtyk <.0001 age Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits sex gtyk < gtyk age / 53

28 LR = = 28.0 χ 2 2 (2 degrees of freedom) 28 / 53 Likelihood Ratio Test. proc tphreg data=melanom; class gtyk sex; model time*status(2,3) = gtyk sex; run; Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L AIC SBC proc tphreg data=melanom; class sex; model time*status(2,3) = sex; run; Model Fit Statistics Without With Criterion Covariates Covariates -2 LOG L AIC SBC

29 SAS: p-value from chi-square test data temp; chisquare=28; df=2; p=1-probchi(chisquare,df); run; proc print data=temp; run; Obs chisquare df p / 53

30 Choice of Time-Scale A study may be conducted over calendar time even though the natural time-scale is time since treatment Melanoma study. Cohort studies are often conducted by recruiting a random sample of the population at the start of the study and then these subjects are followed for a number of years Framingham. A natural time-scale may be age rather than time in study which most often is an artificial time-scale constructed by the investigators. What would time-origin be if age was chosen as time-scale? 30 / 53

31 Vaccinations in Guinea-Bissau Rural Guinea-Bissau: 5274 children under 7 months of age visited two times at home, with an interval of six months. Information about vaccination (BCG, DTP, mealses vaccine) collected at each visit and at second visit death during follow-up is registered. Some children moved away during follow-up, i.e. censored or survived until next visit, also censored. Below are some of the variable names from the bissau data. fuptime dead bcg agem Follow-up time in days 0 = censored, 1 = dead 1 = Yes, 2 = No Age at first visit in months 31 / 53

32 Is the risk of dying associated with vaccination? Outcome Exposure Died Survived Total BCG vaccinated 125 (3.8%) not BCG vaccinated 97 (4.9%) Total 222 (4.2%) / 53

33 proc tphreg data=bissau; class bcg; model fuptime*dead(0)=bcg / rl ; run; Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio Score Wald Type 3 Tests Wald Effect DF Chi-Square Pr > ChiSq bcg Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits bcg / 53

34 proc tphreg data=bissau; class bcg agem; model fuptime*dead(0)=bcg agem / rl ; run; Type 3 Tests Wald Effect DF Chi-Square Pr > ChiSq bcg agem Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits bcg agem agem agem agem agem agem / 53

35 Delayed entries Time in study Age as time Individual 7 6 Individual Times (months) Age (months) 35 / 53

36 Subjects are only at risk at age of entry and onwards. They are not at risk in our World of analysis before age of entry! Handling of delayed entries is easily done by careful control of the RISK SET R(t i ) at death time t i in the likelihood function: L(β) = d exp(βx i ) j R(t i ) exp(βx j) i=1 Only individuals at risk and under observation is included in the risk set R(t i ) at time t i. 36 / 53

37 Delayed entries in SAS data bissau2; set bissau; outage=age+fuptime; run; proc tphreg data=bissau2; class bcg; model (age,outage)*dead(0)= bcg / rl; run; Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits bcg / 53

38 Time dependent explanatory variables The Cox model can be expanded to include time-varying covariates λ i (t) = λ 0 (t) exp(βx i (t)). The likelihood function for death times t 1,..., t d becomes L(β) = d i=1 exp(βx i (t i )) j R(t i ) exp(βx j(t i )). From this we can see that we just need to know the value of the covariates at the deaths times: X i (t 1 ), X i (t 2 ),..., X i (t d ). The covariate values at any time different from a death time is not used in the likelihood function. 38 / 53

39 The most simple time-varying covariate is a binary variable that is allowed to change once during follow-up, e.g. new BCG vaccinations registered between visits in the Bissau data: X i (t) = { 0 if no BCG before time t 1 if BCG-time t 39 / 53

40 A child being BCG-vaccinated after 3 months of follow-up. BCG Follow up (months) The time-varying covariate is 0 in the time interval 0 to 3 months and 1 for the rest of follow-up. For a child who was BCG vaccinated before first visit the time-varying covariate is one during all the follow-up. 40 / 53

41 Multi-state Model λ 01 (t) 0 1 Unexposed Exposed λ 02 (t) 2 Dead λ 12 (t) We want to compare λ 02 (t) and λ 12 (t). The transition λ 01 (t) is not modeled here. 41 / 53

42 Instead of time of follow-up we will use age as time-scale to illustrate the use of BCG as a time-varying covariate in the Bissau data. At visit 2 the vaccination cards were seen for the children at home and an age of BCG vaccination (bcgage) was calculated: id fuptime dead age bcg bcgage outage / 53

43 Binary time-varying covariate in SAS (I) proc tphreg data=bcg; if.<bcgage<outage then bcg_t=1; else bcg_t=0; model (age,outage)*dead(0)=bcg_t / rl ; run; Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits bcg_t < / 53

44 The if-statement if.<bcgage<outage then bcg_t=1; else bcg_t=0; is recalculated at each death time. The outage in the model statement refers to the current death times being evaluated (i.e. a t i in the likelihood). For the first death time which is t 1 = 23 days of age, the if-statement becomes if.<bcgage<23 then bcg_t=1; else bcg_t=0; being calculated for all children at risk at age 23 days (in R(t 1 = 23)) with their individual bcgage-values. This is a recalculation of the time-varying covariate at each death time c.f. the likelihood function. 44 / 53

45 Binary time-varying covariate in SAS (II) Splitting up persons with a changing time-varying covariate in two records: age bcgage outage bcgvacc=0 status=0 bcgvacc=1 status=dead and use delayed entries. Thus, we need to generate a new data set. 45 / 53

46 data splitbcg; set bcg; if bcgage=. or bcgage>outage then do; bcgvacc=0; entryage=age; exitage=outage; status=dead; output; end; if.<bcgage<=age then do; bcgvacc=1; entryage=age; exitage=outage; status=dead; output; end; if age<bcgage<=outage then do; bcgvacc=0; entryage=age ; exitage=bcgage; status= 0; output; bcgvacc=1; entryage=bcgage; exitage=outage; status=dead; output; end; run; id fuptime dead age bcg bcgage outage bcgvacc entryage exitage status / 53

47 proc tphreg data=splitbcg; class bcgvacc(ref="0"); model (entryage,exitage)*status(0)=bcgvacc / rl ; run; Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error Chi-Square Pr > ChiSq Ratio Confidence Limits bcgvacc < / 53

48 Other time-varying covariates Effect of binary X (0,1) changes at t 0 : where λ i (t) = λ 0 (t) exp(β 1 X i + β 2 X i I (t t 0 )), I (t t 0 ) = Can be handled by method I+II. { 1 if t t 0 0 if t < t 0 Effect of binary X (0,1) decreases or increases with time: λ i (t) = λ 0 (t) exp(β 1 X i + β 2 (X i t)). Can be handled by method I or by splitting at each failure or special options. 48 / 53

49 Stanford Heart Transplant Data (p. 235) In a report (Crowley and Hu, J Amer. Statist Assoc. 1977) on the Stanford Heart Transplantation Study, patients identified as been eligible (N=103) for a heart transplant were followed until death or censorship. In total 65 received transplant during follow-up, whereas 38 did not. Assess whether transplanted patients survive better. On the next slide you will find the variables in the transplant data set. Here we will discuss how to analyse and at the exercises we will do some of the analyses. 49 / 53

50 Stanford Heart Transplant Data variables age cens days trans wait mismatch age (in years) at entry into the study. 0 = Censoring 1 = Dead number of days from entry to dead/censoring. 1 = if the person had a heart transplantation 0 = otherwise. number of days from entry to transplantation NB: if trans = 0 then wait = -1 1 = mismatch between HLA type in donor and patient 0 = no mismatch NB: if trans = 0 then mismatch = / 53

51 Obs age cens days trans wait mismatch / 53

52 Piecewise Constant Hazard Rate = Poisson regression Divide the time scale into K pieces and assuming piecewise constant but different hazard rates in each of the intervals. This may provide a sensible summary of many phenomena and is often used in epidemiology. λ 1 λ 2 λ 3 λ K c 0 = 0 c 1 c 2 c 3 c K 1 c K Age Thus λ(t) = λ k for t (c k 1, c k ], k = 1,..., K The intervals do not need to be of same length. We only need to keep record of the total number of deaths and the exposure time in each group. 52 / 53

53 We can further divide each interval into categories of covariates, e.g. sex (F=females, M=males): λ 1F λ 2F λ 3F λ KF λ 1M λ 2M λ 3M λ KM c 0 = 0 c 1 c 2 c 3 c K 1 c K Age Not straight forward in SAS to split the time-scale, but so-called user-written SAS-macros exist. See for example: Stata use stsplit command. R packages exist (e.g. Epi Package) SPSS? 53 / 53

Extensions of Cox Model for Non-Proportional Hazards Purpose

PhUSE 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Jadwiga Borucka, PAREXEL, Warsaw, Poland ABSTRACT Cox proportional hazard model is one of the most common methods used

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

Multinomial Logistic Regression Models

Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

STAT331. Cox s Proportional Hazards Model

STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

Survival Analysis. Lu Tian and Richard Olshen Stanford University

1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

Log-linearity for Cox s regression model Thesis for the Degree Master of Science Zaki Amini Master s Thesis, Spring 2015 i Abstract Cox s regression model is one of the most applied methods in medical

STA6938-Logistic Regression Model

Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Outline CHL 5225H Advanced Statistical Methods for Clinical Trials: Survival Analysis Prof. Kevin E. Thorpe Defining Survival Data Mathematical Definitions Non-parametric Estimates of Survival Comparing

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of

Longitudinal Modeling with Logistic Regression

Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

Lecture 8 Stat D. Gillen

Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels

Tied survival times; estimation of survival probabilities

Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation

Correlation and regression

1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

Multistate Modeling and Applications

Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

Inference for Binomial Parameters

Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

Methodological challenges in research on consequences of sickness absence and disability pension?

Methodological challenges in research on consequences of sickness absence and disability pension? Prof., PhD Hjelt Institute, University of Helsinki 2 Two methodological approaches Lexis diagrams and Poisson

9 Generalized Linear Models

9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

Generalized Linear Modeling - Logistic Regression

1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

Chapter 4 Regression Models

23.August 2010 Chapter 4 Regression Models The target variable T denotes failure time We let x = (x (1),..., x (m) ) represent a vector of available covariates. Also called regression variables, regressors,

PASS Sample Size Software. Poisson Regression

Chapter 870 Introduction Poisson regression is used when the dependent variable is a count. Following the results of Signorini (99), this procedure calculates power and sample size for testing the hypothesis

Models for Binary Outcomes

Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

The coxvc_1-1-1 package

Appendix A The coxvc_1-1-1 package A.1 Introduction The coxvc_1-1-1 package is a set of functions for survival analysis that run under R2.1.1 [81]. This package contains a set of routines to fit Cox models

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

Lecture 8. Poisson models for counts

Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes

Time-Dependent Covariates Survival More in PROC PHREG Fengying Xue,Sanofi R&D, China Michael Lai, Sanofi R&D, China

Time-Dependent Covariates Survival More in PROC PHREG Fengying Xue,Sanofi R&D, China Michael Lai, Sanofi R&D, China ABSTRACT Survival analysis is a powerful tool with much strength, especially the semi-parametric

Introduction to SAS proc mixed

Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The

Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

SAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;

SAS Analysis Examples Replication C8 * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; libname ncsr "P:\ASDA 2\Data sets\ncsr\" ; data c8_ncsr ; set ncsr.ncsr_sub_13nov2015

Logistic regression: Miscellaneous topics

Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Analysis of competing risks data and simulation of data following predened subdistribution hazards Bernhard Haller Institut für Medizinische Statistik und Epidemiologie Technische Universität München 27.05.2013

A note on R 2 measures for Poisson and logistic regression models when both models are applicable

Journal of Clinical Epidemiology 54 (001) 99 103 A note on R measures for oisson and logistic regression models when both models are applicable Martina Mittlböck, Harald Heinzl* Department of Medical Computer

Müller: Goodness-of-fit criteria for survival data

Müller: Goodness-of-fit criteria for survival data Sonderforschungsbereich 386, Paper 382 (2004) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Goodness of fit criteria for survival data

MODULE 6 LOGISTIC REGRESSION. Module Objectives:

MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between

Chapter 5: Logistic Regression-I

: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

ST495: Survival Analysis: Maximum likelihood

ST495: Survival Analysis: Maximum likelihood Eric B. Laber Department of Statistics, North Carolina State University February 11, 2014 Everything is deception: seeking the minimum of illusion, keeping

Lecture 41 Sections Mon, Apr 7, 2008

Lecture 41 Sections 14.1-14.3 Hampden-Sydney College Mon, Apr 7, 2008 Outline 1 2 3 4 5 one-proportion test that we just studied allows us to test a hypothesis concerning one proportion, or two categories,

Survival Distributions, Hazard Functions, Cumulative Hazards

BIO 244: Unit 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution

6. Multiple regression - PROC GLM

Use of SAS - November 2016 6. Multiple regression - PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. http://biostat.ku.dk/~kach/sas2016/ kach@biostat.ku.dk, tel: 35327491

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

Ignoring the matching variables in cohort studies - when is it valid, and why?

Ignoring the matching variables in cohort studies - when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposure-outcome association

Model Estimation Example

Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)

SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTER-RANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000) AMITA K. MANATUNGA THE ROLLINS SCHOOL OF PUBLIC HEALTH OF EMORY UNIVERSITY SHANDE

Standardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE

ORIGINAL ARTICLE Marginal Structural Models as a Tool for Standardization Tosiya Sato and Yutaka Matsuyama Abstract: In this article, we show the general relation between standardization methods and marginal

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

Flexible modelling of the cumulative effects of time-varying exposures

Flexible modelling of the cumulative effects of time-varying exposures Applications in environmental, cancer and pharmaco-epidemiology Antonio Gasparrini Department of Medical Statistics London School

Statistical Modelling with Stata: Binary Outcomes

Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls

Multivariable Fractional Polynomials

Multivariable Fractional Polynomials Axel Benner September 7, 2015 Contents 1 Introduction 1 2 Inventory of functions 1 3 Usage in R 2 3.1 Model selection........................................ 3 4 Example

Tests for Two Correlated Proportions in a Matched Case- Control Design

Chapter 155 Tests for Two Correlated Proportions in a Matched Case- Control Design Introduction A 2-by-M case-control study investigates a risk factor relevant to the development of a disease. A population

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Title stata.com stcrreg postestimation Postestimation tools for stcrreg Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

Lab 11. Multilevel Models. Description of Data

Lab 11 Multilevel Models Henian Chen, M.D., Ph.D. Description of Data MULTILEVEL.TXT is clustered data for 386 women distributed across 40 groups. ID: 386 women, id from 1 to 386, individual level (level

Appendix: Computer Programs for Logistic Regression

Appendix: Computer Programs for Logistic Regression In this appendix, we provide examples of computer programs to carry out unconditional logistic regression, conditional logistic regression, polytomous

Analysing categorical data using logit models

Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect

Linear Regression Models P8111

Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )

Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376-390) BIO656 2009 Goal: To see if a major health-care reform which took place in 1997 in Germany was

Meta-analysis of epidemiological dose-response studies

Meta-analysis of epidemiological dose-response studies Nicola Orsini 2nd Italian Stata Users Group meeting October 10-11, 2005 Institute Environmental Medicine, Karolinska Institutet Rino Bellocco Dept.

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials

Analytic Methods for Applied Epidemiology: Framework and Contingency Table Analysis

Analytic Methods for Applied Epidemiology: Framework and Contingency Table Analysis 2014 Maternal and Child Health Epidemiology Training Pre-Training Webinar: Friday, May 16 2-4pm Eastern Kristin Rankin,

A Re-Introduction to General Linear Models

A Re-Introduction to General Linear Models Today s Class: Big picture overview Why we are using restricted maximum likelihood within MIXED instead of least squares within GLM Linear model interpretation

Dynamic Determination of Mixed Model Covariance Structures. in Double-blind Clinical Trials. Matthew Davis - Omnicare Clinical Research

PharmaSUG2010 - Paper SP12 Dynamic Determination of Mixed Model Covariance Structures in Double-blind Clinical Trials Matthew Davis - Omnicare Clinical Research Abstract With the computing power of SAS

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method

Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Yan Wang 1, Michael Ong 2, Honghu Liu 1,2,3 1 Department of Biostatistics, UCLA School

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence

Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey

over Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */

CLP 944 Example 4 page 1 Within-Personn Fluctuation in Symptom Severity over Time These data come from a study of weekly fluctuation in psoriasis severity. There was no intervention and no real reason

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs

STAT 5500/6500 Conditional Logistic Regression for Matched Pairs The data for the tutorial came from support.sas.com, The LOGISTIC Procedure: Conditional Logistic Regression for Matched Pairs Data :: SAS/STAT(R)

Unit 9: Inferences for Proportions and Count Data

Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9 - Stat 571 - Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)

Lecture 5: ANOVA and Correlation

Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD

Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable

Frailty Modeling for clustered survival data: a simulation study

Frailty Modeling for clustered survival data: a simulation study IAA Oslo 2015 Souad ROMDHANE LaREMFiQ - IHEC University of Sousse (Tunisia) souad_romdhane@yahoo.fr Lotfi BELKACEM LaREMFiQ - IHEC University

Statistical Methods in Clinical Trials Categorical Data

Statistical Methods in Clinical Trials Categorical Data Types of Data quantitative Continuous Blood pressure Time to event Categorical sex qualitative Discrete No of relapses Ordered Categorical Pain level

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

Poisson Regression. Ryan Godwin. ECON University of Manitoba

Poisson Regression Ryan Godwin ECON 7010 - University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots. March 8, 2015

STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and Q-Q plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis

In Class Review Exercises Vartanian: SW 540

In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison

Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs Christopher Jennison Department of Mathematical Sciences, University of Bath http://people.bath.ac.uk/mascj

Continuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21

Hazard functions Patrick Breheny August 27 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21 Introduction Continuous case Let T be a nonnegative random variable representing the time to an event

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.

Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk

Investigating Models with Two or Three Categories

Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

Exact McNemar s Test and Matching Confidence Intervals Michael P. Fay April 25,

Exact McNemar s Test and Matching Confidence Intervals Michael P. Fay April 25, 2016 1 McNemar s Original Test Consider paired binary response data. For example, suppose you have twins randomized to two

Categorical and Zero Inflated Growth Models

Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State University, Corvallis OR 97331 (alan.acock@oregonstate.edu).

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

EDF 7405 Advanced Quantitative Methods in Educational Research Data are available on IQ of the child and seven potential predictors. Four are medical variables available at the birth of the child: Birthweight

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

Applied Linear Statistical Methods

Applied Linear Statistical Methods (short lecturenotes) Prof. Rozenn Dahyot School of Computer Science and Statistics Trinity College Dublin Ireland www.scss.tcd.ie/rozenn.dahyot Hilary Term 2016 1. Introduction

Lecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson

Lecture 10: Alternatives to OLS with limited dependent variables PEA vs APE Logit/Probit Poisson PEA vs APE PEA: partial effect at the average The effect of some x on y for a hypothetical case with sample

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Introduction This course is an applied course,

The Ef ciency of Simple and Countermatched Nested Case-control Sampling

Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA Vol 26: 493±509, 1999 The Ef ciency of Simple and Countermatched Nested Case-control

Introduction to the rstpm2 package

Introduction to the rstpm2 package Mark Clements Karolinska Institutet Abstract This vignette outlines the methods and provides some examples for link-based survival models as implemented in the R rstpm2

M3 Symposium: Multilevel Multivariate Survival Models For Analysis of Dyadic Social Interaction

M3 Symposium: Multilevel Multivariate Survival Models For Analysis of Dyadic Social Interaction Mike Stoolmiller: stoolmil@uoregon.edu University of Oregon 5/21/2013 Outline Example Research Questions

Known unknowns : using multiple imputation to fill in the blanks for missing data

Known unknowns : using multiple imputation to fill in the blanks for missing data James Stanley Department of Public Health University of Otago, Wellington james.stanley@otago.ac.nz Acknowledgments Cancer