ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables


 Catherine Kelly
 7 months ago
 Views:
Transcription
1 ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November / 53
2 Outline Survival Data Example: Malignant Melanoma Data The Cox Model Cox in SAS Choice of TimeScale Example: GuineaBissau Data Delayed entries Time dependent explanatory variables 2 / 53
3 d i=1 exp(βx i ) j R(t i ) exp(βx j) 3 / 53
4 Survival Data Time to death or other event of interest. One timescale including a welldefined starting time timeorigin: Time from start of randomized clinical trial to death. Time from first employment to pension. Time from filling of a tooth to filling falls out. What is special about survival data? Rightskewed. No problem. CENSORING: For some we will only know a lower bound of lifetime. 4 / 53
5 Simple data Individual Times (months) 5 / 53
6 Survival and hazard function Let T be the TIME to event of interest: S(t) = P(T > t) = probability of survival to time t after entry at time 0 λ(t) = incidence, rate, or hazard Relationship: S(t) = exp ( t ) λ(s)ds = exp( Λ(t)) 0 Λ(t) is called the integrated hazard function. 6 / 53
7 λ(t) = λ S(t) = e λt Hazard rate Survival Function Time (t) Time (t) Λ(t) = λt Integrated hazard Time (t) 7 / 53
8 KaplanMeier estimate of survival function Death times t 1,..., t d (ordered). Y (t i ) = # alive just before t i. Ŝ(t) = ( 1 1 ) Y (t t i t i ) Risk sets Individual Times (months) 8 / 53
9 Survival probability Kaplan Meier survival estimate Time (months) Number at risk / 53
10 Malignant Melanoma Data In the period a total of 205 patients had their tumor removed and were followed until At the end of 1977: 57 died of mgl. mel. (status=1) 134 were still alive. (status=2) 14 died of nonrelated mgl. mel. (status=3) competing risk Purpose: Study effect on survival of sex, age, thickness of tumor, ulceration, etc / 53
11 Malignant melanoma N time status sex age year thickness ulcer / 53
12 The Cox Model The Cox model assumes that the rate for the ith individual is λ i (t) = λ 0 (t) exp(β 1 X i1 + β 2 X i β p X ip ) where β 1, β 2,..., β p are regression parameters, X i1 is the covariate value for covariate 1 for individual i, etc. Finally, λ 0 (t) is the baseline hazard. Time t is the timescale of choice, e.g. age, time since randomization, or time since operation. As formulated here the only quantity on the righthand side of the equal sign that depends on time is the baseline hazard λ 0 (t). If all covariates (X s) are zero we get λ i (t) = λ 0 (t). The interpretation of the baseline hazard is thus the hazard of a individual that have all covariates equal to zero. 12 / 53
13 The Cox model λ i (t) = λ 0 (t) exp(β 1 X i1 + β 2 X i β p X ip ) can also be written on the logscale (natural log) log(λ i (t)) = log(λ 0 (t) exp(β 1 X i1 + β 2 X i β p X ip )) The Cox model assumes that = log(λ 0 (t)) + β 1 X i1 + β 2 X i β p X ip. the effects of covariates are additive and linear on the log rate scale, just like the poisson regression. the CORNER i.e. the baseline hazard is nonparametric and depends on time, and time is thus adjusted for. We now turn to the interpretation of the regression parameters β 1, β 2,..., β p. 13 / 53
14 One binary covariate To make things more simple we only study the effect of one single binary covariate, e.g. sex on the risk of dying { 0 if individual i is a female X i = 1 if individual i is a male The Cox model is λ i (t) = λ 0 (t) exp(βx i ). With X i defined as above we get { λ 0 (t) if individual i is a female λ i (t) = λ 0 (t) exp(β) if individual i is a male 14 / 53
15 Mortality Rate Ratio Hazard Ratio If λ i (t) = { λ 0 (t) λ 0 (t) exp(β) if individual i is a female if individual i is a male then we have that the RATE RATIO (RR) between males and females is RR = λ 0(t) exp(β) = exp(β). λ 0 (t) Importantly, the ratio is independent of time, i.e. we have PROPORTIONAL HAZARDS over time. The Cox model is also called the proportional hazards model. How to estimate β? And what about baseline hazard λ 0 (t)? 15 / 53
16 Likelihood Function The baseline hazard is regarded as a nuisance and is not in general estimated, but it is possible. Let t 1,..., t d be the ordered death times It can been shown, that all we need is to find the β that maximizes the following function called Cox s partial likelihood function d exp(βx i ) L(β) = j R(t i ) exp(βx j) i=1 where R(t i ) is the RISK SET at death time t i i.e. the set of individuals being at risk of dying (under observation) just before time t i. The resulting estimate β is called the MAXIMUM LIKELIHOOD ESTIMATE of β. 16 / 53
17 Likelihood Function a closer look Death times t 1,..., t d, numbering individuals with deaths first: i = 1, 2,..., d, d + 1,..., n. with times and covariates t 1, t 2,..., t d, t d+1,..., t n. X 1, X 2,..., X d, X d+1,..., X n. At each death time we have the RISK SET: individuals alive and at risk of dying just before the death time: R(t 1 ), R(t 2 ),..., R(t d ) 17 / 53
18 Risk sets Individual Times (months) 18 / 53
19 For the Cox model λ i (t) = λ 0 (t) exp(βx i ) we use the Cox likelihood function to estimate β: L(β) = = d exp(βx i ) j R(t i ) exp(βx j) i=1 exp(βx 1 ) j R(t 1 ) exp(βx j) exp(βx 2 ) j R(t 2 ) exp(βx j) exp(βx d ) j R(t d ) exp(βx j) We index individuals in the risk sets using the letter j. Writing j R(t 1 ) exp(βx j) means summing over the individuals in the risk set for death time t 1. If we here assume that no one was censored before the first death time all individuals are in the risk set R(t 1 ) and the sum is exp(βx 1 ) + exp(βx 2 ) + + exp(βx n ). 19 / 53
20 For example for the Cox model λ i (t) = λ 0 (t) exp(β sex) Sex: 1=male, 0=female. Likelihood function: exp(β) j R(t 1 ) exp(βx j) 1 j R(t 2 ) exp(βx j) exp(β) j R(t d ) exp(βx j). If we again assume that no one was censored before the first death time all individuals are in the risk set R(t 1 ) and the sum is exp(β) exp(β) = N M exp(β) + N F, where N M and N F number of males and females respectively in R(t 1 ). The risk sets also play a crucial role in nested casecontrol studies more on this later in the course. 20 / 53
21 So far the following assumptions have been made for the Cox model The baseline hazard is assumed nonparametric, i.e. assumed to vary freely. The effects of covariates are additive and linear on the log rate scale. The ratio of the hazard rate for two subjects are constant over time. In other words, there is no interaction between the covariates and the time variable. Let us look at the Melanoma data using SAS. 21 / 53
22 Kaplan Meier survival estimates, by sex Time (years) female male What is the estimate of the RR between males and females? 22 / 53
23 Cox in SAS In SAS, proc phreg and proc tphreg can be used for estimating in the Cox model. We will use proc tphreg as this procedure can handle categorical variables much easier than proc phreg. Using proc tphreg we define the variable sex to be categorical using the class statement. For the variable sex 1 is males and 0 is females. proc tphreg data=melanom; class sex; model time*status(2,3) = sex; run; Please note, that we have two censoring codes namely 2 and 3. NB: In SAS 9.2 proc phreg now handles class variables and proc tphreg is obsolete. 23 / 53
24 Part of output from proc tphreg: Analysis of Maximum Likelihood Estimates Parameter Standard Hazard Parameter DF Estimate Error ChiSquare Pr > ChiSq Ratio sex The column Parameter Estimate is β. For a class variable SAS will automatically choose the highest number (here 1) as the reference. Thus, the rate ratio or Hazard Ratio is females compared to males. There is no estimate statement in proc (t)phreg, but a similar socalled contrast statement exists. Instead we can use the ref option in the class statement. Note also the option risklimits in the model statement which calculates the confidence interval for the hazard ratio. 24 / 53
25 proc tphreg data=melanom; class sex(ref="0"); model time*status(2,3) = sex / risklimits; run;... Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error ChiSquare Pr > ChiSq Ratio Confidence Limits sex / 53
26 Melanoma data, thickness of tumor given by variable gtyk 1 if <2mm gtyk = 2 if 25 mm 3 if >5 mm proc tphreg data=melanom; class gtyk; model time*status(2,3) = gtyk / risklimits; run; Type 3 Tests Wald Effect DF ChiSquare Pr > ChiSq gtyk <.0001 Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error ChiSquare Pr > ChiSq Ratio Confidence Limits gtyk < gtyk / 53
27 Melanoma data, + age in years proc tphreg data=melanom; class gtyk sex; model time*status(2,3) = gtyk sex age / risklimits; run; Type 3 Tests Wald Effect DF ChiSquare Pr > ChiSq sex gtyk <.0001 age Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error ChiSquare Pr > ChiSq Ratio Confidence Limits sex gtyk < gtyk age / 53
28 LR = = 28.0 χ 2 2 (2 degrees of freedom) 28 / 53 Likelihood Ratio Test. proc tphreg data=melanom; class gtyk sex; model time*status(2,3) = gtyk sex; run; Model Fit Statistics Without With Criterion Covariates Covariates 2 LOG L AIC SBC proc tphreg data=melanom; class sex; model time*status(2,3) = sex; run; Model Fit Statistics Without With Criterion Covariates Covariates 2 LOG L AIC SBC
29 SAS: pvalue from chisquare test data temp; chisquare=28; df=2; p=1probchi(chisquare,df); run; proc print data=temp; run; Obs chisquare df p / 53
30 Choice of TimeScale A study may be conducted over calendar time even though the natural timescale is time since treatment Melanoma study. Cohort studies are often conducted by recruiting a random sample of the population at the start of the study and then these subjects are followed for a number of years Framingham. A natural timescale may be age rather than time in study which most often is an artificial timescale constructed by the investigators. What would timeorigin be if age was chosen as timescale? 30 / 53
31 Vaccinations in GuineaBissau Rural GuineaBissau: 5274 children under 7 months of age visited two times at home, with an interval of six months. Information about vaccination (BCG, DTP, mealses vaccine) collected at each visit and at second visit death during followup is registered. Some children moved away during followup, i.e. censored or survived until next visit, also censored. Below are some of the variable names from the bissau data. fuptime dead bcg agem Followup time in days 0 = censored, 1 = dead 1 = Yes, 2 = No Age at first visit in months 31 / 53
32 Is the risk of dying associated with vaccination? Outcome Exposure Died Survived Total BCG vaccinated 125 (3.8%) not BCG vaccinated 97 (4.9%) Total 222 (4.2%) / 53
33 proc tphreg data=bissau; class bcg; model fuptime*dead(0)=bcg / rl ; run; Testing Global Null Hypothesis: BETA=0 Test ChiSquare DF Pr > ChiSq Likelihood Ratio Score Wald Type 3 Tests Wald Effect DF ChiSquare Pr > ChiSq bcg Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error ChiSquare Pr > ChiSq Ratio Confidence Limits bcg / 53
34 proc tphreg data=bissau; class bcg agem; model fuptime*dead(0)=bcg agem / rl ; run; Type 3 Tests Wald Effect DF ChiSquare Pr > ChiSq bcg agem Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error ChiSquare Pr > ChiSq Ratio Confidence Limits bcg agem agem agem agem agem agem / 53
35 Delayed entries Time in study Age as time Individual 7 6 Individual Times (months) Age (months) 35 / 53
36 Subjects are only at risk at age of entry and onwards. They are not at risk in our World of analysis before age of entry! Handling of delayed entries is easily done by careful control of the RISK SET R(t i ) at death time t i in the likelihood function: L(β) = d exp(βx i ) j R(t i ) exp(βx j) i=1 Only individuals at risk and under observation is included in the risk set R(t i ) at time t i. 36 / 53
37 Delayed entries in SAS data bissau2; set bissau; outage=age+fuptime; run; proc tphreg data=bissau2; class bcg; model (age,outage)*dead(0)= bcg / rl; run; Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error ChiSquare Pr > ChiSq Ratio Confidence Limits bcg / 53
38 Time dependent explanatory variables The Cox model can be expanded to include timevarying covariates λ i (t) = λ 0 (t) exp(βx i (t)). The likelihood function for death times t 1,..., t d becomes L(β) = d i=1 exp(βx i (t i )) j R(t i ) exp(βx j(t i )). From this we can see that we just need to know the value of the covariates at the deaths times: X i (t 1 ), X i (t 2 ),..., X i (t d ). The covariate values at any time different from a death time is not used in the likelihood function. 38 / 53
39 The most simple timevarying covariate is a binary variable that is allowed to change once during followup, e.g. new BCG vaccinations registered between visits in the Bissau data: X i (t) = { 0 if no BCG before time t 1 if BCGtime t 39 / 53
40 A child being BCGvaccinated after 3 months of followup. BCG Follow up (months) The timevarying covariate is 0 in the time interval 0 to 3 months and 1 for the rest of followup. For a child who was BCG vaccinated before first visit the timevarying covariate is one during all the followup. 40 / 53
41 Multistate Model λ 01 (t) 0 1 Unexposed Exposed λ 02 (t) 2 Dead λ 12 (t) We want to compare λ 02 (t) and λ 12 (t). The transition λ 01 (t) is not modeled here. 41 / 53
42 Instead of time of followup we will use age as timescale to illustrate the use of BCG as a timevarying covariate in the Bissau data. At visit 2 the vaccination cards were seen for the children at home and an age of BCG vaccination (bcgage) was calculated: id fuptime dead age bcg bcgage outage / 53
43 Binary timevarying covariate in SAS (I) proc tphreg data=bcg; if.<bcgage<outage then bcg_t=1; else bcg_t=0; model (age,outage)*dead(0)=bcg_t / rl ; run; Analysis of Maximum Likelihood Estimates Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error ChiSquare Pr > ChiSq Ratio Confidence Limits bcg_t < / 53
44 The ifstatement if.<bcgage<outage then bcg_t=1; else bcg_t=0; is recalculated at each death time. The outage in the model statement refers to the current death times being evaluated (i.e. a t i in the likelihood). For the first death time which is t 1 = 23 days of age, the ifstatement becomes if.<bcgage<23 then bcg_t=1; else bcg_t=0; being calculated for all children at risk at age 23 days (in R(t 1 = 23)) with their individual bcgagevalues. This is a recalculation of the timevarying covariate at each death time c.f. the likelihood function. 44 / 53
45 Binary timevarying covariate in SAS (II) Splitting up persons with a changing timevarying covariate in two records: age bcgage outage bcgvacc=0 status=0 bcgvacc=1 status=dead and use delayed entries. Thus, we need to generate a new data set. 45 / 53
46 data splitbcg; set bcg; if bcgage=. or bcgage>outage then do; bcgvacc=0; entryage=age; exitage=outage; status=dead; output; end; if.<bcgage<=age then do; bcgvacc=1; entryage=age; exitage=outage; status=dead; output; end; if age<bcgage<=outage then do; bcgvacc=0; entryage=age ; exitage=bcgage; status= 0; output; bcgvacc=1; entryage=bcgage; exitage=outage; status=dead; output; end; run; id fuptime dead age bcg bcgage outage bcgvacc entryage exitage status / 53
47 proc tphreg data=splitbcg; class bcgvacc(ref="0"); model (entryage,exitage)*status(0)=bcgvacc / rl ; run; Parameter Standard Hazard 95% Hazard Ratio Parameter DF Estimate Error ChiSquare Pr > ChiSq Ratio Confidence Limits bcgvacc < / 53
48 Other timevarying covariates Effect of binary X (0,1) changes at t 0 : where λ i (t) = λ 0 (t) exp(β 1 X i + β 2 X i I (t t 0 )), I (t t 0 ) = Can be handled by method I+II. { 1 if t t 0 0 if t < t 0 Effect of binary X (0,1) decreases or increases with time: λ i (t) = λ 0 (t) exp(β 1 X i + β 2 (X i t)). Can be handled by method I or by splitting at each failure or special options. 48 / 53
49 Stanford Heart Transplant Data (p. 235) In a report (Crowley and Hu, J Amer. Statist Assoc. 1977) on the Stanford Heart Transplantation Study, patients identified as been eligible (N=103) for a heart transplant were followed until death or censorship. In total 65 received transplant during followup, whereas 38 did not. Assess whether transplanted patients survive better. On the next slide you will find the variables in the transplant data set. Here we will discuss how to analyse and at the exercises we will do some of the analyses. 49 / 53
50 Stanford Heart Transplant Data variables age cens days trans wait mismatch age (in years) at entry into the study. 0 = Censoring 1 = Dead number of days from entry to dead/censoring. 1 = if the person had a heart transplantation 0 = otherwise. number of days from entry to transplantation NB: if trans = 0 then wait = 1 1 = mismatch between HLA type in donor and patient 0 = no mismatch NB: if trans = 0 then mismatch = / 53
51 Obs age cens days trans wait mismatch / 53
52 Piecewise Constant Hazard Rate = Poisson regression Divide the time scale into K pieces and assuming piecewise constant but different hazard rates in each of the intervals. This may provide a sensible summary of many phenomena and is often used in epidemiology. λ 1 λ 2 λ 3 λ K c 0 = 0 c 1 c 2 c 3 c K 1 c K Age Thus λ(t) = λ k for t (c k 1, c k ], k = 1,..., K The intervals do not need to be of same length. We only need to keep record of the total number of deaths and the exposure time in each group. 52 / 53
53 We can further divide each interval into categories of covariates, e.g. sex (F=females, M=males): λ 1F λ 2F λ 3F λ KF λ 1M λ 2M λ 3M λ KM c 0 = 0 c 1 c 2 c 3 c K 1 c K Age Not straight forward in SAS to split the timescale, but socalled userwritten SASmacros exist. See for example: Stata use stsplit command. R packages exist (e.g. Epi Package) SPSS? 53 / 53
Extensions of Cox Model for NonProportional Hazards Purpose
PhUSE 2013 Paper SP07 Extensions of Cox Model for NonProportional Hazards Purpose Jadwiga Borucka, PAREXEL, Warsaw, Poland ABSTRACT Cox proportional hazard model is one of the most common methods used
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 HosmerLemeshow Statistic The HosmerLemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 01 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationSurvival Analysis. Lu Tian and Richard Olshen Stanford University
1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival
More informationLoglinearity for Cox s regression model. Thesis for the Degree Master of Science
Loglinearity for Cox s regression model Thesis for the Degree Master of Science Zaki Amini Master s Thesis, Spring 2015 i Abstract Cox s regression model is one of the most applied methods in medical
More informationSTA6938Logistic Regression Model
Dr. Ying Zhang STA6938Logistic Regression Model Topic 2Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationTypical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction
Outline CHL 5225H Advanced Statistical Methods for Clinical Trials: Survival Analysis Prof. Kevin E. Thorpe Defining Survival Data Mathematical Definitions Nonparametric Estimates of Survival Comparing
More informationA COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky
A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),
More informationLecture 11. Interval Censored and. DiscreteTime Data. Statistics Survival Analysis. Presented March 3, 2016
Statistics 255  Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of
More informationLongitudinal Modeling with Logistic Regression
Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationLecture 8 Stat D. Gillen
Statistics 255  Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels
More informationTied survival times; estimation of survival probabilities
Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in twoway and threeway tables. Now we will
More informationLecture 3. Truncation, lengthbias and prevalence sampling
Lecture 3. Truncation, lengthbias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in
More informationMultistate Modeling and Applications
Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)
More informationInference for Binomial Parameters
Inference for Binomial Parameters Dipankar Bandyopadhyay, Ph.D. Department of Biostatistics, Virginia Commonwealth University D. Bandyopadhyay (VCU) BIOS 625: Categorical Data & GLM 1 / 58 Inference for
More informationCOMPLEMENTARY LOGLOG MODEL
COMPLEMENTARY LOGLOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementaryloglog model. They all follow the same form π ( x) =Φ ( α
More informationMethodological challenges in research on consequences of sickness absence and disability pension?
Methodological challenges in research on consequences of sickness absence and disability pension? Prof., PhD Hjelt Institute, University of Helsinki 2 Two methodological approaches Lexis diagrams and Poisson
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationGeneralized Linear Modeling  Logistic Regression
1 Generalized Linear Modeling  Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationChapter 4 Regression Models
23.August 2010 Chapter 4 Regression Models The target variable T denotes failure time We let x = (x (1),..., x (m) ) represent a vector of available covariates. Also called regression variables, regressors,
More informationPASS Sample Size Software. Poisson Regression
Chapter 870 Introduction Poisson regression is used when the dependent variable is a count. Following the results of Signorini (99), this procedure calculates power and sample size for testing the hypothesis
More informationModels for Binary Outcomes
Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.
More informationThe coxvc_111 package
Appendix A The coxvc_111 package A.1 Introduction The coxvc_111 package is a set of functions for survival analysis that run under R2.1.1 [81]. This package contains a set of routines to fit Cox models
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationLecture 8. Poisson models for counts
Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes
More informationTimeDependent Covariates Survival More in PROC PHREG Fengying Xue,Sanofi R&D, China Michael Lai, Sanofi R&D, China
TimeDependent Covariates Survival More in PROC PHREG Fengying Xue,Sanofi R&D, China Michael Lai, Sanofi R&D, China ABSTRACT Survival analysis is a powerful tool with much strength, especially the semiparametric
More informationIntroduction to SAS proc mixed
Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The
More informationLogistic regression model for survival time analysis using timevarying coefficients
Logistic regression model for survival time analysis using timevarying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshimau.ac.jp Research
More informationSAS Analysis Examples Replication C8. * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ;
SAS Analysis Examples Replication C8 * SAS Analysis Examples Replication for ASDA 2nd Edition * Berglund April 2017 * Chapter 8 ; libname ncsr "P:\ASDA 2\Data sets\ncsr\" ; data c8_ncsr ; set ncsr.ncsr_sub_13nov2015
More informationLogistic regression: Miscellaneous topics
Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles doseresponse example
More informationAnalysis of competing risks data and simulation of data following predened subdistribution hazards
Analysis of competing risks data and simulation of data following predened subdistribution hazards Bernhard Haller Institut für Medizinische Statistik und Epidemiologie Technische Universität München 27.05.2013
More informationA note on R 2 measures for Poisson and logistic regression models when both models are applicable
Journal of Clinical Epidemiology 54 (001) 99 103 A note on R measures for oisson and logistic regression models when both models are applicable Martina Mittlböck, Harald Heinzl* Department of Medical Computer
More informationMüller: Goodnessoffit criteria for survival data
Müller: Goodnessoffit criteria for survival data Sonderforschungsbereich 386, Paper 382 (2004) Online unter: http://epub.ub.unimuenchen.de/ Projektpartner Goodness of fit criteria for survival data
More informationMODULE 6 LOGISTIC REGRESSION. Module Objectives:
MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between
More informationChapter 5: Logistic RegressionI
: Logistic RegressionI Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationST495: Survival Analysis: Maximum likelihood
ST495: Survival Analysis: Maximum likelihood Eric B. Laber Department of Statistics, North Carolina State University February 11, 2014 Everything is deception: seeking the minimum of illusion, keeping
More informationLecture 41 Sections Mon, Apr 7, 2008
Lecture 41 Sections 14.114.3 HampdenSydney College Mon, Apr 7, 2008 Outline 1 2 3 4 5 oneproportion test that we just studied allows us to test a hypothesis concerning one proportion, or two categories,
More informationSurvival Distributions, Hazard Functions, Cumulative Hazards
BIO 244: Unit 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution
More information6. Multiple regression  PROC GLM
Use of SAS  November 2016 6. Multiple regression  PROC GLM Karl Bang Christensen Department of Biostatistics, University of Copenhagen. http://biostat.ku.dk/~kach/sas2016/ kach@biostat.ku.dk, tel: 35327491
More informationSections 4.1, 4.2, 4.3
Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear
More informationIgnoring the matching variables in cohort studies  when is it valid, and why?
Ignoring the matching variables in cohort studies  when is it valid, and why? Arvid Sjölander Abstract In observational studies of the effect of an exposure on an outcome, the exposureoutcome association
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationSAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTERRANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000)
SAMPLE SIZE ESTIMATION FOR SURVIVAL OUTCOMES IN CLUSTERRANDOMIZED STUDIES WITH SMALL CLUSTER SIZES BIOMETRICS (JUNE 2000) AMITA K. MANATUNGA THE ROLLINS SCHOOL OF PUBLIC HEALTH OF EMORY UNIVERSITY SHANDE
More informationStandardization methods have been used in epidemiology. Marginal Structural Models as a Tool for Standardization ORIGINAL ARTICLE
ORIGINAL ARTICLE Marginal Structural Models as a Tool for Standardization Tosiya Sato and Yutaka Matsuyama Abstract: In this article, we show the general relation between standardization methods and marginal
More informationContrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:
Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.
More informationFlexible modelling of the cumulative effects of timevarying exposures
Flexible modelling of the cumulative effects of timevarying exposures Applications in environmental, cancer and pharmacoepidemiology Antonio Gasparrini Department of Medical Statistics London School
More informationStatistical Modelling with Stata: Binary Outcomes
Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Crosstabulation Exposed Unexposed Total Cases a b a + b Controls
More informationMultivariable Fractional Polynomials
Multivariable Fractional Polynomials Axel Benner September 7, 2015 Contents 1 Introduction 1 2 Inventory of functions 1 3 Usage in R 2 3.1 Model selection........................................ 3 4 Example
More informationTests for Two Correlated Proportions in a Matched Case Control Design
Chapter 155 Tests for Two Correlated Proportions in a Matched Case Control Design Introduction A 2byM casecontrol study investigates a risk factor relevant to the development of a disease. A population
More informationDescription Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see
Title stata.com stcrreg postestimation Postestimation tools for stcrreg Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see
More informationBIAS OF MAXIMUMLIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY
BIAS OF MAXIMUMLIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca LenzTönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1
More informationLab 11. Multilevel Models. Description of Data
Lab 11 Multilevel Models Henian Chen, M.D., Ph.D. Description of Data MULTILEVEL.TXT is clustered data for 386 women distributed across 40 groups. ID: 386 women, id from 1 to 386, individual level (level
More informationAppendix: Computer Programs for Logistic Regression
Appendix: Computer Programs for Logistic Regression In this appendix, we provide examples of computer programs to carry out unconditional logistic regression, conditional logistic regression, polytomous
More informationAnalysing categorical data using logit models
Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.researchtraining.net/manchester
More informationHypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)
Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Ztest χ 2 test Confidence Interval Sample size and power Relative effect
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationLab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p )
Lab 3: Two levels Poisson models (taken from Multilevel and Longitudinal Modeling Using Stata, p. 376390) BIO656 2009 Goal: To see if a major healthcare reform which took place in 1997 in Germany was
More informationMetaanalysis of epidemiological doseresponse studies
Metaanalysis of epidemiological doseresponse studies Nicola Orsini 2nd Italian Stata Users Group meeting October 1011, 2005 Institute Environmental Medicine, Karolinska Institutet Rino Bellocco Dept.
More informationA new strategy for metaanalysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston
A new strategy for metaanalysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials
More informationAnalytic Methods for Applied Epidemiology: Framework and Contingency Table Analysis
Analytic Methods for Applied Epidemiology: Framework and Contingency Table Analysis 2014 Maternal and Child Health Epidemiology Training PreTraining Webinar: Friday, May 16 24pm Eastern Kristin Rankin,
More informationA ReIntroduction to General Linear Models
A ReIntroduction to General Linear Models Today s Class: Big picture overview Why we are using restricted maximum likelihood within MIXED instead of least squares within GLM Linear model interpretation
More informationDynamic Determination of Mixed Model Covariance Structures. in Doubleblind Clinical Trials. Matthew Davis  Omnicare Clinical Research
PharmaSUG2010  Paper SP12 Dynamic Determination of Mixed Model Covariance Structures in Doubleblind Clinical Trials Matthew Davis  Omnicare Clinical Research Abstract With the computing power of SAS
More informationCompare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method
Compare Predicted Counts between Groups of Zero Truncated Poisson Regression Model based on Recycled Predictions Method Yan Wang 1, Michael Ong 2, Honghu Liu 1,2,3 1 Department of Biostatistics, UCLA School
More informationGeneralized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence
Generalized Linear Model under the Extended Negative Multinomial Model and Cancer Incidence Sunil Kumar Dhar Center for Applied Mathematics and Statistics, Department of Mathematical Sciences, New Jersey
More informationover Time line for the means). Specifically, & covariances) just a fixed variance instead. PROC MIXED: to 1000 is default) list models with TYPE=VC */
CLP 944 Example 4 page 1 WithinPersonn Fluctuation in Symptom Severity over Time These data come from a study of weekly fluctuation in psoriasis severity. There was no intervention and no real reason
More informationSTAT 5500/6500 Conditional Logistic Regression for Matched Pairs
STAT 5500/6500 Conditional Logistic Regression for Matched Pairs The data for the tutorial came from support.sas.com, The LOGISTIC Procedure: Conditional Logistic Regression for Matched Pairs Data :: SAS/STAT(R)
More informationUnit 9: Inferences for Proportions and Count Data
Unit 9: Inferences for Proportions and Count Data Statistics 571: Statistical Methods Ramón V. León 12/15/2008 Unit 9  Stat 571  Ramón V. León 1 Large Sample Confidence Interval for Proportion ( pˆ p)
More informationLecture 5: ANOVA and Correlation
Lecture 5: ANOVA and Correlation Ani Manichaikul amanicha@jhsph.edu 23 April 2007 1 / 62 Comparing Multiple Groups Continous data: comparing means Analysis of variance Binary data: comparing proportions
More informationCausal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification. Todd MacKenzie, PhD
Causal Hazard Ratio Estimation By Instrumental Variables or Principal Stratification Todd MacKenzie, PhD Collaborators A. James O Malley Tor Tosteson Therese Stukel 2 Overview 1. Instrumental variable
More informationFrailty Modeling for clustered survival data: a simulation study
Frailty Modeling for clustered survival data: a simulation study IAA Oslo 2015 Souad ROMDHANE LaREMFiQ  IHEC University of Sousse (Tunisia) souad_romdhane@yahoo.fr Lotfi BELKACEM LaREMFiQ  IHEC University
More informationStatistical Methods in Clinical Trials Categorical Data
Statistical Methods in Clinical Trials Categorical Data Types of Data quantitative Continuous Blood pressure Time to event Categorical sex qualitative Discrete No of relapses Ordered Categorical Pain level
More informationStat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010
1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of
More informationPoisson Regression. Ryan Godwin. ECON University of Manitoba
Poisson Regression Ryan Godwin ECON 7010  University of Manitoba Abstract. These lecture notes introduce Maximum Likelihood Estimation (MLE) of a Poisson regression model. 1 Motivating the Poisson Regression
More informationSTAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and QQ plots. March 8, 2015
STAT 135 Lab 6 Duality of Hypothesis Testing and Confidence Intervals, GLRT, Pearson χ 2 Tests and QQ plots March 8, 2015 The duality between CI and hypothesis testing The duality between CI and hypothesis
More informationIn Class Review Exercises Vartanian: SW 540
In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE
More informationMonitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs. Christopher Jennison
Monitoring clinical trial outcomes with delayed response: incorporating pipeline data in group sequential designs Christopher Jennison Department of Mathematical Sciences, University of Bath http://people.bath.ac.uk/mascj
More informationContinuous case Discrete case General case. Hazard functions. Patrick Breheny. August 27. Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21
Hazard functions Patrick Breheny August 27 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/21 Introduction Continuous case Let T be a nonnegative random variable representing the time to an event
More informationFrailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Mela. P.
Frailty Modeling for Spatially Correlated Survival Data, with Application to Infant Mortality in Minnesota By: Sudipto Banerjee, Melanie M. Wall, Bradley P. Carlin November 24, 2014 Outlines of the talk
More informationInvestigating Models with Two or Three Categories
Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might
More informationExact McNemar s Test and Matching Confidence Intervals Michael P. Fay April 25,
Exact McNemar s Test and Matching Confidence Intervals Michael P. Fay April 25, 2016 1 McNemar s Original Test Consider paired binary response data. For example, suppose you have twins randomized to two
More informationCategorical and Zero Inflated Growth Models
Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State University, Corvallis OR 97331 (alan.acock@oregonstate.edu).
More informationEDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.
EDF 7405 Advanced Quantitative Methods in Educational Research Data are available on IQ of the child and seven potential predictors. Four are medical variables available at the birth of the child: Birthweight
More informationMultilevel Statistical Models: 3 rd edition, 2003 Contents
Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction
More informationOptimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai
Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment
More informationApplied Linear Statistical Methods
Applied Linear Statistical Methods (short lecturenotes) Prof. Rozenn Dahyot School of Computer Science and Statistics Trinity College Dublin Ireland www.scss.tcd.ie/rozenn.dahyot Hilary Term 2016 1. Introduction
More informationLecture 10: Alternatives to OLS with limited dependent variables. PEA vs APE Logit/Probit Poisson
Lecture 10: Alternatives to OLS with limited dependent variables PEA vs APE Logit/Probit Poisson PEA vs APE PEA: partial effect at the average The effect of some x on y for a hypothetical case with sample
More informationStatistics 262: Intermediate Biostatistics Regression & Survival Analysis
Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Introduction This course is an applied course,
More informationThe Ef ciency of Simple and Countermatched Nested Casecontrol Sampling
Published by Blackwell Publishers Ltd, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA 02148, USA Vol 26: 493±509, 1999 The Ef ciency of Simple and Countermatched Nested Casecontrol
More informationIntroduction to the rstpm2 package
Introduction to the rstpm2 package Mark Clements Karolinska Institutet Abstract This vignette outlines the methods and provides some examples for linkbased survival models as implemented in the R rstpm2
More informationM3 Symposium: Multilevel Multivariate Survival Models For Analysis of Dyadic Social Interaction
M3 Symposium: Multilevel Multivariate Survival Models For Analysis of Dyadic Social Interaction Mike Stoolmiller: stoolmil@uoregon.edu University of Oregon 5/21/2013 Outline Example Research Questions
More informationKnown unknowns : using multiple imputation to fill in the blanks for missing data
Known unknowns : using multiple imputation to fill in the blanks for missing data James Stanley Department of Public Health University of Otago, Wellington james.stanley@otago.ac.nz Acknowledgments Cancer
More information