# Logistic regression model for survival time analysis using time-varying coefficients

Save this PDF as:

Size: px
Start display at page:

Download "Logistic regression model for survival time analysis using time-varying coefficients"

## Transcription

1 Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH Research Institute for Radiation Biology and Medicine, Hiroshima University, Kasumi, Minami-ku, Hiroshima Tetsuji TONDA Faculty of Management and Information Systems, Prefectural University of Hiroshima, Ujina-Higashi, Minami-ku, Hiroshima , JAPAN. Shizue IZUMI Center for Data Science Education and Research, Shiga University, Banbacho, Hikone, Shiga , JAPAN. SYNOPTIC ABSTRACT In epidemiological studies, odds ratios are widely used for quantifying the relative risk. The odds ratio can be estimated from background factors using logistic regression. In this paper, a logistic regression model for the survival time is proposed using time-varying coefficients, and statistical inference is conducted using the Newton-Raphson method and simultaneous confidence intervals. Numerical examples and simulation studies demonstrate that the proposed model can be used to obtain the odds ratio in survival time analysis. Key words: Logistic regression model; Newton-Raphson method; Odds ratio; Survival time analysis; Time-varying coefficient. 1. Introduction. Odds ratios are widely used in epidemiology to measure the association between dichotomous outcome variables, such as, case or control, normal or abnormal, dead or alive (see, McCullagh and Nelder (1989)). It can be interpreted as a relative risk when the probability of occurrence is very small.

2 Logistic regression models are often used to estimate the odds ratio in situations when there are confounding factors requiring adjustment. On the other hand, time to death or survival time is frequently analyzed by using Cox proportional hazard model, proposed in Cox (1972). However, the model is not concerned with the odds ratio, but with the hazard ratio. Here, we try to apply the logistic regression model to survival time analysis and evaluate the odds ratio. In Section 2, we consider time-varying coefficients in logistic regression model in order to describe survival time data. In Section 3 the proposed model is applied to a real dataset, and the stability of the estimation method is investigated in a simulation study in Section 4. In Section 5, we discuss our proposed method and conclusions from our investigation. 2. Logistic regression model for survival time data. First, we define survival time as a random variable and explain a censoring time in 2.1. Then we connect the distribution function of survival time with time-varying coefficients. In 2.2 regression coefficients are estimated by maximizing a log-likelihood under the logistic regression model and the Newton-Raphson method can be implemented. Since estimated time-varying coefficients are functions of time, their confidence intervals are also functions given in Describing distribution function of survival time data by using time-varying coefficients. Let T be a continuous random variable denoting the time of death, whose cumulative distribution function (cdf) is given by F (t) = Pr(T t). The complement of cdf is known as the survival function, given by S(t) = 1 F (t). It denotes the probability of being alive up until time t, or more generally, the probability that the event of interest has not occurred by time t, which is often called the censoring time. Let the regression coefficients of covariates a = (a 1,..., a p ) be β(t) = (β 1 (t),..., β p (t)). The effects of covariates can be non-stationary, and are

3 referred to as time-varying coefficients (Hastie and Tibshirani (1993)). With the logit or log-odds transformation of F (t), a logistic regression model can be obtained for survival time data as follows, log F (t) S(t) = z(t a) = β(t) a. (1) Thus, the log-odds ratio for a j = 1 to a j = 0 at time t can be expressed by z(t a j = 1) z(t a j = 0) = β j (t), (2) or the odds ratio is given by exp{β j (t)}. The model in (1) can be regarded as an extension of the log-logistic model proposed by Bennet (1983), which uses the log-logistic distribution function for survival time and has a varying coefficient log t only for a constant covariate a 1, i.e., log F (t)/s(t) = φ log t + β a. Here we propose a model to evaluate the time-varying coefficients for the covariates in equation (1). We consider linear time-varying coefficients using the growth curve model presented in Satoh and Yanagihara (2010) for longitudinal data. Let x(t) be a (q 1) th degree polynomial basis function for varying coefficients β(t), i.e., β(t) = x(t) Θ. (3) Here, x(t) = (1, t, t 2,, t q 1 ) and Θ = (θ 1,, θ p ) is a q p unknown regression coefficient matrix. Note that ẋ(t) does not need to be a polynomial basis function, but it must be a differentiable function of t Deriving maximum likelihood estimators of regression coefficients. Assuming that the cdf F (t) is differentiable, we can then obtain the probability density function (pdf) given by, f(t) = From (4), it holds that df (t) dt = F (t)s(t) dz(t). (4) dt dz(t) dt = dβ(t) a = ẋ(t) Θa (5) dt

4 where ẋ(t) = dx(t) = (0, 1, 2t,, (q 1)t q 2 ). (6) dt Note that the hazard function can be written as f(t)/s(t) = F (t)ẋ(t) Θa. In most real situations, polynomial basis functions based on t = log( t) can provide a better fit for survival data than those based on the original survival time t, e.g., Bennet (1983). Assume that all subjects may experience an event or be censored, that is, for subject i either the time of death t i or an indication of whether or not the subject is censored, δ i = 1(uncensored) and δ i = 0(censored), i.e., (t i, δ i ), i = 1,, n, may be observed. Then the likelihood function for the regression coefficients Θ can be expressed as L(Θ) = n i=1 f δ i i S 1 δ i i = n i=1 {F i S i ż i } δ i S 1 δ i i, (7) where a i is a covariate vector for subject i, ż i = ẋ(t i ) Θa i, f i = f(t i ), F i = F (t i ) and S i = S(t i ). By maximizing the log-likelihood function with respect to Θ, the maximum likelihood estimator ˆΘ = ( ˆθ 1,, ˆθ p ) can be obtained. Let θ = vec(θ) = (θ 1,, θ p), and l(θ) = log L(Θ), and then the estimator ˆθ = vec( ˆΘ) satisfies dl( ˆθ)/dθ = 0 qp, which is defined by dl(θ) n dθ = { } δi S i w i F i w i + δ i ż 1 i ẇ i, (8) i=1 where w i = a i x(t i ) and ẇ i = a i ẋ(t i ). Its Hessian matrix is given by d 2 l(θ) dθ 2 = n i=1 { (1 + δi )F i S i w i w i + δ i ż 2 i ẇ i ẇ i}. (9) Using the Newton-Raphson method, the maximum likelihood estimator ˆθ can be obtained in the following recurrence formula. { } d 2 1 l(θ m ) dl(θ m ) θ m+1 = θ m, m = 0, 1, 2,, (10) dθ 2 dθ

5 x(t) ˆΘ or, ˆβj (t) = x(t) ˆθj, j {1,, p}. (12) where θ 0 is an adequate initial value. Note that the inverse matrix can be used as an asymptotic covariance matrix of the maximum likelihood estimator ˆθ, i.e., Ω = Cov( ˆθ) { d 2 l( ˆθ) } 1. (11) dθ 2 We then have estimators for the linear time-varying coefficients, ˆβ(t) = From the properties of the maximum likelihood estimator under regularity conditions, e.g., Philippou and Roussas (1975), the estimators are asymptotically normal, ˆβ j (t) N q (0, σ 2 j (t)) where σ 2 j (t) = x(t) Ω j x(t) and Cov( ˆθ j ) = Ω j which is the corresponding q q matrix of Ω = (Ω uv ), u, v = 1,, pq, i.e.,ω j = (Ω uv ), u, v = (j 1)q + 1,, jq Constructing simultaneous confidence intervals of time-varying coefficients. Here, we construct a confidence interval for the linear time-varying coefficients, given by I j,α (t u α ) = [ ˆβj (t) u αˆσ j (t), ˆβj (t) + u αˆσ j (t)]. (13) The covering probability of I j,α (t u α ) depends on u α. For example, the pointwise confidence interval at a fixed time t can be constructed by letting u α = z α/2, where z α denotes the upper 100α percentile of N(0, 1). Note that the confidence interval I j,α (t z α/2 ) satisfies Pr(β j (t) I j,α (t z α/2 )) 1 α for a fixed time t. To construct a simultaneous confidence interval, we need to evaluate the distribution of the supremum of the Wald type statistic T j (t) = { ˆβ j (t) β j (t)}/σ j (t), but it is difficult to derive an explicit expression for the distribution of the supremum statistic in general. Here, we evaluate the upper bound of the supremum of T j (t) in the same manner as in Satoh and Yanagihara (2010). From the inequality in Rao (1973, p. 60), ˆβ j (t) asymptotically

6 satisfies the following equation: {x(t) ( ˆθj θ j )} 2 {x ( ˆθj θ j )} 2 sup T j (t) 2 = sup t R t R x(t) Ω j x(t) = sup ( x R q ) x Ω j x ( ) ˆθj θ j Ω 1 ˆθj θ j χ 2 q. j (14) Note that the asymptotic distribution of the upper bound is χ 2 q for any time t. Let u α = c q,α, where c q,α is the upper 100α percentile of χ 2 q, then the covering probability of the confidence interval I j,α (t c q,α ) satisfies Pr ( β j (t) I j,α (t) t R ) 1 α. (15) Based on equation (14), we can construct test statistics for the following null hypotheses for time-varying coefficient β j (t): Uniformly zero Uniformly constant. H 0 : β j (t) = 0 for t R H 0 : β j (t) = const. for t R (16) The uniformly zero hypothesis is equivalent to θ j = 0. Using equation (14) with θ j = 0, the upper bound of the supremum of T j (t) 2 is W j = ˆθ jω 1 j ˆθ j χ 2 q. Hence, W j can be used as a test statistic for the null hypothesis H 0. The uniformly zero hypothesis is rejected when W j > c q,α, and the p-value can be obtained by Pr(χ 2 q > W j ). Note that the uniformly constant hypothesis is equivalent to θ ( 1) j = 0, where θ ( 1) j is a (q 1)-dimensional vector, where the first element of θ j is excluded because it is equal to 1. This implies that the corresponding covariate a j has no effect on observations and the corresponding odds ratio is 1, i.e., exp{β j (t)} = 1. Analogous to the test for the uniformly zero hypothesis, we can construct a test statistic and derive an asymptotic null distribution for the uniformly constant hypothesis. 3. Numerical example. In this section, we consider a dataset of remission lengths (weeks) for acute leukemia patients in Table 1, which was reported by Freireich et al. (1963) and was explained in Kleinbaum (2012). The data consist of a placebo

7 group and a treatment group, each containing 21 patients. Our main concern is comparing the survival rates of the two groups. We considered the proposed model using the placebo group as a control group, and the covariate of the i th individual is expressed as a i = 1 for the treatment group and a i = 0 for the placebo group, where i = 1,, n with n = 21 2 = 42. Assuming the time-varying coefficient for the treatment effect to be a linear curve, the design vector is given by x(t) = (1, t) and the length is q = 2. Note that the survival time t is the logarithm of the original length of remission. The maximum likelihood estimators and the asymptotic standard error were calculated using (10) and (11) respectively and are listed in Table 2. Hence, the estimated logistic regression model in (1) can be expressed as ˆβ 1 (t) + ˆβ 2 (t)a where ˆβ 1 (t) = t for the placebo group and ˆβ 2 (t) = t for the treatment effect, i.e., ˆβ1 (t) + ˆβ 2 (t) for the treatment group. Figure 1 shows the fitted survival curves for each group. The proposed model seems to provide a good fit to the Kaplan-Meier curves. Since the proposed model is based on logistic regression, the odds ratio for the treatment group to the placebo group can be expressed as exp{β 2 (t)}, (see Figure 2). The simultaneous confidence intervals were also derived using (15). The estimated time-varying odds ratio curve seems to be around 0.1 during observation in Figure 2. In fact, the regression coefficient of t a in Table 2 is not statistically significant; p = > Then, the interaction term is removed from Table 2 and the corresponding estimates are given in Table 3. The treatment effect is now statistically significant, although the effect is not significant in Table 2. The estimated odds ratio in Table 3 is exp( 2.315) = 0.10 and the curve in Figure 2 appear to be reasonably constant. From the results of applying the proposed method to the remission time dataset, the proposed model constructed by logistic regression with time-varying coefficients can be seen to provide a good fit to the data, and we could confirm that the odds ratio was constant using the more flexible model which allowed for non-stationary odds ratios.

8 4. Simulation. We obtained our estimates for the model parameters using the Newton- Raphson method, as defined by the recurrence formula (10). The estimates will converge if the initial value θ 0 is sufficiently close to the maximum likelihood estimator ˆθ, since dl( ˆθ)/dθ = 0 qp (see, McCullagh and Nelder (1989)). To elucidate the behavior of the estimator we investigated: 1) how quickly the estimator converged as the number of iterations increased, and 2) the influence of the initial guess for the estimator on the convergence. For our simulations, we used the parameter estimates in Table 3, which were fitted to the example shown in Table 1. Therefore, the initial values can be expressed as θ 0 = (θ 01, θ 02, θ 03 ). The regression coefficients θ 01 and θ 02 were fixed as and 1.830, respectively, based on the values in Table 3 and the coefficients θ 03 was simulated from the uniform distribution U( 4, 0), which are relatively close to the true maximum likelihood estimator ˆθ 3 = given in Table 3. Thus, as shown in Figure 3, 1,000 initial values were simulated from the uniform distribution and the Newton-Raphson method was applied 20 times for each initial value. All estimators successfully converged and the converged values were almost the same as the true maximum likelihood estimator. For the convergence rate, the number of iterations until convergence was less than 5 times. From the results of the simulations, the Newton-Raphson method seems to be suitable for obtaining the maximum likelihood estimators when the initial values are sufficiently close to the true values. Therefore, it is important for us to try different initial values and confirm the likelihood value in (7) for the obtained estimators. 5. Conclusion. We proposed a logistic regression survival model with time-varying coefficients. The maximum likelihood estimators and their asymptotic covariance matrix were calculated iteratively by the Newton-Raphson method. In our model, the odds ratio can be expressed as a function of time and its simultaneous confidence intervals were also considered. From the simulation study,

9 a maximum likelihood estimator can also be obtained with the odds ratio when initial values are close to the true values. The model provided a good fit when applied to a real dataset, and it was confirmed that the odds ratio is constant in time. Besides providing a test of stationarity for the odds ratio, our proposed model might also be useful for modeling odds ratios which are non-stationary. References Bennet, S. (1983). Log-logistic regression models for survival data. Journal of Applied Statistics, 32, Cox, D. R. (1972). Regression Models and Life-Tables. Journal of the Royal Statistical Society. Series B, 34, Freireich, E. O. et al. (1963). The effect of 6-mercaptopurine on the duration of steroid-induced remissions in acute leukemia. Blood, 21, Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. Journals of the Royal Statistical Society B, 55, Kleinbaum, D. G. (2012). Survival Analysis 3rd ed., Springer, New York. Philippou, A. N. and Roussas, G. G. (1975). Asymptotic normality of the maximum likelihood estimate in the independent not identically distributed case. Annals of the Institute of Statistical Mathematics, 27, Rao, C. R. (1973). Linear Statistical Inference and Its Applications. John Wiley, New York. McCullagh, P. and Nelder, J. A. (1989). Generalized linear models 2nd ed., Chapman and Hall/CRC, London.

10 Satoh, K. and Yanagihara, H. (2010). for a growth curve model. Management Sciences, 30, Estimation of varying coefficients American Journal of Mathematical and Satoh, K. and Tonda, T. (2016). Estimating regression coefficients for balanced growth curve model when time trend of baseline is not specified. American Journal of Mathematical and Management Sciences, in press. Table 1. Length of remission dataset by Freireich et al. (1963). ID Placebo Treatment ID Placebo Treatment Table 2. Estimates of regression coefficients. Variables Estimate Std. Error χ 2 1 p-value (Intercept) t a t a Table 3. Estimates of regression coefficients when the treatment effect is constant in time. Variables Estimate Std. Error χ 2 1 p-value (Intercept) t a

11 Survival Probability Treatment Placebo Kaplan Meier Weeks Figure 1. Fitted survival curves based on the logistic regression model.

12 Odds Ratio Estimated OR 95% C.I Weeks Figure 2. The estimated time-varying odds ratio curve and its 95% simultaneous confidence intervals.

13 Estimates Iterations of Newton Raphson method Figure 3. Convergence of the regression coefficients with different initial values, when using the Newton-Raphson method. The true value is

### Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui

### Survival Analysis. Stat 526. April 13, 2018

Survival Analysis Stat 526 April 13, 2018 1 Functions of Survival Time Let T be the survival time for a subject Then P [T < 0] = 0 and T is a continuous random variable The Survival function is defined

### Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

### Bias-corrected AIC for selecting variables in Poisson regression models

Bias-corrected AIC for selecting variables in Poisson regression models Ken-ichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,

### Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data

International Mathematical Forum, 3, 2008, no. 33, 1643-1654 Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data A. Al-khedhairi Department of Statistics and O.R. Faculty

### Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of

### Typical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction

Outline CHL 5225H Advanced Statistical Methods for Clinical Trials: Survival Analysis Prof. Kevin E. Thorpe Defining Survival Data Mathematical Definitions Non-parametric Estimates of Survival Comparing

### LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

### Multistate Modeling and Applications

Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

### FULL LIKELIHOOD INFERENCES IN THE COX MODEL

October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

### BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

### STA6938-Logistic Regression Model

Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

### Correlation and regression

1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

### Lecture 5 Models and methods for recurrent event data

Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.

### Chapter 2 Inference on Mean Residual Life-Overview

Chapter 2 Inference on Mean Residual Life-Overview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate

### Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data

Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum

### 9 Generalized Linear Models

9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

### A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

Communications for Statistical Applications and Methods 2016, Vol. 23, No. 6, 517 529 http://dx.doi.org/10.5351/csam.2016.23.6.517 Print ISSN 2287-7843 / Online ISSN 2383-4757 A comparison of inverse transform

### Chapter 4 Regression Models

23.August 2010 Chapter 4 Regression Models The target variable T denotes failure time We let x = (x (1),..., x (m) ) represent a vector of available covariates. Also called regression variables, regressors,

### Linear Regression Models P8111

Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

### Survival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek

Survival Analysis 732G34 Statistisk analys av komplexa data Krzysztof Bartoszek (krzysztof.bartoszek@liu.se) 10, 11 I 2018 Department of Computer and Information Science Linköping University Survival analysis

### A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood

Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints and Its Application to Empirical Likelihood Mai Zhou Yifan Yang Received:

### Attributable Risk Function in the Proportional Hazards Model

UW Biostatistics Working Paper Series 5-31-2005 Attributable Risk Function in the Proportional Hazards Model Ying Qing Chen Fred Hutchinson Cancer Research Center, yqchen@u.washington.edu Chengcheng Hu

### Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

Log-linearity for Cox s regression model Thesis for the Degree Master of Science Zaki Amini Master s Thesis, Spring 2015 i Abstract Cox s regression model is one of the most applied methods in medical

### Logistic Regression. Fitting the Logistic Regression Model BAL040-A.A.-10-MAJ

Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent

### Survival Regression Models

Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

### STAT331. Cox s Proportional Hazards Model

STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

### DAGStat Event History Analysis.

DAGStat 2016 Event History Analysis Robin.Henderson@ncl.ac.uk 1 / 75 Schedule 9.00 Introduction 10.30 Break 11.00 Regression Models, Frailty and Multivariate Survival 12.30 Lunch 13.30 Time-Variation and

### Introduction to the Logistic Regression Model

CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response

### Association studies and regression

Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

### ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

### 8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

### CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

### A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky

A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),

### FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH

FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH Jian-Jian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter

### FSAN815/ELEG815: Foundations of Statistical Learning

FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression Fall 2014 Course Objectives & Structure Course Objectives & Structure The course provides an introduction

### Solutions for Examination Categorical Data Analysis, March 21, 2013

STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.

### Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided

### Full likelihood inferences in the Cox model: an empirical likelihood approach

Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s10463-010-0272-y Full likelihood inferences in the Cox model: an empirical likelihood approach Jian-Jian Ren Mai Zhou Received: 22 September 2008 / Revised:

### Generalized Linear Models with Functional Predictors

Generalized Linear Models with Functional Predictors GARETH M. JAMES Marshall School of Business, University of Southern California Abstract In this paper we present a technique for extending generalized

### Multinomial Logistic Regression Models

Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

### Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

Chapter 864 Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X Introduction Logistic regression expresses the relationship between a binary response variable and one or more

### Lecture 7. Proportional Hazards Model - Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016

Proportional Hazards Model - Handling Ties and Survival Estimation Statistics 255 - Survival Analysis Presented February 4, 2016 likelihood - Discrete Dan Gillen Department of Statistics University of

### Outline of GLMs. Definitions

Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

### Master s Written Examination - Solution

Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

### Binary Logistic Regression

The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

### A SIMPLE IMPROVEMENT OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

A SIMPLE IMPROVEMENT OF THE KAPLAN-MEIER ESTIMATOR Agnieszka Rossa Dept of Stat Methods, University of Lódź, Poland Rewolucji 1905, 41, Lódź e-mail: agrossa@krysiaunilodzpl and Ryszard Zieliński Inst Math

### Longitudinal Modeling with Logistic Regression

Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

### Basic Medical Statistics Course

Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable

### Lecture 8 Stat D. Gillen

Statistics 255 - Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels

### Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

### BIOS 312: Precision of Statistical Inference

and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

### Censoring. Time to Event (Survival) Data. Special features of time to event (survival) data: Strictly non-negative observations

Time to Event (Survival) Data Survival analysis is the analysis of observed times from a well defined origin to the occurrence of a particular event or end-point. Time from entry into a clinical trial

### AFT Models and Empirical Likelihood

AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t

### Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

### Randomization Tests for Regression Models in Clinical Trials

Randomization Tests for Regression Models in Clinical Trials A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University By Parwen

### Survival Analysis. Lu Tian and Richard Olshen Stanford University

1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival

### On the Existence and Uniqueness of the Maximum Likelihood Estimators of Normal and Lognormal Population Parameters with Grouped Data

Florida International University FIU Digital Commons Department of Mathematics and Statistics College of Arts, Sciences & Education 6-16-2009 On the Existence and Uniqueness of the Maximum Likelihood Estimators

### Generalized linear models

Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

### The Logit Model: Estimation, Testing and Interpretation

The Logit Model: Estimation, Testing and Interpretation Herman J. Bierens October 25, 2008 1 Introduction to maximum likelihood estimation 1.1 The likelihood function Consider a random sample Y 1,...,

### A fast routine for fitting Cox models with time varying effects

Chapter 3 A fast routine for fitting Cox models with time varying effects Abstract The S-plus and R statistical packages have implemented a counting process setup to estimate Cox models with time varying

### Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

### A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic

### A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

### Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

### Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

### Confidence Bands for the Logistic and Probit Regression Models Over Intervals

Confidence Bands for the Logistic and Probit Regression Models Over Intervals arxiv:1604.01242v1 [math.st] 5 Apr 2016 Lucy Kerns Department of Mathematics and Statistics Youngstown State University Youngstown

### Practice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:

Practice Exam 1 1. Losses for an insurance coverage have the following cumulative distribution function: F(0) = 0 F(1,000) = 0.2 F(5,000) = 0.4 F(10,000) = 0.9 F(100,000) = 1 with linear interpolation

### Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

### Confidence intervals for the variance component of random-effects linear models

The Stata Journal (2004) 4, Number 4, pp. 429 435 Confidence intervals for the variance component of random-effects linear models Matteo Bottai Arnold School of Public Health University of South Carolina

### GEE for Longitudinal Data - Chapter 8

GEE for Longitudinal Data - Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasi-likelihood estimation method

### Longitudinal + Reliability = Joint Modeling

Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTED-HAROSA International Workshop November 21-22, 2013 Barcelona Mainly from Rizopoulos,

### Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s

Chapter 866 Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s Introduction Logistic regression expresses the relationship between a binary response variable and one or

### Simple logistic regression

Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

### Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

### Statistics 262: Intermediate Biostatistics Regression & Survival Analysis

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Introduction This course is an applied course,

### Integrated likelihoods in survival models for highlystratified

Working Paper Series, N. 1, January 2014 Integrated likelihoods in survival models for highlystratified censored data Giuliana Cortese Department of Statistical Sciences University of Padua Italy Nicola

### Survival Analysis. STAT 526 Professor Olga Vitek

Survival Analysis STAT 526 Professor Olga Vitek May 4, 2011 9 Survival Data and Survival Functions Statistical analysis of time-to-event data Lifetime of machines and/or parts (called failure time analysis

### Likelihood Construction, Inference for Parametric Survival Distributions

Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make

### Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)

Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Z-test χ 2 -test Confidence Interval Sample size and power Relative effect

### Statistical Inference and Methods

Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and

### Model comparison and selection

BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

### Generalized Linear Modeling - Logistic Regression

1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

### Lecture 10: Introduction to Logistic Regression

Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial

### STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

### Lecture 4: Newton s method and gradient descent

Lecture 4: Newton s method and gradient descent Newton s method Functional iteration Fitting linear regression Fitting logistic regression Prof. Yao Xie, ISyE 6416, Computational Statistics, Georgia Tech

### Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract

### Goodness-Of-Fit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen

Outline Cox s proportional hazards model. Goodness-of-fit tools More flexible models R-package timereg Forthcoming book, Martinussen and Scheike. 2/38 University of Copenhagen http://www.biostat.ku.dk

### A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials

### A SMOOTHED VERSION OF THE KAPLAN-MEIER ESTIMATOR. Agnieszka Rossa

A SMOOTHED VERSION OF THE KAPLAN-MEIER ESTIMATOR Agnieszka Rossa Dept. of Stat. Methods, University of Lódź, Poland Rewolucji 1905, 41, Lódź e-mail: agrossa@krysia.uni.lodz.pl and Ryszard Zieliński Inst.

### LOGISTICS REGRESSION FOR SAMPLE SURVEYS

4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi-002 4. INTRODUCTION Researchers use sample survey methodology to obtain information

### Harvard University. Harvard University Biostatistics Working Paper Series. Survival Analysis with Change Point Hazard Functions

Harvard University Harvard University Biostatistics Working Paper Series Year 2006 Paper 40 Survival Analysis with Change Point Hazard Functions Melody S. Goodman Yi Li Ram C. Tiwari Harvard University,

### Müller: Goodness-of-fit criteria for survival data

Müller: Goodness-of-fit criteria for survival data Sonderforschungsbereich 386, Paper 382 (2004) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Goodness of fit criteria for survival data

### COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

### Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

### Frailty Models and Copulas: Similarities and Differences

Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt

### Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Title stata.com logistic postestimation Postestimation tools for logistic Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

### Investigation of goodness-of-fit test statistic distributions by random censored samples

d samples Investigation of goodness-of-fit test statistic distributions by random censored samples Novosibirsk State Technical University November 22, 2010 d samples Outline 1 Nonparametric goodness-of-fit