Logistic regression model for survival time analysis using timevarying coefficients


 Gwenda Morris
 8 months ago
 Views:
Transcription
1 Logistic regression model for survival time analysis using timevarying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH Research Institute for Radiation Biology and Medicine, Hiroshima University, Kasumi, Minamiku, Hiroshima Tetsuji TONDA Faculty of Management and Information Systems, Prefectural University of Hiroshima, UjinaHigashi, Minamiku, Hiroshima , JAPAN. Shizue IZUMI Center for Data Science Education and Research, Shiga University, Banbacho, Hikone, Shiga , JAPAN. SYNOPTIC ABSTRACT In epidemiological studies, odds ratios are widely used for quantifying the relative risk. The odds ratio can be estimated from background factors using logistic regression. In this paper, a logistic regression model for the survival time is proposed using timevarying coefficients, and statistical inference is conducted using the NewtonRaphson method and simultaneous confidence intervals. Numerical examples and simulation studies demonstrate that the proposed model can be used to obtain the odds ratio in survival time analysis. Key words: Logistic regression model; NewtonRaphson method; Odds ratio; Survival time analysis; Timevarying coefficient. 1. Introduction. Odds ratios are widely used in epidemiology to measure the association between dichotomous outcome variables, such as, case or control, normal or abnormal, dead or alive (see, McCullagh and Nelder (1989)). It can be interpreted as a relative risk when the probability of occurrence is very small.
2 Logistic regression models are often used to estimate the odds ratio in situations when there are confounding factors requiring adjustment. On the other hand, time to death or survival time is frequently analyzed by using Cox proportional hazard model, proposed in Cox (1972). However, the model is not concerned with the odds ratio, but with the hazard ratio. Here, we try to apply the logistic regression model to survival time analysis and evaluate the odds ratio. In Section 2, we consider timevarying coefficients in logistic regression model in order to describe survival time data. In Section 3 the proposed model is applied to a real dataset, and the stability of the estimation method is investigated in a simulation study in Section 4. In Section 5, we discuss our proposed method and conclusions from our investigation. 2. Logistic regression model for survival time data. First, we define survival time as a random variable and explain a censoring time in 2.1. Then we connect the distribution function of survival time with timevarying coefficients. In 2.2 regression coefficients are estimated by maximizing a loglikelihood under the logistic regression model and the NewtonRaphson method can be implemented. Since estimated timevarying coefficients are functions of time, their confidence intervals are also functions given in Describing distribution function of survival time data by using timevarying coefficients. Let T be a continuous random variable denoting the time of death, whose cumulative distribution function (cdf) is given by F (t) = Pr(T t). The complement of cdf is known as the survival function, given by S(t) = 1 F (t). It denotes the probability of being alive up until time t, or more generally, the probability that the event of interest has not occurred by time t, which is often called the censoring time. Let the regression coefficients of covariates a = (a 1,..., a p ) be β(t) = (β 1 (t),..., β p (t)). The effects of covariates can be nonstationary, and are
3 referred to as timevarying coefficients (Hastie and Tibshirani (1993)). With the logit or logodds transformation of F (t), a logistic regression model can be obtained for survival time data as follows, log F (t) S(t) = z(t a) = β(t) a. (1) Thus, the logodds ratio for a j = 1 to a j = 0 at time t can be expressed by z(t a j = 1) z(t a j = 0) = β j (t), (2) or the odds ratio is given by exp{β j (t)}. The model in (1) can be regarded as an extension of the loglogistic model proposed by Bennet (1983), which uses the loglogistic distribution function for survival time and has a varying coefficient log t only for a constant covariate a 1, i.e., log F (t)/s(t) = φ log t + β a. Here we propose a model to evaluate the timevarying coefficients for the covariates in equation (1). We consider linear timevarying coefficients using the growth curve model presented in Satoh and Yanagihara (2010) for longitudinal data. Let x(t) be a (q 1) th degree polynomial basis function for varying coefficients β(t), i.e., β(t) = x(t) Θ. (3) Here, x(t) = (1, t, t 2,, t q 1 ) and Θ = (θ 1,, θ p ) is a q p unknown regression coefficient matrix. Note that ẋ(t) does not need to be a polynomial basis function, but it must be a differentiable function of t Deriving maximum likelihood estimators of regression coefficients. Assuming that the cdf F (t) is differentiable, we can then obtain the probability density function (pdf) given by, f(t) = From (4), it holds that df (t) dt = F (t)s(t) dz(t). (4) dt dz(t) dt = dβ(t) a = ẋ(t) Θa (5) dt
4 where ẋ(t) = dx(t) = (0, 1, 2t,, (q 1)t q 2 ). (6) dt Note that the hazard function can be written as f(t)/s(t) = F (t)ẋ(t) Θa. In most real situations, polynomial basis functions based on t = log( t) can provide a better fit for survival data than those based on the original survival time t, e.g., Bennet (1983). Assume that all subjects may experience an event or be censored, that is, for subject i either the time of death t i or an indication of whether or not the subject is censored, δ i = 1(uncensored) and δ i = 0(censored), i.e., (t i, δ i ), i = 1,, n, may be observed. Then the likelihood function for the regression coefficients Θ can be expressed as L(Θ) = n i=1 f δ i i S 1 δ i i = n i=1 {F i S i ż i } δ i S 1 δ i i, (7) where a i is a covariate vector for subject i, ż i = ẋ(t i ) Θa i, f i = f(t i ), F i = F (t i ) and S i = S(t i ). By maximizing the loglikelihood function with respect to Θ, the maximum likelihood estimator ˆΘ = ( ˆθ 1,, ˆθ p ) can be obtained. Let θ = vec(θ) = (θ 1,, θ p), and l(θ) = log L(Θ), and then the estimator ˆθ = vec( ˆΘ) satisfies dl( ˆθ)/dθ = 0 qp, which is defined by dl(θ) n dθ = { } δi S i w i F i w i + δ i ż 1 i ẇ i, (8) i=1 where w i = a i x(t i ) and ẇ i = a i ẋ(t i ). Its Hessian matrix is given by d 2 l(θ) dθ 2 = n i=1 { (1 + δi )F i S i w i w i + δ i ż 2 i ẇ i ẇ i}. (9) Using the NewtonRaphson method, the maximum likelihood estimator ˆθ can be obtained in the following recurrence formula. { } d 2 1 l(θ m ) dl(θ m ) θ m+1 = θ m, m = 0, 1, 2,, (10) dθ 2 dθ
5 x(t) ˆΘ or, ˆβj (t) = x(t) ˆθj, j {1,, p}. (12) where θ 0 is an adequate initial value. Note that the inverse matrix can be used as an asymptotic covariance matrix of the maximum likelihood estimator ˆθ, i.e., Ω = Cov( ˆθ) { d 2 l( ˆθ) } 1. (11) dθ 2 We then have estimators for the linear timevarying coefficients, ˆβ(t) = From the properties of the maximum likelihood estimator under regularity conditions, e.g., Philippou and Roussas (1975), the estimators are asymptotically normal, ˆβ j (t) N q (0, σ 2 j (t)) where σ 2 j (t) = x(t) Ω j x(t) and Cov( ˆθ j ) = Ω j which is the corresponding q q matrix of Ω = (Ω uv ), u, v = 1,, pq, i.e.,ω j = (Ω uv ), u, v = (j 1)q + 1,, jq Constructing simultaneous confidence intervals of timevarying coefficients. Here, we construct a confidence interval for the linear timevarying coefficients, given by I j,α (t u α ) = [ ˆβj (t) u αˆσ j (t), ˆβj (t) + u αˆσ j (t)]. (13) The covering probability of I j,α (t u α ) depends on u α. For example, the pointwise confidence interval at a fixed time t can be constructed by letting u α = z α/2, where z α denotes the upper 100α percentile of N(0, 1). Note that the confidence interval I j,α (t z α/2 ) satisfies Pr(β j (t) I j,α (t z α/2 )) 1 α for a fixed time t. To construct a simultaneous confidence interval, we need to evaluate the distribution of the supremum of the Wald type statistic T j (t) = { ˆβ j (t) β j (t)}/σ j (t), but it is difficult to derive an explicit expression for the distribution of the supremum statistic in general. Here, we evaluate the upper bound of the supremum of T j (t) in the same manner as in Satoh and Yanagihara (2010). From the inequality in Rao (1973, p. 60), ˆβ j (t) asymptotically
6 satisfies the following equation: {x(t) ( ˆθj θ j )} 2 {x ( ˆθj θ j )} 2 sup T j (t) 2 = sup t R t R x(t) Ω j x(t) = sup ( x R q ) x Ω j x ( ) ˆθj θ j Ω 1 ˆθj θ j χ 2 q. j (14) Note that the asymptotic distribution of the upper bound is χ 2 q for any time t. Let u α = c q,α, where c q,α is the upper 100α percentile of χ 2 q, then the covering probability of the confidence interval I j,α (t c q,α ) satisfies Pr ( β j (t) I j,α (t) t R ) 1 α. (15) Based on equation (14), we can construct test statistics for the following null hypotheses for timevarying coefficient β j (t): Uniformly zero Uniformly constant. H 0 : β j (t) = 0 for t R H 0 : β j (t) = const. for t R (16) The uniformly zero hypothesis is equivalent to θ j = 0. Using equation (14) with θ j = 0, the upper bound of the supremum of T j (t) 2 is W j = ˆθ jω 1 j ˆθ j χ 2 q. Hence, W j can be used as a test statistic for the null hypothesis H 0. The uniformly zero hypothesis is rejected when W j > c q,α, and the pvalue can be obtained by Pr(χ 2 q > W j ). Note that the uniformly constant hypothesis is equivalent to θ ( 1) j = 0, where θ ( 1) j is a (q 1)dimensional vector, where the first element of θ j is excluded because it is equal to 1. This implies that the corresponding covariate a j has no effect on observations and the corresponding odds ratio is 1, i.e., exp{β j (t)} = 1. Analogous to the test for the uniformly zero hypothesis, we can construct a test statistic and derive an asymptotic null distribution for the uniformly constant hypothesis. 3. Numerical example. In this section, we consider a dataset of remission lengths (weeks) for acute leukemia patients in Table 1, which was reported by Freireich et al. (1963) and was explained in Kleinbaum (2012). The data consist of a placebo
7 group and a treatment group, each containing 21 patients. Our main concern is comparing the survival rates of the two groups. We considered the proposed model using the placebo group as a control group, and the covariate of the i th individual is expressed as a i = 1 for the treatment group and a i = 0 for the placebo group, where i = 1,, n with n = 21 2 = 42. Assuming the timevarying coefficient for the treatment effect to be a linear curve, the design vector is given by x(t) = (1, t) and the length is q = 2. Note that the survival time t is the logarithm of the original length of remission. The maximum likelihood estimators and the asymptotic standard error were calculated using (10) and (11) respectively and are listed in Table 2. Hence, the estimated logistic regression model in (1) can be expressed as ˆβ 1 (t) + ˆβ 2 (t)a where ˆβ 1 (t) = t for the placebo group and ˆβ 2 (t) = t for the treatment effect, i.e., ˆβ1 (t) + ˆβ 2 (t) for the treatment group. Figure 1 shows the fitted survival curves for each group. The proposed model seems to provide a good fit to the KaplanMeier curves. Since the proposed model is based on logistic regression, the odds ratio for the treatment group to the placebo group can be expressed as exp{β 2 (t)}, (see Figure 2). The simultaneous confidence intervals were also derived using (15). The estimated timevarying odds ratio curve seems to be around 0.1 during observation in Figure 2. In fact, the regression coefficient of t a in Table 2 is not statistically significant; p = > Then, the interaction term is removed from Table 2 and the corresponding estimates are given in Table 3. The treatment effect is now statistically significant, although the effect is not significant in Table 2. The estimated odds ratio in Table 3 is exp( 2.315) = 0.10 and the curve in Figure 2 appear to be reasonably constant. From the results of applying the proposed method to the remission time dataset, the proposed model constructed by logistic regression with timevarying coefficients can be seen to provide a good fit to the data, and we could confirm that the odds ratio was constant using the more flexible model which allowed for nonstationary odds ratios.
8 4. Simulation. We obtained our estimates for the model parameters using the Newton Raphson method, as defined by the recurrence formula (10). The estimates will converge if the initial value θ 0 is sufficiently close to the maximum likelihood estimator ˆθ, since dl( ˆθ)/dθ = 0 qp (see, McCullagh and Nelder (1989)). To elucidate the behavior of the estimator we investigated: 1) how quickly the estimator converged as the number of iterations increased, and 2) the influence of the initial guess for the estimator on the convergence. For our simulations, we used the parameter estimates in Table 3, which were fitted to the example shown in Table 1. Therefore, the initial values can be expressed as θ 0 = (θ 01, θ 02, θ 03 ). The regression coefficients θ 01 and θ 02 were fixed as and 1.830, respectively, based on the values in Table 3 and the coefficients θ 03 was simulated from the uniform distribution U( 4, 0), which are relatively close to the true maximum likelihood estimator ˆθ 3 = given in Table 3. Thus, as shown in Figure 3, 1,000 initial values were simulated from the uniform distribution and the NewtonRaphson method was applied 20 times for each initial value. All estimators successfully converged and the converged values were almost the same as the true maximum likelihood estimator. For the convergence rate, the number of iterations until convergence was less than 5 times. From the results of the simulations, the NewtonRaphson method seems to be suitable for obtaining the maximum likelihood estimators when the initial values are sufficiently close to the true values. Therefore, it is important for us to try different initial values and confirm the likelihood value in (7) for the obtained estimators. 5. Conclusion. We proposed a logistic regression survival model with timevarying coefficients. The maximum likelihood estimators and their asymptotic covariance matrix were calculated iteratively by the NewtonRaphson method. In our model, the odds ratio can be expressed as a function of time and its simultaneous confidence intervals were also considered. From the simulation study,
9 a maximum likelihood estimator can also be obtained with the odds ratio when initial values are close to the true values. The model provided a good fit when applied to a real dataset, and it was confirmed that the odds ratio is constant in time. Besides providing a test of stationarity for the odds ratio, our proposed model might also be useful for modeling odds ratios which are nonstationary. References Bennet, S. (1983). Loglogistic regression models for survival data. Journal of Applied Statistics, 32, Cox, D. R. (1972). Regression Models and LifeTables. Journal of the Royal Statistical Society. Series B, 34, Freireich, E. O. et al. (1963). The effect of 6mercaptopurine on the duration of steroidinduced remissions in acute leukemia. Blood, 21, Hastie, T. and Tibshirani, R. (1993). Varyingcoefficient models. Journals of the Royal Statistical Society B, 55, Kleinbaum, D. G. (2012). Survival Analysis 3rd ed., Springer, New York. Philippou, A. N. and Roussas, G. G. (1975). Asymptotic normality of the maximum likelihood estimate in the independent not identically distributed case. Annals of the Institute of Statistical Mathematics, 27, Rao, C. R. (1973). Linear Statistical Inference and Its Applications. John Wiley, New York. McCullagh, P. and Nelder, J. A. (1989). Generalized linear models 2nd ed., Chapman and Hall/CRC, London.
10 Satoh, K. and Yanagihara, H. (2010). for a growth curve model. Management Sciences, 30, Estimation of varying coefficients American Journal of Mathematical and Satoh, K. and Tonda, T. (2016). Estimating regression coefficients for balanced growth curve model when time trend of baseline is not specified. American Journal of Mathematical and Management Sciences, in press. Table 1. Length of remission dataset by Freireich et al. (1963). ID Placebo Treatment ID Placebo Treatment Table 2. Estimates of regression coefficients. Variables Estimate Std. Error χ 2 1 pvalue (Intercept) t a t a Table 3. Estimates of regression coefficients when the treatment effect is constant in time. Variables Estimate Std. Error χ 2 1 pvalue (Intercept) t a
11 Survival Probability Treatment Placebo Kaplan Meier Weeks Figure 1. Fitted survival curves based on the logistic regression model.
12 Odds Ratio Estimated OR 95% C.I Weeks Figure 2. The estimated timevarying odds ratio curve and its 95% simultaneous confidence intervals.
13 Estimates Iterations of Newton Raphson method Figure 3. Convergence of the regression coefficients with different initial values, when using the NewtonRaphson method. The true value is
Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives
TRNo. 1406, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui
More informationSurvival Analysis. Stat 526. April 13, 2018
Survival Analysis Stat 526 April 13, 2018 1 Functions of Survival Time Let T be the survival time for a subject Then P [T < 0] = 0 and T is a continuous random variable The Survival function is defined
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles doseresponse example
More informationBiascorrected AIC for selecting variables in Poisson regression models
Biascorrected AIC for selecting variables in Poisson regression models Kenichi Kamo (a), Hirokazu Yanagihara (b) and Kenichi Satoh (c) (a) Corresponding author: Department of Liberal Arts and Sciences,
More informationParameters Estimation for a Linear Exponential Distribution Based on Grouped Data
International Mathematical Forum, 3, 2008, no. 33, 16431654 Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data A. Alkhedhairi Department of Statistics and O.R. Faculty
More informationLecture 11. Interval Censored and. DiscreteTime Data. Statistics Survival Analysis. Presented March 3, 2016
Statistics 255  Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of
More informationTypical Survival Data Arising From a Clinical Trial. Censoring. The Survivor Function. Mathematical Definitions Introduction
Outline CHL 5225H Advanced Statistical Methods for Clinical Trials: Survival Analysis Prof. Kevin E. Thorpe Defining Survival Data Mathematical Definitions Nonparametric Estimates of Survival Comparing
More informationLOGISTIC REGRESSION Joseph M. Hilbe
LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of
More informationMultistate Modeling and Applications
Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL
October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIANJIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach
More informationBIAS OF MAXIMUMLIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY
BIAS OF MAXIMUMLIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca LenzTönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1
More informationSTA6938Logistic Regression Model
Dr. Ying Zhang STA6938Logistic Regression Model Topic 2Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationCorrelation and regression
1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,
More informationLecture 5 Models and methods for recurrent event data
Lecture 5 Models and methods for recurrent event data Recurrent and multiple events are commonly encountered in longitudinal studies. In this chapter we consider ordered recurrent and multiple events.
More informationChapter 2 Inference on Mean Residual LifeOverview
Chapter 2 Inference on Mean Residual LifeOverview Statistical inference based on the remaining lifetimes would be intuitively more appealing than the popular hazard function defined as the risk of immediate
More informationDynamic Prediction of Disease Progression Using Longitudinal Biomarker Data
Dynamic Prediction of Disease Progression Using Longitudinal Biomarker Data Xuelin Huang Department of Biostatistics M. D. Anderson Cancer Center The University of Texas Joint Work with Jing Ning, Sangbum
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationA comparison of inverse transform and composition methods of data simulation from the Lindley distribution
Communications for Statistical Applications and Methods 2016, Vol. 23, No. 6, 517 529 http://dx.doi.org/10.5351/csam.2016.23.6.517 Print ISSN 22877843 / Online ISSN 23834757 A comparison of inverse transform
More informationChapter 4 Regression Models
23.August 2010 Chapter 4 Regression Models The target variable T denotes failure time We let x = (x (1),..., x (m) ) represent a vector of available covariates. Also called regression variables, regressors,
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationSurvival Analysis. 732G34 Statistisk analys av komplexa data. Krzysztof Bartoszek
Survival Analysis 732G34 Statistisk analys av komplexa data Krzysztof Bartoszek (krzysztof.bartoszek@liu.se) 10, 11 I 2018 Department of Computer and Information Science Linköping University Survival analysis
More informationA Recursive Formula for the KaplanMeier Estimator with Mean Constraints and Its Application to Empirical Likelihood
Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the KaplanMeier Estimator with Mean Constraints and Its Application to Empirical Likelihood Mai Zhou Yifan Yang Received:
More informationAttributable Risk Function in the Proportional Hazards Model
UW Biostatistics Working Paper Series 5312005 Attributable Risk Function in the Proportional Hazards Model Ying Qing Chen Fred Hutchinson Cancer Research Center, yqchen@u.washington.edu Chengcheng Hu
More informationLoglinearity for Cox s regression model. Thesis for the Degree Master of Science
Loglinearity for Cox s regression model Thesis for the Degree Master of Science Zaki Amini Master s Thesis, Spring 2015 i Abstract Cox s regression model is one of the most applied methods in medical
More informationLogistic Regression. Fitting the Logistic Regression Model BAL040A.A.10MAJ
Logistic Regression The goal of a logistic regression analysis is to find the best fitting and most parsimonious, yet biologically reasonable, model to describe the relationship between an outcome (dependent
More informationSurvival Regression Models
Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant
More informationSTAT331. Cox s Proportional Hazards Model
STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations
More informationDAGStat Event History Analysis.
DAGStat 2016 Event History Analysis Robin.Henderson@ncl.ac.uk 1 / 75 Schedule 9.00 Introduction 10.30 Break 11.00 Regression Models, Frailty and Multivariate Survival 12.30 Lunch 13.30 TimeVariation and
More informationIntroduction to the Logistic Regression Model
CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables
ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationCHAPTER 1: BINARY LOGIT MODEL
CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual
More informationA COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky
A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),
More informationFULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH
FULL LIKELIHOOD INFERENCES IN THE COX MODEL: AN EMPIRICAL LIKELIHOOD APPROACH JianJian Ren 1 and Mai Zhou 2 University of Central Florida and University of Kentucky Abstract: For the regression parameter
More informationFSAN815/ELEG815: Foundations of Statistical Learning
FSAN815/ELEG815: Foundations of Statistical Learning Gonzalo R. Arce Chapter 14: Logistic Regression Fall 2014 Course Objectives & Structure Course Objectives & Structure The course provides an introduction
More informationSolutions for Examination Categorical Data Analysis, March 21, 2013
STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.
More informationLogistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy
Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided
More informationFull likelihood inferences in the Cox model: an empirical likelihood approach
Ann Inst Stat Math 2011) 63:1005 1018 DOI 10.1007/s104630100272y Full likelihood inferences in the Cox model: an empirical likelihood approach JianJian Ren Mai Zhou Received: 22 September 2008 / Revised:
More informationGeneralized Linear Models with Functional Predictors
Generalized Linear Models with Functional Predictors GARETH M. JAMES Marshall School of Business, University of Southern California Abstract In this paper we present a technique for extending generalized
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationConfidence Intervals for the Odds Ratio in Logistic Regression with One Binary X
Chapter 864 Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X Introduction Logistic regression expresses the relationship between a binary response variable and one or more
More informationLecture 7. Proportional Hazards Model  Handling Ties and Survival Estimation Statistics Survival Analysis. Presented February 4, 2016
Proportional Hazards Model  Handling Ties and Survival Estimation Statistics 255  Survival Analysis Presented February 4, 2016 likelihood  Discrete Dan Gillen Department of Statistics University of
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More informationMaster s Written Examination  Solution
Master s Written Examination  Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More informationA SIMPLE IMPROVEMENT OF THE KAPLANMEIER ESTIMATOR. Agnieszka Rossa
A SIMPLE IMPROVEMENT OF THE KAPLANMEIER ESTIMATOR Agnieszka Rossa Dept of Stat Methods, University of Lódź, Poland Rewolucji 1905, 41, Lódź email: agrossa@krysiaunilodzpl and Ryszard Zieliński Inst Math
More informationLongitudinal Modeling with Logistic Regression
Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to
More informationBasic Medical Statistics Course
Basic Medical Statistics Course S7 Logistic Regression November 2015 Wilma Heemsbergen w.heemsbergen@nki.nl Logistic Regression The concept of a relationship between the distribution of a dependent variable
More informationLecture 8 Stat D. Gillen
Statistics 255  Survival Analysis Presented February 23, 2016 Dan Gillen Department of Statistics University of California, Irvine 8.1 Example of two ways to stratify Suppose a confounder C has 3 levels
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationBIOS 312: Precision of Statistical Inference
and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample
More informationCensoring. Time to Event (Survival) Data. Special features of time to event (survival) data: Strictly nonnegative observations
Time to Event (Survival) Data Survival analysis is the analysis of observed times from a well defined origin to the occurrence of a particular event or endpoint. Time from entry into a clinical trial
More informationAFT Models and Empirical Likelihood
AFT Models and Empirical Likelihood Mai Zhou Department of Statistics, University of Kentucky Collaborators: Gang Li (UCLA); A. Bathke; M. Kim (Kentucky) Accelerated Failure Time (AFT) models: Y = log(t
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationRandomization Tests for Regression Models in Clinical Trials
Randomization Tests for Regression Models in Clinical Trials A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at George Mason University By Parwen
More informationSurvival Analysis. Lu Tian and Richard Olshen Stanford University
1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival
More informationOn the Existence and Uniqueness of the Maximum Likelihood Estimators of Normal and Lognormal Population Parameters with Grouped Data
Florida International University FIU Digital Commons Department of Mathematics and Statistics College of Arts, Sciences & Education 6162009 On the Existence and Uniqueness of the Maximum Likelihood Estimators
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationThe Logit Model: Estimation, Testing and Interpretation
The Logit Model: Estimation, Testing and Interpretation Herman J. Bierens October 25, 2008 1 Introduction to maximum likelihood estimation 1.1 The likelihood function Consider a random sample Y 1,...,
More informationA fast routine for fitting Cox models with time varying effects
Chapter 3 A fast routine for fitting Cox models with time varying effects Abstract The Splus and R statistical packages have implemented a counting process setup to estimate Cox models with time varying
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationA Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression  nonnormal/nonlinear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationLecture 3. Truncation, lengthbias and prevalence sampling
Lecture 3. Truncation, lengthbias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in
More informationConfidence Bands for the Logistic and Probit Regression Models Over Intervals
Confidence Bands for the Logistic and Probit Regression Models Over Intervals arxiv:1604.01242v1 [math.st] 5 Apr 2016 Lucy Kerns Department of Mathematics and Statistics Youngstown State University Youngstown
More informationPractice Exam 1. (A) (B) (C) (D) (E) You are given the following data on loss sizes:
Practice Exam 1 1. Losses for an insurance coverage have the following cumulative distribution function: F(0) = 0 F(1,000) = 0.2 F(5,000) = 0.4 F(10,000) = 0.9 F(100,000) = 1 with linear interpolation
More informationIntroduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017
Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent
More informationConfidence intervals for the variance component of randomeffects linear models
The Stata Journal (2004) 4, Number 4, pp. 429 435 Confidence intervals for the variance component of randomeffects linear models Matteo Bottai Arnold School of Public Health University of South Carolina
More informationGEE for Longitudinal Data  Chapter 8
GEE for Longitudinal Data  Chapter 8 GEE: generalized estimating equations (Liang & Zeger, 1986; Zeger & Liang, 1986) extension of GLM to longitudinal data analysis using quasilikelihood estimation method
More informationLongitudinal + Reliability = Joint Modeling
Longitudinal + Reliability = Joint Modeling Carles Serrat Institute of Statistics and Mathematics Applied to Building CYTEDHAROSA International Workshop November 2122, 2013 Barcelona Mainly from Rizopoulos,
More informationConfidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s
Chapter 866 Confidence Intervals for the Odds Ratio in Logistic Regression with Two Binary X s Introduction Logistic regression expresses the relationship between a binary response variable and one or
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationStatistics 262: Intermediate Biostatistics Regression & Survival Analysis
Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Introduction This course is an applied course,
More informationIntegrated likelihoods in survival models for highlystratified
Working Paper Series, N. 1, January 2014 Integrated likelihoods in survival models for highlystratified censored data Giuliana Cortese Department of Statistical Sciences University of Padua Italy Nicola
More informationSurvival Analysis. STAT 526 Professor Olga Vitek
Survival Analysis STAT 526 Professor Olga Vitek May 4, 2011 9 Survival Data and Survival Functions Statistical analysis of timetoevent data Lifetime of machines and/or parts (called failure time analysis
More informationLikelihood Construction, Inference for Parametric Survival Distributions
Week 1 Likelihood Construction, Inference for Parametric Survival Distributions In this section we obtain the likelihood function for noninformatively rightcensored survival data and indicate how to make
More informationHypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2)
Hypothesis Testing, Power, Sample Size and Confidence Intervals (Part 2) B.H. Robbins Scholars Series June 23, 2010 1 / 29 Outline Ztest χ 2 test Confidence Interval Sample size and power Relative effect
More informationStatistical Inference and Methods
Department of Mathematics Imperial College London d.stephens@imperial.ac.uk http://stats.ma.ic.ac.uk/ das01/ 31st January 2006 Part VI Session 6: Filtering and Time to Event Data Session 6: Filtering and
More informationModel comparison and selection
BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)
More informationGeneralized Linear Modeling  Logistic Regression
1 Generalized Linear Modeling  Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More informationSTA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random
STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression
More informationLecture 4: Newton s method and gradient descent
Lecture 4: Newton s method and gradient descent Newton s method Functional iteration Fitting linear regression Fitting logistic regression Prof. Yao Xie, ISyE 6416, Computational Statistics, Georgia Tech
More informationContinuous Time Survival in Latent Variable Models
Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract
More informationGoodnessOfFit for Cox s Regression Model. Extensions of Cox s Regression Model. Survival Analysis Fall 2004, Copenhagen
Outline Cox s proportional hazards model. Goodnessoffit tools More flexible models Rpackage timereg Forthcoming book, Martinussen and Scheike. 2/38 University of Copenhagen http://www.biostat.ku.dk
More informationA new strategy for metaanalysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston
A new strategy for metaanalysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials
More informationA SMOOTHED VERSION OF THE KAPLANMEIER ESTIMATOR. Agnieszka Rossa
A SMOOTHED VERSION OF THE KAPLANMEIER ESTIMATOR Agnieszka Rossa Dept. of Stat. Methods, University of Lódź, Poland Rewolucji 1905, 41, Lódź email: agrossa@krysia.uni.lodz.pl and Ryszard Zieliński Inst.
More informationLOGISTICS REGRESSION FOR SAMPLE SURVEYS
4 LOGISTICS REGRESSION FOR SAMPLE SURVEYS Hukum Chandra Indian Agricultural Statistics Research Institute, New Delhi002 4. INTRODUCTION Researchers use sample survey methodology to obtain information
More informationHarvard University. Harvard University Biostatistics Working Paper Series. Survival Analysis with Change Point Hazard Functions
Harvard University Harvard University Biostatistics Working Paper Series Year 2006 Paper 40 Survival Analysis with Change Point Hazard Functions Melody S. Goodman Yi Li Ram C. Tiwari Harvard University,
More informationMüller: Goodnessoffit criteria for survival data
Müller: Goodnessoffit criteria for survival data Sonderforschungsbereich 386, Paper 382 (2004) Online unter: http://epub.ub.unimuenchen.de/ Projektpartner Goodness of fit criteria for survival data
More informationCOMPLEMENTARY LOGLOG MODEL
COMPLEMENTARY LOGLOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementaryloglog model. They all follow the same form π ( x) =Φ ( α
More informationDiscussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon
Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics
More informationFrailty Models and Copulas: Similarities and Differences
Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt
More informationDescription Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see
Title stata.com logistic postestimation Postestimation tools for logistic Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see
More informationInvestigation of goodnessoffit test statistic distributions by random censored samples
d samples Investigation of goodnessoffit test statistic distributions by random censored samples Novosibirsk State Technical University November 22, 2010 d samples Outline 1 Nonparametric goodnessoffit
More informationApplied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne
Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Motivation: Why Applied Statistics?
More information18.465, further revised November 27, 2012 Survival analysis and the Kaplan Meier estimator
18.465, further revised November 27, 2012 Survival analysis and the Kaplan Meier estimator 1. Definitions Ordinarily, an unknown distribution function F is estimated by an empirical distribution function
More information