The coxvc_1-1-1 package

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "The coxvc_1-1-1 package"

Transcription

1 Appendix A The coxvc_1-1-1 package A.1 Introduction The coxvc_1-1-1 package is a set of functions for survival analysis that run under R2.1.1 [81]. This package contains a set of routines to fit Cox models [24] with time varying effects of the covariates and reduced-rank models [77]. What makes those two modelling approaches so special is that an expanded data set has to be created before fitting, making the task computationally demanding, since even small data sets explode when stacking together all the possible risk sets. Using coxvc the models can be fitted on the original data, in a very fast and efficient algorithm, as described in [76]. The set of routines included in the package also contains some small useful functions that the authors often use when fitting survival models. The coxvc requires packages MASS, splines and survival [64], which are automatically loaded when you use the package. Please refer to the manual of those packages for more information. The MASS [102] package is loaded for using the command ginverse which is essential when estimating the generalized inverse matrix of the information matrix from a reduced-rank model. Splines are loaded in order to transform some of the covariates when running the models. Note that this package is not essential (although the build in examples of the coxvc package use splines) but it is definitely useful in many applications. Last, the survival package is the base core of the package, since it is needed for creating the survival objects used in our examples. A.2 Statistical background The Cox proportional hazards models is the most common method to analyze survival data. However, the main assumption of proportionality - the hazard ratio of two different cases remain constant regardless of time- is often violated, especially in studies with long follow up. The most straightforward way to extent the model is via the inclusion of interactions of the covariates with time 123

2 The coxvc_1-1-1 package functions. A non-proportional Cox model may be written as: h(t X) = h 0 (t) exp(xθf ) (A.1) where h 0 (t) is the unspecified baseline hazard, X is an 1 p matrix of p covariates, F is a n q matrix of q time functions, and Θ is a p q matrix of estimable coefficients. Perperoglou, le Cessie and van Houwelingen [77] introduced the idea of reduced-rank regression to survival analysis with time varying coefficients. A reduced-rank model requires the matrix of regression coefficients Θ to be written as a product of two submatrices, B of size p r and Γ of size q r, thus resulting in Θ = BΓ, a matrix of reduced-rank r, smaller than the number of covariates p or the number of time functions q. For fitting the full model, r has to be chosen to be equal to the minimum (p, q), in which case the structure matrix Θ is of full rank. This package was created to fulfil the demand of fitting reduced- rank hazards models in a fast and efficient way. For motivation of the package use refer to [76]. The new version of the package contains an additional set of small functions that were found useful to the author in several cases when analyzing survival data. A.3 Examples First load the coxvc library: > library(coxvc) The sample data within this library come from a study of ovarian cancer patients [104]. There are in total 358 cases of patients with information of the following variables: 124 time The number of days from enrollment until death or censoring. death An indicator of death (1) or censoring (0). karn The karnofsky index measuring the ability of the patients to perform several tasks. diam The diameter of the residual tumor. figo The Figo index, denoting the site of the metastasis. x Patient id

3 A.3. Examples Table A.1: Definitions of variables and patients frequencies X k Karnofsky < 70 n X f 0 1 Figo III IV n X d Diameter Micro < > 5 n For more information refer to table A.1. First attach the data: > data(ova) > attach(ova) A short summary of the data follows: > summary(ova) time death karn figo Min. : 7.0 Min. :0.000 Min. :0.000 Min. : st Qu.: st Qu.: st Qu.: st Qu.: Median : Median :1.000 Median :1.000 Median : Mean : Mean :0.743 Mean :1.173 Mean : rd Qu.: rd Qu.: rd Qu.: rd Qu.: Max. : Max. :1.000 Max. :4.000 Max. : diam x Min. :0.000 Min. : st Qu.: st Qu.: Median :3.000 Median : Mean :2.651 Mean : rd Qu.: rd Qu.: Max. :4.000 Max. : A simple Cox proportional hazards model can be fitted in the usual way using the coxph command from survival library: 125

4 The coxvc_1-1-1 package > fit.ph <- coxph(surv(time, death) ~ karn + diam + figo) > fit.ph Call: coxph(formula = Surv(time, death) ~ karn + diam + figo) coef exp(coef) se(coef) z p karn e-03 diam e-05 figo e-05 Likelihood ratio test=64.1 on 3 df, p=7.68e-14 n= 358 A test of proportionality based on Schoenfeld residuals [92] reveals that in fact there are deviations from proportional hazards in the data. > cox.zph(fit.ph) rho chisq p karn diam figo GLOBAL NA as it is indicated by the small global p-value given above. A graphical inspection given by: > par(mfrow = c(3, 1)) > plot(cox.zph(fit.ph)) The results are shown in figure A.1 and suggest that there may be an interaction of time with the covariates. A first approach will be to fit a full rank model, which includes the full Θ matrix. We choose to transform time using B-splines, thus create the F matrix to contain F 1 (t) = 1 a constant and cubic B-spline functions on 3 degrees of freedom: > Ft <- cbind(rep(1, nrow(ova)), bs(time, df = 3)) Then the full rank model is given by: > fit.r3 <- coxvc(surv(time, death) ~ karn + diam + figo, Ft, rank = 3, + data = ova) > fit.r3 126

5 A.3. Examples Beta(t) for karn Time Beta(t) for diam Time Beta(t) for figo Time Figure A.1: Test of proportionality based on scaled Schoenfeld residuals along with a spline smooth with 90% confidence intervals. call: coxvc(formula = Surv(time, death) ~ karn + diam + figo, Ft = Ft, rank = 3, data = ova) coef exp(coef) se(coef) z p karn diam figo karn:f1(t) diam:f1(t) figo:f1(t) karn:f2(t) diam:f2(t) figo:f2(t)

6 The coxvc_1-1-1 package karn:f3(t) diam:f3(t) figo:f3(t) log-likelihood= algorithm converged in 5 iterations The class of object fit.r3 is coxvc. The generic function printcoxvc is included in the package for printing results from the full model. The model has 21 parameters, and in practice the results are identical with fitting a coxph model on the expanded data set. However, the fit here was done in 5 iterations, on the original data set, which makes the routine much faster and more efficient. There are in total 266 events present in the ovarian data set. The object fit.r3 also contains the baseline hazard evaluated at this event time points. The function expand.haz can be used for expanding either the baseline or the cumulative baseline hazard. > haz <- fit.r3$hazard > length(haz) [1] 266 > haz.exp <- expand.haz(haz, death, fun = "baseline") > length(haz.exp) [1] 358 When expanding the baseline hazard, the function assigns a zero value in the time points of censoring, while when expanding a cumulative baseline hazard, the function assigns the value of the cumulative baseline at the time where the previous event took place whenever there is a censored case. > cum.haz <- cumsum(haz) > cum.haz.exp <- expand.haz(cum.haz, death, fun = "cumulative") The function plotcoxvc is included in the package to draw figures of the time varying behavior of the covariates: > plotcoxvc(fit.r3, fun = "effects", xlab = "time in days") The same function can be also used for plotting the survival function. Since the object fit.r3 is a coxvc using plot(survfit(...)) will not give the survival plot. Instead, the function plotcoxvc can be used: 128

7 A.3. Examples karn diam figo time in days Figure A.2: Estimated effects of the covariates over time, for the full rank model. > plotcoxvc(fit.r3, fun = "survival", xlab = "time in days") In figure A.2 we have seen that the time varying behavior of the covariates is too flexible, especially in the last days of the follow up. We fitted a rank=2 model at the data, to see whether the fit improves: > fit.r2 <- coxvc(surv(time, death) ~ karn + diam + figo, Ft, rank = 2, + data = ova) > fit.r2 call: coxvc(formula = Surv(time, death) ~ karn + diam + figo, Ft = Ft, rank = 2, data = ova) coef exp(coef) se(coef) karn

8 The coxvc_1-1-1 package time in days Figure A.3: Survival function for the full rank model. diam figo karn:f1(t) diam:f1(t) figo:f1(t) karn:f2(t) diam:f2(t) figo:f2(t) karn:f3(t) diam:f3(t) figo:f3(t) log-likelihood= , Rank= 2 algorithm converged in 12 iterations 130

9 A.3. Examples Beta : Gamma: [,1] [,2] [,1] [,2] [1,] [1,] [2,] [2,] [3,] [3,] [4,] > summary(fit.r2) call: coxvc(formula = Surv(time, death) ~ karn + diam + figo, Ft = Ft, rank = 2, data = ova) Beta : Gamma: [,1] [,2] [,1] [,2] [1,] [1,] [2,] [2,] [3,] [3,] [4,] The class of fit.r2 is coxrr. For reduced-rank models the generic function print.coxrr will print the estimated coefficients of the model along with their standard errors and so forth, as well as the factors of the Θ matrix, B and Γ. Moreover, the function summary.coxrr will provide also summary of the B and Γ matrices. We see that the rank=2 model, with 16 parameters in total, has a more reasonable fitting of the covariate effects > plotcoxvc(fit.r2, fun = "effects", xlab = "time in days") while the rank=1 model with 9 free parameters, is more much more rigid: > fit.r1 <- coxvc(surv(time, death) ~ karn + diam + figo, Ft, rank = 1, + data = ova) > fit.r1 call: coxvc(formula = Surv(time, death) ~ karn + diam + figo, Ft = Ft, rank = 1, data = ova) 131

10 The coxvc_1-1-1 package karn diam figo time in days Figure A.4: Estimated effects of the covariates over time, for the rank=2 model. coef exp(coef) se(coef) karn diam figo karn:f1(t) diam:f1(t) figo:f1(t) karn:f2(t) diam:f2(t) figo:f2(t) karn:f3(t) diam:f3(t) figo:f3(t)

11 A.3. Examples log-likelihood= , Rank= 1 algorithm converged in 5 iterations Beta : Gamma: [,1] [,1] [1,] [1,] [2,] [2,] [3,] [3,] [4,] > plotcoxvc(fit.r1, fun = "effects", xlab = " time in days") karn diam figo time in days Figure A.5: Estimated effects of the covariates over time, for the rank=1 model. The package also contains a small function calc.h0 to compute the baseline hazard from a Cox model, evaluated for a case with all covariate values equal 133

12 The coxvc_1-1-1 package to zero. For example consider the simple proportional hazards model fit.ph. To get an estimate of the baseline hazard the function coxph.details can be used: > haz.ph <- coxph.detail(fit.ph)$haz > haz.ph0 <- calc.h0(fit.ph) The object haz.ph is the baseline hazard evaluated at the mean value of the covariates, while the object haz.ph0 is the baseline hazard evaluated for all covariate values equal to zero. This can be seen in graph A.6: > plot(time[death == 1], exp(-cumsum(haz.ph)), ylim = c(0, 1), + ylab = "", "l") > lines(time[death == 1], exp(-cumsum(haz.ph0)), col = 2) time[death == 1] Figure A.6: Figure of survival for an average person (black line) and a person with covariates X = 0 134

Chapter 4 Regression Models

Chapter 4 Regression Models 23.August 2010 Chapter 4 Regression Models The target variable T denotes failure time We let x = (x (1),..., x (m) ) represent a vector of available covariates. Also called regression variables, regressors,

More information

Multivariable Fractional Polynomials

Multivariable Fractional Polynomials Multivariable Fractional Polynomials Axel Benner September 7, 2015 Contents 1 Introduction 1 2 Inventory of functions 1 3 Usage in R 2 3.1 Model selection........................................ 3 4 Example

More information

Extensions of Cox Model for Non-Proportional Hazards Purpose

Extensions of Cox Model for Non-Proportional Hazards Purpose PhUSE 2013 Paper SP07 Extensions of Cox Model for Non-Proportional Hazards Purpose Jadwiga Borucka, PAREXEL, Warsaw, Poland ABSTRACT Cox proportional hazard model is one of the most common methods used

More information

Understanding the Cox Regression Models with Time-Change Covariates

Understanding the Cox Regression Models with Time-Change Covariates Understanding the Cox Regression Models with Time-Change Covariates Mai Zhou University of Kentucky The Cox regression model is a cornerstone of modern survival analysis and is widely used in many other

More information

Analysis of competing risks data and simulation of data following predened subdistribution hazards

Analysis of competing risks data and simulation of data following predened subdistribution hazards Analysis of competing risks data and simulation of data following predened subdistribution hazards Bernhard Haller Institut für Medizinische Statistik und Epidemiologie Technische Universität München 27.05.2013

More information

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016 Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of

More information

Package ICGOR. January 13, 2017

Package ICGOR. January 13, 2017 Package ICGOR January 13, 2017 Type Package Title Fit Generalized Odds Rate Hazards Model with Interval Censored Data Version 2.0 Date 2017-01-12 Author Jie Zhou, Jiajia Zhang, Wenbin Lu Maintainer Jie

More information

ST495: Survival Analysis: Maximum likelihood

ST495: Survival Analysis: Maximum likelihood ST495: Survival Analysis: Maximum likelihood Eric B. Laber Department of Statistics, North Carolina State University February 11, 2014 Everything is deception: seeking the minimum of illusion, keeping

More information

Meta-analysis of epidemiological dose-response studies

Meta-analysis of epidemiological dose-response studies Meta-analysis of epidemiological dose-response studies Nicola Orsini 2nd Italian Stata Users Group meeting October 10-11, 2005 Institute Environmental Medicine, Karolinska Institutet Rino Bellocco Dept.

More information

Survival models and health sequences

Survival models and health sequences Survival models and health sequences Walter Dempsey University of Michigan July 27, 2015 Survival Data Problem Description Survival data is commonplace in medical studies, consisting of failure time information

More information

Tied survival times; estimation of survival probabilities

Tied survival times; estimation of survival probabilities Tied survival times; estimation of survival probabilities Patrick Breheny November 5 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/22 Introduction Tied survival times Introduction Breslow approximation

More information

Müller: Goodness-of-fit criteria for survival data

Müller: Goodness-of-fit criteria for survival data Müller: Goodness-of-fit criteria for survival data Sonderforschungsbereich 386, Paper 382 (2004) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Goodness of fit criteria for survival data

More information

Multistate Modeling and Applications

Multistate Modeling and Applications Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

More information

Package ICBayes. September 24, 2017

Package ICBayes. September 24, 2017 Package ICBayes September 24, 2017 Title Bayesian Semiparametric Models for Interval-Censored Data Version 1.1 Date 2017-9-24 Author Chun Pan, Bo Cai, Lianming Wang, and Xiaoyan Lin Maintainer Chun Pan

More information

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

More information

Logistic regression model for survival time analysis using time-varying coefficients

Logistic regression model for survival time analysis using time-varying coefficients Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research

More information

Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes.

Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes. Unit 2: Models, Censoring, and Likelihood for Failure-Time Data Notes largely based on Statistical Methods for Reliability Data by W.Q. Meeker and L. A. Escobar, Wiley, 1998 and on their class notes. Ramón

More information

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science

Log-linearity for Cox s regression model. Thesis for the Degree Master of Science Log-linearity for Cox s regression model Thesis for the Degree Master of Science Zaki Amini Master s Thesis, Spring 2015 i Abstract Cox s regression model is one of the most applied methods in medical

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software January 2011, Volume 38, Issue 2. http://www.jstatsoft.org/ Analyzing Competing Risk Data Using the R timereg Package Thomas H. Scheike University of Copenhagen Mei-Jie

More information

Assessment of time varying long term effects of therapies and prognostic factors

Assessment of time varying long term effects of therapies and prognostic factors Assessment of time varying long term effects of therapies and prognostic factors Dissertation by Anika Buchholz Submitted to Fakultät Statistik, Technische Universität Dortmund in Fulfillment of the Requirements

More information

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see Title stata.com stcrreg postestimation Postestimation tools for stcrreg Description Syntax for predict Menu for predict Options for predict Remarks and examples Methods and formulas References Also see

More information

Introduction to the rstpm2 package

Introduction to the rstpm2 package Introduction to the rstpm2 package Mark Clements Karolinska Institutet Abstract This vignette outlines the methods and provides some examples for link-based survival models as implemented in the R rstpm2

More information

x y x y 15 y is directly proportional to x. a Draw the graph of y against x.

x y x y 15 y is directly proportional to x. a Draw the graph of y against x. 3 8.1 Direct proportion 1 x 2 3 5 10 12 y 6 9 15 30 36 B a Draw the graph of y against x. y 40 30 20 10 0 0 5 10 15 20 x b Write down a rule for y in terms of x.... c Explain why y is directly proportional

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Methodological challenges in research on consequences of sickness absence and disability pension?

Methodological challenges in research on consequences of sickness absence and disability pension? Methodological challenges in research on consequences of sickness absence and disability pension? Prof., PhD Hjelt Institute, University of Helsinki 2 Two methodological approaches Lexis diagrams and Poisson

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software January 2011, Volume 38, Issue 7. http://www.jstatsoft.org/ mstate: An R Package for the Analysis of Competing Risks and Multi-State Models Liesbeth C. de Wreede Leiden

More information

Introduction to Linear Regression

Introduction to Linear Regression Introduction to Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Introduction to Linear Regression 1 / 46

More information

Reliability Engineering I

Reliability Engineering I Happiness is taking the reliability final exam. Reliability Engineering I ENM/MSC 565 Review for the Final Exam Vital Statistics What R&M concepts covered in the course When Monday April 29 from 4:30 6:00

More information

Lecture 3. Truncation, length-bias and prevalence sampling

Lecture 3. Truncation, length-bias and prevalence sampling Lecture 3. Truncation, length-bias and prevalence sampling 3.1 Prevalent sampling Statistical techniques for truncated data have been integrated into survival analysis in last two decades. Truncation in

More information

0.1 weibull: Weibull Regression for Duration Dependent

0.1 weibull: Weibull Regression for Duration Dependent 0.1 weibull: Weibull Regression for Duration Dependent Variables Choose the Weibull regression model if the values in your dependent variable are duration observations. The Weibull model relaxes the exponential

More information

ECON 5350 Class Notes Functional Form and Structural Change

ECON 5350 Class Notes Functional Form and Structural Change ECON 5350 Class Notes Functional Form and Structural Change 1 Introduction Although OLS is considered a linear estimator, it does not mean that the relationship between Y and X needs to be linear. In this

More information

Continuous Time Survival in Latent Variable Models

Continuous Time Survival in Latent Variable Models Continuous Time Survival in Latent Variable Models Tihomir Asparouhov 1, Katherine Masyn 2, Bengt Muthen 3 Muthen & Muthen 1 University of California, Davis 2 University of California, Los Angeles 3 Abstract

More information

Package GORCure. January 13, 2017

Package GORCure. January 13, 2017 Package GORCure January 13, 2017 Type Package Title Fit Generalized Odds Rate Mixture Cure Model with Interval Censored Data Version 2.0 Date 2017-01-12 Author Jie Zhou, Jiajia Zhang, Wenbin Lu Maintainer

More information

Univariate Descriptive Statistics for One Sample

Univariate Descriptive Statistics for One Sample Department of Psychology and Human Development Vanderbilt University 1 Introduction 2 3 4 5 6 7 8 Introduction Our first step in descriptive statistics is to characterize the data in a single group of

More information

Survival Analysis. Lu Tian and Richard Olshen Stanford University

Survival Analysis. Lu Tian and Richard Olshen Stanford University 1 Survival Analysis Lu Tian and Richard Olshen Stanford University 2 Survival Time/ Failure Time/Event Time We will introduce various statistical methods for analyzing survival outcomes What is the survival

More information

Mixed effects models

Mixed effects models Mixed effects models The basic theory and application in R Mitchel van Loon Research Paper Business Analytics Mixed effects models The basic theory and application in R Author: Mitchel van Loon Research

More information

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution

A comparison of inverse transform and composition methods of data simulation from the Lindley distribution Communications for Statistical Applications and Methods 2016, Vol. 23, No. 6, 517 529 http://dx.doi.org/10.5351/csam.2016.23.6.517 Print ISSN 2287-7843 / Online ISSN 2383-4757 A comparison of inverse transform

More information

Generalized Linear Models with Functional Predictors

Generalized Linear Models with Functional Predictors Generalized Linear Models with Functional Predictors GARETH M. JAMES Marshall School of Business, University of Southern California Abstract In this paper we present a technique for extending generalized

More information

Kernel density estimation in R

Kernel density estimation in R Kernel density estimation in R Kernel density estimation can be done in R using the density() function in R. The default is a Guassian kernel, but others are possible also. It uses it s own algorithm to

More information

Longitudinal Modeling with Logistic Regression

Longitudinal Modeling with Logistic Regression Newsom 1 Longitudinal Modeling with Logistic Regression Longitudinal designs involve repeated measurements of the same individuals over time There are two general classes of analyses that correspond to

More information

Survival Distributions, Hazard Functions, Cumulative Hazards

Survival Distributions, Hazard Functions, Cumulative Hazards BIO 244: Unit 1 Survival Distributions, Hazard Functions, Cumulative Hazards 1.1 Definitions: The goals of this unit are to introduce notation, discuss ways of probabilistically describing the distribution

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Package lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer

Package lmm. R topics documented: March 19, Version 0.4. Date Title Linear mixed models. Author Joseph L. Schafer Package lmm March 19, 2012 Version 0.4 Date 2012-3-19 Title Linear mixed models Author Joseph L. Schafer Maintainer Jing hua Zhao Depends R (>= 2.0.0) Description Some

More information

SPSS LAB FILE 1

SPSS LAB FILE  1 SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Distributed-lag linear structural equation models in R: the dlsem package

Distributed-lag linear structural equation models in R: the dlsem package Distributed-lag linear structural equation models in R: the dlsem package Alessandro Magrini Dep. Statistics, Computer Science, Applications University of Florence, Italy dlsem

More information

Follow-up data with the Epi package

Follow-up data with the Epi package Follow-up data with the Epi package Summer 2014 Michael Hills Martyn Plummer Bendix Carstensen Retired Highgate, London International Agency for Research on Cancer, Lyon plummer@iarc.fr Steno Diabetes

More information

Package rnmf. February 20, 2015

Package rnmf. February 20, 2015 Type Package Title Robust Nonnegative Matrix Factorization Package rnmf February 20, 2015 An implementation of robust nonnegative matrix factorization (rnmf). The rnmf algorithm decomposes a nonnegative

More information

Statistical Analysis of Pipe Breaks in Water Distribution Systems in Ethiopia, the Case of Hawassa

Statistical Analysis of Pipe Breaks in Water Distribution Systems in Ethiopia, the Case of Hawassa IOSR Journal of Mathematics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 12, Issue 3 Ver. IV (May. - Jun. 2016), PP 127-136 www.iosrjournals.org Statistical Analysis of Pipe Breaks in Water Distribution

More information

pensim Package Example (Version 1.2.9)

pensim Package Example (Version 1.2.9) pensim Package Example (Version 1.2.9) Levi Waldron March 13, 2014 Contents 1 Introduction 1 2 Example data 2 3 Nested cross-validation 2 3.1 Summarization and plotting..................... 3 4 Getting

More information

Bayesian course - problem set 5 (lecture 6)

Bayesian course - problem set 5 (lecture 6) Bayesian course - problem set 5 (lecture 6) Ben Lambert November 30, 2016 1 Stan entry level: discoveries data The file prob5 discoveries.csv contains data on the numbers of great inventions and scientific

More information

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY

BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY BIAS OF MAXIMUM-LIKELIHOOD ESTIMATES IN LOGISTIC AND COX REGRESSION MODELS: A COMPARATIVE SIMULATION STUDY Ingo Langner 1, Ralf Bender 2, Rebecca Lenz-Tönjes 1, Helmut Küchenhoff 2, Maria Blettner 2 1

More information

Exam C Solutions Spring 2005

Exam C Solutions Spring 2005 Exam C Solutions Spring 005 Question # The CDF is F( x) = 4 ( + x) Observation (x) F(x) compare to: Maximum difference 0. 0.58 0, 0. 0.58 0.7 0.880 0., 0.4 0.680 0.9 0.93 0.4, 0.6 0.53. 0.949 0.6, 0.8

More information

Assessing the effect of a partly unobserved, exogenous, binary time-dependent covariate on -APPENDIX-

Assessing the effect of a partly unobserved, exogenous, binary time-dependent covariate on -APPENDIX- Assessing the effect of a partly unobserved, exogenous, binary time-dependent covariate on survival probabilities using generalised pseudo-values Ulrike Pötschger,2, Harald Heinzl 2, Maria Grazia Valsecchi

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

Modeling Overdispersion

Modeling Overdispersion James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in

More information

Generalized additive modelling of hydrological sample extremes

Generalized additive modelling of hydrological sample extremes Generalized additive modelling of hydrological sample extremes Valérie Chavez-Demoulin 1 Joint work with A.C. Davison (EPFL) and Marius Hofert (ETHZ) 1 Faculty of Business and Economics, University of

More information

Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

More information

R-squared for Bayesian regression models

R-squared for Bayesian regression models R-squared for Bayesian regression models Andrew Gelman Ben Goodrich Jonah Gabry Imad Ali 8 Nov 2017 Abstract The usual definition of R 2 (variance of the predicted values divided by the variance of the

More information

Measuring relationships among multiple responses

Measuring relationships among multiple responses Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.

More information

Package elhmc. R topics documented: July 4, Type Package

Package elhmc. R topics documented: July 4, Type Package Package elhmc July 4, 2017 Type Package Title Sampling from a Empirical Likelihood Bayesian Posterior of Parameters Using Hamiltonian Monte Carlo Version 1.1.0 Date 2017-07-03 Author Dang Trung Kien ,

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

0.1 blogit: Bivariate Logistic Regression for Two Dichotomous

0.1 blogit: Bivariate Logistic Regression for Two Dichotomous 0.1 blogit: Bivariate Logistic Regression for Two Dichotomous Dependent Variables Use the bivariate logistic regression model if you have two binary dependent variables (Y 1, Y 2 ), and wish to model them

More information

Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data

Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data Journal of Modern Applied Statistical Methods Volume 13 Issue 2 Article 22 11-2014 Double Bootstrap Confidence Interval Estimates with Censored and Truncated Data Jayanthi Arasan University Putra Malaysia,

More information

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where

STAT 331. Accelerated Failure Time Models. Previously, we have focused on multiplicative intensity models, where STAT 331 Accelerated Failure Time Models Previously, we have focused on multiplicative intensity models, where h t z) = h 0 t) g z). These can also be expressed as H t z) = H 0 t) g z) or S t z) = e Ht

More information

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis Hanxiang Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis March 4, 2009 Outline Project I: Free Knot Spline Cox Model Project I: Free Knot Spline Cox Model Consider

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston

A new strategy for meta-analysis of continuous covariates in observational studies with IPD. Willi Sauerbrei & Patrick Royston A new strategy for meta-analysis of continuous covariates in observational studies with IPD Willi Sauerbrei & Patrick Royston Overview Motivation Continuous variables functional form Fractional polynomials

More information

R-companion to: Estimation of the Thurstonian model for the 2-AC protocol

R-companion to: Estimation of the Thurstonian model for the 2-AC protocol R-companion to: Estimation of the Thurstonian model for the 2-AC protocol Rune Haubo Bojesen Christensen, Hye-Seong Lee & Per Bruun Brockhoff August 24, 2017 This document describes how the examples in

More information

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA

CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Examples: Multilevel Modeling With Complex Survey Data CHAPTER 9 EXAMPLES: MULTILEVEL MODELING WITH COMPLEX SURVEY DATA Complex survey data refers to data obtained by stratification, cluster sampling and/or

More information

M3 Symposium: Multilevel Multivariate Survival Models For Analysis of Dyadic Social Interaction

M3 Symposium: Multilevel Multivariate Survival Models For Analysis of Dyadic Social Interaction M3 Symposium: Multilevel Multivariate Survival Models For Analysis of Dyadic Social Interaction Mike Stoolmiller: stoolmil@uoregon.edu University of Oregon 5/21/2013 Outline Example Research Questions

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

Multi-period credit default prediction with time-varying covariates. Walter Orth University of Cologne, Department of Statistics and Econometrics

Multi-period credit default prediction with time-varying covariates. Walter Orth University of Cologne, Department of Statistics and Econometrics with time-varying covariates Walter Orth University of Cologne, Department of Statistics and Econometrics 2 20 Overview Introduction Approaches in the literature The proposed models Empirical analysis

More information

A brief introduction to mixed models

A brief introduction to mixed models A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.

More information

Treatment Effects with Normal Disturbances in sampleselection Package

Treatment Effects with Normal Disturbances in sampleselection Package Treatment Effects with Normal Disturbances in sampleselection Package Ott Toomet University of Washington December 7, 017 1 The Problem Recent decades have seen a surge in interest for evidence-based policy-making.

More information

Analysing categorical data using logit models

Analysing categorical data using logit models Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

More information

Unit 2 Regression and Correlation Practice Problems. SOLUTIONS Version STATA

Unit 2 Regression and Correlation Practice Problems. SOLUTIONS Version STATA PubHlth 640. Regression and Correlation Page 1 of 19 Unit Regression and Correlation Practice Problems SOLUTIONS Version STATA 1. A regression analysis of measurements of a dependent variable Y on an independent

More information

Frailty Models and Copulas: Similarities and Differences

Frailty Models and Copulas: Similarities and Differences Frailty Models and Copulas: Similarities and Differences KLARA GOETHALS, PAUL JANSSEN & LUC DUCHATEAU Department of Physiology and Biometrics, Ghent University, Belgium; Center for Statistics, Hasselt

More information

Gibbs Sampling in Linear Models #2

Gibbs Sampling in Linear Models #2 Gibbs Sampling in Linear Models #2 Econ 690 Purdue University Outline 1 Linear Regression Model with a Changepoint Example with Temperature Data 2 The Seemingly Unrelated Regressions Model 3 Gibbs sampling

More information

Attributable Risk Function in the Proportional Hazards Model

Attributable Risk Function in the Proportional Hazards Model UW Biostatistics Working Paper Series 5-31-2005 Attributable Risk Function in the Proportional Hazards Model Ying Qing Chen Fred Hutchinson Cancer Research Center, yqchen@u.washington.edu Chengcheng Hu

More information

COMPUTER SESSION: ARMA PROCESSES

COMPUTER SESSION: ARMA PROCESSES UPPSALA UNIVERSITY Department of Mathematics Jesper Rydén Stationary Stochastic Processes 1MS025 Autumn 2010 COMPUTER SESSION: ARMA PROCESSES 1 Introduction In this computer session, we work within the

More information

Model Fitting. Jean Yves Le Boudec

Model Fitting. Jean Yves Le Boudec Model Fitting Jean Yves Le Boudec 0 Contents 1. What is model fitting? 2. Linear Regression 3. Linear regression with norm minimization 4. Choosing a distribution 5. Heavy Tail 1 Virus Infection Data We

More information

The GLM really is different than OLS, even with a Normally distributed dependent variable, when the link function g is not the identity.

The GLM really is different than OLS, even with a Normally distributed dependent variable, when the link function g is not the identity. GLM with a Gamma-distributed Dependent Variable. 1 Introduction I started out to write about why the Gamma distribution in a GLM is useful. I ve found it difficult to find an example which proves that

More information

Introductory Statistics with R: Simple Inferences for continuous data

Introductory Statistics with R: Simple Inferences for continuous data Introductory Statistics with R: Simple Inferences for continuous data Statistical Packages STAT 1301 / 2300, Fall 2014 Sungkyu Jung Department of Statistics University of Pittsburgh E-mail: sungkyu@pitt.edu

More information

Lecture 12: Interactions and Splines

Lecture 12: Interactions and Splines Lecture 12: Interactions and Splines Sandy Eckel seckel@jhsph.edu 12 May 2007 1 Definition Effect Modification The phenomenon in which the relationship between the primary predictor and outcome varies

More information

Introduction to SAS proc mixed

Introduction to SAS proc mixed Faculty of Health Sciences Introduction to SAS proc mixed Analysis of repeated measurements, 2017 Julie Forman Department of Biostatistics, University of Copenhagen 2 / 28 Preparing data for analysis The

More information

Search for Blazar Flux-Correlated TeV Neutrinos in IceCube 40-String Data

Search for Blazar Flux-Correlated TeV Neutrinos in IceCube 40-String Data Search for Blazar Flux-Correlated TeV Neutrinos in IceCube 40-String Data Derek Fox, Colin Turley Fourth AMON Workshop 4 December 2015 Penn State 2 Background and Outline Two models for blazar emission:

More information

Lecture 2. October 21, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University.

Lecture 2. October 21, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University. Lecture 2 Department of Biostatistics Johns Hopkins Bloomberg School of Public Health Johns Hopkins University October 21, 2007 1 2 3 4 5 6 Define probability calculus Basic axioms of probability Define

More information

3 Results. Part I. 3.1 Base/primary model

3 Results. Part I. 3.1 Base/primary model 3 Results Part I 3.1 Base/primary model For the development of the base/primary population model the development dataset (for data details see Table 5 and sections 2.1 and 2.2), which included 1256 serum

More information

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai

Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective. Anastasios (Butch) Tsiatis and Xiaofei Bai Optimal Treatment Regimes for Survival Endpoints from a Classification Perspective Anastasios (Butch) Tsiatis and Xiaofei Bai Department of Statistics North Carolina State University 1/35 Optimal Treatment

More information

Statistics 135 Fall 2008 Final Exam

Statistics 135 Fall 2008 Final Exam Name: SID: Statistics 135 Fall 2008 Final Exam Show your work. The number of points each question is worth is shown at the beginning of the question. There are 10 problems. 1. [2] The normal equations

More information

S The Over-Reliance on the Central Limit Theorem

S The Over-Reliance on the Central Limit Theorem S04-2008 The Over-Reliance on the Central Limit Theorem Abstract The objective is to demonstrate the theoretical and practical implication of the central limit theorem. The theorem states that as n approaches

More information

0.1 gamma.mixed: Mixed effects gamma regression

0.1 gamma.mixed: Mixed effects gamma regression 0. gamma.mixed: Mixed effects gamma regression Use generalized multi-level linear regression if you have covariates that are grouped according to one or more classification factors. Gamma regression models

More information

Hazards, Densities, Repeated Events for Predictive Marketing. Bruce Lund

Hazards, Densities, Repeated Events for Predictive Marketing. Bruce Lund Hazards, Densities, Repeated Events for Predictive Marketing Bruce Lund 1 A Proposal for Predicting Customer Behavior A Company wants to predict whether its customers will buy a product or obtain service

More information

Failure rate in the continuous sense. Figure. Exponential failure density functions [f(t)] 1

Failure rate in the continuous sense. Figure. Exponential failure density functions [f(t)] 1 Failure rate (Updated and Adapted from Notes by Dr. A.K. Nema) Part 1: Failure rate is the frequency with which an engineered system or component fails, expressed for example in failures per hour. It is

More information

Lab 11. Multilevel Models. Description of Data

Lab 11. Multilevel Models. Description of Data Lab 11 Multilevel Models Henian Chen, M.D., Ph.D. Description of Data MULTILEVEL.TXT is clustered data for 386 women distributed across 40 groups. ID: 386 women, id from 1 to 386, individual level (level

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information