Modelling Survival Data using Generalized Additive Models with Flexible Link
|
|
- Elmer Randall
- 5 years ago
- Views:
Transcription
1 Modelling Survival Data using Generalized Additive Models with Flexible Link Ana L. Papoila 1 and Cristina S. Rocha 2 1 Faculdade de Ciências Médicas, Dep. de Bioestatística e Informática, Universidade Nova de Lisboa, Campo Mártires da Pátria 130, Lisboa, Portugal, CEAUL (apapoila@hotmail.com) 2 Faculdade de Ciências, Universidade de Lisboa, Campo Grande, Edifício C6, Piso 4, Lisboa, Portugal, CEAUL (cmrocha@fc.ul.pt) Abstract: When using Generalized Linear Models (GLMs), misspecification of the link is very likely to occur due to the fact that the information, necessary to correctly choose this distribution function, is usually unavailable. To overcome this problem, new developments emerged which, simultaneously, gave rise to more flexible models. As a result, survival analysis also derived benefit from this new line of research. In fact, the gamma-logit model may be viewed as a GLM with binary response and unknown link function belonging to the one-parameter family of transformations, introduced by Aranda-Ordaz(1981). We suggest the use of flexible parametric link families in Generalized Additive Models (GAMs) with binary response and propose a generalization of the gamma-logit model, which we will denote by additive gamma-logit model. Based on the local scoring algorithm, the estimation procedure minimizes the deviance through the use of a deviance profile plot. A simulation study was carried out and the proposed methodology was applied to a real current status data set. Keywords: Generalized additive model; unknown link function; survival analysis; gamma-logit model; current status data. 1 Introduction With the evolution of Statistics, there has been an emphasis on the development of models with greater flexibility. That is what happened with the GLMs, in particular with the logistic model. In fact, several generalizations of this model were developed to ensure a minimization of the errors resulting from a bad choice of the link. Power transformation families were used to control symmetric and asymmetric departures from the logistic model and many parametric link classes were proposed (e.g. Pregibon (1980) and Aranda-Ordaz (1981)). As a consequence, survival analysis also benefited from these developments, due to the correspondence that can be established between models in binary regression analysis and in survival analysis (Doksum and Gasko, 1990). For instance, we may refer the gamma-logit model that, from the inferential point of view, is equivalent to a binary response
2 2 The additive gamma-logit model GLM, with unknown link function belonging to the Aranda-Ordaz (1981) transformations family. However, considering a GAM instead of a GLM is the natural extension of the gamma-logit model, in the sense that smooth functions may be used to establish the relation between the covariates and the response variable, often in a more realistic way. Some work has already been done to extend GAMs to a broader class of models with unknown nonparametric link function (Hastie and Tibshirani (1984) and Roca-Pardiña et al. (2004)). In this paper we propose the introduction of parametric link families in GAMs and, although the developed procedures may be applied to any model with a response variable whose distribution belongs to the exponential family, our paper will obviously focus the binary response case. Our proposal lies somewhere between an additive model with a fixed link and an additive model with a fully non-parametric link. When using GAMs with flexible link, it is necessary to calculate an odds ratio curve because, unlike the GLMs, the effect of a continuous covariate on the response depends not only on the shape of the partial function but also on the functional form of the link. In our case, we have derived an estimator of the odds ratio curve and also constructed pointwise confidence intervals for the odds ratios, following Figueiras and Cadarso-Suárez (2001) and Cadarso-Suárez et al. (2005). A simulation study was conducted and the new methodology was applied to a real current status data set. 2 GLMs with flexible parametric link and the gamma-logit model The idea of using GLMs with flexible parametric link emerged as a natural consequence of the development of goodness of link tests for GLMs. In this context, Pregibon (1980) suggested a procedure to examine the adequacy of a particular hypothesized link function of a GLM, by embedding this function and the correct, but unknown, link in a family of link functions. Let Y be a response variable with a distribution belonging to the exponential family and (X 1,..., X p ) a vector of p covariates. A GLM with flexible parametric link is defined by E(Y X 1,..., X p ) = h(β 0 + p j=1 β jx j, ψ), where h, known as the link function, is a strictly monotone differentiable function that belongs to the family H = {h(., ψ) : ψ Ψ}, ψ represents the link parameter vector and β 0, β 1,..., β p are the regression coefficients, that must be estimated from the available data. This defines a broad class of models but, at the present, we will only focus the particular case of a model with binary response and parametric link belonging to the family proposed by Aranda-Ordaz (1981), in order to obtain the existing gammalogit model. In a survival analysis context, this family is defined by { { } log (1 u) γ 1 γ-logit(u) = γ if γ > 0 (1) log[ log(1 u)] if γ = 0.
3 and h is the inverse of the function defined in (1). A.L.Papoila and C.S.Rocha 3 3 GAMs with flexible parametric link and the additive gamma-logit model In this paper, we propose the introduction of GAMs with a flexible parametric link, in order to obtain an extension of the gamma-logit model which we will denote by additive gamma-logit model. Let Y be a response variable with a distribution belonging to the exponential family and (X 1,..., X p ) a vector of p covariates. A GAM with flexible parametric link is defined by µ = E(Y X 1,..., X p ) = h(β 0 + p j=1 f j(x j ), ψ), where h, the link function, is a strictly monotone differentiable function that belongs to the family H = {h(., ψ) : ψ Ψ}, where ψ represents the link parameter vector. The partial functions f j (X j ), j = 1,..., p, are arbitrary univariate functions that must be estimated from the data and represent the effect of the covariates on the response. As previously referred, we will only focus the particular case of a model with a binary response and parametric link belonging to the family proposed by Aranda-Ordaz (1981),{ in order to obtain the additive gamma-logit model defined by F (t x) = h γ-logit [F 0 (t)] + } p j=1 f j(x j ), where F 0 (t) represents the baseline distribution function. In what concerns estimation, we added, to the Fortran program developed by Hastie and Tibshirani (1990), new routines that allowed the estimation of β 0 and of the partial functions f 1,..., f p through the use of the iterative modified backfitting (Buja et al., 1989) and local scoring algorithms (Hastie and Tibshirani, 1990). Cubic smoothing splines were used to model individual predictors. The amount of smoothing was defined, before fitting the model, by the specification of the degrees of freedom. In order to estimate the parameter vector ψ, we used a deviance profile plot. To estimate the odds ratio curve we followed Cadarso-Suárez et al. (2005), that proposed a generalization of the odds ratio curve suggested by Figueiras and Cadarso-Suárez (2001) for the logistic GAMs. In fact, Cadarso-Suárez et al. (2005) defined the generalized odds ratio curve for a continuous covariate X j at point x, and taking x 0 as the reference value, by OR x 0 j (x) = E (X 1,...,X p ) [ ] p(x1,..., x,..., X p )/(1 p(x 1,..., x,..., X p )), p(x 1,..., x 0,..., X p )/(1 p(x 1,..., x 0,..., X p )) where p(x 1,..., X p ) = P (Y = 1 X 1,..., X p ) and E (X1,...,X p ) represents the mean operator over the covariates {X k } k j. Thus, if we consider a GAM with a link belonging to the Aranda-Ordaz (1981) transformations family,
4 4 The additive gamma-logit model we obtain the following estimator of the odds ratio ÔR x 0 j (x) = 1 n n i=1 (1 + ˆψ e ˆβ 0+ ˆf 1(X i1)+...+ ˆf j(x)+...+ ˆf p(x ip) ) 1/ ˆψ 1 (1 + ˆψ e ˆβ 0 + ˆf 1 (X i1 )+...+ ˆf j (x 0 )+...+ ˆf p (X ip ) ) 1/ ˆψ 1, where ˆψ, ˆβ 0 and ˆf j are estimates obtained from fitting our GAM. In what concerns the construction of pointwise confidence intervals for the odds ratio curve, we used bootstrap techniques (Cadarso-Suárez et al., 2005). A simulation study was carried out, not only to evaluate the quality of the link parameter estimates, but also to compare the performance of the proposed GAM with that of the GLM with the same parametric link. We concluded that the estimation process was satisfactory and that a substantial gain, in what concerns the deviance, may be achieved with our model. 4 A real case study To apply the proposed methodology, we have studied the elapsed time from first injecting drug use until HIV infection, using a data set of 361 drug users who started using intravenous drugs between 1974 and 1997 and were admitted to the detoxification unit of the Hospital Universitari Germans Trias i Pujol in Badalona, Spain, between 1987 and For these individuals the moment of HIV infection is unknown. In fact, for 15% of the cases, the only available information about this instant is limited to the interval [instant of last negative HIV test, instant of first positive HIV test]. For the rest of the individuals, we only know their status (infected or not infected) at the date of the last HIV test (monitoring instant). This means that the data is mainly case I interval censored and so we decided to treat all the observations as current status data. From the available data we used the variables age of first intravenous drug use, gender, the elapsed time (T ), in months, from the instant of first intravenous drug use until the date of the last HIV test (monitoring time) and the indicator variable Y that gives us information about the result of the last HIV test (0 if the individual is seronegative or 1 if the individual is infected). We considered the model µ = h{[β 0 + f(t )] + f 1 (age) + β 1 gender}. The estimate of the link parameter was obtained through the minimization of the deviance, calculated for a grid of values of ψ and we considered that ˆψ = 5 was the best estimate, for a deviance of approximately The resulting fitted model is given by ˆµ = ( 1/ e [5.18+ ˆf(T )]+ ˆf 1 (age)+2.61 gender) For the variable age, we refer to Figure 1 for a graphical representation of the odds ratio curve, estimated for both genders and considering the mean age (19 years) as the reference value. As we can see from these two figures, the graphics are
5 A.L.Papoila and C.S.Rocha FIGURE 1. OR (age) estimates and corresponding 95% confidence intervals, female and male genders. very similar. It seems to exist a lower risk of infection for the individuals who initiated their injecting drug addiction with an age of approximately 26 years old. Survival curves for both female and male were obtained and FIGURE 2. Estimates of the survival functions of time until HIV infection for females and males who initiated their drug addiction with a mean age of 19 years. from Figure 2 we can see that time until HIV infection is longer for men. It also seems that the curves level off and the resulting plateau may indicate the existence of immune individuals in the population. In fact, it is admissible that some of the injecting drug users take the adequate precautions and consequently an HIV infection is unlikely to occur. Finally, to evaluate the goodness-of-fit of the proposed model, the deviance residuals were examined and no serious trends, characteristic of a bad fit,
6 6 The additive gamma-logit model were detected. To overcome the lack of global goodness-of-fit tests for these kind of models, we used bootstrap techniques and concluded that the model was reasonably adequate. The 95% bootstrap confidence interval for the deviance (352.86, ) was obtained. However, we are aware of the existence of unobserved heterogeneity among the individuals. So, we believe that the introduction of a frailty term would certainly improve the fit of the model. Acknowledgements: This research was supported by FCT/POCI The authors would like to thank Drs. Klaus Langohr, Guadalupe Gómez and Robert Muga for making the data available. References Aranda-Ordaz, F.J. (1981). On two families of transformations to additivity for binary regression data. Biometrika 68, Buja, A., Hastie, T.J. and Tibshirani, R.J. (1989). Linear smoothers and additive models (with discussion). Annals of Statistics 17, Cadarso-Suárez, C., Roca-Pardiñas, J.R., Figueiras, A. and Manteiga, W. (2005). Non-parametric estimation of the odds ratios for continuous exposures using generalized additive models with an unknown link function. Statistics in Medicine 24, Doksum, K.A. and Gasko, M. (1990). On a correspondence between models in binary regression and in survival analysis. International Statistical Review 58, Figueiras, A. and Cadarso-Suárez, C. (2001). Application of nonparametric models for calculating odds ratios and their confidence intervals for continuous exposures. American Journal of Epidemiology 154, 3, Hastie, T. and Tibshirani, R. (1984). Generalized additive models. Tech. Rep. 98, Dept. of Statistics, Stanford University. Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman & Hall, New York. Pregibon, D. (1980). Goodness of link tests for generalized linear models. Journal of the Royal Statistical Society, series C 29, Roca-Pardiñas, J., Manteiga, W., Bande, M., Sánchez, J., Cadarso-Suárez, C. (2004). Predicting binary time series of SO 2 using generalized additive models with unknown link function. Environmetrics 15, 1-14.
Generalized Additive Models
Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More informationA NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL
Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl
More informationInteraction effects for continuous predictors in regression modeling
Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage
More informationGeneralized Additive Models
By Trevor Hastie and R. Tibshirani Regression models play an important role in many applied settings, by enabling predictive analysis, revealing classification rules, and providing data-analytic tools
More informationPower and Sample Size Calculations with the Additive Hazards Model
Journal of Data Science 10(2012), 143-155 Power and Sample Size Calculations with the Additive Hazards Model Ling Chen, Chengjie Xiong, J. Philip Miller and Feng Gao Washington University School of Medicine
More informationClinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.
Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,
More informationRegularization in Cox Frailty Models
Regularization in Cox Frailty Models Andreas Groll 1, Trevor Hastie 2, Gerhard Tutz 3 1 Ludwig-Maximilians-Universität Munich, Department of Mathematics, Theresienstraße 39, 80333 Munich, Germany 2 University
More informationGraphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals
Graphical Presentation of a Nonparametric Regression with Bootstrapped Confidence Intervals Mark Nicolich & Gail Jorgensen Exxon Biomedical Science, Inc., East Millstone, NJ INTRODUCTION Parametric regression
More information* * * * * * * * * * * * * * * ** * **
Generalized Additive Models Trevor Hastie y and Robert Tibshirani z Department of Statistics and Division of Biostatistics Stanford University 12th May, 1995 Regression models play an important role in
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationGeneralized Linear Models (GLZ)
Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationTreatment Variables INTUB duration of endotracheal intubation (hrs) VENTL duration of assisted ventilation (hrs) LOWO2 hours of exposure to 22 49% lev
Variable selection: Suppose for the i-th observational unit (case) you record ( failure Y i = 1 success and explanatory variabales Z 1i Z 2i Z ri Variable (or model) selection: subject matter theory and
More informationReduced-rank hazard regression
Chapter 2 Reduced-rank hazard regression Abstract The Cox proportional hazards model is the most common method to analyze survival data. However, the proportional hazards assumption might not hold. The
More informationInversion Base Height. Daggot Pressure Gradient Visibility (miles)
Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu:
More informationPREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY
REVSTAT Statistical Journal Volume 7, Number 1, April 2009, 37 54 PREWHITENING-BASED ESTIMATION IN PARTIAL LINEAR REGRESSION MODELS: A COMPARATIVE STUDY Authors: Germán Aneiros-Pérez Departamento de Matemáticas,
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationSemiparametric Generalized Linear Models
Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student
More information7 Semiparametric Estimation of Additive Models
7 Semiparametric Estimation of Additive Models Additive models are very useful for approximating the high-dimensional regression mean functions. They and their extensions have become one of the most widely
More informationBayesian Estimation and Inference for the Generalized Partial Linear Model
Bayesian Estimation Inference for the Generalized Partial Linear Model Haitham M. Yousof 1, Ahmed M. Gad 2 1 Department of Statistics, Mathematics Insurance, Benha University, Egypt. 2 Department of Statistics,
More informationChapter 4. Parametric Approach. 4.1 Introduction
Chapter 4 Parametric Approach 4.1 Introduction The missing data problem is already a classical problem that has not been yet solved satisfactorily. This problem includes those situations where the dependent
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationPENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA
PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA Kasun Rathnayake ; A/Prof Jun Ma Department of Statistics Faculty of Science and Engineering Macquarie University
More informationKernel Logistic Regression and the Import Vector Machine
Kernel Logistic Regression and the Import Vector Machine Ji Zhu and Trevor Hastie Journal of Computational and Graphical Statistics, 2005 Presented by Mingtao Ding Duke University December 8, 2011 Mingtao
More informationON CONCURVITY IN NONLINEAR AND NONPARAMETRIC REGRESSION MODELS
STATISTICA, anno LXXIV, n. 1, 2014 ON CONCURVITY IN NONLINEAR AND NONPARAMETRIC REGRESSION MODELS Sonia Amodio Department of Economics and Statistics, University of Naples Federico II, Via Cinthia 21,
More informationConsider Table 1 (Note connection to start-stop process).
Discrete-Time Data and Models Discretized duration data are still duration data! Consider Table 1 (Note connection to start-stop process). Table 1: Example of Discrete-Time Event History Data Case Event
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationRegression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102
Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More informationSupporting Information for Estimating restricted mean. treatment effects with stacked survival models
Supporting Information for Estimating restricted mean treatment effects with stacked survival models Andrew Wey, David Vock, John Connett, and Kyle Rudser Section 1 presents several extensions to the simulation
More informationQuantile regression and heteroskedasticity
Quantile regression and heteroskedasticity José A. F. Machado J.M.C. Santos Silva June 18, 2013 Abstract This note introduces a wrapper for qreg which reports standard errors and t statistics that are
More informationLogistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20
Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationSTA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random
STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression
More informationFlexible Estimation of Treatment Effect Parameters
Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationA class of latent marginal models for capture-recapture data with continuous covariates
A class of latent marginal models for capture-recapture data with continuous covariates F Bartolucci A Forcina Università di Urbino Università di Perugia FrancescoBartolucci@uniurbit forcina@statunipgit
More informationREVISED PAGE PROOFS. Logistic Regression. Basic Ideas. Fundamental Data Analysis. bsa350
bsa347 Logistic Regression Logistic regression is a method for predicting the outcomes of either-or trials. Either-or trials occur frequently in research. A person responds appropriately to a drug or does
More informationLogistic regression model for survival time analysis using time-varying coefficients
Logistic regression model for survival time analysis using time-varying coefficients Accepted in American Journal of Mathematical and Management Sciences, 2016 Kenichi SATOH ksatoh@hiroshima-u.ac.jp Research
More informationMeasurement Error in Spatial Modeling of Environmental Exposures
Measurement Error in Spatial Modeling of Environmental Exposures Chris Paciorek, Alexandros Gryparis, and Brent Coull August 9, 2005 Department of Biostatistics Harvard School of Public Health www.biostat.harvard.edu/~paciorek
More informationSurvival Analysis Math 434 Fall 2011
Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup
More informationVariable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting
Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting Andreas Groll 1 and Gerhard Tutz 2 1 Department of Statistics, University of Munich, Akademiestrasse 1, D-80799, Munich,
More informationUNIVERSITY OF CALIFORNIA, SAN DIEGO
UNIVERSITY OF CALIFORNIA, SAN DIEGO Estimation of the primary hazard ratio in the presence of a secondary covariate with non-proportional hazards An undergraduate honors thesis submitted to the Department
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More informationModel Selection, Estimation, and Bootstrap Smoothing. Bradley Efron Stanford University
Model Selection, Estimation, and Bootstrap Smoothing Bradley Efron Stanford University Estimation After Model Selection Usually: (a) look at data (b) choose model (linear, quad, cubic...?) (c) fit estimates
More informationFahrmeir: Discrete failure time models
Fahrmeir: Discrete failure time models Sonderforschungsbereich 386, Paper 91 (1997) Online unter: http://epub.ub.uni-muenchen.de/ Projektpartner Discrete failure time models Ludwig Fahrmeir, Universitat
More informationSTA102 Class Notes Chapter Logistic Regression
STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response
More informationDistribution-free ROC Analysis Using Binary Regression Techniques
Distribution-free Analysis Using Binary Techniques Todd A. Alonzo and Margaret S. Pepe As interpreted by: Andrew J. Spieker University of Washington Dept. of Biostatistics Introductory Talk No, not that!
More informationLecture 1. Introduction Statistics Statistical Methods II. Presented January 8, 2018
Introduction Statistics 211 - Statistical Methods II Presented January 8, 2018 linear models Dan Gillen Department of Statistics University of California, Irvine 1.1 Logistics and Contact Information Lectures:
More informationGradient types. Gradient Analysis. Gradient Gradient. Community Community. Gradients and landscape. Species responses
Vegetation Analysis Gradient Analysis Slide 18 Vegetation Analysis Gradient Analysis Slide 19 Gradient Analysis Relation of species and environmental variables or gradients. Gradient Gradient Individualistic
More informationAnalysis of Time-to-Event Data: Chapter 4 - Parametric regression models
Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Right censored
More informationTruck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation
Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical
More informationHarvard University. Harvard University Biostatistics Working Paper Series. A New Class of Rank Tests for Interval-censored Data
Harvard University Harvard University Biostatistics Working Paper Series Year 2008 Paper 93 A New Class of Rank Tests for Interval-censored Data Guadalupe Gomez Ramon Oller Pique Harvard School of Public
More informationSimultaneous Confidence Bands for the Coefficient Function in Functional Regression
University of Haifa From the SelectedWorks of Philip T. Reiss August 7, 2008 Simultaneous Confidence Bands for the Coefficient Function in Functional Regression Philip T. Reiss, New York University Available
More informationStat 642, Lecture notes for 04/12/05 96
Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationUNIVERSITÄT POTSDAM Institut für Mathematik
UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam
More informationLogistic Regression: Regression with a Binary Dependent Variable
Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression
More informationLecture 2: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationLocal Likelihood Bayesian Cluster Modeling for small area health data. Andrew Lawson Arnold School of Public Health University of South Carolina
Local Likelihood Bayesian Cluster Modeling for small area health data Andrew Lawson Arnold School of Public Health University of South Carolina Local Likelihood Bayesian Cluster Modelling for Small Area
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationProfessors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th
DISCUSSION OF THE PAPER BY LIN AND YING Xihong Lin and Raymond J. Carroll Λ July 21, 2000 Λ Xihong Lin (xlin@sph.umich.edu) is Associate Professor, Department ofbiostatistics, University of Michigan, Ann
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationGeneralized Additive Models (GAMs)
Generalized Additive Models (GAMs) Israel Borokini Advanced Analysis Methods in Natural Resources and Environmental Science (NRES 746) October 3, 2016 Outline Quick refresher on linear regression Generalized
More informationSemi-parametric estimation of non-stationary Pickands functions
Semi-parametric estimation of non-stationary Pickands functions Linda Mhalla 1 Joint work with: Valérie Chavez-Demoulin 2 and Philippe Naveau 3 1 Geneva School of Economics and Management, University of
More informationOn Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation
On Fitting Generalized Linear Mixed Effects Models for Longitudinal Binary Data Using Different Correlation Structures Authors: M. Salomé Cabral CEAUL and Departamento de Estatística e Investigação Operacional,
More informationOdds ratio estimation in Bernoulli smoothing spline analysis-ofvariance
The Statistician (1997) 46, No. 1, pp. 49 56 Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance models By YUEDONG WANG{ University of Michigan, Ann Arbor, USA [Received June 1995.
More informationA review of some semiparametric regression models with application to scoring
A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France
More informationIntroduction to mtm: An R Package for Marginalized Transition Models
Introduction to mtm: An R Package for Marginalized Transition Models Bryan A. Comstock and Patrick J. Heagerty Department of Biostatistics University of Washington 1 Introduction Marginalized transition
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationmboost - Componentwise Boosting for Generalised Regression Models
mboost - Componentwise Boosting for Generalised Regression Models Thomas Kneib & Torsten Hothorn Department of Statistics Ludwig-Maximilians-University Munich 13.8.2008 Boosting in a Nutshell Boosting
More informationTests of independence for censored bivariate failure time data
Tests of independence for censored bivariate failure time data Abstract Bivariate failure time data is widely used in survival analysis, for example, in twins study. This article presents a class of χ
More informationStatistics in medicine
Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu
More informationToday. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:
Today HW 1: due February 4, 11.59 pm. Aspects of Design CD Chapter 2 Continue with Chapter 2 of ELM In the News: STA 2201: Applied Statistics II January 14, 2015 1/35 Recap: data on proportions data: y
More informationLecture 5: Poisson and logistic regression
Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction
More informationProteomics and Variable Selection
Proteomics and Variable Selection p. 1/55 Proteomics and Variable Selection Alex Lewin With thanks to Paul Kirk for some graphs Department of Epidemiology and Biostatistics, School of Public Health, Imperial
More informationLinear Regression With Special Variables
Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:
More informationA multi-state model for the prognosis of non-mild acute pancreatitis
A multi-state model for the prognosis of non-mild acute pancreatitis Lore Zumeta Olaskoaga 1, Felix Zubia Olaskoaga 2, Guadalupe Gómez Melis 1 1 Universitat Politècnica de Catalunya 2 Intensive Care Unit,
More informationStatistical Inference
Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park
More informationModel Selection in GLMs. (should be able to implement frequentist GLM analyses!) Today: standard frequentist methods for model selection
Model Selection in GLMs Last class: estimability/identifiability, analysis of deviance, standard errors & confidence intervals (should be able to implement frequentist GLM analyses!) Today: standard frequentist
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationA Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic
More informationBiost 518 Applied Biostatistics II. Purpose of Statistics. First Stage of Scientific Investigation. Further Stages of Scientific Investigation
Biost 58 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 5: Review Purpose of Statistics Statistics is about science (Science in the broadest
More informationData Mining Stat 588
Data Mining Stat 588 Lecture 9: Basis Expansions Department of Statistics & Biostatistics Rutgers University Nov 01, 2011 Regression and Classification Linear Regression. E(Y X) = f(x) We want to learn
More informationIntegrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University
Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y
More informationCensored partial regression
Biostatistics (2003), 4, 1,pp. 109 121 Printed in Great Britain Censored partial regression JESUS ORBE, EVA FERREIRA, VICENTE NÚÑEZ-ANTÓN Departamento de Econometría y Estadística, Facultad de Ciencias
More informationHow to Present Results of Regression Models to Clinicians
How to Present Results of Regression Models to Clinicians Frank E Harrell Jr Department of Biostatistics Vanderbilt University School of Medicine f.harrell@vanderbilt.edu biostat.mc.vanderbilt.edu/fhhandouts
More informationDuration of Unemployment - Analysis of Deviance Table for Nested Models
Duration of Unemployment - Analysis of Deviance Table for Nested Models February 8, 2012 The data unemployment is included as a contingency table. The response is the duration of unemployment, gender and
More informationLogistisk regression T.K.
Föreläsning 13: Logistisk regression T.K. 05.12.2017 Your Learning Outcomes Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition likelihood function: maximum likelihood estimate
More informationPartial Generalized Additive Models
Partial Generalized Additive Models An Information-theoretic Approach for Selecting Variables and Avoiding Concurvity Hong Gu 1 Mu Zhu 2 1 Department of Mathematics and Statistics Dalhousie University
More informationThe In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification p.1/27
The In-and-Out-of-Sample (IOS) Likelihood Ratio Test for Model Misspecification Brett Presnell Dennis Boos Department of Statistics University of Florida and Department of Statistics North Carolina State
More informationSTAT5044: Regression and Anova
STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation
More informationWhat s New in Econometrics? Lecture 14 Quantile Methods
What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression
More informationBinary Logistic Regression
The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b
More information