Beyond GLM and likelihood

Size: px
Start display at page:

Download "Beyond GLM and likelihood"

Transcription

1 Stat 6620: Applied Linear Models Department of Statistics Western Michigan University

2 Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence theorems) Math stat 2 (likelihood estimation, testing) Linear models (correlation, regression, ANOVA) Generalized linear models (repeated measures, random effects) Bayesian data analysis Data types Categorical data (logistic regression, odds ratios, CMH χ 2 ) Nonparametric data analysis Survival data Multivariate data Computing Stat computing 1 (SAS, R, SPSS, Python) Stat computing 2 (data mining, machine learning)

3 Logistic regression: binary response (Y) (X) Subj Group LogHcy Sex Age BMI SBP 1 Stroke Female Stroke Female Nonstroke Male Nonstroke Female : : : 1919 Stroke Male Nonstroke Male

4 Logistic regression

5 Logistic regression Model 1: unadjusted Model 2: adjusted for age and sex Model 3: adjusted for age, sex, BMI, SBP, DBP, Gluc, TCh, Trigl, HDL, LDL, HoS, HoA

6 Logistic regression { 1 if Stroke Y = 0 if Nonstroke X =Log Hcy Logistic Model: P[Y = 1 X ] = eβ 0+β 1 X 1 + e β 0+β 1 X Probit Model: P[Y = 1 X ] = Φ(β 0 + β 1 X )

7 Q: What values of (β 0, β 1 ) fit the data best? (X) (Y) (-2,2) (-7,5) (-19,15) Obs loghcy pred1 pred2 pred : : :

8 Many criteria for choosing best fit ( ˆβ 0, ˆβ 1 ). For example, let Y = (1, 1, 0, 0,..., 1, 0) and let Ŷ be vector of predicted values. Minimize D 1 (Y, Ŷ ) = Y Ŷ = Y i Ŷi or or or or D 2 (Y, Ŷ ) = Y Ŷ 2 = (Y i Ŷi) 2 D 3 (Y, Ŷ ) = Med Y i Ŷi D 4 (Y, Ŷ ) = Max Y i Ŷi D 5 = total misclassification rate

9 The likelihood principle Choose ( ˆβ 0, ˆβ 1 ) to maximize L(β 0, β 1 ) = i P[Y = y i X = x i ] In log Hcy example, maximize P[Y = 1 X = 1.45]P[Y = 1 X = 1.33] P[Y = 0 X = 0.89] [ e β 0 +β 1 ] [ 1.45 e β 0 +β 1 ] 1.33 = 1 + e β 0+β e β [1 eβ 0+β 1 ] β e β 0+β The maximum likelihood estimates are ( ˆβ 0, ˆβ 1 ) = ( 3.37, 2.91)

10 The LOGISTIC Procedure Maximum Likelihood Estimates Std Wald Parameter DF Estimate Error Chi-Square Pr>ChiSq Intercept loghcy Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits loghcy

11 SAS code DATA strokedat; INPUT stroke $ loghcy; DATALINES; Yes 1.45 Yes 1.33 No 1.26 No 1.10 : : Yes 1.45 No 0.89 ; PROC LOGISTIC; MODEL stroke (EVENT= Yes ) = loghcy;

12 Logistic regression: odds ratio Define the odds of an event E as Odds(E) = P(E) 1 P(E) Then and P(E) = eβ 0+β 1 X 1 + e β 0+β 1 X 1 P(E) = Odds(Y = 1 x + δ) Odds(Y = 1 x) Odds(E) = e β 0+β 1 X = eβ 0+β 1 (x+δ) e β 0+β 1 x e β 0+β 1 X = e β 1δ

13 The effect of having log Hcy one standard deviation higher is Odds(Y = 1 x ) Odds(Y = 1 x) = e ˆβ 1 (.20) = e (2.91)(.20) = 1.79 The odds of having a stroke increases by 79% for every 1 SD increase in log Hcy.

14 What about categorical predictors? For example, suppose log Hcy was categorized into three levels: Low (< 1.09), Normal ( ), or High (> 1.23). Subj Group Y LogHcy Level X1 X2 1 Stroke High Stroke High Nonstroke High Nonstroke Normal 0 1 : : : 1919 Stroke High Nonstroke Low 0 0

15 P[Y = 1 Level] = eβ 0+β 1 X 1 +β 2 X e β 0+β 1 X 1 +β 2 X 2 Odds[Y = 1 Level] = e β 0+β 1 X 1 +β 2 X 2 e β 0+β 1, if High = e β 0+β 2, if Normal e β 0, if Low Odds[Y = 1 Level] Odds[Y = 1 Low] = e β 1, if High e β 2, if Normal 1, if Low

16 Logistic regression

17 Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence theorems) Math stat 2 (likelihood estimation, testing) Linear models (correlation, regression, ANOVA) Generalized linear models (repeated measures, random effects) Bayesian data analysis Data types Categorical data (logistic regression, odds ratios, CMH χ 2 ) Nonparametric data analysis Survival data Multivariate data Computing Stat computing 1 (SAS, R, SPSS, Python) Stat computing 2 (data mining, machine learning)

18 Likelihood Theory Given a sample y i,..., y n, the likelihood function is L(θ; y 1,..., y n ) = (Note: θ may be a vector.) n f (y i ; θ) i=1 The value ˆθ which maximizes this is called the MLE. Comment: It is often easier to maximize the log-likelihood log L(θ) = n log f (y i ; θ) i=1

19 The shape of f ( ) determines the properties of ˆθ. Normal: If f (y i ; θ) = ( 1 2π ) 1/2 e (y i θ) 2 /2, then and ˆθ = y 1 + +y n n. L(θ) = Laplace: If f (y i ; θ) = ( 1 2) e y i θ, then L(θ) = and ˆθ = med{y1,, y n }. ( ) 1 n/2 e P (y i θ) 2 /2 2π ( ) 1 n e P y i θ 2

20 Theorem L1: Let θ 0 denote the true value of θ. Under regularity conditions, ( ) 1 ˆθ N θ 0, I (θ 0 ) where [ 2 ] [ logl(θ) 2 ] log f (Y ; θ) I (θ) = E θ 2 = E θ 2 Example: Poisson(θ) L(θ) = e θ θ y i y i! = e nθ θ P y i yi! log L(θ) = nθ + y i log θ log y i! log L(θ) θ = n + yi θ 0 MLE is ˆθ = P yi n.

21 Theorem L1 says Var(ˆθ). = 1 I (θ) log L(θ) = nθ + y i log θ log y i! log L(θ) θ = n + y i θ 2 log L(θ) θ 2 = y i θ 2 [ 2 ] log L(θ) [ I (θ) = E θ 2 = E y ] i θ 2 = n θ so Var(ˆθ) =. θ 0 n. The standard error is SE =. y n

22

23 Theorem L2 (RCLB): Let y 1,..., y n be a random sample from f (y; θ 0 ). Let U(y 1,..., y n ) be a function such that E(U) = θ 0. Then Var(U) 1 I (θ 0 ) Implications: 1 The MLE is efficient, in the sense of smallest possible variance. 2 Provides a framework for comparing estimates Q: When is the sample median better than the sample mean? A: When the distribution is closer to the Laplace than the Normal.

24 Definition: Let the efficiency of an estimator U(y 1,..., y n ) be the ratio of its variance to the RCLB. i.e. eff = 1/[I (θ 0)] Var(U) Table: Efficiency of estimators Normal Laplace Sample mean Sample median

25 Definition: Let the efficiency of an estimator U(y 1,..., y n ) be the ratio of its variance to the RCLB. i.e. eff = 1/[I (θ 0)] Var(U) Table: Efficiency of estimators Normal Laplace Sample mean Sample median Med{(y i + y j )/2}

26 Confirm by simulation > nsamp<-rnorm(n=21,mean=0,sd=3) > nsamp [1] [7] [13] [19] > mean(nsamp) [1] > median(nsamp) [1] > for (i in 1:10000){nsamp<-rnorm(n=21,mean=0,sd=3); stomean[i]<-mean(nsamp);stomed[i]<-median(nsamp)} > var(stomean)/var(stomed) [1]

27 Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence theorems) Math stat 2 (likelihood estimation, testing) Linear models (correlation, regression, ANOVA) Generalized linear models (repeated measures, random effects) Bayesian data analysis Data types Categorical data (logistic regression, odds ratios, CMH χ 2 ) Nonparametric data analysis Survival data Multivariate data Computing Stat computing 1 (SAS, R, SPSS, Python) Stat computing 2 (data mining, machine learning)

28 Multi-parameter likelihood estimation Let (ˆθ 1, ˆθ 2 ) maximize log L(θ 1, θ 2 ) = n log f (y i ; x i, θ 1, θ 2 ) i=1 The 2 2 information matrix I(θ 1, θ 2 ) has (j, k)th element [ 2 ] E log f (Y ; xi ; θ 1, θ 2 ) θ j θ k Theorem L3: The variances of ˆθ 1 and ˆθ 2 are the diagonals of I 1 (θ 1, θ 2 )

29 The LOGISTIC Procedure Maximum Likelihood Estimates Std Wald Parameter DF Estimate Error Chi-Square Pr>ChiSq Intercept loghcy Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits loghcy

30 Confidence interval for odds ratio Recall: Odds[Y = 1 Level] Odds[Y = 1 Low] = Confidence interval for β 1 : Confidence interval for e β 1 : e β 1, if High e β 2, if Normal 1, if Low ± 2(1.0699) [0.7739, ] [ e , e ] = [2.263, ]

31

32 Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence theorems) Math stat 2 (likelihood estimation, testing) Linear models (correlation, regression, ANOVA) Generalized linear models (repeated measures, random effects) Bayesian data analysis Data types Categorical data (logistic regression, odds ratios, CMH χ 2 ) Nonparametric data analysis Survival data Multivariate data Computing Stat computing 1 (SAS, R, SPSS, Python) Stat computing 2 (data mining, machine learning)

33 Beyond likelihood Example: Survival times for heart transplant patients (T) (X1) (X2) (X3) Subj Survive Trans Prior Age : :

34 Suppose that survival time T has probability density function f (t). The cumulative density function is F (t) = P[T t] = Define the survivor function as and the hazard function as It can be shown that t 0 f (u) du S(t) = P[T > t] = 1 F (t) h(t) = f (t) S(t) f (t) = h(t)e R t 0 h(u) du

35 Some hazard models: λ, Exponential h(t) = λγ t, Gompertz λt α, Weibull Incorporating covariates like transplant, prior surgery, and age? λ e β 1x 1 + +β k x k, Exponential h(t) = λγ t e β 1x 1 + +β k x k, Gompertz λt α e β 1x 1 + +β k x k, Weibull = λ 0 (t) e β x Proportional hazards

36 Estimate parameters by maximum likelihood n L = f i (t i ) i=1 where f i (t) = h i (t)e R t 0 h i (u) du and Problem: Specify λ 0 (t)? h i (t) = λ 0 (t) e β x i λ, Exponential λ 0 (t) = λγ t, Gompertz λt α, Weibull

37 Partial Likelihood Cox (1972) proposed an estimation method for the βs without needing to specify λ 0 (t). Maximize the partial likelihood PL = where L i is the conditional probability of failure at time t i given the number of cases at risk at time t i. n i=1 L i

38 Example: Survival times for heart transplant patients (T) (X1) (X2) (X3) Subj Survive Trans Prior Age : :

39 A death occurred 5 days after enrollment. What is the probability that it happened to patient 1 instead of to one of the other at-risk patients? h 1 (5) L 1 = h 1 (5) + + h 75 (5) h 2 (15) L 2 = h 2 (15) + + h 75 (15) : : L 74 = L 75 = 1 h 74 (541) h 74 (541) + h 75 (541)

40 L 1 = h 1 (5) h 1 (5) + + h 75 (5) λ 0 (5) e β x 1 = λ 0 (5) e β x λ0 (5) e β x 75 e β x 1 = e β x e β x 75 e β x 2 L 2 = e β x e β x 75 The combination of PH assumption and partial likelihood PL = allow estimation of β without specifying the baseline hazard. n i=1 L i

41 SAS output The PHREG Procedure Dependent variable: Survive Maximum Likelihood Estimates Standard Wald Pr> Risk Var DF Estimate Error Chi-sq Chi-sq Ratio Trans Prior Age

42 The hazard ratio for age is h(t; Age = x + 1) h(t; Age = x) = λ 0(t)e 1.708x x (x 3 +1) λ 0 (t)e 1.708x x x 3 = e.0586 = so every additional year of age increases hazard of failure by 6%. 95% confidence interval for hazard ratio is [ e (.0150), e (.0150)] = [1.0289, ]

43 Example: Survival times for heart transplant patients Standard Risk Var DF Estimate Error Ratio Trans Prior Age Age increases hazard of death by 6%, and getting a transplant reduces hazard of death by 82%. While the age effect may be real, the magnitude of transplant effect is likely false. Transplant Late death No Transplant Early death

44 Example: Survival times for heart transplant patients Standard Risk Var DF Estimate Error Ratio Trans Prior Age Age increases hazard of death by 6%, and getting a transplant reduces hazard of death by 82%. While the age effect may be real, the magnitude of transplant effect is likely false. Transplant Late death No Transplant Early death

45 What does the data say? Challenge: 1 Data integrity 2 Appropriate methodology 3 Building a correct narrative significant relationships effect size

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS3301 / MAS8311 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-10 1 13 The Cox proportional hazards model 13.1 Introduction In the

More information

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables

ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES. Cox s regression analysis Time dependent explanatory variables ADVANCED STATISTICAL ANALYSIS OF EPIDEMIOLOGICAL STUDIES Cox s regression analysis Time dependent explanatory variables Henrik Ravn Bandim Health Project, Statens Serum Institut 4 November 2011 1 / 53

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?

You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What? You know I m not goin diss you on the internet Cause my mama taught me better than that I m a survivor (What?) I m not goin give up (What?) I m not goin stop (What?) I m goin work harder (What?) Sir David

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis STAT 6350 Analysis of Lifetime Data Failure-time Regression Analysis Explanatory Variables for Failure Times Usually explanatory variables explain/predict why some units fail quickly and some units survive

More information

Building a Prognostic Biomarker

Building a Prognostic Biomarker Building a Prognostic Biomarker Noah Simon and Richard Simon July 2016 1 / 44 Prognostic Biomarker for a Continuous Measure On each of n patients measure y i - single continuous outcome (eg. blood pressure,

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Right censored

More information

β j = coefficient of x j in the model; β = ( β1, β2,

β j = coefficient of x j in the model; β = ( β1, β2, Regression Modeling of Survival Time Data Why regression models? Groups similar except for the treatment under study use the nonparametric methods discussed earlier. Groups differ in variables (covariates)

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520

REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 REGRESSION ANALYSIS FOR TIME-TO-EVENT DATA THE PROPORTIONAL HAZARDS (COX) MODEL ST520 Department of Statistics North Carolina State University Presented by: Butch Tsiatis, Department of Statistics, NCSU

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Survival Analysis. Stat 526. April 13, 2018

Survival Analysis. Stat 526. April 13, 2018 Survival Analysis Stat 526 April 13, 2018 1 Functions of Survival Time Let T be the survival time for a subject Then P [T < 0] = 0 and T is a continuous random variable The Survival function is defined

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

ST495: Survival Analysis: Maximum likelihood

ST495: Survival Analysis: Maximum likelihood ST495: Survival Analysis: Maximum likelihood Eric B. Laber Department of Statistics, North Carolina State University February 11, 2014 Everything is deception: seeking the minimum of illusion, keeping

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Chapter 4 Regression Models

Chapter 4 Regression Models 23.August 2010 Chapter 4 Regression Models The target variable T denotes failure time We let x = (x (1),..., x (m) ) represent a vector of available covariates. Also called regression variables, regressors,

More information

Cox s proportional hazards model and Cox s partial likelihood

Cox s proportional hazards model and Cox s partial likelihood Cox s proportional hazards model and Cox s partial likelihood Rasmus Waagepetersen October 12, 2018 1 / 27 Non-parametric vs. parametric Suppose we want to estimate unknown function, e.g. survival function.

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Logistic regression: Miscellaneous topics

Logistic regression: Miscellaneous topics Logistic regression: Miscellaneous topics April 11 Introduction We have covered two approaches to inference for GLMs: the Wald approach and the likelihood ratio approach I claimed that the likelihood ratio

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Survival Analysis Math 434 Fall 2011

Survival Analysis Math 434 Fall 2011 Survival Analysis Math 434 Fall 2011 Part IV: Chap. 8,9.2,9.3,11: Semiparametric Proportional Hazards Regression Jimin Ding Math Dept. www.math.wustl.edu/ jmding/math434/fall09/index.html Basic Model Setup

More information

MAS3301 / MAS8311 Biostatistics Part II: Survival

MAS3301 / MAS8311 Biostatistics Part II: Survival MAS330 / MAS83 Biostatistics Part II: Survival M. Farrow School of Mathematics and Statistics Newcastle University Semester 2, 2009-0 8 Parametric models 8. Introduction In the last few sections (the KM

More information

Lecture 10: Introduction to Logistic Regression

Lecture 10: Introduction to Logistic Regression Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Lecture 7 Time-dependent Covariates in Cox Regression

Lecture 7 Time-dependent Covariates in Cox Regression Lecture 7 Time-dependent Covariates in Cox Regression So far, we ve been considering the following Cox PH model: λ(t Z) = λ 0 (t) exp(β Z) = λ 0 (t) exp( β j Z j ) where β j is the parameter for the the

More information

Consider Table 1 (Note connection to start-stop process).

Consider Table 1 (Note connection to start-stop process). Discrete-Time Data and Models Discretized duration data are still duration data! Consider Table 1 (Note connection to start-stop process). Table 1: Example of Discrete-Time Event History Data Case Event

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Joint Modeling of Longitudinal Item Response Data and Survival

Joint Modeling of Longitudinal Item Response Data and Survival Joint Modeling of Longitudinal Item Response Data and Survival Jean-Paul Fox University of Twente Department of Research Methodology, Measurement and Data Analysis Faculty of Behavioural Sciences Enschede,

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

Last week. posterior marginal density. exact conditional density. LTCC Likelihood Theory Week 3 November 19, /36

Last week. posterior marginal density. exact conditional density. LTCC Likelihood Theory Week 3 November 19, /36 Last week Nuisance parameters f (y; ψ, λ), l(ψ, λ) posterior marginal density π m (ψ) =. c (2π) q el P(ψ) l P ( ˆψ) j P ( ˆψ) 1/2 π(ψ, ˆλ ψ ) j λλ ( ˆψ, ˆλ) 1/2 π( ˆψ, ˆλ) j λλ (ψ, ˆλ ψ ) 1/2 l p (ψ) =

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Survival Analysis. STAT 526 Professor Olga Vitek

Survival Analysis. STAT 526 Professor Olga Vitek Survival Analysis STAT 526 Professor Olga Vitek May 4, 2011 9 Survival Data and Survival Functions Statistical analysis of time-to-event data Lifetime of machines and/or parts (called failure time analysis

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

5. Parametric Regression Model

5. Parametric Regression Model 5. Parametric Regression Model The Accelerated Failure Time (AFT) Model Denote by S (t) and S 2 (t) the survival functions of two populations. The AFT model says that there is a constant c > 0 such that

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis

CIMAT Taller de Modelos de Capture y Recaptura Known Fate Survival Analysis CIMAT Taller de Modelos de Capture y Recaptura 2010 Known Fate urvival Analysis B D BALANCE MODEL implest population model N = λ t+ 1 N t Deeper understanding of dynamics can be gained by identifying variation

More information

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina

Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes. Donglin Zeng, Department of Biostatistics, University of North Carolina Support Vector Hazard Regression (SVHR) for Predicting Survival Outcomes Introduction Method Theoretical Results Simulation Studies Application Conclusions Introduction Introduction For survival data,

More information

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II)

Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II) 1/45 Introduction to Analysis of Genomic Data Using R Lecture 6: Review Statistics (Part II) Dr. Yen-Yi Ho (hoyen@stat.sc.edu) Feb 9, 2018 2/45 Objectives of Lecture 6 Association between Variables Goodness

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

General Regression Model

General Regression Model Scott S. Emerson, M.D., Ph.D. Department of Biostatistics, University of Washington, Seattle, WA 98195, USA January 5, 2015 Abstract Regression analysis can be viewed as an extension of two sample statistical

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

A Survival Analysis of GMO vs Non-GMO Corn Hybrid Persistence Using Simulated Time Dependent Covariates in SAS

A Survival Analysis of GMO vs Non-GMO Corn Hybrid Persistence Using Simulated Time Dependent Covariates in SAS Western Kentucky University From the SelectedWorks of Matt Bogard 2012 A Survival Analysis of GMO vs Non-GMO Corn Hybrid Persistence Using Simulated Time Dependent Covariates in SAS Matt Bogard, Western

More information

BIOS 2083 Linear Models c Abdus S. Wahed

BIOS 2083 Linear Models c Abdus S. Wahed Chapter 5 206 Chapter 6 General Linear Model: Statistical Inference 6.1 Introduction So far we have discussed formulation of linear models (Chapter 1), estimability of parameters in a linear model (Chapter

More information

CHAPTER 1: BINARY LOGIT MODEL

CHAPTER 1: BINARY LOGIT MODEL CHAPTER 1: BINARY LOGIT MODEL Prof. Alan Wan 1 / 44 Table of contents 1. Introduction 1.1 Dichotomous dependent variables 1.2 Problems with OLS 3.3.1 SAS codes and basic outputs 3.3.2 Wald test for individual

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Generalized Linear Modeling - Logistic Regression

Generalized Linear Modeling - Logistic Regression 1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials

Binomial Model. Lecture 10: Introduction to Logistic Regression. Logistic Regression. Binomial Distribution. n independent trials Lecture : Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 27 Binomial Model n independent trials (e.g., coin tosses) p = probability of success on each trial (e.g., p =! =

More information

Transformations. 7.3 Extensions of Simple Linear Regression. β > 1 0 < β < 1. β < 0. Power Laws: Y = α X β X 4 X 1/2 X 1/3 X 3 X 1/4 X 2 X 1 X 1/2

Transformations. 7.3 Extensions of Simple Linear Regression. β > 1 0 < β < 1. β < 0. Power Laws: Y = α X β X 4 X 1/2 X 1/3 X 3 X 1/4 X 2 X 1 X 1/2 Ismor Fischer, 5/29/202 7.3-7.3 Extensions of Simple Linear Regression Transformations Power Laws: Y = α X β β > 0 < β < X 4 X /2 X /3 X 3 X 2 X /4 β < 0 X 2 X X /2 0.0 Ismor Fischer, 5/29/202 7.3-2 If

More information

STAT331. Cox s Proportional Hazards Model

STAT331. Cox s Proportional Hazards Model STAT331 Cox s Proportional Hazards Model In this unit we introduce Cox s proportional hazards (Cox s PH) model, give a heuristic development of the partial likelihood function, and discuss adaptations

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

8. Parametric models in survival analysis General accelerated failure time models for parametric regression

8. Parametric models in survival analysis General accelerated failure time models for parametric regression 8. Parametric models in survival analysis 8.1. General accelerated failure time models for parametric regression The accelerated failure time model Let T be the time to event and x be a vector of covariates.

More information

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials

Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Two-stage Adaptive Randomization for Delayed Response in Clinical Trials Guosheng Yin Department of Statistics and Actuarial Science The University of Hong Kong Joint work with J. Xu PSI and RSS Journal

More information

ST745: Survival Analysis: Cox-PH!

ST745: Survival Analysis: Cox-PH! ST745: Survival Analysis: Cox-PH! Eric B. Laber Department of Statistics, North Carolina State University April 20, 2015 Rien n est plus dangereux qu une idee, quand on n a qu une idee. (Nothing is more

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K

Modern Methods of Statistical Learning sf2935 Lecture 5: Logistic Regression T.K Lecture 5: Logistic Regression T.K. 10.11.2016 Overview of the Lecture Your Learning Outcomes Discriminative v.s. Generative Odds, Odds Ratio, Logit function, Logistic function Logistic regression definition

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies.

11 November 2011 Department of Biostatistics, University of Copengen. 9:15 10:00 Recap of case-control studies. Frequency-matched studies. Matched and nested case-control studies Bendix Carstensen Steno Diabetes Center, Gentofte, Denmark http://staff.pubhealth.ku.dk/~bxc/ Department of Biostatistics, University of Copengen 11 November 2011

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model

Other Survival Models. (1) Non-PH models. We briefly discussed the non-proportional hazards (non-ph) model Other Survival Models (1) Non-PH models We briefly discussed the non-proportional hazards (non-ph) model λ(t Z) = λ 0 (t) exp{β(t) Z}, where β(t) can be estimated by: piecewise constants (recall how);

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Introduction to the Logistic Regression Model

Introduction to the Logistic Regression Model CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response

More information

The linear model is the most fundamental of all serious statistical models encompassing:

The linear model is the most fundamental of all serious statistical models encompassing: Linear Regression Models: A Bayesian perspective Ingredients of a linear model include an n 1 response vector y = (y 1,..., y n ) T and an n p design matrix (e.g. including regressors) X = [x 1,..., x

More information

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require

In contrast, parametric techniques (fitting exponential or Weibull, for example) are more focussed, can handle general covariates, but require Chapter 5 modelling Semi parametric We have considered parametric and nonparametric techniques for comparing survival distributions between different treatment groups. Nonparametric techniques, such as

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing

STAT763: Applied Regression Analysis. Multiple linear regression. 4.4 Hypothesis testing STAT763: Applied Regression Analysis Multiple linear regression 4.4 Hypothesis testing Chunsheng Ma E-mail: cma@math.wichita.edu 4.4.1 Significance of regression Null hypothesis (Test whether all β j =

More information

Statistical aspects of prediction models with high-dimensional data

Statistical aspects of prediction models with high-dimensional data Statistical aspects of prediction models with high-dimensional data Anne Laure Boulesteix Institut für Medizinische Informationsverarbeitung, Biometrie und Epidemiologie February 15th, 2017 Typeset by

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided

More information

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing

STAT 135 Lab 5 Bootstrapping and Hypothesis Testing STAT 135 Lab 5 Bootstrapping and Hypothesis Testing Rebecca Barter March 2, 2015 The Bootstrap Bootstrap Suppose that we are interested in estimating a parameter θ from some population with members x 1,...,

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

Chapter 11. Regression with a Binary Dependent Variable

Chapter 11. Regression with a Binary Dependent Variable Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

BIOS 312: Precision of Statistical Inference

BIOS 312: Precision of Statistical Inference and Power/Sample Size and Standard Errors BIOS 312: of Statistical Inference Chris Slaughter Department of Biostatistics, Vanderbilt University School of Medicine January 3, 2013 Outline Overview and Power/Sample

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs)

36-309/749 Experimental Design for Behavioral and Social Sciences. Dec 1, 2015 Lecture 11: Mixed Models (HLMs) 36-309/749 Experimental Design for Behavioral and Social Sciences Dec 1, 2015 Lecture 11: Mixed Models (HLMs) Independent Errors Assumption An error is the deviation of an individual observed outcome (DV)

More information