Multinomial Regression Models

Size: px
Start display at page:

Download "Multinomial Regression Models"

Transcription

1 Multinomial Regression Models Objectives: Multinomial distribution and likelihood Ordinal data: Cumulative link models (POM). Ordinal data: Continuation models (CRM). 84 Heagerty, Bio/Stat 571

2 Models for Multinomial Data Example Data: Wisconsin Study of Diabetic Retinopathy (WESDR). Diabetic retinopathy is one of the leading causes of blindness in people aged years in the US. Disease characterized by appearance of small hemorrhages in the retina which progress and lead to severe visual loss. 85 Heagerty, Bio/Stat 571

3 Disease severity: None Mild Moderate Proliferative Covariates: duration of diabetes glycosolated hemoglobin diastolic blood pressure age at diagnosis 86 Heagerty, Bio/Stat 571

4 Models for Ordinal Data Example Data: Q: How does the distribution of disease severity vary as a function of covariates? Q: Possible approaches to analysis? Ideas? 87 Heagerty, Bio/Stat 571

5 87-1 Heagerty, Bio/Stat 571 Score versus GlycHemo y glyc.hemo Score versus logduration y log.duration

6 age.diag y Score versus AgeDiag dia.bp y Score versus DiaBP 87-2 Heagerty, Bio/Stat 571

7 Multinomial Response Models Common categorical outcomes take more than two levels: Pain severity = low, medium, high Conception trials = 1, 2 if not 1, 3 if not 1-2 The basic probability model is the multi-category extension of the Bernoulli (Binomial) distribution multinomial. Univariate outcome with multivariate representation: Let O i [1, 2, 3,... C] be a categorical outcome. Let Y ij be the indicator Y ij = 1(O i = j), j = 1, 2,... (C 1). 88 Heagerty, Bio/Stat 571

8 O i = j Y ij = 1 and Y ik = 0 k j O i Y i = Y i1 Y i2.. Y ic 89 Heagerty, Bio/Stat 571

9 Probability given by P (O i ) = P (Y i ) = π Y i1 i1 πy i2 i2 C 1... (1 k=1 π ik ) (1 C 1 k=1 Y ik) E[Y ij ] = π ij cov[y ij, Y ik ] = π ij π ik π ij (1 π ij ) j k j = k 90 Heagerty, Bio/Stat 571

10 What is Ordinal Data? Categories (fixed number) that are ordered much like the ordinal numbers: first, second,... It doesn t make sense to talk about a distance between the categories: HIGH, MEDIUM, LOW NEVER, RARELY, SOMETIMES, ALWAYS e.g. Drachman Class 91 Heagerty, Bio/Stat 571

11 Ordinal Data Q: Is this type of data really that common? A: Lincoln Moses (Stanford) surveyed vol. 306 of NEJM and found that 32/168 articles contained ordered categorical data! We found no indication that any of the authors in vol. 306 used an analysis that takes account of the ordering. (p. 447) 92 Heagerty, Bio/Stat 571

12 Ordinal? Binary? Continuous? Q: Do we lose much information by GROUPING (categorizing) a measurement? Example: rather than analysis of BMI as a continuous measurement we recode into: non-obese; obese; severely obsese. Q: Do we gain much information by using the ordered categories over just DICHOTOMIZING? Example: rather than analysis of BMI categories we use the indicator of severely obsese. 93 Heagerty, Bio/Stat 571

13 Ordinal? Binary? Continuous? Evaluation: 1) Assume an underlying Gaussian: O i N (µ, 1). 2) Define CUTPOINTS α R C. Define: Y ij = 1(O i (α j 1, α j ]) E(Y ij ) = P (O i (α j 1, α j ]) = P (O i α j ) P (O i α j 1 ) π ij = Φ(α j µ) Φ(α j 1 µ) Y i represents a multinomial random variable with m = 1, π i. 94 Heagerty, Bio/Stat 571

14 94-1 Heagerty, Bio/Stat 571 Gaussian with cut-points (mu=0) fz alpha1 alpha2 alpha3 z Gaussian with cut-points (mu=3/4) fz alpha1 alpha2 alpha3 z

15 Ordinal versus Continuous? Given a sample of Y 1, Y 2,..., Y n we can obtain the MLE for µ (and we ll just assume α is known). We obtain (derive this yourself!): I n = i j π ij µ 1 π ij π ij µ If we assume that the Y i s are i.i.d then we can drop the subscript i and we can compare the information from the categorized, or grouped, data Y i to the information in the continuous O i : 95 Heagerty, Bio/Stat 571

16 Asymptotic relative efficiency: var( µ O ) var( µ Y ) = 1/n [ n j π j µ 1 π j π j µ ] 1 = I 1 Q: How does the efficiency depend on C, the number of categories? Q: How does the efficiency depend on the locations α j? 96 Heagerty, Bio/Stat 571

17 Efficiency versus C For this I actually used the MLE s variance based on estimating the location difference µ 1 µ 2 = and the cut-point parameters. For the cutpoints I used the 1/(C + 1)-tiles of N (0, 1) shifted by /2 (ie. cut-points are between the two locations). 97 Heagerty, Bio/Stat 571

18 C ARE Heagerty, Bio/Stat 571

19 98-1 Heagerty, Bio/Stat 571 ARE using different thresholds, C=5 ARE for delta cuts = c( -2, -1, 1, 2 ) cuts = c( -1, -0.5, 0.5, 1 ) cuts = ( -0.75, -0.25, 0.25, 0.75 ) delta

20 Some references on grouping data : Connor R.J. (1972), Grouping for testing trends in categorical data, Journal of the American Statistical Association, 67: binary & continuous 2 k Efficiency from 65% - 97% for testing (k=2,...,6) Armstrong B.G. and Sloan M. (1989), Ordinal regression models for epidemiologic data, American Journal of Epidemiology, 129: POM versus logistic regression. A smart dichotomization yields 75-80% efficiency. Sankey S.S and Weissfeld L.A. (1998), A study of the effect of dichotomizing ordinal data upon modeling, Communications in Statistics, Part B Sim & Comp, 27: Heagerty, Bio/Stat 571

21 Probability Model = Multinomial Fixed number of categories (C). Fixed sample size (m). Insect species 1, 2,..., C. Sites s 1, s 2,..., s m. Scheme 1 When first insect enters trap it shuts. All traps left until catch 1. Scheme 2 If insect enters it can t get out. All traps left for 24 hours. Y 1,i = (0, 0, 0, 1, 0) Y 2,i = (0, 4, 1, 0, 2) 100 Heagerty, Bio/Stat 571

22 Probability Model = Multinomial Given Y ij = number of type j out of m i ; j Y ij = m i P (Y i1 = y i1, Y i2 = y i2,..., Y ic = y ic ) = m i! y i1!y i2!... y ic! In this representation the parameter π ij can be defined via: C j=1 π y ij ij Given m i = 1: Given m i > 1: 101 Heagerty, Bio/Stat 571

23 Some properties of multinomial 1) w 1, w 2,..., w C independent w i P(λ i ). Then (w 1, w 2,..., w C j w j = m) mult(m, 2) If Y mult(m, π) λ 1 j λ, j λ 2 j λ,..., j = Y j binomial(m, π j ) 3) cov(y j, Y k ) = mπ j π k for j k. λ C j λ ) j 102 Heagerty, Bio/Stat 571

24 4) REPRODUCIBLE: Let Y = (Y 1, Y 2 ) then (Y 1, sum(y 2 ) ) mult(m, π 1, sum(π 2 ) ) 5) REPRODUCIBLE for CONDITIONALS: Condition on Y j : (Y 1, Y 2,..., ( j),..., Y C Y j = t) mult(m t, π 1 1 π j, π 2 1 π j,..., ( j),..., π C 1 π j ) 6) Define cumulative totals Z j = j k=1 Y k, γ j = j k=1 π k: Z j binomial(m, γ j ) Z i Z j = z j, i < j binomial(z j, γ i γ j ) 103 Heagerty, Bio/Stat 571

25 Multinomial as Vector Exponential Family P (Y i ) = C 1 j=1 π Y ij ij (1 C 1 C 1 = exp Y ij log(π ij ) + (1 j C 1 k k Y ik ) log(1 π ik ) (1 C 1 k Y ik ) C 1 k π ik ) ] 104 Heagerty, Bio/Stat 571

26 Multinomial as Vector Exponential Family C 1 P (Y i ) = exp Y ij log(π ij /π ic ) + log(π ic ) = exp j [ ] Y T i θ i b(θ i ) Note: we have dropped the last category. 105 Heagerty, Bio/Stat 571

27 where θ ij = log(π ij /π ic ) b(θ i ) = log(π ic ) = log[1 + b(θ i ) = π ij (verify!) θ ij 2 π ij π ik k j b(θ i ) = θ ij θ ik π ij (1 π ij ) k = j C 1 k exp(θ ik )] (verify!) 106 Heagerty, Bio/Stat 571

28 Models for Ordinal Data dia.bp80 y RowTotl ColTotl Test for independence of all factors Chi^2 = d.f.= 3 (p= e-10) 107 Heagerty, Bio/Stat 571

29 107-1 Heagerty, Bio/Stat 571 Call: crosstabs( ~ dia.bp80 + (y > 1)) dia.bp80 (y > 1) FALSE TRUE RowTotl ColTotl Test for independence of all factors Chi^2 = d.f.= 1 (p= e-10) odds ratio = 2.74, ( 1.983, ) log-odds ratio = 1.008, s.e.= 0.165

30 107-2 Heagerty, Bio/Stat 571 Call: crosstabs( ~ dia.bp80 + (y > 2)) dia.bp80 (y > 2) FALSE TRUE RowTotl ColTotl Test for independence of all factors Chi^2 = d.f.= 1 (p= e-06) Yates correction not used odds ratio = 2.276, ( 1.611, ) log-odds ratio = 0.822, s.e.= 0.176

31 107-3 Heagerty, Bio/Stat 571 Call: crosstabs( ~ dia.bp80 + (y > 3)) dia.bp80 (y > 3) FALSE TRUE RowTotl ColTotl Test for independence of all factors Chi^2 = d.f.= 1 (p= ) odds ratio = 2.848, ( 1.539, ) log-odds ratio = 1.047, s.e.= 0.314

32 Proportional Odds Model model the cumulative indicators: Z ij = 1(O i j) Define γ ij = P (O i j) = E(Z ij ) GLM g(γ ij ) = β (0,j) X i β For a single j this is equivalent to logistic regression when we use a logit link. 108 Heagerty, Bio/Stat 571

33 The model with the logit link is called the Proportional odds model. Another common link is the complementary log-log link, g(γ) = log( log(1 γ)), which is the discrete time proportional hazards model. (more on this later) There is an ordering constraint: β (0,1) β (0,2)... β (0,C 1) 109 Heagerty, Bio/Stat 571

34 Score equations The underlying model is the multinomial which yields log L β Note: L = = i ( πi β ) T Σ 1 i (Y i π i ) Heagerty, Bio/Stat 571

35 Score equations Properties of L : LY i = Z i Lπ i = γ i log L β log L β = i = i ( ) T Lπi ( LΣi L T ) 1 L (Y i π i ) β ( γi β ) T [cov(z i )] 1 (Z i γ i ) 111 Heagerty, Bio/Stat 571

36 Reparameterization of POM There are several ways this model can be parameterized. We have used indicators for O ij j and: logit(γ ij ) = β (0,j) X i β Alternatively we may model Zij = 1(O ij > j) and: E(Z ij) = 1 γ ij = µ ij logit(µ ij) = β (0,j) + X i β = β (0,j) + X iβ 112 Heagerty, Bio/Stat 571

37 112-1 Heagerty, Bio/Stat 571 Ordinal Regression Cumulative Link GLM link = logit -2*maxL = Regression Coefficients and S.E. s estimate S.E. z c c c dia.bp

38 112-2 Heagerty, Bio/Stat 571 Ordinal Regression Cumulative Link GLM link = logit -2*maxL = Regression Coefficients and S.E. s estimate S.E. z c c c dia.bp

39 112-3 Heagerty, Bio/Stat 571 Ordinal Regression Cumulative Link GLM link = logit -2*maxL = Regression Coefficients and S.E. s estimate S.E. z c c c glyc.hemo log.duration age.diag dia.bp

40 Ordinal Data Regression Models for ordinal data can be viewed as simultaneous regressions for the cumulative indicators Z ij = 1(O i j). Probability model is the multinomial. Models for ordinal data use the order information but do not assign numbers to the outcome levels. Models are also invariant to collapsing the categories (ie. combine 4th and 5th levels). 113 Heagerty, Bio/Stat 571

41 Ordinal Data Regression Models for ordinal data are useful as something intermediate to just dichotomizing the data, or to assigning scores and using linear regression. Fact For a single binary covariate X = 0/1 the score test using the POM model is the Wilcoxon! (see MN exercise 5.10). Thus, these models can be thought of as models for the rank of the data. 114 Heagerty, Bio/Stat 571

42 Models for Multinomial Data Another approach to the analysis of ordered categorical data is to model conditional expectations. Infection in 0-6 months. Infection in 6-12 months given uninfected at 6. Infection in months given uninfected at 12. Salmon dies before point #1. Salmon dies before point #2 given past #1. Salmon dies before point #3 given past # Heagerty, Bio/Stat 571

43 Multinomial Framework Define: Y ij = 1(O i = j) Y i = vec(y ij ) H ij = 1 j 1 k=1 Y ik H i1 = Heagerty, Bio/Stat 571

44 Multinomial Framework Model: E[Y ij H ij = 1] = µ ij = π ij 1 γ i(j 1) g(µ ij ) = α j + X ij β j j = 1, 2,..., C 1 Note : E[Y ij H ij ] = µ ij H ij 117 Heagerty, Bio/Stat 571

45 Example: j = 1, 2,..., Y ij H ij µ ij µ i1 µ i2 µ i3 µ i4 µ i5 E[Y ij H ij ] µ i1 µ i2 µ i Heagerty, Bio/Stat 571

46 Multinomial CRM Models Models for E(Y ij H ij ) are sometimes called continuation ratio models (Agresti 1990). These conditional models are also known as discrete-time survival models, in particular, if we assume that the ordered categories are time intervals, G j = (t j 1, t j ], and O i records the interval that the time, T i falls in: O i = j T i G j 119 Heagerty, Bio/Stat 571

47 Multinomial CRM Models PH Assumption : P (T i > t) = exp[ Λ(t)] = exp[ Λ 0 (t) exp(x i β)] then E(Y ij H ij = 1) = µ ij log( log(1 µ ij )) = α j + X i β α j = log[λ 0 (t j ) Λ 0 (t j 1 )] Note: The PH model uses a common β rather than a coefficient specific to each time interval, β j. 120 Heagerty, Bio/Stat 571

48 CRM Model The multinomial probability factors: P (Y i = y i ) = C j=1 π y ij ij = P (y i1 )P (y i2 y i1 )P (y i3 y i1, y i2 ) P (y ic y ik k < C) 121 Heagerty, Bio/Stat 571

49 CRM Model P (Y i = y i ) = = C 1 j=1 C 1 j=1 ( ( π ij 1 γ i(j 1) π ij 1 γ i(j 1) ) yij ( 1 ) yij ( 1 γij π ij 1 γ i(j 1) 1 γ i(j 1) ) hij y ij ) hij y ij Note: π ij π ij logit( ) = log( ) 1 γ i(j 1) 1 γ ij 122 Heagerty, Bio/Stat 571

50 Comments: This likelihood factorization implies that we obtain the MLE for this model using the pairs (Y ij, H ij = 1) and treating them as independent Bernoulli random variables. For any (j, k) E [(Y ij µ ij H ij )(Y ik µ ik H ik )] = 0 Q: justification for this? 123 Heagerty, Bio/Stat 571

51 123-1 Heagerty, Bio/Stat 571 WESDR Example: # nsubjects <- nrow( data ) # wisc.data <- data.frame( y = wisc.all.data[,"r.level"], glyc.hemo = wisc.all.data[,"glyc.hemo"], log.duration = log( wisc.all.data[,"duration"] ), age.diag = wisc.all.data[,"age.diag"], dia.bp = wisc.all.data[,"dia.bp"] ) print( wisc.data[1:5,] ) # ### ### Construct CRM data... ### # ##### (1) construct the pairs (Y,H) #

52 ncuts <- max(wisc.data$y) - 1 print( paste("ncuts =", ncuts, "\n\n") ) y.crm <- NULL h.crm <- NULL id <- NULL for( j in 1:nsubjects ){ yj <- rep( 0, ncuts ) if( wisc.data$y[j] <= ncuts ) yj[ wisc.data$y[j] ] <- 1 hj <- 1 - c(0,cumsum(yj)[1:(ncuts-1)]) y.crm <- c( y.crm, yj ) h.crm <- c( h.crm, hj ) id <- c( id, rep(j,ncuts) ) } Heagerty, Bio/Stat 571

53 ##### (2) construct the intercepts # level <- factor( rep(1:ncuts, nsubjects ) ) int.mat <- NULL for( j in 1:ncuts ){ intj <- rep( 0, ncuts ) intj[ j ] <- 1 int.mat <- cbind( int.mat, rep( intj, nsubjects) ) } dimnames(int.mat) <- list( NULL, paste("int",c(1:ncuts),sep="") ) # ##### (3) expand the X s # glyc.hemo <- rep( wisc.data$glyc.hemo, rep(ncuts,nsubjects) ) log.duration <- rep( wisc.data$log.duration, rep(ncuts,nsubjects) ) age.diag <- rep( wisc.data$age.diag, rep(ncuts,nsubjects) ) dia.bp <- rep( wisc.data$dia.bp, rep(ncuts,nsubjects) ) # print( cbind( y.crm, h.crm, level, int.mat, glyc.hemo, log.duration, age.diag, dia.bp)[1:15,] ) # Heagerty, Bio/Stat 571

54 ##### (4) drop the H=0 and build dataframe # keep <- h.crm==1 wisc.crm.data <- data.frame( id = id[keep], y = y.crm[keep], level = level[keep], glyc.hemo = glyc.hemo[keep], log.duration = log.duration[keep], age.diag = age.diag[keep], dia.bp = dia.bp[keep] ) # print( wisc.crm.data[1:15,] ) Heagerty, Bio/Stat 571

55 123-5 Heagerty, Bio/Stat 571 Original Data: y glyc.hemo log.duration age.diag dia.bp

56 123-6 Heagerty, Bio/Stat 571 Expanded Data: id y.crm h.crm level Int1 Int2 Int3 glyc.hemo [1,] [2,] [3,] [4,] [5,] [6,] [7,] [8,] [9,] [10,] [11,] [12,] [13,] [14,] [15,]

57 id y level glyc.hemo log.duration age.diag dia.bp Heagerty, Bio/Stat 571

58 WESDR Models ##### ##### Model fitting... ##### # options( contasts="contr.treatment" ) # table( wisc.crm.data$level ) # fit1 <- glm( y ~ glyc.hemo, family=binomial, subset=(as.integer(level)==1), data = wisc.crm.data ) summary( fit1, cor=f ) fit2 <- glm( y ~ glyc.hemo, family=binomial, subset=(as.integer(level)==2), data = wisc.crm.data ) summary(fit2, cor=f ) fit3 <- glm( y ~ glyc.hemo, family=binomial, 124 Heagerty, Bio/Stat 571

59 subset=(as.integer(level)==3), data = wisc.crm.data ) summary(fit3, cor=f ) # fit4 <- glm( y ~ level + glyc.hemo, family=binomial, data = wisc.crm.data ) summary( fit4, cor=f ) fit5 <- glm( y ~ level * glyc.hemo, family=binomial, data = wisc.crm.data ) # anova( fit4, fit5 ) # 125 Heagerty, Bio/Stat 571

60 125-1 Heagerty, Bio/Stat 571 Separate fits: table(level) : 1 : 2 : ***LEVEL=1 Coefficients: Value Std. Error t value (Intercept) glyc.hemo

61 125-2 Heagerty, Bio/Stat 571 ***LEVEL=2 Coefficients: Value Std. Error t value (Intercept) glyc.hemo ***LEVEL=3 Value Std. Error t value (Intercept) glyc.hemo

62 125-3 Heagerty, Bio/Stat 571 CRM fits: Value Std. Error t value (Intercept) level: level: glyc.hemo (Dispersion Parameter for Binomial family taken to be 1 ) Null Deviance: on 1339 degrees of freedom Residual Deviance: on 1336 degrees of freedom Number of Fisher Scoring Iterations: 3

63 125-4 Heagerty, Bio/Stat 571 CRM fits: Analysis of Deviance Table Response: y Terms Resid. Df Resid. Dev Test Df Deviance 1 level + glyc.hemo level * glyc.hemo level:glyc.hemo

64 Summary Multinomial models use a multivariate response vector to represent the outcome. For these models the mean (π i ) determines the covariance. Proportional odds models consider all possible dichotomizations. Continuation ratio models consider an ordered sequence of conditional expectations. 126 Heagerty, Bio/Stat 571

65 References : McCullagh & Nelder, Chapter 5 Polytomous Regression Ananth C.V. and Kleinbaum D.G. (1997), Regression models for ordinal responses: A review of methods and applications, Int J of Epi, 26: Fahrmeir & Tutz, Chapter 3 Multicategorical responses Agresti A. (1999), Modelling ordered categorical data: recent advances and future challenges, Statistics in Medicine, 18: Heagerty, Bio/Stat 571

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016

Lecture 11. Interval Censored and. Discrete-Time Data. Statistics Survival Analysis. Presented March 3, 2016 Statistics 255 - Survival Analysis Presented March 3, 2016 Motivating Dan Gillen Department of Statistics University of California, Irvine 11.1 First question: Are the data truly discrete? : Number of

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Statistics in medicine

Statistics in medicine Statistics in medicine Lecture 4: and multivariable regression Fatma Shebl, MD, MS, MPH, PhD Assistant Professor Chronic Disease Epidemiology Department Yale School of Public Health Fatma.shebl@yale.edu

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels

Prediction of ordinal outcomes when the association between predictors and outcome diers between outcome levels STATISTICS IN MEDICINE Statist. Med. 2005; 24:1357 1369 Published online 26 November 2004 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/sim.2009 Prediction of ordinal outcomes when the

More information

,..., θ(2),..., θ(n)

,..., θ(2),..., θ(n) Likelihoods for Multivariate Binary Data Log-Linear Model We have 2 n 1 distinct probabilities, but we wish to consider formulations that allow more parsimonious descriptions as a function of covariates.

More information

Matched Pair Data. Stat 557 Heike Hofmann

Matched Pair Data. Stat 557 Heike Hofmann Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

R Hints for Chapter 10

R Hints for Chapter 10 R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Comparing IRT with Other Models

Comparing IRT with Other Models Comparing IRT with Other Models Lecture #14 ICPSR Item Response Theory Workshop Lecture #14: 1of 45 Lecture Overview The final set of slides will describe a parallel between IRT and another commonly used

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

STA 450/4000 S: January

STA 450/4000 S: January STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have

More information

Generalized Linear Models (GLZ)

Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) Generalized Linear Models (GLZ) are an extension of the linear modeling process that allows models to be fit to data that follow probability distributions other than the

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Methods@Manchester Summer School Manchester University July 2 6, 2018 Generalized Linear Models: a generic approach to statistical modelling www.research-training.net/manchester2018

More information

Probability Distributions Columns (a) through (d)

Probability Distributions Columns (a) through (d) Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)

More information

Investigating Models with Two or Three Categories

Investigating Models with Two or Three Categories Ronald H. Heck and Lynn N. Tabata 1 Investigating Models with Two or Three Categories For the past few weeks we have been working with discriminant analysis. Let s now see what the same sort of model might

More information

Logistic Regression - problem 6.14

Logistic Regression - problem 6.14 Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Logistic Regression. Continued Psy 524 Ainsworth

Logistic Regression. Continued Psy 524 Ainsworth Logistic Regression Continued Psy 524 Ainsworth Equations Regression Equation Y e = 1 + A+ B X + B X + B X 1 1 2 2 3 3 i A+ B X + B X + B X e 1 1 2 2 3 3 Equations The linear part of the logistic regression

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Generalized Models: Part 1

Generalized Models: Part 1 Generalized Models: Part 1 Topics: Introduction to generalized models Introduction to maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical outcomes

More information

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses

More information

Stat 642, Lecture notes for 04/12/05 96

Stat 642, Lecture notes for 04/12/05 96 Stat 642, Lecture notes for 04/12/05 96 Hosmer-Lemeshow Statistic The Hosmer-Lemeshow Statistic is another measure of lack of fit. Hosmer and Lemeshow recommend partitioning the observations into 10 equal

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW

ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved

More information

Repeated ordinal measurements: a generalised estimating equation approach

Repeated ordinal measurements: a generalised estimating equation approach Repeated ordinal measurements: a generalised estimating equation approach David Clayton MRC Biostatistics Unit 5, Shaftesbury Road Cambridge CB2 2BW April 7, 1992 Abstract Cumulative logit and related

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

Likelihoods for Generalized Linear Models

Likelihoods for Generalized Linear Models 1 Likelihoods for Generalized Linear Models 1.1 Some General Theory We assume that Y i has the p.d.f. that is a member of the exponential family. That is, f(y i ; θ i, φ) = exp{(y i θ i b(θ i ))/a i (φ)

More information

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017 Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which

More information

MIT Spring 2016

MIT Spring 2016 Generalized Linear Models MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Generalized Linear Models 1 Generalized Linear Models 2 Generalized Linear Model Data: (y i, x i ), i = 1,..., n where y i : response

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

Semiparametric Generalized Linear Models

Semiparametric Generalized Linear Models Semiparametric Generalized Linear Models North American Stata Users Group Meeting Chicago, Illinois Paul Rathouz Department of Health Studies University of Chicago prathouz@uchicago.edu Liping Gao MS Student

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Logistic regression (v1) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 30 Regression methods for binary outcomes 2 / 30 Binary outcomes For the duration of this

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Bayesian Multivariate Logistic Regression

Bayesian Multivariate Logistic Regression Bayesian Multivariate Logistic Regression Sean M. O Brien and David B. Dunson Biostatistics Branch National Institute of Environmental Health Sciences Research Triangle Park, NC 1 Goals Brief review of

More information

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression

Section IX. Introduction to Logistic Regression for binary outcomes. Poisson regression Section IX Introduction to Logistic Regression for binary outcomes Poisson regression 0 Sec 9 - Logistic regression In linear regression, we studied models where Y is a continuous variable. What about

More information

Analysing categorical data using logit models

Analysing categorical data using logit models Analysing categorical data using logit models Graeme Hutcheson, University of Manchester The lecture notes, exercises and data sets associated with this course are available for download from: www.research-training.net/manchester

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Statistical Modelling with Stata: Binary Outcomes

Statistical Modelling with Stata: Binary Outcomes Statistical Modelling with Stata: Binary Outcomes Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 21/11/2017 Cross-tabulation Exposed Unexposed Total Cases a b a + b Controls

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

Introduction to Generalized Linear Models

Introduction to Generalized Linear Models Introduction to Generalized Linear Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Introduction (motivation

More information

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla. Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi

LOGISTIC REGRESSION. Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi LOGISTIC REGRESSION Lalmohan Bhar Indian Agricultural Statistics Research Institute, New Delhi- lmbhar@gmail.com. Introduction Regression analysis is a method for investigating functional relationships

More information

Logistic Regression 21/05

Logistic Regression 21/05 Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression

More information

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102 Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical

More information

Introduction to Generalized Models

Introduction to Generalized Models Introduction to Generalized Models Today s topics: The big picture of generalized models Review of maximum likelihood estimation Models for binary outcomes Models for proportion outcomes Models for categorical

More information