R code and output of examples in text. Contents. De Jong and Heller GLMs for Insurance Data R code and output. 1 Poisson regression 2

Size: px
Start display at page:

Download "R code and output of examples in text. Contents. De Jong and Heller GLMs for Insurance Data R code and output. 1 Poisson regression 2"

Transcription

1 R code and output of examples in text Contents 1 Poisson regression 2 2 Negative binomial regression 5 3 Quasi likelihood regression 6 4 Logistic regression 6 5 Ordinal regression 10 6 Nominal regression 12 7 Gamma regression 13 8 Inverse Gaussian regression 15 9 Logistic regression GLMM Logistic regression GEE Logistic regression GAM 17 January 31,

2 1 Poisson regression Number of children: log link birth <- read.table("birth.csv",sep=",",header=t) birth.log <- glm( formula = children ~ age, family = poisson(link = log),data=birth) summary(birth.log) glm(formula = children ~ age, family = poisson(link = log), data = birth) Estimate Std. Error z value Pr( z ) (Intercept) e-08 *** age e-07 *** (Dispersion parameter for poisson family taken to be 1) Null deviance: on 140 degrees of freedom Residual deviance: on 139 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 anova(birth.log) Analysis of Deviance Table Model: poisson, link: log Response: children Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev NULL age Number of children: identity link R produces the following error message. Notice also the error message in the SAS output. Clearly there is a problem with this model. birth.id <- glm( formula = children ~ age, family = poisson(link = identity),data=birth) Error: no valid set of coefficients has been found: please supply starting values Diabetes deaths, categorical age In order the read the data into R, diabetes.xls must be saved as diabetes.csv. Gender and age are both character variables in the data file, so R will treat them as categorical. The way that the model is specified is deaths gender + age The default base level in R is the lowest level, which is female gender and age <25. In order to reproduce the SAS output, we control the base level using the C function. In the case of age, for example, we want to be the base level. This is the fourth level of age, so the term is specified in the model as C(age,base=4). January 31,

3 Diabetes <- read.table("diabetes.csv",sep=",",header=t) attach(diabetes) ### categorical age Model1 <- glm(deaths ~ C(gender,base=2) + C(age,base=4), family = poisson(link = log), offset = l_popn) summary(model1) glm(formula = deaths ~ C(gender, base = 2) + C(age, base = 4), family = poisson(link = log), offset = l_popn) Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** C(gender, base = 2) e-15 *** C(age, base = 4) e-09 *** C(age, base = 4) *** C(age, base = 4) ** C(age, base = 4) e-10 *** C(age, base = 4) < 2e-16 *** C(age, base = 4) < 2e-16 *** C(age, base = 4) < 2e-16 *** (Dispersion parameter for poisson family taken to be 1) Null deviance: on 15 degrees of freedom Residual deviance: on 7 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 Diabetes deaths, cubic age Polynomials are specified in R using the poly function. Model2 <- glm(deaths ~ C(gender,base=2) + poly(agemidpt,3), family = poisson(link = log), offset = l_popn) summary(model2) glm(formula = deaths ~ C(gender, base = 2) + poly(agemidpt, 3), family = poisson(link = log), offset = l_popn) Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** C(gender, base = 2) e-15 *** poly(agemidpt, 3) < 2e-16 *** poly(agemidpt, 3) poly(agemidpt, 3) (Dispersion parameter for poisson family taken to be 1) Null deviance: on 15 degrees of freedom Residual deviance: on 11 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 January 31,

4 This gives different coefficients for the agemidpt polynomial to SAS. The SAS solution is reproduced as minage <- min(agemidpt) maxage <- max(agemidpt) agestd <- (agemidpt-0.5*(minage+maxage))/(0.5*(maxage-minage)) Model3 <- glm(deaths ~ C(gender,base=2) + agestd + I(agestd^2) + I(agestd^3), + family = poisson(link = log), offset = l_popn) summary(model3) glm(formula = deaths ~ C(gender, base = 2) + agestd + I(agestd^2) + I(agestd^3), family = poisson(link = log), offset = l_popn) Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** C(gender, base = 2) e-15 *** agestd < 2e-16 *** I(agestd^2) I(agestd^3) (Dispersion parameter for poisson family taken to be 1) Null deviance: on 15 degrees of freedom Residual deviance: on 11 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 Third party claims TP <- read.table("third party claims.csv",sep=",",header=t) attach(tp) model1 <- glm(claims ~ log(accidents), family=poisson, offset=log(population)) summary(model1) glm(formula = claims ~ log(accidents), family = poisson, offset = log(population)) Estimate Std. Error z value Pr( z ) (Intercept) <2e-16 *** log(accidents) <2e-16 *** (Dispersion parameter for poisson family taken to be 1) Null deviance: on 175 degrees of freedom Residual deviance: on 174 degrees of freedom AIC: Number of Fisher Scoring iterations: 4 January 31,

5 2 Negative binomial regression Negative binomial regression is in the MASS library, which must be installed and loaded. The function is glm.nb. Third party claims library(mass) model2 <- glm.nb(claims ~ log(accidents) + offset(log(population))) summary(model2) glm.nb(formula = claims ~ log(accidents) + offset(log(population)), init.theta = , link = log) Estimate Std. Error z value Pr( z ) (Intercept) <2e-16 *** log(accidents) <2e-16 *** (Dispersion parameter for Negative Binomial(5.8309) family taken to be 1) Null deviance: on 175 degrees of freedom Residual deviance: on 174 degrees of freedom AIC: Number of Fisher Scoring iterations: 1 Correlation of (Intercept) log(accidents) Theta: Std. Err.: x log-likelihood: The dispersion parameter is Theta= In SAS the dispersion parameter is given as , which is 1/ Swedish mortality, categorical age and year mortality <- read.table("mortality.csv",header=t,sep=",") mortality <- mortality[,-c(3,5,7,9,11)] mortality <- na.omit(mortality) attach(mortality) library(mass) model1 <- glm.nb(male_death ~ factor(age) + factor(year) + offset(l_male_exp)) There were 50 or more warnings (use warnings() to see the first 50) summary(model1,corr=f) glm.nb(formula = Male_death ~ factor(age) + factor(year) + offset(l_male_exp), init.theta = , link = log) [parameter estimates table omitted] January 31,

6 (Dispersion parameter for Negative Binomial( ) family taken to be 1) Null deviance: on 5867 degrees of freedom Residual deviance: 7709 on 5704 degrees of freedom AIC: Number of Fisher Scoring iterations: 1 Theta: Std. Err.: x log-likelihood: Quasi likelihood regression model3 <- glm(claims ~ log(accidents), family=quasi(link="log",variance="mu"), + offset=log(population)) summary(model3) glm(formula = claims ~ log(accidents), family = quasi(link = "log", variance = "mu"), offset = log(population)) Estimate Std. Error t value Pr( t ) (Intercept) < 2e-16 *** log(accidents) e-12 *** (Dispersion parameter for quasi family taken to be ) Null deviance: on 175 degrees of freedom Residual deviance: on 174 degrees of freedom AIC: NA Number of Fisher Scoring iterations: 4 4 Logistic regression Vehicle insurance: quadratic vehicle value car <- read.table("car.csv",sep=",",header=t) model1 <- glm(clm ~ veh_value + I(veh_value^2), family=binomial, data=na.omit(car)) summary(model1) glm(formula = clm ~ veh_value + I(veh_value^2), family = binomial, data = na.omit(car)) Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** veh_value e-10 *** I(veh_value^2) e-05 *** January 31,

7 (Dispersion parameter for binomial family taken to be 1) Null deviance: on degrees of freedom Residual deviance: on degrees of freedom AIC: Number of Fisher Scoring iterations: 6 Vehicle insurance: banded vehicle value ### create banded variable valuecat <- cut(car$veh_value, c(-1,2.5,5.0,7.5,10.0,12.5,100)) table(valuecat) valuecat (-1,2.5] (2.5,5] (5,7.5] (7.5,10] (10,12.5] (12.5,100] car <- cbind(car,valuecat) model2 <- glm(clm ~ factor(valuecat), family=binomial, data=na.omit(car)) summary(model2) glm(formula = clm ~ factor(valuecat), family = binomial, data = na.omit(car)) Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** factor(valuecat)(2.5,5] e-06 *** factor(valuecat)(5,7.5] factor(valuecat)(7.5,10] factor(valuecat)(10,12.5] factor(valuecat)(12.5,100] (Dispersion parameter for binomial family taken to be 1) Null deviance: on degrees of freedom Residual deviance: on degrees of freedom AIC: Number of Fisher Scoring iterations: 5 Vehicle insurance: full model, adjusted for exposure source("logit-exposure-adjusted.r") attach(car) model3 <- glm(clm ~ C(factor(agecat),base=3)+ C(factor(area),base=3) + + C(factor(veh_body),base=10) + factor(valuecat), family=binomial(logitexp(exposure))) summary(model3) glm(formula = clm ~ C(factor(agecat), base = 3) + C(factor(area), base = 3) + C(factor(veh_body), base = 10) + factor(valuecat), family = binomial(logitexp(exposure))) January 31,

8 Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** C(factor(agecat), base = 3) e-06 *** C(factor(agecat), base = 3) C(factor(agecat), base = 3) C(factor(agecat), base = 3) e-06 *** C(factor(agecat), base = 3) *** C(factor(area), base = 3) C(factor(area), base = 3) C(factor(area), base = 3) * C(factor(area), base = 3) C(factor(area), base = 3) C(factor(veh_body), base = 10) * C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) ** C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) ** factor(valuecat)(2.5,5] e-05 *** factor(valuecat)(5,7.5] factor(valuecat)(7.5,10] factor(valuecat)(10,12.5] factor(valuecat)(12.5,100] (Dispersion parameter for binomial family taken to be 1) Null deviance: on degrees of freedom Residual deviance: on degrees of freedom AIC: Number of Fisher Scoring iterations: 4 Vehicle insurance: logistic regression on grouped data ### grouped data car.group <- read.table("car_grouped.csv",sep=",",header=t) ### the response is a two-column matrix ### the first column is the number of successes (claims) ### the second column is the number of failures (number-claims) model4 <- glm(cbind(claims,number-claims) ~ C(factor(agecat),base=6)+ C(factor(area),base=6) + + C(factor(veh_body),base=13) + factor(valuecat), + family=binomial, data=car.group) summary(model4) glm(formula = cbind(claims, number - claims) ~ C(factor(agecat), base = 6) + C(factor(area), base = 6) + C(factor(veh_body), base = 13) + factor(valuecat), family = binomial, data = car.group) Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** C(factor(agecat), base = 6) e-05 *** C(factor(agecat), base = 6) C(factor(agecat), base = 6) C(factor(agecat), base = 6) e-05 *** C(factor(agecat), base = 6) *** January 31,

9 C(factor(area), base = 6) C(factor(area), base = 6) C(factor(area), base = 6) * C(factor(area), base = 6) C(factor(area), base = 6) C(factor(veh_body), base = 13) ** C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) *** factor(valuecat) *** factor(valuecat) factor(valuecat) factor(valuecat) factor(valuecat) (Dispersion parameter for binomial family taken to be 1) Null deviance: on 928 degrees of freedom Residual deviance: on 901 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 ROC curves and AUC The AUC is easily computed using the somers2 function in the Hmisc package, which needs to be downloaded from the CRAN website. A function ROC for computing and plotting the ROC curve, is given on the book website in file ROC-function.r. car <- read.table("car.csv",sep=",",header=t) valuecat <- cut(car$veh_value, c(-1,2.5,5.0,7.5,10.0,12.5,100)) car <- cbind(car,valuecat) attach(car) The following object(s) are masked _by_.globalenv : valuecat library(hmisc) ### need this for somers2 function to compute AUC Attaching package: Hmisc The following object(s) are masked from package:base : format.pval The following object(s) are masked from package:base : round.posixt The following object(s) are masked from package:base : trunc.posixt Warning message: package Hmisc was built under R version source("roc-function.r") ### from book website; for plotting ROC curve January 31,

10 model5 <- glm(clm ~ C(factor(agecat),base=3)+ C(factor(area),base=3) + + C(factor(veh_body),base=10) + factor(valuecat), family=binomial) ## compute fitted values from logistic regression and store in fittedvalues fittedvalues <- predict(model5, type = response, newdata = car) somers2(fittedvalues,clm) C Dxy n Missing e e e e+00 ROC(fittedvalues,clm) The AUC is given as the element C of the somers2 result, which is Sensitivity specificity 5 Ordinal regression Proportional odds model A few functions for this model are available. We prefer vglm in the VGAM package. The VGAM manual is worth consulting before attempting to implement the next three models. injury <- read.table("injury.csv",sep=",",header=t) attach(injury) library(vgam) Loading required package: splines Loading required package: stats4 Attaching package: VGAM [warnings omitted] ## change base levels to those in the text ## (not necessary, this is just to demonstrate that the solution is the same ## as the SAS solution) road.x <- C(factor(roaduserclass),base=4) age.x <- C(factor(agecat),base=7) sex.x <- C(sex,base=2) model1 <- vglm(degree ~ road.x + age.x + sex.x + age.x*sex.x, cumulative(parallel=true), + weights=number) summary(model1) vglm(formula = degree ~ road.x + age.x + sex.x + age.x * sex.x, family = cumulative(parallel = TRUE), weights = number) January 31,

11 Pearson Residuals: logit(p[y<=1]) logit(p[y<=2]) Value Std. Error t value (Intercept): (Intercept): road.x road.x road.x age.x age.x age.x age.x age.x age.x sex.x age.x1:sex.x age.x2:sex.x age.x3:sex.x age.x4:sex.x age.x5:sex.x age.x6:sex.x Number of linear predictors: 2 Names of linear predictors: logit(p[y<=1]), logit(p[y<=2]) Dispersion Parameter for cumulative family: 1 Residual Deviance: on 400 degrees of freedom Log-likelihood: on 400 degrees of freedom Number of Iterations: 7 Partial proportional odds model We use vglm for this model. parameter. The partial proportional odds are specified via the parallel model2 <- vglm(degree ~ road.x + age.x + sex.x + age.x*sex.x, + cumulative(parallel=true~age.x*sex.x-1), + weights=number) summary(model2) vglm(formula = degree ~ road.x + age.x + sex.x + age.x * sex.x, family = cumulative(parallel = TRUE ~ age.x * sex.x - 1), weights = number) Pearson Residuals: logit(p[y<=1]) logit(p[y<=2]) Value Std. Error t value (Intercept): (Intercept): road.x1: road.x1: road.x2: road.x2: road.x3: road.x3: age.x January 31,

12 age.x age.x age.x age.x age.x sex.x age.x1:sex.x age.x2:sex.x age.x3:sex.x age.x4:sex.x age.x5:sex.x age.x6:sex.x Number of linear predictors: 2 Names of linear predictors: logit(p[y<=1]), logit(p[y<=2]) Dispersion Parameter for cumulative family: 1 Residual Deviance: on 397 degrees of freedom Log-likelihood: on 397 degrees of freedom Number of Iterations: 7 6 Nominal regression As the private health insurance data are not publicly available, nominal regression is illustrated here on the degree of injury data. The vglm function in the VGAM package is used. injury <- read.table("injury.csv",sep=",",header=t) attach(injury) library(vgam) Loading required package: splines Loading required package: stats4 Attaching package: VGAM [warnings omitted] ## change base levels to those in the text road.x <- C(factor(roaduserclass),base=4) age.x <- C(factor(agecat),base=7) sex.x <- C(sex,base=2) ## nominal regression model model3 <- vglm(degree ~ road.x + age.x + sex.x + age.x*sex.x, + multinomial, weights=number) summary(model3) vglm(formula = degree ~ road.x + age.x + sex.x + age.x * sex.x, family = multinomial, weights = number) Pearson Residuals: log(mu[,1]/mu[,3]) log(mu[,2]/mu[,3]) Value Std. Error t value (Intercept): (Intercept): road.x1: road.x1: road.x2: road.x2: road.x3: road.x3: January 31,

13 age.x1: age.x1: age.x2: age.x2: age.x3: age.x3: age.x4: age.x4: age.x5: age.x5: age.x6: age.x6: sex.x1: sex.x1: age.x1:sex.x1: age.x1:sex.x1: age.x2:sex.x1: age.x2:sex.x1: age.x3:sex.x1: age.x3:sex.x1: age.x4:sex.x1: age.x4:sex.x1: age.x5:sex.x1: age.x5:sex.x1: age.x6:sex.x1: age.x6:sex.x1: Number of linear predictors: 2 Names of linear predictors: log(mu[,1]/mu[,3]), log(mu[,2]/mu[,3]) Dispersion Parameter for multinomial family: 1 Residual Deviance: on 384 degrees of freedom Log-likelihood: on 384 degrees of freedom Number of Iterations: 7 7 Gamma regression Vehicle insurance car <- read.table("car.csv",sep=",",header=t) #### banded vehicle value valuecat <- cut(car$veh_value, c(-1,2.5,5.0,7.5,10.0,12.5,100)) #### create variables with same base levels as in the text age.x <- C(factor(car$agecat),base=3) ## agecat=3 base level area.x <- C(factor(car$area),base=3) ## area C is 3rd level gender.x <- C(factor(car$gender),base=2) ## gender M is 2nd level veh_body.x <- C(factor(car$veh_body),base=10) ## SEDAN is 10th level car <- cbind(car,valuecat, age.x,area.x,gender.x,veh_body.x) model1 <- glm(claimcst0 ~ age.x + gender.x + age.x*gender.x + area.x + veh_body.x, + family=gamma(link="log"),data=subset(car,clm==1)) summary(model1) glm(formula = claimcst0 ~ age.x + gender.x + age.x * gender.x + area.x + veh_body.x, family = Gamma(link = "log"), data = subset(car, clm == 1)) January 31,

14 [output omitted] (Dispersion parameter for Gamma family taken to be ) Null deviance: on 4623 degrees of freedom Residual deviance: on 4595 degrees of freedom AIC: Number of Fisher Scoring iterations: 7 Personal injury insurance, no adjustment for quickly settled claims persinj <- read.table("persinj.csv",sep=",",header=t) model3 <- glm(total ~ op_time + factor(legrep) + op_time*factor(legrep), + family=gamma(link="log"), data=persinj) summary(model3) glm(formula = total ~ op_time + factor(legrep) + op_time * factor(legrep), family = Gamma(link = "log"), data = persinj) Estimate Std. Error t value Pr( t ) (Intercept) < 2e-16 *** op_time < 2e-16 *** factor(legrep) < 2e-16 *** op_time:factor(legrep) e-10 *** (Dispersion parameter for Gamma family taken to be ) Null deviance: on degrees of freedom Residual deviance: on degrees of freedom AIC: Number of Fisher Scoring iterations: 6 Runoff triangle runoff <- read.table("runoff triangle.csv",sep=",",header=t) runoff$y[runoff$y<0] <- 1 ### replace negative value by 1 model4 <- glm(y ~ factor(devyear) + factor(accyear), family=gamma(link="log"), data=runoff) summary(model4) glm(formula = Y ~ factor(devyear) + factor(accyear), family = Gamma(link = "log"), data = runoff) Estimate Std. Error t value Pr( t ) (Intercept) < 2e-16 *** factor(devyear) * factor(devyear) * factor(devyear) factor(devyear) factor(devyear) January 31,

15 factor(devyear) * factor(devyear) * factor(devyear) ** factor(devyear) ** factor(accyear) factor(accyear) factor(accyear) factor(accyear) factor(accyear) factor(accyear) factor(accyear) factor(accyear) factor(accyear) (Dispersion parameter for Gamma family taken to be ) Null deviance: on 54 degrees of freedom Residual deviance: on 36 degrees of freedom AIC: Number of Fisher Scoring iterations: 11 8 Inverse Gaussian regression The data frame car used here is the one created for the vehicle insurance, Gamma regression model. model2 <- glm(claimcst0 ~ age.x + gender.x + area.x, + family=inverse.gaussian(link="log"),data=subset(car,clm==1)) summary(model2) glm(formula = claimcst0 ~ age.x + gender.x + area.x, family = inverse.gaussian(link = "log"), data = subset(car, clm == 1)) Estimate Std. Error t value Pr( t ) (Intercept) < 2e-16 *** age.x * age.x age.x age.x age.x gender.x ** area.x area.x area.x area.x area.x * (Dispersion parameter for inverse.gaussian family taken to be ) Null deviance: on 4623 degrees of freedom Residual deviance: on 4612 degrees of freedom AIC: Number of Fisher Scoring iterations: 11 January 31,

16 9 Logistic regression GLMM The software in this area is developing very rapidly. We use here glmmpql in the MASS package. claimslong <- read.table("claimslong.txt",header=t,sep=",") ## create binary variable for claim/no claim claimslong <- cbind(claimslong,clm=1*(claimslong$numclaims0)) #### create variables with same base levels as in the text, for comparability age.x <- C(factor(claimslong$agecat),base=6) value.x <- C(factor(claimslong$valuecat),base=6) period.x <- C(factor(claimslong$period),base=3) claimslong <- cbind(claimslong,age.x,value.x,period.x) library(mass) model1 <- glmmpql(clm ~ age.x + value.x + period.x, + random=~1 policyid, family=binomial, data=claimslong) Loading required package: nlme iteration 1 iteration 2 iteration 3 iteration 4 iteration 5 iteration 6 iteration 7 iteration 8 summary(model1) Linear mixed-effects model fit by maximum likelihood Data: claimslong AIC BIC loglik NA NA NA Random effects: Formula: ~1 policyid (Intercept) Residual StdDev: Variance function: Structure: fixed weights Formula: ~invwt Fixed effects: clm ~ age.x + value.x + period.x Value Std.Error DF t-value p-value (Intercept) age.x age.x age.x age.x age.x value.x value.x value.x value.x value.x period.x period.x Correlation: [correlation matrix omitted] Standardized Within-Group Residuals: Min Q1 Med Q3 Max Number of Observations: Number of Groups: Parameter estimates are similar to those produced by SAS. They are not identical because proc nlmixed and glmmpql use different methods for finding the maximum likelihood solution. January 31,

17 10 Logistic regression GEE As for GLMMs, software for these models is evolving constantly. We use geeglm in the geepack package, which gives identical parameter estimates to proc genmod. model2 <- geeglm(clm ~ age.x + value.x + period.x, + id=policyid, corstr="exchangeable", family=binomial, data=claimslong) summary(model2) geeglm(formula = clm ~ age.x + value.x + period.x, family = binomial, data = claimslong, id = policyid, corstr = "exchangeable") Estimate Std.err Wald p(w) (Intercept) e e+00 age.x e e-06 age.x e e-01 age.x e e-01 age.x e e-08 age.x e e-04 value.x e e-09 value.x e e-01 value.x e e-02 value.x e e-01 value.x e e-01 period.x e e+00 period.x e e-13 Estimated Scale Parameters: Estimate Std.err (Intercept) Correlation: Structure = exchangeable Link = identity Estimated Correlation Parameters: Estimate Std.err alpha Number of clusters: Maximum cluster size: 3 11 Logistic regression GAM GAMs can be fitted using either the special purpose gam package, or the more general gamlss package. We illustrate the use of both. ######## vehicle insurance data car <- read.table("car.csv",sep=",",header=t) #### banded vehicle value valuecat <- cut(car$veh_value, c(-1,2.5,5.0,7.5,10.0,12.5,100)) #### create variables with same base levels as in the text age.x <- C(factor(car$agecat),base=3) ## agecat=3 base level area.x <- C(factor(car$area),base=3) ## area C is 3rd level gender.x <- C(factor(car$gender),base=2) ## gender M is 2nd level veh_body.x <- C(factor(car$veh_body),base=10) ## SEDAN is 10th level car <- cbind(car,valuecat, age.x,area.x,gender.x,veh_body.x) ### use gam in gam package: library(gam) Loading required package: splines model1 <- gam(clm ~ age.x + area.x + veh_body.x + s(veh_value), + family=binomial, data=car) summary(model1) gam(formula = clm ~ age.x + area.x + veh_body.x + s(veh_value), family = binomial, data = car) January 31,

18 (Dispersion Parameter for binomial family taken to be 1) Null Deviance: on degrees of freedom Residual Deviance: on degrees of freedom AIC: Number of Local Scoring Iterations: 7 DF for Terms and Chi-squares for Nonparametric Effects Df Npar Df Npar Chisq P(Chi) (Intercept) 1 age.x 5 area.x 5 veh_body.x 12 s(veh_value) e-06 par(mfrow=c(2,2)) plot(model1) A B C D F partial for age.x partial for area.x age.x area.x BUS HDTOP STNWG partial for veh_body.x s(veh_value) veh_body.x veh_value The highly nonlinear effect of vehicle value, with a peak around 4 ($40 000), is seen clearly. The gamlss implementation gives parameter estimates for the parametric explanatory variables, which are similar to those given by proc gam. ### use gamlss: library(gamlss) Loading required package: splines ********** GAMLSS Version ********** For more on GAMLSS look at Type gamlssnews() to see new features/changes/bug fixes. January 31,

19 model2 <- gamlss(clm ~ age.x + area.x + veh_body.x + cs(veh_value), + family=bi, data=car) GAMLSS-RS iteration 1: Global Deviance = GAMLSS-RS iteration 2: Global Deviance = summary(model2) ******************************************************************* Family: c("bi", "Binomial") gamlss(formula = clm ~ age.x + area.x + veh_body.x + cs(veh_value), family = BI, data = car) Fitting method: RS() - Mu link function: logit Mu Estimate Std. Error t value Pr( t ) (Intercept) e+00 age.x e-05 age.x e-01 age.x e-01 age.x e-05 age.x e-04 area.x e-01 area.x e-01 area.x e-02 area.x e-01 area.x e-01 veh_body.x e-03 veh_body.x e-01 veh_body.x e-02 veh_body.x e-01 veh_body.x e-01 veh_body.x e-02 veh_body.x e-01 veh_body.x e-01 veh_body.x e-01 veh_body.x e-01 veh_body.x e-01 veh_body.x e-04 cs(veh_value) e-07 - No. of observations in the fit: Degrees of Freedom for the fit: Residual Deg. of Freedom: at cycle: 2 Global Deviance: AIC: SBC: ******************************************************************* Warning message: addive terms exists in the mu formula results maybe are not appropriate in: vcov.gamlss(object, "all") January 31,

Sample solutions. Stat 8051 Homework 8

Sample solutions. Stat 8051 Homework 8 Sample solutions Stat 8051 Homework 8 Problem 1: Faraway Exercise 3.1 A plot of the time series reveals kind of a fluctuating pattern: Trying to fit poisson regression models yields a quadratic model if

More information

Ch 6: Multicategory Logit Models

Ch 6: Multicategory Logit Models 293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses

More information

Non-Gaussian Response Variables

Non-Gaussian Response Variables Non-Gaussian Response Variables What is the Generalized Model Doing? The fixed effects are like the factors in a traditional analysis of variance or linear model The random effects are different A generalized

More information

Logistic Regression 21/05

Logistic Regression 21/05 Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.

Regression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples. Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression

More information

Exercise 5.4 Solution

Exercise 5.4 Solution Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information

Poisson Regression. The Training Data

Poisson Regression. The Training Data The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following

More information

Generalised linear models. Response variable can take a number of different formats

Generalised linear models. Response variable can take a number of different formats Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion

More information

R Output for Linear Models using functions lm(), gls() & glm()

R Output for Linear Models using functions lm(), gls() & glm() LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/

Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/ Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion

Overdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion

More information

Modeling Overdispersion

Modeling Overdispersion James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Poisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction

More information

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.

You can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials. The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the

More information

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )

cor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson ) Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation

More information

Matched Pair Data. Stat 557 Heike Hofmann

Matched Pair Data. Stat 557 Heike Hofmann Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic

More information

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis

7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized

More information

Logistic Regression - problem 6.14

Logistic Regression - problem 6.14 Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

Age 55 (x = 1) Age < 55 (x = 0)

Age 55 (x = 1) Age < 55 (x = 0) Logistic Regression with a Single Dichotomous Predictor EXAMPLE: Consider the data in the file CHDcsv Instead of examining the relationship between the continuous variable age and the presence or absence

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Interactions in Logistic Regression

Interactions in Logistic Regression Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.

More information

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science. Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

MODULE 6 LOGISTIC REGRESSION. Module Objectives:

MODULE 6 LOGISTIC REGRESSION. Module Objectives: MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between

More information

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps

More information

Unit 5 Logistic Regression Practice Problems

Unit 5 Logistic Regression Practice Problems Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises

More information

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key

Statistical Methods III Statistics 212. Problem Set 2 - Answer Key Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423

More information

Generalized Linear Models

Generalized Linear Models York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear

More information

Lecture 9 STK3100/4100

Lecture 9 STK3100/4100 Lecture 9 STK3100/4100 27. October 2014 Plan for lecture: 1. Linear mixed models cont. Models accounting for time dependencies (Ch. 6.1) 2. Generalized linear mixed models (GLMM, Ch. 13.1-13.3) Examples

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

R Hints for Chapter 10

R Hints for Chapter 10 R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.

More information

Duration of Unemployment - Analysis of Deviance Table for Nested Models

Duration of Unemployment - Analysis of Deviance Table for Nested Models Duration of Unemployment - Analysis of Deviance Table for Nested Models February 8, 2012 The data unemployment is included as a contingency table. The response is the duration of unemployment, gender and

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response) Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV

More information

Regression Methods for Survey Data

Regression Methods for Survey Data Regression Methods for Survey Data Professor Ron Fricker! Naval Postgraduate School! Monterey, California! 3/26/13 Reading:! Lohr chapter 11! 1 Goals for this Lecture! Linear regression! Review of linear

More information

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression

Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)

More information

PAPER 206 APPLIED STATISTICS

PAPER 206 APPLIED STATISTICS MATHEMATICAL TRIPOS Part III Thursday, 1 June, 2017 9:00 am to 12:00 pm PAPER 206 APPLIED STATISTICS Attempt no more than FOUR questions. There are SIX questions in total. The questions carry equal weight.

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models 1/37 The Kelp Data FRONDS 0 20 40 60 20 40 60 80 100 HLD_DIAM FRONDS are a count variable, cannot be < 0 2/37 Nonlinear Fits! FRONDS 0 20 40 60 log NLS 20 40 60 80 100 HLD_DIAM

More information

Chapter 3: Generalized Linear Models

Chapter 3: Generalized Linear Models 92 Chapter 3: Generalized Linear Models 3.1 Components of a GLM 1. Random Component Identify response variable Y. Assume independent observations y 1,...,y n from particular family of distributions, e.g.,

More information

Models for Binary Outcomes

Models for Binary Outcomes Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.

More information

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next

The GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Methods@Manchester Summer School Manchester University July 2 6, 2018 Generalized Linear Models: a generic approach to statistical modelling www.research-training.net/manchester2018

More information

Notes for week 4 (part 2)

Notes for week 4 (part 2) Notes for week 4 (part 2) Ben Bolker October 3, 2013 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share & remix noncommercially,

More information

Solutions to obligatorisk oppgave 2, STK2100

Solutions to obligatorisk oppgave 2, STK2100 Solutions to obligatorisk oppgave 2, STK2100 Vinnie Ko May 14, 2018 Disclaimer: This document is made solely for my own personal use and can contain many errors. Oppgave 1 We load packages and read data

More information

STAT 510 Final Exam Spring 2015

STAT 510 Final Exam Spring 2015 STAT 510 Final Exam Spring 2015 Instructions: The is a closed-notes, closed-book exam No calculator or electronic device of any kind may be used Use nothing but a pen or pencil Please write your name and

More information

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic

More information

Analysis of binary repeated measures data with R

Analysis of binary repeated measures data with R Analysis of binary repeated measures data with R Right-handed basketball players take right and left-handed shots from 3 locations in a different random order for each player. Hit or miss is recorded.

More information

COMPLEMENTARY LOG-LOG MODEL

COMPLEMENTARY LOG-LOG MODEL COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!

Hierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine. Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present

More information

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation

Truck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical

More information

ZERO INFLATED POISSON REGRESSION

ZERO INFLATED POISSON REGRESSION STAT 6500 ZERO INFLATED POISSON REGRESSION FINAL PROJECT DEC 6 th, 2013 SUN JEON DEPARTMENT OF SOCIOLOGY UTAH STATE UNIVERSITY POISSON REGRESSION REVIEW INTRODUCING - ZERO-INFLATED POISSON REGRESSION SAS

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

ESTIMATE PROP. IMPAIRED PRE- AND POST-INTERVENTION FOR THIN LIQUID SWALLOW TASKS. The SURVEYFREQ Procedure

ESTIMATE PROP. IMPAIRED PRE- AND POST-INTERVENTION FOR THIN LIQUID SWALLOW TASKS. The SURVEYFREQ Procedure ESTIMATE PROP. IMPAIRED PRE- AND POST-INTERVENTION FOR THIN LIQUID SWALLOW TASKS 18:58 Sunday, July 26, 2015 1 The SURVEYFREQ Procedure Data Summary Number of Clusters 30 Number of Observations 360 time_cat

More information

STA 450/4000 S: January

STA 450/4000 S: January STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have

More information

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 NBA attendance data........................ 2 2 Regression model for NBA attendances...............

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Introduction to logistic regression

Introduction to logistic regression Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

DISPLAYING THE POISSON REGRESSION ANALYSIS

DISPLAYING THE POISSON REGRESSION ANALYSIS Chapter 17 Poisson Regression Chapter Table of Contents DISPLAYING THE POISSON REGRESSION ANALYSIS...264 ModelInformation...269 SummaryofFit...269 AnalysisofDeviance...269 TypeIII(Wald)Tests...269 MODIFYING

More information

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables

More information

Chapter 1. Modeling Basics

Chapter 1. Modeling Basics Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical

More information

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models

Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: 1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more

More information

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102

Regression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102 Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical

More information

Various Issues in Fitting Contingency Tables

Various Issues in Fitting Contingency Tables Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam

Booklet of Code and Output for STAD29/STA 1007 Midterm Exam Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 Packages................................ 2 2 Hospital infection risk data (some).................

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Table of Contents. Logistic Regression- Illustration Carol Bigelow March 21, 2017

Table of Contents. Logistic Regression- Illustration Carol Bigelow March 21, 2017 Logistic Regression- Illustration Carol Bigelow March 21, 2017 Table of Contents Preliminary - Attach packages needed using command library( )... 2 Must have installed packages in console window first...

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

Reaction Days

Reaction Days Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days)

More information

Generalized Linear Modeling - Logistic Regression

Generalized Linear Modeling - Logistic Regression 1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating

More information