R code and output of examples in text. Contents. De Jong and Heller GLMs for Insurance Data R code and output. 1 Poisson regression 2
|
|
- Sharlene Neal
- 5 years ago
- Views:
Transcription
1 R code and output of examples in text Contents 1 Poisson regression 2 2 Negative binomial regression 5 3 Quasi likelihood regression 6 4 Logistic regression 6 5 Ordinal regression 10 6 Nominal regression 12 7 Gamma regression 13 8 Inverse Gaussian regression 15 9 Logistic regression GLMM Logistic regression GEE Logistic regression GAM 17 January 31,
2 1 Poisson regression Number of children: log link birth <- read.table("birth.csv",sep=",",header=t) birth.log <- glm( formula = children ~ age, family = poisson(link = log),data=birth) summary(birth.log) glm(formula = children ~ age, family = poisson(link = log), data = birth) Estimate Std. Error z value Pr( z ) (Intercept) e-08 *** age e-07 *** (Dispersion parameter for poisson family taken to be 1) Null deviance: on 140 degrees of freedom Residual deviance: on 139 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 anova(birth.log) Analysis of Deviance Table Model: poisson, link: log Response: children Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev NULL age Number of children: identity link R produces the following error message. Notice also the error message in the SAS output. Clearly there is a problem with this model. birth.id <- glm( formula = children ~ age, family = poisson(link = identity),data=birth) Error: no valid set of coefficients has been found: please supply starting values Diabetes deaths, categorical age In order the read the data into R, diabetes.xls must be saved as diabetes.csv. Gender and age are both character variables in the data file, so R will treat them as categorical. The way that the model is specified is deaths gender + age The default base level in R is the lowest level, which is female gender and age <25. In order to reproduce the SAS output, we control the base level using the C function. In the case of age, for example, we want to be the base level. This is the fourth level of age, so the term is specified in the model as C(age,base=4). January 31,
3 Diabetes <- read.table("diabetes.csv",sep=",",header=t) attach(diabetes) ### categorical age Model1 <- glm(deaths ~ C(gender,base=2) + C(age,base=4), family = poisson(link = log), offset = l_popn) summary(model1) glm(formula = deaths ~ C(gender, base = 2) + C(age, base = 4), family = poisson(link = log), offset = l_popn) Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** C(gender, base = 2) e-15 *** C(age, base = 4) e-09 *** C(age, base = 4) *** C(age, base = 4) ** C(age, base = 4) e-10 *** C(age, base = 4) < 2e-16 *** C(age, base = 4) < 2e-16 *** C(age, base = 4) < 2e-16 *** (Dispersion parameter for poisson family taken to be 1) Null deviance: on 15 degrees of freedom Residual deviance: on 7 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 Diabetes deaths, cubic age Polynomials are specified in R using the poly function. Model2 <- glm(deaths ~ C(gender,base=2) + poly(agemidpt,3), family = poisson(link = log), offset = l_popn) summary(model2) glm(formula = deaths ~ C(gender, base = 2) + poly(agemidpt, 3), family = poisson(link = log), offset = l_popn) Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** C(gender, base = 2) e-15 *** poly(agemidpt, 3) < 2e-16 *** poly(agemidpt, 3) poly(agemidpt, 3) (Dispersion parameter for poisson family taken to be 1) Null deviance: on 15 degrees of freedom Residual deviance: on 11 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 January 31,
4 This gives different coefficients for the agemidpt polynomial to SAS. The SAS solution is reproduced as minage <- min(agemidpt) maxage <- max(agemidpt) agestd <- (agemidpt-0.5*(minage+maxage))/(0.5*(maxage-minage)) Model3 <- glm(deaths ~ C(gender,base=2) + agestd + I(agestd^2) + I(agestd^3), + family = poisson(link = log), offset = l_popn) summary(model3) glm(formula = deaths ~ C(gender, base = 2) + agestd + I(agestd^2) + I(agestd^3), family = poisson(link = log), offset = l_popn) Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** C(gender, base = 2) e-15 *** agestd < 2e-16 *** I(agestd^2) I(agestd^3) (Dispersion parameter for poisson family taken to be 1) Null deviance: on 15 degrees of freedom Residual deviance: on 11 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 Third party claims TP <- read.table("third party claims.csv",sep=",",header=t) attach(tp) model1 <- glm(claims ~ log(accidents), family=poisson, offset=log(population)) summary(model1) glm(formula = claims ~ log(accidents), family = poisson, offset = log(population)) Estimate Std. Error z value Pr( z ) (Intercept) <2e-16 *** log(accidents) <2e-16 *** (Dispersion parameter for poisson family taken to be 1) Null deviance: on 175 degrees of freedom Residual deviance: on 174 degrees of freedom AIC: Number of Fisher Scoring iterations: 4 January 31,
5 2 Negative binomial regression Negative binomial regression is in the MASS library, which must be installed and loaded. The function is glm.nb. Third party claims library(mass) model2 <- glm.nb(claims ~ log(accidents) + offset(log(population))) summary(model2) glm.nb(formula = claims ~ log(accidents) + offset(log(population)), init.theta = , link = log) Estimate Std. Error z value Pr( z ) (Intercept) <2e-16 *** log(accidents) <2e-16 *** (Dispersion parameter for Negative Binomial(5.8309) family taken to be 1) Null deviance: on 175 degrees of freedom Residual deviance: on 174 degrees of freedom AIC: Number of Fisher Scoring iterations: 1 Correlation of (Intercept) log(accidents) Theta: Std. Err.: x log-likelihood: The dispersion parameter is Theta= In SAS the dispersion parameter is given as , which is 1/ Swedish mortality, categorical age and year mortality <- read.table("mortality.csv",header=t,sep=",") mortality <- mortality[,-c(3,5,7,9,11)] mortality <- na.omit(mortality) attach(mortality) library(mass) model1 <- glm.nb(male_death ~ factor(age) + factor(year) + offset(l_male_exp)) There were 50 or more warnings (use warnings() to see the first 50) summary(model1,corr=f) glm.nb(formula = Male_death ~ factor(age) + factor(year) + offset(l_male_exp), init.theta = , link = log) [parameter estimates table omitted] January 31,
6 (Dispersion parameter for Negative Binomial( ) family taken to be 1) Null deviance: on 5867 degrees of freedom Residual deviance: 7709 on 5704 degrees of freedom AIC: Number of Fisher Scoring iterations: 1 Theta: Std. Err.: x log-likelihood: Quasi likelihood regression model3 <- glm(claims ~ log(accidents), family=quasi(link="log",variance="mu"), + offset=log(population)) summary(model3) glm(formula = claims ~ log(accidents), family = quasi(link = "log", variance = "mu"), offset = log(population)) Estimate Std. Error t value Pr( t ) (Intercept) < 2e-16 *** log(accidents) e-12 *** (Dispersion parameter for quasi family taken to be ) Null deviance: on 175 degrees of freedom Residual deviance: on 174 degrees of freedom AIC: NA Number of Fisher Scoring iterations: 4 4 Logistic regression Vehicle insurance: quadratic vehicle value car <- read.table("car.csv",sep=",",header=t) model1 <- glm(clm ~ veh_value + I(veh_value^2), family=binomial, data=na.omit(car)) summary(model1) glm(formula = clm ~ veh_value + I(veh_value^2), family = binomial, data = na.omit(car)) Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** veh_value e-10 *** I(veh_value^2) e-05 *** January 31,
7 (Dispersion parameter for binomial family taken to be 1) Null deviance: on degrees of freedom Residual deviance: on degrees of freedom AIC: Number of Fisher Scoring iterations: 6 Vehicle insurance: banded vehicle value ### create banded variable valuecat <- cut(car$veh_value, c(-1,2.5,5.0,7.5,10.0,12.5,100)) table(valuecat) valuecat (-1,2.5] (2.5,5] (5,7.5] (7.5,10] (10,12.5] (12.5,100] car <- cbind(car,valuecat) model2 <- glm(clm ~ factor(valuecat), family=binomial, data=na.omit(car)) summary(model2) glm(formula = clm ~ factor(valuecat), family = binomial, data = na.omit(car)) Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** factor(valuecat)(2.5,5] e-06 *** factor(valuecat)(5,7.5] factor(valuecat)(7.5,10] factor(valuecat)(10,12.5] factor(valuecat)(12.5,100] (Dispersion parameter for binomial family taken to be 1) Null deviance: on degrees of freedom Residual deviance: on degrees of freedom AIC: Number of Fisher Scoring iterations: 5 Vehicle insurance: full model, adjusted for exposure source("logit-exposure-adjusted.r") attach(car) model3 <- glm(clm ~ C(factor(agecat),base=3)+ C(factor(area),base=3) + + C(factor(veh_body),base=10) + factor(valuecat), family=binomial(logitexp(exposure))) summary(model3) glm(formula = clm ~ C(factor(agecat), base = 3) + C(factor(area), base = 3) + C(factor(veh_body), base = 10) + factor(valuecat), family = binomial(logitexp(exposure))) January 31,
8 Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** C(factor(agecat), base = 3) e-06 *** C(factor(agecat), base = 3) C(factor(agecat), base = 3) C(factor(agecat), base = 3) e-06 *** C(factor(agecat), base = 3) *** C(factor(area), base = 3) C(factor(area), base = 3) C(factor(area), base = 3) * C(factor(area), base = 3) C(factor(area), base = 3) C(factor(veh_body), base = 10) * C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) ** C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) C(factor(veh_body), base = 10) ** factor(valuecat)(2.5,5] e-05 *** factor(valuecat)(5,7.5] factor(valuecat)(7.5,10] factor(valuecat)(10,12.5] factor(valuecat)(12.5,100] (Dispersion parameter for binomial family taken to be 1) Null deviance: on degrees of freedom Residual deviance: on degrees of freedom AIC: Number of Fisher Scoring iterations: 4 Vehicle insurance: logistic regression on grouped data ### grouped data car.group <- read.table("car_grouped.csv",sep=",",header=t) ### the response is a two-column matrix ### the first column is the number of successes (claims) ### the second column is the number of failures (number-claims) model4 <- glm(cbind(claims,number-claims) ~ C(factor(agecat),base=6)+ C(factor(area),base=6) + + C(factor(veh_body),base=13) + factor(valuecat), + family=binomial, data=car.group) summary(model4) glm(formula = cbind(claims, number - claims) ~ C(factor(agecat), base = 6) + C(factor(area), base = 6) + C(factor(veh_body), base = 13) + factor(valuecat), family = binomial, data = car.group) Estimate Std. Error z value Pr( z ) (Intercept) < 2e-16 *** C(factor(agecat), base = 6) e-05 *** C(factor(agecat), base = 6) C(factor(agecat), base = 6) C(factor(agecat), base = 6) e-05 *** C(factor(agecat), base = 6) *** January 31,
9 C(factor(area), base = 6) C(factor(area), base = 6) C(factor(area), base = 6) * C(factor(area), base = 6) C(factor(area), base = 6) C(factor(veh_body), base = 13) ** C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) C(factor(veh_body), base = 13) *** factor(valuecat) *** factor(valuecat) factor(valuecat) factor(valuecat) factor(valuecat) (Dispersion parameter for binomial family taken to be 1) Null deviance: on 928 degrees of freedom Residual deviance: on 901 degrees of freedom AIC: Number of Fisher Scoring iterations: 5 ROC curves and AUC The AUC is easily computed using the somers2 function in the Hmisc package, which needs to be downloaded from the CRAN website. A function ROC for computing and plotting the ROC curve, is given on the book website in file ROC-function.r. car <- read.table("car.csv",sep=",",header=t) valuecat <- cut(car$veh_value, c(-1,2.5,5.0,7.5,10.0,12.5,100)) car <- cbind(car,valuecat) attach(car) The following object(s) are masked _by_.globalenv : valuecat library(hmisc) ### need this for somers2 function to compute AUC Attaching package: Hmisc The following object(s) are masked from package:base : format.pval The following object(s) are masked from package:base : round.posixt The following object(s) are masked from package:base : trunc.posixt Warning message: package Hmisc was built under R version source("roc-function.r") ### from book website; for plotting ROC curve January 31,
10 model5 <- glm(clm ~ C(factor(agecat),base=3)+ C(factor(area),base=3) + + C(factor(veh_body),base=10) + factor(valuecat), family=binomial) ## compute fitted values from logistic regression and store in fittedvalues fittedvalues <- predict(model5, type = response, newdata = car) somers2(fittedvalues,clm) C Dxy n Missing e e e e+00 ROC(fittedvalues,clm) The AUC is given as the element C of the somers2 result, which is Sensitivity specificity 5 Ordinal regression Proportional odds model A few functions for this model are available. We prefer vglm in the VGAM package. The VGAM manual is worth consulting before attempting to implement the next three models. injury <- read.table("injury.csv",sep=",",header=t) attach(injury) library(vgam) Loading required package: splines Loading required package: stats4 Attaching package: VGAM [warnings omitted] ## change base levels to those in the text ## (not necessary, this is just to demonstrate that the solution is the same ## as the SAS solution) road.x <- C(factor(roaduserclass),base=4) age.x <- C(factor(agecat),base=7) sex.x <- C(sex,base=2) model1 <- vglm(degree ~ road.x + age.x + sex.x + age.x*sex.x, cumulative(parallel=true), + weights=number) summary(model1) vglm(formula = degree ~ road.x + age.x + sex.x + age.x * sex.x, family = cumulative(parallel = TRUE), weights = number) January 31,
11 Pearson Residuals: logit(p[y<=1]) logit(p[y<=2]) Value Std. Error t value (Intercept): (Intercept): road.x road.x road.x age.x age.x age.x age.x age.x age.x sex.x age.x1:sex.x age.x2:sex.x age.x3:sex.x age.x4:sex.x age.x5:sex.x age.x6:sex.x Number of linear predictors: 2 Names of linear predictors: logit(p[y<=1]), logit(p[y<=2]) Dispersion Parameter for cumulative family: 1 Residual Deviance: on 400 degrees of freedom Log-likelihood: on 400 degrees of freedom Number of Iterations: 7 Partial proportional odds model We use vglm for this model. parameter. The partial proportional odds are specified via the parallel model2 <- vglm(degree ~ road.x + age.x + sex.x + age.x*sex.x, + cumulative(parallel=true~age.x*sex.x-1), + weights=number) summary(model2) vglm(formula = degree ~ road.x + age.x + sex.x + age.x * sex.x, family = cumulative(parallel = TRUE ~ age.x * sex.x - 1), weights = number) Pearson Residuals: logit(p[y<=1]) logit(p[y<=2]) Value Std. Error t value (Intercept): (Intercept): road.x1: road.x1: road.x2: road.x2: road.x3: road.x3: age.x January 31,
12 age.x age.x age.x age.x age.x sex.x age.x1:sex.x age.x2:sex.x age.x3:sex.x age.x4:sex.x age.x5:sex.x age.x6:sex.x Number of linear predictors: 2 Names of linear predictors: logit(p[y<=1]), logit(p[y<=2]) Dispersion Parameter for cumulative family: 1 Residual Deviance: on 397 degrees of freedom Log-likelihood: on 397 degrees of freedom Number of Iterations: 7 6 Nominal regression As the private health insurance data are not publicly available, nominal regression is illustrated here on the degree of injury data. The vglm function in the VGAM package is used. injury <- read.table("injury.csv",sep=",",header=t) attach(injury) library(vgam) Loading required package: splines Loading required package: stats4 Attaching package: VGAM [warnings omitted] ## change base levels to those in the text road.x <- C(factor(roaduserclass),base=4) age.x <- C(factor(agecat),base=7) sex.x <- C(sex,base=2) ## nominal regression model model3 <- vglm(degree ~ road.x + age.x + sex.x + age.x*sex.x, + multinomial, weights=number) summary(model3) vglm(formula = degree ~ road.x + age.x + sex.x + age.x * sex.x, family = multinomial, weights = number) Pearson Residuals: log(mu[,1]/mu[,3]) log(mu[,2]/mu[,3]) Value Std. Error t value (Intercept): (Intercept): road.x1: road.x1: road.x2: road.x2: road.x3: road.x3: January 31,
13 age.x1: age.x1: age.x2: age.x2: age.x3: age.x3: age.x4: age.x4: age.x5: age.x5: age.x6: age.x6: sex.x1: sex.x1: age.x1:sex.x1: age.x1:sex.x1: age.x2:sex.x1: age.x2:sex.x1: age.x3:sex.x1: age.x3:sex.x1: age.x4:sex.x1: age.x4:sex.x1: age.x5:sex.x1: age.x5:sex.x1: age.x6:sex.x1: age.x6:sex.x1: Number of linear predictors: 2 Names of linear predictors: log(mu[,1]/mu[,3]), log(mu[,2]/mu[,3]) Dispersion Parameter for multinomial family: 1 Residual Deviance: on 384 degrees of freedom Log-likelihood: on 384 degrees of freedom Number of Iterations: 7 7 Gamma regression Vehicle insurance car <- read.table("car.csv",sep=",",header=t) #### banded vehicle value valuecat <- cut(car$veh_value, c(-1,2.5,5.0,7.5,10.0,12.5,100)) #### create variables with same base levels as in the text age.x <- C(factor(car$agecat),base=3) ## agecat=3 base level area.x <- C(factor(car$area),base=3) ## area C is 3rd level gender.x <- C(factor(car$gender),base=2) ## gender M is 2nd level veh_body.x <- C(factor(car$veh_body),base=10) ## SEDAN is 10th level car <- cbind(car,valuecat, age.x,area.x,gender.x,veh_body.x) model1 <- glm(claimcst0 ~ age.x + gender.x + age.x*gender.x + area.x + veh_body.x, + family=gamma(link="log"),data=subset(car,clm==1)) summary(model1) glm(formula = claimcst0 ~ age.x + gender.x + age.x * gender.x + area.x + veh_body.x, family = Gamma(link = "log"), data = subset(car, clm == 1)) January 31,
14 [output omitted] (Dispersion parameter for Gamma family taken to be ) Null deviance: on 4623 degrees of freedom Residual deviance: on 4595 degrees of freedom AIC: Number of Fisher Scoring iterations: 7 Personal injury insurance, no adjustment for quickly settled claims persinj <- read.table("persinj.csv",sep=",",header=t) model3 <- glm(total ~ op_time + factor(legrep) + op_time*factor(legrep), + family=gamma(link="log"), data=persinj) summary(model3) glm(formula = total ~ op_time + factor(legrep) + op_time * factor(legrep), family = Gamma(link = "log"), data = persinj) Estimate Std. Error t value Pr( t ) (Intercept) < 2e-16 *** op_time < 2e-16 *** factor(legrep) < 2e-16 *** op_time:factor(legrep) e-10 *** (Dispersion parameter for Gamma family taken to be ) Null deviance: on degrees of freedom Residual deviance: on degrees of freedom AIC: Number of Fisher Scoring iterations: 6 Runoff triangle runoff <- read.table("runoff triangle.csv",sep=",",header=t) runoff$y[runoff$y<0] <- 1 ### replace negative value by 1 model4 <- glm(y ~ factor(devyear) + factor(accyear), family=gamma(link="log"), data=runoff) summary(model4) glm(formula = Y ~ factor(devyear) + factor(accyear), family = Gamma(link = "log"), data = runoff) Estimate Std. Error t value Pr( t ) (Intercept) < 2e-16 *** factor(devyear) * factor(devyear) * factor(devyear) factor(devyear) factor(devyear) January 31,
15 factor(devyear) * factor(devyear) * factor(devyear) ** factor(devyear) ** factor(accyear) factor(accyear) factor(accyear) factor(accyear) factor(accyear) factor(accyear) factor(accyear) factor(accyear) factor(accyear) (Dispersion parameter for Gamma family taken to be ) Null deviance: on 54 degrees of freedom Residual deviance: on 36 degrees of freedom AIC: Number of Fisher Scoring iterations: 11 8 Inverse Gaussian regression The data frame car used here is the one created for the vehicle insurance, Gamma regression model. model2 <- glm(claimcst0 ~ age.x + gender.x + area.x, + family=inverse.gaussian(link="log"),data=subset(car,clm==1)) summary(model2) glm(formula = claimcst0 ~ age.x + gender.x + area.x, family = inverse.gaussian(link = "log"), data = subset(car, clm == 1)) Estimate Std. Error t value Pr( t ) (Intercept) < 2e-16 *** age.x * age.x age.x age.x age.x gender.x ** area.x area.x area.x area.x area.x * (Dispersion parameter for inverse.gaussian family taken to be ) Null deviance: on 4623 degrees of freedom Residual deviance: on 4612 degrees of freedom AIC: Number of Fisher Scoring iterations: 11 January 31,
16 9 Logistic regression GLMM The software in this area is developing very rapidly. We use here glmmpql in the MASS package. claimslong <- read.table("claimslong.txt",header=t,sep=",") ## create binary variable for claim/no claim claimslong <- cbind(claimslong,clm=1*(claimslong$numclaims0)) #### create variables with same base levels as in the text, for comparability age.x <- C(factor(claimslong$agecat),base=6) value.x <- C(factor(claimslong$valuecat),base=6) period.x <- C(factor(claimslong$period),base=3) claimslong <- cbind(claimslong,age.x,value.x,period.x) library(mass) model1 <- glmmpql(clm ~ age.x + value.x + period.x, + random=~1 policyid, family=binomial, data=claimslong) Loading required package: nlme iteration 1 iteration 2 iteration 3 iteration 4 iteration 5 iteration 6 iteration 7 iteration 8 summary(model1) Linear mixed-effects model fit by maximum likelihood Data: claimslong AIC BIC loglik NA NA NA Random effects: Formula: ~1 policyid (Intercept) Residual StdDev: Variance function: Structure: fixed weights Formula: ~invwt Fixed effects: clm ~ age.x + value.x + period.x Value Std.Error DF t-value p-value (Intercept) age.x age.x age.x age.x age.x value.x value.x value.x value.x value.x period.x period.x Correlation: [correlation matrix omitted] Standardized Within-Group Residuals: Min Q1 Med Q3 Max Number of Observations: Number of Groups: Parameter estimates are similar to those produced by SAS. They are not identical because proc nlmixed and glmmpql use different methods for finding the maximum likelihood solution. January 31,
17 10 Logistic regression GEE As for GLMMs, software for these models is evolving constantly. We use geeglm in the geepack package, which gives identical parameter estimates to proc genmod. model2 <- geeglm(clm ~ age.x + value.x + period.x, + id=policyid, corstr="exchangeable", family=binomial, data=claimslong) summary(model2) geeglm(formula = clm ~ age.x + value.x + period.x, family = binomial, data = claimslong, id = policyid, corstr = "exchangeable") Estimate Std.err Wald p(w) (Intercept) e e+00 age.x e e-06 age.x e e-01 age.x e e-01 age.x e e-08 age.x e e-04 value.x e e-09 value.x e e-01 value.x e e-02 value.x e e-01 value.x e e-01 period.x e e+00 period.x e e-13 Estimated Scale Parameters: Estimate Std.err (Intercept) Correlation: Structure = exchangeable Link = identity Estimated Correlation Parameters: Estimate Std.err alpha Number of clusters: Maximum cluster size: 3 11 Logistic regression GAM GAMs can be fitted using either the special purpose gam package, or the more general gamlss package. We illustrate the use of both. ######## vehicle insurance data car <- read.table("car.csv",sep=",",header=t) #### banded vehicle value valuecat <- cut(car$veh_value, c(-1,2.5,5.0,7.5,10.0,12.5,100)) #### create variables with same base levels as in the text age.x <- C(factor(car$agecat),base=3) ## agecat=3 base level area.x <- C(factor(car$area),base=3) ## area C is 3rd level gender.x <- C(factor(car$gender),base=2) ## gender M is 2nd level veh_body.x <- C(factor(car$veh_body),base=10) ## SEDAN is 10th level car <- cbind(car,valuecat, age.x,area.x,gender.x,veh_body.x) ### use gam in gam package: library(gam) Loading required package: splines model1 <- gam(clm ~ age.x + area.x + veh_body.x + s(veh_value), + family=binomial, data=car) summary(model1) gam(formula = clm ~ age.x + area.x + veh_body.x + s(veh_value), family = binomial, data = car) January 31,
18 (Dispersion Parameter for binomial family taken to be 1) Null Deviance: on degrees of freedom Residual Deviance: on degrees of freedom AIC: Number of Local Scoring Iterations: 7 DF for Terms and Chi-squares for Nonparametric Effects Df Npar Df Npar Chisq P(Chi) (Intercept) 1 age.x 5 area.x 5 veh_body.x 12 s(veh_value) e-06 par(mfrow=c(2,2)) plot(model1) A B C D F partial for age.x partial for area.x age.x area.x BUS HDTOP STNWG partial for veh_body.x s(veh_value) veh_body.x veh_value The highly nonlinear effect of vehicle value, with a peak around 4 ($40 000), is seen clearly. The gamlss implementation gives parameter estimates for the parametric explanatory variables, which are similar to those given by proc gam. ### use gamlss: library(gamlss) Loading required package: splines ********** GAMLSS Version ********** For more on GAMLSS look at Type gamlssnews() to see new features/changes/bug fixes. January 31,
19 model2 <- gamlss(clm ~ age.x + area.x + veh_body.x + cs(veh_value), + family=bi, data=car) GAMLSS-RS iteration 1: Global Deviance = GAMLSS-RS iteration 2: Global Deviance = summary(model2) ******************************************************************* Family: c("bi", "Binomial") gamlss(formula = clm ~ age.x + area.x + veh_body.x + cs(veh_value), family = BI, data = car) Fitting method: RS() - Mu link function: logit Mu Estimate Std. Error t value Pr( t ) (Intercept) e+00 age.x e-05 age.x e-01 age.x e-01 age.x e-05 age.x e-04 area.x e-01 area.x e-01 area.x e-02 area.x e-01 area.x e-01 veh_body.x e-03 veh_body.x e-01 veh_body.x e-02 veh_body.x e-01 veh_body.x e-01 veh_body.x e-02 veh_body.x e-01 veh_body.x e-01 veh_body.x e-01 veh_body.x e-01 veh_body.x e-01 veh_body.x e-04 cs(veh_value) e-07 - No. of observations in the fit: Degrees of Freedom for the fit: Residual Deg. of Freedom: at cycle: 2 Global Deviance: AIC: SBC: ******************************************************************* Warning message: addive terms exists in the mu formula results maybe are not appropriate in: vcov.gamlss(object, "all") January 31,
Sample solutions. Stat 8051 Homework 8
Sample solutions Stat 8051 Homework 8 Problem 1: Faraway Exercise 3.1 A plot of the time series reveals kind of a fluctuating pattern: Trying to fit poisson regression models yields a quadratic model if
More informationCh 6: Multicategory Logit Models
293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationReview of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models
Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses
More informationNon-Gaussian Response Variables
Non-Gaussian Response Variables What is the Generalized Model Doing? The fixed effects are like the factors in a traditional analysis of variance or linear model The random effects are different A generalized
More informationLogistic Regression 21/05
Logistic Regression 21/05 Recall that we are trying to solve a classification problem in which features x i can be continuous or discrete (coded as 0/1) and the response y is discrete (0/1). Logistic regression
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationRegression models. Generalized linear models in R. Normal regression models are not always appropriate. Generalized linear models. Examples.
Regression models Generalized linear models in R Dr Peter K Dunn http://www.usq.edu.au Department of Mathematics and Computing University of Southern Queensland ASC, July 00 The usual linear regression
More informationExercise 5.4 Solution
Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia
More informationGeneralized Additive Models
Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions
More informationPoisson Regression. The Training Data
The Training Data Poisson Regression Office workers at a large insurance company are randomly assigned to one of 3 computer use training programmes, and their number of calls to IT support during the following
More informationGeneralised linear models. Response variable can take a number of different formats
Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion
More informationR Output for Linear Models using functions lm(), gls() & glm()
LM 04 lm(), gls() &glm() 1 R Output for Linear Models using functions lm(), gls() & glm() Different kinds of output related to linear models can be obtained in R using function lm() {stats} in the base
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationTento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/
Tento projekt je spolufinancován Evropským sociálním fondem a Státním rozpočtem ČR InoBio CZ.1.07/2.2.00/28.0018 Statistical Analysis in Ecology using R Linear Models/GLM Ing. Daniel Volařík, Ph.D. 13.
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More informationOverdispersion Workshop in generalized linear models Uppsala, June 11-12, Outline. Overdispersion
Biostokastikum Overdispersion is not uncommon in practice. In fact, some would maintain that overdispersion is the norm in practice and nominal dispersion the exception McCullagh and Nelder (1989) Overdispersion
More informationModeling Overdispersion
James H. Steiger Department of Psychology and Human Development Vanderbilt University Regression Modeling, 2009 1 Introduction 2 Introduction In this lecture we discuss the problem of overdispersion in
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationMcGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper
Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationPoisson Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Poisson Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Poisson Regression 1 / 49 Poisson Regression 1 Introduction
More informationYou can specify the response in the form of a single variable or in the form of a ratio of two variables denoted events/trials.
The GENMOD Procedure MODEL Statement MODEL response = < effects > < /options > ; MODEL events/trials = < effects > < /options > ; You can specify the response in the form of a single variable or in the
More informationcor(dataset$measurement1, dataset$measurement2, method= pearson ) cor.test(datavector1, datavector2, method= pearson )
Tutorial 7: Correlation and Regression Correlation Used to test whether two variables are linearly associated. A correlation coefficient (r) indicates the strength and direction of the association. A correlation
More informationMatched Pair Data. Stat 557 Heike Hofmann
Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic
More information7/28/15. Review Homework. Overview. Lecture 6: Logistic Regression Analysis
Lecture 6: Logistic Regression Analysis Christopher S. Hollenbeak, PhD Jane R. Schubart, PhD The Outcomes Research Toolbox Review Homework 2 Overview Logistic regression model conceptually Logistic regression
More informationReview. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis
Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationSTAT 526 Advanced Statistical Methodology
STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 10 Analyzing Clustered/Repeated Categorical Data 0-0 Outline Clustered/Repeated Categorical Data Generalized Linear Mixed Models Generalized
More informationLogistic Regression - problem 6.14
Logistic Regression - problem 6.14 Let x 1, x 2,, x m be given values of an input variable x and let Y 1,, Y m be independent binomial random variables whose distributions depend on the corresponding values
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationAge 55 (x = 1) Age < 55 (x = 0)
Logistic Regression with a Single Dichotomous Predictor EXAMPLE: Consider the data in the file CHDcsv Instead of examining the relationship between the continuous variable age and the presence or absence
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationLogistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction
More informationInteractions in Logistic Regression
Interactions in Logistic Regression > # UCBAdmissions is a 3-D table: Gender by Dept by Admit > # Same data in another format: > # One col for Yes counts, another for No counts. > Berkeley = read.table("http://www.utstat.toronto.edu/~brunner/312f12/
More informationGeneralized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model
Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example
More informationContrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:
Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.
More informationssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm
Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.
More informationGeneralized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.
Texts in Statistical Science Generalized Linear Mixed Models Modern Concepts, Methods and Applications Walter W. Stroup CRC Press Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More informationMODULE 6 LOGISTIC REGRESSION. Module Objectives:
MODULE 6 LOGISTIC REGRESSION Module Objectives: 1. 147 6.1. LOGIT TRANSFORMATION MODULE 6. LOGISTIC REGRESSION Logistic regression models are used when a researcher is investigating the relationship between
More informationA Handbook of Statistical Analyses Using R. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R Brian S. Everitt and Torsten Hothorn CHAPTER 6 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, and Colonic Polyps
More informationUnit 5 Logistic Regression Practice Problems
Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises
More informationStatistical Methods III Statistics 212. Problem Set 2 - Answer Key
Statistical Methods III Statistics 212 Problem Set 2 - Answer Key 1. (Analysis to be turned in and discussed on Tuesday, April 24th) The data for this problem are taken from long-term followup of 1423
More informationGeneralized Linear Models
York SPIDA John Fox Notes Generalized Linear Models Copyright 2010 by John Fox Generalized Linear Models 1 1. Topics I The structure of generalized linear models I Poisson and other generalized linear
More informationLecture 9 STK3100/4100
Lecture 9 STK3100/4100 27. October 2014 Plan for lecture: 1. Linear mixed models cont. Models accounting for time dependencies (Ch. 6.1) 2. Generalized linear mixed models (GLMM, Ch. 13.1-13.3) Examples
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationR Hints for Chapter 10
R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.
More informationDuration of Unemployment - Analysis of Deviance Table for Nested Models
Duration of Unemployment - Analysis of Deviance Table for Nested Models February 8, 2012 The data unemployment is included as a contingency table. The response is the duration of unemployment, gender and
More informationModel Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 18.1 Logistic Regression (Dose - Response)
Model Based Statistics in Biology. Part V. The Generalized Linear Model. Logistic Regression ( - Response) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch 9, 10, 11), Part IV
More informationRegression Methods for Survey Data
Regression Methods for Survey Data Professor Ron Fricker! Naval Postgraduate School! Monterey, California! 3/26/13 Reading:! Lohr chapter 11! 1 Goals for this Lecture! Linear regression! Review of linear
More informationIntroduction to the Generalized Linear Model: Logistic regression and Poisson regression
Introduction to the Generalized Linear Model: Logistic regression and Poisson regression Statistical modelling: Theory and practice Gilles Guillot gigu@dtu.dk November 4, 2013 Gilles Guillot (gigu@dtu.dk)
More informationPAPER 206 APPLIED STATISTICS
MATHEMATICAL TRIPOS Part III Thursday, 1 June, 2017 9:00 am to 12:00 pm PAPER 206 APPLIED STATISTICS Attempt no more than FOUR questions. There are SIX questions in total. The questions carry equal weight.
More informationGeneralized Linear Models
Generalized Linear Models 1/37 The Kelp Data FRONDS 0 20 40 60 20 40 60 80 100 HLD_DIAM FRONDS are a count variable, cannot be < 0 2/37 Nonlinear Fits! FRONDS 0 20 40 60 log NLS 20 40 60 80 100 HLD_DIAM
More informationChapter 3: Generalized Linear Models
92 Chapter 3: Generalized Linear Models 3.1 Components of a GLM 1. Random Component Identify response variable Y. Assume independent observations y 1,...,y n from particular family of distributions, e.g.,
More informationModels for Binary Outcomes
Models for Binary Outcomes Introduction The simple or binary response (for example, success or failure) analysis models the relationship between a binary response variable and one or more explanatory variables.
More informationThe GENMOD Procedure. Overview. Getting Started. Syntax. Details. Examples. References. SAS/STAT User's Guide. Book Contents Previous Next
Book Contents Previous Next SAS/STAT User's Guide Overview Getting Started Syntax Details Examples References Book Contents Previous Next Top http://v8doc.sas.com/sashtml/stat/chap29/index.htm29/10/2004
More informationGeneralized Linear Models
Generalized Linear Models Methods@Manchester Summer School Manchester University July 2 6, 2018 Generalized Linear Models: a generic approach to statistical modelling www.research-training.net/manchester2018
More informationNotes for week 4 (part 2)
Notes for week 4 (part 2) Ben Bolker October 3, 2013 Licensed under the Creative Commons attribution-noncommercial license (http: //creativecommons.org/licenses/by-nc/3.0/). Please share & remix noncommercially,
More informationSolutions to obligatorisk oppgave 2, STK2100
Solutions to obligatorisk oppgave 2, STK2100 Vinnie Ko May 14, 2018 Disclaimer: This document is made solely for my own personal use and can contain many errors. Oppgave 1 We load packages and read data
More informationSTAT 510 Final Exam Spring 2015
STAT 510 Final Exam Spring 2015 Instructions: The is a closed-notes, closed-book exam No calculator or electronic device of any kind may be used Use nothing but a pen or pencil Please write your name and
More informationA Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn
A Handbook of Statistical Analyses Using R 2nd Edition Brian S. Everitt and Torsten Hothorn CHAPTER 7 Logistic Regression and Generalised Linear Models: Blood Screening, Women s Role in Society, Colonic
More informationAnalysis of binary repeated measures data with R
Analysis of binary repeated measures data with R Right-handed basketball players take right and left-handed shots from 3 locations in a different random order for each player. Hit or miss is recorded.
More informationCOMPLEMENTARY LOG-LOG MODEL
COMPLEMENTARY LOG-LOG MODEL Under the assumption of binary response, there are two alternatives to logit model: probit model and complementary-log-log model. They all follow the same form π ( x) =Φ ( α
More informationEPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7
Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review
More informationHierarchical Generalized Linear Models. ERSH 8990 REMS Seminar on HLM Last Lecture!
Hierarchical Generalized Linear Models ERSH 8990 REMS Seminar on HLM Last Lecture! Hierarchical Generalized Linear Models Introduction to generalized models Models for binary outcomes Interpreting parameter
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationExplanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.
Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present
More informationTruck prices - linear model? Truck prices - log transform of the response variable. Interpreting models with log transformation
Background Regression so far... Lecture 23 - Sta 111 Colin Rundel June 17, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical or categorical
More informationZERO INFLATED POISSON REGRESSION
STAT 6500 ZERO INFLATED POISSON REGRESSION FINAL PROJECT DEC 6 th, 2013 SUN JEON DEPARTMENT OF SOCIOLOGY UTAH STATE UNIVERSITY POISSON REGRESSION REVIEW INTRODUCING - ZERO-INFLATED POISSON REGRESSION SAS
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationESTIMATE PROP. IMPAIRED PRE- AND POST-INTERVENTION FOR THIN LIQUID SWALLOW TASKS. The SURVEYFREQ Procedure
ESTIMATE PROP. IMPAIRED PRE- AND POST-INTERVENTION FOR THIN LIQUID SWALLOW TASKS 18:58 Sunday, July 26, 2015 1 The SURVEYFREQ Procedure Data Summary Number of Clusters 30 Number of Observations 360 time_cat
More informationSTA 450/4000 S: January
STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have
More informationBooklet of Code and Output for STAD29/STA 1007 Midterm Exam
Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 NBA attendance data........................ 2 2 Regression model for NBA attendances...............
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationIntroduction to logistic regression
Introduction to logistic regression Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia What we are going to learn
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationDISPLAYING THE POISSON REGRESSION ANALYSIS
Chapter 17 Poisson Regression Chapter Table of Contents DISPLAYING THE POISSON REGRESSION ANALYSIS...264 ModelInformation...269 SummaryofFit...269 AnalysisofDeviance...269 TypeIII(Wald)Tests...269 MODIFYING
More informationLinear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model
Regression: Part II Linear Regression y~n X, 2 X Y Data Model β, σ 2 Process Model Β 0,V β s 1,s 2 Parameter Model Assumptions of Linear Model Homoskedasticity No error in X variables Error in Y variables
More informationChapter 1. Modeling Basics
Chapter 1. Modeling Basics What is a model? Model equation and probability distribution Types of model effects Writing models in matrix form Summary 1 What is a statistical model? A model is a mathematical
More informationMixed models in R using the lme4 package Part 7: Generalized linear mixed models
Mixed models in R using the lme4 package Part 7: Generalized linear mixed models Douglas Bates University of Wisconsin - Madison and R Development Core Team University of
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationRecap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:
1 / 23 Recap HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis: Pr(G = k X) Pr(X G = k)pr(g = k) Theory: LDA more
More informationRegression so far... Lecture 21 - Logistic Regression. Odds. Recap of what you should know how to do... At this point we have covered: Sta102 / BME102
Background Regression so far... Lecture 21 - Sta102 / BME102 Colin Rundel November 18, 2014 At this point we have covered: Simple linear regression Relationship between numerical response and a numerical
More informationVarious Issues in Fitting Contingency Tables
Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationBooklet of Code and Output for STAD29/STA 1007 Midterm Exam
Booklet of Code and Output for STAD29/STA 1007 Midterm Exam List of Figures in this document by page: List of Figures 1 Packages................................ 2 2 Hospital infection risk data (some).................
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationTable of Contents. Logistic Regression- Illustration Carol Bigelow March 21, 2017
Logistic Regression- Illustration Carol Bigelow March 21, 2017 Table of Contents Preliminary - Attach packages needed using command library( )... 2 Must have installed packages in console window first...
More informationESP 178 Applied Research Methods. 2/23: Quantitative Analysis
ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and
More informationNATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3
More informationChapter 22: Log-linear regression for Poisson counts
Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure
More informationReaction Days
Stat April 03 Week Fitting Individual Trajectories # Straight-line, constant rate of change fit > sdat = subset(sleepstudy, Subject == "37") > sdat Reaction Days Subject > lm.sdat = lm(reaction ~ Days)
More informationGeneralized Linear Modeling - Logistic Regression
1 Generalized Linear Modeling - Logistic Regression Binary outcomes The logit and inverse logit interpreting coefficients and odds ratios Maximum likelihood estimation Problem of separation Evaluating
More information