Regression modeling for categorical data. Part II : Model selection and prediction

Size: px
Start display at page:

Download "Regression modeling for categorical data. Part II : Model selection and prediction"

Transcription

1 Regression modeling for categorical data Part II : Model selection and prediction David Causeur Agrocampus Ouest IRMAR CNRS UMR

2 Plan du cours 1 Interpreting model fit 2 Model comparison 3 Subset selection 4 Prediction 5 Cross-validation

3 Definition Individual departures from the fit The deviance residuals ε i ( ˆβ) are defined as follows: ε i ( ˆβ) = y i 2 log (1 + exp ( y i ( ˆβ 0 + ˆβ 1 x i ˆβ p x ip ) )) They are individual contributions to the residual deviance: D x,y ( ˆβ) = n ε 2 i ( ˆβ) i=1 For large n, ε i ( ˆβ) N (0, 1).

4 Individual departures from the fit R script > epsilon = residuals(maturity.logit,type="deviance") # Extracts the deviance residuals > outlier = which(abs(epsilon)>2) # Selects the indices of the largest epsilons > pi = predict(maturity.logit,type="response") # Calculates the estimated pi=p(y=+1) > cbind(scale(dta.12[,1:5]),dta.12[,6:7],pi)[outlier,] # Displays external information L a b Weight Diam Maturity Variety pi go mo bl bl

5 Plotting the model fit Model for maturity from variety and index a: logit π 1 (x) = µ + βx, (for variety 37 ) logit π 2 (x) = µ + α 2 + (β + γ 2 )x, (for variety bl ) logit π 3 (x) = µ + α 3 + (β + γ 3 )x, (for variety go ) logit π 4 (x) = µ + α 4 + (β + γ 4 )x, (for variety mo ) Correspondence with the estimated values: Parameter Estimation (Intercept) µ Varietybl α Varietygo α Varietymo α a β Varietybl:a γ Varietygo:a γ Varietymo:a γ

6 Plotting the model fit > vec.a = seq(from=min(dta.12$a),to=max(dta.12$a),by=0.01) > # vec.a is a high resolution sequence of a values > varieties = levels(dta.12$variety) # Vector of Variety levels > pi = matrix(0,nrow=length(vec.a),ncol=4) > # pi will be used to store the estimated P(Y=+1) > # One row of pi for each value in vec.a, one column for each variety R script > for (j in 1:4) + pi[,j] = predict(maturity.logit,type="response", + newdata=data.frame(variety=varieties[j],a=vec.a)) # Estimated P(Y=+1) in matrix pi > matplot(vec.a,pi,type= l,lwd=2,lty=1,xlab="a",ylab=expression(pi), + main= Maturity along index a ) # Plots the 4 probability curves > legend("bottomright",lwd=2,lty=1,col=1:4, + legend=c( 37, bl, go, mo ),bty="n") # adds a legend to the plot

7 Plotting the model fit Maturity along index a π bl go mo a

8 Confidence intervals for the regression parameters Asymptotic distribution of the ML estimator of β The ML estimator ˆβ of β is approximately normally distributed, for a large n, with mean β and variance matrix V ˆβ = (X VX) 1

9 Confidence intervals for the regression parameters R script > X = model.matrix( Variety*a,data=dta.12) # Extracts the design matrix of the model > pi = predict(maturity.logit,type="response") # Fitted P(Y=+1) > V = diag(pi*(1-pi)) # Diagonal matrix which diagonal entries are pi*(1-pi) > Var.beta = solve(t(x)%*%v%*%x) # Asymptotic variance of the ML estimator > sqrt(diag(var.beta)) # Estimated standard deviations of the regression parameters (Intercept) Varietybl Varietygo Varietymo a Varietybl:a Varietygo:a Varietymo:a

10 Confidence intervals for the regression parameters Asymptotic distribution of the ML estimator of β The ML estimator ˆβ of β is approximately normally distributed, for a large n, with mean β and variance matrix V ˆβ = (X VX) 1 Confidence interval CI 1 α (β j ) with confidence level 1 α of β j : CI 1 α (β j ) = [ ˆβ j z 1 α 2 ˆσ ˆβ j ; ˆβ j + z 1 α 2 ˆσ ˆβ j ], where z 1 α = F 1 2 0,1 (1 α/2) is the (1 α/2)-quantile of the standard normal distribution.

11 Confidence intervals for the regression parameters > ci = cbind(estimate=coef(maturity.logit),confint.default(maturity.logit,level=0.95)) > ci Estimate 2.5 % 97.5 % (Intercept) Varietybl Varietygo Varietymo a Varietybl:a Varietygo:a Varietymo:a R script

12 Confidence intervals for the regression parameters > exp(ci) # Confidence intervals for odds-ratio Estimate 2.5 % 97.5 % (Intercept) e e e+00 Varietybl e e e+04 Varietygo e e e-01 Varietymo e e e+09 a e e e+00 Varietybl:a e e e+01 Varietygo:a e e e+00 Varietymo:a e e e+01 R script

13 Confidence intervals for the regression parameters R script > pi = lwr = upr = matrix(0,nrow=length(vec.a),ncol=4) > # pi stores the estimated P(Y=+1) : one row of pi for each value in vec.a > # One column for each variety. Same for lwr (lower bound) and upr (upper bound) > for (j in 1:4) { + predictions = predict(maturity.logit,type="response", + newdata=data.frame(variety=varieties[j],a=vec.a),se.fit=true) + pi[,j] = predictions$fit + lwr[,j] = predictions$fit-1.96*predictions$se.fit + lwr[lwr[,j]<0,j] = 0 # Lower bound of C.I. >= 0 + upr[,j] = predictions$fit+1.96*predictions$se.fit + upr[upr[,j]>1,j] = 1 # Upper bound of C.I. <= 1 + }

14 Confidence intervals for the regression parameters > par(mfrow=c(2,2)) # Splits the graphics in a 2x2 grid > color = rgb(red=0,green=0,blue=0.9,alpha=0.5) # Code for a transparent blue > for (j in 1:4) { + plot(vec.a,pi[,j],type= l,lwd=2,lty=1,xlab="a",ylim=c(0,1), + ylab=expression(pi),main= Maturity along index a ) + polygon(c(vec.a,rev(vec.a)),c(lwr[,j],rev(upr[,j])),col=color) + # adds a shaded confidence region around the curve + lines(vec.a,pi[,j],lwd=2) + mtext(paste("variety",varieties[j])) + } > par(mfrow=c(1,1)) # Restores the 1x1 organization for the next graphics device R script

15 Confidence intervals for the regression parameters Maturity along index a Maturity along index a π Variety 37 π Variety bl a a Maturity along index a Maturity along index a π Variety go π Variety mo a a

16 Wald tests Based on the asymptotic normality of ˆβ: Z βj = ˆβ j ˆσ ˆβ j. is a Student-like test statistics for the test of H (j) 0 : β j = 0.

17 Wald tests > maturity.logit = glm(maturity Variety*a,data=dta.12,family=binomial) > summary(maturity.logit) R script Call: glm(formula = Maturity Variety * a, family = binomial, data = dta.12) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) Varietybl e-05 Varietygo Varietymo a e-06 Varietybl:a Varietygo:a Varietymo:a

18 Assessment of the fit Residual deviance D x,y ( ˆβ): lowest possible deviance among all the possible fits of the model... among those possible fits, the null model.

19 Assessment of the fit Residual deviance D x,y ( ˆβ): lowest possible deviance among all the possible fits of the model... among those possible fits, the null model. > maturity.logit = glm(maturity Variety*a,data=dta.12,family=binomial) > summary(maturity.logit) R script Call: glm(formula = Maturity Variety * a, family = binomial, data = dta.12) Null deviance: Residual deviance: AIC: on 306 degrees of freedom on 299 degrees of freedom

20 Assessment of the fit Residual deviance D x,y ( ˆβ): lowest possible deviance among all the possible fits of the model... among those possible fits, the null model. Model comparison has to account for the complexity of the model. Definition The Akaike Information Criterion AIC x,y ( ˆβ) is given by: AIC x,y ( ˆβ) = D x,y ( ˆβ) + 2(p + 1). AIC x,y ( ˆβ) estimates the information loss when using the model estimated with ˆβ rather than the unknown model that is supposed to generate the data.

21 Assessment of the fit Residual deviance D x,y ( ˆβ): lowest possible deviance among all the possible fits of the model... among those possible fits, the null model. Bayesian Information Criterion (BIC): BIC x,y ( ˆβ) = D x,y ( ˆβ) + ln(n)(p + 1), to measure the information loss when using the estimated model rather than the true model in the scope of parametric models considered

22 Assessment of the fit Residual deviance D x,y ( ˆβ): lowest possible deviance among all the possible fits of the model... among those possible fits, the null model. AIC or BIC? If the goal is to make a prediction rule, AIC is recommended. If the goal is just to fit the model, BIC shall be favored. The goodness-of-fit of a model is more penalized by its complexity when it is evaluated by BIC than AIC.

23 Plan du cours 1 Interpreting model fit 2 Model comparison 3 Subset selection 4 Prediction 5 Cross-validation

24 Significance of an effect Illustration by the maturity study. Suppose we aim at testing a Variety index a interaction effect. It consists in comparing the full model M full logit π i (x) = µ + α i + (β + γ i )x. to one of its possible submodel M sub obtained by setting the γs to zero: logit π i (x) = µ + α i + βx

25 Significance of an effect Illustration by the maturity study. Suppose we aim at testing a Variety index a interaction effect. Testing for the significance of the interaction effect is stated as: { H0 : There is no interaction effect, H 1 : There is an interaction effect or, in an equivalent model comparison perspective: { H0 : M full does not explain the maturity better than M sub H 1 : M full does explain the maturity better than M sub.

26 Likelihood-ratio test The residual deviances D sub and D full will be used to compare M sub and M sub. > maturity.full = glm(maturity Variety*a,data=dta.12,family=binomial) > maturity.sub = glm(maturity Variety+a,data=dta.12,family=binomial) > deviance(maturity.full) [1] > deviance(maturity.sub) [1] R script

27 Likelihood-ratio test The residual deviances D sub and D full will be used to compare M sub and M sub. The difference D full/sub = measures the fitting gain obtained by using model M full rather than model M sub. Null distribution: χ 2 3, only depending on the difference between the number of parameters in the two models, here 3 (the γs).

28 Likelihood-ratio test The residual deviances D sub and D full will be used to compare M sub and M sub. > dev.diff = deviance(maturity.sub) - deviance(maturity.full) > pchisq(dev.diff,df=3,lower.tail=false) > # Gives the probability that a chi-square variable exceeds dev.diff [1] e-19 R script

29 Likelihood-ratio test Now summed up in a general framework: Definition (χ 2 analysis of deviance or LRT test) Suppose M full M sub are two nested models. The so-called Likelihood-Ratio Test (LRT) statistics, or analysis of deviance test statistics, of the following hypothesis testing issue: { H0 : M full does not explain the response better than M sub H 1 : M full does explain the response better than M sub, is defined as D full/sub = D sub D full = 2 log l sub l full. Under H 0, D full/sub χ2 k full k sub.

30 Likelihood-ratio test > anova(maturity.sub,maturity.full,test="chisq") Analysis of Deviance Table Model 1: Maturity Variety + a Model 2: Maturity Variety * a Resid. Df Resid. Dev Df Deviance Pr(>Chi) < 2.2e-16 R script

31 Analysis of Deviance Table Complete analysis of deviance table: > Anova(maturity.logit) Analysis of Deviance Table Response: Maturity LR Chisq Df Pr(>Chisq) Variety < 2.2e-16 a < 2.2e-16 Variety:a < 2.2e-16 R script

32 Analysis of Deviance Table 1st row, the main effect of Variety is tested: { H0 : logit π i (x) = µ + βx H 1 : logit π i (x) = µ + α i + βx. 2nd row, the main effect of the a index is tested similarly. 3rd row, the test for the interaction effect is handled differently: { H0 : logit π i (x) = µ + α i + βx H 1 : logit π i (x) = µ + α i + (β + γ i )x.

33 Analysis of Deviance Table The Wald test for the significance of a single parameter is equivalent to the Student test in the usual linear model. Similarly, the LRT corresponds to the Fisher test for analysis of variance. When an effect is just measured by one single parameter, in the usual linear model, the t-test statistics is just the signed square-root of the F-test (their p-value are exactly the same). in the logistic linear model, this coherence between the Wald and the LRT test no longer holds.

34 Definition Profile likelihood confidence intervals The profile likelihood confidence interval for β j, with confidence level 1 α, is the interval of values b j such that the LRT of H 0 : β j = b j at level α does not conclude to the rejection of the null.

35 Profile likelihood confidence intervals > confint(maturity.logit,level=0.95) Waiting for profiling to be done % 97.5 % (Intercept) Varietybl Varietygo Varietymo a Varietybl:a Varietygo:a Varietymo:a R script

36 Detailing a significant group effect Once the effect of a factor is significant, what levels shall be pointed out as different? Post-hoc tests: I(I 1)/2 pairwise comparisons of the effect parameters. In the maturity study, for the interaction effect, 6 tests of H (ii ) 0 : γ i = γ i, for 1 i < i 4. The probability of one or more erroneous rejections of any of the null hypotheses H (ii ) 0 : 1 P H (ii ) 0 all true (H(12) 0 is not rejected,..., H ((I 1),I) 0 is not rejected) = 1 P H (12) (H (12) 0 = 1 (1 α) I(I 1)/2 0 is not rejected)... P H ((I 1)I) 0 (H ((I 1)I) 0 is not rejected)

37 Detailing a significant group effect Once the effect of a factor is significant, what levels shall be pointed out as different? Post-hoc tests: I(I 1)/2 pairwise comparisons of the effect parameters. > 1-(1-0.05) ˆ 6 [1] R script If α = 0.05, then the probability of one or more erroneous declarations that two γ i are significantly different is 0.26!

38 Detailing a significant group effect Once the effect of a factor is significant, what levels shall be pointed out as different? Post-hoc tests: I(I 1)/2 pairwise comparisons of the effect parameters. > alpha = 1-(1-0.05) ˆ (1/6) > alpha [1] > 1-(1-alpha) ˆ 6 [1] 0.05 R script

39 Detailing a significant group effect R script # Initialization of empty matrices of confidence bounds for the pairwise differences > estimate = lower = upper = matrix(0,nrow=4,ncol=4) > varieties = levels(dta.12$variety) # Extract variety names # Sets names to matrices upper and lower > rownames(estimate) = rownames(upper) = rownames(lower) = varieties > colnames(estimate) = colnames(upper) = colnames(lower) = varieties > upper 37 bl go mo bl go mo

40 Detailing a significant group effect > for (j in 1:4) { + tmp = dta.12 # Temporary dataset similar to dta.12 + tmp$variety = relevel(dta.12$variety,varieties[j]) + tmp.logit = glm(formula(maturity.logit),family=binomial,data=tmp) + estimate[j,-j] = coef(tmp.logit)[6:8] + ci = confint(tmp.logit,level=1-alpha,parm=6:8) + upper[j,-j] = ci[,2] # Feeds the jth row of matrix upper + lower[j,-j] = ci[,1] # Feeds the jth row of matrix lower + } R script

41 Detailing a significant group effect > ci = data.frame(estimate=estimate[col(estimate)>row(estimate)], + lower=lower[col(lower)>row(lower)],upper=upper[col(upper)>row(upper)]) > # Creates a 6 x 3 matrix with all the pairwise combinations in rows > colnames(ci) = c("estimate","2.5%","97.5%") R script > cilabs = outer(varieties,varieties,paste,sep="-") > # Creates names for combinations with all pairwise combinations of variety labels > rownames(ci) = cilabs[col(cilabs)>row(cilabs)] > ci Estimate 2.5% 97.5% 37-bl go bl-go mo bl-mo go-mo

42 Detailing a significant group effect Once the effect of a factor is significant, what levels shall be pointed out as different? Post-hoc tests: I(I 1)/2 pairwise comparisons of the effect parameters. Finally: the slopes of the probability curves are significantly different for all pairwise comparisons except for the comparison between varieties bl and mo.

43 Model for preference data Exercise We aim at studying the effect of age on the preference of women for a special type of perfume, denoted G2 or another one, denoted G3. It is supected that the way the preference is affected by age may depend on the consumer habit, especially the frequency of use of a perfume. Results of a consumer study are provided in file parfums.txt, in order to address the former issue. Propose and fit an appropriate model for the above issue. Is the effect of age preference different according to the frequency of use of a perfume?

44 Model for colouring of fat Exercise Experimental study to investigate the causes of yellow fat in lamb meat. The experiment focuses on two possible causes, the feeding and housing modes, in a balanced design: 20 lambs per possible association of two feeding modes (1: two meals a day or 2: ad libitum) and two housing modes (1: individual and 2: collective) Feeding mode Housing mode Numbers of lambs Total number with coloured fat of lambs Feeding Housing Coloured Total F 1 H F 1 H F 2 H F 2 H Propose and fit an appropriate model for the above issue. Is the effect of feeding mode different according to the housing mode? Give the odds-ratio of the feeding mode.

45 Plan du cours 1 Interpreting model fit 2 Model comparison 3 Subset selection 4 Prediction 5 Cross-validation

46 Subset selection Let us now model the maturity class of an apricot by L, a, b, Diam and Weight. Subset selection issue: which subset of those 5 x s is sufficient to explain the maturity status? Handled by comparison of the 2 5 = 32 submodels The submodels are not all nested: LRT is not appropriate For a number k of x s: the submodel M k with lowest residual deviance Dk is the champion The overall champion M is the M k with lowest BIC (or AIC)

47 Search algorithm If p explanatory variables, then 2 p submodels... can be huge! Stepwise search algorithms - forward starts from M 0 : First step: fit the p models with only one x and keep model M 1 with lowest BIC. kth step: 2 variants forward stepwise: fit the p k + 1 models obtained by adding one x to M k 1 and keep the model with lowest BIC. forward/backward stepwise: fit the p models obtained by adding or removing one x to/from M k 1 and keep the model with lowest BIC. Stop when adding an x increases BIC. In forward/backward, at most p 2 submodels are fitted.

48 Search algorithm If p explanatory variables, then 2 p submodels... can be huge! R script > p=50 # For example, p=50 candidate variables > p ˆ 2 # Maximum number of model fits in a stepwise search [1] 2500 > 2 ˆ p # Number of possible submodels [1] e+15 > p ˆ 2/2 ˆ p # Proportion of submodels explored in a stepwise search [1] e-12

49 Search algorithm If p explanatory variables, then 2 p submodels... can be huge! forward or backward? backward tends to keep more variables in the selection. when p n or p > n, then estimation of the full model is not reliable (or just not possible). forward or forward/backward? forward/backward is greedier than just forward. Both algorithms are computationally equivalent: forward/backward search can be viewed as free improvement.

50 Search algorithm If p explanatory variables, then 2 p submodels... can be huge! > maturity.logit = glm(maturity.,family=binomial,data=dta.12[,-7]) > # Fits the model with all explanatory variables except Variety > stepwise(maturity.logit,direction="forward/backward",criterion="bic") Direction: forward/backward Criterion: BIC Start: AIC= Maturity 1 Df Deviance AIC + a Weight <none> b Diam L R script

51 Search algorithm If p explanatory variables, then 2 p submodels... can be huge! > maturity.logit = glm(maturity.,family=binomial,data=dta.12[,-7]) > # Fits the model with all explanatory variables except Variety > stepwise(maturity.logit,direction="forward/backward",criterion="bic") Direction: forward/backward Criterion: BIC Step: AIC= Maturity a Df Deviance AIC + L Weight Diam b <none> a R script

52 Search algorithm If p explanatory variables, then 2 p submodels... can be huge! > maturity.logit = glm(maturity.,family=binomial,data=dta.12[,-7]) > # Fits the model with all explanatory variables except Variety > stepwise(maturity.logit,direction="forward/backward",criterion="bic") Direction: forward/backward Criterion: BIC Step: AIC= Maturity a + L Df Deviance AIC + b <none> Diam Weight L a R script

53 Search algorithm If p explanatory variables, then 2 p submodels... can be huge! Df Deviance AIC <none> b Diam Weight L a R script Call: glm(formula = Maturity a + L + b, family = binomial, data = dta.12[,-7]) Coefficients: (Intercept) a L b Degrees of Freedom: 306 Total (i.e. Null); 303 Residual Null Deviance: Residual Deviance: AIC: 241.3

54 Search algorithm If p explanatory variables, then 2 p submodels... can be huge! Since p is moderate, an exhaustive search is possible here > maturity.select = bestglm(xy=dta.12[,-7],family=binomial,method="exhaustive") > maturity.select$subsets[,8] = maturity.select$subsets[,8]+log(nrow(dta.12)) > maturity.select$subsets Intercept L a b Weight Diam BIC 0 TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE TRUE * TRUE TRUE TRUE TRUE TRUE TRUE R script

55 Plan du cours 1 Interpreting model fit 2 Model comparison 3 Subset selection 4 Prediction 5 Cross-validation

56 Classification rule In the usual linear model, predicting the unknown value of Y from x = (x 1,..., x p ) is unambiguous: Ŷ = Ê(Y X = x) = ˆβ 0 + ˆβ 1 x ˆβ p x p.

57 Classification rule In the logistic linear model, the prediction issue is less clear Estimation of π(x) = P(Y = 1 X = x)? Assignment of a Y value, either +1 or -1 given x? > maturity.logit = glm(maturity.,family=binomial,data=dta.12[,-7]) > # Using the model to estimate the probability that Maturity = 2 > pi = predict(maturity.logit,type="response") > # Plotting the predictions versus observed maturity status > plot(pi dta.12$maturity,xlab="observed maturity status", + ylab="estimated probability of Maturity=2 ",cex.lab=1.25, + main="predictions versus observed maturity status",cex.axis=1.25, + cex.main=1.25) R script

58 Classification rule Predictions versus observed maturity status Estimated probability of 'Maturity=2' Observed maturity status

59 Definition Classification rule A decision rule aiming at the prediction of the value of a categorical response variable Y from X = (X 1,..., X p ) is named a classification rule. Algorithm describing how to get Ŷ, starting from x. Definition The misclassification probability of an item with explanatory profile x 0 and unknown response value Y 0 is P(Ŷ0 Y 0 X = x 0 ).

60 Bayes classification rule Definition The logistic linear Bayes classification rule is derived as follows: if ˆπ(x 0 ) is the estimated probability that Y 0 = +1, then: { Ŷ0 = +1 if ˆπ(x 0 ) 0.5 Ŷ 0 = 1 if ˆπ(x 0 ) < 0.5

61 Bayes classification rule > # Implementation of the Bayes logistic classification rule > predicted = ifelse(pi>=0.5, 2, 1 ) > # Construction of the confusion matrix > confusion = table(dta.12$maturity,predicted,dnn=list("obs.","pred.")) > confusion Pred. Obs R script

62 Prediction performance The global misclassification rate is often not suited Example: Y is the status, sick (Y = +1) or healthy (Y = 1), of a patient. To be avoided: Ŷ = 1 whereas Y = +1, Less importantly: Ŷ = +1 whereas Y = 1. Definition The sampling items with Ŷ = +1 are said positive and those with Ŷ = 1 are said negative.

63 Prediction performance Probability of a true positive: P(Ŷ P(Ŷ = +1, Y = +1) = +1 Y = +1) =. P(Y = +1) estimated by the sensitivity or true positive rate: sensitivity = Probability of a true negative: { } # i = 1,..., n, Ŷ i = +1, Y i = +1. # {i = 1,..., n, Y i = +1} P(Ŷ P(Ŷ = 1, Y = 1) = 1 Y = 1) =. P(Y = 1) estimated by the specificity or true negative rate: specificity = { } # i = 1,..., n, Ŷ i = 1, Y i = 1. # {i = 1,..., n, Y i = 1}

64 Prediction performance Exercise Give the sensitivity and specificity of the Bayes logistic linear classification rule of the maturity of an apricot by the 3 colorimetric and 2 biometric measurements.

65 A short case study Biostatistical issue: predicting the status, healthy (Y = 1) or sick (Y = +1) of a patient The classification rule is highly sensitive (0.9) and highly specific (0.9). The incidence p = P(Y = +1) is low, say p =

66 A short case study Probability that someone predicted as sick is sick: where P(Y = +1) P(Y = +1 Ŷ = +1) = P(Ŷ = +1 Y = +1) P(Ŷ = +1), = sensitivity incidence P(Ŷ = +1), P(Ŷ = +1) = P(Ŷ = +1 Y = +1)P(Y = +1) + P(Ŷ = +1 Y = 1)P(Y = 1), Hence, = sensitivity incidence + (1 specificity) (1 incidence), = (1 0.9) ( ), = P(Y = +1 Ŷ = +1) = , =

67 A short case study Biostatistical issue: predicting the status, healthy (Y = 1) or sick (Y = +1) of a patient The classification rule is highly sensitive (0.9) and highly specific (0.9). The incidence p = P(Y = +1) is low, say p = Probability that someone predicted as sick is sick: P(Y = +1 Ŷ = +1) = Conlusion: high sensitivity and high specificity do not guarantee a good prediction performance!

68 Precision of a classification rule Probability that a positive is truely +1: P(Y = +1 Ŷ = +1) = P(Ŷ = +1, Y = +1) P(Ŷ = +1). estimated by the precision or Positive Predictive Value (PPV): { } # i = 1,..., n, Ŷ i = +1, Y i = +1 PPV = { }. # i = 1,..., n, Ŷ i = +1 Probability that a negative is truely -1: P(Y = 1 Ŷ = 1) = P(Ŷ = 1, Y = 1) P(Ŷ = 1). estimated by the Negative Predictive Value (NPV): { } # i = 1,..., n, Ŷ i = 1, Y i = 1 NPV = { }. # i = 1,..., n, Ŷ i = 1

69 Precision of a classification rule > perf = rep(0,5) # Creates a 5-vector with only 0 entries > names(perf) = c("nb. pos","sens.","spec.","ppv","npv") > colmargins = colsums(confusion) # Column totals > rowmargins = rowsums(confusion) # Row totals > perf[1] = colmargins[2] # Number of positives > perf[2] = confusion[2,2]/rowmargins[2] # Sensitivity > perf[3] = confusion[1,1]/rowmargins[1] # Specificity > perf[4] = confusion[2,2]/colmargins[2] # PPV > perf[5] = confusion[1,1]/colmargins[1] # NPV > perf Nb. pos Sens. Spec. PPV NPV R script

70 Classification ability of explanatory variables The performance of a logistic classification rule depends on: the relevance of the x s to predict the response Y = ±1, the choice of the threshold on π(x), above which the prediction is Ŷ = +1. In the Bayes classification rule, the former threshold is 0.5. lower threshold leads to larger TPR and FPR larger threshold leads in lower TPR and FPR

71 Classification ability of explanatory variables > # Create a prediction object to be used by function performance > pred = prediction(predictions=pi,labels=dta.12$maturity) > tpr = performance(pred,measure="tpr") # Derive the TPR and the FPR > fpr = performance(pred,measure="fpr") # and the FPR > # Plots TPR and FPR against the threshold > plot(tpr,lwd=2,col="blue",ylab="tpr and FPR",xlab="Threshold") > plot(fpr,lwd=2,col="orange",add=true) > legend("topright",bty="n",lwd=2,col=c("blue","orange"),legend=c("tpr","fpr")) R script

72 Classification ability of explanatory variables TPR and FPR TPR FPR Threshold

73 Classification ability of explanatory variables > # Finds the minimal threshold for which TPR>=0.95 > choice = min(which(tpr@"y.values"[[1]]>=0.95)) > threshold = tpr@"x.values"[[1]][choice] > threshold > tpr@"y.values"[[1]][choice] # Corresponding TPR [1] > fpr@"y.values"[[1]][choice] # Corresponding FPR [1] R script

74 Classification ability of explanatory variables The performance of a logistic classification rule depends on: the relevance of the x s to predict the response Y = ±1, the choice of the threshold on π(x), above which the prediction is Ŷ = +1. In the Bayes classification rule, the former threshold is 0.5. lower threshold leads to larger TPR and FPR larger threshold leads in lower TPR and FPR ROC curve: compromises between sensitivity (True Positive Rate) and specificity (True Negative Rate = 1-False Positive Rate)

75 Classification ability of explanatory variables > # Derive performance criteria > perf = performance(pred,measure="tpr",x.measure="fpr") > plot(perf,lwd=2,col="blue") R script

76 Classification ability of explanatory variables True positive rate False positive rate

77 Classification ability of explanatory variables ROC curve Starts from (0,0) for threshold = 1 (all predictions are -1) Ends at (1,1) for threshold = 0 (all predictions are +1) Measures the prediction ability by comparison with two reference ROC curves: the ideal classifier ROC curve, reaching (0,1); the worst classifier ROC curve, going along the line y = x. Area Under the ROC Curve (AUC): measures the prediction ability of the x s AUC = 1 for the ideal classifier AUC = 0.5 for the worst classifier.

78 Classification ability of explanatory variables > [1] R script

79 Classification ability of explanatory variables > choice = which.min(fpr@"y.values"[[1]] ˆ 2+(1-tpr@"y.values"[[1]]) ˆ 2) > threshold = tpr@"x.values"[[1]][choice] > threshold > tpr@"y.values"[[1]][choice] # Corresponding TPR [1] > fpr@"y.values"[[1]][choice] # Corresponding FPR [1] R script

80 Plan du cours 1 Interpreting model fit 2 Model comparison 3 Subset selection 4 Prediction 5 Cross-validation

81 Assessment of a classification rule Remark: in the previous prediction performance criteria, Y i is explicitly used to derive Ŷi... major deviation from the real conditions. Recommendation: the classification rule, fitted on a learning sample, has to be applied to a completely separate test sample. Definition An assessment procedure involving a test sample, completely separated from the learning sample, is referred to as external cross-validation If n is moderate, then an internal K -fold CV procedure shall be preferred.

82 Assessment of a classification rule viduals n indiv

83 Assessment of a classification rule n indiv viduals The sample is split into k balanced subsamples

84 Assessment of a classification rule s n indiv vidual Learning sample used to estimate the model

85 Assessment of a classification rule Testing sample to calculate prediction errors s n indiv vidual Learning sample to estimate the model

86 Assessment of a classification rule s Testing sample to calculate prediction errors vidual n indiv

87 Assessment of a classification rule s n indiv vidual Testing sample to calculate prediction errors

88 Assessment of a classification rule viduals n indiv Testing sample to calculate prediction errors

89 Assessment of a classification rule The choice of K can affect the result of the CV procedure: The CV procedure involves to fit the model K times: if fitting the model is computationally time-consuming, the values K = 3 and K = 10 are often chosen. when n is small, K = n may be recommended. The resulting CV procedure is named leave-one-out cross-validation.

90 Assessment of a classification rule > # Step 1: segmentation of the dataset in 10 subsamples R script > subsamples = cvsegments(n=307,k=10) > # List whose 10 components contain the indices of items in each of the 10 segments > unlist(lapply(subsamples,length)) # Subsamples sizes V1 V2 V3 V4 V5 V6 V7 V8 V9 V

91 Assessment of a classification rule > # Step 2: cycling over the segments > cvpredicted = rep( 0,307) # will contain the cross-validated predictions > nbselected = rep(0,10) # will contain the number of selected variables R script > for (k in 1:10) { # cycling over the 10 segments + learn = dta.12[-subsamples[[k]],-7] # the kth segment is excluded + test = dta.12[subsamples[[k]],-7] # the test sample is just the kth segment + maturity.select = bestglm(xy=learn,family=binomial, method="exhaustive") + resselect = maturity.select$subsets # selected is a vector of boolean values + selected = unlist(resselect[which.min(resselect[,8]),2:6]) + nbselected[k] = sum(selected) # Fits the selected model + maturity.logit = glm(maturity.,family=binomial,data=learn[,c(selected,true)]) + pi = predict(maturity.logit,newdata=test[,c(selected,true)],type="response") + cvpredicted[subsamples[[k]]] = ifelse(pi>=threshold, 2, 1 ) + } > nbselected [1]

92 Assessment of a classification rule > # Step 3: Cross-validated performance criteria > # The confusion matrix is first created > confusion = table(dta.12$maturity,cvpredicted,dnn=list("obs.","pred.")) > confusion Pred. Obs R script

93 Assessment of a classification rule > # The performance criteria are deduced > perf = rep(0,5) # Creates a 5-vector with only 0 entries > names(perf) = c("nb. pos","sens.","spec.","ppv","npv") > colmargins = colsums(confusion) > rowmargins = rowsums(confusion) # Column and Row totals > perf[1] = colmargins[2] # Number of positives > perf[2] = confusion[2,2]/rowmargins[2] # Sensitivity > perf[3] = confusion[1,1]/rowmargins[1] # Specificity > perf[4] = confusion[2,2]/colmargins[2] # PPV > perf[5] = confusion[1,1]/colmargins[1] # NPV > perf Nb. pos Sens. Spec. PPV NPV R script

94 Modeling default in cheese production Exercise A dairy food industry wish to make an objective classification rule to detect major defaults in cheese. For that, they collect daily data of proportions of defective cheeses and corresponding food process conditions, characterized by 2 sanitary variables (San1 and San2)and 3 milk quality variables (From1, From2, From3). Data are provided in file cheese.txt. Propose and fit an appropriate model for the above issue using the possible explanatory variables San1, San2, From1, From2, From3, San1 2, From1 2, From3 2 and From1 From3. Suppose we want the classification rule to be able to detect 90% of the defective cheese. Correspondingly, which proportion of false positives should we expect?

Regression modeling for categorical data. Part II : Model selection and prediction

Regression modeling for categorical data. Part II : Model selection and prediction Regression modeling for categorical data Part II : Model selection and prediction David Causeur Agrocampus Ouest IRMAR CNRS UMR 6625 http://math.agrocampus-ouest.fr/infogluedeliverlive/membres/david.causeur

More information

High-dimensional regression modeling

High-dimensional regression modeling High-dimensional regression modeling David Causeur Department of Statistics and Computer Science Agrocampus Ouest IRMAR CNRS UMR 6625 http://www.agrocampus-ouest.fr/math/causeur/ Course objectives Making

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Exercise 5.4 Solution

Exercise 5.4 Solution Exercise 5.4 Solution Niels Richard Hansen University of Copenhagen May 7, 2010 1 5.4(a) > leukemia

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla.

Experimental Design and Statistical Methods. Workshop LOGISTIC REGRESSION. Jesús Piedrafita Arilla. Experimental Design and Statistical Methods Workshop LOGISTIC REGRESSION Jesús Piedrafita Arilla jesus.piedrafita@uab.cat Departament de Ciència Animal i dels Aliments Items Logistic regression model Logit

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm

ssh tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm Kedem, STAT 430 SAS Examples: Logistic Regression ==================================== ssh abc@glue.umd.edu, tap sas913, sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm a. Logistic regression.

More information

Linear model selection and regularization

Linear model selection and regularization Linear model selection and regularization Problems with linear regression with least square 1. Prediction Accuracy: linear regression has low bias but suffer from high variance, especially when n p. It

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. Linear regression Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear. 1/48 Linear regression Linear regression is a simple approach

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Model comparison and selection

Model comparison and selection BS2 Statistical Inference, Lectures 9 and 10, Hilary Term 2008 March 2, 2008 Hypothesis testing Consider two alternative models M 1 = {f (x; θ), θ Θ 1 } and M 2 = {f (x; θ), θ Θ 2 } for a sample (X = x)

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Logistic & Tobit Regression

Logistic & Tobit Regression Logistic & Tobit Regression Different Types of Regression Binary Regression (D) Logistic transformation + e P( y x) = 1 + e! " x! + " x " P( y x) % ln$ ' = ( + ) x # 1! P( y x) & logit of P(y x){ P(y

More information

Inference for Regression

Inference for Regression Inference for Regression Section 9.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 13b - 3339 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017 Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Variable Selection and Model Building

Variable Selection and Model Building LINEAR REGRESSION ANALYSIS MODULE XIII Lecture - 39 Variable Selection and Model Building Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur 5. Akaike s information

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

Homework 5 - Solution

Homework 5 - Solution STAT 526 - Spring 2011 Homework 5 - Solution Olga Vitek Each part of the problems 5 points 1. Agresti 10.1 (a) and (b). Let Patient Die Suicide Yes No sum Yes 1097 90 1187 No 203 435 638 sum 1300 525 1825

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Lecture 2: Poisson and logistic regression

Lecture 2: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 11-12 December 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

R Hints for Chapter 10

R Hints for Chapter 10 R Hints for Chapter 10 The multiple logistic regression model assumes that the success probability p for a binomial random variable depends on independent variables or design variables x 1, x 2,, x k.

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin Note on Bivariate Regression: Connecting Practice and Theory Konstantin Kashin Fall 2012 1 This note will explain - in less theoretical terms - the basics of a bivariate linear regression, including testing

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y

INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y INTRODUCING LINEAR REGRESSION MODELS Response or Dependent variable y Predictor or Independent variable x Model with error: for i = 1,..., n, y i = α + βx i + ε i ε i : independent errors (sampling, measurement,

More information

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data Outline Mixed models in R using the lme4 package Part 3: Longitudinal data Douglas Bates Longitudinal data: sleepstudy A model with random effects for intercept and slope University of Wisconsin - Madison

More information

Lecture 5: Poisson and logistic regression

Lecture 5: Poisson and logistic regression Dankmar Böhning Southampton Statistical Sciences Research Institute University of Southampton, UK S 3 RI, 3-5 March 2014 introduction to Poisson regression application to the BELCAP study introduction

More information

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks

ST440/540: Applied Bayesian Statistics. (9) Model selection and goodness-of-fit checks (9) Model selection and goodness-of-fit checks Objectives In this module we will study methods for model comparisons and checking for model adequacy For model comparisons there are a finite number of candidate

More information

Binary Dependent Variables

Binary Dependent Variables Binary Dependent Variables In some cases the outcome of interest rather than one of the right hand side variables - is discrete rather than continuous Binary Dependent Variables In some cases the outcome

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models The Model The GLM is: g( µ) = ß 0 + ß 1 x 1 + ß 2 x 2 +... + ß k x k The generalization to the GAM is: g(µ) = ß 0 + f 1 (x 1 ) + f 2 (x 2 ) +... + f k (x k ) where the functions

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

y ˆ i = ˆ  T u i ( i th fitted value or i th fit) 1 2 INFERENCE FOR MULTIPLE LINEAR REGRESSION Recall Terminology: p predictors x 1, x 2,, x p Some might be indicator variables for categorical variables) k-1 non-constant terms u 1, u 2,, u k-1 Each u

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap

More information

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam

ECLT 5810 Linear Regression and Logistic Regression for Classification. Prof. Wai Lam ECLT 5810 Linear Regression and Logistic Regression for Classification Prof. Wai Lam Linear Regression Models Least Squares Input vectors is an attribute / feature / predictor (independent variable) The

More information

Count data page 1. Count data. 1. Estimating, testing proportions

Count data page 1. Count data. 1. Estimating, testing proportions Count data page 1 Count data 1. Estimating, testing proportions 100 seeds, 45 germinate. We estimate probability p that a plant will germinate to be 0.45 for this population. Is a 50% germination rate

More information

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection

Model comparison. Patrick Breheny. March 28. Introduction Measures of predictive power Model selection Model comparison Patrick Breheny March 28 Patrick Breheny BST 760: Advanced Regression 1/25 Wells in Bangladesh In this lecture and the next, we will consider a data set involving modeling the decisions

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

STAT 4385 Topic 01: Introduction & Review

STAT 4385 Topic 01: Introduction & Review STAT 4385 Topic 01: Introduction & Review Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2016 Outline Welcome What is Regression Analysis? Basics

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

Generalised linear models. Response variable can take a number of different formats

Generalised linear models. Response variable can take a number of different formats Generalised linear models Response variable can take a number of different formats Structure Limitations of linear models and GLM theory GLM for count data GLM for presence \ absence data GLM for proportion

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

MS-C1620 Statistical inference

MS-C1620 Statistical inference MS-C1620 Statistical inference 10 Linear regression III Joni Virta Department of Mathematics and Systems Analysis School of Science Aalto University Academic year 2018 2019 Period III - IV 1 / 32 Contents

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information