NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

Size: px
Start display at page:

Download "NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )"

Transcription

1 NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: ) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question Full marks Earned marks Total Full marks 100 Earned marks INSTRUCTIONS TO CANDIDATES 1. This examination paper contains SIX (6) questions and comprises TWELVE (12) printed pages. 2. Answer ALL the questions for TOTAL 100 marks. 3. Read the questions CAREFULLY. 4. All NOTATIONS used here are the same as those used in the lecture notes. 5. Write your answers NEATLY following the associated questions. 6. This is a Closed textbook, Closed notes examination but calculators are allowed. 7. Candidates may bring in TWO A4 size ( mm) help sheets. Page 2

2 Page 2 1. [10 pts, each 1 pt] Circle T or F for each of the statements. (1) [F] To test for independence in two-way contingency tables, likelihood ratio tests and Pearson s χ 2 tests are equivalent for small sample sizes. (2) [F] Fisher s exact test uses negative binomial distribution to compute p-values. (3) [F] Diagnosis of type of mental illness (schizophrenia, neurosis, depression) is an ordinal variable. (4) [F] If odds of success in a binary response is 0.5, the probability of success is (5) [T] Suppose that P (Y i = 1) = 1 P (Y i = 0) = 0.2, i = 1,, n, where Y i s are independent. Let Y = 50 Y i. Then the distribution of Y is Binomial with mean 10. i=1 (6) [T] Test of independence for a linear trend alternative cannot be used for nominal categorical data. (7) [F] In a logistic regression model, logit[π(x)] = α + βx, e α equals the odds of success when x = 1. (8) [F] In a logit model logit[π(x)] = α + βx, the probability increases at the rate of 0.16β when π(x) = 0.4. (9) [F] A classical linear regression model with errors having normal distribution is a special case of generalized linear model with probit link. (10) [F] Fitting a saturated model often results in nonzero residual deviance. Page 3

3 Page 3 2. [28 pts] For a study using logistic regression to examine the data on rheumatoid arthritis, we consider age of the patient as the predictor variable. The response Y measured whether the patient showed any improvement at all (1=yes). The following computer output reports for a logistic regression model using age to predict the probability of improvement. Model Fit Statistics Intercept Intercept and Criterion Only Covariates -2 Log L Standard Wald Parameter DF Estimate Error Chi-Square Pr>ChiSq Intercept age Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age Estimated Covariance Matrix parameter Intercept age Intercept age The 90, 95, 97.5 and 99.5-th percentiles of the standard normal distribution are 1.28, 1.645, 1.96, and respectively. (a) Find out the rates of change in predicted probabilities of improvement when age = 25 and when the estimated probability of improvement is 0.3, respectively. Solution: The estimated probability of improvement at age=25 is: ˆπ(25) = exp( ) = (2 pts) 1 + exp( ) [8 Pts] Then at age 25, the rate of change in the estimated probability is ˆβ ˆπ(25)(1 ˆπ(25)) = ( ) = (3 pts). When the estimated probability of improvement is.3, the rate of change in the estimated probability is ˆβ.3 (1.3) = (1 0.3) = (3 pts). Page 4

4 Page 4 (b) Find out the age at which the estimated probability of improvement is 0.3. [4 Pts] exp(ˆα+ ˆβx) Solution: For ˆπ(Age) = 1+exp(ˆα+ ˆβx) = 0.3 (2 pts), Age = (log(.3/.7) ˆα)/ ˆβ (1 pt) = ( )/ = (1 pt). (c) Obtain a 95% confidence interval for the true odds ratio of improvement for a half year increase in age. [6 pts] Solution: The 95% confidence interval for.5β (1 pt) is.5( ˆβ ±z ASE( ˆβ)) =.5(0.0492± ) = ( , ). (3 pts) Thus, the 95% confidence interval for the true odds ratio exp(.5β) is (exp( ), exp( )) = ( , ). (2 pts) (d) Obtain a 95% confidence interval for the probability of improvement at age = 25. [10 pts] Solution: The estimated linear predictor at age 25 is, ˆα + 25 ˆβ = (1 pt) and its estimated variance is Var(ˆα) Var( ˆβ) Cov(ˆα, ˆβ) (1 pt) = ( ) = (2 pts) Therefore, the estimated ASE of the linear predictor is.3742 = (1 a 95% confidence interval for the true linear predictor is pt). So ± = ( , ). (2 pts) Therefore, a 95% confidence interval for the true probability at age 25 is ( ) exp( ) 1 + exp( ), exp( ) = (0.0684, ). (3 pts) 1 + exp( ) Page 5

5 Page 5 3. [15 pts] The following table was taken from the 1991 General Social Survey. Party Identification Race Democrat Independent Republican Total White Black Total Final Examination Q3 R code and Output Racew<-c(1,1,1,0,0,0)# White=1; black=0; PartyD<-c(1,0,0,1,0,0) #Democrat= 1; others 0 PartyI<-c(0,1,0,0,1,0)# Independent=1; others 0 Count<-c(341,105,405,103,15,11) RacewPartyD<-Racew*PartyD; RacewPartyI<-Racew*PartyI; fit<-glm(count~racew+partyd+racewpartyd+racewpartyi,family=poisson(link="log")) summary(fit) ####R outputs Call: glm(formula = count ~ Racew + PartyD + RacewPartyD +RacewPartyI, family = poisson(link = "log")) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) <2e-16 *** Racew <2e-16 *** PartyD <2e-16 *** RacewPartyD <2e-16 *** RacewPartyI <2e-16 *** Null deviance: Residual deviance: Page 6

6 Page 6 Let X and Y denote the race and party respectively. The 95-th percentiles of χ 2 -distribution with 1, 2, 3, 4, 5, 6 degrees of freedom are 3.841, 5.99, 7.81, 9.49, 11.07, and respectively. Based on the R code and R output, (a) Write down the loglinear regression model and identify the associated estimates. [3 Pts] Solution: The log linear regression model can be written as log(µ ij ) = λ + λ X i + λ Y j + λ ij, i = 1, 2; j = 1, 2, 3. (1 pt) Based on the R code, the first scheme of constraints was used. The estimated parameters are = X 1 = X 2 = 0 Y 1 = , Y 2 = 0, Y 3 = 0, 11 = , 12 = , 13 = 21 = 22 = 23 = 0. (2 pts) (b) Compute all the estimated cell counts. Solution: [6 pts] ˆµ 11 = exp( + X 1 + Y 1 + ˆµ 12 = exp( + X 1 + Y 2 + ˆµ 13 = exp( + X 1 + Y 3 + ˆµ 21 = exp( + X 2 + Y ) = exp( ) = ) = exp( ) = ) = exp( ) = ) = exp( ) = ˆµ 22 = exp( + X 2 + Y ) = exp(2.5649) = ˆµ 23 = exp( + X 2 + Y ) = exp(2.5649) = (each 1 pt) (c) Comment if the intercept model fit the data well. [3 pts] Solution: From the R output, the non-intercept coefficients are highly significant and hence they are unlikely 0 (2 pts). Thus, the intercept model assuming the non-intercept coefficients being 0 can not fit the data well. (1 pt) OR From the R output, the null deviance is (1 a χ 2 -distribution with 6 1 = 5 degrees of freedom (1 χ 2 5 pt). The null deviance follows pt). The 95-th percentile of is which is much smaller than the null deviance. Thus, the intercept model does not fit the data well (1 pt). (d) Comment if the loglinear model fit the data well. Solution: From the R output, the residual deviance is (1 [3 pts] pt). The residual deviance follows a χ 2 -distribution with 6 5 = 1 degrees of freedom (1 pt). The 95-th percentile of χ 2 1 is which is much larger than the residual deviance. Thus, the loglinear regression model does fit the data well. (1 pt) Page 7

7 Page 7 4. [20 pts] The following table is taken from Lecture 8. Alcohol, Cigarette and Marijuana Use For High School Seniors Marijuana Use Alcohol Cigarette Use Use Yes No Yes Yes No No Yes 3 43 No ## Final Examination Q4 R code and output A<-c(1,1,1,1,0,0,0,0); ## 1--Alcohol use 0--otherwise C<-c(1,1,0,0,1,1,0,0); ## 1---Cigarette use 0---otherwise M<-c(1,0,1,0,1,0,1,0); ## 1-Marijuana use 0-otherwise count<-c(911,538,44,456,3,43,2,279); AC<-A*C; AM<-A*M; CM<-C*M; ACM<-A*C*M; ##Model (AM,CM,AC) fit drug.log<-glm(count~a+c+m+am+cm+ac,family=poisson(link="log")) summary(drug.log) ## output Call: glm(formula = count ~ A + C + M + AM + CM + AC, family = poisson(link = "log")) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16*** A e-10 *** C < 2e-16 *** M < 2e-16*** AM <1.31e-10*** CM < 2e-16 *** AC < 2e-16 *** Null deviance: , Residual deviance: ##Estimated covariance matrix between AM and CM AM CM AM CM Page 8

8 Page 8 Let X, Y and Z denote the variables Alcohol, Cigarette and Marijuana use respectively. The 90, 95, 97.5 and 99.5-th percentiles of the standard normal distribution are 1.28, 1.645, 1.96, and respectively. Based on the R code and R output, (a) Write down the loglinear regression model and identify the associated estimates. [5 pts] Solution: The log linear regression model can be written as log(µ ijk ) = λ + λ X i + λ Y j + λ ij + λ XZ ik + λ Y jk Z, i = 1, 2; j = 1, 2; k = 1, 2. (1 pt) Based on the R code, the first scheme of constraints was used. The estimated parameters are (4 pts) = X 1 = X 2 = 0, Y 1 = , Y 2 = 0, Z 1 = , Z 2 = 0 11 = , XZ 11 = , Y Z 12 = 21 = 22 = 0, XZ 12 = XZ 21 = 11 = XZ 22 = 0, Y Z 12 = Y 21 Z = Y 22 Z = 0. (b) Compute the estimated odds ratio between any two variables of Alcohol, Cigarette, and Marijuana use controlling for the third variable. [3 pts] Solution: Since the loglinear regression model (,XZ,YZ) is homogeneous association for any two variables controlling for the third variable. The estimated odds ratio between Alcohol and Cigarette use controlling for Marijuana use is exp( ) = exp( 11 ) = exp( ) = (1 pt) The estimated odds ratio between Alcohol and Marijuana use controlling for Cigarette use is exp( XZ XZ XZ XZ ) = exp( XZ 11 ) = exp(2.9860) = (1 pt) The estimated odds ratio between Cigarette and Marijuana use controlling for Alcohol use is exp( Y Z 11 + Y Z 22 Y Z 12 Y Z 21 ) = exp( Y Z 11 ) = exp( ) = (1 pt) Page 9

9 Page 9 (c) Construct the 95% confidence interval for the true odds ratio between Alcohol and Cigarette use controlling for Marijuana use. [4 pts] Solution: The 95% confidence interval for the true log odds ratio between Alcohol and Cigarette use controlling for Marijuana use is 11 ± 1.96 ASE = ± = (1.7134, ) (2 pts) Thus, the 95% confidence interval for the true odds ratio between Alcohol and Cigarette use controlling for Marijuana use is (exp(1.7134), exp(2.3957)) = (5.5476, ). (2 pts) (d) Test if the true odds ratio between Alcohol and Marijuana use controlling for Cigarette use equals the true odds ratio between Cigarette and Marijuana use controlling for Alcohol use at α = 5%. [ 8 pts] Solution: Set T = λ XZ 11 λ Y 11 Z. It is equivalent to test H 0 : T = 0 vs H 1 : T 0 (1 pt). Now the observed ˆT = XZ 11 Y 11 Z = = (1 pt). In addition, Var( ˆT ) = Var( XZ 11 ) + Var( Y Z 11 ) 2Cov( XZ 11, Y Z 11 ) (1 pt) = ( ) = (2 pts) Therefore, the estimated ASE of ˆT is.2527 =.5027 (1 pt). It follows that ˆT /ASE =.13812/.5027 =.2747 (1 pt) which is smaller than the 95-th percentile of the standard normal distribution, That is, at α = 5%, it is very likely that the true odds ratio between Alcohol and Marijuana use controlling for Cigarette use equals the true odds ratio between Cigarette and Marijuana use controlling for Alcohol use. (1 pt) Page 10

10 Page [15 pts] Consider a three-way contingency table with categorical variables X having 2 categories, Y having 2 categories and Z having K 2 categories. (Hint: To show A if and only if B, you need show both A implies B and B implies A ) (a) Show that the loglinear model (, XZ, Y Z) holds if and only if X and Y have homogeneous association controlling for Z. [9 pts] Proof: If the loglinear model (, XZ, Y Z) holds, then we have log(θ (k) ) = λ 11 + λ 22 λ 12 λ 21, which does not depend on k, the level of Z. Thus, X and Y have homogeneous association controlling for Z. (3 pts) Under the first scheme of constraints, the possible nonzero 3-factor terms are λ Z 11k, k = 1, 2,, K 1. Other 3-factor terms are 0. Then under the saturated loglinear model ( Z), we can show that log(θ (k) ) = λ 11 + λ 22 λ 12 λ 21 + λ 11k Z + λ 22k Z λ 12k Z λ 21k Z = λ 11 + λ 22 λ 12 λ 21 + λ Z, k = 1, 2,, K 1, and log(θ (K) ) = λ 11 + λ 22 λ 12 λ 21 11k. (3 pts) If X and Y have homogeneous association controlling for Z, then log(θ (k) ) = log(θ (K) ) = λ 11 + λ 22 λ 12 λ 21 for k = 1, 2,, K 1. It follows that λ 11k Z = λ 11K Z = 0, k = 1, 2,, K 1. Therefore, in this case, the saturated model ( Z) reduces to the homogeneous association model (, XZ, Y Z). (3 pts) Page 11

11 Page 11 (b) Show that the loglinear model (XZ, Y Z) holds if and only if X and Y are conditionally independent controlling for Z. [6 pts] Proof: If the loglinear model (XZ, Y Z) holds, then we have log(θ (k) ) = λ 11 + λ 22 λ 12 λ 21 = 0. It follows that θ (k) = 1, k = 1, 2,, K. Thus, X and Y are conditionally independent controlling for Z. (3 pts) If X and Y are conditionally independent controlling for Z, then by Part (a), we have 0 = log(θ (k) ) = λ 11 + λ 22 λ 12 λ 21 = 0 for k = 1, 2,, K. Under the first scheme of constraints, the possible nonzero 2-factor terms are λ 11. Other 2-factor terms are 0. Then we have λ 11 = 0. It follows that in this case, the homogeneous association model (, XZ, Y Z) reduces to the conditionally independent model (XZ, Y Z) controlling for Z. (3 pts) Page 12

12 Page [12 pts] (a) Let P (Y = 1) = 1 P (Y = 0) = p. For the population of subjects having Y = j, X has a probability density function f j (x) = λ j exp( λ j x), x 0, j = 0, 1. Show that π(x) = P (Y = 1 x) satisfies the logistic regression model with some α and β. [7 pts] Proof: Since P (Y = 1) = 1 P (Y = 0) = p and the conditional probability density function of X given Y = 0 and Y = 1 are f 0 (x) = λ 0 exp( λ 0 x), x 0, (1 pt) and f 1 (x) = λ 1 exp( λ 1 x), x 0, (1 pt) by Bayes theorem, we have π(x) P (Y = 1 x) = f 1 (x)p (Y =1) f 0 (x)p (Y =0)+f 1 (x)p (Y =1). (1 pt) Therefore, { pf 1 (x) logit(π(x)) = log (1 p)f 0 (x) = log pλ1 [ ] } exp (λ 0 λ 1 )x (1 p)λ 0 = log pλ 1 (1 p)λ 0 + (λ 0 λ 1 )x = α + βx (2 pts) where pλ 1 α = log( ) and β = (λ 0 λ 1 ). (2 pts) (1 p)λ 0 (b) For known n 2, show that the negative binomial distribution with probability mass function, f(y n, µ) = ( ) ( ) n ( y y+n 1 n n 1 µ+n 1 µ+n) n, y = 0, 1, 2,. belongs to the exponential family of distributions. Find out the natural parameter for this distribution. [5 pts] Proof: The probability mass function of the negative binomial distribution can be written as ( ) y + n 1 n f(y n, µ) = ( n 1 µ + n )n (1 n µ + n )y ( ) µ = exp[y log( µ + n ) + n log( n y + n 1 µ + n ) + log ] (2 pts) n 1 This belongs to the exponential family of distributions with θ = log(µ/(µ + n)) (1 pt) and b(θ) = n log(1 e θ ) (1 pt). Here φ = 1, a(φ) = 1 and c(y; φ) = log ( ) y+n 1 n 1. The natural parameter for this distribution is θ = log(µ/(µ + n)). (1 pt) -End of the Paper

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

The material for categorical data follows Agresti closely.

The material for categorical data follows Agresti closely. Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Logistic Regressions. Stat 430

Logistic Regressions. Stat 430 Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

(c) Interpret the estimated effect of temperature on the odds of thermal distress. STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Homework 1 Solutions

Homework 1 Solutions 36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Solutions for Examination Categorical Data Analysis, March 21, 2013

Solutions for Examination Categorical Data Analysis, March 21, 2013 STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00 Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section

More information

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine. Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links

Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department

More information

Solution to Tutorial 7

Solution to Tutorial 7 1. (a) We first fit the independence model ST3241 Categorical Data Analysis I Semester II, 2012-2013 Solution to Tutorial 7 log µ ij = λ + λ X i + λ Y j, i = 1, 2, j = 1, 2. The parameter estimates are

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

Various Issues in Fitting Contingency Tables

Various Issues in Fitting Contingency Tables Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

MSH3 Generalized linear model

MSH3 Generalized linear model Contents MSH3 Generalized linear model 7 Log-Linear Model 231 7.1 Equivalence between GOF measures........... 231 7.2 Sampling distribution................... 234 7.3 Interpreting Log-Linear models..............

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap

More information

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression

Logistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024

More information

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013

Analysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013 Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Likelihoods for Generalized Linear Models

Likelihoods for Generalized Linear Models 1 Likelihoods for Generalized Linear Models 1.1 Some General Theory We assume that Y i has the p.d.f. that is a member of the exponential family. That is, f(y i ; θ i, φ) = exp{(y i θ i b(θ i ))/a i (φ)

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Generalized linear models

Generalized linear models Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data

More information

Matched Pair Data. Stat 557 Heike Hofmann

Matched Pair Data. Stat 557 Heike Hofmann Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic

More information

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses

More information

Short Course Introduction to Categorical Data Analysis

Short Course Introduction to Categorical Data Analysis Short Course Introduction to Categorical Data Analysis Alan Agresti Distinguished Professor Emeritus University of Florida, USA Presented for ESALQ/USP, Piracicaba Brazil March 8-10, 2016 c Alan Agresti,

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

Simple logistic regression

Simple logistic regression Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy

Logistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

STAT 526 Advanced Statistical Methodology

STAT 526 Advanced Statistical Methodology STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 7 Contingency Table 0-0 Outline Introduction to Contingency Tables Testing Independence in Two-Way Contingency Tables Modeling Ordinal Associations

More information

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper

McGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the

More information

Generalized logit models for nominal multinomial responses. Local odds ratios

Generalized logit models for nominal multinomial responses. Local odds ratios Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π

More information

Regression Methods for Survey Data

Regression Methods for Survey Data Regression Methods for Survey Data Professor Ron Fricker! Naval Postgraduate School! Monterey, California! 3/26/13 Reading:! Lohr chapter 11! 1 Goals for this Lecture! Linear regression! Review of linear

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

Chapter 4: Generalized Linear Models-I

Chapter 4: Generalized Linear Models-I : Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Beyond GLM and likelihood

Beyond GLM and likelihood Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence

More information

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017 Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which

More information

Homework 10 - Solution

Homework 10 - Solution STAT 526 - Spring 2011 Homework 10 - Solution Olga Vitek Each part of the problems 5 points 1. Faraway Ch. 4 problem 1 (page 93) : The dataset parstum contains cross-classified data on marijuana usage

More information

MSH3 Generalized linear model

MSH3 Generalized linear model Contents MSH3 Generalized linear model 5 Logit Models for Binary Data 173 5.1 The Bernoulli and binomial distributions......... 173 5.1.1 Mean, variance and higher order moments.... 173 5.1.2 Normal limit....................

More information

Ch 6: Multicategory Logit Models

Ch 6: Multicategory Logit Models 293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models 1/37 The Kelp Data FRONDS 0 20 40 60 20 40 60 80 100 HLD_DIAM FRONDS are a count variable, cannot be < 0 2/37 Nonlinear Fits! FRONDS 0 20 40 60 log NLS 20 40 60 80 100 HLD_DIAM

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

Introducing Generalized Linear Models: Logistic Regression

Introducing Generalized Linear Models: Logistic Regression Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and

More information

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti

Good Confidence Intervals for Categorical Data Analyses. Alan Agresti Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline

More information

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:

Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

Homework 5 - Solution

Homework 5 - Solution STAT 526 - Spring 2011 Homework 5 - Solution Olga Vitek Each part of the problems 5 points 1. Agresti 10.1 (a) and (b). Let Patient Die Suicide Yes No sum Yes 1097 90 1187 No 203 435 638 sum 1300 525 1825

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)

Ron Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011) Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October

More information

Lecture 10: Introduction to Logistic Regression

Lecture 10: Introduction to Logistic Regression Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial

More information

STA 450/4000 S: January

STA 450/4000 S: January STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have

More information