NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )
|
|
- Georgiana Henry
- 5 years ago
- Views:
Transcription
1 NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: ) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question Full marks Earned marks Total Full marks 100 Earned marks INSTRUCTIONS TO CANDIDATES 1. This examination paper contains SIX (6) questions and comprises TWELVE (12) printed pages. 2. Answer ALL the questions for TOTAL 100 marks. 3. Read the questions CAREFULLY. 4. All NOTATIONS used here are the same as those used in the lecture notes. 5. Write your answers NEATLY following the associated questions. 6. This is a Closed textbook, Closed notes examination but calculators are allowed. 7. Candidates may bring in TWO A4 size ( mm) help sheets. Page 2
2 Page 2 1. [10 pts, each 1 pt] Circle T or F for each of the statements. (1) [F] To test for independence in two-way contingency tables, likelihood ratio tests and Pearson s χ 2 tests are equivalent for small sample sizes. (2) [F] Fisher s exact test uses negative binomial distribution to compute p-values. (3) [F] Diagnosis of type of mental illness (schizophrenia, neurosis, depression) is an ordinal variable. (4) [F] If odds of success in a binary response is 0.5, the probability of success is (5) [T] Suppose that P (Y i = 1) = 1 P (Y i = 0) = 0.2, i = 1,, n, where Y i s are independent. Let Y = 50 Y i. Then the distribution of Y is Binomial with mean 10. i=1 (6) [T] Test of independence for a linear trend alternative cannot be used for nominal categorical data. (7) [F] In a logistic regression model, logit[π(x)] = α + βx, e α equals the odds of success when x = 1. (8) [F] In a logit model logit[π(x)] = α + βx, the probability increases at the rate of 0.16β when π(x) = 0.4. (9) [F] A classical linear regression model with errors having normal distribution is a special case of generalized linear model with probit link. (10) [F] Fitting a saturated model often results in nonzero residual deviance. Page 3
3 Page 3 2. [28 pts] For a study using logistic regression to examine the data on rheumatoid arthritis, we consider age of the patient as the predictor variable. The response Y measured whether the patient showed any improvement at all (1=yes). The following computer output reports for a logistic regression model using age to predict the probability of improvement. Model Fit Statistics Intercept Intercept and Criterion Only Covariates -2 Log L Standard Wald Parameter DF Estimate Error Chi-Square Pr>ChiSq Intercept age Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits age Estimated Covariance Matrix parameter Intercept age Intercept age The 90, 95, 97.5 and 99.5-th percentiles of the standard normal distribution are 1.28, 1.645, 1.96, and respectively. (a) Find out the rates of change in predicted probabilities of improvement when age = 25 and when the estimated probability of improvement is 0.3, respectively. Solution: The estimated probability of improvement at age=25 is: ˆπ(25) = exp( ) = (2 pts) 1 + exp( ) [8 Pts] Then at age 25, the rate of change in the estimated probability is ˆβ ˆπ(25)(1 ˆπ(25)) = ( ) = (3 pts). When the estimated probability of improvement is.3, the rate of change in the estimated probability is ˆβ.3 (1.3) = (1 0.3) = (3 pts). Page 4
4 Page 4 (b) Find out the age at which the estimated probability of improvement is 0.3. [4 Pts] exp(ˆα+ ˆβx) Solution: For ˆπ(Age) = 1+exp(ˆα+ ˆβx) = 0.3 (2 pts), Age = (log(.3/.7) ˆα)/ ˆβ (1 pt) = ( )/ = (1 pt). (c) Obtain a 95% confidence interval for the true odds ratio of improvement for a half year increase in age. [6 pts] Solution: The 95% confidence interval for.5β (1 pt) is.5( ˆβ ±z ASE( ˆβ)) =.5(0.0492± ) = ( , ). (3 pts) Thus, the 95% confidence interval for the true odds ratio exp(.5β) is (exp( ), exp( )) = ( , ). (2 pts) (d) Obtain a 95% confidence interval for the probability of improvement at age = 25. [10 pts] Solution: The estimated linear predictor at age 25 is, ˆα + 25 ˆβ = (1 pt) and its estimated variance is Var(ˆα) Var( ˆβ) Cov(ˆα, ˆβ) (1 pt) = ( ) = (2 pts) Therefore, the estimated ASE of the linear predictor is.3742 = (1 a 95% confidence interval for the true linear predictor is pt). So ± = ( , ). (2 pts) Therefore, a 95% confidence interval for the true probability at age 25 is ( ) exp( ) 1 + exp( ), exp( ) = (0.0684, ). (3 pts) 1 + exp( ) Page 5
5 Page 5 3. [15 pts] The following table was taken from the 1991 General Social Survey. Party Identification Race Democrat Independent Republican Total White Black Total Final Examination Q3 R code and Output Racew<-c(1,1,1,0,0,0)# White=1; black=0; PartyD<-c(1,0,0,1,0,0) #Democrat= 1; others 0 PartyI<-c(0,1,0,0,1,0)# Independent=1; others 0 Count<-c(341,105,405,103,15,11) RacewPartyD<-Racew*PartyD; RacewPartyI<-Racew*PartyI; fit<-glm(count~racew+partyd+racewpartyd+racewpartyi,family=poisson(link="log")) summary(fit) ####R outputs Call: glm(formula = count ~ Racew + PartyD + RacewPartyD +RacewPartyI, family = poisson(link = "log")) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) <2e-16 *** Racew <2e-16 *** PartyD <2e-16 *** RacewPartyD <2e-16 *** RacewPartyI <2e-16 *** Null deviance: Residual deviance: Page 6
6 Page 6 Let X and Y denote the race and party respectively. The 95-th percentiles of χ 2 -distribution with 1, 2, 3, 4, 5, 6 degrees of freedom are 3.841, 5.99, 7.81, 9.49, 11.07, and respectively. Based on the R code and R output, (a) Write down the loglinear regression model and identify the associated estimates. [3 Pts] Solution: The log linear regression model can be written as log(µ ij ) = λ + λ X i + λ Y j + λ ij, i = 1, 2; j = 1, 2, 3. (1 pt) Based on the R code, the first scheme of constraints was used. The estimated parameters are = X 1 = X 2 = 0 Y 1 = , Y 2 = 0, Y 3 = 0, 11 = , 12 = , 13 = 21 = 22 = 23 = 0. (2 pts) (b) Compute all the estimated cell counts. Solution: [6 pts] ˆµ 11 = exp( + X 1 + Y 1 + ˆµ 12 = exp( + X 1 + Y 2 + ˆµ 13 = exp( + X 1 + Y 3 + ˆµ 21 = exp( + X 2 + Y ) = exp( ) = ) = exp( ) = ) = exp( ) = ) = exp( ) = ˆµ 22 = exp( + X 2 + Y ) = exp(2.5649) = ˆµ 23 = exp( + X 2 + Y ) = exp(2.5649) = (each 1 pt) (c) Comment if the intercept model fit the data well. [3 pts] Solution: From the R output, the non-intercept coefficients are highly significant and hence they are unlikely 0 (2 pts). Thus, the intercept model assuming the non-intercept coefficients being 0 can not fit the data well. (1 pt) OR From the R output, the null deviance is (1 a χ 2 -distribution with 6 1 = 5 degrees of freedom (1 χ 2 5 pt). The null deviance follows pt). The 95-th percentile of is which is much smaller than the null deviance. Thus, the intercept model does not fit the data well (1 pt). (d) Comment if the loglinear model fit the data well. Solution: From the R output, the residual deviance is (1 [3 pts] pt). The residual deviance follows a χ 2 -distribution with 6 5 = 1 degrees of freedom (1 pt). The 95-th percentile of χ 2 1 is which is much larger than the residual deviance. Thus, the loglinear regression model does fit the data well. (1 pt) Page 7
7 Page 7 4. [20 pts] The following table is taken from Lecture 8. Alcohol, Cigarette and Marijuana Use For High School Seniors Marijuana Use Alcohol Cigarette Use Use Yes No Yes Yes No No Yes 3 43 No ## Final Examination Q4 R code and output A<-c(1,1,1,1,0,0,0,0); ## 1--Alcohol use 0--otherwise C<-c(1,1,0,0,1,1,0,0); ## 1---Cigarette use 0---otherwise M<-c(1,0,1,0,1,0,1,0); ## 1-Marijuana use 0-otherwise count<-c(911,538,44,456,3,43,2,279); AC<-A*C; AM<-A*M; CM<-C*M; ACM<-A*C*M; ##Model (AM,CM,AC) fit drug.log<-glm(count~a+c+m+am+cm+ac,family=poisson(link="log")) summary(drug.log) ## output Call: glm(formula = count ~ A + C + M + AM + CM + AC, family = poisson(link = "log")) Coefficients: Estimate Std. Error z value Pr(> z ) (Intercept) < 2e-16*** A e-10 *** C < 2e-16 *** M < 2e-16*** AM <1.31e-10*** CM < 2e-16 *** AC < 2e-16 *** Null deviance: , Residual deviance: ##Estimated covariance matrix between AM and CM AM CM AM CM Page 8
8 Page 8 Let X, Y and Z denote the variables Alcohol, Cigarette and Marijuana use respectively. The 90, 95, 97.5 and 99.5-th percentiles of the standard normal distribution are 1.28, 1.645, 1.96, and respectively. Based on the R code and R output, (a) Write down the loglinear regression model and identify the associated estimates. [5 pts] Solution: The log linear regression model can be written as log(µ ijk ) = λ + λ X i + λ Y j + λ ij + λ XZ ik + λ Y jk Z, i = 1, 2; j = 1, 2; k = 1, 2. (1 pt) Based on the R code, the first scheme of constraints was used. The estimated parameters are (4 pts) = X 1 = X 2 = 0, Y 1 = , Y 2 = 0, Z 1 = , Z 2 = 0 11 = , XZ 11 = , Y Z 12 = 21 = 22 = 0, XZ 12 = XZ 21 = 11 = XZ 22 = 0, Y Z 12 = Y 21 Z = Y 22 Z = 0. (b) Compute the estimated odds ratio between any two variables of Alcohol, Cigarette, and Marijuana use controlling for the third variable. [3 pts] Solution: Since the loglinear regression model (,XZ,YZ) is homogeneous association for any two variables controlling for the third variable. The estimated odds ratio between Alcohol and Cigarette use controlling for Marijuana use is exp( ) = exp( 11 ) = exp( ) = (1 pt) The estimated odds ratio between Alcohol and Marijuana use controlling for Cigarette use is exp( XZ XZ XZ XZ ) = exp( XZ 11 ) = exp(2.9860) = (1 pt) The estimated odds ratio between Cigarette and Marijuana use controlling for Alcohol use is exp( Y Z 11 + Y Z 22 Y Z 12 Y Z 21 ) = exp( Y Z 11 ) = exp( ) = (1 pt) Page 9
9 Page 9 (c) Construct the 95% confidence interval for the true odds ratio between Alcohol and Cigarette use controlling for Marijuana use. [4 pts] Solution: The 95% confidence interval for the true log odds ratio between Alcohol and Cigarette use controlling for Marijuana use is 11 ± 1.96 ASE = ± = (1.7134, ) (2 pts) Thus, the 95% confidence interval for the true odds ratio between Alcohol and Cigarette use controlling for Marijuana use is (exp(1.7134), exp(2.3957)) = (5.5476, ). (2 pts) (d) Test if the true odds ratio between Alcohol and Marijuana use controlling for Cigarette use equals the true odds ratio between Cigarette and Marijuana use controlling for Alcohol use at α = 5%. [ 8 pts] Solution: Set T = λ XZ 11 λ Y 11 Z. It is equivalent to test H 0 : T = 0 vs H 1 : T 0 (1 pt). Now the observed ˆT = XZ 11 Y 11 Z = = (1 pt). In addition, Var( ˆT ) = Var( XZ 11 ) + Var( Y Z 11 ) 2Cov( XZ 11, Y Z 11 ) (1 pt) = ( ) = (2 pts) Therefore, the estimated ASE of ˆT is.2527 =.5027 (1 pt). It follows that ˆT /ASE =.13812/.5027 =.2747 (1 pt) which is smaller than the 95-th percentile of the standard normal distribution, That is, at α = 5%, it is very likely that the true odds ratio between Alcohol and Marijuana use controlling for Cigarette use equals the true odds ratio between Cigarette and Marijuana use controlling for Alcohol use. (1 pt) Page 10
10 Page [15 pts] Consider a three-way contingency table with categorical variables X having 2 categories, Y having 2 categories and Z having K 2 categories. (Hint: To show A if and only if B, you need show both A implies B and B implies A ) (a) Show that the loglinear model (, XZ, Y Z) holds if and only if X and Y have homogeneous association controlling for Z. [9 pts] Proof: If the loglinear model (, XZ, Y Z) holds, then we have log(θ (k) ) = λ 11 + λ 22 λ 12 λ 21, which does not depend on k, the level of Z. Thus, X and Y have homogeneous association controlling for Z. (3 pts) Under the first scheme of constraints, the possible nonzero 3-factor terms are λ Z 11k, k = 1, 2,, K 1. Other 3-factor terms are 0. Then under the saturated loglinear model ( Z), we can show that log(θ (k) ) = λ 11 + λ 22 λ 12 λ 21 + λ 11k Z + λ 22k Z λ 12k Z λ 21k Z = λ 11 + λ 22 λ 12 λ 21 + λ Z, k = 1, 2,, K 1, and log(θ (K) ) = λ 11 + λ 22 λ 12 λ 21 11k. (3 pts) If X and Y have homogeneous association controlling for Z, then log(θ (k) ) = log(θ (K) ) = λ 11 + λ 22 λ 12 λ 21 for k = 1, 2,, K 1. It follows that λ 11k Z = λ 11K Z = 0, k = 1, 2,, K 1. Therefore, in this case, the saturated model ( Z) reduces to the homogeneous association model (, XZ, Y Z). (3 pts) Page 11
11 Page 11 (b) Show that the loglinear model (XZ, Y Z) holds if and only if X and Y are conditionally independent controlling for Z. [6 pts] Proof: If the loglinear model (XZ, Y Z) holds, then we have log(θ (k) ) = λ 11 + λ 22 λ 12 λ 21 = 0. It follows that θ (k) = 1, k = 1, 2,, K. Thus, X and Y are conditionally independent controlling for Z. (3 pts) If X and Y are conditionally independent controlling for Z, then by Part (a), we have 0 = log(θ (k) ) = λ 11 + λ 22 λ 12 λ 21 = 0 for k = 1, 2,, K. Under the first scheme of constraints, the possible nonzero 2-factor terms are λ 11. Other 2-factor terms are 0. Then we have λ 11 = 0. It follows that in this case, the homogeneous association model (, XZ, Y Z) reduces to the conditionally independent model (XZ, Y Z) controlling for Z. (3 pts) Page 12
12 Page [12 pts] (a) Let P (Y = 1) = 1 P (Y = 0) = p. For the population of subjects having Y = j, X has a probability density function f j (x) = λ j exp( λ j x), x 0, j = 0, 1. Show that π(x) = P (Y = 1 x) satisfies the logistic regression model with some α and β. [7 pts] Proof: Since P (Y = 1) = 1 P (Y = 0) = p and the conditional probability density function of X given Y = 0 and Y = 1 are f 0 (x) = λ 0 exp( λ 0 x), x 0, (1 pt) and f 1 (x) = λ 1 exp( λ 1 x), x 0, (1 pt) by Bayes theorem, we have π(x) P (Y = 1 x) = f 1 (x)p (Y =1) f 0 (x)p (Y =0)+f 1 (x)p (Y =1). (1 pt) Therefore, { pf 1 (x) logit(π(x)) = log (1 p)f 0 (x) = log pλ1 [ ] } exp (λ 0 λ 1 )x (1 p)λ 0 = log pλ 1 (1 p)λ 0 + (λ 0 λ 1 )x = α + βx (2 pts) where pλ 1 α = log( ) and β = (λ 0 λ 1 ). (2 pts) (1 p)λ 0 (b) For known n 2, show that the negative binomial distribution with probability mass function, f(y n, µ) = ( ) ( ) n ( y y+n 1 n n 1 µ+n 1 µ+n) n, y = 0, 1, 2,. belongs to the exponential family of distributions. Find out the natural parameter for this distribution. [5 pts] Proof: The probability mass function of the negative binomial distribution can be written as ( ) y + n 1 n f(y n, µ) = ( n 1 µ + n )n (1 n µ + n )y ( ) µ = exp[y log( µ + n ) + n log( n y + n 1 µ + n ) + log ] (2 pts) n 1 This belongs to the exponential family of distributions with θ = log(µ/(µ + n)) (1 pt) and b(θ) = n log(1 e θ ) (1 pt). Here φ = 1, a(φ) = 1 and c(y; φ) = log ( ) y+n 1 n 1. The natural parameter for this distribution is θ = log(µ/(µ + n)). (1 pt) -End of the Paper
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours
NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks
More informationSCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models
SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION
More informationST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples
ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population
More informationSTA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).
STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent
More informationCategorical data analysis Chapter 5
Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases
More informationSTAT 7030: Categorical Data Analysis
STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012
More informationThe material for categorical data follows Agresti closely.
Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical
More informationST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses
ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities
More information9 Generalized Linear Models
9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models
More informationLinear Regression Models P8111
Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started
More informationST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples
ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors
More informationLog-linear Models for Contingency Tables
Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A
More informationBMI 541/699 Lecture 22
BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based
More informationSTA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3
STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae
More informationˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.
Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the
More informationSTAT 526 Spring Midterm 1. Wednesday February 2, 2011
STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points
More informationLogistic Regressions. Stat 430
Logistic Regressions Stat 430 Final Project Final Project is, again, team based You will decide on a project - only constraint is: you are supposed to use techniques for a solution that are related to
More informationUNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator
UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages
More informationMultinomial Logistic Regression Models
Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word
More informationSections 4.1, 4.2, 4.3
Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear
More informationSingle-level Models for Binary Responses
Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =
More information(c) Interpret the estimated effect of temperature on the odds of thermal distress.
STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether
More informationSTAT 526 Spring Final Exam. Thursday May 5, 2011
STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationHomework 1 Solutions
36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports
More informationCohen s s Kappa and Log-linear Models
Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance
More informationSolutions for Examination Categorical Data Analysis, March 21, 2013
STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.
More informationLecture 14: Introduction to Poisson Regression
Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why
More informationModelling counts. Lecture 14: Introduction to Poisson Regression. Overview
Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week
More informationChapter 5: Logistic Regression-I
: Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationTwo Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00
Two Hours MATH38052 Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER GENERALISED LINEAR MODELS 26 May 2016 14:00 16:00 Answer ALL TWO questions in Section
More informationBIOS 625 Fall 2015 Homework Set 3 Solutions
BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationExplanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.
Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present
More informationGeneralized Linear Models. Last time: Background & motivation for moving beyond linear
Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered
More informationSTA216: Generalized Linear Models. Lecture 1. Review and Introduction
STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general
More informationClassification. Chapter Introduction. 6.2 The Bayes classifier
Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode
More informationGeneralized linear models
Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models
More informationGoodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links
Communications of the Korean Statistical Society 2009, Vol 16, No 4, 697 705 Goodness-of-Fit Tests for the Ordinal Response Models with Misspecified Links Kwang Mo Jeong a, Hyun Yung Lee 1, a a Department
More informationSolution to Tutorial 7
1. (a) We first fit the independence model ST3241 Categorical Data Analysis I Semester II, 2012-2013 Solution to Tutorial 7 log µ ij = λ + λ X i + λ Y j, i = 1, 2, j = 1, 2. The parameter estimates are
More information1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches
Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model
More informationVarious Issues in Fitting Contingency Tables
Various Issues in Fitting Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Complete Tables with Zero Entries In contingency tables, it is possible to have zero entries in a
More informationLISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014
LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers
More information12 Modelling Binomial Response Data
c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual
More informationA Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46
A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response
More informationStat 5102 Final Exam May 14, 2015
Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions
More informationClass Notes: Week 8. Probit versus Logit Link Functions and Count Data
Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While
More informationMSH3 Generalized linear model
Contents MSH3 Generalized linear model 7 Log-Linear Model 231 7.1 Equivalence between GOF measures........... 231 7.2 Sampling distribution................... 234 7.3 Interpreting Log-Linear models..............
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap
More informationLogistic Regression. Interpretation of linear regression. Other types of outcomes. 0-1 response variable: Wound infection. Usual linear regression
Logistic Regression Usual linear regression (repetition) y i = b 0 + b 1 x 1i + b 2 x 2i + e i, e i N(0,σ 2 ) or: y i N(b 0 + b 1 x 1i + b 2 x 2i,σ 2 ) Example (DGA, p. 336): E(PEmax) = 47.355 + 1.024
More informationAnalysis of Categorical Data. Nick Jackson University of Southern California Department of Psychology 10/11/2013
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/2013 1 Overview Data Types Contingency Tables Logit Models Binomial Ordinal Nominal 2 Things not
More information8 Nominal and Ordinal Logistic Regression
8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on
More informationSTA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random
STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.
More informationMATH 644: Regression Analysis Methods
MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100
More informationLikelihoods for Generalized Linear Models
1 Likelihoods for Generalized Linear Models 1.1 Some General Theory We assume that Y i has the p.d.f. that is a member of the exponential family. That is, f(y i ; θ i, φ) = exp{(y i θ i b(θ i ))/a i (φ)
More informationFigure 36: Respiratory infection versus time for the first 49 children.
y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects
More informationGeneralized linear models
Generalized linear models Outline for today What is a generalized linear model Linear predictors and link functions Example: estimate a proportion Analysis of deviance Example: fit dose- response data
More informationMatched Pair Data. Stat 557 Heike Hofmann
Matched Pair Data Stat 557 Heike Hofmann Outline Marginal Homogeneity - review Binary Response with covariates Ordinal response Symmetric Models Subject-specific vs Marginal Model conditional logistic
More informationReview of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models
Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses
More informationShort Course Introduction to Categorical Data Analysis
Short Course Introduction to Categorical Data Analysis Alan Agresti Distinguished Professor Emeritus University of Florida, USA Presented for ESALQ/USP, Piracicaba Brazil March 8-10, 2016 c Alan Agresti,
More informationReview: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:
Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic
More informationGeneralized Linear Models 1
Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter
More informationLecture 12: Effect modification, and confounding in logistic regression
Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression
More informationSimple logistic regression
Simple logistic regression Biometry 755 Spring 2009 Simple logistic regression p. 1/47 Model assumptions 1. The observed data are independent realizations of a binary response variable Y that follows a
More informationStat 579: Generalized Linear Models and Extensions
Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,
More informationSTAT 525 Fall Final exam. Tuesday December 14, 2010
STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will
More informationLogistic Regression. Some slides from Craig Burkett. STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy
Logistic Regression Some slides from Craig Burkett STA303/STA1002: Methods of Data Analysis II, Summer 2016 Michael Guerzhoy Titanic Survival Case Study The RMS Titanic A British passenger liner Collided
More informationChapter 4: Generalized Linear Models-II
: Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationSTAT 526 Advanced Statistical Methodology
STAT 526 Advanced Statistical Methodology Fall 2017 Lecture Note 7 Contingency Table 0-0 Outline Introduction to Contingency Tables Testing Independence in Two-Way Contingency Tables Modeling Ordinal Associations
More informationMcGill University. Faculty of Science. Department of Mathematics and Statistics. Statistics Part A Comprehensive Exam Methodology Paper
Student Name: ID: McGill University Faculty of Science Department of Mathematics and Statistics Statistics Part A Comprehensive Exam Methodology Paper Date: Friday, May 13, 2016 Time: 13:00 17:00 Instructions
More informationSTA6938-Logistic Regression Model
Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of
More informationUNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016
UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016 Work all problems. 60 points are needed to pass at the Masters Level and 75 to pass at the
More informationGeneralized logit models for nominal multinomial responses. Local odds ratios
Generalized logit models for nominal multinomial responses Categorical Data Analysis, Summer 2015 1/17 Local odds ratios Y 1 2 3 4 1 π 11 π 12 π 13 π 14 π 1+ X 2 π 21 π 22 π 23 π 24 π 2+ 3 π 31 π 32 π
More informationRegression Methods for Survey Data
Regression Methods for Survey Data Professor Ron Fricker! Naval Postgraduate School! Monterey, California! 3/26/13 Reading:! Lohr chapter 11! 1 Goals for this Lecture! Linear regression! Review of linear
More informationNormal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,
Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability
More informationCategorical Data Analysis Chapter 3
Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,
More informationChapter 4: Generalized Linear Models-I
: Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationAdministration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books
STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,
More informationModel Estimation Example
Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions
More informationStatistics & Data Sciences: First Year Prelim Exam May 2018
Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book
More informationBeyond GLM and likelihood
Stat 6620: Applied Linear Models Department of Statistics Western Michigan University Statistics curriculum Core knowledge (modeling and estimation) Math stat 1 (probability, distributions, convergence
More informationBinary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017
Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which
More informationHomework 10 - Solution
STAT 526 - Spring 2011 Homework 10 - Solution Olga Vitek Each part of the problems 5 points 1. Faraway Ch. 4 problem 1 (page 93) : The dataset parstum contains cross-classified data on marijuana usage
More informationMSH3 Generalized linear model
Contents MSH3 Generalized linear model 5 Logit Models for Binary Data 173 5.1 The Bernoulli and binomial distributions......... 173 5.1.1 Mean, variance and higher order moments.... 173 5.1.2 Normal limit....................
More informationCh 6: Multicategory Logit Models
293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the
More informationGeneralized Linear Models
Generalized Linear Models 1/37 The Kelp Data FRONDS 0 20 40 60 20 40 60 80 100 HLD_DIAM FRONDS are a count variable, cannot be < 0 2/37 Nonlinear Fits! FRONDS 0 20 40 60 log NLS 20 40 60 80 100 HLD_DIAM
More informationGeneralized Estimating Equations
Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson
More information22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression
22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then
More informationIntroducing Generalized Linear Models: Logistic Regression
Ron Heck, Summer 2012 Seminars 1 Multilevel Regression Models and Their Applications Seminar Introducing Generalized Linear Models: Logistic Regression The generalized linear model (GLM) represents and
More informationGood Confidence Intervals for Categorical Data Analyses. Alan Agresti
Good Confidence Intervals for Categorical Data Analyses Alan Agresti Department of Statistics, University of Florida visiting Statistics Department, Harvard University LSHTM, July 22, 2011 p. 1/36 Outline
More informationContrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models:
Contrasting Marginal and Mixed Effects Models Recall: two approaches to handling dependence in Generalized Linear Models: Marginal models: based on the consequences of dependence on estimating model parameters.
More informationHomework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.
EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests
More informationHomework 5 - Solution
STAT 526 - Spring 2011 Homework 5 - Solution Olga Vitek Each part of the problems 5 points 1. Agresti 10.1 (a) and (b). Let Patient Die Suicide Yes No sum Yes 1097 90 1187 No 203 435 638 sum 1300 525 1825
More informationGeneralized Linear Models I
Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.
More informationRon Heck, Fall Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October 20, 2011)
Ron Heck, Fall 2011 1 EDEP 768E: Seminar in Multilevel Modeling rev. January 3, 2012 (see footnote) Week 8: Introducing Generalized Linear Models: Logistic Regression 1 (Replaces prior revision dated October
More informationLecture 10: Introduction to Logistic Regression
Lecture 10: Introduction to Logistic Regression Ani Manichaikul amanicha@jhsph.edu 2 May 2007 Logistic Regression Regression for a response variable that follows a binomial distribution Recall the binomial
More informationSTA 450/4000 S: January
STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have
More information