Homework 1 Solutions

Size: px
Start display at page:

Download "Homework 1 Solutions"

Transcription

1 Homework 1 Solutions Problem 3.4 (a) X and G We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports it to be 0. We therefore reject the null hypothesis of independence between party identification and race. (b) Under the null hypothesis, the standardized Pearson residuals will have approximately a standard normal distribution. The magnitude of the standardized residuals below is therefore evidence against the null hypothesis of independence. Democrat Independent Republican Black White (c) Using the subtable excluding Republicans, the test statistics for choice between Democrat and Independent are X and G Using the subtable for comparing Democrats and Independents as a group to Republicans, the the test statistics are X and G Note that G2 1 + G2 2 G2, but X1 2 + X2 2 X2, as we can partition G 2 exactly but not X 2. Although both sets of tests lead us to reject the corresponding null hypotheses, we can see that the relationship between race and party is greater when we compare Republicans to Democrats and Independents as a group than when we compare Democrats and Independents to each other. (d) The MLE for the log odds ratio is log( /(341 11)) By the delta method, its asymptotic standard error is (1/ / / /11) 1/ A 95% confidence interval for the log odds ratio is ± [1.7706,3.0470], and so a 95% confidence interval for the odds ratio is [5.87,21.05]. Because this interval does not contain 1, we reject the null hypothesis of independence between race and party affiliation. Problem 3.5 The table shows actual and expected counts in each cell. The statistics reported are X and G , each of which is compared to a χ 2 distribution with (2 1)(3 1) 2 degrees of 1

2 freedom, giving p-values in each case of about 0.03, which means that we reject, at the 0.05 level, the hypothesis of independence between gender and party identification. The residuals for each cell are given at the bottom of the output. The first column (Resraw) are the raw residuals: the observed minus the expected counts. The second two columns are the Pearson and standardized Pearson residuals. The magnitude of the standardized residuals is further evidence against the null hypothesis of independence, when we compare them to the null distribution (standard normal). Problem 3.6 The table of counts used in the analysis was the following. Advanced Local Alone Spouse Others This gives test statistics X and G Comparing each to a χ 2 distribution with (2 1)(2 1) 1 degree of freedom, we get p-values of and , respectively. Either could be rounded up to get the value of 0.02 reported in the article. Problem 3.11 (a) The statistics for testing the null hypothesis of independence between family income and educational aspirations are X and G Comparing each to a χ 2 distribution with (3 1)(4 1) 6 degrees of freedom, we get p-values of and 0.178, neither of which would lead us to reject the null hypothesis. However, both of the variables in the table are ordinal, and neither of these tests use this information. (b) The standardized Pearson residuals are below. Roughly speaking, as income increases, so do aspirations. Some high school High school grad Some college College grad Low Middle High (c) To use the information about the ordering, I assigned values of 1 through 4 to each of the aspirations and 1 through 3 to each of the incomes. Then the correlation between income and aspirations is r 0.132, so the M 2 statistic is (n 1)r Comparing this to a χ 2 distribution with 1 degree of freedom, we get a p-value of 0.029, which leads us to reject the null hypothesis of independence. 2

3 Problem 3.12 The point estimate for gamma is Using the delta method (Problem 3.27), the standard error is , giving a confidence interval of ± [0.315, 0.459]. Please see the Appendix for calculations in S-plus. Because this interval does not contain zero, we reject the null hypothesis of independence between schooling and attitude toward abortion. Problem 3.22 Suppose Y Bin(n,π). The MLE for π, ˆπ y/n AN(π,π(1 π)/n). We apply the delta method to g(x) log(x/(1 x)). The first derivative of g evaluated at π is g (π) 1 π(1 π), so using g(ˆπ) AN(g(π),g (π) 2 π(1 π) n ), we have log( ˆπ 1 ˆπ ) AN(log( π 1 π ), 1 nπ(1 π) ). This gives a Wald confidence interval of log( ˆπ 1 ˆπ ) ± z α/2/ nπ(1 π). Because g is monotone, applying g 1 (y) ey 1+e to each endpoint of this interval will give a confidence interval for π itself. y Problem 3.32 (a) There are many counterexamples here. For instance, for the 2 2 table j1 j2 i1 1 2 i2 3 4 the Pearson residuals are j1 j2 i i (b) Using the usual notation for 2 2 tables, each standardized residual is e ij n ij ˆµ ij. ˆµ ij (1 p i+ )(1 p +j ) Write n i C + for the sum of the row not equal to i and likewise n +j C for the sum of the column not equal to j. Then the denominator ˆµ ij (1 p i+ )(1 p +j ) n ( i+n +j 1 n ) ( i+ 1 n ) +j n n n ni+ n +j n i C + n +j C n n n n1+ n 2+ n +1 n +2 n 3, 3

4 which doesn t depend on i or j. So we need only show that the raw residuals n ij ˆµ ij have the same absolute value. Calculating them separately for each i and j, we have the following raw residuals, which all have the same absolute value. i j n ij ˆµ ij (nn ij n i+ n +j )/n 1 1 (n 11 n 22 n 12 n 21 )/n 1 2 (n 12 n 21 n 11 n 22 )/n 2 1 (n 12 n 21 n 11 n 22 )/n 2 2 (n 11 n 22 n 12 n 21 )/n (c) Using the fact given in the problem, that X 2 n(n 11 n 22 n 12 n 21 ) 2 /(n 1+ n 2+ n +1 n +2 ), it s clear that (e ij) 2 (n 11n 22 n 12 n 21 ) 2 /n 2 (n 1+ n 2+ n +1 n +2 )/n 3 X 2 Problem 4.9 (a) When creating my dummy variables, I chose to make light medium the default color category. The model is then Y i Poisson(λ i ), where log(λ i ) β 0 +β 1 weight+β 2 I {colormedium} +β 3 I {colordark medium} +β 4 I {colordark}. The parameter estimates are ˆβ , ˆβ , ˆβ , ˆβ , and ˆβ (If your dummy variables correspond to the first 3 color categories, your parameter estimates should be , , , , and ) The intercept doesn t have a meaningful interpretation; it corresponds to a light medium colored crab weighing zero kilograms. The coefficient for weight is interpreted as follows: for every increase of one kilogram, log(λ) increases by , which means that the mean number of satellites is multiplied by exp(0.5462) The coefficients for color should all be interpreted with reference to the baseline case. Medium colored crabs have on average exp( ) times as many satellites as medium light crabs, dark medium crabs have exp( ) times as many, and dark crabs have exp( ) times as many. (b) For the medium light crab, the estimate of E(Y ) is For the dark crab, it is (c) The difference in deviances for the full model and the model without color is Comparing this to a χ 2 distribution with 3 degrees of freedom (the difference in the number of parameters for each model) gives a p-value of This is evidence against the null hypothesis that a model without color is sufficient. 4

5 (d) Treating color as quantitative, I assign values 1 through 4 to each of the color categories. The systematic component of the model is now log(λ i ) β 0 + β 1 weight + β 2 color. The parameter estimates are ˆβ , ˆβ , and ˆβ Now the interpretation of the color effect is that for each increase of one color category away from light medium, the number of satellites is multiplied by exp( ) Under this model, the estimate of E(Y ) for the medium light crab is For the dark crab, it is Comparing this model to the one without a color effect, the difference in deviances is 8.073, which has a p-value of when compared to a χ 2 distribution with 1 degree of freedom, which is again evidence against the null hypothesis of no color effect. Finally, comparing the fit of this model to the model using dummy variables for color (a more flexible model), the difference in deviances is 0.989, which has a p-value of 0.61 when compared to a χ 2 distribution with 2 degrees of freedom. So we fail to reject the null hypothesis that the simpler, linear model provides an adequate fit. (Note that this model is a special case of the previous model where the differences between adjacent categories are constrained to be equal. Therefore the null χ 2 distribution of the test statistic is valid.) (e) The output for the coefficients in this model is given below. The effect of width does not appear to be significant when weight is already in the model (small t value). Value Std. Error t value (Intercept) weight color width The likelihood ratio test confirms this: the difference in deviances for this model and the model without width is 0.442, which has a p-value of about 0.5 when compared to a χ 2 distribution with 1 degree of freedom. Therefore we fail to reject the null hypothesis that a model with weight and color only is adequate. Problem 4.19 Suppose Y i Bernoulli(π(x i )) for x i {0,1}, with log π(x i ) α + βx i. Then log π(0) α and log π(1) α + β, so solving for β we have β log π(1) log π(0) log ( ) π(1), π(0) and π(1)/π(0) is the relative risk. This link function is not often used because if α + β > 0,π(1) exp(α + β) > 1 and therefore is not a valid probability. Problem

6 Suppose n i Y i Binomial(n i,π i ), and π i Φ( j β jx ij ). Define η i j β jx ij and µ i E[n i Y i ] π i Φ(η i ). Then µ i η i φ(η i ), where φ(u) Φ(u) u. The variance of Y i is the variance of n i Y i divided by n 2 i, which gives w i n i φ 2 (η i ) Φ(η i )[1 Φ(η i )]. So Cov(ˆβ) ˆ (X ŴX) 1, where Ŵ is a diagonal matrix whose diagonal elements are equal to the w i evaluated at ˆβ. In the case of logistic regression, Φ(u) eu 1+e, so φ(u) u eu (1+e u ) and 2 φ(η i ) eη i (1+e η i) eη i 2 1+e 1 η i 1+e π(1 π). Finally, η i Problem 4.27 w i n i[π i (1 π i )] 2 Φ(η i )[1 Φ(η i )] n i[π i (1 π i )] 2 π i (1 π i ) n i π i (1 π i ) (a) Suppose Y ij are independent Poisson with E(Y ij ) µ i for i 1,...,I and j 1,...,n i. Then the log-likelihood is L(µ 1,µ 2,...,µ I ) [ µ i + y ij log µ i log(y ij!)] i1 j1 I n i n i µ i + n i log µ i (y ij!), i1 j1 so L µ i ni/µ i j y ij, and setting this equal to zero we get ˆµ i ȳ i j y ij/n i as the MLE. (Note the second derivative evaluated at ˆµ i is negative, so this is indeed a maximum.) (b) Note that so L(ˆµ;y) i1 j1 j y ij /n i + y ij log( j y ij /n i ) log(y ij!) L(y;y) [ y ij + y ij log(y ij ) log(y ij!)] i1 j1 i1 j1 ni j1 y ij n i y ij, i1 j1 ( ) ȳi 2(L(ˆµ; y) L(y; y)) 2 y ij log y i1 j1 ij ( ) yij 2 y ij log ȳ i1 j1 i 6

7 Problem 4.29 Suppose π(x) Φ(α + βx), where the standard cdf Φ corresponds to a pdf φ that is symmetric around 0. (a) π(x) Φ(α + βx) Φ 1 (0.5) α + βx 0 α + βx x α/β (b) π (x) π(x)0.5 π (x) x α/β Φ(α + βx) x x α/β βφ(α + βx) x α/β βφ(0) For the logit link, Φ(x) ex 1+e x and φ(x) ex (1+e x ) 2, so βφ(0) β/4. For the probit link, φ(x) exp( x 2 /2)/ 2π, so βφ(0) β/ 2π. (c) The regression curve is π(x), a function of x given by π(x) Φ(α + βx) P(Z α + βx),where Z N(0,1) ( ) Z α P x β Since Z N(0,1), Z α β this distribution. N( α/β,1/β 2 ), and π(x) is therefore the cdf corresponding to Appendix: Splus code for problem 3.12 f.gamma <-function(x){ # x is a matrix of counts n <- nrow(x); m <- ncol(x) pihat.mat <- x/sum(x) pihat.mat.c <- pihat.mat.d <- matrix(na, nrown, ncolm) for(i in 1:n){ for(j in 1:m){ pihat.mat.c[i,j] <- sum(pihat.mat[row(pihat.mat)<i & col(pihat.mat)<j]) + sum(pihat.mat[row(pihat.mat)>i & col(pihat.mat)>j]) pihat.mat.d[i,j] <- sum(pihat.mat[row(pihat.mat)<i & col(pihat.mat)>j]) + 7

8 sum(pihat.mat[row(pihat.mat)>i & col(pihat.mat)<j]) } } pihat.c <- sum(pihat.mat * pihat.mat.c) C <- pihat.c*sum(x)^2/2 pihat.d <- sum(pihat.mat * pihat.mat.d) D <- pihat.d*sum(x)^2/2 gamma <- (C-D)/(C+D) psi.mat <- 4*(pihat.d*pihat.mat.c - pihat.c*pihat.mat.d)/(pihat.c+pihat.d)^2 sigma2 <- 16 * sum(pihat.mat*(pihat.d*pihat.mat.c - pihat.c*pihat.mat.d)^2)/ (pihat.c+pihat.d)^4 / sum(x) list(gammagamma, sigma2sigma2, CIgamma+c(-1,1)*1.96*sqrt(sigma2)) } counts <- c(209,101,237,151,126,426,16,21,138) table.3.12 <- cbind(expand.grid(list(attitudec("gd", "MP", "GA"), Schoolc("<HS", "HS", ">HS"))), countcounts) table.3.12.array <- t(design.table(table.3.12)) f.gamma(table.3.12.array) #$gamma: #[1] #$sigma2: #[1] #$CI: #[1]

BIOS 625 Fall 2015 Homework Set 3 Solutions

BIOS 625 Fall 2015 Homework Set 3 Solutions BIOS 65 Fall 015 Homework Set 3 Solutions 1. Agresti.0 Table.1 is from an early study on the death penalty in Florida. Analyze these data and show that Simpson s Paradox occurs. Death Penalty Victim's

More information

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

The material for categorical data follows Agresti closely.

The material for categorical data follows Agresti closely. Exam 2 is Wednesday March 8 4 sheets of notes The material for categorical data follows Agresti closely A categorical variable is one for which the measurement scale consists of a set of categories Categorical

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses

ST3241 Categorical Data Analysis I Multicategory Logit Models. Logit Models For Nominal Responses ST3241 Categorical Data Analysis I Multicategory Logit Models Logit Models For Nominal Responses 1 Models For Nominal Responses Y is nominal with J categories. Let {π 1,, π J } denote the response probabilities

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

(c) Interpret the estimated effect of temperature on the odds of thermal distress.

(c) Interpret the estimated effect of temperature on the odds of thermal distress. STA 4504/5503 Sample questions for exam 2 1. For the 23 space shuttle flights that occurred before the Challenger mission in 1986, Table 1 shows the temperature ( F) at the time of the flight and whether

More information

Chapter 4: Generalized Linear Models-I

Chapter 4: Generalized Linear Models-I : Generalized Linear Models-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models

Review of Multinomial Distribution If n trials are performed: in each trial there are J > 2 possible outcomes (categories) Multicategory Logit Models Chapter 6 Multicategory Logit Models Response Y has J > 2 categories. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. 6.1 Logit Models for Nominal Responses

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Categorical Data Analysis Chapter 3

Categorical Data Analysis Chapter 3 Categorical Data Analysis Chapter 3 The actual coverage probability is usually a bit higher than the nominal level. Confidence intervals for association parameteres Consider the odds ratio in the 2x2 table,

More information

Ch 6: Multicategory Logit Models

Ch 6: Multicategory Logit Models 293 Ch 6: Multicategory Logit Models Y has J categories, J>2. Extensions of logistic regression for nominal and ordinal Y assume a multinomial distribution for Y. In R, we will fit these models using the

More information

Answer Key for STAT 200B HW No. 8

Answer Key for STAT 200B HW No. 8 Answer Key for STAT 200B HW No. 8 May 8, 2007 Problem 3.42 p. 708 The values of Ȳ for x 00, 0, 20, 30 are 5/40, 0, 20/50, and, respectively. From Corollary 3.5 it follows that MLE exists i G is identiable

More information

Advanced Quantitative Methods: limited dependent variables

Advanced Quantitative Methods: limited dependent variables Advanced Quantitative Methods: Limited Dependent Variables I University College Dublin 2 April 2013 1 2 3 4 5 Outline Model Measurement levels 1 2 3 4 5 Components Model Measurement levels Two components

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine.

Explanatory variables are: weight, width of shell, color (medium light, medium, medium dark, dark), and condition of spine. Horseshoe crab example: There are 173 female crabs for which we wish to model the presence or absence of male satellites dependant upon characteristics of the female horseshoe crabs. 1 satellite present

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Poisson regression 1/15

Poisson regression 1/15 Poisson regression 1/15 2/15 Counts data Examples of counts data: Number of hospitalizations over a period of time Number of passengers in a bus station Blood cells number in a blood sample Number of typos

More information

Answer Key for STAT 200B HW No. 7

Answer Key for STAT 200B HW No. 7 Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017

Poisson Regression. Gelman & Hill Chapter 6. February 6, 2017 Poisson Regression Gelman & Hill Chapter 6 February 6, 2017 Military Coups Background: Sub-Sahara Africa has experienced a high proportion of regime changes due to military takeover of governments for

More information

STA6938-Logistic Regression Model

STA6938-Logistic Regression Model Dr. Ying Zhang STA6938-Logistic Regression Model Topic 2-Multiple Logistic Regression Model Outlines:. Model Fitting 2. Statistical Inference for Multiple Logistic Regression Model 3. Interpretation of

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

Linear Regression With Special Variables

Linear Regression With Special Variables Linear Regression With Special Variables Junhui Qian December 21, 2014 Outline Standardized Scores Quadratic Terms Interaction Terms Binary Explanatory Variables Binary Choice Models Standardized Scores:

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models

Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models Chapter 14 Logistic Regression, Poisson Regression, and Generalized Linear Models 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 29 14.1 Regression Models

More information

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game.

Homework 5: Answer Key. Plausible Model: E(y) = µt. The expected number of arrests arrests equals a constant times the number who attend the game. EdPsych/Psych/Soc 589 C.J. Anderson Homework 5: Answer Key 1. Probelm 3.18 (page 96 of Agresti). (a) Y assume Poisson random variable. Plausible Model: E(y) = µt. The expected number of arrests arrests

More information

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples

ST3241 Categorical Data Analysis I Logistic Regression. An Introduction and Some Examples ST3241 Categorical Data Analysis I Logistic Regression An Introduction and Some Examples 1 Business Applications Example Applications The probability that a subject pays a bill on time may use predictors

More information

Intermediate Econometrics

Intermediate Econometrics Intermediate Econometrics Heteroskedasticity Text: Wooldridge, 8 July 17, 2011 Heteroskedasticity Assumption of homoskedasticity, Var(u i x i1,..., x ik ) = E(u 2 i x i1,..., x ik ) = σ 2. That is, the

More information

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic.

I i=1 1 I(J 1) j=1 (Y ij Ȳi ) 2. j=1 (Y j Ȳ )2 ] = 2n( is the two-sample t-test statistic. Serik Sagitov, Chalmers and GU, February, 08 Solutions chapter Matlab commands: x = data matrix boxplot(x) anova(x) anova(x) Problem.3 Consider one-way ANOVA test statistic For I = and = n, put F = MS

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

n y π y (1 π) n y +ylogπ +(n y)log(1 π).

n y π y (1 π) n y +ylogπ +(n y)log(1 π). Tests for a binomial probability π Let Y bin(n,π). The likelihood is L(π) = n y π y (1 π) n y and the log-likelihood is L(π) = log n y +ylogπ +(n y)log(1 π). So L (π) = y π n y 1 π. 1 Solving for π gives

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

This manual is Copyright 1997 Gary W. Oehlert and Christopher Bingham, all rights reserved.

This manual is Copyright 1997 Gary W. Oehlert and Christopher Bingham, all rights reserved. This file consists of Chapter 4 of MacAnova User s Guide by Gary W. Oehlert and Christopher Bingham, issued as Technical Report Number 617, School of Statistics, University of Minnesota, March 1997, describing

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part II Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Solutions for Examination Categorical Data Analysis, March 21, 2013

Solutions for Examination Categorical Data Analysis, March 21, 2013 STOCKHOLMS UNIVERSITET MATEMATISKA INSTITUTIONEN Avd. Matematisk statistik, Frank Miller MT 5006 LÖSNINGAR 21 mars 2013 Solutions for Examination Categorical Data Analysis, March 21, 2013 Problem 1 a.

More information

Categorical Variables and Contingency Tables: Description and Inference

Categorical Variables and Contingency Tables: Description and Inference Categorical Variables and Contingency Tables: Description and Inference STAT 526 Professor Olga Vitek March 3, 2011 Reading: Agresti Ch. 1, 2 and 3 Faraway Ch. 4 3 Univariate Binomial and Multinomial Measurements

More information

STAT 705: Analysis of Contingency Tables

STAT 705: Analysis of Contingency Tables STAT 705: Analysis of Contingency Tables Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Analysis of Contingency Tables 1 / 45 Outline of Part I: models and parameters Basic

More information

Lecture 12: Effect modification, and confounding in logistic regression

Lecture 12: Effect modification, and confounding in logistic regression Lecture 12: Effect modification, and confounding in logistic regression Ani Manichaikul amanicha@jhsph.edu 4 May 2007 Today Categorical predictor create dummy variables just like for linear regression

More information

STAT 525 Fall Final exam. Tuesday December 14, 2010

STAT 525 Fall Final exam. Tuesday December 14, 2010 STAT 525 Fall 2010 Final exam Tuesday December 14, 2010 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 18 Outline 1 Logistic regression for Binary data 2 Poisson regression for Count data 2 / 18 GLM Let Y denote a binary response variable. Each observation

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Some General Types of Tests

Some General Types of Tests Some General Types of Tests We may not be able to find a UMP or UMPU test in a given situation. In that case, we may use test of some general class of tests that often have good asymptotic properties.

More information

STAC51: Categorical data Analysis

STAC51: Categorical data Analysis STAC51: Categorical data Analysis Mahinda Samarakoon April 6, 2016 Mahinda Samarakoon STAC51: Categorical data Analysis 1 / 25 Table of contents 1 Building and applying logistic regression models (Chap

More information

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017 Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which

More information

Discrete Multivariate Statistics

Discrete Multivariate Statistics Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are

More information

Cohen s s Kappa and Log-linear Models

Cohen s s Kappa and Log-linear Models Cohen s s Kappa and Log-linear Models HRP 261 03/03/03 10-11 11 am 1. Cohen s Kappa Actual agreement = sum of the proportions found on the diagonals. π ii Cohen: Compare the actual agreement with the chance

More information

Homework 10 - Solution

Homework 10 - Solution STAT 526 - Spring 2011 Homework 10 - Solution Olga Vitek Each part of the problems 5 points 1. Faraway Ch. 4 problem 1 (page 93) : The dataset parstum contains cross-classified data on marijuana usage

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 6 SOLUTIONS 1a. Under the null hypothesis X has the binomial (100,.5) distribution with E(X) = 50 and SE(X) = 5. So P ( X 50 > 10) is (approximately) two tails

More information

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014

LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R. Liang (Sally) Shan Nov. 4, 2014 LISA Short Course Series Generalized Linear Models (GLMs) & Categorical Data Analysis (CDA) in R Liang (Sally) Shan Nov. 4, 2014 L Laboratory for Interdisciplinary Statistical Analysis LISA helps VT researchers

More information

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).

Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,

More information

STA Outlines of Solutions to Selected Homework Problems

STA Outlines of Solutions to Selected Homework Problems 1 STA 6505 CATEGORICAL DATA ANALYSIS Outlines of Solutions to Selected Homework Problems Alan Agresti January 5, 2004, c Alan Agresti 2004 This handout contains solutions and hints to solutions for many

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Hypothesis testing. Data to decisions

Hypothesis testing. Data to decisions Hypothesis testing Data to decisions The idea Null hypothesis: H 0 : the DGP/population has property P Under the null, a sample statistic has a known distribution If, under that that distribution, the

More information

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression

22s:152 Applied Linear Regression. Example: Study on lead levels in children. Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4): Logistic Regression 22s:52 Applied Linear Regression Ch. 4 (sec. and Ch. 5 (sec. & 4: Logistic Regression Logistic Regression When the response variable is a binary variable, such as 0 or live or die fail or succeed then

More information

TMA 4275 Lifetime Analysis June 2004 Solution

TMA 4275 Lifetime Analysis June 2004 Solution TMA 4275 Lifetime Analysis June 2004 Solution Problem 1 a) Observation of the outcome is censored, if the time of the outcome is not known exactly and only the last time when it was observed being intact,

More information

Data-analysis and Retrieval Ordinal Classification

Data-analysis and Retrieval Ordinal Classification Data-analysis and Retrieval Ordinal Classification Ad Feelders Universiteit Utrecht Data-analysis and Retrieval 1 / 30 Strongly disagree Ordinal Classification 1 2 3 4 5 0% (0) 10.5% (2) 21.1% (4) 42.1%

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data

Class Notes: Week 8. Probit versus Logit Link Functions and Count Data Ronald Heck Class Notes: Week 8 1 Class Notes: Week 8 Probit versus Logit Link Functions and Count Data This week we ll take up a couple of issues. The first is working with a probit link function. While

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Lecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression

Lecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression Lecture notes to Chapter 11, Regression with binary dependent variables - probit and logit regression Tore Schweder October 28, 2011 Outline Examples of binary respons variables Probit and logit - examples

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

When is MLE appropriate

When is MLE appropriate When is MLE appropriate As a rule of thumb the following to assumptions need to be fulfilled to make MLE the appropriate method for estimation: The model is adequate. That is, we trust that one of the

More information

Lecture 13: More on Binary Data

Lecture 13: More on Binary Data Lecture 1: More on Binary Data Link functions for Binomial models Link η = g(π) π = g 1 (η) identity π η logarithmic log π e η logistic log ( π 1 π probit Φ 1 (π) Φ(η) log-log log( log π) exp( e η ) complementary

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Regression with Qualitative Information. Part VI. Regression with Qualitative Information

Regression with Qualitative Information. Part VI. Regression with Qualitative Information Part VI Regression with Qualitative Information As of Oct 17, 2017 1 Regression with Qualitative Information Single Dummy Independent Variable Multiple Categories Ordinal Information Interaction Involving

More information