Likelihoods for Generalized Linear Models

Size: px
Start display at page:

Download "Likelihoods for Generalized Linear Models"

Transcription

1 1 Likelihoods for Generalized Linear Models 1.1 Some General Theory We assume that Y i has the p.d.f. that is a member of the exponential family. That is, f(y i ; θ i, φ) = exp{(y i θ i b(θ i ))/a i (φ) + c(y i ; φ)} for some specific functions a i ( ), b( ) and c( ), with φ known. The parameter θ is called the canonical parameter. The parameter φ, termed the scale parameter, is constant for all i = 1,...,n. For now consider φ to be specified. Consider a log-likelihood for a single observation from the exponential family. l(θ, φ; y) log f(y; θ, φ) = (yθ b(θ))/a(φ) + c(y; φ) S(θ) = l θ = y b (θ) a(φ) I(θ) = 2 l θ = b (θ) 2 a(φ) Consider the following very general results :

2 1.1. SOME GENERAL THEORY But Therefore, which in turn implies E(S(θ)) = 0. Differentiating again, we find 2 f(y; θ, φ)dy = 1 f(y; θ, φ)dy = 1/ θ θ f(y; θ, φ)dy = 0 θ θ log f(y; θ, φ) = 1 f(y; θ, φ) log f(y; θ, φ)f(y; θ, φ)dy + θ2 2 which implies, log f(y; θ, φ)f(y; θ, φ)dy + θ2 f(y; θ, φ). θ log f(y; θ, φ)f(y; θ, φ)dy = 0 θ log f(y; θ, φ)f(y; θ, φ)dy = 0/ θ θ θ θ log f(y; θ, φ) f(y; θ, φ)dy = 0 θ ( θ log f(y; θ, φ))2 f(y; θ, φ)dy = 0, E( 2 log f(y; θ, φ)/ θ 2 ) = E(( log f(y; θ, φ)/ θ) 2 ) These facts are useful to us for deriving the following results. Mean : E(S(θ)) = 0 E(Y b (θ))/a(φ)) = 0 which in turn implies that E(Y ) = b (θ) = µ. The key result here is that µ = b (θ). Variance : E( 2 log f(y; θ, φ)/ θ 2 ) = E(( log f(y; θ, φ)/ θ) 2 ) implies E(b (θ)/a(φ)) = E( ((y b (θ))/a(φ)) 2 ) which in turn implies b (θ)/a(φ) = Var(Y )/a(φ) 2 and hence Var(Y ) = b (θ)a(φ). The variance can be written as a product of b (θ) - a function of the canonical parameter and hence the mean of the distribution - sometimes called the variance function and denoted by V (µ) when considered as a function of µ, the mean - note that this is not in general the variance of Y. a(φ) - a function only of φ RJ Cook 2 October 21, 2008

3 CHAPTER 1. LIKELIHOODS FOR GENERALIZED LINEAR MODELS - can commonly be written as a(φ) = φ/w where φ is called a dispersion parameter and w is a prior weight for this observation. Link Functions The link functions relate η (the linear predictor) to µ, the expected value of the random variable Y. Canonical Links Consider the log-likelihood function from the exponential family. l(θ, φ; y) = (yθ b(θ))/a(φ) + c(y; φ). If y 1, y 2,..., y n is a random sample of n observations we introduce a subscript i and write If we set l(θ, φ; y) = n [(y i θ i b(θ i ))/a i (φ) + c(y i ; φ)] i=1 θ i = η i = x i β we say we have a canonical link (i.e. canonical parameter = linear predictor) l(β, φ; y) = = n i=1 j=1 [(y i x iβ b(x iβ))/a i (φ) + c(y i ; φ)] p n β j ( y i x ij /a i (φ)) i=1 n b(x i β)/a i(φ) + i=1 n c(y i ; φ) If φ is known, n i=1 y ix ij is a sufficient statistic for β j, or equivalently, Xy is a sufficient statistic for β. Canonical links are useful in terms of their statistical properties, but context and goodness of fit should motivate the choice of the link. It often turns out that the canonical links are in fact appropriate (i.e. linear regression with normally distributed observations, logistic regression). i=1 RJ Cook 3 October 21, 2008

4 1.1. SOME GENERAL THEORY a. Normal Distribution Dropping the subscript i, consider a single observation from the N(µ, σ 2 ) distribution. It has density f(y; θ, φ) = = = 1 2πσ 2 exp{ (y µ)2 /2σ 2 } 1 2πσ 2 exp{ (y2 2µy + µ 2 )/2σ 2 } 1 2πσ 2 exp{(yµ µ2 /2)/σ 2 y 2 /2σ 2 } = exp{(yµ µ 2 /2)/σ 2 y 2 /2σ 2 log(2πσ 2 )/2} Matching this up with the general expression for the exponential family, we see θ = µ, φ = σ 2, mean : b (θ) = θ = µ variance : b (θ)a(φ) = 1 φ = σ 2 a(φ) = φ b(θ) = 1 2 θ2 = 1 2 µ2 c(y; φ) = 1 2 (y2 /φ + log(2πφ)) = 1 2 (y2 /σ 2 + log(2πσ 2 )). f(y; θ, φ) = exp{(yµ µ 2 /2)/σ 2 (y 2 /σ 2 + log(2πσ 2 ))/2} variance function : b (θ) = 1 {= V (µ)} canonical link : θ = µ = η (identity) b. Poisson Distribution f(y; λ) = λ y e λ /y! f(y; θ, φ) = exp{(y log λ λ) log y!} Matching these expressions up with the expression for the general exponential family, we see θ = log λ and φ = 1. Furthermore, These give mean : b (θ) = exp{θ} = µ = λ a(φ) = 1 b(θ) = exp{θ} = λ c(y; φ) = log y! RJ Cook 4 October 21, 2008

5 CHAPTER 1. LIKELIHOODS FOR GENERALIZED LINEAR MODELS variance : b (θ)a(φ) = exp{θ} 1 = µ variance function : b (θ) = exp{θ} = µ {= V (µ)} canonical link : θ = log µ = η (log link) c. Binomial Distribution If V Bin(m, π), then f(v; π) = ( m v ) π v (1 π) m v, v = 0, 1,..., m. Now consider the transformation to Y = V/m. Then, ( m f(y; π) = π my) my (1 π) m my, y = 0, 1/m,..., 1 We can write this as ( m f(y; π) = exp{my log π + (m my) log(1 π) + log } my) ( m = exp{my log(π/(1 π)) + m log(1 π) + log } my) ( m = exp{[y log(π/(1 π)) + log(1 π)]/m 1 + log } my) Here we have θ = log(π/(1 π)), φ = 1, w = m, and a(φ) = φ = 1/m b(θ) = log(1 + exp{θ}) = log(1 π) ( m c(y; φ) = log my) mean : b (θ) = exp{θ}/(1 + exp{θ}) = µ = π variance : RJ Cook 5 October 21, 2008

6 1.2. ITERATIVELY REWEIGHTED LEAST SQUARES ( ) b (θ)a(φ) = eθ (1 + e θ ) e θ (e θ ) 1 (1 + e θ ) 2 m ( ) ( ) ( ) e θ 1 1 = 1 + e θ 1 + e θ m = µ(1 µ)/m = π(1 π)/m variance function : b (θ) = µ(1 µ) = V (µ) canonical link : θ = log(µ/(1 µ)) = η (logit link) 1.2 Iteratively Reweighted Least Squares The Score Vector This is a way of interpreting the Newton Raphson algorithm for maximization of the likelihood function. Consider the log-likelihood for a single observation from the exponential family Recall l(θ, φ; y) = [(yθ b(θ))/a(φ) + c(y; φ)] l is a function of θ (we initially assume that φ is known) µ can be expressed in terms of θ through µ = b (θ) η can be expressed in terms of µ through the link function and µ = b (θ) η can be expressed in terms of β through η = x β To find an MLE of β, we want to solve S(β) = l/ β = 0. Consider differentiating with respect to a scalar β j. By the chain rule l = l β j θ θ µ µ η η β j where l θ = (y b (θ))/a(φ) ( ) 1 θ µ µ = = 1 θ b (θ) µ η = µ η η = x j β j RJ Cook 6 October 21, 2008

7 CHAPTER 1. LIKELIHOODS FOR GENERALIZED LINEAR MODELS Since µ = b (θ), and V = b (θ), l = y b (θ) 1 β j a(φ) b (θ) µ η x j = y µ a(φ) 1 V µ η x j = y µ Var(Y ) µ = y µ Var(Y ) η x j ( µ η ) 2 η µ x j = (y µ) W η µ x j where W 1 = (Var(Y ))( η/ µ) 2. With n observations l/ β j = n l i / β j i=1 where W 1 i = Var(y i )( η i / µ i ) 2 and η i / µ i means η/ µ evaluated wrt the covariate vector x i. The score vector will then take the form S(β) = ( l/ β 0, l/ β 1,..., l/ β p 1 ). In vector form we can write S(β) as XW(y µ) η/ µ where in vector form y = (y 1,...,y n ) and µ = (µ 1,..., µ n ) are n 1 vectors, X = (x 1,...,x n ) is a p n matrix, W denotes the diagonal matrix with W = W 1 W 2 W j W n, and denotes an elementwise product Newton Raphson and Fisher Scoring Newton Raphson ˆβ (r+1) = ˆβ (r) + I 1 (ˆβ (r) )S(ˆβ (r) ) where I is the observed information matrix. Fisher Scoring Method Fisher suggested using the expected information matrix rather than the observed information matrix. In general, this simplifies the computations as we shall see. Consider, for a single observation (one observation of many, but subscript omitted for convenience) RJ Cook 7 October 21, 2008

8 1.2. ITERATIVELY REWEIGHTED LEAST SQUARES But Therefore, I jk = 2 l = l β j β k β k β j = { (y µ)w β k = (y µ) { W β k I jk = (y µ) { W β k = (y µ) { W β k Taking expectations we get, { } 2 l I jk = E = E β j β k ( ) } η x j µ ( ) } { η x j W µ µ = µ β k η η = µ β k η x k. = β k ( ) } { η x j + W µ ( ) } η x j + x j Wx k µ { (y µ) β k { W ( η µ ( ) } η x j (y µ) µ β k ( ) } η µ x j µ η x k { ( ) }} η W x j + E {x j Wx k } µ ) } x j E {(y µ)} + x j Wx k Notice that the first term vanishes since E{(y µ)} = 0 by definition. Then, for n observations we can write n I jk = x ij W i x ik = (XWX ) jk where again, W is a diagonal matrix W 1 W = W 2 W j W n i=1 and W 1 i = Var{Y i }( η i / µ i ) 2. The Fisher scoring method operates by utilizing ˆβ (r+1) = ˆβ (r) + I 1 (ˆβ (r) )S(ˆβ (r) ) where I is the expected information matrix given above, as opposed to the observed information matrix which is used in the Newton-Raphson algorithm. RJ Cook 8 October 21, 2008

9 CHAPTER 1. LIKELIHOODS FOR GENERALIZED LINEAR MODELS 1.3 Iteratively Re-weighted Least Squares Overview Why is this called the iteratively re-weighted least squares? It is because of the following manipulation : ˆβ (r+1) = ˆβ (r) + I 1 (ˆβ (r) )S(ˆβ (r) ) I(ˆβ (r) )ˆβ (r+1) = I(ˆβ (r) )ˆβ (r) + S(ˆβ (r) ) (XW(ˆβ (r) )X )ˆβ (r+1) = (XW(ˆβ (r) )X ) ˆβ (r) + S(ˆβ (r) ) (XW(ˆβ (r) )X )ˆβ (r+1) = (XW(ˆβ (r) )X )ˆβ (r) + (XW(ˆβ (r) )(y µ(ˆβ (r) )) ( η(ˆβ (r) )/ µ)) ( ) (XW(ˆβ (r) )X )ˆβ (r+1) = XW(ˆβ (r) ) X ˆβ(r) + (y µ(ˆβ (r) )) ( η(ˆβ (r) )/ µ) Let z = η + (y µ) η/ µ. Then ˆβ (r+1) = (XW(ˆβ (r) )X ) 1 XW(ˆβ (r) )z(ˆβ (r) ) This is the same as the weighted LS estimate of β with dependent variable z(ˆβ (r) ) and weight matrix W(ˆβ (r) ). Since we are updating z with each iteration, it is called re-weighted least squares, and since we have to repeat this estimation procedure until convergence, it is called iteratively re-weighted least squares. Note: 1. Consider a Taylor series approximation of g(y) about µ to get g(y) = g(µ) g (µ)(y µ) + g (µ)(y µ) 2 /2 + z = η+(y µ) η can be thought of as a linearized form of the link function. That is, it provides µ a linear approximation to the functional relationship between the mean of the distribution and the linear predictor. 2. This motivates the choice of W. ( ) 2 ( ) 2 η η Var(Z) = Var(Y ) = a(φ)b (θ) µ µ ( ) 2 ( ) 2 1 µ inverse variance W = Var(Z) = 1 µ (Var(y)) 1 = η a(φ)b (θ) η RJ Cook 9 October 21, 2008

10 1.3. ITERATIVELY RE-WEIGHTED LEAST SQUARES When is Fisher Scoring Method Equivalent to Newton Raphson? This question can be equivalently re-phrased as When is the expected information matrix the same as the observed information matrix? Recall I jk = 2 l = (y µ) β j β k β k { W ( ) } { η x j W µ ( ) } η x j (y µ) µ β k Consider the first term of the above expression for the observed information matrix. Recall that Then W = = V = b (θ) = b (θ)/ θ = µ/ θ. 1 a(φ) V ( ) 2 µ η 1 a(φ) µ/ θ = 1 a(φ) θ µ = 1 a(φ) µ η ( µ η ( µ η ) ( ) µ η ) ( ) µ η with canonical link:θ = η Therefore, with the canonical link, { W ( ) } η x j = x j /a(φ) µ and since β k x j /a(φ) = 0 the expected information matrix equals the observed information matrix. Hence, there is no difference between the Newton Raphson algorithm and the Fisher Scoring algorithm. The difference arises when using other (non-canonical) link functions! Questions Problem 1.1. RJ Cook 10 October 21, 2008

11 CHAPTER 1. LIKELIHOODS FOR GENERALIZED LINEAR MODELS Suppose a population is divided into n strata, and we sample m i subjects from the ith stratum, i = 1,...,n. Let Y ij N(µ i, φ) (independently distributed) denote the response for the jth subject in the ith stratum of the sample, j = 1,..., m i, i = 1,..., n. Suppose all that is available are the sample means for each strata, ȳ 1, ȳ 2,...,ȳ n, and an associated p 1 covariate vector x i = (x i0, x i1,...,x i,p 1 ), where x i0 = 1, i = 1,..., n. Then mi f (ȳ i ; µ i, φ) = exp{ m i (ȳ i µ i ) 2 /(2φ)} 2πφ where < y <, < µ i <, and φ > 0. a. Show that the distribution of Ȳi belongs to the exponential family and find the functions a i ( ), b( ), c( ; ), E(Ȳi), Var(Ȳi), and the canonical link function of g(µ) = η. b. Given the data ȳ 1, ȳ 2,...,ȳ n and the linear predictor η i = x i β, find the specific form of the score vector and information matrix for β and explain how you would obtain maximum likelihood estimates of β 0, β 1,...,β p 1. c. Relate the Newton-Raphson algorithm to any other method of model fitting you may have seen before. Problem 1.2. Consider a setting where Y i1,..., Y imi are m i independently distributed Poisson random variables with Y ij Poisson(µ ij ), j = 1,..., m i, i = 1,...,n. Moreover, assume that the Poisson counts are generated by a time homogeneous Poisson process with µ ij = λ i t ij, where λ i is an underlying rate assumed to be common for Y i1,...,y imi and t ij is the duration of observation leading to the count y ij, j = 1,...,m i, i = 1,...,n. Finally, assume that associated with Y i1,...,y imi is a p 1 covariate vector x i = (x i0, x i1,...,x i,p 1 ) where x i0 = 1, i = 1,...,n, and let β = (β 0,..., β p 1 ). a. Write down the likelihood for the rate functions λ = (λ 1,...,λ n ). b. Show that the distribution for Y i = m i j=1 Y ij belongs to the exponential family and hence find the functions a( ), b( ), c( ; ), E(Y i ), Var(Y i ), and the canonical link function g(µ) = η. c. Given summary data y 1, y 2,...,y n and the linear predictors η i = x i β, find the specific form for an entry of the score vector (i.e. l/ β j ) and the expected information matrix (i.e. E( 2 l/ β j β k )) under the canonical link. d. Briefly describe how to obtain maximum likelihood estimates of β 0, β 1,...,β p 1 using a Fisher Scoring algorithm. Problem 1.3. Let Y 1, Y 2,..., Y n be independent Poisson random variables with means µ 1, µ 2,...,µ n respectively, and let Y = (Y 1,...,Y n ). Associated with each Y i is a covariate vector, x i = (1, x i1,...,x i,p 1 ), of length p. a. Show that η i = log µ i is the natural parameter of the Poisson distribution. b. Find the score vector for β. c. Find the observed and expected information matrix for β and hence show how to obtain the MLE for β. d. FOR STAT 831 ONLY Show that T = XY is a vector of sufficient statistics for β where X is the p n matrix with columns comprised of covariate vectors. Problem 1.4. RJ Cook 11 October 21, 2008

12 1.3. ITERATIVELY RE-WEIGHTED LEAST SQUARES Suppose you observe a sample y 1, y 2,...,y n consisting of realizations of n independent Poisson random variables where E(Y i ) = µ i. Suppose that associated with y i is a p 1 vector of explanatory variables (1, x i1, x i2,...,x i,p 1 ). A Poisson regression model with the canonical link takes the form log(µ i ) = β 0 + β 1 x i β p 1 x i,p 1. The likelihood for the vector of regression coefficients β is constructed by making the substitution above into the likelihood for the n means µ i,...,µ n. To answer the following questions, you may either calculate the derivatives using standard methods, or use the general results given in class for the exponential family. a. Write down the score vector for the regression coefficients β i, i = 0,..., p 1. b. Write down the observed and expected information matrix for the regression coefficients. Are they the same or different? Why? c. What is the form of the weight function? What types of observations will have the largest and smallest weights? Problem 1.5. Suppose y i1, y i2,..., y imi are observations from a Gaussian distribution with mean µ i and variance σ 2, i = 1, 2,..., I. Associated with each y i = (y i1, y i2,...,y imi ) is a vector of explanatory variables x i = (x i0, x i1,...,x i,p 1 ). a. Show that ȳ i = n i j=1 y ij/m i is suffient for µ i. b. Show that the distribution of ȳ i is in the exponential family and identify the parameters θ i and φ, and the functions a i (φ), b(θ i ), and c(y i ; φ). c. If we want to set up a regression model, what is the canonical link. d. Write down the score and information function and indicate connections between the Fisher scoring/newton Raphson iterations and another method for estimation regression parameters that you ve encountered before. Problem 1.6. Consider the table below which summarizes data from two samples with independent binomial responses for each. Outcome Present Absent Total Group 1 y m 1 y m 1 Group 2 t y m 2 t + y m 2 Total t m. t m. The conditional distribution of y given t, the first column total, (and m 1 and m 2 ) is ( )( ) m 1 m 2 exp{yα} y t y f(y t, m 1, m 2 ) = ( )( ) m 1 m 2 exp{vα} v t v v S where S = {v : max(0, t m 2 ) v min(m 1, t)}. RJ Cook 12 October 21, 2008

13 CHAPTER 1. LIKELIHOODS FOR GENERALIZED LINEAR MODELS a. Show that this distribution belongs to the exponential family and hence find the canonical parameter, the functions a( ), b( ), c( ; ), E(Y ), var(y ) and the canonical link function. b. Suppose now that we have a series of n independent 2 2 tables of the sort above and let x i = (1, x i1,...,x i,p 1 ) denote a p 1 vector of explanatory variables for the ith table. Introducing subscripts to distinguish data from different tables, we then summarize the data from table i as Outcome Present Absent Total Group 1 y i m 1i y i m 1i Group 2 t i y i m 2i t i + y i m 2i Total t i m.i t i m.i The conditional distribution of y i given t i, the first column total, (and m 1i and m 2i ) is ( )( ) f(y i t i, m 1i, m 2i ) = m 1i y i ( v S i m 1i v m 2i t i y i )( m 2i t i v exp{y i α i } ) exp{vα i } with S i = {v : max(0, t i m 2i ) v min(m 1i, t i )}. Given the data y 1, y 2,...,y n and the linear predictor η i = x iβ where β = (β 0,...,β p 1 ), find the specific form of the score and information function and explain how you would obtain maximum likelihood estimates of β 0, β 1,...,β p 1. Problem 1.7. Consider n 2 2 tables where the ith table is given by Success Failure Total Group 1 y i m i1 y i m i1 Group 2 t i y i m i2 t i + y i m i2 Total t i m i t i m i a. Derive the conditional distribution of Y i given m i1, m i2 and t i and show that this belongs to the exponential family of distributions. Find the canonical parameter and hence find the conditional mean and variance of Y i. b. Suppose that associated with the ith table is a p 1 covariate vector x i = (1, x i1,...,x i,p 1 ), i = 1,...,n. Use the canonical link and the systematic component as η = X β where X is a p n matrix of column vectors x 1,...,x n. Find the score vector and the observed and expected information matrix for β, and hence show how to obtain the MLE of β. Problem 1.8. Suppose a population is divided into n strata, and we sample m i subjects from the ith stratum, i = 1,...,n. Let Y ij Bernoulli(π i ) (independently distributed) denote the response for the jth subject in the ith stratum of the sample, so that P(Y ij = 1) = π i and P(Y ij = 0) = 1 π i, j = 1,..., m i, i = 1,..., n. Suppose that the responses available from the strata are simply the means y 1, y 2,...,y n (where y i = m i j=1 y ij/m i, i = 1,...,n), and that the associated sample sizes, m 1,...,m n, are also available. a. Show that the distribution of y i belongs to the exponential family by identifying the functions a( ), b( ) and c( ; ), obtain E(Y i ) and Var(Y i ), and name the canonical link function. RJ Cook 13 October 21, 2008

14 1.3. ITERATIVELY RE-WEIGHTED LEAST SQUARES b. Suppose that associated with y i is a p 1 covariate vector x i = (1, x i1,...,x i,p 1 ), i = 1,...,n. If β = (β 0,...,β p 1 ) is a p 1 vector of regression coefficients, let η i = x iβ denote the linear predictor for stratum i, i = 1,...,n. Find the specific form of the score vector and information matrix and explain how you would obtain the maximum likelihood estimate of β, which we denote as ˆβ. c. What is the mean vector and covariate matrix for the asymptotic distribution of the MLE ˆβ? d. STAT 831 ONLY Is there any information lost about the vector of regression coefficients, β, when only the sample means and the sample sizes are available from each stratum (as opposed to when the individual subjects responses are available)? Explain. RJ Cook 14 October 21, 2008

15 2 Basic methods for the Analysis of Binary Data 2.1 Introduction Binary responses generally require different analysis techniques than have been considered so far in regression courses. Examples of binary responses include disease status (diseased/not diseased) and survival status (dead/alive). In addition to have such a binary response, we often have a single binary covariate of interest. Examples include treatment (experimental/control) or exposure (exposed to radiation/not exposured to radiation) variables. To summarize the data, we might construct a 2 2 table as follows. Table 2.1. A 2 2 Table Disease Present Absent Group 1 y 1 m 1 y 1 m 1 Group 2 y 2 m 2 y 2 m 2 Total y. m. y. m. If we have m 1 and m 2 fixed, we typically assume we ve got two independent binomial samples with Y k Bin(m k, π k ), k = 1, 2. There are many measures of association one can consider for such tables, but here we will focus on the odds ratio. The odds of one event versus another is simply the ratio of their respective probabilities. Therefore, the odds of disease versus no disease in Group 1 is π 1 /(1 π 1 ). We sometimes just refer to this as the odds of disease in Group 1. In this context, the odds is a 1-1 monotonically increasing function of π 1 which takes on values on the non-negative real line. The odds of disease in Group 2 is π 2 /(1 π 2 ). The odds ratio reflecting the relative odds of disease in Group 1 versus Group 2 is then ψ = π 1/(1 π 1 ) π 2 /(1 π 2 ). Note that in the case of a rare disease (i.e. when π 1 and π 2 are very small, then ψ is close to the relative risk, π 1 /π 2. This can be seen by noting that ψ = π 1/(1 π 1 ) π 2 /(1 π 2 ) = π ( ) 1 1 π2. π 2 1 π 1 When π 1 π 2 is small the fraction in parentheses is close to 1.

16 2.2. ESTIMATION OF THE ODDS RATIO 2.2 Estimation of the Odds Ratio We would like to use likelihood theory to estimate ψ and therefore need to construct an appropriate likelihood function. Note that ( ) ( ) m1 Pr(Y 1 = y 1, Y 2 = y 2 ) = π y 1 y 1 (1 π 1) m 1 y 1 m2 π y 2 1 y 2 (1 π 2) m 2 y 2 2 L(π 1, π 2 ) = π y 1 1 (1 π 1) m 1 y 1 π y 2 2 (1 π 2) m 2 y 2 ( ) y1 ( ) y2 π1 L(π 1, π 2 ) = (1 π 1 ) m π2 1 (1 π 2 ) m 2 1 π 1 1 π 2 ( ) y1 ( ) y2 +y π1 /(1 π 1 ) 1 π2 L(π 1, π 2 ) = (1 π 1 ) m 1 (1 π 2 ) m 2 π 2 /(1 π 2 ) 1 π 2 We want to reparameterize to get rid of π 1 and so note that if ψ = π 1 /(1 π 1 )/[π 2 /(1 π 2 )] then π 1 = ψπ 2 /[1 π 2 + ψπ 2 ]. Substituting into the above likelihood we get ( ) y2 +y 1 [ ] m1 L(ψ, π 2 ) = ψ y π2 (1 π 1 2 ) (1 π 2 ) m 2 1 π 2 (1 π 2 + ψπ 2 ) Now that we have a likelihood involving the parameters of interest, we can consider further reparameterization to enable us to obtain Wald-type quantities for inference. Wald-type quantities are most appealing when the corresponding parameters are unrestricted (i.e. the parameter space is the real line). Therefore consider reparameterizing to β = log ψ and α = log(π 2 /(1 π 2 )). Here we get where y = y 1 + y 2. Then the log-likelihood is L(α, β) = e y 1β (1 + e α+β ) m 1 e y.α (1 + e α ) m 2, l(α, β) = y 1 β m 1 log(1 + e α+β ) + y.α m 2 log(1 + e α ). Differentiating with respect to α and β we get S α (α, β) = S β (α, β) = l(α, β) α l(α, β) β = m 1e α+β 1 + e α+β + y. m 2e α 1 + e α = y 1 m 1e α+β 1 + e α+β If S(α, β) = (S α (α, β), S β (α, β)), solving S(α, β) = 0 gives ˆα = log(y 2 /(m 2 y 2 )) and ˆβ = log( y 1/(m 1 y 1 ) y 2 /(m 2 y 2 ) ). These estimates are natural since they imply ˆπ 2 = eˆα /(1+eˆα ) = y 2 /m 2, ˆπ 1 = y 1 /m 1 and ˆψ = ˆπ 1 /(1 ˆπ 1 )/[ˆπ 2 /(1 ˆπ 2 )]. Note that RJ Cook 16 October 21, 2008

17 CHAPTER 2. BASIC ANALYSIS OF BINARY DATA ] e I αα = [ m α+β (1 + e α+β ) e α+β e α+β e α (1 + e α ) e 2α 1 m (1 + e α+β ) 2 2 (1 + e α ) 2 e α+β = m 1 (1 + e α+β ) + m 2 2 (1 + e α ) 2 ] e I αβ = [ m α+β (1 + e α+β ) e α+β e α+β 1 (1 + e α+β ) 2 e α+β = m 1 (1 + e α+β ) 2 ] e I ββ = [ m α+β (1 + e α+β ) e α+β e α+β 1 (1 + e α+β ) 2 = m 1 e α+β (1 + e α+β ) 2 e α We are interested in the (β, β) entry of I 1 which we denoted by I ββ in section 1.2. This is given by I ββ (α, β) = [I ββ I βα I 1 ααi αβ ] 1 and here we obtain which we evaluate at ˆα, ˆβ to give I ββ (α, β) = 1 E(Y 1 ) + 1 E(m 1 Y 1 ) + 1 E(Y 2 ) + 1 E(m 2 Y 2 ), I ββ (ˆα, ˆβ) = 1 y m 1 y 1 y 2 m 2 y 2 Proof : First note that we can write I αα = m 1 π 1 (1 π 1 ) + m 2 π 2 (1 π 2 ), I αβ = I βα = m 1 π 1 (1 π 1 ) and I ββ = m 1 π 1 (1 π 1 ). Now we get [I 1 ] ββ = I ββ = [I ββ I βα I 1 αα I αβ] 1 = [m 1 π 1 (1 π 1 ) (m 1 π 1 (1 π 1 )) 2 /(m 1 π 1 (1 π 1 ) + m 2 π 2 (1 π 2 ))] 1 m 1 π 1 (1 π 1 )m 2 π 2 (1 π 2 ) = [ m 1 π 1 (1 π 1 ) + m 2 π 2 (1 π 2 ) ] 1 1 = m 1 π 1 (1 π 1 ) + 1 m 2 π 2 (1 π 2 ) = m 1 π 1 m 1 (1 π 1 ) m 2 π 2 1 m 2 (1 π 2 ) = 1 E(Y 1 ) + 1 E(m 1 Y 1 ) + 1 E(Y 2 ) + 1 E(m 2 Y 2 ). Given this result we may obtain a Wald-type approximate 95% CI for β as ( [ 1 1 ˆβ ] 1/2 [ , ˆβ ] ) 1/2 1 + y 1 m 1 y 1 y 2 m 2 y 2 y 1 m 1 y 1 y 2 m 2 y 2 which we denote as (ˆβ L, ˆβ ) (eˆβ U ). An approximate 95% CI for ψ is then given by L, eˆβ U. We are not typically that interested in α or π 2 in such 2 2 tables. Note that given the likelihood and RJ Cook 17 October 21, 2008

18 2.3. MULTIPLE REGRESSION FOR BINARY RESPONSES the asymptotic (large sample) results in section 1.2 we could use the likelihood itself to conduct inference about β and hence ψ. The Wald-type pivotal used here is much more convenient and since the range of values for β is unrestricted the results will generally agree very closely. 2.3 Multiple Regression for Binary Responses The results of the preceding section were directed at the case with a single factor variable with two levels and a binary response. This is a simple setting but more often we need multiple regression methodology since we may a. want to be able to control for confounding variables and hence want to examine the effect of several (possibly related collinear) variables simultaneously, b. want to examine the effect of categorical covariates (>2 levels) or continuous covariates. c. want to develop sophisticated models that describe complex relationships. Example : Consider the data in the following table which describes the relationship between the level of prenatal care and fetal mortality. The data arose from two clinics which we refer to as Clinic A and Clinic B (not their real names!). Table 2.2. Prenatal Care Data from Two Clinics. Died Survived Total Intensive Regular Here we obtain ˆψ = ˆπ 1 /(1 ˆπ 1 )/[π 2 /(1 π 2 )] = 20/316/[46/373] = 0.51 and a 95% CI of ψ of (0.30, 0.89). This suggests a strong association between level of prenatal care and fetal mortality. However if we consider data from just those subjects who are at Clinic A, we get the following table. This gives ˆψ = /[12 293] = 0.80 with a 95% CI for ψ of (0.37, 1.73). While the odds ratio Table 2.3. Prenatal Care Data from Patients at Clinic A Died Survived Intensive Regular estimate is in the direction of a protective effect with intensive prenatal care, the confidence interval is quite wide and includes values above one (which correspond to an increased risk of mortality). Now we consider the corresponding data from Clinic B. This gives ˆψ = 4 197/[23 34] = 1.01 with a 95% CI for ψ of (0.33, 3.10). Note that the reduction in the odds of fetal mortality due to intensive prenatal care which we observed in the RJ Cook 18 October 21, 2008

19 CHAPTER 2. BASIC ANALYSIS OF BINARY DATA Table 2.4. Prenatal Care Data from Patients at Clinic B Died Survived Intensive Regular pooled data, appears to have vanished for patients in Clinic B, and we found above that the estimate of benefit is considerably smaller (and no longer significant) among patients in Clinic A. For further investigation, we examine the relationship between clinic and level of care. Table 2.5. The Association Between Clinic and Level of Care Clinic A Clinic B Intensive Regular Here we get ˆψ = with a 95% CI for ψ of (9.12, 21.76). This suggests a very strong and statistically significant relationship between clinic and the intensity of prenatal care. Specifically, we can see that the proportion of patients in Clinic A receiving intensive prenatal care is considerably higher than it is for patients in Clinic B. The following table displays the relationship between clinic and mortality. Table 2.6. The Association Between Clinic and Mortality Rate Clinic A Clinic B Died Survived Here we obtain ˆψ = 0.35 with a 95% CI (0.21, 0.58). This suggests that there is a statistically significantly (significant at the 5% level since the 95% confidence interval does not include one) higher rate of mortality in Clinic B. Finally, we can tabulate the relationship between clinic and mortality stratified by level of care. We do that in the following table. We will return to this example shortly. To summarize what we found here, we found that there is an apparent strong association between level of prenatal care and fetal mortality. When stratifying by clinic, evidence of this apparent association is greatly reduced. When we stratify by level of prenatal care, there is a reduced risk of mortality for patients in Clinic A versus Clinic B. We aim to study how these findings might be reflected in a regression model. RJ Cook 19 October 21, 2008

20 2.4. SETTING UP A BINOMIAL REGRESSION MODEL Table 2.7. Stratified Tabulation of Clinic Effects Intensive Regular Died Survived Died Survived Total Clinic A Clinic B Setting up a Binomial Regression Model Introduction and Notation Let x 1, x 2,..., x p 1 be a set of p 1 explanatory variables and x = (1, x 1, x 2,..., x p 1 ) be a p 1 vector of explanatory variables. Let β = (β 0, β 1,..., β p 1 ) be a p 1 vector of parameters. The scalar quantity η = x β = β 0 + β 1 x β p 1 x p 1 is called the linear predictor. Let x i = (1, x i1, x i2,..., x i,p 1 ) be the vector of covariates for the ith subject, i = 1, 2,..., n. Define X = x 11 x 21 x n1.. x 1,p 1 x 2,p 1 Then the vector of linear predictors is given by η 1. x n,p 1 η =. = X β η n = (x 1, x 2,..., x n ) Recall in the context of the Gaussian linear model Y i N(µ i, σ 2 ) are independent and we set E(Y i ) = µ i = η i = β 0 + β 1 x i1 + β 2 x i2 + + β p 1 x i,p 1, Now consider binomial data with Y i Bin(m i, π i ). We might think of a regression model of the form E(Y i /m i ) = π i = η i = β 0 + β 1 x i1 + + β p 1 x i,p 1 Is this a reasonable model? It is not particularly convenient to work with since we ve got to impose constraints on the RHS because 0 π i 1, i = 1, 2,..., n. Therefore, rather than working with π i directly, we work with a function of it. The so-called link function defines such a transformation, which typically maps [0, 1] (, + ). We denote the link function by g(π). Name of Link Function Expression Identity g(π) = π log-log Probit Logit g(π) = log( log(π)) g(π) = Φ 1 (π) g(π) = log(π/(1 π)) RJ Cook 20 October 21, 2008

21 CHAPTER 2. BASIC ANALYSIS OF BINARY DATA Φ is the cdf for a standard normal random variable. Having selected the logit link, our regression model takes the form g(π) = η. Introducing the subscript i to distinguish individuals we have The Logit Link and Odds Ratios π i log( ) = x 1 π iβ = β 0 + β 1 x i1 + + β p 1 x i,p 1 i Let Y 1 and Y 2 denote the number of individuals with the outcome in Groups 1 and 2 respectively. We let Y k Bin(m k, π k ), k = 1, 2. Let x i = 1 if the ith individual is in Group 1 and x i = 0 otherwise. We now consider a model for each individual s response (which is binary, not binomial) with the following form ( ) πi log = β 0 + β 1 x i. 1 π i Consider the log odds for a subject in group 1 as ( ) πi log = β 0 + β 1 1 π i and the log odds for a subject in group 2 as This implies that log ) = β 0 1 π j ( πj ( ) ( ) πi πj log log = β 1 1 π i 1 π j which means log ψ = β 1 where ψ is the odds ratio comparing the odds of an event for a subject in group 1 versus a subject in group 2. Therefore, the regression coefficient from this logistic model may be interpreted as a log odds ratio describing the association between group membership and the outcome. Frequently we are interested in the parameter π itself. In this case note that ( ) πi log = β 0 + β 1 x i 1 π i π i = e β 0+β 1 x i 1 π i eβ 0+β 1 x i π i = 1 + e β 0+β 1 x i In a Gaussian model, given ˆβ, the fitted value for E(y i ) is ˆµ i (x i ) = x i ˆβ. In this binomial regression model, the fitted value for E(Y i /m i ) is eˆβ 0 +ˆβ 1 x i1 ˆπ i = ˆπ(x i ) = 1 + eˆβ 0 +ˆβ 1 x i1 More generally, these fitted values may be written as RJ Cook 21 October 21, 2008

22 2.4. SETTING UP A BINOMIAL REGRESSION MODEL ˆπ i = ˆπ(x i ) = exp(x ˆβ) i 1 + exp(x ˆβ) i Now consider the case with two binary explanatory variables. Let { 1 if factor A present x i1 = 0 otherwise { 1 if factor B present x i2 = 0 otherwise { 1 if A and B present x i3 = 0 otherwise Consider the model ( ) πi log = β 0 + β 1 x i1 + β 2 x i2 1 π i and interpret the effect of x i1. First we compute the log odds when factors A and B are present, where x i = (1, 1, 1). Here we get ( ) πi log = β 0 + β 1 + β 2. 1 π i Then we get the log odds when factors A is absent but B present by noting that x j = (1, 0, 1) and ( ) πj log = β 0 + β 2. 1 π j Taking the difference of these log odds gives ( ) ( ) πi πj log log = log 1 π i 1 π j ( ) πi (1 π j ) = β 1 π j (1 π i ) Therefore β 1 is again the log odds ratio reflecting the effect of factor A, but this time we are controlling for factor B and are specifying that factor B is present. Note that the effect of factor A is the same regardless of the level of B. To see this we examine the log odds when factors A is present B absent, where x i = (1, 1, 0) and ( ) πi log = β 0 + β 1 1 π i and the log odds when factor A is absent and B is absent, where x j = (1, 0, 0) and ( ) πj log = β 0. 1 π j Again the difference gives log ( ) πi /(1 π i ) = β 1. π j /(1 π j ) RJ Cook 22 October 21, 2008

23 CHAPTER 2. BASIC ANALYSIS OF BINARY DATA Now consider the model ( ) πi log = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 1 π i where we have introduced an interaction term. The log odds when factor A and B is present is obtained by noting that x i = (1, 1, 1, 1) and so ( ) πi log = β 0 + β 1 + β 2 + β 3. 1 π i When factor A is absent and B is present x j = (1, 0, 1, 0) giving ( ) πj log = β 0 + β 2. 1 π j Taking the difference again we find ( ) ( ) πi πj log log = β 1 + β 3 1 π i 1 π j ( ) πi (1 π (giving log j ) π j (1 π i = β ) 1 + β 3 ). The log odds when factor A is present and B is absent leads to x i = (1, 1, 0, 0) and ( ) πi log = β 0 + β 1 1 π i The log odds when factor A is absent and B is absent is (x i = (1, 0, 0, 0) ). ( ) πj log = β 0 giving log 1 π j ( ) πi (1 π j ) π j (1 π i = β ) 1. With interaction term, we find that the effect of factor A depends on the presence or absence of factor B. If factor B is absent the log odds ratio relating A to the outcome is β 1, but if factor B is present it is β 1 + β 3. Now we consider the data from the prenatal care example from before. Here we set up a regression model for the analyses of interest. Again our response Y i Bin(m i, π i ), i = 1, 2,..., n and we have explanatory variables { 1 Clinic A x i1 = 0 Clinic B { 1 intensive level of care x i2 = 0 regular level of care { 1 intensive level of care and Clinic A x i3 = 0 otherwise Before considering the data analysis, we note that the parameters of the model can often be interpreted quickly and more easily if we write down the following tables. For a model with ( ) π log = β 0 + β 1 x i1 + β 2 x i2 1 π we can write RJ Cook 23 October 21, 2008

24 2.4. SETTING UP A BINOMIAL REGRESSION MODEL Clinic Level of Care x i π i /(1 π i ) B regular (1, 0, 0) e β 0 B intensive (1, 0, 1) e β 0 +β 2 A regular (1, 1, 0) e β 0+β 1 A intensive (1, 1, 1) e β 0+β 1 +β 2 where we make use of the fact that π i /(1 π i ) = exp(x iβ). The last column reports the odds of mortality for the four combinations of the risk factors. If we divide the corresponding terms we can obtain odds ratios. For example among those patients in Clinic B the relative odds of mortality for those with intensive versus regular care is e β 0+β 2 e β 0 = e β 2. Among those in Clinic A the relative odds of mortality for those with intensive versus regular care is e β 0+β 1 +β 2 e β 0+β 1 = e β 2 the same expression we got before for those in Clinic B. By similar methods we can see that the relative odds of mortality for those in Clinic A versus Clinic B is e β 1 regardless of their drinking status. Now consider the model ( ) πi log = β 0 + β 1 x i1 + β 2 x i2 + β 3 x i3 1 π i where now x i = (1, x i1, x i2, x i3 ) and β = (β 0, β 1, β 2, β 3 ). Clinic Level of Care x i π i /(1 π i ) B regular (1, 0, 0, 0) e β 0 B intensive (1, 0, 1, 0) e β 0 +β 2 A regular (1, 1, 0, 0) e β 0+β 1 A intensive (1, 1, 1, 1) e β 0+β 1 +β 2 +β 3 Here the odds ratio of mortality for those with intensive versus regular care among those in Clinic B is e β 0+β 2 e β 0 = e β 2. However the corresponding odds ratio among those in Clinic A is e β 0+β 1 +β 2 +β 3 e β 0+β 1 = e β 2+β 3. If β 3 = 0 then the effect of level of prenatal care does not depend on the clinic and vice versa. If β 3 = 0 and β 2 = 0 as well, it means not only does the effect of level of care not depend on clinic, but there is no such effect. RJ Cook 24 October 21, 2008

25 CHAPTER 2. BASIC ANALYSIS OF BINARY DATA Logistic Regression Analysis of Prenatal Care Data What follows is the data file prenatal.dat in which the first line contains the variable labels and the remaining four lines the data. As before we are using indicator variables for the explanatory variables and have binomial response data. clinic loc y m The program used to analyse the data is given below. Splus program for analysis of prenatal care data help.start() prenatal.dat_read.table("prenatal.dat", header=t) % here we construct the response variable for the logistic regression analysis prenatal.dat$resp_cbind(prenatal.dat$y,prenatal.dat$m-prenatal.dat$y) prenatal.dat % now we fit the model using the glm function and store the result in "model1" % we indicate "resp" contains a binomial response and that we are using the % logistic link function model1_glm(resp ~ loc, family=binomial(link=logit),data=prenatal.dat) summary(model1) % the "names" function lists the contents of the object "model1" and following % this statement we examine some of the contents of these objects (try it) names(model1) model1$family model1$formula model1$coefficients model1$deviance model1$fitted.values model1$residuals % now we fit a model to examine the relationship between level of care % and mortality adjusting for clinic model2_glm(resp ~ clinic + loc, family=binomial(link=logit),data=prenatal.dat) summary(model2) % here we examine whether the association between loc and mortality depends on % the clinic model3_glm(resp ~ loc + clinic + loc*clinic, family=binomial(link=logit),data=prenatal.dat) summary(model3) % now we examine the marginal relationship between mortality and clinic model4_glm(resp ~ clinic, family=binomial(link=logit),data=prenatal.dat) summary(model4) A selection of the output printed from the summary commands summary(model1), summary(model2), summary(model3) and summary(model4) follows. > prenatal.dat$resp_cbind(prenatal.dat$y,prenatal.dat$m-prenatal.dat$y) > prenatal.dat clinic loc y m resp.1 resp Here we print out the augmented dataframe to see what the resp variables looks like. Next we fit the regression model examining the relationship between the level of care and mortality. A portion of the output is reported below. RJ Cook 25 October 21, 2008

26 2.4. SETTING UP A BINOMIAL REGRESSION MODEL > model1_glm(resp ~ loc, family=binomial(link=logit),data=prenatal.dat) > summary(model1) Coefficients: Value Std. Error t value (Intercept) loc Null Deviance: on 3 degrees of freedom Residual Deviance: on 2 degrees of freedom Number of Fisher Scoring Iterations: 3 The numbers under the heading Value are the maximum likelihood estimates of the regression coefficients ˆβ 0 and ˆβ 2 (Here we are using the convention that the subscripts for the regression coefficients coincide with the subscripts on the variables themselves). The numbers under Std. Error are estimated standard errors based on the inverse of the information matrix (more on this shortly). Finally, the numbers under t value are Wald-type test statistics for testing the hypothesis that H 0 : β k = 0 vs. H 0 : β k 0. Note that these are of the form (ˆβ k 0)/s.e.(ˆβ k ). These test statistics are approximately standard normal if the null hypothesis is true. To verify note that for testing the effect of level of care based on model 1 we find / = The p values can be computed as p value = 2 pr(u > ( ˆβ k 0)/s.d.(ˆβ k ) ) where U N(0, 1). Therefore, if we test the hypothesis that there is no relation between level of care and mortality (H 0 : β 2 = 0.0) we get 2 pr(u > ) = 2 (1 pnorm( )) = Here we conclude that those patients receiving more intensive care are at a significantly lower risk of mortality than those receiving standard level of care, and that this evidence is rather strong. To further characterize this dependence we need to conduct inference about the odds ratio. Recall that exp(β 2) is the odds ratio parameter, exp(ˆβ 2) is the MLE, and (exp(ˆβ s.e.(ˆβ 2)),exp(ˆβ s.e.(ˆβ 2))) is an approximate 95 % confidence interval. Here we get a point estimate of exp( ) = 0.51 and a 95 % CI of (exp( ), exp( )) = (0.30, 0.89) as we did before. In fact the analysis before was exactly the same as this one and the results will be identical apart from rounding. Finally note that while the Wald statistic is computed for β 0, it is seldom of interest. Instead we tend to focus on coefficients of explanatory variables which have useful interpretations. Now we consider introducing clinic into the model. This generates a model in which we can examine the effect of level of care on mortality, but adjusted for the clinic the patient attended. > model2_glm(resp ~ clinic + loc, family=binomial(link=logit),data=prenatal.dat) > summary(model2) Coefficients: Value Std. Error t value (Intercept) clinic loc (Dispersion Parameter for Binomial family taken to be 1 ) Null Deviance: on 3 degrees of freedom Residual Deviance: on 1 degrees of freedom Number of Fisher Scoring Iterations: 3 Here we see that there is no longer any evidence that there is a relationship between level of care and mortality. The association that did exist has been explained away by the clinic the patients attended. To see this, note that a test of H 0 : β 2 = 0, while controlling for clinic, gives a p value as 2 pr(u > ) = 2 (1 pnorm( )) This is interesting, but it does not mean that there is no effect of level of care for any patient. There may be an interaction between level of care and clinic, for example, and the level of care variable may be significant in one of the clinics. To check this, we therefore fit the model with loc and clinic main effects as well as the loc*clinic interaction. RJ Cook 26 October 21, 2008

27 CHAPTER 2. BASIC ANALYSIS OF BINARY DATA > model3_glm(resp~loc+clinic+loc*clinic,family=binomial(link=logit),data=prenatal.dat) > summary(model3) Call: glm(formula = resp ~ loc + clinic + loc * clinic, family = binomial(link = logit), data = prenatal.dat) Coefficients: Value Std. Error t value (Intercept) loc clinic loc:clinic (Dispersion Parameter for Binomial family taken to be 1 ) Null Deviance: on 3 degrees of freedom Residual Deviance: 0 on 0 degrees of freedom Number of Fisher Scoring Iterations: 3 It is not surprising that this interaction term is not significant, since when we tabulated the data we found that the odds ratios relating level of care with mortality within the two strata were quite similar. Finally we fit the model involving just the clinic main effect. > model4_glm(resp ~ clinic, family=binomial(link=logit),data=prenatal.dat) > summary(model4) Call: glm(formula = resp ~ clinic, family = binomial(link = logit), data = prenatal.dat) Deviance Residuals: Coefficients: Value Std. Error t value (Intercept) clinic (Dispersion Parameter for Binomial family taken to be 1 ) Null Deviance: on 3 degrees of freedom Residual Deviance: on 2 degrees of freedom Number of Fisher Scoring Iterations: 3 When testing the hypothesis H 0 : β 1 = 0 versus H 0 : β 1 0 we get 2 pr(u > ) = 2 (1 pnorm( )) < Here we conclude that those patients at clinic A are at a significantly lower risk of mortality than those at clinic B and that this evidence is very strong. Here we get a point estimate for the odds ratio reflecting the reduction in odds of mortality in Clinic A compared to Clinic B as exp( ) = and a corresponding 95 % CI as (exp( , (exp( ) = (0.21, 0.58) 2.5 Likelihood for Binary Regression The responses y 1, y 2,..., y n as observations from n random variables Y 1...Y n where Y i Bin(m i, π i ), i = 1, 2,..., n. ( ) n mi pr(y = y; π) = π yi i (1 π i) mi yi which gives i=1 y i RJ Cook 27 October 21, 2008

28 2.5. LIKELIHOOD FOR BINARY REGRESSION L(π; y) = l(π; y) = = n i=1 π yi i (1 π i) mi yi n [y i log π i + (m i y i )log(1 π i )] i=1 n [y i log i=1 ( πi 1 π i ) + m i log(1 π i )] Modeling procedures are based on expressing π 1, π 2,..., π n in terms of fewer parameters (this is with a view of data reduction). These new parameters take the form of regression coefficients. This is done through a link function. The logistic link g(π) = log ( πi 1 π i ) = x i β implies π i = 1 + e x i β If the dimension of β is less than n, we say we have an unsaturated model. In this case we take the dimension to be p, corresponding to a model with p 1 covariates. Under some models with the dimension of β equal to n, we have a saturated model. Returning to the log-likelihood: l(β; y) = i=1 ex i β n [ ( )] y i (x 1 iβ) + m i log = 1 + e x i β n i=1 [ ] y i (x iβ) m i log(1 + e x i β ) Upon maximizing l wrt β, we obtain ˆβ and compute ˆπ = e x i ˆβ/1+e x i ˆβ. The quality of the fit of these regression models will be judged by how well ˆπ 1, ˆπ 2,..., ˆπ n fit the data (or equivalently, how well m iˆπ i approximates y i, i = 1, 2,..., n). We need a criterion to assess how much worse unsaturated models are from the saturated model. A convenient way of testing nested hypotheses is based on the likelihood ratio statistic. Likelihood Ratio Tests: L(θ) is a likelihood of a q dimensional parameter vector θ it may be maximized with no constraints on θ giving θ, or subject to constraints on ˆθ. In the latter case the effect dimension of θ will be denoted by p. Note that regression models may be interpreted as imposing constraints on the mean responses. That is we force relationships between the µ i values in linear regression, or π i values in binary regression. We may then formulate a hypothesis that the constraint is a reasonable one a test is by seeing how consistent it is with the data. The likelihood ratio statistic 2 log(l(ˆθ)/l( θ)) has a χ 2 distribution on ν = q p degrees of freedom if the null hypothesis that the constraints are reasonable is true, where ν is the difference in the effective number of parameters with and without the constraints. Therefore if 2 log(l(ˆθ)/l( θ)) > χ 2 ν (α) we would reject H 0 at the α significance level. It is more informative to examine the p value and so we compute Returning to the log likelihood for binomial data we have l(π; y) = p = pr(χ 2 q p > 2 log(l(ˆθ)/l( θ))) n [y i log i=1 ( πi 1 π i ) + m i log(1 π i )] RJ Cook 28 October 21, 2008

29 CHAPTER 2. BASIC ANALYSIS OF BINARY DATA let π = ( π 1,..., π n ) = (y 1 /m 1,..., y n /m n ) represent the MLE under the saturated model and let ˆπ = (ˆπ 1,, ˆπ n ) denote the MLE under the constrained model imposed by the regression equation. With a little algebra one can show that the LR statistic 2 log(l(ˆπ)/l( π)) = 2(l(ˆπ) l( π)) obtained by substituting the appropriate MLE s into the log-likelihood has the form " X n» 2(l( π) l(ˆπ)) = 2 y i log(y i/m i) + (m i y i) log = 2 i=1 " m X i=1 mi y i m i «# nx [y i log ˆπ + (m i y i)log(1 ˆπ i] i=1» ««# yi mi y i y i log + (m i y i) log m iˆπ i m i(1 ˆπ i) This likelihood ratio statistic is central to the analysis of binary regression models and so has a special name. It is called the deviance statistic and is represented by D if we think of it as random and d if we refer to a realized value for it. Splus reports this as the residual deviance, and sometimes is will be called the scaled deviance for reasons that will become clear shortly. Based on the general result above, we would expect it to have a χ 2 distribution on n p degrees of freedome. Unfortunately, this distributional approximation for the deviance statistic is not as good as one might hope! It does perform very well, however, for testing nested unsaturated models which we will consider in the next section. We remark in passing that the deviance statistic has the form 2 ( ) Oij O ij log where O ij is an observed quantity and E ij is an expected quantity. We use two subscripts here since we are summing over both the y i cells and the (m i y i ) cells. The Pearson statistic is another statistic one can use for assessing overall fit of a model. P = n i=1 E ij (y i m iˆπ i ) 2 m iˆπ i (1 ˆπ i ) which has the form (O i E i ) 2 /V i. As for the deviance statistic P χ 2 n p approximately if the model provides a reasonable fit to the data (i.e. if the assumed model is true ). The Chi-square approximation is a little bit better than for deviance statistics. Both however are poor if sample size (m i ) are small. The deviance and Pearson statistics can be shown to be asymptotically equivalent by a Taylor series expansion. 2.6 Testing Nested Non-saturated Models Suppose we have a model log(π i /(1 π i )) = β 0 + β 1 x i1 + + β p 1 x ip 1 and another model log(π i /(1 π i )) = β 0 + β 1 x i1 + + β p 1 x ip π + + β q 1 x i,q 1 We may be interested in testing whether the first model, which is a sub-model of the second, provides as good a fit to the data. This is equivalent to testing the significance of the covariates x p,...,x q 1, or to testing H 0 : β p = = β q = 0. Let ˆπ i denote the MLE of π i under the reduced model with p parameters and let π i denote the MLE of π i under the full model with q parameters. Again with a little algebra one can show that the likelihood ratio statistici corresponding to this test is given by the difference in the deviance of the two models. That is the appropriate likelihood ratio test statistic is D = D 0 D A where D 0 is the deviance under the null model and D A RJ Cook 29 October 21, 2008

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples ST3241 Categorical Data Analysis I Generalized Linear Models Introduction and Some Examples 1 Introduction We have discussed methods for analyzing associations in two-way and three-way tables. Now we will

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Advanced Methods for Data Analysis (36-402/36-608 Spring 2014 1 Generalized linear models 1.1 Introduction: two regressions So far we ve seen two canonical settings for regression.

More information

Linear Regression Models P8111

Linear Regression Models P8111 Linear Regression Models P8111 Lecture 25 Jeff Goldsmith April 26, 2016 1 of 37 Today s Lecture Logistic regression / GLMs Model framework Interpretation Estimation 2 of 37 Linear regression Course started

More information

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models. Last time: Background & motivation for moving beyond linear Generalized Linear Models Last time: Background & motivation for moving beyond linear regression - non-normal/non-linear cases, binary, categorical data Today s class: 1. Examples of count and ordered

More information

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

STA216: Generalized Linear Models. Lecture 1. Review and Introduction STA216: Generalized Linear Models Lecture 1. Review and Introduction Let y 1,..., y n denote n independent observations on a response Treat y i as a realization of a random variable Y i In the general

More information

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random STA 216: GENERALIZED LINEAR MODELS Lecture 1. Review and Introduction Much of statistics is based on the assumption that random variables are continuous & normally distributed. Normal linear regression

More information

12 Modelling Binomial Response Data

12 Modelling Binomial Response Data c 2005, Anthony C. Brooms Statistical Modelling and Data Analysis 12 Modelling Binomial Response Data 12.1 Examples of Binary Response Data Binary response data arise when an observation on an individual

More information

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

Classification. Chapter Introduction. 6.2 The Bayes classifier

Classification. Chapter Introduction. 6.2 The Bayes classifier Chapter 6 Classification 6.1 Introduction Often encountered in applications is the situation where the response variable Y takes values in a finite set of labels. For example, the response Y could encode

More information

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Statistics for Applications Chapter 10: Generalized Linear Models (GLMs) 1/52 Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52 Components of a linear model The two

More information

Generalized Linear Models I

Generalized Linear Models I Statistics 203: Introduction to Regression and Analysis of Variance Generalized Linear Models I Jonathan Taylor - p. 1/16 Today s class Poisson regression. Residuals for diagnostics. Exponential families.

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: )

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) ST3241 Categorical Data Analysis. (Semester II: ) NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION (SOLUTIONS) Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3

More information

9 Generalized Linear Models

9 Generalized Linear Models 9 Generalized Linear Models The Generalized Linear Model (GLM) is a model which has been built to include a wide range of different models you already know, e.g. ANOVA and multiple linear regression models

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Yan Lu Jan, 2018, week 3 1 / 67 Hypothesis tests Likelihood ratio tests Wald tests Score tests 2 / 67 Generalized Likelihood ratio tests Let Y = (Y 1,

More information

Generalized linear models

Generalized linear models Generalized linear models Douglas Bates November 01, 2010 Contents 1 Definition 1 2 Links 2 3 Estimating parameters 5 4 Example 6 5 Model building 8 6 Conclusions 8 7 Summary 9 1 Generalized Linear Models

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Generalized Linear Models 1

Generalized Linear Models 1 Generalized Linear Models 1 STA 2101/442: Fall 2012 1 See last slide for copyright information. 1 / 24 Suggested Reading: Davison s Statistical models Exponential families of distributions Sec. 5.2 Chapter

More information

Generalized linear models

Generalized linear models Generalized linear models Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 29, 202 Contents Densities for generalized linear models. Mean and variance...............................

More information

Chapter 4: Generalized Linear Models-II

Chapter 4: Generalized Linear Models-II : Generalized Linear Models-II Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46 A Generalized Linear Model for Binomial Response Data Copyright c 2017 Dan Nettleton (Iowa State University) Statistics 510 1 / 46 Now suppose that instead of a Bernoulli response, we have a binomial response

More information

Sections 4.1, 4.2, 4.3

Sections 4.1, 4.2, 4.3 Sections 4.1, 4.2, 4.3 Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1/ 32 Chapter 4: Introduction to Generalized Linear Models Generalized linear

More information

LOGISTIC REGRESSION Joseph M. Hilbe

LOGISTIC REGRESSION Joseph M. Hilbe LOGISTIC REGRESSION Joseph M. Hilbe Arizona State University Logistic regression is the most common method used to model binary response data. When the response is binary, it typically takes the form of

More information

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model

Generalized linear models for binary data. A better graphical exploratory data analysis. The simple linear logistic regression model Stat 3302 (Spring 2017) Peter F. Craigmile Simple linear logistic regression (part 1) [Dobson and Barnett, 2008, Sections 7.1 7.3] Generalized linear models for binary data Beetles dose-response example

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Chapter 22: Log-linear regression for Poisson counts

Chapter 22: Log-linear regression for Poisson counts Chapter 22: Log-linear regression for Poisson counts Exposure to ionizing radiation is recognized as a cancer risk. In the United States, EPA sets guidelines specifying upper limits on the amount of exposure

More information

SB1a Applied Statistics Lectures 9-10

SB1a Applied Statistics Lectures 9-10 SB1a Applied Statistics Lectures 9-10 Dr Geoff Nicholls Week 5 MT15 - Natural or canonical) exponential families - Generalised Linear Models for data - Fitting GLM s to data MLE s Iteratively Re-weighted

More information

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20 Logistic regression 11 Nov 2010 Logistic regression (EPFL) Applied Statistics 11 Nov 2010 1 / 20 Modeling overview Want to capture important features of the relationship between a (set of) variable(s)

More information

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Logistic Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Logistic Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Logistic Regression 1 / 38 Logistic Regression 1 Introduction

More information

Introduction to Generalized Linear Models

Introduction to Generalized Linear Models Introduction to Generalized Linear Models Edps/Psych/Soc 589 Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2018 Outline Introduction (motivation

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information

8 Nominal and Ordinal Logistic Regression

8 Nominal and Ordinal Logistic Regression 8 Nominal and Ordinal Logistic Regression 8.1 Introduction If the response variable is categorical, with more then two categories, then there are two options for generalized linear models. One relies on

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Answer Key for STAT 200B HW No. 7

Answer Key for STAT 200B HW No. 7 Answer Key for STAT 200B HW No. 7 May 5, 2007 Problem 2.2 p. 649 Assuming binomial 2-sample model ˆπ =.75, ˆπ 2 =.6. a ˆτ = ˆπ 2 ˆπ =.5. From Ex. 2.5a on page 644: ˆπ ˆπ + ˆπ 2 ˆπ 2.75.25.6.4 = + =.087;

More information

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T.

ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T. Exam 3 Review Suppose that X i = x =(x 1,, x k ) T is observed and that Y i X i = x i independent Binomial(n i,π(x i )) for i =1,, N where ˆπ(x) = exp(ˆα + ˆβ T x) 1 + exp(ˆα + ˆβ T x) This is called the

More information

Single-level Models for Binary Responses

Single-level Models for Binary Responses Single-level Models for Binary Responses Distribution of Binary Data y i response for individual i (i = 1,..., n), coded 0 or 1 Denote by r the number in the sample with y = 1 Mean and variance E(y) =

More information

Generalized Linear Models (1/29/13)

Generalized Linear Models (1/29/13) STA613/CBB540: Statistical methods in computational biology Generalized Linear Models (1/29/13) Lecturer: Barbara Engelhardt Scribe: Yangxiaolu Cao When processing discrete data, two commonly used probability

More information

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator UNIVERSITY OF TORONTO Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS Duration - 3 hours Aids Allowed: Calculator LAST NAME: FIRST NAME: STUDENT NUMBER: There are 27 pages

More information

STAT 526 Spring Final Exam. Thursday May 5, 2011

STAT 526 Spring Final Exam. Thursday May 5, 2011 STAT 526 Spring 2011 Final Exam Thursday May 5, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points will

More information

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek

Binary Response: Logistic Regression. STAT 526 Professor Olga Vitek Binary Response: Logistic Regression STAT 526 Professor Olga Vitek March 29, 2011 4 Model Specification and Interpretation 4-1 Probability Distribution of a Binary Outcome Y In many situations, the response

More information

Answer Key for STAT 200B HW No. 8

Answer Key for STAT 200B HW No. 8 Answer Key for STAT 200B HW No. 8 May 8, 2007 Problem 3.42 p. 708 The values of Ȳ for x 00, 0, 20, 30 are 5/40, 0, 20/50, and, respectively. From Corollary 3.5 it follows that MLE exists i G is identiable

More information

MIT Spring 2016

MIT Spring 2016 Generalized Linear Models MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Generalized Linear Models 1 Generalized Linear Models 2 Generalized Linear Model Data: (y i, x i ), i = 1,..., n where y i : response

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis

Review. Timothy Hanson. Department of Statistics, University of South Carolina. Stat 770: Categorical Data Analysis Review Timothy Hanson Department of Statistics, University of South Carolina Stat 770: Categorical Data Analysis 1 / 22 Chapter 1: background Nominal, ordinal, interval data. Distributions: Poisson, binomial,

More information

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models

Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Analysis of Time-to-Event Data: Chapter 4 - Parametric regression models Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/25 Right censored

More information

2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) T In 2 2 tables, statistical independence is equivalent to a population

More information

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches

1. Hypothesis testing through analysis of deviance. 3. Model & variable selection - stepwise aproaches Sta 216, Lecture 4 Last Time: Logistic regression example, existence/uniqueness of MLEs Today s Class: 1. Hypothesis testing through analysis of deviance 2. Standard errors & confidence intervals 3. Model

More information

MSH3 Generalized linear model

MSH3 Generalized linear model Contents MSH3 Generalized linear model 5 Logit Models for Binary Data 173 5.1 The Bernoulli and binomial distributions......... 173 5.1.1 Mean, variance and higher order moments.... 173 5.1.2 Normal limit....................

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto.

Clinical Trials. Olli Saarela. September 18, Dalla Lana School of Public Health University of Toronto. Introduction to Dalla Lana School of Public Health University of Toronto olli.saarela@utoronto.ca September 18, 2014 38-1 : a review 38-2 Evidence Ideal: to advance the knowledge-base of clinical medicine,

More information

Linear Methods for Prediction

Linear Methods for Prediction This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this

More information

Chapter 5: Logistic Regression-I

Chapter 5: Logistic Regression-I : Logistic Regression-I Dipankar Bandyopadhyay Department of Biostatistics, Virginia Commonwealth University BIOS 625: Categorical Data & GLM [Acknowledgements to Tim Hanson and Haitao Chu] D. Bandyopadhyay

More information

Generalized Linear Models. stat 557 Heike Hofmann

Generalized Linear Models. stat 557 Heike Hofmann Generalized Linear Models stat 557 Heike Hofmann Outline Intro to GLM Exponential Family Likelihood Equations GLM for Binomial Response Generalized Linear Models Three components: random, systematic, link

More information

STA 450/4000 S: January

STA 450/4000 S: January STA 450/4000 S: January 6 005 Notes Friday tutorial on R programming reminder office hours on - F; -4 R The book Modern Applied Statistics with S by Venables and Ripley is very useful. Make sure you have

More information

Multinomial Logistic Regression Models

Multinomial Logistic Regression Models Stat 544, Lecture 19 1 Multinomial Logistic Regression Models Polytomous responses. Logistic regression can be extended to handle responses that are polytomous, i.e. taking r>2 categories. (Note: The word

More information

Lecture 5: LDA and Logistic Regression

Lecture 5: LDA and Logistic Regression Lecture 5: and Logistic Regression Hao Helen Zhang Hao Helen Zhang Lecture 5: and Logistic Regression 1 / 39 Outline Linear Classification Methods Two Popular Linear Models for Classification Linear Discriminant

More information

Weighted Least Squares I

Weighted Least Squares I Weighted Least Squares I for i = 1, 2,..., n we have, see [1, Bradley], data: Y i x i i.n.i.d f(y i θ i ), where θ i = E(Y i x i ) co-variates: x i = (x i1, x i2,..., x ip ) T let X n p be the matrix of

More information

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News:

Today. HW 1: due February 4, pm. Aspects of Design CD Chapter 2. Continue with Chapter 2 of ELM. In the News: Today HW 1: due February 4, 11.59 pm. Aspects of Design CD Chapter 2 Continue with Chapter 2 of ELM In the News: STA 2201: Applied Statistics II January 14, 2015 1/35 Recap: data on proportions data: y

More information

MSH3 Generalized linear model Ch. 6 Count data models

MSH3 Generalized linear model Ch. 6 Count data models Contents MSH3 Generalized linear model Ch. 6 Count data models 6 Count data model 208 6.1 Introduction: The Children Ever Born Data....... 208 6.2 The Poisson Distribution................. 210 6.3 Log-Linear

More information

STAT 7030: Categorical Data Analysis

STAT 7030: Categorical Data Analysis STAT 7030: Categorical Data Analysis 5. Logistic Regression Peng Zeng Department of Mathematics and Statistics Auburn University Fall 2012 Peng Zeng (Auburn University) STAT 7030 Lecture Notes Fall 2012

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Lectures on Machine Learning (Fall 2017) Hyeong In Choi Seoul National University Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2) Topics to be covered:

More information

Categorical data analysis Chapter 5

Categorical data analysis Chapter 5 Categorical data analysis Chapter 5 Interpreting parameters in logistic regression The sign of β determines whether π(x) is increasing or decreasing as x increases. The rate of climb or descent increases

More information

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification,

Normal distribution We have a random sample from N(m, υ). The sample mean is Ȳ and the corrected sum of squares is S yy. After some simplification, Likelihood Let P (D H) be the probability an experiment produces data D, given hypothesis H. Usually H is regarded as fixed and D variable. Before the experiment, the data D are unknown, and the probability

More information

Log-linear Models for Contingency Tables

Log-linear Models for Contingency Tables Log-linear Models for Contingency Tables Statistics 149 Spring 2006 Copyright 2006 by Mark E. Irwin Log-linear Models for Two-way Contingency Tables Example: Business Administration Majors and Gender A

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Statistics 203: Introduction to Regression and Analysis of Variance Course review Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying

More information

STAT 526 Spring Midterm 1. Wednesday February 2, 2011

STAT 526 Spring Midterm 1. Wednesday February 2, 2011 STAT 526 Spring 2011 Midterm 1 Wednesday February 2, 2011 Time: 2 hours Name (please print): Show all your work and calculations. Partial credit will be given for work that is partially correct. Points

More information

Homework 1 Solutions

Homework 1 Solutions 36-720 Homework 1 Solutions Problem 3.4 (a) X 2 79.43 and G 2 90.33. We should compare each to a χ 2 distribution with (2 1)(3 1) 2 degrees of freedom. For each, the p-value is so small that S-plus reports

More information

Figure 36: Respiratory infection versus time for the first 49 children.

Figure 36: Respiratory infection versus time for the first 49 children. y BINARY DATA MODELS We devote an entire chapter to binary data since such data are challenging, both in terms of modeling the dependence, and parameter interpretation. We again consider mixed effects

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Introduction to the Logistic Regression Model

Introduction to the Logistic Regression Model CHAPTER 1 Introduction to the Logistic Regression Model 1.1 INTRODUCTION Regression methods have become an integral component of any data analysis concerned with describing the relationship between a response

More information

Proportional hazards regression

Proportional hazards regression Proportional hazards regression Patrick Breheny October 8 Patrick Breheny Survival Data Analysis (BIOS 7210) 1/28 Introduction The model Solving for the MLE Inference Today we will begin discussing regression

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models Generalized Linear Models - part III Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs.

More information

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books

Administration. Homework 1 on web page, due Feb 11 NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00 / 5 Administration Homework on web page, due Feb NSERC summer undergraduate award applications due Feb 5 Some helpful books STA 44/04 Jan 6, 00... administration / 5 STA 44/04 Jan 6,

More information

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F).

STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis. 1. Indicate whether each of the following is true (T) or false (F). STA 4504/5503 Sample Exam 1 Spring 2011 Categorical Data Analysis 1. Indicate whether each of the following is true (T) or false (F). (a) (b) (c) (d) (e) In 2 2 tables, statistical independence is equivalent

More information

Lecture 8. Poisson models for counts

Lecture 8. Poisson models for counts Lecture 8. Poisson models for counts Jesper Rydén Department of Mathematics, Uppsala University jesper.ryden@math.uu.se Statistical Risk Analysis Spring 2014 Absolute risks The failure intensity λ(t) describes

More information

BMI 541/699 Lecture 22

BMI 541/699 Lecture 22 BMI 541/699 Lecture 22 Where we are: 1. Introduction and Experimental Design 2. Exploratory Data Analysis 3. Probability 4. T-based methods for continous variables 5. Power and sample size for t-based

More information

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours

NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION. ST3241 Categorical Data Analysis. (Semester II: ) April/May, 2011 Time Allowed : 2 Hours NATIONAL UNIVERSITY OF SINGAPORE EXAMINATION Categorical Data Analysis (Semester II: 2010 2011) April/May, 2011 Time Allowed : 2 Hours Matriculation No: Seat No: Grade Table Question 1 2 3 4 5 6 Full marks

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Practical Econometrics. for. Finance and Economics. (Econometrics 2)

Practical Econometrics. for. Finance and Economics. (Econometrics 2) Practical Econometrics for Finance and Economics (Econometrics 2) Seppo Pynnönen and Bernd Pape Department of Mathematics and Statistics, University of Vaasa 1. Introduction 1.1 Econometrics Econometrics

More information

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g)

More information

STAT5044: Regression and Anova

STAT5044: Regression and Anova STAT5044: Regression and Anova Inyoung Kim 1 / 15 Outline 1 Fitting GLMs 2 / 15 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually

More information

STA102 Class Notes Chapter Logistic Regression

STA102 Class Notes Chapter Logistic Regression STA0 Class Notes Chapter 0 0. Logistic Regression We continue to study the relationship between a response variable and one or more eplanatory variables. For SLR and MLR (Chapters 8 and 9), our response

More information

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent Latent Variable Models for Binary Data Suppose that for a given vector of explanatory variables x, the latent variable, U, has a continuous cumulative distribution function F (u; x) and that the binary

More information

Lecture 01: Introduction

Lecture 01: Introduction Lecture 01: Introduction Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South Carolina Lecture 01: Introduction

More information

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence Bayesian Inference in GLMs Frequentists typically base inferences on MLEs, asymptotic confidence limits, and log-likelihood ratio tests Bayesians base inferences on the posterior distribution of the unknowns

More information

Poisson regression: Further topics

Poisson regression: Further topics Poisson regression: Further topics April 21 Overdispersion One of the defining characteristics of Poisson regression is its lack of a scale parameter: E(Y ) = Var(Y ), and no parameter is available to

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j Standard Errors & Confidence Intervals β β asy N(0, I( β) 1 ), where I( β) = [ 2 l(β, φ; y) ] β i β β= β j We can obtain asymptotic 100(1 α)% confidence intervals for β j using: β j ± Z 1 α/2 se( β j )

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017

Binary Regression. GH Chapter 5, ISL Chapter 4. January 31, 2017 Binary Regression GH Chapter 5, ISL Chapter 4 January 31, 2017 Seedling Survival Tropical rain forests have up to 300 species of trees per hectare, which leads to difficulties when studying processes which

More information