STAT5044: Regression and Anova

Size: px

Start display at page:

Download "STAT5044: Regression and Anova"

Patience Adams
5 years ago
Views:

1 STAT5044: Regression and Anova Inyoung Kim 1 / 15

2 Outline 1 Fitting GLMs 2 / 15

3 Fitting GLMS We study how to find the maxlimum likelihood estimator ˆβ of GLM parameters The likelihood equaions are usually nonlinear in ˆβ We describe a general-purpose iterative method for solving nonlinear equations and apply it two ways to determine the maximum of a likelihood function 3 / 15

4 Fitting GLMS Newton-Raphson method Fisher scoring method Reweighted Least Squares 4 / 15

5 Newton-Raphson method The Newton-Raphson method is an iterative method for solving nonlinear equations, such as equations whose solution determines the point at which a function takes its maximum. It begins with an initial guess for the solution. It obtains a second guess by approximating the function to be maximized in a neighborhod of the initial guess by a second-degree Then we find the location of that polynomial s maximum value. It then approximates the function in a neighborhood of the second guess by another second-order polynomial, and the third guess is the location of its maximum. In this manner, the method generates a sequences of guesses. These converge to the location of the maximum when the function is suitable and /or the initial guess is good. 5 / 15

6 Newton-Raphson method Here s how Newton-Raphson determines the value ˆβ at which a function L(β) is maximized. Let u = ( L(β)/ β 1, L(β)/ β 2,...) Let H denote the matrix having entries h ab = 2 L(β)/ β a β b, called the Hessian matrix. Let u (t) and H (t) be u and H evaluated at β (t), the guess t for ˆβ. Step t in the iterative process (t = 0,1,2,...) approximates L(β) near β (t) by the terms up to second order in its Taylor series expansion L(β) L(β (t) ) + u (t) (β β (t) ) + ( 1 2 )(β β (t) ) H (t) (β β (t) ) Solving L(β)/ β = u (t) + H (t) (β β (t) ) = 0 for β yields the next guess. That guess can be expressed as β (t+1) = β (t) (H (t) ) 1 u (t) assuing that H (t) is nonsingular. 6 / 15

7 Newton-Raphson method The converegbce of β (t) to ˆβ for the Newton-Raphson method is usually fast. For large t, the convergence satisfies, for each j, for some c > 0 It is referred tp as second-order β (t+1) j ˆβ j c β (t) j ˆβ j 2 This implies that the number of correct decimals in the approximation roughly doubles after sufficiently many iterations. 7 / 15

8 Newton-Raphson method for logistic regression L(π) = ylog(π) + (n y)log(1 π) u = (y nπ)/π(1 π) H = [y/π 2 + (n y)/(1 π) 2 ] π (t+1) = π (t) y + [ (π (t) ) + n y 2 (1 π (t) ) 2 ] 1 ( y nπ(t) π (t) (1 π (t) )) 8 / 15

9 Fisher Scoring Method Fisher scoring us an alternative iterative method for solving likelihood equations It resemles the NR method, the distinction being with the Hessian matrix. Fisher scoring uses the expected value of this matrix, called the expected information, whereas NR uses the matix itself, called observed information. 9 / 15

10 Fisher Scoring Method Let J (t) denote the approximation t for the ML estimate of the expected information matrix; That is, J (t) has elements E( 2 L(β)/ β a β b ), evaluated at β (t). The formula for Fisher scoring is β (t+1) = β (t) + (J (t) ) 1 u (t) 10 / 15

11 Fisher Scoring Method in logsitic regression Information is n/[π(1 π)] A step of Fisher scoring gives π (t+1) = π (t) n y nπ (t) + [ π (t) (1 π (t) ) ] 1 π (t) (1 π (t) ) = π (t) + y nπ(t) n = y n 11 / 15

12 Fisher Scoring Method in logsitic regression J (t) = X W (t) X, where W (t) is W evaluated at β (t) The estimated asymptotic covariance matrix Jˆ 1 of ˆβ occurs as a by-product of this algorithm as (J (t) ) 1 for t at which convergence is adequate. For both Fisher scoring and Newton-Raphson, u has elements u j = L(β) = β j N i=1 (y i µ i )x ij var(y i ) µ i η i 12 / 15

13 ML as Iterative Reweighted Least Squares A relation exists between weighted least squares estimation and using Fisher scoring to find ML estimates. We refer here to the general linear model of form z = Xβ + ε When the covariance matrix of ε is V, the weighted least squares (WLS) estimator of β is (X V 1 X) 1 X V 1 z From J = X WX, expression for elements of u, and since diagonal elements of W are w i = ( µ i / η i ) 2 /Var(Y i ). where z (t) has elements z (t) J (t) β (t) + u (t) = X W (t) z (t) i = j = η (t) i x ij β (t) j + (y i µ (t) i + (y i µ (t) i ) η(t) i µ (t) i ) η(t) i µ (t) i 13 / 15

14 ML as Iterative Reweighted Least Squares Equations for Fisher scoring then have the form (X W (t) X)β (t+1) = X W (t) z (t) These are the normal equations for using weighted least squares to fit a linear model for a response variable z (t), when the model matrix is X and the inverse of the covariance matrix is W (t). The equations have solution β (t+1) = (X W (t) X) 1 X W (t) z (t) 14 / 15

15 ML as Iterative Reweighted Least Squares The vector z in this formulation is a linearized form of the link function g evaluated at y g(y i ) g(µ i ) + (y i µ i )g (µ i ) = η i + (y i µ i )( η i / µ i ) = z i This adjusted response variable z has element i approximated by z (t) i for cycle t of the iterative scheme. That cycle regresses z (t) on X with weight (i.e., inverse covariance) W (t) to obtain a new estimate β (t+1) This estimate yields a new linear predictor value η (t+1) = Xβ (t+1) and a new adjusted respomse value z (t+1) for the next cycle. The ML estimator results from iterative use of weighted least squares, in which the weight matrix changes at each cycle. The process is called iterative reweighted least squares 15 / 15

Outline of GLMs. Definitions

Outline of GLMs. Definitions Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density