Some explanations about the IWLS algorithm to fit generalized linear models

Similar documents
Methylation-associated PHOX2B gene silencing is a rare event in human neuroblastoma.

A new simple recursive algorithm for finding prime numbers using Rosser s theorem

Smart Bolometer: Toward Monolithic Bolometer with Smart Functions

Case report on the article Water nanoelectrolysis: A simple model, Journal of Applied Physics (2017) 122,

A new approach of the concept of prime number

Hook lengths and shifted parts of partitions

On Newton-Raphson iteration for multiplicative inverses modulo prime powers

A Context free language associated with interval maps

Vibro-acoustic simulation of a car window

Full-order observers for linear systems with unknown inputs

Can we reduce health inequalities? An analysis of the English strategy ( )

Easter bracelets for years

A non-commutative algorithm for multiplying (7 7) matrices using 250 multiplications

From Unstructured 3D Point Clouds to Structured Knowledge - A Semantics Approach

Thomas Lugand. To cite this version: HAL Id: tel

New estimates for the div-curl-grad operators and elliptic problems with L1-data in the half-space

Solving a quartic equation and certain equations with degree n

Exact Comparison of Quadratic Irrationals

Completeness of the Tree System for Propositional Classical Logic

On the link between finite differences and derivatives of polynomials

Evolution of the cooperation and consequences of a decrease in plant diversity on the root symbiont diversity

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

A proximal approach to the inversion of ill-conditioned matrices

Dispersion relation results for VCS at JLab

Understanding SVM (and associated kernel machines) through the development of a Matlab toolbox

Polychotomous regression : application to landcover prediction

Passerelle entre les arts : la sculpture sonore

Solving the neutron slowing down equation

Some tight polynomial-exponential lower bounds for an exponential function

On size, radius and minimum degree

Exogenous input estimation in Electronic Power Steering (EPS) systems

On Symmetric Norm Inequalities And Hermitian Block-Matrices

A remark on a theorem of A. E. Ingham.

L institution sportive : rêve et illusion

b-chromatic number of cacti

Approximation SEM-DG pour les problèmes d ondes elasto-acoustiques

Analysis of Boyer and Moore s MJRTY algorithm

The Mahler measure of trinomials of height 1

Soundness of the System of Semantic Trees for Classical Logic based on Fitting and Smullyan

Axiom of infinity and construction of N

There are infinitely many twin primes 30n+11 and 30n+13, 30n+17 and 30n+19, 30n+29 and 30n+31

On path partitions of the divisor graph

The FLRW cosmological model revisited: relation of the local time with th e local curvature and consequences on the Heisenberg uncertainty principle

Entropies and fractal dimensions

Fast Computation of Moore-Penrose Inverse Matrices

Classification of high dimensional data: High Dimensional Discriminant Analysis

Finite volume method for nonlinear transmission problems

Trajectory Optimization for Differential Flat Systems

Best linear unbiased prediction when error vector is correlated with other random vectors in the model

Widely Linear Estimation with Complex Data

Generalized Linear Models I

Solving an integrated Job-Shop problem with human resource constraints

A Simple Proof of P versus NP

Generalized Linear Models 1

A note on the computation of the fraction of smallest denominator in between two irreducible fractions

Quantum efficiency and metastable lifetime measurements in ruby ( Cr 3+ : Al2O3) via lock-in rate-window photothermal radiometry

On Symmetric Norm Inequalities And Hermitian Block-Matrices

Linear Quadratic Zero-Sum Two-Person Differential Games

Unbiased minimum variance estimation for systems with unknown exogenous inputs

Multiple sensor fault detection in heat exchanger system

A CONDITION-BASED MAINTENANCE MODEL FOR AVAILABILITY OPTIMIZATION FOR STOCHASTIC DEGRADING SYSTEMS

How to make R, PostGIS and QGis cooperate for statistical modelling duties: a case study on hedonic regressions

Basic concepts and models in continuum damage mechanics

Generalized Linear Models. Kurt Hornik

Comment on: Sadi Carnot on Carnot s theorem.

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Solution to Sylvester equation associated to linear descriptor systems

Determination of absorption characteristic of materials on basis of sound intensity measurement

A note on the acyclic 3-choosability of some planar graphs

Nel s category theory based differential and integral Calculus, or Did Newton know category theory?

A non-linear simulator written in C for orbital spacecraft rendezvous applications.

Comparison of Harmonic, Geometric and Arithmetic means for change detection in SAR time series

The Accelerated Euclidean Algorithm

A non-commutative algorithm for multiplying (7 7) matrices using 250 multiplications

Weighted Radon transforms for which the Chang approximate inversion formula is precise

Quasi-periodic solutions of the 2D Euler equation

Sound intensity as a function of sound insulation partition

The Learner s Dictionary and the Sciences:

Two-step centered spatio-temporal auto-logistic regression model

MODal ENergy Analysis

Climbing discrepancy search for flowshop and jobshop scheduling with time-lags

Influence of a Rough Thin Layer on the Potential

RHEOLOGICAL INTERPRETATION OF RAYLEIGH DAMPING

Numerical Modeling of Eddy Current Nondestructive Evaluation of Ferromagnetic Tubes via an Integral. Equation Approach

On The Exact Solution of Newell-Whitehead-Segel Equation Using the Homotopy Perturbation Method

Norm Inequalities of Positive Semi-Definite Matrices

Towards an active anechoic room

On the longest path in a recursively partitionable graph

A Slice Based 3-D Schur-Cohn Stability Criterion

Question order experimental constraints on quantum-like models of judgement

A Simple Model for Cavitation with Non-condensable Gases

Lorentz force velocimetry using small-size permanent magnet systems and a multi-degree-of-freedom force/torque sensor

New Basis Points of Geodetic Stations for Landslide Monitoring

The magnetic field diffusion equation including dynamic, hysteresis: A linear formulation of the problem

Some approaches to modeling of the effective properties for thermoelastic composites

On the simultaneous stabilization of three or more plants

Stickelberger s congruences for absolute norms of relative discriminants

Low frequency resolvent estimates for long range perturbations of the Euclidean Laplacian

Nodal and divergence-conforming boundary-element methods applied to electromagnetic scattering problems

Outline of GLMs. Definitions

Transcription:

Some explanations about the IWLS algorithm to fit generalized linear models Christophe Dutang To cite this version: Christophe Dutang. Some explanations about the IWLS algorithm to fit generalized linear models. 207. <hal-0577698> HAL Id: hal-0577698 https://hal.archives-ouvertes.fr/hal-0577698 Submitted on 27 Aug 207 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Distributed under a Creative Commons ttribution - NonCommercialttribution - NonCommercial 4.0 International License

Some explanations about the IWLS algorithm to fit generalized linear models Christophe Dutang Laboratoire Manceau de Mathématiques, Le Mans Université, France August 207 This short note focuses on the estimation procedure generally used for generalized linear models GLMs, see e.g. McCullagh, P. 984. Generalized linear models. European Journal of Operational Research, 63, 285-292. Fitting GLMs. Definition of the log-likelihood and the score function The parametrization of the exponential family generally used for GLMs is given by the following density or mass probability function: f Y y; θ, φ = e yθ bθ aφ +cy,φ, y S, where S is the support of the distribution, typically N or R, and a, b, c are known smooth functions. Note that EY = b θ = µ and V ary = φb θ = φv µ. Let us start with the iid case, where Y i are independent and identically distributed. In that case, the score is defined as Sθ = log f Y Y ; θ, φ θ = Y b θ. aφ It is well known that ES = 0 and V ars = ES θ = b θ/φ. Now, we focus on the GLM context. That is Y i F exp θ i, φ i for all i =,..., n where the explanatory variables are linked to the expectation by gb θ i = gµ i = β x i + + β p x ip, with p < n for identifiability reasons. Note that an intercept is generally included so that x i = for all i. The log-density of Y i is l i β i = log f Yi y i ; θ i β i, φ i = y iθ i β i bθ i β i aφ i + cy i, φ i. The log-likelihood of the GLM for observations y,..., y n is simply obtained by adding l i contributions Lβ = l i β i = yi θ i β i bθ i β i aφ i + cy i, φ i. A common choice for the dispersion parameter is φ i = φ/w i with w i a known weigth. The score function is defined as the expectation of the gradient of the log-likelihood. Using θ i = b g β x i + + β p x ip, η i = β x i + +β p x ip, f = /f f, f = /f f, we derive the partial derivative

θ i β j = b g η i g η i = b b g η i Therefore, using this partial derivative w.r.t. β j leads to the following score S j β = Lβ β j = U i θ i θ i β j = y i b θ i aφ i b θ i g µ i = g g η i = b θ i n y i µ i aφ i V µ i g µ i g µ i. where µ i = b θ i and V µ i = b θ i for j =,..., p. The parameter β is found by solving the score equations S j β = 0, j =,..., p..2 Objective of the optimization procedure The question we may ask is whether it is equivalent to solve the score function or to minimize the opposite of the log-likelihood by the exact Newton method? Consider f : R n R a twice differentiable function with a gradient vector gx = fx, and a Hessian matrix Hx = 2 fx. Let F : R n R n be a differentiable function. The Jacobian matrix is denoted by JacF x R n n. From classical optimization books, e.g. Nocedal, J. & Wright, S. J. 2006, Numerical Optimization, Springer Science+Business Media, a local optimization method consists in computing the following sequence x k+ = x k + d k where d k is computed according to a scheme. In addition, a globalization technique may be used in conjunction such as a line search. But, the globalization scheme is seldom done for fitting GLMs. The exact Newton method also called the Newton-Raphson method to find the minimum of a function f uses the direction d k = Hx k gx k. In comparison, the steepest descent method to find the minimum of f considers d k = gx k. Furthermore, the exact Newton method to find the root of F uses the direction d k = Jacx k F x k. Hence, the direction is exactly the same between the minimization problem and the root problem, when the root function F is the gradient f of the objective. Hence, finding the roots of the score equations is equivalent to maximizing the log-likelihood..3 Derivation of the Newton method for the score equations The Newton method to find the root of the score equations is β k+ = β k JacS β k Sβ k. The exponent k is used to denote the kth iteration since subscript are used for indexing observation and/or component. Let us compute the Jacobian of the score or the Hessian of the log-likelihood. 2 Lβ β j = yi b θ i x n ij β l aφ i b θ i g µ i + β l xij yi b θ i + g µ i aφ i b θ i. yi b θ i b θ i aφ i g µ i The first term is yi b θ i aφ i = b θ i aφ i θ i = b θ i aφ i b θ i 2 x il g µ i = x il aφ i g µ i.

The second term is b θ i The third term is since = b θ i b θ i 2 = b θ i b θ i 2 θ i = b θ i b θ i 3 x il g µ i = b θ i V µ i 3 x il g µ i. xij g = g µ i µ i g µ i 2 = g µ i µ i g µ i 2 = x il g µ i g µ i 3, µ i = b θ i = b θ i θ i = b θ i b θ i Recalling that the Hessian matrix is defined as 2 Lβ Hβ, y,..., y n = β j and using b θ = V µ, we get, j,l x il g µ i = x il g µ i. 2 Lβ β j = n x il aφ i g µ i b θ i g µ i x il g µ i y i b θ i g µ i 3 aφ i b θ i x il x n ij = aφ i g µ i 2 V µ i b θ i V µ i 3 x il g µ i y i b θ i aφ i b θ i x il y i µ i n V µ i 3 g µ i 2 aφ i g µ i x il g µ i y i µ i g µ i 3 µ i aφ i. In practice, we use the expectation of this matrix w.r.t. the random variable Y i. This procedure is known as the Fisher scoring method. Hence, two terms will cancel because EY i = µ i. So x il Hβ = EHβ, Y,..., Y n = aφ i g µ i 2. V µ i This matrix can be rewritten as the product of three matrices Hβ = X T W βx where aφ g µ 2 V µ x... x p W β =..., X =.. aφ ng µ n 2 V µ n x n... x np The expected Newton method is Let us write matricially the score vector S j β = y i µ i φ i V µ i g µ i = β k+ = β k + X T W β X k Sβ k. n y i µ i g µ i j,l φ i g µ i 2 V µ i = XT W βỹ β where we define a new vector Ỹ β = y i µ i g µ i i Rn. The expected Newton method can be reformulated as β k+ = β k + X T W β X k X T W β k Ỹ βk. 3

.4 Reformulation as an iterative weighted least square IWLS problem Let us rewrite β as a matrix product β = X T W β X k X T W β k Xβ = X T W β X k X T W β k X, where Xβ = Xβ is the vector of linear predictor η i. In other words, the expected Newton method can be factorized as β k+ = X T W β X k X T W β k Xβ k + Ỹ βk = X T W β X k X T W β k Zβ k with a new vector Zβ = η i β + y i µ i β g µ i β i. That is β k+ is the solution of a weighted least square problem with weights W k, response vector Z k and explanatory variable X k..5 The IWLS Algorithm The iterative weighted least square algorithm used to fit GLM is as follows. Initialization: a Use original data with a small shift µ 0 i = y i + 0. to compute η 0 i = gµ 0 i. b Compute working responses Z 0 = η 0 i + y i µ 0 i g µ 0 i i. c Compute working weights W 0 = diagw,..., w n and w i = aφ ig µ 0 i 2 V µ 0 i. d Solve the system to get β 0 X T W 0 Xβ 0 = X T W 0 Z 0. 2. Iteration: for k =,..., m do a Compute working responses Z k = z i i and z i = η i β k + y i µ i β k g µ i β k. b Compute working weights W k = diagw,..., w n and w i = aφ ig µ iβ k 2 V µ iβ k. c Solve the system to get β k+ X T W k Xβ k+ = X T W k Z k. d Verify convergence on the deviance: Devβ k+ Devβ k ɛ. In practice the linear system X T W k Xβ k+ = X T W k Z k is solved via a QR decomposition, see e.g. Green 984. 2 Numerical illustration In this section, we carry out simple examples of GLMs on simulated datasets in the R statistical software, R Core Team 207, R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.r-project.org/. 4

2. Poisson regression A Poisson distribution has the following probabilty mass function P X = x = λ x e λ /x! for x N. We rewrite as logfx = x logλ logx! λ = x logλ λ logx!. So θ = logλ λ = e θ, bx = e x, φ =, ax = x and cx, φ = logx!. In particular b x = logx. Below we make a simple Poisson regression with a single categorical variable where an explicit solution exists. We plot the absolute relative error of the GLM estimator. abs. relative error 0.00 0.02 0.04 0.06 0 000 2000 3000 4000 5000 sample size 2.2 Gamma regression A gamma distribution has the following density function fx = λα x α e λx Γα for x X = R +, λ, α > 0. We rewrite as λ λ α x log α logfx = + α logα + α logx logγα /α So θ = λ α, Θ = R, φ = /α, ax = x, bx = log x and cx, φ = log/φ/φ + /φ logx logγ/φ. In particular b x = /x. Below we make a simple gamma regression with a single categorical variable where an explicit solution exists. We plot the absolute relative error of the GLM estimator. 5

abs. relative error 0.00 0.0 0.20 0 000 2000 3000 4000 5000 sample size 6