Stat 710: Mathematical Statistics Lecture 12

Similar documents
Stat 710: Mathematical Statistics Lecture 27

Lecture 26: Likelihood ratio tests

Lecture 32: Asymptotic confidence sets and likelihoods

ST5215: Advanced Statistical Theory

Lecture 17: Minimal sufficiency

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Lecture 17: Likelihood ratio and asymptotic tests

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Lecture 20: Linear model, the LSE, and UMVUE

Stat 710: Mathematical Statistics Lecture 40

Outline of GLMs. Definitions

A Very Brief Summary of Statistical Inference, and Examples

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Lecture 34: Properties of the LSE

Stat 710: Mathematical Statistics Lecture 31

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Review and continuation from last week Properties of MLEs

A Very Brief Summary of Statistical Inference, and Examples

Introduction: exponential family, conjugacy, and sufficiency (9/2/13)

Fundamentals of Statistics

Normalising constants and maximum likelihood inference

Mathematical statistics

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Chapter 3: Unbiased Estimation Lecture 22: UMVUE and the method of using a sufficient and complete statistic

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Generalized Linear Models. Kurt Hornik

STAT5044: Regression and Anova

Statistics & Data Sciences: First Year Prelim Exam May 2018

Generalized Linear Models and Exponential Families

Information in a Two-Stage Adaptive Optimal Design

Central Limit Theorem ( 5.3)

MIT Spring 2016

Mathematical statistics

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

STAT 992 Paper Review: Sure Independence Screening in Generalized Linear Models with NP-Dimensionality J.Fan and R.Song

Classical Estimation Topics

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Lecture 23: UMPU tests in exponential families

Chapter 3 : Likelihood function and inference

Generalized Estimating Equations

simple if it completely specifies the density of x

Exponential Families

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Lecture 28: Asymptotic confidence sets

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

MIT Spring 2016

Stat 579: Generalized Linear Models and Extensions

Generalized Linear Models Introduction

Brief Review on Estimation Theory

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

GARCH Models Estimation and Inference

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Generalized Linear Models 1

Generalized Linear Models I

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS

EM Algorithm II. September 11, 2018

1. Fisher Information

Generalized Linear Models (1/29/13)

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Information in Data. Sufficiency, Ancillarity, Minimality, and Completeness

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

Lecture 4: Exponential family of distributions and generalized linear model (GLM) (Draft: version 0.9.2)

Chapter 6. Hypothesis Tests Lecture 20: UMP tests and Neyman-Pearson lemma

STAT 730 Chapter 4: Estimation

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

arxiv: v2 [stat.me] 8 Jun 2016

GARCH Models Estimation and Inference

STA 732: Inference. Notes 10. Parameter Estimation from a Decision Theoretic Angle. Other resources

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

Lecture 7 Introduction to Statistical Decision Theory

Generalized Linear Models

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

SB1a Applied Statistics Lectures 9-10

Logistic Regression. Seungjin Choi

Generalized linear models

Fitting Multidimensional Latent Variable Models using an Efficient Laplace Approximation

The equivalence of the Maximum Likelihood and a modified Least Squares for a case of Generalized Linear Model

(a) (3 points) Construct a 95% confidence interval for β 2 in Equation 1.

Stat 5421 Lecture Notes Proper Conjugate Priors for Exponential Families Charles J. Geyer March 28, 2016

1 Mixed effect models and longitudinal data analysis

Chapter 4: Asymptotic Properties of the MLE (Part 2)

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

Modeling the scale parameter ϕ A note on modeling correlation of binary responses Using marginal odds ratios to model association for binary responses

Notes on the Multivariate Normal and Related Topics

DELTA METHOD and RESERVING

Goodness-of-Fit Testing with. Discrete Right-Censored Data

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

Stat 8931 (Aster Models) Lecture Slides Deck 8

Chapter 8.8.1: A factorization theorem

Lecture 1: Introduction

Nuisance parameters and their treatment

The Expectation-Maximization Algorithm

PQL Estimation Biases in Generalized Linear Mixed Models

Final Examination. STA 215: Statistical Inference. Saturday, 2001 May 5, 9:00am 12:00 noon

Probabilistic Graphical Models for Image Analysis - Lecture 4

Transcription:

Stat 710: Mathematical Statistics Lecture 12 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 1 / 11

Lecture 12: MLE in generalized linear models (GLM) and Quasi-MLE MLE in exponential families Suppose that X has a distribution from a natural exponential family so that the likelihood function is l(η) = exp{η τ T(x) ζ(η)}h(x), where η Ξ is a vector of unknown parameters. The likelihood equation is then logl(η) η = T(x) ζ(η) η = 0, which has a unique solution T(x) = ζ(η)/ η, assuming that T(x) is in the range of ζ(η)/ η. Note that 2 logl(η) η η τ = 2 ζ(η) η η τ = Var(T) (see the proof of Proposition 3.2). Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 2 / 11

MLE in exponential families Since Var(T) is positive definite, logl(η) is convex in η and T(x) is the unique MLE of the parameter µ(η) = ζ(η)/ η. Also, the function µ(η) is one-to-one so that µ 1 exists. By Definition 4.3, the MLE of η is η = µ 1 (T(x)). If the distribution of X is in a general exponential family and the likelihood function is l(θ) = exp{[η(θ)] τ T(x) ξ(θ)}h(x), then the MLE of θ is θ = η 1 ( η), if η 1 exists and η is in the range of η(θ). Of course, θ is also the solution of the likelihood equation logl(θ) θ = η(θ) θ T(x) ξ(θ) = 0. θ The results for exponential families lead to an estimation method in a class of models that have very wide applications. Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 3 / 11

MLE in exponential families Since Var(T) is positive definite, logl(η) is convex in η and T(x) is the unique MLE of the parameter µ(η) = ζ(η)/ η. Also, the function µ(η) is one-to-one so that µ 1 exists. By Definition 4.3, the MLE of η is η = µ 1 (T(x)). If the distribution of X is in a general exponential family and the likelihood function is l(θ) = exp{[η(θ)] τ T(x) ξ(θ)}h(x), then the MLE of θ is θ = η 1 ( η), if η 1 exists and η is in the range of η(θ). Of course, θ is also the solution of the likelihood equation logl(θ) θ = η(θ) θ T(x) ξ(θ) = 0. θ The results for exponential families lead to an estimation method in a class of models that have very wide applications. Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 3 / 11

GLM The GLM is a generalization of the normal linear model discussed in 3.3.1-3.3.2. The GLM is useful since it covers situations where the relationship between E(X i ) and Z i is nonlinear and/or X i s are discrete. The structure of a GLM The sample X = (X 1,...,X n ) has independent X i s and X i has the p.d.f. { exp ηi x i ζ(η i ) φ i }h(x i,φ i ), i = 1,...,n, w.r.t. a σ-finite measure ν, where η i and φ i are unknown, φ i > 0, η i Ξ = { η : 0 < h(x,φ)e ηx/φ dν(x) < } R for all i, ζ and h are known functions, and ζ (η) > 0 is assumed for all η Ξ, the interior of Ξ. Note that the p.d.f. belongs to an exponential family if φ i is known. As a consequence, E(X i ) = ζ (η i ) and Var(X i ) = φ i ζ (η i ), i = 1,...,n. Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 4 / 11

GLM The GLM is a generalization of the normal linear model discussed in 3.3.1-3.3.2. The GLM is useful since it covers situations where the relationship between E(X i ) and Z i is nonlinear and/or X i s are discrete. The structure of a GLM The sample X = (X 1,...,X n ) has independent X i s and X i has the p.d.f. { exp ηi x i ζ(η i ) φ i }h(x i,φ i ), i = 1,...,n, w.r.t. a σ-finite measure ν, where η i and φ i are unknown, φ i > 0, η i Ξ = { η : 0 < h(x,φ)e ηx/φ dν(x) < } R for all i, ζ and h are known functions, and ζ (η) > 0 is assumed for all η Ξ, the interior of Ξ. Note that the p.d.f. belongs to an exponential family if φ i is known. As a consequence, E(X i ) = ζ (η i ) and Var(X i ) = φ i ζ (η i ), i = 1,...,n. Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 4 / 11

The structure of a GLM Define µ(η) = ζ (η). It is assumed that η i is related to Z i, the ith value of a p-vector of covariates, through g(µ(η i )) = β τ Z i, i = 1,...,n, where β is a p-vector of unknown parameters and g, called a link function, is a known one-to-one, third-order continuously differentiable function on {µ(η) : η Ξ }. If µ = g 1, then η i = β τ Z i and g is called the canonical or natural link function. d If g is not canonical, we assume that dη (g µ)(η) 0 for all η. In a GLM, the parameter of interest is β. We assume that the range of β is B = {β : (g µ) 1 (β τ z) Ξ for all z Z }, where Z is the range of Z i s. φ i s are called dispersion parameters and are considered to be nuisance parameters. Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 5 / 11

MLE in GLM An MLE of β in a GLM is considered under assumption φ i = φ/t i, i = 1,...,n, with an unknown φ > 0 and known positive t i s. Let θ = (β,φ) and ψ = (g µ) 1. n [ logl(θ) = logh(x i,φ/t i )+ ψ(β τ Z i )x i ζ(ψ(β τ ] Z i )) φ/t i logl(θ) = 1 n { β φ [xi µ(ψ(β τ Z i ))]ψ (β τ } Z i )t i Z i = 0 n { logl(θ) logh(xi,φ/t = φ i ) t i[ψ(β τ Z i )x i ζ(ψ(β τ } Z i ))] φ φ 2 = 0. From the first likelihood equation, an MLE of β, if it exists, can be obtained without estimating φ. The second likelihood equation, however, is usually difficult to solve. Some other estimators of φ are suggested by various researchers; see, for example, McCullagh and Nelder (1989). Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 6 / 11

MLE in GLM Suppose that there is a solution β B to the likelihood equation. ( ) logl(θ) 2 logl(θ) Var = M n (β)/φ, β β β τ = [R n (β) M n (β)]/φ. where M n (β) = R n (β) = n n [ψ (β τ Z i )] 2 ζ (ψ(β τ Z i ))t i Z i Z τ [x i µ(ψ(β τ Z i ))]ψ (β τ Z i )t i Z i Z τ Consider first the simple case of canonical g, ψ 0 and R n 0. If M n (β) is positive definite for all β, then logl(θ) is strictly convex in β for any fixed φ and, therefore, β is the unique MLE of β. For noncanonical g, R n (β) 0 and β is not necessarily an MLE. If R n (β) is dominated by M n (β), i.e., [M n (β)] 1/2 R n (β)[m n (β)] 1/2 0 in some sense, then logl(θ) is convex and β is an MLE for large n. See more details in the proof of Theorem 4.18 in 4.5.2. Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 7 / 11 i i.

Computation of MLE in GLM In a GLM, an MLE β usually does not have an analytic form. A numerical method such as the Newton-Raphson or the Fisher-scoring method has to be applied. Using the Newton-Raphson method, we have the following iteration procedure: β (t+1) = β (t) [R n ( β (t) ) M n ( β (t) )] 1 s n ( β (t) ), t = 0,1,..., where s n (β) = φ logl(θ)/ β. Note that E[R n (β)] = 0 if β is the true parameter value and x i is replaced by X i. This means that the Fisher-scoring method uses the following iteration procedure: β (t+1) = β (t) +[M n ( β (t) )] 1 s n ( β (t) ), t = 0,1,... If the canonical link is used, then the two methods are identical. Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 8 / 11

Example 4.36 Consider the GLM with ζ(η) = η 2 /2, η R. If g is the canonical link, then the model is the same as a linear model with independent ε i s distributed as N(0,φ i ). If φ i φ, then the likelihood equation is exactly the same as the normal equation in 3.3.1. If Z is of full rank, then M n (β) = Z τ Z is positive definite. Thus, the LSE β in a normal linear model is the unique MLE of β. Suppose now that g is noncanonical but φ i φ. Then the model reduces to the one with independent X i s and ( ) X i = N g 1 (β τ Z i ), φ, i = 1,...,n. This type of model is called a nonlinear regression model (with normal errors) and an MLE of β under this model is also called a nonlinear LSE, since maximizing the log-likelihood is equivalent to minimizing the sum of squares n [X i g 1 (β τ Z i )] 2. Under certain conditions the matrix R n (β) is dominated by M n (β) and an MLE of β exists. Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 9 / 11

Example 4.37 (The Poisson model) Consider the GLM with ζ(η) = e η, η R, φ i = φ/t i. If φ i = 1, then X i has the Poisson distribution with mean e η i. Under the canonical link g(t) = logt, M n (β) = n e β τ Z i t i Z i Z τ i, which is positive definite if inf i e β τ Z i > 0 and the matrix ( t 1 Z 1,..., t n Z n ) is of full rank. There is one noncanonical link that deserves attention. Suppose that we choose a link function so that [ψ (t)] 2 ζ (ψ(t)) 1. Then M n (β) n t iz i Zi τ does not depend on β. In 4.5.2 it is shown that the asymptotic variance of the MLE β is φ[m n (β)] 1. The fact that M n (β) does not depend on β makes the estimation of the asymptotic variance (and, thus, statistical inference) easy. Under the Poisson model, ζ (t) = e t and, therefore, we need to solve the differential equation [ψ (t)] 2 e ψ(t) = 1. A solution is ψ(t) = 2log(t/2) and the link g(µ) = 2 µ. Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 10 / 11

Quasi-MLE If assumption φ i is arbitrary, or the distribution assumption on X i does not hold (e.g., X i is longitudinal), but E(X i ) = ζ (η i ) and Var(X i ) = φ i ζ (η i ), i = 1,...,n. and g(µ(η i )) = β τ Z i, i = 1,...,n, still hold, and we estimate β by solving equation n { [xi µ(ψ(β τ Z i ))]ψ (β τ Z i )t i Z i } = 0 then the resulting estimator is called a quasi-mle. This method is also called the method of generalized estimating equations (GEE). Quasi-MLE or GEE has some good asymptotic properties. They are efficient if the GEE is a likelihood equation, and is robust if it is not. Jun Shao (UW-Madison) Stat 710, Lecture 12 Feb 18, 2009 11 / 11