Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Similar documents
1 Mixed effect models and longitudinal data analysis

MIT Spring 2016

Linear Regression Models P8111

Generalized Linear Models Introduction

Outline of GLMs. Definitions

WU Weiterbildung. Linear Mixed Models

Lattice Data. Tonglin Zhang. Spatial Statistics for Point and Lattice Data (Part III)

Generalized Additive Models

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Stat 579: Generalized Linear Models and Extensions

Generalized Linear Models. Kurt Hornik

Logistic regression. 11 Nov Logistic regression (EPFL) Applied Statistics 11 Nov / 20

STA216: Generalized Linear Models. Lecture 1. Review and Introduction

Research Projects. Hanxiang Peng. March 4, Department of Mathematical Sciences Indiana University-Purdue University at Indianapolis

Generalized Linear Models. Last time: Background & motivation for moving beyond linear

Generalized Linear Models 1

Linear model A linear model assumes Y X N(µ(X),σ 2 I), And IE(Y X) = µ(x) = X β, 2/52

Bayesian inference. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. April 10, 2017

Lecture 8. Poisson models for counts

Parametric Modelling of Over-dispersed Count Data. Part III / MMath (Applied Statistics) 1

if n is large, Z i are weakly dependent 0-1-variables, p i = P(Z i = 1) small, and Then n approx i=1 i=1 n i=1

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Linear Models A linear model is defined by the expression

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 9. Linear models and regression

SB1a Applied Statistics Lectures 9-10

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Introduction to General and Generalized Linear Models

Modelling geoadditive survival data

Stat 710: Mathematical Statistics Lecture 12

STAT 6350 Analysis of Lifetime Data. Failure-time Regression Analysis

Applied Linear Statistical Methods

ECON 5350 Class Notes Functional Form and Structural Change

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Generalized Linear Models (GLZ)

Generalized Linear Models

Linear Mixed Models. One-way layout REML. Likelihood. Another perspective. Relationship to classical ideas. Drawbacks.

Generalized Linear Models: An Introduction

STAT5044: Regression and Anova

ST3241 Categorical Data Analysis I Generalized Linear Models. Introduction and Some Examples

STA 216: GENERALIZED LINEAR MODELS. Lecture 1. Review and Introduction. Much of statistics is based on the assumption that random

Review and continuation from last week Properties of MLEs

Bayesian Inference in GLMs. Frequentists typically base inferences on MLEs, asymptotic confidence

Linear Methods for Prediction

Modeling Real Estate Data using Quantile Regression

Bayesian Linear Regression

Two Hours. Mathematical formula books and statistical tables are to be provided THE UNIVERSITY OF MANCHESTER. 26 May :00 16:00

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Chapter 4: Generalized Linear Models-II

Gradient types. Gradient Analysis. Gradient Gradient. Community Community. Gradients and landscape. Species responses

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

Survival Analysis Math 434 Fall 2011

Model Selection for Semiparametric Bayesian Models with Application to Overdispersion

Modeling the Mean: Response Profiles v. Parametric Curves

Generalized linear models

Model comparison and selection

Linear Regression With Special Variables

AMS-207: Bayesian Statistics

Generalized linear models

Generalized Estimating Equations

Sampling distribution of GLM regression coefficients

GARCH Models Estimation and Inference

Generalized Linear Models I

PENALIZED LIKELIHOOD PARAMETER ESTIMATION FOR ADDITIVE HAZARD MODELS WITH INTERVAL CENSORED DATA

11. Generalized Linear Models: An Introduction

Practical considerations for survival models

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Non-maximum likelihood estimation and statistical inference for linear and nonlinear mixed models

Regularization in Cox Frailty Models

variability of the model, represented by σ 2 and not accounted for by Xβ

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

Linear models and their mathematical foundations: Simple linear regression

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Testing Statistical Hypotheses

Frailty Modeling for clustered survival data: a simulation study

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

SUPPLEMENTARY SIMULATIONS & FIGURES

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

Generalized Linear. Mixed Models. Methods and Applications. Modern Concepts, Walter W. Stroup. Texts in Statistical Science.

Semiparametric Generalized Linear Models

Master s Written Examination

A Generalized Linear Model for Binomial Response Data. Copyright c 2017 Dan Nettleton (Iowa State University) Statistics / 46

Various types of likelihood

Generalized Linear Models

Poisson regression: Further topics

Bayesian Linear Models

Standard Errors & Confidence Intervals. N(0, I( β) 1 ), I( β) = [ 2 l(β, φ; y) β i β β= β j

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Ph.D. Qualifying Exam Friday Saturday, January 3 4, 2014

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Approximate Likelihoods

Linear Models and Estimation by Least Squares

. Also, in this case, p i = N1 ) T, (2) where. I γ C N(N 2 2 F + N1 2 Q)

Semiparametric Regression

Likelihood inference in the presence of nuisance parameters

Math 423/533: The Main Theoretical Topics

Transcription:

Weighting We have seen that if E(Y) = Xβ and V (Y) = σ 2 G, where G is known, the model can be rewritten as a linear model. This is known as generalized least squares or, if G is diagonal, with trace(g) = n, as weighted least squares. Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown. We typically assume a parametric form relating g i to µ i or x i, where g i is the ith diagonal element of G. For example, we might assume that g i = xi θ, where θ is an unknown parameter.

We usually fit such models using iterated weighted least squares, i.e. 1. start with the least squares estimates; 2. use the residuals to estimate θ and hence g i ; 3. use these estimates in a weighted least squares fit to re-estimate β; 4. iterate steps 2 and 3 until convergence to ˆβ G. It can be shown that, asymptotically, ˆβ G N(β, σ 2 (X G 1 X) 1 ). Note, however, that the variance depends on G, which is a function of θ and not Ĝ, which is a function of ˆθ. The quality of the asymptotic approximation depends on how well we estimate θ. In particular, standard errors are usually underestimated if a simple plug-in estimator is used.

Transformations The famous Box-Cox family of transformations is given by { Y (λ) (Y = λ 1)/λ if λ 0; log Y if λ = 0. The assumed model is with Y i s independent. Y (λ) i N(x iβ, σ 2 ), The main purpose of this transformation is to ensure symmetry (normality) in the distribution of Y. In a randomization framework, it ensures additivity of unit and treatment effects.

The best fitting λ will be chosen by maximum likelihood and should stabilise the variance. The MLEs are easily obtained by using an iterative maximisation over λ and finding the least squares estimates of β, at each λ, i.e. we minimise n { S(λ) = Y (λ) 2 i x iβ}. i=1 It can be shown that, for a fixed λ, {S(λ) S(ˆλ)} χ 2 1. This means that we can conduct all the usual inference, but with one degree of freedom lost for estimating λ.

Both weighting and transformation rely on the assumption that the variance increases (or decreases) with µ or with x. Whereas weighting assumes a symmetrical distribution, transformation assumes that the distribution becomes more asymmetrical the smaller the variance is. It is also possible to combine transformations and weighting (if the data are sufficiently rich), to simultaneously correct for heteroskedasticity and asymmetry. Other families of transformations are possible.

It is also possible to transform both sides, i.e. Y (λ) i N ( (x i β ) (λ), σ 2 ). This corrects for asymmetry, while maintaining the functional relationship between µ and the explanatory variables. This makes most sense when the functional form of µ is known (and so is much more common with nonlinear functional forms).

Components of Variance In many studies there are several nested levels of variation, e.g. schools and pupils. It is natural to consider each of these as contributing a variance component, e.g. for group i, unit j, we have Y ij = x ijβ + δ i + ɛ ij, where δ i N(0, σ 2 1 ), ɛ ij N(0, σ 2 ) and all r.v.s are independent. This can be rewritten as where Y N n (Xβ, V),

V = U 0 0 0 U.... 0 U and U = σ1 2 + σ2 σ1 2 σ1 2 σ1 2. σ1 2 + σ2... σ1 2 + σ2 σ 2 1. This generalizes in an obvious way to: unequal numbers of subgroups in each group; any number of variance components; negative variance components.

If σ 2 1 /σ2 were known, β could be estimated by generalized least squares. In practice, we usually estimate the variance components and plug the estimates in to a generalized least squares fit. This underestimates the standard errors of ˆβ, but various corrections are available. The variance components can be estimated by maximum likelihood or residual maximum likelihood (REML) (or we can do a Bayesian analysis). REML uses an orthogonal transformation of Y of dimension n p and maximises the likelihood. In the general linear model, it gives S 2 as the estimator of σ 2.

Generalized Linear Models Linear models are useful for response variables taking values on R. Generalized linear models (GLMs) are used for other responses. GLMs require an assumption that the responses are from a particular distribution from the exponential family, with location parameter µ depending on x through a link function g( ): g(µ i ) = x iβ and constant dispersion parameter. These are most commonly used with discrete data, the assumed distribution being Poisson or binomial, but occasionally the gamma distribution is used for continuous data on R +. This is an alternative to transforming Y.

GLMs have a number of properties which make them simple to work with: Maximum likelihood estimates can be obtained using iteratively reweighted least squares, which ensures convergence. Canonical link functions (e.g. log λ for Poisson, log{π/(1 π)} for binomial) can be used, for which X Y is a sufficient statistic. Extensions include generalized linear mixed models and hierarchical generalized linear models. These allow for additional random effects.

Quasi-likelihood models retain the same mean-variance relationship as the distributions used in GLMs, but drop the distributional assumption and allow for an extra dispersion parameter. For example, instead of assuming Y Poisson(λ), which implies E(Y ) = λ and V (Y ) = λ, we assume E(Y ) = λ and V (Y ) = σ 2 λ, with the same link function, but without assuming a specific distribution. This is a kind of semi-parametric model.

Proportional Hazards Models For time-to-event data T, it is often appropriate to assume the model T i Weibull(α i, η), where log α i = x i β. This is not a GLM, since the Weibull distribution is not a member of the exponential family. The hazard function for this distribution is h(t) = f (t) 1 F (t) = ηα η i t η 1 = ηe ηx iβ t η 1 = h 0 (t)e x iβ, where h 0 (t) is the baseline hazard for an observation with x = 0.

The Weibull regression model has the property of proportional hazards. Very often, we use Cox s semi-parametric proportional hazards model, which assumes h(t) = h 0 (t)e x iβ, but does not make any distributional assumption. As usual, proportional hazards is just an assumption and can be relaxed. We can also include random effects (frailty) in these models.

Semi-Parametric Models Semi-parametric regression models assume that Y N(f (x), σ 2 ), but avoid assume a specific functional form for f (x). Instead, data-driven smoothers are used to approximate the unknown function. Typical examples are smoothed local polynomials (splines), which try to follow the data, while penalising roughness. These can be extended to other distributions using generalized additive models. Opinion: These models are most useful for nuisance effects, as they do not allow mechanistic understanding to be developed.

Nonlinear Models The term nonlinear models is completely general, but is most often associated with models of the form E(Y i ) = f (x i ; θ) and V (Y) = σ 2 I. The parameters are usually estimated by nonlinear least squares (NLLS), i.e. we choose θ to minimise n {Y i f (x i ; θ)} 2. i=1 Inference is usually carried out by assuming normality, in which case the NLLS estimates are equivalent to the MLEs, and invoking the asymptotic properties of MLEs.

Nonlinear least squares is a fundamentally difficult numerical problem: Convergence to the global optimum is not guaranteed and often fails in practice. Partial linear algorithms can help when there are separable linear parameters, i.e. the model can be written E(Y ) = q β j f j (x; θ). j=1 This reduces the numerical search by q dimensions. Also, asymptotic inferences can be very poor approximations. Bayesian methods avoid these computational problems, but might create others.