Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Similar documents
Likelihood-Based Methods

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Modelling geoadditive survival data

Spatially Adaptive Smoothing Splines

A Modern Look at Classical Multivariate Techniques

Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Foundations of Statistical Inference

REGRESSION WITH SPATIALLY MISALIGNED DATA. Lisa Madsen Oregon State University David Ruppert Cornell University

STAT 518 Intro Student Presentation

Professors Lin and Ying are to be congratulated for an interesting paper on a challenging topic and for introducing survival analysis techniques to th

A general mixed model approach for spatio-temporal regression data

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Regularization in Cox Frailty Models

Nonparametric Small Area Estimation Using Penalized Spline Regression

Some properties of Likelihood Ratio Tests in Linear Mixed Models

Exact Likelihood Ratio Tests for Penalized Splines

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

FREQUENTIST BEHAVIOR OF FORMAL BAYESIAN INFERENCE

Bayesian Estimation and Inference for the Generalized Partial Linear Model

Analysing geoadditive regression data: a mixed model approach

Semiparametric Regression of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines and Linear Mixed Models

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.

Problem Selected Scores

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models

Gaussian Graphical Models and Graphical Lasso

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Hierarchical Modeling for Univariate Spatial Data

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

BIOS 2083 Linear Models c Abdus S. Wahed

Data Mining Stat 588

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

SEMI-LINEAR LINEAR INDEX MODEL WHEN THE LINEAR COVARIATES AND INDICES ARE INDEPENDENT

Spatial smoothing using Gaussian processes

Motivational Example

Semiparametric Mixed Model for Evaluating Pathway-Environment Interaction

On the Behavior of Marginal and Conditional Akaike Information Criteria in Linear Mixed Models

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

Ridge Estimation and its Modifications for Linear Regression with Deterministic or Stochastic Predictors

The Poisson transform for unnormalised statistical models. Nicolas Chopin (ENSAE) joint work with Simon Barthelmé (CNRS, Gipsa-LAB)

LOCAL POLYNOMIAL AND PENALIZED TRIGONOMETRIC SERIES REGRESSION

MIXED MODELS THE GENERAL MIXED MODEL

Modeling Real Estate Data using Quantile Regression

6 Pattern Mixture Models

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Hypothesis Testing in Smoothing Spline Models

Stat 579: Generalized Linear Models and Extensions

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Two Applications of Nonparametric Regression in Survey Estimation

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

Topic 12 Overview of Estimation

Generalized Elastic Net Regression

SIMULTANEOUS CONFIDENCE INTERVALS FOR SEMIPARAMETRIC LOGISTICS REGRESSION AND CONFIDENCE REGIONS FOR THE MULTI-DIMENSIONAL EFFECTIVE DOSE

Additive Isotonic Regression

Consistent high-dimensional Bayesian variable selection via penalized credible regions

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

The linear model is the most fundamental of all serious statistical models encompassing:

MCMC algorithms for fitting Bayesian models

ASSESSING A VECTOR PARAMETER

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Inference with few assumptions: Wasserman s example

Cointegrating Regressions with Messy Regressors: J. Isaac Miller

Invariant HPD credible sets and MAP estimators

On fixed effects estimation in spline-based semiparametric regression for spatial data

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Density Estimation. Seungjin Choi

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

Correlated Spatiotemporal Data Modeling Using Generalized Additive Mixed Model and Bivariate Smoothing Techniques

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Mathematical statistics

Econ 582 Nonparametric Regression

APTS course: 20th August 24th August 2018

Introduction. Chapter 1

Multivariate Survival Analysis

STAT331. Cox s Proportional Hazards Model

Covariance function estimation in Gaussian process regression

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Estimating prediction error in mixed models

On prediction and density estimation Peter McCullagh University of Chicago December 2004

Now consider the case where E(Y) = µ = Xβ and V (Y) = σ 2 G, where G is diagonal, but unknown.

Functional Latent Feature Models. With Single-Index Interaction

Bayesian Linear Regression

WU Weiterbildung. Linear Mixed Models

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

Variable Selection for Generalized Additive Mixed Models by Likelihood-based Boosting

Chapter 17: Undirected Graphical Models

A Bayesian Treatment of Linear Gaussian Regression

Issues on quantile autoregression

Stat 5101 Lecture Notes

Generalized Linear Models. Kurt Hornik

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Nonparametric Small Area Estimation via M-quantile Regression using Penalized Splines

Hierarchical Modelling for Univariate Spatial Data

Reduced-rank hazard regression

Short Questions (Do two out of three) 15 points each

Transcription:

Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York

Introduction Let Y 1, Y 2,..., Y n denote real-valued random variables of the form Y j = x T j β + γ(z j ) + ϵ j, j = 1,..., n where x 1,..., x n are constants in R p ; z 1,..., z n are constants, taking values in a set Z ϵ 1,..., ϵ n are unobserved mean-0 r.v.s such that ϵ = (ϵ 1,..., ϵ n ) T has a multivariate normal distribution covariance matrix Ω ϕ, ϕ Φ and β R p are unknown parameters γ is an unknown real-valued function on Z, taking values in a set of functions Γ Our goal is inference about the parameter β in the presence of the nuisance parameters γ and ϕ

The likelihood function for this model is given by Ω ϕ 1 1 2 exp{ 2 (Y Xβ g))t Ω 1 ϕ (Y Xβ g)} where Y = (y 1,..., y n ) T, X is the n p matrix with jth row x j, and g = (γ(z 1 ),..., γ(z n )) T. Hence, in order to proceed with likelihood inference for β some method of dealing with the nuisance parameters γ, ϕ is needed. Many methods of estimation have been proposed for this model: Engle, Granger, Rice, and Weiss (1986), Hastie and Tibshirani (1990), Heckman (1986), Ruppert, Wand, and Carroll (2003), Severini and Staniswalis (1994), and Speckman (1988). Most involve eliminating γ using some modification of the profile likelihood idea.

An alternative approach is to use an integrated likelihood in which γ is removed by averaging with respect to some weight function. Suppose that Z R and Γ is a set of differentiable functions on Z. Consider a weight function for γ corresponding to a mean-zero Gaussian stochastic process with covariance function K λ (, ) where λ is a parameter. Under this distribution, the vector (γ(z 1 ),..., γ(z n )) T has a multivariate normal distribution with mean vector 0 and covariance matrix Σ λ. The integrated likelihood is given by Ω ϕ + Σ λ 1 2 exp{ 1 2 (y xβ)t (Ω ϕ + Σ λ ) 1 (y xβ)}.

The integrated likelihood approach has several advantages: Restrictions on γ are often easy to impose by using a covariance function that respects the restrictions More complicated models in which the parameters of interest are intertwined with the unknown function are often easier to handle through the covariance structure than through the mean function of the observations It is straightforward to incorporate a parametric model for the covariance matrix of the errors

Inference based on an integrated likelihood is related to Bayesian inference in nonparametric and semiparametric regression models. Much of the Bayesian work in this area has made use of the fact that smoothing splines have a Bayesian interpretation (Wahba, 1990) and the covariance function is chosen so that spline estimation can be used (see below). Here the covariance function is chosen to reflect our assumptions about γ and the model Also we consider non-bayesian methods of inference and consider standard frequentist properties such as consistency and asymptotic distribution theory. However, the basic approach could also be applied to Bayesian inference.

Estimation The integrated likelihood is a normal likelihood with mean vector Xβ and covariance matrix V (θ) = Ω ϕ + Σ λ, θ = (ϕ, λ). Given the covariance parameter θ, β can be estimated by generalized least-squares: ˆβ(θ) = X T (X T V 1 X) 1 X T V 1 Y, V V (θ). When θ is unknown, it can be replaced by an estimator. To estimate θ, we can use the restricted maximum likelihood (REML) estimator, l p (θ) 1 2 log X T V (θ)x where l p is the profile integrated likelihood. Given the REML estimator ˆθ of θ, an estimator of β is given by ˆβ(ˆθ).

Note that standard methods of computation for mixed models can be used. To estimate γ, we can use the Best Linear Unbiased Predictor (BLUP) based on the assumption that γ is a random function. Let z denote an element of Z and consider estimation of γ(z ). The BLUP of γ(z ) is Σ (ˆθ)V (ˆθ) 1 (Y X ˆβ(ˆθ)). To use this approach, the covariance function K λ must be chosen; to do this, we consider the properties of {γ(z) : z Z} as a random process.

Models with an Unknown Continuous Function on the Real Line Suppose Z R and γ is a smooth function. It is often reasonable to assume that the covariance of γ(z) and γ( z) is a decreasing function of z z so that K λ (z, z) = τ 2 Kν ( z z /α) where K ν is a decreasing, positive definite function on [0, ) with K ν (0) = 1. Here τ > 0 is the standard deviation of γ(z), α > 0 represents a scale parameter, and ν represents a shape parameter (if present). One choice for K ν is the Gaussian covariance function K(t) = exp( 1 2 t2 ); then {γ(z) : z Z} is a stationary, infinitely-differentiable random process.

As noted earlier, the IL approach is related to spline estimation. There are at least two spline methods that can be used here: smoothing splines (e.g., Wahba, 1990) and penalized splines (e.g., Ruppert, Wand, and Carroll, 2003). Smoothing splines: γ is a mean-zero Gaussian process with covariance function 1 + z z [(1 + z 2 )(1 + z 2 )] 1 2, z, z [0, 1]. This process is nonstationary and highly correlated. Penalized splines: γ is a Gaussian stochastic process with mean δ 0 + δ 1 z + δ 2 z 2 and covariance function k K P (z, z) = τ 2 (z d j ) 2 ( z d j ) 2 for d k < z d k+1 and z z, where 0 < d 1 < d 2 <... < d r < 1 are given. Under K P, the correlation of γ(z), γ( z) is generally small. j=1

Incorporating Assumptions about γ( ) in the Model A main advantage of the IL approach is in models with additional assumptions on γ. Linear constraints on γ Suppose γ is subject to a constraint of the form T γ = 0 where T is a known, realvalued, affine function on L 2 (Z). In carrying out the IL approach, we need a distribution for {γ(z) : z Z} that respects the condition T γ = 0. First consider a mean-zero Gaussian process {γ 0 (z) : z Z} with Gaussian covariance function H λ and take {γ(z) : z Z} to have the the conditional distribution of γ 0 given that T γ 0 = 0. This conditional distribution is identical to the distribution of (Janson, 1997). γ 0 (z) Cov[γ 0(z), T γ 0 ] T γ 0 Var(T γ 0 )

It follows that {γ(z) : z Z} is a mean-zero Gaussian process with covariance function K λ (t, s) = H λ (t, s) Cov[γ 0(t), T γ 0 ; λ]cov[γ 0 (s), T γ 0 ; λ]. Var(T γ 0 ; λ) Thus, the restriction can be taken into account by simply modifying the covariance function of the process. For instance, suppose that T γ 0 = Z γ 0 (t)w(t)dt c where w is a given element of L 2 (Z) and c is a constant. Then K λ (t, s) = H λ (t, s) Z H λ(s, t)w(t)dt Z H λ(s, t)w(s)ds Z Z H. λ(s, t)w(s)w(t)ds dt

Asymptotic Properties of the Estimator Suppose that θ satisfies ˆθ = θ + O p (1/ n). Recall that θ = (ϕ, λ) where ϕ is a parameter of the error covariance matrix and λ is a parameter of the covariance function of γ( ). Therefore ϕ = ϕ 0, the true value of ϕ. However, there is no conventional true value of λ. ˆβ has the same asymptotic distribution as ˆβ (X T (V ) 1 X) 1 X T (V ) 1 Y, V = V (θ ). Note that ˆβ is normally distributed but it has bias (X T (V ) 1 X) 1 X T (V ) 1 g, g = (γ(z 1 ),..., γ(z n )) T.

The key idea in showing that the bias is asymptotically negligible is that Σ λ properties similar to a covariance function of g. has E.g., suppose that Ω ϕ = I and Σ λ g. Then (V ) 1 g = = gg T, the sample covariance function based on 1 1 + g g g = O(n 1 ). Under fairly general conditions on γ, it can be shown that n( ˆβ β0 ) D N(0, M ) as n where M lim n n[(xt V 1 (θ )X) 1 X T V 1 (θ )Ω ϕ0 V 1 (θ )X(X T V 1 (θ )X) 1 ].

Examples Example 1: Semiparametric regression model with independent errors Bowman and Azzalini (1997) present data taken taken from a survey of the fauna on the sea bed lying between the coast of northern Queensland and the Great Barrier Reef. Let Y denote catch score 1 and let x and z denote the latitude and longitude, respectively, of the sampling position. Here we use the data from zone 1; the sample size is n = 42. An appropriate model for these data is Y j = β 0 + β 1 x j + γ(z j ) + ϵ j, j = 1,..., n where ϵ 1,..., ϵ n are independent error terms with mean 0 and constant variance.

This model was fit using the IL method with a Gaussian covariance function. For comparison, the model was also fit using the generalized additive model approach of Hastie & Tibshirani (smoothing splines), the penalized spline method described in Semiparametric Regression by Ruppert, Wand, & Carroll and a kernel-based estimator (Speckman, 1988 and many others). Estimates of β 1 (reported SE): IL: 1.020(0.356) GAM: 1.153(0.371) Pen Spline: 1.098(0.368) Kernel: 1.203(0.371) The estimates of γ are also in close agreement.

Estimates of gamma in the reef example gamma(z) 0.5 0.0 0.5 1.0 Int Like SPM GAM Kernel 143.0 143.2 143.4 143.6 143.8 z

A small simulation study was conducted in which data were simulated from the model described here, with the parameter values taken to be the estimates based on the integrated likelihood method. A Monte Carlo sample size of 5000 was used. Comparison of Estimators in the Reef Example Method Int Lik GAM Pen Spline Kernel Bias -0.067-0.017-0.007 0.030 SD 0.365 0.364 0.368 0.423 MSE 0.138 0.133 0.135 0.180 Est SE 0.350 0.354 0.360 0.377 Cov Prob 0.933 0.938 0.940 0.910

Example 2: A shape-invariant model Hastie, Tibshirani, and Friedman (2001) describe data on bone mineral density (BMD) in adolescents. The response variable Y j is relative change in spinal BMD, which is modeled as a function of age and gender. Preliminary analysis suggests that the relationship between Y j and age is different for males and females, with the function relating Y j and age for males being a scaled and shifted version of the corresponding function for females. This observation suggests a model in which the mean of Y j is of the form β 0 + β x j 1 γ(z j + β 2 x j ) where z j denotes age and x j = 1 is subject j is male and 0 otherwise.

It follows that the mean function for males is β 0 +β 1 γ(z j +β 2 ) while the mean function for females is β 0 + γ(z j ). To compute the IL, we use a weight function based on taking γ to be a mean 0 Gaussian process with a Gaussian covariance function. Then Cov(β x j 1 γ(z j + β 2 x j ), β x k 1 γ(z k + β 2 x k )) = β x j+x k 1 K λ ( z j z k + β 2 (x j x k ) ). There is a further complication to this data set some of the subjects are tested multiple times (485 observations on 261 subjects). To account for this, the model was modified to include subject-specific intercept terms, taken to be normally distributed random effects.

Thus, the model has 7 parameters: β 0, the mean of the subject-specific intercepts β 1 and β 2, which describe how males and females differ the variances of the error term and of the random interecepts two parameters for the Gaussian covariance function. Note that the parameters of primary interest, β 1 and β 2, appear in the covariance matrix of Y, rather than in the mean function. The estimate of the shift is 2.1 years (SE = 0.19); the estimate of the scaling factor is 0.79 (SE = 0.068). The plot of the estimated model describes the differences between the relationship between change in BMD and age for males and females.

Comparison of Males and Females in the BMD Example Relative Change in Spinal BMD 0.05 0.00 0.05 0.10 0.15 0.20 10 15 20 25 Age

Summary The IL method provides a conceptually easy approach to estimation in models with an unknown function In simple models, the IL method works (nearly) as well as standard methods In more complicated settings, it is often straightforward to modify the covariance function used to form the IL Computation: standard methods work surprisingly well in the normal case; for non-normal errors more sophisticated methods will be needed Current proofs of asymptotic properties require stronger conditions than other methods; examples suggest that weaker conditions would suffice