Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Similar documents
Spatially Adaptive Smoothing Splines

Fahrmeir: Recent Advances in Semiparametric Bayesian Function Estimation

Data Mining Stat 588

Generalized Additive Models

* * * * * * * * * * * * * * * ** * **

Statistics 203: Introduction to Regression and Analysis of Variance Course review

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Combining estimates in regression and. classication. Michael LeBlanc. and. Robert Tibshirani. and. Department of Statistics. cuniversity of Toronto

Density Estimation. Seungjin Choi

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Nonparametric Bayesian Methods (Gaussian Processes)

Abstract Nonparametric regression techniques such as spline smoothing and local tting depend implicitly on a parametric model. For instance, the cubic

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Bayesian linear regression

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.

Linear Models for Regression

Comments on \Wavelets in Statistics: A Review" by. A. Antoniadis. Jianqing Fan. University of North Carolina, Chapel Hill

Bayesian Regression Linear and Logistic Regression

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Nonparametric Regression. Badr Missaoui

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Likelihood NIPS July 30, Gaussian Process Regression with Student-t. Likelihood. Jarno Vanhatalo, Pasi Jylanki and Aki Vehtari NIPS-2009

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Contents. Part I: Fundamentals of Bayesian Inference 1

Pattern Recognition and Machine Learning

STA 4273H: Sta-s-cal Machine Learning

Analysing geoadditive regression data: a mixed model approach

Logistic Regression. Seungjin Choi

STA414/2104 Statistical Methods for Machine Learning II

Recap. HW due Thursday by 5 pm Next HW coming on Thursday Logistic regression: Pr(G = k X) linear on the logit scale Linear discriminant analysis:

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

A Modern Look at Classical Multivariate Techniques

Gaussian process for nonstationary time series prediction

Spatial Process Estimates as Smoothers: A Review

STA 4273H: Statistical Machine Learning

Latent Variable Models for Binary Data. Suppose that for a given vector of explanatory variables x, the latent

Bayesian Linear Regression

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Default Priors and Effcient Posterior Computation in Bayesian

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance

Regularization Paths

A short introduction to INLA and R-INLA

ECE521 week 3: 23/26 January 2017

Machine Learning Lecture 5

Nonparametric Bayesian Methods - Lecture I

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

COS 424: Interacting with Data

Gaussian processes for spatial modelling in environmental health: parameterizing for flexibility vs. computational efficiency

Estimation of cumulative distribution function with spline functions

Introduction to Smoothing spline ANOVA models (metamodelling)

Spatial smoothing using Gaussian processes

Relevance Vector Machines

ADVANCED MACHINE LEARNING ADVANCED MACHINE LEARNING. Non-linear regression techniques Part - II

Outline Lecture 2 2(32)

BANA 7046 Data Mining I Lecture 6. Other Data Mining Algorithms 1

Multivariate statistical methods and data mining in particle physics

Generalized Additive Models (GAMs)

Bayesian Estimation and Inference for the Generalized Partial Linear Model

Part 8: GLMs and Hierarchical LMs and GLMs

Package FDRreg. August 29, 2016

Statistical Data Mining and Machine Learning Hilary Term 2016

Automatic Local Smoothing for Spectral Density. Abstract. This article uses local polynomial techniques to t Whittle's likelihood for spectral density

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

CPSC 540: Machine Learning

A general mixed model approach for spatio-temporal regression data

Can we do statistical inference in a non-asymptotic way? 1

Graphical Models for Collaborative Filtering

Machine Learning Lecture 7

Dynamic System Identification using HDMR-Bayesian Technique

Gaussian Processes for Regression. Carl Edward Rasmussen. Department of Computer Science. Toronto, ONT, M5S 1A4, Canada.

1 Bayesian Linear Regression (BLR)

Regression, Ridge Regression, Lasso

Stat 5101 Lecture Notes

Efficient Bayesian Multivariate Surface Regression

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Backtting algorithms for total-variation and empirical-norm penalized additive modeling with high-dimensional data

Restricted Likelihood Ratio Tests in Nonparametric Longitudinal Models

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

Modelling Survival Data using Generalized Additive Models with Flexible Link

Towards a Bayesian model for Cyber Security

Covariance Matrix Simplification For Efficient Uncertainty Management

Or How to select variables Using Bayesian LASSO

Alternatives. The D Operator

Probabilistic Matrix Factorization

type of the at; and an investigation of duration of unemployment in West Germany from 1980 to 1995, with spatio-temporal data from the German Federal

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

Lecture 6: Bayesian Inference in SDE Models

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Linear Regression and Its Applications

Biostatistics Advanced Methods in Biostatistics IV

Statistical Methods for Data Mining

Markov Chain Monte Carlo Algorithms for Gaussian Processes

Linear Regression Linear Regression with Shrinkage

Transcription:

Stanford University June 2, 1998 Bayesian Backtting: 1 Bayesian Backtting Trevor Hastie Stanford University Rob Tibshirani University of Toronto Email: trevor@stat.stanford.edu Ftp: stat.stanford.edu: pub/hastie WWW: http://stat.stanford.edu/~trevor These transparencies are available via ftp: ftp://stat.stanford.edu/pub/hastie/bayes.ps

Stanford University June 2, 1998 Bayesian Backtting: 2 In a Nutshell y = f1 + f2 + : : : + f p + " Backtting cycles around and replaces each current function P estimate f j by f j S j (y k6=j f k), where S j is a smoothing operator; Gibbs sampling cycles around and obtains a new realization of f j via f j S j (y P k6=j f k) + S 1=2 j z, where z is a vector of N(0; 1) variates and is the standard deviation.

Stanford University June 2, 1998 Bayesian Backtting: 3 ' $ Example: Air Pollution Inversion Base Height 0 1000 3000 5000 10 5 0 5 10 Daggot Pressure Gradient 50 0 50 100 15 10 5 0 5 10 Inversion Base Temperature 0 100 200 300 10 0 10 20 Visibility (miles) 0 100 200 300 10 5 0 5 10 Four environmental variables in a air pollution study.

Stanford University June 2, 1998 Bayesian Backtting: 4 Outline Smoothing splines a Bayesian version. Additive Models and backtting Bayesian backtting Example: Bone mineral density Priors, variance components and DF GAMs and MetropolisHastings

Stanford University June 2, 1998 Bayesian Backtting: 5 Smoothing Splines y i = f(x i ) + " i ; " i N(0; 2 ) X R(f; y) = (y i f(x i )) 2 + i Z [f 00 (x)] 2 dx Solution is natural cubic spline, with knots at unique values of x Solutions vary between linear ts ( = 1) and interpolating ts ( = 0) Finite dimensional solution minimizes with solution (y f) T (y f) + f T Kf; f = (I + K) 1 y = S()y

Stanford University June 2, 1998 Bayesian Backtting: 6 Bayesian Smoothing Splines y = f + " f N(0; 2 K ) where K is partially improper (linear functions have innite variances) " N(0; 2 I) (y; f) (y f) T (y f)=2 2 + f T Kf=2 2 fjy N(S()y; S() 2 ), where = 2 = 2 f = S()y + S() 1 2 z where z is an Nvector of N(0; 1) variates.

Stanford University June 2, 1998 Bayesian Backtting: 7 Picture the Prior f N(0; 2 K ), or equivalently, f = N, and N(0; 2 D) 0 10 20 30 40 50 0 10 20 30 40 50 number number number number 0.2 0.1 0.0 eigenfunction 0.2 0.1 number number number number 0.1 0.2 0.0 0.1 0.2 0 10 20 30 40 50 0 10 20 30 40 50 Eigenfunctions of the Prior Covariance Prior Variance 10^3 10^0 10^2 10^4 0 10 20 30 40 50 Number

Stanford University June 2, 1998 Bayesian Backtting: 8 Backtting Additive Models y = f 1 (x 1 ) + f 2 (x 2 ) + + f p (x p ) + " Estimating Equations f 1 (x 1 ) = S 1 (y f 2 (x 2 ) f p (x p )) f 2 (x 2 ) = S 2 (y f 1 (x 1 ) f p (x p )). f p (x p ) = S p (y f 1 (x 1 ) f 2 (x 2 ) ) where S j are: univariate regression smoothers such as smoothing splines, lowess, kernel linear regression operators yielding polynomial ts, piecewise polynomials,... more complicated operators: surface smoothers for 2nd order interactions, and random eects shrinkage operators. We use GaussSeidel or \backtting" to solve these estimating equations.

Stanford University June 2, 1998 Bayesian Backtting: 9 Justication? Example: penalized least squares Minimize nx i=1 y i px j=1 2+ px f j (x ij ) j=1 j Z (f 00 j (t)) 2 dt m f 1 = S 1 ( 1 )(y X j6=1 f j ) f 2 = S 2 ( 2 )(y X j6=2 f j ). f p = S p ( p )(y X j6=p f j ) where S j ( j ) denotes a smoothing spline using variable x j and penalty coecient j. Each smoothing spline t is O(N) computations, hence so is entire t.

Stanford University June 2, 1998 Bayesian Backtting: 10 Bayesian Additive Splines f j N(0; 2 j K j ) f j jy N(G j y; C j 2 ) where G j and C j are ugly and O(N 3 ) computations. You don't believe me? See Wahba (1990). Backtting computes ^f j = G j y eciently in O(N) computations, but not C j. Current GAM software in Splus approximates C j by C 0 j + S j, where C 0 j is the exact posterior covariance operator for the linear part of f j, but S j is the exact posterior covariance operator for the nonlinear part of a univariate spline problem.

Stanford University June 2, 1998 Bayesian Backtting: 11 Gibbs sampling saves the day! L(f j jy; f k ; k 6= j) = L(f j jy X k6=j f k ; ff k ; k 6= jg) = N(S j (y X k6=j f k ); S j 2 ) By replacing the backtting operator f j S j (y P k6=j f k) by the univariate Bayesian spline posterior sampler P f j S j (y k6=j f k) + S 1=2 j z we generate a Markov chain, whose stationary distribution coincides with (f1; f2; : : : ; f p jy) Carter and Kohn (1994, Biometrika), and Kohn and Wong (1998, manuscript) propose a similar algorithm using a state space approach.

Stanford University June 2, 1998 Bayesian Backtting: 12 Other operators We can use the Bayesian Backtting algorithm with operators other than smoothing splines S j. General nonparametric smoother: S j denes the smoothing operation, with implicit prior f j N(0; (S j I) 2 ): The operator S 1=2 j is found by a simple Taylor series expansion. Fixed linear eects: S j = X j (X T j X j) 1 X T j. This results from the model f j = X j j with j N(0; 2 D) and D diagonal, and! 1. Then S 1=2 j = S j and is easily applied. For the intercept term, P for example, we simply obtain p N(ave[y 1 f j]; 2 =n). Random linear eects: S j = X j (X T X j j + 2 1 ) 1 X T j. This results from f j = X T j j with j N(0; ).

Stanford University June 2, 1998 Bayesian Backtting: 13 ' $ Example: Growth Curves y ij = f(t ij ) + x T i E + V i + " ij Age Spinal Bone Mineral Density 10 15 20 25 0.6 0.8 1.0 1.2 1.4 Age Spinal Bone Mineral Density 10 15 20 25 0.3 0.2 0.1 0.0 0.1 0.05 0.0 0.05 Asian Black Hispanic White Ethnic Group Age Random Individual Level Effect 10 15 20 25 0.2 0.0 0.2 0.4

Stanford University June 2, 1998 Bayesian Backtting: 14 Functionals of posteriors Posterior Distributions of Functionals Derivative of Posterior Mean 0.02 0.0 0.02 0.04 0.06 0.08 10 15 20 25 Age The location of the maximum derivative (center of growth spurt) is not too convincing. We now attempt a more realistic model

Stanford University June 2, 1998 Bayesian Backtting: 15 Computing S 1 2 z Smoothing Splines Writing f(x) = P M j=1 b j(x) j in terms of the natural spline basis, the posterior distribution for f has the form f jy N(Sy; S 2 ) = N(B^; B^ B T ) ^ = (B T B + ) 1 B T y ^ = 2 (B T B + ) 1 The Cholesky squareroot of the last expression is computed routinely in the smoothing spline computations, and so is available for our purposes. General Smoothers S 1=2 = S 1 2 S(S I) + 3 8 S(S I)2 5 16 S(S I)3

Stanford University June 2, 1998 Bayesian Backtting: 16 ' $ y ij = f(t ij i ) + x T i E + V i + " ij Age Spinal Bone Mineral Density 10 15 20 25 0.6 0.8 1.0 1.2 1.4 Principal Curve Age Mean Age Curve 10 15 20 25 0.0 0.1 0.2 0.3 0.4 Girl Average Age Random Level Shifts 10 15 20 25 0.2 0.1 0.0 0.1 0.2 Girl Average Age Random Age Shifts 10 15 20 25 1.0 0.5 0.0 0.5 1.0 1.5

Stanford University June 2, 1998 Bayesian Backtting: 17 Age Shifted Mean Curves 0.0 0.1 0.2 0.3 0.4 Level Shifted Mean Curves 0.6 0.8 1.0 1.2 1.4 10 15 20 25 10 15 20 25 Age Adjusted Age Shrunken Level Fits 0.3 0.2 0.1 0.0 0.1 0.2 0.3 10 15 20 25 Girl Average Age

Stanford University June 2, 1998 Bayesian Backtting: 18 ' $ 0.2 0.1 0.0 0.1 0.2 107 2 0 2 93 124 2 0 2 37 109 57 2 0 2 68 0.2 0.1 0.0 0.1 0.2 36 2 0 2 Theta V Girl Average Age Spinal BMD 10 15 20 25 0.6 0.8 1.0 1.2 1.4 107 93 124 37 109 57 68 36

Stanford University June 2, 1998 Bayesian Backtting: 19 Estimating 2, 2,... A fully hierarchical Bayesian approach puts priors on these hyperparameters, and generates them along with the other posterior realizations. Empirical Bayes procedures maximize the marginal likelihood of y to estimate the hyperparameters. Very similar to REML Restricted Maximum Likelihood Estimation. This highlights the formal equivalence of additive spline models and mixedeects models: Y = N 1 1 + N 2 2 + : : : N p p + " Crossvalidation, GCV and Cp use prediction error to guide selection.

Stanford University June 2, 1998 Bayesian Backtting: 20 Priors for 2, 2,... p( 2 j ) 1= 2 j or 1= 2 j exp( = 2 j ) with = 10e 10. p( 2 ) 1= 2. Wong and Kohn (1998), Carter and Kohn (1994) These lead to Inverse Gaussian posteriors: p( 2 j jy; 2 ; ff j g p 1 ) = p( 2 j jf j) = IG(n=2; 1 2 f T j K j f j + j ) p( 2 jrest) = IG(n=2; jjejj 2 ) where e = y P j f j. These are generated within each cycle of the Gibbs algorithm, along with the functions. O(N) computations.

Stanford University June 2, 1998 Bayesian Backtting: 21 Posterior for df The eective degrees of freedom are dened as df = trs(), where = 2 = 2. Inversion Base Height Daggot Pressure Gradient Degrees of Freedom 0 2 4 6 8 10 12 Degrees of Freedom 0 2 4 6 8 10 12 0 1000 3000 5000 0 1000 3000 5000 Iteration Iteration Inversion Base Temperature Visibility (miles) Degrees of Freedom 0 2 4 6 8 10 12 Degrees of Freedom 0 2 4 6 8 10 12 0 1000 3000 5000 Iteration 0 1000 3000 5000 Iteration

Stanford University June 2, 1998 Bayesian Backtting: 22 Prior for df? Using the priors for 2 and 2, we can induce a prior for df for any sequence of x values. 50 40 Percent of Total 30 20 10 0 0 10 20 30 40 50 Prior Degrees of Freedom Actually, this gure is based on log 2 U[ 25; 25]. When 25! 1, we get point masses of 1 at 2 and N! 2

Stanford University June 2, 1998 Bayesian Backtting: 23 Bayes vs Bootstrap Suppose 2 is known, and we add residuals r N(0; I 2 ) to ^f and ret. For a single smoothing spline: Bootstrap: f N(S 2 y; S 2 2 ) Bayes: f jy N(Sy; S 2 ) S > S 2, and the Bayes posterior intervals are wider than the bootstrap intervals; they include an average bias component. For an additive spline model: Bootstrap: f j = N(A j Ay; A 2 j 2 ) Bayes: f j N(A j y; (I A j )S j (I S j ) 2 ) Bayes Bootstrap 0.0.41.45 0.5.43.47 0.9.51.64

Stanford University June 2, 1998 Bayesian Backtting: 24 Generalized Additive Models Suppose instead we have a GAM, such as an additive logistic regression model: LogitP (Y = 1jx) = X j f j (x j ) where the f j can be functions, random eects or \xed" eects, each with their (Gaussian) priors N(0; K j ) and hyperparameters j. Similar to Zeger and Karim (1991, JASA), we propose a MetropolisHastings scheme for updating the functions: At the current state, approximate the likelihood by a Gaussian, thus creating a working response z i and weights w i. Generate a new realization f 0 to replace j f j from this Gaussian approximation, which we denote by q(f j ; f j). 0

Stanford University June 2, 1998 Bayesian Backtting: 25 Move to f 0 j with probability min 0 (f )q(f 0 ; f) (f)q(f; f 0 ) ; 1 where (f) denotes the posterior. Again, all the computations can be performed in O(N) operations per update for smoothing splines, random eects and xed eects. This allows for estimation of mixed eects GLMs, with both the usual random eects as well as nonparametric smoothers, in a seamless fashion. The End