Spatially Adaptive Smoothing Splines

Similar documents
Inversion Base Height. Daggot Pressure Gradient Visibility (miles)

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Bayesian Regression Linear and Logistic Regression

Modelling geoadditive survival data

Priors for Bayesian adaptive spline smoothing

Introduction to Smoothing spline ANOVA models (metamodelling)

Odds ratio estimation in Bernoulli smoothing spline analysis-ofvariance

Analysing geoadditive regression data: a mixed model approach

Penalized Splines, Mixed Models, and Recent Large-Sample Results

MCMC Sampling for Bayesian Inference using L1-type Priors

A STATISTICAL TECHNIQUE FOR MODELLING NON-STATIONARY SPATIAL PROCESSES

Doubly Penalized Likelihood Estimator in Heteroscedastic Regression 1

STA414/2104 Statistical Methods for Machine Learning II

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Functional Estimation in Systems Defined by Differential Equation using Bayesian Smoothing Methods

An Introduction to GAMs based on penalized regression splines. Simon Wood Mathematical Sciences, University of Bath, U.K.

Effective Computation for Odds Ratio Estimation in Nonparametric Logistic Regression

Supplement to A Hierarchical Approach for Fitting Curves to Response Time Measurements

Bayesian Modeling of Conditional Distributions

STAT 518 Intro Student Presentation

Density Estimation. Seungjin Choi

Fahrmeir: Recent Advances in Semiparametric Bayesian Function Estimation

Kernel density estimation in R

Uncertainty Quantification for Inverse Problems. November 7, 2011

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

An example of Bayesian reasoning Consider the one-dimensional deconvolution problem with various degrees of prior information.

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Alternatives. The D Operator

Spatial Process Estimates as Smoothers: A Review

Hypothesis Testing in Smoothing Spline Models

Lecture 8: The Metropolis-Hastings Algorithm

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

CPSC 540: Machine Learning

Spatial Statistics with Image Analysis. Outline. A Statistical Approach. Johan Lindström 1. Lund October 6, 2016

Sequential Monte Carlo Methods for Bayesian Computation

A short introduction to INLA and R-INLA

Nonparametric Bayesian Methods - Lecture I

Bayesian Regularization

Lecture 2: From Linear Regression to Kalman Filter and Beyond

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Fundamental Issues in Bayesian Functional Data Analysis. Dennis D. Cox Rice University

Beyond Mean Regression

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Riemann Manifold Methods in Bayesian Statistics

Reduction of Model Complexity and the Treatment of Discrete Inputs in Computer Model Emulation

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Statistics & Data Sciences: First Year Prelim Exam May 2018

I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A)

Econometrics I, Estimation

Nonparametric Drift Estimation for Stochastic Differential Equations

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Automatic Smoothing and Variable Selection. via Regularization 1

Constrained Gaussian processes: methodology, theory and applications

Advanced Computational Methods in Statistics: Lecture 5 Sequential Monte Carlo/Particle Filtering

Statistical inference on Lévy processes

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Malliavin Calculus in Finance

Vast Volatility Matrix Estimation for High Frequency Data

A general mixed model approach for spatio-temporal regression data

A new iterated filtering algorithm

Basis Penalty Smoothers. Simon Wood Mathematical Sciences, University of Bath, U.K.

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Gaussian processes for inference in stochastic differential equations

Consistent high-dimensional Bayesian variable selection via penalized credible regions

On fixed effects estimation in spline-based semiparametric regression for spatial data

ERRATA. for Semiparametric Regression. Last Updated: 30th September, 2014

Stat 451 Lecture Notes Numerical Integration

Nonparametric Bayesian Methods (Gaussian Processes)

Bayesian Analysis of Multivariate Smoothing Splines

Markov Chain Monte Carlo methods

Multivariate Normal & Wishart

The Metropolis-Hastings Algorithm. June 8, 2012

Generalized Cross Validation

Computational statistics

On Bayesian Computation

PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Factorization of Seperable and Patterned Covariance Matrices for Gibbs Sampling

Bayesian Methods for Machine Learning

Bayesian Nonparametrics

Scalable MCMC for the horseshoe prior

Statistical Inference and Methods

Analysis Methods for Supersaturated Design: Some Comparisons

STA 294: Stochastic Processes & Bayesian Nonparametrics

Stochastic optimal control with rough paths

Stochastic Collocation Methods for Polynomial Chaos: Analysis and Applications

Calibrating Environmental Engineering Models and Uncertainty Analysis

Lecture 7 and 8: Markov Chain Monte Carlo

Bayesian inverse problems with Laplacian noise

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

An Introduction to Functional Data Analysis

Dynamic System Identification using HDMR-Bayesian Technique

Markov Chain Monte Carlo (MCMC)

Bayesian Analysis of RR Lyrae Distances and Kinematics

1. SS-ANOVA Spaces on General Domains. 2. Averaging Operators and ANOVA Decompositions. 3. Reproducing Kernel Spaces for ANOVA Decompositions

On Conditional Variance Estimation in Nonparametric Regression

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Bayesian Estimation and Inference for the Generalized Partial Linear Model

Bayesian linear regression

Estimation of cumulative distribution function with spline functions

Transcription:

Spatially Adaptive Smoothing Splines Paul Speckman University of Missouri-Columbia speckman@statmissouriedu September 11, 23 Banff 9/7/3

Ordinary Simple Spline Smoothing Observe y i = f(t i ) + ε i, = 1,, n ε i iid N(, σ 2 ) t i [, 1] WLOG f Only known to be smooth min f n (y i f(t i )) 2 + λ i=1 1 f (p) (t) 2 dt Silverman s equivalent kernel: ˆf(t) 1 n n ( t ti w i=1 λ 1/(2p+1) ) y i Banff 9/7/3 1

Example: Heaviside function 18 true data ( ) and noisy data () 16 14 12 f true (x) and f noisy (x) 1 8 6 4 2 n = 5 2 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 2

16 true data ( ) and best fit Smooth Spl 14 12 1 f true (x) and f SS (x) 8 6 4 2 2 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 3

Spatially adaptive Idea: Somehow adjust penalty for roughness of f min f n (y i f(t i )) 2 + i=1 1 λ(t)f (p) (t) 2 dt for a good choice of λ(t) > Banff 9/7/3 4

Problem: choose λ(t) from the data Two proposals Frequentist: Pintore, Speckman, Holmes (23) Use GCV Bayesian w/ Dongchu Sun Use discretized approximation to smoothing spline Type of stochastic volatility model In progress Related work Local bandwidth kernel smoothers Local GCV: Cummins, Filloon, Nychka (21) Adaptive P-splines: Ruppert and Carroll Banff 9/7/3 5

Toward frequentist exact solution min f n (y i f(t i )) 2 + i=1 Special case of L-spline with Lf = λd p f: 1 λ(t)f (p) (t) 2 dt min f n (y i f(t i )) 2 + i=1 1 (Lf(t)) 2 dt Reproducing kernel: (eg, Gu or Heckman and Ramsay) K λ (s, t) = G(s, u) = 1 1 G(s, u)g(t, u) du λ(u) (s u)p 1 + (p 1)! Banff 9/7/3 6

Solution satisfies ˆf(t) = n c j K λ (t, t j ) + j=1 f = Σ λ c + Td Σ λ = [K λ (t i, t j )] n n 1 λ (f (p) ) 2 = f Σ λ f p 1 j= Problem becomes min y Σ λ c Td 2 + f Σ λ f f One common solution: Factorize [ ] R T = [Q 1 Q 2 ] Q = [Q 1 Q 2 ] orthogonal d j φ j (t) Banff 9/7/3 7

Solution (see Wahba or Gu) ˆf = Aλ y I A λ = nq 2 (Q 2M λ Q 2 ) 1 Q 2 M λ = Σ λ + ni So ˆf = y nq2 (Q 2M λ Q 2 ) 1 Q 2y Banff 9/7/3 8

Special case: piecewise constant λ We fix = τ < τ 1 < < τ K < τ K+1 = 1 Assume each τ k = t i, some i Piecewise constant: λ(t) = λ k = e γ k, t [τ k 1, )τ k ) Explicit (but messy) formula for Σ λ for p = 2 ˆf λ is a polynomial spline with multiple knots at τ k Banff 9/7/3 9

Choosing appropriate λ by GCV We use Generalized Cross Validation with extra cost term: Cost κ 1 A λ does not diagonalize V (λ 1,, λ K ) = n (I A λ)y 2 (tr(i κa λ )) 2 Brute force: Matlab, Nelder-Mead optimization Seems to work for K 2 Banff 9/7/3 1

Examples Heaviside: K = 5 16 true data ( ) and best fit Spat Adapt Spl with 5 jumps 14 12 1 f true (x) and f SAS(5) (x) 8 6 4 2 2 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 11

Bayesian interpretation: Wahba, 1978 Reasonable prior X(t) is mean zero Gaussian process with covariance K λ (s, t) Diffuse prior on X(),, D p 1 X() Then posterior is spatially adaptive spline: normal with mean A λ y and covariance σ 2 A λ (Wahba, 1978) Nychka showed that average frequentist coverage tends to have right level Banff 9/7/3 12

Bayes credibility interval 2 Spat Adapt Spl with 5 jumps ( ) and 95% Confidence Intervals 15 f SAS(5) (x) and 95% confidence intervals 1 5 5 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 13

Doppler function f(t) = (t(1 t)) 1/2 sin 2π(1 + a)/(t + a), a = 5, n = 128, S-N = 7 8 true data ( ) and noisy data () 6 4 f true (x) and f noisy (x) 2 2 4 6 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 14

Doppler: ordinary smoothing 5 true data ( ) and best fit Smooth Spl 4 3 2 f true (x) and f SS (x) 1 1 2 3 4 5 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 15

Doppler:K = 5 5 true data ( ) and best fit Spat Adapt Spl with 5 jumps 4 3 2 f true (x) and f SAS(5) (x) 1 1 2 3 4 5 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 16

Doppler:K = 1 5 true data ( ) and best fit Spat Adapt Spl with 1 jumps 4 3 2 f true (x) and f SAS(1) (x) 1 1 2 3 4 5 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 17

Doppler:K = 2 6 true data ( ) and best fit Spat Adapt Spl with 2 jumps 4 2 f true (x) and f SAS(2) (x) 2 4 6 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 18

Doppler:K = 5, Bayesian interval 6 Spat Adapt Spl with 5 jumps ( ) and 95% Confidence Intervals 4 f SAS(5) (x) and 95% confidence intervals 2 2 4 6 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 19

Experiments with κ: Heaviside, K = 1, κ = 1 18 true data ( ) and Spat Adapt Spl with 1 jumps ( ), κ = 1 16 14 f true (x) and f SAS(1) (x), κ = 1 12 1 8 6 4 2 2 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 2

Heaviside, K = 1, κ = 12 18 true data ( ) and Spat Adapt Spl with 1 jumps ( ), κ = 12 16 14 f true (x) and f SAS(1) (x), κ = 12 12 1 8 6 4 2 2 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 21

Heaviside, K = 1, κ = 14 16 true data ( ) and Spat Adapt Spl with 1 jumps ( ), κ = 14 14 12 f true (x) and f SAS(1) (x), κ = 14 1 8 6 4 2 2 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 22

Conclusions for this part κ = 14 works pretty well in at least one example K is another regularizing parameter K = 5 or 1 seemed to work well We can get empirical Bayes credibility intervals for f Much more research needed Banff 9/7/3 23

Bayesian model Our simplified approach Discretize Use MCMC to get posterior quantities Banff 9/7/3 24

Bayesian model for spline smoothing (Wahba, 1978) Prior for f: D p X(t) = b dw (t) Diffuse prior: X(), DX(),, D p 1 X() Posterior for f with this prior: normal Mean = smoothing spline with λ = σ 2 /b Variance = σ 2 A λ Kohn and coauthors have championed this model; Hastie and Tibshirani (22) Banff 9/7/3 25

Discretize the prior: assume t i s equally spaced (Whitaker) x i = x i x i 1 2 x i = x i x i 1 etc Assume p x i iid N(, b), i = p + 1,, n x 1, x 1,, p 1 x p diffuse Equivalent forms: Bx N(, bi n p ) p(x) b (n p)/2 exp ( 1 ) 2b x B Bx Banff 9/7/3 26

(Partially informative prior) B B is Beran s annihilator matrix For moderate n, fit is indistinguishable from regular smoothing spline Banff 9/7/3 27

Rich class of priors on function space Equivalent? min f,γ n (y i f(t i )) 2 + i=1 1 e γ(t) f (p) (t) 2 dt + η 1 γ (q) (t) 2 dt Banff 9/7/3 28

Bayes solution Inverse gamma prior on σ 2 Inverse gamma prior on η MCMC: Markov chain Monte Carlo simulation to estimate posterior quantities Gibbs sampling is relatively simple: x y, γ, η, σ 2 has multivariate normal smoothing spline posterior σ 2 y, x, γ, η is inverse gamma η y, x, γ, σ 2 is inverse gamma γ i γ i, y, x, σ 2, η does not have nice form and we now sample one component at a time Mixing can be slow, especially for p > 2 Banff 9/7/3 29

Example: Doppler, n = 5, nonadaptive 8 6 4 2-2 -4-6 2 4 6 8 1 x Banff 9/7/3 3

Example: Doppler, n = 5, adaptive 8 6 4 2-2 -4-6 2 4 6 8 1 delta_k 4 3 2 1 2 4 6 8 1 Banff 9/7/3 31

Computational issues: p = 3 would be better MCMC convergence issues What is the continuous version of this process? Ideas extend to other penalities such as L 1 In principle extends to higher dimensions Data dependent penalties open many new possibilities Banff 9/7/3 32