Spatially Adaptive Smoothing Splines

Spatially Adaptive Smoothing Splines Paul Speckman University of Missouri-Columbia speckman@statmissouriedu September 11, 23 Banff 9/7/3

Ordinary Simple Spline Smoothing Observe y i = f(t i ) + ε i, = 1,, n ε i iid N(, σ 2 ) t i [, 1] WLOG f Only known to be smooth min f n (y i f(t i )) 2 + λ i=1 1 f (p) (t) 2 dt Silverman s equivalent kernel: ˆf(t) 1 n n ( t ti w i=1 λ 1/(2p+1) ) y i Banff 9/7/3 1

Example: Heaviside function 18 true data ( ) and noisy data () 16 14 12 f true (x) and f noisy (x) 1 8 6 4 2 n = 5 2 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 2

16 true data ( ) and best fit Smooth Spl 14 12 1 f true (x) and f SS (x) 8 6 4 2 2 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 3

Spatially adaptive Idea: Somehow adjust penalty for roughness of f min f n (y i f(t i )) 2 + i=1 1 λ(t)f (p) (t) 2 dt for a good choice of λ(t) > Banff 9/7/3 4

Problem: choose λ(t) from the data Two proposals Frequentist: Pintore, Speckman, Holmes (23) Use GCV Bayesian w/ Dongchu Sun Use discretized approximation to smoothing spline Type of stochastic volatility model In progress Related work Local bandwidth kernel smoothers Local GCV: Cummins, Filloon, Nychka (21) Adaptive P-splines: Ruppert and Carroll Banff 9/7/3 5

Toward frequentist exact solution min f n (y i f(t i )) 2 + i=1 Special case of L-spline with Lf = λd p f: 1 λ(t)f (p) (t) 2 dt min f n (y i f(t i )) 2 + i=1 1 (Lf(t)) 2 dt Reproducing kernel: (eg, Gu or Heckman and Ramsay) K λ (s, t) = G(s, u) = 1 1 G(s, u)g(t, u) du λ(u) (s u)p 1 + (p 1)! Banff 9/7/3 6

Solution satisfies ˆf(t) = n c j K λ (t, t j ) + j=1 f = Σ λ c + Td Σ λ = [K λ (t i, t j )] n n 1 λ (f (p) ) 2 = f Σ λ f p 1 j= Problem becomes min y Σ λ c Td 2 + f Σ λ f f One common solution: Factorize [ ] R T = [Q 1 Q 2 ] Q = [Q 1 Q 2 ] orthogonal d j φ j (t) Banff 9/7/3 7

Solution (see Wahba or Gu) ˆf = Aλ y I A λ = nq 2 (Q 2M λ Q 2 ) 1 Q 2 M λ = Σ λ + ni So ˆf = y nq2 (Q 2M λ Q 2 ) 1 Q 2y Banff 9/7/3 8

Special case: piecewise constant λ We fix = τ < τ 1 < < τ K < τ K+1 = 1 Assume each τ k = t i, some i Piecewise constant: λ(t) = λ k = e γ k, t [τ k 1, )τ k ) Explicit (but messy) formula for Σ λ for p = 2 ˆf λ is a polynomial spline with multiple knots at τ k Banff 9/7/3 9

Choosing appropriate λ by GCV We use Generalized Cross Validation with extra cost term: Cost κ 1 A λ does not diagonalize V (λ 1,, λ K ) = n (I A λ)y 2 (tr(i κa λ )) 2 Brute force: Matlab, Nelder-Mead optimization Seems to work for K 2 Banff 9/7/3 1

Examples Heaviside: K = 5 16 true data ( ) and best fit Spat Adapt Spl with 5 jumps 14 12 1 f true (x) and f SAS(5) (x) 8 6 4 2 2 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 11

Bayesian interpretation: Wahba, 1978 Reasonable prior X(t) is mean zero Gaussian process with covariance K λ (s, t) Diffuse prior on X(),, D p 1 X() Then posterior is spatially adaptive spline: normal with mean A λ y and covariance σ 2 A λ (Wahba, 1978) Nychka showed that average frequentist coverage tends to have right level Banff 9/7/3 12

Bayes credibility interval 2 Spat Adapt Spl with 5 jumps ( ) and 95% Confidence Intervals 15 f SAS(5) (x) and 95% confidence intervals 1 5 5 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 13

Doppler function f(t) = (t(1 t)) 1/2 sin 2π(1 + a)/(t + a), a = 5, n = 128, S-N = 7 8 true data ( ) and noisy data () 6 4 f true (x) and f noisy (x) 2 2 4 6 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 14

Doppler: ordinary smoothing 5 true data ( ) and best fit Smooth Spl 4 3 2 f true (x) and f SS (x) 1 1 2 3 4 5 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 15

Doppler:K = 5 5 true data ( ) and best fit Spat Adapt Spl with 5 jumps 4 3 2 f true (x) and f SAS(5) (x) 1 1 2 3 4 5 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 16

Doppler:K = 1 5 true data ( ) and best fit Spat Adapt Spl with 1 jumps 4 3 2 f true (x) and f SAS(1) (x) 1 1 2 3 4 5 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 17

Doppler:K = 2 6 true data ( ) and best fit Spat Adapt Spl with 2 jumps 4 2 f true (x) and f SAS(2) (x) 2 4 6 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 18

Doppler:K = 5, Bayesian interval 6 Spat Adapt Spl with 5 jumps ( ) and 95% Confidence Intervals 4 f SAS(5) (x) and 95% confidence intervals 2 2 4 6 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 19

Experiments with κ: Heaviside, K = 1, κ = 1 18 true data ( ) and Spat Adapt Spl with 1 jumps ( ), κ = 1 16 14 f true (x) and f SAS(1) (x), κ = 1 12 1 8 6 4 2 2 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 2

Heaviside, K = 1, κ = 12 18 true data ( ) and Spat Adapt Spl with 1 jumps ( ), κ = 12 16 14 f true (x) and f SAS(1) (x), κ = 12 12 1 8 6 4 2 2 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 21

Heaviside, K = 1, κ = 14 16 true data ( ) and Spat Adapt Spl with 1 jumps ( ), κ = 14 14 12 f true (x) and f SAS(1) (x), κ = 14 1 8 6 4 2 2 1 2 3 4 5 6 7 8 9 1 x Banff 9/7/3 22

Conclusions for this part κ = 14 works pretty well in at least one example K is another regularizing parameter K = 5 or 1 seemed to work well We can get empirical Bayes credibility intervals for f Much more research needed Banff 9/7/3 23

Bayesian model Our simplified approach Discretize Use MCMC to get posterior quantities Banff 9/7/3 24

Bayesian model for spline smoothing (Wahba, 1978) Prior for f: D p X(t) = b dw (t) Diffuse prior: X(), DX(),, D p 1 X() Posterior for f with this prior: normal Mean = smoothing spline with λ = σ 2 /b Variance = σ 2 A λ Kohn and coauthors have championed this model; Hastie and Tibshirani (22) Banff 9/7/3 25

Discretize the prior: assume t i s equally spaced (Whitaker) x i = x i x i 1 2 x i = x i x i 1 etc Assume p x i iid N(, b), i = p + 1,, n x 1, x 1,, p 1 x p diffuse Equivalent forms: Bx N(, bi n p ) p(x) b (n p)/2 exp ( 1 ) 2b x B Bx Banff 9/7/3 26

(Partially informative prior) B B is Beran s annihilator matrix For moderate n, fit is indistinguishable from regular smoothing spline Banff 9/7/3 27

Rich class of priors on function space Equivalent? min f,γ n (y i f(t i )) 2 + i=1 1 e γ(t) f (p) (t) 2 dt + η 1 γ (q) (t) 2 dt Banff 9/7/3 28

Bayes solution Inverse gamma prior on σ 2 Inverse gamma prior on η MCMC: Markov chain Monte Carlo simulation to estimate posterior quantities Gibbs sampling is relatively simple: x y, γ, η, σ 2 has multivariate normal smoothing spline posterior σ 2 y, x, γ, η is inverse gamma η y, x, γ, σ 2 is inverse gamma γ i γ i, y, x, σ 2, η does not have nice form and we now sample one component at a time Mixing can be slow, especially for p > 2 Banff 9/7/3 29

Example: Doppler, n = 5, nonadaptive 8 6 4 2-2 -4-6 2 4 6 8 1 x Banff 9/7/3 3

Example: Doppler, n = 5, adaptive 8 6 4 2-2 -4-6 2 4 6 8 1 delta_k 4 3 2 1 2 4 6 8 1 Banff 9/7/3 31

Computational issues: p = 3 would be better MCMC convergence issues What is the continuous version of this process? Ideas extend to other penalities such as L 1 In principle extends to higher dimensions Data dependent penalties open many new possibilities Banff 9/7/3 32