Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother J. E. Griffin and M. F. J. Steel University of Warwick Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 1/23
Introduction Nonparametric regression offers flexibility that many real applications require Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 2/23
Introduction Nonparametric regression offers flexibility that many real applications require Nonlinear relationships with minimal assumptions Interest could be in various aspects (cond. location, cond. spread,... ) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 2/23
Introduction Nonparametric regression offers flexibility that many real applications require Nonlinear relationships with minimal assumptions Interest could be in various aspects (cond. location, cond. spread,... ) Existing Bayesian approaches: flexible location modelling (Gaussian Processes, splines, wavelets) and local modelling (partial exchangeability, BPM) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 2/23
Introduction Nonparametric regression offers flexibility that many real applications require Nonlinear relationships with minimal assumptions Interest could be in various aspects (cond. location, cond. spread,... ) Existing Bayesian approaches: flexible location modelling (Gaussian Processes, splines, wavelets) and local modelling (partial exchangeability, BPM) Here we attempt to combine Bayesian NP function estimation and density estimation We also want to allow for centring over parametric models Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 2/23
Bayesian Nonparametric Modelling Usual hierarchical Bayesian model for y 1,...,y n : y i k(ψ i ), ψ i F, F Π, where k( ) is pdf and Π is flexible class Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 3/23
Bayesian Nonparametric Modelling Usual hierarchical Bayesian model for y 1,...,y n : y i k(ψ i ), ψ i F, F Π, where k( ) is pdf and Π is flexible class Here concentrate on stick-breaking class F d = i=1 p i δ θi, δ θ Dirac measure at θ and p i = V i j<i (1 V j) V 1,V 2,V 3,... independent with V i Be(a i,b i ) θ 1,θ 2,θ 3,... i.i.d. from centring distribution H Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 3/23
Bayesian Nonparametric Modelling Usual hierarchical Bayesian model for y 1,...,y n : y i k(ψ i ), ψ i F, F Π, where k( ) is pdf and Π is flexible class Here concentrate on stick-breaking class F d = i=1 p i δ θi, δ θ Dirac measure at θ and p i = V i j<i (1 V j) V 1,V 2,V 3,... independent with V i Be(a i,b i ) θ 1,θ 2,θ 3,... i.i.d. from centring distribution H Dirichlet process when a i = 1 and b i = M for all i Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 3/23
Bayesian Nonparametric Regression For pairs (x 1,y 1 ), (x 2,y 2 ),...,(x n,y n ) a natural extension is y i k(ψ i ), ψ i F x, F x d = j=1 p j (x)δ θj (x). Covers existing processes: Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 4/23
Bayesian Nonparametric Regression For pairs (x 1,y 1 ), (x 2,y 2 ),...,(x n,y n ) a natural extension is y i k(ψ i ), ψ i F x, F x d = j=1 p j (x)δ θj (x). Covers existing processes: p i (x) = p i : single p DDP (MacEachern, 2001; De Iorio et al., 2004; Gelfand et al., 2005) θ i (x) = θ i : e.g. πddp (Griffin and Steel, 2006) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 4/23
Bayesian Nonparametric Regression Here focus on models with θ i (x) = θ i Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 5/23
Bayesian Nonparametric Regression Here focus on models with θ i (x) = θ i Often undersmooth posterior mean (piecewise constant) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 5/23
Bayesian Nonparametric Regression Here focus on models with θ i (x) = θ i Often undersmooth posterior mean (piecewise constant) So consider: y i g(x i ) m(x i ) k(ψ i ), ψ F x, F x d = i=1 p i (x)δ θi, conditional regression function: parametric part g(x) and a nonparametric part m(x) For m(x): Gaussian process prior with mean 0 and covariance σ 2 0 ρ(x i,x j ) where ρ(x i,x j ) is a Matèrn correlation function Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 5/23
Bayesian Density Smoother Stick-breaking prior: consider the atoms and their ordering at each x R p Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 6/23
Bayesian Density Smoother Stick-breaking prior: consider the atoms and their ordering at each x R p Define closed, convex sets in R p, say I 1,I 2,... and construct F(x) by only considering {(V j,θ j ) x I j } Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 6/23
Bayesian Density Smoother Stick-breaking prior: consider the atoms and their ordering at each x R p Define closed, convex sets in R p, say I 1,I 2,... and construct F(x) by only considering {(V j,θ j ) x I j } Ordering determined by associated t j > 0 (smallest first) So prior is defined by (V 1,θ 1,I 1,t 1 ), (V 2,θ 2,I 2,t 2 ),... Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 6/23
Bayesian Density Smoother Stick-breaking prior: consider the atoms and their ordering at each x R p Define closed, convex sets in R p, say I 1,I 2,... and construct F(x) by only considering {(V j,θ j ) x I j } Ordering determined by associated t j > 0 (smallest first) So prior is defined by (V 1,θ 1,I 1,t 1 ), (V 2,θ 2,I 2,t 2 ),... If p s,w = P(s,w I k s I k or w I k ) is given, then Corr(F s,f w ) = 2(M + 1)p s,w 2 + M(1 + p s,w ) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 6/23
Bayesian Density Smoother For I k choose a ball of radius r k around C k Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 7/23
Bayesian Density Smoother For I k choose a ball of radius r k around C k (C 1,r 1,t 1 ), (C 2,r 1,t 1 ),...: Poisson process on R p R 2 + with intensity p(r) (pdf on R + ) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 7/23
Bayesian Density Smoother For I k choose a ball of radius r k around C k (C 1,r 1,t 1 ), (C 2,r 1,t 1 ),...: Poisson process on R p R 2 + with intensity p(r) (pdf on R + ) Some results for case where x R: p s,s+u = 2µ 2 ui 4µ 2µ 2 + ui where µ = E[r], I = u/2 p(r)dr and µ 2 = u/2 rp(r)dr If r Ga(α, ), F x is mean square differentiable of order q = 1, 2,... if and only if α 2q 1. Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 7/23
Dirichlet Process Regression Smoother Definition: Let (t i,c i,r i ) be a Poisson process with intensity βα Γ(α) rα 1 i exp{ βr i } on R + R p R + with associated marks (V i,θ i ) which are i.i.d. from Be(1,M) and H. If F x = (1 V j )δ θi {i x B ri (C i )} V i {j x B rj (C j ),t j <t i } then {F x x R p } follows a Dirichlet Process Regression Smoother (DPRS), represented as DPRS(M, H, α, β) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 8/23
Centring over Models Centre nonparametric model over nontrivial parametric model: Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 9/23
Centring over Models Centre nonparametric model over nontrivial parametric model: nonparametric model can indicate flaws in common parametric models can aid interpretation and prior elicitation Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 9/23
Centring over Models Centre nonparametric model over nontrivial parametric model: nonparametric model can indicate flaws in common parametric models can aid interpretation and prior elicitation Regression errors: ǫ i = y i g(x i ) All models centred over ǫ i N(0,σ 2 ) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 9/23
Centring over Models Model 1(a): ǫ i N(µ i,aσ 2 ), µ i F xi, F x DPRS(M,H,α,β), H N(0, (1 a)σ 2 ), 0 < a < 1 a close to zero: nonparametric modelling crucial Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 10/23
Centring over Models Model 1(a): ǫ i N(µ i,aσ 2 ), µ i F xi, F x DPRS(M,H,α,β), H N(0, (1 a)σ 2 ), 0 < a < 1 a close to zero: nonparametric modelling crucial Model 1(b): ǫ i m(x i ) N(µ i,aσ 2 ), µ i F xi, F x DPRS(M,H,α,β), H N(0, (1 a)σ 2 ), with σ 2 = σ 2 + σ0 2 and b = σ2 0 /σ2 b indicates relative importance m(x) (GP) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 10/23
Centring over Models Large a, small b: nonparametric modelling less critical, and g(x) is a good parametric model Interpretation of g(x) nonstandard (given F x ), so consider fixing median Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 11/23
Centring over Models Large a, small b: nonparametric modelling less critical, and g(x) is a good parametric model Interpretation of g(x) nonstandard (given F x ), so consider fixing median Model 2: ǫ i m(x i ) U ( σ ui,σ ui ), u i F xi, F x DPRS(M,H,α,β), which leads to symmetric error distributions Choose H Ga(3/2, 1/2) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 11/23
Computational Issues DPRS allows for much simpler MCMC sampling scheme than in Griffin and Steel (2006) Adapt Retrospective sampling methods from Papaspiliopoulos and Roberts (2004) (no truncation) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 12/23
Examples Example 1: Sine wave 100 observations from y i = sin(2πx i ) + ǫ i with x i U(0, 1) and Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 13/23
Examples Example 1: Sine wave 100 observations from y i = sin(2πx i ) + ǫ i with x i U(0, 1) and Error 1: ǫ i is t with 2.5 d.f. and conditional variance σ 2 (x) = (x 1 2 )2 Error 2: p(ǫ i x i ) = 0.3N(0.3, 0.01) + 0.7N( 0.3 + 0.6x i, 0.01) Bimodal at x i = 0 and unimodal (and normal) at x i = 1 Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 13/23
Examples Example 1: Sine wave 100 observations from y i = sin(2πx i ) + ǫ i with x i U(0, 1) and Error 1: ǫ i is t with 2.5 d.f. and conditional variance σ 2 (x) = (x 1 2 )2 Error 2: p(ǫ i x i ) = 0.3N(0.3, 0.01) + 0.7N( 0.3 + 0.6x i, 0.01) Bimodal at x i = 0 and unimodal (and normal) at x i = 1 Take g(x) = 0 throughout Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 13/23
Example: Sine wave, error 1 Model 1(a) Model 1(b) Model 2 2 2 2 1.5 1.5 1.5 1 1 1 p(y x) 0.5 0 0.5 0 0.5 0 0.5 0.5 0.5 1 1 1 1.5 1.5 1.5 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.4 0.25 0.25 0.35 0.2 0.2 0.3 Cond. var. 0.25 0.2 0.15 0.15 0.15 0.1 0.1 0.1 0.05 0.05 0.05 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 1: posterior predictive and σ 2 (x) (true value dashed) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 14/23
Example: Sine wave Results with Error 1: Small values of a indicate lack of normality Model 1(a): blocky as expected Models with GP regression function do better Cond. variance reasonably captured by latter models Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 15/23
Example: Sine wave, error 2 Predictive p(y x) Posterior of m(x) Predictive error 2 2 0.8 1.5 1.5 0.6 Model 1(b) 1 0.5 0 1 0.5 0 0.4 0.2 0 0.5 0.5 0.2 1 1 0.4 1.5 1.5 0.6 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 2 0.8 1.5 1.5 0.6 Model 2 1 0.5 0 1 0.5 0 0.4 0.2 0 0.5 0.5 0.2 1 1 0.4 1.5 1.5 0.6 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.8 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Figure 2: posterior predictive density, m(x) with data (dots), and posterior predictive error distribution Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 16/23
Example: Sine wave Results with Error 2: Model 1(b) can deal with bimodality Model 2 can not, by construction Small a: nonnormality Large b: constant centring model is poor Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 17/23
Examples Example 2: Scale economies Cost function for electricity distribution tc = f(cust) + β 1 wage + β 2 pcap + β 3 PUC + β 4 kwh + β 5 life + β 6 lf + β 7 kmwire + ǫ, tc: log total cost per customer cust: log number of customers Data: 81 municipal distributors in Ontario, Canada during 1993 Interest: effect of cust and other regressors on tc Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 18/23
Example: Scale economies DPRS model with cust as the covariate for ǫ and the GP Centre the model over two parametric regression models by choosing f(cust) to be: Parametric 1: γ 1 + γ 2 cust Parametric 2: γ 1 + γ 2 cust + γ 3 cust 2 Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 19/23
Example: Scale economies Results with Parametric 1: Inference on β, γ quite different for parametric and nonparametric models Small a for Model 1(a), much larger for Model 1(b) Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 20/23
Example: Scale economies Results with Parametric 1: Inference on β, γ quite different for parametric and nonparametric models Small a for Model 1(a), much larger for Model 1(b) Results with Parametric 2: Inference on β, γ similar for parametric and nonparametric models Small a for Model 1(a), much larger for Model 1(b) Now b smaller than with Parametric 1 Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 20/23
Example: Scale economies Model 1(a) Model 1(b) Model 2 0.2 0.2 0.2 0.15 0.15 0.15 Lin. 0.1 0.05 0 0.1 0.05 0 0.1 0.05 0 0.05 0.05 0.05 0.1 0.1 0.1 0.15 0.15 0.15 0.2 7 8 9 10 11 12 0.2 7 8 9 10 11 12 0.2 7 8 9 10 11 12 0.2 0.2 0.2 0.15 0.15 0.15 Quadr. 0.1 0.05 0 0.1 0.05 0 0.1 0.05 0 0.05 0.05 0.05 0.1 0.1 0.1 0.15 0.15 0.15 0.2 7 8 9 10 11 12 0.2 7 8 9 10 11 12 0.2 7 8 9 10 11 12 Figure 3: Posterior mean of the nonparametric component(s) of the model Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 21/23
Example: Scale economies Nonparametric correction to parametric fit: linear model suggests problems; quadratic is better Conclusion: Quadratic parametric model is not bad; linear is inappropriate Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 22/23
Discussion Combine Bayesian NP density estimation and regression modelling Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 23/23
Discussion Combine Bayesian NP density estimation and regression modelling Separate modelling of components: NP smoother needs to do less Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 23/23
Discussion Combine Bayesian NP density estimation and regression modelling Separate modelling of components: NP smoother needs to do less Centring over parametric models: More structured approach Can identify specific problems of parametric models Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 23/23
Discussion Combine Bayesian NP density estimation and regression modelling Separate modelling of components: NP smoother needs to do less Centring over parametric models: More structured approach Can identify specific problems of parametric models Modelling ideas can be used in combination with any NP prior that allows for dependence on covariates Bayesian Nonparametric Modelling with the Dirichlet Process Regression Smoother p. 23/23