Space-time modelling of air pollution with array methods Dae-Jin Lee Royal Statistical Society Conference Edinburgh 2009 D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 1
Motivation Increasing research on modelling spatio-temporal data Wide variety of approaches and research communities E.g. : Environmental data, epidemiologic studies, disease mapping applications,... European Environmental Agency (EEA): Monitoring networks EMEP project (European Monitoring and Evaluation Programme) Ozone (O 3 ) is currently one of the air pollutants of most concern in Europe. D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 2
Monitoring stations across Europe Sweden Latitude 40 45 50 55 60 65 UK Austria Spain 5 0 5 10 15 20 25 30 sample of 45 monitoring stations Longitude Monitoring station D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 3
O 3 time series plot for selected locations Seasonal pattern: O3 20 40 60 80 100 120 140 Spain Sweden Austria UK 1999 2000 2001 2002 2003 2004 2005 year D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 4
Motivation Spatio-temporal data Response variable, y ijt measured over geographical locations, s = (xi, x j ), with i, j = 1,.., n and over time periods, xt, for t = 1,..., T ISSUE: huge amount of data available Smoothing techniques: Study spatial and temporal trends. Space and time interactions. 3-dimensional smoothing: P-splines and GLAM. D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 5
Outline 1 P-splines in spatio-temporal smoothing context 2 ANOVA-Type Interaction Models 3 Application to air pollution data 4 Concluding remarks D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 6
Outline 1 P-splines in spatio-temporal smoothing context 2 ANOVA-Type Interaction Models 3 Application to air pollution data 4 Concluding remarks D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 7
Penalized splines GLAM in 3d 3d-case: f (x 1, x 2, x 3 ) = Bθ θ can be expressed as a 3d-array Θ = {θ} ijk of dim. c 1 c 2 c 3 θ (1,1,c3 ) θ (1,c2,c 3 ) layer 1,...,c 3 columns θ (1,c2,1) θ (1,1,1) 1,...,c 2 rows 1,...,c 1 θ (c1,1,c 3 ) θ (c1,c 2,c 3 ) θ (c1,1,1) θ (c1,c 2,1) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 8
3d-Penalty matrix: Set penalties over the 3d-array Θ: P = λ 1 D 1D 1 I c2 I c3 +λ 2 I c1 D 2D 2 I c3 +λ t I c1 I c2 D td t {z } {z } {z } row-wise column-wise layer-wise For spatio-temporal data: f ( longitude, latitude, time) }{{} Space Spatial anisotropy (λ 1 λ 2 ), different amount of smoothing for latitude and longitude. Temporal smoothing (λ t) Space-time interaction. D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 9
Penalized splines Spatio-Temporal data smoothing For spatio-temporal data, we propose: Spatio-temporal B-splines Basis: B = B s B t, of dim. nt c 1 c 2 c 3 where B s is the spatial B-spline basis (B 1 B 2 ) and B t is the B-spline basis for time of dim. t c 3. Note that: GLAM framework E[Y] = B t ΘB s Mixed model D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 10
Penalized splines Spatio-Temporal data smoothing Most of the space-time approaches consider Simplest additive model (ignores space-time interaction): f s (x 1, x 2 ) + f t (x t ) We proposed space-time interaction models 3d: f st (x 1, x 2, x t ) Space-time ANOVA: f s (x 1, x 2 ) + f t (x t ) + f st (x 1, x 2, x t ) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 11
Penalized splines Spatio-Temporal data smoothing Most of the space-time approaches consider Simplest additive model (ignores space-time interaction): f s (x 1, x 2 ) + f t (x t ) We proposed space-time interaction models 3d: f st (x 1, x 2, x t ) Space-time ANOVA: f s (x 1, x 2 ) + f t (x t ) + f st (x 1, x 2, x t ) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 11
Penalized splines Spatio-Temporal data smoothing Most of the space-time approaches consider Simplest additive model (ignores space-time interaction): f s (x 1, x 2 ) + f t (x t ) We proposed space-time interaction models 3d: f st (x 1, x 2, x t ) Space-time ANOVA: f s (x 1, x 2 ) + f t (x t ) + f st (x 1, x 2, x t ) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 11
Outline 1 P-splines in spatio-temporal smoothing context 2 ANOVA-Type Interaction Models 3 Application to air pollution data 4 Concluding remarks D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 12
ANOVA-Type Interaction Models Chen (1993), Gu (2002): Smoothing-Spline ANOVA (SS-ANOVA). Interpretation as main effects and interactions. Models of type: ŷ = f (x 1 ) + f (x 2 ) + f (x t ) Main/additive effects + f (x 1, x 2 ) + f (x 1, x t ) + f (x 2, x t ) 2-way interactions + f (x 1, x 2, x t ) 3-way interactions PROBLEMS: identifiability, and basis dimension ( curse of dimensionality ) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 13
P-spline ANOVA model for spatio-temporal smoothing Lee and Durbán (2009b), consider: where y = γ + f s (x 1, x 2 ) + f t (time) + f st (x 1, x 2, time) + ɛ, f s (x 1, x 2 ) Spatial 2d smooth surface f t (time) Smooth time trend f st (x 1, x 2, time) Space-time interaction We need to construct an identifiable model. Our approach is based on: low-rank basis (P-splines) the mixed model representation and SVD properties. D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 14
P-spline ANOVA model for spatio-temporal smoothing Model: by = Intercept + f s(x 1, x 2 ) + f t(time) + f st(x 1, x 2, time) with Basis and Coefficients: ( ) Bθ = 1 nt B s 1 t 1 n B t B s B t and Penalty: 0 P = where τ 1 D 1D 1 I c2 + τ 2 I c1 D 2D 2 τ t D td t γ θ (s) θ (t) θ (st) P st = λ 1 D 1 D 1 I c2 I c3 + λ 2 I c1 D 2 D 2 I c3 + λ t I c1 I c2 D t D t P st D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 15
P-spline ANOVA model for spatio-temporal smoothing Model: by = Intercept + f s(x 1, x 2 ) + f s(time) + f st(x 1, x 2, time) with Basis and Coefficients: ( ) Bθ = 1 nt B s 1 t 1 n B t B s B t and Penalty: 0 P = where τ 1 D 1D 1 I c2 + τ 2 I c1 D 2D 2 τ t D td t γ θ (s) θ (t) θ (st) P st = λ 1 D 1 D 1 I c2 I c3 + λ 2 I c1 D 2 D 2 I c3 + λ t I c1 I c2 D t D t P st D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 15
P-spline ANOVA model for spatio-temporal smoothing Model: by = Intercept + f s(x 1, x 2 ) + f s(time) + f st(x 1, x 2, time) with Basis and Coefficients: ( ) Bθ = 1 nt B s 1 t 1 n B t B s B t and Penalty: 0 P = where τ 1 D 1D 1 I c2 + τ 2 I c1 D 2D 2 τ t D td t γ θ (s) θ (t) θ (st) P st = λ 1 D 1 D 1 I c2 I c3 + λ 2 I c1 D 2 D 2 I c3 + λ t I c1 I c2 D t D t P st D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 15
P-spline ANOVA model for spatio-temporal smoothing Model: by = Intercept + f s(x 1, x 2 ) + f s(time) + f st(x 1, x 2, time) with Basis and Coefficients: ( ) Bθ = 1 nt B s 1 t 1 n B t B s B t and Penalty: 0 P = where τ 1 D 1D 1 I c2 + τ 2 I c1 D 2D 2 τ t D td t γ θ (s) θ (t) θ (st) P st = λ 1 D 1 D 1 I c2 I c3 + λ 2 I c1 D 2 D 2 I c3 + λ t I c1 I c2 D t D t P st D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 15
P-spline ANOVA model for spatio-temporal smoothing Model: by = Intercept + f s(x 1, x 2 ) + f s(time) + f st(x 1, x 2, time) with Basis and Coefficients: ( ) Bθ = 1 nt B s 1 t 1 n B t B s B t and Penalty: 0 P = where τ 1 D 1D 1 I c2 + τ 2 I c1 D 2D 2 τ t D td t γ θ (s) θ (t) θ (st) P st = λ 1 D 1 D 1 I c2 I c3 + λ 2 I c1 D 2 D 2 I c3 + λ t I c1 I c2 D t D t P st D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 15
P-spline ANOVA model for spatio-temporal smoothing Model: by = Intercept + f s(x 1, x 2 ) + f s(time) + f st(x 1, x 2, time) with Basis and Coefficients: ( ) Bθ = 1 nt B s 1 t 1 n B t B s B t and Penalty: 0 P = where τ 1 D 1D 1 I c2 + τ 2 I c1 D 2D 2 τ t D td t γ θ (s) θ (t) θ (st) P st = λ 1 D 1 D 1 I c2 I c3 + λ 2 I c1 D 2 D 2 I c3 + λ t I c1 I c2 D t D t P st D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 15
In GLAM notation: ( B s 1 t ) θ (s) 1 t Θ s B s ( 1 n B t ) θ (t) B t Θ t 1 n ( B s B t ) θ (s,t) B t Θ st B s D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 16
P-spline ANOVA model for spatio-temporal smoothing We avoid identifiability problems using Mixed model reparameterization and SVD properties For each term we have: Basis [ X : Z ] f s (x 1, x 2 ) x 1 : x 2 (1) f t (x t ) x t (2) f st (x 1, x 2, x t ) x 1 : x 2 : x t (3) Some terms in (1) and (2) also appear in (3). Remove linearly dependent columns in the basis» Constraints D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 17
P-spline ANOVA model for spatio-temporal smoothing We avoid identifiability problems using Mixed model reparameterization and SVD properties For each term we have: Basis [ X : Z ] f s (x 1, x 2 ) x 1 : x 2 (1) f t (x t ) x t (2) f st (x 1, x 2, x t ) x 1 : x 2 : x t (3) Some terms in (1) and (2) also appear in (3). Remove linearly dependent columns in the basis» Constraints D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 17
P-spline ANOVA model for spatio-temporal smoothing Equivalent to apply constraints over the coefficients: ct t=1 θ(t) t = 0 (time) c 1 i θ (st) t,ij = c 2 j θ (st) t,ij = c 1 i c 2 j θ (st) t,ij = 0 (space-time) or in array form over Θ ijt» Array
c1 i θ (st) t,ij = c 2 j θ (st) t,ij = c 1 c2 i j θ (st) t,ij = 0 Centering and scaling matrix: ( I c 11 /c ) θ (1,1,c3 ) θ (1,c2,c 3 ) layer 1,...,c columns 3 θ (1,1,1) 1,...,c 2 θ (1,c2,1) rows 1,...,c 1 θ (c1,1,c 3 ) θ (c1,c 2,c 3 ) θ (c1,1,1) θ (c1,c 2,1) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 19
c1 i θ (st) t,ij = c 2 j θ (st) t,ij = c 1 c2 i j θ (st) t,ij = 0 Centering and scaling matrix: ( I c 11 /c ) θ (1,1,c3 ) θ (1,c2,c 3 ) layer 1,...,c columns 3 θ (1,1,1) 1,...,c 2 θ (1,c2,1) rows 1,...,c 1 θ (c1,1,c 3 ) θ (c1,c 2,c 3 ) θ (c1,1,1) θ (c1,c 2,1) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 19
c1 i θ (st) t,ij = c 2 j θ (st) t,ij = c 1 c2 i j θ (st) t,ij = 0 Centering and scaling matrix: ( I c 11 /c ) θ (1,1,c3 ) θ (1,c2,c 3 ) layer 1,...,c columns 3 θ (1,1,1) 1,...,c 2 θ (1,c2,1) rows 1,...,c 1 θ (c1,1,c 3 ) θ (c1,c 2,c 3 ) θ (c1,1,1) θ (c1,c 2,1) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 19
In practice We only need to construct the matrices X, Z and penalty F f s (x 1, x 2 ) f t (x t ) f st (x 1, x 2, x t ) X by columns x 1 : x 2 x t (x 1, x 2, x t ) Z by blocks F blockdiagonal F s F t F st (λ 1, λ 2, λ t ) λ t (τ 1, τ 2, τ t ) And estimate by REML D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 20
Outline 1 P-splines in spatio-temporal smoothing context 2 ANOVA-Type Interaction Models 3 Application to air pollution data 4 Concluding remarks D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 21
Ozone pollution in Europe Lee and Durbán (2009b) Sample of 45 monitoring stations Monthly averages of O 3 levels (in µg/m 3 units) from january 1999 to december 2005 (t = 1,..., 84) Models: Additive: Spatio-temporal Interaction: 3d: f s (x 1, x 2 ) + f t (x t ) f st(x 1, x 2, x t) ANOVA: f s(x 1, x 2 ) + f t(x t) + f st(x 1, x 2, x t) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 22
Spatial 2d + time f s (x 1, x 2 ) + f t (x t ) Longitude 40 45 50 55 60 65 90 80 70 60 50 40 f(time) 20 10 0 10 20 0 5 10 15 20 25 Latitude 1999 2000 2001 2002 2003 2004 2005 year Space-time interaction is not considered time smooth trend is additive D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 23
Spatio-temporal ANOVA model Play animation ^ y = f(space) + f(space,time) + f(time) 1999 : 1 D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 24
Comparison of fitted values Additive VS ANOVA Additive model fit ANOVA model fit f s(x 1, x 2 ) + f s(x t) f s(x 1, x 2 ) + f t(x t) + f st(x 1, x 2, x t) O3 20 40 60 80 100 120 140 Spain Sweden Austria UK O3 20 40 60 80 100 120 140 Spain Sweden Austria UK 1999 2000 2001 2002 2003 2004 2005 2006 1999 2000 2001 2002 2003 2004 2005 2006 year year Additive model assumes a spatial smooth surface over all monitoring stations that remains constant over time. ANOVA model captures individual characteristics of the stations throughout time. D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 25
Comparison of Models ANOVA, 3d and Additive Model AIC df ANOVA 14280,73 366,03 3d interaction 14537,22 765,05 Additive 16506,28 65,98 Observations: Best overall performance of ANOVA in terms of AIC also with less d.f. than 3d ANOVA model is more realistic than Additive, and easier to decompose and interpret in terms of the fit. D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 26
Outline 1 P-splines in spatio-temporal smoothing context 2 ANOVA-Type Interaction Models 3 Application to air pollution data 4 Concluding remarks D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 27
Concluding remarks Use of Array methods in space-time data speeds up calculations data and model matrices are stored efficiently Spatio-temporal ANOVA model Flexible formulation and interpretability of smoothing Computational efficiency: nested B-spline bases consider a smaller basis B t for the interaction B s B t... such that rank( B t ) < rank(b t ) The size of the full basis B is reduced and model is nested f s (x 1, x 2 ) + f t (x t ) + f st(x 1, x 2, x t ) f s (x 1, x 2 ) + f t (x t ) + f st (x 1, x 2, x t ) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 28
Concluding remarks Use of Array methods in space-time data speeds up calculations data and model matrices are stored efficiently Spatio-temporal ANOVA model Flexible formulation and interpretability of smoothing Computational efficiency: nested B-spline bases consider a smaller basis B t for the interaction B s B t... such that rank( B t ) < rank(b t ) The size of the full basis B is reduced and model is nested f s (x 1, x 2 ) + f t (x t ) + f st(x 1, x 2, x t ) f s (x 1, x 2 ) + f t (x t ) + f st (x 1, x 2, x t ) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 28
Concluding remarks Use of Array methods in space-time data speeds up calculations data and model matrices are stored efficiently Spatio-temporal ANOVA model Flexible formulation and interpretability of smoothing Computational efficiency: nested B-spline bases consider a smaller basis B t for the interaction B s B t... such that rank( B t ) < rank(b t ) The size of the full basis B is reduced and model is nested f s (x 1, x 2 ) + f t (x t ) + f st(x 1, x 2, x t ) f s (x 1, x 2 ) + f t (x t ) + f st (x 1, x 2, x t ) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 28
Concluding remarks Use of Array methods in space-time data speeds up calculations data and model matrices are stored efficiently Spatio-temporal ANOVA model Flexible formulation and interpretability of smoothing Computational efficiency: nested B-spline bases consider a smaller basis B t for the interaction B s B t... such that rank( B t ) < rank(b t ) The size of the full basis B is reduced and model is nested f s (x 1, x 2 ) + f t (x t ) + f st(x 1, x 2, x t ) f s (x 1, x 2 ) + f t (x t ) + f st (x 1, x 2, x t ) D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 28
THANKS FOR YOUR ATTENTION!!!
Bibliography Spatial and Spatio-temporal smoothing with P-splines: Lee, D.-J. and Durbán, M. (2009a) Smooth-CAR mixed models for spatial count data. CSDA 53(8):2968-2979 Lee, D.-J. and Durbán, M. (2009b) P-spline ANOVA-Type interaction models for spatio-temporal smoothing Submitted P-splines: Eilers, PHC. and Marx, BD. Flexible smoothing with B-splines and penalties Stat. Sci. (1996), 11:89-121 Currie, ID., Durbán M. and Eilers, PHC. (2006) Generalized linear array models with applications to multidimensional smoothing JRSSB, 68:1-22 D.-J. Lee (Uc3m) GLAM: Array methods in Statistics RSS 09 - Edinburgh # 30