Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University

Size: px

Start display at page:

Download "Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University"

Neil Dickerson
5 years ago
Views:

under STAR Research Assistance Agreements CR-82995 and CR- 82996 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University and Oregon State University.

1 Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under STAR Research Assistance Agreements CR and CR awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University and Oregon State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of the authors. EPA does not endorse any products or commercial services mentioned in this report.

2 Outline Regression for sparse spatial sample layers and neighborhoods for GIS data linear model formulation with many parameters Lasso shrinkage and model selection numerical experiment I Spatial Lasso modifications for spatial smoothness numerical experiment II Application: Prediction of soil moisture index

3 Responses and Covariates Data {Y (s i ) : i = 1,..., n} observed at spatial locations s i D Z 2 D is a regular grid Covariate layers {x k (s) : s D}; k = 1,..., p Goal: find an appropriate model for Y (s) as a linear function of {x k (s) : s D}; k = 1,..., p

4 Model Formulation Each layer has an associated neighborhood: N k Z 2 is a neighborhood set of J Y (s i ) = a j φ j (s i ) j=1 + p k=1 u N k b k (s i, u)x k (s i + u) + ε(s i ) φ 1 ( ),..., φ J ( ) are known functions {ε(s 1 ),..., ε(s n )} are iid N(, σ 2 )

5 Spatial Homogeneity For each layer k and pixel u, expect some spatial homogeneity in b k (, u): b k (, u) = L k l=1 c k,l (u)ψ k,l ( ); k = 1,..., p, u N k, where ψ k,l ( ) s are known functions possibly depending on some covariates. can adjust for spatial features like flow directions take L k 1, ψ k,1 1 in this discussion b k (, u) = c k,1 (u)

6 Linear Model and Least Squares Estimation Rewrite as linear model: Y (Y (s 1 ),..., Y (s n )) = Xβ + ɛ, Ordinary least squares estimates? large number of parameters: J + p k=1 N k OLS has low bias, large variance: shrinkage OLS difficult to interpret: model selection

7 Lasso Lasso (Tibshirani, 1996): least absolute shrinkage and selection operator standardize X as X Minimize RSS subject to L 1 constraint: (Y X β ) (Y X β ), subject to βj t, a tuning parameter Key feature: some estimated βj can be exactly zero

8 Lasso, Continued Equivalently, minimize (Y X β ) (Y X β ) + λ m βj, j=1 if priors are independent Laplace, { ˆβ j } are posterior modes Least angle regression (LARS, Efron et al., 24) provides fast algorithm for Lasso same computational order as OLS applied to full set of covariates lars package in R (

9 Numerical Experiment I: Mean Function Consider six covariate layers simulated as Gaussian random fields exponential covariance, strong dependence Construct true mean function: µ(s) = { x 1 (s) + 1 j= 1 True neighborhoods: x 1 (s + (j, 1)) + 2 j= 2 x 1 : inverted pyramid x 2 : centered 3 3 block x 3, x 4, x 5, x 6 : empty x 1 (s + (j, 2)) } j= 1 k= 1 x 2 (s + (j, k))

10 Simulated Covariate Layer Strong spatial dependence in x 1 ( )

11 Estimation and Prediction Sample 1 sites from 1 1 spatial domain observed response = mean function plus noise Estimate and predict using OLS or Lasso OLS models with correct layers: Y (s i ) = β + Y (s i ) = β + Y (s i ) = β + 2 β l x l (s i ) + ε(s i ); i = 1,..., n, l=1 2 l=1 2 l=1 1 1 j= 1 k= j= 2 k= 2 β l,j,k x l (s i + (j, k)) + ε(s i ); i = 1,..., n, β l,j,k x l (s i + (j, k)) + ε(s i ); i = 1,..., n, Lasso neighborhoods, N (2q+1) : (2q + 1) (2q + 1) blocks

12 True Mean Function and Absolute Prediction Errors Qualitatively similar for weak dependence

13 Average Squared Error ASE over 1 1 spatial domain: ASE = 1 (1) 2 (ˆµ(s) µ(s)) 2, s D Conduct 1 simulation replicates get 1 ASE s for LS1, LS3, LS5, Lasso3,..., Lasso11 produce boxplots of ASE s

14 Average Squared Error Under Strong Dependence Lasso3 Lasso5 Lasso7 Lasso9 Lasso11 Lasso* Spatial* True LS1 LS3 LS5 of ASE for various estimation methods with sample size n = 1

15 Estimated Neighborhoods Using Lasso (Strong) Figure 4: Proportion of {x k (s) : s N (2q+1) } being selected for θ = 2 under various neighboring structures, where the three columns correspond to k = 1, 2, 3 and the five rows correspond to q = 1,...,5, respectively.

16 Spatial Smoothness Above model assumes spatial homogeneity of regression coefficients across neighborhoods No assumption of spatial smoothness of regression coefficients within neighborhoods in many applications, reasonable to assume smoothness of {c k,l (s) : s N k } Ordinary Lasso does not account for spatial smoothness

17 Spatial Lasso Allow for smoothness of coefficients Assume spatial dependence prior for β: Γ is prior correlation matrix of β β Γ 1/2 β independent Laplace Spatial Lasso obtained by minimizing (Y X β ) (Y X β ) + λ where X X Γ 1/2 m j=1 β j, computation via modification of LARS: equi-projection regression

18 Numerical Experiment II Mean function: for x(s) iid N(, 1) 2 2 µ(s) = w j,k x(s + (j, k)); s D, j= 2 k= 2 and Gaussian weight function exp ( (j 2 + k 2 )/4 ) w j,k 2j= 2 2k= 2 exp ( (j 2 + k 2 )/4 ) Sample n = 1 sites and generate {Y (s 1 ),..., Y (s n )} by adding noise

19 Spatial Lasso for Numerical Experiment II Compare OLS, Lasso, Spatial Lasso Apply Spatial Lasso with exponential covariance structure Choose smoothness parameter and neighborhood size via grid search: neighborhood sets {N (2q+1) : q = 1,..., 5} smoothness parameters γ =, 1,..., 5 choose combination with best ten-fold cross-validation

20 Average Squared Error for Experiment II Spatial(3) Spatial(5) Spatial(7) Spatial(9) Spatial(11) Lasso* Spatial* LS1 LS3 LS5 of ASE for various estimation methods with sample size n = 1, w

21 Estimated Coefficients Using Spatial Lasso (a) (b) (c) (d) (e) (f) Figure 6: (a) Image of True regression coefficients; (b)-(f) Images of average estimated coefficients under N (3), N (5),...,N (11), respectively

22 Application: Prediction of Soil Moisture Index Select n = 1 sites via simple random sampling without replacement

23 Covariates for Soil Moisture Prediction Aspect, hill shade, elevation, slope, precipitation (a) (b) (c) (d).2 (e) 35

24 Covariates for Soil Moisture Prediction Response: soil moisture index Basic covariate layers: aspect, hill shade, elevation, slope, precipitation 1 1 regular grid Expanded covariate layers (14 total): average elevation, slope, precipitation on 3 3 blocks average elevation, slope, precipitation on 9 9 blocks aspect*elevation, aspect*slope, aspect*precipitation

25 Neighborhoods for Soil Moisture Prediction Neighborhoods: use single pixel for elevation, 3 3 elevation, 9 9 elevation same for slope and precipitation (up to 9 9 neighborhood, but highly constrained parameters) for remaining variables, use N (2q+1) = {, ±1, ±2,..., ±(2q + 1)} 2 with q =, 1,..., 5

26 Spatial Lasso for Soil Moisture Prediction Apply spatial Lasso with exponential covariance structure grid search on γ =, 1,..., 5, crossed with neighborhoods above 36 possible combinations Smallest ten-fold cross-validation value at γ = 2 and q = 5

27 True and Predicted Soil Moisture Index (a) (b) (c) Figure 8: 12

28 Summary Lasso approach to GIS model selection and estimation simultaneous layer selection, neighborhood selection, and estimation dominates OLS for unknown layers/neighborhoods extensible to neighborhood transformation Spatial Lasso accounts for spatial smoothness within neighborhoods dominates Lasso if smoothness is present Promising results in prediction of soil moisture index

Spatial Lasso with Applications to GIS Model Selection

Spatial Lasso with Applications to GIS Model Selection Hsin-Cheng Huang Institute of Statistical Science, Academia Sinica Nan-Jung Hsu National Tsing-Hua University David Theobald Colorado State University