Geostatistical Modeling for Large Data Sets: Low-rank methods Whitney Huang, Kelly-Ann Dixon Hamil, and Zizhuang Wu Department of Statistics Purdue University February 22, 2016
Outline Motivation Low-rank methods
Gaussian process (GP) geostatistics Model: where Y (s) = µ(s) + η(s) + ε(s), s S R d µ(s) = E [Y (s)] = X(s) T β {η(s)} s S GP (0, C (, )) ε(s) iid N(0, τ 2 ) s S
Likelihood When the process Y (s) is observed at n locations, say, s 1, s n ie, Y = (Y (s 1 ),, Y (s n )) T Log-likelihood: where l n 1 2 log Σ Y 1 2 (Y XT β) T [Σ Y ] 1 (Y Xβ) Σ Y = [Cov {Y (s i ), Y (s j )}] i,j=1,,n [ ] = C (s i, s j ) + τ 2 1 {si =s j } i,j=1,,n
Big n Problem in geostatistics Modern environmental instruments have produced a wealth of space time data n is big Evaluation of the likelihood function involves factorizing large covariance matrices that generally requires O(n 3 ) operations O(n 2 ) memory Modeling strategies are needed to deal with large spatial data set parameter estimation MLE, Bayesian spatial interpolation Kriging multivariate spatial data (np np), spatio-temporal data (nt nt)
Big n Problem in geostatistics Modern environmental instruments have produced a wealth of space time data n is big Evaluation of the likelihood function involves factorizing large covariance matrices that generally requires O(n 3 ) operations O(n 2 ) memory Modeling strategies are needed to deal with large spatial data set parameter estimation MLE, Bayesian spatial interpolation Kriging multivariate spatial data (np np), spatio-temporal data (nt nt)
Big n Problem in geostatistics Modern environmental instruments have produced a wealth of space time data n is big Evaluation of the likelihood function involves factorizing large covariance matrices that generally requires O(n 3 ) operations O(n 2 ) memory Modeling strategies are needed to deal with large spatial data set parameter estimation MLE, Bayesian spatial interpolation Kriging multivariate spatial data (np np), spatio-temporal data (nt nt)
Outline Motivation Low-rank methods
Low rank approximation Goal: want to approximate Y = X T β + η + ε into a lower-dimensional structure Hierarchical Representation (assume zero mean) Y = η + ε, ε N n (0, Σ ε ) η = AZ + ξ, ξ N n (0, Σ ξ ) Z N r (0, Σ Z ) Z = (Z 1,, Z r ) T, r n A is the (n r) expansion matrix that maps Z to η Σ ε and Σ ξ are (usually assumed to be) diagonal η 0 = A 0 Z + ξ 0 η(s 0 ) = r j=1 a(s 0) T Z j + ξ(s 0 )
Low rank approximation Goal: want to approximate Y = X T β + η + ε into a lower-dimensional structure Hierarchical Representation (assume zero mean) Y = η + ε, ε N n (0, Σ ε ) η = AZ + ξ, ξ N n (0, Σ ξ ) Z N r (0, Σ Z ) Z = (Z 1,, Z r ) T, r n A is the (n r) expansion matrix that maps Z to η Σ ε and Σ ξ are (usually assumed to be) diagonal η 0 = A 0 Z + ξ 0 η(s 0 ) = r j=1 a(s 0) T Z j + ξ(s 0 )
Low rank approximation Goal: want to approximate Y = X T β + η + ε into a lower-dimensional structure Hierarchical Representation (assume zero mean) Y = η + ε, ε N n (0, Σ ε ) η = AZ + ξ, ξ N n (0, Σ ξ ) Z N r (0, Σ Z ) Z = (Z 1,, Z r ) T, r n A is the (n r) expansion matrix that maps Z to η Σ ε and Σ ξ are (usually assumed to be) diagonal η 0 = A 0 Z + ξ 0 η(s 0 ) = r j=1 a(s 0) T Z j + ξ(s 0 )
Low rank approximation Goal: want to approximate Y = X T β + η + ε into a lower-dimensional structure Hierarchical Representation (assume zero mean) Y = η + ε, ε N n (0, Σ ε ) η = AZ + ξ, ξ N n (0, Σ ξ ) Z N r (0, Σ Z ) Z = (Z 1,, Z r ) T, r n A is the (n r) expansion matrix that maps Z to η Σ ε and Σ ξ are (usually assumed to be) diagonal η 0 = A 0 Z + ξ 0 η(s 0 ) = r j=1 a(s 0) T Z j + ξ(s 0 )
Computational savings using low rank approximation To carry out parameter estimation and spatial interpolation one need to compute ( AΣZ A T + V ) 1 where V = Σ ε + Σ ξ Sherman Morrison Woodbury formula (A + BCD) 1 = A 1 A 1 B ( C 1 + DA 1 B ) 1 DA 1 In low rank approximation, we have ( AΣZ A T + V ) 1 = V 1 V 1 A ( Σ 1 Z + AT V 1 A ) 1 A T V 1
Low rank approximation: expansion matrix A η(s 1 ) η(s 2 ) η(s n ) }{{} η A 11 A 12 A 1r A 21 A 22 A 2r A n1 A n2 A nr }{{} A Z 1 Z 2 Z r }{{} Z Question How to choose A? How to determine Z = (Z 1,, Z r ) T? choose r knots s 1,, s r S and let Z = (Z (s 1 ),, Z (s r )) T
Low rank approximation: expansion matrix A η(s 1 ) η(s 2 ) η(s n ) }{{} η A 11 A 12 A 1r A 21 A 22 A 2r A n1 A n2 A nr }{{} A Z 1 Z 2 Z r }{{} Z Question How to choose A? How to determine Z = (Z 1,, Z r ) T? choose r knots s 1,, s r S and let Z = (Z (s 1 ),, Z (s r )) T
Choice of A Orthogonal basis functions: eg, Fourier, orthogonal polynomial, etc Karhune Loéve expansion: Cov(η(s 1 ), η(s 2 )) = j=1 λ jφ j (s 1 )φ j (s 2 ) where Cov(η(s 1 ), η(s 2 ))φ j (s 1 ) ds 1 = λ j φ j (s 2 ) S Empirical orthogonal functions (EOFs) Nonorthogonal Basis Functions: represent the spatial process as combination of kernel functions η(s) = r a(s, s j )Z j + ξ(s) j=1
Fixed Rank Kriging (Cressie & Johannesson 08) Fixed-rank kriging outlines how kriging (ie, spatial best linear unbiased predictor) can be applied in the low-rank parameterization of a spatial process A: (multiresolutional) non orthogonal basis functions ( ) a(s) T = {1 ( s s1(l) /ρ 1(l)) 2 } 2 +,, {1 ( s sr(l) /ρ r(l)) 2 } 2 + Z = (Z (s 1 ),, Z (s r )) T, Σ Z to be estimated from data The fixed rank kriging is equivalent to the following low rank model Y (s) = X (s) T β + r a(s s j(l) )Z j + ε(s) + ξ(s) j=1
Multiresolutional basis functions Figure: courtesy of Cressie & Johannesson 08
Nonstationary covariance function Figure: courtesy of Cressie & Johannesson 08
Gaussian Predictive Process (Banerjee et al 08) Y (s) = X (s) T β + η(s) + ε(s), η GP(0, C(, ; θ) Reasoning: Given Z, we seek a linear combination a(s) T Z such that [ 2 E η (s) a (s) Z] T ds S is minimized In predictive process one let Z = η = {η(s i )}p i=1 N p(0, C (θ)) [ ] p where C (θ) = C(si, s j ; θ) and i,j=1 ] j=1,,p A = [C(s i, s j ; θ) [C ] 1 i=1,,n
Covaraince function approximation Figure: courtesy of Banerjee et al 08
Covaraince function approximation cont d Figure: courtesy of Banerjee et al 08
Discussion
References S Banerjee, A E Gelfand, A O Finley, and H Sang Gaussian Predictive Process Models for Large Spatial Data Sets JRSSB, 70:825 848, 2008 N A C Cressie, and G Johannesson Fixed Rank Kriging for Very Large Spatial Data Sets JRSSB, 70:209 226, 2008 C K Wikle Low-rank representations for spatial processes, Chapter 8 of Handbook of Spatial Statistics Chapman & Hall/CRC, 2010