Estimation of spatiotemporal effects by the fused lasso for densely sampled spatial data using body condition data set from common minke whales

Size: px

Start display at page:

Download "Estimation of spatiotemporal effects by the fused lasso for densely sampled spatial data using body condition data set from common minke whales"

Ralph Mills
5 years ago
Views:

1 Estimation of spatiotemporal effects by the fused lasso for densely sampled spatial data using body condition data set from common minke whales Mariko Yamamura 1, Hirokazu Yanagihara 2, Keisuke Fukui 3, Hiroko Solvang 4, Nils Øien 4, Tore Haug 4 1 Graduate School of Education, Hiroshima University, 2 Graduate School of Science, Hiroshima University, 3 Research & Development Center, Osaka Medical College, 4 Institute of Marine Research, Norway 1 Introduction Samples evenly distributed all over the population are not always available for real data analysis. As an example of a spatial data, a data from common mink whales in Norwegian water provides values showing their body conditions with whaling locations such as longitudes and latitudes. Though whales are distributed all over the Norwegian water, whaling locations are almost the same every year, therefore samples are dense at particular locations. Solvang et al. (2017) and Yamamura et al. (2016) model spatial effects by polynomial in the densely sampled spatial data to avoid the estimation result greatly influenced by the dense. However, the polynomial is not enough to have a flexible estimation result. Having a more flexible estimation result, a nonparametric smoothing by placing basis functions such as the spline method is available for estimating spatial effects. However, the nonparametric smoothing may not show a precise estimation result of spatial effects at a space with small number of samples in densely sampled spatial data. Therefore, we propose an estimation method of spatial effect which is hardly affected by the dense samples. In the proposal method, the space to be analyzed is subdivided into 1

2 several, and we estimate the spatial effect by using fused lasso with combining spatial effects from subdivided space. 2 Estimation Method 2.1 Additive Model with Spatial Effect Let y ij be a response variable of the i-th sample at the j-th space for i = 1,..., n j and j = 1,..., m, where n = m j=1 n j and m are the sample size and the number of spaces, respectively, and let x l,ij be the l-th explanatory variable of the i-th sample at the j-th space for i = 1,..., n j, j = 1,..., m and l = 1,..., p. The additive model with spatial effect is defined by y ij = p f l (x l,ij ) + µ j + ε ij, l=1 (i = 1,..., n j ; j = 1,..., m), where f l is a function indicating the influence of l-th explanatory variable, µ j is the spatial effect of the j-th space and ε ij is an error variable of the i-th sample at the j-th space. Here, we assume that ε 11,..., ε n1 1,..., ε 1m,..., ε nmm are independetly and identically distributed according to a distribution with the mean 0 and variance σ 2. We estimate the spatial effects µ 1,..., µ m and effects other than space f 1,..., f p by the backfitting algorithm, see more about the backfitting algorithm in Hastie and Tibshirani (1990). The f l can be any function but should be the one which properly fits the analyzed data. We nonparametrically estimate f l with the common mink whale data, because some explanatory variables have non-linear shapes in previous study, Yamamura et al. (2016). We assume that f 1,..., f p1 are linear functions and f p1 +1,..., f p are nonlinear functions approximated by truncated cubic basis functions, i.e., p p 1 p f l (x l,ij ) = β l x l,ij + f l (x l,ij ), l=1 l=1 l=p

3 where for l = p 1 + 1,..., p, f l (x l,ij ) = β l,1 x l,ij + β l,2 x 2 l,ij + β l,3 x 3 l,ij + b 0 g=1 α l,g (x l,ij τ l,g ) 3 +. (1) Here, τ l,g is the knot of basis function given by the 100 g/(b 0 +1)% (g = 1,..., b 0 ) point of the l-th explanatory variable and (x τ) 3 + = I(x > τ)(x τ) 3 is the truncated cubic basis function, where I(A) is the indicator function, i.e., I(A) = 1 if A is true and I(A) = 0 if A is false. Let β l = (β l,1, β l,2, β l,3 ), x l,ij = (x l,ij, x 2 l,ij, x3 l,ij ), α l = (α l,1,..., α l,b0 ) and b l,ij = ((x l,ij τ l,1 ) 3 +,..., (x l,ij τ l,b0 ) 3 +). The f l in (1) is written as f l (x l,ij ) = β lx l,ij + α lb l,ij. Let y j and ε j be n j -dimensional vectors obtained by stacking response variables and error variables of the j-th space, respectively, i.e., y j = (y 1j,..., y nj j) and ε j = (ε 1j,..., ε nj j). Focusing on a space, the additive model of the j-th space is written as y j = X j β + B j α + µ j 1 nj + ε j, (j = 1,..., m), (2) where 1 nj as below. is the n j -dimensional vector of ones, and other vectors and matrices indicate β = (β 1,..., β p1, β p 1 +1,..., β p), α = (α p 1 +1,..., α p), x 1,1j... x p1,1j x p 1 +1,1j... x p,1j X j = (n j k, k = 3p 2p 1 ), x 1,nj j... x p1,n j j x p 1 +1,n j j... x p,n j j b p 1 +1,1j... b p,1j B j =..... (n j b, b = b 0 (p p 1 )). b p 1 +1,n j j... b p,n j j Let y and ε be n-dimensional vectors obtained by stacking the vectors of response variables and error variables of the j-th space, respectively, i.e., y = (y 1,..., y m) and ε = (ε 1,..., ε m), and let X and B be n k and n b matrices obtained by stacking the 3

4 matrices of explanatory variables and basis functions of the j-th space, respectively, i.e., X = (X 1,..., X m) and B = (B 1,..., B m). As a whole of space, the additive model in (2) is written such as y = Xβ + Bα + Rµ + ε, where µ = (µ 1,..., µ m ) and R is an n m matrix defined by 1 n1 e 1 R =.. 1 nm e m Here, e j is the m-dimensional vector of which the j-th element is 1 while all the other elements are 0, and indicates the Kronecker product of the two matrices. 2.2 Estimations of α and β We explain the estimation method for α and β at first, though we need to estimate µ. Therefore we put the estimator of µ as ˆµ here. We use the penalized spline regression introduced by Yanagihara (2012, 2018) to estimate α and β. Yanagihara (2012) shows that choosing the smoothing parameters in the penalized smoothing spline is equivalent to choosing the ridge parameters in the generalized ridge regression using the matrix of transformed basis function values as the matrix of explanatory variables. And then Yanagihara (2018) considers optimization of the ridge parameters in generalized ridge regression by minimizing a model selection criterion, i.e., generalized cross-validation (GCV). Hence, we estimate α by the penalized smoothing spline optimized by minimizing GCV. Let Q be the b b orthogonal matrix which diagonalizes B (I n P X )B, where P X = X(X X) 1 X, such that Q B (I n P X )BQ = D = diag(d 1,..., d b ), (d 1 d b ). (3) By using Q and D, we define the following n b matrix: H = (I n P X )BQD 1/2. (4) 4

5 From H, a b-dimensional vector z is defined by z = (z 1,..., z b ) = H (y R ˆµ). (5) In the penalized spline, there is a possibility that some singular values of the matrix of basis functions become very small. Then estimates of α and β become unstable. Removing very small singular values from data will eliminate a fault of an estimation. Hence we consider to use d 1,..., d γ (γ = 1,..., b), i.e., to remove d γ+1,..., d b, for estimating α and β. Let Q γ and H γ be b γ and n γ matrices consisted from the 1st to the γ-th columns of Q in (3) and H in (4), respectively, and D γ be a γ γ diagonal matrix defined by D γ = diag(d 1,..., d γ ). Moreover, let t γ,1 t γ,γ be the order statistics of z1, 2..., zγ, 2 where z j is given by (5). By using the order statistics, we define the following statistics of dispersion s 2 γ,a and a region π γ,a : (y R ˆµ) (I n P X H γ H γ)(y R ˆµ) n k γ s 2 γ,a = (n j γ)s 2 γ,0 + a j=1 t γ,j n k γ + a (0, t γ,1 ] (a = 0) π γ,a = (t γ,a, t γ,a+1 ] (a = 1,..., γ 1). (a = 0) (a = 1,..., γ), (t γ,γ, ) (a = γ) Let A γ be a set of integers which is defined by A γ = {a = {0, 1,..., γ} s 2 γ,a π γ,a }. From Yanagihara (2018), we can see that #(A γ ) = 1. Hence, we write the only element of the set A γ as a γ. Let V γ be a γ γ diagonal matrix as V γ = diag(ν γ,1,..., ν γ,γ ), ν γ,j = I ( ) ( ) zj 2 > s 2 γ,a 1 s2 γ,a γ γ zj 2 (j = 1,..., γ). From Yanagihara (2012, 2018), estimates of α and β with d 1,..., d γ smoothing parameters by GCV are given by after optimizing ˆα γ = Q γ V γ D 1/2 γ z γ, ˆβγ = (X X) 1 X (y R ˆµ B ˆα γ ), 5

6 respectively, where z γ = (z 1,..., z γ ). Since an optimization of γ remains, we optimize γ by a minimization of GCV as ˆγ = arg min γ {1,...,b} (y R ˆµ) (I n P X H γ V γ H γ)(y R ˆµ) {1 (k + γ)/n} 2. Therefore, ultimate estimates of α and β optimized by GCV are ˆα = ˆαˆγ = Qˆγ Vˆγ D 1/2 ˆγ zˆγ, ˆβ = ˆβˆγ = (X X) 1 X (y R ˆµ B ˆαˆγ ). (6) 2.3 Estimation of µ We explain the estimation method for µ, therefore estimators of α and β are given as ˆα and ˆβ in (6). The penalized residual sum of squares (PRSS λ ) for the adaptive fused lasso is given by PRSS λ (µ ˆf) = y X ˆβ m B ˆα Rµ 2 + λ w jl µ j µ l, (7) j=1 l D j where λ is the non-negative regularization parameter, i.e., λ 0. Here, D j is a set indicating spaces adjacent to the j-th space. As an example, the D 1 = {2, 3, 5} expresses that the 2nd, 3rd, and 5th spaces are adjacent to the 1st space. The number of elements of the set D j is denoted by m j, i.e., m j = #(D j ). The weight of the adaptive fused lasso is denoted by w jl such as w jl = 1/ µ j µ l, where µ j is the j-th element of (M M) 1 M y and M = (R, X, B). The spatial effect µ is estimated by minimizing PRSS λ as ˆµ λ = arg min µ R m PRSS λ(µ ˆf). The above minimization problem can be solved by the coordinate descent algorithm in Friedman et al. (2007). Suppose that all of the values of µ l other than µ γ (γ = 1,..., m) are given. The equation (7) can expressed by a function of µ γ (γ = 1,..., m), i.e., ϕ γ (µ γ ) such that PRSS λ (µ ˆf) = ϕ γ (µ γ ) + (a term not dependent on µ γ ). 6

7 Let t γ,1 t γ,mγ be the order statistics of spatial effects of adjacent spaces of the γ-th space, i.e., the order statistics of sequence {ˆµ l } l Dγ, and let us define the region π γ,a as (0, t γ,1 ] (a = 0) π γ,a = (t γ,a, t γ,a+1 ] (a = 1,..., m γ 1), (t γ,mγ, ) (a = m γ ) By using the above region and the set D γ, we define D γ,a = { l D γ µ l } a π γ,h, ν γ,a = u γ + 2λ w γl w γl, n λ l Dγ,a c l D γ,a h=0 where u j is given by u j = 1 n j (y j X j ˆβ Bj ˆα) n j. (8) Then, ϕ γ is given by a piecewise function as ϕ γ (µ γ ) = ϕ γ,a (µ γ ) = n γ (µ 2 γ 2ν γ,a µ γ ), (µ γ π γ,a, a = 1,..., γ). (9) Let A γ,1 and A γ,2 be sets of integers defined by A γ,1 = { a = {0, 1,..., m γ } νγ,a π γ,a }, A γ,2 = { a = {0, 1,..., m γ 1} {ν γ,a t γ,a+1 } {t γ,a t γ,a+1 } }. Notice that #(A γ,1 ) = 1 and #(A γ,2 ) = 1 if A γ,1 and A γ,2, respectively. Hence, we write the only on elements of A γ,1 and A γ,2 as a γ,1 and a γ,2, respectively. Moreover, we can see that #(A γ,1 A γ,2 ) = 1 when A γ,1 A γ,2. By using the above equations, we obtain the minimizer of ϕ γ (µ γ ) in (9) as ˆµ γ = arg min µ γ R ϕ γ(µ γ ) = ν γ,a γ,1 (A γ,1 ) (A γ,2 ), t γ,a γ,2 t γ,mγ (A γ,1 A γ,2 ). 7

8 An optimization of λ is performed by the minimization of GCV. Let us define ˆµ = 1 mr (y X ˆβ B ˆα). n By using the above equation and u j in (8), we define λ max = u j ˆµ max. j {1,...,m} m j /n j Moreover, we prepare a set of λ as Λ = {λ 1,..., λ 100 }, where λ j = λ max (0.75) 100 j. The optimization of λ is performed by the minimization of GCV as ˆλ = arg min λ Λ y X ˆβ B ˆα R ˆµ λ 2 (1 df λ /n) 2, where the df λ is the number of non-zero elements of ˆµ λ. 3 Data Over the period , the body condition data were obtained from common minke whales taken in Norwegian scientific and commercial whaling operations in the Northeast Atlantic during the months April to September. This data is basically the same one used in Solvang et al. (2017) and Yamamura et al. (2016) but samples in are newly added here. Immediately after death, the whales were taken onboard and hauled across the foredeck of the boat. Total body length was measured in a straight line from the tip of the upper jaw to the apex of the tail fluke notch; girth was measured right behind the flipper; and blubber thickness was measured at three sites (Fig.1): Dorsally behind the blowhole (BT1) and behind the dorsal fin (BT2), and laterally just above the center of the flipper (BT3). Blubber measurements were made perpendicular from the skin surface to the muscle-connective tissue interface. Length and girth measurements were made to the nearest centimeter, while blubber measurements were to the nearest millimeter. For all whales, the year, month, day, latitude, and longitude were recorded. After removing data in period and data with missing values, final numbers of individuals included 8

9 in the analysis are 11,505. We use that y ij as the BT1, x 1,ij as the sex, x 2i j as the year, x 3i j as the calendar day, x 4i j as the length, µ j as the space effect, and ε ij as the error term about the i- th sample (i = 1,..., n j ) at the j-th space (j = 1,..., m). Figure 1: Measurement sites. There is the geographic distribution of the five International Whaling Commission (IWC) management areas: ES (Svalbard-Bear Island area), EB (Eastern Barents Sea), EW (Norwegian Sea and coastal zones off North Norway, including the Lofoten area), EN (North Sea), and CM (Western Norwegian Sea- Jan Mayen area). We subdivide each areas to have about 300 samples, and estimate ˆµ of subdivided areas. If the ˆµ of a subdivided area is equal to the one of its neighbor area, the subdivided area and the neighbor are united. 4 Estimation Result From the fused lasso estimation result, the subdivided areas are narrowed down the 11 spaces, and whales have the thickest BT1 in the northernmost space. The other detailed results are showed at the seminar. References [1] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Annals of Applied Statistics, 1, [2] Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman & Hall Ltd., London, New York. [3] Solvang, H. K., Yanagihara, H., Øien, N. and Haug, T. (2017). Temporal and geographical variation in body condition of common minke whales (Balaenoptera acutorostrata acutorostrata) in the northeast Atlantic. Polar Biology, 40,

10 [4] Yamamura, M., Yanagihara, H., Solvang, H. K., Øien, N. and Haug, T. (2016). Canonical correlation analysis for geographical and chronological responses. Procedia Computer Science, 96, [5] Yanagihara, H. (2012). A non-iterative optimization method for smoothness in penalized spline regression. Statistics and Computing, 22, [6] Yanagihara, H. (2018). Explicit solution to the minimization problem of generalized cross-validation criterion for selecting ridge parameters in generalized ridge regression. Hiroshima Mathematical Journal, 48,

Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui