Supplemental materials to Reduced Rank Mixed Effects Models for Spatially Correlated Hierarchical Functional Data

Size: px

Start display at page:

Download "Supplemental materials to Reduced Rank Mixed Effects Models for Spatially Correlated Hierarchical Functional Data"

Darlene Simon
5 years ago
Views:

1 Supplemental materials to Reduced Rank Mixed Effects Models for Spatially Correlated Hierarchical Functional Data Lan Zhou, Jianhua Z. Huang, Josue G. Martinez, Arnab Maity, Veerabhadran Baladandayuthapani and Raymond J. Carroll 1 Outline Section details the computational methods for implementing the proposed methodology, including steps of the EM algorithm and techniques for avoiding storage and inverting large matrices. Section 3 provides additional simulation results. Section 4 contains residuals plots after model fitting for the real data example. Computational methods.1 Computation of the Conditional Moments in the E-step of the EM Algorithm In the following, the conditional expectations are calculated given the current parameter values; for simplicity of presentation, the dependence on the current parameter values is suppressed in our notation. To calculate the conditional moments, we use some standard results for multivariate normal distributions. The covariances between α ab, β ab and Y ab are cov(α ab, Y ab ) = D α,a Γ T ξ B T ab and cov(β ab, Y ab ) = V ab Γ T η,abb T ab. Since cov(α ab, β ab ) = 0, we have cov(y ab ) = B ab Γ ξ D α,a Γ T ξ B T ab + B abγ η,ab V ab Γ T η,abb T ab + σ I ab, where I ab is the identity matrix of rank N ab. The conditional distribution of (α ab, β ab ) given Y ab is normal and is denoted as ( α ab β ab The conditional means are ) ( ) ( ) } m α,ab Σ αα,ab, Σ αβ,ab N, Σ ab =. (1) m β,ab Σ T αβ,ab, Σ ββ,ab m α,ab = E(α ab Y ab ) = D α,a Γ T ξ B T abcov(y ab ) 1 (Y ab B ab γ µ,a ) () and m β,ab = E(β ab Y ab ) = V ab Γ T η,abb T abcov(y ab ) 1 (Y ab B ab γ µ,a ). (3) 1

2 The conditional covariance matrix is ( ) } ( ) ( ) α ab cov β Y D α,a 0 D α,a Γ T ξ B T ab ab = cov(y ab 0 V ab V ab Γ T η,abb T ab )} 1 (B ab Γ ξ D α,a, B ab Γ η,ab V ab ). ab Therefore, we have Σ αα,ab = D α,a D α,a Γ T ξ B T abcov(y ab )} 1 B ab Γ ξ D α,a, (4) Σ ββ,ab = V ab V ab Γ T η,abb T abcov(y ab )} 1 B ab Γ η,ab V ab, (5) and Σ αβ,ab = D α,a Γ T ξ B T abcov(y ab )} 1 B ab Γ η,ab V ab. (6) The desired predictions required by the EM algorithm are α ab = E(α ab Y ab ) = m α,ab, βab = E(β ab Y ab ) = m β,ab, α ab α T ab = E(α abα T ab Y ab ) = α ab α T ab + Σ αα,ab β ab β T ab = E(β ab β T ab Y ab ) = β ab βt ab + Σ ββ,ab, α ab β T ab = E(α ab β T ab Y ab ) = α ab βt ab + Σ αβ,ab. (7) The first two equalities also give expressions of the best linear unbiased predictors (BLUP) of the random effects α ab and β ab. Computation of V ab. It is convenient to re-group the elements of β ab according to the principal components. Denote β ab,j = (β ab1j,..., β abcab j) T, j = 1,..., K η, and β ab = (β T ab,1,..., β T ab,k η ) T. Let P abj be a C ab by C ab matrix with elements P abj,cc = ρ( x abc x abc ; θ aj ), c, c = 1,..., C ab. Note that P abj = P T abj. Then Σ abj = cov(β ab,j ) = σβ,aj P abj. Since cov(β ab,j, β ab,j ) = 0 for j j, the covariance matrix of β ab is block diagonal Ṽ ab = cov( β ab ) = diag(σ ab1,... Σ abkη ). Note that β ab is just a reordering of β ab, that is, β ab = O β ab for a permutation matrix O. It follows the covariance matrix of β ab is V ab = OṼabO T. Circumventing inversion of cov(y ab ). We suppress the subscripts of D α,a and V ab in the following discussion. Denote E = B ab Γ ξ, F = B ab Γ η,ab, and S = cov(y ab ) = EDE T + FVF T + σ I. To calculate the conditional moments given in () (6), we need to compute DE T S 1 in (), VF T S 1 in (3), D DE T S 1 ED in (4), V VF T S 1 F T V in (6), and DE T S 1 FV in (5). Denote J = (EDE T + σ I) 1 and K = (FVF T + σ I) 1. Repeatedly using the identities (A 1 + C T B 1 C) 1 = A AC T (CAC T + B) 1 CA, (8) (A 1 + C T B 1 C) 1 C T B 1 = AC T (CAC T + B) 1, (9)

3 we obtain J = σ I σ E(E T E + σ D 1 ) 1 E T, (10) K = σ I σ F(F T F + σ V 1 ) 1 F T, (11) and DE T S 1 = (D 1 + E T KE) 1 E T K, VF T S 1 = (V 1 + F T JF) 1 F T J, D DE T S 1 ED = (D 1 + E T KE) 1, V VF T S 1 FV = (V 1 + F T JF) 1, DE T S 1 FV = DE T JF(V 1 + F T JF) 1. The matrices that need to be inverted here are of the same size as D or V, which is K ξ K ξ or (C ab K η ) (C ab K η ), much smaller than the size of cov(y ab ).. Updating Formula in the M-step of the EM Algorithm In the updating formulas given below, the parameters appear on the right hand side of equations are all fixed at their values obtained from the previous iteration of the algorithm. 1. Update the estimate of σ. Let N = A Ba a=1 N ab. The updating formula is σ = 1 A E(ɛ T N abɛ ab Y ab ), a=1 where ɛ ab = Y ab B ab γ µ,a B ab Γ ξ α ab B ab Γ η,ab β ab. Using (7), we obtain that E(ɛ T ab ɛ ab Y ab ) can be expressed as (Y ab B ab γ µ,a B ab Γ ξ α ab B ab Γ η,ab βab ) T (Y ab B ab γ µ,a B ab Γ ξ α ab B ab Γ η,ab βab ) D α,a is + trace(b ab Γ ξ Σ αα,ab Γ T ξ B T ab + B ab Γ η,ab Σ ββ,ab Γ T η,abb T ab + B ab Γ ξ Σ αβ,ab Γ T η,abb T ab).. Update D α,a = diag(σα,aj) and D β,a = diag(σβ,aj ). The updating formula for 1 D α,a = diag B a } α ab α T ab. (1) To update D β,a, we need to minimize the conditional expectation of K η j=1 log( V ab )+β T abv 1 ab β ab = K η j=1 C ab log(σ β,aj)+log P abj +σ β,aj βt ab,jp 1 ab,j β ab,j }. 3

4 Thus the updating formulas are ( σ β,aj 1 = trace Ba C ab P 1 ab,j β ab,j β T ab,j ), j = 1,..., K η. (13) If the variances of random effects do not depend on the treatment level a, we simply need to average over a in (1) and (13). 3. Update γ µ,a, a = 1,..., A. The updating formula are Ba } 1 Ba γ µ,a = B T abb ab + σ λ µ b (t)b (t) T B T ab(y ab B ab Γ ξ α ab B ab Γ η,ab βab ). 4. Update the columns of Γ ξ sequentially. For j = 1,..., K ξ, the updating formula are A γ ξ,j = α abj BT abb ab + σ λ ξ a=1 A a=1 B T ab b (t)b (t) T } 1 ( (Y ab B ab γ µ,a ) α abj B ab l j ) γ ξ,l α abj α abl B ab Γ η,ab α abj β ab }. 5. Update the columns of Γ η sequentially. For j = 1,..., K η, the updating formula are A C ab γ η,j = β abcj BT abcb abc + σ λ η,a a=1 c=1 A C ab a=1 c=1 B T abc b (t)b (t) T } 1 (Y abc B abc γ µ,a ) β abcj B abc Γ ξ β abcj α ab B abc ( l j γ η,l β abcj β abcl )}. 6. Orthogonalizing Γ ξ and Γ η. The matrix Γ ξ and Γ η obtained in Steps 4 and 5 need not have orthonormal columns. We orthogonalize them in this step. Compute the QR decomposition Γ ξ = Q ξ R ξ where Q ξ has orthonormal columns. Let Γ ξ Q ξ. Orthogonalize Γ η similarly. 7. Update the correlation parameter θ aj. Given the current estimates of other parameters, we minimize for each a and j the conditional expectation of which is log P 1 abj + 1 } β T σ ab,jp 1 β,aj abj β ab,j, log P 1 abj + 1 trace σβ,aj 4 ( P 1 abj β ab,j β T ab,j) }. (14)

5 The minimizer can be found by a gradient-based method. Denote the components of θ aj as θ ajk s. The gradient with respect to θ aj is a vector with elements ( trace P 1 abj P ) abj 1 ( trace P 1 P abj θ ajk σβ,aj abj P 1 abjβ θ ab,j βab,j) } T. ajk Alternatively, a direct search algorithm that does not require the calculation of the gradient, such as the downhill simplex method of Nelder and Mead (1965), can be applied to minimize (14). If the correlation parameters do not depend on j, we consider for each a a new objective function by summing (14) over j. If the correlation parameters do not depend on a, we need only sum (14) over a for each j to define the new objective function. Note that the orthogonalization in Step 6 is used to ensure identifiability. This approach was adapted from Zhou et al. (008) and worked well in our simulation study and real data analysis. An alternative approach is direct optimization within a restricted space (Peng and Paul, 009) but its implementation is beyond the scope of this paper..3 Computing the Observed Data Log Likelihood When we do crossvalidation, we need to compute the log likelihood of observed data, which depends on the determinant and inverse of the possibly very large matrix S, defined in Section.1. This section gives a method for computation of S and S 1 without constructing the matrix S. Using the identity A + BC T = A I + C T A 1 B, (15) we obtain S = σ I + EDE T + FVF T = σ I + EDE T I + V 1/ F T (σ I + EDE T ) 1 FV 1/. The two factors on the right hand side of the above equation can be computed as follows. Using the identity (15), we have σ I + EDE T = σ N I + σ D 1/ E T ED 1/ = σ N D D 1 + σ E T E. Using (σ I + EDE T ) 1 = σ I σ E(E T E + σ D 1 ) 1 E T, we obtain I + V 1/ F T (σ I + EDE T ) 1 FV 1/ = V V 1 + σ F T F σ F T E(E T E + σ D 1 ) 1 E T F. 5

6 Using the definition of J in Section.1 and the identity (8), we have S 1 = (J 1 + FVF T ) 1 = J JF(F T JF + V 1 ) 1 F T J, where J can be computed using (10)..4 Calculating the partial derivatives of P abj Consider the Matérn family of autocorrelation functions ( ) ρ(d; φ, ν) = 1 ν dν 1/ ν ( ) dν 1/ K ν, Γ(ν) φ φ φ > 0, ν > 0, where K ν ( ) is the modified Bessel function of order ν taking the form with K ν (u) = I ν (u) = m=0 π sin(νπ) I ν(u) I ν (u)} ( ) m+ν 1 u. m!γ(m + ν + 1) Denote u = xν 1/ /φ, then the partial derivatives of ρ(x; φ, ν) are ρ(x; φ, ν) φ = φγ(ν)( 1 u ) ν[ukν 1 (u) + K ν+1 (u)} νk ν (u)] and ρ(x; φ, ν) ν = ( u ) ψ(ν) π cot(νπ) + log + 1 } ρ(x; φ, ν) π ) [ ν u + ν I ν+1(u) I ν+1 (u)} u Γ(ν) sin(νπ)( ( u ) log m=0 m=0 } I ν (u) + I ν (u)} 1 u ) m+νψ(m + ν + 1) m!γ(m + ν + 1)( 1 u ) ] m νψ(m ν + 1), m!γ(m ν + 1)( where ψ(z) = C l=0 ( 1 1 ) with C = lim z+l 1+l k k 1 l=1 ln(k)}.557, and k Γ( ) is the Gamma function. 6

7 3 Additional simulation results Two other setups from the Bayesian hierarchical model in Baladandayuthapani, et al. (008) were also considered but the software of the Bayesian method encountered some serious numerical problems and could only run on a small proportion of simulated data sets. We ran simulation 100 times for each setup. The results of our methods are based on three unit and subunit level principal components. Our methods successfully ran on all data sets. and Setup 6. Same as Setup 5 except that Σ 1 = Σ = Σ 31 = Σ 3 = For this setup, the Bayesian method only ran on 10 out of 100 simulated data sets. Setup 7. Same as Setup 6 except that σ = σ 3 = 0.4. For this setup, the Bayesian method only ran on 3 out of 100 simulated data sets. We used the measures defined at the beginning of Section 5 of the paper to assess/compare performance of the two methods. For our reduced rank methods, the measures were computed using all 100 data sets as well as only those data sets that the Bayesian method ran. The results are summarized in Table 1. Our reduced rank method is comparable to the Bayesian method for estimating the mean functions and predicting the random effects. Next we report some simulation results on parameter estimation for setups following our model. Note that comparison with the Bayesian method is not available since two methods use different parameters. Table shows the summary statistics of correlation parameter estimates for the first three simulation setups where the data were generated from our model. The parameter estimates are reasonably unbiased. Figure 1 presents the pointwise sample mean of the estimated treatment effects, unit level and sub-unit level PCs, along with 90% pointwise coverage intervals for simulation Setup. It is not surprising that sub-unit level PCs are better estimated than the unit level PCs since there are more sub-unit level data than the unit level data.. 7

8 Table 1: Comparison of two methods based on 100 simulation runs. Mean (SD) of the integrated absolute errors of estimating the mean functions and predicting the unit and subunit level random effects. Numbers shown are the actual numbers multiplied by 10. Reduced rank refers to our method; Bayesian refers to the Bayesian method of Baladandayuthapani, et al. (008). 100 sims, 10 sims and 3 sims indicate the number of data sets used in the calculation. Setup Method Mean Unit Subunit 6 7 Reduced rank (100 sims) (1.45) (1.088) (0.481) Reduced rank (10 sims) (0.963) (1.044) (0.305) Bayesian (10 sims) (0.963) 6.55 (0.899) (0.93) Reduced rank (100 sims) (1.515) (1.187) (1.064) Reduced rank (3 sims) (1.34) 7.86 (0.890) (0.845) Bayesian (3 sims) (1.344) 7.84 (0.99) (0.567) Table : Sample mean (SD) of correlation parameter estimates, based on 100 simulation runs. PC 1 PC Setup φ = 8 ν = 0.1 φ = 4 ν = (4.98) (0.00) (5.67) (0.018) 4.6 (1.306) (0.040) (.369) (0.019) (1.19) (0.036) 4 Colon Carcinogenesis Data Figures and 3 contains respectively residual plots after fitting our reduced rank model and the Bayesian model of Baladandayuthapani, et al. (008). It is clear that our model fits the data better. After fitting the Bayesian model, there were still obvious patterns left in the residual plots for treatment groups FO+B and FO B. References Baladandayuthapani, V., Mallick, B., Turner, N., Hong, M., Chapkin, R., Lupton, J. and Carroll, R. J. (008). Bayesian hierarchical spatially correlated functional data analysis with application to colon carcinogenesis. Biometrics 64, Nelder, J.A. & Mead, R. (1965). Computer Journal, 7, A simplex method for function minimization. Peng, J. and Paul, D. (009). A geometric approach to maximum likelihood estimation of the functional principal components from sparse longitudinal data. Journal of Computational and Graphical Statistics, to appear. 8

9 Figure 1: Simulation Setup. The pointwise sample mean of the estimated treatment effects, unit level and sub-unit level PCs, along with 90% pointwise coverage intervals. Group 1 Unit PC 1 Sub unit PC Group Unit PC Sub unit PC t Zhou, L., Huang, J.Z. and Carroll, J.R. (008). Joint modelling of paired sparse functional data using principal components. Biometrika, 95,

10 Figure : Residual plots after fitting the reduced rank model. Separate plots are drawn for the four diet groups. CO is Corn Oil, FO is Fish Oil, +B and B represent with or without (±) Butyrate supplement. Diet CO+B Diet CO B Residual Diet FO+B Diet FO B Fitted p7 10

11 Figure 3: Residual plots after fitting the Bayesian model of Baladandayuthapani, et al. (008). Separate plots are drawn for the four diet groups. CO is Corn Oil, FO is Fish Oil, +B and B represent with or without Butyrate supplement. Diet CO+B Diet CO B Residual Diet FO+B Diet FO B 3 Fitted p7 11

Fast Methods for Spatially Correlated Multilevel Functional Data

Fast Methods for Spatially Correlated Multilevel Functional Data Ana-Maria Staicu, Department of Statistics, North Carolina State University, 23 Stinson Drive Raleigh, NC 27695-8203, USA email: staicu@stat.ncsu.edu,