Fast iterative implementation of large-scale nonlinear geostatistical inverse modeling

WATER RESOURCES RESEARCH, VOL. 50, 198 207, doi:10.1002/2012wr013241, 2014 Fast iterative implementation of large-scale nonlinear geostatistical inverse modeling Xiaoyi Liu, 1 Quanlin Zhou, 1 Peter K. Kitanidis, 2 and Jens T. Birkholzer 1 Received 6 November 2012; revised 3 December 2013; accepted 5 December 2013; published 8 January 2014. [1] In nonlinear geostatistical inverse problems, it often takes a significant amount of computational cost to form linear geostatistical inversion systems by linearizing the forward model. More specifically, the storage cost associated with the sensitivity matrix H (m 3 n, where m and n are the numbers of measurements and unknowns, respectively) is high, especially when both m and n are large in for instance, 3-D tomography problems. In this research, instead of explicitly forming and directly solving the linear geostatistical inversion system, we use MINRES, a Krylov subspace method, to solve it iteratively. During each iteration in MINRES, we only compute the products Hx and H T x for any appropriately sized vectors x, for which we solve the forward problem twice. As a result, we reduce the memory requirement from OðmnÞ to OðmÞ1OðnÞ. This iterative methodology is combined with the Bayesian inverse method in Kitanidis (1996) to solve large-scale inversion problems. The computational advantages of our methodology are demonstrated using a large-scale 3-D numerical hydraulic tomography problem with transient pressure measurements (250,000 unknowns and 100,000 measurements). In this case, 200 GB of memory would otherwise be required to fully compute and store the sensitivity matrix H at each Newton step during optimization. The CPU cost can also be significantly reduced in terms of the total number of forward simulations. In the end, we discuss potential extension of the methodology to other geostatistical methods such as the Successive Linear Estimator. Citation: Liu, X., Q. Zhou, P. K. Kitanidis, and J. T. Birkholzer (2014), Fast iterative implementation of large-scale nonlinear geostatistical inverse modeling, Water Resour. Res., 50, 198 207, doi:10.1002/2012wr013241. 1. Introduction [2] Geostatistical inverse modeling has been widely used to characterize subsurface heterogeneity that is important for accurate prediction of subsurface fluid dynamics. It often involves a set of criteria to select the best parameters. For example, in Kriging, Cokriging, and Successive Linear Estimator (SLE), the criteria are the unbiasedness and variance of the estimator, which are equivalent to the Bayesian posterior likelihood criterion under certain conditions. Indeed, using Bayes rule with Gaussian assumptions, the posterior distribution of parameters p has the following form: pðpjb; y Þ/exp exp 2 1 2 ðp2xb 2 1 ð 2 y 2yðpÞ ÞT Q 21 ðp2xbþ Þ T R 21 ðy 2yðpÞÞ ; (1) where p ½n31Š contains the parameters to be estimated on a grid of n cells; b ½d31Š contains the drift coefficients; Q ½n 3nŠ is the prior covariance matrix of the parameters; y ½m 31Š contains the measurements; X ½n3dŠ is a structure matrix; y5mhðpþ where M ½m3nn t Š is the measurement operator; h ½nn t 31Š is the forward model, which takes an n 3 1 vector as input and returns the nn t 31 vector of heads at n t time steps, and often involves the solution of a PDE; and R ½m3mŠ is the covariance matrix of the measurement error. [3] Following the process proposed in Kitanidis [1995], we eliminate the drift coefficients b (assumed to have a flat prior) from the objective function by integrating them out from (1), and obtain pðpjy Þ/exp 2 1 2 pt Gp exp 2 1 ð 2 y 2yðpÞÞ T R 21 ðy 2yðpÞÞ ; (2) 1 Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA. 2 Department of Civil and Environmental Engineering, Stanford University, Stanford, California, USA. Corresponding author: X. Liu, Earth Sciences Division, Lawrence Berkeley National Laboratory, One Cyclotron Rd., Berkeley, CA 94720, USA. (XiaoyiLiu@lbl.gov) 2013. American Geophysical Union. All Rights Reserved. 0043-1397/14/10.1002/2012WR013241 where G5Q 21 2Q 21 XðX T Q 21 XÞ 21 X T Q 21. [4] With the Gauss-Newton method, the maximization of (2) can be done efficiently following the approach proposed in Kitanidis and Vomvoris [1983], Hoeksema and Kitanidis [1984], and Kitanidis [1995]. In this approach, the nonlinear forward model h(p) in (2) is linearized and parameters updated successively until convergence: 198

and LIU ET AL.: FAST ITERATIVE NONLINEAR INVERSE MODELING " #" # " # W U n 5 y 2yðp i Þ1Hp i U T 0 ^b 0 (3) p i11 5X^b1QH T n; (4) where W5HQH T 1R; U5HX; H5MJ, and it is the sensitivity matrix of y; J ½nn t 3nŠ is the sensitivity matrix of h evaluated at p i (i.e., J i;j 5 @h i ); p i11 points to the next Newton step; and i is an index for the Gauss-Newton iteration. The beauty of this method is that it reduces the dimension of the parameter estimation problem from n the number of unknowns, to m the number of observations. In fact, as Kitanidis [1998] points out, equation (4) represents the solution of the linearized inverse problem as a linear combination of m 1 1 spline functions defined by X and QH T. For simplicity, we denote the first term in equation (3) as K5 W U U T. 0 [5] In most hydrological applications, m is a small number and m n, thus the system in equation (3) is usually solved with direct methods (elimination, decomposition, etc.) without much effort once formed. However, forming matrices W and U for large problems is problematic because both Q and H are large and dense. For instance, with m510 5 and n52:5310 5, the memory requirement for H (200 GB) and Q (500 GB) is tremendous. The computational cost associated with Q is eased by using a regular grid and the resulting fact that Q is a symmetric Toeplitz matrix whose whole information is stored in its first row and its multiplication with any vectors can be easily computed with complexity Oðnlog nþ [Nowak et al., 2003]. Nonetheless, the memory requirement for H is still high in applications such as hydraulic tomography an emerging subsurface imaging technology where a large volume of pressure measurements (large m) are collected in multilevel wells and samplers. [6] In this paper, we propose the use of Krylov subspace methods to solve equation (3) iteratively such that H does not need to be explicitly computed and stored, and only the computing of Hx and H T x for any vector x (assume x is variably and appropriately sized to make the multiplications doable) is needed. Furthermore, the computing of Hx or H T x needs only one run of the forward model h. This extra loop of iteration, combined with the utilization of a Toeplitz prior generalized covariance matrix Q, minimizes the memory requirement of finding the Newton steps for nonlinear geostatistical inverse problems without increasing the CPU cost when the number of iterations to solve equation (3) is relatively small. We note that similar strategies have been developed for geophysical inverse problems in Mackie and Madden [1993], Newman and Alumbaugh [1997], Haber et al. [2000], and Avdeev [2005]. However, the use in geostatistical inverse modeling has not been developed, especially for dense inversion systems. [7] The rest of this paper is arranged as follows. We first develop the methodology for the use of Krylov subspace methods to solve the geostatistical inversion system in equation (3) (section 2.). We start from the development of adjoint equations to compute Hx and H T x for transient measurements in groundwater flow (subsections 2.1. and 2.2.). We then discuss preconditioning of the linearized geostatistical inversion system (subsection 2.3.) and assessment of computational efficiency (subsection 2.4.). In section 3., we take two synthetic examples of 3-D transient hydraulic tomography to demonstrate the efficiency of the proposed methodology in solving large-scale inverse problems. Then we discuss in section 4., the generation of conditional realizations and the potential extension of the proposed methodology to other geostatistical inverse methods such as the SLE. Section 5. contains conclusions. 2. Methodology [8] Krylov subspace methods such as GMRES, MIN- RES, and Conjugate Gradient have been widely used to solve linear equation systems [Saad, 2003]. For a general linear system like Ax 5 b, at the ith iteration, it finds an approximate solution x i from an affine space x 0 1K i, where x 0 is the initial guess; K i 5span fr 0 ; Ar 0 ; A 2 r 0 ;...; A i21 r 0 g, the Krylov subspace; and r 0 5b2Ax 0, the initial residual. Among the many Krylov subspace methods available, we use MINRES [Paige and Saunders, 1975] to solve equation (3) in applications of this paper because MINRES is suitable to solve symmetric, indefinite linear systems such as equation (3). MINRES solves a sequence of least-squares problems arg min xi 2x 0 1K i jjb2ax i jj to find x i and convergence can be claimed when jjb2ax i jj=jjbjj r where r is a user given tolerance for the residual. Another reason why we choose MINRES is that this algorithm is well implemented in the MATLAB function minres. We choose r 510 25 for all simulations of this paper and it is sufficiently small to provide accurate solutions of equation (3). For more explanation of these methods the readers are referred to Paige and Saunders [1975] and Saad [2003]. Note that the inversion algorithm used in this manuscript is the same as that of Kitanidis [1995] except that we use Krylov subspace methods to solve the linearized geostatistical inversion system (3) at each Newton step during maximization of (2). To use minres we only need a function that returns the product Kx for an appropriately sized arbitrary vector x and a preconditioner to speed up convergence. Computing Kx for an arbitrary vector x can be broken down to two parts. The first part is the multiplication of an arbitrary vector x with the covariance matrix Q, which is done in this paper with the Fast Fourier Transform (FFT) following Nowak et al. [2003]. The second part is the multiplications Hx or H T x for an arbitrary vector x, for which we will develop adjoint equations in the following section. 2.1. Discrete Adjoint Sensitivity Analysis [9] Both continuous and discrete sensitivity analysis have been used for the adjoint-state method [Carter et al., 1974; Liu and Kitanidis, 2011]. We use discrete analysis here because the derivation is simpler and the physical meaning of the adjoint equation is straightforward. [10] To proceed, we take a groundwater flow equation as an example and write the governing equation in domain X: 199

rðkrhþ1qdðx2x Q Þ5S @h @t subject to initial and boundary conditions: (5) h5h 0 at t50 (6) h5h 1 on C 1 (7) ðkrhþn5q on C 2 (8) where h is the hydraulic head; K represents hydraulic conductivity; S represents specific storage; Q is the pumping (2)/injection (1) rate at location x Q ; d is the Dirac delta function; h 1 is the specified head value at the Dirichlet boundary C 1 ; n is a unit vector normal to the Neumann boundary C 2 ; and q is the specified normal flux through C 2. We should note here that although equation (5) is a linear PDE for the state variable h, it leads to a nonlinear forward problem h(k) as the relationship between h and K is obviously nonlinear. [11] After discretizing domain X into n cells and the time axis into n t steps (implicit) to solve equation (5), we end up solving a linear equation system: fðp; hþ5bðpþ2aðpþh50; (9) where A is an nn t 3nn t matrix; b is an nn t 31 vector; p5ln ðkþ½n31š; and each row represents a linear equation that usually results from the application of the law of mass conservation in a given cell at a certain time step. In practice this system is never solved directly, instead a smaller n 3 n system is solved for each time step. [12] For convenience, let us write h5a 21 b, and e i is a zero vector with the ith entry replaced by one. We have the sensitivity of the ith entry in h; h i 5e T i h to p j, the jth entry in p: J ij 5 @h i 52e T i A 21 b1e T i 5e T @b i A21 2 @A h 5k T i n j; A21 @A where n j 5 @b 2 @A h; k T i 5eT i A21 such that A21 @b (10) A T k i 5e i ; (11) and @A and @b for all j51;...; n are obtained while matrices A and b are formed during discretization. The sensitivity of h w.r.t. S, the specific storage, can be similarly derived by computing n j for S during discretization. Note that this form is equivalent to the adjoint equations derived in Carter et al. [1974], Sykes et al. [1985], and Sun and Yeh [1985]. [13] We now show that k T i n j actually acts as a chain rule for derivative evaluation. In fact, @f 5 @b 2 @A h5n j returns derivatives w.r.t. p j of mass flow out of cell j to all other cells (or neighboring cells) and boundaries, given the current head configuration; k i is simply the head response at time steps n t ;...; d i ne;...; 1 to an imaginary pumping test at the observation cell at time step n t 2d i ne11, with all initial head and constant boundary head values fixed at 0. In other words, it is the head change (the law of superposition) if an additional sink term is enforced at the observation well. By the Law of Reciprocity [Morse and Feshbach, 1953, p. 858; Carter et al., 1974], it is also the head response at time step d i ne at the observation cell to an additional imaginary pumping test (mass outflow) at each of all cells at time steps 1;...; d i n e;...; n t. As a result, k T i n j5 P k @h i @f k @f k functions as a chain rule (or the total derivative rule) to calculate the sensitivity of the observation to p j. [14] We can also extend equation (10) for the whole sensitivity matrix H as H5K T N; (12) where N is an n 3 n matrix whose jth column is n j, and A T K5M T (13) is the set of adjoint equations we need to solve for K and hence H. Since M has m rows, solving equation (13) is equivalent to solving the forward model m times with various source term conditions. In fact, it is evident to show that forming H only requires running the forward model m l times where m l is the number of unique measurement locations. 2.2. Adjoint Equations for Hx and H T x [15] In this section, we derive the adjoint equations for Hx and H T x based on discussion in the previous section. [16] From equations (12) and (13), we have where H T x5n T ða 21 Þ T M T x5n T k HT (14) A T k HT 5M T x; (15) is the adjoint equation for k HT and hence H T x. [17] Similarly for Hx, we have Hx5MA 21 Nx5Mk H ; (16) where Ak H 5Nx is the adjoint equation for k H and hence Hx. [18] Similar analysis can be done for moment equations of hydraulic head drawdown, and we show it for the first two moments in Appendix A. 2.3. Preconditioning [19] Preconditioning does not change the solution to the linear system in equation (3) but can significantly speed up the convergence of MINRES in solving the system. Without adequate preconditioning, the iteration process can be extremely long, leading to significant increase of CPU cost. A good preconditioner should be as close to K 21 as possible while at the same time, be as computationally cheap to acquire as possible. We use here the preconditioner 200

recently developed in Saibaba and Kitanidis [2012] for geostatistical inverse systems. It uses a predefined low rank representation of the prior covariance matrix Q to establish a close approximation of K 21. With a limited number of extra forward simulation runs, it significantly reduces the number of iterations to achieve convergence in solving equation (3). We give a brief introduction of the construction process and readers interested in more details are referred to Saibaba and Kitanidis [2012]. [20] 1. Compute incomplete eigenvalue decomposition of Q V r K r V T r by retaining the r largest eigenvalues of Q. This step is done with MATLAB function eigs that is a matrix-free eigenvalue solver. [21] 2. Form the matrix W5R 21 2 HVr K 1 2 r. This step involves the multiplication HV r, which is done with the adjoint method mentioned in the previous section. This is the most costly step for constructing the preconditioner and the total number of required forward simulations is r. [22] 3. Compute the singular value decomposition of the matrix W5URV T. This step is done with the MATLAB function svd. [23] 4. Approximate ^W 21 5R 21 2 2R 2 1 2 UDr U T R 21 2, where r D r 5diag 2 i and r 11r 2 i ; i51;...; min fn; rg are the singular values of W. i [24] 5. We now have the approximate inverse ^K 21 5 ^W 21 U T. Note that we never store ^K 21 in its ^W21 U matrix form as it is large and dense. Instead, we only need to compute the multiplication ^K 21 x for vector x whenever needed. 2.4. Computational Efficiency [25] We have now completely avoided the computing and storage of matrices of size OðmnÞ and larger. The largest matrix in the whole inversion process is V, which is n 3 r. As shown in the numerical examples in the next section, it is sufficient to pick an r value around 100 300 for n up to 250,000. Hence, the memory requirement is now in gigabytes instead of hundreds of gigabytes if direct methods are used to solve equation (3). [26] The CPU cost of geostatistical inversion is often dominated by running the forward model multiple times. Note that with either MINRES or the traditional direct method, we are solving the same linear system in equation (3) to find the Newton direction during maximization of (2). With the direct method, the number of forward simulations for each Newton step is m l, the number of adjoint equations to solve to acquire H and hence to linearize the forward model h(p), and it is also the number of observation locations (assume m l < n). With the methodology proposed in this paper, the number of forward simulations during each Newton step equals to the number of evaluations of Hx or H T x, each of which requires solving the adjoint equation once. Thus, the total number of forward simulations is the sum of the predefined rank (r) of the approximate prior covariance matrix and twice the number of MINRES iterations (s) to solve the linear system in equation (3). Thus, we can also achieve computational savings in CPU time with the proposed methodology if r12s < m l. 3. Numerical Applications [27] We apply the methodology developed above to two transient hydraulic tomography problems similar to those in Zhu and Yeh [2005], Liu et al. [2007], and Cardiff and Barrash [2011]. In each of these problems, a set of numerical pumping tests are conducted at different locations in the model domain with the generated true hydraulic conductivity field. The transient hydraulic pressure responses at a number of observation locations are simulated and used as measurements in the inverse model to characterize hydraulic property distributions in the domain. In the small test problem, the accuracy of our methodology is demonstrated by comparing with the traditional direct method. In the large-scale problem, the computational advantages of our methodology are demonstrated, while it is very difficult, if not impossible, to solve such a large-scale problem with the traditional direct method. [28] We need to emphasize here that the methodology we propose here is not limited to any specific type of measurements or inverse problems. We use the Bayesian inverse method in this application; however, as discussed in a latter section, the methodology can also be applied to other inverse methods such as the well-known Successive Linear Estimator (SLE) [Yeh et al., 1996] that also involves the solution of linear systems containing H and H T. 3.1. A Small Test Problem [29] We first check the accuracy of the proposed methodology using a synthetic problem of transient flow in a 3-D domain of 20 m 3 20 m 3 50 m in the x, y, and z directions, respectively. The domain is relatively coarsely discretized with equally sized 1 m 3 1m3 1 m cells. The true Gaussian ln (K) distribution ðmean 5ln ð0:2m=d ÞÞ is generated with truncated Karhunen-Loève expansion by keeping the 100 largest eigenpairs of the exponential covariance matrix (correlation length 5 20 m, 20 m, and 10 m in the x, y, and z directions, respectively). The generated 3-D ln (K) field is shown in Figure 1. A constant specific storage value is assumed to be known at 3 3 10 24 m 21. Twelve numerical pumping tests at 12 different locations (permutation of x 5 [5, 15], y 5 [5, 15], and z 5 [10, 25, 40]) are conducted to produce synthetic data of transient pressure. The pressure is measured every 5 m in each direction, leading to m l 5 3 3 3 3 9 5 81 measurement locations. The pressure sampling interval at a measurement location is 1 day and each pumping test lasts 5 days. This leads to a total of 4860 (81 3 5 3 12) pressure measurements. The finite volume method is used to solve the forward model in equation (5) together with the implicit method in the time domain. [30] We run our models on a single node of a Linux cluster with MATLAB R2011b, Intel Xeon X5650 2.67 GHz CPUs, and 24 GB total memory. The node has 12 cores, among which the forward runs of multiple pumping tests are evenly distributed. [31] For the above transient hydraulic tomography test problem, we use the developed method with the MATLAB built-in function minres to solve equation (3) (convergence tolerance r 510 25 ) and the Gauss-Newton method to maximize (2). In this case, we construct the preconditioner by 201

Figure 1. True 3-D ln (K) field used to generate synthetic hydraulic head data for inversion. The field is generated with truncated Karhunen-Loève expansion and the dimension is 20m (X) 3 20m (Y) 3 50m (Z). keeping the 150 largest eigenvalues of Q. For accuracy comparison, we also use the direct method to solve equation (3). For both methods, it takes 5 Newton steps until the Gauss-Newton algorithm converges and reaches the final estimation. The estimated ln (K) fields using the two methods are plotted in Figure 2, as a function of the number of Newton iterations (1, 3, and 5 Newton steps). A visual inspection shows that the difference in ln (K) estimation is very small between the two methods for each Newton step. Indeed, the relative 2-norm of the difference in ln (K) estimation is 2.7%, 1.4%, and 0.09% respectively after 1, 3, and 5 Newton steps. Figure 3 shows very good fittings between the true and the inverted ln (K) values and those between the measured and simulated pressure data obtained by both methods. This demonstrates that the iterative method produces results almost identical to those produced by the direct method. This phenomenon is expected as they solve the same linear system to find the Newton step and the difference is only a function of the convergence tolerance r, which is set small in this case. 3.2. A Large-Scale Application [32] To show the power of our method, we construct another 3-D ln (K) field in a model domain of 50 m 3 50 m 3 100 m in the x, y, and z directions, respectively (Figure 4a). A uniform discretization of 1 m 3 1m3 1 m is used, leading to 250,000 cells and hence 250,000 unknown hydraulic conductivity (ln (K)) parameters to estimate. The true Gaussian ln (K) distribution with a mean of ln (1 m/d) is generated and shown in Figure 4a. The generation is performed using the truncated Karhunen-Loève expansion and keeping the 200 largest eigenvalues of the exponential covariance matrix (correlation length 5 20 m, 20 m, and 10 m in the x, y, and z directions, respectively). A constant specific storage value is assumed to be known at 3 3 10 24 m 21. Twelve numerical pumping tests at 12 different locations (permutation of x 5 [10, 40], y 5 [10, 40], and z 5 [20, 50, 80]) are conducted to produce synthetic transient pressure data measured every 5 m in each direction, with a total of m l 5 9 3 9 3 19 5 1539 measurement locations. Transient pressure measurements are also taken at 5 time steps, with a total of 1539 3 5 3 12 5 92,340 measurements from the 12 pumping tests. We also add to the measurements Gaussian random noise with a standard deviation of 5% of the drawdown. In this case, the sensitivity matrix H alone would have about 23 billion elements and hence require about 200 GB memory to fully store it with double precision. Moreover, it will be very expensive computationally to operate with such a large and dense matrix and thus impossible to solve equation (3) with traditional direct methods and available computing resources. With the developed methodology and minres, however, only a few GB of memory is required and the use of any large and dense matrices is completely avoided during the inversion process. [33] Figure 5 shows the 400 largest eigenvalues of the prior covariance matrix Q. We choose the 300 largest eigenpairs of Q to construct the preconditioner, which significantly speeds up the convergence of minres. The inversion starts from the initial guess of a homogeneous ln (K) field equaling ln (0.5 m/d). The entire inversion process takes about 26 h after 5 Newton steps, each of which takes 118, 96, 94, 78, and 88 minres iterations, totalling 474 3 2 5 948 forward model runs. Thus, the total number of forward runs for inversion is 300 3 5 1 948 5 2448. This is actually significantly less than that using direct methods involving computing the whole sensitivity matrix, which requires 1539 3 5 5 7695 forward runs. [34] The estimated ln (K) field is plotted in Figure 4b and a visual inspection shows that it is very close to the true ln (K) field shown in Figure 4a. This is verified more quantitatively in Figure 6 a comparison of the true and estimated ln (K) values (left), and a comparison of the measured and predicted hydraulic head values with the estimated ln (K) field (right). 4. Discussion 4.1. Conditional Realizations [35] Uncertainty quantification is always an important part of geostatistical inverse modeling and it provides necessary inputs to risk analysis. For large problems with 202

Figure 2. Estimated 3-D ln (K) field (20 (X) 3 20 (Y) 3 50 (Z)) at the 1st (column 1), 3rd (column 2), and 5th (column 3) Newton steps with (top) the proposed methodology and (bottom) the direct method. many unknowns, it is impractical and often impossible to compute and store the whole posterior covariance matrix of the estimated parameters. Instead, a more common practice is to generate equally probable conditional realizations of the parameters [Kitanidis, 1995; Hanna and Yeh, 1998; Liu and Kitanidis, 2011]. To generate a conditional realization, we start from an unconditional realization p u with zero mean and covariance Q, and measurement error e with zero mean and covariance R. We then get a conditional realization p c by solving the following problem: arg min p c n o ðp c 2p u Þ T Gðp c 2p u Þ1ðy 1e2yðp c ÞÞ T R 21 ðy 1e2yðp c ÞÞ ; (17) which is equivalent to minimizing the logarithm of (2). The process above is actually solving another inverse problem with a different initial guess and synthetic noise contaminated measurements. If we need to generate a number of conditional realizations, the total saving in CPU time increases proportionally. 203

Figure 3. (left) The fitting between the true and the estimated ln (K) values and (right) the fitting between the measured and the simulated y values. 4.2. Potential Extension to Other Geostatistical Inverse Methods [36] We take the Successive Linear Estimator (SLE) as an example for potential extension of the methodology discussed in section 2. to other inverse methods. The SLE iteratively linearizes the forward model and applies a Best Linear Unbiased Estimator (BLUE) to improve parameter estimation until convergence [Yeh et al., 1996]. Within each linearization of the forward model, it solves the following linear equations system: and then updates the estimation HQH T 1hI K5HQ; (18) p i11 5p i 1K T ðy 2yðp i ÞÞ; (19) where I is an identity matrix and hi serves as a stabilization term that improves the condition number of HQH T. [37] Equation (18) is not amenable to iterative methods since its right-hand side has many columns; however, we can easily rewrite it in a similar form to equation (3): and HQH T 1hI n5y 2yðp i Þ; (20) p i11 5p i 1QH T n: (21) [38] Equation (20) can now be easily solved iteratively with Krylov subspace methods that only require the products of Hx and H T x for an arbitrary appropriately sized vector x. 204

Figure 4. (a) True 3-D ln (K) field used to generate synthetic hydraulic head data for inversion. The field is generated with truncated Karhunen-Loève expansion and the dimension is 50 (X) 3 50 (Y) 3 100 (Z). (b) Estimated ln (K) using simulated transient hydraulic head data and the developed iterative method. 5. Conclusions [39] In this paper, we proposed an iterative methodology to solve the linear system resulted from large-scale geostatistical inverse modeling of nonlinear forward problems. Using this method required on-the-go computing of the products of the sensitivity matrix and its transpose with any appropriately sized vectors. We derived adjoint equations for these products for transient pressure and drawdown moment measurements of groundwater flow and showed that only one forward run is needed for the computing of each product. We applied the iterative method together with a previously developed preconditioner for geostatistical inversion system to a large-scale numerical example of transient hydraulic tomography, and solved a Bayesian inverse problem with 250,000 unknowns and 100,000 measurements using only a few GB of memory. Using traditional direct method would otherwise need 200 GB memory and it would be impossible to achieve without massive parallelization. The huge savings on memory were a result of not computing and storing the forward model s sensitivity matrix, a large dense matrix that was computationally expensive to store and operate with. [40] We used the minres function included in MATLAB to solve the linearized geostatistical inversion system, for which the total number of forward runs was two times the number of iterations plus the rank of the approximate prior covariance matrix used to construct the preconditioner. We also showed that minres converges rather fast and the total number of forward runs required during each Newton step is smaller than that required to construct the sensitivity matrix. However, we need to note here that the convergence rate is related to the spectrum of the system in equation (3) and hence problem-specific. There may exist certain problems for which the CPU cost is higher with minres than with direct methods. [41] In the end, we also discussed extension of the methodology to other geostatistical inverse methods such as Figure 5. Leading eigenvalues of the prior covariance matrix of the 3-D ln (K)fieldwith503 50 3 100 unknowns. 205

Figure 6. (left) Fitting between the true and the estimated ln (K) values and (right) fitting between the measured and the simulated y values. SLE and it presented that the proposed method could be easily applied to other geostatistical inverse methods. @m 0 21 @A 52A m 0 : (A7) Appendix A: Adjoint Method for Moment Equations [42] Assuming full recovery of the drawdown curve, and a hydrostatic initial condition, the zeroth moment (M 0 ) equation [Zhu and Yeh, 2006] for pressure drawdown induced by pumping is subject to boundary conditions: rðkrm 0 Þ1sQdðx2x Q Þ50 (A1) M 0 50onC ð 1 1 ðkrm 0 Þn5 qdt on C 2 ; 0 (A2) (A3) where s is the duration of pumping. Similarly, the first moment (M 1 ) equation is rðkrm 1 Þ1 s2 2 Qdðx2x QÞ1SM 0 50 (A4) subject to boundary conditions: M 1 50onC ð 1 1 ðkrm 1 Þ n5 tqdt on C 2 : 0 (A5) (A6) [43] We should note that after discretization, both equations have the same left-hand side (matrix A) as the hydraulic head equation. The right-hand sides (vector b) are different so both can be solved with the steady-state flow simulator with appropriate source terms. [44] Computation of the sensitivity of M 0 w.r.t. p 5 ln (K) is straightforward, and it is similar to that of the steady state head and we have [45] The sensitivity of M 1 w.r.t. K and S is a little trickier since M 0 is in the right-hand side of the linear equations system (vector b). Similarly to equation (10), we have for M 1 J ij 5 @m 1 i 5e T i @p A21 2diag ðsþ @m 0 2 @A m 1 j 5e T i A21 21 @A diag ðsþa 5 ða 21 Þ T diag ðsþk i 5k 0 T i n 0j 2k T i n 1 j ; m 0 2e T i T n0j 2kT i n 1j A21 @A m 1 (A8) where A T k 0 i 5diag ðsþk i, which is the first-moment adjoint equation and the solution k 0 i is derived by solving the steady-state forward problem for another time using diag ðsþk i as the source terms. Similarly, adjoint equations for Hx and H T x can be easily derived. [46] Acknowledgments. This work was funded by the Assistant Secretary for Fossil Energy, Office of Sequestration, Hydrogen, and Clean Coal Fuels, National Energy Technology Laboratory, of the U.S. Department of Energy under contract DE-AC02 05CH11231. Additional funding was provided by the Earth Sciences Division of Lawrence Berkeley National Laboratory through Early Career Development Grants. We also thank Michael Cardiff, Dmitry B. Avdeev, and the other anonymous reviewer for their valuable comments and suggestions. References Avdeev, D. (2005), Three-dimensional electromagnetic modelling and inversion from theory to application, Surv. Geophys., 26, 767 799, doi: 10.1007/s10712-005-1836-x. Cardiff, M., and W. Barrash (2011), 3-D transient hydraulic tomography in unconfined aquifers with fast drainage response, Water Resour. Res., 47, W12518, doi:10.1029/2010wr010367. Carter, R., L. Kemp, A. Pierce, and D. Williams (1974), Performance matching with constraints, Soc. Pet. Eng. J., 14(2), 187 196. 206

Haber, E., U. M. Ascher, and D. Oldenburg (2000), On optimization techniques for solving nonlinear inverse problems, Inverse Probl., 16(5), 1263, doi:10.1088/0266-5611/16/5/309. Hanna, S., and T.-C. Yeh (1998), Estimation of co-conditional moments of transmissivity, hydraulic head, and velocity fields, Adv. Water Resour., 22(1), 87 95, doi:10.1016/s0309-1708(97)00033-x. Hoeksema, R. J., and P. K. Kitanidis (1984), An application of the geostatistical approach to the inverse problem in two-dimensional groundwater modeling, Water Resour. Res., 20(7), 1003 1020. Kitanidis, P. K. (1995), Quasi-linear geostatistical theory for inversing, Water Resour. Res., 31(10), 2411 2419. Kitanidis, P. K. (1996), On the geostatistical approach to the inverse problem, Adv. Water Resour., 19(6), 333 342. Kitanidis, P. K. (1998), How observations and structure affect the geostatistical solution to the steady-state inverse problem, Ground Water, 36(5), 754 763. Kitanidis, P. K., and E. G. Vomvoris (1983), A geostatistical approach to the inverse problem in groundwater modeling (steady-state) and onedimensional simulations, Water Resour. Res., 19(3):677 690. Liu, X., W. A. Illman, A. J. Craig, J. Zhu, and T. C. J. Yeh (2007), Laboratory sandbox validation of transient hydraulic tomography, Water Resour. Res., 43, W05404, doi:10.1029/2006wr005144. Liu, X., and P. K. Kitanidis (2011), Large-scale inverse modeling with an application in hydraulic tomography, Water Resour. Res., 47, W02501, doi:10.1029/2010wr009144. Mackie, R. L., and T. R. Madden (1993), Three-dimensional magnetotelluric inversion using conjugate gradients, Geophys. J. Int., 115(1), 215 229, doi:10.1111/j.1365-246x.1993.tb05600.x. Morse, P. M., and H. Feshbach (1953), Methods of Theoretical Physics, Part I, vol. 1, McGraw-Hill, N.Y. Newman, G. A., and D. L. Alumbaugh (1997), Three-dimensional massively parallel electromagnetic inversion i. Theory, Geophys. J. Int., 128(2), 345 354, doi:10.1111/j.1365-246x.1997.tb01559.x. Nowak, W., S. Tenkleve, and O. A. Cirpka (2003), Efficient computation of linearized cross-covariance and auto-covariance matrices of interdependent quantities, Math. Geol., 35(1), 53 66. Paige, C. C., and M. A. Saunders (1975), Solution of sparse indefinite systems of linear equations, SIAM J. Numer. Anal., 12(4), 617 629, doi: 10.1137/0712047. Saad, Y. (2003), Iterative Methods for Sparse Linear Systems, 2nd ed., Society for Industrial and Applied Mathematics, Philadelphia, Penn. Saibaba, A. K., and P. K. Kitanidis (2012), Efficient methods for largescale linear inversion using a geostatistical approach, Water Resour. Res., 48, W05522, doi:10.1029/2011wr011778. Sun, N.-Z., and W. W.-G. Yeh (1985), Identification of parameter structure in groundwater inverse problem, Water Resour. Res., 21(6), 869 883, doi:10.1029/wr021i006p00869. Sykes, J. F., J. L. Wilson, and R. W. Andrews (1985), Sensitivity analysis for steady-state groundwater-flow using adjoint operators, Water Resour. Res., 21(3), 359 371. Yeh, T. C. J., M. H. Jin, and S. Hanna (1996), An iterative stochastic inverse method: Conditional effective transmissivity and hydraulic head fields, Water Resour. Res., 32(1), 85 92. Zhu, J. F., and T. C. J. Yeh (2005), Characterization of aquifer heterogeneity using transient hydraulic tomography, Water Resour. Res., 41, W07028, doi:10.1029/2004wr003790. Zhu, J. F., and T. C. J. Yeh (2006), Analysis of hydraulic tomography using temporal moments of drawdown recovery data, Water Resour. Res., 42, W02403, doi:10.1029/2005wr004309. 207