Fast Statistical Surrogates for Dynamical 3D Computer Models of Brain Tumor

Size: px
Start display at page:

Download "Fast Statistical Surrogates for Dynamical 3D Computer Models of Brain Tumor"

Transcription

1 Fast Statistical Surrogates for Dynamical 3D omputer Models of Brain Tumor Dorin Drignei Department of Mathematics and Statistics Oakland University, Rochester, MI 48309, USA Abstract. Understanding how malignant brain tumors are formed and evolve has direct consequences on the development of efficient methods for their early detection and treatment. Adequate mathematical models for brain tumor growth and invasion can be helpful in clarifying some aspects of the mechanism responsible for the tumor. These mathematical models are typically implemented in computer models, which can be used for computer experimentation to study how changes in inputs, such as growth and diffusion parameters, affect the evolution of the virtual brain tumor. The computer model considered in this paper is defined on a three dimensional (3D) anatomically accurate digital representation of the human brain, which includes white and grey matter, and on a time interval of hundreds of days to realistically simulate the tumor development. onsequently, this computer model is very computationally intensive and only small size computer experiments can be conducted, corresponding to a small sample of inputs. This paper presents a computationally efficient multidimensional kriging method to predict the evolution of the virtual brain tumor at new inputs, conditioned on the virtual brain tumor data available from the small size computer experiment. The analysis shows that this prediction can be more accurate than a computationally competing model. Keywords: Astrocytomas; BrainWeb; omputer experiments; Kriging; Numerical models. Introduction Despite recent advances in computerized tomography (T) and magnetic resonance imaging (MRI), the chances for early detection and subsequent successful treatment of malignant brain tumors are still low. In general, the cancerous tumors develop from one or several mutating cells which sustain rapid uncontrolled growth and invade the normal tissue. This paper is concerned with the most common type of primary brain tumors, called gliomas, which begin in glial cells (the supportive tissue of the brain). The most common gliomas are known as astrocytomas, originating in connective tissue cells called astrocytes. The astrocytomas, in turn, could be low-grade (the least malignant), mid-grade (moderately malignant) or highgrade (glioblastomas, the most malignant). In children, astrocytomas are most commonly located in the cerebellum whereas in adults the most common location is in the cerebral hemispheres. The exact causes of these brain tumors and their mechanism of development are still under intense scientific investigation. Mathematical models of brain tumors are helpful in understanding some aspects of the mechanism responsible for brain tumors, with implications for prognosis and treatment. The dynamical three dimensional (3D) mathematical model considered here includes growth and diffusion parameters. The growth parameter is an unknown constant appearing in the growth term of the mathematical model. As in Swanson et al (2000) and Murray (2003), chapter, distinct parameters for diffusion of a brain tumor are specified for white and grey brain matter, with larger diffusion parameters in the white matter since the tumor diffuses faster in such tissue. These mathematical models are typically implemented in computer models, or codes. Different combinations of parameters (or inputs) lead to appropriate simulations of virtual brain tumors with different degrees of malignancy (the outputs). Experimentation with these computer models, in which the inputs are systematically changed, represents a useful method for understanding their effects on the outputs. For each input, these computer models are solved numerically over time and three spatial dimensions, and they are very computationally intensive, with a single run taking hours of computational time. Therefore, only small size experiments (corresponding to a small number of sampled inputs) can be conducted and analyzed.

2 The main goal of this paper is to illustrate a computationally efficient method for predicting virtual brain tumors at any input in an input space, conditioned on the virtual brain tumor data available from the small number of computer model runs. More precisely, a small number of inputs are sampled in the input space according to a specific design, the corresponding dynamical 3D computer model runs are performed and the output data are recorded. An appropriate statistical model for the sampled output data is estimated and kriging methods are used to predict the output data at new inputs. This prediction can then be used as a substitute (or surrogate) for the computationally intensive dynamical 3D computer model, at any input in the input space. An important aspect of the output data sets analyzed here is that these are deterministic: running the computer model twice for the same input yields the same output data set. In this paper we take the empirical Bayesian approach of urrin et al (99) where, before the computer model runs are performed, the joint distribution of a Gaussian process is assumed as a prior distribution for the unknown output, over the entire input space. The general statistical methodology of kriging for design and analysis of computer experiments is discussed, in the context of univariate output, by Sacks et al (989), urrin et al (99) and in the books by Santner et al (2003) and Fang et al (2006). More recently, a Bayesian model that relies on this methodology has been used by Bayarri et al (2007a) to calibrate computer models. The output data sets presented in this paper are high-dimensional. Straightforward generalizations of the univariate case to accommodate analysis of such large output data sets have computational limitations, as pointed out in Drignei and Morris (2006). More precisely, the general methodology would require the specification of a high dimensional dense covariance matrix, which in turn would be difficult to estimate and use in kriging formulas. Drignei (2006) proposed a two-stage method for the analysis of high dimensional computer output, which was illustrated with a geophysical example. Drignei and Morris (2006) developed methodology specifically for the analysis of finite difference computer models, in which the partial derivatives of the original mathematical model are approximated by finite differences on a grid. The form of the finite difference dynamical 3D computer model of a brain tumor considered here permits the development of a more computationally efficient modification of the prediction method presented in Drignei and Morris (2006). Other recent work in the area of multidimensional computer output analysis has been done by Bayarri et al (2007b) and Higdon et al (2007), who used basis representations for multidimensional output in a Bayesian framework and in the context of computer model calibration. This paper is organized as follows. Section 2 discusses the computer experiment with the dynamical 3D computer model of brain tumor growth and invasion, the input parameters and the output data sets. Section 3 presents the statistical methodology, including the statistical model, kriging prediction and validation. Section 4 presents the results and some conclusions are presented in Section 5. 2 An experiment with the computer model of brain tumor 2. The computer model The computer model is a deterministic, iterative relationship in time and three spatial dimensions. It originates in a continuous mathematical model (a partial differential equation), which includes diffusion and growth terms (Murray 2003, p 545). The mathematical model is defined on a rectangular spatial domain and in a finite time interval. It does not have a closed form, analytical solution, and therefore a numerical, approximate solution is obtained. The partial derivatives with respect to time and space are approximated on a discrete grid of points, generating an iterative relationship commonly known as a finite difference scheme. Finer grids produce more accurate but more computationally intensive numerical solutions. The analysis presented here concerns a hypothetical adult patient, who has developed a brain tumor initially localized at a spatial point x 0. The untreated brain tumor is then followed for about 600 days. onsider the time interval [0, L T ] and the rectangular spatial domain [0, L S ] [0, L S2 ] [0, L S3 ]. This four-dimensional volume is discretized on a fine grid with M F + equally distanced time points and (S F + 2) (S 2F + 2) (S 3F + 2) equally distanced space points. Adjacent points are separated by time increment tf = L T /M F and space increment = L Si /(S if + ), i =, 2, 3. While theoretically the space increments need not be equal, the brain spatial grid available for this analysis requires equal spaced increments for all three spatial dimensions. The development of BrainWeb ( allows users to obtain anatomically accurate digital data sets of the human brain (ollins et al, 998). In particular, here the interest is in data sets that define the white and grey matter, and the brain boundaries in an

3 cm 3 rectangular box. The fine grid has space points, 2mm apart in each dimension. Figure presents a smoothed image of this data set, sectioned at the center x 0 (approximately the same as in Murray 2003, p.577) of the future virtual tumor. While the tumor starts from a single cell, by the time of the first computerized tomography (T) scan it evolved in a bundle of malignant cells, assumed normally distributed at a center point x 0, with a spread b and maximum cell density a Y (0, i, j, k) = a exp[ (x i x 0i ) 2 + (y j y 0j ) 2 + (z k z 0k ) 2 ], b where Y is the brain tumor cell density, x = (x i, y j, z k ), x 0 = (x 0i, y 0j, z 0k ), i =,..., S F +2, j =,..., S 2F +2, k =,..., S 3F + 2, with S F = 89, S 2F = 07, S 3F = 89, and parameters a and b similar to those in Murray (2003), p Here 0 is the initial time for the analysis presented. The computer model is iteratively expressed as Y (t +, i, j, k) = Y (t, i, j, k) + tf { Y (t,i+,j,k) Y (t,i,j,k) Y (t,i,j,k) Y (t,i,j,k) [D(i, j, k) D(i, j, k) [D(i, j, k) Y (t,i,j+,k) Y (t,i,j,k) Y (t,i,j,k+) Y (t,i,j,k) D(i, j, k) Y (t,i,j,k) Y (t,i,j,k) Y (t,i,j,k) Y (t,i,j,k ) [D(i, j, k) D(i, j, k ) ρ Y (t, i, j, k)}, t = 0,..., M F, i = 2,..., S F +, j = 2,..., S 2F + and k = 2,..., S 3F +. This finite difference relationship is said to be explicit because the numerical solution at time t + is obtained explicitly from the numerical solution at previous time points. However, implicit finite differences may also be considered (e.g. Swanson et al 2004). In the above relationship, D(i, j, k) represents the array of non-homogeneous parameters (or coefficients) of diffusion of a tumor inside the brain, which are piecewise constant: D g, D w diffusion parameters in the grey and white matter respectively, and zero anywhere else in the rectangular domain (e.g. Tracqui et al 995, p. 20). As in Murray (2003), the tumor diffuses about 5 times faster in the white than in the grey matter, therefore we choose D w = 5D g. Also, ρ is the tumor growth parameter. The unknown parameters D g and ρ will be the inputs in the computer model and will be discussed below. To maintain numerical stability, a very small time step needs to be chosen; here M F = 5, 003 equally spaced time points are considered. The numerical solution computed on such a fine grid, in a loop with respect to time and in vector format with respect to each spatial dimension, takes about 5 hours and 45 min for each run in R on a computer with a 2.6 GHz Intel ore Duo processor and 2GB RAM. Note that BrainWeb allows the user to choose even finer spaced grids, of mm 3 (as in Murray 2003) or 0.5mm 3 resolutions. Such fine spaced grids require a smaller time increment (to maintain numerical stability), which in turn give more accurate, but even more computationally intensive numerical solutions. Figure. Anatomically accurate model of human brain containing white and grey matter, embedded in a rectangular domain and sectioned at the center of the future virtual tumor. 3

4 Figure 2. Low grade virtual brain tumor (black contours) corresponding to input (D g = , ρ = 0.002), at four different times separated by 200 days (columns), in transversal (upper row), coronal (middle row), sagittal (lower row) brain sections. Figure 3. High grade virtual brain tumor (black contours) corresponding to input (D g = 0.003, ρ = 0.02), at four different times separated by 200 days (columns), in transversal (upper row), coronal (middle row), sagittal (lower row) brain sections. 2.2 The input parameters omputer experiments have been conducted in Murray (2003), especially in two-dimensional coronal sections of the brain, to assess the effect of diffusion (invasiveness) and growth parameters on the evolution of a brain tumor. The input parameter vectors (D g, ρ) have been constrained to the rectangular input space [0.0003, 0.003] [0.002, 0.02], as in Murray (2003) p.572, since values in this region lead to virtual brain tumors in agreement with clinical data. In this paper we consider computer experiments in the same input 4

5 region, but the output data sets have three spatial dimensions. Figures 2 and 3 show snapshots of the brain tumor at interval of 200 days, for two corners of the input space: a low grade tumor (D g = , ρ = 0.002) and a high grade tumor (D g = 0.003, ρ = 0.02). One can clearly see wide differences among simulated tumors across the input space. Figure 4 shows the input space, along with 5 input vectors (D g, ρ) (plotted as o ) and sampled according to a maximin Latin hypercube design (Morris et al 993). This design spreads out the sampled inputs, in the sense that no two sampled inputs are too close. The dynamical 3D brain tumor computer model discussed above is run for each of these sampled inputs and output data sets are obtained. In addition, to validate the prediction method (to be discussed later), 2 more inputs plotted as + in Figure 4 are sampled according to a Latin hypercube design that maximizes the minimum distance among the points themselves and from the 5 inputs of the first design. 2.3 The output data sets For each of the 5 sampled inputs there are 5, data values, which would give an overall sample size of more than 200 billion. It was necessary to use a fine grid, in order to obtain an accurate numerical solution. However, due to the smoothness of the numerical solution (as one can see from Figures 2 and 3), the fine grid data will only be retained on a coarser grid, to facilitate further analysis. Each spatial dimension has been coarsened three times and, for numerical stability, the time dimension nine times, the fine grid data being retained on the resulting coarse grid. Therefore, the coarse grid on which the fine grid data have been retained has M = 667 equally distanced time points in the time interval [0, L T ] and (S + 2) (S 2 + 2) (S 3 + 2) = equally distanced space points in the rectangular spatial domain [0, L S ] [0, L S2 ] [0, L S3 ]. While an even coarser grid may be considered, one needs to retain enough grid structure to preserve key geometrical properties of the tumor output data set. Denote Y the fine grid data set retained on the above coarse grid. Now, for each of the 5 sampled inputs, there are 59,309,076 data values, giving a total of 889,636,40 values as the size of the final data set retained for further analysis. Note that the fine grid data Y retained on the coarse grid is not equal to the coarse numerical solution obtained by running the computer model above with coarse grid increments..00! D g Figure 4. The input space, along with D = 5 sampled inputs (D g, ρ) for analysis ( o ) and P = 2 sampled inputs (D g, ρ) for prediction validation ( + ). The goal in this paper is to develop a computationally efficient prediction method for the fine grid virtual brain tumor data set retained on the coarse grid, at any new input in the input space, therefore avoiding new runs with the computationally intensive dynamical 3D brain tumor model. This will be accomplished by developing a statistical model for the output data set at the D = 5 sampled inputs and then using kriging-type methodology to obtain the prediction at new inputs. 5

6 3 Methodology 3. The statistical model Modeling directly the large output data sets by dense covariance matrices is unpractical, as demonstrated in Drignei and Morris (2006). Instead, a statistical model that closely follows the iterative finite difference relationship will be used, hence incorporating the output data generating mechanism. For each of the 5 sampled inputs compute T (t, i, j, k) = Y (t +, i, j, k) Y (t, i, j, k) t { [D (i, j, k) Y (t,i+,j,k) Y (t,i,j,k) D (i, j, k) Y (t,i,j,k) Y (t,i,j,k) [D (i, j, k) Y (t,i,j+,k) Y (t,i,j,k) D (i, j, k) Y (t,i,j,k) Y (t,i,j,k) [D (i, j, k) Y (t,i,j,k+) Y (t,i,j,k) D (i, j, k ) Y (t,i,j,k) Y (t,i,j,k ) ρ Y (t, i, j, k)} for t = 0,..., M, i = 2,..., S +, j = 2,..., S 2 +, k = 2,..., S 3 +, with t = L T /M, = L Si /(S i + ), i =, 2, 3 and D is D retained on the coarse space grid. These are approximate local truncation errors, similar to those defined for example in Thomas (995) p , but rescaled by t to simplify the computation of statistical prediction. Second order stationary distributions are suitable as prior distributions for the approximate local truncation errors due to theoretical properties showing that these quantities have roughly constant magnitude across space-time, and spatiotemporal averages of approximately zero. Moreover, Taylor series arguments show that local truncation errors are a combination of higher order derivatives of the output Y, and are in general rougher than the outputs themselves. This leads to the assumption that the spatiotemporal correlations of local truncation errors are in general weak, and therefore spatiotemporal independence will be assumed for the approximate local truncation errors. Let T denote the complete vector of the M S S 2 S 3 D approximate local truncation errors. The statistical model assumed is T N(0, σ 2 D I M S S 2 S 3 ), where represents the Kronecker product. Here D is the D D correlation matrix for inputs with (m, n) element D (m, n) = exp{ θ Dg [d Dg (m) d Dg (n)] 2 } exp{ θ ρ [d ρ (m) d ρ (n)] 2 }, a product of one-dimensional correlations, where d Dg and d ρ are the coordinates of D g and ρ respectively, rescaled to the interval [0, ] for better numerical stability. The maximum likelihood estimates of parameters θ Dg and θ ρ are obtained by minimizing iteratively the function where log(ˆσ 2 ) + log(det( D)), D ˆσ 2 = T ( D I M S S 2 S 3 )T = M S S 2 S 3 D M S S 2 S 3 D D i,j= ( D [i, j])t r[., i]t r [., j], T r being the vector T reorganized as an M S S 2 S 3 D matrix. Since ˆσ 2 is used in the iterative likelihood optimization and it changes its value at each iteration, it is more computationally convenient to use the last form, with the scalars T r[., i]t r [., j], i, j =,..., D, computed before the iterative likelihood optimization begins. 3.2 Prediction The main goal in this paper is to predict the fine grid virtual brain tumor, retained on the coarse grid, at a new input (D p g, ρ p ). The prediction distribution of approximate local truncation errors at the new input is the conditional multivariate normal distribution with mean ˆT p D = [(ĈpĈ D ) I M S S 2 S 3 ]T = D i= (ĈpĈ D ) it r[., i], 6

7 a weighted sum of approximate local truncation errors vectors at the sampled inputs (the true but unknown statistical parameters have been replaced by their maximum likelihood estimates). The prediction error covariance is ˆσ p D 2 = ˆσ2 ( ĈpĈ D Ĉ p) I M S S 2 S 3. Here, Ĉ p (n) = exp{ ˆθ Dg [d D p g d Dg (n)] 2 } exp{ ˆθ ρ [d ρ p d ρ (n)] 2 }, with n =,..., D. The univariate version of this prediction method is sometimes called simple kriging (e.g. ressie 993, p. 0). Note that kriging prediction is done over the input space, not over the spatial component of the data sets. The output data Y is a linear function of T, which implies that Y has also a normal distribution. From Appendix, it follows that the point prediction Ŷ p D for output data Yp at the new input can be computed iteratively as Ŷp D (t +, i, j, k) = Ŷp D (t, i, j, k) + t { [D (i, j, k) Ŷ p D (t,i+,j,k) Ŷ p D (t,i,j,k) D (i, j, k) Ŷ p D (t,i,j,k) Ŷ p D (t,i,j,k) [D (i, j, k) Ŷ p D (t,i,j+,k) Ŷ p D (t,i,j,k) D (i, j, k) Ŷ p D (t,i,j,k) Ŷ (t,i,j,k) [D (i, j, k) Ŷ p D (t,i,j,k+) Ŷ p D (t,i,j,k) D (i, j, k ) Ŷ p D (t,i,j,k) Ŷ p D (t,i,j,k ) ρ Ŷ p D (t, i, j, k)} + ˆT p D (t, i, j, k) for t = 0,..., M, i = 2,..., S +, j = 2,..., S 2 + and k = 2,..., S 3 +, starting with the same initial values presented in Section 2., retained on the coarsely spaced grid. While it is algebraically possible to write down the theoretical covariance prediction error for Ŷ p D, it is computationally unpractical to do so, as it requires the multiplication of a matrix of size M S S 2 S 3 M S S 2 S 3 = 49, 068, 45 49, 068, 45 with its transpose (see Appendix). In practice, it often suffices to obtain individual prediction intervals, which involve the main diagonal elements of the covariance prediction error. While computationally less demanding than the entire covariance prediction error, this is still computationally intensive to be practically useful. Instead, a more computationally feasible solution to obtain individual prediction intervals is by simulation. onsider the following iterative relationship that simulates R realizations Ỹ p D from the conditional distribution of Yp on Y : Ỹp D (t +, i, j, k, r) = Ỹp D (t, i, j, k, r) + t { [D (i, j, k) Ỹ p D (t,i+,j,k,r) Ỹ p D (t,i,j,k,r) D (i, j, k) Ỹ p D (t,i,j,k,r) Ỹ p D (t,i,j,k,r) [D (i, j, k) Ỹ p D (t,i,j+,k,r) Ỹ p D (t,i,j,k,r) D (i, j, k) Ỹ p D (t,i,j,k,r) Y (t,i,j,k,r) [D (i, j, k) Ỹ p D (t,i,j,k+,r) Ỹ p D (t,i,j,k,r) D (i, j, k ) Ỹ p D (t,i,j,k,r) Ỹ p D (t,i,j,k,r) ρ Ỹ p D (t, i, j, k, r)} + ˆT p D (t, i, j, k) + ˆσ p D 2 ɛ t,i,j,k,r for t = 0,..., M, i = 2,..., S +, j = 2,..., S 2 +, k = 2,..., S 3 +, r =,..., R and ɛ independent standard normal random variables. The initial values are the same as above. If τ0 D 2 (t, i, j, k) represents the conditional variance of Yp (t, i, j, k) on Y at each point (t, i, j, k), then is univariate normal and conditionally independent of R τ0 D 2 (t, i, j, k) (Ỹ Yp (t, i, j, k) Ŷ p D (t, i, j, k) τ 0 D (t, i, j, k) r= p D (t, i, j, k, r) Ŷ p D (t, i, j, k))2, which has a χ 2 R distribution. Therefore, conditioned on Y, the distribution of Yp (t, i, j, k) Ŷ p D (t, i, j, k) R r= (Ỹ p D (t, i, j, k, r) Ŷ p D (t, i, j, k))2 /R 7

8 is t R. The individual prediction interval of Y p (t, i, j, k) is defined as (Ŷ p D (t, i, j, k) t R, α/2 SE, Ŷ p D (t, i, j, k) + t R, α/2 SE), R where SE = r= (Ỹ p D (t, i, j, k, r) Ŷ p D (t, i, j, k))2 /R. In practice R will typically be small (e.g R = 2) since larger R would require more computational effort. The key to the computational efficiency of the prediction method developed in this paper is the parametric model for the predicted output, which permits the use of a relatively small number of simulations Ỹ p D in conjunction with the point prediction Ŷp D. The above point prediction (along with the prediction intervals) defines the fast statistical surrogate. 3.3 Validation The following validation measures for prediction are considered here: the root mean square error P M,S,S 2,S 3 RMSE = (Yp P M S S 2 S (t, i, j, k) Ŷ p D (t, i, j, k))2, 3 the maximum absolute value of error p,t,i,j,k= MaxErr = max Yp (t, i, j, k) Ŷ p D (t, i, j, k) and the actual coverage of prediction intervals with a certain nominal coverage (e.g. 95%) OV ER = P M S S 2 S 3 P M,S,S 2,S 3 p,t,i,j,k= δ [Y p (t,i,j,k) I p,t,i,j,k ], where I p,t,i,j,k is the individual prediction interval of Yp (t, i, j, k) at each point (p, t, i, j, k), and δ is the indicator function. The coarse numerical solution computed on the coarse grid, denoted by X below, and the prediction of the fine grid output data use comparable computational resources (the computational times are given in the next Section). Therefore the coarse numerical solution will be used as a computational competitor for prediction, in order to compare their accuracy with respect to the validation measures defined above. This coarse numerical solution is faster but less accurate than the fine grid numerical solution, when compared on the coarse grid. The validation measures RMSE and MaxErr for the coarse numerical solution are defined similarly, with X instead of Ŷ p D X(t +, i, j, k) = X(t, i, j, k) + t { [D (i, j, k) X(t,i+,j,k) X(t,i,j,k). The coarse numerical solution on the coarse grid is computed as [D (i, j, k) X(t,i,j+,k) X(t,i,j,k) D (i, j, k) X(t,i,j,k) X(t,i,j,k) D (i, j, k) X(t,i,j,k) X(t,i,j,k) [D (i, j, k) X(t,i,j,k+) X(t,i,j,k) D (i, j, k ) X(t,i,j,k) X(t,i,j,k ) ρ X(t, i, j, k)}, t = 0,..., M, i = 2,..., S +, j = 2,..., S 2 + and k = 2,..., S Analysis of brain tumor computer experiment The computer model has been run for D = 5 input vectors (D g, ρ) shown in Figure 4 ( o ) and the output data Y have been recorded. The approximate local truncation errors data T have been obtained as in Section 3. and the statistical model presented there has been fitted. The variance parameters will be fixed at their maximum likelihood values ˆθ Dg = 0.77, ˆθ ρ = 6.76 and ˆσ 2 = 0.4 throughout the rest of the analysis. Figure 5 shows contour plots of the correlations for each input dimension, the correlation of the first input dimension decreasing much slower with distance than the correlation of the second input dimension. 8

9 To demonstrate the quality of the statistical prediction, the fine grid computer model has been run at the P = 2 new inputs (Dg, p ρ p ) denoted by + in Figure 4, the output data Y retained on the coarse grid have been obtained, and their predicted values along with the validation measures presented in Section 3.3 have been computed. (For applications where new runs cannot be obtained, one could use cross-validation methods.) To compare the results, the coarse numerical solution X along with the corresponding RMSE and MaxErr measures have also been computed. This coarse numerical solution takes about 2 minutes per run. The computation of ˆT p D takes about 3 minutes. The iterative computation of the point prediction Ŷ p D alone takes about 2 minutes, and the computation of the R = 2 statistical simulations Ỹ p D takes about 4 minutes. The computational times for the coarse numerical solution X and for prediction are comparable, given that the computational time of the fine grid numerical solution is measured in hours. However, as one can see from Table, the statistical prediction is much more accurate than the coarse numerical solution. (In Table, the index p of the accuracy measures stands for statistical prediction and c for coarse numerical solution). As the number D of sampled input parameters increases, the accuracy of the statistical prediction is expected to increase even further; however, this requires more fine-grid computer model runs and therefore additional computational resources are needed. The individual prediction intervals appear to over-cover the true output values for both 95% and 99% nominal values, although the over-coverage seems to be smaller for the 99% nominal value. Increasing the number of simulations appears to improve slightly the actual coverage (e.g. for R = 0 the actual coverage was 72 and 920 for the 95% and 99% nominal coverage, respectively). However, this requires an increased computational effort and in practice one needs to balance the accuracy and the computational efficiency of the statistical surrogate d Dg d! d Dg d! Figure 5. ontour plots of correlation matrices for each input dimension (D g left panel, ρ right panel). Table. Results. RMSE p /RMSE c MaxErr p /MaxErr c OV ER 95% (99%) (947) The above RMSE and MaxErr are overall measures with respect to all P = 2 new inputs. In order to assess their performance at individual new inputs, the RMSE and MaxErr measures at each of the P = 2 new inputs were also computed, by taking P = in the formulas presented in Section 3.3. Figure 6 contains boxplots of the differences of log of these 2 values, showing that the statistical prediction validation measures RMSE p and MaxErr p are smaller than their coarse numerical solution counterparts RMSE c and MaxErr c at any of the 2 new inputs, although for one new input, (D g, ρ) = (0.002, ), the values of MaxErr p and MaxErr c are almost equal. The analysis shows that the statistical prediction has achieved greater accuracy than a computationally comparable numerical solution computed with the original computer model. 9

10 log(rmsep) log(rmsec) log(maxp) log(maxc) Figure 6. Boxplots of differences of log RMSE (left) and of differences of log MaxErr (right) at individual P = 2 prediction inputs, for point prediction and coarse numerical solution. 5 onclusions This paper has presented a computationally efficient method for constructing fast statistical surrogates for a computationally intensive dynamical 3D computer model of brain tumor growth and invasion. The method consisted in sampling a small number of input parameters, running the computer code for each input and obtaining the output data sets. An appropriate statistical model for the output data sets was estimated and a multidimensional kriging method was used to predict the output data sets at new inputs. The prediction defined the fast statistical surrogates. The statistical model followed closely the finite difference relationship underpinning the computer model. The method was tested at a new set of inputs and the statistical surrogate was more accurate than a computationally comparable coarse numerical solution generated with the computer model. Ultimately, the interest is in developing efficient methods for early detection and treatment of brain tumors. Mathematical models for surgical resection and chemotherapy treatment of brain tumors have already been developed (Murray 2003, Sections.9-0). With these models, one can investigate the effects of the surgical resection and/or chemotherapy on the subsequent evolution of the brain tumor. However, the resulting dynamical 3D computer models are even more complex and computationally intensive than the computer model discussed in this paper, therefore precluding scientists from conducting large size computer experiments. The statistical methodology presented in this paper, with appropriate modifications, may be used to reduce the computational effort and increase the size of those experiments. Appendix This Appendix derives the prediction distribution for the virtual tumor data set, outlines some computational issues and ways to overcome them. The output data in vector format at each of the sampled input vectors can be written as Yi = µ i + A i T i, following from the iterative linear relationship between Y and T. The matrices A i are lower triangular. In vector format, Y = µ+at, with Y = [Y,..., YD ], µ = [µ,..., µ D ], T = [T,..., T D ] and A = diag(a,..., A D ) (A a block diagonal matrix). The covariance matrix of Y is cov(y ) = σ 2 A( D I M S S 2 S 3 )A, with cov(yi, Y j ) = σ2 D (i, j)a i A j. onsider now a new input (Dg, p ρ p ). For true but unknown statistical parameters, the covariance between the output data at the new input and the output data at sampled inputs is cov(yp, Y ) = σ 2 [ p ()A p A,..., p (D)A p A D ] = σ2 A p ( p I M S S 2 S 3 )A. Therefore, the point prediction of Yp is the conditional mean Yp D = µ p + cov(yp, Y )cov(y ) (Y µ) = µ p + A p ( p I M S S 2 S 3 )A A ( D I M S S 2 S 3 )A AT = µ p + A p ( p D I M S S 2 S 3 )T = µ p + A p T p D, which justifies the use of 0

11 the iterative relationship to obtain Ŷ p D in Section 3.2. The prediction covariance matrix is the conditional covariance matrix cov(yp, Yp ) cov(yp, Y )cov(y ) cov(yp, Y ) = σ 2 A p A p σ 2 A p ( p I M S S 2 S 3 )A A ( D I M S S 2 S 3 )A A( p I M S S 2 S 3 )A p = σ 2 A p A p σ 2 A p ( p D p)a p = σ 2 ( p D p)a p A p. The computation of the prediction covariance matrix requires the product of two matrices of size M S S 2 S 3 M S S 2 S 3 and it is therefore unpractical. The computation of theoretical individual prediction intervals, which involve only the main diagonal of the product A p A p, is also unpractical. A more computationally practical way to obtain prediction intervals is through simulations from the prediction conditional distribution, which can be done iteratively as explained in Section 3.2. Acknowledgments This research has been supported in part by Oakland University Provost s New Faculty Research Fellowship. References [] Bayarri, M.J., Berger, J.O., A., Paulo, R., Sacks, J., afeo, J.A., avendish, J., Lin,.H. and Tu, J. (2007). A Framework for Validation of omputer Models. Technometrics, 49, [2] Bayarri, M.J., Walsh, D., Berger, J.O., afeo, J., Garcia-Donato, G., Liu, F., Palomo, Parthasarathy, R.J., Paulo, R. and Sacks, J. (2007), omputer Model Validation with Functional Output, Annals of Statistics, 35, [3] ollins, D.L., Zijdenbos A.P., Kollokian V., Sled J.G., Kabani N.J., Holmes.J., Evans A.. (998), Design and onstruction of a Realistic Digital Brain Phantom, IEEE Transactions on Medical Imaging, 7, [4] ressie, N. A. (993), Statistics for Spatial Data, New York, Wiley. [5] urrin,., Mitchell, T., Morris, M., and Ylvisaker D. (99), Bayesian prediction of deterministic functions, with applications to the design and analysis of computer experiments, Journal of the American Statistical Association, 86, [6] Drignei, D. (2006), Empirical Bayesian Analysis for High-Dimensional omputer Output, Technometrics, 48, [7] Drignei, D. and Morris, M.D. (2006), Empirical Bayesian Analysis for omputer Experiments Involving Finite-Difference odes, Journal of the American Statistical Association, 0, p [8] Fang, K.T., Li, R. and Sudjianto, A. (2006). Design and Modeling for omputer Experiments, hapman and Hall. [9] Higdon, D., Gattiker, J., Williams, B., and Rightley, M. (2007), omputer model validation using high dimensional outputs, in Bayesian Statistics 8, eds. Bernardo, J., Bayarri, M. J., Dawid, A. P., Berger, J. O., Heckerman, D., Smith, A. F. M., and West, M., London: Oxford University Press. [0] Johnson, M., Moore, L. and Ylvisaker D. (989), Minimax and Maximin Distance Designs, Journal of Statistical Planning and Inference, 26, [] Morris, M.D., Mitchell, T.J., and Ylvisaker, D. (993), Bayesian Design and Analysis of omputer Experiments: Use of Derivatives in Surface Prediction,Technometrics, 35, [2] Murray J.D. (2003), Mathematical biology II: spatial models and biomedical applications, Springer, New York. [3] Sacks, J., Welch, W.J., Mitchell, T.J. and Wynn, H.P. (989), Design and analysis of computer experiments, Statistical Science, 4,

12 [4] Santner, T.J., Williams, B.J., Notz, W.I. (2003). The Design and Analysis of omputer Experiments, Springer, New York. [5] Swanson KR, Alvord E Jr and Murray JD (2000), A quantitative model for differential motility of gliomas in grey and white matter. ell Prolif, 33, [6] Swanson K. R., Alvord E.. Jr. and Murray J. D. (2004), Dynamics of a model for brain tumors reveals a small window for therapeutic intervention, Discrete and ontinuous Dynamical Systems - B, 4, [7] Thomas, J.W. (995). Numerical Partial Differential Equations: Finite Difference Methods, Springer, New York. [8] Tracqui, P., ruywagen, G.., Woodward, D. E., Bartooll, G. T., Murray, J. D. and Alvord, E.. Jr. (995), A mathematical model of glioma growth: the effect of chemotherapy on spatio-temporal growth, ell Prolif, 28,

A Kriging Approach to the Analysis of Climate Model Experiments

A Kriging Approach to the Analysis of Climate Model Experiments A Kriging Approach to the Analysis of Climate Model Experiments Dorin Drignei Department of Mathematics and Statistics Oakland University, Rochester, MI 48309, USA Abstract. A climate model is a computer

More information

Empirical Bayesian Analysis for Computer Experiments Involving Finite-Difference Codes

Empirical Bayesian Analysis for Computer Experiments Involving Finite-Difference Codes Empirical Bayesian Analysis for Computer Experiments Involving Finite-Difference Codes Dorin DRIGNEI and Max D. MORRIS Computer experiments are increasingly used in scientific investigations as substitutes

More information

Bayesian Dynamic Linear Modelling for. Complex Computer Models

Bayesian Dynamic Linear Modelling for. Complex Computer Models Bayesian Dynamic Linear Modelling for Complex Computer Models Fei Liu, Liang Zhang, Mike West Abstract Computer models may have functional outputs. With no loss of generality, we assume that a single computer

More information

Limit Kriging. Abstract

Limit Kriging. Abstract Limit Kriging V. Roshan Joseph School of Industrial and Systems Engineering Georgia Institute of Technology Atlanta, GA 30332-0205, USA roshan@isye.gatech.edu Abstract A new kriging predictor is proposed.

More information

Monitoring Wafer Geometric Quality using Additive Gaussian Process

Monitoring Wafer Geometric Quality using Additive Gaussian Process Monitoring Wafer Geometric Quality using Additive Gaussian Process Linmiao Zhang 1 Kaibo Wang 2 Nan Chen 1 1 Department of Industrial and Systems Engineering, National University of Singapore 2 Department

More information

Computer Model Calibration or Tuning in Practice

Computer Model Calibration or Tuning in Practice Computer Model Calibration or Tuning in Practice Jason L. Loeppky Department of Statistics University of British Columbia Vancouver, BC, V6T 1Z2, CANADA (jason@stat.ubc.ca) Derek Bingham Department of

More information

Computer experiments with functional inputs and scalar outputs by a norm-based approach

Computer experiments with functional inputs and scalar outputs by a norm-based approach Computer experiments with functional inputs and scalar outputs by a norm-based approach arxiv:1410.0403v1 [stat.me] 1 Oct 2014 Thomas Muehlenstaedt W. L. Gore & Associates and Jana Fruth Faculty of Statistics,

More information

An introduction to Bayesian statistics and model calibration and a host of related topics

An introduction to Bayesian statistics and model calibration and a host of related topics An introduction to Bayesian statistics and model calibration and a host of related topics Derek Bingham Statistics and Actuarial Science Simon Fraser University Cast of thousands have participated in the

More information

Use of Design Sensitivity Information in Response Surface and Kriging Metamodels

Use of Design Sensitivity Information in Response Surface and Kriging Metamodels Optimization and Engineering, 2, 469 484, 2001 c 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Use of Design Sensitivity Information in Response Surface and Kriging Metamodels J. J.

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information

Gaussian Processes for Computer Experiments

Gaussian Processes for Computer Experiments Gaussian Processes for Computer Experiments Jeremy Oakley School of Mathematics and Statistics, University of Sheffield www.jeremy-oakley.staff.shef.ac.uk 1 / 43 Computer models Computer model represented

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Dynamic Matrix-Variate Graphical Models A Synopsis 1

Dynamic Matrix-Variate Graphical Models A Synopsis 1 Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics Benidorm (Alicante, Spain), June 1st 6th, 2006 Dynamic Matrix-Variate Graphical Models A Synopsis 1 Carlos M. Carvalho & Mike West ISDS, Duke

More information

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling François Bachoc former PhD advisor: Josselin Garnier former CEA advisor: Jean-Marc Martinez Department

More information

mlegp: an R package for Gaussian process modeling and sensitivity analysis

mlegp: an R package for Gaussian process modeling and sensitivity analysis mlegp: an R package for Gaussian process modeling and sensitivity analysis Garrett Dancik January 30, 2018 1 mlegp: an overview Gaussian processes (GPs) are commonly used as surrogate statistical models

More information

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson

Mustafa H. Tongarlak Bruce E. Ankenman Barry L. Nelson Proceedings of the 0 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelspach, K. P. White, and M. Fu, eds. RELATIVE ERROR STOCHASTIC KRIGING Mustafa H. Tongarlak Bruce E. Ankenman Barry L.

More information

arxiv: v1 [stat.me] 10 Jul 2009

arxiv: v1 [stat.me] 10 Jul 2009 6th St.Petersburg Workshop on Simulation (2009) 1091-1096 Improvement of random LHD for high dimensions arxiv:0907.1823v1 [stat.me] 10 Jul 2009 Andrey Pepelyshev 1 Abstract Designs of experiments for multivariate

More information

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix Labor-Supply Shifts and Economic Fluctuations Technical Appendix Yongsung Chang Department of Economics University of Pennsylvania Frank Schorfheide Department of Economics University of Pennsylvania January

More information

Sensitivity analysis in linear and nonlinear models: A review. Introduction

Sensitivity analysis in linear and nonlinear models: A review. Introduction Sensitivity analysis in linear and nonlinear models: A review Caren Marzban Applied Physics Lab. and Department of Statistics Univ. of Washington, Seattle, WA, USA 98195 Consider: Introduction Question:

More information

Sequential adaptive designs in computer experiments for response surface model fit

Sequential adaptive designs in computer experiments for response surface model fit Statistics and Applications Volume 6, Nos. &, 8 (New Series), pp.7-33 Sequential adaptive designs in computer experiments for response surface model fit Chen Quin Lam and William I. Notz Department of

More information

One-at-a-Time Designs for Estimating Elementary Effects of Simulator Experiments with Non-rectangular Input Regions

One-at-a-Time Designs for Estimating Elementary Effects of Simulator Experiments with Non-rectangular Input Regions Statistics and Applications Volume 11, Nos. 1&2, 2013 (New Series), pp. 15-32 One-at-a-Time Designs for Estimating Elementary Effects of Simulator Experiments with Non-rectangular Input Regions Fangfang

More information

1. Gaussian process emulator for principal components

1. Gaussian process emulator for principal components Supplement of Geosci. Model Dev., 7, 1933 1943, 2014 http://www.geosci-model-dev.net/7/1933/2014/ doi:10.5194/gmd-7-1933-2014-supplement Author(s) 2014. CC Attribution 3.0 License. Supplement of Probabilistic

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

Statistica Sinica Preprint No: SS R2

Statistica Sinica Preprint No: SS R2 Statistica Sinica Preprint No: SS-2016-0423.R2 Title Construction of Maximin Distance Designs via Level Permutation and Expansion Manuscript ID SS-2016-0423.R2 URL http://www.stat.sinica.edu.tw/statistica/

More information

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014

Bayesian Prediction of Code Output. ASA Albuquerque Chapter Short Course October 2014 Bayesian Prediction of Code Output ASA Albuquerque Chapter Short Course October 2014 Abstract This presentation summarizes Bayesian prediction methodology for the Gaussian process (GP) surrogate representation

More information

Overview of Spatial Statistics with Applications to fmri

Overview of Spatial Statistics with Applications to fmri with Applications to fmri School of Mathematics & Statistics Newcastle University April 8 th, 2016 Outline Why spatial statistics? Basic results Nonstationary models Inference for large data sets An example

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Introduction to Gaussian-process based Kriging models for metamodeling and validation of computer codes

Introduction to Gaussian-process based Kriging models for metamodeling and validation of computer codes Introduction to Gaussian-process based Kriging models for metamodeling and validation of computer codes François Bachoc Department of Statistics and Operations Research, University of Vienna (Former PhD

More information

Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL

Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL 1 Bayesian inference & process convolution models Dave Higdon, Statistical Sciences Group, LANL 2 MOVING AVERAGE SPATIAL MODELS Kernel basis representation for spatial processes z(s) Define m basis functions

More information

Modeling nonlinear systems using multiple piecewise linear equations

Modeling nonlinear systems using multiple piecewise linear equations Nonlinear Analysis: Modelling and Control, 2010, Vol. 15, No. 4, 451 458 Modeling nonlinear systems using multiple piecewise linear equations G.K. Lowe, M.A. Zohdy Department of Electrical and Computer

More information

Physical Experimental Design in Support of Computer Model Development. Experimental Design for RSM-Within-Model

Physical Experimental Design in Support of Computer Model Development. Experimental Design for RSM-Within-Model DAE 2012 1 Physical Experimental Design in Support of Computer Model Development or Experimental Design for RSM-Within-Model Max D. Morris Department of Statistics Dept. of Industrial & Manufacturing Systems

More information

A Short Note on Resolving Singularity Problems in Covariance Matrices

A Short Note on Resolving Singularity Problems in Covariance Matrices International Journal of Statistics and Probability; Vol. 1, No. 2; 2012 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education A Short Note on Resolving Singularity Problems

More information

Tilburg University. Two-Dimensional Minimax Latin Hypercube Designs van Dam, Edwin. Document version: Publisher's PDF, also known as Version of record

Tilburg University. Two-Dimensional Minimax Latin Hypercube Designs van Dam, Edwin. Document version: Publisher's PDF, also known as Version of record Tilburg University Two-Dimensional Minimax Latin Hypercube Designs van Dam, Edwin Document version: Publisher's PDF, also known as Version of record Publication date: 2005 Link to publication General rights

More information

Tilburg University. Two-dimensional maximin Latin hypercube designs van Dam, Edwin. Published in: Discrete Applied Mathematics

Tilburg University. Two-dimensional maximin Latin hypercube designs van Dam, Edwin. Published in: Discrete Applied Mathematics Tilburg University Two-dimensional maximin Latin hypercube designs van Dam, Edwin Published in: Discrete Applied Mathematics Document version: Peer reviewed version Publication date: 2008 Link to publication

More information

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED

A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED A TIME SERIES PARADOX: UNIT ROOT TESTS PERFORM POORLY WHEN DATA ARE COINTEGRATED by W. Robert Reed Department of Economics and Finance University of Canterbury, New Zealand Email: bob.reed@canterbury.ac.nz

More information

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks

A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks A Bayesian Nonparametric Approach to Causal Inference for Semi-competing risks Y. Xu, D. Scharfstein, P. Mueller, M. Daniels Johns Hopkins, Johns Hopkins, UT-Austin, UF JSM 2018, Vancouver 1 What are semi-competing

More information

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood

Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Stat 542: Item Response Theory Modeling Using The Extended Rank Likelihood Jonathan Gruhl March 18, 2010 1 Introduction Researchers commonly apply item response theory (IRT) models to binary and ordinal

More information

arxiv: v1 [stat.me] 24 May 2010

arxiv: v1 [stat.me] 24 May 2010 The role of the nugget term in the Gaussian process method Andrey Pepelyshev arxiv:1005.4385v1 [stat.me] 24 May 2010 Abstract The maximum likelihood estimate of the correlation parameter of a Gaussian

More information

Estimating Optimum Linear Combination of Multiple Correlated Diagnostic Tests at a Fixed Specificity with Receiver Operating Characteristic Curves

Estimating Optimum Linear Combination of Multiple Correlated Diagnostic Tests at a Fixed Specificity with Receiver Operating Characteristic Curves Journal of Data Science 6(2008), 1-13 Estimating Optimum Linear Combination of Multiple Correlated Diagnostic Tests at a Fixed Specificity with Receiver Operating Characteristic Curves Feng Gao 1, Chengjie

More information

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University this presentation derived from that presented at the Pan-American Advanced

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

On prediction and density estimation Peter McCullagh University of Chicago December 2004

On prediction and density estimation Peter McCullagh University of Chicago December 2004 On prediction and density estimation Peter McCullagh University of Chicago December 2004 Summary Having observed the initial segment of a random sequence, subsequent values may be predicted by calculating

More information

Stat 890 Design of computer experiments

Stat 890 Design of computer experiments Stat 890 Design of computer experiments Will introduce design concepts for computer experiments Will look at more elaborate constructions next day Experiment design In computer experiments, as in many

More information

A Dynamic Modelling Strategy for Bayesian Computer Model Emulation

A Dynamic Modelling Strategy for Bayesian Computer Model Emulation Bayesian Analysis (2004) 1, Number 1 A Dynamic Modelling Strategy for Bayesian Computer Model Emulation Fei Liu Univeristy of Missouri Mike West Duke University Abstract. Computer model evaluation studies

More information

Appendix C: Recapitulation of Numerical schemes

Appendix C: Recapitulation of Numerical schemes Appendix C: Recapitulation of Numerical schemes August 31, 2009) SUMMARY: Certain numerical schemes of general use are regrouped here in order to facilitate implementations of simple models C1 The tridiagonal

More information

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise

More information

Wrapped Gaussian processes: a short review and some new results

Wrapped Gaussian processes: a short review and some new results Wrapped Gaussian processes: a short review and some new results Giovanna Jona Lasinio 1, Gianluca Mastrantonio 2 and Alan Gelfand 3 1-Università Sapienza di Roma 2- Università RomaTRE 3- Duke University

More information

arxiv: v1 [stat.ap] 27 Mar 2015

arxiv: v1 [stat.ap] 27 Mar 2015 Submitted to the Annals of Applied Statistics A NOTE ON THE SPECIFIC SOURCE IDENTIFICATION PROBLEM IN FORENSIC SCIENCE IN THE PRESENCE OF UNCERTAINTY ABOUT THE BACKGROUND POPULATION By Danica M. Ommen,

More information

Understanding Uncertainty in Climate Model Components Robin Tokmakian Naval Postgraduate School

Understanding Uncertainty in Climate Model Components Robin Tokmakian Naval Postgraduate School Understanding Uncertainty in Climate Model Components Robin Tokmakian Naval Postgraduate School rtt@nps.edu Collaborators: P. Challenor National Oceanography Centre, UK; Jim Gattiker Los Alamos National

More information

Inferring biological dynamics Iterated filtering (IF)

Inferring biological dynamics Iterated filtering (IF) Inferring biological dynamics 101 3. Iterated filtering (IF) IF originated in 2006 [6]. For plug-and-play likelihood-based inference on POMP models, there are not many alternatives. Directly estimating

More information

Estimation de la fonction de covariance dans le modèle de Krigeage et applications: bilan de thèse et perspectives

Estimation de la fonction de covariance dans le modèle de Krigeage et applications: bilan de thèse et perspectives Estimation de la fonction de covariance dans le modèle de Krigeage et applications: bilan de thèse et perspectives François Bachoc Josselin Garnier Jean-Marc Martinez CEA-Saclay, DEN, DM2S, STMF, LGLS,

More information

Supplementary Material to General Functional Concurrent Model

Supplementary Material to General Functional Concurrent Model Supplementary Material to General Functional Concurrent Model Janet S. Kim Arnab Maity Ana-Maria Staicu June 17, 2016 This Supplementary Material contains six sections. Appendix A discusses modifications

More information

A Randomized Algorithm for the Approximation of Matrices

A Randomized Algorithm for the Approximation of Matrices A Randomized Algorithm for the Approximation of Matrices Per-Gunnar Martinsson, Vladimir Rokhlin, and Mark Tygert Technical Report YALEU/DCS/TR-36 June 29, 2006 Abstract Given an m n matrix A and a positive

More information

PERSONALIZING PHYSIOLOGICAL MODELS WITH MEDICAL DATA

PERSONALIZING PHYSIOLOGICAL MODELS WITH MEDICAL DATA PERSONALIZING PHYSIOLOGICAL MODELS WITH MEDICAL DATA Ender Konukoglu enderk@microsoft.com http://research.microsoft.com/en-us/people/enderk/ 2 Physiological Modelling Models Personalization Simulations

More information

Bayesian Nonparametric Point Estimation Under a Conjugate Prior

Bayesian Nonparametric Point Estimation Under a Conjugate Prior University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 5-15-2002 Bayesian Nonparametric Point Estimation Under a Conjugate Prior Xuefeng Li University of Pennsylvania Linda

More information

Variational Principal Components

Variational Principal Components Variational Principal Components Christopher M. Bishop Microsoft Research 7 J. J. Thomson Avenue, Cambridge, CB3 0FB, U.K. cmbishop@microsoft.com http://research.microsoft.com/ cmbishop In Proceedings

More information

29th Monitoring Research Review: Ground-Based Nuclear Explosion Monitoring Technologies

29th Monitoring Research Review: Ground-Based Nuclear Explosion Monitoring Technologies MODELING TRAVEL-TIME CORRELATIONS BASED ON SENSITIVITY KERNELS AND CORRELATED VELOCITY ANOMALIES William L. Rodi 1 and Stephen C. Myers 2 Massachusetts Institute of Technology 1 and Lawrence Livermore

More information

X t = a t + r t, (7.1)

X t = a t + r t, (7.1) Chapter 7 State Space Models 71 Introduction State Space models, developed over the past 10 20 years, are alternative models for time series They include both the ARIMA models of Chapters 3 6 and the Classical

More information

A Parametric Spatial Bootstrap

A Parametric Spatial Bootstrap A Parametric Spatial Bootstrap Liansheng Tang a ; William R. Schucany b ; Wayne A. Woodward b ; and Richard F. Gunst b July 17, 2006 a Department of Biostatistics, University of Washington, P.O. Box 357232,

More information

NEW ESTIMATORS FOR PARALLEL STEADY-STATE SIMULATIONS

NEW ESTIMATORS FOR PARALLEL STEADY-STATE SIMULATIONS roceedings of the 2009 Winter Simulation Conference M. D. Rossetti, R. R. Hill, B. Johansson, A. Dunkin, and R. G. Ingalls, eds. NEW ESTIMATORS FOR ARALLEL STEADY-STATE SIMULATIONS Ming-hua Hsieh Department

More information

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9

InDel 3-5. InDel 8-9. InDel 3-5. InDel 8-9. InDel InDel 8-9 Lecture 5 Alignment I. Introduction. For sequence data, the process of generating an alignment establishes positional homologies; that is, alignment provides the identification of homologous phylogenetic

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Graph Detection and Estimation Theory

Graph Detection and Estimation Theory Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and

More information

Currie, Iain Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh EH14 4AS, UK

Currie, Iain Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh EH14 4AS, UK An Introduction to Generalized Linear Array Models Currie, Iain Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh EH14 4AS, UK E-mail: I.D.Currie@hw.ac.uk 1 Motivating

More information

Efficient Data Assimilation for Spatiotemporal Chaos: a Local Ensemble Transform Kalman Filter

Efficient Data Assimilation for Spatiotemporal Chaos: a Local Ensemble Transform Kalman Filter Efficient Data Assimilation for Spatiotemporal Chaos: a Local Ensemble Transform Kalman Filter arxiv:physics/0511236 v1 28 Nov 2005 Brian R. Hunt Institute for Physical Science and Technology and Department

More information

An EM algorithm for Gaussian Markov Random Fields

An EM algorithm for Gaussian Markov Random Fields An EM algorithm for Gaussian Markov Random Fields Will Penny, Wellcome Department of Imaging Neuroscience, University College, London WC1N 3BG. wpenny@fil.ion.ucl.ac.uk October 28, 2002 Abstract Lavine

More information

Synthesis of Gaussian and non-gaussian stationary time series using circulant matrix embedding

Synthesis of Gaussian and non-gaussian stationary time series using circulant matrix embedding Synthesis of Gaussian and non-gaussian stationary time series using circulant matrix embedding Vladas Pipiras University of North Carolina at Chapel Hill UNC Graduate Seminar, November 10, 2010 (joint

More information

Multiresolution Models of Time Series

Multiresolution Models of Time Series Multiresolution Models of Time Series Andrea Tamoni (Bocconi University ) 2011 Tamoni Multiresolution Models of Time Series 1/ 16 General Framework Time-scale decomposition General Framework Begin with

More information

Applying Adomian Decomposition Method to Solve Burgess Equation with a Non-linear Source

Applying Adomian Decomposition Method to Solve Burgess Equation with a Non-linear Source arxiv:1606.00259v1 [q-bio.to] 14 Jan 2016 Applying Adomian Decomposition Method to Solve Burgess Equation with a Non-linear Source O. González-Gaxiola, R. Bernal-Jaquez Departamento de Matemáticas Aplicadas

More information

FAST AND ACCURATE DIRECTION-OF-ARRIVAL ESTIMATION FOR A SINGLE SOURCE

FAST AND ACCURATE DIRECTION-OF-ARRIVAL ESTIMATION FOR A SINGLE SOURCE Progress In Electromagnetics Research C, Vol. 6, 13 20, 2009 FAST AND ACCURATE DIRECTION-OF-ARRIVAL ESTIMATION FOR A SINGLE SOURCE Y. Wu School of Computer Science and Engineering Wuhan Institute of Technology

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach"

Kneib, Fahrmeir: Supplement to Structured additive regression for categorical space-time data: A mixed model approach Kneib, Fahrmeir: Supplement to "Structured additive regression for categorical space-time data: A mixed model approach" Sonderforschungsbereich 386, Paper 43 (25) Online unter: http://epub.ub.uni-muenchen.de/

More information

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts ICML 2015 Scalable Nonparametric Bayesian Inference on Point Processes with Gaussian Processes Machine Learning Research Group and Oxford-Man Institute University of Oxford July 8, 2015 Point Processes

More information

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Propagation of Uncertainties. Department of Physics and Astronomy University of Rochester Physics 403 Propagation of Uncertainties Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Maximum Likelihood and Minimum Least Squares Uncertainty Intervals

More information

ASSESSING A VECTOR PARAMETER

ASSESSING A VECTOR PARAMETER SUMMARY ASSESSING A VECTOR PARAMETER By D.A.S. Fraser and N. Reid Department of Statistics, University of Toronto St. George Street, Toronto, Canada M5S 3G3 dfraser@utstat.toronto.edu Some key words. Ancillary;

More information

Default Priors and Effcient Posterior Computation in Bayesian

Default Priors and Effcient Posterior Computation in Bayesian Default Priors and Effcient Posterior Computation in Bayesian Factor Analysis January 16, 2010 Presented by Eric Wang, Duke University Background and Motivation A Brief Review of Parameter Expansion Literature

More information

Introduction to emulators - the what, the when, the why

Introduction to emulators - the what, the when, the why School of Earth and Environment INSTITUTE FOR CLIMATE & ATMOSPHERIC SCIENCE Introduction to emulators - the what, the when, the why Dr Lindsay Lee 1 What is a simulator? A simulator is a computer code

More information

COMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS

COMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS VII. COMPUTATIONAL MULTI-POINT BME AND BME CONFIDENCE SETS Until now we have considered the estimation of the value of a S/TRF at one point at the time, independently of the estimated values at other estimation

More information

Spatial smoothing using Gaussian processes

Spatial smoothing using Gaussian processes Spatial smoothing using Gaussian processes Chris Paciorek paciorek@hsph.harvard.edu August 5, 2004 1 OUTLINE Spatial smoothing and Gaussian processes Covariance modelling Nonstationary covariance modelling

More information

Basics of Point-Referenced Data Models

Basics of Point-Referenced Data Models Basics of Point-Referenced Data Models Basic tool is a spatial process, {Y (s), s D}, where D R r Chapter 2: Basics of Point-Referenced Data Models p. 1/45 Basics of Point-Referenced Data Models Basic

More information

EEG/MEG Inverse Solution Driven by fmri

EEG/MEG Inverse Solution Driven by fmri EEG/MEG Inverse Solution Driven by fmri Yaroslav Halchenko CS @ NJIT 1 Functional Brain Imaging EEG ElectroEncephaloGram MEG MagnetoEncephaloGram fmri Functional Magnetic Resonance Imaging others 2 Functional

More information

Anale. Seria Informatică. Vol. XIII fasc Annals. Computer Science Series. 13 th Tome 1 st Fasc. 2015

Anale. Seria Informatică. Vol. XIII fasc Annals. Computer Science Series. 13 th Tome 1 st Fasc. 2015 24 CONSTRUCTION OF ORTHOGONAL ARRAY-BASED LATIN HYPERCUBE DESIGNS FOR DETERMINISTIC COMPUTER EXPERIMENTS Kazeem A. Osuolale, Waheed B. Yahya, Babatunde L. Adeleke Department of Statistics, University of

More information

Two Issues in Using Mixtures of Polynomials for Inference in Hybrid Bayesian Networks

Two Issues in Using Mixtures of Polynomials for Inference in Hybrid Bayesian Networks Accepted for publication in: International Journal of Approximate Reasoning, 2012, Two Issues in Using Mixtures of Polynomials for Inference in Hybrid Bayesian

More information

CHAPTER 3 Further properties of splines and B-splines

CHAPTER 3 Further properties of splines and B-splines CHAPTER 3 Further properties of splines and B-splines In Chapter 2 we established some of the most elementary properties of B-splines. In this chapter our focus is on the question What kind of functions

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Superiorized Inversion of the Radon Transform

Superiorized Inversion of the Radon Transform Superiorized Inversion of the Radon Transform Gabor T. Herman Graduate Center, City University of New York March 28, 2017 The Radon Transform in 2D For a function f of two real variables, a real number

More information

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets

Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Parameter Estimation in the Spatio-Temporal Mixed Effects Model Analysis of Massive Spatio-Temporal Data Sets Matthias Katzfuß Advisor: Dr. Noel Cressie Department of Statistics The Ohio State University

More information

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation

Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation Choosing the Summary Statistics and the Acceptance Rate in Approximate Bayesian Computation COMPSTAT 2010 Revised version; August 13, 2010 Michael G.B. Blum 1 Laboratoire TIMC-IMAG, CNRS, UJF Grenoble

More information

1 Mixed effect models and longitudinal data analysis

1 Mixed effect models and longitudinal data analysis 1 Mixed effect models and longitudinal data analysis Mixed effects models provide a flexible approach to any situation where data have a grouping structure which introduces some kind of correlation between

More information

Calibrating Environmental Engineering Models and Uncertainty Analysis

Calibrating Environmental Engineering Models and Uncertainty Analysis Models and Cornell University Oct 14, 2008 Project Team Christine Shoemaker, co-pi, Professor of Civil and works in applied optimization, co-pi Nikolai Blizniouk, PhD student in Operations Research now

More information

Directed acyclic graphs and the use of linear mixed models

Directed acyclic graphs and the use of linear mixed models Directed acyclic graphs and the use of linear mixed models Siem H. Heisterkamp 1,2 1 Groningen Bioinformatics Centre, University of Groningen 2 Biostatistics and Research Decision Sciences (BARDS), MSD,

More information

Single Equation Linear GMM with Serially Correlated Moment Conditions

Single Equation Linear GMM with Serially Correlated Moment Conditions Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot October 28, 2009 Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t )

More information

Gaussian Processes 1. Schedule

Gaussian Processes 1. Schedule 1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project

More information

The Nature of Geographic Data

The Nature of Geographic Data 4 The Nature of Geographic Data OVERVIEW Elaborates on the spatial is special theme Focuses on how phenomena vary across space and the general nature of geographic variation Describes the main principles

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS

UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS UNIFORMLY MOST POWERFUL CYCLIC PERMUTATION INVARIANT DETECTION FOR DISCRETE-TIME SIGNALS F. C. Nicolls and G. de Jager Department of Electrical Engineering, University of Cape Town Rondebosch 77, South

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information