Bootstrap confidence intervals for reservoir model selection techniques

Bootstrap confidence intervals for reservoir model selection techniques Céline Scheidt and Jef Caers Department of Energy Resources Engineering Stanford University Abstract Stochastic spatial simulation allows rapid generation of multiple, alternative realizations of spatial variables. Quantifying uncertainty on response resulting from those multiple realizations would require the evaluation of a transfer function on every realization. This is not possible in real applications, where one transfer function evaluation may be very time consuming (several hours to several days). One must therefore select a few representative realizations for transfer function evaluation and then derive the production statistics of interest (typically the P, P5 and P9 quantiles of the response). By selecting only a few realizations one may risk biasing the P, P5 and P9 estimates as compared to the original multiple realizations. The principle objective of this study is to develop a methodology to quantify confidence intervals for the estimated P, P5 and P9 quantiles when only a few models are retained for response evaluation. Our approach is to use the parametric bootstrap technique, which allows the evaluation of the variability of the statistics obtained from uncertainty quantification and construct confidence intervals. A second objective is to compare the confidence intervals when using two selection methods available to quantify uncertainty given a set of geostatistical realizations: traditional ranking technique and the distance-based kernel clustering technique (DKM). The DKM has been recently developed and has been shown to be effective in quantifying uncertainty. The methodology is demonstrated using two examples. The first example is a synthetic example, which uses bi-normal variables and serves to demonstrate the technique. The second example is from an oil field in West Africa where the uncertain variable is the cumulative oil production coming from 2 wells. The results show that for the same number of transfer function evaluations, the DKM method has equal or smaller error and confidence interval compared to ranking.

. Introduction Uncertainty quantification of subsurface spatial phenomena is done in the context of decision making, often by estimating low, mean and high quantile values (typically P, P5, and P9) of the response of interest. Often, an exhaustive sampling of all uncertain parameters is unfeasible, and only a small subset of reservoir model realizations of the phenomena can be created. Due to high computational requirements, the transfer function must be evaluated on an even smaller subset of realizations. Therefore, any quantiles that are estimated from this subset are themselves subject to uncertainty, and may vary depending on the selection method, the number of transfer function evaluations, the initial set of realizations, the use of a proxy response, etc. The objective of the study is to be able to quantify confidence intervals for the estimated P, P5 and P9 quantiles when only a few models are retained for response evaluation. The magnitude of the confidence intervals can then be used to decide whether or not more flow simulations are required to establish a better quantification of response uncertainty. The methodology developed uses parametric bootstrap technique, which is a statistical method allowing to construct confidence intervals of the estimated statistics. Such confidence intervals provide an idea on the variability of the statistics inferred by selecting only a few models for evaluation. The workflow can be applied using any technique of reservoir model selection. In this paper, we compare the behavior of the estimated quantiles using 3 different selection techniques. The first method is the traditional ranking technique (Ballin et al., 992), which select realizations according to a ranking measure. The second method has been developed recently and is called the distance-based kernel technique (DKM, Scheidt and Caers, 28). Finally, we use a random selection for comparison. It should be noted that the proposed bootstrap technique applies to any model selection methodology. The paper is organized as follows. In the next section, we give a description of the two methods employed to quantify uncertainty in spatial parameters. Then, we give a brief overview of the basic ideas of the bootstrap methodology in the context of parametric inference, illustrated by a typical example. We then describe our workflow which is applied to cases where we have a proxy response which can be evaluated rapidly for each realization, and a true response which cannot be evaluated for every realization. The subsequent section is devoted to the application of the specific workflow to two examples, the first being a synthetic example, the second is an example from an oil field in West Africa. Finally, we discuss the results obtained as well as some concluding remarks. 2. Quantification of uncertainty methodologies Uncertainty quantification of a spatial phenomenon aims at characterizing the 2

statistics (P, P5 and P9) of the response(s) of interest. In real applications where one transfer function evaluation can be very time consuming, it may not be possible to perform a transfer function evaluation on every realization of the reservoir. This difficulty can be overcome by selecting a representative set of realizations from the initial set. In this paper, we consider two different ways of selecting realizations for transfer function evaluation. The first method is the traditional ranking technique, which was introduced by Ballin et al. in 992. The second method, denoted the Distance-Kernel Method (DKM) is more recent and was first presented in 28 (Scheidt and Caers) and applied to a real case in Scheidt and Caers (29). 2.. Traditional Traditional ranking technique was introduced by Ballin in 992 in the context of stochastic reservoir modeling. The basic idea behind ranking is to define a rapidly calculable ranking measure, which can be evaluated for each realization. Most of the time, the ranking measure is static (eg. original oil-in-place), however more recent studies employ more complex measures, such as connectivity (McLennan and Deustch, 25), streamline (Gilman et al., 22) or tracer-based measures (Ballin et al., 992, Saad et al., 996). The ranking measure acts as a proxy of the response of interest for each realization. To be effective, therefore, ranking requires a good correlation between the ranking measure and the response. The ranking measure is used to rank the realizations according to the measure, and realizations are subsequently selected corresponding typically to the P, P5 and P9 quantiles. Full flow simulation is then performed on these selected realizations, and the P, P5 and P9 values are derived from the simulation results. In previous work (Scheidt and Caers, 29), we show that selecting only 3 realizations to derive the P, P5, and P9 quantiles can result in very inaccurate estimations. In this study, contrary to the standard ranking approach, we propose to select more than 3 realizations, and compare ranking with the Distance-Kernel Method proposed below. The realizations are selected equally-spaced according to the ranking measure, and we derive the P, P5 and P9 quantiles by interpolation from the distribution of the selected points. 2.2. Distance-Kernel Method In this section, we describe the main principle of the Distance-Kernel Method (DKM), illustrated in Figure. Starting from a large number of model realizations, the first step is to define a dissimilarity distance between the realizations. This distance is a measure of the dissimilarity between any two realizations, and should be tailored to the application and the response(s) of interest (just as in ranking), in order to make uncertainty quantification more efficient. The distance is evaluated between any two realizations, and a dissimilarity distance table ( R x R ) is then derived. Multi-dimensional scaling (MDS) is then applied using the distance table (Borg and 3

Groenen, 997). This results in a map (usually 2 or 3D) of the realizations, where the Euclidean distance between any two realizations is similar to the distance table. ote that only the distance between the realizations in the new space matters - the actual position of the realizations is irrelevant. Once the realizations are in MDS space, one could classify realizations and select a subset using clustering techniques. However, often the points in MDS space do not vary linearly and thus classical clustering methods would result in inaccurate classification. To overcome the nonlinear variation of the points, Schöelkopf et al. (22) introduced kernel methods to improve the clustering results. The main idea behind kernel methods is to introduce a highly non-linear function Φ and map the realizations from the MDS space to a new space, called feature space. The high dimensionality of that space makes the points behave more linearly and thus standard classification tools, such as clustering, can be applied more successfully. In this paper, we employ kernel k-means to select representative realizations of the entire set. Transfer function evaluation is then applied on the closest realization to the centroids and the statistics (P, P5 and P9) are computed on the small subset of realizations. Model Model 2 δ 2 δ 3 δ 4 δ 24 δ 32 δ 34 Model 3 Model 4 2 3 4 Distance Matrix D δ δ 2 δ 3 δ 4 2 δ 2 δ 22 δ 32 δ 42 Euclidean Space on-linear variation (a) (b) (c) 3 δ 3 δ 23 δ 33 δ 43 4 δ 4 δ 24 δ 34 δ 44 R simulation P,P5,P9 model selection Kernels Φ linear features (e) R Φ - Apply Apply standard standard tools tools here here PCA PCA Clustering Clustering Dimensionality Dimensionality reduction reduction (d) F Figure : DKM for uncertainty quantification: (a) distance between two models, (b) distance matrix, (c) models mapped in Euclidean space, (d) feature space, (e) preimage construction, (f) P, P5, P9 estimation 4

For more details about the methodology, we refer to Scheidt and Caers (28). 3. Parametric Bootstrap Methodology 3.. General introduction to Bootstrap Bootstrap methods fall within the broader category of resampling methods. The concept of the bootstrap was first introduced by Efron (979). In his paper, Efron considered two types of bootstrap procedures (nonparametric and parametric inference). Bootstrap is a Monte-Carlo simulation technique that uses sampling theory to estimate the standard error and the distribution of a statistic. In many recent statistical texts, bootstrap methods are recommended for estimating sampling distributions, finding standard errors and confidence intervals. A bootstrap procedure is the practice of estimating properties of an estimator (such as its variance) by measuring those properties when sampling from an approximate distribution. In the parametric bootstrap, we consider an unknown distribution F to be a member of some prescribed parametric family and obtain the empirical distribution Fˆ n by estimating the parameters of the family from the data. Then, a new random sequence, called a resample, is generated from the distribution Fˆ n. The parametric bootstrap procedure works as follows. First, the statistics θˆ of the distribution of the initial sample are computed (for example the mean and variance). Then, the distribution Fˆ n is estimated using those statistics. We assume that the distribution Fˆ n is the true distribution and we use Monte-Carlo simulation to generate B new samples of the initial sample using the distribution Fˆ n. ext, we apply the same estimation technique to these bootstrapped data to get a total of B bootstrap estimates of θˆ, which are denoted ˆ*b θ, b =, B. Using these B bootstrap estimates, we can compute confidence intervals or any other statistical measure of error. Simple illustrative example A simple example illustrating the parametric bootstrap method is presented in Figure 2. Suppose we have R =5 values X = (x,,x R ) of a normal distribution ( µ, σ ) and we are interested in the estimation of the unknown parameters µ and σ. The first step is to assume that X has a normal distribution F n and then to estimate the mean and variance of the distribution: µˆ = x and ( ) 2 R 2 ˆ σ = x i x R i= 5

We assume that µˆ and σˆ are the true parameters and we generate B = new samples X (b=,,b) from Fˆ n = ( ˆ, µ ˆ σ ) using Monte-Carlo simulation, each sample containing R =5 values. For each sample, the bootstrap estimate of the mean and variance of the distribution can be calculated: ˆ = µ b x * Having computed ˆ b 2* b ( ) * ˆ µ, ˆ σ R 2 and ˆ σ = ( xi x )2 R i= θ =, one can now construct a histogram of the mean and the variance to display the probability distribution of the bootstrap estimator (Figure 2). From this distribution, one can obtain an idea of the statistical properties of the estimates µˆ and σˆ. In Figure 2, the red line represents the estimation of the mean µˆ and variance σˆ of the initial sample. F ( µ, σ ) X = [ x,..., x ] R ˆ θ = ( ˆ, µ ˆ) σ 3 3 25 25 Fˆ ( ˆ, µ ˆ) σ Frequency 2 5 Frequency 2 5 5 5 X = [ x,..., x R ] 2 3 4 5 6 7 8 Bootstrap Mean 5 5 2 25 Bootstrap Variance ˆ * b θ = ( ˆ µ, ˆ σ ) Figure 2: Application of the parametric bootstrap procedure to a simple example The histograms of the bootstrap estimations of the mean and the variance are informative about the variability of the statistics obtained. Confidence intervals of the estimated mean and variance (or any quantiles) can then be calculated from the B estimates of the mean and variance. The next section shows how to apply the bootstrap method in the context of uncertainty quantification where a proxy value can be rapidly calculated for many realizations of a spatial phenomenon. 3.2. Workflow of the study Contrary to the previous example where the data are univariate, in the context of reservoir model selection techniques, a proxy response is employed for the selection 6

using DKM or ranking and thus two variables are necessary: the response of interest and the proxy response. Therefore, we consider a bivariate variable X = [X, X 2,, X R ], where: X i = [x i,y i ], i =,, R, R being the total number of samples/realizations x i represents the response of interest (e.g. cumulative oil production) y i represents the proxy response, which will serve as a ranking measure or be transformed to a distance. ote that for ranking and DKM to be effective, the response and its proxy should be reasonably well correlated. In addition, for real applications, the values of the true response x i for each realization are unknown. In traditional uncertainty quantification, the proxy response serves as a guide to select a few realizations which will be evaluated using the transfer function. The response quantiles are then deduced from the evaluation of the realizations. Since the resulting quantiles are subject to uncertainty, the bootstrap method illustrated previously is well suited to the problem and can inform us on the accuracy of the estimated quantiles and give an idea of the error resulting from the selection of a small subset of realizations. The workflow in the context of reservoir model selection is as follow. It is illustrated in Figure 3.. Evaluate a proxy response y i for each of the i =,, R realizations. 2. Apply ranking or DKM using the proxy response in order to select samples/realizations for evaluation (note that << R ). Compute the statistics of interest - P, P5 and P9 in the case of uncertainty ˆ * * * θ = xˆ, xˆ, xˆ. quantification: ( ) P P5 P9 3. Assume that the distribution of X is a member of some parametric family and fit a bivariate parametric model Fˆ n by estimating the family parameters from the selected data. 4. Assume that Fˆ n is the true distribution and use Monte-Carlo simulation to generate B new samples from this parametric model Fˆ n. For each of the B samples generated, apply ranking or DKM to select realizations and compute the statistics of interest: ˆ * * * = ( xˆ, ˆ, ˆ P xp xp ) θ. 5. From the B samples of θˆ, compute the confidence intervals on any statistics of interest. One way to estimate confidence intervals from bootstrap samples is to take the α and -α quantiles of the estimated values (α = in this study). These are called bootstrap percentile 5 9 7

intervals. 6. A single measure of accuracy of our quantile estimation is defined by computing the dimensionless bootstrap error of the estimated quantiles for each of the B new samples created (Eq. ): xˆ P xˆ P xˆ P xˆ ˆ ˆ 5 P x 5 P x 9 P 9 error = + + 3 () xˆ ˆ ˆ P x P x 5 P9 The bootstrap error of the estimated quantiles is evaluated on each sample, and thus can be represented as a histogram to visualize the variability between the samples. From the histogram, we can quantify the variation of the bootstrap error of the estimated quantiles. True Values x, K, x R y, K y, R / -> Select points x K x * *,, y K y * *,, Parametric Bootstrap Estimation of distribution Fˆn Generation of B samples xˆ * P, xˆ * P 5, xˆ * P 9 b =,,B x,k, y,k, x y, K, x ( b ) ( b ) / R -> Select points ( b ) ( b ) y, K, y x R xˆ P, xˆ P 5, xˆ P 9 error = 3 xˆ P xˆ xˆ P P + xˆ P5 xˆ xˆ P5 P5 + xˆ P9 xˆ xˆ P9 P9 Figure 3: Workflow of the bootstrap method applied to uncertainty quantification The workflow described previously and illustrated in Figure 3 is performed for several values of, where is the number of selected realizations for evaluation. This is done to evaluate the influence of the number of transfer function evaluations on the accuracy of the quantile estimation. For each value of, the selected realizations are obtained using DKM or ranking methods, and therefore the realizations are different for each value of. ow that the basic idea and theory of the bootstrap method has been presented, 8

the next section shows some application examples of this technique in the context of uncertainty quantification. 4. Application of the methodology to uncertainty quantification Two examples are presented in this section. The first one is illustrative and uses a bivariate Gaussian distribution. The second example is more complex and is based on a real oil field reservoir in West Africa (West Coast African reservoir) and uses real production data. In the case of DKM, the definition of a distance between any two realizations is required. In this study, in order to compare the results of the DKM with those obtained by ranking using the exact same information, we use simply the difference of ranking measure (proxy response) as a distance between realizations. ote however that, as opposed to the ranking measure, the distance can be calculated using a combination of many different measures, and thus has more flexibility to be tailored to the application. We will discuss the consequences of this in more detail below. 4.. Bivariate Gaussian distribution In the first example, we consider a bivariate Gaussian distribution: X ~ bi ( µ, Σ), where µ represents the mean and Σ the covariance matrix. In this example, the mean of the sample is taken as µ = [5,5], and the covariance is taken 2 2ρ as: Σ =. The parameter ρ defines the correlation coefficient between the 2ρ 2 target response and the proxy response. To set up an example, an initial sample X of R = values is generated using the distribution given above. ote that for this illustrative example, we use the term sample instead of realization, since no geostatistical realization is associated to each bivariate value. Figure 4 shows an example of the probability density plot of the binormal sample X, where the correlation coefficient between the target and proxy responses was defined as ρ =.9. 9

Figure 4: Probability density of X, which has a bi-normal distribution ow that the initial data is defined, we assume that we only know the type of distribution of X (bi-normal), but that we do not know the parameters defining the distribution (the mean µ and the covariance Σ ). The bootstrap procedure illustrated in Figure 3 is applied taking the sample X generated previously (Figure 4) and using DKM to select =5 points. Estimation of the mean µˆ and the covariance Σˆ are then obtained using the response on the 5 selected points and thus the estimated bivariate distribution of the data is assumed to be the true distribution: Fˆ n = bi( ˆ, µ Σˆ ). B= new samples of this distribution can then be easily derived, since the distribution is known. Uncertainty quantification is then performed on those B samples, and an estimation of the variability of the quantiles is possible. Examples of the bootstrap histograms of the P, P5 and P9 quantiles are presented in Figure 5. 3 25 3 Frequency 25 2 5 5 Frequency 2 5 5 Frequency 25 2 5 5 2 3 4 5 Bootstrap P 4 4.5 5 5.5 6 6.5 Bootstrap P5 5 6 7 8 9 Bootstrap P9 x ˆ * P Figure 5: Histogram of the P, P5 and P9 quantiles estimated by bootstrap, xˆ * P5, xˆ * P9. The red line represents the estimated quantiles x ˆ * P, xˆ estimates are calculated using DKM to select 5 points. * P5, xˆ * P9. The

We observe on Figure 5 that the distribution of the bootstrap quantiles is Gaussian. In addition, there is a small bias in the estimation of the P and P9 quantiles for this example. Although this is not shown, ranking has the same effect. The result is that on average, the xˆp is overestimated and the x ˆ P9 is underestimated. The biased estimates should not affect the determination of the confidence intervals. In our study, we have found that the estimated mean µˆ and covariance Σˆ from the initial sample had an impact on the confidence intervals. Since our goal in this first example is to understand what the general behavior is when varying the number of selected samples, we propose to do a Monte-Carlo bootstrap, which basically means that we randomize the initial sample and use C sets of initial samples, then perform the workflow illustrated in Figure 3 on those C sets of initial samples. The estimated statistics of each initial sample are averaged to obtain the final statistics. In this study, we take C = 5. In the next few examples, the workflow illustrated in Figure 3 has been performed by varying the number of selected samples ( = 5, 8,, 5 and 2 more precisely), in order to examine the effect of the number of transfer function evaluations on the bootstrap error. In addition, several correlation values between the proxy response and the target response were used to explore the influence of the correlation coefficient on the confidence intervals. Results are presented in Figure 6, for ρ =,.9,.8,.7,.6 and.5 respectively. Figure 6 shows the confidence intervals of the error of the bootstrap estimated quantiles for DKM (blue - square) and ranking (red - dot) for different values of. The number of bootstrap samples generated is B =. The symbols represent the P5 value of the error of the estimated quantiles, in other words, half of the estimated quantiles have an error below this value and half above.

ρ =. ρ =.9 Error on quantile estimation.3.25.2.5..5 Error on quantile estimation.3.25.2.5..5 5 5 2 5 5 2 ρ =.8 ρ =.7 Error on quantile estimation.3.25.2.5..5 Error on quantile estimation.3.25.2.5..5 5 5 2 5 5 2 ρ =.6 ρ =.5 Error on quantile estimation.3.25.2.5..5 Error on quantile estimation.3.25.2.5..5 5 5 2 5 5 2 Figure 6: Confidence intervals (α = ) of the bootstrap error of the estimated quantiles as a function of the number of function evaluation for ρ =,.9,.8,.7,.6 and.5. The symbols represent the P5 value of the bootstrap error. We observe on Figure 6 that the error globally decreases as the number of transfer function evaluation increases. Also, the confidence intervals tend to narrow as the number of transfer function evaluation increases, meaning that the error in our estimates decreases. Both methods, DKM and ranking, provide similar results. However, the error obtained by the DKM is slightly smaller than the one observed for ranking. The same remark is valid for the confidence intervals. Finally, the results 2

provided by DKM vary smoother than the one obtained by ranking technique. ote that each method selects optimally samples for evaluation. Therefore, the = 8 models selected do not necessarily include the = 5 models. This is true for all. The bootstrap method can also be used to compute an estimate of the correlation coefficient between the actual response and the proxy response. Figure 7 presents the confidence intervals for the correlation corresponding to the results obtained in Fig. 6. ρ =.9 ρ =.8.95 Estimated correlation.95.9.85.8 Estimated correlation.9.85.8.75.7.65.6.75 5 5 2.55 5 5 2 ρ =.7.2 ρ =.6.9 Estimated correlation.8.7.6.5.4 Estimated correlation.8.6.4.2 5 5 2 -.2 5 5 2 ρ =.5 Estimated correlation.8.6.4.2 -.2 -.4 5 5 2 Figure 7: Bootstrap estimates (α = ) of the correlation between the response and the proxy. The black line represents the input correlation (ρ =.9,.8,.7,.6 and.5) to generate the first sample 3

We observe on Figure 7 that the estimation of the correlation coefficient tends to be overestimated, especially for small values of. Figure 7 also shows that the correlation estimation becomes more accurate and less prone to error as the number of transfer function evaluation increases. The next section illustrates the workflow using a real oil reservoir, located in West Africa. 4.2. West Coast African reservoir Reservoir Description The West Coast African (WCA) reservoir is a deepwater turbidite offshore reservoir located in a slope valley. The reservoir is located offshore in 6 feet of water and is 46 feet below see level. Four depositional facies were interpreted from the well logs: shale (Facies ), poor quality sand # (Facies 2), poor quality sand #2 (Facies 3) and good quality channels (Facies 4). The description of the facies filling the slope valley is subject to uncertainty. 2 TIs are used in this case study, representing uncertainty on the facies representations. The reservoir is produced with 28 wells, of which 2 are production wells and 8 are water injection wells. The locations of the wells are displayed in Figure 8. Wells colored in red are producers wells and in blue are injectors. Figure 8: Location of the 28 wells. Red are production wells and blue are injection wells. Different colors in grid represent different fluid regions 72 geostatistical realizations were created using the multi-point geostatistical algorithm snesim (Strebelle, 22). To include spatial uncertainty, two realizations were generated for each combination of TI and facies probability cube, leading to a total of 72 possible realizations of the WCA reservoir. Each flow simulation took 4.5 hours. 4

In a previous paper (Scheidt and Caers, 29), uncertainty quantification on the WCA reservoir has been performed by performing only a small number of simulations. It was shown that the statistics obtained by flow simulation on 7 realizations selected by the DKM are very similar to the one obtained by simulation on the entire set of 72 realizations. A comparison with the traditional ranking method showed that the DKM method easily outperforms ranking technique without requiring any additional information. However, in reality, one would not have access to the results of those 72 flow simulations, hence one would not know how accurate the results of P, P5, P9 of those 7 flow simulations are with respect to the entire set of 72 flow simulations. In this study, the response of interest is the cumulative oil production at 2 days. We have evaluated the response for each of the 72 realizations, as a reference. For the proxy response, we evaluated the cumulative oil production using streamline simulation (Batycky et al., 997). The correlation coefficient between the response and the proxy is ρ =.92. In order to perform the parametric bootstrap procedure, we must estimate the distribution of the cumulative oil production and its ranking proxy, and be able to generate new samples with densities of the bivariate distribution. Because we do not know a priori the distribution of the cumulative oil production and its proxy (contrary to the previous example), we propose to compute the bivariate densities using a kernel smoothing technique (Bowman and Azzalini, 997). Generation of a sample for a kernel smoothing density Kernel smoothing (Bowman and Azzalini, 997) is a spatial method that generates a map of density values. The density at each location reflects the concentration of points in the surrounding area. Kernel smoothing does not require making any parametric assumption about the probability density function (pdf). The kernel smoothing density of a variable X = x, K, ] is defined as follow: [ x R R x xi fˆ( x, h) = K p h i= h with K the kernel function and h the bandwidth. R, x, x i R In the case of a Gaussian rbf kernel, the kernel smoothing density is defined as: 2 R x xi fˆ( x, h) = exp p / 2 Rh (2π ) i= 2 h Choosing the bandwidth for the kernel smoothing can be a difficult task, and is generally a compromise between acceptable smoothness of the curve and fidelity to the data. The choice of h has an impact on the overall appearance of the resulting smooth curve, much more so than the choice of the kernel function which is generally held to be of secondary importance. In this work, we use a bandwidth which is function to the number of points in X. p 5

For example, Figure 9 shows the density distribution of the 72 data from the WCA example, estimated by kernel smoothing using a Gaussian kernel. Figure 9: Probability density of X for WCA Once the density of the bivariate variable has been defined, new samples of the same distribution can be generated using Metropolis sampling technique. Overview of the Metropolis sampling algorithm Metropolis-Hasting technique is a Markov chain-based method which allows generating a random variable having a particular distribution (Metropolis and Ulam 949, Metropolis et al. 953). The Metropolis algorithm generates a sequence of samples from a distribution f as follows:. Start with some initial value x 2. Given this initial value, draw a candidate value x* from some proposal distribution (we choose a uniform distribution) 3. Compute the ratio α of the density at the candidate x* and the current x t- points and accept the candidate point with probability α: α = f ( x*) f ( ) x t 4. Return to step 2 until the desired number of samples is obtained. 5. The new sample ( x,..., ) x t is of distribution f 6

An illustration of a sample generated by Metropolis sampling associated with the density provided by kernel smoothing is presented in Figure. The contours present the density probability which is calculated using = values of response selected by DKM. The red points show 7 values derived from this density by Metropolis sampling. Figure : Generation of a new sample using Metropolis sampling. The contours represent the probability density obtained by kernel smoothing and red dots represent new sample generated by Metropolis sampling Application of the bootstrap technique to the WCA case In the context of uncertainty quantification in cumulative oil production, the initial data are the flow simulations at the R = 72 realizations of the WCA reservoir: x, x R : Cumulative oil production obtained by full flow simulation (target response) y, y R : Cumulative oil production obtained by fast flow simulation (proxy response) The distance employed for the DKM is computed as the absolute value of the difference of proxy response between any two realizations: d ij = y y. The bootstrap procedure, illustrated in Figure 3, is performed for different number of transfer function evaluation: in this case = 3, 5, 8,, 5 and 2. For each value of, the procedure to generate B bootstrap samples is as follow:. Select realizations using the proxy response as ranking measure or distance measure d according to the method used 2. Evaluation of the response using the transfer function (flow simulation) on the selected realizations i j 7

3. Compute the bivariate density Fˆ n of the target response using kernel smoothing on the responses resulting from the selected realizations 4. Use Metropolis sampling to generate B samples of the bivariate distribution Fˆ n 5. For each of the B samples generated, apply ranking or DKM to select realizations and compute the statistics of interest: ˆ * * * = ( xˆ, ˆ, ˆ P xp xp ) θ. The workflow illustrated in Figure gives more details than the general workflow in Figure 3, by including the estimation of Fˆ n by kernel smoothing and the generation of new samples by metropolis sampling. 5 9 Proxy measure: y, K y, R /ranking to select real. Response evaluation on selected real. Kernel Smoothing on selected realizations Metropolis sampling to generate a new sample x K x y K y * *,, * *,, Fˆn x y, K, x ( b ) ( b ) R, K, y ( b ) ( b ) R /ranking to select real. P, P5 and P9 evaluation on selected real. x,k, y,k, x y Figure : Workflow for confidence interval calculation The next section shows an application of the workflow illustrated above in Figure. The workflow is performed using 3 different methods for selecting the realizations: DKM, ranking and random selection. Our objective is to see how each method behaves as the number of transfer function evaluation increases and how they compare to each other. First, we compare the 3 methods by looking at the histograms of the bootstrap error of the estimated quantiles for each method (Figure 2). The bootstrap error is computed using Eq. above. The blue, red and green bars represent the error obtained for DKM, ranking and random selection respectively. 8

3 25 = 3 DKM Random 3 25 = 5 DKM Random Frequency 2 5 Frequency 2 5 5 5 Frequency.2.4.6.8..2 Response Value 3 25 2 5 5 = 8 DKM Random.2.4.6.8..2 Response Value Frequency.2.4.6.8..2 Response Value 3 25 2 5 5 = DKM Random.2.4.6.8..2 Response Value 3 25 = 5 DKM Random 3 25 = 2 DKM Random Frequency 2 5 Frequency 2 5 5 5.2.4.6.8..2 Response Value.2.4.6.8..2 Response Value Figure 2: Histograms of the bootstrap error of the estimated quantiles for different number of function evaluation and 3 selection methods. We observe that, in each case, the DKM method performs better than the ranking technique. For all values of, the errors are globally smaller for the DKM than for ranking or random selection. In addition, the error variance is reduced with more transfer function evaluations. 9

Figure 2 represents the bootstrap percentile intervals (α = ) of the bootstrap error of the estimated quantiles. The symbol in each interval represents the P5 value of the error. Error on quantile estimation..8.6.4.2 Quantiles estimation Random 5 5 2 Figure 3: Confidence intervals (α = ) of the bootstrap error of the estimated quantiles as a function of the number of function evaluations We observe on Figure 3 that the error tends to decrease as the number of function evaluations increases. As observed before on the histograms, DKM performs better than ranking, which performs better than random selection. This conclusion was also reached in Scheidt and Caers (29). In this example, we observe that for the DKM, the results stabilize for > 8. We can therefore conclude that 8 or flow simulations are necessary for the DKM selected models to have the same uncertainty as the total set of 72. In a previous paper (Scheidt and Caers, 29), it was concluded that 7 simulations were satisfactory. ote however that the distance in that work was slightly more correlated to the difference in response compared to the correlation in this study. The table below represents the mean of the bootstrap error, computed from the histograms presented in Figure 2. DKM Random select. = 3.356.495.56 = 5.333.348.47 = 8.28.293.324 =.253.27.322 = 5.25.325.367 = 2.27.36.34 Table : Mean of the dimensionless bootstrap error for each selection method. 2

This table, as well as the histograms and confidence intervals can be very useful to give an indication of the error resulting from the quantile estimation of the response, based on the selected realizations. For example, suppose we are limited in time and can only perform 5 transfer function evaluations. However, we want to be sure that we can be confident on the uncertainty quantification results derived from those 5 simulations. From Table, we can see that the mean error for = 5 for DKM is.333 and.495 for ranking. If we had a little more time and had performed = 8 simulations, the error would be.28 and.293, which is an improvement of 6% (5.8% for ranking) compared to the results from = 5. Another way of looking at the results is to show the confidence intervals for each quantile individually. This is illustrated on Figure 4. 7.5 x 4 Quantiles estimation 7 Random 7.6 x Quantiles estimation 4 7.4 7.2 7 P 6.5 P5 6.8 6.6 6 5.5 6.4 6.2 6 5 5 2 8.2 x Quantiles estimation 4 8 7.8 7.6 Random 5 5 2 P9 7.4 7.2 7 6.8 6.6 Random 5 5 2 Figure 4: Confidence intervals of the bootstrap estimates of the quantiles P, P5 and P9 (BBL) as a function of the number of function evaluation. The line represents the quantiles derived from the entire set of realizations Figure 4 shows that DKM and ranking produce very accurate estimates of the P5 quantile of the target response, even for small number of transfer function evaluations (medians are easier to estimate than extremes). In addition, the P quantiles tend to be slightly underestimated, but DKM is closest to the reference 2

value than the other techniques. The same conclusions are valid for the P9, except that we observe an overestimation of the quantiles. The underestimation of P and overestimation of P9 are most likely due to the use of kernel smoothing to estimate the density, which will increase the variability of the response compared to the original 72 realizations. As mentioned in the beginning of the paper, the proxy measure should be correlated for DKM and ranking to be effective. However, the correlation coefficient between both responses is not known a priori, since the target response for all realizations is unknown. Once a selection method is applied and the transfer function is evaluated on the selected realizations, an estimation of the correlation coefficient can be inferred. The quality of the estimated correlation coefficient can be studied in exactly the same way than the estimated quantiles, by doing parametric bootstrap. Figure 5 represents the confidence intervals obtained for different values of, the correlations being estimated on the same samples used to estimate the quantile error. The symbols show the initial estimates of the covariance ρˆ. Quantiles estimation Estimated correlation coefficient.8.6.4.2 5 5 2 Figure 5: Bootstrap estimated correlation coefficient on the WCA test case. Figure 5 shows that the first estimates ρˆ of the correlation coefficient between the 2 responses are accurate, with a slight overestimation for small number of transfer function evaluations ( = 3 and 5). However, the bootstrap estimated correlation coefficients are largely underestimated. We believe that this is due to the kernel smoothing technique, which tends to smooth the density of the bivariate data and therefore allow Metropolis sampling to sample points in a wider area than it should. This was not the case for the bi-normal example in Section 4.. However, we can still derive conclusions on the confidence intervals provided. We observe that DKM tends to have less uncertainty in the correlation coefficient than ranking, except for = 8. 22

5. Discussion on distances The above examples demonstrate that using the same measure for ranking and distance provides for similar accuracy in uncertainty quantification for the Gaussian case. We should emphasize however that the bootstrap method applied in the context of the paper is clearly unfavorable to DKM. In order to compare ranking and the DKM, we calculated the distance between 2 realizations as the difference of the ranking measure between the realizations. This leads to a representation of uncertainty in a D MDS-space, and therefore the use of kernel methods has not the same impact as for higher dimensional MDS-space. The distance in this study is very simple, whereas in many applications the distance can be much more complex, and can take into account many measures of influential factors on the response. For example, a distance can be a function of many parameters, such as the cumulative oil production at different times, and water-cut of a a group of wells (Scheidt and Caers, 29). Using traditional ranking techniques may require multiple independent studies if one is interested in uncertainty in several responses. In the case of DKM, a single study is enough if the distance is well chosen. 6. Conclusions We have established a new workflow to construct confidence intervals on quantile estimations in model selection techniques. We would like to state explicitly that we do not treat the question of whether or not the uncertainty model, i.e. the possibly large set of reservoir models that can be generated by varying several input parameters, is realistic. Uncertainty quantification by itself is inherently subjective and any confidence estimates of the uncertainty model itself are therefore useless. In this paper we assume there is a larger set of model realizations and assume that this set provides a realistic representation of uncertainty. Then, the proposed bootstrap allows quantifying error on uncertainty intervals or quantiles when only a few models from the larger set are selected. The workflow uses model selection methods in this work DKM or ranking - and employs a parametric bootstrap procedure to construct confidence intervals on the quantiles retained by the model selection techniques. Examples show that DKM provides more robust results compared to ranking, especially for small number of transfer function evaluations. The study of the uncertainty resulting from model selection can be very informative - it shows if we can be confident or not in the estimated statistics. The confidence interval is a function of the estimated variance of the response and the estimated correlation coefficient between the proxy measure and the response. Since the user does not know the correlation coefficient a priori, we propose performing a bootstrap procedure between the response and its proxy to estimate the quality of the distance. If the estimated correlation coefficient is high and its associated uncertainty low, then we can be confident on the uncertainty quantification results. If after transfer function evaluations the uncertainty is large 23

and a poor correlation is found, then the results should be improved by either using a better proxy response or doing more transfer function evaluations. omenclature R = number of initial realizations = number of selected realizations for transfer function evaluation X = [X,, X R ] Xi = [x i, y i ] x i = target response value for realization i y i = proxy response value for realization i d ij = distance between realizations i and j ρ = correlation coefficient between the target and proxy responses B = number of samples generated in the bootstrap procedure e = bootstrap error of the estimated quantiles for sample b * * * x ˆ, xˆ, xˆ = estimated P, P5 and P9 after the first selection method P * P P5 * P5 P9 x ˆ, xˆ, xˆ = bootstrap estimated quantiles for the second selection method * P9 References Ballin, P.R., Journel A.G., and Aziz, K. [992] Prediction of Uncertainty in Reservoir Performance Forecast, JCPT, no. 4. Batycky, R. P., Blunt, M. J. and Thiele, M. R. 997. A 3D Field-Scale Streamline-Based Reservoir Simulator, SPERE 2(4): 246-254. Borg, I., Groenen, P. 997. Modern multidimensional scaling: theory and applications. ew-york, Springer.Bowman, A. W., and A. Azzalini, [997] Applied Smoothing Techniques for Data Analysis, Oxford University Press Erfon, B. [979]. Bootstrap methods: Another look at the Jackknife, The Annals of Statistics 7 (): -26 Hastings, W. K. 97. Monte Carlo sampling methods using Markov Chains and their applications. Biometrika 57: 97 9. McLennan, J.A., and Deutsch, C.V. 25. Geostatistical Realizations by Measures of Connectivity, Paper SPE/PS-CIM/CHOA 9868 presented at the SPE International Thermal Operations and Heavy Oil Symposium, Calgary, Alberta, Canada, -3 ovember. Metropolis,., and S. Ulam. 949. The Monte Carlo method. J. Amer. Statist. Assoc. 44: 335 34. Metropolis,., A.W. Rosenbluth, M.. Rosenbluth, A.Teller, and H. Teller. 24

953. Equations of state calculations by fast computing machines. Journal of Chemical Physics 2: 87 9 Saad,., Maroongroge, V. and Kalkomey C. T. 996. Geostatistical Models Using Tracer Production Data, Paper presented at the European 3-D Reservoir Modeling Conference, Stavanger, orway, 6-7 April. Scheidt, C., and Caers, J. 28. Representing Spatial Uncertainty Using Distances and Kernels. Mathematical Geosciences, DOI:.7/s4-8-986-. Scheidt, C., and Caers, J. 29, A new method for uncertainty quantification using distances and kernel methods. Application to a deepwater turbidite reservoir. Accepted in SPEJ. To be published. Schoelkopf B., Smola A. (22) Learning with kernels, MIT Press, Cambridge, 664p. Strebelle, S. 22. Conditional Simulation of Complex Geological Structures using Multiple-point Statistics, Mathematical Geology, 34(): -22. 25