The Tau model for Data Redundancy: Part 2

Similar documents
Combining individual data information: A review and the tau model

Entropy of Gaussian Random Functions and Consequences in Geostatistics

Multiple-Point Geostatistics: from Theory to Practice Sebastien Strebelle 1

COLLOCATED CO-SIMULATION USING PROBABILITY AGGREGATION

Geostatistical Determination of Production Uncertainty: Application to Firebag Project

COLLOCATED CO-SIMULATION USING PROBABILITY AGGREGATION

Computational Challenges in Reservoir Modeling. Sanjay Srinivasan The Pennsylvania State University

The Necessity of a Multiple-Point Prior Model

Characterization of Geoobjects Continuity using Moments of Inertia

Reservoir Uncertainty Calculation by Large Scale Modeling

Correcting Variogram Reproduction of P-Field Simulation

Quantitative Seismic Interpretation An Earth Modeling Perspective

A Short Note on the Proportional Effect and Direct Sequential Simulation

Advances in Locally Varying Anisotropy With MDS

A008 THE PROBABILITY PERTURBATION METHOD AN ALTERNATIVE TO A TRADITIONAL BAYESIAN APPROACH FOR SOLVING INVERSE PROBLEMS

A MultiGaussian Approach to Assess Block Grade Uncertainty

Conditional Standardization: A Multivariate Transformation for the Removal of Non-linear and Heteroscedastic Features

Reliability of Seismic Data for Hydrocarbon Reservoir Characterization

Large Scale Modeling by Bayesian Updating Techniques

USING GEOSTATISTICS TO DESCRIBE COMPLEX A PRIORI INFORMATION FOR INVERSE PROBLEMS THOMAS M. HANSEN 1,2, KLAUS MOSEGAARD 2 and KNUD S.

Inverting hydraulic heads in an alluvial aquifer constrained with ERT data through MPS and PPM: a case study

Building Blocks for Direct Sequential Simulation on Unstructured Grids

Statistical Rock Physics

Assessing uncertainty on Net-to-gross at the Appraisal Stage: Application to a West Africa Deep-Water Reservoir

Combining geological surface data and geostatistical model for Enhanced Subsurface geological model

PRODUCING PROBABILITY MAPS TO ASSESS RISK OF EXCEEDING CRITICAL THRESHOLD VALUE OF SOIL EC USING GEOSTATISTICAL APPROACH

4th HR-HU and 15th HU geomathematical congress Geomathematics as Geoscience Reliability enhancement of groundwater estimations

Modeling of Atmospheric Effects on InSAR Measurements With the Method of Stochastic Simulation

NEW GEOLOGIC GRIDS FOR ROBUST GEOSTATISTICAL MODELING OF HYDROCARBON RESERVOIRS

Geostatistical History Matching coupled with Adaptive Stochastic Sampling: A zonation-based approach using Direct Sequential Simulation

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

Optimizing Thresholds in Truncated Pluri-Gaussian Simulation

Anomaly Density Estimation from Strip Transect Data: Pueblo of Isleta Example

Direct forecasting without full model inversion Jef Caers

Linear inverse Gaussian theory and geostatistics a tomography example København Ø,

Acceptable Ergodic Fluctuations and Simulation of Skewed Distributions

Teacher s Aide Geologic Characteristics of Hole-Effect Variograms Calculated from Lithology-Indicator Variables 1

Transiogram: A spatial relationship measure for categorical data

Recent developments in object modelling opens new era for characterization of fluvial reservoirs

We LHR3 04 Realistic Uncertainty Quantification in Geostatistical Seismic Reservoir Characterization

Introduction. Semivariogram Cloud

Comparing the gradual deformation with the probability perturbation method

Facies Modeling in Presence of High Resolution Surface-based Reservoir Models

Multiple realizations using standard inversion techniques a

Determination of Locally Varying Directions through Mass Moment of Inertia Tensor

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Using linear and non-linear kriging interpolators to produce probability maps

A Program for Data Transformations and Kernel Density Estimation

Formats for Expressing Acceptable Uncertainty

GSLIB Geostatistical Software Library and User's Guide

Sequential Simulations of Mixed Discrete-Continuous Properties: Sequential Gaussian Mixture Simulation

Integration of Rock Physics Models in a Geostatistical Seismic Inversion for Reservoir Rock Properties

Automatic Determination of Uncertainty versus Data Density

Short Note: Naive Bayes Classifiers and Permanence of Ratios

Time-varying failure rate for system reliability analysis in large-scale railway risk assessment simulation

Semi-parametric predictive inference for bivariate data using copulas

Porosity prediction using cokriging with multiple secondary datasets

Lawrence D. Brown* and Daniel McCarthy*

Reservoir connectivity uncertainty from stochastic seismic inversion Rémi Moyen* and Philippe M. Doyen (CGGVeritas)

Time to Depth Conversion and Uncertainty Characterization for SAGD Base of Pay in the McMurray Formation, Alberta, Canada*

Checking up on the neighbors: Quantifying uncertainty in relative event location

Downscaling Seismic Data to the Meter Scale: Sampling and Marginalization. Subhash Kalla LSU Christopher D. White LSU James S.

Bayesian Modeling and Classification of Neural Signals

Latin Hypercube Sampling with Multidimensional Uniformity

The value of imperfect borehole information in mineral resource evaluation

3D geologic modelling of channellized reservoirs: applications in seismic attribute facies classification

23855 Rock Physics Constraints on Seismic Inversion

2 Approaches To Developing Design Ground Motions

Machine Learning in Modern Well Testing

Stepwise Conditional Transformation for Simulation of Multiple Variables 1

Model Inversion for Induced Seismicity

A033 PRACTICAL METHODS FOR UNCERTAINTY ASSESSMENT

Stanford Exploration Project, Report 105, September 5, 2000, pages 41 53

Solving Pure Torsion Problem and Modelling Radionuclide Migration Using Radial Basis Functions

Conditional Distribution Fitting of High Dimensional Stationary Data

EXPERT AGGREGATION WITH DEPENDENCE

Statistical Data Analysis

Best Practice Reservoir Characterization for the Alberta Oil Sands

Geostatistical History Matching coupled with Adaptive Stochastic Sampling

CSCI-567: Machine Learning (Spring 2019)

Soil Moisture Modeling using Geostatistical Techniques at the O Neal Ecological Reserve, Idaho

7 Geostatistics. Figure 7.1 Focus of geostatistics

RC 3.3. Summary. Introduction

Harvard University. Rigorous Research in Engineering Education

Calculation of Permeability Tensors for Unstructured Grid Blocks

Estimation of direction of increase of gold mineralisation using pair-copulas

2/23/2015 GEOGRAPHY 204: STATISTICAL PROBLEM SOLVING IN GEOGRAPHY THE NORMAL DISTRIBUTION THE NORMAL DISTRIBUTION

Bayesian Linear Regression. Sargur Srihari

3D geostatistical porosity modelling: A case study at the Saint-Flavien CO 2 storage project

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).

Introduction to Matrix Algebra and the Multivariate Normal Distribution

Multivariate Distribution Models

Analysis of the Pattern Correlation between Time Lapse Seismic Amplitudes and Saturation

GEOINFORMATICS Vol. II - Stochastic Modelling of Spatio-Temporal Phenomena in Earth Sciences - Soares, A.

The Proportional Effect of Spatial Variables

Data Integration with Direct Multivariate Density Estimation

1 Using standard errors when comparing estimated values

Geostatistics for Seismic Data Integration in Earth Models

EE/CpE 345. Modeling and Simulation. Fall Class 9

Constraining Uncertainty in Static Reservoir Modeling: A Case Study from Namorado Field, Brazil*

Transcription:

The Tau model for Data Redundancy: Part 2 Sunderrajan Krishnan April 24, 25 Abstract Knowledge about earth properties arrive from diverse sources of information. Connectivity of permeability is one such earth property which is essential for describing flow through porous materials. Information on multiple-point connectivity of permeability arrives from core data, well-test data and seismic data which are defined over varying supports with complex redundancy between these information sources. The tau model offers a framework to combine these diverse and partially redundant data. The tau weights in this tau model are a measure of data redundancy, function of the data values and of the order in which the conditioning data are being considered. In order to compute these tau weights, one needs a model of data redundancy, here expressed as a vectorial training image (Ti. A vectorial Ti can be constructed using a prior conceptual knowledge of geology and the physics of data measurement. From such a vectorial Ti, the tau weights can be computed exactly, then compared to those computed using any approximative calibration technique. In the case of estimating permeability connectivity, one observes significant deviations from data independence or conditional independence. Neglecting data redundancy leads to an over-compounding of individual data information and a possible risk of making extreme decisions. Introduction Several recent studies have pointed out that advective fluid flow through permeable media is affected strongly by the connected patterns of extreme permeability values, both highs and lows. Srinivasan [7] studied the impact of higher-order pattern statistics on fluid flow and showed that complex flow-based reservoir performance can be represented in terms of the multiple-point properties of the underlying permeability field [8]. Several authors have attempted to develop measures of such multiple-point (mp pattern characteristics

of permeability. One such measure is the rectilinear mp-connectivity of binary indicator values I(u at location u along any direction h given by: K(h; n = E{I(uI(u + h...i(u + nh} = Prob{I(u =, I(u + h =,...,I(u + nh = } ( Note that K(h; n = P(I(u = for n =. This multiple-point measure K(h; n is an improvement over the traditional two-point transition probability P(I(u + nh = I(u = which considers the interaction between locations taken only two at a time. Geostatistical algorithms have been developed to impose these mp-statistics over conditional realizations using simulated annealing [2]. However, it has been shown that these rectilinear measures of connectivity fail to capture the curvilinear nature of geological patterns. One striking example demonstrated by Krishnan and Journel [5] is that of a highly continuous channel deposit which has the same rectilinear connectivity K(h; n and same indicator variogram γ I (h as a discontinuous lens structure which, by simple visual inspection, shows a significantly lesser spatial connectivity and hence a lesser effective permeability. Newly developed geostatistical simulation algorithms manage to capture such higher order connectivity measures by means of considering all mp-statistics within a specified neighborhood template ( [9], [], []. Information about these mp-characteristics of permeability can arrive from a variety of sources. These sources of information, denoted D i, i =,...,n, could be defined over different supports and could be derived from diverse data such as small-support core data and larger support well-test data and seismic-derived data. Denoting the unknown connectivity of permeability as A, one can represent the information arriving from each individual data source D i in terms of the conditional probability P(A D i. It is evident that there could be strong redundancies between these individual data information P(A D i. This data redundancy arrives from a variety of reasons with one prime reason being the overlap in the volume supports over which these data are defined. There could also be overlap in the physical processes generating these data, for example two different well-tests conducted at nearby wells using different pressure or flow controls. Very importantly, note that this overlap between information cannot be fully described by the common unknown A alone rejecting therefore any possibility of conditional independence (CI between the data, i.e., P(D,..., D n A = P(D A...P(D n A [6]. Throughout this paper, we use the short notation P(A D i instead of the exact P(A = a D i = d i 2

When combining these elementary data information P(A D i into a combined knowledge P(A D,..., D n, one needs to account for the redundancies between the n data utilized. The question then is how can one account for complex data redundancy as related for example to the mp-connectivity of permeability? This problem of combining probabilistic information under redundancy is addressed by the tau model described in detail in the accompanying paper [6]. That paper addresses the general problem of combining conditional probabilities P(A D i accounting for data redundancy. Define the following data probability ratios x, x,...,x n and the target ratio x, all valued in [, ], as: x = P(A, x P(A = P(A D,..., x P(A D n = P(A Dn P(A D n x = P(A D,,D n P(A D,,D n, [, ],and The tau model is stated as: x x = n ( xi τi, τi ɛ [, + ] (2 x i= The most important components in this expression are the weights τ i. These weights which can lie anywhere in [, ] account for the redundancy between the information arriving from different data. The accompanying theory paper [6] develops this expression in more detail and interprets the tau weights. Further a calibration technique was proposed to compute these weights in practice. An exact representation of data redundancy would require the generation of joint realizations of the variables A, D,...,D n. In its turn, this requires knowledge of the physics of data generation accompanied by a conceptual knowledge of the underlying geology which relates A to each of the datum D i. A conceptual depiction of geology can be represented in the form of a training image (Ti which can be seen as a single realization of a spatial random function Z(u. Once we have this conceptual depiction of the geology, a knowledge of the data physics can be used to generate the unknown A and the related data D,...,D n. Such a generation will give rise to a vectorial Ti: {A (l, D (l,...,d(l where this Ti is now composed of a vector of (n + variables. The superscript (l refers to a specific vectorial Ti; there can be l =,..., L different such Tis. This vectorial Ti can be used to determine the redundancy between the information arriving from data D i about the unknown A. Examples of such data physics are seismic ray tracing techniques and flow models for generating well-test data. Such forward models of data generation are now widely developed for different fields of expertise and in fact, are commonly used 3 n }

to calibrate the individual data probabilities P(A D i. No rigorous evaluation of data redundancy is possible without such knowledge of the data physics. In fact, our evaluation of data redundancy and consequently the accuracy of any data combination technique can only be as good as our knowledge of these data physics. It is better to use some rough idea of the data physics, rather than totally ignoring it and relying on typically poorly estimated correlation values. In this paper, we first generate a fine scale Ti representing a heterogenous distribution of permeability values. This fine scale data set is averaged using different combinations of power averages resulting in four other variables, all of which together constitute a 5- dimensional vectorial Ti. The connectivity of permeability at one intermediate support is then evaluated using data from smaller and larger supports of this vectorial Ti. The tau model is used towards this purpose, with the tau weights being computed both exactly and by using an approximate calibration technique. Finally, the impact of ignoring data redundancy is demonstrated by assigning tau weights corresponding to a data conditional independence assumption. 2 Description of the data set Consider the 5 5 permeability field shown in Figure a. This reference field has been constructed by using a combination of Gaussian simulation (GSLIB program sgsim, [3] and a random drop (Poisson distribution of rectangular-shape low permeability shales. Let this reference field represent a permeability variable Z(u defined on a quasi-point support. Averaging of this variable Z(u over different supports with different averaging functions leads to the other variables displayed in Figure : Z v (u is the geometric average of Z(u defined over a constant volume of size v = pixels. v is the support volume to be estimated for input to, say, a flow simulator. The variable Z v (u is to be evaluated using point support data of type Z(u and the following data: Z w (u is a linear average of harmonic averages, the latter defined over 8 radial strings each of length pixels each (Figure 2. This average is assumed to mimic radial flow from a central well, approximated by D parallel flow occurring along each of the 8 radial directions. 4

Z w2 (u is the harmonic average of 3 geometric averages, the latter defined over 3 annular regions of radii 5, and 2 pixels (Figure 3. This average mimics radial flow through these annular regions. Note that the ω 2 -support is larger than that of ω. Z s (u is the linear average of Z(u defined over a constant large volume of size s = 5 5 pixels. Z s (u is a large scale average of Z(u obtained, possibly, from calibration of seismic data. All previous averaging types are power averages. However, Z w (u and Z w2 (u calls for a sequence of two power averages, they are thus non-linear and multiple-point averages as opposed to Z v (u and Z s (u which are single-point averages. Taken together, the set of images (a through (e in Figure constitutes a vectorial training image. Figure 4 shows the histograms of these five variables. The impact of averaging can be seen clearly in the smoothing of the histograms: the coefficient of variation decreases from.83 for Z(u to.7,.,.95 and.7 for Z v (u, Z w (u, Z w2 (u and Z s (u, respectively. Our focus here is the relationship between the high values of these maps. Define the following upper quartile indicator variables representing the high values. The indicators I(u, I v (u, I w (u, I w2 (u and I s (u are the indicators of exceeding respectively the upper quartiles z.75, zv.75, zw.75, zw.75 2 and zs.75 defined at each support. These upper quartiles can be read from the histograms in Figure 4. The corresponding indicator maps are shown in Figure 5. Using this multiple-support data set, our target of estimation is the mp connectivity of the variable Z v (u at the v block support of size. For sake of illustration, we will consider only the rectilinear measure of connectivity K(h; n in the East-West direction instead of the curvilinear measure K C (h; n, [5]. Define the following data events: A (n : (I v (u =,...,I v (u + nh =, the event of observing a set of n contiguous high values in direction h at the v support Similarly, define D (n : (I(u =,...,I(u + nh =, D (n 2 : (I w (u =,...,I w (u + nh =, D (n 3 : (I w2 (u =,..., I w2 (u + nh = and D (n 4 : (I s (u =,...,I s (u + nh =. Note for reference: the prior probability P(A ( =.25 corresponds to the distance x =.25 = 3. Similarly, P(D (.25 k =.25 for all k =, 2, 3, 4. Next, consider the conditional connectivity function: P(A (n D (n k, k =,...,4. This function gives the probability of observing a string of n connected high values at 5

the support v, given observation of a colocated string of connected high values at another support k. The availability of the various reference maps of Figure allows us to compute exactly all four previous conditional connectivity functions as well as the following connectivities given two or three data events, P(A (n D (n i, D (n, P(A (n D (n j i, D (n j four data, P(A (n D (n, D(n 2, D(n 3, D(n 4. Figure 6 shows all the single data conditional connectivities P(A (n D (n, D (n and given all k k, k =,...,4. The marginal probability P(A (n is the traditional rectilinear connectivity function, here defined at the support v= ; it is a decreasing function of n. Note that P(A (n at n = is equal to.25, the global proportion of high-values. The other curves in Figure 6 give the probability of connected v-support high values in the E-W direction given D (n, D(n 2, D(n 3 or D (n 4 taken one at a time. Note that the P(A (n D (n 2 and P(A (n D (n 3 curves both intersect the P(A(n D (n curve. At all lags, the datum D(n 4 gives the conditional probability closest to the marginal, i.e., D (n 4 is the least informative datum. Figure 7 shows the two-data conditional connectivities. All conditioning data pairs which include datum D (n show higher conditional probability than those which do not include it. Presence of a string of high values at both the point support and support w 2 (data D (n and D (n 3 results in almost sure occurrence of connected high values at the v support. Figure 8 shows the three and four data conditional connectivities. Again, inclusion of datum D (n indicates strong connectivity at the v support. On the other hand, inclusion of the large support datum D (n 4 does not add much information. This can be consistently observed for all combinations including this datum. 3 Cross-support statistics Conditional correlations The conditional correlation Corr{D i, D j A} is a measure of how two data D i and D j relate to each other with regard to a specific outcome of the unknown A, in other words a measure of data redundancy for informing that particular value of unknown A. Here we compute these conditional correlations between data taken two at a time for the event A =. Figure 9 gives these correlation values for lags n = through 5. Note that zero correlation implies conditional independence for A =. Note that a zero correlation given A = does not imply the same for A =, unless there is full independence. 6

Since we are interested in the data event A =, i.e., in the probability of pixels being connected, we only observe the correlation between the data given A =. The pair D (n and D (n 3 have the least conditional correlation for all lags: connected strings of high values at the point support and at the w 2 support are almost independent of each other given the connected values at support v. The maximum conditional correlation is observed between data D (n 3 and D (n 4. Note that the same datum D(n 3 exhibits both the least and the greatest conditional correlation, with D (n and D (n 4 respectively. 4 Combining data from different supports 4. Exact tau weights In the accompanying paper [6], it was shown that the sequence-dependent exact tau weights τ i can be expressed as a ratio of the data-likelihoods of D (s i given all previously utilized data D (s,...,d (s i in a sequence s: τ (s i (d,...,d i, a = Ln( P(D(s i Ã,D (s,,d(s i P(D (s i A,D (s,,d(s i Ln( P(D(s i à P(D (s i A (3 The superscript (s refers to a specific sequence of data conditioning starting by D and ending with D i. The small case notations d,...,d i, a represent the values taken by the data D,...,D i and by the unknown A, respectively. In the example here, the data and the unknown are all binary, thus: d i =, and a =,. Knowledge of the joint probability distribution between the variables A, D, D 2, D 3 and D 4 allows to compute these data-likelihoods. One can average these sequencedependent tau weights over all possible sequences s resulting in sequence-averaged tau weights τ i. For this example, we would have, 2, (3! =6 and (4! =24 number of sequences, respectively for cases with, 2, 3 and 4 data. Only the sequence-averaged tau weights τ i will be discussed hereafter. Note that the multivariate distribution between the variables A (n, D (n, D (n 2, D (n 3 and D (n 4 change with lag n. Therefore, the relationships between the data is considered separately for each lag, that is, the tau weights are a function of the lag n. Figure shows these sequence-averaged tau weights τ i. As a general rule, deviation of these weights from the value implies deviation from conditional independence. Initially, focus only on the first six cases in Figure corresponding to conditioning to two data only. It is clear that data with poor correlation (D, D 3 have tau weights closer to 7

than those with stronger correlation (D, D 2. Note that even though D -D 3 have very poor correlation (ρ. from Figure 9, their tau weights are significantly different from (τ, τ 3.7. This indicates that even small correlations could result in significant deviations from conditional independence. Next, look at the (D, D 2 pair. At lag 45, the two data D and D 2 are equally informative and get an equal weight of.59. Next consider the case D and D 4. Datum D 4 gets a lower weight (.6 than D (.8. For the (D 2, D 3 pair, the weights are almost similar and close to.6 at all lags. The correlation in this case is similar in magnitude to (D, D 2. The cases (D 2, D 4 and (D 3, D 4 shows behavior similar to that seen in (D, D 4. Datum D 4 receives much less weight than the other datum. From these six cases, one can observe that the sequence averaged tau weights are function of the interactions between data redundancy, data information and data value. There is complex overlap between these concepts, for example, changing the data values result in changing the data information content (a form of heteroscedasticity. All of these concepts need to be studied jointly towards their contribution to the tau weights. The cases of conditioning to three data are considerably more complex. Consider first the triplet (D, D 2, D 3 and compare with the three previous cases of (D, D 2, (D, D 3 and (D 2, D 3. τ 2 is now consistently lesser than τ. There is no switching between these two weights. The behavior of τ and τ 3 is similar to their behavior in the two-data case with the switch occurring at a greater lag than in the two-data D -D 3 case. As opposed to the two-data case, τ 3 is consistently greater than τ 2 suggesting that the presence of the datum D has an impact on the interaction between the two data D 2, D 3. The effect of the joint distribution of the three data D, D 2 and D 3 is observed here. Looking at the other cases of three and four data, τ 4 is always the lowest and τ is consistently greater than the other weights. The other two weights τ 2 and τ 3 show interesting behavior in that, depending on which other data are involved, one is lesser or greater than the other. This means that weights of closely related data (Figure 9 which are similarly important depends on their interaction with the other data. 4.2 Calibration-based tau weights Next, the tau weights are computed using the proposed approximate calibration technique [6]. Briefly described, this technique first involves a ranking of data according to their information content about the unknown. Then, one requires the conditional correlation of each datum with the most informative datum. The tau weight for the most informative first datum is set to τ =. Then, all other data obtain a weight given by: τ i = (ρ 2 D i,d A f(t ɛ [, (4 8

where t is a calibration parameter varying in [, ]. This technique requires the knowledge of the conditional correlation (Figure 9 and the calibration parameter t. Calibrating the tau model In expression (4 the weights τ k are approximated in terms of the conditional correlation ρ 2, requiring the calibration of a single scaling parameter, here denoted D (n k,d(n A t(n. Note here that the superscript (n denotes the lag. More precisely, for a series of values t (n ɛ[, ] expression (4 gives the tau values: τ (n k = (ρ 2 D (n k,d(n A f(t(n k = 2,..., K Using this series of values for τ (n k, the estimated distance x (t is computed using equation (2. This estimated value x (t is then compared with the true value x computed from the training images of Figure. The value of parameter t (n which minimizes the squared error (x x (t 2 is then taken as the optimal value: Combining two data t (n = argmin tɛ[,] (x x (t 2 (5 First consider the case of conditioning to data D and D 2 only. Using the procedure described above, the rescaling parameter t (n is computed for each lag n, see Figure. There is small variation of this parameter over all string lengths n. This almost constant value of t (n over all n suggests some aspect unique to the data set or/and to the proposed heuristic calibration equation (4. Figure 2 shows the τ weights computed using the optimal calibration values t (n and the conditional correlations shown in Figure 9. At lags lesser than 45, data D (n 2 is the most informative. But at larger lags, it is data D (n that is most informative. Since, we assign a weight to the most informative datum, we observe on Figure 2 the switch in weights at lag 45. Detailed studies have been performed to study the sensitivity of the estimated combined probability to incorrect calibration of parameter t [4]. Such studies have revealed that estimation using this calibration technique is indeed highly sensitive to accurate estimation of parameter t. A poor knowledge of the data physics, hence a poor vectorial Ti, will lead to incorrect evaluation of data redundancy and inaccurate estimates of the conditional probability. 9

The behavior for D -D 3 and D 2 -D 3 is similar to that of D -D 2. In case of the pair D -D 3, the switch in weights happens around lag 5 whereas for D 2 -D 3, the switch is at an initial lag close to. The behavior of tau weights is different for all cases involving datum D 4. Since the D 4 datum is always lesser informative than any other datum, it receives a lower weight (<.5 for all cases. No switch in weights is observed for these cases involving D 4. Combining three data: Next, we observe conditioning to three data taken together. Consider the case of conditioning to D, D 2 and D 3 in Figure 2. Beyond lag 45, D is the most informative datum, therefore it gets maximum weight of. Beyond this lag, τ 3 is greater then τ 2. From Figure 9, observe that amongst the two data D 2 and D 3, the datum which is consistently better correlated with D is D 2. The greater the correlation with the most informative datum, the lesser the tau weight given to that datum. This is the behavior observed in this case. One observes a similar behavior in Figure with the exact tau weights, albeit in a less marked fashion. Next, consider the case of conditioning to D, D 2 and D 4. Figure 2 shows that D takes the maximum tau weight beyond lag 45, and τ 4 receives a greater tau weight than τ 2. This arrives from the ordering of the respective correlations with datum D (Figure 9. However this ordering of tau weights is different from that observed in Figure. The two-point correlations used in computed the calibrated weights are insufficient to predict the behavior of the exact tau weights. Similar behavior is observed for the cases D -D 3 -D 4 and D 2 -D 3 -D 4. The maximum weight is given the same datum D for the first case and D 2 for the second case. But the order of the weight given to the other data is reversed. Considering all four data Finally, we consider conditioning to all four data D, D 2, D 3 and D 4. The behavior of the computed weights in Figure 2 is similar to that of D -D 2 -D 3, but more complex because of the introduction of the fourth datum D (n 4. As before, one can distinguish two zones: n ɛ [5, 45] and n > 45 In the first zone n ɛ [5, 45], datum D 2 has maximum information about the unknown A and receives a weight. The weight given to any other datum is greater if its conditional correlation with the maximally informative datum D (n 2 is lesser (see Figure 9. Consequently: τ 4 τ τ 3.

In the second zone n > 45, datum D (n is maximally informative receiving weight of. Here τ 3 τ 4 τ 2. Comparing with the exact averaged weights of Figure, one observes that the ordering of weights given to the lesser important datum is different. Also, those weights are less different from each other. 4.3 Cost of ignoring data redundancy Just a visual inspection of Figures and 5 would starkly convey that there is strong dependence between data coming from different supports. And that those relationships cannot all be linked to the common estimation goal A, the connectivity of strings at the support size. Such data-dependence implies that the information coming from these supports towards the estimation goal A bear considerable redundancy between each other. The plots for conditional correlation in Figure 9 confirm this information redundancy. Figure 3 shows the estimated probabilities using tau weights of resulting from a conditional independence (CI assumption, compared with the exact probabilities. With CI, one obtains an almost sure probability of for all cases. Conditional independence, here, results in too much importance given to each individual datum, leading to an apparent sense of greater concordance of information and greater certainty. Such assumption is hardly ever justified, except for circumstances of physically inferred full independence between data, examples of which are few in the earth sciences. Therefore, one must, if necessary, make an assumption of CI with caution. Assumption of conditional independence cannot be deemed either as a safe one. Safety of a model is not determined by matters of analytical convenience, but by consequences of the assumption on the physical quantities being estimated. 5 Discussion and Conclusions This paper illustrates a unique methodology to combine complex multiple-point information arriving from diverse sources and defined over varying supports. The information arriving from individual data are represented in the form of conditional probabilities. The overlap in different data information, i.e. data redundancy, is accounted for by the tau weights in the tau model. The tau weights are a measure of multiple-data redundancy, therefore they go much beyond traditional measures such as two-data correlations. These tau weights are function of data values or interval of data values, therefore they can also account for heteroscedastic dependency between the data and the unknown.

The example presented in this paper illustrates the case of evaluating connectivity of high permeability values which determines to a great extent, the paths of fluid flow and transport in porous media. Since information about permeability and its connectivity, arrives from multiple sources, namely well-core, geophysical data and dynamic well-based data, one needs to synthesize all these different information together in order to arrive at an estimate of the unknown permeability connectivity. Complex redundancies exist between these data. In most data combination procedures, there are calls to simplifying assumptions which eventually result in ignoring these redundancies. We have shown that ignoring such redundancies cannot be considered as a safe assumption. In some cases, accounting for data redundancy can become as important as the individual data processing itself. The important question that arises then is how to determine these data redundancies. Here, we have proposed the concept of a vectorial training image that represents a single, joint realization of the unknown and all data variables. A conceptual knowledge of the geology is used along with an understanding of the data physics to create this vectorial Ti. Using this vectorial Ti, we infer all statistics required to determine the tau weights. The exact tau weights computed using this Ti show complex interactions resulting from the redundancy between the different data. An approximate calibration technique is proposed to compute these tau weights. Application of the tau model framework to other problems may require novel strategies to evaluate data redundancy. For many examples in earth sciences, it should be possible to construct an analog model or a vectorial training image. Conceptual, mathematical and numerical algorithms have been developed for many data measurement processes over the past few decades. These forward models which operate on earth properties and model the data measurement procedure are frequently used in inversion procedures that evaluate the data information about the unknown. In fact, it can be stated that any data measurement corresponds to an inversion procedure which uses an implicit forward model. Such forward models can be used to construct a vectorial Ti. A major challenge that lies ahead is in identifying appropriate procedures for constructing these vectorial Ti and henceforth evaluate data redundancy. That would require novel techniques adapting to each individual problem, for example, say, combination of satellite and ground truth information. Developing such novel procedures for evaluating data redundancy need to be the areas of focus in future. 2

References [] B.A. Arpat. A multi-scale pattern-based approach to sequential simulation. In Proceedings of Geostatistics Congress, Banff, 24. [2] C. Deutsch. Geostatistical Reservoir Modeling. Oxford University Press, 22. [3] C. Deutsch and A.G. Journel. GSLIB: Geostatistical software library and user s guide. Oxford University Press, 998. [4] S. Krishnan. Combining diverse and partially redundant information in the Earth Sciences. PhD Thesis, Stanford University, 24. [5] S. Krishnan and A.G. Journel. Spatial connectivity: from variograms to multiplepoint measures. Mathematical Geology, 35(8:95 925, 23. [6] S. Krishnan and A.G. Journel. The tau model for data redundancy: Part. Mathematical Geology, this volume, 25. [7] S. Srinivasan. Is crisp modeling of geological objects important for flow - when is flow convective? In 2th Annual SCRF Meeting, Stanford University, 999. [8] S. Srinivasan. Integration of production data into reservoir models: a forward modeling perspective. PhD Thesis, Stanford University, 2. [9] S. Strebelle. Sequential simulation of complex geological structures using multiplepoint statistics. Mathematical Geology, 34(: 22, 2. [] T. Zhang, P. Switzer, and A.G. Journel. Sequential conditional simulation using classification of local patterns. In Proceedings of Geostatistics Congress, Banff, 24. 3

(a fine scale (b geometric x 5.. 5.. North. North..... East 5... East 5. (c string aver. length (d annular aver. 5,, 2 radii 5.. 5.. North. North..... East 5... East 5. (e linear ave. 5 x 5 5.. North... East 5.. Figure : Pixelmaps of a point support Z(u, b Z v (u, c Z w (u, d Z w2 (u, e Z s (u 4

8 6 4 2 2 4 6 8 8 6 4 2 2 4 6 8 Figure 2: Radial directions: 8 radial directions of length units defining support w 2 5 5 5 5 2 2 5 5 5 5 2 Figure 3: Annular regions: 3 annular regions of radii 5, and 2 pixels defining support w 2 5

.2 Frequency.8.4 mean 228.32 std. dev. 47.96 coef. of var.83 upper quartile 265.78 median 98.5 lower quartile 34.92 Frequency.2.8.4 mean 53.4 std. dev. 79.3 coef. of var 922.3 upper quartile 9.28 median 9.52 lower quartile 44.6. (a Z(u. (b Zv(u Frequency.2.8.4 mean 32.79 std. dev. 45.73 coef. of var. upper quartile 7.8 median 84.27 lower quartile 38.9 Frequency.2.8.4 mean 22.38 std. dev. 6.37 coef. of var.95 upper quartile 54.2 median 87.8 lower quartile 47.82. (c Zw(u. (d Zw2(u Frequency.2.5..5 mean 228.52 std. dev. 6.59 coef. of var.7 upper quartile 289.89 median 88.3 lower quartile 6.99. (e Zs(u Figure 4: Histograms of a point support Z(u, b Z v (u, c Z w (u, d Z w2 (u, e Z s (u 6

(a fine scale (b geometric x 5. 5. North North.. East 5... East 5. (c string aver. length (d annular aver. 5,, 2 radii 5. 5. North North.. East 5... East 5. (e linear ave. 5 x 5 5. North.. East 5. Figure 5: Pixelmaps of a point support I(u, b I v (u, c I w (u, d I w2 (u, e I s (u 7

.9.8 P(A D.7 probability.6.5.4 P(A D 3 P(A D 2.3.2 P(A P(A D 4. 2 4 6 8 2 4 lag Figure 6: Single data conditional probabilities: Connectivity function in EW of the Student Version of MATLAB high-values at the v support.9 P(A D.8.7 probability.6.5.4.3.2. P(A D P(A D P(A D 2 P(A D 2 P(A D 3 2 4 6 8 2 4 lag Figure 7: Two data conditional probabilities: Connectivity function in EW at the v support. Student Version of MATLAB 8

.9 P(A D P(A D.8.7 probability.6.5.4 P(A D.3.2 P(A D P(A D 2. 2 4 6 8 2 4 lag Figure 8: Three and four data conditional probabilities: Connectivity function in EW Student Version of MATLAB at the v support conditioned to data from three other supports.3 Corr 2 (D, D 2 A.3 Corr 2 (D, D 3 A.3.2..2..2. Corr 2 (D, D 4 A 5 5 5.3 Corr 2 (D 2, D 3 A.3 Corr 2 (D 2, D 4 A.3 Corr 2 (D 3, D 4 A.2.2.2... 5 5 lag 5 Figure 9: Conditional correlations: Square of conditional correlations between, k =,...,4 given the unknown A = D (n k Student Version of MATLAB 9

D D D D 2.8.8.8.8.6.6.6.6.4.4.4.4.2 5 D 4.2 5 D 4.2 5 D,D,D 2 3.2 5 D,D,D 2 4.8.8.8.8.6.6.6.6.4.4.4.4.2 5 D.2 5 D 2.2 5 D.2 5 τ.8.6.8.6.8.6 τ 2 τ 3 τ 4.4.4.4.2 5.2 5.2 5 Figure : Exact averaged tau weights τ i : The sequence averaged exact tau weights computed for each case of data conditioning using the reference vectorial Ti 2

D D D D 2.2.2.2.2.... 4 D 4 4 D 4 4 D,D,D 2 3 4 D,D,D 2 4 t parameter.2..2..2..2. 4 D.3 4 D 2.3 4 D.3 4.2.2.2... 4 4 nlag in EW 4 Figure : Calibration parameter t (n for all cases: All sets of combinations 2

D D D D 2.5.5.5.5 D 2 4 D 3 4 D 4 D 4 tau weights.5.5.5.5 D 4 D 2 4 D 4 4.5.5.5 4 4 nlag in EW 4 Figure 2: Tau weights using calibration: All sets of tau weights from calibration approximation 22

D D D D 2.8.8.8.8.6.6.6.6 Combined probability.8.6 4 D 4.8 4 D.8.6.8 4 D 4 4 D 2.8.6.8 4 D,D,D 2 3 4 D.8.6 4 D,D,D 2 4 4 Conditional Ind. estimate.6.6.6 4 4 4 nlag Exact conditional probability Figure 3: Impact of incorrect conditional independence assumption: All sets of combinations 23