Bayesian inference of random fields represented with the Karhunen-Loève expansion

Size: px

Start display at page:

Download "Bayesian inference of random fields represented with the Karhunen-Loève expansion"

Anabel Kelley
5 years ago
Views:

1 Bayesian inference of random fields represented with the Karhunen-Loève expansion Felipe Uribe a,, Iason Papaioannou a, Wolfgang Betz a, Daniel Straub a a Engineering Risk Analysis Group, Technische Universität München. Arcisstraße, 8333 München, Germany. Abstract The integration of data into engineering models involving uncertain and spatially varying parameters is oftentimes key to obtaining accurate predictions. Bayesian inference is effective in achieving such an integration. Uncertainties related to spatially varying parameters are typically represented through random fields discretized into a finite number of random variables. The prior correlation length and variance of the field, as well as the number of terms used in the random field discretization, have a considerable impact on the outcome of the Bayesian inference which however has received little attention in the literature. Here, we investigate the implications of different choices in the prior random field model on the outcome of the Bayesian inference. We employ the Karhunen-Loève expansion for the representation of the random fields. We show that a higher-order Karhunen-Loève discretization is required in Bayesian inverse problems, as compared to standard prior uncertainty propagation. Furthermore, the smoothing effect of the forward operator has a large influence on the posterior solution, specially when the quantity of interest is sensitive to local random fluctuations of the inverse quantity. This is also reflected in the magnitude of updated rare event probabilities. We illustrate these effects analytically through a D cantilever beam with spatially varying flexibility, and numerically using a D linear elasticity example where the Young s modulus is spatially variable. Keywords: uncertainty quantification, inverse problems, Bayesian inference, random fields, Karhunen-Loève expansion, reliability updating.. Introduction In science and engineering, physical systems are typically modeled by partial differential equations (PDEs). Such PDEs require a proper description of the underlying system inputs and parameters. In practice, there is often significant uncertainty about the actual value of these properties. The associated uncertainties can be reduced through measurements of the system response. If these observations are combined with the PDE model, information about the uncertain system inputs and parameters can be retrieved. This inference process is referred to as inverse problem. The stability of the inverse problem is mainly controlled by the dimension of the parameter space, the structure of the PDE, and the observations, which are in most cases scarce and noisy. Hence, inverse problems are typically ill-posed. This means that different values of model parameters are consistent with the data or that the parameters cannot be identified at all. Bayesian statistical methods provide a tool to regularize the problem by incorporating a probabilistic description of the model parameters that combines prior information with observations [, ]. In this case, the objective is to estimate the posterior probability density function (PDF) of the model parameters. Closed-form expressions for the posterior density are only available for some Corresponding author. addresses: felipe.uribe@tum.de (Felipe Uribe), iason.papaioannou@tum.de (Iason Papaioannou), wolfgang.betz@tum.de (Wolfgang Betz), straub@tum.de (Daniel Straub) Preprint submitted to Computer Methods in Applied Mechanics and Engineering November 3, 8

2 particular cases. Hence, Bayesian inverse problems are generally solved using sampling-based methods, such as Markov chain Monte Carlo (MCMC), importance sampling [3], sequential Monte Carlo [4], structural reliability methods [5], or approximation methods such as transport maps [6], variational inference [7], among others. An additional level of complexity is present when the unknown model parameters fluctuate randomly in space. A common example arises in continuum mechanics where material parameters are spatially variable, such as Young s modulus in elasticity theory [8], conduction and convection coefficients in heat transfer problems [9], or permeability fields in hydraulic tomography applications []. The uncertainty related to spatially varying properties is generally represented by random fields. This mathematical object implies the generation of an infinite-dimensional collection of random variables indexed at spatial coordinates of the continuous domain of the system. For Bayesian inverse problems that are solved numerically, the infinite-dimensional parameter space needs to be projected onto a suitable finite-dimensional one. The Karhunen-Loève (K-L) expansion is a random field discretization approach that is optimal in the meansquared-error sense as compared to any other spectral projection algorithm. This method employs the eigenvalues and eigenfunctions of the autocovariance operator describing the random field, to construct a series expansion with random coefficients. In practice, it is common to truncate the K-L expansion after a finite number of terms. Thereafter, the uncertain parameters associated to the full random field are replaced by the coefficients of the truncated expansion; reducing the dimensionality of the inverse problem. We remark that alternative approaches to dimensionality reduction are also used in the context of Bayesian inversion, as in the case of wavelet-based parametrization [], likelihood-informed subspaces [], active subspaces [3], or the approach recently proposed in [4]. A main challenge in Bayesian inference of random fields is the choice of the prior distribution for the parameters that generate the field. Commonly, the number of terms used in the random field discretization is fixed, as is the correlation length and variance of the field. These quantities have a considerable impact on the random field representation and, consequently, on the Bayesian inversion. The difficulty of selecting appropriate prior distributions for random fields has fostered research on hierarchical Bayesian approaches. In this regard, Marzouk and Najm [5] applied the K-L expansion with a hierarchical Gaussian process prior using the mean and variance of the field as hyperparameters; the full mathematical model is replaced by a polynomial chaos surrogate yielding an efficient evaluation of the likelihood function. They also performed an error analysis on the K-L approximation of the posterior random fields. Tagade and Choi [6] extended the approach in [5] using a larger hierarchical structure, where the correlation length was also part of the inference process. Cotter et al. [7] generalized several MCMC algorithms to the realm of functions. In particular, they proposed a Metropolis-within-Gibbs algorithm to infer both the random coefficients of the K-L expansion and its truncation order. Mondal et al. [8] also performed inference on the number of terms in the K-L expansion by applying the reversible jump Markov chain Monte Carlo algorithm [9]. Sraj et al. [] included the correlation length of the field as hyperparameter. They proposed a parametrized autocovariance function to reduce the computational cost associated to the repeated solution of the eigenvalue problem required by the sample-based inference process. Moreover, Roininen et al. [] applied Cauchy and Gaussian hyperpriors to the correlation length of the field using non-homogeneous Matérn covariance kernels. They used a combined Gibbs and Metropolis-within-Gibbs algorithm for the solution of the hierarchical Bayesian inverse problem. Recently, Fuglstad et al. [] used the concept of penalized complexity priors proposed in [3] to derive a joint prior for the variance and correlation length of Gaussian random fields; they also provided guidelines for selecting the hyperparameters and priors for non-homogeneous random fields. Latz et al. [4] proposed a Metropolis-within-Gibbs algorithm to infer jointly a parameterized Gaussian random field and its correlation length; they applied a reduced basis algorithm to decrease the computational cost of the simulation. Despite extensive research on the development of numerical methods for Bayesian inference of random fields, the influence of the random field discretization on the solution of the inverse problem has received little attention. Li [5] derived an error bound between the maximum a posteriori estimator and the truncated K-L representation in terms of the eigenvalues of the prior covariance. Spantini et al. [6] pointed out that in order to avoid large truncation errors in the posterior solution associated to the K-L discretization, the prior distribution needs to impose significant smoothness on the parameters (i.e., the eigenvalues of the prior

3 covariance decay fast). In this paper the effect of the K-L discretization on the Bayesian inverse problem solution is investigated. We extend the analysis of [7] by showing analytically and numerically the influence of different prior assumptions on the posterior solution. We perform two studies: (i) a one-dimensional example, for which closed-form expressions of the posterior random field can be derived. A parametric study to evaluate the influence of the prior correlation length, autocovariance kernel of the field, and number of terms in the K-L expansion on the posterior random field is carried out. Different sets of observations are also considered in order to assess the influence of the number of measurement points on the random field updating. The analytical expressions enable us to perform a systematic error analysis of the posterior mean and variance approximations. Furthermore, for a given parameter setting, we perform model selection on different truncation orders in the K-L expansion. This allows us to evaluate whether a larger number of K-L terms is required for the solution of the Bayesian inverse problem as compared to the forward problem solution; (ii) a two-dimensional numerical example is used to study the smoothing effect of the forward operator on different quantities of interest (QoI). In this case, the Bayesian inverse problem is solved using the BUS (Bayesian updating with structural reliability methods) approach proposed in [5, 8]. The identified random field is then employed to evaluate the influence of the random field discretization on the updating of rare event probabilities using the approach discussed in [9]. The remainder of this work is structured as follows: in, a brief summary of random fields and the K-L expansion is presented; Whittle-Matérn covariance kernels and error measures used for the random field discretization are also introduced. In 3, the Bayesian approach to inverse problems in the context of random fields is formulated; furthermore, the principles of the BUS approach are described. Next, the influence of the random field discretization on the posterior random field is demonstrated by means of analytical and numerical experiments in 4. The main findings of the study are summarized in 5. The paper finalizes with the conclusions.. Modeling and representation of random fields Random fields provide an effective tool for the modeling of system inputs and parameters that fluctuate continuously through space. The following discussion follows the expositions of Adler [3] and Grigoriu [3]... Definition of a random field Let (Ω, F, P) be a probability space, D R d a bounded index set representing a physical domain, and L (Ω, P) the Hilbert space of second-order random variables (finite variance), with the inner product X, Y = E[XY ] (for X, Y L (Ω, P)). A random field can be understood as a function H(x, ω) : D Ω R, with arguments x D a spatial coordinate and ω Ω a generic outcome of the sample space [3]. Intuitively, a random field is a collection of random variables representing uncertain values at each spatial coordinate of D. Notice that if D is uncountable, it is not possible to specify the joint distribution of all random variables defining the random field. Hence, from a modeling perspective, a random field is characterized in terms of its finite-dimensional (fi-di) distributions (general definitions are given in [3, 3]). Consider the finite set of points x = {x,..., x n x i D, i =,..., n}, associated with a set of random variables H(x, ω) = {H(x, ω),..., H(x n, ω) H(x i, ω) L (Ω, P)}, with joint distribution F H (y) = P[H(x, ω) y], called the n-th order fi-di distribution of the random field [3]. A random field is defined by its family of fi-di distributions provided they exist and satisfy Kolmogorov s conditions of consistency and symmetry (see [3, p.3]). A random field is Gaussian if its fi-di distributions are multivariate Gaussian for any x D [3]. Gaussian random fields are completely characterized by their first- and second-order moments, i.e., the mean function µ H (x) = E[H(x, ω)] and the autocovariance function C HH (x, x ) = E[(H(x, ω) µ H (x))(h(x, ω) µ H (x ))] = σ H (x)σ H (x )R HH (x, x ), with σ H (x), σ H (x ) and R HH (x, x ) the standard deviation and autocorrelation functions of the field. Moreover, a random field is said to be homogeneous if the associated fi-di distributions of the field are invariant under arbitrary shifts in space d = x x ; and the field is weakly homogeneous if the mean function µ H (x) = µ H is space-invariant and the autocovariance function only 3

4 depends on the shift, i.e., C HH (x, x ) = C HH (d) [3]. Further, if the autocovariance function is independent of the direction, i.e., if it is a function of the Euclidean norm d = x x, the random field is isotropic. It is clear that a proper definition of a random field will imply the construction of a fi-di distribution family with n. Such a theoretical description is not commonly used in practice, since it is not feasible to collect sufficient data to verify the assumed probabilistic models. Hence, the process of representing a continuous-parameter random field in terms of a finite set of random variables requires the use of stochastic discretization schemes (e.g., [3]). Among these representation techniques, methods based on finite expansions of random variables and deterministic functions are popular. These include the Karhunen-Loève expansion [33, 34], which expresses a random field as a linear combination of orthogonal functions chosen as the eigenfunctions resulting from the spectral decomposition of the autocovariance function of the field... Karhunen-Loève expansion Let us consider a real-valued random field H(x, ω) with continuous mean µ H (x) : D R and autocovariance functions C HH (x, x ) : D D R. Autocovariance functions belong to the class of Hilbert-Schmidt kernels (functions with finite L -norm), which are symmetric and positive-semidefinite [3]. These properties guarantee the existence of an orthonormal basis consisting of the eigenfunctions of the associated covariance operator, such that the sequence of corresponding eigenvalues is real and non-negative [35]. Following Mercer s theorem, the autocovariance kernel can be represented by a series expansion based on the spectral representation of the covariance operator [36, p.48], C HH (x, x ) = λ k φ k (x)φ k (x ) () k= where λ k [, ) (with λ k λ k+ and lim k λ k = ), and φ k (x) : D R are the eigenvalues and eigenfunctions of the covariance operator. A direct consequence of this result is the representation of a random field in terms of a series expansion. Hence, a second-order random field H(x, ω) can be approximated by Ĥ(x, ω) using the Karhunen-Loève (K-L) expansion after truncating the series at the M-th term as [37] H(x, ω) Ĥ(x, ω) := µ H(x) + M λk φ k (x)θ k (ω), () here, θ k (ω) : Ω R is a set of mutually uncorrelated random variables with zero mean and unit variance (i.e. E[θ k (ω)] = and E[θ k (ω)θ l (ω)] = δ kl ). If the random field is Gaussian, the random variables θ k (ω) are independent standard Gaussian. In any other case, the joint distribution of θ k (ω) is difficult to obtain. However, a class of non-gaussian random fields, the so-called translation fields [38], can be still represented with the K-L expansion through a suitable isoprobabilistic transformation of an underlying Gaussian field. Notice that Eq. () separates the random field as H(x, ω) = µ H (x) + H σ (x, ω), that is, into the mean path of the field and a zero-mean (centered) random field that incorporates the covariance information. The set of eigenpairs {λ k, φ k } is computed through the solution of a homogeneous Fredholm integral equation of the second kind [37], C HH (x, x )φ k (x )dx = λ k φ k (x), (3) D whose analytical solution exist only for specific cases of autocovariance functions [37]. In general, this equation is estimated numerically using projection methods (e.g., collocation, Galerkin) [39], which express the eigenfunctions as a linear combination of complete basis functions. Other approaches include, degenerate kernel methods [4], which approximate the target kernel by a separable kernel given by the sum of a finite number of products of functions; Nyström methods [4], which solve the integral equation using Gaussian quadrature rules; circulant embedding [4], which uses fast Fourier transform to diagonalize a nested-blockcirculant-matrix extension of a nested-block-toeplitz covariance matrix, this construction provides a finite expansion of the field in terms of a deterministic basis. 4 k=

5 .3. Whittle-Matérn covariance kernels Covariance kernels for random field modeling are empirical models used to define the particular correlation characteristics of a random field. A flexible class of isotropic Hilbert-Schmidt kernels used for the definition of random fields are the so-called Whittle-Matérn functions, which are defined as [43] ν C ν (d) = σ H Γ(ν) ( νd l c )ν ( ) νd K ν l c where, d = x x, Γ( ) is the gamma function, K ν ( ) is the modified Bessel function of the second kind, l c is a range parameter (correlation length), and ν > is a smoothing parameter. The parameters of the Matérn model l c and ν can be fitted based on experimental measurements. The value of ν determines the smoothness of the random field; this is important when the field is used to make predictions. However, ν is typically fixed since it is poorly identified in practical applications [44]. From Eq. (4), the special case ν = / and limiting case ν are of particular interest, ) C / (d) = σ H exp ( dlc and ) C (d) = σ H exp ( d l c which correspond to the non-differentiable exponential and infinite-differentiable squared exponential (also called as Gaussian) autocovariance kernels, respectively..4. Error measures for random field discretization For the K-L expansion, the number of terms to be included in the series is closely related to the magnitudes of the eigenvalues of the covariance operator, which in turn strongly depend on the correlation length of the field. Specifically, the quality of the discretization is quantified with respect to the level of accuracy in the estimation of the exact mean (bias) and variance (variability) functions of the random field. Local point-wise error measures for the mean and variance can be defined as the relative difference between the exact and approximated random fields: E[H(x, ω)] E[Ĥ(x, ω)] V[H(x, ω)] V[Ĥ(x, ω)] ɛ µ (x) = ɛ E[H(x, ω)] σ (x) = (6) V[H(x, ω)] here, ɛ µ and ɛ σ are the relative errors in the mean and variance, respectively. Global error measures can also be applied to quantify the overall quality of the random field representation. These measures are defined for the mean and the variance, as their average values over the domain of definition D of the random field [39], ɛ µ = ɛ µ (x)dx and ɛ σ = ɛ σ (x)dx (7) D D D D where D = D dx. These error measures allow one to evaluate the quality of the prior and posterior random field estimates. For the prior random field, the mean function can be represented exactly with the K-L expansion, i.e., ɛ µ (x) = ; and the variance function is approximated as V[Ĥ(x, ω)] = M k= λ kφ k (x), which yields, ɛ σ (x) = σ M H k= λ kφ. k (x) For the posterior random field, these expressions are no longer valid and estimation based on posterior statistics is necessary. When data is available, other error measures are of relevance, such as the relative misfit of the approximated forward model and the observed data. In this study, only global error measures that average local point-errors over the domain are considered to facilitate a comparison between the prior and posterior random field approximation. (4) (5) 5

6 3. Bayesian inference for inverse problems 3.. Inference of random fields Consider the forward problem y = G(H(x, ω)), where G : L (D) L (D) is the forward response model expressing the relationship between the model output and the spatially varying parameters. The forward model G is generally masked by an observation operator, such that the model output is computed at specific locations x R m, with m denoting the number of observations. Since the spatially varying parameters are modeled by random fields, they are parametrized in the physical and stochastic space by x and ω. As a result, G typically implies the solution of a stochastic partial differential equation (SPDE) projected onto the space of observations Y R m. The observed data frequently contain noise. In classical inverse problems, this noise is usually modeled as additive and mutually independent of the uncertain parameters; this assumption yields, ỹ = G(H(x, ω)) + η (8) where ỹ R m is the vector of observations, and the noise random vector η R m follows a Gaussian distribution with mean zero and non-singular covariance matrix Σ ηη R m m. Other noise models exist in the literature, e.g., multiplicative errors, convolution of measurement and model error distributions, among others (see [, 45]). The inverse problem in Eq. (8) is difficult to solve since it is generally ill-posed. This is mainly because the outcome space of the random field is infinite dimensional, while the dimension of the data space is finite. For this reason, the framework of Bayesian statistical theory is employed. The advantage of Bayesian inference for inverse problems lies in the fact that the prior information represents a mechanism of regularization [, 5]. Furthermore, Bayesian updating facilitates the assessment of the impact of the uncertain parameters on the solution of the forward problem, on the prediction of a given quantity of interest, and on the estimation of rare event probabilities. Bayes theorem in infinite dimensions is interpreted as the Radon-Nikodym derivative of the posterior probability measure with respect to the prior probability measure [46]. In practice, the random field H(x, ω) is substituted by its discrete representation Ĥ(x, ω) in terms of a finite number of random variables. In most cases, the discretized random field lies in a high dimensional parameter space. Particularly, the K-L expansion can be used to reduce the dimensionality and parametrize the random field. Consider the square-integrable random vector θ(ω) Θ R M resulting from a truncated K-L series expansion (Eq. ()). Observe that since the parameter θ(ω) characterizes the randomness of the field, performing inference on the random field Ĥ(x, ω) is analogous to inferring directly the random vector θ(ω); we henceforth denote the approximated random field as Ĥ(x, θ), and consequently, the forward response operator is now a map G : Θ Y. In Bayesian inverse problems, it is assumed that the initial knowledge about the parameters before considering any measurement can be summarized by a probability density function (PDF) θ f (θ), called the prior distribution. The updated belief about θ after including the data ỹ represents the solution of the inverse problem, that is, the posterior distribution f(θ ỹ). Following Bayes theorem this conditional PDF is [], f(θ ỹ) = f(θ) L(θ ỹ) (9) Z(ỹ) where, the likelihood function L(θ ỹ) = f(ỹ θ) provides a link between model and data, and the model evidence Z(ỹ) = L(θ ỹ) f(θ) dθ is a normalization constant. The value of Z(ỹ) gives information about Θ the plausibility of the assumed model and it is used in the context of model selection and averaging [47]. As a result of the K-L representation, Gaussian or translation random fields are implicitly endowed with a multivariate Gaussian prior distribution (also known as Gaussian process prior) whose second-order moment properties need to be defined; that is, the prior mean and autocovariance functions. Even for homogeneous Gaussian random fields controlled only by the correlation length and the variance, the choice of the prior distribution remains a challenge. This is due to the fact that the prior information about the autocovariance kernel is usually vague. Additionally, the observed data is not often sufficient to clearly identify the correlation structure. Therefore, the assumed prior probabilistic model has a large influence on the posterior random 6

7 field solution and on the rare event updating. A hierarchical Bayesian framework simplifies the prior modeling of the target random field by the inclusion of hyperparameters, for the definition of the autocovariance kernel, such as the variance and correlation length of the field [5, 6, ]. This approach is not considered here, since the target is to directly study the implications of different parameter choices on the posterior solution. Remark (). Since the random vector θ R M of K-L expansion coefficients is standard Gaussian distributed, the prior density is fixed as f(θ) = N (, I), with I R M M the identity matrix. For a given modeling setting, the prior information about the second-order properties of the field enters directly in the definition of the likelihood function. 3.. The BUS framework In most cases, the solution of Bayesian inverse problems requires the application of numerical methods. MCMC-based algorithms are typically employed to generate samples from the target posterior distribution. A disadvantage of standard MCMC samplers is that they often require a large computational cost since the underlying PDE model needs to be solved many times to achieve convergence. Moreover, the convergence rate of such methods typically deteriorates when the dimension of the parameter space increases. Specialized sampling-based algorithms [4, 7, 3] alleviate some of the issues of standard MCMC. A recently proposed framework to Bayesian inference is BUS (Bayesian Updating with Structural reliability methods) [5]. The BUS approach is based on the classical rejection sampling algorithm and expresses Bayesian inference as an equivalent rare event simulation problem. Let π(θ) be an unnormalized version of the posterior distribution in Eq. (9), i.e., π(θ) = f(θ)l(θ ỹ). In BUS, the proposal distribution of rejection sampling q(θ) is set to be equal to the prior distribution f(θ) (provided that f(θ) has heavier tails than π(θ)). The acceptance probability of rejection sampling becomes α = π(θ) f(θ) L(θ ỹ) = = c L(θ ỹ) () ĉ q(θ) ĉ f(θ) where c = /ĉ is a positive constant satisfying c L(θ ỹ). Consequently, a proposed sample θ f(θ) is accepted if υ c L(θ ỹ), otherwise is rejected. The auxiliary parameter υ Υ R [,] is a standard uniform random variable (υ U[, ]) that is included into the space of random variables (Θ = [Θ, Υ]). From this construction we can define the space, H = {[θ, υ] Θ : υ c L(θ ỹ)}. () In reliability analysis, the space H can be seen as a failure domain with associated limit state function h(θ, υ) = υ c L(θ ỹ). We refer to this space as the observation domain, since the samples drawn from the prior distribution will follow the posterior distribution if and only if they belong to H. Observe also that if the samples belong to H, they describe a failure event that represents a rare event estimation problem. This connection allows us to use existing methods from rare event simulation to perform Bayesian inference. For instance, the classical rejection sampling algorithm corresponds to employing standard Monte Carlo simulation in BUS. In order to perform Bayesian inference efficiently, BUS is typically combined with the subset simulation (SuS) method [48]. The main advantage of SuS lies in its ability to transform a rare event estimation problem into a sequence of problems involving more frequent events. Moreover, the performance of the method does not deteriorate with increasing dimension of the uncertain parameter space. When using SuS in combination with BUS, the resulting posterior samples are unweighted but correlated (due to the adaptive choice of the intermediate levels and MCMC steps) [8]. The implementation of BUS requires the choice of the constant c = /ĉ. In BUS, the parameter ĉ is optimally chosen as the maximum of the likelihood function. However, in most cases this value is not known in advance. Therefore, an adaptive version of BUS in which the constant c is not required beforehand and is computed sequentially as the simulation evolves is proposed in [49, 8]. Additionally, a method that does not require the scaling constant ĉ to be equal to the maximum likelihood and incorporates a re-sampling step to draw samples from the posterior is introduced in [5]. 7

8 3.3. Updating of rare event probabilities In the context of reliability analysis and rare event estimation, the performance of the system under consideration can be described by a limit state function (LSF) g : Θ R. The failure hypersurface defined by g(θ) = splits the space of uncertain variables into the safe domain S = {θ Θ : g(θ) > } and the failure domain F = {θ Θ : g(θ) < }. The probability of occurrence of F Θ, referred to as the probability of failure, is defined by P[F] = F [θ]f(θ)dθ () Θ where f(θ) is the prior PDF of the model parameters and [ ] denotes the indicator function, which takes the values F [θ] = when θ F, and F [θ] = otherwise. A special challenge involves the analysis of rare events, that is, when Eq. () represents the solution of a potentially high dimensional integral for which P[F] is very small. The information provided by measured or observed data can be incorporated into the analysis to improve the probability of failure estimate. This implies the computation of failure probabilities conditional on the observations ỹ. The updated probability of failure P[F ỹ] can be estimated using the posterior PDF of the model parameters as P[F ỹ] = F [θ]f(θ ỹ) dθ = F [θ]f(θ) L(θ ỹ) dθ. (3) Θ Z(ỹ) Θ Advanced simulation methods can be employed for the estimation of the integrals in Eqs. () and (3), e.g. sequential importance sampling [5], cross-entropy method [3], moving particles [5], or subset simulation [48]. However, the estimation of the integral (3) is a more challenging task than (), since it requires sampling from the tails of the posterior distribution. Several strategies are proposed in the literature to estimate this posterior failure probability, e.g., [53, 54, 55, 56]. In the context of BUS, the reliability updating problem is approached as follows. The posterior distribution of the parameter vector θ is computed by conditioning the joint distribution of [θ, υ] to the observation space H and marginalizing over υ, i.e., f(θ ỹ) = c θ H [θ]f(θ)dυ, with the normalizing constant c θ = Θ H [θ]f(θ)dυdθ. Hence, the posterior failure probability can be expressed in terms of two rare event estimation tasks [9], P[F ỹ] = Θ F [θ] H [θ, υ]f(θ) dυ dθ Θ = H [θ, υ]f(θ) dυ dθ f(θ) dυ dθ h(θ,υ) f(θ) dυ dθ = g(θ) h(θ,υ) P[g(θ) h(θ, υ) ], P[h(θ, υ) ] (4) which implies the computation of a system reliability problem for the numerator and a component reliability problem for the denominator. In the general case, both reliability estimation tasks in Eq. (4) need to be solved. However, if the Bayesian inverse problem has been computed already with the BUS approach (or any other method), samples from the posterior distribution are available and they can be used to accelerate the estimation of the posterior probability of failure P[F ỹ]. This is because the posterior samples belong to the observation domain H associated to the LSF h(θ, υ). Hence, the estimation of the failure probability corresponding to the numerator is only required, i.e., P[g(θ) h(θ, υ) ]. This represents a conditional reliability problem that can be computed by any advanced rare event estimation algorithm. Especially, if we employ the SuS method combined with the BUS approach for solving the Bayesian inverse problem, some minor modifications of the original algorithm are required: (i) limit state function of the observation domain; the LSF h(θ, υ) is fixed at the beginning of the simulation for a suitable constant c (if adaptive BUS is used [8], the posterior solution provides the constant c at no additional cost); (ii) initial Monte Carlo samples; the samples θ at the first simulation level are the estimated posterior samples; and (iii) acceptance/rejection MCMC criterion; in the MCMC algorithm used within SuS, the candidate sample needs to satisfy the constraint h(θ, υ) (in addition to the condition g(θ) ). This guarantees that the proposed samples are not only included in the failure domain F, but also in the observation domain H. Details of this approach are given in [9]. 8

9 4. Numerical investigations The focus of this paper is the analysis of the implications of different parameter choices for the prior random field modeling on the solution of the Bayesian inverse problem. We aim at showing those effects by carrying out a parametric study on two examples, one for which is possible to compute all the posterior quantities analytically, and a second one that requires the use of sampling-based approaches to estimate the posterior quantities. 4.. D cantilever beam: analytical solution In the following, an example for which it is possible to derive analytically the posterior random field is proposed. This enables a precise evaluation of the influence of the K-L discretization on the posterior solution Model description We consider the second example in [5], the updating of the spatially variable flexibility F (x) of a cantilever beam. The beam has length L = 5 m (i.e., the domain is the interval D = [, L]) and is subjected to a deterministic point load P = kn at the free end as shown in Figure. The prior flexibility is described by a homogeneous Gaussian random field F (x, ω). The two Matérn kernels in Eq. (5) are considered as autocovariance functions C F F (x, x ) for the flexibility. The mean of the field is µ F = 4 kn m and the standard deviation is σ F = kn m. A parameter study on the correlation length l c is performed. Figure : Cantilever beam: true values and two sets of deflection observations. From the Euler-Bernoulli equation [57], the bending moment in the beam M(x) can be computed from the differential equation, M(x) = E(x)I d w(x) dx = M(x)F (x) = d w(x) dx, (5) where, w(x) is the deflection, E(x) is the elastic modulus, I is the moment of inertia, and F (x) = (E(x)I) is the flexibility of the beam (the inverse of the bending stiffness). Integrating twice Eq. (5) and noting that 9

10 the bending moment of a cantilever beam can be calculated as M(x) = (L x)p, the forward deflection response can be obtained by solving the following equation, w(x, F (x)) = P x s (L t)f (t) dtds. (6) The observation noise is modeled as additive and mutually independent from the uncertain flexibility. The noise is described by a joint Gaussian PDF with mean zero and covariance matrix Σ ηη. The noise covariance is computed by assuming that the measurements are correlated with an exponential kernel, with standard deviation σ η = 3 and correlation length l η = m. This results in the following likelihood function, L(F (x) ỹ) = ( (π)m det(σ ηη ) exp ) [ỹ w( x, F (x))]t Σ ηη [ỹ w( x, F (x))], (7) here, F (x) is a realization of the flexibility random field, and ỹ is a set of m deflection observations measured at equally spaced points x of the domain (Figure ). The observations are generated by simulation assuming a true (but in real applications unknown) deflection of the beam. To avoid a so-called inverse crime [], the underlying true flexibility is generated at a much finer discretization than the one used during the inverse problem solution. Moreover, the full autocovariance information via Cholesky decomposition is used (assuming an exponential kernel with l c = m and applying the same noise used in the likelihood) Analytical solution for prior and posterior The mean and autocovariance functions of the prior deflection can be evaluated using the prior information about the flexibility F (x) and the forward operator. Since F (x) is Gaussian and w(x, F (x)) is a linear function of F (x), the prior distribution of the deflection is also Gaussian. Therefore, an expression for the mean of w(x) can be obtained using µ F in Eq. (6), µ w (x) = P x s (L t)µ F (t) dtds = P µ F 6 x (3L x) (8) and similarly, the autocovariance function of w(x) can be deduced using C F F (x, x ), x C ww (x, x ) = P x s s (L t)(l t )C F F (t, t ) dt dt ds ds, (9) which leads to different expressions depending on the choice of C F F (x, x ). The mean, standard deviations and autocorrelation functions of the prior deflection random fields are shown in Figure. The autocorrelation functions for the prior flexibility are also plotted (the mean and standard deviations are not included since they are constant). Closed-form expressions of the posterior random fields of the flexibility and deflection can also be derived in this example. Since the prior and likelihood are Gaussian, the posterior distribution is also Gaussian [47]. We introduce the random vector F = [F, ỹ], which is comprised of the random vectors F = F (x, ω) R n and ỹ R m, with F representing the random field discretized at spatial locations x = [x,..., x n ]. The mean vector and covariance matrix of F can be partitioned accordingly in terms of individual and crossed components [58]: µ F = [ µf µỹ ] [ ΣF Σ F F = F Σ F ỹ Σ T F ỹ Σỹỹ ]. () The n-th order fi-di posterior distribution of the flexibility random field f(f ỹ) can be obtained analytically from direct application of the Bayes theorem (see e.g., [, 3.4]); this conditional PDF is given by f(f ỹ) = ( (π)n det(σ F F ỹ ) exp ) [F µ F ỹ] T Σ F F ỹ [F µ F ỹ ]. ()

11 µw(x) [m] Mean deflection.8 σw(x) [m].3.. Std. of deflection Exponential kernel Sq. Exponential kernel. x x R F F (x, x ): Exponential C F F x R ww (x, x ): Exponential C F F x R F F (x, x ): sq. Exponential C F F 5 x 4 3 x R ww (x, x ): sq. Exponential C F F 5 x 4 3 x Figure : Mean, standard deviation, and autocorrelation function of the prior flexibility and deflection random fields. Using the exponential and squared exponential kernels for C F F (with l c =.5). An analogous expression can be obtained for the posterior distribution of the deflection f(w ỹ). Those multivariate distributions are characterized by the conditional mean vectors µ F ỹ, µ w ỹ and the conditional autocovariance matrices Σ F F ỹ, Σ ww ỹ, which are respectively given by [58] µ F ỹ = µ F + Σ F ỹ Σ ỹỹ (ỹ µỹ) Σ F F ỹ = Σ F F Σ F ỹ Σ ỹỹ Σ T F ỹ µ w ỹ = µ w + Σ wỹ Σ ỹỹ (ỹ µỹ) Σ ww ỹ = Σ ww Σ wỹ Σ ỹỹ Σ T wỹ (a) (b) these quantities are known from the prior random fields or can be computed analytically. The mean, standard deviation and autocorrelation functions of the posterior flexibility and deflection random fields are plotted in Figure Approximated solution for prior and posterior When dealing with random fields, the Bayesian inference process involves analysis in high dimensional spaces. In such cases, the uncertain function is typically represented by a suitable parametrization. The K-L expansion is employed here to discretize the prior flexibility random field. We express the forward operator in Eq. (6) in terms of the K-L expansion for the flexibility field as, ŵ(x, θ) = P = P x s x s = µ w (x) [ ] M (L t) µ F (t) + λk φ k (t)θ k dtds k= (L t)µ F (t) dtds P x s (L t) M Φ k (x) λ k θ k where, Φ k (x) = P k= M λk φ k (t)θ k dtds k= x s (3a) (3b) (L t)φ k (t) dtds. (3c)

12 µ w ỹ (x) [m] µ F ỹ (x) [/(kn m )] Mean flexibility Exponential kernel Sq. Exponential kernel R F F ỹ (x, x ): from exponential C F F 5 x x Mean deflection Exponential kernel Sq. Exponential kernel R ww ỹ (x, x ): from exponential C F F 5 x 4 3 x σ F ỹ (x) [/(kn m )] σ w ỹ (x) [m] Std. flexibility Exponential kernel Sq. Exponential kernel. x R F F ỹ (x, x ): from sq. Exp. C F F x 3 Std. deflection Exponential kernel Sq. Exponential kernel x R ww ỹ (x, x ): from sq. Exp. C F F x Figure 3: Mean, standard deviation, and autocorrelation function of the posterior flexibility (rows -) and deflection (rows 3-4) random fields (from Eqs. (a),(b)). Using the exponential and squared exponential kernels for C F F (with l c =.5 and m = ).

13 Alternatively, for a given discretization of the domain x = [x,..., x n ], Eq. (3c) can also be written in matrix form as ŵ = µ w ΦΛθ = µ w Aθ (4) where, µ w R n is the prior mean deflection vector computed from Eq. (8), Φ R n M is a matrix obtained by evaluating Φ k (x) in Eq. (3c) (that is, Φ (j,k) = Φ k (x j ), for j =,..., n), and Λ = diag( λ) R M M is a diagonal matrix with the square root of the eigenvalues. Observe that Φ k (x) can be evaluated analytically given that the eigenpairs of the target autocovariance kernel are available. The approximated posterior random field can be subsequently computed following the same procedure of 4.., but in this case for the standard Gaussian random vector θ. Hence, we assume a Gaussian random vector composed by θ and ỹ. The posterior distribution can be calculated as the conditional PDF of θ given ỹ, as in Eq. (). This posterior random field can be represented as a multivariate Gaussian distribution with conditional mean vector µ θ ỹ and conditional covariance matrix Σ θθ ỹ given by, µ θ ỹ = µ θ + Σ θỹ Σ ỹỹ (ỹ µỹ) and Σ θθ ỹ = Σ θθ Σ θỹ Σ ỹỹ Σ T θỹ; (5) here, µ θ = E[θ] =, Σ θθ = I (where I R M M is the identity matrix), and the remaining covariance terms can be derived analytically from the approximated model in Eq. (4). Therefore, the mean vector and covariance matrix of the posterior distribution of θ can be computed respectively as µ θ ỹ = A T ( AA T + Σ ηη ) (ỹ µỹ) and Σ θθ ỹ = I A T ( AA T + Σ ηη ) A. (6) The posterior random fields of the flexibility and deflection after using the K-L approximation can be obtained from the posterior of θ. In this case, both random fields are also represented by multivariate Gaussian distributions described by the following approximated mean and autocovariance functions, µ F ỹ (x) = µ F (x) + µ w ỹ (x) = µ w (x) + P M k= λk φ k (x)µ (k) θ ỹ M k= λk Φ k (x)µ (k) θ ỹ Ĉ F F ỹ (x, x ) = M Ĉ ww ỹ (x, x ) = P M k= l= M k= l= λk λ l φ k (x)φ l (x )Σ (k,l) θθ ỹ M λk λ l Φ k (x)φ l (x )Σ (k,l) θθ ỹ (7a) (7b) where the superscripts in vector µ (k) θ ỹ and matrix Σ (k,l) θθ ỹ refer to element indexing Analytical solution for the model evidence Consider a finite collection of possible models {M, M,..., M M,..., M Mmax }, where M [, M max ] is a model indicator index. Each particular model M M has an associated vector of uncertain parameters θ R M, where the dimension M vary between different models. In the context of the K-L discretization, these models correspond to the dimension of the stochastic space discretized by the truncated series, i.e., the number of terms in the K-L expansion. An analytical expression for the model evidence can be derived for this example. The process involves a marginalization of the likelihood function over the parameters (integration); alternatively it can be computed as the product of prior with likelihood divided by the posterior. Following the latter approach, the natural logarithm of the model evidence is given by, ln Z (ỹ M) = ln f (θ M) + ln L (θ ỹ, M) ln f (θ ỹ, M) (8) where the log-prior, log-likelihood and log-posterior conditional on the dimension are ln f (θ M) = M ln(π) θt θ ln L (θ ỹ, M) = m ln(π) ln (det(σ ηη)) ( ) [ỹ (µ w Aθ)] T Σ ηη [ỹ (µ w Aθ)] (9a) (9b) 3

14 ln f (θ ỹ, M) = M ln(π) ln (det(σ θθ ỹ)) ( ) [θ µ θ ỹ ] T Σ θθ ỹ [θ µ θ ỹ ]. (9c) After substituting Eqs. (9a) (9c) into Eq. (8) and some algebra (a derivation is given in the Appendix), the analytical expression for the model evidence is found to be, ln Z (ỹ M) = ( ( ) ) det(σηη ) m ln(π) + ln + (ỹ µ w ) T Σ ηη (ỹ µ w ) µ T det(σ θθ ỹ ) θ ỹσ θθ ỹµ θ ỹ. (3) The model evidence is employed to assess whether a more complex model is required for the representation of the measurement data. In this case, the dimension with the highest value of the model evidence is regarded as the best model, meaning that it gives an optimum balance between predictability and quality of the data fit [59]. 4.. Parametric studies We are now able to evaluate the influence of the K-L expansion on the prior as well as on the posterior flexibility and deflection random fields. The following settings are considered: the number of terms in the K-L expansion is chosen from M = {5,, }; the correlation length of the prior flexibility from l c = {.5,.5, 4.5} m; two different sets of measurements are assumed with m = {, 4} points (see Figure ); and for each of these settings, the two autocovariance functions in Eq. (5) are used to represent the prior flexibility, namely, the exponential and the squared exponential kernels (the standard deviation is fixed and it is specified in 4..). Posterior approximation: the analytical posterior random field expressions (Eqs. (a) and (b)) and the associated K-L approximations (Eqs. (7a) and (7b)), allow us to assess the influence of different prior random field assumptions on the posterior solution. In the following, 95% credible intervals (CI) are represented as the region between the.5 and.975 quantiles of the posterior. The approximation of the posterior flexibility random field using an exponential kernel as the underlying prior flexibility covariance is illustrated in Figure 4. We show the 95% CI of the analytical solution (shaded area) and the K-L approximations as a function of the number of terms in the expansion for increasing correlation length. The full set of K-L representations are contained inside the analytical CI, and they convergence to this solution as the number of terms increases. Thus, the K-L expansion under-represents the true variability in the posterior flexibility. For small correlation lengths, the posterior random field is more difficult to capture since one is learning a random field that has larger variability. Nevertheless, already M = terms in the expansion are enough to have a good approximation of the flexibility random field for this example. Comparing the results from both sets of measurements, it can be seen that the number of data points controls the width of the CI bounds. The width of those bounds narrows when more information is available. Furthermore, the flexibility random field is no longer weakly homogeneous since the posterior mean vary through the domain (the plots are omitted). Figure 5 presents the approximation of the posterior deflection random field with underlying exponential autocovariance for the prior flexibility. In order to illustrate the distinction between solutions, the 95% CIs of a differential deflection are shown. They are computed as the difference between the prior mean of the random field and the 95% posterior CIs. In contrast to the posterior flexibility, the correlation length does not have a large influence in the K-L approximation of the posterior deflection. For all cases, the K-L expansion represents the posterior deflection almost as exact as the analytical case, even when using a small number of terms in the expansion. The reason for this is that the posterior deflection is computed by averaging the K-L expansion of the flexibility random field over the domain (see Eq. (3a)). As a result, the influence of the higher K-L eigenfunctions becomes negligible, and mainly the first modes have a contribution to the random field representation. 4

15 .5.5. m = m = 4 Flexibility [/(kn m )].75 Flexibility [/(kn m )] l c = l c = Flexibility [/(kn m )] Flexibility [/(kn m )].8 4 l c = l c = Flexibility [/(kn m )] Flexibility [/(kn m )].8 4 l c = l c = Figure 4: Posterior flexibility (using an exponential kernel for the prior): 95% CI for different terms in the K-L expansion, number of measurements (rows), and correlation lengths of the prior flexibility (columns). The shaded area corresponds to the analytical CI (Eq. (a)).. m = m = 4 Differential deflection [m].75 Differential deflection [m].5. l c = l c = Differential deflection [m] Differential deflection [m]. l c = l c = Differential deflection [m] Differential deflection [m]. l c = l c = Figure 5: Differential posterior deflection (using an exponential kernel for the prior): 95% CI for different terms in the K-L expansion, number of measurements (rows), and correlation lengths of the prior flexibility (columns). The shaded area corresponds to the analytical CI (Eq. (b)). Finally, the approximation of the posterior flexibility and deflection random fields assuming a squared exponential autocovariance function for the prior flexibility is shown in Figure 6. Here, only the results for the set of measurements with m = points are shown. Even for small correlation lengths and K-L expansions with at least M = terms, the difference between the posterior flexibility random field and 5

16 .5 the analytical solution is negligible. As the correlation length increases, the inverse problem solution can be computed accurately with even a smaller number of terms in the expansion (M = 5). We point out that the eigenvalue decay is stronger for the squared exponential kernel as compared to the exponential, which yields to a lower number of terms in the K-L representation. It is also reminded that the true underlying flexibility is generated assuming an exponential kernel. This is reflected in the inverse problem solution, since sample paths generated from a random field with a squared exponential covariance smooth out faster. The resulting posterior approximation is not able to capture the true underlying field with high confidence at all spatial points of the domain when the correlation length is large..5. m = m = Flexibility [/(kn m )].75 Differential deflection [m] l c = l c = Flexibility [/(kn m )] Differential deflection [m].8 4 l c = l c = Flexibility [/(kn m )] Differential deflection [m].8 4 l c = l c = Figure 6: Posterior flexibility and differential posterior deflection (using a squared exponential kernel for the prior): 95% CI for different terms in the K-L expansion and correlation lengths of the prior flexibility (columns). The shaded area corresponds to the analytical CI. Model comparison: the analytical expression of the model evidence in Eq. (3) is now used to perform model comparison. Figure 7 shows the model evidence for different K-L expansion terms, where the exponential kernel is used as the autocovariance of the prior flexibility. The best models are highlighted by a red solid line. Notice that different choices in the parameters of the prior random field lead to different optimal truncation orders in the K-L expansion. As also evident in the posterior approximation results, random fields described by small correlation lengths require a larger number of terms in the expansion for their discretization. In particular, the information gained by the inclusion of additional terms is negligible once the best dimension is achieved, and is lower than the penalty for the increased model complexity. Furthermore, more measurement data leads to a larger model evidence, which requires more K-L parameters for an optimal random field representation. The model evidence using a squared exponential autocovariance kernel for the prior flexibility is also evaluated (the plots are omitted). The results based on this assumption yield smaller model evidence values as compared to the exponential case. This agrees with the fact that the underlying true autocovariance is of the exponential type. Since the solution of the inverse problem is typically affected by changes in data, different measurements will yield different model evidence factors. In order to assess the overall contribution of the number of terms in the K-L discretization, it is relevant to compute the model evidence without considering the measurement data. Therefore, the model evidence can be marginalized with respect to the observational data as, Eỹ [Z (ỹ M)] = Z (ỹ M) f data (ỹ)dỹ; (3) ỹ 6

Random Fields in Bayesian Inference: Effects of the Random Field Discretization

Random Fields in Bayesian Inference: Effects of the Random Field Discretization Felipe Uribe a, Iason Papaioannou a, Wolfgang Betz a, Elisabeth Ullmann b, Daniel Straub a a Engineering Risk Analysis Group,