Comparison study between MCMC-based and weight-based Bayesian methods for identification of joint distribution

Size: px

Start display at page:

Download "Comparison study between MCMC-based and weight-based Bayesian methods for identification of joint distribution"

Tyrone Wilkins
5 years ago
Views:

1 Struct Multidisc Optim (2010) 42: DOI /s RESEARCH PAPER Comparison study between MCMC-based and weight-based Bayesian methods for identification of joint distribution Yoojeong Noh K. K. Choi Ikjin Lee Received: 24 January 2010 / Revised: 16 June 2010 / Accepted: 30 June 2010 / Published online: 27 July 2010 c Springer-Verlag 2010 Abstract The Bayesian method is widely used to identify a joint distribution, which is modeled by marginal distributions and a copula. The joint distribution can be identified by one-step procedure, which directly tests all candidate joint distributions, or by two-step procedure, which first identifies marginal distributions and then copula. The weight-based Bayesian method using two-step procedure and the Markov chain Monte Carlo (MCMC)-based Bayesian method using one-step and twostep procedures were recently developed. In this paper, the one-step weight-based Bayesian method and two-step MCMC-based Bayesian method using the parametric marginal distributions are proposed. Comparison studies among the Bayesian methods have not been thoroughly carried out. In this paper, the weight-based and MCMC-based Bayesian methods using one-step and two-step procedures are compared to see which Bayesian method accurately and efficiently identifies a correct joint distribution through simulation studies. It is validated that the two-step weight-based Bayesian method has the best performance. Keywords Copula Identification of joint distribution Weight-based Bayesian method MCMC-based Bayesian method One-step procedure Two-step procedure Y. Noh K. K. Choi (B) I. Lee Department of Mechanical & Industrial Engineering, College of Engineering, The University of Iowa, Iowa City, IA 52242, USA kkchoi@engineering.uiowa.edu Y. Noh noh@engineering.uiowa.edu I. Lee ilee@engineering.uiowa.edu 1 Introduction In many engineering applications, it was found that input random variables such as fatigue material properties are correlated (Socie 2003; Annis 2004; Efstratios et al. 2004; Pham 2006). When the input random variables are correlated, the joint distribution needs to be obtained. For example, the reliability-based design optimization (RBDO) requires an accurate joint distribution of correlated input variables to obtain accurate optimum design (Noh et al. 2007, 2008). However, it is difficult to model the joint distribution from limited data in real engineering applications. For this, a copula can be used to generate a joint distribution by utilizing the correlation parameter and marginal distributions, which can be obtained from experimental data. To identify the correct joint distribution, the Bayesian method or goodness-of-fit (GOF) test can be used. Since it is known that the Bayesian method is more efficient and accurate in identifying the copula and marginal distributions than GOF (Noh et al. 2010), only the Bayesian method is used in this paper. The joint distribution can be obtained by a one-step or two-step procedure. The one-step Bayesian method identifies a joint distribution by directly testing all candidate joint distributions while the two-step Bayesian method identifies marginal distributions first, and then identifies a copula using the identified marginal distributions, which are used to construct a joint distribution (Huard et al. 2006; Genest et al. 1995; Hürliman 2004;Roch and Alegre 2006). The weight-based Bayesian method using the two-step procedure (Noh et al. 2010) andthemarkov chain Monte Carlo (MCMC)-based Bayesian method using one-step and two-step procedures (Silva and Lopes 2008) were recently developed. Simulations test results showed that those Bayesian methods have good performance of

2 824 Y. Noh et al. identifying a correct joint distribution. However, the weightbased Bayesian method using the one-step procedure was not investigated. The MCMC-based method using the twostep procedure was developed (Silva and Lopes 2008), but only the empirical marginal distribution, which is not often used in engineering applications, was considered. Thus, the one-step weight-based Bayesian method and the twostep MCMC-based Bayesian method using the parametric marginal distributions are developed in this paper. Even though the Bayesian methods have been investigated in many studies, it has not been thoroughly tested for which method has the best performance in identifying the joint distribution among various Bayesian methods. Through simulation tests, the weight-based and MCMCbased Bayesian methods using one-step and two-step procedures are compared in this paper to see how accurately and efficiently those methods identify a correct joint distribution for various types of joint distributions. In Section 2, the basic concept of the copula is introduced. The weight-based and MCMC-based Bayesian method using one-step and two-step procedures are illustrated in Sections 3 and 4, respectively. Section 5 shows the simulation results of comparison studies in terms of accuracy and efficiency of identifying a joint distribution. 2 Copula to represent joint distribution Consider a joint cumulative distribution function (CDF) F X1 X n (x 1,, x n ) of random variables X i for i = 1,, n. According to Sklar s theorem (Nelsen 1999), there exists a unique copula C such that F X1,...,X n (x 1,..., x n ) = C ( F X1 (x 1 ),..., F Xn (x n ) θ ) (1) where F Xi (x i ) is the marginal CDF of X i for i = 1,, n; and θ is the matrix of correlation parameters between X 1,..., X n. Taking the derivative of (1) with respect to x 1,, x n,the joint probability density function (PDF) is obtained as f X1 X n (x 1,, x n ) = c ( F X1 (x 1 ),, F Xn (x n ) θ ) n f Xi (x i ) (2) i=1 where c (u 1,, u n ) = n C(u 1,,u n ) u 1 u n is the copula density function with u i = F Xi (x i ), and f Xi (x i ) indicates the marginal PDF of X i for i =1,, n. Thus, the joint CDF and PDF can be constructed by combining marginal distributions and copula function. Most copula applications consider bivariate data because few copula families have n-dimensional generalization. Even though some copula families such as Archimedean can represent the joint distribution with n-dimensional correlated variables, but those only have one correlation coefficient for n correlated variables. It has often been observed that two input variables are correlated in many cases (Socie 2003; Annis 2004; Efstratios et al. 2004; Pham2006), so that only bivariate copulas are considered in this paper. To model the joint CDF using the bivariate copula, the correlation parameter θ needs to be obtained from experimental data. Since various types of copulas have their own correlation parameters, it is desirable to have a common correlation measure to obtain the correlation parameter from the experimental data. There are two commonly used correlation coefficients, Pearson s rho and Kendall s tau. The Pearson s rho (Pearson 1896) is used as a correlation measure of linear dependence between two variables. However, if the two variables have a nonlinear dependence, the correlation between two random variables cannot be accurately measured. On the other hand, since the Kendall s tau measures the correspondence of rankings between random variables (Kendall 1938; Kruskal1958), which is theoretically related to the definition of copulas, it can be used for various copulas with both linear and nonlinear dependence. Thus, in this paper, Kendall s tau is used. The population version of Kendall s tau (τ) can be obtained using the copula function and the correlation parameter θ as τ = 4 C (u,v θ ) dc (u,v) 1 (3) I 2 where I 2 = I I (I = [0, 1]), dc = u v dudv, and u = F X (x) and v = F Y (y) are marginal CDFs of X and Y, respectively (Nelsen 1999). Thesample version ofkendall s tau (t) is obtained as 2 C t = c d (4) c + d where c and d represent the number of concordant and discordant pairs of given data, respectively. Using the estimated Kendall s tau, the correlation parameter of the copula, θ, can be calculated because Kendall s tau can be expressed as a function of the correlation parameter as shown in (3). The explicit functions of (3) for some copulas that are used in this paper are presented in Noh et al. (2010). More theoretical explanations on copulas are presented in Nelsen (1999)andJoe(1997). 3 Weight-based Bayesian method The weight-based Bayesian method calculates the normalized weights of candidate models such as marginal distributions, copulas, or joint distributions using a probability

3 Comparison study between MCMC-based and weight-based Bayesian methods for identification of joint distribution 825 of a hypothesis on each candidate model, and selects a model with the highest normalized weight as the correct one. Sections 3.1 and 3.2 illustrate the one-step and two-step weight-based Bayesian method, respectively. 3.1 One-step weight-based Bayesian method Consider a hypothesis h ijk that the given data D come from a candidate joint distribution M ijk where i, j,andk indicate the indexes of seven candidate marginal distributions of X and Y, and nine copulas for i = 1,,7and j = 1,,7,and k = 1,, 9, respectively. That is, M ijk indicates a candidate jointdistributionmodeled byith and jth candidate marginal distributions and kth candidate copula. In this paper, the candidate marginal distributions for two random variables X and Y are Gaussian, Weibull, Gamma, Lognormal, Gumbel, Extreme type-i, and Extreme type-ii; whereas the candidate copulas are Clayton, AMH, Gumbel, Frank, A12, A14, FGM, Gaussian, and independent copulas. Thus, these are 441 (7 7 9) candidate joint distributions to be tested. To identify a joint distribution that best describes given data among candidates, the Bayesian method considers the probability of each hypothesis h ijk given data D as (Noh et al. 2010; Huard et al. 2006) Pr ( h ijk D, I ) = Pr ( D hijk, I ) Pr ( h ijk I ) Pr (D I ) where Pr(D h ijk, I ) is the conditional probability of drawing data D from the hypothesis on h ijk, Pr(h ijk I ) is the prior probability on the candidate, and Pr(D I ) is the normalization constant with any relevant additional knowledge I. Consider a parameter vector = [μ X,σ X,μ Y,σ Y,θ] T consisting of means and standard deviations of X and Y, respectively, and correlation parameter between X and Y. Assume that the standard deviations of X and Y are the fixed values, which are obtained from given data. Selecting μ X, μ Y,andθ as the nuisance variables, (5) can be written (5) Pr ( h ijk D, I ) = 1 1 Pr ( D hijk,,i ) Pr ( h ijk, I ) Pr ( I ) dμ X dμ Y dτ (6) Pr (D I ) Since each copula has its own correlation parameter θ, Kendall s tau τ, which is expressed as θ = rk 1 (τ), isused as a nuisance variable for kth copula in (6). The explicit equations for some copulas are presented in Noh et al. (2010). Equation (6) could be expressed in terms of five parameters, μ X, μ Y, σ X, σ Y,andτ.However,sincethefive dimensional integration requires significantly more computational effort and its performance is similar to the triple integration using two means μ X, μ Y and one correlation coefficient τ, the triple integration is used in this paper to calculate probability of the hypothesis on each candidate asshownin(6). In (6), two standard deviations, σ X and σ Y, could be used as the nuisance variables for the triple integration instead of μ X and μ Y. However, use of the means better identifies a correct distribution than use of the standard deviations (Noh et al. 2010). Pr(D h ijk,, I ) is the likelihood function of the parameter vector for data D from the hypothesis h ijk,whichis expressed as Pr ( D hijk,,i ) = L ( x, y, Mijk ) = ns m=1 f ijk XY (x m, y m ) (7) for given paired data x = [x 1,, x ns ] T and y =[y 1,, y ns ] T where ns is the number of paired data. In (7), f ijk XY ( ) is the candidate joint PDF of X and Y ; x m and y m represent the mth sample point for m = 1,, ns. Using the copula density function, the joint PDF for M ijk can be written as ( ) f ijk XY (x m, y m ) = c k FX i (x m μ X ), F j Y (y m μ Y ) rk 1 (τ) f i X (x m μ X ) f j Y (y m μ Y ) (8) for ith and jth candidate marginal distributions and kth candidate copula. In (6), since all candidates are equally probable with respect to, Pr(h ijk, I ) is obtained as Pr ( h ijk, I ) = { 1, ijk 0, / ijk and ijk is the domain of the parameter vector of the candidate M ijk.in(6), Pr( I ) is the prior distribution on as 1 Pr ( I ) = λ ( ), (10) 0, / (9)

4 826 Y. Noh et al. where is the domain of the parameter vector that users might know, and λ ( ) is the width of the domain. After substituting (7, 9,and10)into(6) and integrating it, the weight of each candidate joint distribution is defined as W ijk = ns ijk m=1 c k ( F i X (x m μ X ), F j Y (y m μ Y ) rk 1 λ ( μ X ) λ ( μ Y ) λ ( τ ) ) (τ) f X i (x m μ X ) f j Y (y m μ Y ) dμ X dμ Y dτ (11) where the normalization constant Pr(D I ) is not used for convenience and the prior is used as the domain of the integration. The triple integration of (11) is calculated using triplequad function, which uses adaptive Simpson quadrature, in a commercial code Matlab. The normalized weight is calculated by w ijk = 7 W ijk 7 i=1 j=1 k=1 (12) 9 W ijk Calculating the normalized weights of candidate joint distributions, the one with the highest normalized weight among 441 candidate joint distributions is selected as the correct joint distribution. If the prior distribution is known, then the correct model could be identified more often, especially when the number of samples is small. However, the prior distribution is usually unknown, and if the wrong prior distribution is used, a wrong model could be identified. Thus, in this paper, the uniform distribution is used as the prior distribution as shown in (10), which makes the calculation of the weight more depend on the data. As the number of samples increases, the effect of the prior distribution on calculation of the weight becomes negligible. 3.2 Two-step weight-based Bayesian method The two-step Bayesian method identifies marginal distributions of X and Y first, and then a copula using the identified marginal distributions. Using the same procedure as the one-step Bayesian method, the weight of each marginal distribution can be obtained as W i = 1 λ ( γ ) γ i γ ns m=1 f i X (x m a (γ,σ), b (γ,σ))dγ (13) by integrating the likelihood function of the ith candidate marginal distribution M i over the parameter γ (mean) where f i X (x m a (γ,σ), b (γ,σ)) is the ith marginal PDF evaluated at mth sample point x m of X for m = 1,, ns.in(13), a and b are the parameters of the ith marginal PDF, which are expressed in terms of mean γ and standard deviation σ (Noh et al. 2010). Equation (13) also can be used to calculate the weight of the jth candidate marginal PDF of Y for j = 1,,7. For the copula, the weight of each candidate copula can be defined as W k = 1 λ ( γ ) γ k γ ns m=1 ( ) r 1 c k u m,v m k (γ ) dγ (14) ( where the parameter γ is the Kendall s tau; c k um,v m rk 1 (γ ) ) is the copula density function value of kth candidate for k = 1,,9 at the identified marginal CDF values, u m = F x (x m ) and v m = F Y (y m ),form = 1,, ns. The normalized weights of (13 and 14) are obtained using (12), but the denominator is the summation of the weights of candidate marginal distributions or copulas. The one-dimensional integrations of (14 and 15) are calculated using quad function, which uses adaptive Simpson quadrature, in Matlab. In this process, there are 23 ( ) candidates to be tested, which is much less than 441 candidates that need to be tested for the one-step procedure. Thus, the two-step procedure is much more efficient. 4 Markov Chain Monte Carlo simulation-based Bayesian method The MCMC-based Bayesian method identifies a correct model (marginal distribution, copula, or joint distribution) among candidates using a criterion such as a deviance information criterion (DIC). Using the MCMC, samples of the parameter vector consisting of mean, standard deviation, and correlation parameter are randomly generated from the posterior distribution of the parameter vector, and those are used to calculate the DIC value for each candidate. The smaller the DIC value is, the better fit to the model the data is. Thus, the candidate with the lowest DIC value is identified as a correct model.

5 Comparison study between MCMC-based and weight-based Bayesian methods for identification of joint distribution One-step MCMC-based Bayesian method Since the posterior distribution is proportional to the likelihood function L(x, y, M ijk ) in (6) and prior distribution of the parameter vector g( ), it is written as g ijk ( x, y) L ( x, y, Mijk ) g ( ) (15) The prior distribution of the parameter vector is usually unknown, so that the uniform distribution is used as the prior distribution like the weight-based Bayesian method. To identify a model based on the given data, the MCMCbased Bayesian method requires the samples of the parameter vector to be obtained from the posterior distribution of the parameter vector, which is hard to obtain because the posterior distribution does not have a standard form such as Gaussian or lognormal. Thus, the MCMC simulation is used to obtain the posterior distribution of the parameter vector. The MCMC simulation is a method of generating random samples from probability distributions via Markov chains (Gamerman and Lopes 2006; Gelmanetal.2004; Robert and Casella 2004). The objective of the MCMC is to generate one or more values of a parameter vector. Rather than attempting to directly draw samples from the [ probability distribution of, g ijk ( x, y), a sequence (1), (2),, (t), ] is generated where each vector (t) in the sequence depends on the preceding ones, (1), (2),, (t 1). For a sufficiently large number t, i.e., 1000, (t) is approximately generated from g ijk ( x, y). The slice sampling (Neal 2003) and Metropolis-Hastings (Metropolis et al. 1953) are the most popular MCMC methods. However, the Metropolis-Hastings method requires determination of a proposal distribution, which is used to draw samples, whereas the slice sampling method does not. To produce samples efficiently using the Metropolis-Hastings method, it is important to select a good proposal distribution. Thus, the slice sampling method, which does not require the proposal distribution, is preferred. Consider a parameter variable, e.g., the mean. The first step of the slice sampling is to assume an initial value μ (t) within the domain of the posterior distribution g(μ x)ofthe given data x. Then, a value y is uniformly drawn from (0, g(μ (t) x)) as shown in Fig. 1a. Thus, a horizontal slice can be defined as S = {μ: y < g(μ x)}, and μ (t) is always within S, which is indicated by bold lines in Fig. 1b. Accordingly, in the second step, an interval I = (L, R) can be found around μ (t) within this interval that contains all, or much of the slice S. Let the length of the interval I be w where circular dots in Fig. 1b indicate the left and right bounds L and R of the interval I, respectively. The interval is expanded until both ends are outside of the slice as shown in Fig. 1b. In the third step, the sequential integer Fig. 1 Slice sampling (Neal 2003) t is increased to t + 1, and the new point μ (t+1) is found within the interval until a point inside the slice is found. Points that are picked outside the slice such as μ*infig.1c are used to reduce the interval size, indicated by rectangular dots in Fig. 1c. Steps 1 and 2 are repeated until the desired number of samples for the slice sampling, N = 1,000, is achieved. Using the slice sampling, N samples of the parameter vector [ (1), (2),, (N)] are obtained, and those are used to estimate the parameter of the original data or confidence interval of the parameter. In this paper, the samples of the parameter vector are used to identify a correct model among candidates using the DIC as following. The DIC is used to select a joint distribution that best fits the given data x and y. The DIC is defined as (Gelman et al. 2004; Spiegelhalter et al. 2002) DIC = D + p D = 2 D D ( ) (16) where D is the expectation of the deviance function, which is defined as D ( ) = 2log [ L ( x, y )], Mijk (17) p D is the effective number of parameters of the model, which is computed as p D = D D ( ) where is the expectation of the parameter vector. In (16), since the expectation of the deviance function D indicates how well the model fits to the data, the smaller D is, the better fit to the data the model is. On the other hand, p D indicates the complexity of the model, so that the larger p D value means it is easier for the model to fit the data. However, it does not necessarily mean that the complex

828 Y. Noh et al. model, i.e., large p D, better represents a true model than a less complex model. For example, a fifth-order polynomial can exactly fit six points.

6 828 Y. Noh et al. model, i.e., large p D, better represents a true model than a less complex model. For example, a fifth-order polynomial can exactly fit six points. However, if those six points are not properly distributed, then the higher-order polynomial is not useful to represent a true response. Thus, a moderate model with a better fit to the data (with small D and p D ), that is, the one with the smallest DIC will be selected as a correct model. Using (16), the DIC of a candidate M ijk can be written as DIC ( ) [ ] M ijk = 2E D ( ) x, y, Mijk D ( E [ ]) x, y, Mijk (18) where E[D( ) x, y,m ijk ] is approximated as N D ( (l)) l=1 E [ D ( ) ] x, y, Mijk N where E[ x, y,m ijk ] can be approximated as N (l) l=1 (19) E [ ] x, y, Mijk (20) N Substituting (20) into the deviance function in (17), and calculating (19), the DIC value can be calculated for each candidate M ijk. 4.2 Two-step MCMC-based Bayesian method For the two-step MCMC-based Bayesian method, the likelihood function of the ith candidate marginal distribution M i with mean and standard deviation of X is defined as L (x μ X,σ X, M i ) = ns m=1 f i X (x μ X,σ X ) (21) For Y, L(y μ Y, σ Y, M j ) is the likelihood function of the jth candidate marginal distribution M j, f j Y (y μ Y,σ Y ). Likewise, the likelihood function of the kth candidate copula M k is defined as L (x, y θ, M k ) = ns m=1 c k (x, y θ ) (22) Using the likelihood functions of the marginal distribution and copula, the deviance functions can be obtained. Using the deviance functions and generated samples of the parameter vectors from the slice sampling, the DIC values of candidate marginal distributions and copulas can be obtained using (18). As in the case of the weight-based Bayesian method, for the one-step procedure, there are 441 candidates that need to be tested whereas 23 candidates need to be tested for the two-step procedure. Table 1 Four cases Marginal distributions Copula Kendall s X Y Case 1 Extreme-II Extreme-I Gumbel 0.7 Case 2 Gaussian Weibull Frank 0.7 Case 3 Weibull Gumbel A Case 4 Lognormal Extreme-II Clayton Comparison of methods In Sections 5.1 and 5.2, the weight-based and MCMCbased Bayesian methods using one-step and two-step procedures are compared in terms of accuracy and efficiency, respectively. 5.1 Accuracy test Since it is impossible to show simulation results for all candidate joint distributions, i.e., 441, in the paper, four joint distributions with various combinations of marginal distributions, copulas, and Kendall s tau are considered as true models as shown in Table 1. Even though only bivariate joint distributions with positive correlation are only tested in this paper, those with negative correlation will yield similar identification results because the joint PDFs with negative correlation have the rotated shapes of those with positive correlation. Moreover, multivariate distributions are not commonly used in practical applications, so that those are not considered. The means and standard deviations are given as μ X = μ Y = 5.0 and σ X = σ Y = 2.5 for X and Y, respectively. The parameters of non-gaussian candidate marginal distributions are calculated from the given mean and standard deviation using some explicit functions (Noh Fig. 2 Joint PDF contours of four cases tau

7 Comparison study between MCMC-based and weight-based Bayesian methods for identification of joint distribution 829 Clayton AMH Gumbel Frank A12 A FGM Gaussian 0.02 Independent Fig. 3 Averaged normalized weights using one-step weight-based Bayesian method et al. 2010). Likewise, the correlation parameters of candidate copulas are obtained using (3) or explicit functions in Noh et al. (2010). The joint PDF contours of four cases are shown in Fig. 2. For the identification of a correct joint distribution, a data set is randomly generated from a true joint distribution, and then a joint distribution that best fits to the data among candidates is selected as the correct one based on the estimated weight or DIC values. To test the performance of the weight-based and MCMC-based methods, the above procedure is repeated 100 times for ns = 30, 100, and 300. Using the randomly generated 100 data sets with different number of samples, the averaged normalized weights and number of correct identifications are calculated. Fig. 4 Averaged normalized weights using two-step weight-based Bayesian method Margin Gau Wei Gam Log Gum Ext-I Ext-II X Y Clay AMH Gum Frank A12 A14 FGM Gau Ind

8 830 Y. Noh et al. Table 2 Averaged normalized weights of correct joint CDFs using one-step weight-based method ns = 30 ns = 100 ns= 300 Case Case Comparison of one-step and two-step weight-based Bayesian methods Using each dataset with specified sample data size ns that is randomly generated from the true joint distribution, the onestep weight-based Bayesian method calculates normalized weights of all 441 candidate joint distributions. Figure 3 shows the averaged normalized weights of 441 candidates over 100 trials using the one-step weight-based Bayesian method for sample data of size ns = 30 obtained from the true joint distribution Case 3 in Table 1, which is modeled by Weibull and Gumbel distributions, and A12 copula. In Fig. 3, each candidate copula has 7 7 matrix indicating seven candidate marginal distributions for the row (X) andcolumn(y ), respectively, with the order of marginal distributions as Gaussian, Weibull, Gamma, Lognormal, Gumbel, Extreme, and Extreme-II. For example, the first row and second column indicates that X and Y have Gaussian and Weibull distributions, respectively. Since the sum of the normalized weight for all 441 candidates is one, the normalized weights of many candidates are very small and some of them have zero values, which are shown as blanks in Fig. 3. Even though A12, which is the correct copula, has the higher normalized weights than other copulas in Fig.3, the normalized weight of the correct model, indicated as a shaded box in Fig. 3, is only 0.060, which could make the identification of the correct joint distribution difficult. On the other hand, the two-step weight-based Bayesian method calculates the weights of seven candidate marginal distributions of X and Y, and then calculates the weights of nine candidate copulas using the identified marginal distri- Table 4 Number of correct identification of joint CDFs using weightbased Bayesian method Joint One-step Two-step distributions ns=30 ns=100 ns=300 ns=30 ns=100 ns=300 Case Case Case Case butions. Thus, 23 ( ) candidates of marginal distributions and copulas are tested. Since the two-step Bayesian method separately calculates the normalized weights of seven marginal distributions and nine copulas, the normalized weights of the correct marginal distributions and copula are easily distinguishable compared with the one-step Bayesian method, as shown in Fig. 4. Next, the averaged normalized weights of candidate models are calculated using the one-step and two-step weight-based Bayesian methods for two cases, Case 1 and 3, using different number of samples, ns = 30, 100, and 300. In case of the one-step Bayesian method, since it is too long to show results like Fig. 3 for all cases with different number of samples, the averaged normalized weights of the correct models are only presented as shown in Table 2. Since Case 1 has a distinct PDF shapes among candidates compared to Case 3, the averaged normalized weights of the correct joint distribution for Case 1 are larger than Case 3. However, the normalized weights of Case 1 using the one-step weight-based method are still not larger than the normalized weights using the two-step weightbased method, especially for a small number of samples (Table 3). Accordingly, as shown in Table 4, the number of correct identifications using the one-step weight-based Bayesian method is smaller than the one using the two-step weight-based Bayesian method. As the number of samples increases, the performance of the one-step weight-based Table 3 Averaged normalized weights of correct marginal CDFs and copulas using two-step weight-based method ns = 30 ns = 100 ns = 300 Case 1 X Y Copula Case 3 X Y Copula Table 5 Number of correct identification of joint CDFs using MCMCbased Bayesian method Joint One step Two step distributions ns=30 ns=100 ns=300 ns=30 ns=100 ns=300 Case Case Case Case

9 Comparison study between MCMC-based and weight-based Bayesian methods for identification of joint distribution 831 method becomes improved, but it is still not as good as the performance of the two-step weight-based method. When the joint distribution shapes are not distinct, as in Case 3, it becomes more challenging for the one-step Bayesian method to identify the correct joint distribution because the candidate joint distributions with similar PDF shapes to the correct joint distribution will have similar normalized weights to the correct one. Thus, the two-step weightbased method is preferred to the one-step weight-based method Comparison of one-step and two-step MCMC-based Bayesian methods Table 5 shows the number of correct identifications of four cases using one-step and two-step MCMC-based Bayesian methods for ns = 30, 100, and 300. A fair number of candidate joint distributions have similar PDF shapes to the correct one, which lead to similar DIC values of all candidate joint distributions especially for Case 3. The marginal PDF or copula shapes are more distinctive than joint PDFs, which leads to rather different DIC values of candidate marginal distributions and copulas. Thus, the one-step Bayesian method is more difficult to identify the correct joint distribution than the two-step Bayesian method. Similar to Table 4, the number of correct identifications using the one-step MCMC-based method is smaller than using the two-step MCMC-based method. As the number of samples increases, the performance of the one-step MCMC-based Bayesian method is improved, but not as effective as the two-step MCMC-based Bayesian method. The one-step method might identify a correct joint distribution among candidate joint distributions that the two-step method does not even test. However, when the one-step method is used, the number of marginal distributions and copulas should be cautiously determined. For example, assume that the numbers of candidate marginal distributions and copulas are 8 and 10, respectively. Even though one candidate marginal distribution and copula are added to the original candidates, the total number of candidate joint distributions is increased from (441) to (640). In this case, it is more confusing to identify a correct joint distribution among 640 candidates Comparison of two-step weight-based and MCMC-based Bayesian methods In Sections and 5.1.2, the one-step and two-step Bayesian method using the weight-based and MCMC-based methods, respectively, are compared. From the simulation results, the two-step Bayesian methods are preferred to the one-step Bayesian methods. In this section, the two-step weight-based and MCMCbased Bayesian methods are compared. As shown in Tables 4 and 5, the performances of the two methods are very similar. However, the MCMC-based Bayesian method depends on random samples of the parameter vector obtained from the slice sampling. When the true distributions do not have distinct shapes, as in Case 3, the randomness of the samples of the parameter vector could affect the performance. For example, even though the same data is used to generate the random samples of the parameter vector, the identified distribution could be different according to the generated samples of the parameter vector. Suppose that the true model is Case 3 and two data sets with ns = 30 are generated. Table 6 shows the DIC values of seven candidate marginal distributions for X and Y obtained from two different sets of samples of the parameter vector. In this case, two different sets correctly identify A12, which is the correct copula, so that the DIC values of candidate copulas are not presented. However, the twostep MCMC-based method using Set 1 identifies Weibull and Gumbel (true model) as correct marginal distributions whereas the one using Set 2 identifies Gamma as correct marginal distributions for X and Y. This is because the joint PDF shapes of Case 3 and the identified model (Gamma, Gamma, and Clayton copula) are similar as shown in Fig. 5. Thus, as shown in Table 6, the DIC values of Weibull, Gamma, and Gumbel distributions are very similar, so that different marginal distributions are identified for two different sets of samples of the parameter vector even though the same data are used. To avoid this problem, the number of samples of the parameter vector is increased up to 2,000, but the results are still inconsistent and computational time is rapidly increased. It could be interesting to test cases with different correlation coefficient, but the general trend will not be changed due to the randomness of slice samples used in MCMC-based method. Thus, the Table 6 DIC values for identification of marginal CDFs for Case 3 Two-step approach Slice Sampling Margin Gaussian Weibull Gamma Lognormal Gumbel Ext. I Ext. II Set 1 X Y Set 2 X Y

10 832 Y. Noh et al. Table 8 Computational time using two-step weight-based and MCMC-based Bayesian methods (copula) Methods Copula ns = 30 ns = 100 ns = 300 MCMC s s s Weight s s s Fig. 5 PDF contours obtained from two different sets of slice sampling two-step weight-based Bayesian method is preferred over the two-step MCMC-based Bayesian method. 5.2 Efficiency test To test how the Bayesian methods efficiently identify a correct joint distribution among candidates, the computational time is calculated for one data set, which is randomly generated from a true distribution. Since the computational times are similarly estimated for four cases, Case 1 is considered. Table 7 shows the computational time when the two-step weight-based and MCMC-based Bayesian methods are used to identify the correct marginal distribution among seven candidates. The weight-based Bayesian method is more efficient than the MCMC-based Bayesian method because the MCMC-based Bayesian method takes more time to generate random samples, i.e., N = 1,000. Likewise, Table 8 shows the computational time when two-step weight-based and MCMC-based Bayesian methods are used to identify a copula among nine candidates. Again, the two-step weight-based method is more efficient than the two-step MCMC-based method in identifying the correct copula. Table 9 displays the computational time to identify a joint distribution using the weight-based and MCMC-based Bayesian methods using one-step and two-step procedures. In the two-step Bayesian methods, the total computational Table 7 Computational time using two-step weight-based and MCMC-based Bayesian methods (marginal CDF) Methods Marginal CDF ns = 30 ns = 100 ns = 300 MCMC s s s Weight s s s time to identify a joint distribution is calculated by summing up double computational times for X and Y in Table 7, and those for copula in Table 8. As shown in Table 9, since the one-step Bayesian method calculates the weights of 441 candidate joint distributions, the computational time using the one-step Bayesian methods is much larger than that using the two-step Bayesian methods. The weight-based Bayesian methods integrate the likelihood function of all candidate joint distributions over means of X and Y and Kendall s tau, whereas the MCMCbased Bayesian methods generate random samples of the parameter vector for all candidates. Since calculating integrations is more efficient than generating random samples and calculating DIC values, the weight-based method is more efficient than the MCMC-based method as shown in Table 9. In summary, the two-step Bayesian methods identify a correct joint distribution more accurately and efficiently than the one-step Bayesian method. In terms of accuracy, the two-step weight-based Bayesian method is similar to the two-step MCMC-based Bayesian method, but the twostep weight-based method identifies the correct distribution more efficiently than the two-step MCMC-based method. Moreover, the normalized weights are consistently calculated for given data, but the DIC values can be differently calculated according to the randomly generated samples of the parameter vector. Thus, the two-step weight-based Bayesian method is preferred to the one-step weight-based Bayesian method and one-step and two-step MCMC-based Bayesian methods. Table 9 Computational time using weight-based and MCMC-based Bayesian method using one-step and two-step procedures (joint CDF) Methods Weight MCMC ns = 30 One-step s 1,649 s Two-step s s ns = 100 One-step 1,218 s 2,008 s Two-step s s ns = 300 One-step 3,045 s 4,956 s Two-step s s

11 Comparison study between MCMC-based and weight-based Bayesian methods for identification of joint distribution 833 6Conclusion Different Bayesian methods were proposed to identify correct models for marginal distribution, copula, and joint distribution in literatures, but it has not been tested, which Bayesian method more accurately and efficiently identifies a correct mode. In this paper, the two recently developed Bayesian methods, weight-based and MCMC-based Bayesian methods, are compared using one-step and twostep procedures though simulation studies. For the comparison studies, a one-step weight-based method and a two-step MCMC-based method using parametric marginal distributions are developed in this paper. Through simulation studies, it is demonstrated that the two-step approach identifies the correct joint distribution more accurately and efficiently than the one-step approach for both weight-based and MCMC-based methods. The twostep weight-based and MCMC-based Bayesian methods show similar performance in identifying a correct joint distribution. However, according to randomness of generated samples of the parameter vector, the identified model using the MCMC-based method could be different even though the same data are used. On the other hand, the weightbased Bayesian method identifies the same model as long as the same data are used. In addition, the two-step weightbased method is far more efficient when calculating the weights of the candidate marginal distribution and copulas than the two-step MCMC-based method. Thus, the two-step weight-based Bayesian method is the preferred method. Acknowledgments This research is supported by the Automotive Research Center, which is sponsored by the U.S. Army TARDEC, and ARO Project W911NF This support is greatly appreciated. References Annis C (2004) Probabilistic life prediction isn t as easy as it looks. Journal of ASTM International 1(2):3 14 Efstratios N, Ghiocel D, Singhal S (2004) Engineering design reliability handbook. CRC, New York Gamerman D, Lopes HF (2006) Markov Chain Monte Carlo: stochastic simulation for Bayesian inference, 2nd edn. Chapman & Hall, London Gelman A, Carlin JB, Stern HS, Rubin DB (2004) Bayesian data analysis, 2nd edn. Chapman & Hall/CRC, London Genest C, Ghoudi K, Rivest LP (1995) A semiparametric estimation procedure of dependence parameters in multivariate families of distribution. Biometrika 82(3): Huard D, Évin G, Favre A-C (2006) Bayesian copula selection. Comput Stat Data Anal COMSTA (2): Hürliman W (2004) Fitting bivariate cumulative returns with copulas. Comput Stat Data Anal 45(2): Joe H (1997) Multivariate models and dependence concepts. Chapman & Hall, London Kendall M (1938) A new measure of rank correlation. Biometrika 30:81 89 Kruskal WH (1958) Ordinal measures of associations. J Am Stat Assoc 53(284): Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equations of state calculations by fast computing machines. J Chem Phys 21(6): Neal RM (2003) Slice sampling. Ann Stat 31(3): Nelsen RB (1999) An introduction to copulas. Springer, New York Noh Y, Choi KK, Du L (2007) New transformation of dependent input variables using copula for RBDO. In: 7th world congress on structural and multidisciplinary optimization, May 21 25, Seoul, Korea Noh Y, Choi KK, Du L (2008) Reliability based design optimization of problems with correlated input variables using copulas. Struct Multidisc Optim 38(1):1 16 Noh Y, Choi KK, Lee I (2010) Identification of marginal and joint CDFs Using Bayesian Method for RBDO. Struct Multidisc Optim 40(1):35 51 Pearson K (1896) Mathematical contributions to the theory of evolution. III. Regression, heredity and panmixia. Philos Trans R Soc Lond Ser A 187: Pham H (2006) Springer handbook of engineering statistics. Springer, London Robert CP, Casella G (2004) Monte Carlo statistical methods, 2nd edn. Springer, New York Roch O, Alegre A (2006) Testing the bivariate distribution of daily equity returns using copulas. An application to the Spanish Stockmarket. Comput Stat Data Anal 51(2): Silva RS, Lopes HF (2008) Copula, marginal distributions and model selection: a Bayesian note. Stat Comput 18(3): Socie DF (2003) Seminar notes: Probabilistic aspects of fatigue. URL: (cited May, ) Spiegelhalter DJ, Best NG, Carlin BP, Linde A (2002) Bayesian measures of model complexity and fit (with discussion). J R Stat Soc Ser B 64:

Identification of marginal and joint CDFs using Bayesian method for RBDO

Struct Multidisc Optim (2010) 40:35 51 DOI 10.1007/s00158-009-0385-1 RESEARCH PAPER Identification of marginal and joint CDFs using Bayesian method for RBDO Yoojeong Noh K. K. Choi Ikjin Lee Received: