Relevance Vector Machines for Earthquake Response Spectra

2012 2011 American American Transactions Transactions on on Engineering Engineering & Applied Applied Sciences Sciences. American Transactions on Engineering & Applied Sciences http://tuengr.com/ateas Relevance Vector Machines for Earthquake Response Spectra Jale Tezcan a*, Qiang Cheng b a Department of Civil and Environmental Engineering, Southern Illinois University Carbondale, Carbondale, IL 62901, USA b Department of Computer Science, Southern Illinois University Carbondale, Carbondale, IL 62901, USA A R T I C L E I N F O Article history: Received 23 August 2011 Received in revised form 23 September 2011 Accepted 26 September 2011 Available online 26 September 2011 Keywords: Response spectrum Ground motion Supervised learning Bayesian regression Relevance Vector Machines A B S T RA C T This study uses Relevance Vector Machine (RVM) regression to develop a probabilistic model for the average horizontal component of 5%-damped earthquake response spectra. Unlike conventional models, the proposed approach does not require a functional form, and constructs the model based on a set predictive variables and a set of representative ground motion records. The RVM uses Bayesian inference to determine the confidence intervals, instead of estimating them from the mean squared errors on the training set. An example application using three predictive variables (magnitude, distance and fault mechanism) is presented for sites with shear wave velocities ranging from 450 m/s to 900 m/s. The predictions from the proposed model are compared to an existing parametric model. The results demonstrate the validity of the proposed model, and suggest that it can be used as an alternative to the conventional ground motion models. Future studies will investigate the effect of additional predictive variables on the predictive performance of the model. 2012 American Transactions on Engineering & Applied Sciences. 25

1. Introduction Reliable prediction of ground motions from future earthquakes is one of the primary challenges in seismic hazard assessment. Conventional ground motion models are based on parametric regression, which requires a fixed functional form for the predictive model. Because the mechanisms governing ground motion processes are not fully understood, identification of the mathematical form of the underlying function is a challenge. Once a functional form is selected, the model is fit to the data and the model coefficients minimizing the mean squared errors between the model and the data are determined. This approach, when the selected mathematical form does not accurately represent the actual input-output relationship, is susceptible to overfitting. Indeed, using a sufficiently complex model, one can achieve a perfect fit to the training data, regardless of the selected mathematical form. However, a perfect fit to the training data does not indicate the predictive performance of the model for new data. Kernel regression offers a convenient way to perform regression without a fixed parametric form, or any knowledge of the underlying probability distribution. A special form of kernel regression, called the Support Vector Regression (SVR) (Drucker et al., 1997) is characterized by its compact representation and its high generalization performance. In SVR, the training data is first transformed into a high dimensional kernel space, and linear regression is performed on the transformed data. The resulting model is a linear combination of nonlinear kernel functions evaluated at a subset of the training input. Combination weights are determined by minimizing a penalized residual function. The SVR has proved successful in many studies since its introduction in 1997. The effectiveness of SVR in ground motion modeling has been recently demonstrated (Tezcan and Cheng, 2011), (Tezcan et al., 2010). A well-known weakness of the SVR is the lack of probabilistic outputs. Although the confidence intervals can be constructed using the mean-squared errors, similar to the approach used in conventional ground motion models, the posterior probabilities, which produce the most reliable estimate of prediction intervals, are not given. The lack of probabilistic outputs in the SVR formulation has motivated the development of a new kernel regression model called Relevance Vector Machine (RVM) (Tipping, 2000) which operates in a Bayesian framework. To overcome the limitations of parametric regression while obtaining probabilistic 26 Jale Tezcan and Qiang Cheng

predictions, this paper proposes a new ground motion model based on the RVM regression. Unlike standard ground motion models, which make point estimates of the optimal value of the weights by minimizing the fitting error, the RVM model treats the model coefficients as random variables with independent variances and attempts to find the model that maximizes the likelihood of the observations. This approach offers two main advantages over the conventional ground motion models. First, the prediction uncertainty is explicitly determined using Bayesian inference, as opposed to being estimated from the mean squared errors. Second, the complexity of the RVM model is controlled by assigning suitable prior distributions over the model coefficients, which reduces the overfit susceptibility of the model. The rest of the paper is organized as follows. In Section 2, the RVM regression algorithm is described. Section 3 is devoted to the construction of ground motion model. Starting with the description of the ground motion data and the predictive and target variables, the training results are presented, and the prediction procedure for new data is described. Section 4 demonstrates computational results and compares the RVM predictions to an existing empirical parametric model. Section 5 concludes the paper by presenting the main conclusions of this study, and discusses the advantages and limitations of the proposed method. 2. The RVM Regression Algorithm Given a set of input vectors x i, i = 1: N and corresponding real-valued targets t i, the regression task is to estimate the underlying input-output relationship. Using kernel representation (Smola and Schölkopf, 2004), the regression function can be written as a linear combination of a set of nonlinear kernel functions: N f(x) = w i K(x, x i ) + w 0 i=1 (1) where w i, i = 1 N are the combination weights and w 0 is the bias term. 27

This study uses the radial basis function (RBF) kernel: K(x i, x j ), = e γ x i x j 2, γ > 0 (2) where γ is the width parameter controlling the trade-off between model accuracy and complexity. In this study, the width parameter has been determined using cross-validation. Assuming independent noise samples from a zero-mean Gaussian distribution, i.e., n i ~N(0, σ n 2 ), the target values can be written as: t i = f(x i ) + n i i = 1,, N. (3) Recast in matrix from, Equation (3) becomes: t = Φw + n, (4) where t = (t 1,, t N ) T, w = (w 0,, w N ) T, and Φ is an N N + 1 basis matrix with Φ i1 = 1 and Φ ij = K x i, x j 1. The likelihood of the entire set, assuming independent observations is given by: p(t w, σ n 2 ) = (2πσ n 2 ) N 2 e 1 2σ n 2 t Φμ 2. (5) where μ = (μ 0,, μ N ) T is the vector containing the mean values of the combination weights. To control the complexity of the model, a zero-mean Gaussian prior is used where each weight is assigned a different variance (MacKay, 1992): N p(w α) = N(0, 1/α i ). i=0 (6) 28 Jale Tezcan and Qiang Cheng

In Eq. (6), α = (α 0,, α N ) where 1/α i is the variance of w i. The posterior distribution of the weights is obtained as: p(w t, α, σ n 2 ) = (2π) N+1 2 C 1 2 e 1 2 (w μ)t C 1 (w μ). (7) where the mean vector μ and covariance matrix C are: μ = σ n 2 C Φ T t (8) C = [σ n 2 Φ T Φ + A ] 1 (9) with α 0 0 : α A = 1. (10) 0 α N The marginal likelihood of the dataset can be determined by integrating out the weights (MacKay, 1992) as follows: p(t α, σ n 2 ) = (2π) N 2 H 1 2 e 1 2 tt H 1 t (11) where H = σ 2 n I N + ΦA 1 Φ T and I N is the identity matrix of size N. Ideal Bayesian inference requires defining prior distributions over α and σ 2 n, followed by marginalization. This process, 2 however, will not result in a closed form solution. Instead, the α i and σ n values maximizing Eq. (11) can be found iteratively as follows (MacKay, 1992): 29

(α i ) new = 1 α ic ii μ i 2 (12) (σ n 2 ) new = t Φμ 2 N (1 α i C ii ). (13) Because the nominator in Eq.(12) is a positive number with a maximum value of 1, an α i value tending to infinity implies that the posterior distribution of w i is infinitely peaked at zero, i.e. w i = 0. As a consequence, the corresponding kernel function can be removed from the model. The procedure for determining the weights and the noise variance can be summarized as follows: 1) Select a width parameter of the kernel function and form the basis matrix Φ. 2) Initialize α = (α 0,, α N ) and σ 2 n. 3) Compute matrix A using Eq.(10). 4) Compute the covariance matrix C using Eq.(9). 5) Compute the mean vector μ using Eq.(8). 2 6) Update α and σ n using Eq.(12) and Eq.(13). 7) If α i, set w i = 0 and remove the corresponding column in Φ. 8) Go back to step 3 until convergence. 9) Set the remaining weights equal to μ. The training input points corresponding to the remaining nonzero weights are called the relevance vectors. After the weights and the noise variance are determined, the predictive mean for a new input x can be found as follows: f(x ) = w T Φ. (14) In Eq.(14) Φ = [1 K(x, r 1 ) K(x, r 2 ) K(x, r Nr )] T where relevance vectors. (r 1, r 2, r Nr ) are the 30 Jale Tezcan and Qiang Cheng

The total predictive variance can be found by adding the noise variance to the uncertainty due to the variance of the weights, as follows: σ 2 = σ n 2 + Φ T CΦ. (15) 3. Construction of the Ground Motion Model In this section, RVM regression algorithm will be used to construct a ground motion model. In Section 4, the resulting model will be compared to an existing parametric model by Idriss (Idriss, 2008), which will be referred to as I08 model in this paper. To enable a fair comparison, the dataset and the predictive variables of I08 model have been adopted in this study. The RVM algorithm is independent of the size of the predictive variable set; additional variables can be introduced the set of predictive variables can be customized to specific applications. 3.1 Ground Motion Data The ground motion records used in the training have been obtained from the PEER-NGA database (PEER, 2007). Consistent with the I08 model, a total of 942 free-field records have been selected using the following criteria: Shear wave velocity at the top 30 m (V s30 ) ranging from 450 m/s to 900 m/s, Magnitude larger than 4.5, Closest distance between the station and rupture surface (R) less than 200 km. Detailed information regarding these records can be found in the paper by Idriss (Idriss, 2008). 3.2 Predictive and Target Variables The predictive variable set includes moment magnitude (M), natural logarithm of the closest distance between the station and the rupture surface in kilometers (lnr) and fault mechanism (F). Idriss finds that with the shear wave velocity (V s30 ) constrained to 450 m/s- 900 m/s range, it has 31

negligible effect on spectral values up to 1 second. Therefore, V s30 was not used as a predictive variable. Following the convention used in I08 model, earthquakes that have been assigned a fault mechanism type 0 and 1 in the PEER database were merged to a single, strike-slip group, while the rest were considered to be representative of reverse events. In the RVM model, strike-slip and reverse earthquakes are assigned F = 1 and F = 1, respectively. The input vector representing i th record has the following form: x i = [M i lnr i F i ]. (16) A set of eight vibration periods (nt = 8) ranging from 0.01 second to 4 seconds was used in the RVM model. The output for the i th record for the vibration period T j is defined as: y i = lns(t j ) for j = 1 to nt. (17) In Equation (17), lns is the natural logarithm of the average horizontal component of 5%- damped pseudo-acceleration response spectrum. The spectral values(s) represent the median value of the geometric mean of the two horizontal components, computed using non-redundant rotations between 0 and 90 degrees (Boore, 2006). 3.3 Training of the RVM Regression Model As a pre-processing step, M and lnr values were linearly scaled to [-1 1] to achieve uniformity between the ranges of the predictive variables. There is no need to scale the fault mechanism identifier (F) as it was already defined to take either -1 or 1. Because kernel functions use Euclidean distances between pairs of input vectors, such scaling will help prevent numerical problems due to large variations between the ranges of the values that variables can take. In the ground motion data used in this study, the ranges of the predictive variables are 4.53 M 7.68, and 0.32 km R 199.27 km. Therefore, input scaling takes the following form: 32 Jale Tezcan and Qiang Cheng

x = 2M 12.21, 2lnR 4.16, F. (18) 3.15 6.44 The optimal value of the kernel width parameter (γ) for each vibration period was determined using 10-fold cross validation (Webb, 2002). In 10-fold cross validation, the training data is randomly partitioned into 10 subsets of equal size; and the model is trained using 9 subsets, and the remaining subset is used to compute the validation error. This process is repeated 10 times, each time with a different validation subset, and the average validation error for a particular γ is computed. By computing the average validation error over a range of possible γ values, the optimal γ with the smallest average validation error is determined. The resulting γ values for each period are listed in Table 1, along with the standard deviation of noise (σ n ), the mean value of the constant term (W 0 ) and the number of relevance vectors. The relevance vectors and the combination weights (W i ) are listed in Table 2. After the RVM models, one for each vibration period, were trained, standardized residuals were computed. Figure 1 shows the distribution of the standardized residuals, corresponding to T=1 second, with respect to M, R and V s30. The residual distribution patterns for other periods were similar, not indicating any systematic bias. Table 1: Kernel width parameter (γ), logarithmic standard deviation of noise (σ n ), mean value of the bias term(w 0 ) and the number of relevance vectors (N r ), for each period. T (sec) γ σ n W o N r 0.01 0.23 0.633-3.069 7 0.05 0.32 0.666-0.664 7 0.10 0.13 0.718 0.002 7 0.20 0.15 0.661-15.042 6 0.50 0.25 0.695-8.359 7 1.00 0.36 0.748-4.670 5 2.00 0.28 0.869-6.0548 5 4.00 0.26 0.983-7.794 5 33

Figure 1: Standardized residuals for T=1 second. Table 2: Mean values of the combination weights (W i ) and the relevance vectors (x i ) T=0.01 s. T=0.05 s. i W i r i i W i r i 1 13.258 [-0.1937 0.2676-1] 1-6.177 [0.7905-0.4227 1] 2 15.393 [0.5238-0.2268 1] 2 6.355 [-0.3841-0.1783-1] 3 0.4861 [ 0.8921 0.9414-1] 3 28.555 [0.5238 0.5856 1] 4-5.073 [0.9619-1.0000 1] 4-7.930 [-0.5111 0.7896-1] 5-4.275 [0.9619-0.6751 1] 5-0.402 [0.7460-0.4021-1] 6-14.173 [-0.2889 0.7862-1] 6-12.622 [0.9619 0.9545 1] 7-8.086 [ 0.0603 0.9789 1] 7-16.194 [0.0603 0.9789 1] T=0.1 s. T=0.2 s. i W i r i i W i r i 1 64.423 [0.4159-0.1499 1] 1 29.569 [-0.8921-0.0837-1] 2-6.991 [ 0.9619 0.9545 1] 2 2.293 [0.7905-0.4227 1] 3-36.297 [0.9619-1.0000 1] 3 35.440 [0.8921 0.6543-1] 4 15.875 [1.0000 0.4559-1] 4 5.7412 [0.9619-1.0000 1] 5-5.599 [-0.3143 0.0809 1] 5 3.5036 [-0.8222 0.1385 1] 6-17.361 [ 0.6508 0.9961-1] 6-48.496 [0.0603 0.4955-1] 7-25.799 [-0.1302 0.9056 1] 34 Jale Tezcan and Qiang Cheng

Table 2 (continued). T=0.5 s. T=1.0 s. i W i r i i W i r i 1 6.4551 [0.7905-0.4227 1] 1 1.9699 [0.7905-0.4227 1] 2 12.825 [-0.2317-0.2931-1] 2 4.8873 [0.0540-0.2785-1] 3 0.0283 [-0.7714 0.1214 1] 3-4.1425 [-0.7524 0.7892 1] 4-0.806 [ 0.8921-0.0318-1] 4-3.9593 [-0.7651 0.8672-1] 5 8.4335 [0.8921 0.9414-1] 5 3.7352 [-0.1302-0.0121 1] 6-0.089 [ 0.9619 0.9545 1] 7-12.9 [ 0.0603 0.5786-1] T=2.0 s. T=4.0 s. i W i r i i W i r i 1 7.3574 [-0.2317-0.2931-1] 1 0.4747 [0.7460-0.4021-1] 2 4.5548 [-0.0730 0.4691 1] 2 11.936 [0.7460 0.5118-1] 3 3.0086 [ 0.9619-1.0000 1] 3 6.8109 [0.3714-0.0296 1] 4-6.4695 [-1.0000 0.5142-1] 4-5.6050 [-0.7524 0.7892 1] 5-5.3630 [-0.7524 0.7892 1] 5-10.180 [0.3778 1.0000-1] 3.4 Prediction Phase After training, the spectral values for a new input vector x = [M, lnr, F ] can be determined as follows: 1. Scale the input to the range [-1 1] using Eq. (18); 2. Construct the basis vector Φ = [1 K(x, r 1 ) K(x, r 2 ) K(x, r Nr )] T using the relevance vectors from Table 2 and the kernel width parameter from Table 1; 3. Determine the median value of lns using Eq.(14); 4. Obtain the standard deviation of the noise from Table 1. Total uncertainty, if needed, can be determined using Eq.(15). 4. Computational Results The RVM model was tested using different magnitude, distance and fault mechanisms, and the results were compared to the I08 model. Figure 2 shows the median spectral acceleration at T=1 35

second, along with the 16 th and 84 th percentile values (±σ n bounds) for strike-slip faults, for M=5 (left) and M=7 (right). The circles in the figure show the spectral values from earthquakes with the same fault mechanism and within ±0.25 magnitude units. Figure 3 shows the same information for reverse faults. For periods about 1 second and longer, it was observed that the median estimates from the RVM model were generally lower than those from the I08 model. At very short distances, within ~20 km of the source, RVM estimates were higher for M=7, for both strike-slip and reverse faulting earthquakes. Figure 2: Median ±σ bounds for spectral acceleration at T=1 second, strike-slip faults. Figure 3: Median ±σ bounds for spectral acceleration at T=1 second, reverse faults. 36 Jale Tezcan and Qiang Cheng

Figure 4 presents the results for vibration period T=0.2 second, for strike-slip earthquakes. The results for the reverse faulting earthquakes were similar. For shorter vibration periods, and M=7, RVM estimates were lower than those from the I08 model. For M=5, however, RVM predictions equaled or exceed the I08 predictions. Regarding the variation about the median (noise variance), the predictions from the two models were in general agreement for all vibration periods. Figure 4: Median ±σ bounds for spectral acceleration at T=0.2 second, strike-slip faults. 5. Conclusion This paper proposes an RVM-based model for the average horizontal component of earthquake response spectra. Given a set of predictive variable set, and a set of ground motion records, the RVM model predicts the most likely spectral values in addition to its variability. An example application has been presented where the predictions from the RVM model have been compared to an existing, parametric ground motion model. The results demonstrate the validity of the proposed model, and suggest that it can be used as an alternative to the conventional ground motion models. The RVM model offers the following advantages over its conventional counterparts: (1) There is no need to select a fixed functional form. By determining the optimal variances associated with 37

the weights, the RVM automatically detects the most plausible model; (2) The resulting RVM model has a simple mathematical structure (weighted average of exponential basis functions), and is based on a small number of samples that carry the most relevant information. Samples that are not well supported by the evidence (as measured by the increase in the marginal likelihood) are automatically pruned. (3) Because the model complexity is controlled during the training stage, the RVM has lower risk of over-fitting. One limitation of the proposed approach is that the resulting model may be difficult to interpret. Because the RVM is not a physical model, it does not allow any user-defined, physical constraints, not allowing extension of the model to scenarios not represented in the training data set. However, in our opinion, this does not constitute a shortcoming, considering that the reliability such practice is questionable in any regression model. Another potential limitation is that the RVM requires a user-defined kernel width parameter, which does not have a very clear intuitive meaning, especially when working with high dimensional input vectors. However, the optimal value of the kernel width parameter can be determined using cross-validation, as has been done in this study. Future studies will investigate the effect of using additional predictive variables on the performance of the model. 6. Acknowledgements This material is based in part upon work supported by the National Science Foundation under Grant Number CMMI-1100735. 7. References Boore, D.M., J. Watson-Lamprey, and N.A. Abrahamson. (2006). Orientation-independent measures of ground motion. Bulletin of the Seismological Society of America, 96(4A), 1502-1511. Bozorgnia, Y. and K. W. Campbell. (2004). The vertical-to-horizontal response spectral ratio and tentative procedures for developing simplified V/H and vertical design spectra. Journal of Earthquake Engineering, 8(2), 175-207. Campbell, K. W. and Y. Bozorgnia. (2003). Updated Near-Source Ground-Motion (Attenuation) Relations for the Horizontal and Vertical Components of Peak Ground Acceleration and Acceleration Response Spectra. Bulletin of the Seismological Society of America, 93(1), 314-331. 38 Jale Tezcan and Qiang Cheng

Drucker, H., C. J. C. Burges, L. Kaufman, A. Smola and V. Vapnik. (1997). Support vector regression machines, Advances in Neural Information Processing Systems 9, MIT Press. Idriss, I. M. (2008). An NGA empirical model for estimating the horizontal spectral values generated by shallow crustal earthquakes. Earthquake spectra, 24(1), 217-242. MacKay, D. J. C. (1992). Bayesian interpolation. Neural computation, 4(3), 415-447. MacKay, D. J. C. (1992). The evidence framework applied to classification networks. Neural Computation, 4(5), 720-736. PEER. (2007). PEER-NGA Database. http://peer.berkeley.edu/nga/index.html. Smola, A. J. and B. Schölkopf. (2004). A tutorial on support vector regression. Statistics and Computing, 14(3), 199-222. Tezcan, J. and Q. Cheng. (2011). A Nonparametric Characterization of Vertical Ground Motion Effects. Earthquake Engineering and Structural Dynamics (in print). Tezcan, J., Q. Cheng and L. Hill. (2010). Response Spectrum Estimation using Support Vector Machines, 5th International Conference on Recent Advances in Geotechnical Earthquake Engineering and Soil Dynamics, San Diego, CA. Tipping, M. (2000). The relevance vector machine. Advances in Neural Information Processing Systems MIT Press. Webb, A. (2002). Statistical pattern recognition, New York, John Wiley and Sons. Dr.Jale Tezcan is an Associate Professor in the Department of Civil and Environmental Engineering at Southern Illinois University Carbondale. She earned her Ph.D. from Rice University, Houston, TX in 2005. Dr.Tezcan s research interests include earthquake engineering, material characterization, and numerical methods. Dr.Qiang Cheng is an Assistant Professor in the Department of Computer Science at Southern Illinois University Carbondale. He earned his Ph.D. from the University of Illinois at Urbana Champaign, IL in 2002. Dr.Cheng s research interests include pattern recognition, machine learning and signal processing. Peer Review: This article has been internationally peer-reviewed and accepted for publication according to the guidelines given at the journal s website. 39