# EXTREME events, such as heat waves, cold snaps, tropical. Extreme-Value Graphical Models with Multiple Covariates

Save this PDF as:

Size: px
Start display at page:

Download "EXTREME events, such as heat waves, cold snaps, tropical. Extreme-Value Graphical Models with Multiple Covariates"

## Transcription

1 1 Extreme-Value Graphical Models with Multiple Covariates Hang Yu, Student Member, IEEE, Justin Dauwels, Senior Member, IEEE, and Philip Jonathan Abstract To assess the risk of extreme events such as hurricanes, earthquakes, and floods, it is crucial to develop accurate extreme-value statistical models. Extreme events often display heterogeneity i.e., non-stationarity), varying continuously with a number of covariates. Previous studies have suggested that models considering covariate effects lead to reliable estimates of extreme events distributions. In this paper, we develop a novel statistical model to incorporate the effects of multiple covariates. Specifically, we analye as an example the extreme sea states in the Gulf of Mexico, where the distribution of extreme wave heights changes systematically with location and storm direction. In the proposed model, the block maximum at each location and sector of wind direction are assumed to follow the Generalied Extreme Value GEV) distribution. The GEV parameters are coupled across the spatio-directional domain through a graphical model, in particular, a three-dimensional 3D) thin-membrane model. Efficient learning and inference algorithms are developed based on the special characteristics of the thin-membrane model. We further show how to extend the model to incorporate an arbitrary number of covariates in a straightforward manner. Numerical results for both synthetic and real data indicate that the proposed model can accurately describe marginal behaviors of extreme events. Index Terms extreme events modeling, Gaussian graphical models, covariates, Laplacian matrix, Kronecker product I. INTRODUCTION EXTREME events, such as heat waves, cold snaps, tropical cyclones, hurricanes, heavy precipitation and floods, droughts and wild fires, have possibly tremendous impact on people s lives and properties. For instance, China experienced massive flooding of parts of the Yangte River in the summer of 1998, resulting in about 4, dead, 1 million homeless and 26 billion USD in economic loss. To make matters worse, both observational data and computer climate models suggest that the occurrence and sies of such catastrophes will increase in the future [1]. It is therefore imperative to model such events, assess the risk, and further take precaution measurements. Extreme-value theory governs the statistical behavior of extreme values of variables, such as extreme wave heights Part of the material in this paper was presented at ISIT, Cambridge, MA, July 212, and will be presented at ICASSP, Florence, Italy, May 214. H. Yu is with School of Electrical and Electronic Engineering, Nanyang Technological University, Nanyang Avenue, Singapore J. Dauwels is with School of Electrical and Electronic Engineering and School of Physical and Mathematical Sciences, Nanyang Technological University, Nanyang Avenue, Singapore P. Jonathan is with Shell Research Ltd., Manchester, M22 RR, UK and Department of Mathematics and Statistics, Lancaster University, Lancaster, LA1 4YF, UK during hurricanes. The theory provides closed-form distribution functions for the extremes of single variables marginals), such as block maxima monthly or annually) and peaks over a sufficiently high threshold [2]. The main challenge in fitting such distributions to measurements is the lack of data, as extreme events are by definition very rare. The problem can be alleviated by assuming that all the collected data e.g., extreme wave heights at different measuring sites [3]) are stationary and follow the same distribution. After combining all the data, the resulting sample sie is sufficiently large to yield apparently reliable estimates. However, there usually exists clear heterogeneity in the extreme-value data caused by the underlying mechanisms that drive the weather events. Extreme temperature, for example, is greatly influenced by the altitude of the measuring site. The latter can be regarded as a covariate. Accommodating heterogeneity in the model is essential since the estimated model will be unreliable otherwise [4]. In order to handle both heterogeneity as well as the problem of small sample sie, the interactions among extreme events with different covariate values is often exploited. For instance, extreme temperatures at similar altitudes behave similarly, implying that the parameters of the corresponding extreme value distributions vary smoothly with the covariate i.e., altitude). Such prior knowledge may help to improve the fitting of extreme value distributions. The large body of literature on extreme-value models with covariates can be divided into two categories: models with single []-[9] and multiple covariates []-[13]. Approaches of the first group usually treat the parameters of the marginal extreme-value distributions as a function of the covariate. In []-[7], directional and seasonal effects are considered when describing the marginal behavior of extreme wave heights. The dependence of the parameters on the single covariate is captured by a Fourier expansion. Spatial effects are investigated in [8]: the parameters are assumed to be Legendre polynomials of the location; extreme value threshold is determined through quantile regression. Although these parametric models offer a simple framework to capture the covariate effect, they are prone to model misspecification. A more appealing approach is to incorporate the covariate in a non-parametric manner. The work in [9] employs a Markov random field, in particular, a conditional autoregressive model, to induce spatial dependence among the parameters of extreme-value distributions across a spatial domain. This also enables the use of the Markov Chain Monte Carlo MCMC) algorithm to learn the model. However, such procedures are computationally complex and can be prohibitive for large-scale systems. On the other hand, few attempts have been made to model

2 2 multiple covariates. A standard method is to predefine the distribution parameters as a function of all the covariates [2]. Unfortunately, the function can be quite complicated as the number of covariates increases, whereas only linear or loglinear model are used in [2] for simplicity. The resulting estimates may be biased due to the misspecification of the functional form. As an alternative, Eastoe et al [] proposed to remove the heterogeneity of the entire data set, both extreme and non-extreme, through preprocessing and then model the extremal part of the preprocessed data using the above mentioned standard approach. They found that the preprocessing technique can indeed remove almost all the heterogeneity, and consequently, simple linear or log-linear models are capable of expressing the residual heterogeneity. Motivated by this success, Jonathan et al. [11] proposed to process two different covariates individually. They removed the effect of the first covariate by whitening the data using a linear location-scale model and then employed the methods for a single covariate to accommodate the second one. A setback of their method, however, is that the dependence on the first covariate may be nonlinear and therefore the whitening step cannot completely remove the effect of the first covariate. Furthermore, capturing the two covariates independently fails to consider the possible correlation between them. To accommodate all the covariates at the same time, a spline-based generalied additive model is introduced in [12], where the spline smoothers for each covariate are added to the original likelihood function and then the penalied likelihood is maximied. Similarly, Randell et al. [13] addressed the problem by means of penalied tensor products of B-splines so as to obtain a smooth dependence of the distribution parameters w.r.t. all the covariates. The splinebased methods have the virtue of extending the methods for single covariates to the case of multiple covariates. Unfortunately, directly maximiing the complex penalied likelihood can be computationally intensive. Moreover, choosing the smoothness parameters can be problematic as pointed out in [14]. The aforementioned shortcomings spark our interest in exploiting graphical models to incorporate multiple covariates in the extreme-value model. The interdependencies between extreme values with different covariates are often highly structured. This structure can in turn be leveraged by the graphical model framework to yield very efficient algorithms [1]-[18]. In this paper, we aim to model extreme events with multiple covariates using graphical models. Our model is theoretically significant because it is among the first approaches to exploit the framework of graphical models to analye the marginal behavior of extreme events. As an example, we model the storm-wise maxima of significant wave heights in the Gulf of Mexico see Fig. 7), where the covariates are, latitude, and wind direction. Note that the significant wave height is a standard measure of sea surface roughness; it is defined as the mean of the highest one third of waves typically in a three-hour period). Theoretically, significant wave heights are affected by dominant wave direction i.e., storm direction). However, we use wind direction as a surrogate since wind and wave direction are generally fairly well correlated, especially for extreme events. Motivated by the extreme value theory [2], the extreme events are assumed to follow the heavy-tailed Generalied Extreme Value GEV) distributions. The parameters of those GEV distributions are further assumed to depend smoothly on the covariates. To facilitate the use of graphical models, we discretie the continuous covariates within a finite range. In the example of extreme wave heights in the Gulf of Mexico, space is discretied as a finite homogeneous two-dimensional lattice, and the wind direction is discretied in a finite number of equal-sied sectors. More generally, the GEV distributed variables and hence also the GEV parameters) are defined on a finite number of points indexed by the discretied) covariates. We characterie the dependence between the GEV parameters through a graphical model prior, in particular, a multidimensional thin-membrane model where edges are only present between pairs of neighboring points see Fig. 1). We demonstrate that the multidimensional model can be constructed flexibly from one-dimensional thin-membrane models for each covariate. The proposed model can therefore easily be extended to an arbitrary number of covariates. We follow the empirical Bayes approach to learn the parameters and hyperparameters. Specifically, both the smoothed GEV parameters and the smoothness parameters are inferred via Expectation Maximiation. A major challenge lies in the scalability of the algorithm since the dimension of the model is usually quite large. In order to derive efficient algorithms, we take advantage of the special pattern of the eigenvalues and eigenvectors corresponding to the one-dimensional thin-membrane models. Our numerical results for both synthetic and real data suggest that the proposed model indeed accurately captures the effect of covariates on the statistics of extreme events. Moreover, the proposed model can flexibly accommodate a large variety of smooth) dependencies on covariates, as the smoothness parameters of the thin-membrane model are inferred automatically from data. The remainder of the paper is organied as follows. In Section II, we introduce thin-membrane models and their properties. In Section III, we first construct the 3D thin-membrane model based on simple 1D thin-membrane models to model extreme wave heights, and then illustrate how to generalie the model to incorporate any number of covariates. In Section IV, we discuss the efficient learning and inference algorithms at length and provide theoretical guarantees. Numerical results for both synthetic and real data are presented in Section V. Lastly, we offer concluding remarks in Section VI. II. THIN-MEMBRANE MODELS In this section, we first give a brief introduction to graphical models, and subsequently consider the special case of thinmembrane models. We then analye two concrete examples of thin-membrane models that will be used as building blocks in the proposed extreme-value graphical model: chain and circle models. In an undirected graphical model i.e., Markov random field), the probability distribution is represented by an undirected graph G which consists of nodes V and edges E. Each node i is associated with a random variable Z i. An

3 3 edge i, j) is absent if the corresponding two variables Z i and Z j are conditionally independent: P Z i, Z j Z V i,j ) = P Z i Z V i,j )P Z j Z V i,j ), where V i, j denotes all the variables except Z i and Z j. If the random variables Z corresponding to the nodes on the graph are jointly Gaussian, then the graphical model is called a Gaussian graphical model. Let Z N µ, Σ) with mean vector µ and positive-definite covariance matrix Σ. Since Z constitutes a Gaussian graphical model, the inverse of the covariance matrix precision matrix) K = Σ 1 is sparse with respect to the graph G, i.e., [K] i,j if and only if the edge i, j) E [19]. The Gaussian graphical model can be written in an equivalent information form N K 1 h, K 1 ) with a precision matrix K and a potential vector h = Σ 1 µ. The corresponding probability density function PDF) is P X) exp 1 2 XT KX + h T X). 1) The thin-membrane model [2] is a Gaussian graphical model that is commonly used as smoothness prior, as it minimies the difference between values at neighboring nodes: P Z) exp{ 1 2 α Z i Z j ) 2 } 2) i V j Ni) exp{ 1 2 αzt K p Z}, 3) where Ni) denotes the neighboring nodes of node i, K p is a graph Laplacian matrix such that [K p ] i,i is the number of neighbors of the i th node while the off-diagonal elements [K p ] i,j are equal to 1 if nodes i and j are adjacent and otherwise, and α is the smoothness parameter which controls the smoothness across the domain defined by the thinmembrane model. By comparing 3) with 1), we can see that the precision matrix of the thin-membrane model is K = αk p. Since K p is a Laplacian matrix, K is rank deficient, i.e., det K =. As such, the thin-membrane model is classified as a partially informative normal prior [21] or intrinsic Gaussian Markov random field [22]. To make the distribution welldefined, the improper density function is usually applied in practice: P Z) K. + exp{ 1 2 ZT KZ} 4) = αk p. + exp{ 1 2 αzt K p Z}, ) where K + denotes the product of nonero eigenvalues of K. Note that K p 1 =, where 1 is a vector of all ones, indicating the eigenvalue associated with the eigenvector 1 equals. Thus, the thin-membrane model is invariant to the addition of c1, where c is an arbitrary constant, and it allows the deviation from any overall mean level without having to specify the overall mean level itself. As an illustration, we can easily find that the conditional mean of variable Z i is [22] EZ i Z V i ) = 1 [K p ] i,j Z j, 6) [K p ] i,i j Ni) i.e., the mean of its neighbors, but it does not involve an overall level. This special behavior is often desirable in applications. We now turn our attention to the thin-membrane models of the chain and the circular graph, as shown in Fig. 1a and Fig. 1b respectively. The former can well characterie the dependence structure of nonperiodic covariates e.g., and latitude), while the latter is highly suitable for periodic covariates e.g., directional and seasonal patterns). The corresponding Laplacian matrices are denoted as K B and K C respectively. It is easy to prove that the eigenvalues and eigenvectors of K B and K C have the following special pattern [23]: ) kπ λ Bk = 2 2 cos, P [ ) kπ 3kπ v Bk = cos, cos 2P 2P ) 2kπ λ Ck = 2 2 cos, P [ v Ck = 1, ω k, ω 2k,, ω P 1)k] T, ),, cos )] T 2P 1)kπ, 2P for k = 1,, P, where P is the dimension of K B and K C, ω = exp2πi/p ) and i is the imaginary unit. An important property of the eigenvectors is as follows: Let V B be the eigenvector matrix of K B, i.e., V B = [v B1, v B2,, v BP ], and x be a P 1 column vector, then V B x is identical to the discrete cosine transform of x [23, Ch. 1]. Similarly, let V C denote the eigenvector matrix of K C, then V C x can be computed as the discrete Fourier transform of x [23, Ch. 1]. III. MODELING MULTIPLE COVARIATES In this section, we present a novel extreme-value statistical model that incorporates multiple covariates. We assume that the block maxima e.g., monthly or annual maxima) associated with every possible set of values for the covariates follow the Generalied Extreme Value GEV) distribution, according to the Fisher-Tippett-Gnedenko theorem in extreme value theory [2]. The GEV parameters are assumed to vary smoothly with the covariates. The latter are discretied within a finite range. Therefore, the GEV parameters can be indexed as i1i 2 i m, where m is the number of covariates, and i 1 corresponds to a discretied value of the first covariate and likewise for the other indices i 2,, i m. In summary, the GEV parameters are defined on a finite number of points i 1, i 2,..., i m ) and those parameters are supposed to vary smoothly from one point to a nearby point. The resulting dependence structure can be well represented by a multidimensional thin-membrane model. Furthermore, we show that the multidimensional model can be constructed from onedimensional thin-membrane models for each covariate, thereby making the proposed model generaliable to accommodate as many covariates as required. We employ the multidimensional thin-membrane model as the prior and estimate the GEV parameters through an empirical Bayes approach. As an illustration, we present in detail the spatio-directional model to quantify the extreme wave heights. Suppose that we have N samples x n) block maxima) at each location, indexed by its and latitude i, j), and directional sector k, where n = 1,, N, i = 1,, P, j = 1,, Q,

4 4 a) b) c) Fig. 1: Thin-membrane models: a) chain graph; b) circle graph; c) lattice; d) spatio-directional model. k = 1,, D and P, Q, D are the number of indices, latitude indices and directional sectors respectively. Consequently, the dimension M of the proposed model is given by M = P QD. Hence, our objective is to accurately infer the GEV parameters with the consideration of both spatial and directional dependence. Specifically, we first locally fit the GEV distribution to block maxima at each location and in each directional sector. The local estimates are further smoothed across the spatio-directional domain by means of 3D thin-membrane models. A. Local estimates of GEV parameters We assume that the block maxima x follow a Generalied Extreme Value GEV) distribution [2], whose cumulative probability distribution CDF) equals: exp{ [1 + γ x µ )] 1 γ }, γ σ F x ) = exp{ exp[ 1 x µ )]}, γ =, σ for 1+γ /σ x µ ) if γ and x R if γ =, where µ R is the location parameter, σ > is the scale parameter and γ R is the shape parameter. The Probability-Weighted Moment PWM) method [24] is employed here to yield the estimates of GEV parameters ˆµ, ˆσ and ˆγ locally at each location i, j) and each direction k. The goal of the PWM method is to match the moments E[x t F x )) r 1 F x )) s ] with the empirical ones, where t, r and s are real numbers. For the GEV distribution, E[x F x )) r ] with t = 1 and s = ) can be written as: b r = 1 r + 1 {µ σ γ [1 + r 1) γ Γ1 γ )]}, 7) where γ < 1 and γ, and Γ ) is the gamma function. The resulting PWM estimates ˆµ, ˆσ and ˆγ are the solution of the following system of equations: b = µ σ 1 Γ1 γ )), γ 2b 1 b = σ Γ1 γ )2 γ 1), γ 8) 3b 2 b = 3γ 1 2b 1 b 2 γ. 1 In practice, since solving the last equation in 8) is timeconsuming, it can be approximated by [24]: ˆγ i PWM = 7.89c c 2 ), 9) y x d) where c = 2b 1 b )/2b 2 b ) log 2/log 3. The PWM method generates good estimates even when the sample sie is small, as demonstrated in [24]. Therefore, the method is suitable for extreme-events modeling, especially in our case where the number of samples with the same values of covariates is limited. The only restriction is that the PWM estimates is inaccurate and even does not exist when γ >.. To address this concern, we utilie the two-stage procedure proposed by Castillo et al. [2], which is referred to as the median MED) method, when the PWM estimates ˆγ >. or is not a number NAN). In the first stage of the MED method, the extreme-value samples can be ordered x 1) x 2) xn). A set of GEV estimates is obtained by equating the GEV CDF F x n) ) to its corresponding empirical CDF P x n) ) = n.3)/n. More explicitly, for each n such that 2 n N 1 in turn, we have: x 1) x n) x N) = σn) γ n) = σn) γ n) = σn) γ n) {[ log P x 1) )] γn) n) 1} + µ, {[ log P x n) )] γn) n) 1} + µ, {[ log P x N) )] γn) n) 1} + µ. ) Note that the equations associated with the first and last sample, i.e., x 1) and xn), will be used multiple times. By solving the three equations in ) w.r.t. the GEV parameters independently for each n 2 n N 1) in turn, we can obtain a set of γ n), σn), µn) ) for each of the N 2 occurrences x n). In the second stage, we determine the median of the sets of GEV parameters as the final MED estimates ˆµ, ˆσ, ˆγ ). As shown in [2], although the MED method is more computationally demanding than the PWM method, it yields reliable estimates when the shape parameter γ >.. B. Prior distribution We assume that each of the three parameter vectors µ = [µ ], γ = [γ ] and σ = [σ ] has a 3D thin-membrane model see Fig. 1d) as prior. Since the thin-membrane models of µ, γ, and σ share the same structure and inference methods, we present the three models in a unified form. Let denote the true GEV parameters, that is, is either µ, σ, or γ. We next illustrate how to construct the 3D thin-membrane model priors from the fundamental Markov chains and circular graphs. As a first step, we build a regular lattice see Fig. 1c) from Markov chains. Since both the and the latitude of the measurements are nonperiodic, either of them can be characteried by the chain graph shown in Fig. 1a. Let K Bx and K By denote the Laplacian matrices corresponding to the graph of one row i.e., the ) and one column i.e., the latitude) of sites respectively. We further assume that the smoothness across the and the latitude are the same, thus, they can share one common smoothness parameter α. The resulting precision matrix of the regular lattice is given

5 by: K p = α K Bx ) α K By ) = α K Bx K By ) = α K L, where and represent Kronecker sum and Kronecker product respectively, I is an identity matrix with the same dimension as K. It is easy to show that K L = K Bx K By is the graph Laplacian matrix of the lattice. According to the property of Kronecker sum, the eigenvalue matrix Λ L = Λ Bx Λ By and eigenvector matrix V L = V By V Bx. In the second step, we accommodate the effect of wind direction. We discretie the wind direction into D sectors and assume that the true GEV parameters are constant in each sector. As GEV parameters of neighboring sectors are similar, the directional dependence can be encoded in a circular graph see Fig. 1b), whose Laplacian matrix is K C. Another smoothness parameter β is introduced to dictate the directional dependence since the smoothness across wind direction and space can be different. We can then seamlessly combine the lattice and the circle, leading to the 3D thin-membrane model see Fig. 1d) with precision matrix: K prior = α K L ) β K C ) = α K s + β K d. 11) Interestingly, K s = K L I C and K d = I L K C corresponds to the lattices and circular graphs in the graph respectively, indicating that the former only characteries the spatial dependence while the latter the directional dependence. Based on the property of Kronecker sum, the eigenvalue matrix of K prior equals: Λ prior = α Λ L ) β Λ C ) = α Λ s + β Λ d, 12) where Λ s = Λ L I C and Λ d = I L Λ C. The eigenvector matrix equals: V prior = V L V C = V Bx V By V C. 13) By substituting 11) into 4), the density function of the 3D thin-membrane model becomes: P ) K prior. + exp{ 1 2 T K prior } 14) = α K s + β K d. + exp{ 1 2 ZT α K s + β K d )Z}. 1) The specific structure of the model is extendable to any number of covariates. Specifically, the precision matrix can be generalied as: K prior = αk a ) βk b ) γk c ) 16) = αk a I b I c + βi a K b I c + γi a I b K c +, 17) where K i {K B, K C } for i {a, b, c, } is the graph Laplacian matrix associated with the dependence structure of covariate i, and where α, β and γ are corresponding smoothness parameters. The eigenvalue and eigenvector matrix of K prior can be computed as: Λ prior = αλ a ) βλ b ) γλ c ) 18) = αλ a I b I c + βi a Λ b I c + γi a I b Λ c +, 19) = α Λ a + β Λ b + γ Λ c +, 2) V prior = V a V b V c, 21) where Λ i {Λ B, Λ C } and V i {V B, V C } are the eigenvalue and eigenvector matrix of K i. The dependence structure of nonperiodic and periodic covariates can usually be described by chain and circle graphs respectively, thus it is easy to calculate Λ i and V i as discussed in Section II. C. Posterior distribution Let y denote the local estimates of, where y is either ˆµ, ˆσ, or ˆγ, and denotes the true GEV parameters. We further assume that local estimates for some locations i, j) and directions k are missing, probably due to the two reasons listed below: 1) There are insufficient observations of extremes in some direction sectors or at some sites. Note that the PWM method needs at least three samples to solve 8). In practice, for data collected by satellites, there are always missing parts in satellite images due to the limited satellite path or the presence of clouds. For data collected by sensors, the sensors may fail during the extreme events, resulting in missing measurements of extremevalue samples. On the other hand, the rate of occurrence of events with respect to direction in particular is nonuniform. For example, for locations sheltered by land, the number of extreme events emanating from the direction of the land will be smaller than from other directions in general. 2) There exist unmonitored sites where no observations are available. For instance, the measuring stations are often irregularly distributed across space, whereas the proposed method is more applicable to the case of regular lattice see Fig. 1c). As a result, we can introduce unmonitored sites such that all the sites, including both observed and unobserved ones, are located on a regular lattice. In addition, in the case of wave heights analysis, people may have particular interest in some unmonitored locations since it is easy and convenient to build offshore facilities there. We therefore only have measurements i.e., the local estimates y) at a subset of variables in the 3D graphical model. Due to the limited number of samples available at each site, we model the local estimates as y = C +b, where b N, R ) is ero-mean Gaussian random vector Gaussian white noise) with diagonal covariance matrix R, and C is the selection matrix that only selects at which the noisy observations y are available. C has a single non-ero value equal to 1) in each row. If there are adequate observations available at all the locations and directions, C would simply be an identity matrix. Note that the local estimates given by the PWM method are

6 6 asymptotically Gaussian distributed [24], thus motivating us to employ the Gaussian approximation. As a result of this approximation, the conditional distribution of the observed value y given the true value is a Gaussian distribution: P y ) exp{ 1 2 y C)T R 1 y C)}. 22) Since we assume that the prior distribution of is the 3D thin-membrane model 1), the posterior distribution is given by: P y) P )P y ) 23) K prior. + exp{ 1 2 T K prior + C T R 1 C)+ T C T R 1 y} 24) K post exp{ 1 2 T K post + T C T R 1 y}, 2) where K post = K prior + C T R 1 C is the precision matrix of the posterior distribution. IV. LEARNING AND INFERENCE We describe here the proposed learning and inference algorithm. Concretely, we discuss how the smoothed GEV parameters, the noise covariance matrix R, and the smoothness parameter α for each of the three parameters µ, γ, and σ are computed. A. Inferring smoothed GEV Parameters Given the covariance matrix R and the smoothness parameters α and β, the maximum a posteriori estimate of is given by: ẑ = argmax P y) = K 1 postc T R 1 y. 26) The dimension M of K post is equal to M = P QD. In most practical scenarios, it is intractable to compute the inverse of K post due to the OM 3 ) complexity. In the following, we explain how we can significantly reduce the computational complexity by exploiting the special configuration of the eigenvectors of K prior. When the diagonal matrix C T R C can be well approximated by a scaled identity matrix ci, we have: K post K prior + ci 27) = V T priorα Λ s + β Λ d + ci )V prior 28) As a result, ẑ can be computed as: ẑ = V T priorα Λ s + β Λ d + ci ) 1 V prior C T R 1 y. 29) We propose a fast thin-membrane model FTM) solver to evaluate 29) in three steps. 1) Let = C T R 1 y, and the first step computes 1 = V prior. Note that V prior = V Bx V By V C 21), thus, 1 = V prior is a three-dimensional integration transform as defined in [28]. More explicitly, we can first reshape the vector into a P Q D array Z such that = vecz ), where vecz ) denotes the vectoriation operation. In the spatio-directional model, the three dimensions represent, latitude and direction respectively. Next, we perform the fast cosine transform FCT) in the first and second dimension corresponding to V Bx and V By ) and the fast Fourier transform FFT) in the third one corresponding to V C ). The detailed derivation is shown in Appendix A. The resulting computational complexity is OM logm)). Moreover, the computation is amenable to paralleliation. 2) In the second step, α Λ s + β Λ d + ci ) is a diagonal matrix, so the complexity of the operation 2 = α Λ s + β Λ d + ci ) 1 1 is linear in M. 3) The operation Vprior T 2 in the final step amounts to performing the inverse FCT and FFT in the proper dimensions of the P Q D array reshaped from 2, and the computational effort is the same with the first step. In summary, the computational complexity of evaluating is OM logm)). A similar algorithm can easily be designed for the general case of multiple covariates. Since the generalied V prior = V a V b V c for V i {V B, V C } i {a, b, c, }) 21), we can perform FFT in the i-th dimension if V i belongs to the V B family, and perform FCT otherwise. When ci is not a good approximation to C T R C, we decompose K post into two parts K 1 and K 2 as follows: K 1 = K prior + ci, K 2 = K post K 1, where c is chosen as the largest entry in C T R C. The Richardson iteration [27] is then used to solve κ+1) = K1 1 CT R 1 y K 2 κ) ) until convergence. In each iteration, computing the inverse of K 1 can be circumvented by using the FTM solver mentioned above. The following two theorems guarantee the convergence of the proposed method. Theorem 1. Given K post = K prior +C T R 1 C, K 1 = K prior + ci and K 2 = K post K 1, where c equals the largest diagonal element in C T R 1 C, 1) K post is strictly positive definite, 2) the spectral radius ρk1 1 K 2) < 1 and the resulting Richardson iterations κ+1) = K1 1 CT R 1 y K 2 κ) ) are guaranteed to converge to KpostC 1 T R 1 y. Proof. See Appendix B. Theorem 2. The convergence rate of the proposed algorithm ρk 1 1 K 2) is bounded below and above by: ρk1 1 2) maxct R 1 C) minc T R 1 C) maxλ prior ) + maxc T R 1 C), 3) ρk1 1 2) maxct R 1 C) minc T R 1 C) minλ prior ) + maxc T R 1, C) 31) where maxk) and mink) denote the maximum and minimum diagonal element of the matrix K respectively. Proof. See Appendix C. According to Theorem 2, decreasing the difference between the largest and smallest diagonal entries of matrix C T R 1 C will reduce the upper bound on ρk1 1 K 2), resulting in faster convergence. In the limit case where minc T R 1 C) =

7 7 maxc T R 1 C), that is, C T R 1 C = ci, the Richardson iteration converges in the step as in 29). Note that the MAP estimate of is dependent on the covariance matrix R and the smoothness parameters, the calculation of which is described in the sequel. B. Estimating Covariance Matrices R We use the parametric bootstrap approach to infer the diagonal) noise covariance matrices R µ, R γ and R σ ; this method is suitable when the number of available samples is small [26]. Concretely, we proceed as follows: 1) We generate the local GEV estimates ˆµ, ˆσ and ˆγ using the method discussed in Section III-A. 2) We draw M = 3 sample sets S 1,, S M, each with N GEV distributed samples based on the local estimates of GEV parameters ˆµ, ˆσ, ˆγ ). 3) For each S m, where m = 1,, M, we estimate the GEV parameters locally again using the method in Section III-A. The resulting GEV estimates are denoted as ˆµ [m], ˆσ[m], ˆγ[m] ). 4) The variance of ˆµ [m] m = 1,, M) at site i, j) and wind direction k is our estimate of the corresponding diagonal element in R µ. Similarly, we can obtain estimates of the diagonal covariance matrices R γ and R σ. C. Learning Smoothness Parameters Since α and β are usually unknown, we need to infer them based on the local estimates y. However, since is unknown, directly inferring α and β is impossible, and instead we solve 2) by Expectation Maximiation EM): ˆα κ), ˆβ κ) ) = argmax Qα, β ; ˆα κ 1) κ 1), ˆβ ). 32) In Appendix D, we derive the Q-function: Qα, β ; ˆα κ 1), tr K prior where c 1 = tr K s c 2 = tr K d κ 1) ˆβ ) K κ 1) post ) 1 ) κ)) T Kprior κ) + log K prior + + c, 33) = c 1 α c 2 β + log K prior + + c, 34) K κ 1) post K κ 1) post ) 1 ) ) 1 ) + κ)) T Ks κ), 3) + κ)) T Kd κ), 36) ˆβ κ 1) κ) is computed as in 26) with ˆα κ 1) and obtained from the previous iteration, and c stands for all the unrelated terms. In 33), we need to evaluate log K prior +. Since K piror can be regarded as a generalied Laplacian matrix corresponding to the connected graph of the 3D thin-membrane model, according to the properties of Laplacian matrices, K prior + = M det SK prior ), where SK prior ) denotes the first M 1 rows and columns of the M M matrix K prior and SK prior ) is positive definite [29]. As a result, log K prior + = log det SK prior ) + log M, 37) = log detα SK s ) + β SK d )) + c. 38) Taking the partial derivatives of Q function with regard to α and β, we can obtain: Q = c 1 + trsk prior ) 1 SK s )), α 39) Q = c 2 + trsk prior ) 1 SK d )), β 4) where the Jacobi s formula is applied, i.e., log det K 1 K = tr K x x ). 41) By equating the partial derivatives 39) and 4) to ero, we have: As a consequence, we obtain: c 1 = trsk prior ) 1 SK s )), 42) c 2 = trsk prior ) 1 SK d )). 43) αc 1 + βc 2 = tr{sk prior ) 1 α SK s ) + β SK d ))} 44) Note that α SK s ) + β SK d ) = SK prior ), and therefore, αc 1 + βc 2 = M 1. 4) By substituting 4) into the Q-function 33), it follows: Qα, β ; ˆα κ 1) κ 1), ˆβ ) = log K prior + M 1) + c. 46) At this point, the expression 32) can be succinctly formulated as: α κ), β κ) ) = argmax log K prior +, 47) s.t. c 1 α + c 2 β = M 1, α, β. Since the eigenvalue matrix Λ prior of K prior equals Λ prior = α Λ s + β Λ d as in 2), we can further simplify the objective function 47) as: log K prior + = log Λ prior + = logα λ sk + β λ dk ), 48) k λsk +λ dk > Consequently, the constrained convex optimiation problem 47) can be solved efficiently via the bisection method [3]. In the scenario of multiple covariates, 47) can be extended as: α κ), β κ), γ κ), ) = argmax logα λak + β λbk k + γ λck + ), 49) s.t. c 1 α + c 2 β + c 3 γ + = M 1, α, β, γ, where λ ik i {a, b, c, }) is the k-th diagonal element of λ i in 2). The overall learning and inference algorithm for the generalied model is summaried in Table I.

8 8 TABLE I: The learning and inference algorithm for modeling multiple covariates. 1) Estimate the GEV parameters locally using the combination of PWM and MED estimator. 2) Approximate the relation between locally fitting and true GEV parameters by a Gaussian model, that is, y = C + b. Estimate the diagonal noise covariance R using the parametric bootstrap approach. 3) Initialie the smoothness parameters ˆα ) ) and ˆβ. Iterate the following steps till convergence: a) E-step: update the MAP estimates of GEV parameters using the methods described in Section IV-A: { ẑ κ) = K κ 1) post } 1 C T R 1 y, where K κ 1) post = K κ 1) prior + C T R 1 C = ˆα κ 1) κ 1) K a ) ˆβ K b ) ˆγ κ 1) K c ) + C T R 1 C. The last expression follows from 16). b) M-step: update the estimate of smoothness parameters by solving 49). D. Bootstrapping the uncertainty of GEV estimates In addition to the point estimates of GEV parameters, we also have particular interest in the uncertainly of the estimates. As demonstrated in [3], nonparametric bootstrapping provides a reliable tool to quantify the uncertainty associated with extreme-value models. The bootstrap procedure consists of the following steps: 1) Generate M = sample sets S 1,, S m, each with N occurrences, by resampling at random with replacement from the original N observations. 2) For each S m, where m = 1,, M, apply the algorithm in Table I to yield smoothed GEV parameters m), which denotes either γ, σ, or µ. 3) The 9% confidence interval for GEV estimates is computed as the values corresponding to the 2.% and 97.% quantiles of m) with m = 1,, M. V. NUMERICAL RESULTS In this section, we test the proposed spatio-directional model on both synthetic and real data against the locally fit model, a spatial model only considering the covariate of location), and a directional model only considering the covariate of direction). A. Synthetic Data Here we draw samples from GEV distributions with parameters that depend on location and direction. Concretely, we select 26 sites arranged in a two-dimensional lattice. Extreme events occurring at each site are further allocated to one of 8 directional sectors. We then predefine GEV parameters for each site and each direction sector. In particular, the shape parameter γ is chosen to be constant across space whereas varying smoothly with direction. In contrast, the scale parameter σ is chosen to be a quadratic polynomial function of the location while remaining a constant with direction. The location parameter µ changes smoothly with regard to both location and direction. More explicitly, we characterie the spatial dependence and directional dependence by a quadratic polynomial and a Fourier series expansion respectively, as suggested in [8] and []. We then randomly generate 2 GEV distributed occurrences for each site and direction. Our results are summaried in Fig. 2 to 4. Fig. 2 and 3 show the scale and location parameters estimated by the aforementioned four models across space, while Fig. 4 shows the estimated GEV parameters with 9% confidence interval across different wind directions. We can see that estimates resulting from the proposed spatio-directional model follow the ground truth closely, across both space and wind direction. On the contrary, local estimates fluctuate substantially and exhibit the largest uncertainty among the four models, since the limited number of occurrences at each location and direction is insufficient to yield reliable estimates of GEV parameters. On the other hand, the spatial model is able to infer the varying trend of GEV parameters across space, but generates biased results; specifically, it underestimates the location parameters see Fig. 2c) and overestimates the scale parameters see Fig. 3c). Moreover, the model mistakenly ignores the directional variation of GEV parameters as shown in Fig. 4. Similarly, the directional model can roughly capture the overall trend across different directions cf. Fig. 4) but fails to model the spatial variation cf. Fig. 2d and Fig. 3d). Note that the confidence intervals of the latter two models are narrower than that of the spatio-directional model due to the larger sample sie by combining the data from different directions or sites) after ignoring the possible variation. Table II summaries the overall mean square error MSE) for each of the three GEV parameters and the Akaike Information Criterion AIC) of model fitting. The proposed model yields the smallest MSE and AIC score. The latter implies that the proposed model fits the data best notwithstanding the penalty on the effective number of parameters. Note that the effective number of parameters i.e. the degree of freedom) in the AIC score can be computed as tr{c T R 1 C + α K s + β K d ) 1 C T R 1 C} in the proposed spatio-directional model, according to the definition given in [31] and [32]. TABLE II: Quantitative comparison of different models Models Mean Square Error MSE) γ σ µ AIC Locally fit model Spatial model Directional model Spatio-directional model Interestingly, the estimated smoothness parameters α γ and β σ converge to infinity in the proposed spatio-directional model, exactly consistent with the ground truth that γ and σ are constant w.r.t. location and direction respectively. This indicates that the EM algorithm can learn the appropriate

9 a) Ground truth. b) Locally fit model. c) Spatial model. d) Directional model. e) Spatio-directional model. Fig. 2: Estimates of location parameter µ across all sites in the directional sector [3, 31 ) a) Ground truth. b) Locally fit model. c) Spatial model. d) Directional model. e) Spatio-directional model. Fig. 3: Estimates of scale parameter σ across all sites in the directional sector [3, 31 ) true value local estimates directional model spatial model spatio-directional model 13 3 shape parameter γ scale parameter σ location parameter µ direction a) direction b) direction c) Fig. 4: Estimates of the GEV parameters solid lines) with 9% confidence intervals dashed lines) across different directions at a randomly selected site. degree of smoothness in an automatic manner. Next, we investigate the impact of sample sie on the four models. In this set of experiments, we consider the MSE of the GEV estimates for varying sample sie. Concretely, we consider sample sie, 3,,, 1, and 2 per location per directional bin. For each sample sie, we generate data sets with the same parameteriation as before. We then compute the MSE of GEV estimates averaged over the sets, as shown in Fig.. The proposed spatio-directional model usually performs the best. The only exception is for the shape parameter when the number of samples is less than. In this case, the directional model produces the most accurate results, because the number of occurrences for each direction in the directional model is 26 i.e., the number of sites) times larger than that in the spatio-directional model. More importantly, the shape parameter only varies with direction in our simulation, thus making the directional model a proper choice. The proposed model yields reasonably accurate estimates even when the sample sie is as small as, which demonstrates the utility of the proposed model in the case of small sample sie. Finally, we test the performance of the proposed model when dealing with unmonitored sites and wind directions, i.e., sites and directional sectors for which no observations are available. The selection matrix C is not an identity matrix in this case. We depict in Fig. 6 the MSE for each GEV parameter and for the observed and unobserved variables respectively as a function of the percentage of missing variables across trials. We can see that the MSE decreases as the number of unmonitored sites and directional sectors decreases, in agreement with our expectations. However, the MSE is still small even when only % variables are observed in the

10 logarithm of averaged mean square error local estimates directional model spatial model spatio-directional model sample sie sample sie sample sie -7 a) Shape parameter γ. b) Scale parameter σ. c) Location parameter µ. -8 logarithm of averaged mean square error Fig. : Mean square error MSE) of the GEV estimates as a function of sample sie averaged over data sets). logarithm of averaged mean square error logrithm of averaged mean square error shape parameter 4 γ observed) shape parameter γ unobserved) sample sie 14 scale parameter σ observed) scale parameter σ unobserved) 12 location parameter µ observed) location parameter µ unobserved) proportion of observed variables Fig. 6: MSE for varying proportion of observed sites averaged over trials). The MSE decreases with increasing number of observed sites, as expected. 3D graphical models. In conclusion, the proposed model can successfully tackle missing data. B. Real Data We now investigate the extreme wave heights in the Gulf of Mexico. Such analysis could be of benefit when constructing oil platforms in the gulf to withstand extreme waves. The GOMOS Gulf of Mexico Oceanographic Study) data [33] cover the period from September 19 to September 2 inclusive, at thirty-minute intervals. It measures the significant wave heights. The hindcasts are produced by a physical model, calibrated to observed hurricane data. We isolate 31 maximum peak wave height values; each corresponds to a hurricane event in the Gulf of Mexico. We also extract the corresponding vector mean direction of the wind at the time of the peak significant wave height. We then select 176 sites arranged on a 8 22 lattice with spacing. approximately 6km), which almost covers the entire U.S. Gulf of Mexico. An initial analysis as shown in Fig. 7) shows the heterogeneity of the data w.r.t. location and direction. Additionally, we can see from Fig. 7 that the distribution of storm-wise maxima at each site and direction has a heavy Wave Height m) Wave Height m) Site Index a) Distribution of extreme wave heights at 2 randomly selected sites Wind Direction b) Scatter plot of wave height w.r.t direction. Fig. 7: Heterogeneity in GOMOS data. The wave heights clearly depend on the location in the Gulf of Mexico and the wind direction. tail, supporting the use of the GEV marginals. We further depict in Fig. 8 the scatter plot of extreme wave heights from two neighboring sites and two distant sites in the lattice. Strong spatial dependence exists between two nearby sites, however, the dependence is clearly weaker between sites that are far apart. This observation provides support for the choice of thin-membrane models, since the latter only capture the dependence between neighbors. In summary, the preliminary study suggests that the proposed model is well suited for this data set.

11 11 significant wave height at Site significant wave height at Site 76 a) Two nearby sites significant wave height at Site significant wave height at Site 1 b) Two distant sites Fig. 8: Scatter plots of wave height at pairs of sites. AIC score 2.4 x local estimates directional model spatial model spatio-directional model number of directional sectors Fig. 9: AIC score as a function of the selected number of directional sectors. We next apply the four models to the data: the locally fit model, the spatial model, the directional model and the spatiodirectional model. To test the influence of the selected number of directional sectors on the results, we consider different numbers of sectors, i.e., 6, 8,, 12, 1 and 18 in sequence. We then compute the AIC score of the four models for each number of directional sectors. The results are summaried in Fig. 9. Again, the spatio-directional model always achieves the best AIC score. Moreover, the performance of this model is not sensitive to the chosen number of directional sectors. In practice, we propose to choose the number that minimies the AIC score, which is 12 for the GOMOS data. The directional model also performs well but it ignores the spatial variation, which is essential to model the extreme wave heights in the Gulf of Mexico [8] see Fig 7a and 8b). The spatial model, on the other hand, does not properly consider the strong directional variation visible in Fig. 7b. Finally, the locally fit model overfits the data by introducing too many parameters. This can be concluded from the fact that its AIC increases with the number of directional sectors. Obviously, the proposed spatio-directional model is preferred for this GOMOS data. VI. CONCLUSION AND DISCUSSION In this paper, we proposed a novel extreme-value model to accommodate the effects of multiple covariates. More explicitly, we assume that marginal extreme values follow a GEV distribution. The GEV parameters are then coupled together through a multidimensional thin-membrane model. The advantages of the proposed model can be summaried as follows: Firstly, the multidimensional thin-membrane model can be constructed flexibly from one-dimensional thin-membrane models, rendering the proposed model generaliable to incorporate any number of covariates. Secondly, component eigenvector structures provide efficient inference of smoothed GEV parameters with computational complexity OM logm)). Thirdly, the eigenvalues of the overall multidimensional model can be computed easily, and help to simplify the determinant maximiation problem in the learning process of smoothness parameters. As a result, the proposed method scales gracefully with the problem sie. Numerical results for both synthetic and real data support the proposed model; it achieves the most accurate estimates of GEV parameters, and models the data best. Therefore, the approach may prove to be a practical tool for modeling extreme events with covariates. Despite the utility of the present model, it has several limitations. Firstly, where there are less than three samples per site per directional sector, we will treat the bin as empty. However, a more appealing approach is to utilie these samples in the model. One solution to this problem is reported in [37]. The authors first transform the distribution of directions per location into a uniform distribution for modeling, so that all directional bins have the same number of occurrences, and then transform it back after inference. Additionally, from the perspective of graphical models, we can try to couple the GEV marginals with the Gaussian prior directly, resulting in a graphical model with Gaussian pairwise potentials but non-gaussian unary potentials. Secondly, although the thinmembrane model greatly aids in flexible model construction as well as efficient learning and inference, it penalies the gradient and favors a flat surface i.e., constant values for GEV parameters). Therefore, the study of other smoothness priors would be of interest. One good example is the thin-plate model, which penalies curvature and resembles the thin-plate splines in the literature of spline smoothing. We expect that this model would give a more natural description of marginal extreme-event behaviors. Thirdly, we only consider the dependence between GEV parameters in the proposed model, whereas there also exists dependence between extreme values e.g., the extreme wave heights), especially when considering the spatial effects on extreme-value modeling [18]. Finally, another interesting area for future application is in estimation of joint extremes, for instance, the joint distribution of significant wave height and associated spectral peak period [37]. APPENDIX A DERIVATION OF THE FIRST STEP IN THE FTM SOLVER Due to the mixed product property of Kronecker product, V prior in 21) can be expressed alternatively as the matrix product: V prior = I Bx I By V C )I Bx V By I C ) V Bx I By I C ). ) Therefore, the calculation of 1 = V prior can be further divided into three substeps. Before elaborating on the subthree

12 12 steps, we first introduce another property of Kronecker product: for two arbitrary matrices J and K that can be multiplied together, vecjk) = I K J)vecK) 1) = K T I J )vecj). 2) Now let us focus on the first substep, that is, 1 1 = V Bx I By I C ). Given the above mentioned property, 1 1 can be computed as: 1 1 = vecz 1 V T B x ) = vec{v Bx Z 1 T ) T }, 3) where Z 1 is a P by QD matrix such that vecz) 1 =, and P, Q and D are the dimension of V Bx, V By and V C respectively. In addition, we define Z to be a 3D P by Q by D array such that vecz = as in Section IV-A. It is easy to find that Z 1 T has the following form: [Z ] 111 [Z ] 121 [Z ] 1Q1 [Z ] 1QD Z 1 T [Z ] 211 [Z ] 221 [Z ] 2Q1 [Z ] 2QD =..... [Z ] P 11 [Z ] P 21 [Z ] P Q1 [Z ] P QD As mentioned in Section II, V Bx Z 1 T is equivalent to discrete Fourier transform on each column of Z 1 T, and therefore, it coincides with performing FFT along the first dimension of the 3D array Z. Similarly, in the second substep, we can first apply the property in expression 1), and subsequently, in expression 2). As a result, we can find that 1 2 = I Bx V By I C )1 1 is identical to FFT in the second dimension of the 3D P by Q by Q array Z1 1 which satisfies vecz1) 1 = 1. 1 The operation in the third substep, i.e., 1 = I Bx I By V C )1, 2 can be proven likewise. Note that the ordering of the tree integration transforms can be arbitrary, because the three matrices I Bx I By V C ), I Bx V By I C ), and V Bx I By I C ) commute with each other. This algorithm resembles the popular rowcolumn algorithm in the literature of multidimensional Fourier transform, cf. [34]. APPENDIX B PROOF OF THEOREM 1 Since K prior is a Laplacian matrix, it is strictly positive semi-definite. In addition, C T R 1 C is a diagonal matrix with positive diagonal elements corresponding to the observed variables. According to the properties of Laplacian matrices, the resulting summation K post is strictly positive definite. The second conclusion follows from the following theorem of standard Richardson iteration, proved by Adams [3]. Theorem 3. Let K post = K 1 + K 2 be a symmetric positive definite matrix and let K 1 be symmetric and nonsingular. Then ρk 1 1 K 2) < 1 if and only if K 1 K 2 is positive definite. Since K 1 = K prior + ci and c >, K 1 is symmetric and nonsingular. Note that K 2 = K post K 1 4) = C T R 1 C ci, ) where c equals the largest diagonal element in C T R 1 C. Therefore, K 2 is a diagonal matrix with diagonal elements smaller than or equal to. As a result, K 1 K 2 is positive definite. APPENDIX C PROOF OF THEOREM 2 Computing the eigenvalues of K 1 1 K 2 amounts to solving the generalied eigenvalue problem: It follows from Theorem 2.2 in [36] that λk 1 x = K 2 x. 6) λ max K 2 ) λ max K 1 ) ρk 1 1 K 2) λ maxk 2 ) λ min K 1 ), 7) where λ max K) and λ min K) denote the largest and smallest absolute eignevalues of K. By substituting K 1 = K prior + ci and K 2 = C T R 1 C ci into 7), we can obtain the bounds specified for the proposed algorithm. APPENDIX D DERIVATION OF THE Q-FUNCTION We aim to learn the smoothness parameters α and β by maximum likelihood estimation, i.e., by maximiing Lα, β ) = log py α, β ) 8) = log py, α, β )d. 9) Since maximiing log py α, β ) is infeasible, we apply Expectation Maximiation EM) instead. In the E-step, we compute the Q-function, which is defined as: Qα, β ; ˆα κ 1), = p y, ˆα κ 1), = E κ 1) y,ˆα, ˆβ κ 1) = E κ 1) κ 1) y,ˆα, ˆβ + c. ˆβ κ 1) ) κ 1) ˆβ log py, α, β )d {log py, α, β )} { 12 trt K prior ) + 12 } log K prior + Note that the trace and the expectation operator commute. Consequently, Qα, β ; ˆα κ 1) κ 1), ˆβ ) = 1 K 2 tr prior E κ 1) y,ˆα, log K prior + + c = 1 2 tr K prior log K prior + + c. K κ 1) post ) κ 1) ˆβ T ) ) ) κ)) T Kprior κ) If we ignore the common coefficient 1/2, we obtain the Q- function in 33).

### EXTREME-VALUE GRAPHICAL MODELS WITH MULTIPLE COVARIATES. Hang Yu, Jingjing Cheng, and Justin Dauwels

EXTREME-VALUE GRAPHICAL MODELS WITH MULTIPLE COVARIATES Hang Yu, Jingjing Cheng, and Justin Dauwels School of Electrical and Electronics Engineering, Nanyang Technological University, Singapore ABSTRACT

### σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

### Probabilistic Graphical Models

2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

### 1 Random walks and data

Inference, Models and Simulation for Complex Systems CSCI 7-1 Lecture 7 15 September 11 Prof. Aaron Clauset 1 Random walks and data Supposeyou have some time-series data x 1,x,x 3,...,x T and you want

### Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

### Bayesian covariate models in extreme value analysis

Bayesian covariate models in extreme value analysis David Randell, Philip Jonathan, Kathryn Turnbull, Mathew Jones EVA 2015 Ann Arbor Copyright 2015 Shell Global Solutions (UK) EVA 2015 Ann Arbor June

### Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

### L11: Pattern recognition principles

L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

### Boosting Methods: Why They Can Be Useful for High-Dimensional Data

New URL: http://www.r-project.org/conferences/dsc-2003/ Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC 2003) March 20 22, Vienna, Austria ISSN 1609-395X Kurt Hornik,

### Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

### Lecture 10. Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis

CS-621 Theory Gems October 18, 2012 Lecture 10 Lecturer: Aleksander Mądry Scribes: Mani Bastani Parizi and Christos Kalaitzis 1 Introduction In this lecture, we will see how one can use random walks to

### Bayesian linear regression

Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding

### Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

### Multistate Modeling and Applications

Multistate Modeling and Applications Yang Yang Department of Statistics University of Michigan, Ann Arbor IBM Research Graduate Student Workshop: Statistics for a Smarter Planet Yang Yang (UM, Ann Arbor)

### Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Vector Autoregressive Model Vector Autoregressions II Empirical Macroeconomics - Lect 2 Dr. Ana Beatriz Galvao Queen Mary University of London January 2012 A VAR(p) model of the m 1 vector of time series

### STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

### Probabilistic Graphical Models

Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

### Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

### Lecture 21: Spectral Learning for Graphical Models

10-708: Probabilistic Graphical Models 10-708, Spring 2016 Lecture 21: Spectral Learning for Graphical Models Lecturer: Eric P. Xing Scribes: Maruan Al-Shedivat, Wei-Cheng Chang, Frederick Liu 1 Motivation

### Probability and Statistics

Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

### Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

### Introduction to Bayesian methods in inverse problems

Introduction to Bayesian methods in inverse problems Ville Kolehmainen 1 1 Department of Applied Physics, University of Eastern Finland, Kuopio, Finland March 4 2013 Manchester, UK. Contents Introduction

### Numerical Analysis for Statisticians

Kenneth Lange Numerical Analysis for Statisticians Springer Contents Preface v 1 Recurrence Relations 1 1.1 Introduction 1 1.2 Binomial CoefRcients 1 1.3 Number of Partitions of a Set 2 1.4 Horner's Method

### Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

### Statistical techniques for data analysis in Cosmology

Statistical techniques for data analysis in Cosmology arxiv:0712.3028; arxiv:0911.3105 Numerical recipes (the bible ) Licia Verde ICREA & ICC UB-IEEC http://icc.ub.edu/~liciaverde outline Lecture 1: Introduction

### Graph Detection and Estimation Theory

Introduction Detection Estimation Graph Detection and Estimation Theory (and algorithms, and applications) Patrick J. Wolfe Statistics and Information Sciences Laboratory (SISL) School of Engineering and

### Introduction to Graphical Models

Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic

### Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

### Gaussian Processes (10/16/13)

STA561: Probabilistic machine learning Gaussian Processes (10/16/13) Lecturer: Barbara Engelhardt Scribes: Changwei Hu, Di Jin, Mengdi Wang 1 Introduction In supervised learning, we observe some inputs

### ANALYSIS OF NONLINEAR PARTIAL LEAST SQUARES ALGORITHMS

ANALYSIS OF NONLINEAR PARIAL LEAS SQUARES ALGORIHMS S. Kumar U. Kruger,1 E. B. Martin, and A. J. Morris Centre of Process Analytics and Process echnology, University of Newcastle, NE1 7RU, U.K. Intelligent

### An introduction to Sequential Monte Carlo

An introduction to Sequential Monte Carlo Thang Bui Jes Frellsen Department of Engineering University of Cambridge Research and Communication Club 6 February 2014 1 Sequential Monte Carlo (SMC) methods

### Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

### x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

### Fast Linear Iterations for Distributed Averaging 1

Fast Linear Iterations for Distributed Averaging 1 Lin Xiao Stephen Boyd Information Systems Laboratory, Stanford University Stanford, CA 943-91 lxiao@stanford.edu, boyd@stanford.edu Abstract We consider

### Quantile POD for Hit-Miss Data

Quantile POD for Hit-Miss Data Yew-Meng Koh a and William Q. Meeker a a Center for Nondestructive Evaluation, Department of Statistics, Iowa State niversity, Ames, Iowa 50010 Abstract. Probability of detection

### Analysis of Fast Input Selection: Application in Time Series Prediction

Analysis of Fast Input Selection: Application in Time Series Prediction Jarkko Tikka, Amaury Lendasse, and Jaakko Hollmén Helsinki University of Technology, Laboratory of Computer and Information Science,

### Bivariate Degradation Modeling Based on Gamma Process

Bivariate Degradation Modeling Based on Gamma Process Jinglun Zhou Zhengqiang Pan Member IAENG and Quan Sun Abstract Many highly reliable products have two or more performance characteristics (PCs). The

### ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

### Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

### Kernel Methods. Machine Learning A W VO

Kernel Methods Machine Learning A 708.063 07W VO Outline 1. Dual representation 2. The kernel concept 3. Properties of kernels 4. Examples of kernel machines Kernel PCA Support vector regression (Relevance

### Bayesian time series classification

Bayesian time series classification Peter Sykacek Department of Engineering Science University of Oxford Oxford, OX 3PJ, UK psyk@robots.ox.ac.uk Stephen Roberts Department of Engineering Science University

### Self Adaptive Particle Filter

Self Adaptive Particle Filter Alvaro Soto Pontificia Universidad Catolica de Chile Department of Computer Science Vicuna Mackenna 4860 (143), Santiago 22, Chile asoto@ing.puc.cl Abstract The particle filter

### Feature selection and extraction Spectral domain quality estimation Alternatives

Feature selection and extraction Error estimation Maa-57.3210 Data Classification and Modelling in Remote Sensing Markus Törmä markus.torma@tkk.fi Measurements Preprocessing: Remove random and systematic

### Petr Volf. Model for Difference of Two Series of Poisson-like Count Data

Petr Volf Institute of Information Theory and Automation Academy of Sciences of the Czech Republic Pod vodárenskou věží 4, 182 8 Praha 8 e-mail: volf@utia.cas.cz Model for Difference of Two Series of Poisson-like

### Statistics 203: Introduction to Regression and Analysis of Variance Penalized models

Statistics 203: Introduction to Regression and Analysis of Variance Penalized models Jonathan Taylor - p. 1/15 Today s class Bias-Variance tradeoff. Penalized regression. Cross-validation. - p. 2/15 Bias-variance

### Convex Optimization of Graph Laplacian Eigenvalues

Convex Optimization of Graph Laplacian Eigenvalues Stephen Boyd Abstract. We consider the problem of choosing the edge weights of an undirected graph so as to maximize or minimize some function of the

### What s New in Econometrics. Lecture 13

What s New in Econometrics Lecture 13 Weak Instruments and Many Instruments Guido Imbens NBER Summer Institute, 2007 Outline 1. Introduction 2. Motivation 3. Weak Instruments 4. Many Weak) Instruments

### Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

### 2D Image Processing. Bayes filter implementation: Kalman filter

2D Image Processing Bayes filter implementation: Kalman filter Prof. Didier Stricker Dr. Gabriele Bleser Kaiserlautern University http://ags.cs.uni-kl.de/ DFKI Deutsches Forschungszentrum für Künstliche

### Estimation of Operational Risk Capital Charge under Parameter Uncertainty

Estimation of Operational Risk Capital Charge under Parameter Uncertainty Pavel V. Shevchenko Principal Research Scientist, CSIRO Mathematical and Information Sciences, Sydney, Locked Bag 17, North Ryde,

### C4-304 STATISTICS OF LIGHTNING OCCURRENCE AND LIGHTNING CURRENT S PARAMETERS OBTAINED THROUGH LIGHTNING LOCATION SYSTEMS

2, rue d'artois, F-75008 Paris http://www.cigre.org C4-304 Session 2004 CIGRÉ STATISTICS OF LIGHTNING OCCURRENCE AND LIGHTNING CURRENT S PARAMETERS OBTAINED THROUGH LIGHTNING LOCATION SYSTEMS baran@el.poweng.pub.ro

### INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

Pak. J. Statist. 2017 Vol. 33(1), 37-61 INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION A. M. Abd AL-Fattah, A.A. EL-Helbawy G.R. AL-Dayian Statistics Department, Faculty of Commerce, AL-Azhar

### Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives

TR-No. 14-06, Hiroshima Statistical Research Group, 1 11 Illustration of the Varying Coefficient Model for Analyses the Tree Growth from the Age and Space Perspectives Mariko Yamamura 1, Keisuke Fukui

### COM336: Neural Computing

COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

### Introduction to Gaussian Process

Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

### Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

### Lecture 4 January 23

STAT 263/363: Experimental Design Winter 2016/17 Lecture 4 January 23 Lecturer: Art B. Owen Scribe: Zachary del Rosario 4.1 Bandits Bandits are a form of online (adaptive) experiments; i.e. samples are

### Based on slides by Richard Zemel

CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

### PART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics

Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability

### Bayesian Modeling and Classification of Neural Signals

Bayesian Modeling and Classification of Neural Signals Michael S. Lewicki Computation and Neural Systems Program California Institute of Technology 216-76 Pasadena, CA 91125 lewickiocns.caltech.edu Abstract

### Undirected graphical models

Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

### 1 Using standard errors when comparing estimated values

MLPR Assignment Part : General comments Below are comments on some recurring issues I came across when marking the second part of the assignment, which I thought it would help to explain in more detail

### Kernel expansions with unlabeled examples

Kernel expansions with unlabeled examples Martin Szummer MIT AI Lab & CBCL Cambridge, MA szummer@ai.mit.edu Tommi Jaakkola MIT AI Lab Cambridge, MA tommi@ai.mit.edu Abstract Modern classification applications

### TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS

TRAFFIC FLOW MODELING AND FORECASTING THROUGH VECTOR AUTOREGRESSIVE AND DYNAMIC SPACE TIME MODELS Kamarianakis Ioannis*, Prastacos Poulicos Foundation for Research and Technology, Institute of Applied

### Bayesian methods in economics and finance

1/26 Bayesian methods in economics and finance Linear regression: Bayesian model selection and sparsity priors Linear Regression 2/26 Linear regression Model for relationship between (several) independent

### Unsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto

Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian

### Machine Learning Lecture 2

Machine Perceptual Learning and Sensory Summer Augmented 15 Computing Many slides adapted from B. Schiele Machine Learning Lecture 2 Probability Density Estimation 16.04.2015 Bastian Leibe RWTH Aachen

### Uncertainty quantification and visualization for functional random variables

Uncertainty quantification and visualization for functional random variables MascotNum Workshop 2014 S. Nanty 1,3 C. Helbert 2 A. Marrel 1 N. Pérot 1 C. Prieur 3 1 CEA, DEN/DER/SESI/LSMR, F-13108, Saint-Paul-lez-Durance,

### DS-GA 1002 Lecture notes 12 Fall Linear regression

DS-GA Lecture notes 1 Fall 16 1 Linear models Linear regression In statistics, regression consists of learning a function relating a certain quantity of interest y, the response or dependent variable,

### Master s Written Examination

Master s Written Examination Option: Statistics and Probability Spring 05 Full points may be obtained for correct answers to eight questions Each numbered question (which may have several parts) is worth

### arxiv: v1 [physics.data-an] 2 Mar 2011

Incorporating Nuisance Parameters in Likelihoods for Multisource Spectra J. S. Conway University of California, Davis, USA arxiv:1103.0354v1 [physics.data-an] Mar 011 1 Overview Abstract We describe here

### Regularized Least Squares

Regularized Least Squares Ryan M. Rifkin Google, Inc. 2008 Basics: Data Data points S = {(X 1, Y 1 ),...,(X n, Y n )}. We let X simultaneously refer to the set {X 1,...,X n } and to the n by d matrix whose

### Covariance Matrix Simplification For Efficient Uncertainty Management

PASEO MaxEnt 2007 Covariance Matrix Simplification For Efficient Uncertainty Management André Jalobeanu, Jorge A. Gutiérrez PASEO Research Group LSIIT (CNRS/ Univ. Strasbourg) - Illkirch, France *part

### Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

### Independent Component (IC) Models: New Extensions of the Multinormal Model

Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research

### An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application, CleverSet, Inc. STARMAP/DAMARS Conference Page 1 The research described in this presentation has been funded by the U.S.

### Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

### The Impact of Distributed Generation on Power Transmission Grid Dynamics

The Impact of Distributed Generation on Power Transmission Grid Dynamics D. E. Newman B. A. Carreras M. Kirchner I. Dobson Physics Dept. University of Alaska Fairbanks AK 99775 Depart. Fisica Universidad

### Imperfect Data in an Uncertain World

Imperfect Data in an Uncertain World James B. Elsner Department of Geography, Florida State University Tallahassee, Florida Corresponding author address: Dept. of Geography, Florida State University Tallahassee,

### UNIVERSITY OF TRENTO A THEORETICAL FRAMEWORK FOR UNSUPERVISED CHANGE DETECTION BASED ON CHANGE VECTOR ANALYSIS IN POLAR DOMAIN. F. Bovolo, L.

UNIVERSITY OF TRENTO DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY 38050 Povo Trento (Italy), Via Sommarive 14 http://www.dit.unitn.it A THEORETICAL FRAMEWORK FOR UNSUPERVISED CHANGE DETECTION

### Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise

### 1 Introduction. 2 AIC versus SBIC. Erik Swanson Cori Saviano Li Zha Final Project

Erik Swanson Cori Saviano Li Zha Final Project 1 Introduction In analyzing time series data, we are posed with the question of how past events influences the current situation. In order to determine this,

### Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

### DAG models and Markov Chain Monte Carlo methods a short overview

DAG models and Markov Chain Monte Carlo methods a short overview Søren Højsgaard Institute of Genetics and Biotechnology University of Aarhus August 18, 2008 Printed: August 18, 2008 File: DAGMC-Lecture.tex

### Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

### bound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o

Category: Algorithms and Architectures. Address correspondence to rst author. Preferred Presentation: oral. Variational Belief Networks for Approximate Inference Wim Wiegerinck David Barber Stichting Neurale

### Ch3. TRENDS. Time Series Analysis

3.1 Deterministic Versus Stochastic Trends The simulated random walk in Exhibit 2.1 shows a upward trend. However, it is caused by a strong correlation between the series at nearby time points. The true

### On the Application of the Generalized Pareto Distribution for Statistical Extrapolation in the Assessment of Dynamic Stability in Irregular Waves

On the Application of the Generalized Pareto Distribution for Statistical Extrapolation in the Assessment of Dynamic Stability in Irregular Waves Bradley Campbell 1, Vadim Belenky 1, Vladas Pipiras 2 1.

### p L yi z n m x N n xi

y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

### Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities

Conservative variance estimation for sampling designs with zero pairwise inclusion probabilities Peter M. Aronow and Cyrus Samii Forthcoming at Survey Methodology Abstract We consider conservative variance

### MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

### Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features Yangxin Huang Department of Epidemiology and Biostatistics, COPH, USF, Tampa, FL yhuang@health.usf.edu January

### Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

### The Caterpillar -SSA approach: automatic trend extraction and other applications. Theodore Alexandrov

The Caterpillar -SSA approach: automatic trend extraction and other applications Theodore Alexandrov theo@pdmi.ras.ru St.Petersburg State University, Russia Bremen University, 28 Feb 2006 AutoSSA: http://www.pdmi.ras.ru/

### The Generalized Likelihood Uncertainty Estimation methodology

CHAPTER 4 The Generalized Likelihood Uncertainty Estimation methodology Calibration and uncertainty estimation based upon a statistical framework is aimed at finding an optimal set of models, parameters

### Multi-resolution models for large data sets

Multi-resolution models for large data sets Douglas Nychka, National Center for Atmospheric Research National Science Foundation Iowa State March, 2013 Credits Steve Sain, Tamra Greasby, NCAR Tia LeRud,

### Faster Kriging on Graphs

Faster Kriging on Graphs Omkar Muralidharan Abstract [Xu et al. 2009] introduce a graph prediction method that is accurate but slow. My project investigates faster methods based on theirs that are nearly

### Outline Lecture 2 2(32)

Outline Lecture (3), Lecture Linear Regression and Classification it is our firm belief that an understanding of linear models is essential for understanding nonlinear ones Thomas Schön Division of Automatic