Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University, Cambridge, UK 3 Ma Planck Institute for Intelligent Systems, Tübingen, Germany April 29, 2013 2 Both authors are equal contributors. 1
What is a Copula? Informal Definition A copula is a function that links univariate marginal distributions into a joint multivariate one. Marginal Densities 0.0 0.1 0.2 0.3 Copula Joint Density 0 2 4 6 8 10 y 0.0 0.1 0.2 0.3 0.4 3 2 1 0 1 2 3 The copula specifies the dependencies among the random variables. 2
What is a Copula? Formal Definition A copula is a distribution function with marginals uniform in [0, 1]. Let U 1,..., U d be r.v. uniformly distributed in [0, 1] with copula C then C(u 1,..., u d ) = p(u 1 u 1,..., U d u d ). Sklar s theorem (connection between joints, marginals and copulas) Any joint cdf F ( 1,..., d ) with marginal cdfs F 1 ( 1 ),..., F d ( d ) satisfies F ( 1,..., d ) = C(F 1 ( 1 ),..., F d ( d )), where C is the copula of F. It is easy to show that the joint pdf f can be written as d f ( 1,..., d ) = c(f 1 ( 1 ),..., F d ( d )) f i ( i ), c(u 1,..., u d ) and f 1 ( 1 ),..., f d ( d ) are the copula and marginal densities. i=1 3
4 Why are Copulas Useful in Machine Learning? The converse of Sklar s theorem is also true: Given a copula C : [0, 1] d [0, 1] and margins F 1 ( 1 ),..., F d ( d ) then C(F 1 ( 1 ),..., F d ( d )) represents a valid joint cdf. Copulas are a powerful tool for the modeling of multivariate data. We can easily etend univariate models to the multivariate regime. Copulas simplify the estimation process for multivariate models. 1 - Estimate the marginal distributions. 2 - Map the data to [0, 1] d using the estimated marginals. 3 - Estimate a copula function given the mapped data. Learning the marginals : easily done using standard univariate methods. Learning the copula : difficult, requires to use copula models that i) can represent a broad range of dependencies and ii) are robust to overfitting.
5 Parametric Copula Models There are many parametric 2D copulas. Some eamples are... Gaussian Clayton Frank t Copula Gumbel Joe Usually depend on a single scalar parameter θ which is in a one-to-one relationship with Kendall s tau rank correlation coefficient, defined as τ = p[(u 1 U 1)(U 2 U 2) > 0] p[(u 1 U 1)(U 2 U 2) < 0] = p[concordance] p[discordance], where (U 1, U 2 ) and (U 1, U 2 ) are independent samples from the copula. However, in higher dimensions, the number and epressiveness of parametric copulas is more limited.
6 Vine Copulas They are hierarchical graphical models that factorize c(u 1,..., u d ) into a product of d(d 1)/2 bivariate conditional copula densities. We can factorize c(u 1, u 2, u 3 ) using the product rule of probability as c(u 1, u 2, u 3 ) = f 3 12 (u 3 u 1, u 2 )f 2 1 (u 2 u 1 ) and we can epress each factor in terms of bivariate copula functions
7 Computing Conditional cdfs Computing c 31 2 [F 3 2 (u 3 u 2 ), F 1 2 (u 1 u 2 ) u 2 ] requires to evaluate the conditional marginal cdfs F 3 2 (u 3 u 2 ) and F 1 2 (u 1 u 2 ). This can be done using the following recursive relationship: F j A (u j A) = C jk B[F j B (u j B), B] =Fk B (u k B), where A is a set of variables different from u j and B = A \ {k}. For eample, F 3 2 (u 3 u 2 ) = C 32(u 3, ), F 1 2 (u 1 u 2 ) = C 21(, u 1 ) =u2. =u2
8 Regular Vines A regular vine specifies a factorization of c(u 1,..., u d ). Formed by d 1 trees T 1,..., T d 1 with node and edge sets V i and E i. Each edge e in any tree has associated three sets of variables C(e), D(e), N(e) {1,..., d} called conditioned, conditioning and constraint sets. V 1 = {1,..., d} and E 1 forms a spanning tree over a complete graph G 1 over V 1. For any e E 1, C(e) = N(e) = e and D(e) =. For i > 1, V i = E i 1 and E i forms a spanning tree over a graph G i with nodes V i and edges e = {e 1, e 2 } such that e 1, e 2 E i 1 and e 1 e 2. For any e = {e 1, e 2 } E i, i > 1, we have that C(e) = N(e 1 ) N(e 2 ), D(e) = N(e 1 ) N(e 2 ) and N(e) = N(e 1 ) N(e 2 ). c(u 1,..., u d ) = d 1 i=1 e E i c C(e) D(e).
Eample of a Regular Vine 9
10 Using Regular Vines in Practice Selecting a particular factorization: Many possible factorizations. Each one determined by the specific choices of spanning trees T 1,..., T d 1. In practice, each tree T i is chosen by assigning a weight to each edge in G i and then selecting the corresponding maimum spanning tree. The weight for the edge e is usually related to the dependence level between the variables in C(e) (often measured in terms of Kendall s tau). It is common to prune the vine and consider only a few of the first trees. Dealing with conditional bivariate copulas: Use the simplifying assumption : c C(e) D(e) does not depend on D(e). Our main contribution: avoid making use of the simplifying assumption.
11 A Semi-parametric Model for Conditional Copulas We describe c C(e) D(e) using a parametric model specified in terms of Kendall s tau τ [ 1, 1]. Let z be a vector with the value of the variables in D(e). Then we assume τ = σ[f (z)], where f is an arbitrary non-linear function and σ() = 2Φ() 1 is a sigmoid function.
Bayesian Inference on f We are given a sample D UV = {U i, V i } n i=1 from C C(e) D(e) with corresponding values for the variables in D(e) given by D z = {z i } n i=1. We want to identify the value of f that was used to generate the data. We assume that f follows a priori a Gaussian process. 1 0 1 2 3 1.0 0.5 0.0 0.5 1.0 10 5 0 5 10 10 5 0 5 10 12
13 Posterior and Predictive Distributions The posterior distribution for f = (f 1,..., f n ) T, where f i = f (z i ), is p(f D UV, D z ) = [ n i=1 c(u i, V i τ = σ[f i ])] p(f D z ), p(d UV D z ) where p(f D z ) = N (f m 0, K) is the Gaussian process prior on f. Given z n+1, the predictive distribution for U n+1 and V n+1 is p(u n+1, v n+1 z n+1, D UV, D z ) = c(u n+1, v n+1 τ = σ[f n+1 ]) p(f n+1 f, z n+1, D z )p(f D UV, D z )df, For efficient approimate inference, we use Epectation Propagation.
14 Epectation Propagation EP approimates p(f D UV, D z ) by Q(f) = N (f m, V), where EP tunes ˆm i and ˆv i by minimizing KL[q i (f i )Q(f)[ˆq i (f i )] 1 Q(f)]. We use numerical integration methods for this task. Kernel parameters fied by maimizing the EP appro. of p(d UV D z ). The total cost is O(n 3 ).
Implementation Details We choose the following covariance function for the GP prior: { } Cov[f (z i ), f (z j )] = σ ep (z i z j ) T diag(λ)(z i z j ) + σ 0. The mean of the GP prior is constant and equal to Φ 1 ((ˆτ MLE + 1)/2), where ˆτ MLE is the MLE of τ for an unconditional Gaussian copula. We use the FITC approimation: K approimated by K = Q + diag(k Q), where Q = K nn0 K 1 n 0 n 0 K T nn 0. K n0 n 0 is the n 0 n 0 covariance matri for n 0 n pseudo-inputs. K nn0 contains the covariances between training points and pseudo-inputs. The cost of EP is now O(nn 2 0 ). We choose n 0 = 20. The predictive distribution is approimated using sampling. 15
Eperiments I We compare the proposed method GPVINE with two baselines: 1 - SVINE, based on the simplifying assumption. 2 - MLLVINE, based on the maimization of the local likelihood. Can only capture dependencies on a single random variable. Limited to regular vines with at most two trees. All the data mapped to [0, 1] d using the ecdfs. Synthetic Data: Z uniform in [ 6, 6] and (U, V ) Gaussian with correlation 3/4 sin(z). Data set of size 50. τ U,V Z -0.6-0.2 0.2 0.6 GPVINE MLLVINE TRUE 0.0 0.2 0.4 0.6 0.8 1.0 P Z (Z) 16
17 Eperiments II Real-world data: UCI datasets, meteorological data, mineral concentrations and financial data Data split into training and test sets (50 times) with half of the data. Average test log likelihood when limited to two trees in the vine:
18 Results for More than Two Trees GPVINE SVINE
Conditional Dependencies in Weather Data Conditional Kendall s tau for atmospheric pressure and cloud percentage cover when conditioned to latitude and longitude near Barcelona on 11/19/2012 at 8pm. 19
20 Summary and Conclusions Vine copulas are fleible models for multivariate dependencies which specify a factorization of the copula density into a product of conditional bivariate copulas. In practical implementations of vines, some of the conditional dependencies in the bivariate copulas are usually ignored. To avoid this, we have proposed a method for the estimation of fully conditional vines using Gaussian processes (GPVINE). GPVINE outperforms a baseline that ignores conditional dependencies (SVINE) and other alternatives based on maimum local-likelihood methods (MLLVINE).
21 References Lopez-Paz D., Hernandez-Lobato J. M. and Ghahramani Z. Gaussian Process Vine Copulas for Multivariate Dependence International Conference on Machine Learning (ICML 2013). Acar, E. F., Craiu, R. V., and Yao, F. Dependence calibration in conditional copulas: A nonparametric approach. Biometrics, 67(2):445-453, 2011. Bedford, T. and Cooke, R. M. Vines-a new graphical model for dependent random variables. The Annals of Statistics, 30(4):1031-1068, 2002 Minka, T. P. Epectation Propagation for approimate Bayesian inference. Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, pp. 362-369, 2001. Naish-Guzman, A. and Holden, S. B. The generalized FITC approimation. In Advances in Neural Information Processing Systems 20, 2007. Patton, A. J. Modelling asymmetric echange rate dependence. International Economic Review, 47(2):527-556, 2006
Thank you for your attention! 22