Gaussian Process Vine Copulas for Multivariate Dependence

Similar documents
Gaussian Process Vine Copulas for Multivariate Dependence

arxiv: v1 [stat.me] 16 Feb 2013

How to select a good vine

Copulas. MOU Lili. December, 2014

Financial Econometrics and Volatility Models Copulas

Gaussian Process Regression Networks

Estimation of Copula Models with Discrete Margins (via Bayesian Data Augmentation) Michael S. Smith

Machine Learning 4771

Bayesian Semi-supervised Learning with Deep Generative Models

Markov Switching Regular Vine Copulas

Bayesian Learning in Undirected Graphical Models

Hybrid Copula Bayesian Networks

Bayesian Inference for Conditional Copula models

Construction and estimation of high dimensional copulas

Variational Inference with Copula Augmentation

Bayesian Quadrature: Model-based Approximate Integration. David Duvenaud University of Cambridge

Gaussian Process priors with Uncertain Inputs: Multiple-Step-Ahead Prediction

Multivariate Non-Normally Distributed Random Variables

Lecture 6: Graphical Models: Learning

Expectation Propagation for Approximate Bayesian Inference

arxiv: v3 [stat.me] 25 May 2017

Bayesian Inference for Pair-copula Constructions of Multiple Dependence

Dependence Calibration in Conditional Copulas: A Nonparametric Approach

Probabilistic Graphical Models

STA 414/2104, Spring 2014, Practice Problem Set #1

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Machine Learning Summer School

Lecture 3: Pattern Classification. Pattern classification

Mobile Robot Localization

Tree-structured Gaussian Process Approximations

Quantifying mismatch in Bayesian optimization

Learning Gaussian Process Models from Uncertain Data

An Introduction to Bayesian Machine Learning

Active and Semi-supervised Kernel Classification

Truncation of vine copulas using fit indices

Marginal Specifications and a Gaussian Copula Estimation

A parametric approach to Bayesian optimization with pairwise comparisons

Copula modeling for discrete data

Copula Network Classifiers (CNCs)

Model Selection for Gaussian Processes

CSci 8980: Advanced Topics in Graphical Models Gaussian Processes

Bayesian Inference for Conditional Copula models with Continuous and Binary Responses

Density Estimation: ML, MAP, Bayesian estimation

Probabilistic & Bayesian deep learning. Andreas Damianou

Bayesian optimization for automatic machine learning

Variational Model Selection for Sparse Gaussian Process Regression

Bayesian Approach 2. CSC412 Probabilistic Learning & Reasoning

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Short course A vademecum of statistical pattern recognition techniques with applications to image and video analysis. Agenda

Nonparametric Bayesian Methods (Gaussian Processes)

Assessing the VaR of a portfolio using D-vine copula based multivariate GARCH models

A Goodness-of-fit Test for Copulas

Where now? Machine Learning and Bayesian Inference

STA414/2104 Statistical Methods for Machine Learning II

Predictive Entropy Search for Efficient Global Optimization of Black-box Functions

Nonparameteric Regression:

Learning to Learn and Collaborative Filtering

Dependence. MFM Practitioner Module: Risk & Asset Allocation. John Dodson. September 11, Dependence. John Dodson. Outline.

Probabilistic Reasoning in Deep Learning

Machine Learning Lecture 3

M.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011

Robustness of a semiparametric estimator of a copula

Bayesian Inference of Noise Levels in Regression

Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?

ebay/google short course: Problem set 2

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

An Econometric Study of Vine Copulas

State Space and Hidden Markov Models

Variable sigma Gaussian processes: An expectation propagation perspective

Program and big picture Big data: can copula modelling be used for high dimensions, say

Chapter 1. Bayesian Inference for D-vines: Estimation and Model Selection

Semi-parametric predictive inference for bivariate data using copulas

How to build an automatic statistician

GWAS IV: Bayesian linear (variance component) models

The partial vine copula: A dependence measure and approximation based on the simplifying assumption

Bivariate Rainfall and Runoff Analysis Using Entropy and Copula Theories

Tutorial on Gaussian Processes and the Gaussian Process Latent Variable Model

Vine Copulas. Spatial Copula Workshop 2014 September 22, Institute for Geoinformatics University of Münster.

Deep Neural Networks as Gaussian Processes

Deep learning with differential Gaussian process flows

A measure of radial asymmetry for bivariate copulas based on Sobolev norm

PARSIMONIOUS MULTIVARIATE COPULA MODEL FOR DENSITY ESTIMATION. Alireza Bayestehtashk and Izhak Shafran

Kernels for Automatic Pattern Discovery and Extrapolation

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

MACHINE LEARNING ADVANCED MACHINE LEARNING

Regression with Input-Dependent Noise: A Bayesian Treatment

Bayesian Learning in Undirected Graphical Models

Introduction to Probabilistic Graphical Models: Exercises

Simulation of Tail Dependence in Cot-copula

Black-box α-divergence Minimization

Talk on Bayesian Optimization

Spatial Statistics 2013, S2.2 6 th June Institute for Geoinformatics University of Münster.

Gaussian Process Regression with Censored Data Using Expectation Propagation

Machine Learning Lecture 2

Recent Advances in Bayesian Inference Techniques

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Generalized Information Matrix Tests for Copulas

MULTIDIMENSIONAL POVERTY MEASUREMENT: DEPENDENCE BETWEEN WELL-BEING DIMENSIONS USING COPULA FUNCTION

Web-based Supplementary Material for. Dependence Calibration in Conditional Copulas: A Nonparametric Approach

A Brief Introduction to Copulas

Transcription:

Gaussian Process Vine Copulas for Multivariate Dependence José Miguel Hernández-Lobato 1,2 joint work with David López-Paz 2,3 and Zoubin Ghahramani 1 1 Department of Engineering, Cambridge University, Cambridge, UK 3 Ma Planck Institute for Intelligent Systems, Tübingen, Germany April 29, 2013 2 Both authors are equal contributors. 1

What is a Copula? Informal Definition A copula is a function that links univariate marginal distributions into a joint multivariate one. Marginal Densities 0.0 0.1 0.2 0.3 Copula Joint Density 0 2 4 6 8 10 y 0.0 0.1 0.2 0.3 0.4 3 2 1 0 1 2 3 The copula specifies the dependencies among the random variables. 2

What is a Copula? Formal Definition A copula is a distribution function with marginals uniform in [0, 1]. Let U 1,..., U d be r.v. uniformly distributed in [0, 1] with copula C then C(u 1,..., u d ) = p(u 1 u 1,..., U d u d ). Sklar s theorem (connection between joints, marginals and copulas) Any joint cdf F ( 1,..., d ) with marginal cdfs F 1 ( 1 ),..., F d ( d ) satisfies F ( 1,..., d ) = C(F 1 ( 1 ),..., F d ( d )), where C is the copula of F. It is easy to show that the joint pdf f can be written as d f ( 1,..., d ) = c(f 1 ( 1 ),..., F d ( d )) f i ( i ), c(u 1,..., u d ) and f 1 ( 1 ),..., f d ( d ) are the copula and marginal densities. i=1 3

4 Why are Copulas Useful in Machine Learning? The converse of Sklar s theorem is also true: Given a copula C : [0, 1] d [0, 1] and margins F 1 ( 1 ),..., F d ( d ) then C(F 1 ( 1 ),..., F d ( d )) represents a valid joint cdf. Copulas are a powerful tool for the modeling of multivariate data. We can easily etend univariate models to the multivariate regime. Copulas simplify the estimation process for multivariate models. 1 - Estimate the marginal distributions. 2 - Map the data to [0, 1] d using the estimated marginals. 3 - Estimate a copula function given the mapped data. Learning the marginals : easily done using standard univariate methods. Learning the copula : difficult, requires to use copula models that i) can represent a broad range of dependencies and ii) are robust to overfitting.

5 Parametric Copula Models There are many parametric 2D copulas. Some eamples are... Gaussian Clayton Frank t Copula Gumbel Joe Usually depend on a single scalar parameter θ which is in a one-to-one relationship with Kendall s tau rank correlation coefficient, defined as τ = p[(u 1 U 1)(U 2 U 2) > 0] p[(u 1 U 1)(U 2 U 2) < 0] = p[concordance] p[discordance], where (U 1, U 2 ) and (U 1, U 2 ) are independent samples from the copula. However, in higher dimensions, the number and epressiveness of parametric copulas is more limited.

6 Vine Copulas They are hierarchical graphical models that factorize c(u 1,..., u d ) into a product of d(d 1)/2 bivariate conditional copula densities. We can factorize c(u 1, u 2, u 3 ) using the product rule of probability as c(u 1, u 2, u 3 ) = f 3 12 (u 3 u 1, u 2 )f 2 1 (u 2 u 1 ) and we can epress each factor in terms of bivariate copula functions

7 Computing Conditional cdfs Computing c 31 2 [F 3 2 (u 3 u 2 ), F 1 2 (u 1 u 2 ) u 2 ] requires to evaluate the conditional marginal cdfs F 3 2 (u 3 u 2 ) and F 1 2 (u 1 u 2 ). This can be done using the following recursive relationship: F j A (u j A) = C jk B[F j B (u j B), B] =Fk B (u k B), where A is a set of variables different from u j and B = A \ {k}. For eample, F 3 2 (u 3 u 2 ) = C 32(u 3, ), F 1 2 (u 1 u 2 ) = C 21(, u 1 ) =u2. =u2

8 Regular Vines A regular vine specifies a factorization of c(u 1,..., u d ). Formed by d 1 trees T 1,..., T d 1 with node and edge sets V i and E i. Each edge e in any tree has associated three sets of variables C(e), D(e), N(e) {1,..., d} called conditioned, conditioning and constraint sets. V 1 = {1,..., d} and E 1 forms a spanning tree over a complete graph G 1 over V 1. For any e E 1, C(e) = N(e) = e and D(e) =. For i > 1, V i = E i 1 and E i forms a spanning tree over a graph G i with nodes V i and edges e = {e 1, e 2 } such that e 1, e 2 E i 1 and e 1 e 2. For any e = {e 1, e 2 } E i, i > 1, we have that C(e) = N(e 1 ) N(e 2 ), D(e) = N(e 1 ) N(e 2 ) and N(e) = N(e 1 ) N(e 2 ). c(u 1,..., u d ) = d 1 i=1 e E i c C(e) D(e).

Eample of a Regular Vine 9

10 Using Regular Vines in Practice Selecting a particular factorization: Many possible factorizations. Each one determined by the specific choices of spanning trees T 1,..., T d 1. In practice, each tree T i is chosen by assigning a weight to each edge in G i and then selecting the corresponding maimum spanning tree. The weight for the edge e is usually related to the dependence level between the variables in C(e) (often measured in terms of Kendall s tau). It is common to prune the vine and consider only a few of the first trees. Dealing with conditional bivariate copulas: Use the simplifying assumption : c C(e) D(e) does not depend on D(e). Our main contribution: avoid making use of the simplifying assumption.

11 A Semi-parametric Model for Conditional Copulas We describe c C(e) D(e) using a parametric model specified in terms of Kendall s tau τ [ 1, 1]. Let z be a vector with the value of the variables in D(e). Then we assume τ = σ[f (z)], where f is an arbitrary non-linear function and σ() = 2Φ() 1 is a sigmoid function.

Bayesian Inference on f We are given a sample D UV = {U i, V i } n i=1 from C C(e) D(e) with corresponding values for the variables in D(e) given by D z = {z i } n i=1. We want to identify the value of f that was used to generate the data. We assume that f follows a priori a Gaussian process. 1 0 1 2 3 1.0 0.5 0.0 0.5 1.0 10 5 0 5 10 10 5 0 5 10 12

13 Posterior and Predictive Distributions The posterior distribution for f = (f 1,..., f n ) T, where f i = f (z i ), is p(f D UV, D z ) = [ n i=1 c(u i, V i τ = σ[f i ])] p(f D z ), p(d UV D z ) where p(f D z ) = N (f m 0, K) is the Gaussian process prior on f. Given z n+1, the predictive distribution for U n+1 and V n+1 is p(u n+1, v n+1 z n+1, D UV, D z ) = c(u n+1, v n+1 τ = σ[f n+1 ]) p(f n+1 f, z n+1, D z )p(f D UV, D z )df, For efficient approimate inference, we use Epectation Propagation.

14 Epectation Propagation EP approimates p(f D UV, D z ) by Q(f) = N (f m, V), where EP tunes ˆm i and ˆv i by minimizing KL[q i (f i )Q(f)[ˆq i (f i )] 1 Q(f)]. We use numerical integration methods for this task. Kernel parameters fied by maimizing the EP appro. of p(d UV D z ). The total cost is O(n 3 ).

Implementation Details We choose the following covariance function for the GP prior: { } Cov[f (z i ), f (z j )] = σ ep (z i z j ) T diag(λ)(z i z j ) + σ 0. The mean of the GP prior is constant and equal to Φ 1 ((ˆτ MLE + 1)/2), where ˆτ MLE is the MLE of τ for an unconditional Gaussian copula. We use the FITC approimation: K approimated by K = Q + diag(k Q), where Q = K nn0 K 1 n 0 n 0 K T nn 0. K n0 n 0 is the n 0 n 0 covariance matri for n 0 n pseudo-inputs. K nn0 contains the covariances between training points and pseudo-inputs. The cost of EP is now O(nn 2 0 ). We choose n 0 = 20. The predictive distribution is approimated using sampling. 15

Eperiments I We compare the proposed method GPVINE with two baselines: 1 - SVINE, based on the simplifying assumption. 2 - MLLVINE, based on the maimization of the local likelihood. Can only capture dependencies on a single random variable. Limited to regular vines with at most two trees. All the data mapped to [0, 1] d using the ecdfs. Synthetic Data: Z uniform in [ 6, 6] and (U, V ) Gaussian with correlation 3/4 sin(z). Data set of size 50. τ U,V Z -0.6-0.2 0.2 0.6 GPVINE MLLVINE TRUE 0.0 0.2 0.4 0.6 0.8 1.0 P Z (Z) 16

17 Eperiments II Real-world data: UCI datasets, meteorological data, mineral concentrations and financial data Data split into training and test sets (50 times) with half of the data. Average test log likelihood when limited to two trees in the vine:

18 Results for More than Two Trees GPVINE SVINE

Conditional Dependencies in Weather Data Conditional Kendall s tau for atmospheric pressure and cloud percentage cover when conditioned to latitude and longitude near Barcelona on 11/19/2012 at 8pm. 19

20 Summary and Conclusions Vine copulas are fleible models for multivariate dependencies which specify a factorization of the copula density into a product of conditional bivariate copulas. In practical implementations of vines, some of the conditional dependencies in the bivariate copulas are usually ignored. To avoid this, we have proposed a method for the estimation of fully conditional vines using Gaussian processes (GPVINE). GPVINE outperforms a baseline that ignores conditional dependencies (SVINE) and other alternatives based on maimum local-likelihood methods (MLLVINE).

21 References Lopez-Paz D., Hernandez-Lobato J. M. and Ghahramani Z. Gaussian Process Vine Copulas for Multivariate Dependence International Conference on Machine Learning (ICML 2013). Acar, E. F., Craiu, R. V., and Yao, F. Dependence calibration in conditional copulas: A nonparametric approach. Biometrics, 67(2):445-453, 2011. Bedford, T. and Cooke, R. M. Vines-a new graphical model for dependent random variables. The Annals of Statistics, 30(4):1031-1068, 2002 Minka, T. P. Epectation Propagation for approimate Bayesian inference. Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, pp. 362-369, 2001. Naish-Guzman, A. and Holden, S. B. The generalized FITC approimation. In Advances in Neural Information Processing Systems 20, 2007. Patton, A. J. Modelling asymmetric echange rate dependence. International Economic Review, 47(2):527-556, 2006

Thank you for your attention! 22