Variational Inference with Copula Augmentation

Size: px

Start display at page:

Download "Variational Inference with Copula Augmentation"

Geraldine Chapman
5 years ago
Views:

1 Variational Inference with Copula Augmentation Dustin Tran 1 David M. Blei 2 Edoardo M. Airoldi 1 1 Department of Statistics, Harvard University 2 Department of Statistics & Computer Science, Columbia University Presented by Shaobo Han, Duke University September 4, 2015

2 Outline 1 Introduction 2 Background: Vine Pair Copulas 3 Copula Variational Inference Sampling from the copula-augmented variational distribution Calculating the gradients 4 Experiments Mixture of Gaussians Latent space model D. Tran et al., 2015 Variational Inference with Copula Augmentation 1 / 15

3 Introduction The authors aim to do scalable, generic Bayesian inference: p(z x) q(z λ) Mean-field VI is fast but highly biased, underestimates the variance, and is sensitive to local optima (and hyper-parameter) Structured VI incorporates dependency but requires explicit knowledge of model and is difficult to construct The proposed approach automatically learns the dependency structure within a black box framework, and generalizes both approaches. D. Tran et al., 2015 Variational Inference with Copula Augmentation 2 / 15

4 Outline 1 Introduction 2 Background: Vine Pair Copulas 3 Copula Variational Inference Sampling from the copula-augmented variational distribution Calculating the gradients 4 Experiments Mixture of Gaussians Latent space model D. Tran et al., 2015 Variational Inference with Copula Augmentation 2 / 15

5 Variational inference Variational inference minimizes KL(q p) by maximizing the ELBO L(λ) E q [log p(x, z)] E q [log q(z λ)] }{{}}{{} energy entropy (1) Any random variable z = {z 1,..., z d } q can be factorized as [ d ] q(z) = q(z i ) c(q(z 1 ),..., Q(z d )) (2) i=1 where c is a joint density known as the copula. Bivariate Gaussian copula: c Gaussian (u 1, u 2 ; ρ) Φ ρ (Φ 1 (u 1 ), Φ 1 (u 2 )) D. Tran et al., 2015 Variational Inference with Copula Augmentation 3 / 15

6 Vine copulas Limitations: Standard multivariate copulas can be inflexible in high dimensions do not allow for different pairwise dependency structures Vine copulas for higher-dimensional data Bivariate copulas are building blocks, selected from a wide range of (parametric) families The dependency structure is determined by the bivariate copulas and a nest set of trees. D. Tran et al., 2015 Variational Inference with Copula Augmentation 4 / 15

7 Preliminaries: bivariate copulas Key basic identities Sklar s theorem (1959)[1]: F (x 1, x 2 ) = C(F 1 (x 1 ), F 2 (x 2 )) (3) Joint density f(x 1, x 2 ) = c 12 (F 1 (x 1 ), F 2 (x 2 )) f 1 (x 1 ) f 2 (x 2 ) (4) Conditional density f(x 2 x 1 ) = c 12 (F 1 (x 1 ), F 2 (x 2 )) f 2 (x 2 ) (5) Conditional distribution function F (x 2 x 1 ) = C 12 (F 1 (x 1 ), F 2 (x 2 ))/ F 1 (x 1 ) (6) [1] A. Sklar, Fonctions de Répartition à n Dimensions Et Leurs Marges, 1959 D. Tran et al., 2015 Variational Inference with Copula Augmentation 5 / 15

8 Pair-copula construction (PCC) Represent a density f(x 1,..., x d ) as a product of pair copula densities and marginal densities Example [2]: d = 3 dimensions. One possible decomposition of f(x 1, x 2, x 3 ) = f 1 (x 1 ) f 2 (x 2 ) f 3 (x 3 ) c 12 (F 1 (x 1 ), F 2 (x 2 )) c 23 (F 2 (x 2 ), F 3 (x 3 )) c 13 2 (F 1 2 (x 1 x 2 ), F 3 2 (x 3 x 2 )) For high-dimensional distributions, there are a significant number of possible pair-copula constructions. [2] N. Krämer & U. Schepsmeier, Introduction to Vine Copulas, NIPS workshop, 2011 D. Tran et al., 2015 Variational Inference with Copula Augmentation 6 / 15

9 Regular vine structure Bedford and Cooke (2001) [3] introduce graphical models denoted regular vines structure (R-vines) to help organize them. Regular vine A regular vine is a sequence of d 1 linked trees where: Tree T 1 is a tree on nodes 1 to d Tree T j has d + 1 j nodes and d j edges Edges in tree T j become nodes in tree T j+1 Proximity condition: Two nodes in tree T j+1 can be joined by an edge only if the corresponding edges in tree T j share a node [3] T.Bedford & R. Cooke, Probabilistic density decomposition for conditionally dependent random variables modeled by vines, 2001 D. Tran et al., 2015 Variational Inference with Copula Augmentation 7 / 15

10 Example [4]: Density f =f 1 f 2 f 3 f 4 f 5 c 14 c 15 c 24 c 34 c 12 4 c 13 4 c 45 1 c c c Multivariate copula Product of pair copula: c(u 1,..., u d ; η) = d 1 j=1 e(i,k) E j c ik D(e) (7) [4] C. Czado & K. Was, Pair-copula constructions -even more flexible than copulas, 2013 D. Tran et al., 2015 Variational Inference with Copula Augmentation 8 / 15

11 Outline 1 Introduction 2 Background: Vine Pair Copulas 3 Copula Variational Inference Sampling from the copula-augmented variational distribution Calculating the gradients 4 Experiments Mixture of Gaussians Latent space model D. Tran et al., 2015 Variational Inference with Copula Augmentation 8 / 15

12 Methodology λ: the original parameters (mean-field or structured) η: the augmented parameters (copula). [ d ] q(z λ, η) = q(z i λ) c(q(z 1 λ),..., Q(z 2 λ); η) }{{} i=1 }{{} copula mean-field (8) Gradients Expectations: {λ,η} L = E q [ {λ,η} log q(z λ, η) (log p(x, z) log q(z λ, η))] (9) D. Tran et al., 2015 Variational Inference with Copula Augmentation 9 / 15

13 Difficulties: 1. Sample from q 2. Calculate the gradient log q D. Tran et al., 2015 Variational Inference with Copula Augmentation 10 / 15

14 Simulation from an R-Vine copula model [5] 1. Generate u = (u 1,..., u d ) where each u i U(0, 1) 2. Calculate v = (v 1,..., v d ) which follows a joint uniform distribution with dependencies given by the copula: v 1 = u 1 v 2 = Q (u 2 v 1 ) v 3 = Q (u 3 v 1, v 2 ). v d = Q 1 d 12...d 1 (u d v 1, v 2,..., v d 1 ) 3. Calculate z = (Q 1 1 (v 1),..., Q 1 d (v d)), which is a sample from the copula-augmented distribution q(z λ, η). Use a recursive approach, refer to [5] for more details. [5] J. Dissmann, Statistical Inference for Regular Vines and Application, 2010 D. Tran et al., 2015 Variational Inference with Copula Augmentation 11 / 15

15 Calculating the gradients {λ,η} log q(z λ, η) [ d λ = i=1 log q(z i λ i ) + λ log c(q(z 1 λ),..., Q(z d λ); η) η log c(q(z 1 λ),..., Q(z d λ); η) ] (10) λi log q(z λ, η) = λi log q(z i λ i ) + Q(zi λ i ) log c(q(z 1 λ),..., Q(z d λ); η) λi Q(z i λ i ) d 1 = λi log q(z i λ i ) + λi Q(z i λ i ) j=1 ηi log c(q(z 1 λ),..., Q(z d λ); η) = e(k,l) E j : i C(e) d 1 e edge; C conditioning set; D conditioned set. Q(zi λ i ) log c kl D(e) j=1 e(k,l) E j : e ηi {C(e),D(e)} ηi log c kl D(e) D. Tran et al., 2015 Variational Inference with Copula Augmentation 12 / 15

16 Outline 1 Introduction 2 Background: Vine Pair Copulas 3 Copula Variational Inference Sampling from the copula-augmented variational distribution Calculating the gradients 4 Experiments Mixture of Gaussians Latent space model D. Tran et al., 2015 Variational Inference with Copula Augmentation 12 / 15

17 Implementations Automatic differentiation tools [6] Variance reduction: # of samples m = 1024 ADAM [7]: adaptive learning rate schedule combines ideas from AdaGrad and RMSprop [6] Stan Development Team. Stan: A c++ library for probability and sampling, 2014 [7] D. Kingman and J. Lei Ba. Adam: a method for stochastic optimization, ICLR, 2015 D. Tran et al., 2015 Variational Inference with Copula Augmentation 13 / 15

18 Mixture of Gaussians Classic example which stresses the difficulty of modeling dependency. D. Tran et al., 2015 Variational Inference with Copula Augmentation 14 / 15

19 Latent space model Dependency in the latent variables is crucial and the mean-field provides arbitrarily bad estimates. z n N (µ, Λ 1 ), logit(p) = θ z i z j (11) D. Tran et al., 2015 Variational Inference with Copula Augmentation 15 / 15

How to select a good vine

How to select a good vine Universitetet i Oslo ingrihaf@math.uio.no International FocuStat Workshop on Focused Information Criteria and Related Themes, May 9-11, 2016 Copulae Regular vines Model selection and reduction Limitations