Bayesian nonparametric latent feature models

Size: px

Start display at page:

Download "Bayesian nonparametric latent feature models"

Martha Terry
5 years ago
Views:

1 Bayesian nonparametric latent feature models Indian Buffet process, beta process, and related models François Caron Department of Statistics, Oxford Applied Bayesian Statistics Summer School Como, Italy June 16-, 14 F. Caron 1 / 62 Introduction Indian buffet process A parametric beta Bernoulli model Beta-Bernoulli process Inference Stable Indian buffet process Beyond the Indian buffet process F. Caron 2 / 62

2 Outline Introduction Indian buffet process A parametric beta Bernoulli model Beta-Bernoulli process Inference Stable Indian buffet process Beyond the Indian buffet process F. Caron 3 / 62 Introduction Clustering Cluster/partition a set of items i = 1,..., n into clusters F. Caron 4 / 62

3 Introduction Clustering Random partition π n = {A n,1,..., A n,kn } where A n,j, j = 1,..., K n non-empty and non-overlapping subsets of [n] := {1,..., n} with K n A j = [n] A j are clusters, K n n is the number of clusters Example π 6 = {{1, 4, }, {2, 3}, {6}} F. Caron / 62 Introduction Clustering Nonparametric approach: K n can increase unboundedly with the number of items n Exchangeable random partition: Distribution is invariant w.r.t. any permutation of [n], e.g. P ({{1, 2}, {3}}) = P ({{2, 3}, {1}}) = P ({{1, 3}, {2}}) Labelling/ordering of the items is of no importance Chinese restaurant process is an example of a generative process for an exchangeable partition F. Caron 6 / 62

2 Tree Human Image 3 Human Image 4 Tree Human Image Road Animal F.

4 Introduction Latent feature models Set of objects i = 1,..., n Objects i have a set of features/attributes, shared amongst objects Example: Image 1 Image 2 Tree Human Image 3 Human Image 4 Tree Human Image Road Animal F. Caron 7 / 62 Introduction Latent feature models Dynamic state-space models Collection of time series with shared dynamical behaviors [Fox et al., 09] F. Caron 8 / 62

Introduction Latent feature models Application to dynamic state-space models Collection of time series with shared dynamical behaviors [Fox et al., 09] F.

5 Introduction Latent feature models Application to dynamic state-space models Collection of time series with shared dynamical behaviors [Fox et al., 09] F. Caron 9 / 62 Introduction Latent feature models Collaborative filtering: predict missing entries in a user/items matrix from a subset of its entries Low-rank assumption: matrix can be decomposed with a small number of latent features User/feature association matrix [Meeds et al., 07] F. Caron / 62

6 Introduction Latent feature models Random feature allocation Representation as a multiset of [n] = {1,..., n} f n = {A n,1,..., A n,kn } where A n,j, j = 1,..., K n are non-empty (possibly overlapping) subsets of [n] A n,j, j = 1,..., K n are sets of objects sharing a given feature j Example: f = {{2, 3, 4}, {2, 4}, {}, {}} Image 1 Image 2 Tree Human Image 3 Human Image 4 Tree Human Image Road Animal [Broderick et al., 13a] F. Caron 11 / 62 Introduction Latent feature models Multisets often graphically represented by a binary matrix Beware that feature labelling does not matter! Features Features Object 1 Object 2 Object 3 represent the same multiset and f 3 = {{1, 2, 3}, {1, 3}, {1, 2}, {2}, {2, 3}, {3}, {3}} F. Caron 12 / 62

7 Introduction Latent feature models Nonparametric approach: the number of features K n can increase unboundedly with n Exchangeable latent feature model: distribution of f n invariant w.r.t. any permutation σ of [n], e.g. Pr({{2, 3, 4}, {2, 4}, {}, {}}) = Pr({{3, 4, }, {3, }, {1}, {1}}) = Pr({{σ(2), σ(3), σ(4)}, {σ(2), σ(4)}, {σ()}, {σ()}}) for any permutation σ of {1, 2, 3, 4, } F. Caron 13 / 62 Outline Introduction Indian buffet process A parametric beta Bernoulli model Beta-Bernoulli process Inference Stable Indian buffet process Beyond the Indian buffet process F. Caron 14 / 62

8 Indian buffet process Generative model for multisets Single parameter α > 0 First customer picks K + 1 Poisson(α) dishes Then each customer i = 2,... chooses a dish j previously chosen mi 1,j times with probability m i 1,j /i picks an additional set of dishes K + i Customer 1 Customer 2 Customer 3 Dishes Poisson(α/i) f 3 = {{1, 2, 3}, {1, 3}, {1, 2}, {2}, {2, 3}, {3}, {3}} [Griffiths and Ghahramani, 0, Griffiths and Ghahramani, 11] F. Caron / 62 Indian buffet process alpha=1 alpha= alpha= Objects Objects Objects Features Features Features F. Caron 16 / 62

9 Indian buffet process Rich gets richer process: more popular dishes are more likely to be chosen by new customers New dishes can always be picked as new customers arrive, but at a decreasing rate α/i Number of features/dishes for n customers follows a Poisson distribution with rate α n i=1 1 i α log(n) Number of dishes picked by each customer (degree of a customer) follows Poisson(α) Degree distribution of features follows a heavy tail distribution F. Caron 17 / 62 Indian buffet process Number of occurences Distribution Degree of objects Degree of features F. Caron 18 / 62

10 Indian buffet process Multiset f n = {A n,1,..., A n,kn } with m n,j = A n,j Let {Ãn,1,..., Ãn, K n } be the set of unique values in f n, and κ 1,..., be their multiplicities, then κ Kn Pr(f n ) = αk n Kn h=1 κ h! e α n i=1 1 i K n (m n,j 1)!(n m n,j )! n! Does not depend on the ordering of the customers Exchangeable latent feature model F. Caron 19 / 62 Indian buffet process How to derive the IBP? Limit of a parametric beta Bernoulli model Completely random measures F. Caron / 62

11 Outline Introduction Indian buffet process A parametric beta Bernoulli model Beta-Bernoulli process Inference Stable Indian buffet process Beyond the Indian buffet process F. Caron 21 / 62 Parametric beta Bernoulli model Binary matrix z = (z i,j ) of size n p For j = 1,..., p ( ) α π j Beta p, 1 For i = 1,..., n and j = 1,..., p z i,j π j Ber(π j ) (a) p = 0 (b) p = 00 F. Caron 22 / 62

12 Parametric beta Bernoulli model Pr(z) = = = = p 1 0 n 0 i=1 p 1 πj p p π z i,j j (1 π j ) 1 z i,j Beta(π j ; α/p, 1)dπ j i z ij (1 π j ) n i z ij Beta(π j ; α/p, 1)dπ j B( i z ij + α/p, n i z ij + 1) B(α/p, 1) α/pγ( i z ij + α/p)γ(n i z ij + 1) Γ(n α/p) where B(a, b) = Γ(a)Γ(b) Γ(a+b) Γ(a + 1) = aγ(a). is the beta function, using F. Caron 23 / 62 Parametric beta Bernoulli model Let f n = multiset(z) denote the multiset corresponding to z multiset(z) = {{i z ij = 1}, j = 1,..., p s.t. i z ij > 0} Many matrices z correspond to the same multiset Let E(f n ) = {z f n = multiset(z)} be the set of matrices corresponding to the same multiset f n Cardinality of E(f n ) E(f n ) = p! κ 0! Kn h=1 κ h! where κ 0 is the number of all-zero columns. F. Caron 24 / 62

13 Parametric beta Bernoulli model Due to column exchangeability, all matrices z E(f n ) have the same probability Pr(f n ) = Pr(z) = p! z E(f n ) κ 0! Kn h=1 κ h! = αk n Kh K n α/pγ(m n,j + α/p)γ(n m n,j + 1) Γ(n α/p) ( α/pγ(α/p)γ(n + 1) Γ(n α/p) ( ) p! n!γ(α/p) p h=1 κ h! κ 0!p K n Γ(n α/p) K n Γ(m n,j + α/p)(n m n,j )! Γ(α/p)n! ) κ0 F. Caron / 62 Parametric beta Bernoulli model Taking the limit as p α K n Kh h=1 κ h! K n p! κ 0!p K n ( n!γ(α/p) Γ(n+1+α/p) Γ(m n,j +α/p)(n m n,j )! Γ(α/p)n! ) p α K n p Kh h=1 κ h! K n 1 e α n i=1 1/i (m n,j 1)!(n m n,j )! n! F. Caron 26 / 62

14 Outline Introduction Indian buffet process A parametric beta Bernoulli model Beta-Bernoulli process Inference Stable Indian buffet process Beyond the Indian buffet process F. Caron 27 / 62 Beta-Bernoulli process Now assume that each feature j = 1,..., K n has some location θn,j in a feature space Θ Feature locations are assumed to be i.i.d from some distribution G 0 (density g 0 ) Represent the feature model as a collection of point processes Z i = z ij δ θj where δ a is the dirac delta mass and zij = 1 if object i possesses feature θ j {θ n,j } = {θ k i [n] s.t. z ik > 0} F. Caron 28 / 62

15 Beta-Bernoulli process Let f n (Z 1,..., Z n ) be the multiset induced by the point processes f n (Z 1,..., Z n ) = {{i Z i (θ n,j ) = 1}, j = 1,..., K n} Distribution over (Z i ) i=1,...,n is obtained by setting independent priors over the feature allocations and their locations K n p(z 1,..., Z n ) = Pr(f n (Z 1,..., Z n )) Using the IBP prior for the feature allocations Kh g 0 (θn,j ) h=1 κ h! p(z 1,..., Z n ) =α K n e α n i=1 1 i K n g 0 (θ j ) K n (m n,j 1)!(n m n,j )! n! F. Caron 29 / 62 Beta-Bernoulli process Exchangeability over the feature allocations f n carries over (Z i ) i=1,...,n Infinite exchangeability: for any n 1 and any permutation σ of [n] p(z 1,..., Z n ) = p(z σ(1),..., Z σ(n) ) De Finetti representation theorem implies p(z 1,..., Z n ) = n i=1 p(z i B)P (db) where B is some latent process with distribution P de Finetti measure P (db): beta process [Hjort, 1990, Thibaux and Jordan, 07] F. Caron / 62

16 Beta-Bernoulli process Let B = π j δ θj be a completely random measure characterized by its Lévy measure ν(dπ, dθ) = απ 1 (1 π) α 1 dπg 0 (dθ) defined on [0, 1] Θ. B is called a beta process and we write B BetaP(α, G 0 ) A draw from a beta process is discrete a.s. with an infinite number of atoms [Hjort, 1990] F. Caron 31 / 62 Beta-Bernoulli process Beta process Lévy intensity Feature space Θ Stick weights F. Caron 32 / 62

17 Beta-Bernoulli process Conditional Bernoulli process Z i B BeP(B) Z i = z ij δ θj where z ij Ber(π j ) F. Caron 33 / 62 Beta-Bernoulli process 1 Stick weights B 0 40 Objects Feature space Θ Z F. Caron 34 / 62

18 Beta-Bernoulli process Conjugacy Let θn,1,..., θ n,k n be the number of support points in Z 1,..., Z n and m n,j their occurences Posterior B Z 1,..., Z n BetaP α + n, α α + n G 0 + K n m n,j α + n δ θn,j Predictive distribution Z n+1 Z 1,..., Z n BeP α α + n G 0 + K n m n,j α + n δ θn,j [Hjort, 1990, Kim, 1999, Thibaux and Jordan, 07] F. Caron 3 / 62 Chinese restaurant vs Indian buffet Application Clustering Latent feature Combinatorial object Partition Multiset Generative model Chinese restaurant proc. Indian buffet proc. de Finetti measure Dirichlet process beta process Stick-breaking Yes Yes Conjugacy Yes Yes Power-law extensions Pitman-Yor stable beta process F. Caron 36 / 62

19 Outline Introduction Indian buffet process A parametric beta Bernoulli model Beta-Bernoulli process Inference Stable Indian buffet process Beyond the Indian buffet process F. Caron 37 / 62 Inference Latent variable model Data X of size n d (Marginal) Likelihood Prior Pr(X f n ) = Θ Pr(X f n, θ)p (θ)dθ Pr(f n ) Posterior Pr(f n ) Pr(X f n ) Pr(f n ) Inference can be carried out using IBP MCMC with Metropolis-Hastings within Gibbs updates Sequential Monte Carlo [Meeds et al., 07, Wood and Griffiths, 07] F. Caron 38 / 62

20 Outline Introduction Indian buffet process A parametric beta Bernoulli model Beta-Bernoulli process Inference Stable Indian buffet process Beyond the Indian buffet process F. Caron 39 / 62 Stable Indian buffet process Three parameters α > 0, σ [0, 1) and c > σ First customer picks K + 1 Poisson(α) dishes Then each customer i = 2,... chooses a dish j previously chosen m i 1,j times with probability m i 1,j σ c + i 1 picks an additional set of dishes ( ) K + Γ(1 + c)γ(i 1 + c + σ) i Poisson α Γ(i + c)γ(c + σ) Reduces to the one parameter IBP when c = 1 and σ = 0 [Teh and Görür, 09] F. Caron 40 / 62

21 Stable Indian buffet process sigma=0 sigma=0. sigma=0.9 Objects Objects Objects Features Features Features F. Caron 41 / 62 Stable Indian buffet process Power-law behavior for σ > 0 Number of features grows in O(n σ ) Proportion of features associated to m objects is, for n m large, in O ( 1 m 1+σ ) Similar to the Pitman-Yor process for mixture models F. Caron 42 / 62

22 Stable Indian buffet process Distribution σ=0 σ=0. σ=0.9 Number of features σ=0 σ=0. σ= Degree of features Number of objects F. Caron 43 / 62 Outline Introduction Indian buffet process A parametric beta Bernoulli model Beta-Bernoulli process Inference Stable Indian buffet process Beyond the Indian buffet process F. Caron 44 / 62

23 Properties of the Indian buffet Nb of features Overall nb Prop. of features per object of features associated to m objects IBP Poisson O(log(n)) stable IBP Poisson O(n σ ) Power-law behavior (σ > 0) latent IBP Poisson ( rates) O(log(n)) Power-law behavior Mixture of Poisson O(n σ ) In the IBP, all objects have marginally Poisson(α) features One may want: Relax exchangeability assumption: some objects are a priori likely to have more features than others Relax Poisson assumptions: distribution on the number of features per objects may have heavier tails than Poisson F. Caron 4 / 62 Shared pattern in time series [Fox et al., 09] F. Caron 46 / 62

Book-crossing community network 000 readers, 36 000 books, 0 000 edges F.

24 Book-crossing community network 000 readers, books, edges F. Caron 47 / 62 Book-crossing community network Degree distributions on log-log scale Distribution 3 4 Distribution Degree Degree (c) Readers (d) Books F. Caron 48 / 62

25 Hierarchical model Collection of atomic measures Z i, i = 1, 2,... Z i = z ij δ θj z ij = 1 if reader i has read book j, 0 otherwise {θj } is the set of books Each book j is assigned a positive popularity parameter w j Each reader i is assigned a positive interest in reading parameter γ i The probability that reader i reads book j is P (z ij = 1 γ i, w j ) = 1 exp( w j γ i ) [Caron, 12] F. Caron 49 / 62 Data Augmentation Latent variable formulation Latent scores s ij Gumbel(log(w j ), 1) All books with a score above log(γi ) are retained, others are discarded log(γ i ) books books popularity score F. Caron 0 / 62

26 Model for the book popularity parameters Random atomic measure G = w j δ θj Construction: two-dimensional Poisson process N = {w j, θ j },... Generalized gamma process G GGP(α, σ, τ, h) characterized by a Lévy measure λ(w)h(θ)dwdθ with α λ(w) = Γ(1 σ) w σ 1 e τ w 0 (1 e w )λ(w)dw < finite total z ij. [Kingman, 1967, Brix, 1999, Regazzini et al., 03, Lijoi and Prünster, ] F. Caron 1 / 62 Posterior characterization Observed Z 1,..., Z n K n books at locations θn,j read m n,j times Cannot derive directly the conditional of G given Z 1,..., Z n nor the predictive of Z n+1 given Z 1,..., Z n Let X i = x ij δ θj where x ij = max(0, s ij + log(γ i )) 0 are latent positive scores. log(γ i ) books books score censored score F. Caron 2 / 62

27 Posterior Characterization The conditional distribution of G given X 1,... X n can be expressed as G = G + K n w j δ θ n,j where G and (w j ) are mutually independent with ( ) n G GGP α, σ, τ + γ i, h and the masses are w j other Gamma ( i=1 m n,j σ, τ + ) n γ i e x ij i=1 Characterization related to that for normalized random measures [Prünster, 02, James, 02, James et al., 09] F. Caron 3 / 62 Indian buffet process with latent scores Predictive distribution of Z n+1 given the latent process X 1,..., X n Books Reader Reader Reader [Caron, 12] F. Caron 4 / 62

28 Prior Draws Generalized Gamma process with τ = 1, γ i = 2. Readers Readers Readers Books Books Books (e) α = 1, σ = 0 (f) α =, σ = 0 (g) α =, σ = 0 Readers Readers Readers Books Books Books (h) α = 2, σ = 0.1 (i) α = 2, σ = 0. (j) α = 2, σ = 0.9 [Brix, 1999, Lijoi et al., 07] F. Caron / 62 Properties of the model Power-law behavior for the generalized gamma process with σ > 0 The total number of books read by n readers is O(n σ ) Asympt., the proportion of books read by m readers is O(m 1 σ ) F. Caron 6 / 62

29 Properties of the model (stable) Beta-Bernoulli/Indian Buffet process G stablebetap Z i G BeP(G) Special case of the latent IBP model when γ i = γ and λ(w) = αγ(1 + c) Γ(1 σ)γ(σ + c) γ(1 e γw ) σ 1 e γw(c+σ) In this case one can marginalize out the latent variables in the predictive distribution to obtain the (stable) Indian Buffet process F. Caron 7 / 62 Model for the interest in reading parameters Fixed γ i : Poisson degree distribution for readers with different rates ( ) α Poisson τ ((τ + γ i) σ τ σ ) Random γ i : conjugate gamma prior γ i Gamma(a γ, b γ ) Degree of readers is mixture of Poisson (heavier tails) F. Caron 8 / 62

30 Bibliography I Brix, A. (1999). Generalized gamma measures and shot-noise Cox processes. Advances in Applied Probability, 31(4): Broderick, T., Jordan, M. I., and Pitman, J. (13a). Cluster and feature modeling from combinatorial stochastic processes. Statistical Science, 28(3): Broderick, T., Pitman, J., and Jordan, M. I. (13b). Feature allocations, probability functions, and paintboxes. Bayesian Analysis, 8(4): Caron, F. (12). Bayesian nonparametric models for bipartite graphs. In NIPS. Fox, E., Sudderth, E., Jordan, M., and Willsky, A. (09). Sharing features among dynamical systems with beta processes. In NIPS, volume 22, pages Griffiths, T. and Ghahramani, Z. (0). Infinite latent feature models and the Indian buffet process. In NIPS. F. Caron 9 / 62 Bibliography II Griffiths, T. and Ghahramani, Z. (11). The Indian buffet process: an introduction and review. Journal of Machine Learning Research, 12(April): Hjort, N. (1990). Nonparametric bayes estimators based on beta processes in models for life history data. The Annals of Statistics, 18(3): James, L., Lijoi, A., and Prünster, I. (09). Posterior analysis for normalized random measures with independent increments. Scandinavian Journal of Statistics, 36(1): James, L. F. (02). Poisson process partition calculus with applications to exchangeable models and bayesian nonparametrics. arxiv preprint math/0093. Kim, Y. (1999). Nonparametric Bayesian estimators for counting processes. Annals of Statistics, pages Kingman, J. (1967). Completely random measures. Pacific Journal of Mathematics, 21(1):9 78. F. Caron 60 / 62

31 Bibliography III Lijoi, A., Mena, R. H., and Prünster, I. (07). Controlling the reinforcement in Bayesian non-parametric mixture models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4): Lijoi, A. and Prünster, I. (). Models beyond the Dirichlet process. In N. L. Hjort, C. Holmes, P. M. S. G. W., editor, Bayesian Nonparametrics. Cambridge University Press. Meeds, E., Ghahramani, Z., Neal, R., and Roweis, S. (07). Modeling dyadic data with binary latent factors. In NIPS, volume 19, page 977. MIT; Prünster, I. (02). Random probability measures derived from increasing additive processes and their application to Bayesian statistics. PhD thesis, University of Pavia. Regazzini, E., Lijoi, A., and Prünster, I. (03). Distributional results for means of normalized random measures with independent increments. The Annals of Statistics, 31(2):60 8. F. Caron 61 / 62 Bibliography IV Teh, Y. and Görür, D. (09). Indian buffet processes with power-law behavior. In NIPS. Thibaux, R. and Jordan, M. (07). Hierarchical beta processes and the Indian buffet process. In International Conference on Artificial Intelligence and Statistics, volume 11, pages Wood, F. and Griffiths, T. L. (07). Particle filtering for nonparametric Bayesian matrix factorization. In Advances in Neural Information Processing Systems, volume 19, page 13. MIT; F. Caron 62 / 62

Bayesian nonparametric models for bipartite graphs

Bayesian nonparametric models for bipartite graphs François Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers