Infinite latent feature models and the Indian Buffet Process

Size: px

Start display at page:

Download "Infinite latent feature models and the Indian Buffet Process"

Collin Campbell
5 years ago
Views:

1 p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani

2 p.2 Beyond latent classes Unsupervised learning often uses latent classes Many domains require richer representations membership in multiple latent classes more generally, latent features How do we choose the dimensionality of representations? problem of model selection

3 p.3 Perspectives on model selection Compare multiple models of different dimensionality Bayes factors, cross-validation, etc. (usually) false assumption of fixed dimension hard to apply to large model spaces Define a single model of unbounded dimensionality posterior on dimensionality via posterior on parameters allows dimensionality to grow with new data pursused in nonparametric Bayesian density estimation (e.g., Antoniak, 1974; Escobar & West, 1995)

4 p.4 Latent class models Associate each datapoint x i with a latent class z i data matrix X = [x 1...x N ] T class vector z = [z 1...z N ] T Model defined by prior on class assignments P(z) likelihood p(x z) How do we choose the number of classes? Nonparametric Bayes: define P(z) to allow infinitely many classes, of which a finite subset are used

5 p.5 Chinese restaurant process (CRP) Chinese restaurant with infinitely many infinite tables N customers sit down the first customer sits at the first table the ith customer chooses a table at random P(occupied table k previous customers) = P(next unoccupied table previous customers) = m k α+i 1 α α+i 1

6 p.5 Chinese restaurant process (CRP) Defines a distribution over partitions e.g., ( ) (2 5 10) (6) (7 9) Customers are exchangeable (Aldous, 1985; Pitman, 2002) P(z) = α K + K + (m k 1)! k=1 Γ(α) Γ(N + α)

7 p.6 CRP and mixture modeling β1 β 10 β β β Each table k corresponds to a mixture component associated with a parameter β k drawn from a prior e.g., Gaussian CRP mixture model: z CRP(α) x i z i,β Gaussian(β zi,σ X ) β k Gaussian(0,σ β )

8 p.6 CRP and mixture modeling β1 β 10 β β β Given data x, posterior on z gives # of classes (# of occupied tables) which data are assigned to each class parameter for each class, p(β k data assigned to table k) Posterior inference via Gibbs sampling (e.g., Neal, 1998)

9 p.7 Gibbs sampling Sequentially sample class assignments (for each i) CRP provides P(z i z i ) P(z i x,z i ) p(x i x i,z)p(z i z i ) P(z i = occupied class k z i ) = m k, i α+n 1 P(z i = new class z i ) = α α+n 1 Allows datapoints to come from new classes Also split-merge algorithms (Jain & Neal, 2000; Dahl, 2003)

10 p.8 Beyond latent classes The CRP allows number of classes to be inferred a prior on class assignments of unbounded dimension distribution over partitions Can we apply a parallel strategy with other representations? Infinite latent feature models a prior on feature assignments of unbounded dimension distribution over binary matrices

11 p.9 Different feature representations K features N objects Binary features

12 p.9 Different feature representations K features objects N Binary features Factorial features

13 p.9 Different feature representations K features objects N Binary features Factorial features Continuous features

14 p.10 Latent feature models Associate each datapoint x i with a latent feature vector z i data matrix X = [x 1...x N ] T feature matrix Z = [z 1...z N ] T Model defined by prior on feature assignments P(Z) likelihood p(x Z) How do we choose the number of features? Nonparametric Bayes: define P(Z) to allow infinitely many features, of which a finite subset are used

15 p.11 Priors on binary matrices Start with priors on N K matrices, take K Two cases: class matrices : one 1 per row feature matrices : general binary matrices Two priors: the Chinese restaurant process the Indian buffet process

16 p.12 Class matrices lof z i θ Discrete(θ) θ Dirichlet( α K )

17 p.12 Class matrices lof P(Z) = N P(z i θ)p(θ) dθ i=1

18 p.13 Left-ordered form lof History h of each class: binary column vector lof orders columns by values of binary histories

19 p.14 lof equivalence classes lof X and Y are lof equivalent iff lof(x) = lof(y) Class matrices: lof equivalence classes are partitions

20 p.15 lof equivalence classes lof lim k P([Z]) = αk + K + (m k 1)! k=1 Γ(α) Γ(N + α) (see also Green & Richardson, 2001; Neal, 1992)

21 p.16 Feature matrices For general binary matrices z ik Bernoulli(θ k ) θ k Beta( α K,1)

22 p.16 Feature matrices For general binary matrices z ik Bernoulli(θ k ) θ k Beta( α K,1) For a finite matrix Z P(Z) = K P(Z θ 1,...,θ k ) p(θ k ) dθ k k=1

23 Feature matrices For general binary matrices z ik Bernoulli(θ k ) θ k Beta( α K,1) For a finite matrix Z P(Z) = K P(Z θ 1,...,θ k ) p(θ k ) dθ k k=1 Taking the limit as K... P([Z]) = exp{ α N i=1 1 i } α K+ h>0 K h! (N m k )!(m k 1)! N! k K + p.16

24 p.17 Indian buffet process (IBP) Indian restaurant with infinitely many infinite dishes N customers serve themselves the first customer samples Poisson(α) dishes the ith customer samples a previously sampled dish with probability m k i+1 then samples Poisson( α i ) new dishes

25 p.17 Indian buffet process (IBP) Indian restaurant with infinitely many infinite dishes N customers serve themselves the first customer samples Poisson(α) dishes the ith customer samples a previously sampled dish with probability m k i+1 then samples Poisson( α i ) new dishes

26 p.17 Indian buffet process (IBP) Indian restaurant with infinitely many infinite dishes N customers serve themselves the first customer samples Poisson(α) dishes the ith customer samples a previously sampled dish with probability m k i+1 then samples Poisson( α i ) new dishes

27 p.17 Indian buffet process (IBP) Indian restaurant with infinitely many infinite dishes N customers serve themselves the first customer samples Poisson(α) dishes the ith customer samples a previously sampled dish with probability m k i+1 then samples Poisson( α i ) new dishes

28 p.17 Indian buffet process (IBP) Indian restaurant with infinitely many infinite dishes N customers serve themselves the first customer samples Poisson(α) dishes the ith customer samples a previously sampled dish with probability m k i+1 then samples Poisson( α i ) new dishes

29 p.18 Properties of the IBP Customers are exchangeable Total number of dishes K + Poisson(α N i=1 1 i ) i.e. K + as N, as with the CRP Number of dishes sampled by each customer Poisson(α) sparsity is coupled to dimension Expected number of non-zero entries in Z is Nα

30 p.19 A linear-gaussian model Likelihood P(X Z) specified by x i Gaussian(z i A,σ X I) A Gaussian(0,σ A I) For Z CRP(α), spherical Gaussian mixture model For Z IBP(α), binary latent factor model Compute posterior distribution P(Z X)

31 p.20 Gibbs sampling Sequentially sample feature assignments P(z ik X,z ( i)k ) p(x i X i,z)p(z ik z ( i)k ) IBP provides P(z ik X,z ( i)k ) P(z ik z ( i)k ) = m k, i N Draw Poisson( α N ) new features

32 p.21 Coding for the presence of objects Photographs of everyday objects taken with a webcam 100 images, each pixels Each image contained from 1 to 4 (fixed position) objects

33 p.21 Coding for the presence of objects (Positive) (Negative) (Negative) (Negative) K

34 p.22 Conclusion Strategy for model selection from nonparametric Bayes: prior over combinatorial structures of variable dimension Structure Distribution Models partitions CRP infinite mixture models binary matrices IBP infinite binary latent factors ( Z) infinite CVQ ( R) infinite sparse PCA Other exchangeable distributions?

Infinite Latent Feature Models and the Indian Buffet Process

Infinite Latent Feature Models and the Indian Buffet Process Thomas L. Griffiths Cognitive and Linguistic Sciences Brown University, Providence RI 292 tom griffiths@brown.edu Zoubin Ghahramani Gatsby Computational