Collapsed Variational Inference for HDP

Size: px

Start display at page:

Download "Collapsed Variational Inference for HDP"

Alaina Cobb
5 years ago
Views:

1 Collapse Variational Inference for HDP Yee W. Teh Davi Newman an Max Welling Publishe on NIPS 2007 Discussion le by Iulian Pruteanu

2 Outline Introuction Hierarchical Bayesian moel for LDA Collapse VB inference for HDP = CV-HDP Experiments Discussion

3 Introuction For Dirichlet-multinomial moels such as LDA or HDP the inference metho of choice is typically collapse Gibbs sampling but it seems to be necessary to consier alternatives to sampling. Teh et. al. NIPS 2006 propose an improve VB approximation for LDA CV-LDA base on the iea of collapsing integrating out moel parameters while assuming other latent variables inepenent. Previous wor on collapse variational Latent Dirichlet Allocation LDA i not consier moel selection an inference for hyperparameters. Avantages of CV-HDP over CV-LDA: the optimal number of variational components is not finite the number of topics is unlimite; the posterior istribution over hyperparameters of Dirichlet variables is treate exactly. The algorithm is fully Bayesian; the only assumptions mae are inepenencies among latent topic variables an hyperparameters. CVB algorithm maing use of some approximations is easy to implement an more accurate than stanar VB by collapsing moel variables the uncertainty upon the moel is reuce.

4 Hierarchical Bayesian moel for LDA1/3 Stic π θ i φ a Dir 1... x i i 1...n 1...D

5 Hierarchical Bayesian moel for LDA2/3 θ φ x { x i } { i } { } { w } - observe wors - latent variables topic inices - mixing proportions - topic parameters D K - number of ocuments - number of topics φ θ j π ~ Dir π ~ Dir H base istribution x i i i θ ~ i ~ θ π Stic Dir a i π Stic θ i x i φ 1... a Dir i 1...n 1...D

6 Hierarchical Bayesian moel for LDA3/3 In the normal Dirichlet process notation we woul equivalently have: G G 0 ~ DP H ~ DP G H Dir 0 G G π Stic θ φ ~ Dir π ~ Dir ~ Dir a π Stic θ i φ 1... a Dir x i i 1...n 1...D

7 CV inference for HDP 1/3 In variational Bayesian approximation we assume a factorie form for the posterior approximating istribution. However it is not a goo assumption since changes in moel parameters on latent variables. θφ will have a consierable impact CVB is equivalent to marginaliing out the moel parameters approximating the posterior over the latent variable. θφ before The exact implementation of CVB has a close form an it seems to be computationally practical. The authors use the Gaussian approximation which wore very accurately in the CV-LDA paper as well.

8 CV inference for HDP 2/3 } : { w x i n i i w # W 1 K 1 D 1 K 1 w w w w n n n n n p π x K = inex such that K i The form of the variational posterior:

9 CV inference for HDP 3/3 Variational upates of the hyperparameters: Variational upates of auxiliary variables:

10 Experiments A: results for KOS. D=3430 ocuments; W=6906; N= wors B: results for Reuters ataset. D=8433 ocuments; W=4593; N= wors 10% for testing; 50 ranom runs C: results for 20 Newsgroups. D=4726 ocuments; W=8424; N= wors VLDA=VB-LDA GLDA=C-Gibbs-LDA GHDP=Gibbs-HDP K=truncation level #steps=#iterations*k

11 Conclusions CV-HDP presents an improvement over CV-LDA by taing the infinite topic limit in the generative moel an truncating the variational posterior an by inferring posterior istributions over the higher level variables.

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example