Diversifying Restricted Boltzmann Machine for Document Modeling. Pengtao Xie. Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University

Size: px

Start display at page:

Download "Diversifying Restricted Boltzmann Machine for Document Modeling. Pengtao Xie. Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University"

Rudolf French
5 years ago
Views:

1 Diversifying Restricted Boltzmann Machine for Document Modeling Pengtao Xie Joint work with Yuntian Deng and Eric Xing Carnegie Mellon University 1

2 Document Modeling Politics nimal Food 2

3 Document Modeling 3

4 Restricted Boltzmann Machine 4

5 Popularity of Topics Power-law distribution Dominant Topics Politics Economics Sports Long-tail Topics Garden nimal Furniture Tour Flower Food 5

6 RBM is insufficient to capture long-tail topics 6

7 Long-tail topics are important mount is large Long-tail topics rguably more interesting Example: in advertisement, a lose weight topic is more important than a time topic 7

8 Diversification 8

9 Diversity Regularized RBM Goal: encourage the latent factors to spread out to improve the coverage of long-tail topics pproach: Define a metric to measure the diversity of latent factors Use the diversity metric to regularize the learning of latent factors 9

10 Diversity Metric Measure the dissimilarity between two vectors Measure the diversity of a vector set 10

11 Dissimilarity between two vectors Invariant to scale, translation, rotation and orientation of the two vectors Euclidean distance, L1 distance Variant to scale O O Cosine similarity Variant to orientation O O 11

12 Dissimilarity between two vectors Non-obtuse angle O θ O θ O θ Invariant to scale, translation, rotation and orientation of the two vectors Definition arccos xy x y 12

13 Measure the diversity of a vector set Based on the pairwise dissimilarity measure between vectors The diversity of a set of vectors a where K i i 1 ( ) mean( ) var( ) i K, jk ij i1, j1 ij arccos is defined as Mean: summarize how these vectors are different from each other on the whole Variance: encourage the vectors to evenly spread out a a i i a a j j 13

15 Optimization ,, ) ( ) ; (.. max, i g i i D L t s a g g i i g a g ) diag( Fix, optimize g 1, ) ( ) ; (.. max i i D L t s a g Fix, optimize 0, ) ; (.. max i g i D L s t g g g Reparametrize

16 Optimization 16 1, ) ( ) ; (.. max i i D L t s a g Lower bound 2 ))) det( arcsin( 2 ( )) det( arcsin( ) ( ) ( T T 1, ) ( ) ; (.. max i i D L t s a g

17 Theorem Maximizing the lower bound with projected gradient ascent (PG) can increase the diversity metric Maximizing the lower bound with PG can increase the mean of the angles Maximizing the lower bound with PG can reduce the variance of the angles 17

18 Geometry Interpretation The gradient of the lower bound w.r.t complement of the space spanned by a i is in the orthogonal, a, 2 a 1, a K a i

19 Geometry Interpretation The gradient of the lower bound w.r.t complement of the space spanned by a i is in the orthogonal, a, 2 a 1, a K a i

20 Experiments Datasets Baselines Bag-of-Words (BOW); Latent Dirichlet llocation (LD); LD regularized with Determinantal Point Process prior (DPP-LD); Pitman-Yor Process Topic Model (PYTM); Latent IBP Compound Dirichlet llocation (LID); Neural utoregressive Topic Model (DocNDE); Paragraph Vector (PV); Restricted Boltzmann Machine Evaluation Retrieval: Clustering: accuracy Perplexity Qualitative evaluation #categories #samples vocab. size TDT News Reuters

21 (%) (%) (%) Retrieval Precision on TDT Dataset on 20-News Dataset RBM DRBM RBM DRBM Number of hidden units K Number of hidden units K Precision@100 on Reuters Dataset Number of hidden units K RBM DRBM

22 Retrieval Precision TDT 20-News Reuters BOW LD DPP-LD PYTM LID DocNDE PV RBM DRBM

23 ccuracy (%) ccuracy (%) ccuracy (%) Clustering ccuracy ccuracy on TDT Dataset Number of hidden units K RBM DRBM ccuracy on 20-News Dataset Number of hidden units K RBM DRBM ccuracy on Reuters Dataset RBM DRBM Number of hidden units K

24 Clustering ccuracy TDT 20-News Reuters BOW LD DPP-LD PYTM LID DocNDE PV RBM DRBM

25 Perplexity Perplexity Perplexity Perplexity Perplexity on TDT Dataset Perplexity on 20-News Dataset Number of hidden units K RBM DRBM Number of hidden units K RBM DRBM Perplexity on Reuters Dataset RBM DRBM Number of hidden units K

26 Qualitative Evaluation Exemplar Topics Learned by RBM and DRBM RBM DRBM Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 president iraq iraq olympic spkr president olympic iraq lawyers students clinton united un games voice iraq games united kaczynski japanese iraq un iraqi nagano tobacco clinton olympics un ms japan united weapons lewinsky olympics olympic united nagano weapons defense school spkr iraqi saddam game games million team iraqi trial ms house nuclear clinton team people lewinsky gold baghdad judge united people india baghdad gold olympics thailand game council people yen lewinsky minister inspectors japan nagano spkr hockey inspectors prosecutor gm government saddam weapons medal game government medal nations kaczynskis tokyo white military white hockey gold jones winter military government south

27 Sensitivity to Tradeoff Parameter Sensitivity of DRBM to tradeoff parameter λ on (a) TDT dataset (b) 20-News dataset (c) Reuters dataset

28 Conclusions Problem The popularity of topics is distributed in a power-law fashion Standard RBM is insufficient to capture long-tail topics Solution Diversify the hidden units in RBM to improve the coverage of long-tail topics Define an angle based diversity regularizer Optimization Results Experiments on document retrieval and clustering demonstrate the effectiveness of the diversity regularizer

29 Thank you! Questions? 29

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications

Diversity Regularization of Latent Variable Models: Theory, Algorithm and Applications Pengtao Xie, Machine Learning Department, Carnegie Mellon University 1. Background Latent Variable Models (LVMs) are