Additive Regularization for Hierarchical Multimodal Topic Modeling

Size: px

Start display at page:

Download "Additive Regularization for Hierarchical Multimodal Topic Modeling"

Blanche Owens
6 years ago
Views:

1 Additive Regularization for Hierarchical Multimodal Topic Modeling N. A. Chirkova 1,2, K. V. Vorontsov 3 1 JSC Antiplagiat, 2 Lomonosov Moscow State University 3 Federal Research Center Computer Science and Control of RAS October 14, 2016 N. A. Chirkova October 14, / 31

2 Topic hierarchies for automatic text categorization How to overview a large text collection in a few minutes? Topic hierarchy: soft hierarchical documents clustering into topics; topics are described by specific terminology. A fragment of English Wikipedia topic hierarchy N. A. Chirkova October 14, / 31

3 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

4 Topic hierarchies for automatic text categorization Topic articles: Toccata and Fugue, F major, E minor, Carl Friedrich Abel, List of compositions by Frédéric Chopin by genre, Piano quintet, F minor... N. A. Chirkova October 14, / 31

5 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

6 Topic hierarchies for automatic text categorization Topic articles: Filmfare Award for Best Actor, Filmfare Award for Best Film, Karisma Kapoor, Rishi Kapoor, Arjun Rampal, Shammi Kapoor... N. A. Chirkova October 14, / 31

7 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

8 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

9 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

10 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

11 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

12 Topic hierarchies for automatic text categorization Topic articles: Functional (C++), SQL/CLI, SQL/JRT, Constructor (object-oriented programming), Static cast, Copy constructor, C++/CX, Java Persistence Query Language... N. A. Chirkova October 14, / 31

13 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

14 Applications of topic hierarchies Navigation through large text collection Harmonization of existing categorizations duplicate categories detection splitting of miscellaneous topics Searching of semantically similar documents News filtering The need for automatic learning of topic hierarchies. N. A. Chirkova October 14, / 31

15 Applications of topic hierarchies: real world tasks Navigation through large multilingual, multisource, multilmodal text collection Harmonization of existing categorizations duplicate categories detection miscellaneous categories splitting detecting of relations between categories Personalized searching for semantically similar documents News filtering with respect to geography and time The need for automatic learning of flexible topic hierarchies. N. A. Chirkova October 14, / 31

16 Topic hierarchies in ARTM Additive Regularization of Topic Models: Modeling fixed number of topics from a set of multimodal documents: text, tags, authors, categories, geotags ans timestamps, commented users, etc flexibility Regularization to satisfy additional requirements: topics sparsity, decorrelation, interpretability; consistency with partial markup, etc flexibility Scalable open-source implementation: BigARTM.org The goal of the research: to extend ARTM to learn topic hierarchies and to implement approach in BigARTM. N. A. Chirkova October 14, / 31

17 Topic hierarchies in ARTM: key features Topic hierarchy is a multipartite (multilevel) graph of topics: The flexibility of hierarchical structure: multiple inheritance (a topic may have several parent topics); control over hierarchy sparsity. Automatic determination of children topics number. N. A. Chirkova October 14, / 31

18 Topic hierarchies in ARTM: approach 1 Each level (except Root) is a flat topic model with its own regularizers. 2 When learning topics of l-th level we use specific regularier to find parent topics from (l 1)-th level. 3 We propose a regularizer to control hierarchy sparsity. N. A. Chirkova October 14, / 31

19 ARTM: a flat topic model Given: documents set d D, modalities m M, modalities disjoint dictionaries W = m M W m of tokens w W, document-token counters matrix n dw used to estimate p(w d): n dw p(w d) = w W m n dw Flat topic model for each modality m: p(w d) t T p(w t)p(t d) = t T φ wt θ td d D, w W m, with topics set T and model parameters Φ m = {φ wt } W m T with p(w t) and Θ = {θ td } T D with p(t d) values, Φ = m M Φm Vorontsov K., Frei O., Apishev M., Romov P., Suvorova M., Yanina A. Non-bayesian additive regularization for multimodal topic modeling of large collections N. A. Chirkova October 14, / 31

20 ARTM: flat model learning Optimization task: κ m n dw ln φ wt θ td + τ i R i (Φ, Θ) m M d D w W m t T i Log Likelihood w W m φ ws = 1; φ ws 0 m; EM-algorithm for topic model training: E-step : p(t d, w) = norm [φ wtθ td ] t T [ M-step : n wt + R φ wt φ wt φ wt = norm w W m [ θ td = norm t T n td + R θ td θ td Regularizers θ sd = 1; θ sd 0 s norm[y i ] = i I max Φ,Θ max{y i,0} i I max{y i,0} ], n wt = d D n dw p(t d, w) ], n td = w W n dw p(t d, w) Vorontsov K., Frei O., Apishev M., Romov P., Suvorova M., Yanina A. Non-bayesian additive regularization for multimodal topic modeling of large collections N. A. Chirkova October 14, / 31

21 ARTM: regularizers example The goal: distributions p(w t) and p(t d) should be sparse. Θ sparsing: R 1 (Θ) = 1 T ln θ td d D t T Updated M-step: [ θ td = norm n td τ ] 2 t T T Φ sparsing: R 2 (Φ m ) = 1 W m ln φ wt t T w W m Updated M-step: [ φ wt = norm n wt τ ] 1 w W m W m Vorontsov K., Frei O., Apishev M., Romov P., Suvorova M., Yanina A. Non-bayesian additive regularization for multimodal topic modeling of large collections N. A. Chirkova October 14, / 31

22 hartm: Φ interlevel regularizer Already learned: levels 1,..., l, l-th level: topics set a A, parameters Φ l R W A and Θ l R A D. Level to learn: topics set t T, parameters Φ R W T and Θ R T D. The goal: to establish parent-child relations t is a child of a. Hypothesis: parent topic is a mixture of children topics p(w a) = t T p(w t)p(t a), w W m, a A. Φ regularization criteria with new parameters Ψ = {ψ ta } T A, ψ ta = p(t a): Φ l ΦΨ R 3 (Φ, Ψ) = n wa ln φ wt ψ ta m M a A w W m t T Implementation: A pseudodocuments with n wa (counted on M-step). N. A. Chirkova October 14, / 31

23 hartm: Θ interlevel regularizer Already learned: levels 1,..., l, l-th level: topics set a A, parameters Φ l R W A and Θ l R A D. Level to learn: topics set t T, parameters Φ R W T and Θ R T D. The goal: to establish parent-child relations t is a child of a. Hypothesis: p(a d) = t T p(a t)p(t d), a A, d D. Θ regularization criteria with new parameters Ψ = { ψ at } A T, ψ at = p(a t): Θ l ΨΘ R 4 (Θ, Ψ) = n ad ln ψ at θ td a A d D t T Implementation: new modality with tokens corresponding to a A. N. A. Chirkova October 14, / 31

24 hartm: interlevel regularizers illustration PLSA ARTM F.. Φ Θ F 1 F 2.. Φ1 Φ 2 Θ hartm with F Φ l.. Φ Φ reg. hartm with Θ reg. Θ Ψ F 1 F 2 Θ l.. Φ 1 Φ 2 F = m M F m, F m = {f dw } W m T, f dw = norm w W m[n dw ] Ψ Θ N. A. Chirkova October 14, / 31

25 hartm: hierarchy sparsing with Θ interlevel regularizer The goal: topics have small number of parent topics p(a t) is sparse. Entropy sparsing regularizer: R 5 ( Ψ) = t T a A 1 A ln ψ at Updated M-step: [ ψ at = norm n at τ ] 5 a A A Drawback: the possibility of p(a t) = 0 a Power sparsing regularizer: R 5 ( Ψ) = 1 ψ q q at, q > 1 Updated M-step: t T a A [ ] ψ at = norm n at + τ 5 ψ at q a A N. A. Chirkova October 14, / 31

26 hartm: hierarchy sparsing with Φ interlevel regularizer The goal: topics have small number of parent topics p(a t) is sparse. Entropy sparsing regularizer: R 5 (Ψ) = t T Updated M-step: a A ψ ta = norm t T At any time t a : p(a t) > ln p(a t) = A A [ ln ( ) ] 1 n ta τ 5 A p(a t) a t ψ ta p(a) a ψ ta p(a ) N. A. Chirkova October 14, / 31

27 hartm in BigARTM Key BigARTM concepts: Documents set is split into batches and stored on disk 1 EM-step = a pass through batches iterating over each batch Storing Φ permanently, retraining Θ for any loaded batch Φ interlevel regularizer implementation: 1 Learn levels l = 1, 2, For levels l > 1 add 1 extra batch composed from (l 1)-th level s Φ 3 Extract Ψ as Θ corresponding to extra batch Θ intervelel regularizer implementation: 1 Learn levels l = 1, 2, For levels l > 1 modify all batches: add extra modality composed from (l 1)-th level s Θ 3 Extract Ψ as Φ corresponding to extra modality N. A. Chirkova October 14, / 31

28 Experiments: comparison of Φ and Θ interlevel regularizers Wikipedia: D = , W = Learning 2 nd level, A = 50, T = 250, vary number of batches. Measuring the quality of approximation Φ l ΦΨ and Θ l = ΨΘ. ρ(φ l, ΦΨ) Φ ρ(θ l, ΨΘ) Φ ρ(φ l, ΦΨ) Θ ρ(θ l, ΨΘ) Θ Approximation is quite the same with both regularizers, Φ-reg. is better. N. A. Chirkova October 14, /

29 Experiments: children number study Postnauka: D = 1728, W = Learning 2 nd level with Φ-reg., A = 10, T = 30, vary hierarchy sparsing reg. τ 5. Measuring the mean and standard deviation of estimated subtopics count over 10 restarts. t is a child of a if p(t a) > threshold. log 10 τ log 10 τ The bigger τ 5, the more sparse the hierarchy. For large τ 5 subtopics count estimation is robust (std < 1). N. A. Chirkova October 14, / 31

30 Experiments: parent-child relations study Postnauka: D = 1728, W = Learning topics hierarchy with Φ-reg. Generating 100 pairs topic-subtopic, asking an expert to mark a pair as relation exists or not. p(a t) no sparsing p(a t) Ψ sparsing When using the hierarchy sparcing, we can impose a threshold with minimum errors. N. A. Chirkova October 14, / 31

31 Summary Contributions: An approach to learn topic hierarchies from multimodal data with additional requirements. A method to control hierarchy sparsity. Open-source implementation in BigARTM with friendly interface. Ongoing projects with hartm: Creating a user-friendly navigator through Postnauka.ru materials. Developing a system for online news flow filtration. N. A. Chirkova October 14, / 31

Additive Regularization of Topic Models for Topic Selection and Sparse Factorization

Additive Regularization of Topic Models for Topic Selection and Sparse Factorization Konstantin Vorontsov 1, Anna Potapenko 2, and Alexander Plavin 3 1 Moscow Institute of Physics and Technology, Dorodnicyn