Additive Regularization for Hierarchical Multimodal Topic Modeling

Size: px
Start display at page:

Download "Additive Regularization for Hierarchical Multimodal Topic Modeling"

Transcription

1 Additive Regularization for Hierarchical Multimodal Topic Modeling N. A. Chirkova 1,2, K. V. Vorontsov 3 1 JSC Antiplagiat, 2 Lomonosov Moscow State University 3 Federal Research Center Computer Science and Control of RAS October 14, 2016 N. A. Chirkova October 14, / 31

2 Topic hierarchies for automatic text categorization How to overview a large text collection in a few minutes? Topic hierarchy: soft hierarchical documents clustering into topics; topics are described by specific terminology. A fragment of English Wikipedia topic hierarchy N. A. Chirkova October 14, / 31

3 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

4 Topic hierarchies for automatic text categorization Topic articles: Toccata and Fugue, F major, E minor, Carl Friedrich Abel, List of compositions by Frédéric Chopin by genre, Piano quintet, F minor... N. A. Chirkova October 14, / 31

5 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

6 Topic hierarchies for automatic text categorization Topic articles: Filmfare Award for Best Actor, Filmfare Award for Best Film, Karisma Kapoor, Rishi Kapoor, Arjun Rampal, Shammi Kapoor... N. A. Chirkova October 14, / 31

7 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

8 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

9 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

10 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

11 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

12 Topic hierarchies for automatic text categorization Topic articles: Functional (C++), SQL/CLI, SQL/JRT, Constructor (object-oriented programming), Static cast, Copy constructor, C++/CX, Java Persistence Query Language... N. A. Chirkova October 14, / 31

13 Topic hierarchies for automatic text categorization N. A. Chirkova October 14, / 31

14 Applications of topic hierarchies Navigation through large text collection Harmonization of existing categorizations duplicate categories detection splitting of miscellaneous topics Searching of semantically similar documents News filtering The need for automatic learning of topic hierarchies. N. A. Chirkova October 14, / 31

15 Applications of topic hierarchies: real world tasks Navigation through large multilingual, multisource, multilmodal text collection Harmonization of existing categorizations duplicate categories detection miscellaneous categories splitting detecting of relations between categories Personalized searching for semantically similar documents News filtering with respect to geography and time The need for automatic learning of flexible topic hierarchies. N. A. Chirkova October 14, / 31

16 Topic hierarchies in ARTM Additive Regularization of Topic Models: Modeling fixed number of topics from a set of multimodal documents: text, tags, authors, categories, geotags ans timestamps, commented users, etc flexibility Regularization to satisfy additional requirements: topics sparsity, decorrelation, interpretability; consistency with partial markup, etc flexibility Scalable open-source implementation: BigARTM.org The goal of the research: to extend ARTM to learn topic hierarchies and to implement approach in BigARTM. N. A. Chirkova October 14, / 31

17 Topic hierarchies in ARTM: key features Topic hierarchy is a multipartite (multilevel) graph of topics: The flexibility of hierarchical structure: multiple inheritance (a topic may have several parent topics); control over hierarchy sparsity. Automatic determination of children topics number. N. A. Chirkova October 14, / 31

18 Topic hierarchies in ARTM: approach 1 Each level (except Root) is a flat topic model with its own regularizers. 2 When learning topics of l-th level we use specific regularier to find parent topics from (l 1)-th level. 3 We propose a regularizer to control hierarchy sparsity. N. A. Chirkova October 14, / 31

19 ARTM: a flat topic model Given: documents set d D, modalities m M, modalities disjoint dictionaries W = m M W m of tokens w W, document-token counters matrix n dw used to estimate p(w d): n dw p(w d) = w W m n dw Flat topic model for each modality m: p(w d) t T p(w t)p(t d) = t T φ wt θ td d D, w W m, with topics set T and model parameters Φ m = {φ wt } W m T with p(w t) and Θ = {θ td } T D with p(t d) values, Φ = m M Φm Vorontsov K., Frei O., Apishev M., Romov P., Suvorova M., Yanina A. Non-bayesian additive regularization for multimodal topic modeling of large collections N. A. Chirkova October 14, / 31

20 ARTM: flat model learning Optimization task: κ m n dw ln φ wt θ td + τ i R i (Φ, Θ) m M d D w W m t T i Log Likelihood w W m φ ws = 1; φ ws 0 m; EM-algorithm for topic model training: E-step : p(t d, w) = norm [φ wtθ td ] t T [ M-step : n wt + R φ wt φ wt φ wt = norm w W m [ θ td = norm t T n td + R θ td θ td Regularizers θ sd = 1; θ sd 0 s norm[y i ] = i I max Φ,Θ max{y i,0} i I max{y i,0} ], n wt = d D n dw p(t d, w) ], n td = w W n dw p(t d, w) Vorontsov K., Frei O., Apishev M., Romov P., Suvorova M., Yanina A. Non-bayesian additive regularization for multimodal topic modeling of large collections N. A. Chirkova October 14, / 31

21 ARTM: regularizers example The goal: distributions p(w t) and p(t d) should be sparse. Θ sparsing: R 1 (Θ) = 1 T ln θ td d D t T Updated M-step: [ θ td = norm n td τ ] 2 t T T Φ sparsing: R 2 (Φ m ) = 1 W m ln φ wt t T w W m Updated M-step: [ φ wt = norm n wt τ ] 1 w W m W m Vorontsov K., Frei O., Apishev M., Romov P., Suvorova M., Yanina A. Non-bayesian additive regularization for multimodal topic modeling of large collections N. A. Chirkova October 14, / 31

22 hartm: Φ interlevel regularizer Already learned: levels 1,..., l, l-th level: topics set a A, parameters Φ l R W A and Θ l R A D. Level to learn: topics set t T, parameters Φ R W T and Θ R T D. The goal: to establish parent-child relations t is a child of a. Hypothesis: parent topic is a mixture of children topics p(w a) = t T p(w t)p(t a), w W m, a A. Φ regularization criteria with new parameters Ψ = {ψ ta } T A, ψ ta = p(t a): Φ l ΦΨ R 3 (Φ, Ψ) = n wa ln φ wt ψ ta m M a A w W m t T Implementation: A pseudodocuments with n wa (counted on M-step). N. A. Chirkova October 14, / 31

23 hartm: Θ interlevel regularizer Already learned: levels 1,..., l, l-th level: topics set a A, parameters Φ l R W A and Θ l R A D. Level to learn: topics set t T, parameters Φ R W T and Θ R T D. The goal: to establish parent-child relations t is a child of a. Hypothesis: p(a d) = t T p(a t)p(t d), a A, d D. Θ regularization criteria with new parameters Ψ = { ψ at } A T, ψ at = p(a t): Θ l ΨΘ R 4 (Θ, Ψ) = n ad ln ψ at θ td a A d D t T Implementation: new modality with tokens corresponding to a A. N. A. Chirkova October 14, / 31

24 hartm: interlevel regularizers illustration PLSA ARTM F.. Φ Θ F 1 F 2.. Φ1 Φ 2 Θ hartm with F Φ l.. Φ Φ reg. hartm with Θ reg. Θ Ψ F 1 F 2 Θ l.. Φ 1 Φ 2 F = m M F m, F m = {f dw } W m T, f dw = norm w W m[n dw ] Ψ Θ N. A. Chirkova October 14, / 31

25 hartm: hierarchy sparsing with Θ interlevel regularizer The goal: topics have small number of parent topics p(a t) is sparse. Entropy sparsing regularizer: R 5 ( Ψ) = t T a A 1 A ln ψ at Updated M-step: [ ψ at = norm n at τ ] 5 a A A Drawback: the possibility of p(a t) = 0 a Power sparsing regularizer: R 5 ( Ψ) = 1 ψ q q at, q > 1 Updated M-step: t T a A [ ] ψ at = norm n at + τ 5 ψ at q a A N. A. Chirkova October 14, / 31

26 hartm: hierarchy sparsing with Φ interlevel regularizer The goal: topics have small number of parent topics p(a t) is sparse. Entropy sparsing regularizer: R 5 (Ψ) = t T Updated M-step: a A ψ ta = norm t T At any time t a : p(a t) > ln p(a t) = A A [ ln ( ) ] 1 n ta τ 5 A p(a t) a t ψ ta p(a) a ψ ta p(a ) N. A. Chirkova October 14, / 31

27 hartm in BigARTM Key BigARTM concepts: Documents set is split into batches and stored on disk 1 EM-step = a pass through batches iterating over each batch Storing Φ permanently, retraining Θ for any loaded batch Φ interlevel regularizer implementation: 1 Learn levels l = 1, 2, For levels l > 1 add 1 extra batch composed from (l 1)-th level s Φ 3 Extract Ψ as Θ corresponding to extra batch Θ intervelel regularizer implementation: 1 Learn levels l = 1, 2, For levels l > 1 modify all batches: add extra modality composed from (l 1)-th level s Θ 3 Extract Ψ as Φ corresponding to extra modality N. A. Chirkova October 14, / 31

28 Experiments: comparison of Φ and Θ interlevel regularizers Wikipedia: D = , W = Learning 2 nd level, A = 50, T = 250, vary number of batches. Measuring the quality of approximation Φ l ΦΨ and Θ l = ΨΘ. ρ(φ l, ΦΨ) Φ ρ(θ l, ΨΘ) Φ ρ(φ l, ΦΨ) Θ ρ(θ l, ΨΘ) Θ Approximation is quite the same with both regularizers, Φ-reg. is better. N. A. Chirkova October 14, /

29 Experiments: children number study Postnauka: D = 1728, W = Learning 2 nd level with Φ-reg., A = 10, T = 30, vary hierarchy sparsing reg. τ 5. Measuring the mean and standard deviation of estimated subtopics count over 10 restarts. t is a child of a if p(t a) > threshold. log 10 τ log 10 τ The bigger τ 5, the more sparse the hierarchy. For large τ 5 subtopics count estimation is robust (std < 1). N. A. Chirkova October 14, / 31

30 Experiments: parent-child relations study Postnauka: D = 1728, W = Learning topics hierarchy with Φ-reg. Generating 100 pairs topic-subtopic, asking an expert to mark a pair as relation exists or not. p(a t) no sparsing p(a t) Ψ sparsing When using the hierarchy sparcing, we can impose a threshold with minimum errors. N. A. Chirkova October 14, / 31

31 Summary Contributions: An approach to learn topic hierarchies from multimodal data with additional requirements. A method to control hierarchy sparsity. Open-source implementation in BigARTM with friendly interface. Ongoing projects with hartm: Creating a user-friendly navigator through Postnauka.ru materials. Developing a system for online news flow filtration. N. A. Chirkova October 14, / 31

Additive Regularization of Topic Models for Topic Selection and Sparse Factorization

Additive Regularization of Topic Models for Topic Selection and Sparse Factorization Additive Regularization of Topic Models for Topic Selection and Sparse Factorization Konstantin Vorontsov 1, Anna Potapenko 2, and Alexander Plavin 3 1 Moscow Institute of Physics and Technology, Dorodnicyn

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering

More information

Collaborative topic models: motivations cont

Collaborative topic models: motivations cont Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)

More information

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang

Matrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Large-Scale Behavioral Targeting

Large-Scale Behavioral Targeting Large-Scale Behavioral Targeting Ye Chen, Dmitry Pavlov, John Canny ebay, Yandex, UC Berkeley (This work was conducted at Yahoo! Labs.) June 30, 2009 Chen et al. (KDD 09) Large-Scale Behavioral Targeting

More information

Generative Clustering, Topic Modeling, & Bayesian Inference

Generative Clustering, Topic Modeling, & Bayesian Inference Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week

More information

Modern Information Retrieval

Modern Information Retrieval Modern Information Retrieval Chapter 8 Text Classification Introduction A Characterization of Text Classification Unsupervised Algorithms Supervised Algorithms Feature Selection or Dimensionality Reduction

More information

From Non-Negative Matrix Factorization to Deep Learning

From Non-Negative Matrix Factorization to Deep Learning The Math!! From Non-Negative Matrix Factorization to Deep Learning Intuitions and some Math too! luissarmento@gmailcom https://wwwlinkedincom/in/luissarmento/ October 18, 2017 The Math!! Introduction Disclaimer

More information

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations

Pachinko Allocation: DAG-Structured Mixture Models of Topic Correlations : DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation

More information

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University Text Mining Dr. Yanjun Li Associate Professor Department of Computer and Information Sciences Fordham University Outline Introduction: Data Mining Part One: Text Mining Part Two: Preprocessing Text Data

More information

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Music and Machine Learning (IFT68 Winter 8) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015 Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch COMP-599 Oct 1, 2015 Announcements Research skills workshop today 3pm-4:30pm Schulich Library room 313 Start thinking about

More information

Classification & Information Theory Lecture #8

Classification & Information Theory Lecture #8 Classification & Information Theory Lecture #8 Introduction to Natural Language Processing CMPSCI 585, Fall 2007 University of Massachusetts Amherst Andrew McCallum Today s Main Points Automatically categorizing

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Latent Dirichlet Allocation Introduction/Overview

Latent Dirichlet Allocation Introduction/Overview Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models

More information

Clustering, K-Means, EM Tutorial

Clustering, K-Means, EM Tutorial Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

6.867 Machine learning, lecture 23 (Jaakkola)

6.867 Machine learning, lecture 23 (Jaakkola) Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context

More information

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information

More information

Online Dictionary Learning with Group Structure Inducing Norms

Online Dictionary Learning with Group Structure Inducing Norms Online Dictionary Learning with Group Structure Inducing Norms Zoltán Szabó 1, Barnabás Póczos 2, András Lőrincz 1 1 Eötvös Loránd University, Budapest, Hungary 2 Carnegie Mellon University, Pittsburgh,

More information

Discovering Geographical Topics in Twitter

Discovering Geographical Topics in Twitter Discovering Geographical Topics in Twitter Liangjie Hong, Lehigh University Amr Ahmed, Yahoo! Research Alexander J. Smola, Yahoo! Research Siva Gurumurthy, Twitter Kostas Tsioutsiouliklis, Twitter Overview

More information

Lecture 8: Clustering & Mixture Models

Lecture 8: Clustering & Mixture Models Lecture 8: Clustering & Mixture Models C4B Machine Learning Hilary 2011 A. Zisserman K-means algorithm GMM and the EM algorithm plsa clustering K-means algorithm K-means algorithm Partition data into K

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Click Prediction and Preference Ranking of RSS Feeds

Click Prediction and Preference Ranking of RSS Feeds Click Prediction and Preference Ranking of RSS Feeds 1 Introduction December 11, 2009 Steven Wu RSS (Really Simple Syndication) is a family of data formats used to publish frequently updated works. RSS

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Topic Models and Applications to Short Documents

Topic Models and Applications to Short Documents Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text

More information

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

RETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic

More information

Natural Language Processing. Topics in Information Retrieval. Updated 5/10

Natural Language Processing. Topics in Information Retrieval. Updated 5/10 Natural Language Processing Topics in Information Retrieval Updated 5/10 Outline Introduction to IR Design features of IR systems Evaluation measures The vector space model Latent semantic indexing Background

More information

Modeling Environment

Modeling Environment Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA

More information

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017

Sum-Product Networks. STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Sum-Product Networks STAT946 Deep Learning Guest Lecture by Pascal Poupart University of Waterloo October 17, 2017 Introduction Outline What is a Sum-Product Network? Inference Applications In more depth

More information

Document and Topic Models: plsa and LDA

Document and Topic Models: plsa and LDA Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13 Indexes for Multimedia Data 13 Indexes for Multimedia

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a

More information

Predicting New Search-Query Cluster Volume

Predicting New Search-Query Cluster Volume Predicting New Search-Query Cluster Volume Jacob Sisk, Cory Barr December 14, 2007 1 Problem Statement Search engines allow people to find information important to them, and search engine companies derive

More information

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models

University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models University of Washington Department of Electrical Engineering EE512 Spring, 2006 Graphical Models Jeff A. Bilmes Lecture 1 Slides March 28 th, 2006 Lec 1: March 28th, 2006 EE512

More information

N-gram Language Modeling Tutorial

N-gram Language Modeling Tutorial N-gram Language Modeling Tutorial Dustin Hillard and Sarah Petersen Lecture notes courtesy of Prof. Mari Ostendorf Outline: Statistical Language Model (LM) Basics n-gram models Class LMs Cache LMs Mixtures

More information

Unsupervised Learning with Permuted Data

Unsupervised Learning with Permuted Data Unsupervised Learning with Permuted Data Sergey Kirshner skirshne@ics.uci.edu Sridevi Parise sparise@ics.uci.edu Padhraic Smyth smyth@ics.uci.edu School of Information and Computer Science, University

More information

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations

Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Sparse Principal Component Analysis via Alternating Maximization and Efficient Parallel Implementations Martin Takáč The University of Edinburgh Joint work with Peter Richtárik (Edinburgh University) Selin

More information

Chapter 14 Combining Models

Chapter 14 Combining Models Chapter 14 Combining Models T-61.62 Special Course II: Pattern Recognition and Machine Learning Spring 27 Laboratory of Computer and Information Science TKK April 3th 27 Outline Independent Mixing Coefficients

More information

Decision Trees (Cont.)

Decision Trees (Cont.) Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split

More information

Data Mining Techniques

Data Mining Techniques Data Mining Techniques CS 622 - Section 2 - Spring 27 Pre-final Review Jan-Willem van de Meent Feedback Feedback https://goo.gl/er7eo8 (also posted on Piazza) Also, please fill out your TRACE evaluations!

More information

Additive regularization of topic models

Additive regularization of topic models Mach Learn (2015) 101:303 323 DOI 10.1007/s10994-014-5476-6 Additive regularization of topic models Konstantin Vorontsov AnnaPotapenko Received: 22 January 2014 / Accepted: 18 November 2014 / Published

More information

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS

APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS APPLICATIONS OF MINING HETEROGENEOUS INFORMATION NETWORKS Yizhou Sun College of Computer and Information Science Northeastern University yzsun@ccs.neu.edu July 25, 2015 Heterogeneous Information Networks

More information

Factor Modeling for Advertisement Targeting

Factor Modeling for Advertisement Targeting Ye Chen 1, Michael Kapralov 2, Dmitry Pavlov 3, John F. Canny 4 1 ebay Inc, 2 Stanford University, 3 Yandex Labs, 4 UC Berkeley NIPS-2009 Presented by Miao Liu May 27, 2010 Introduction GaP model Sponsored

More information

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data

Multimedia Databases 1/29/ Indexes for Multimedia Data Indexes for Multimedia Data Indexes for Multimedia Data 1/29/2010 13 Indexes for Multimedia Data 13 Indexes for Multimedia Data 13.1 R-Trees Multimedia Databases Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig

More information

A Continuous-Time Model of Topic Co-occurrence Trends

A Continuous-Time Model of Topic Co-occurrence Trends A Continuous-Time Model of Topic Co-occurrence Trends Wei Li, Xuerui Wang and Andrew McCallum Department of Computer Science University of Massachusetts 140 Governors Drive Amherst, MA 01003-9264 Abstract

More information

Information Retrieval and Organisation

Information Retrieval and Organisation Information Retrieval and Organisation Chapter 13 Text Classification and Naïve Bayes Dell Zhang Birkbeck, University of London Motivation Relevance Feedback revisited The user marks a number of documents

More information

10708 Graphical Models: Homework 2

10708 Graphical Models: Homework 2 10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves

More information

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 10: Neural Language Models. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 10: Neural Language Models Princeton University COS 495 Instructor: Yingyu Liang Natural language Processing (NLP) The processing of the human languages by computers One of

More information

Word Representations via Gaussian Embedding. Luke Vilnis Andrew McCallum University of Massachusetts Amherst

Word Representations via Gaussian Embedding. Luke Vilnis Andrew McCallum University of Massachusetts Amherst Word Representations via Gaussian Embedding Luke Vilnis Andrew McCallum University of Massachusetts Amherst Vector word embeddings teacher chef astronaut composer person Low-Level NLP [Turian et al. 2010,

More information

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task

Binary Principal Component Analysis in the Netflix Collaborative Filtering Task Binary Principal Component Analysis in the Netflix Collaborative Filtering Task László Kozma, Alexander Ilin, Tapani Raiko first.last@tkk.fi Helsinki University of Technology Adaptive Informatics Research

More information

Hidden Markov Models in Language Processing

Hidden Markov Models in Language Processing Hidden Markov Models in Language Processing Dustin Hillard Lecture notes courtesy of Prof. Mari Ostendorf Outline Review of Markov models What is an HMM? Examples General idea of hidden variables: implications

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Notifications and Accretions: Facility Role

Notifications and Accretions: Facility Role Notifications and Accretions: Facility Role Goal: Learn to view and resolve notification and accretion discrepancies in CROWNWeb. Estimated Time: 25 to 30 minutes PDF: Download a screen reader compatible

More information

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views

A Randomized Approach for Crowdsourcing in the Presence of Multiple Views A Randomized Approach for Crowdsourcing in the Presence of Multiple Views Presenter: Yao Zhou joint work with: Jingrui He - 1 - Roadmap Motivation Proposed framework: M2VW Experimental results Conclusion

More information

Notes on Latent Semantic Analysis

Notes on Latent Semantic Analysis Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically

More information

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287

Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Part-of-Speech Tagging + Neural Networks 3: Word Embeddings CS 287 Review: Neural Networks One-layer multi-layer perceptron architecture, NN MLP1 (x) = g(xw 1 + b 1 )W 2 + b 2 xw + b; perceptron x is the

More information

Data Science and Scientific Computation Track Core Course

Data Science and Scientific Computation Track Core Course Data Science and Scientific Computation Track Core Course Christoph Lampert Spring Semester 2016/17 Segment 1, Lecture 2 1 / 32 Overview Date no. Topic Feb 27 Mon 1 predictive models, least squares regression,

More information

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr.

Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, Dr. Large Scale Evaluation of Chemical Structure Recognition 4 th Text Mining Symposium in Life Sciences October 10, 2006 Dr. Overview Brief introduction Chemical Structure Recognition (chemocr) Manual conversion

More information

Online Bayesian Passive-Agressive Learning

Online Bayesian Passive-Agressive Learning Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online

More information

St. Kitts and Nevis Heritage and Culture

St. Kitts and Nevis Heritage and Culture St. Kitts and Nevis Heritage and Culture Eloise Stancioff, Habiba, Departmet of Culture St. Kitts HERA workshop: March 17-20, 2015 Goals Using freely available open source platforms, we implement two different

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

N-gram Language Modeling

N-gram Language Modeling N-gram Language Modeling Outline: Statistical Language Model (LM) Intro General N-gram models Basic (non-parametric) n-grams Class LMs Mixtures Part I: Statistical Language Model (LM) Intro What is a statistical

More information

Categorical and Zero Inflated Growth Models

Categorical and Zero Inflated Growth Models Categorical and Zero Inflated Growth Models Alan C. Acock* Summer, 2009 *Alan C. Acock, Department of Human Development and Family Sciences, Oregon State University, Corvallis OR 97331 (alan.acock@oregonstate.edu).

More information

10-701/15-781, Machine Learning: Homework 4

10-701/15-781, Machine Learning: Homework 4 10-701/15-781, Machine Learning: Homewor 4 Aarti Singh Carnegie Mellon University ˆ The assignment is due at 10:30 am beginning of class on Mon, Nov 15, 2010. ˆ Separate you answers into five parts, one

More information

Scaling Neighbourhood Methods

Scaling Neighbourhood Methods Quick Recap Scaling Neighbourhood Methods Collaborative Filtering m = #items n = #users Complexity : m * m * n Comparative Scale of Signals ~50 M users ~25 M items Explicit Ratings ~ O(1M) (1 per billion)

More information

Single-tree GMM training

Single-tree GMM training Single-tree GMM training Ryan R. Curtin May 27, 2015 1 Introduction In this short document, we derive a tree-independent single-tree algorithm for Gaussian mixture model training, based on a technique

More information

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT

MLCC 2018 Variable Selection and Sparsity. Lorenzo Rosasco UNIGE-MIT-IIT MLCC 2018 Variable Selection and Sparsity Lorenzo Rosasco UNIGE-MIT-IIT Outline Variable Selection Subset Selection Greedy Methods: (Orthogonal) Matching Pursuit Convex Relaxation: LASSO & Elastic Net

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Latent Dirichlet Allocation

Latent Dirichlet Allocation Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology

More information

4CitySemantics. GIS-Semantic Tool for Urban Intervention Areas

4CitySemantics. GIS-Semantic Tool for Urban Intervention Areas 4CitySemantics GIS-Semantic Tool for Urban Intervention Areas Nuno MONTENEGRO 1 ; Jorge GOMES 2 ; Paulo URBANO 2 José P. DUARTE 1 1 Faculdade de Arquitectura da Universidade Técnica de Lisboa, Rua Sá Nogueira,

More information

Entropy. Expected Surprise

Entropy. Expected Surprise Entropy Let X be a discrete random variable The surprise of observing X = x is defined as log 2 P(X=x) Surprise of probability 1 is zero. Surprise of probability 0 is (c) 200 Thomas G. Dietterich 1 Expected

More information

An overview of word2vec

An overview of word2vec An overview of word2vec Benjamin Wilson Berlin ML Meetup, July 8 2014 Benjamin Wilson word2vec Berlin ML Meetup 1 / 25 Outline 1 Introduction 2 Background & Significance 3 Architecture 4 CBOW word representations

More information

Note for plsa and LDA-Version 1.1

Note for plsa and LDA-Version 1.1 Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,

More information

Text Mining for Economics and Finance Latent Dirichlet Allocation

Text Mining for Economics and Finance Latent Dirichlet Allocation Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model

More information

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos

Discriminative Learning of Sum-Product Networks. Robert Gens Pedro Domingos Discriminative Learning of Sum-Product Networks Robert Gens Pedro Domingos X1 X1 X1 X1 X2 X2 X2 X2 X3 X3 X3 X3 X4 X4 X4 X4 X5 X5 X5 X5 X6 X6 X6 X6 Distributions X 1 X 1 X 1 X 1 X 2 X 2 X 2 X 2 X 3 X 3

More information

Putting the Bayes update to sleep

Putting the Bayes update to sleep Putting the Bayes update to sleep Manfred Warmuth UCSC AMS seminar 4-13-15 Joint work with Wouter M. Koolen, Dmitry Adamskiy, Olivier Bousquet Menu How adding one line of code to the multiplicative update

More information

A Tutorial on Learning with Bayesian Networks

A Tutorial on Learning with Bayesian Networks A utorial on Learning with Bayesian Networks David Heckerman Presented by: Krishna V Chengavalli April 21 2003 Outline Introduction Different Approaches Bayesian Networks Learning Probabilities and Structure

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

Language Information Processing, Advanced. Topic Models

Language Information Processing, Advanced. Topic Models Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:

More information

DT2118 Speech and Speaker Recognition

DT2118 Speech and Speaker Recognition DT2118 Speech and Speaker Recognition Language Modelling Giampiero Salvi KTH/CSC/TMH giampi@kth.se VT 2015 1 / 56 Outline Introduction Formal Language Theory Stochastic Language Models (SLM) N-gram Language

More information

Logic-based probabilistic modeling language. Syntax: Prolog + msw/2 (random choice) Pragmatics:(very) high level modeling language

Logic-based probabilistic modeling language. Syntax: Prolog + msw/2 (random choice) Pragmatics:(very) high level modeling language 1 Logic-based probabilistic modeling language Turing machine with statistically learnable state transitions Syntax: Prolog + msw/2 (random choice) Variables, terms, predicates, etc available for p.-modeling

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Matrix Factorization Techniques for Recommender Systems

Matrix Factorization Techniques for Recommender Systems Matrix Factorization Techniques for Recommender Systems Patrick Seemann, December 16 th, 2014 16.12.2014 Fachbereich Informatik Recommender Systems Seminar Patrick Seemann Topics Intro New-User / New-Item

More information

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister

A Syntax-based Statistical Machine Translation Model. Alexander Friedl, Georg Teichtmeister A Syntax-based Statistical Machine Translation Model Alexander Friedl, Georg Teichtmeister 4.12.2006 Introduction The model Experiment Conclusion Statistical Translation Model (STM): - mathematical model

More information

Sum-Product Networks: A New Deep Architecture

Sum-Product Networks: A New Deep Architecture Sum-Product Networks: A New Deep Architecture Pedro Domingos Dept. Computer Science & Eng. University of Washington Joint work with Hoifung Poon 1 Graphical Models: Challenges Bayesian Network Markov Network

More information

Naïve Bayes Classifiers

Naïve Bayes Classifiers Naïve Bayes Classifiers Example: PlayTennis (6.9.1) Given a new instance, e.g. (Outlook = sunny, Temperature = cool, Humidity = high, Wind = strong ), we want to compute the most likely hypothesis: v NB

More information

Adaptive Crowdsourcing via EM with Prior

Adaptive Crowdsourcing via EM with Prior Adaptive Crowdsourcing via EM with Prior Peter Maginnis and Tanmay Gupta May, 205 In this work, we make two primary contributions: derivation of the EM update for the shifted and rescaled beta prior and

More information