Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1
|
|
- Patience Sanders
- 5 years ago
- Views:
Transcription
1 Topic Models Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1
2 Low-Dimensional Space for Documents Last time: embedding space for words This time: embedding space for documents Generative story New inference techniques Advanced Machine Learning for NLP Boyd-Graber Topic Models 2 of 1
3 Why topic models? Suppose you have a huge number of documents Want to know what s going on Can t read them all (e.g. every New York Times article from the 90 s) Topic models offer a way to get a corpus-level view of major themes Advanced Machine Learning for NLP Boyd-Graber Topic Models 3 of 1
4 Why topic models? Suppose you have a huge number of documents Want to know what s going on Can t read them all (e.g. every New York Times article from the 90 s) Topic models offer a way to get a corpus-level view of major themes Unsupervised Advanced Machine Learning for NLP Boyd-Graber Topic Models 3 of 1
5 Roadmap What are topic models How to go from raw data to topics Advanced Machine Learning for NLP Boyd-Graber Topic Models 4 of 1
6 Embedding Space From an input corpus and number of topics K words to topics Corpus Forget the Bootleg, Just Download Multiplex the Heralded Movie Legally As Linchpin The Shape To of Growth Cinema, Transformed A Peaceful At Crew the Click Puts of Muppets a Where Mouse Stock Trades: A Its Better Mouth Deal Is For The Investors three Isn't big Internet Simple portals Red Light, begin Green to distinguish Light: A among 2-Tone themselves L.E.D. to as shopping Simplify Screens malls Advanced Machine Learning for NLP Boyd-Graber Topic Models 5 of 1
7 Embedding Space From an input corpus and number of topics K words to topics TOPIC 1 TOPIC 2 TOPIC 3 computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Advanced Machine Learning for NLP Boyd-Graber Topic Models 5 of 1
8 Conceptual Approach For each document, what topics are expressed by that document? Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens The three big Internet portals begin to distinguish among themselves as shopping malls Stock Trades: A Better Deal For Investors Isn't Simple TOPIC 1 Forget the Bootleg, Just Download the Movie Legally TOPIC 2 The Shape of Cinema, Transformed At the Click of a Mouse Multiplex Heralded As Linchpin To Growth TOPIC 3 A Peaceful Crew Puts Muppets Where Its Mouth Is Advanced Machine Learning for NLP Boyd-Graber Topic Models 6 of 1
9 Topics from Science Advanced Machine Learning for NLP Boyd-Graber Topic Models 7 of 1
10 Why should you care? Neat way to explore / understand corpus collections E-discovery Social media Scientific data NLP Applications Word Sense Disambiguation Discourse Segmentation Machine Translation Psychology: word meaning, polysemy Inference is (relatively) simple Advanced Machine Learning for NLP Boyd-Graber Topic Models 8 of 1
11 Matrix Factorization Approach M K K V M V Topics Topic Assignment Dataset K Number of topics M Number of documents V Size of vocabulary Advanced Machine Learning for NLP Boyd-Graber Topic Models 9 of 1
12 Matrix Factorization Approach M K K V M V Topics Topic Assignment Dataset K Number of topics M Number of documents V Size of vocabulary If you use singular value decomposition (SVD), this technique is called latent semantic analysis. Popular in information retrieval. Advanced Machine Learning for NLP Boyd-Graber Topic Models 9 of 1
13 Alternative: Generative Model How your data came to be Sequence of Probabilistic Steps Posterior Inference Advanced Machine Learning for NLP Boyd-Graber Topic Models 10 of 1
14 Alternative: Generative Model How your data came to be Sequence of Probabilistic Steps Posterior Inference Blei, Ng, Jordan. Latent Dirichlet Allocation. JMLR, Advanced Machine Learning for NLP Boyd-Graber Topic Models 10 of 1
15 Multinomial Distribution Distribution over discrete outcomes Represented by non-negative vector that sums to one Picture representation (1,0,0) (0,0,1) (0,1,0) (1/3,1/3,1/3) (1/4,1/4,1/2) (1/2,1/2,0) Advanced Machine Learning for NLP Boyd-Graber Topic Models 11 of 1
16 Multinomial Distribution Distribution over discrete outcomes Represented by non-negative vector that sums to one Picture representation (1,0,0) (0,0,1) (0,1,0) (1/3,1/3,1/3) (1/4,1/4,1/2) (1/2,1/2,0) Come from a Dirichlet distribution Advanced Machine Learning for NLP Boyd-Graber Topic Models 11 of 1
17 Dirichlet Distribution Advanced Machine Learning for NLP Boyd-Graber Topic Models 12 of 1
18 Dirichlet Distribution Advanced Machine Learning for NLP Boyd-Graber Topic Models 12 of 1
19 Dirichlet Distribution Advanced Machine Learning for NLP Boyd-Graber Topic Models 12 of 1
20 Dirichlet Distribution Advanced Machine Learning for NLP Boyd-Graber Topic Models 13 of 1
21 Dirichlet Distribution If φ Dir(()α), w Mult(()φ), and n k = {w i : w i = k} then p(φ α,w ) p(w φ)p(φ α) (1) φ n k (2) k k k φ α k +n k 1 φ α k 1 Conjugacy: this posterior has the same form as the prior (3)
22 Dirichlet Distribution If φ Dir(()α), w Mult(()φ), and n k = {w i : w i = k} then p(φ α,w ) p(w φ)p(φ α) (1) φ n k (2) k k k φ α k +n k 1 φ α k 1 Conjugacy: this posterior has the same form as the prior (3) Advanced Machine Learning for NLP Boyd-Graber Topic Models 14 of 1
23 Generative Model TOPIC 1 computer, technology, system, service, site, phone, internet, machine TOPIC 2 sell, sale, store, product, business, advertising, market, consumer TOPIC 3 play, film, movie, theater, production, star, director, stage Advanced Machine Learning for NLP Boyd-Graber Topic Models 15 of 1
24 Generative Model Red Light, Green Light: A 2-Tone L.E.D. to Simplify Screens The three big Internet portals begin to distinguish among themselves as shopping malls Stock Trades: A Better Deal For Investors Isn't Simple TOPIC 1 Forget the Bootleg, Just Download the Movie Legally TOPIC 2 The Shape of Cinema, Transformed At the Click of a Mouse Multiplex Heralded As Linchpin To Growth TOPIC 3 A Peaceful Crew Puts Muppets Where Its Mouth Is Advanced Machine Learning for NLP Boyd-Graber Topic Models 15 of 1
25 Generative Model computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Advanced Machine Learning for NLP Boyd-Graber Topic Models 15 of 1
26 Generative Model computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Advanced Machine Learning for NLP Boyd-Graber Topic Models 15 of 1
27 Generative Model computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Advanced Machine Learning for NLP Boyd-Graber Topic Models 15 of 1
28 Generative Model computer, technology, system, service, site, phone, internet, machine sell, sale, store, product, business, advertising, market, consumer play, film, movie, theater, production, star, director, stage Hollywood studios are preparing to let people download and buy electronic copies of movies over the Internet, much as record labels now sell songs for 99 cents through Apple Computer's itunes music store and other online services... Advanced Machine Learning for NLP Boyd-Graber Topic Models 15 of 1
29 Generative Model Approach λ βk K α θd zn wn N M For each topic k {1,..., K }, draw a multinomial distribution β k from a Dirichlet distribution with parameter λ Advanced Machine Learning for NLP Boyd-Graber Topic Models 16 of 1
30 Generative Model Approach λ βk K α θd zn wn N M For each topic k {1,..., K }, draw a multinomial distribution β k from a Dirichlet distribution with parameter λ For each document d {1,...,M }, draw a multinomial distribution θ d from a Dirichlet distribution with parameter α Advanced Machine Learning for NLP Boyd-Graber Topic Models 16 of 1
31 Generative Model Approach λ βk K α θd zn wn N M For each topic k {1,..., K }, draw a multinomial distribution β k from a Dirichlet distribution with parameter λ For each document d {1,...,M }, draw a multinomial distribution θ d from a Dirichlet distribution with parameter α For each word position n {1,...,N }, select a hidden topic z n from the multinomial distribution parameterized by θ. Advanced Machine Learning for NLP Boyd-Graber Topic Models 16 of 1
32 Generative Model Approach λ βk K α θd zn wn N M For each topic k {1,..., K }, draw a multinomial distribution β k from a Dirichlet distribution with parameter λ For each document d {1,...,M }, draw a multinomial distribution θ d from a Dirichlet distribution with parameter α For each word position n {1,...,N }, select a hidden topic z n from the multinomial distribution parameterized by θ. Choose the observed word w n from the distribution β zn. Advanced Machine Learning for NLP Boyd-Graber Topic Models 16 of 1
33 Generative Model Approach λ βk K α θd zn wn N M For each topic k {1,..., K }, draw a multinomial distribution β k from a Dirichlet distribution with parameter λ For each document d {1,...,M }, draw a multinomial distribution θ d from a Dirichlet distribution with parameter α For each word position n {1,...,N }, select a hidden topic z n from the multinomial distribution parameterized by θ. Choose the observed word w n from the distribution β zn. Advanced We Machine use Learning statistical for NLP Boyd-Graber inference to uncover the most likely unobserved Topic Models 16 of 1
34 Topic Models: What s Important Topic models Topics to word types multinomial distribution Documents to topics multinomial distribution Focus in this talk: statistical methods Model: story of how your data came to be Latent variables: missing pieces of your story Statistical inference: filling in those missing pieces We use latent Dirichlet allocation (LDA), a fully Bayesian version of plsi, probabilistic version of LSA Advanced Machine Learning for NLP Boyd-Graber Topic Models 17 of 1
35 Topic Models: What s Important Topic models (latent variables) Topics to word types multinomial distribution Documents to topics multinomial distribution Focus in this talk: statistical methods Model: story of how your data came to be Latent variables: missing pieces of your story Statistical inference: filling in those missing pieces We use latent Dirichlet allocation (LDA), a fully Bayesian version of plsi, probabilistic version of LSA Advanced Machine Learning for NLP Boyd-Graber Topic Models 17 of 1
36 Advanced Machine Learning for NLP Boyd-Graber Topic Models 18 of 1
Topic Models. Material adapted from David Mimno University of Maryland INTRODUCTION. Material adapted from David Mimno UMD Topic Models 1 / 51
Topic Models Material adapted from David Mimno University of Maryland INTRODUCTION Material adapted from David Mimno UMD Topic Models 1 / 51 Why topic models? Suppose you have a huge number of documents
More informationLSI, plsi, LDA and inference methods
LSI, plsi, LDA and inference methods Guillaume Obozinski INRIA - Ecole Normale Supérieure - Paris RussIR summer school Yaroslavl, August 6-10th 2012 Guillaume Obozinski LSI, plsi, LDA and inference methods
More informationMr. LDA: A Flexible Large Scale Topic Modeling Package using Variational Inference in MapReduce
Mr. LDA: A Flexible Large Scale Topic Modeling Package using Variational Inference in MapReduce Ke Zhai, Jordan Boyd-Graber, Nima Asadi, and Mohamad Alkhouja Mr. LDA: A Flexible Large Scale Topic Modeling
More informationGibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:
Gibbs Sampling Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644: 2019 03 27 Latent semantic analysis Documents as mixtures of topics (Hoffman 1999) 1 / 60 Latent semantic analysis
More informationInformation retrieval LSI, plsi and LDA. Jian-Yun Nie
Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and
More informationApplying LDA topic model to a corpus of Italian Supreme Court decisions
Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding
More informationDistributed ML for DOSNs: giving power back to users
Distributed ML for DOSNs: giving power back to users Amira Soliman KTH isocial Marie Curie Initial Training Networks Part1 Agenda DOSNs and Machine Learning DIVa: Decentralized Identity Validation for
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationLatent Dirichlet Allocation
Outlines Advanced Artificial Intelligence October 1, 2009 Outlines Part I: Theoretical Background Part II: Application and Results 1 Motive Previous Research Exchangeability 2 Notation and Terminology
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationTopic Models and Applications to Short Documents
Topic Models and Applications to Short Documents Dieu-Thu Le Email: dieuthu.le@unitn.it Trento University April 6, 2011 1 / 43 Outline Introduction Latent Dirichlet Allocation Gibbs Sampling Short Text
More informationTopic Modeling Using Latent Dirichlet Allocation (LDA)
Topic Modeling Using Latent Dirichlet Allocation (LDA) Porter Jenkins and Mimi Brinberg Penn State University prj3@psu.edu mjb6504@psu.edu October 23, 2017 Porter Jenkins and Mimi Brinberg (PSU) LDA October
More informationStatistical Debugging with Latent Topic Models
Statistical Debugging with Latent Topic Models David Andrzejewski, Anne Mulhern, Ben Liblit, Xiaojin Zhu Department of Computer Sciences University of Wisconsin Madison European Conference on Machine Learning,
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Yuriy Sverchkov Intelligent Systems Program University of Pittsburgh October 6, 2011 Outline Latent Semantic Analysis (LSA) A quick review Probabilistic LSA (plsa)
More informationtopic modeling hanna m. wallach
university of massachusetts amherst wallach@cs.umass.edu Ramona Blei-Gantz Helen Moss (Dave's Grandma) The Next 30 Minutes Motivations and a brief history: Latent semantic analysis Probabilistic latent
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang
More informationPROBABILISTIC LATENT SEMANTIC ANALYSIS
PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications
More informationAN INTRODUCTION TO TOPIC MODELS
AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationUnderstanding Comments Submitted to FCC on Net Neutrality. Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014
Understanding Comments Submitted to FCC on Net Neutrality Kevin (Junhui) Mao, Jing Xia, Dennis (Woncheol) Jeong December 12, 2014 Abstract We aim to understand and summarize themes in the 1.65 million
More informationEfficient Tree-Based Topic Modeling
Efficient Tree-Based Topic Modeling Yuening Hu Department of Computer Science University of Maryland, College Park ynhu@cs.umd.edu Abstract Topic modeling with a tree-based prior has been used for a variety
More informationContent-based Recommendation
Content-based Recommendation Suthee Chaidaroon June 13, 2016 Contents 1 Introduction 1 1.1 Matrix Factorization......................... 2 2 slda 2 2.1 Model................................. 3 3 flda 3
More informationLecture 19, November 19, 2012
Machine Learning 0-70/5-78, Fall 0 Latent Space Analysis SVD and Topic Models Eric Xing Lecture 9, November 9, 0 Reading: Tutorial on Topic Model @ ACL Eric Xing @ CMU, 006-0 We are inundated with data
More informationNote for plsa and LDA-Version 1.1
Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,
More informationTopic Models. Charles Elkan November 20, 2008
Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One
More informationTopic Discovery Project Report
Topic Discovery Project Report Shunyu Yao and Xingjiang Yu IIIS, Tsinghua University {yao-sy15, yu-xj15}@mails.tsinghua.edu.cn Abstract In this report we present our implementations of topic discovery
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationKernel Density Topic Models: Visual Topics Without Visual Words
Kernel Density Topic Models: Visual Topics Without Visual Words Konstantinos Rematas K.U. Leuven ESAT-iMinds krematas@esat.kuleuven.be Mario Fritz Max Planck Institute for Informatics mfrtiz@mpi-inf.mpg.de
More informationOnline Bayesian Passive-Agressive Learning
Online Bayesian Passive-Agressive Learning International Conference on Machine Learning, 2014 Tianlin Shi Jun Zhu Tsinghua University, China 21 August 2015 Presented by: Kyle Ulrich Introduction Online
More informationWeb Search and Text Mining. Lecture 16: Topics and Communities
Web Search and Tet Mining Lecture 16: Topics and Communities Outline Latent Dirichlet Allocation (LDA) Graphical models for social netorks Eploration, discovery, and query-ansering in the contet of the
More informationMatrix Factorization & Latent Semantic Analysis Review. Yize Li, Lanbo Zhang
Matrix Factorization & Latent Semantic Analysis Review Yize Li, Lanbo Zhang Overview SVD in Latent Semantic Indexing Non-negative Matrix Factorization Probabilistic Latent Semantic Indexing Vector Space
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationKnowledge Discovery and Data Mining 1 (VO) ( )
Knowledge Discovery and Data Mining 1 (VO) (707.003) Probabilistic Latent Semantic Analysis Denis Helic KTI, TU Graz Jan 16, 2014 Denis Helic (KTI, TU Graz) KDDM1 Jan 16, 2014 1 / 47 Big picture: KDDM
More informationA Continuous-Time Model of Topic Co-occurrence Trends
A Continuous-Time Model of Topic Co-occurrence Trends Wei Li, Xuerui Wang and Andrew McCallum Department of Computer Science University of Massachusetts 140 Governors Drive Amherst, MA 01003-9264 Abstract
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationMeasuring Topic Quality in Latent Dirichlet Allocation
Measuring Topic Quality in Sergei Koltsov Olessia Koltsova Steklov Institute of Mathematics at St. Petersburg Laboratory for Internet Studies, National Research University Higher School of Economics, St.
More informationApplying Latent Dirichlet Allocation to Group Discovery in Large Graphs
Lawrence Livermore National Laboratory Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs Keith Henderson and Tina Eliassi-Rad keith@llnl.gov and eliassi@llnl.gov This work was performed
More informationLatent Dirichlet Allocation
Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability
More informationNon-parametric Clustering with Dirichlet Processes
Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction
More informationCS 572: Information Retrieval
CS 572: Information Retrieval Lecture 11: Topic Models Acknowledgments: Some slides were adapted from Chris Manning, and from Thomas Hoffman 1 Plan for next few weeks Project 1: done (submit by Friday).
More informationMixtures of Multinomials
Mixtures of Multinomials Jason D. M. Rennie jrennie@gmail.com September, 25 Abstract We consider two different types of multinomial mixtures, () a wordlevel mixture, and (2) a document-level mixture. We
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationText Mining for Economics and Finance Latent Dirichlet Allocation
Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model
More informationEvaluation Methods for Topic Models
University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationApplying hlda to Practical Topic Modeling
Joseph Heng lengerfulluse@gmail.com CIST Lab of BUPT March 17, 2013 Outline 1 HLDA Discussion 2 the nested CRP GEM Distribution Dirichlet Distribution Posterior Inference Outline 1 HLDA Discussion 2 the
More informationhow to *do* computationally assisted research
how to *do* computationally assisted research digital literacy @ comwell Kristoffer L Nielbo knielbo@sdu.dk knielbo.github.io/ March 22, 2018 1/30 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class Person(object):
More informationTopic Modeling: Beyond Bag-of-Words
University of Cambridge hmw26@cam.ac.uk June 26, 2006 Generative Probabilistic Models of Text Used in text compression, predictive text entry, information retrieval Estimate probability of a word in a
More informationNotes on Latent Semantic Analysis
Notes on Latent Semantic Analysis Costas Boulis 1 Introduction One of the most fundamental problems of information retrieval (IR) is to find all documents (and nothing but those) that are semantically
More informationFast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine
Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu
More informationGaussian Models
Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density
More informationAUTOMATIC DETECTION OF WORDS NOT SIGNIFICANT TO TOPIC CLASSIFICATION IN LATENT DIRICHLET ALLOCATION
AUTOMATIC DETECTION OF WORDS NOT SIGNIFICANT TO TOPIC CLASSIFICATION IN LATENT DIRICHLET ALLOCATION By DEBARSHI ROY A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
More informationLanguage Information Processing, Advanced. Topic Models
Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:
More informationInteractive Topic Modeling
The 49 th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies Interactive Topic Modeling Yuening Hu, Jordan Boyd-Graber, Brianna Satinoff {ynhu, bsonrisa}@cs.umd.edu,
More informationModeling Topics and Knowledge Bases with Embeddings
Modeling Topics and Knowledge Bases with Embeddings Dat Quoc Nguyen and Mark Johnson Department of Computing Macquarie University Sydney, Australia December 2016 1 / 15 Vector representations/embeddings
More informationMixture Models and Expectation-Maximization
Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?
More informationPachinko Allocation: DAG-Structured Mixture Models of Topic Correlations
: DAG-Structured Mixture Models of Topic Correlations Wei Li and Andrew McCallum University of Massachusetts, Dept. of Computer Science {weili,mccallum}@cs.umass.edu Abstract Latent Dirichlet allocation
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University April 5, 2011 Today: Latent Dirichlet Allocation topic models Social network analysis based on latent probabilistic
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationA Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank
A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,
More informationAn Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition
An Empirical Study on Dimensionality Optimization in Text Mining for Linguistic Knowledge Acquisition Yu-Seop Kim 1, Jeong-Ho Chang 2, and Byoung-Tak Zhang 2 1 Division of Information and Telecommunication
More informationProbabilistic Latent Semantic Analysis
Probabilistic Latent Semantic Analysis Dan Oneaţă 1 Introduction Probabilistic Latent Semantic Analysis (plsa) is a technique from the category of topic models. Its main goal is to model cooccurrence information
More informationLearning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text
Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text Yi Zhang Machine Learning Department Carnegie Mellon University yizhang1@cs.cmu.edu Jeff Schneider The Robotics Institute
More informationThe Role of Semantic History on Online Generative Topic Modeling
The Role of Semantic History on Online Generative Topic Modeling Loulwah AlSumait, Daniel Barbará, Carlotta Domeniconi Department of Computer Science George Mason University Fairfax - VA, USA lalsumai@gmu.edu,
More informationLatent variable models for discrete data
Latent variable models for discrete data Jianfei Chen Department of Computer Science and Technology Tsinghua University, Beijing 100084 chris.jianfei.chen@gmail.com Janurary 13, 2014 Murphy, Kevin P. Machine
More informationSparse Stochastic Inference for Latent Dirichlet Allocation
Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation
More informationInexact Search is Good Enough
Inexact Search is Good Enough Advanced Machine Learning for NLP Jordan Boyd-Graber MATHEMATICAL TREATMENT Advanced Machine Learning for NLP Boyd-Graber Inexact Search is Good Enough 1 of 1 Preliminaries:
More informationLecture 22 Exploratory Text Analysis & Topic Models
Lecture 22 Exploratory Text Analysis & Topic Models Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor [Some slides borrowed from Michael Paul] 1 Text Corpus
More informationWeb-Mining Agents Topic Analysis: plsi and LDA. Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme
Web-Mining Agents Topic Analysis: plsi and LDA Tanya Braun Ralf Möller Universität zu Lübeck Institut für Informationssysteme Acknowledgments Pilfered from: Ramesh M. Nallapati Machine Learning applied
More informationReplicated Softmax: an Undirected Topic Model. Stephen Turner
Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental
More informationNote on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing
Note on Algorithm Differences Between Nonnegative Matrix Factorization And Probabilistic Latent Semantic Indexing 1 Zhong-Yuan Zhang, 2 Chris Ding, 3 Jie Tang *1, Corresponding Author School of Statistics,
More informationParallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability
Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability Ramesh Nallapati, William Cohen and John Lafferty Machine Learning Department Carnegie Mellon
More informationLatent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization
Latent Dirichlet Allocation and Singular Value Decomposition based Multi-Document Summarization Rachit Arora Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India.
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More informationRETRIEVAL MODELS. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
RETRIEVAL MODELS Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Retrieval models Boolean model Vector space model Probabilistic
More informationLDA with Amortized Inference
LDA with Amortied Inference Nanbo Sun Abstract This report describes how to frame Latent Dirichlet Allocation LDA as a Variational Auto- Encoder VAE and use the Amortied Variational Inference AVI to optimie
More informationModeling User Rating Profiles For Collaborative Filtering
Modeling User Rating Profiles For Collaborative Filtering Benjamin Marlin Department of Computer Science University of Toronto Toronto, ON, M5S 3H5, CANADA marlin@cs.toronto.edu Abstract In this paper
More informationText Mining: Basic Models and Applications
Introduction Basics Latent Dirichlet Allocation (LDA) Markov Chain Based Models Public Policy Applications Text Mining: Basic Models and Applications Alvaro J. Riascos Villegas University of los Andes
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationDepartment of Computer Science, Guiyang University, Guiyang , GuiZhou, China
doi:10.21311/002.31.12.01 A Hybrid Recommendation Algorithm with LDA and SVD++ Considering the News Timeliness Junsong Luo 1*, Can Jiang 2, Peng Tian 2 and Wei Huang 2, 3 1 College of Information Science
More informationDirichlet Enhanced Latent Semantic Analysis
Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,
More informationModeling Environment
Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA
More informationMixed Membership Matrix Factorization
Mixed Membership Matrix Factorization Lester Mackey 1 David Weiss 2 Michael I. Jordan 1 1 University of California, Berkeley 2 University of Pennsylvania International Conference on Machine Learning, 2010
More informationQuery-document Relevance Topic Models
Query-document Relevance Topic Models Meng-Sung Wu, Chia-Ping Chen and Hsin-Min Wang Industrial Technology Research Institute, Hsinchu, Taiwan National Sun Yat-Sen University, Kaohsiung, Taiwan Institute
More informationGraphical models and topic modeling. Ho Tu Bao Japan Advance Institute of Science and Technology John von Neumann Institute, VNU-HCM
Graphical models and topic modeling Ho Tu Bao Japan Advance Institute of Science and Technology John von Neumann Institute, VNU-HCM Content Brief overview of graphical models Introduction to topic models
More informationProbabilistic Topic Models Tutorial: COMAD 2011
Probabilistic Topic Models Tutorial: COMAD 2011 Indrajit Bhattacharya Assistant Professor Dept of Computer Sc. & Automation Indian Institute Of Science, Bangalore My Background Interests Topic Models Probabilistic
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationMETHODS FOR IDENTIFYING PUBLIC HEALTH TRENDS. Mark Dredze Department of Computer Science Johns Hopkins University
METHODS FOR IDENTIFYING PUBLIC HEALTH TRENDS Mark Dredze Department of Computer Science Johns Hopkins University disease surveillance self medicating vaccination PUBLIC HEALTH The prevention of disease,
More informationLecture 8: Graphical models for Text
Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/
More informationHTM: A Topic Model for Hypertexts
HTM: A Topic Model for Hypertexts Congkai Sun Department of Computer Science Shanghai Jiaotong University Shanghai, P. R. China martinsck@hotmail.com Bin Gao Microsoft Research Asia No.49 Zhichun Road
More informationLatent Dirichlet Allocation Based Multi-Document Summarization
Latent Dirichlet Allocation Based Multi-Document Summarization Rachit Arora Department of Computer Science and Engineering Indian Institute of Technology Madras Chennai - 600 036, India. rachitar@cse.iitm.ernet.in
More informationProbabilistic Matrix Factorization
Probabilistic Matrix Factorization David M. Blei Columbia University November 25, 2015 1 Dyadic data One important type of modern data is dyadic data. Dyadic data are measurements on pairs. The idea is
More informationMixed-membership Models (and an introduction to variational inference)
Mixed-membership Models (and an introduction to variational inference) David M. Blei Columbia University November 24, 2015 Introduction We studied mixture models in detail, models that partition data into
More informationCollaborative Topic Modeling for Recommending Scientific Articles
Collaborative Topic Modeling for Recommending Scientific Articles Chong Wang and David M. Blei Best student paper award at KDD 2011 Computer Science Department, Princeton University Presented by Tian Cao
More informationUncovering News-Twitter Reciprocity via Interaction Patterns
Uncovering News-Twitter Reciprocity via Interaction Patterns Yue Ning 1 Sathappan Muthiah 1 Ravi Tandon 2 Naren Ramakrishnan 1 1 Discovery Analytics Center, Department of Computer Science, Virginia Tech
More informationLatent Dirichlet Conditional Naive-Bayes Models
Latent Dirichlet Conditional Naive-Bayes Models Arindam Banerjee Dept of Computer Science & Engineering University of Minnesota, Twin Cities banerjee@cs.umn.edu Hanhuai Shan Dept of Computer Science &
More information