Topic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up

Similar documents
COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #11 Scribe: Andrew Ferguson March 13, 2007

CS Lecture 18. Topic Models and LDA

Generative Clustering, Topic Modeling, & Bayesian Inference

Study Notes on the Latent Dirichlet Allocation

CS145: INTRODUCTION TO DATA MINING

Latent Dirichlet Allocation (LDA)

Bayesian Methods: Naïve Bayes

Latent Dirichlet Allocation

Introduction To Machine Learning

Topic Models and Applications to Short Documents

Language Information Processing, Advanced. Topic Models

Distributed ML for DOSNs: giving power back to users

Lecture 13 : Variational Inference: Mean Field Approximation

Recent Advances in Bayesian Inference Techniques

Topic Modeling: Beyond Bag-of-Words

Information retrieval LSI, plsi and LDA. Jian-Yun Nie

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Evaluation Methods for Topic Models

Machine Learning, Fall 2009: Midterm

Latent Dirichlet Allocation Introduction/Overview

Applying Latent Dirichlet Allocation to Group Discovery in Large Graphs

Topic Modelling and Latent Dirichlet Allocation

ECE 5984: Introduction to Machine Learning

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Topic Models. Charles Elkan November 20, 2008

Gaussian Models

Nonparametric Bayesian Methods (Gaussian Processes)

STA 4273H: Statistical Machine Learning

Latent Dirichlet Allocation (LDA)

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Topic Models. Advanced Machine Learning for NLP Jordan Boyd-Graber OVERVIEW. Advanced Machine Learning for NLP Boyd-Graber Topic Models 1 of 1

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Topic Models. Material adapted from David Mimno University of Maryland INTRODUCTION. Material adapted from David Mimno UMD Topic Models 1 / 51

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Measuring Topic Quality in Latent Dirichlet Allocation

Bayesian Methods for Machine Learning

Bayes methods for categorical data. April 25, 2017

Mixture Models and Expectation-Maximization

Generative Models for Discrete Data

Directed Probabilistic Graphical Models CMSC 678 UMBC

Bayesian Learning. CSL603 - Fall 2017 Narayanan C Krishnan

STA 414/2104: Machine Learning

Non-Parametric Bayes

Probabilistic Latent Semantic Analysis

Gaussian Mixture Models, Expectation Maximization

DEPARTMENT OF COMPUTER SCIENCE Autumn Semester MACHINE LEARNING AND ADAPTIVE INTELLIGENCE

PROBABILISTIC LATENT SEMANTIC ANALYSIS

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Graphical Models

Gaussian Mixture Model

Online Bayesian Passive-Aggressive Learning

Online Bayesian Passive-Agressive Learning

Introduction to Bayesian inference

Kernel Density Topic Models: Visual Topics Without Visual Words

The Bayes classifier

Pattern Recognition and Machine Learning

Modeling Environment

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

BAYESIAN MACHINE LEARNING.

Bayesian Approaches Data Mining Selected Technique

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

CSC411 Fall 2018 Homework 5

13: Variational inference II

Midterm II. Introduction to Artificial Intelligence. CS 188 Spring ˆ You have approximately 1 hour and 50 minutes.

Chris Bishop s PRML Ch. 8: Graphical Models

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Contents. Part I: Fundamentals of Bayesian Inference 1

Machine Learning. Classification. Bayes Classifier. Representing data: Choosing hypothesis class. Learning: h:x a Y. Eric Xing

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Statistical Models. David M. Blei Columbia University. October 14, 2014

Introduction to Probabilistic Machine Learning

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning

topic modeling hanna m. wallach

CS6220: DATA MINING TECHNIQUES

RECSM Summer School: Facebook + Topic Models. github.com/pablobarbera/big-data-upf

Bayesian Linear Regression [DRAFT - In Progress]

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Dirichlet Enhanced Latent Semantic Analysis

Learning the Semantic Correlation: An Alternative Way to Gain from Unlabeled Text

Introduction to Computer Vision

Lecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models

Probabilistic Graphical Models

Text Mining for Economics and Finance Latent Dirichlet Allocation

Mixed-membership Models (and an introduction to variational inference)

Introduction to Bayesian Learning

16 : Approximate Inference: Markov Chain Monte Carlo

Naïve Bayes. Vibhav Gogate The University of Texas at Dallas

Classical Predictive Models

10-701/ Machine Learning - Midterm Exam, Fall 2010

Lecture 8: Graphical models for Text

Topic Modeling Using Latent Dirichlet Allocation (LDA)

Machine Learning, Midterm Exam: Spring 2009 SOLUTION

Statistical Debugging with Latent Topic Models

Latent Dirichlet Alloca/on

Machine Learning Lecture 2

Cheng Soon Ong & Christian Walder. Canberra February June 2018

CSE 546 Final Exam, Autumn 2013

Transcription:

Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014

Suppose we have a large number of books. Each is about several unknown topics. How can we tell which books are about similar (sets of) topics?

Suppose we have a large number of books. Each is about several unknown topics. α T K w P z N D β How can we tell which books are about similar (sets of) topics? This can be described by a topic model, in particular, the latent Dirichlet allocation model.

1 Latent Dirichlet Allocation 2 Success Stories 3 Wrap-up

Topic Model Notation We will discuss everything in terms of text classification. Often, other domains can be mapped onto these concepts. T. The (multinomial) distribution over words for a particular topic w. A word selected to be in the document of interest N. The number of words in the document of interest K. The number of topics z. The topic for the word of interest P. The (multinomial) distribution over topics for a particular document D. The number of documents α. The parameter to a Dirichlet distribution which is a prior for T β. The parameter to a Dirichlet distribution which is a prior for P

Text classification with naive Bayes T w N We used the naive Bayes model for a document about a single topic. Why shouldn t we use this for a document about multiple topics?

Mixtures of naive Bayes T K P w z N Each topic has its own distribution over the words. Look familiar?

Mixtures of naive Bayes T K P w z N Each topic has its own distribution over the words. Look familiar? All mixture models look basically the same.

Extending to multiple documents P T K w z N D Each document now has its own distribution over topics. We are now interested in the joint probability of topics and words over all documents. Contrast this with how we trained naive Bayes.

Moving away from maximum likelihood α P β T K w z N D We previously saw problems with MLE parameters. They still exist. We can add Dirichlet priors to the multinomial distributions. This is called the latent Dirichlet allocation model (LDA).

Typical goals in LDA We are often interested in the posterior distributions related to z. High values of P(w z), which show the words highly associated with each topic High values of P(z P), which shows the topics with which each document is associated

Parameter inference in LDA Estimating the parameters in LDA (and similar models) is non-trivial. Variational inference. The network is simplified by ignoring some edges and an EM-like algorithm is used to optimize the parameters. Gibbs sampling. We randomly draw samples from the model and based on the samples we see, we update the parameters.

Extensions Correlated topic models. Some topics are more likely to occur together than others. Dynamic topic models. The word distributions may change over time. Discriminitive topic models. We may want to make class membership predictions. Nonparametric models. We may not a priori know how many topics there are. Many, many others...

Bayesian networks in environment analysis Bayesian networks have been used to model willow seed dispersion (as a proxy for species invasion) in the Florida wetlands. These results are used to help decide how much logging is allowed. Wilkinson et al., 2013

LDA in consumer behavior LDA has been used to find groups of product relevant to particular consumers, as well as infuencers who impact other consumers. These results are used to direct advertising for a group-deal website. Sun et al., 2013

Discriminitive LDA in image recognition Discriminitive LDA has been used to effectively classify images based on small parts of the image ( words ). Fei-Fei et al., 2005

Correlated topic models for exploring Science Correlated topics models have been used to identify related articles in Science. Blei and Lafferty, 2007

Dynamic topic models for exploring Science Dynamic topics models have been used to analyze how the trends in Science have changed over time. Topic 1: Topic 2: Blei and Lafferty, 2009

Recap During this part of the course, we have discussed: Topic models as a method for uncovering underlying structure of objects Successful applications of graphical models in the wild

Final exam The final exam is Monday, February 24, 16:00-18:30 in B123. You can use a simple (non-graphing, not a cell phone, etc.) calculator and one A4 handwritten sheet of notes (front and back is okay). The exercises were worth 25 points. The exam is worth 35 points. You must score at least 20 points on the exam to receive a grade of 1 or more. See the syllabus for the grading scale. The renewal exam is April 15. The exercise points will count towards this exam.

Conclusions Thanks for participating in the course. I hope you have enjoyed it and will remember the problem-solving strategies we discussed when dealing with future challenges. Feel free to contact me if you have any questions about probabilistic models in the future.