Latent Dirichlet Allocation
|
|
- Nelson Harrington
- 6 years ago
- Views:
Transcription
1 Latent Dirichlet Allocation 1
2 Directed Graphical Models William W. Cohen Machine Learning
3 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability distribution: Pr(i 1,,i-1,i+1,,n) = Pr(i parents(i)) Alarm Phone Call Your house has a twitchy burglar alarm that is also sometimes triggered by earthquakes. Earth arguably doesn t care whether your house is currently being burgled While you are on vacation, one of your neighbors calls and tells you your home s burglar alarm is ringing. Uh oh! 3
4 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability distribution: Pr(i 1,,i-1,i+1,,n) = Pr(i parents(i)) Alarm Phone Call Generative story (and joint distribution): Pick b ~ Pr(Burglar), a binomial Pick e ~ Pr(Earthquake), a binomial Pick a ~ Pr(Alarm Burglar=b, Earthquake=e), four binomials Pick c ~ Pr(PhoneCall Alarm) 4
5 DGMs: The Burglar Alarm example You can also compute other quantities: e.g., Pr(Burglar=true PhoneCall=true) Burglar Alarm Earthquake Pr(Burglar=true PhoneCall=true, and Earthquake=true) Phone Call Generative story: Pick b ~ Pr(Burglar), a binomial Pick e ~ Pr(Earthquake), a binomial Pick a ~ Pr(Alarm Burglar=b, Earthquake=e), four binomials Pick c ~ Pr(PhoneCall Alarm) 5
6 Inference in Bayes nets + E General problem: given evidence E 1,,E k compute P( E 1,..,E k ) for any U 1 U 2 Big assumption: graph is polytree <=1 undirected path between any nodes,y Notation: Z 1 Y 1 Y 2 Z 2 E + = "causal support"for E E = "evidential support"for Slide 6
7 Slide 7 Inference in Bayes nets: P( E) ), ( ) ( + = E E P E P ) ( ) ( ), ( = E E P E P E E P ) ( ) ( ) ( + E P E P E P )... ( letsstart with + E P Y 1 Y 2 Z 2 Z 1 U 2 U 1 E + E E+: causal support E-: evidential support
8 Inference in Bayes nets: P( E + ) P( E + ) = E +1 U 1 U E u u 1, 2 + P( u1, u2, E ) P( u1, u2 E + ) Z 1 Z 2 = u u 1, 2 P u, u ) P( u E ) ( 1 2, j j + j Y 1 Y 2 E CPT table lookup Recursive call to P(. E + ) = P( u) P(u j E Uj\ ) u j So far: simple way of propagating requests for belief due to causal evidence up the tree Evidence for Uj that doesn t go thru I.e. info on Pr( E+) flows down Slide 8
9 Inference in Bayes nets: P(E - ) simplified P( E) P(E ) = j y jk j P( E P(E Yj\ = P(y jk ) j y jk ) P( ) P(E Yj\, y jk ) = P(y jk ) P(E j Yj y jk ) So far: simple way of propagating requests for belief due to evidential support down the tree E + ) d-sep + polytree Recursive call to P(E -. ) I.e. info on Pr(E- ) flows up 1 E E j j Y Z 1 U 1 U 2 Y 1 Y 2 Z 2 + E 2 E :evidential support for Y E Yj \ : evidence for Y j excluding support through j Slide 9
10 Recap: Message Passing for BP + E U 1 U 2 We reduced P( E) to product of two recursively calculated parts: P(=x E + ) = u u 1, 2 P u, u ) P( u E ) ( 1 2, j j + j Z 1 Y 1 Y 2 Z 2 E i.e., CPT for and product of forward messages from parents P(E - =x) = β j ( EY y jk) P( y j jk, z jk ) P( z jk EZ j \ Y j y z i.e., combination of backward messages from parents, CPTs, and P(Z E Z\Yk ), a simpler instance of P( E) This can also be implemented by message-passing (belief propagation) Messages are distributions i.e., vectors jk P ) jk k Slide 10
11 Recap: Message Passing for BP + E U 1 U 2 Top-level algorithm Z 1 Z 2 Pick one vertex as the root Any node with only one edge is a leaf Y 1 Y 2 E Pass messages from the leaves to the root Pass messages from root to the leaves Now every has received P( E+) and P(E- ) and can compute P( E) Slide 11
12 NAÏVE BAYES AS A DGM 12
13 Naïve Bayes as a DGM For each document d in the corpus (of size D): - Pick a label y d from Pr(Y) - For each word in d (of length N d ): Pick a word x id from Pr( Y=y d ) onion 0.3 economist 0.7 Pr(Y=y) Y for every Y Pr( y=y) onion aardvark onion ai economist aardvark economist zymurgy N d D 13 Plate diagram
14 Naïve Bayes as a DGM For each document d in the corpus (of size D): - Pick a label y d from Pr(Y) - For each word in d (of length N d ): Pick a word x id from Pr( Y=y d ) Y Not described: how do we smooth for classes? For multinomials? How many classes are there?. N d D 14 Plate diagram
15 Recap: smoothing for a binomial MLE: maximize Pr(D θ) MAP: maximize Pr(D θ)pr(θ) estimate Θ=P(heads) for a binomial with MLE as: #heads #tails and with MAP as: #imaginary heads Smoothing = prior over the parameter θ 15 #imaginary tails
16 Smoothing for a binomial as a DGM MAP for dataset D with α 1 heads and α 2 tails: #imaginary heads γ θ #imaginary tails MAP is a Viterbi-style inference: want to find max probability paramete θ, to the posterior distribution D Also: inference in a simple graph can be intractable if the conditional distributions are complicated 16
17 Smoothing for a binomial as a DGM MAP for dataset D with α 1 heads and α 2 tails: #imaginary heads γ θ replacing D=d with F=f1,f2, #imaginary Dirichlet prior has tails data for each of K classes Comment: conjugate for binomial is a Beta, conjugate for a multinomial is called a Dirichlet F α 1 + α 2 17
18 Recap: Naïve Bayes as a DGM Now: let s turn Bayes up to 11 for naïve Bayes. Y N d D 18 Plate diagram
19 A more Bayesian Naïve Bayes From a Dirichlet α: - Draw a multinomial π over the K classes From a Dirichlet β For each class y=1 K Draw a multinomial γ[y] over the vocabulary For each document d=1..d: - Pick a label y d from π - For each word in d (of length N d ): Pick a word x id from γ[y d ] α β 19 π Y W γ N d K D
20 Unsupervised Naïve Bayes From a Dirichlet α: - Draw a multinomial π over the K classes From a Dirichlet β For each class y=1 K Draw a multinomial γ[y] over the vocabulary For each document d=1..d: - Pick a label y d from π - For each word in d (of length N d ): Pick a word x id from γ[y d ] α β 20 π Y W γ N d K D
21 Unsupervised Naïve Bayes The generative story is the same What we do is different - We want to both Uind values of γ and π and find values for Y for each example. α π Y EM is a natural algorithm W N d D β 21 γ K
22 LDA AS A DGM 22
23 The LDA Topic Model 23
24 24
25 Unsupervised NB vs LDA one class prior α π α different class distrib for each doc π one Y per doc Y W one Y per word Y W N d N d D D β γ K β 25 γ K
26 Unsupervised NB vs LDA one class prior α π α different class distrib θ for each doc θ d one Y per doc Y W one Z per word ZY di W di N d N d D D β γ K β 26 γ k K
27 LDA Blei s motivation: start with BOW assumption θ Assumptions: 1) documents are i.i.d 2) within a document, words are i.i.d. (bag of words) For each document d = 1,!,M Generate θ d ~ D 1 ( ) w For each word n = 1,!, N d generate w n ~ D 2 (. θ dn ) Now pick your favorite distributions for D 1, D 2 27
28 Unsupervised NB vs LDA θ w Unsupervised NB clusters documents into latent classes LDA clusters word occurrences into latent classes (topics) The smoothness of β k è same word suggests same topic The smoothness of θ d è same document suggests same topic α β 28 θ d ZY di W di γ k N d K D
29 LDA s view of a document Mixed membership model 29
30 LDA topics: top words w by Pr(w Z=k) Z=13 Z=22 Z=27 Z=19 30
31 SVM using 50 features: Pr(Z=k θ d ) 50 topics vs all words, SVM 31
32 Gibbs Sampling for LDA 32
33 LDA Latent Dirichlet Allocation Parameter learning: Variational EM Collapsed Gibbs Sampling Wait, why is sampling called learning here? Here s the idea. 33
34 LDA Gibbs sampling works for any directed model! Applicable when joint distribution is hard to evaluate but conditional distribution is known Sequence of samples comprises a Markov Chain Stationary distribution of the chain is the joint distribution Key capability: estimate distribution of one latent variables given the other latent variables and observed variables. 34
35 α θ Z α θ Z 1 Z 2 Y 2 1 Y 1 2 Y 2 I ll assume we know parameters for Pr( Z) and Pr(Y ) 35
36 Initialize all the hidden variables randomly, then. Pick Z 1 ~ Pr(Z x 1,y 1,θ) Pick θ ~ Pr(z 1, z 2,α) pick from posterior Pick Z 2 ~ Pr(Z x 2,y 2,θ) α θ... Pick Z 1 ~ Pr(Z x 1,y 1,θ) Z 1 1 Y 1 Z 2 2 Y 2 Pick θ ~ Pr(z 1, z 2,α) pick from posterior Pick Z 2 ~ Pr(Z x 2,y 2,θ) So we will have (a sample of) the true θ in a broad range of cases eventually these will converge to samples from the true joint distribution 36
37 Why does Gibbs sampling work? Basic claim: when you sample x~p( y 1,,y k ) then if y 1,,y k were sampled from the true joint then x will be sampled from the true joint So the true joint is a Uixed point you tend to stay there if you ever get there How long does it take to get there? depends on the structure of the space of samples: how well-connected are they by the sampling steps? 37
38 38
39 Mixing time depends on the eigenvalues of the state space! 39
40 LDA Latent Dirichlet Allocation Parameter learning: Variational EM Not covered in 601-B Collapsed Gibbs Sampling What is collapsed Gibbs sampling? 40
41 Initialize all the Z s randomly, then. Pick Z 1 ~ Pr(Z 1 z 2,z 3,α) Pick Z 2 ~ Pr(Z 2 z 1,z 3,α) Pick Z 3 ~ Pr(Z 3 z 1,z 2,α) α θ Pick Z 1 ~ Pr(Z 1 z 2,z 3,α) Pick Z 2 ~ Pr(Z 2 z 1,z 3,α) Z 1 Z 2 Z 3 Pick Z 3 ~ Pr(Z 3 z 1,z 2,α)... 1 Y 1 2 Y 2 3 Y 3 Converges to samples from the true joint and then we can estimate Pr(θ α, sample of Z s) 41
42 Initialize all the Z s randomly, then. Pick Z 1 ~ Pr(Z 1 z 2,z 3,α) α θ What s this distribution? Z 1 Z 2 Z 3 1 Y 1 2 Y 2 3 Y 3 42
43 Initialize all the Z s randomly, then. Pick Z 1 ~ Pr(Z 1 z 2,z 3,α) α θ Simpler case: What s this distribution? called a Dirichlet-multinomial and it looks like this: Z 1 Z 2 Z 3 Pr(Z 1 = k 1, Z 2 = k 2, Z 3 = k 3 α) = θ Pr(Z 1 = k 1, Z 2 = k 2, Z 3 = k 3 θ)pr( θ α)dθ If there are k values for the Z s and n k = # Z s with value k, then it turns out: Pr(Z α) = if n is pos int Pr(Z θ)pr( θ α)dθ = θ Γ α k k Γ α k + n k k k k Γ( n k +α k ) Γ(α k ) 43
44 Initialize all the Z s randomly, then. Pick Z 1 ~ Pr(Z 1 z 2,z 3,α) α It turns out that sampling from a Dirichlet-multinomial is very easy! θ Notation: k values for the Z s n k = # Z s with value k Z = (Z 1,, Z m ) Z (-i) = (Z 1,, Z i-1, Z i+1,, Z m ) n k (-i) = # Z s with value k excluding Z i Z 1 Z 2 Z 3 Pr(Z i = k Z ( i),α) n k ( i) +α k Pr(Z α) = Pr(Z θ)pr( θ α)dθ = θ Γ α k k Γ α k + n k k k k Γ( n k +α k ) Γ(α k ) 44
45 What about with downstream evidence? Pr(Z i = k Z ( i),α) n k ( i) +α k captures the constraints on Z i via θ (from above, causal direction) what about via β? Pr(Z) = (1/c) * Pr(Z E+) Pr(E- Z) α θ Z 1 Z 2 Z η Z 1 β[k] Z 2 β[k] 1 2 η 45
46 Sampling for LDA Notation: k values for the Z d,i s Z (-d,i) = all the Z s but Z d,i n w,k = # Z d,i s with value k paired with W d,i =w n *,k = # Z d,i s with value k n w,k (-d,i) = n w,k excluding Z i,d n *,k (-d,i) = n *k excluding Z i,d n *,k d,(-i) = n *k from doc d excluding Z i,d α θ d ZY di Pr(Z d,i = k Z ( d,i),w d,i = w,α) W di Pr(Z E+) ( n d,( i) *,k +α ) k fraction of time Z=k in doc d w',k Pr(E- Z) n ( d,i) w,k +η w n ( d,i) w',k +η w' η fraction of time W=w in topic k 46 β k N d K D
47 IMPLEMENTING LDA 47
48 Way way more detail 48
49 More detail 49
50 50
51 51
52 random z=1 z=2 z=3 unit height 52
53 What gets learned.. 53
54 In A Math-ier Notation N[*,*]=V N[*,k] N[d,k] M[w,k] 54
55 for each document d and word position j in d z[d,j] = k, a random topic N[d,k]++ W[w,k]++ where w = id of j-th word in d 55
56 for each pass t=1,2,. for each document d and word position j in d z[d,j] = k, a new random topic update N, W to reflect the new assignment of z: N[d,k]++; N[d,k ] - - where k is old z[d,j] W[w,k]++; W[w,k ] - - where w is w[d,j] 56
57 p(z d, j = k..) Pr(Z d, j = k "d")* Pr(W d,k = w Z d, j = k,...) = N[k, d] C d, j,k +α Z W[w, k] C d, j,k + β (W[*, k] C d, j,k )+ βn[*,*] C d, j,k =!# " $# 1 Z d, j = k 0 else 57
58 random z=1 z=2 z=3 unit height 1. You spend a lot of time sampling 2. There s a loop over all topics here in the sampler 58
Bayesian Networks: Independencies and Inference
Bayesian Networks: Independencies and Inference Scott Davies and Andrew Moore Note to other teachers and users of these slides. Andrew and Scott would be delighted if you found this source material useful
More informationBayesian Networks Introduction to Machine Learning. Matt Gormley Lecture 24 April 9, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks Matt Gormley Lecture 24 April 9, 2018 1 Homework 7: HMMs Reminders
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationBayesian Networks (Part II)
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Bayesian Networks (Part II) Graphical Model Readings: Murphy 10 10.2.1 Bishop 8.1,
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationGenerative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationLecture 8: Bayesian Networks
Lecture 8: Bayesian Networks Bayesian Networks Inference in Bayesian Networks COMP-652 and ECSE 608, Lecture 8 - January 31, 2017 1 Bayes nets P(E) E=1 E=0 0.005 0.995 E B P(B) B=1 B=0 0.01 0.99 E=0 E=1
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang
More informationCS 2750: Machine Learning. Bayesian Networks. Prof. Adriana Kovashka University of Pittsburgh March 14, 2016
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016 Plan for today and next week Today and next time: Bayesian networks (Bishop Sec. 8.1) Conditional
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationDirected Graphical Models
CS 2750: Machine Learning Directed Graphical Models Prof. Adriana Kovashka University of Pittsburgh March 28, 2017 Graphical Models If no assumption of independence is made, must estimate an exponential
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationBayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1
Bayes Networks CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 59 Outline Joint Probability: great for inference, terrible to obtain
More informationMachine Learning Lecture 14
Many slides adapted from B. Schiele, S. Roth, Z. Gharahmani Machine Learning Lecture 14 Undirected Graphical Models & Inference 23.06.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationProbablistic Graphical Models, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07
Probablistic Graphical odels, Spring 2007 Homework 4 Due at the beginning of class on 11/26/07 Instructions There are four questions in this homework. The last question involves some programming which
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationDirected Probabilistic Graphical Models CMSC 678 UMBC
Directed Probabilistic Graphical Models CMSC 678 UMBC Announcement 1: Assignment 3 Due Wednesday April 11 th, 11:59 AM Any questions? Announcement 2: Progress Report on Project Due Monday April 16 th,
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationSampling from Bayes Nets
from Bayes Nets http://www.youtube.com/watch?v=mvrtaljp8dm http://www.youtube.com/watch?v=geqip_0vjec Paper reviews Should be useful feedback for the authors A critique of the paper No paper is perfect!
More informationText Mining for Economics and Finance Latent Dirichlet Allocation
Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model
More informationProbability. CS 3793/5233 Artificial Intelligence Probability 1
CS 3793/5233 Artificial Intelligence 1 Motivation Motivation Random Variables Semantics Dice Example Joint Dist. Ex. Axioms Agents don t have complete knowledge about the world. Agents need to make decisions
More informationProbabilistic Graphical Models: Representation and Inference
Probabilistic Graphical Models: Representation and Inference Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Andrew Moore 1 Overview
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationDirected Graphical Models (aka. Bayesian Networks)
School of Computer Science 10-601B Introduction to Machine Learning Directed Graphical Models (aka. Bayesian Networks) Readings: Bishop 8.1 and 8.2.2 Mitchell 6.11 Murphy 10 Matt Gormley Lecture 21 November
More informationChapter 16. Structured Probabilistic Models for Deep Learning
Peng et al.: Deep Learning and Practice 1 Chapter 16 Structured Probabilistic Models for Deep Learning Peng et al.: Deep Learning and Practice 2 Structured Probabilistic Models way of using graphs to describe
More informationProbabilistic Graphical Networks: Definitions and Basic Results
This document gives a cursory overview of Probabilistic Graphical Networks. The material has been gleaned from different sources. I make no claim to original authorship of this material. Bayesian Graphical
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationStatistical Approaches to Learning and Discovery
Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationDirected Graphical Models. William W. Cohen Machine Learning
Directed Graphical Models William W. Cohen Machine Learning 10-601 MOTIVATION FOR GRAPHICAL MODELS Recap: A paradox of induction A black crow seems to support the hypothesis all crows are black. A pink
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationDynamic Approaches: The Hidden Markov Model
Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationApproximate Inference
Approximate Inference Simulation has a name: sampling Sampling is a hot topic in machine learning, and it s really simple Basic idea: Draw N samples from a sampling distribution S Compute an approximate
More informationCS 484 Data Mining. Classification 7. Some slides are from Professor Padhraic Smyth at UC Irvine
CS 484 Data Mining Classification 7 Some slides are from Professor Padhraic Smyth at UC Irvine Bayesian Belief networks Conditional independence assumption of Naïve Bayes classifier is too strong. Allows
More informationIntroduction to Bayesian Learning
Course Information Introduction Introduction to Bayesian Learning Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Apprendimento Automatico: Fondamenti - A.A. 2016/2017 Outline
More informationHidden Markov Models Part 2: Algorithms
Hidden Markov Models Part 2: Algorithms CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Hidden Markov Model An HMM consists of:
More informationBayesian Inference for Dirichlet-Multinomials
Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 22 April 2, 2018 1 Reminders Homework
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationCSE 473: Artificial Intelligence Autumn Topics
CSE 473: Artificial Intelligence Autumn 2014 Bayesian Networks Learning II Dan Weld Slides adapted from Jack Breese, Dan Klein, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer 1 473 Topics
More informationProbabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April
Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference
More informationModeling Environment
Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLearning Bayesian Networks (part 1) Goals for the lecture
Learning Bayesian Networks (part 1) Mark Craven and David Page Computer Scices 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some ohe slides in these lectures have been adapted/borrowed from materials
More informationRecall from last time: Conditional probabilities. Lecture 2: Belief (Bayesian) networks. Bayes ball. Example (continued) Example: Inference problem
Recall from last time: Conditional probabilities Our probabilistic models will compute and manipulate conditional probabilities. Given two random variables X, Y, we denote by Lecture 2: Belief (Bayesian)
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 3, 2016 CPSC 422, Lecture 11 Slide 1 422 big picture: Where are we? Query Planning Deterministic Logics First Order Logics Ontologies
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationGibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:
Gibbs Sampling Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644: 2019 03 27 Latent semantic analysis Documents as mixtures of topics (Hoffman 1999) 1 / 60 Latent semantic analysis
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationTopic Models. Charles Elkan November 20, 2008
Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft
More informationInference in Bayesian Networks
Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationCours 7 12th November 2014
Sum Product Algorithm and Hidden Markov Model 2014/2015 Cours 7 12th November 2014 Enseignant: Francis Bach Scribe: Pauline Luc, Mathieu Andreux 7.1 Sum Product Algorithm 7.1.1 Motivations Inference, along
More informationInfering the Number of State Clusters in Hidden Markov Model and its Extension
Infering the Number of State Clusters in Hidden Markov Model and its Extension Xugang Ye Department of Applied Mathematics and Statistics, Johns Hopkins University Elements of a Hidden Markov Model (HMM)
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationIntroduction to Artificial Intelligence. Unit # 11
Introduction to Artificial Intelligence Unit # 11 1 Course Outline Overview of Artificial Intelligence State Space Representation Search Techniques Machine Learning Logic Probabilistic Reasoning/Bayesian
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationLecture 10: Introduction to reasoning under uncertainty. Uncertainty
Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationIntelligent Systems (AI-2)
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 19 Oct, 24, 2016 Slide Sources Raymond J. Mooney University of Texas at Austin D. Koller, Stanford CS - Probabilistic Graphical Models D. Page,
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationHidden Markov Models. AIMA Chapter 15, Sections 1 5. AIMA Chapter 15, Sections 1 5 1
Hidden Markov Models AIMA Chapter 15, Sections 1 5 AIMA Chapter 15, Sections 1 5 1 Consider a target tracking problem Time and uncertainty X t = set of unobservable state variables at time t e.g., Position
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING Text Data: Topic Model Instructor: Yizhou Sun yzsun@cs.ucla.edu December 4, 2017 Methods to be Learnt Vector Data Set Data Sequence Data Text Data Classification Clustering
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationChapter 4 Dynamic Bayesian Networks Fall Jin Gu, Michael Zhang
Chapter 4 Dynamic Bayesian Networks 2016 Fall Jin Gu, Michael Zhang Reviews: BN Representation Basic steps for BN representations Define variables Define the preliminary relations between variables Check
More informationMachine Learning for Data Science (CS4786) Lecture 24
Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each
More informationHidden Markov Models
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Hidden Markov Models Matt Gormley Lecture 19 Nov. 5, 2018 1 Reminders Homework
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More information