Bayesian Structure Modeling. SPFLODD December 1, 2011
|
|
- Felix Griffin
- 5 years ago
- Views:
Transcription
1 Bayesian Structure Modeling SPFLODD December 1, 2011
2 Outline Defining Bayesian Parametric Bayesian models Latent Dirichlet allocabon (Blei et al., 2003) Bayesian HMM (Goldwater and Griffiths, 2007) A limle bit about inference Nonparametric Bayesian models Dirichlet Process and hierarchical DP NP Bayes and grammars (Beal et al., 2001; Johnson et al., 2007; Cohn et al., 2009)
3 In StaBsBcs BeMer to assign F. and B. to analyses, not people. Frequen'st analysis (most of science today): parameters are fixed and unknown; we gain informabon by repeated experiments Point esbmates, standard errors, confidence intervals ( in P% of experiments, the interval will cover the true θ ), hypothesis tests with α fixed in advance, reason about p(data H 0 ) Bayesian analysis: treat unknown parameters probabilisbcally; update beliefs as evidence arrives Start with p(θ) and infer p(θ data), means and quanbles of the posterior over θ, intervals corresponding to P% belief that θ is in the interval
4 The AMracBon of Bayesian Thinking Write down your model declarabvely, worry later about how to fit it from data. Prior encodes prior knowledge. We have lots of this when it comes to language, or at least we think we do! Manage uncertainty about the model the same way we manage uncertainty about the data. Bayesian methods are strongly associated with: unsupervised (and latent variable) learning generabve models
5 Evolving DefiniBons MLE (not Bayesian): Maximum a posteriori esbmabon: CompuBng the posterior over the parameters (fully Bayesian): Empirical Bayesian:
6 MAP Learning as a Graphical Model R p(w) p w (L) w L from one month ago V p w (V L) Combined inference (max over w, sum over L) is very hard. If w were fixed, geeng the posterior over L wouldn t be so bad. If L were fixed, maximizing over w wouldn t be so bad. Standard EM doesn t have p(w); it s very simple to add and useful in pracbce.
7 MLE θ L V α θ L MAP V α θ L Fully Bayesian V α θ L Empirical Bayesian V
8 MulBnomials Let s assume discrete distribubons that simply assign probabilibes to finite sets of events. n gram models, HMMs, PCFGs,
9 DistribuBons over MulBnomials You can think of a mulbnomial distribubon over d events as a point in the (d 1) simplex. [1, 0, 0, 0] [0, 0, 0, 1] [0, 1, 0, 0] [0, 0, 1, 0] To randomly pick a point in this space, we need a con'nuous distribubon over the simplex.
10 Dirichlet DistribuBon A distribubon over the devent probability simplex. Parameters: ρ, the mean of the Dirichlet, and α, the concentrabon around that mean (large α means smaller variance). Beta funcbon: Gamma funcbon (generalized factorial): p(θ α, ρ) = B(αρ) = Γ(a) = 1 B(αρ) d i=1 d Γ(αρ i ) i=1 0 Γ(α) θ αρ i 1 i t a 1 e a dt
11 from answers.com Dirichlet, d=3 (various parameter seengs)
12 Dirichlet, d= 3 (different means and variances ) from Liang and Klein, 2007
13 Sampling from a Dirichlet For i from 1 to do, sample v i from a gamma distribubon with shape αρ i and scale 1. Renormalize the vector v to obtain θ.
14 MAP with a Dirichlet Recall that we can use a prior to smooth an esbmate. For a mulbnomial θ with Dirichlet prior αρ > 1, this equates to adding pseudocounts to the vector of observed counts. ˆθ i = N i + αρ i 1 N + α d As counts become large, prior mamers less. Closed form! d Regularizer view: R(θ) = i=1 (αρ i 1) log θ i Flat prior: α = d, all ρ i = 1/d (equates to MLE) Sparse prior (encourages most θ i to go to zero), but now it s not closed form.
15 Mixture of Unigrams The generabve story for a classical documentclustering model would be something like this (Nigam et al., 2000): For i = 1... M (number of documents): Draw a document length Ni from some distribubon. Draw a topic z i for the document from a mulbnomial over topics, θ. For j = 1... N i : Draw word wij from the mulbnomial βz. Nigam et al. learned this using EM.
16 Mixture of Unigrams θ Z i mixture coefficients W i,j β z words N i documents M components of the mixture K Z i is somebmes called the topic of document i. A topic z is defined by a unigram distribubon β z. M = number of documents N i = number of words in document i K = number of mixture components
17 A Word Clustering Model θ Z i,j mixture coefficients W i,j β z words N i documents M components of the mixture K Problem: all words are the same; document informabon is irrelevant. (This is exactly a zero order HMM.) M = number of documents N i = number of words in document i K = number of mixture components
18 ProbabilisBc LSI (Hofmann, 1996) θ i Z i,j W i,j β z This has very limle to do with latent seman'c indexing, except that it s a probabilisbc model trying to perform a similar task. words N i documents M components of the mixture K Word clustering; documents correspond to distribu1ons over topics. Problem: can t describe new documents! M = number of documents N i = number of words in document i K = number of mixture components
19 Latent Dirichlet AllocaBon (Blei et al., 2003) α, ρ θ i Z i,j W i,j β z words N i documents M components of the mixture K Documents are mixtures of topics, but a prior over those mixtures lets us reason about new documents, too. M = number of documents N i = number of words in document i K = number of mixture components
20 LDA on ACL Papers (Gimpel, 2006)
21 Smoothed Latent Dirichlet AllocaBon (Blei et al., 2003) α, ρ θ i Z i,j α, ρ W i,j β z words N i documents M components of the mixture K M = number of documents N i = number of words in document i K = number of mixture components
22 Topic Models Beyond LDA Small industry in variabons on topic models, usually adding more evidence to be explained by the topics. Examples: Supervised LDA (Blei and McAuliffe, 2007) adds an observed document category. Link LDA (Erosheva et al., 2004) adds citabons to the document, explained by more draws from θ i Author topic model (Rosen Zvi et al., 2004) adds authors. Correlated topic model (Blei and Lafferty, 2006) lets different topics correlate more flexibly through a different prior. Comment LDA (Yano et al., 2009) generates comments from a different set of unigram models but the same topics.
23 Where is the Structure? Through the topics and θ i, words in a document become interdependent. Kind of a joint document/word clustering. Not really discrete structure the way we ve mostly discussed in this class, though. LDA is a Bayesian zero order HMM.
24 Where is the PredicBon? Topics are hard to evaluate; no gold standard. This is either an open problem or the nail in the coffin, depending on your point of view.
25 Hidden Markov Models α, ρ γ y K Y i,j 1 Y i,j Bayesian HMM (Goldwater and Griffiths, 2007) X i,j η y N i M K α, ρ γ y K Y i,j 1 Y i,j Unsupervised HMM (Merialdo, 1994) X i,j η y N i M K M = number of sequences N i = number of words in sequence i K = number of states
26 The Engineering Part Typically approximate inference is required. Markov chain Monte Carlo (e.g., Gibbs sampling) VariaBonal inference (e.g., mean field) Graphical model view is really helpful when designing inference algorithms for your Bayesian model! Learning: approximate inference + opbmizabon of hyperparameters (for LDA, usually α and β; ρ is o{en assumed uniform). StochasBc or variabonal EM, depending on your choice of approximate inference. Full Bayesian: fix the prior and do inference on all your data. ImplicaBons for train/test methodology?
27 Mean Field VariaBonal Inference in One Slide Consider all hidden variables (structure, parameters, whatever). Call these L and everything else V. Define a parametric distribubon q over each variable. OpBmize this lower bound on likelihood: arg max q E q(l) log p(l V = v)+h(q(l)) Use the collecbon of q as a posterior. InteresBngly, the block coordinate ascent algorithms used to do this for mulbnomial based models look an awful lot like EM!
28 Going Nonparametric How many topics or states? Nonparametric: let the data decide. Not necessarily Bayesian or even probabilisbc! More data jusbfy more parameters. Most common nonparametric and Bayesian tools in NLP are based on the Dirichlet process. DP is not the same as the Dirichlet distribu1on.
29 Dirichlet Process (Key Property) Two parameters, α 0 (concentrabon) and G 0 (base distribubon). If we draw G from DP(α 0, G 0 ), then G is drawn from an infinite mixture (Sethuraman, 1994): G(θ) = k=1 π k { 1 if θ = θk 0 otherwise mixture coefficient independent draw from G 0
30 Dirichlet Process (Key Property) Let Ω denote the event space for G 0 and G. For any finite parbbon of Ω into N parts Ω 1,, Ω N : G(Ω 1 ),..., G(Ω N ) Dirichlet(α 0, G 0 (Ω 1 ),..., G 0 (Ω N ) ) Special case: Ω is finite; result is a Dirichlet.
31 Two Helpful Views S'ck breaking view emphasizes the infinite mixture property. Can be understood as generabng the model, then the data. Useful for deriving a variabonal inference algorithm. Chinese restaurant process view. Data and model generated together. Useful for deriving an MCMC inference algorithm.
32 SBck Breaking Process 1. Draw an infinitely long sequence of mixture components θ 1, from the base distribubon G Draw an infinitely long sequence of mixture coefficients π Draw θ from the mixture: { 1 if θ = θk G(θ) = π k 0 otherwise k=1
33 Mixture Coefficients and GEM For i = 1 to : Draw v i from Beta(1, α 0 ). Aside: a more general framework uses Beta(1 b, α 0 + ib), giving a Pitman Yor process. Let: i 1 π i = v i (1 v j ) j=1
34 Infinite Language Model sbck α 0 π z θ unigram language model G 0 θ z W i words N
35 Chinese Restaurant Process A different way to get to the same distribubon.
36 Chinese Restaurant Process 1. A customer, X 1 walks into a restaurant and sits at the first table. Draw X 1 from G 0. Let t = For i = 2, : A. Customer i chooses a table: p(y i = j y 1,..., y i 1 ) = B. If Y i = t + 1, then draw X i from G 0 and ++t. Otherwise X i is copied from other customers at the table. N j α 0 + i 1 α α + i 1 if j {1,..., t} if j = t +1
37 Why the CRP? No infinitely large random variables. θ and π are now encoded in Y. Highlights the exchangeability property: the probability of the data items doesn t depend on the order in which they are observed. Weaker than IID assumpbons. The data are, in fact, not IID, and that is why inference is much harder.
38 Dirichlet Process Prior What kinds of distribubons does it prefer? The expected number of tables grows with the number of customers N as α 0 log N.
39 α 0 = 1, 1000 customers
40 α 0 = 10, 1000 customers α 0 = 100, 1000 customers
41 α 0 = 0.1, 1000 customers
42 Hierarchical DP First draw from a DP. Use that as a base distribubon for a second DP, from which we draw mulbple Bmes. E.g., β z distribubons in a topic model or η distribubons in an HMM, which should range over the same vocabulary. E.g., γ distribubons in an HMM, which should range over the same set of states. You can do the same thing with a Pitman Yor process.
43 Infinite HMM (Beal et al., 2001) The states, transibons, and emissions are generated by an HDP. The base DP creates a set of states. Each transibon distribubon is generated by the secondary DP, all with the same base DP; so the states are shared. Infinite state PCFGs studied by Liang et al. (2007) and Finkel et al. (2007).
44 Adaptor Grammars (Goldwater et al., 2006) PCFG: when expanding nonterminal N, draw from θ N ; then recur on children. In adaptor grammars, the above happens when a customer in the N restaurant sits at a new table. I.e., the PCFG is a base distribubon. ExisBng tables are associated with en1re subtrees, down to the leaves.
45 Bayesian Tree SubsBtuBon Grammars (Cohn et al., 2009) TSG is a context free formalism in which rewrite rules can be whole chunks of a tree, with nonterminals at the fronber. Rules are elementary trees Treebank does not show you where the boundaries are between the elementary trees. Base distribubon over elementary trees. HDP to generate the grammar. Similar to adaptor grammar, but elementary trees get cached, not enbre subtrees.
46
47 Why Are These Models Called Infinite? The number of effec1ve grammar rules is arbitrarily large (not infinite). That is, the posterior distribubon will believe that the data came from only a finite number of rules. But with more observabons, more rules might get used to explain the data.
48 Remember Bayesian describes your model, not you! Nonparametric is one way to get around the how many clusters problem. Orthogonal to Bayesian Not the only way to do it, but quite useful in structured seengs where your latent variables aren t just clusters.
Generative Clustering, Topic Modeling, & Bayesian Inference
Generative Clustering, Topic Modeling, & Bayesian Inference INFO-4604, Applied Machine Learning University of Colorado Boulder December 12-14, 2017 Prof. Michael Paul Unsupervised Naïve Bayes Last week
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationHierarchical Bayesian Nonparametrics
Hierarchical Bayesian Nonparametrics Micha Elsner April 11, 2013 2 For next time We ll tackle a paper: Green, de Marneffe, Bauer and Manning: Multiword Expression Identification with Tree Substitution
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationNon-parametric Clustering with Dirichlet Processes
Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction
More informationThe Infinite PCFG using Hierarchical Dirichlet Processes
S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise S NP VP NP PRP VP VBD NP NP DT NN PRP she VBD heard DT the NN noise
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationDirichlet Processes: Tutorial and Practical Course
Dirichlet Processes: Tutorial and Practical Course (updated) Yee Whye Teh Gatsby Computational Neuroscience Unit University College London August 2007 / MLSS Yee Whye Teh (Gatsby) DP August 2007 / MLSS
More informationAN INTRODUCTION TO TOPIC MODELS
AN INTRODUCTION TO TOPIC MODELS Michael Paul December 4, 2013 600.465 Natural Language Processing Johns Hopkins University Prof. Jason Eisner Making sense of text Suppose you want to learn something about
More informationCSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado
CSCI 5822 Probabilistic Model of Human and Machine Learning Mike Mozer University of Colorado Topics Language modeling Hierarchical processes Pitman-Yor processes Based on work of Teh (2006), A hierarchical
More informationBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationBayesian Nonparametric Models
Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More informationImage segmentation combining Markov Random Fields and Dirichlet Processes
Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationCS Lecture 18. Topic Models and LDA
CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same
More informationChapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign
Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign sun22@illinois.edu Hongbo Deng Department of Computer Science University
More informationA Brief Overview of Nonparametric Bayesian Models
A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationLatent Dirichlet Alloca/on
Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which
More informationHierarchical Bayesian Languge Model Based on Pitman-Yor Processes. Yee Whye Teh
Hierarchical Bayesian Languge Model Based on Pitman-Yor Processes Yee Whye Teh Probabilistic model of language n-gram model Utility i-1 P(word i word i-n+1 ) Typically, trigram model (n=3) e.g., speech,
More informationNonparametric Mixed Membership Models
5 Nonparametric Mixed Membership Models Daniel Heinz Department of Mathematics and Statistics, Loyola University of Maryland, Baltimore, MD 21210, USA CONTENTS 5.1 Introduction................................................................................
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationPresent and Future of Text Modeling
Present and Future of Text Modeling Daichi Mochihashi NTT Communication Science Laboratories daichi@cslab.kecl.ntt.co.jp T-FaNT2, University of Tokyo 2008-2-13 (Wed) Present and Future of Text Modeling
More informationAn Overview of Nonparametric Bayesian Models and Applications to Natural Language Processing
An Overview of Nonparametric Bayesian Models and Applications to Natural Language Processing Narges Sharif-Razavian and Andreas Zollmann School of Computer Science Carnegie Mellon University Pittsburgh,
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationBayesian Nonparametrics: Dirichlet Process
Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian
More informationBayesian Inference for Dirichlet-Multinomials
Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationThe Infinite PCFG using Hierarchical Dirichlet Processes
The Infinite PCFG using Hierarchical Dirichlet Processes Percy Liang Slav Petrov Michael I. Jordan Dan Klein Computer Science Division, EECS Department University of California at Berkeley Berkeley, CA
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationStochastic Processes, Kernel Regression, Infinite Mixture Models
Stochastic Processes, Kernel Regression, Infinite Mixture Models Gabriel Huang (TA for Simon Lacoste-Julien) IFT 6269 : Probabilistic Graphical Models - Fall 2018 Stochastic Process = Random Function 2
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationAdvanced Machine Learning
Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing
More informationDirichlet Enhanced Latent Semantic Analysis
Dirichlet Enhanced Latent Semantic Analysis Kai Yu Siemens Corporate Technology D-81730 Munich, Germany Kai.Yu@siemens.com Shipeng Yu Institute for Computer Science University of Munich D-80538 Munich,
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationSpatial Bayesian Nonparametrics for Natural Image Segmentation
Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationDistance dependent Chinese restaurant processes
David M. Blei Department of Computer Science, Princeton University 35 Olden St., Princeton, NJ 08540 Peter Frazier Department of Operations Research and Information Engineering, Cornell University 232
More informationCSC 2541: Bayesian Methods for Machine Learning
CSC 2541: Bayesian Methods for Machine Learning Radford M. Neal, University of Toronto, 2011 Lecture 4 Problem: Density Estimation We have observed data, y 1,..., y n, drawn independently from some unknown
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) A review of topic modeling and customer interactions application 3/11/2015 1 Agenda Agenda Items 1 What is topic modeling? Intro Text Mining & Pre-Processing Natural Language
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationConstruction of Dependent Dirichlet Processes based on Poisson Processes
1 / 31 Construction of Dependent Dirichlet Processes based on Poisson Processes Dahua Lin Eric Grimson John Fisher CSAIL MIT NIPS 2010 Outstanding Student Paper Award Presented by Shouyuan Chen Outline
More informationCS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I
X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationInfinite-State Markov-switching for Dynamic. Volatility Models : Web Appendix
Infinite-State Markov-switching for Dynamic Volatility Models : Web Appendix Arnaud Dufays 1 Centre de Recherche en Economie et Statistique March 19, 2014 1 Comparison of the two MS-GARCH approximations
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationLatent Variable Models in NLP
Latent Variable Models in NLP Aria Haghighi with Slav Petrov, John DeNero, and Dan Klein UC Berkeley, CS Division Latent Variable Models Latent Variable Models Latent Variable Models Observed Latent Variable
More informationThe Infinite Markov Model
The Infinite Markov Model Daichi Mochihashi NTT Communication Science Laboratories, Japan daichi@cslab.kecl.ntt.co.jp NIPS 2007 The Infinite Markov Model (NIPS 2007) p.1/20 Overview ɛ ɛ is of will is of
More informationApplied Nonparametric Bayes
Applied Nonparametric Bayes Michael I. Jordan Department of Electrical Engineering and Computer Science Department of Statistics University of California, Berkeley http://www.cs.berkeley.edu/ jordan Acknowledgments:
More informationProbabilistic modeling of NLP
Structured Bayesian Nonparametric Models with Variational Inference ACL Tutorial Prague, Czech Republic June 24, 2007 Percy Liang and Dan Klein Probabilistic modeling of NLP Document clustering Topic modeling
More informationCollaborative topic models: motivations cont
Collaborative topic models: motivations cont Two topics: machine learning social network analysis Two people: " boy Two articles: article A! girl article B Preferences: The boy likes A and B --- no problem.
More informationLanguage Information Processing, Advanced. Topic Models
Language Information Processing, Advanced Topic Models mcuturi@i.kyoto-u.ac.jp Kyoto University - LIP, Adv. - 2011 1 Today s talk Continue exploring the representation of text as histogram of words. Objective:
More informationUnsupervised Learning
Unsupervised Learning Bayesian Model Comparison Zoubin Ghahramani zoubin@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc in Intelligent Systems, Dept Computer Science University College
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationDocument and Topic Models: plsa and LDA
Document and Topic Models: plsa and LDA Andrew Levandoski and Jonathan Lobo CS 3750 Advanced Topics in Machine Learning 2 October 2018 Outline Topic Models plsa LSA Model Fitting via EM phits: link analysis
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationCS 6140: Machine Learning Spring What We Learned Last Week. Survey 2/26/16. VS. Model
Logis@cs CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa@on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Assignment
More informationHierarchical Dirichlet Processes
Hierarchical Dirichlet Processes Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David M. Blei Computer Science Div., Dept. of Statistics Dept. of Computer Science University of California at Berkeley
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationCS 6140: Machine Learning Spring 2016
CS 6140: Machine Learning Spring 2016 Instructor: Lu Wang College of Computer and Informa?on Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: luwang@ccs.neu.edu Logis?cs Assignment
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationSharing Clusters Among Related Groups: Hierarchical Dirichlet Processes
Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Yee Whye Teh (1), Michael I. Jordan (1,2), Matthew J. Beal (3) and David M. Blei (1) (1) Computer Science Div., (2) Dept. of Statistics
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationInfinite latent feature models and the Indian Buffet Process
p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised
More informationStatistical Models. David M. Blei Columbia University. October 14, 2014
Statistical Models David M. Blei Columbia University October 14, 2014 We have discussed graphical models. Graphical models are a formalism for representing families of probability distributions. They are
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationLatent Dirichlet Allocation
Latent Dirichlet Allocation 1 Directed Graphical Models William W. Cohen Machine Learning 10-601 2 DGMs: The Burglar Alarm example Node ~ random variable Burglar Earthquake Arcs define form of probability
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationBayesian non parametric approaches: an introduction
Introduction Latent class models Latent feature models Conclusion & Perspectives Bayesian non parametric approaches: an introduction Pierre CHAINAIS Bordeaux - nov. 2012 Trajectory 1 Bayesian non parametric
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationNotes on Machine Learning for and
Notes on Machine Learning for 16.410 and 16.413 (Notes adapted from Tom Mitchell and Andrew Moore.) Choosing Hypotheses Generally want the most probable hypothesis given the training data Maximum a posteriori
More informationBayesian Tools for Natural Language Learning. Yee Whye Teh Gatsby Computational Neuroscience Unit UCL
Bayesian Tools for Natural Language Learning Yee Whye Teh Gatsby Computational Neuroscience Unit UCL Bayesian Learning of Probabilistic Models Potential outcomes/observations X. Unobserved latent variables
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationStatistical Methods for NLP
Statistical Methods for NLP Sequence Models Joakim Nivre Uppsala University Department of Linguistics and Philology joakim.nivre@lingfil.uu.se Statistical Methods for NLP 1(21) Introduction Structured
More informationPart IV: Monte Carlo and nonparametric Bayes
Part IV: Monte Carlo and nonparametric Bayes Outline Monte Carlo methods Nonparametric Bayesian models Outline Monte Carlo methods Nonparametric Bayesian models The Monte Carlo principle The expectation
More informationDirichlet Process. Yee Whye Teh, University College London
Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant
More informationInformation retrieval LSI, plsi and LDA. Jian-Yun Nie
Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and
More informationMidterm sample questions
Midterm sample questions CS 585, Brendan O Connor and David Belanger October 12, 2014 1 Topics on the midterm Language concepts Translation issues: word order, multiword translations Human evaluation Parts
More informationLecture 22 Exploratory Text Analysis & Topic Models
Lecture 22 Exploratory Text Analysis & Topic Models Intro to NLP, CS585, Fall 2014 http://people.cs.umass.edu/~brenocon/inlp2014/ Brendan O Connor [Some slides borrowed from Michael Paul] 1 Text Corpus
More informationEvaluation Methods for Topic Models
University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationLecture 19, November 19, 2012
Machine Learning 0-70/5-78, Fall 0 Latent Space Analysis SVD and Topic Models Eric Xing Lecture 9, November 9, 0 Reading: Tutorial on Topic Model @ ACL Eric Xing @ CMU, 006-0 We are inundated with data
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationModeling Environment
Topic Model Modeling Environment What does it mean to understand/ your environment? Ability to predict Two approaches to ing environment of words and text Latent Semantic Analysis (LSA) Topic Model LSA
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationCS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III
CS221 / Autumn 2017 / Liang & Ermon Lecture 15: Bayesian networks III cs221.stanford.edu/q Question Which is computationally more expensive for Bayesian networks? probabilistic inference given the parameters
More informationVariational Scoring of Graphical Model Structures
Variational Scoring of Graphical Model Structures Matthew J. Beal Work with Zoubin Ghahramani & Carl Rasmussen, Toronto. 15th September 2003 Overview Bayesian model selection Approximations using Variational
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationClustering problems, mixture models and Bayesian nonparametrics
Clustering problems, mixture models and Bayesian nonparametrics Nguyễn Xuân Long Department of Statistics Department of Electrical Engineering and Computer Science University of Michigan Vietnam Institute
More information