Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling

Size: px
Start display at page:

Download "Collapsed Gibbs and Variational Methods for LDA. Example Collapsed MoG Sampling"

Transcription

1 Case Stuy : Document Retrieval Collapse Gibbs an Variational Methos for LDA Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 7 th, 0 Example Collapse MoG Sampling Dir(,..., ) z i {µ, } F( ) x i z i N(x i ; µ z i, z i) Collapse sampler z i x i N Emily Fox 0

2 Example Collapse MoG Sampling Dir(,..., ) zi {µ, } F ( ) xi z i N (xi ; µzi, zi ) n n Derivation zi xi N Important facts: ( p(z:n ) = Q P Q ) (n + ) P ( ) ( n + ) (m + ) =m (m) Emily Fox 0 Latent Dirichlet Allocation (LDA) Emily Fox 0 4

3 LDA Generative Moel Observations: w,...,w N Associate topics: z,...,z N Parameters: = {{ }, { }} Generative moel: Emily Fox 0 5 LDA Generative Moel z i N D Y DY p( ) = p( ) p( ) = =! YN p(zi )p(wi zi, ) i= Emily Fox 0 6

4 Collapse LDA Sampling Marginalize parameters Document-specific topic weights Corpus-wie topic-specific wor istributions Sample topic inicators for each wor Derivation: zi wi N D p(z:n ) = (P Q Q ) (n + ) ( ) ( P n + ) p(z ) = DY p(z:n ) = p({wi zi = }, )= Q (P ) ( ) Y p(w z, )= p({wi zi = }, ) = Q (v + ) ( P v + ) Emily Fox 0 7 Collapse LDA Sampling Marginalize parameters Document-specific topic weights Corpus-wie topic-specific wor istributions Sample topic inicators for each wor Algorithm: zi wi N D Emily Fox 0 8 4

5 Sample Document Etruscan trae Emily Fox 0 9 Ranomly Assign Topics z i Etruscan trae Emily Fox 0 0 5

6 Ranomly Assign Topics z i Etruscan trae Etruscan trae Etruscan trae Etruscan Etruscan trae trae Etruscan Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan trae Etruscan trae trae Etruscan Etruscan Etruscan trae trae trae trae Etruscan Etruscan trae ship trae trae Etruscan Etruscan trae trae ship Etruscan trae ship trae Etruscan trae Italy ship trae Emily Fox 0 Maintain Global Statistics z i Etruscan trae Total counts from all ocs Etruscan trae Emily Fox 0 6

7 Resample Assignments z i Etruscan trae Etruscan trae Emily Fox 0 What is the conitional istribution for this topic? z i? Etruscan trae Emily Fox 0 4 7

8 What is the conitional istribution for this topic? Part I: How much oes this ocument lie each topic? z i? Etruscan trae Topic Topic Topic Emily Fox 0 5 What is the conitional istribution for this topic? Part I: How much oes this ocument lie each topic? Part II: How much oes each topic lie this wor? z i? Etruscan trae Topic Topic Topic trae 0 7 Emily Fox 0 6 8

9 What is the conitional istribution for this topic? Part I: How much oes this ocument lie each topic? Part II: How much oes each topic lie this wor? z i? Etruscan trae Topic Topic Topic Emily Fox 0 7 What is the conitional istribution for this topic? Part I: How much oes this ocument lie each topic? Part II: How much oes each topic lie this wor? z i? Etruscan trae Topic Topic Topic n + P j= n j + j vtrae P + V j= v j + Emily Fox 0 8 j 9

10 Sample a New Topic Inicator z i? Etruscan trae Topic Topic Topic Emily Fox 0 9 Upate Counts z i? Etruscan trae Etruscan trae Emily Fox 0 0 0

11 Geometrically z i Etruscan trae Topic Topic Topic Emily Fox 0 Issues with Generic LDA Sampling Slow mixing rates à Nee many iterations Each iteration cycles through sampling topic assignments for all wors in all ocuments Moern approaches: Large-scale LDA. For example, Mimno, Davi, Matthew D. Hoffman an Davi M. Blei. "Sparse stochastic inference for latent Dirichlet allocation." International Conference on Machine Learning, 0. Distribute LDA. For example, Ahme, Amr, et al. "Scalable inference in latent variable moels." Proceeings of the fifth ACM international conference on Web search an ata mining (0): - Alternative: Variational methos instea of sampling Approximate posterior with an optimize variational istribution Emily Fox 0

12 Variational Methos Recall tas: Characterize the posterior Turn posterior inference into an optimization tas Introuce a tractable family of istributions over parameters an latent variables Family is inexe by a set of free parameters Fin member of the family closest to: Questions: How o we measure closeness? If the posterior is intractable, how can we approximate something we o not have to begin with? Emily Fox 0 A Measure of Closeness ullbac-leibler (L) ivergence Measures istance between two istributions p an q Not symmetric p etermines where the ifference is important: p(x)=0 an q(x) 0 p(x) 0 an q(x)=0 Want Just as har as the original problem! Emily Fox 0 4

13 Reverse Divergence Divergence D(q p ) true istribution p efines support of iff. the correct irection will be intractable to compute Reverse ivergence D(q p ) approximate istribution efines support tens to give overconfient results will be tractable Emily Fox 0 5 Interpretations of Minimizing Reverse L Similarity measure: Evience lower boun (ELBO) Therefore, minimizing L is equivalent to maximizing a lower boun on the marginal lielihoo: Max L(q) = min D(q p) = max lower boun of log p(x) Emily Fox 0 6

14 Mean Fiel How o we choose a Q such that the following is tractable? Simplest case = mean fiel approximation Assume each parameter an latent variable is conitionally inepenent given the set of free parameters Then, entropy term ecomposes as Emily Fox 0 7 Mean Fiel Examine one free parameter, e.g., Can rewrite joint as E q [log p(, z, x)] = E q [log p( z,x)] + E q [log p(z,x)] Loo at terms of ELBO just epening on L = Liewise, L n = This motivates using a coorinate ascent algorithm for optimization Iteratively optimize each free parameter holing all others fixe Emily Fox 0 8 4

15 Mean Fiel for LDA In LDA, our parameters are = { }, { } z = {z i } z i N D The variational istribution factorizes as The joint istribution factorizes as Y DY YN p(,, z, w) = p( ) p( ) p(zi )p(wi zi, ) = = i= Emily Fox 0 9 Mean Fiel for LDA Y DY q(,, z) = q( ) q( = = Y q(zi N ) Y DY YN p(,, z, w) = p( ) p( ) p(zi )p(wi zi, ) = = i= i= i ) z i i N D Examine the ELBO X DX L(q) = E q [log p( )] + E q [log p( )] = + = X XN E q [log p(zi )] + E q [log p(wi zi, )] = i= X E q [log q( )] DX X XN E q [log q( )] E q [log q(zi = = = i= i )] Emily Fox 0 0 5

16 Mean Fiel for LDA Let s loo at some of these terms z i i X Eq [log p(z i )] N D E q [log q(z i i )] Other terms follow similarly Emily Fox 0 Optimize via Coorinate Ascent Algorithm: z i i N D Emily Fox 0 6

17 Optimize via Coorinate Ascent Algorithm: z i i N D Emily Fox 0 Alternative Optimization Schemes Inefficient: Start from ranomly initialize (topics) Analyze whole corpus before upating again If streaming ata scenario, can t compute even one iteration! Din t have to o coor. ascent. Coul have use graient ascent. Emily Fox 0 4 7

18 Alternative Optimization Schemes Recall stochastic graient ascent: Assume M = Unbiase, but noisy Here, DX L = E q [log p( )] E q [log q( )] + E q [log p( )] E q [log q( )] DX = + E q [log p(z,x, )] E q [log q(z )] = L t = E q [log p( )] E q [log q( )]+D E q [log p( t )] E[log q( t )] +D E q [log p(z t,x t t, )] E q [log q(z t )] Emily Fox 0 5 Stochastic Variational Inference for LDA Initialize (0) ranomly. Repeat (inefinitely): Sample a ocument uniformly from the ata set. For all, initialize = Repeat until converge For i=,,n i / exp{e[log ]+E[log,w i ]} XN Set = + i= i Tae a stochastic graient step (t) = (t ) + t r L Emily Fox 0 6 8

19 Acnowlegements Thans to Dave Blei, Davi Mimno, an Joran Boy-Graber for some material in this lecture relating to LDA Emily Fox 0 7 9

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling Case Stuy 5: Mixe Membership Moeling LDA Collapse Gibbs Sampler, VariaNonal Inference Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox May 8 th, 05 Emily Fox 05 Task : Mixe

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling

LDA Collapsed Gibbs Sampler, VariaNonal Inference. Task 3: Mixed Membership Models. Case Study 5: Mixed Membership Modeling Case Stuy 5: Mxe Membershp Moelng LDA Collapse Gbbs Sampler, VaraNonal Inference Machne Learnng for Bg Data CSE547/STAT548, Unversty of Washngton Emly Fox May 8 th, 05 Emly Fox 05 Task : Mxe Membershp

More information

Collapsed Variational Inference for HDP

Collapsed Variational Inference for HDP Collapse Variational Inference for HDP Yee W. Teh Davi Newman an Max Welling Publishe on NIPS 2007 Discussion le by Iulian Pruteanu Outline Introuction Hierarchical Bayesian moel for LDA Collapse VB inference

More information

Lecture 2: Correlated Topic Model

Lecture 2: Correlated Topic Model Probabilistic Moels for Unsupervise Learning Spring 203 Lecture 2: Correlate Topic Moel Inference for Correlate Topic Moel Yuan Yuan First of all, let us make some claims about the parameters an variables

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Sparse Stochastic Inference for Latent Dirichlet Allocation

Sparse Stochastic Inference for Latent Dirichlet Allocation Sparse Stochastic Inference for Latent Dirichlet Allocation David Mimno 1, Matthew D. Hoffman 2, David M. Blei 1 1 Dept. of Computer Science, Princeton U. 2 Dept. of Statistics, Columbia U. Presentation

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Lower bounds on Locality Sensitive Hashing

Lower bounds on Locality Sensitive Hashing Lower bouns on Locality Sensitive Hashing Rajeev Motwani Assaf Naor Rina Panigrahy Abstract Given a metric space (X, X ), c 1, r > 0, an p, q [0, 1], a istribution over mappings H : X N is calle a (r,

More information

Topic Modeling: Beyond Bag-of-Words

Topic Modeling: Beyond Bag-of-Words Hanna M. Wallach Cavenish Laboratory, University of Cambrige, Cambrige CB3 0HE, UK hmw26@cam.ac.u Abstract Some moels of textual corpora employ text generation methos involving n-gram statistics, while

More information

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012 CS-6 Theory Gems November 8, 0 Lecture Lecturer: Alesaner Mąry Scribes: Alhussein Fawzi, Dorina Thanou Introuction Toay, we will briefly iscuss an important technique in probability theory measure concentration

More information

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search

Part I: Web Structure Mining Chapter 1: Information Retrieval and Web Search Part I: Web Structure Mining Chapter : Information Retrieval an Web Search The Web Challenges Crawling the Web Inexing an Keywor Search Evaluating Search Quality Similarity Search The Web Challenges Tim

More information

Collapsed Variational Inference for LDA

Collapsed Variational Inference for LDA Collapse Variational Inference for LDA BT Thomas Yeo LDA We shall follow the same notation as Blei et al. 2003. In other wors, we consier full LDA moel with hyperparameters α anη onβ anθ respectiely, whereθparameterizes

More information

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences.

. Using a multinomial model gives us the following equation for P d. , with respect to same length term sequences. S 63 Lecture 8 2/2/26 Lecturer Lillian Lee Scribes Peter Babinski, Davi Lin Basic Language Moeling Approach I. Special ase of LM-base Approach a. Recap of Formulas an Terms b. Fixing θ? c. About that Multinomial

More information

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys Homewor Solutions EM, Mixture Moels, PCA, Dualitys CMU 0-75: Machine Learning Fall 05 http://www.cs.cmu.eu/~bapoczos/classes/ml075_05fall/ OUT: Oct 5, 05 DUE: Oct 9, 05, 0:0 AM An EM algorithm for a Mixture

More information

Stochastic Variational Inference

Stochastic Variational Inference Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

On Topic Evolution. Eric P. Xing School of Computer Science Carnegie Mellon University Technical Report: CMU-CALD

On Topic Evolution. Eric P. Xing School of Computer Science Carnegie Mellon University Technical Report: CMU-CALD On Topic Evolution Eric P. Xing School of Computer Science Carnegie Mellon University epxing@cs.cmu.eu Technical Report: CMU-CALD-05-5 December 005 Abstract I introuce topic evolution moels for longituinal

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

CS Lecture 18. Topic Models and LDA

CS Lecture 18. Topic Models and LDA CS 6347 Lecture 18 Topic Models and LDA (some slides by David Blei) Generative vs. Discriminative Models Recall that, in Bayesian networks, there could be many different, but equivalent models of the same

More information

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors Math 18.02 Notes on ifferentials, the Chain Rule, graients, irectional erivative, an normal vectors Tangent plane an linear approximation We efine the partial erivatives of f( xy, ) as follows: f f( x+

More information

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013

Survey Sampling. 1 Design-based Inference. Kosuke Imai Department of Politics, Princeton University. February 19, 2013 Survey Sampling Kosuke Imai Department of Politics, Princeton University February 19, 2013 Survey sampling is one of the most commonly use ata collection methos for social scientists. We begin by escribing

More information

Evaluation Methods for Topic Models

Evaluation Methods for Topic Models University of Massachusetts Amherst wallach@cs.umass.edu April 13, 2009 Joint work with Iain Murray, Ruslan Salakhutdinov and David Mimno Statistical Topic Models Useful for analyzing large, unstructured

More information

Mixed-membership Models (and an introduction to variational inference)

Mixed-membership Models (and an introduction to variational inference) Mixed-membership Models (and an introduction to variational inference) David M. Blei Columbia University November 24, 2015 Introduction We studied mixture models in detail, models that partition data into

More information

26.1 Metropolis method

26.1 Metropolis method CS880: Approximations Algorithms Scribe: Dave Anrzejewski Lecturer: Shuchi Chawla Topic: Metropolis metho, volume estimation Date: 4/26/07 The previous lecture iscusse they some of the key concepts of

More information

Analyzing Tensor Power Method Dynamics in Overcomplete Regime

Analyzing Tensor Power Method Dynamics in Overcomplete Regime Journal of Machine Learning Research 18 (2017) 1-40 Submitte 9/15; Revise 11/16; Publishe 4/17 Analyzing Tensor Power Metho Dynamics in Overcomplete Regime Animashree Ananumar Department of Electrical

More information

Lecture 2 Lagrangian formulation of classical mechanics Mechanics

Lecture 2 Lagrangian formulation of classical mechanics Mechanics Lecture Lagrangian formulation of classical mechanics 70.00 Mechanics Principle of stationary action MATH-GA To specify a motion uniquely in classical mechanics, it suffices to give, at some time t 0,

More information

IN the evolution of the Internet, there have been

IN the evolution of the Internet, there have been 1 Tag-Weighte Topic Moel For Large-scale Semi-Structure Documents Shuangyin Li, Jiefei Li, Guan Huang, Ruiyang Tan, an Rong Pan arxiv:1507.08396v1 [cs.cl] 30 Jul 2015 Abstract To ate, there have been massive

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 12 EFFICIENT LEARNING So far, our focus has been on moels of learning an basic algorithms for those moels. We have not place much emphasis on how to learn quickly.

More information

Proof of SPNs as Mixture of Trees

Proof of SPNs as Mixture of Trees A Proof of SPNs as Mixture of Trees Theorem 1. If T is an inuce SPN from a complete an ecomposable SPN S, then T is a tree that is complete an ecomposable. Proof. Argue by contraiction that T is not a

More information

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016

Admin BACKPROPAGATION. Neural network. Neural network 11/3/16. Assignment 7. Assignment 8 Goals today. David Kauchak CS158 Fall 2016 Amin Assignment 7 Assignment 8 Goals toay BACKPROPAGATION Davi Kauchak CS58 Fall 206 Neural network Neural network inputs inputs some inputs are provie/ entere Iniviual perceptrons/ neurons Neural network

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k A Proof of Lemma 2 B Proof of Lemma 3 Proof: Since the support of LL istributions is R, two such istributions are equivalent absolutely continuous with respect to each other an the ivergence is well-efine

More information

Text Mining for Economics and Finance Latent Dirichlet Allocation

Text Mining for Economics and Finance Latent Dirichlet Allocation Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model

More information

Topic 7: Convergence of Random Variables

Topic 7: Convergence of Random Variables Topic 7: Convergence of Ranom Variables Course 003, 2016 Page 0 The Inference Problem So far, our starting point has been a given probability space (S, F, P). We now look at how to generate information

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

Table of Common Derivatives By David Abraham

Table of Common Derivatives By David Abraham Prouct an Quotient Rules: Table of Common Derivatives By Davi Abraham [ f ( g( ] = [ f ( ] g( + f ( [ g( ] f ( = g( [ f ( ] g( g( f ( [ g( ] Trigonometric Functions: sin( = cos( cos( = sin( tan( = sec

More information

Topic Models. Charles Elkan November 20, 2008

Topic Models. Charles Elkan November 20, 2008 Topic Models Charles Elan elan@cs.ucsd.edu November 20, 2008 Suppose that we have a collection of documents, and we want to find an organization for these, i.e. we want to do unsupervised learning. One

More information

Topic Uncovering and Image Annotation via Scalable Probit Normal Correlated Topic Models

Topic Uncovering and Image Annotation via Scalable Probit Normal Correlated Topic Models Rochester Institute of Technology RIT Scholar Wors Theses Thesis/Dissertation Collections 5-2015 Topic Uncovering an Image Annotation via Scalable Probit Normal Correlate Topic Moels Xingchen Yu Follow

More information

Introduction to Bayesian inference

Introduction to Bayesian inference Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions

More information

Latent Dirichlet Allocation in Web Spam Filtering

Latent Dirichlet Allocation in Web Spam Filtering Latent Dirichlet Allocation in Web Spam Filtering István Bíró Jácint Szabó Anrás A. Benczúr Data Mining an Web search Research Group, Informatics Laboratory Computer an Automation Research Institute of

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Necessary and Sufficient Conditions for Sketched Subspace Clustering

Necessary and Sufficient Conditions for Sketched Subspace Clustering Necessary an Sufficient Conitions for Sketche Subspace Clustering Daniel Pimentel-Alarcón, Laura Balzano 2, Robert Nowak University of Wisconsin-Maison, 2 University of Michigan-Ann Arbor Abstract This

More information

Note for plsa and LDA-Version 1.1

Note for plsa and LDA-Version 1.1 Note for plsa and LDA-Version 1.1 Wayne Xin Zhao March 2, 2011 1 Disclaimer In this part of PLSA, I refer to [4, 5, 1]. In LDA part, I refer to [3, 2]. Due to the limit of my English ability, in some place,

More information

Note 1: Varitional Methods for Latent Dirichlet Allocation

Note 1: Varitional Methods for Latent Dirichlet Allocation Technical Note Series Spring 2013 Note 1: Varitional Methods for Latent Dirichlet Allocation Version 1.0 Wayne Xin Zhao batmanfly@gmail.com Disclaimer: The focus of this note was to reorganie the content

More information

Introduction to Machine Learning

Introduction to Machine Learning How o you estimate p(y x)? Outline Contents Introuction to Machine Learning Logistic Regression Varun Chanola April 9, 207 Generative vs. Discriminative Classifiers 2 Logistic Regression 2 3 Logistic Regression

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics This moule is part of the Memobust Hanbook on Methoology of Moern Business Statistics 26 March 2014 Metho: Balance Sampling for Multi-Way Stratification Contents General section... 3 1. Summary... 3 2.

More information

Discrete Mathematics

Discrete Mathematics Discrete Mathematics 309 (009) 86 869 Contents lists available at ScienceDirect Discrete Mathematics journal homepage: wwwelseviercom/locate/isc Profile vectors in the lattice of subspaces Dániel Gerbner

More information

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation

Thermal conductivity of graded composites: Numerical simulations and an effective medium approximation JOURNAL OF MATERIALS SCIENCE 34 (999)5497 5503 Thermal conuctivity of grae composites: Numerical simulations an an effective meium approximation P. M. HUI Department of Physics, The Chinese University

More information

Factorized Multi-Modal Topic Model

Factorized Multi-Modal Topic Model Factorize Multi-Moal Topic Moel Seppo Virtanen 1, Yangqing Jia 2, Arto Klami 1, Trevor Darrell 2 1 Helsini Institute for Information Technology HIIT Department of Information an Compute Science, Aalto

More information

u!i = a T u = 0. Then S satisfies

u!i = a T u = 0. Then S satisfies Deterministic Conitions for Subspace Ientifiability from Incomplete Sampling Daniel L Pimentel-Alarcón, Nigel Boston, Robert D Nowak University of Wisconsin-Maison Abstract Consier an r-imensional subspace

More information

5-4 Electrostatic Boundary Value Problems

5-4 Electrostatic Boundary Value Problems 11/8/4 Section 54 Electrostatic Bounary Value Problems blank 1/ 5-4 Electrostatic Bounary Value Problems Reaing Assignment: pp. 149-157 Q: A: We must solve ifferential equations, an apply bounary conitions

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks A PAC-Bayesian Approach to Spectrally-Normalize Margin Bouns for Neural Networks Behnam Neyshabur, Srinah Bhojanapalli, Davi McAllester, Nathan Srebro Toyota Technological Institute at Chicago {bneyshabur,

More information

Topic Modelling and Latent Dirichlet Allocation

Topic Modelling and Latent Dirichlet Allocation Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer

More information

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University

Another Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Comparative Approaches of Calculation of the Back Water Curves in a Trapezoidal Channel with Weak Slope

Comparative Approaches of Calculation of the Back Water Curves in a Trapezoidal Channel with Weak Slope Proceeings of the Worl Congress on Engineering Vol WCE, July 6-8,, Lonon, U.K. Comparative Approaches of Calculation of the Back Water Curves in a Trapezoial Channel with Weak Slope Fourar Ali, Chiremsel

More information

Latent Variable Models

Latent Variable Models Latent Variable Models Stefano Ermon, Aditya Grover Stanford University Lecture 5 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 5 1 / 31 Recap of last lecture 1 Autoregressive models:

More information

WUCHEN LI AND STANLEY OSHER

WUCHEN LI AND STANLEY OSHER CONSTRAINED DYNAMICAL OPTIMAL TRANSPORT AND ITS LAGRANGIAN FORMULATION WUCHEN LI AND STANLEY OSHER Abstract. We propose ynamical optimal transport (OT) problems constraine in a parameterize probability

More information

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling

Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling Online but Accurate Inference for Latent Variable Models with Local Gibbs Sampling Christophe Dupuy INRIA - Technicolor christophe.dupuy@inria.fr Francis Bach INRIA - ENS francis.bach@inria.fr Abstract

More information

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644:

Gibbs Sampling. Héctor Corrada Bravo. University of Maryland, College Park, USA CMSC 644: Gibbs Sampling Héctor Corrada Bravo University of Maryland, College Park, USA CMSC 644: 2019 03 27 Latent semantic analysis Documents as mixtures of topics (Hoffman 1999) 1 / 60 Latent semantic analysis

More information

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations

Lecture XII. where Φ is called the potential function. Let us introduce spherical coordinates defined through the relations Lecture XII Abstract We introuce the Laplace equation in spherical coorinates an apply the metho of separation of variables to solve it. This will generate three linear orinary secon orer ifferential equations:

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Technical Report TTI-TR-2008-5 Multi-View Clustering via Canonical Correlation Analysis Kamalika Chauhuri UC San Diego Sham M. Kakae Toyota Technological Institute at Chicago ABSTRACT Clustering ata in

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

Implicit Differentiation

Implicit Differentiation Implicit Differentiation Thus far, the functions we have been concerne with have been efine explicitly. A function is efine explicitly if the output is given irectly in terms of the input. For instance,

More information

Optimization of Geometries by Energy Minimization

Optimization of Geometries by Energy Minimization Optimization of Geometries by Energy Minimization by Tracy P. Hamilton Department of Chemistry University of Alabama at Birmingham Birmingham, AL 3594-140 hamilton@uab.eu Copyright Tracy P. Hamilton, 1997.

More information

Equilibrium in Queues Under Unknown Service Times and Service Value

Equilibrium in Queues Under Unknown Service Times and Service Value University of Pennsylvania ScholarlyCommons Finance Papers Wharton Faculty Research 1-2014 Equilibrium in Queues Uner Unknown Service Times an Service Value Laurens Debo Senthil K. Veeraraghavan University

More information

Lecture 1b. Differential operators and orthogonal coordinates. Partial derivatives. Divergence and divergence theorem. Gradient. A y. + A y y dy. 1b.

Lecture 1b. Differential operators and orthogonal coordinates. Partial derivatives. Divergence and divergence theorem. Gradient. A y. + A y y dy. 1b. b. Partial erivatives Lecture b Differential operators an orthogonal coorinates Recall from our calculus courses that the erivative of a function can be efine as f ()=lim 0 or using the central ifference

More information

topic modeling hanna m. wallach

topic modeling hanna m. wallach university of massachusetts amherst wallach@cs.umass.edu Ramona Blei-Gantz Helen Moss (Dave's Grandma) The Next 30 Minutes Motivations and a brief history: Latent semantic analysis Probabilistic latent

More information

Applying LDA topic model to a corpus of Italian Supreme Court decisions

Applying LDA topic model to a corpus of Italian Supreme Court decisions Applying LDA topic model to a corpus of Italian Supreme Court decisions Paolo Fantini Statistical Service of the Ministry of Justice - Italy CESS Conference - Rome - November 25, 2014 Our goal finding

More information

Convergence of Random Walks

Convergence of Random Walks Chapter 16 Convergence of Ranom Walks This lecture examines the convergence of ranom walks to the Wiener process. This is very important both physically an statistically, an illustrates the utility of

More information

Lecture 8: Graphical models for Text

Lecture 8: Graphical models for Text Lecture 8: Graphical models for Text 4F13: Machine Learning Joaquin Quiñonero-Candela and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine

Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Fast Inference and Learning for Modeling Documents with a Deep Boltzmann Machine Nitish Srivastava nitish@cs.toronto.edu Ruslan Salahutdinov rsalahu@cs.toronto.edu Geoffrey Hinton hinton@cs.toronto.edu

More information

Chapter 4. Electrostatics of Macroscopic Media

Chapter 4. Electrostatics of Macroscopic Media Chapter 4. Electrostatics of Macroscopic Meia 4.1 Multipole Expansion Approximate potentials at large istances 3 x' x' (x') x x' x x Fig 4.1 We consier the potential in the far-fiel region (see Fig. 4.1

More information

Multi-View Clustering via Canonical Correlation Analysis

Multi-View Clustering via Canonical Correlation Analysis Keywors: multi-view learning, clustering, canonical correlation analysis Abstract Clustering ata in high-imensions is believe to be a har problem in general. A number of efficient clustering algorithms

More information

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson JUST THE MATHS UNIT NUMBER 10.2 DIFFERENTIATION 2 (Rates of change) by A.J.Hobson 10.2.1 Introuction 10.2.2 Average rates of change 10.2.3 Instantaneous rates of change 10.2.4 Derivatives 10.2.5 Exercises

More information

ARCH 614 Note Set 5 S2012abn. Moments & Supports

ARCH 614 Note Set 5 S2012abn. Moments & Supports RCH 614 Note Set 5 S2012abn Moments & Supports Notation: = perpenicular istance to a force from a point = name for force vectors or magnitue of a force, as is P, Q, R x = force component in the x irection

More information

Cascaded redundancy reduction

Cascaded redundancy reduction Network: Comput. Neural Syst. 9 (1998) 73 84. Printe in the UK PII: S0954-898X(98)88342-5 Cascae reunancy reuction Virginia R e Sa an Geoffrey E Hinton Department of Computer Science, University of Toronto,

More information

Topic Modeling Ensembles

Topic Modeling Ensembles Topic Moeling Ensembles Zhiyong Shen, Ping Luo, Shengen Yang, Xukun Shen HP Laboratories HPL-2-58 Keyor(s): Topic moel, Ensemble Abstract: In this paper e propose a frameork of topic moeling ensembles,

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Lecture 6 : Dimensionality Reduction

Lecture 6 : Dimensionality Reduction CPS290: Algorithmic Founations of Data Science February 3, 207 Lecture 6 : Dimensionality Reuction Lecturer: Kamesh Munagala Scribe: Kamesh Munagala In this lecture, we will consier the roblem of maing

More information

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank

A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank A Unified Posterior Regularized Topic Model with Maximum Margin for Learning-to-Rank Shoaib Jameel Shoaib Jameel 1, Wai Lam 2, Steven Schockaert 1, and Lidong Bing 3 1 School of Computer Science and Informatics,

More information

7.1 Support Vector Machine

7.1 Support Vector Machine 67577 Intro. to Machine Learning Fall semester, 006/7 Lecture 7: Support Vector Machines an Kernel Functions II Lecturer: Amnon Shashua Scribe: Amnon Shashua 7. Support Vector Machine We return now to

More information

Euler equations for multiple integrals

Euler equations for multiple integrals Euler equations for multiple integrals January 22, 2013 Contents 1 Reminer of multivariable calculus 2 1.1 Vector ifferentiation......................... 2 1.2 Matrix ifferentiation........................

More information

Replicated Softmax: an Undirected Topic Model. Stephen Turner

Replicated Softmax: an Undirected Topic Model. Stephen Turner Replicated Softmax: an Undirected Topic Model Stephen Turner 1. Introduction 2. Replicated Softmax: A Generative Model of Word Counts 3. Evaluating Replicated Softmax as a Generative Model 4. Experimental

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

Introduction To Machine Learning

Introduction To Machine Learning Introduction To Machine Learning David Sontag New York University Lecture 21, April 14, 2016 David Sontag (NYU) Introduction To Machine Learning Lecture 21, April 14, 2016 1 / 14 Expectation maximization

More information

Smoothed Gradients for Stochastic Variational Inference

Smoothed Gradients for Stochastic Variational Inference Smoothed Gradients for Stochastic Variational Inference Stephan Mandt Department of Physics Princeton University smandt@princeton.edu David Blei Department of Computer Science Department of Statistics

More information

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems

Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan

More information

Latent Dirichlet Alloca/on

Latent Dirichlet Alloca/on Latent Dirichlet Alloca/on Blei, Ng and Jordan ( 2002 ) Presented by Deepak Santhanam What is Latent Dirichlet Alloca/on? Genera/ve Model for collec/ons of discrete data Data generated by parameters which

More information

KNN Particle Filters for Dynamic Hybrid Bayesian Networks

KNN Particle Filters for Dynamic Hybrid Bayesian Networks KNN Particle Filters for Dynamic Hybri Bayesian Networs H. D. Chen an K. C. Chang Dept. of Systems Engineering an Operations Research George Mason University MS 4A6, 4400 University Dr. Fairfax, VA 22030

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information