Collaborative Place Models Supplement 2
|
|
- Louisa Henry
- 6 years ago
- Views:
Transcription
1 Collaborative Place Models Supplement Ber Kapicioglu Foursquare Labs Robert E Schapire Princeton University schapire@csprincetonedu David S Rosenberg YP Mobile Labs daviddavidr@gmailcom Tony Jebara Columbia University jebara@cscolumbiaedu Bayesian Gaussian Mixture Model (BGMM) In this section, we provide a formal description of BGMM and describe its generative process The model is depicted in Figure Figure : Graphical model representation of BGMM The geographic coordinates, denoted by `, are the only observed variables Let denote the normal distribution and IW denote the inverse-wishart distribution Let Dirichlet K ( ) denote a symmetric Dirichlet if its parameter is a scalar and a general Dirichlet if its parameter is a vector The generative process of BGMM is described in more formal terms below Further technical details about the distributions we use in our model can be found in the appendix Draw a place distribution Dirichlet K ( ) For each place, (a) Draw a place covariance IW (,v) (b) Draw a place mean µ, p
2 3 For each observation index n, Inference (a) Draw a place assignment z n Categorical( ) (b) Draw a location `n ( zn, zn ) In this section, we derive our inference algorithm, and we present our derivation in multiple steps First, we use a strategy popularized by Griffiths and Steyvers [], and derive a collapsed Gibbs sampler to sample from the posterior distribution of the categorical random variables conditioned on the observed geographic coordinates Second, we derive the conditional lielihood of the posterior samples, which we use to determine the sampler s convergence Finally, we derive formulas for approximating the posterior expectations of the non-categorical random variables conditioned on the posterior samples Collapsed Gibbs Sampler In Lemma, we derive the collapsed Gibbs sampler for variable z Given a vector x and an index, let x indicate all the entries of the vector excluding the one at index For Lemma, assume i denotes the index of the variable that will be sampled Lemma The unnormalized probability of z i conditioned on the observed location data and remaining categorical variables is p (z i z i, `) / tṽ `i µ, ( p + ) + m, p (ṽ ) The parameters ṽ, µ,, and p are defined in the proof t denotes the bivariate t-distribution and m, denotes counts, both of which are defined in the appendix Proof We decompose the probability into two components using Bayes theorem: p (z i z i, `) p (`i z i, z i, ` i) p (z i z i, ` i) p (`i z i, ` i) p (`i z i, z i, ` i) p (z i z i ) p (`i z i, ` i) / p (`i z i, z i, ` i) () p (z i z i ) () In the first part of the derivation, we operate on () We augment it with and : p (`i z i, z i, ` i) p `i z i, z i, ` i,, p, z i, z i, ` i d d p `i z i,, p, z i, z i, ` i d d `i, (3) p, z i, z i, ` i d d (4)
3 , We convert (4) into a more tractable form Let M be a set of indices, which we define in the appendix, and let ` M,, denote the subset of observations whose indices are in M In the derivation below, we treat all variables other than and as a constant: p, z i, z i, ` i p, z i, z i, ` M, p, z i, ` M, p ` M, / p Y 0 ` M, j Y 0 j Y j IW M, p, z i,, z i p ` M, z i,, z i p, p `j p `j `j,v,, z i A p,,,z j A p,, A µ, p Since the normal-inverse-wishart distribution is the conjugate prior of the multivariate normal distribution, the posterior is also a normal-inverse-wishart distribution, p, z i, z i, ` i µ, IW p, ṽ, (5) whose parameters are defined as p p + m,, ṽ + m, ` m, j j M,, X M, `j, µ p µ + m, `, p S X `j ` + S + `j ` T, p m, T ` p + m, µ ` µ 3
4 The posterior parameters depicted above are derived based on the conjugacy properties of Gaussian distributions, as described in [] We rewrite () by combining (3), (4), and (5) to obtain p (`i z i, z i, ` i) `i, p, z i, z i, ` i d d `i, µ, p IW, ṽ d d tṽ `i µ, ( p + ), (6) p (ṽ ) where t is the bivariate t-distribution (6) is derived by applying Equation 58 from [] ow, we move onto the second part of the derivation We operate on () and augment it with : p (z i z i ) p (z i z i, ) p ( z i )d p (z i ) (7) p ( z i )d (8) We convert (8) into a more tractable form As before, let M be a set of indices, which we define in the appendix, and let z M denote the subset of place assignments whose indices are in M In the derivation below, we treat all variables other than as a constant: p ( z i ) p z M p z M p ( ) p z M Y / p (z j ) p ( ) j j M Y M Categorical (z j ) Dirichlet K ( ) ) p ( z i ) Dirichlet K + m,,, + m K,, (9) where the last step follows because Dirichlet distribution is the conjugate prior of the categorical distribution We rewrite () by combining (7), (8), and (9): p (z i z i ) p (z i ) p ( z i )d Dirichlet K + m,,, + m K, d + m, K + m (0) The last step follows because it is the expected value of the Dirichlet distribution Finally, we combine (), (), (6), and (0) to obtain the unnormalized probability distribution: p (z i z i, `) / p (`i z i, z i, ` i) p (z i z i ) tṽ `i µ, ( p + ) p (ṽ ) + m, K + m 4
5 Lielihoods In this subsection, we derive the conditional lielihoods of the posterior samples conditioned on the observed geographical coordinates We use these conditional lielihoods to determine the sampler s convergence We present the derivations in multiple lemmas and combine them in a theorem at the end of the subsection Let denote the gamma function Lemma The marginal probability of the categorical random variable z is Q (K ) K + m, p (z) ( ) K K + m, where the counts m, and m are defined in the appendix Proof Below, we will augment the marginal probability with, and then factorize it based on the conditional independence assumptions made by our model: p (z) p (z ) p ( ) d 0 p ( ) p (z j ) A Y j K ( ) Y j Categorical (z j ) A d () ow, we substitute the probabilities in () with Dirichlet and categorical distributions, which are defined in more detail in the appendix: 0 p K ( ) Y Categorical (z j ) A d j B ( ) B ( ) +m, m, d B ( ) B + m,,, + m K, (K ) K Q ( ) K + m, K + m d For our final derivation, let denote the bivariate gamma function, and let denote the determinant Lemma 3 The conditional probability of the observed locations ` conditioned on z is v p p (` z) v p m, The parameters v,, and p are defined in the proof, and the counts m, are defined in the appendix 5
6 Proof We will factorize the probability using the conditional independence assumptions made by the model, and then simplify the resulting probabilities by integrating out the means and covariances associated with the place clusters: p (` z) p, `M z p, `M z, p u, u Y jm, `j, p, d d Y jm, µ, p p `j z j, IW, d d,v, d d We apply Equation 66 from [], which describes the conjugacy properties of Gaussian distributions, to reformulate () into its final form: p (` z) µ, Y IW,v `j, d d p m, v p v p The definitions for v,, and p are provided in (5) jm, Finally, we combine Lemmas and 3 to provide the log-lielihood of the samples z conditioned on the observations ` Lemma 4 The log-lielihood of the samples z conditioned on the observations ` is log p (z `) + KX log KX where C denotes the constant terms + m, v log m, log v log log p + C, Proof The result follows by multiplying the probabilities stated in Lemmas and 3, and applying the logarithm function 3 Parameter estimation In Subsection, we described a collapsed Gibbs sampler for sampling the posteriors of the categorical random variables Below, Lemmas 5 and 6 show how these samples, denoted as z, can be used to approximate the posterior expectations of,, and Lemma 5 The expectation of given the observed geographical coordinates and the posterior samples is E [ z, `] + m, K + m where the counts m, and m are defined in the appendix, () 6
7 Proof p ( z, `) p ( z) p (z ) p ( ) p (z) Q p (z j ) p ( ) j p (z) / Dirichlet K ( ) Y j Categorical (z j ) Dirichlet K + m,,, + m K, ) E [ z, `] + m, K + m Lemma 6 The expectations of posterior samples is and given the observed geographical coordinates and the h i u E u z, ` µ u and h i u E u z, ` u v u 3 Parameters µ u, u, and v u are defined in the proof of Lemma 7
8 Proof p, z, ` ) p, z, ` ) p z, ` ) h i E z, ` ) p z, ` ) h i E z, ` p, z, ` p, z,, `M p `M, Q jm, / tv p `j,, z p, `M z,, z p, `M µ, p IW p, z z p,,v p, `M z µ, IW,v p Q jm, µ, IW p, v µ, p (v ) µ IW u, v v 3 Y jm, `j `j,, 3 Appendix 3 Miscellaneous notation Throughout the paper, we use various notations to represent sets of indices and their cardinalities Vectors y and z denote the component and place assignments in CPM, respectively Each vector entry is identified by a tuple index (u, w, n), where u {,,U} is a user, w {,,W} is a weehour, and n {,, u,w } is an iteration index For the subsequent notations, we assume that the random variables y and z are already sampled We refer to a subset of indices using M 0,f0 u 0,w 0 {( u, ẇ, ṅ) z u,ẇ,ṅ 0,y u,ẇ,ṅ f 0, u u 0, ẇ w 0 }, where u 0 denotes the user, w 0 denotes the weehour, 0 denotes the place, and f 0 denotes the component If we want the subset of indices to be unrestricted with respect to a category, we use the placeholder For example, has no constraints with respect to places M,f0 u 0,w 0 {( u, ẇ, ṅ) y u,ẇ,ṅ f 0, u u 0, ẇ w 0 } 8
9 Given a subset of indices denoted by M, the lowercase m M denotes its cardinality For example, given a set of indices M u,f0 0, {( u, ẇ, ṅ) z u,ẇ,ṅ,y u,ẇ,ṅ f 0, u u 0, ẇ w 0 }, its cardinality is m,f0 u 0, M,f0 u 0, For the collapsed Gibbs sampler, the sets of indices and cardinalities used in the derivations exclude the index that will be sampled We use to modify sets or cardinalities for this exclusion Let (u, w, n) denote the index that will be sampled, then given an index set M, let M M {(u, w, n)} represent the excluding set and let m M represent the corresponding cardinality For example, and M,f0 u 0, M,f0 u 0, m,f0 u 0, {(u, w, n)},f0 M u 0, In the proof of Lemma, parameters ṽ u, µu, u, and p u are defined using cardinalities that exclude the current index (u, w, n) Similarly, in the proof of Lemma 6, parameters µ u, u, and v u are defined lie their wiggly versions, but the counts used in their definitions do not exclude the current index 3 Probability distributions Let denote a bivariate gamma function, defined as Y (a) a + j j Let >and let R be a positive definite scale matrix The inverse-wishart distribution, which is the conjugate prior to the multivariate normal distribution, is defined as IW (, ) 3 exp tr Let R be a positive definite covariance matrix and let µ R denote a mean vector The multivariate normal distribution is defined as (` µ, ) ( ) exp (` µ)t (` µ) Let >and let R, then the -dimensional t-distribution is defined as t v (x µ, ) + (x µ)t (x µ) + Let K>be the number of categories and let (,, K ) be the concentration parameters, where > 0 for all {,,K} Then, the K-dimensional Dirichlet distribution, which is the conjugate prior to the categorical distribution, is defined as Dirichlet K (x ) B ( ) x, where KQ ( ) B ( ) KP We abuse the Dirichlet notation slightly and use it to define the K-dimensional symmetric Dirichlet distribution as well Let >0 be a scalar concentration parameter Then, the symmetric Dirichlet distribution is defined as Dirichlet K (x )Dirichlet K (x,, K ), where for all {,,K} 9
10 References [] Thomas L Griffiths and Mar Steyvers Finding scientific topics Proceedings of the ational Academy of Sciences of the United States of America, 0(Suppl ):58 535, April 004 [] Kevin Murphy Conjugate bayesian analysis of the gaussian distribution October 007 0
Collaborative Place Models Supplement 1
Collaborative Place Models Sulement Ber Kaicioglu Foursquare Labs ber.aicioglu@gmail.com Robert E. Schaire Princeton University schaire@cs.rinceton.edu David S. Rosenberg P Mobile Labs david.davidr@gmail.com
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationBayesian Nonparametric Models
Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More informationCollaborative Place Models
Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence IJCAI Berk Kapicioglu Foursquare Labs berk.kapicioglu@gmail.com Collaborative Place Models David S. Rosenberg
More informationCSC411 Fall 2018 Homework 5
Homework 5 Deadline: Wednesday, Nov. 4, at :59pm. Submission: You need to submit two files:. Your solutions to Questions and 2 as a PDF file, hw5_writeup.pdf, through MarkUs. (If you submit answers to
More informationLecturer: David Blei Lecture #3 Scribes: Jordan Boyd-Graber and Francisco Pereira October 1, 2007
COS 597C: Bayesian Nonparametrics Lecturer: David Blei Lecture # Scribes: Jordan Boyd-Graber and Francisco Pereira October, 7 Gibbs Sampling with a DP First, let s recapitulate the model that we re using.
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationTopic Modelling and Latent Dirichlet Allocation
Topic Modelling and Latent Dirichlet Allocation Stephen Clark (with thanks to Mark Gales for some of the slides) Lent 2013 Machine Learning for Language Processing: Lecture 7 MPhil in Advanced Computer
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationBayesian Mixtures of Bernoulli Distributions
Bayesian Mixtures of Bernoulli Distributions Laurens van der Maaten Department of Computer Science and Engineering University of California, San Diego Introduction The mixture of Bernoulli distributions
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 25: Markov Chain Monte Carlo (MCMC) Course Review and Advanced Topics Many figures courtesy Kevin
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationLecture 10. Announcement. Mixture Models II. Topics of This Lecture. This Lecture: Advanced Machine Learning. Recap: GMMs as Latent Variable Models
Advanced Machine Learning Lecture 10 Mixture Models II 30.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ Announcement Exercise sheet 2 online Sampling Rejection Sampling Importance
More informationMULTILEVEL IMPUTATION 1
MULTILEVEL IMPUTATION 1 Supplement B: MCMC Sampling Steps and Distributions for Two-Level Imputation This document gives technical details of the full conditional distributions used to draw regression
More information19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process
10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable
More informationInformation retrieval LSI, plsi and LDA. Jian-Yun Nie
Information retrieval LSI, plsi and LDA Jian-Yun Nie Basics: Eigenvector, Eigenvalue Ref: http://en.wikipedia.org/wiki/eigenvector For a square matrix A: Ax = λx where x is a vector (eigenvector), and
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationCollapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models (DPGMM)
Collapsed Gibbs Sampler for Dirichlet Process Gaussian Mixture Models (DPGMM) Rajarshi Das Language Technologies Institute School of Computer Science Carnegie Mellon University rajarshd@cs.cmu.edu Sunday
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More information19 : Slice Sampling and HMC
10-708: Probabilistic Graphical Models 10-708, Spring 2018 19 : Slice Sampling and HMC Lecturer: Kayhan Batmanghelich Scribes: Boxiang Lyu 1 MCMC (Auxiliary Variables Methods) In inference, we are often
More informationLatent Variable Models Probabilistic Models in the Study of Language Day 4
Latent Variable Models Probabilistic Models in the Study of Language Day 4 Roger Levy UC San Diego Department of Linguistics Preamble: plate notation for graphical models Here is the kind of hierarchical
More informationBayesian Nonparametrics: Dirichlet Process
Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More information16 : Approximate Inference: Markov Chain Monte Carlo
10-708: Probabilistic Graphical Models 10-708, Spring 2017 16 : Approximate Inference: Markov Chain Monte Carlo Lecturer: Eric P. Xing Scribes: Yuan Yang, Chao-Ming Yen 1 Introduction As the target distribution
More informationBayesian Inference. Chapter 9. Linear models and regression
Bayesian Inference Chapter 9. Linear models and regression M. Concepcion Ausin Universidad Carlos III de Madrid Master in Business Administration and Quantitative Methods Master in Mathematical Engineering
More informationBayesian data analysis in practice: Three simple examples
Bayesian data analysis in practice: Three simple examples Martin P. Tingley Introduction These notes cover three examples I presented at Climatea on 5 October 0. Matlab code is available by request to
More informationMultivariate Normal & Wishart
Multivariate Normal & Wishart Hoff Chapter 7 October 21, 2010 Reading Comprehesion Example Twenty-two children are given a reading comprehsion test before and after receiving a particular instruction method.
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationBayesian Inference for the Multivariate Normal
Bayesian Inference for the Multivariate Normal Will Penny Wellcome Trust Centre for Neuroimaging, University College, London WC1N 3BG, UK. November 28, 2014 Abstract Bayesian inference for the multivariate
More informationMultivariate Bayesian Linear Regression MLAI Lecture 11
Multivariate Bayesian Linear Regression MLAI Lecture 11 Neil D. Lawrence Department of Computer Science Sheffield University 21st October 2012 Outline Univariate Bayesian Linear Regression Multivariate
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationOutline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models
Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University
More informationIntroduction to Bayesian inference
Introduction to Bayesian inference Thomas Alexander Brouwer University of Cambridge tab43@cam.ac.uk 17 November 2015 Probabilistic models Describe how data was generated using probability distributions
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationStochastic Processes, Kernel Regression, Infinite Mixture Models
Stochastic Processes, Kernel Regression, Infinite Mixture Models Gabriel Huang (TA for Simon Lacoste-Julien) IFT 6269 : Probabilistic Graphical Models - Fall 2018 Stochastic Process = Random Function 2
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationVariational Mixture of Gaussians. Sargur Srihari
Variational Mixture of Gaussians Sargur srihari@cedar.buffalo.edu 1 Objective Apply variational inference machinery to Gaussian Mixture Models Demonstrates how Bayesian treatment elegantly resolves difficulties
More informationBayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units
Bayesian nonparametric estimation of finite population quantities in absence of design information on nonsampled units Sahar Z Zangeneh Robert W. Keener Roderick J.A. Little Abstract In Probability proportional
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationLatent Dirichlet Allocation (LDA)
Latent Dirichlet Allocation (LDA) D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3:993-1022, January 2003. Following slides borrowed ant then heavily modified from: Jonathan Huang
More informationLower Bounds for the Gibbs Sampler over Mixtures of Gaussians
Lower Bounds for the Gibbs Sampler over Mixtures of Gaussians Christopher Tosh CTOSH@CSUCSDEDU Sanjoy Dasgupta DASGUPTA@CSUCSDEDU Department of Computer Science and Engineering, UCSD, 95 Gilman Drive,
More informationAnother Walkthrough of Variational Bayes. Bevan Jones Machine Learning Reading Group Macquarie University
Another Walkthrough of Variational Bayes Bevan Jones Machine Learning Reading Group Macquarie University 2 Variational Bayes? Bayes Bayes Theorem But the integral is intractable! Sampling Gibbs, Metropolis
More informationText Mining for Economics and Finance Latent Dirichlet Allocation
Text Mining for Economics and Finance Latent Dirichlet Allocation Stephen Hansen Text Mining Lecture 5 1 / 45 Introduction Recall we are interested in mixed-membership modeling, but that the plsi model
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationThe Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations
The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations John R. Michael, Significance, Inc. and William R. Schucany, Southern Methodist University The mixture
More informationBayesian Approach 2. CSC412 Probabilistic Learning & Reasoning
CSC412 Probabilistic Learning & Reasoning Lecture 12: Bayesian Parameter Estimation February 27, 2006 Sam Roweis Bayesian Approach 2 The Bayesian programme (after Rev. Thomas Bayes) treats all unnown quantities
More informationConjugate Analysis for the Linear Model
Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum,
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More informationContent-based Recommendation
Content-based Recommendation Suthee Chaidaroon June 13, 2016 Contents 1 Introduction 1 1.1 Matrix Factorization......................... 2 2 slda 2 2.1 Model................................. 3 3 flda 3
More informationMachine Learning Techniques for Computer Vision
Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationMultiparameter models (cont.)
Multiparameter models (cont.) Dr. Jarad Niemi STAT 544 - Iowa State University February 1, 2018 Jarad Niemi (STAT544@ISU) Multiparameter models (cont.) February 1, 2018 1 / 20 Outline Multinomial Multivariate
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationTimevarying VARs. Wouter J. Den Haan London School of Economics. c Wouter J. Den Haan
Timevarying VARs Wouter J. Den Haan London School of Economics c Wouter J. Den Haan Time-Varying VARs Gibbs-Sampler general idea probit regression application (Inverted Wishart distribution Drawing from
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationLecture 2: Priors and Conjugacy
Lecture 2: Priors and Conjugacy Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de May 6, 2014 Some nice courses Fred A. Hamprecht (Heidelberg U.) https://www.youtube.com/watch?v=j66rrnzzkow Michael I.
More informationThe Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11
Wishart Priors Patrick Breheny March 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Introduction When more than two coefficients vary, it becomes difficult to directly model each element
More informationEfficient Bayesian Multivariate Surface Regression
Efficient Bayesian Multivariate Surface Regression Feng Li (joint with Mattias Villani) Department of Statistics, Stockholm University October, 211 Outline of the talk 1 Flexible regression models 2 The
More informationExponential Families
Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,
More informationA MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs
A MCMC Approach for Learning the Structure of Gaussian Acyclic Directed Mixed Graphs Ricardo Silva Abstract Graphical models are widely used to encode conditional independence constraints and causal assumptions,
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods
Prof. Daniel Cremers 11. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric
More informationBayesian Approaches Data Mining Selected Technique
Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals
More informationStochastic Variational Inference
Stochastic Variational Inference David M. Blei Princeton University (DRAFT: DO NOT CITE) December 8, 2011 We derive a stochastic optimization algorithm for mean field variational inference, which we call
More informationA Fully Nonparametric Modeling Approach to. BNP Binary Regression
A Fully Nonparametric Modeling Approach to Binary Regression Maria Department of Applied Mathematics and Statistics University of California, Santa Cruz SBIES, April 27-28, 2012 Outline 1 2 3 Simulation
More informationCSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado
CSCI 5822 Probabilistic Model of Human and Machine Learning Mike Mozer University of Colorado Topics Language modeling Hierarchical processes Pitman-Yor processes Based on work of Teh (2006), A hierarchical
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationKatsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract
Bayesian analysis of a vector autoregressive model with multiple structural breaks Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus Abstract This paper develops a Bayesian approach
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department of Forestry & Department
More informationBayesian (conditionally) conjugate inference for discrete data models. Jon Forster (University of Southampton)
Bayesian (conditionally) conjugate inference for discrete data models Jon Forster (University of Southampton) with Mark Grigsby (Procter and Gamble?) Emily Webb (Institute of Cancer Research) Table 1:
More informationSpatial Normalized Gamma Process
Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma
More informationInference and estimation in probabilistic time series models
1 Inference and estimation in probabilistic time series models David Barber, A Taylan Cemgil and Silvia Chiappa 11 Time series The term time series refers to data that can be represented as a sequence
More informationCollapsed Variational Bayesian Inference for Hidden Markov Models
Collapsed Variational Bayesian Inference for Hidden Markov Models Pengyu Wang, Phil Blunsom Department of Computer Science, University of Oxford International Conference on Artificial Intelligence and
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationLatent Dirichlet Bayesian Co-Clustering
Latent Dirichlet Bayesian Co-Clustering Pu Wang 1, Carlotta Domeniconi 1, and athryn Blackmond Laskey 1 Department of Computer Science Department of Systems Engineering and Operations Research George Mason
More informationNovember 2002 STA Random Effects Selection in Linear Mixed Models
November 2002 STA216 1 Random Effects Selection in Linear Mixed Models November 2002 STA216 2 Introduction It is common practice in many applications to collect multiple measurements on a subject. Linear
More informationA Simple Proof of the Stick-Breaking Construction of the Dirichlet Process
A Simple Proof of the Stick-Breaking Construction of the Dirichlet Process John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We give a simple
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationExpectation Maximization
Expectation Maximization Aaron C. Courville Université de Montréal Note: Material for the slides is taken directly from a presentation prepared by Christopher M. Bishop Learning in DAGs Two things could
More informationTopic Models. Brandon Malone. February 20, Latent Dirichlet Allocation Success Stories Wrap-up
Much of this material is adapted from Blei 2003. Many of the images were taken from the Internet February 20, 2014 Suppose we have a large number of books. Each is about several unknown topics. How can
More informationRonald Christensen. University of New Mexico. Albuquerque, New Mexico. Wesley Johnson. University of California, Irvine. Irvine, California
Texts in Statistical Science Bayesian Ideas and Data Analysis An Introduction for Scientists and Statisticians Ronald Christensen University of New Mexico Albuquerque, New Mexico Wesley Johnson University
More informationBayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems
Bayesian Learning and Inference in Recurrent Switching Linear Dynamical Systems Scott W. Linderman Matthew J. Johnson Andrew C. Miller Columbia University Harvard and Google Brain Harvard University Ryan
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationMore Spectral Clustering and an Introduction to Conjugacy
CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More information1 Data Arrays and Decompositions
1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is
More informationStatistical Learning
Statistical Learning Lecture 5: Bayesian Networks and Graphical Models Mário A. T. Figueiredo Instituto Superior Técnico & Instituto de Telecomunicações University of Lisbon, Portugal May 2018 Mário A.
More information