Non-Parametric Bayes

Similar documents
Lecture 3a: Dirichlet processes

Bayesian Nonparametric Models

Infinite latent feature models and the Indian Buffet Process

Non-parametric Clustering with Dirichlet Processes

CPSC 540: Machine Learning

Bayesian Nonparametrics for Speech and Signal Processing

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Bayesian Nonparametrics

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

Gentle Introduction to Infinite Gaussian Mixture Modeling

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado

STAT Advanced Bayesian Inference

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process

Bayesian Nonparametrics: Dirichlet Process

Bayesian Methods for Machine Learning

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Introduction to Probabilistic Machine Learning

Stochastic Processes, Kernel Regression, Infinite Mixture Models

Generative Clustering, Topic Modeling, & Bayesian Inference

Clustering using Mixture Models

Advanced Machine Learning

Bayesian Nonparametrics

Bayesian nonparametric latent feature models

Bayesian Nonparametric Learning of Complex Dynamical Phenomena

Bayesian Nonparametrics: Models Based on the Dirichlet Process

A Brief Overview of Nonparametric Bayesian Models

Dirichlet Processes: Tutorial and Practical Course

13: Variational inference II

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Hierarchical Bayesian Languge Model Based on Pitman-Yor Processes. Yee Whye Teh

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign

Bayesian non parametric approaches: an introduction

Bayesian nonparametrics

Bayesian Machine Learning

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Probabilistic Graphical Models

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

Latent Dirichlet Alloca/on

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

ICML Scalable Bayesian Inference on Point processes. with Gaussian Processes. Yves-Laurent Kom Samo & Stephen Roberts

Nonparametric Mixed Membership Models

Nonparametric Bayesian Methods (Gaussian Processes)

STA 4273H: Statistical Machine Learning

Hierarchical Bayesian Nonparametrics

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Dimension Reduction (PCA, ICA, CCA, FLD,

PMR Learning as Inference

Lecture 13 : Variational Inference: Mean Field Approximation

CS Lecture 18. Topic Models and LDA

The Expectation-Maximization Algorithm

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Model-based cognitive neuroscience approaches to computational psychiatry: clustering and classification

28 : Approximate Inference - Distributed MCMC

Nonparametric Bayesian Models for Sparse Matrices and Covariances

Construction of Dependent Dirichlet Processes based on Poisson Processes

Collapsed Variational Dirichlet Process Mixture Models

Part IV: Monte Carlo and nonparametric Bayes

Infinite Latent Feature Models and the Indian Buffet Process

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Introduction to Machine Learning

Density Estimation. Seungjin Choi

Bayesian non-parametric model to longitudinally predict churn

Priors for Random Count Matrices with Random or Fixed Row Sums

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

Interpretable Latent Variable Models

Linear Dynamical Systems

Applied Nonparametric Bayes

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Foundations of Nonparametric Bayesian Methods

CPSC 340: Machine Learning and Data Mining. More PCA Fall 2017

Nonparametric Bayesian Methods - Lecture I

arxiv: v1 [stat.ml] 20 Nov 2012

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

STA 414/2104: Machine Learning

Bayesian Models in Machine Learning

STA 4273H: Statistical Machine Learning

The Indian Buffet Process: An Introduction and Review

CSC411 Fall 2018 Homework 5

Bayesian Inference and MCMC

Nonparametric Probabilistic Modelling

Latent Variable Models

Latent Dirichlet Allocation (LDA)

ECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4

CSC 2541: Bayesian Methods for Machine Learning

CS6220: DATA MINING TECHNIQUES

Image segmentation combining Markov Random Fields and Dirichlet Processes

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Infering the Number of State Clusters in Hidden Markov Model and its Extension

Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC. Shane T. Jensen

CSE 473: Artificial Intelligence Autumn Topics

Fast Approximate MAP Inference for Bayesian Nonparametrics

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Machine Learning

Non-parametric Bayesian Methods

Transcription:

Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016

Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian nonparametrics.

Motivation: Choosing Number of Mixture Components Consider density estimation with mixture of Gaussians: How many clusters should we use?

Motivation: Choosing Number of Mixture Components Consider density estimation with mixture of Gaussians: How many clusters should we use? Standard approach: 1 Try out a bunch of different values for number of clusters. 2 Use a model selection criterion to decide (BIC, cross-validation, etc.).

Motivation: Choosing Number of Mixture Components Consider density estimation with mixture of Gaussians: How many clusters should we use? Bayesian non-parametric approach: Fit a single model where number of clusters adapts to data.

Motivation: Choosing Number of Mixture Components Consider density estimation with mixture of Gaussians: How many clusters should we use? Bayesian non-parametric approach: Fit a single model where number of clusters adapts to data. Number of clusters increases with dataset size.

Finite Mixture Models Standard Gaussian mixture model with k mixtures. x i z i = c, θ c N (µ c, Σ c ), z i Cat(θ 1, θ 2,..., θ k ),

Finite Mixture Models Standard Gaussian mixture model with k mixtures. x i z i = c, θ c N (µ c, Σ c ), z i Cat(θ 1, θ 2,..., θ k ), The conjugate prior to the categorical distribution is the Dirichlet distribution, p(z i = c θ) = θ c, p(θ α) θ α1 1 1 θ α2 1 2... θ α k 1 k. We can think of Dirichlet as distribution over probabilities of k variables.

Finite Mixture Models Standard Gaussian mixture model with k mixtures. x i z i = c, θ c N (µ c, Σ c ), z i Cat(θ 1, θ 2,..., θ k ), The conjugate prior to the categorical distribution is the Dirichlet distribution, p(z i = c θ) = θ c, p(θ α) θ α1 1 1 θ α2 1 2... θ α k 1 k. We can think of Dirichlet as distribution over probabilities of k variables. With this and MCMC/variational inference, we can do the usual Bayesian stuff. However, this model requires us to pre-specify k.

Infinite Mixture Models We don t want to pre-specify k. Naive approach: Put a prior over k. Work with posterior over k, θ, and mixture parameters.

Infinite Mixture Models We don t want to pre-specify k. Naive approach: Put a prior over k. Work with posterior over k, θ, and mixture parameters. Challenges: Do we have to fit a model for every k? For k < k, posterior are defined over different spaces (needs reversible-jump MCMC).

Infinite Mixture Models We don t want to pre-specify k. Naive approach: Put a prior over k. Work with posterior over k, θ, and mixture parameters. Challenges: Do we have to fit a model for every k? For k < k, posterior are defined over different spaces (needs reversible-jump MCMC). Non-parametric Bayesian approach: Assume k =, but only a finite number were used to generate data.

Infinite Mixture Models We don t want to pre-specify k. Naive approach: Put a prior over k. Work with posterior over k, θ, and mixture parameters. Challenges: Do we have to fit a model for every k? For k < k, posterior are defined over different spaces (needs reversible-jump MCMC). Non-parametric Bayesian approach: Assume k =, but only a finite number were used to generate data. Posterior will contain assignments of points to these clusters. Posterior predictive can assign point to new cluster.

Stochastic Processes and Dirichlet Process Recall that stochastic process is an infinite collection of random variables. Gaussian process: infinite-dimensional Gaussian. Process is defined by mean function and covariance function. Useful non-parametric prior for continuous distributions.

Stochastic Processes and Dirichlet Process Recall that stochastic process is an infinite collection of random variables. Gaussian process: infinite-dimensional Gaussian. Process is defined by mean function and covariance function. Useful non-parametric prior for continuous distributions. Dirichlet process: infinite-dimensional Dirichlet. Process defined by concentration parameter α. Useful non-parametric prior for categorical distributions. Also called the Chinese restaurant process.

Chinese Restaurant Process The first customer sits at their own table.

Chinese Restaurant Process The first customer sits at their own table. The second customer: Sits at a new table with probability α 1+α. Sits at first table with probability 1 1+α.

Chinese Restaurant Process The first customer sits at their own table. The second customer: Sits at a new table with probability α 1+α. Sits at first table with probability 1 1+α. The (n + 1) customer: Sits at a new table with probability Sits at table c with probability α. n+α nc. n+α

Chinese Restaurant Process At time n, defines probabilities over k tables and all others, ( ) n1 n + α, n 2 n + α,..., n k n + α, α. n + α Higher concentration α means more occupied tables. For large n number of tables is O(α log n). We can put a hyper-prior on α. A subtle issue is that the CRP is exchangeable: Up to label switching, probabilities are unchanged if order of customers is changed. An equivalent view of Dirichlet/Chinese-restaurant process is the stick-breaking process.

Dirichlet Process Mixture Models Standard finite Gaussian mixture likelihood (fixed variance Σ) k p(x Σ, θ, µ 1, µ 2,..., µ k ) = θ c p(x µ c, Σ), where we might assume θ comes from a Dirichlet distribution. c=1

Dirichlet Process Mixture Models Standard finite Gaussian mixture likelihood (fixed variance Σ) k p(x Σ, θ, µ 1, µ 2,..., µ k ) = θ c p(x µ c, Σ), where we might assume θ comes from a Dirichlet distribution. Infinite Gaussian mixture likelihood, p(x Σ, θ, µ 1, µ 2,... ) = θ c p(x µ c, Σ), c=1 c=1 where we might assume θ comes from a Dirichlet process. So the DP gives us the non-zero θ c values. In practice, variational/mcmc inference methods used. https://www.youtube.com/watch?v=0vh7qzy9sps

Summary Non-parametric Bayes place priors over infinite-dimensional objects. Complexity of model grows with data.

Summary Non-parametric Bayes place priors over infinite-dimensional objects. Complexity of model grows with data. Gaussian processes define prior over infinite-dimensional functions. Dirichlet processes define prior over infinite-dimensional probabilities. Interpretation in terms of Chinese restaurant process. Allows us to fit mixture models without pre-specifying number of mixtures.

Summary Non-parametric Bayes place priors over infinite-dimensional objects. Complexity of model grows with data. Gaussian processes define prior over infinite-dimensional functions. Dirichlet processes define prior over infinite-dimensional probabilities. Interpretation in terms of Chinese restaurant process. Allows us to fit mixture models without pre-specifying number of mixtures. Various extensions exist (some will be discussed next time): Latent Dirichlet allocation (topic models). Beta (indian buffet) process (PCA and factor analysis). Hierarchical Dirichlet process. Poyla trees (generating trees). Infinite hidden Markov models (infinite number of hidden states).