Bayesian nonparametric latent feature models

Similar documents
Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process

Bayesian Nonparametric Models on Decomposable Graphs

Bayesian nonparametric latent feature models

Infinite latent feature models and the Indian Buffet Process

Lecture 3a: Dirichlet processes

Non-Parametric Bayes

A Brief Overview of Nonparametric Bayesian Models

The Indian Buffet Process: An Introduction and Review

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

Dirichlet Processes and other non-parametric Bayesian models

Factor Analysis and Indian Buffet Process

Probabilistic Graphical Models

Bayesian Nonparametrics: Dirichlet Process

Non-parametric Clustering with Dirichlet Processes

Bayesian Nonparametrics

Dirichlet Processes: Tutorial and Practical Course

Non-parametric Bayesian Methods

Bayesian Nonparametrics for Speech and Signal Processing

Infinite Latent Feature Models and the Indian Buffet Process

Nonparametric Bayesian Models for Sparse Matrices and Covariances

Bayesian non parametric approaches: an introduction

Bayesian nonparametric models for bipartite graphs

An Infinite Product of Sparse Chinese Restaurant Processes

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

Nonparametric Probabilistic Modelling

Hierarchical Models, Nested Models and Completely Random Measures

Priors for Random Count Matrices with Random or Fixed Row Sums

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Applied Nonparametric Bayes

Bayesian Hidden Markov Models and Extensions

Dirichlet Process. Yee Whye Teh, University College London

An Introduction to Bayesian Nonparametric Modelling

Bayesian Nonparametrics: Models Based on the Dirichlet Process

Bayesian Nonparametric Models

Feature Allocations, Probability Functions, and Paintboxes

STAT Advanced Bayesian Inference

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

Collapsed Variational Dirichlet Process Mixture Models

A Stick-Breaking Construction of the Beta Process

Bayesian nonparametrics

Stochastic Processes, Kernel Regression, Infinite Mixture Models

Bayesian Nonparametrics

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Image segmentation combining Markov Random Fields and Dirichlet Processes

Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

The Infinite Factorial Hidden Markov Model

arxiv: v1 [stat.ml] 20 Nov 2012

Hierarchical Dirichlet Processes

Nonparametric Mixed Membership Models

Bayesian Nonparametric Learning of Complex Dynamical Phenomena

Sparsity? A Bayesian view

Bayesian nonparametric models for bipartite graphs

arxiv: v2 [stat.ml] 4 Aug 2011

Properties of Bayesian nonparametric models and priors over trees

Distance dependent Chinese restaurant processes

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign

Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5)

Nonparmeteric Bayes & Gaussian Processes. Baback Moghaddam Machine Learning Group

Dependent hierarchical processes for multi armed bandits

Dirichlet Enhanced Latent Semantic Analysis

IPSJ SIG Technical Report Vol.2014-MPS-100 No /9/25 1,a) 1 1 SNS / / / / / / Time Series Topic Model Considering Dependence to Multiple Topics S

Applied Bayesian Nonparametrics 3. Infinite Hidden Markov Models

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado

Bayesian Statistics. Debdeep Pati Florida State University. April 3, 2017

Tree-Based Inference for Dirichlet Process Mixtures

Bayesian non-parametric model to longitudinally predict churn

An Alternative Prior Process for Nonparametric Bayesian Clustering

Advanced Machine Learning

Construction of Dependent Dirichlet Processes based on Poisson Processes

Bayesian Nonparametrics: some contributions to construction and properties of prior distributions

Probabilistic modeling of NLP

Gentle Introduction to Infinite Gaussian Mixture Modeling

arxiv: v1 [stat.ml] 8 Jan 2012

Part IV: Monte Carlo and nonparametric Bayes

arxiv: v2 [stat.ml] 10 Sep 2012

Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling

Variational Bayesian Dirichlet-Multinomial Allocation for Exponential Family Mixtures

28 : Approximate Inference - Distributed MCMC

A Simple Proof of the Stick-Breaking Construction of the Dirichlet Process

Variational Inference for the Nested Chinese Restaurant Process

Parallel Markov Chain Monte Carlo for Pitman-Yor Mixture Models

Foundations of Nonparametric Bayesian Methods

CSci 8980: Advanced Topics in Graphical Models Analysis of Genetic Variation

Evolutionary Clustering by Hierarchical Dirichlet Process with Hidden Markov State

Hierarchical Bayesian Nonparametrics

Spectral Chinese Restaurant Processes: Nonparametric Clustering Based on Similarities

Hierarchical Bayesian Languge Model Based on Pitman-Yor Processes. Yee Whye Teh

Bayesian Mixtures of Bernoulli Distributions

Hierarchical Bayesian Nonparametric Models of Language and Text

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Bayesian Methods for Machine Learning

arxiv: v1 [cs.lg] 16 Sep 2014

Truncation-free Stochastic Variational Inference for Bayesian Nonparametric Models

Truncation error of a superposed gamma process in a decreasing order representation

Hierarchical Dirichlet Processes with Random Effects

CS6220: DATA MINING TECHNIQUES

The Infinite Latent Events Model

TWO MODELS INVOLVING BAYESIAN NONPARAMETRIC TECHNIQUES

Transcription:

Bayesian nonparametric latent feature models François Caron UBC October 2, 2007 / MLRG François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 1 / 29

Overview 1 Introduction 2 Dirichlet Process Finite mixture model Equivalent classes Dirichlet Process model Chinese Restaurant Stick-breaking construction 3 Nonparametric latent feature models Finite feature model Equivalent classes Nonparametric latent feature model Indian buffet Stick-breaking construction 4 Applications François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 2 / 29

Overview 1 Introduction 2 Dirichlet Process Finite mixture model Equivalent classes Dirichlet Process model Chinese Restaurant Stick-breaking construction 3 Nonparametric latent feature models Finite feature model Equivalent classes Nonparametric latent feature model Indian buffet Stick-breaking construction 4 Applications François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 3 / 29

Finite mixture model n objects and K clusters Each object k = 1,..., n can belong to only one cluster j {1,..., K} Represented by an allocation variable c k {1,..., K} François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 4 / 29

Finite mixture model Bayesian approach: Dirichlet-multinomial model π 1:K D( α K,..., α K ) and for k = 1,..., n, K : Dirichlet Process c k π 1:K François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 5 / 29

Finite latent feature model n objects and K features Each object k can have a finite number of latent features Represented by a binary vector c k {0, 1} K François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 6 / 29

Finite latent feature model Bayesian approach: Prior distribution over a binary matrix of size n K K : Nonparametric latent feature model François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 7 / 29

Overview 1 Introduction 2 Dirichlet Process Finite mixture model Equivalent classes Dirichlet Process model Chinese Restaurant Stick-breaking construction 3 Nonparametric latent feature models Finite feature model Equivalent classes Nonparametric latent feature model Indian buffet Stick-breaking construction 4 Applications François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 8 / 29

Finite mixture model Hierarchical model For j = 1,..., K, For k = 1,..., n, π 1:K D( α K,..., α K ) U j G 0 c k π 1:K z k c k, U 1:K f ( U ck ) François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 9 / 29

Finite mixture model Prior distribution Pr(π 1:K, U 1:K, c 1:n ) = Pr(π 1:K, c 1:n ) Pr(U 1:K ) Can integrate out analytically π 1:K (Dirichlet-multinomial distribution) Pr(c 1:n ) = Pr(π 1:K, c 1:n )dπ 1:K = Γ(α) K j=1 Γ(n j + α K ) Γ(α + n)γ( α K )K François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 10 / 29

Distribution over partitions Let Π n = {A 1,..., A n(πn)} be a random partition of {1,..., n} and P n the set of partitions of {1,..., n} Ex: Π 5 = {{1, 2, 5}, {3}, {4}} Several different values of c 1:n may induce the same partition Ex: c 1:5 = (1, 1, 2, 1, 3) and c 1:5 = (2, 2, 3, 2, 1) both induce the same partition Π 5 = {{1, 2, 4}, {3}, {5}} Let Π n (c 1:n ) be the partition of {1,..., n} induced by the equivalence relationship i j c i = c j François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 11 / 29

Distribution over partitions We have Pr(Π n (c 1:n )) = K! (K n(π n ))! Pr(c 1:n) for all Π n P K where P K = {Π n P n n(π n ) K} and thus for the Dirichlet-multinomial distribution Pr(Π n ) = K! Γ(α) K j=1 Γ(n j + α K ) (K n(π n ))! Γ(α + n)γ( α K )K François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 12 / 29

Dirichlet Process Limit when K of the finite model Pr(Π n ) = K! Γ(α) K j=1 Γ(n j + α K ) (K n(π n ))! Γ(α + n)γ( α K )K is given by Pr(Π n ) = αn(πn) n(π n) j=1 Γ(n j ) (α + i 1) n i=1 François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 13 / 29

Dirichlet Process Connexion to the usual formulation for k = 1,..., n G DP(α, G 0 ) θ k G G Let Π n (θ 1...., θ n ) be the partition of {1,..., n} induced by the equivalence relationship i j θ i = θ j and U 1,..., U n(πn) be the different values taken by θ 1,..., θ n When integrating out the unknown distribution G n(π n) Pr(θ 1,..., θ n ) = Pr(Π n (θ 1...., θ n )) G 0 (U j ) j=1 François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 14 / 29

Chinese Restaurant Let Π k be the partition obtained by removing item k from Π n Conditional association probability of item k is Pr(Π n ) Pr(Π k ) Item k is associated to an existing cluster j = 1,..., n(π k ) with probability n j, k n 1 + α and to a new cluster with probability α n 1 + α François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 15 / 29

Stick-breaking representation Let Π 1, Π 2,... be the successive partitions obtained with sequential CRP updates Ex: Π 1 = {{1}}, Π 2 = {{1}, {2}}, Π 3 = {{1, 3}, {2}}, Π 4 = {{1, 3}, {2}, {4}},... Let n j,t the size of cluster j in Π t, where the clusters are numbered in order of appearance Let lim t n j,t t where = π j. We have j 1 π j = β j (1 β i ) i=1 β j B(1, α) Stick-breaking construction or residual allocation model François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 16 / 29

Overview 1 Introduction 2 Dirichlet Process Finite mixture model Equivalent classes Dirichlet Process model Chinese Restaurant Stick-breaking construction 3 Nonparametric latent feature models Finite feature model Equivalent classes Nonparametric latent feature model Indian buffet Stick-breaking construction 4 Applications François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 17 / 29

Finite feature model Assume the following model For each feature j = 1,..., K, π j B( α K, 1) For each object k and each feature j c k,j Ber(π j ) François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 18 / 29

Finite feature model Integrating out π 1:K Pr(c 1:n ) = K j=1 α K Γ(m j + α K )Γ(n m j + 1) Γ(n + 1 + α K ) Invariant to permutation of objects/features François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 19 / 29

Equivalent classes Distribution over binary matrix is invariant with respect to permutations of columns Left-ordered form Pr(ξ(c 1:n )) = K! 2 n 1 h=0 K h! K j=1 α K Γ(m j + α K )Γ(n m j + 1) Γ(n + 1 + α K ) François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 20 / 29

Nonparametric latent feature model Limit when K of the finite model is given by Pr(ξ(c 1:n )) = Pr(ξ(c 1:n )) = K! 2 n 1 h=0 K h! K + 2 n 1 K j=1 h=1 K h! exp( αh n) α K Γ(m j + α K )Γ(n m j + 1) Γ(n + 1 + α K ) K + j=1 (n m j )!(m j 1)! n! François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 21 / 29

Indian Buffet Buffet with infinite number of dishes First customer samples Poisson(α) dishes k eme customer takes each dish with probability m j /k and tries Poisson(α/k) new dishes Caution: Not in left-ordered form! François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 22 / 29

Indian Buffet François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 23 / 29

Properties Total number of different features K + follows Poisson(α( n k=1 1 k )) Number of features possessed by each object follows Poisson(α) Total number of entries in the matrix follows Poisson(nα) François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 24 / 29

Stick-breaking construction Let π (1) > π (2) >... > π (K) be a decreasing ordering of π 1:K where π k B( α K, 1) Limit when K : stick-breaking contruction β (k) B(α, 1) and π (k) = π (k 1) β (k) DP: stick lengths sum to one and are not decreasing (only in average) IBP: stick lengths does not sum to one and are decreasing François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 25 / 29

Overview 1 Introduction 2 Dirichlet Process Finite mixture model Equivalent classes Dirichlet Process model Chinese Restaurant Stick-breaking construction 3 Nonparametric latent feature models Finite feature model Equivalent classes Nonparametric latent feature model Indian buffet Stick-breaking construction 4 Applications François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 26 / 29

Applications Choice behaviour Protein interaction screens Structure of causal graphs François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 27 / 29

Bibliography Random partitions and Dirichlet Processes J. Pitman. Exchangeable and partially exchangeable random partitions Probability Theory and Related Fields, 1995. J. Pitman. Some developments of the Blackwell-MacQueen urn scheme In Statistics, probability and game theory; paper in honour of David Blackwell, 1996. Indian buffet Z. Ghahramani, T.L. Griffiths and P. Sollich. Bayesian nonparametric latent feature models Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics, 2006. T.L. Griffiths and Z. Ghahramani. Infinite latent feature models and the Indian Buffet Process Technical Report, University College, London, 2005. Y.W. Teh, D. Gorur and Z. Ghahramani. Stick-breaking construction for the Indian Buffet Process AISTATS, 2007. François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 28 / 29

Bibliography Applications of Indian buffet D. Gorur, F. Jakel and C. Rasmussen. A choice model with infinitely many latent features. ICML 2006. W. Chu, Z. Ghahramani, R. Krause and D.L. Wild. Identifying protein complexes in high-throughput protein interactions screens using an infinite latent feature model. Biocomputing 2006. F. Wood, T.L. Griffiths and Z. Ghahramani. A nonparametric Bayesian method for inferring hidden causes. UAI 2006. Still hungry? M. McGlohon. Fried Chicken Bucket Processes SIGBOVIK, Pittsburgh, 2007. François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 29 / 29