Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC. Shane T. Jensen

Size: px
Start display at page:

Download "Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC. Shane T. Jensen"

Transcription

1 Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC Shane T. Jensen Department of Statistics The Wharton School, University of Pennsylvania Collaborative work with J. Liu, L. Dicker, and G. Tuteja Shane T. Jensen 1 May 13, 2006

2 Introduction Bayesian non-parametric or semi-parametric models are very useful in many applications Non-parametric: random variables realizations from unspecified probability distribution e.g., X i F( ) i =1,..., n X i s can be observed data, latent variables or unknown parameters (often in a hierarchical setting) Prior distributions for F( ) play an important role in non-parametric modeling Shane T. Jensen 2 May 13, 2006

3 Dirichlet Process Priors A commonly-used prior distribution for an unknown probability distribution is the Dirichlet process F( ) DP(θ, F 0 ) F 0 is a probability measure can represent prior belief in form of F θ is a weight parameter can represent degree of belief in prior form F 0 Ferguson (1973,1974); Antoniak (1974); many others Important consequence of Dirichlet process is that it induces a discretized posterior distribution Shane T. Jensen 3 May 13, 2006

4 Consequence of DP priors Ferguson, 1974: using a Dirichlet process DP(θ, F 0 ) prior for F( ) results in a posterior mixture of F 0 and point masses at observation X i : F( ) X 1,..., X n DP ( θ + n, F 0 + n δ(x i ) ) i=1 For density estimation, discreteness may be a problem: convolutions with kernel functions can be used to produce a continuous density estimate In other applications, discreteness is not a disadvantage! Shane T. Jensen 4 May 13, 2006

5 Clustering with a DP prior Point mass component of posterior leads to a random partition of our variables Consider a new variable X n+1 and let X 1,..., X C be the unique values of X 1:n =(X 1,..., X n ). Then, P (X n+1 = X C X 1:n )= N c θ + n P (X n+1 = new X 1:n )= θ θ + n c =1,..., C N c = size of cluster c: number in X 1:n that equal X c Rich get richer : will return to this... Shane T. Jensen 5 May 13, 2006

6 Motivating Application: TF motifs Genes are regulated by transcription factor (TF) proteins that bind to the DNA sequence near to gene TF proteins can selectively control only certain target genes by only binding to the same sequence, called a motif The motif sites are highly conserved but not identical, so we use a matrix description of the motif appearance Frequency Matrix - X i Sequence Logo A C G T Shane T. Jensen 6 May 13, 2006

7 Collections of TF motifs Large databases contain motif information on many TFs but with large amount of redundancy TRANSFAC and JASPAR are largest (100 s in each) Want to cluster motifs together to either reduce redundancy in databases or match new motifs to database Nucleotide conservation varies both within a single motif (between positions) and between different motifs Tal1beta-E47S AGL3 Shane T. Jensen 7 May 13, 2006

8 Motif Clustering with DP prior Hierarchical model with levels for both within-unit and between-unit variability in discovered motifs Observed count matrix Y i is a product multinomial realization of frequency matrix X i Unknown X i s share unknown distribution F( ) Dirichlet process DP(θ, F 0 ) prior for F( ) leads to posterior mixture of F 0 and point masses at each X i Our prior measure F 0 in this application is a product Dirichlet distribution Shane T. Jensen 8 May 13, 2006

9 Benefits and Issues with DP prior Allows unknown number of clusters without need to model number of clusters directly No real prior knowledge about number of clusters in our application However, with DP there are implicit assumptions about number of clusters (and their size distribution) Rich get richer property influences prior predictive number of clusters and cluster size distribution How influential is this property in an application? Shane T. Jensen 9 May 13, 2006

10 Benefits and Issues with MCMC DP-based model is easy to implement via Gibbs sampling p(x i X i ) is same choice structure as p(x n+1 X 1:n ) X i either sampled into one of current clusters defined by X i or sampled from F 0 to form a new cluster Alternative is direct model on number of clusters and then use something like Reversible Jump MCMC Mixing can be an issue with Gibbs sampler collapsed Gibbs sampler: integrate out X i and deal directly with clustering indicators split/merge moves to speed up mixing: lots of great work by R. Neal, D. Dahl and others Shane T. Jensen 10 May 13, 2006

11 Main Issue 1: Posterior Inference from MCMC However, there are still issues posterior inference based on Gibbs sampling output also has issues Need to infer a set of clusters from sampled partitions, but we have a label switching problem (Stephens, 1999) cluster labels are exchangeable for a particular partition usual summaries such as posterior mean can be misleading mixtures of these exchangeable labeling need summaries that are uninfluenced by labeling Shane T. Jensen 11 May 13, 2006

12 Posterior Inference Options Option 1: clusters defined by last partition visited sampled partition produced at end of Gibbs chain surprisingly popular, e.g. Latent Dirichlet Alloc. models Option 2: clusters defined by MAP partition sampled partition with highest posterior density simple and popular Option 3: clusters defined by threshold on pairwise posterior probabilities P ij frequency of iterations with motifs i & j in same cluster Shane T. Jensen 12 May 13, 2006

13 Main Issue 2: Implicit DP Assumptions DP has implicit rich get richer property: easy to see from the predictive distribution: P (X n+1 joins cluster c ) = P (X n+1 forms new cluster) = N c θ + n θ θ + n c =1,..., C Chinese restaurant process: new customer chooses table sits at current table with probability N c, the number of customers already sitting there sits at entirely new table with probability θ Shane T. Jensen 13 May 13, 2006

14 Alternative Priors for Clustering Uniform Prior: socialism, no one gets rich P (X n+1 joins cluster c ) = 1 θ + C c =1,..., C P (X n+1 forms new cluster) = θ θ + C Pitman-Yor Prior: rich get richer, but charitable P (X n+1 joins cluster c ) = N c α θ + n c =1,..., C P (X n+1 forms new cluster) = θ C α θ + n 0 α 1 is often called the discount factor Shane T. Jensen 14 May 13, 2006

15 Asymptotic Comparison of Priors Number of clusters C n is clearly a function of sample size n How does C n grow as n? DP Prior : Pitman Yor Prior : E(C n ) θ log(n) E(C n ) K(θ, α) n α Uniform Prior : E(C n ) K(θ) n 1 2 DP prior shows slowest growth in number of clusters C n Interestingly, Pitman-Yor can lead to either faster or slower growth vs. Uniform, depending on α Also working on results for distribution of cluster sizes Shane T. Jensen 15 May 13, 2006

16 Finite Sample Comparison of Priors Y = C n vs. X = n for different values of θ θ= 1 θ= 10 θ= 100 Expected Number of Clusters DP UN PY (α= 0.5) PY (α= 0.25) PY (α= 0.75) Expected Number of Clusters DP UN PY (α= 0.5) PY (α= 0.25) PY (α= 0.75) Expected Number of Clusters DP UN PY (α= 0.5) PY (α= 0.25) PY (α= 0.75) 1e+02 5e+02 5e+03 5e+04 1e+02 5e+02 5e+03 5e+04 1e+02 5e+02 5e+03 5e+04 n = number of observations n = number of observations n = number of observations Shane T. Jensen 16 May 13, 2006

17 Simulation Study of Motif Clustering Evaluation of different priors and modes of inference in context of motif clustering application Simulated realistic collections of motifs (known partitions) Different simulation conditions to vary clustering difficulty: high to low within-cluster similarity high to low between-cluster similarity Success measured by Jacard similarity between true partition z and inferred partition ẑ J(z, ẑ) = TP TP + FP + FN Shane T. Jensen 17 May 13, 2006

18 Simulation Comparison of Inference Alternatives Jacard Index MAP Prob > 0.5 Prob > Increasing Clustering Difficulty MAP partition consistently inferior to pairwise probs. Post. probs. incorporate uncertainty across iterations Shane T. Jensen 18 May 13, 2006

19 Simulation Comparison of Prior Alternatives Jacard Index Uniform PY 0.25 PY 0.5 PY 0.75 DP Increasing Clustering Difficulty Not much difference in general between priors Uniform does a little worse in most situations Shane T. Jensen 19 May 13, 2006

20 Real Data Results: Clustering JASPAR database Tree based on pairwise posterior probabilities: 1 Prob(Clustering) Homo.sapiens NUCLEAR MA0065 Homo.sapiens NUCLEAR MA0072 Drosophila.melanogaster NUCLEAR MA0016 Homo.sapiens NUCLEAR MA0074 Homo.sapiens NUCLEAR MA0066 Homo.sapiens NUCLEAR MA0071 Arabidopsis.thaliana HOMEO.ZIP MA0008 Arabidopsis.thaliana HOMEO.ZIP MA0110 Mus.musculus bhlh.zip MA0104 Homo.sapiens bhlh.zip MA0093 Homo.sapiens bhlh.zip MA0059 Mus.musculus bhlh MA0004 Homo.sapiens bhlh.zip MA0058 Homo.sapiens bzip MA0018 Antirrhinum.majus bzip MA0096 Antirrhinum.majus bzip MA0097 Homo.sapiens ETS MA0028 Drosophila.melanogaster ETS MA0026 Homo.sapiens ETS MA0076 Homo.sapiens HMG MA0084 Mus.musculus HMG MA0087 Homo.sapiens FORKHEAD MA0030 Homo.sapiens FORKHEAD MA0031 Homo.sapiens AP2 MA0003 Homo.sapiens ZN.FINGER MA0095 Mus.musculus HMG MA0078 Rattus.norvegicus FORKHEAD MA0041 Rattus.norvegicus FORKHEAD MA0047 Rattus.norvegicus FORKHEAD MA0040 Homo.sapiens FORKHEAD MA0042 Rattus.norvegicus bzip MA0019 Homo.sapiens bhlh MA0091 Homo.sapiens ZN.FINGER MA0073 Homo.sapiens TEA MA0090 Homo.sapiens NUCLEAR MA0017 Gallus.gallus ZN.FINGER MA0103 Homo.sapiens P53 MA0106 Drosophila.melanogaster ZN.FINGER MA0011 Homo.sapiens MADS MA0083 Arabidopsis.thaliana MADS MA0001 Arabidopsis.thaliana MADS MA0005 Homo.sapiens FORKHEAD MA0032 Homo.sapiens bhlh MA0048 Antirrhinum.majus MADS MA0082 Oryctolagus.cuniculus ZN.FINGER MA0109 Homo.sapiens Unknown MA0024 Pisum.sativum HMG MA0044 Homo.sapiens RUNT MA0002 Mus.musculus bhlh MA0006 Homo.sapiens PAIRED MA0069 Petunia.hybrida TRP.CLUSTER MA0054 Hordeum.vulgare TRP.CLUSTER MA0034 Xenupus.laevis ZN.FINGER MA0088 Gallus.gallus ETS MA0098 Homo.sapiens bhlh MA0055 Mus.musculus T.BOX MA0009 Drosophila.melanogaster ZN.FINGER MA0086 NA bzip MA0102 Homo.sapiens bzip MA0025 Mus.musculus HOMEO MA0063 Drosophila.melanogaster REL MA0023 Homo.sapiens REL MA0105 Homo.sapiens ETS MA0062 Zea.mays ZN.FINGER MA0020 Zea.mays ZN.FINGER MA0021 Homo.sapiens HMG MA0077 Mus.musculus PAIRED MA0067 Homo.sapiens bzip MA0043 Gallus.gallus bzip MA0089 Mus.musculus bzip MA0099 Drosophila.melanogaster REL MA0022 Homo.sapiens ZN.FINGER MA0079 Mus.musculus bhlh.zip MA0111 Homo.sapiens REL MA0107 Homo.sapiens REL MA0101 Vertebrates REL MA0061 Mus.musculus PAIRED MA0014 Homo.sapiens ZN.FINGER MA0057 Mus.musculus ZN.FINGER MA0039 Mus.musculus HOMEO MA0027 Homo.sapiens ZN.FINGER MA0056 Homo.sapiens ETS MA0081 Drosophila.melanogaster IPT/TIG MA0085 Rattus.rattus NUCLEAR MA0007 Homo.sapiens ETS MA0080 Drosophila.melanogaster ZN.FINGER MA0015 NA TATA.box MA0108 Mus.musculus ZN.FINGER MA0035 Mus.musculus ZN.FINGER MA0029 Homo.sapiens ZN.FINGER MA0037 Rattus.norvegicus ZN.FINGER MA0038 Homo.sapiens TRP.CLUSTER MA0050 Homo.sapiens TRP.CLUSTER MA0051 Drosophila.melanogaster ZN.FINGER MA0013 Homo.sapiens HOMEO MA0070 Homo.sapiens FORKHEAD MA0033 Mus.musculus PAIRED.HOMEO MA0068 Drosophila.melanogaster ZN.FINGER MA0010 Pisum.sativum HMG MA0045 Drosophila.melanogaster ZN.FINGER MA0012 Drosophila.melanogaster ZN.FINGER MA0049 Post-processed MAP partition to remove weak relationships, then very similar to thresholded post. probs. Shane T. Jensen 20 May 13, 2006

21 Comparing Priors: Clustering JASPAR database Number of Clusters Unif Average Cluster Size Unif Frequency Number of Clusters DP Frequency Frequency Average Cluster Size DP Frequency Very little difference between using DP and uniform prior Likelihood is dominating any prior assumption on partition Shane T. Jensen 21 May 13, 2006

22 Summary Non-parametric Bayesian approaches based on Dirichlet process can be very useful for clustering applications Issues with MCMC inference: popular MAP partitions seem inferior to partitions based on posterior probabilities Issues with implicit DP assumptions: alternative priors give quite different prior partitions Posterior differences between priors are small in our motif application, but can be larger in other applications Jensen and Liu, JASA (forthcoming) plus other manuscripts soon available on my website stjensen Shane T. Jensen 22 May 13, 2006

Non-Parametric Bayes

Non-Parametric Bayes Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian

More information

An Alternative Prior Process for Nonparametric Bayesian Clustering

An Alternative Prior Process for Nonparametric Bayesian Clustering An Alternative Prior Process for Nonparametric Bayesian Clustering Hanna Wallach (UMass Amherst) Shane Jensen (UPenn) Lee Dicker (Harvard) Katherine Heller (Cambridge) Nonparametric Bayesian Clustering

More information

An Alternative Prior Process for Nonparametric Bayesian Clustering

An Alternative Prior Process for Nonparametric Bayesian Clustering University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2010 An Alternative Prior Process for Nonparametric Bayesian Clustering Hanna M. Wallach University of Massachusetts

More information

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado CSCI 5822 Probabilistic Model of Human and Machine Learning Mike Mozer University of Colorado Topics Language modeling Hierarchical processes Pitman-Yor processes Based on work of Teh (2006), A hierarchical

More information

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model

More information

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu

Lecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Clustering using Mixture Models

Clustering using Mixture Models Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior

More information

Bayesian Nonparametrics: Dirichlet Process

Bayesian Nonparametrics: Dirichlet Process Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

Bayesian Nonparametric Regression for Diabetes Deaths

Bayesian Nonparametric Regression for Diabetes Deaths Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,

More information

Non-parametric Clustering with Dirichlet Processes

Non-parametric Clustering with Dirichlet Processes Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction

More information

Hierarchical Bayesian Languge Model Based on Pitman-Yor Processes. Yee Whye Teh

Hierarchical Bayesian Languge Model Based on Pitman-Yor Processes. Yee Whye Teh Hierarchical Bayesian Languge Model Based on Pitman-Yor Processes Yee Whye Teh Probabilistic model of language n-gram model Utility i-1 P(word i word i-n+1 ) Typically, trigram model (n=3) e.g., speech,

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

Bayesian nonparametrics

Bayesian nonparametrics Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability

More information

Gibbs Sampling Methods for Multiple Sequence Alignment

Gibbs Sampling Methods for Multiple Sequence Alignment Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical

More information

Similarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test *

Similarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test * JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 27, 855-868 (20) Similarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test * QIAN LIU +, SAN-YANG LIU AND LI-FANG LIU + Department

More information

Computer Vision Group Prof. Daniel Cremers. 14. Clustering

Computer Vision Group Prof. Daniel Cremers. 14. Clustering Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it

More information

Bayesian Nonparametric Models

Bayesian Nonparametric Models Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior

More information

Hierarchical Bayesian Nonparametrics

Hierarchical Bayesian Nonparametrics Hierarchical Bayesian Nonparametrics Micha Elsner April 11, 2013 2 For next time We ll tackle a paper: Green, de Marneffe, Bauer and Manning: Multiword Expression Identification with Tree Substitution

More information

Gentle Introduction to Infinite Gaussian Mixture Modeling

Gentle Introduction to Infinite Gaussian Mixture Modeling Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for

More information

STAT Advanced Bayesian Inference

STAT Advanced Bayesian Inference 1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(

More information

Infinite latent feature models and the Indian Buffet Process

Infinite latent feature models and the Indian Buffet Process p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised

More information

Bayesian Nonparametrics

Bayesian Nonparametrics Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent

More information

A Brief Overview of Nonparametric Bayesian Models

A Brief Overview of Nonparametric Bayesian Models A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine

More information

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles

Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Peter Green and John Lau University of Bristol P.J.Green@bristol.ac.uk Isaac Newton Institute, 11 December

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson

More information

Contents. Part I: Fundamentals of Bayesian Inference 1

Contents. Part I: Fundamentals of Bayesian Inference 1 Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian

More information

Advanced Machine Learning

Advanced Machine Learning Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing

More information

Bayesian learning of sparse factor loadings

Bayesian learning of sparse factor loadings Magnus Rattray School of Computer Science, University of Manchester Bayesian Research Kitchen, Ambleside, September 6th 2008 Talk Outline Brief overview of popular sparsity priors Example application:

More information

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles

A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes

More information

Spatial Bayesian Nonparametrics for Natural Image Segmentation

Spatial Bayesian Nonparametrics for Natural Image Segmentation Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes

More information

Bayes methods for categorical data. April 25, 2017

Bayes methods for categorical data. April 25, 2017 Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,

More information

Bayesian Nonparametrics for Speech and Signal Processing

Bayesian Nonparametrics for Speech and Signal Processing Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer

More information

Image segmentation combining Markov Random Fields and Dirichlet Processes

Image segmentation combining Markov Random Fields and Dirichlet Processes Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO

More information

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I

CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet

More information

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes

MAD-Bayes: MAP-based Asymptotic Derivations from Bayes MAD-Bayes: MAP-based Asymptotic Derivations from Bayes Tamara Broderick Brian Kulis Michael I. Jordan Cat Clusters Mouse clusters Dog 1 Cat Clusters Dog Mouse Lizard Sheep Picture 1 Picture 2 Picture 3

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Bayesian Classification and Regression Trees

Bayesian Classification and Regression Trees Bayesian Classification and Regression Trees James Cussens York Centre for Complex Systems Analysis & Dept of Computer Science University of York, UK 1 Outline Problems for Lessons from Bayesian phylogeny

More information

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process

Haupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process Haupthseminar: Machine Learning Chinese Restaurant Process, Indian Buffet Process Agenda Motivation Chinese Restaurant Process- CRP Dirichlet Process Interlude on CRP Infinite and CRP mixture model Estimation

More information

Bayesian non parametric approaches: an introduction

Bayesian non parametric approaches: an introduction Introduction Latent class models Latent feature models Conclusion & Perspectives Bayesian non parametric approaches: an introduction Pierre CHAINAIS Bordeaux - nov. 2012 Trajectory 1 Bayesian non parametric

More information

Clustering bi-partite networks using collapsed latent block models

Clustering bi-partite networks using collapsed latent block models Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space

More information

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling 10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel

More information

Bayesian non-parametric model to longitudinally predict churn

Bayesian non-parametric model to longitudinally predict churn Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics

More information

Nonparametric Mixed Membership Models

Nonparametric Mixed Membership Models 5 Nonparametric Mixed Membership Models Daniel Heinz Department of Mathematics and Statistics, Loyola University of Maryland, Baltimore, MD 21210, USA CONTENTS 5.1 Introduction................................................................................

More information

Modelling Genetic Variations with Fragmentation-Coagulation Processes

Modelling Genetic Variations with Fragmentation-Coagulation Processes Modelling Genetic Variations with Fragmentation-Coagulation Processes Yee Whye Teh, Charles Blundell, Lloyd Elliott Gatsby Computational Neuroscience Unit, UCL Genetic Variations in Populations Inferring

More information

Bayesian Analysis for Natural Language Processing Lecture 2

Bayesian Analysis for Natural Language Processing Lecture 2 Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion

More information

Bayesian Sparse Correlated Factor Analysis

Bayesian Sparse Correlated Factor Analysis Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in

More information

Markov Chain Monte Carlo Lecture 6

Markov Chain Monte Carlo Lecture 6 Sequential parallel tempering With the development of science and technology, we more and more need to deal with high dimensional systems. For example, we need to align a group of protein or DNA sequences

More information

Part IV: Monte Carlo and nonparametric Bayes

Part IV: Monte Carlo and nonparametric Bayes Part IV: Monte Carlo and nonparametric Bayes Outline Monte Carlo methods Nonparametric Bayesian models Outline Monte Carlo methods Nonparametric Bayesian models The Monte Carlo principle The expectation

More information

Different points of view for selecting a latent structure model

Different points of view for selecting a latent structure model Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM

More information

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison

David B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University

More information

Distance dependent Chinese restaurant processes

Distance dependent Chinese restaurant processes David M. Blei Department of Computer Science, Princeton University 35 Olden St., Princeton, NJ 08540 Peter Frazier Department of Operations Research and Information Engineering, Cornell University 232

More information

Dirichlet Processes: Tutorial and Practical Course

Dirichlet Processes: Tutorial and Practical Course Dirichlet Processes: Tutorial and Practical Course (updated) Yee Whye Teh Gatsby Computational Neuroscience Unit University College London August 2007 / MLSS Yee Whye Teh (Gatsby) DP August 2007 / MLSS

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics

Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics David B. Dahl August 5, 2008 Abstract Integration of several types of data is a burgeoning field.

More information

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling

Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence

More information

28 : Approximate Inference - Distributed MCMC

28 : Approximate Inference - Distributed MCMC 10-708: Probabilistic Graphical Models, Spring 2015 28 : Approximate Inference - Distributed MCMC Lecturer: Avinava Dubey Scribes: Hakim Sidahmed, Aman Gupta 1 Introduction For many interesting problems,

More information

Infinite Latent Feature Models and the Indian Buffet Process

Infinite Latent Feature Models and the Indian Buffet Process Infinite Latent Feature Models and the Indian Buffet Process Thomas L. Griffiths Cognitive and Linguistic Sciences Brown University, Providence RI 292 tom griffiths@brown.edu Zoubin Ghahramani Gatsby Computational

More information

Fitting Narrow Emission Lines in X-ray Spectra

Fitting Narrow Emission Lines in X-ray Spectra Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Dirichlet Process. Yee Whye Teh, University College London

Dirichlet Process. Yee Whye Teh, University College London Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant

More information

Nonparametric Bayes tensor factorizations for big data

Nonparametric Bayes tensor factorizations for big data Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional

More information

A marginal sampler for σ-stable Poisson-Kingman mixture models

A marginal sampler for σ-stable Poisson-Kingman mixture models A marginal sampler for σ-stable Poisson-Kingman mixture models joint work with Yee Whye Teh and Stefano Favaro María Lomelí Gatsby Unit, University College London Talk at the BNP 10 Raleigh, North Carolina

More information

Graphical Models for Query-driven Analysis of Multimodal Data

Graphical Models for Query-driven Analysis of Multimodal Data Graphical Models for Query-driven Analysis of Multimodal Data John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology

More information

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models

Outline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University

More information

Interpretable Latent Variable Models

Interpretable Latent Variable Models Interpretable Latent Variable Models Fernando Perez-Cruz Bell Labs (Nokia) Department of Signal Theory and Communications, University Carlos III in Madrid 1 / 24 Outline 1 Introduction to Machine Learning

More information

arxiv: v1 [stat.ml] 8 Jan 2012

arxiv: v1 [stat.ml] 8 Jan 2012 A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process Chong Wang David M. Blei arxiv:1201.1657v1 [stat.ml] 8 Jan 2012 Received: date / Accepted: date Abstract The hierarchical Dirichlet process

More information

Alignment. Peak Detection

Alignment. Peak Detection ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)

18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)

More information

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008

Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically

More information

Study Notes on the Latent Dirichlet Allocation

Study Notes on the Latent Dirichlet Allocation Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection

More information

Lecture 3a: Dirichlet processes

Lecture 3a: Dirichlet processes Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics

More information

Bayesian Nonparametrics: some contributions to construction and properties of prior distributions

Bayesian Nonparametrics: some contributions to construction and properties of prior distributions Bayesian Nonparametrics: some contributions to construction and properties of prior distributions Annalisa Cerquetti Collegio Nuovo, University of Pavia, Italy Interview Day, CETL Lectureship in Statistics,

More information

MCMC and Gibbs Sampling. Kayhan Batmanghelich

MCMC and Gibbs Sampling. Kayhan Batmanghelich MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction

More information

Spatial Normalized Gamma Process

Spatial Normalized Gamma Process Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma

More information

Bayesian Nonparametric Models on Decomposable Graphs

Bayesian Nonparametric Models on Decomposable Graphs Bayesian Nonparametric Models on Decomposable Graphs François Caron INRIA Bordeaux Sud Ouest Institut de Mathématiques de Bordeaux University of Bordeaux, France francois.caron@inria.fr Arnaud Doucet Departments

More information

Bayesian nonparametric latent feature models

Bayesian nonparametric latent feature models Bayesian nonparametric latent feature models Indian Buffet process, beta process, and related models François Caron Department of Statistics, Oxford Applied Bayesian Statistics Summer School Como, Italy

More information

Shared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes

Shared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes Shared Segmentation of Natural Scenes using Dependent Pitman-Yor Processes Erik Sudderth & Michael Jordan University of California, Berkeley Parsing Visual Scenes sky skyscraper sky dome buildings trees

More information

Dirichlet Processes and other non-parametric Bayesian models

Dirichlet Processes and other non-parametric Bayesian models Dirichlet Processes and other non-parametric Bayesian models Zoubin Ghahramani http://learning.eng.cam.ac.uk/zoubin/ zoubin@cs.cmu.edu Statistical Machine Learning CMU 10-702 / 36-702 Spring 2008 Model

More information

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April

Probabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process

19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process 10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable

More information

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign

Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign sun22@illinois.edu Hongbo Deng Department of Computer Science University

More information

A permutation-augmented sampler for DP mixture models

A permutation-augmented sampler for DP mixture models Percy Liang University of California, Berkeley Michael Jordan University of California, Berkeley Ben Taskar University of Pennsylvania Abstract We introduce a new inference algorithm for Dirichlet process

More information

Acoustic Unit Discovery (AUD) Models. Leda Sarı

Acoustic Unit Discovery (AUD) Models. Leda Sarı Acoustic Unit Discovery (AUD) Models Leda Sarı Lucas Ondel and Lukáš Burget A summary of AUD experiments from JHU Frederick Jelinek Summer Workshop 2016 lsari2@illinois.edu November 07, 2016 1 / 23 The

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Bayesian Inference for Dirichlet-Multinomials

Bayesian Inference for Dirichlet-Multinomials Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution

More information

Bayesian inference for multivariate extreme value distributions

Bayesian inference for multivariate extreme value distributions Bayesian inference for multivariate extreme value distributions Sebastian Engelke Clément Dombry, Marco Oesting Toronto, Fields Institute, May 4th, 2016 Main motivation For a parametric model Z F θ of

More information

Random Partition Distribution Indexed by Pairwise Information

Random Partition Distribution Indexed by Pairwise Information Random Partition Distribution Indexed by Pairwise Information David B. Dahl 1, Ryan Day 2, and Jerry W. Tsai 3 1 Department of Statistics, Brigham Young University, Provo, UT 84602, U.S.A. 2 Livermore

More information

Hierarchical Dirichlet Processes

Hierarchical Dirichlet Processes Hierarchical Dirichlet Processes Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David M. Blei Computer Science Div., Dept. of Statistics Dept. of Computer Science University of California at Berkeley

More information

Discovering molecular pathways from protein interaction and ge

Discovering molecular pathways from protein interaction and ge Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why

More information

Probabilistic Time Series Classification

Probabilistic Time Series Classification Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign

More information

Bayesian nonparametric latent feature models

Bayesian nonparametric latent feature models Bayesian nonparametric latent feature models François Caron UBC October 2, 2007 / MLRG François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 1 / 29 Overview 1 Introduction

More information

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC

Stat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline

More information

CS6220: DATA MINING TECHNIQUES

CS6220: DATA MINING TECHNIQUES CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network

More information

Department of Statistics, The Wharton School, University of Pennsylvania

Department of Statistics, The Wharton School, University of Pennsylvania Submitted to the Annals of Applied Statistics BAYESIAN TESTING OF MANY HYPOTHESIS MANY GENES: A STUDY OF SLEEP APNEA BY SHANE T. JENSEN Department of Statistics, The Wharton School, University of Pennsylvania

More information

Priors for Random Count Matrices with Random or Fixed Row Sums

Priors for Random Count Matrices with Random or Fixed Row Sums Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences

More information

Computational statistics

Computational statistics Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated

More information