Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC. Shane T. Jensen
|
|
- Bertha Fitzgerald
- 5 years ago
- Views:
Transcription
1 Bayesian Clustering with the Dirichlet Process: Issues with priors and interpreting MCMC Shane T. Jensen Department of Statistics The Wharton School, University of Pennsylvania Collaborative work with J. Liu, L. Dicker, and G. Tuteja Shane T. Jensen 1 May 13, 2006
2 Introduction Bayesian non-parametric or semi-parametric models are very useful in many applications Non-parametric: random variables realizations from unspecified probability distribution e.g., X i F( ) i =1,..., n X i s can be observed data, latent variables or unknown parameters (often in a hierarchical setting) Prior distributions for F( ) play an important role in non-parametric modeling Shane T. Jensen 2 May 13, 2006
3 Dirichlet Process Priors A commonly-used prior distribution for an unknown probability distribution is the Dirichlet process F( ) DP(θ, F 0 ) F 0 is a probability measure can represent prior belief in form of F θ is a weight parameter can represent degree of belief in prior form F 0 Ferguson (1973,1974); Antoniak (1974); many others Important consequence of Dirichlet process is that it induces a discretized posterior distribution Shane T. Jensen 3 May 13, 2006
4 Consequence of DP priors Ferguson, 1974: using a Dirichlet process DP(θ, F 0 ) prior for F( ) results in a posterior mixture of F 0 and point masses at observation X i : F( ) X 1,..., X n DP ( θ + n, F 0 + n δ(x i ) ) i=1 For density estimation, discreteness may be a problem: convolutions with kernel functions can be used to produce a continuous density estimate In other applications, discreteness is not a disadvantage! Shane T. Jensen 4 May 13, 2006
5 Clustering with a DP prior Point mass component of posterior leads to a random partition of our variables Consider a new variable X n+1 and let X 1,..., X C be the unique values of X 1:n =(X 1,..., X n ). Then, P (X n+1 = X C X 1:n )= N c θ + n P (X n+1 = new X 1:n )= θ θ + n c =1,..., C N c = size of cluster c: number in X 1:n that equal X c Rich get richer : will return to this... Shane T. Jensen 5 May 13, 2006
6 Motivating Application: TF motifs Genes are regulated by transcription factor (TF) proteins that bind to the DNA sequence near to gene TF proteins can selectively control only certain target genes by only binding to the same sequence, called a motif The motif sites are highly conserved but not identical, so we use a matrix description of the motif appearance Frequency Matrix - X i Sequence Logo A C G T Shane T. Jensen 6 May 13, 2006
7 Collections of TF motifs Large databases contain motif information on many TFs but with large amount of redundancy TRANSFAC and JASPAR are largest (100 s in each) Want to cluster motifs together to either reduce redundancy in databases or match new motifs to database Nucleotide conservation varies both within a single motif (between positions) and between different motifs Tal1beta-E47S AGL3 Shane T. Jensen 7 May 13, 2006
8 Motif Clustering with DP prior Hierarchical model with levels for both within-unit and between-unit variability in discovered motifs Observed count matrix Y i is a product multinomial realization of frequency matrix X i Unknown X i s share unknown distribution F( ) Dirichlet process DP(θ, F 0 ) prior for F( ) leads to posterior mixture of F 0 and point masses at each X i Our prior measure F 0 in this application is a product Dirichlet distribution Shane T. Jensen 8 May 13, 2006
9 Benefits and Issues with DP prior Allows unknown number of clusters without need to model number of clusters directly No real prior knowledge about number of clusters in our application However, with DP there are implicit assumptions about number of clusters (and their size distribution) Rich get richer property influences prior predictive number of clusters and cluster size distribution How influential is this property in an application? Shane T. Jensen 9 May 13, 2006
10 Benefits and Issues with MCMC DP-based model is easy to implement via Gibbs sampling p(x i X i ) is same choice structure as p(x n+1 X 1:n ) X i either sampled into one of current clusters defined by X i or sampled from F 0 to form a new cluster Alternative is direct model on number of clusters and then use something like Reversible Jump MCMC Mixing can be an issue with Gibbs sampler collapsed Gibbs sampler: integrate out X i and deal directly with clustering indicators split/merge moves to speed up mixing: lots of great work by R. Neal, D. Dahl and others Shane T. Jensen 10 May 13, 2006
11 Main Issue 1: Posterior Inference from MCMC However, there are still issues posterior inference based on Gibbs sampling output also has issues Need to infer a set of clusters from sampled partitions, but we have a label switching problem (Stephens, 1999) cluster labels are exchangeable for a particular partition usual summaries such as posterior mean can be misleading mixtures of these exchangeable labeling need summaries that are uninfluenced by labeling Shane T. Jensen 11 May 13, 2006
12 Posterior Inference Options Option 1: clusters defined by last partition visited sampled partition produced at end of Gibbs chain surprisingly popular, e.g. Latent Dirichlet Alloc. models Option 2: clusters defined by MAP partition sampled partition with highest posterior density simple and popular Option 3: clusters defined by threshold on pairwise posterior probabilities P ij frequency of iterations with motifs i & j in same cluster Shane T. Jensen 12 May 13, 2006
13 Main Issue 2: Implicit DP Assumptions DP has implicit rich get richer property: easy to see from the predictive distribution: P (X n+1 joins cluster c ) = P (X n+1 forms new cluster) = N c θ + n θ θ + n c =1,..., C Chinese restaurant process: new customer chooses table sits at current table with probability N c, the number of customers already sitting there sits at entirely new table with probability θ Shane T. Jensen 13 May 13, 2006
14 Alternative Priors for Clustering Uniform Prior: socialism, no one gets rich P (X n+1 joins cluster c ) = 1 θ + C c =1,..., C P (X n+1 forms new cluster) = θ θ + C Pitman-Yor Prior: rich get richer, but charitable P (X n+1 joins cluster c ) = N c α θ + n c =1,..., C P (X n+1 forms new cluster) = θ C α θ + n 0 α 1 is often called the discount factor Shane T. Jensen 14 May 13, 2006
15 Asymptotic Comparison of Priors Number of clusters C n is clearly a function of sample size n How does C n grow as n? DP Prior : Pitman Yor Prior : E(C n ) θ log(n) E(C n ) K(θ, α) n α Uniform Prior : E(C n ) K(θ) n 1 2 DP prior shows slowest growth in number of clusters C n Interestingly, Pitman-Yor can lead to either faster or slower growth vs. Uniform, depending on α Also working on results for distribution of cluster sizes Shane T. Jensen 15 May 13, 2006
16 Finite Sample Comparison of Priors Y = C n vs. X = n for different values of θ θ= 1 θ= 10 θ= 100 Expected Number of Clusters DP UN PY (α= 0.5) PY (α= 0.25) PY (α= 0.75) Expected Number of Clusters DP UN PY (α= 0.5) PY (α= 0.25) PY (α= 0.75) Expected Number of Clusters DP UN PY (α= 0.5) PY (α= 0.25) PY (α= 0.75) 1e+02 5e+02 5e+03 5e+04 1e+02 5e+02 5e+03 5e+04 1e+02 5e+02 5e+03 5e+04 n = number of observations n = number of observations n = number of observations Shane T. Jensen 16 May 13, 2006
17 Simulation Study of Motif Clustering Evaluation of different priors and modes of inference in context of motif clustering application Simulated realistic collections of motifs (known partitions) Different simulation conditions to vary clustering difficulty: high to low within-cluster similarity high to low between-cluster similarity Success measured by Jacard similarity between true partition z and inferred partition ẑ J(z, ẑ) = TP TP + FP + FN Shane T. Jensen 17 May 13, 2006
18 Simulation Comparison of Inference Alternatives Jacard Index MAP Prob > 0.5 Prob > Increasing Clustering Difficulty MAP partition consistently inferior to pairwise probs. Post. probs. incorporate uncertainty across iterations Shane T. Jensen 18 May 13, 2006
19 Simulation Comparison of Prior Alternatives Jacard Index Uniform PY 0.25 PY 0.5 PY 0.75 DP Increasing Clustering Difficulty Not much difference in general between priors Uniform does a little worse in most situations Shane T. Jensen 19 May 13, 2006
20 Real Data Results: Clustering JASPAR database Tree based on pairwise posterior probabilities: 1 Prob(Clustering) Homo.sapiens NUCLEAR MA0065 Homo.sapiens NUCLEAR MA0072 Drosophila.melanogaster NUCLEAR MA0016 Homo.sapiens NUCLEAR MA0074 Homo.sapiens NUCLEAR MA0066 Homo.sapiens NUCLEAR MA0071 Arabidopsis.thaliana HOMEO.ZIP MA0008 Arabidopsis.thaliana HOMEO.ZIP MA0110 Mus.musculus bhlh.zip MA0104 Homo.sapiens bhlh.zip MA0093 Homo.sapiens bhlh.zip MA0059 Mus.musculus bhlh MA0004 Homo.sapiens bhlh.zip MA0058 Homo.sapiens bzip MA0018 Antirrhinum.majus bzip MA0096 Antirrhinum.majus bzip MA0097 Homo.sapiens ETS MA0028 Drosophila.melanogaster ETS MA0026 Homo.sapiens ETS MA0076 Homo.sapiens HMG MA0084 Mus.musculus HMG MA0087 Homo.sapiens FORKHEAD MA0030 Homo.sapiens FORKHEAD MA0031 Homo.sapiens AP2 MA0003 Homo.sapiens ZN.FINGER MA0095 Mus.musculus HMG MA0078 Rattus.norvegicus FORKHEAD MA0041 Rattus.norvegicus FORKHEAD MA0047 Rattus.norvegicus FORKHEAD MA0040 Homo.sapiens FORKHEAD MA0042 Rattus.norvegicus bzip MA0019 Homo.sapiens bhlh MA0091 Homo.sapiens ZN.FINGER MA0073 Homo.sapiens TEA MA0090 Homo.sapiens NUCLEAR MA0017 Gallus.gallus ZN.FINGER MA0103 Homo.sapiens P53 MA0106 Drosophila.melanogaster ZN.FINGER MA0011 Homo.sapiens MADS MA0083 Arabidopsis.thaliana MADS MA0001 Arabidopsis.thaliana MADS MA0005 Homo.sapiens FORKHEAD MA0032 Homo.sapiens bhlh MA0048 Antirrhinum.majus MADS MA0082 Oryctolagus.cuniculus ZN.FINGER MA0109 Homo.sapiens Unknown MA0024 Pisum.sativum HMG MA0044 Homo.sapiens RUNT MA0002 Mus.musculus bhlh MA0006 Homo.sapiens PAIRED MA0069 Petunia.hybrida TRP.CLUSTER MA0054 Hordeum.vulgare TRP.CLUSTER MA0034 Xenupus.laevis ZN.FINGER MA0088 Gallus.gallus ETS MA0098 Homo.sapiens bhlh MA0055 Mus.musculus T.BOX MA0009 Drosophila.melanogaster ZN.FINGER MA0086 NA bzip MA0102 Homo.sapiens bzip MA0025 Mus.musculus HOMEO MA0063 Drosophila.melanogaster REL MA0023 Homo.sapiens REL MA0105 Homo.sapiens ETS MA0062 Zea.mays ZN.FINGER MA0020 Zea.mays ZN.FINGER MA0021 Homo.sapiens HMG MA0077 Mus.musculus PAIRED MA0067 Homo.sapiens bzip MA0043 Gallus.gallus bzip MA0089 Mus.musculus bzip MA0099 Drosophila.melanogaster REL MA0022 Homo.sapiens ZN.FINGER MA0079 Mus.musculus bhlh.zip MA0111 Homo.sapiens REL MA0107 Homo.sapiens REL MA0101 Vertebrates REL MA0061 Mus.musculus PAIRED MA0014 Homo.sapiens ZN.FINGER MA0057 Mus.musculus ZN.FINGER MA0039 Mus.musculus HOMEO MA0027 Homo.sapiens ZN.FINGER MA0056 Homo.sapiens ETS MA0081 Drosophila.melanogaster IPT/TIG MA0085 Rattus.rattus NUCLEAR MA0007 Homo.sapiens ETS MA0080 Drosophila.melanogaster ZN.FINGER MA0015 NA TATA.box MA0108 Mus.musculus ZN.FINGER MA0035 Mus.musculus ZN.FINGER MA0029 Homo.sapiens ZN.FINGER MA0037 Rattus.norvegicus ZN.FINGER MA0038 Homo.sapiens TRP.CLUSTER MA0050 Homo.sapiens TRP.CLUSTER MA0051 Drosophila.melanogaster ZN.FINGER MA0013 Homo.sapiens HOMEO MA0070 Homo.sapiens FORKHEAD MA0033 Mus.musculus PAIRED.HOMEO MA0068 Drosophila.melanogaster ZN.FINGER MA0010 Pisum.sativum HMG MA0045 Drosophila.melanogaster ZN.FINGER MA0012 Drosophila.melanogaster ZN.FINGER MA0049 Post-processed MAP partition to remove weak relationships, then very similar to thresholded post. probs. Shane T. Jensen 20 May 13, 2006
21 Comparing Priors: Clustering JASPAR database Number of Clusters Unif Average Cluster Size Unif Frequency Number of Clusters DP Frequency Frequency Average Cluster Size DP Frequency Very little difference between using DP and uniform prior Likelihood is dominating any prior assumption on partition Shane T. Jensen 21 May 13, 2006
22 Summary Non-parametric Bayesian approaches based on Dirichlet process can be very useful for clustering applications Issues with MCMC inference: popular MAP partitions seem inferior to partitions based on posterior probabilities Issues with implicit DP assumptions: alternative priors give quite different prior partitions Posterior differences between priors are small in our motif application, but can be larger in other applications Jensen and Liu, JASA (forthcoming) plus other manuscripts soon available on my website stjensen Shane T. Jensen 22 May 13, 2006
Non-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationAn Alternative Prior Process for Nonparametric Bayesian Clustering
An Alternative Prior Process for Nonparametric Bayesian Clustering Hanna Wallach (UMass Amherst) Shane Jensen (UPenn) Lee Dicker (Harvard) Katherine Heller (Cambridge) Nonparametric Bayesian Clustering
More informationAn Alternative Prior Process for Nonparametric Bayesian Clustering
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 2010 An Alternative Prior Process for Nonparametric Bayesian Clustering Hanna M. Wallach University of Massachusetts
More informationCSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado
CSCI 5822 Probabilistic Model of Human and Machine Learning Mike Mozer University of Colorado Topics Language modeling Hierarchical processes Pitman-Yor processes Based on work of Teh (2006), A hierarchical
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationLecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu
Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationBayesian Nonparametrics: Dirichlet Process
Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationBayesian Nonparametric Regression for Diabetes Deaths
Bayesian Nonparametric Regression for Diabetes Deaths Brian M. Hartman PhD Student, 2010 Texas A&M University College Station, TX, USA David B. Dahl Assistant Professor Texas A&M University College Station,
More informationNon-parametric Clustering with Dirichlet Processes
Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction
More informationHierarchical Bayesian Languge Model Based on Pitman-Yor Processes. Yee Whye Teh
Hierarchical Bayesian Languge Model Based on Pitman-Yor Processes Yee Whye Teh Probabilistic model of language n-gram model Utility i-1 P(word i word i-n+1 ) Typically, trigram model (n=3) e.g., speech,
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationGibbs Sampling Methods for Multiple Sequence Alignment
Gibbs Sampling Methods for Multiple Sequence Alignment Scott C. Schmidler 1 Jun S. Liu 2 1 Section on Medical Informatics and 2 Department of Statistics Stanford University 11/17/99 1 Outline Statistical
More informationSimilarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test *
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 27, 855-868 (20) Similarity Analysis between Transcription Factor Binding Sites by Bayesian Hypothesis Test * QIAN LIU +, SAN-YANG LIU AND LI-FANG LIU + Department
More informationComputer Vision Group Prof. Daniel Cremers. 14. Clustering
Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it
More informationBayesian Nonparametric Models
Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior
More informationHierarchical Bayesian Nonparametrics
Hierarchical Bayesian Nonparametrics Micha Elsner April 11, 2013 2 For next time We ll tackle a paper: Green, de Marneffe, Bauer and Manning: Multiword Expression Identification with Tree Substitution
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationInfinite latent feature models and the Indian Buffet Process
p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationA Brief Overview of Nonparametric Bayesian Models
A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine
More informationColouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles
Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Peter Green and John Lau University of Bristol P.J.Green@bristol.ac.uk Isaac Newton Institute, 11 December
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson
More informationContents. Part I: Fundamentals of Bayesian Inference 1
Contents Preface xiii Part I: Fundamentals of Bayesian Inference 1 1 Probability and inference 3 1.1 The three steps of Bayesian data analysis 3 1.2 General notation for statistical inference 4 1.3 Bayesian
More informationAdvanced Machine Learning
Advanced Machine Learning Nonparametric Bayesian Models --Learning/Reasoning in Open Possible Worlds Eric Xing Lecture 7, August 4, 2009 Reading: Eric Xing Eric Xing @ CMU, 2006-2009 Clustering Eric Xing
More informationBayesian learning of sparse factor loadings
Magnus Rattray School of Computer Science, University of Manchester Bayesian Research Kitchen, Ambleside, September 6th 2008 Talk Outline Brief overview of popular sparsity priors Example application:
More informationA Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles
A Bayesian Nonparametric Model for Predicting Disease Status Using Longitudinal Profiles Jeremy Gaskins Department of Bioinformatics & Biostatistics University of Louisville Joint work with Claudio Fuentes
More informationSpatial Bayesian Nonparametrics for Natural Image Segmentation
Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes
More informationBayes methods for categorical data. April 25, 2017
Bayes methods for categorical data April 25, 2017 Motivation for joint probability models Increasing interest in high-dimensional data in broad applications Focus may be on prediction, variable selection,
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationImage segmentation combining Markov Random Fields and Dirichlet Processes
Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO
More informationCS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I
X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet
More informationMAD-Bayes: MAP-based Asymptotic Derivations from Bayes
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes Tamara Broderick Brian Kulis Michael I. Jordan Cat Clusters Mouse clusters Dog 1 Cat Clusters Dog Mouse Lizard Sheep Picture 1 Picture 2 Picture 3
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationBayesian Classification and Regression Trees
Bayesian Classification and Regression Trees James Cussens York Centre for Complex Systems Analysis & Dept of Computer Science University of York, UK 1 Outline Problems for Lessons from Bayesian phylogeny
More informationHaupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process
Haupthseminar: Machine Learning Chinese Restaurant Process, Indian Buffet Process Agenda Motivation Chinese Restaurant Process- CRP Dirichlet Process Interlude on CRP Infinite and CRP mixture model Estimation
More informationBayesian non parametric approaches: an introduction
Introduction Latent class models Latent feature models Conclusion & Perspectives Bayesian non parametric approaches: an introduction Pierre CHAINAIS Bordeaux - nov. 2012 Trajectory 1 Bayesian non parametric
More informationClustering bi-partite networks using collapsed latent block models
Clustering bi-partite networks using collapsed latent block models Jason Wyse, Nial Friel & Pierre Latouche Insight at UCD Laboratoire SAMM, Université Paris 1 Mail: jason.wyse@ucd.ie Insight Latent Space
More information27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling
10-708: Probabilistic Graphical Models 10-708, Spring 2014 27 : Distributed Monte Carlo Markov Chain Lecturer: Eric P. Xing Scribes: Pengtao Xie, Khoa Luu In this scribe, we are going to review the Parallel
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationNonparametric Mixed Membership Models
5 Nonparametric Mixed Membership Models Daniel Heinz Department of Mathematics and Statistics, Loyola University of Maryland, Baltimore, MD 21210, USA CONTENTS 5.1 Introduction................................................................................
More informationModelling Genetic Variations with Fragmentation-Coagulation Processes
Modelling Genetic Variations with Fragmentation-Coagulation Processes Yee Whye Teh, Charles Blundell, Lloyd Elliott Gatsby Computational Neuroscience Unit, UCL Genetic Variations in Populations Inferring
More informationBayesian Analysis for Natural Language Processing Lecture 2
Bayesian Analysis for Natural Language Processing Lecture 2 Shay Cohen February 4, 2013 Administrativia The class has a mailing list: coms-e6998-11@cs.columbia.edu Need two volunteers for leading a discussion
More informationBayesian Sparse Correlated Factor Analysis
Bayesian Sparse Correlated Factor Analysis 1 Abstract In this paper, we propose a new sparse correlated factor model under a Bayesian framework that intended to model transcription factor regulation in
More informationMarkov Chain Monte Carlo Lecture 6
Sequential parallel tempering With the development of science and technology, we more and more need to deal with high dimensional systems. For example, we need to align a group of protein or DNA sequences
More informationPart IV: Monte Carlo and nonparametric Bayes
Part IV: Monte Carlo and nonparametric Bayes Outline Monte Carlo methods Nonparametric Bayesian models Outline Monte Carlo methods Nonparametric Bayesian models The Monte Carlo principle The expectation
More informationDifferent points of view for selecting a latent structure model
Different points of view for selecting a latent structure model Gilles Celeux Inria Saclay-Île-de-France, Université Paris-Sud Latent structure models: two different point of views Density estimation LSM
More informationDavid B. Dahl. Department of Statistics, and Department of Biostatistics & Medical Informatics University of Wisconsin Madison
AN IMPROVED MERGE-SPLIT SAMPLER FOR CONJUGATE DIRICHLET PROCESS MIXTURE MODELS David B. Dahl dbdahl@stat.wisc.edu Department of Statistics, and Department of Biostatistics & Medical Informatics University
More informationDistance dependent Chinese restaurant processes
David M. Blei Department of Computer Science, Princeton University 35 Olden St., Princeton, NJ 08540 Peter Frazier Department of Operations Research and Information Engineering, Cornell University 232
More informationDirichlet Processes: Tutorial and Practical Course
Dirichlet Processes: Tutorial and Practical Course (updated) Yee Whye Teh Gatsby Computational Neuroscience Unit University College London August 2007 / MLSS Yee Whye Teh (Gatsby) DP August 2007 / MLSS
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationDistance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics
Distance-Based Probability Distribution for Set Partitions with Applications to Bayesian Nonparametrics David B. Dahl August 5, 2008 Abstract Integration of several types of data is a burgeoning field.
More informationLearning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling
Learning Sequence Motif Models Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 009 Mark Craven craven@biostat.wisc.edu Sequence Motifs what is a sequence
More information28 : Approximate Inference - Distributed MCMC
10-708: Probabilistic Graphical Models, Spring 2015 28 : Approximate Inference - Distributed MCMC Lecturer: Avinava Dubey Scribes: Hakim Sidahmed, Aman Gupta 1 Introduction For many interesting problems,
More informationInfinite Latent Feature Models and the Indian Buffet Process
Infinite Latent Feature Models and the Indian Buffet Process Thomas L. Griffiths Cognitive and Linguistic Sciences Brown University, Providence RI 292 tom griffiths@brown.edu Zoubin Ghahramani Gatsby Computational
More informationFitting Narrow Emission Lines in X-ray Spectra
Outline Fitting Narrow Emission Lines in X-ray Spectra Taeyoung Park Department of Statistics, University of Pittsburgh October 11, 2007 Outline of Presentation Outline This talk has three components:
More informationGaussian Mixture Model
Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,
More informationDirichlet Process. Yee Whye Teh, University College London
Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant
More informationNonparametric Bayes tensor factorizations for big data
Nonparametric Bayes tensor factorizations for big data David Dunson Department of Statistical Science, Duke University Funded from NIH R01-ES017240, R01-ES017436 & DARPA N66001-09-C-2082 Motivation Conditional
More informationA marginal sampler for σ-stable Poisson-Kingman mixture models
A marginal sampler for σ-stable Poisson-Kingman mixture models joint work with Yee Whye Teh and Stefano Favaro María Lomelí Gatsby Unit, University College London Talk at the BNP 10 Raleigh, North Carolina
More informationGraphical Models for Query-driven Analysis of Multimodal Data
Graphical Models for Query-driven Analysis of Multimodal Data John Fisher Sensing, Learning, & Inference Group Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology
More informationOutline. Clustering. Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models
Capturing Unobserved Heterogeneity in the Austrian Labor Market Using Finite Mixtures of Markov Chain Models Collaboration with Rudolf Winter-Ebmer, Department of Economics, Johannes Kepler University
More informationInterpretable Latent Variable Models
Interpretable Latent Variable Models Fernando Perez-Cruz Bell Labs (Nokia) Department of Signal Theory and Communications, University Carlos III in Madrid 1 / 24 Outline 1 Introduction to Machine Learning
More informationarxiv: v1 [stat.ml] 8 Jan 2012
A Split-Merge MCMC Algorithm for the Hierarchical Dirichlet Process Chong Wang David M. Blei arxiv:1201.1657v1 [stat.ml] 8 Jan 2012 Received: date / Accepted: date Abstract The hierarchical Dirichlet process
More informationAlignment. Peak Detection
ChIP seq ChIP Seq Hongkai Ji et al. Nature Biotechnology 26: 1293-1300. 2008 ChIP Seq Analysis Alignment Peak Detection Annotation Visualization Sequence Analysis Motif Analysis Alignment ELAND Bowtie
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More information18 : Advanced topics in MCMC. 1 Gibbs Sampling (Continued from the last lecture)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 18 : Advanced topics in MCMC Lecturer: Eric P. Xing Scribes: Jessica Chemali, Seungwhan Moon 1 Gibbs Sampling (Continued from the last lecture)
More informationLecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008
Lecture 8 Learning Sequence Motif Models Using Expectation Maximization (EM) Colin Dewey February 14, 2008 1 Sequence Motifs what is a sequence motif? a sequence pattern of biological significance typically
More informationStudy Notes on the Latent Dirichlet Allocation
Study Notes on the Latent Dirichlet Allocation Xugang Ye 1. Model Framework A word is an element of dictionary {1,,}. A document is represented by a sequence of words: =(,, ), {1,,}. A corpus is a collection
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationBayesian Nonparametrics: some contributions to construction and properties of prior distributions
Bayesian Nonparametrics: some contributions to construction and properties of prior distributions Annalisa Cerquetti Collegio Nuovo, University of Pavia, Italy Interview Day, CETL Lectureship in Statistics,
More informationMCMC and Gibbs Sampling. Kayhan Batmanghelich
MCMC and Gibbs Sampling Kayhan Batmanghelich 1 Approaches to inference l Exact inference algorithms l l l The elimination algorithm Message-passing algorithm (sum-product, belief propagation) The junction
More informationSpatial Normalized Gamma Process
Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma
More informationBayesian Nonparametric Models on Decomposable Graphs
Bayesian Nonparametric Models on Decomposable Graphs François Caron INRIA Bordeaux Sud Ouest Institut de Mathématiques de Bordeaux University of Bordeaux, France francois.caron@inria.fr Arnaud Doucet Departments
More informationBayesian nonparametric latent feature models
Bayesian nonparametric latent feature models Indian Buffet process, beta process, and related models François Caron Department of Statistics, Oxford Applied Bayesian Statistics Summer School Como, Italy
More informationShared Segmentation of Natural Scenes. Dependent Pitman-Yor Processes
Shared Segmentation of Natural Scenes using Dependent Pitman-Yor Processes Erik Sudderth & Michael Jordan University of California, Berkeley Parsing Visual Scenes sky skyscraper sky dome buildings trees
More informationDirichlet Processes and other non-parametric Bayesian models
Dirichlet Processes and other non-parametric Bayesian models Zoubin Ghahramani http://learning.eng.cam.ac.uk/zoubin/ zoubin@cs.cmu.edu Statistical Machine Learning CMU 10-702 / 36-702 Spring 2008 Model
More informationProbabilistic Graphical Models. Guest Lecture by Narges Razavian Machine Learning Class April
Probabilistic Graphical Models Guest Lecture by Narges Razavian Machine Learning Class April 14 2017 Today What is probabilistic graphical model and why it is useful? Bayesian Networks Basic Inference
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More information19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process
10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable
More informationChapter 8 PROBABILISTIC MODELS FOR TEXT MINING. Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign
Chapter 8 PROBABILISTIC MODELS FOR TEXT MINING Yizhou Sun Department of Computer Science University of Illinois at Urbana-Champaign sun22@illinois.edu Hongbo Deng Department of Computer Science University
More informationA permutation-augmented sampler for DP mixture models
Percy Liang University of California, Berkeley Michael Jordan University of California, Berkeley Ben Taskar University of Pennsylvania Abstract We introduce a new inference algorithm for Dirichlet process
More informationAcoustic Unit Discovery (AUD) Models. Leda Sarı
Acoustic Unit Discovery (AUD) Models Leda Sarı Lucas Ondel and Lukáš Burget A summary of AUD experiments from JHU Frederick Jelinek Summer Workshop 2016 lsari2@illinois.edu November 07, 2016 1 / 23 The
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationBayesian Inference for Dirichlet-Multinomials
Bayesian Inference for Dirichlet-Multinomials Mark Johnson Macquarie University Sydney, Australia MLSS Summer School 1 / 50 Random variables and distributed according to notation A probability distribution
More informationBayesian inference for multivariate extreme value distributions
Bayesian inference for multivariate extreme value distributions Sebastian Engelke Clément Dombry, Marco Oesting Toronto, Fields Institute, May 4th, 2016 Main motivation For a parametric model Z F θ of
More informationRandom Partition Distribution Indexed by Pairwise Information
Random Partition Distribution Indexed by Pairwise Information David B. Dahl 1, Ryan Day 2, and Jerry W. Tsai 3 1 Department of Statistics, Brigham Young University, Provo, UT 84602, U.S.A. 2 Livermore
More informationHierarchical Dirichlet Processes
Hierarchical Dirichlet Processes Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David M. Blei Computer Science Div., Dept. of Statistics Dept. of Computer Science University of California at Berkeley
More informationDiscovering molecular pathways from protein interaction and ge
Discovering molecular pathways from protein interaction and gene expression data 9-4-2008 Aim To have a mechanism for inferring pathways from gene expression and protein interaction data. Motivation Why
More informationProbabilistic Time Series Classification
Probabilistic Time Series Classification Y. Cem Sübakan Boğaziçi University 25.06.2013 Y. Cem Sübakan (Boğaziçi University) M.Sc. Thesis Defense 25.06.2013 1 / 54 Problem Statement The goal is to assign
More informationBayesian nonparametric latent feature models
Bayesian nonparametric latent feature models François Caron UBC October 2, 2007 / MLRG François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 1 / 29 Overview 1 Introduction
More informationStat 451 Lecture Notes Markov Chain Monte Carlo. Ryan Martin UIC
Stat 451 Lecture Notes 07 12 Markov Chain Monte Carlo Ryan Martin UIC www.math.uic.edu/~rgmartin 1 Based on Chapters 8 9 in Givens & Hoeting, Chapters 25 27 in Lange 2 Updated: April 4, 2016 1 / 42 Outline
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Clustering: Part 2 Instructor: Yizhou Sun yzsun@ccs.neu.edu October 19, 2014 Methods to Learn Matrix Data Set Data Sequence Data Time Series Graph & Network
More informationDepartment of Statistics, The Wharton School, University of Pennsylvania
Submitted to the Annals of Applied Statistics BAYESIAN TESTING OF MANY HYPOTHESIS MANY GENES: A STUDY OF SLEEP APNEA BY SHANE T. JENSEN Department of Statistics, The Wharton School, University of Pennsylvania
More informationPriors for Random Count Matrices with Random or Fixed Row Sums
Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences
More informationComputational statistics
Computational statistics Markov Chain Monte Carlo methods Thierry Denœux March 2017 Thierry Denœux Computational statistics March 2017 1 / 71 Contents of this chapter When a target density f can be evaluated
More information