Beta processes, stick-breaking, and power laws
|
|
- Allan Sanders
- 5 years ago
- Views:
Transcription
1 Beta processes, stick-breaking, and power laws T. Broderick, M. Jordan, J. Pitman Presented by Jixiong Wang & J. Li November 17, 2011
2 DP vs. BP Dirichlet Process Beta Process
3 DP vs. BP Dirichlet Process G DP(αB 0 ): Beta Process G BP(θ, γb 0 ): G = π i δ ψi, i=1 π i = 1 i=1 G = q i δ ψi, q i (0, 1) i=1
4 DP vs. BP Dirichlet Process G DP(αB 0 ): Beta Process G BP(θ, γb 0 ): G = π i δ ψi, i=1 π i = 1 i=1 G = q i δ ψi, q i (0, 1) i=1 CRP - marginalize out π i IBP - marginalize out q i
5 DP vs. BP Dirichlet Process G DP(αB 0 ): Beta Process G BP(θ, γb 0 ): G = π i δ ψi, i=1 π i = 1 i=1 G = q i δ ψi, q i (0, 1) i=1 CRP - marginalize out π i Clustering framework IBP - marginalize out q i Featural framework
6 Poisson Point Process (PPP) PPP: A counting measure N such that A S, N(A) Pois(µ(A))
7 Poisson Point Process (PPP) PPP: A counting measure N such that A S, N(A) Pois(µ(A)) Figure: PPP realizations with different rate measure µ
8 Poisson Point Process (PPP) PPP: A counting measure N such that A S, N(A) Pois(µ(A)) Figure: PPP realizations with different rate measure µ PPP is a completely random measure because for all disjoint subsets A 1,..., A n S, N(A 1 ),..., N(A n ) are independent. Note: DP is not a c.r.m..
9 Beta Process: B BP(θ, γb 0 ) BP is defined by a PPP that lives on Ψ [0, 1] Rate measure: ν(dψ, du) = θ(ψ)u 1 (1 u) θ(ψ) 1 duγb 0 (dψ)
10 Beta Process: B BP(θ, γb 0 ) BP is defined by a PPP that lives on Ψ [0, 1] Rate measure: ν(dψ, du) = θ(ψ)u 1 (1 u) θ(ψ) 1 duγb 0 (dψ) To draw B BP(θ, γb 0 ) = Π = {(ψ i, U i )} i = B = i=1 U iδ ψi
11 Beta Process: B BP(θ, γb 0 ) BP is defined by a PPP that lives on Ψ [0, 1] Rate measure: ν(dψ, du) = θ(ψ)u 1 (1 u) θ(ψ) 1 duγb 0 (dψ) To draw B BP(θ, γb 0 ) = Π = {(ψ i, U i )} i = B = i=1 U iδ ψi Beta process B BP(α,B 0 ) Aside: Poisson processes Θ ν BP (dθ,du)=αu 1 (1 u) α 1 du B 0 (dθ), θ Θ,u [0, 1]
12 Bernoulli process & Binary feature matrix 1 Y BeP(B): Y = i=1 b iδ ψi, where b i Bern(U i ) Draw Y 1,..., Y N BeP(B) Ψ Ψ Figure 2: Upper left: AdrawB from the beta process. Lower left: 50draws from T. Broderick, the Bernoulli M. process Jordan, BeP(B). J. Pitman The vertical Presented axisby indexes Jixiong thewang drawbeta number & J. processes, Li stick-breaking, and power laws
13 0.5 Bernoulli process & Binary feature matrix 1 Y BeP(B): Y = i=1 b iδ ψi, where b i Bern(U i ) 0 Draw Y 1,..., Y N BeP(B) Ψ Ψ Ψ 0 Ψ Figure 2: Upper left: A draw B from the beta process. Lower left: 50 draws from the Bernoulli process BeP(B). The vertical axis indexes the draw number among the 50 exchangeable draws. A point indicates a one at the corresponding location on the horizontal axis, ψ Ψ. Right: We can form a matrix from the lower left plot by including only those ψ values with a non-zero number of Bernoulli successes among the 50 draws from the Bernoulli process. Then, the number of columns K is the number of such ψ, andthenumberofrowsn is the number of draws made. A black square indicates a one at the corresponding matrix position; a white square indicates a zero. Figure 2: Upper left: AdrawB from the beta process. Lower left: 50draws from T. Broderick, the Bernoulli M. process Jordan, BeP(B). J. Pitman The vertical Presented axisby indexes Jixiong thewang drawbeta number & J. processes, Li stick-breaking, and power laws
14 Bernoulli process & Binary feature matrix 1 Y BeP(B): Y = i=1 b iδ ψi, where b i Bern(U i ) 0 0 Draw Y 1,..., Y N BeP(B) Ψ Ψ Ψ Ψ 0 0 Ψ Ψ Figure Figure 2: Upper 2: Upper left: Aleft: drawab draw fromb the frombeta theprocess. beta process. Lower Lower left: 50left: draws 50draws from the from Bernoulli the Bernoulli process process BeP(B). BeP(B). The vertical The vertical axis indexes axis indexes the draw thenumber draw number among among the 50 the exchangeable 50 exchangeable draws. draws. A point Aindicates point indicates a one at a one the corresponding at the corresponding location location on theon horizontal the horizontal axis, ψaxis, Ψ. ψright: Ψ. Right: We canwe form cana form matrix a matrix from the from the lower left lower plot leftbyplot including by including only those only ψthose values ψ values with awith non-zero a non-zero number number of of Bernoulli Bernoulli successes successes among among the 50the draws 50 from drawsthe frombernoulli the Bernoulli process. process. Then, Then, the the number number of columns of columns K is the K number is the number of suchof ψ, such ψ, andthenumberofrowsn is is the number the number of draws of made. draws made. A blacka square black square indicates indicates a one at a one the corresponding at the corresponding matrix matrix position; position; a whitea square white square indicates indicates a zero. a zero. Figure 2: Upper left: AdrawB from the beta process. Lower left: 50draws from T. Broderick, the Bernoulli M. process Jordan, BeP(B). J. Pitman The vertical Presented axisby indexes Jixiong thewang drawbeta number & J. processes, Li stick-breaking, and power laws
15 Bernoulli process & Binary feature matrix 1 Y BeP(B): Y = i=1 b iδ ψi, where b i Bern(U i ) 0 0 Draw Y 1,..., Y N BeP(B) Ψ Ψ Ψ Ψ Ψ 50 Form Bayesian binary nonparametric feature matrix latent feature Z models BP-BeP(N, γ, θ) 7 Figure Figure 2: Upper 2: Upper left: Aleft: drawab draw fromb the frombeta theprocess. beta process. Lower Lower left: 50left: draws 50draws 40 from the from Bernoulli the Bernoulli process process BeP(B). BeP(B). The vertical The vertical axis indexes axis indexes the draw thenumber draw number among among the 50 the exchangeable 50 exchangeable draws. draws. A point Aindicates point indicates a one at a one the corresponding at the corresponding 30 location location on theon horizontal the horizontal axis, ψaxis, Ψ. ψright: Ψ. Right: We canwe form cana form matrix a matrix from the from the lower left lower plot leftbyplot lof including by including only those only ψthose values ψ values with awith non-zero a non-zero number number of of 20 Bernoulli Bernoulli successes successes among among the 50the draws 50 from drawsthe frombernoulli the Bernoulli process. process. Then, Then, the the number number of columns of columns K is the K number is the number of suchof ψ, such ψ, andthenumberofrowsn is is 10 the number the number of draws of made. draws made. A blacka square black square indicates indicates a one at a one the corresponding at the corresponding matrix matrix position; position; a whitea square white square indicates indicates a zero. a zero. 0 Ψ Figure 2: Binary matrices and the left-ordered form. The binary matrix on the left is transformed into the left-ordered binary [Ghahramani matrix on the right et al 06] by the function lof( ). This left-ordered matrix was generated from the exchangeable B from theindian beta process. buffet process Lower left: with 50α draws =10. Empty columns are omitted Figure 2: Upper left: Adraw from T. Broderick, the Bernoulli M. process Jordan, BeP(B). from J. Pitman both Thematrices. vertical Presented axisby indexes Jixiong thewang drawbeta number & J. processes, Li stick-breaking, and power laws
16 or the Stick-breaking beta process can beconstruction derived by a variety of ofbp procedures, sed sampling [Thibaux and Jordan, 2007] and inverse Lévy and Ickstadt, 2004, Teh et al., 2007]. The procedures that are the stick-breaking representation for the Dirichlet process are ey et al. [2010] and Teh et al. [2007]. Our point of departure ch has the following form: B = C i V (l) i,j ψ i,j iid C i i=1 j=1 Pois(γ) iid Beta(1, θ) (1 V (l) i,j )δ ψ i,j i 1 V (i) i,j l=1 iid 1 γ B 0. (13) n is analogous to the stick-breaking representation of the in that it represents a draw from the beta process asasum y drawn atoms, with the weights obtained by a recursive proit is worth noting that for every (i, j) tuple subscript for V (l) i,j, ists and is broken across the superscript l. Thus,there are no roperties across weights in the sum in Eq. (13); by contrast,. (12) sum to one almost surely. tion of the one-parameter Dirichlet process to the two-parameter sst. suggests Broderick, M. that Jordan, wej. Pitman might consider Presented bygeneralizing Jixiong Wang Beta & J. the processes, Li stick- stick-breaking, and power laws
17 or the Stick-breaking beta process can beconstruction derived by a variety of ofbp procedures, sed sampling [Thibaux and Jordan, 2007] and inverse Lévy and Ickstadt, 2004, Teh et al., 2007]. The procedures that are the stick-breaking representation for the Dirichlet process are ey et al. [2010] and Teh et al. [2007]. Our point (Paisley of departure et al 2010): ch has the following form: B = C i V (l) i,j ψ i,j iid C i i=1 j=1 Pois(γ) iid Beta(1, θ) (1 V (l) i,j )δ ψ i,j i 1 V (i) i,j l=1 C 1 B = V (1) 1,j δ ψ 1,j + n is analogous to the stick-breaking representation j=1 of the in that it represents a draw from the beta process asasum y drawn atoms, with the weights obtained by a recursive proit is worth noting that for every (i, j) tuple subscript for V (l) i,j, ists and is broken across the superscript l. Thus,thereareno roperties across weights in the sum in Eq. (13); by contrast,. (12) sum to one almost surely. tion of the one-parameter Dirichlet process to the two-parameter ss suggests that we might consider generalizing the stick- j=1 C 2 j=1 iid 1 γ B 0. C 3 (13) V (3) 3,j V (2) (1) 2,j (1 V ij )δ ψ2,j + (1 V (2,) 3,j )(1 V (1) 3,j )δ ψ 3,j +...
18 or the Stick-breaking beta process can beconstruction derived by a variety of ofbp procedures, sed sampling [Thibaux and Jordan, 2007] and inverse Lévy and Ickstadt, 2004, Teh et al., 2007]. The procedures that are the stick-breaking representation for the Dirichlet process are ey et al. [2010] and Teh et al. [2007]. Our point (Paisley of departure et al 2010): ch has the following form: B = C i V (l) i,j iid C i i=1 j=1 Pois(γ) iid Beta(1, θ) (1 V (l) i,j )δ ψ i,j i 1 V (i) i,j l=1 C 1 B = V (1) 1,j δ ψ 1,j + j=1 C 2 iid ψ i,j 1 γ B 0. C 3 (13) V (3) 3,j n is analogous to the stick-breaking representation j=1 of the in that it represents a draw from the beta process asasum y drawn atoms, with the weights obtained by a recursive proit is worth noting that for every (i, j) tuple subscript for V (l) i,j, Think of each i as a round ists and is broken It is a across multiple the superscript of stick-breaking l. Thus,there DP areno roperties across weights in the sum in Eq. (13); by contrast,. (12) sum to one almost surely. tion of the one-parameter Dirichlet process to the two-parameter sst. suggests Broderick, M. that Jordan, wej. Pitman might consider Presented bygeneralizing Jixiong Wang & J. the Li stick- j=1 V (2) (1) 2,j (1 V ij )δ ψ2,j + (1 V (2,) 3,j )(1 V (1) Beta processes, stick-breaking, and power laws 3,j )δ ψ 3,j +...
19 Three parameter generalization breaking representation of the beta process in Eq. (13) as follows: 3 parameter stick-breaking ( a multiple of Pitman-Yor ) C i B = C i V (l) i,j ψ i,j iid indep iid i=1 j=1 Pois(γ) i 1 V (i) i,j l=1 Beta(1 α, θ + iα) (1 V (l) i,j )δ ψ i,j 1 γ B 0. (14) In Section 6 we will show that introducing the additional parameter α indeed yields Type I and II power law behavior (but not Type III). In the remainder of this section we present a proof that these stick-breaking representations arise from the beta process. In contradistinction to the proof of Eq. (13) by Paisley et al. [2010], which used a limiting process defined on sequences of finite binary matrices, our approach makes a direct connection to the Poisson process characterization of the beta process. Our proof has several virtues: (1) it relies on no asymptotic arguments and instead comes entirely from
20 Three parameter generalization breaking representation of the beta process in Eq. (13) as follows: 3 parameter stick-breaking ( a multiple of Pitman-Yor ) C i B = C i V (l) i,j ψ i,j iid indep iid i=1 j=1 Pois(γ) i 1 V (i) i,j l=1 Beta(1 α, θ + iα) (1 V (l) i,j )δ ψ i,j 1 γ B 0. (14) In Section 3 parameter 6 we willbp(θ, show that α, B 0 introducing ). Rate measure: the additional parameter α indeed yields Type I and II power law behavior (but not Type III). In the ν BP remainder (dψ, du) =B of this o (dψ) section µ BP we (du) present a proof that these stick-breaking representations arise from the beta process. Γ(1 + θ) In contradistinction to the proof =B of Eq. (13) by Paisley o (dψ) et al. [2010], Γ(1 which α)γ(θ used + α) a u 1 α (1 u) θ+α 1 du limiting process defined on sequences of finite binary matrices, our approach makes a direct connection to the Poisson process characterization of the beta process. Our proof has several virtues: (1) it relies on no asymptotic arguments and instead comes entirely from
21 Equivalence An elegant proof Proposition 1 B presented in the stick-breaking construction is equivalent to B BP(θ, α, B 0 )
22 Equivalence An elegant proof Proposition 1 B presented in the stick-breaking construction is equivalent to B BP(θ, α, B 0 ) Idea of proof: The stick-breaking representation is also a PPP, and induces rate measure ν
23 Equivalence An elegant proof Proposition 1 B presented in the stick-breaking construction is equivalent to B BP(θ, α, B 0 ) Idea of proof: The stick-breaking representation is also a PPP, and induces rate measure ν Therefore only need to show that ν = ν BP
24 Power law behavior: Power laws in clustering models: K N,j = i=1 I (N i = j) K N = i=1 I (N i > 0) Type 1: K N cn a, N Type 2: K N,j aγ(j a) j!γ(1 a) cna, N Power laws in featural models: Type 3: P(k n > M) cm a
25 Power law derivations: Type 1 and 2
26 Power law derivations: Poissonization K(t) will be the number of such Poisson processes with points in the interval [0, t] K(t) = i I Π i [0, t] > 0 K j (t) will be the number of such Poisson processes with j points in the interval [0, t] K j (t) = i I Π i [0, t] = j
27 Power law derivations
28 Power law derivations
29 Power law derivations: Type 1 and 2
30 Power law derivations: Type 3
31 Simulation α = 0 (classic), α = 0.3 and α = 0.6; γ = 3, θ = 1. Generate 2000 random variables C i and 2000 i=1 C i feature probabilities. With these probabilities, we generated N = 1000 data points, i.e., 1000 vectors of (2000) independent Bernoulli random variables.
32 Simulation: Type 1 & 2
33 Simulation: Type 3
34 Experimental results Beta process coupled with a discrete factor analysis model. Handwritten digit: 28x28 pixels projected into 50 dimensions with PCA.
35 Experimental results
36 Experimental results
37 Conclusions (BP, stick-breaking, IBP) (DP, stick-breaking, CRP) Three-parameter generalization of BP Pitman-Yor generalization of DP Type 1 & 2 power laws follow from the three-parameter model. Type 3: an open problem to discover new class of stochastic process.
Beta processes, stick-breaking, and power laws
Beta processes, stick-breaking, and power laws Tamara Broderick Michael Jordan Jim Pitman Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-211-125
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Infinite Feature Models: The Indian Buffet Process Eric Xing Lecture 21, April 2, 214 Acknowledgement: slides first drafted by Sinead Williamson
More informationBayesian non parametric approaches: an introduction
Introduction Latent class models Latent feature models Conclusion & Perspectives Bayesian non parametric approaches: an introduction Pierre CHAINAIS Bordeaux - nov. 2012 Trajectory 1 Bayesian non parametric
More informationBayesian nonparametric latent feature models
Bayesian nonparametric latent feature models Indian Buffet process, beta process, and related models François Caron Department of Statistics, Oxford Applied Bayesian Statistics Summer School Como, Italy
More informationCombinatorial Clustering and the Beta. Negative Binomial Process
Combinatorial Clustering and the Beta 1 Negative Binomial Process Tamara Broderick, Lester Mackey, John Paisley, Michael I. Jordan Abstract arxiv:1111.1802v5 [stat.me] 10 Jun 2013 We develop a Bayesian
More informationFeature Allocations, Probability Functions, and Paintboxes
Feature Allocations, Probability Functions, and Paintboxes Tamara Broderick, Jim Pitman, Michael I. Jordan Abstract The problem of inferring a clustering of a data set has been the subject of much research
More informationInfinite latent feature models and the Indian Buffet Process
p.1 Infinite latent feature models and the Indian Buffet Process Tom Griffiths Cognitive and Linguistic Sciences Brown University Joint work with Zoubin Ghahramani p.2 Beyond latent classes Unsupervised
More informationBayesian Nonparametric Models
Bayesian Nonparametric Models David M. Blei Columbia University December 15, 2015 Introduction We have been looking at models that posit latent structure in high dimensional data. We use the posterior
More informationA Stick-Breaking Construction of the Beta Process
John Paisley 1 jwp4@ee.duke.edu Aimee Zaas 2 aimee.zaas@duke.edu Christopher W. Woods 2 woods004@mc.duke.edu Geoffrey S. Ginsburg 2 ginsb005@duke.edu Lawrence Carin 1 lcarin@ee.duke.edu 1 Department of
More informationBayesian Nonparametrics for Speech and Signal Processing
Bayesian Nonparametrics for Speech and Signal Processing Michael I. Jordan University of California, Berkeley June 28, 2011 Acknowledgments: Emily Fox, Erik Sudderth, Yee Whye Teh, and Romain Thibaux Computer
More informationA Brief Overview of Nonparametric Bayesian Models
A Brief Overview of Nonparametric Bayesian Models Eurandom Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin Also at Machine
More informationOn the posterior structure of NRMI
On the posterior structure of NRMI Igor Prünster University of Turin, Collegio Carlo Alberto and ICER Joint work with L.F. James and A. Lijoi Isaac Newton Institute, BNR Programme, 8th August 2007 Outline
More information19 : Bayesian Nonparametrics: The Indian Buffet Process. 1 Latent Variable Models and the Indian Buffet Process
10-708: Probabilistic Graphical Models, Spring 2015 19 : Bayesian Nonparametrics: The Indian Buffet Process Lecturer: Avinava Dubey Scribes: Rishav Das, Adam Brodie, and Hemank Lamba 1 Latent Variable
More informationPriors for Random Count Matrices with Random or Fixed Row Sums
Priors for Random Count Matrices with Random or Fixed Row Sums Mingyuan Zhou Joint work with Oscar Madrid and James Scott IROM Department, McCombs School of Business Department of Statistics and Data Sciences
More informationHaupthseminar: Machine Learning. Chinese Restaurant Process, Indian Buffet Process
Haupthseminar: Machine Learning Chinese Restaurant Process, Indian Buffet Process Agenda Motivation Chinese Restaurant Process- CRP Dirichlet Process Interlude on CRP Infinite and CRP mixture model Estimation
More informationDependent hierarchical processes for multi armed bandits
Dependent hierarchical processes for multi armed bandits Federico Camerlenghi University of Bologna, BIDSA & Collegio Carlo Alberto First Italian meeting on Probability and Mathematical Statistics, Torino
More informationNon-Parametric Bayes
Non-Parametric Bayes Mark Schmidt UBC Machine Learning Reading Group January 2016 Current Hot Topics in Machine Learning Bayesian learning includes: Gaussian processes. Approximate inference. Bayesian
More informationarxiv: v2 [math.st] 22 Apr 2016
arxiv:1410.6843v2 [math.st] 22 Apr 2016 Posteriors, conjugacy, and exponential families for completely random measures Tamara Broderick Ashia C. Wilson Michael I. Jordan April 25, 2016 Abstract We demonstrate
More informationHierarchical Models, Nested Models and Completely Random Measures
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/238729763 Hierarchical Models, Nested Models and Completely Random Measures Article March 2012
More informationNonparametric Factor Analysis with Beta Process Priors
Nonparametric Factor Analysis with Beta Process Priors John Paisley Lawrence Carin Department of Electrical & Computer Engineering Duke University, Durham, NC 7708 jwp4@ee.duke.edu lcarin@ee.duke.edu Abstract
More informationNon-parametric Clustering with Dirichlet Processes
Non-parametric Clustering with Dirichlet Processes Timothy Burns SUNY at Buffalo Mar. 31 2009 T. Burns (SUNY at Buffalo) Non-parametric Clustering with Dirichlet Processes Mar. 31 2009 1 / 24 Introduction
More informationMixed Membership Models for Time Series
20 Mixed Membership Models for Time Series Emily B. Fox Department of Statistics, University of Washington, Seattle, WA 98195, USA Michael I. Jordan Computer Science Division and Department of Statistics,
More informationMAD-Bayes: MAP-based Asymptotic Derivations from Bayes
MAD-Bayes: MAP-based Asymptotic Derivations from Bayes Tamara Broderick Brian Kulis Michael I. Jordan Cat Clusters Mouse clusters Dog 1 Cat Clusters Dog Mouse Lizard Sheep Picture 1 Picture 2 Picture 3
More informationarxiv: v1 [stat.ml] 20 Nov 2012
A survey of non-exchangeable priors for Bayesian nonparametric models arxiv:1211.4798v1 [stat.ml] 20 Nov 2012 Nicholas J. Foti 1 and Sinead Williamson 2 1 Department of Computer Science, Dartmouth College
More informationPoisson Latent Feature Calculus for Generalized Indian Buffet Processes
Poisson Latent Feature Calculus for Generalized Indian Buffet Processes Lancelot F. James (paper from arxiv [math.st], Dec 14) Discussion by: Piyush Rai January 23, 2015 Lancelot F. James () Poisson Latent
More informationCSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado
CSCI 5822 Probabilistic Model of Human and Machine Learning Mike Mozer University of Colorado Topics Language modeling Hierarchical processes Pitman-Yor processes Based on work of Teh (2006), A hierarchical
More informationBayesian Nonparametric Learning of Complex Dynamical Phenomena
Duke University Department of Statistical Science Bayesian Nonparametric Learning of Complex Dynamical Phenomena Emily Fox Joint work with Erik Sudderth (Brown University), Michael Jordan (UC Berkeley),
More informationNonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5)
Nonparametric Bayesian Methods: Models, Algorithms, and Applications (Day 5) Tamara Broderick ITT Career Development Assistant Professor Electrical Engineering & Computer Science MIT Bayes Foundations
More informationBayesian Nonparametric Models on Decomposable Graphs
Bayesian Nonparametric Models on Decomposable Graphs François Caron INRIA Bordeaux Sud Ouest Institut de Mathématiques de Bordeaux University of Bordeaux, France francois.caron@inria.fr Arnaud Doucet Departments
More informationTruncation error of a superposed gamma process in a decreasing order representation
Truncation error of a superposed gamma process in a decreasing order representation B julyan.arbel@inria.fr Í www.julyanarbel.com Inria, Mistis, Grenoble, France Joint work with Igor Pru nster (Bocconi
More informationHierarchical Bayesian Languge Model Based on Pitman-Yor Processes. Yee Whye Teh
Hierarchical Bayesian Languge Model Based on Pitman-Yor Processes Yee Whye Teh Probabilistic model of language n-gram model Utility i-1 P(word i word i-n+1 ) Typically, trigram model (n=3) e.g., speech,
More informationBayesian nonparametric latent feature models
Bayesian nonparametric latent feature models François Caron UBC October 2, 2007 / MLRG François Caron (UBC) Bayes. nonparametric latent feature models October 2, 2007 / MLRG 1 / 29 Overview 1 Introduction
More informationInfinite Latent Feature Models and the Indian Buffet Process
Infinite Latent Feature Models and the Indian Buffet Process Thomas L. Griffiths Cognitive and Linguistic Sciences Brown University, Providence RI 292 tom griffiths@brown.edu Zoubin Ghahramani Gatsby Computational
More informationBayesian nonparametric models for bipartite graphs
Bayesian nonparametric models for bipartite graphs François Caron INRIA IMB - University of Bordeaux Talence, France Francois.Caron@inria.fr Abstract We develop a novel Bayesian nonparametric model for
More informationApplied Nonparametric Bayes
Applied Nonparametric Bayes Michael I. Jordan Department of Electrical Engineering and Computer Science Department of Statistics University of California, Berkeley http://www.cs.berkeley.edu/ jordan Acknowledgments:
More informationBayesian nonparametric models for bipartite graphs
Bayesian nonparametric models for bipartite graphs François Caron Department of Statistics, Oxford Statistics Colloquium, Harvard University November 11, 2013 F. Caron 1 / 27 Bipartite networks Readers/Customers
More informationOn Simulations form the Two-Parameter. Poisson-Dirichlet Process and the Normalized. Inverse-Gaussian Process
On Simulations form the Two-Parameter arxiv:1209.5359v1 [stat.co] 24 Sep 2012 Poisson-Dirichlet Process and the Normalized Inverse-Gaussian Process Luai Al Labadi and Mahmoud Zarepour May 8, 2018 ABSTRACT
More informationThe Kernel Beta Process
The Kernel Beta Process Lu Ren Electrical & Computer Engineering Dept. Duke University Durham, NC 27708 lr22@duke.edu David Dunson Department of Statistical Science Duke University Durham, NC 27708 dunson@stat.duke.edu
More informationAn Infinite Product of Sparse Chinese Restaurant Processes
An Infinite Product of Sparse Chinese Restaurant Processes Yarin Gal Tomoharu Iwata Zoubin Ghahramani yg279@cam.ac.uk CRP quick recap The Chinese restaurant process (CRP) Distribution over partitions of
More informationBayesian Nonparametrics: Dirichlet Process
Bayesian Nonparametrics: Dirichlet Process Yee Whye Teh Gatsby Computational Neuroscience Unit, UCL http://www.gatsby.ucl.ac.uk/~ywteh/teaching/npbayes2012 Dirichlet Process Cornerstone of modern Bayesian
More informationStick-Breaking Beta Processes and the Poisson Process
Stic-Breaing Beta Processes and the Poisson Process John Paisley David M. Blei 3 Michael I. Jordan,2 Department of EECS, 2 Department of Statistics, UC Bereley 3 Computer Science Department, Princeton
More informationBayesian nonparametric models of sparse and exchangeable random graphs
Bayesian nonparametric models of sparse and exchangeable random graphs F. Caron & E. Fox Technical Report Discussion led by Esther Salazar Duke University May 16, 2014 (Reading group) May 16, 2014 1 /
More informationarxiv: v1 [math.pr] 31 Dec 2014
THE CONTINUUM-OF-URNS SCHEME, GENERALIZED BETA AND INDIAN BUFFET PROCESSES, AND HIERARCHIES THEREOF DANIEL M. ROY arxiv:1501.00208v1 [math.pr] 31 Dec 2014 Abstract. We describe the combinatorial stochastic
More informationAn Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism
An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism Aaron C. Courville, Douglas Eck and Yoshua Bengio Department of Computer Science and Operations Research University of Montréal Montréal, Québec,
More informationThe Indian Buffet Process: An Introduction and Review
Journal of Machine Learning Research 12 (2011) 1185-1224 Submitted 3/10; Revised 3/11; Published 4/11 The Indian Buffet Process: An Introduction and Review Thomas L. Griffiths Department of Psychology
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationSharing Clusters Among Related Groups: Hierarchical Dirichlet Processes
Sharing Clusters Among Related Groups: Hierarchical Dirichlet Processes Yee Whye Teh (1), Michael I. Jordan (1,2), Matthew J. Beal (3) and David M. Blei (1) (1) Computer Science Div., (2) Dept. of Statistics
More informationDirichlet Processes: Tutorial and Practical Course
Dirichlet Processes: Tutorial and Practical Course (updated) Yee Whye Teh Gatsby Computational Neuroscience Unit University College London August 2007 / MLSS Yee Whye Teh (Gatsby) DP August 2007 / MLSS
More informationLecture 3a: Dirichlet processes
Lecture 3a: Dirichlet processes Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London c.archambeau@cs.ucl.ac.uk Advanced Topics
More informationLecture 16-17: Bayesian Nonparametrics I. STAT 6474 Instructor: Hongxiao Zhu
Lecture 16-17: Bayesian Nonparametrics I STAT 6474 Instructor: Hongxiao Zhu Plan for today Why Bayesian Nonparametrics? Dirichlet Distribution and Dirichlet Processes. 2 Parameter and Patterns Reference:
More informationarxiv: v2 [stat.ml] 10 Sep 2012
Distance Dependent Infinite Latent Feature Models arxiv:1110.5454v2 [stat.ml] 10 Sep 2012 Samuel J. Gershman 1, Peter I. Frazier 2 and David M. Blei 3 1 Department of Psychology and Princeton Neuroscience
More informationHierarchical Dirichlet Processes
Hierarchical Dirichlet Processes Yee Whye Teh, Michael I. Jordan, Matthew J. Beal and David M. Blei Computer Science Div., Dept. of Statistics Dept. of Computer Science University of California at Berkeley
More informationNormalized kernel-weighted random measures
Normalized kernel-weighted random measures Jim Griffin University of Kent 1 August 27 Outline 1 Introduction 2 Ornstein-Uhlenbeck DP 3 Generalisations Bayesian Density Regression We observe data (x 1,
More informationA marginal sampler for σ-stable Poisson-Kingman mixture models
A marginal sampler for σ-stable Poisson-Kingman mixture models joint work with Yee Whye Teh and Stefano Favaro María Lomelí Gatsby Unit, University College London Talk at the BNP 10 Raleigh, North Carolina
More informationFlexible Regression Modeling using Bayesian Nonparametric Mixtures
Flexible Regression Modeling using Bayesian Nonparametric Mixtures Athanasios Kottas Department of Applied Mathematics and Statistics University of California, Santa Cruz Department of Statistics Brigham
More informationPartition models and cluster processes
and cluster processes and cluster processes With applications to classification Jie Yang Department of Statistics University of Chicago ICM, Madrid, August 26 and cluster processes utline 1 and cluster
More informationExchangeable random hypergraphs
Exchangeable random hypergraphs By Danna Zhang and Peter McCullagh Department of Statistics, University of Chicago Abstract: A hypergraph is a generalization of a graph in which an edge may contain more
More informationOn some distributional properties of Gibbs-type priors
On some distributional properties of Gibbs-type priors Igor Prünster University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics Workshop ICERM, 21st September 2012 Joint work with: P. De Blasi,
More informationarxiv: v2 [stat.ml] 4 Aug 2011
A Tutorial on Bayesian Nonparametric Models Samuel J. Gershman 1 and David M. Blei 2 1 Department of Psychology and Neuroscience Institute, Princeton University 2 Department of Computer Science, Princeton
More informationPriors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material
Priors for Random Count Matrices Derived from a Family of Negative Binomial Processes: Supplementary Material A The Negative Binomial Process: Details A. Negative binomial process random count matrix To
More informationDirichlet Process. Yee Whye Teh, University College London
Dirichlet Process Yee Whye Teh, University College London Related keywords: Bayesian nonparametrics, stochastic processes, clustering, infinite mixture model, Blackwell-MacQueen urn scheme, Chinese restaurant
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationColouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles
Colouring and breaking sticks, pairwise coincidence losses, and clustering expression profiles Peter Green and John Lau University of Bristol P.J.Green@bristol.ac.uk Isaac Newton Institute, 11 December
More informationSTAT Advanced Bayesian Inference
1 / 32 STAT 625 - Advanced Bayesian Inference Meng Li Department of Statistics Jan 23, 218 The Dirichlet distribution 2 / 32 θ Dirichlet(a 1,...,a k ) with density p(θ 1,θ 2,...,θ k ) = k j=1 Γ(a j) Γ(
More informationImage segmentation combining Markov Random Fields and Dirichlet Processes
Image segmentation combining Markov Random Fields and Dirichlet Processes Jessica SODJO IMS, Groupe Signal Image, Talence Encadrants : A. Giremus, J.-F. Giovannelli, F. Caron, N. Dobigeon Jessica SODJO
More information5 Introduction to the Theory of Order Statistics and Rank Statistics
5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationCompound Random Measures
Compound Random Measures Jim Griffin (joint work with Fabrizio Leisen) University of Kent Introduction: Two clinical studies 3 CALGB8881 3 CALGB916 2 2 β 1 1 β 1 1 1 5 5 β 1 5 5 β Infinite mixture models
More informationSpatial Normalized Gamma Process
Spatial Normalized Gamma Process Vinayak Rao Yee Whye Teh Presented at NIPS 2009 Discussion and Slides by Eric Wang June 23, 2010 Outline Introduction Motivation The Gamma Process Spatial Normalized Gamma
More informationBayesian Nonparametrics: Models Based on the Dirichlet Process
Bayesian Nonparametrics: Models Based on the Dirichlet Process Alessandro Panella Department of Computer Science University of Illinois at Chicago Machine Learning Seminar Series February 18, 2013 Alessandro
More informationBayesian Nonparametrics
Bayesian Nonparametrics Peter Orbanz Columbia University PARAMETERS AND PATTERNS Parameters P(X θ) = Probability[data pattern] 3 2 1 0 1 2 3 5 0 5 Inference idea data = underlying pattern + independent
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationProperties of Bayesian nonparametric models and priors over trees
Properties of Bayesian nonparametric models and priors over trees David A. Knowles Computer Science Department Stanford University July 24, 2013 Introduction Theory: what characteristics might we want?
More informationNon-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources
th International Conference on Information Fusion Chicago, Illinois, USA, July -8, Non-parametric Bayesian Modeling and Fusion of Spatio-temporal Information Sources Priyadip Ray Department of Electrical
More informationDirichlet Processes and other non-parametric Bayesian models
Dirichlet Processes and other non-parametric Bayesian models Zoubin Ghahramani http://learning.eng.cam.ac.uk/zoubin/ zoubin@cs.cmu.edu Statistical Machine Learning CMU 10-702 / 36-702 Spring 2008 Model
More informationBayesian Nonparametrics
Bayesian Nonparametrics Lorenzo Rosasco 9.520 Class 18 April 11, 2011 About this class Goal To give an overview of some of the basic concepts in Bayesian Nonparametrics. In particular, to discuss Dirichelet
More informationScribe Notes 11/30/07
Scribe Notes /3/7 Lauren Hannah reviewed salient points from Some Developments of the Blackwell-MacQueen Urn Scheme by Jim Pitman. The paper links existing ideas on Dirichlet Processes (Chinese Restaurant
More informationFlexible Martingale Priors for Deep Hierarchies
Flexible Martingale Priors for Deep Hierarchies Jacob Steinhardt Massachusetts Institute of Technology Zoubin Ghahramani University of Cambridge Abstract We present an infinitely exchangeable prior over
More informationA Bayesian Review of the Poisson-Dirichlet Process
A Bayesian Review of the Poisson-Dirichlet Process Wray Buntine wray.buntine@nicta.com.au NICTA and Australian National University Locked Bag 8001, Canberra ACT 2601, Australia Marcus Hutter marcus.hutter@anu.edu.au
More informationNon-parametric Bayesian Methods
Non-parametric Bayesian Methods Uncertainty in Artificial Intelligence Tutorial July 25 Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK Center for Automated Learning
More informationTruncation error of a superposed gamma process in a decreasing order representation
Truncation error of a superposed gamma process in a decreasing order representation Julyan Arbel Inria Grenoble, Université Grenoble Alpes julyan.arbel@inria.fr Igor Prünster Bocconi University, Milan
More informationA Simple Proof of the Stick-Breaking Construction of the Dirichlet Process
A Simple Proof of the Stick-Breaking Construction of the Dirichlet Process John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We give a simple
More informationTHE POISSON DIRICHLET DISTRIBUTION AND ITS RELATIVES REVISITED
THE POISSON DIRICHLET DISTRIBUTION AND ITS RELATIVES REVISITED LARS HOLST Department of Mathematics, Royal Institute of Technology SE 1 44 Stockholm, Sweden E-mail: lholst@math.kth.se December 17, 21 Abstract
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationCS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr Dirichlet Process I
X i Ν CS281B / Stat 241B : Statistical Learning Theory Lecture: #22 on 19 Apr 2004 Dirichlet Process I Lecturer: Prof. Michael Jordan Scribe: Daniel Schonberg dschonbe@eecs.berkeley.edu 22.1 Dirichlet
More informationThe IBP Compound Dirichlet Process and its Application to Focused Topic Modeling
The IBP Compound Dirichlet Process and its Application to Focused Topic Modeling Sinead Williamson SAW56@CAM.AC.UK Department of Engineering, University of Cambridge, Trumpington Street, Cambridge, UK
More informationPCA with random noise. Van Ha Vu. Department of Mathematics Yale University
PCA with random noise Van Ha Vu Department of Mathematics Yale University An important problem that appears in various areas of applied mathematics (in particular statistics, computer science and numerical
More informationA permutation-augmented sampler for DP mixture models
Percy Liang University of California, Berkeley Michael Jordan University of California, Berkeley Ben Taskar University of Pennsylvania Abstract We introduce a new inference algorithm for Dirichlet process
More informationNonparametric Probabilistic Modelling
Nonparametric Probabilistic Modelling Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Signal processing and inference
More informationVariational Inference for the Nested Chinese Restaurant Process
Variational Inference for the Nested Chinese Restaurant Process Chong Wang Computer Science Department Princeton University chongw@cs.princeton.edu David M. Blei Computer Science Department Princeton University
More informationSpatial Bayesian Nonparametrics for Natural Image Segmentation
Spatial Bayesian Nonparametrics for Natural Image Segmentation Erik Sudderth Brown University Joint work with Michael Jordan University of California Soumya Ghosh Brown University Parsing Visual Scenes
More informationBayesian non-parametric model to longitudinally predict churn
Bayesian non-parametric model to longitudinally predict churn Bruno Scarpa Università di Padova Conference of European Statistics Stakeholders Methodologists, Producers and Users of European Statistics
More informationOn a coverage model in communications and its relations to a Poisson-Dirichlet process
On a coverage model in communications and its relations to a Poisson-Dirichlet process B. Błaszczyszyn Inria/ENS Simons Conference on Networks and Stochastic Geometry UT Austin Austin, 17 21 May 2015 p.
More informationConstruction of Dependent Dirichlet Processes based on Poisson Processes
1 / 31 Construction of Dependent Dirichlet Processes based on Poisson Processes Dahua Lin Eric Grimson John Fisher CSAIL MIT NIPS 2010 Outstanding Student Paper Award Presented by Shouyuan Chen Outline
More informationAn Alternative Prior Process for Nonparametric Bayesian Clustering
An Alternative Prior Process for Nonparametric Bayesian Clustering Hanna Wallach (UMass Amherst) Shane Jensen (UPenn) Lee Dicker (Harvard) Katherine Heller (Cambridge) Nonparametric Bayesian Clustering
More informationHomework 6: Image Completion using Mixture of Bernoullis
Homework 6: Image Completion using Mixture of Bernoullis Deadline: Wednesday, Nov. 21, at 11:59pm Submission: You must submit two files through MarkUs 1 : 1. a PDF file containing your writeup, titled
More informationTwo Useful Bounds for Variational Inference
Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the
More informationChapter 3: Element sampling design: Part 1
Chapter 3: Element sampling design: Part 1 Jae-Kwang Kim Fall, 2014 Simple random sampling 1 Simple random sampling 2 SRS with replacement 3 Systematic sampling Kim Ch. 3: Element sampling design: Part
More informationNonparametric Bayesian Models for Sparse Matrices and Covariances
Nonparametric Bayesian Models for Sparse Matrices and Covariances Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Bayes
More informationBayesian nonparametric model for sparse dynamic networks
Bayesian nonparametric model for sparse dynamic networks Konstantina Palla, François Caron and Yee Whye Teh 23rd of February, 2018 University of Glasgow Konstantina Palla 1 / 38 BIG PICTURE Interested
More information