Beta processes, stick-breaking, and power laws

Size: px

Start display at page:

Download "Beta processes, stick-breaking, and power laws"

Allan Sanders
5 years ago
Views:

1 Beta processes, stick-breaking, and power laws T. Broderick, M. Jordan, J. Pitman Presented by Jixiong Wang & J. Li November 17, 2011

2 DP vs. BP Dirichlet Process Beta Process

3 DP vs. BP Dirichlet Process G DP(αB 0 ): Beta Process G BP(θ, γb 0 ): G = π i δ ψi, i=1 π i = 1 i=1 G = q i δ ψi, q i (0, 1) i=1

4 DP vs. BP Dirichlet Process G DP(αB 0 ): Beta Process G BP(θ, γb 0 ): G = π i δ ψi, i=1 π i = 1 i=1 G = q i δ ψi, q i (0, 1) i=1 CRP - marginalize out π i IBP - marginalize out q i

5 DP vs. BP Dirichlet Process G DP(αB 0 ): Beta Process G BP(θ, γb 0 ): G = π i δ ψi, i=1 π i = 1 i=1 G = q i δ ψi, q i (0, 1) i=1 CRP - marginalize out π i Clustering framework IBP - marginalize out q i Featural framework

6 Poisson Point Process (PPP) PPP: A counting measure N such that A S, N(A) Pois(µ(A))

7 Poisson Point Process (PPP) PPP: A counting measure N such that A S, N(A) Pois(µ(A)) Figure: PPP realizations with different rate measure µ

8 Poisson Point Process (PPP) PPP: A counting measure N such that A S, N(A) Pois(µ(A)) Figure: PPP realizations with different rate measure µ PPP is a completely random measure because for all disjoint subsets A 1,..., A n S, N(A 1 ),..., N(A n ) are independent. Note: DP is not a c.r.m..

9 Beta Process: B BP(θ, γb 0 ) BP is defined by a PPP that lives on Ψ [0, 1] Rate measure: ν(dψ, du) = θ(ψ)u 1 (1 u) θ(ψ) 1 duγb 0 (dψ)

10 Beta Process: B BP(θ, γb 0 ) BP is defined by a PPP that lives on Ψ [0, 1] Rate measure: ν(dψ, du) = θ(ψ)u 1 (1 u) θ(ψ) 1 duγb 0 (dψ) To draw B BP(θ, γb 0 ) = Π = {(ψ i, U i )} i = B = i=1 U iδ ψi

γb 0 ) = Π = {(ψ i, U i )} i = B = i=1 U iδ ψi Beta process B BP(α,B 0 )

11 Beta Process: B BP(θ, γb 0 ) BP is defined by a PPP that lives on Ψ [0, 1] Rate measure: ν(dψ, du) = θ(ψ)u 1 (1 u) θ(ψ) 1 duγb 0 (dψ) To draw B BP(θ, γb 0 ) = Π = {(ψ i, U i )} i = B = i=1 U iδ ψi Beta process B BP(α,B 0 ) Aside: Poisson processes Θ ν BP (dθ,du)=αu 1 (1 u) α 1 du B 0 (dθ), θ Θ,u [0, 1]

12 Bernoulli process & Binary feature matrix 1 Y BeP(B): Y = i=1 b iδ ψi, where b i Bern(U i ) Draw Y 1,..., Y N BeP(B) Ψ Ψ Figure 2: Upper left: AdrawB from the beta process. Lower left: 50draws from T. Broderick, the Bernoulli M. process Jordan, BeP(B). J. Pitman The vertical Presented axisby indexes Jixiong thewang drawbeta number & J. processes, Li stick-breaking, and power laws

13 0.5 Bernoulli process & Binary feature matrix 1 Y BeP(B): Y = i=1 b iδ ψi, where b i Bern(U i ) 0 Draw Y 1,..., Y N BeP(B) Ψ Ψ Ψ 0 Ψ Figure 2: Upper left: A draw B from the beta process. Lower left: 50 draws from the Bernoulli process BeP(B). The vertical axis indexes the draw number among the 50 exchangeable draws. A point indicates a one at the corresponding location on the horizontal axis, ψ Ψ. Right: We can form a matrix from the lower left plot by including only those ψ values with a non-zero number of Bernoulli successes among the 50 draws from the Bernoulli process. Then, the number of columns K is the number of such ψ, andthenumberofrowsn is the number of draws made. A black square indicates a one at the corresponding matrix position; a white square indicates a zero. Figure 2: Upper left: AdrawB from the beta process. Lower left: 50draws from T. Broderick, the Bernoulli M. process Jordan, BeP(B). J. Pitman The vertical Presented axisby indexes Jixiong thewang drawbeta number & J. processes, Li stick-breaking, and power laws

14 Bernoulli process & Binary feature matrix 1 Y BeP(B): Y = i=1 b iδ ψi, where b i Bern(U i ) 0 0 Draw Y 1,..., Y N BeP(B) Ψ Ψ Ψ Ψ 0 0 Ψ Ψ Figure Figure 2: Upper 2: Upper left: Aleft: drawab draw fromb the frombeta theprocess. beta process. Lower Lower left: 50left: draws 50draws from the from Bernoulli the Bernoulli process process BeP(B). BeP(B). The vertical The vertical axis indexes axis indexes the draw thenumber draw number among among the 50 the exchangeable 50 exchangeable draws. draws. A point Aindicates point indicates a one at a one the corresponding at the corresponding location location on theon horizontal the horizontal axis, ψaxis, Ψ. ψright: Ψ. Right: We canwe form cana form matrix a matrix from the from the lower left lower plot leftbyplot including by including only those only ψthose values ψ values with awith non-zero a non-zero number number of of Bernoulli Bernoulli successes successes among among the 50the draws 50 from drawsthe frombernoulli the Bernoulli process. process. Then, Then, the the number number of columns of columns K is the K number is the number of suchof ψ, such ψ, andthenumberofrowsn is is the number the number of draws of made. draws made. A blacka square black square indicates indicates a one at a one the corresponding at the corresponding matrix matrix position; position; a whitea square white square indicates indicates a zero. a zero. Figure 2: Upper left: AdrawB from the beta process. Lower left: 50draws from T. Broderick, the Bernoulli M. process Jordan, BeP(B). J. Pitman The vertical Presented axisby indexes Jixiong thewang drawbeta number & J. processes, Li stick-breaking, and power laws

15 Bernoulli process & Binary feature matrix 1 Y BeP(B): Y = i=1 b iδ ψi, where b i Bern(U i ) 0 0 Draw Y 1,..., Y N BeP(B) Ψ Ψ Ψ Ψ Ψ 50 Form Bayesian binary nonparametric feature matrix latent feature Z models BP-BeP(N, γ, θ) 7 Figure Figure 2: Upper 2: Upper left: Aleft: drawab draw fromb the frombeta theprocess. beta process. Lower Lower left: 50left: draws 50draws 40 from the from Bernoulli the Bernoulli process process BeP(B). BeP(B). The vertical The vertical axis indexes axis indexes the draw thenumber draw number among among the 50 the exchangeable 50 exchangeable draws. draws. A point Aindicates point indicates a one at a one the corresponding at the corresponding 30 location location on theon horizontal the horizontal axis, ψaxis, Ψ. ψright: Ψ. Right: We canwe form cana form matrix a matrix from the from the lower left lower plot leftbyplot lof including by including only those only ψthose values ψ values with awith non-zero a non-zero number number of of 20 Bernoulli Bernoulli successes successes among among the 50the draws 50 from drawsthe frombernoulli the Bernoulli process. process. Then, Then, the the number number of columns of columns K is the K number is the number of suchof ψ, such ψ, andthenumberofrowsn is is 10 the number the number of draws of made. draws made. A blacka square black square indicates indicates a one at a one the corresponding at the corresponding matrix matrix position; position; a whitea square white square indicates indicates a zero. a zero. 0 Ψ Figure 2: Binary matrices and the left-ordered form. The binary matrix on the left is transformed into the left-ordered binary [Ghahramani matrix on the right et al 06] by the function lof( ). This left-ordered matrix was generated from the exchangeable B from theindian beta process. buffet process Lower left: with 50α draws =10. Empty columns are omitted Figure 2: Upper left: Adraw from T. Broderick, the Bernoulli M. process Jordan, BeP(B). from J. Pitman both Thematrices. vertical Presented axisby indexes Jixiong thewang drawbeta number & J. processes, Li stick-breaking, and power laws

16 or the Stick-breaking beta process can beconstruction derived by a variety of ofbp procedures, sed sampling [Thibaux and Jordan, 2007] and inverse Lévy and Ickstadt, 2004, Teh et al., 2007]. The procedures that are the stick-breaking representation for the Dirichlet process are ey et al. [2010] and Teh et al. [2007]. Our point of departure ch has the following form: B = C i V (l) i,j ψ i,j iid C i i=1 j=1 Pois(γ) iid Beta(1, θ) (1 V (l) i,j )δ ψ i,j i 1 V (i) i,j l=1 iid 1 γ B 0. (13) n is analogous to the stick-breaking representation of the in that it represents a draw from the beta process asasum y drawn atoms, with the weights obtained by a recursive proit is worth noting that for every (i, j) tuple subscript for V (l) i,j, ists and is broken across the superscript l. Thus,there are no roperties across weights in the sum in Eq. (13); by contrast,. (12) sum to one almost surely. tion of the one-parameter Dirichlet process to the two-parameter sst. suggests Broderick, M. that Jordan, wej. Pitman might consider Presented bygeneralizing Jixiong Wang Beta & J. the processes, Li stick- stick-breaking, and power laws

17 or the Stick-breaking beta process can beconstruction derived by a variety of ofbp procedures, sed sampling [Thibaux and Jordan, 2007] and inverse Lévy and Ickstadt, 2004, Teh et al., 2007]. The procedures that are the stick-breaking representation for the Dirichlet process are ey et al. [2010] and Teh et al. [2007]. Our point (Paisley of departure et al 2010): ch has the following form: B = C i V (l) i,j ψ i,j iid C i i=1 j=1 Pois(γ) iid Beta(1, θ) (1 V (l) i,j )δ ψ i,j i 1 V (i) i,j l=1 C 1 B = V (1) 1,j δ ψ 1,j + n is analogous to the stick-breaking representation j=1 of the in that it represents a draw from the beta process asasum y drawn atoms, with the weights obtained by a recursive proit is worth noting that for every (i, j) tuple subscript for V (l) i,j, ists and is broken across the superscript l. Thus,thereareno roperties across weights in the sum in Eq. (13); by contrast,. (12) sum to one almost surely. tion of the one-parameter Dirichlet process to the two-parameter ss suggests that we might consider generalizing the stick- j=1 C 2 j=1 iid 1 γ B 0. C 3 (13) V (3) 3,j V (2) (1) 2,j (1 V ij )δ ψ2,j + (1 V (2,) 3,j )(1 V (1) 3,j )δ ψ 3,j +...

18 or the Stick-breaking beta process can beconstruction derived by a variety of ofbp procedures, sed sampling [Thibaux and Jordan, 2007] and inverse Lévy and Ickstadt, 2004, Teh et al., 2007]. The procedures that are the stick-breaking representation for the Dirichlet process are ey et al. [2010] and Teh et al. [2007]. Our point (Paisley of departure et al 2010): ch has the following form: B = C i V (l) i,j iid C i i=1 j=1 Pois(γ) iid Beta(1, θ) (1 V (l) i,j )δ ψ i,j i 1 V (i) i,j l=1 C 1 B = V (1) 1,j δ ψ 1,j + j=1 C 2 iid ψ i,j 1 γ B 0. C 3 (13) V (3) 3,j n is analogous to the stick-breaking representation j=1 of the in that it represents a draw from the beta process asasum y drawn atoms, with the weights obtained by a recursive proit is worth noting that for every (i, j) tuple subscript for V (l) i,j, Think of each i as a round ists and is broken It is a across multiple the superscript of stick-breaking l. Thus,there DP areno roperties across weights in the sum in Eq. (13); by contrast,. (12) sum to one almost surely. tion of the one-parameter Dirichlet process to the two-parameter sst. suggests Broderick, M. that Jordan, wej. Pitman might consider Presented bygeneralizing Jixiong Wang & J. the Li stick- j=1 V (2) (1) 2,j (1 V ij )δ ψ2,j + (1 V (2,) 3,j )(1 V (1) Beta processes, stick-breaking, and power laws 3,j )δ ψ 3,j +...

19 Three parameter generalization breaking representation of the beta process in Eq. (13) as follows: 3 parameter stick-breaking ( a multiple of Pitman-Yor ) C i B = C i V (l) i,j ψ i,j iid indep iid i=1 j=1 Pois(γ) i 1 V (i) i,j l=1 Beta(1 α, θ + iα) (1 V (l) i,j )δ ψ i,j 1 γ B 0. (14) In Section 6 we will show that introducing the additional parameter α indeed yields Type I and II power law behavior (but not Type III). In the remainder of this section we present a proof that these stick-breaking representations arise from the beta process. In contradistinction to the proof of Eq. (13) by Paisley et al. [2010], which used a limiting process defined on sequences of finite binary matrices, our approach makes a direct connection to the Poisson process characterization of the beta process. Our proof has several virtues: (1) it relies on no asymptotic arguments and instead comes entirely from

20 Three parameter generalization breaking representation of the beta process in Eq. (13) as follows: 3 parameter stick-breaking ( a multiple of Pitman-Yor ) C i B = C i V (l) i,j ψ i,j iid indep iid i=1 j=1 Pois(γ) i 1 V (i) i,j l=1 Beta(1 α, θ + iα) (1 V (l) i,j )δ ψ i,j 1 γ B 0. (14) In Section 3 parameter 6 we willbp(θ, show that α, B 0 introducing ). Rate measure: the additional parameter α indeed yields Type I and II power law behavior (but not Type III). In the ν BP remainder (dψ, du) =B of this o (dψ) section µ BP we (du) present a proof that these stick-breaking representations arise from the beta process. Γ(1 + θ) In contradistinction to the proof =B of Eq. (13) by Paisley o (dψ) et al. [2010], Γ(1 which α)γ(θ used + α) a u 1 α (1 u) θ+α 1 du limiting process defined on sequences of finite binary matrices, our approach makes a direct connection to the Poisson process characterization of the beta process. Our proof has several virtues: (1) it relies on no asymptotic arguments and instead comes entirely from

21 Equivalence An elegant proof Proposition 1 B presented in the stick-breaking construction is equivalent to B BP(θ, α, B 0 )

22 Equivalence An elegant proof Proposition 1 B presented in the stick-breaking construction is equivalent to B BP(θ, α, B 0 ) Idea of proof: The stick-breaking representation is also a PPP, and induces rate measure ν

23 Equivalence An elegant proof Proposition 1 B presented in the stick-breaking construction is equivalent to B BP(θ, α, B 0 ) Idea of proof: The stick-breaking representation is also a PPP, and induces rate measure ν Therefore only need to show that ν = ν BP

24 Power law behavior: Power laws in clustering models: K N,j = i=1 I (N i = j) K N = i=1 I (N i > 0) Type 1: K N cn a, N Type 2: K N,j aγ(j a) j!γ(1 a) cna, N Power laws in featural models: Type 3: P(k n > M) cm a

25 Power law derivations: Type 1 and 2

26 Power law derivations: Poissonization K(t) will be the number of such Poisson processes with points in the interval [0, t] K(t) = i I Π i [0, t] > 0 K j (t) will be the number of such Poisson processes with j points in the interval [0, t] K j (t) = i I Π i [0, t] = j

27 Power law derivations

28 Power law derivations

29 Power law derivations: Type 1 and 2

30 Power law derivations: Type 3

31 Simulation α = 0 (classic), α = 0.3 and α = 0.6; γ = 3, θ = 1. Generate 2000 random variables C i and 2000 i=1 C i feature probabilities. With these probabilities, we generated N = 1000 data points, i.e., 1000 vectors of (2000) independent Bernoulli random variables.

32 Simulation: Type 1 & 2

33 Simulation: Type 3

34 Experimental results Beta process coupled with a discrete factor analysis model. Handwritten digit: 28x28 pixels projected into 50 dimensions with PCA.

35 Experimental results

36 Experimental results

37 Conclusions (BP, stick-breaking, IBP) (DP, stick-breaking, CRP) Three-parameter generalization of BP Pitman-Yor generalization of DP Type 1 & 2 power laws follow from the three-parameter model. Type 3: an open problem to discover new class of stochastic process.

Beta processes, stick-breaking, and power laws

Beta processes, stick-breaking, and power laws Tamara Broderick Michael Jordan Jim Pitman Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-211-125