Construction of Dependent Dirichlet Processes based on Poisson Processes

Size: px

Start display at page:

Download "Construction of Dependent Dirichlet Processes based on Poisson Processes"

Wendy Bond
5 years ago
Views:

1 1 / 31 Construction of Dependent Dirichlet Processes based on Poisson Processes Dahua Lin Eric Grimson John Fisher CSAIL MIT NIPS 2010 Outstanding Student Paper Award Presented by Shouyuan Chen

2 Outline 2 / 31 Motivations Dynamic Mixture Models Dependent Dirichlet Process Model Construction Key idea Three operations Discussions Inference and Experiments Inference Experiments Conclusions

3 Outline 3 / 31 Motivations Dynamic Mixture Models Dependent Dirichlet Process Model Construction Key idea Three operations Discussions Inference and Experiments Inference Experiments Conclusions

4 Mixture Models: From Static to Dynamic 4 / 31 Evolutionary clustering add/remove clusters movement of clusters Document modeling add/remove topics evolution of topics Other applications image modeling location base services financial analysis

5 5 / 31 Dynamic Mixture Models Model the behavior of latent components overtime Creation of new components. Removal of existing components. Variation of component parameters. Components can be Clusters Dynamic Gaussian Mixture Model Topics Dynamic Topic Model

6 Outline 6 / 31 Motivations Dynamic Mixture Models Dependent Dirichlet Process Model Construction Key idea Three operations Discussions Inference and Experiments Inference Experiments Conclusions

7 Dirichlet Process 7 / 31 Dirichlet process (DP) infinite limit of Dirichlet distribution. Finite mixture models. Prior: Dir( α): k-dimensional Dirichlet distribution Pre-specified number of components k. Dirichlet process mixture models (DPMM). Prior: DP(α, H): "infinite dimensional Dirichlet distribution" Learn hidden k automatically.

8 Extending DP to Dependent DPs 8 / 31 A Single DP D A Markov chain of t DPs. D 1 D 2 D t X X (1) X (2) X (t) Key problem How to design the Markov chain to support 3 key dependencies between D t 1 D t : Creation Add a new component Removal Remove an existing component Transition Varying component parameters

9 Outline 9 / 31 Motivations Dynamic Mixture Models Dependent Dirichlet Process Model Construction Key idea Three operations Discussions Inference and Experiments Inference Experiments Conclusions

10 10 / 31 Several ways to Dirichlet Process Equivalent constructions for DP Random measure Basic definition Posterior Chinese restaurant process Atomic construction Stick breaking process Construct DP(μ) by ΓP and PP Generate compound poisson process PP(μ γ) Gamma process ΓP(μ) is transformed from compound poisson process Dirichlet process DP(μ) is normalized Gamma process

11 Poisson, Gamma and Dirichlet Process 11 / 31 Given a measurable space (Ω, Σ, μ) Compound Poisson Process Π PP(μ γ), γ(dw) = w 1 e w dw Π is a point process (collection of infinite random points) on product space μ γ Π = i=0 δ (θ,ωθ ) Gamma Process: Transformed from compound poisson process G ω θ δ θ ΓP(μ) (θ,ω θ ) Π Dirichlet Process: Normalized gamma process D G/G(μ) DP(μ)

12 Key Idea for Transforming DPs DP New DP PP Operations? New PP Complete randomness A random measure of which the measure values of disjoint subsets are independent. Complete Randomness Preserving Operations Applying any operations that preserve complete randomness to Poisson processes results in a new Poisson process. Superposition two PP Subsampling a PP Mapping a PP point by point 12 / 31

13 Constructing a Chain of DPs 13 / 31 D t 1 Remove existing components Vary remaining components Add new components D t Π t 1 Subsampling Transition Superposition Π t new components H t

14 Outline 14 / 31 Motivations Dynamic Mixture Models Dependent Dirichlet Process Model Construction Key idea Three operations Discussions Inference and Experiments Inference Experiments Conclusions

15 Subsampling 15 / 31 Subsampling via Independent Bernoulli Trail η = (θ, p θ ), z η Bernoulli(q), D = η p θδ θ DP(μ) S q (D) 1 z p p θ δ θ η=1 θ z η=1 Theorem (Subsampling) S q (D) DP(qμ) Proof sketch: DP PP: D Π PP(μγ). Subsampling PP: S q (Π) = {η Π : z η = 1} PP(qμγ). PP DP: S q (Π) S q (D) DP(qμ)

16 16 / 31 Transition Independent movement of each point T (, ): probabilistic transition kernel D = η p θδ θ DP(μ) T (D) p θ δ T (θ) Theorem (Transition) Proof sketch: T (D) DP(T μ) DP PP: D Π PP(μ γ). Mapping PP: T (Π) = {(T (θ), ω θ ) : (θ, ω θ ) Π} PP(T μ γ). PP DP: T (Π) T (D) DP(T μ)

17 Superposition 17 / 31 Sum of independent DPs D k DP(μ k ), k = 1,..., m be independent, (c 1,..., c m ) Dir(μ 1 (Ω),..., μ m (Ω)) Theorem (Superposition) Proof sketch: c k D k DP(μ μ m ) k DP PP: D k Π k PP(μ k γ). Mapping PP: k g kπ k PP( k g kμ k γ). 1 PP DP: k g gk D k k = c k D k DP( k μ k)

18 Outline 18 / 31 Motivations Dynamic Mixture Models Dependent Dirichlet Process Model Construction Key idea Three operations Discussions Inference and Experiments Inference Experiments Conclusions

19 Are All Poisson Things Necessary? Basic definition of DP D DP(μ) is a DP if for any partition A 1,..., A n of space Ω (D(A 1 ),..., D(A n )) Dir(μ(A 1 ),..., μ(a n )) Alternate proof of superposition theorem Let D = k c kd k, consider any partition A 1,..., A n of space Ω, ( ) (D(A 1 ),..., D(A n )) = c k D k (A 1 ),..., k k Dir( μ k (A 1 ),..., k k c k D k (A n ) μ k (A n )) The second step is from the property of Dirichlet distribution, and it concludes that D DP( k μ k). 19 / 31

20 20 / 31 Proof by basic definition By defining DP on an extended space over functions, we can directly model all three operations: subsampling, transition and superposition without appealing to Poisson process. Such construction also allows DDP to be constructed over any measurable space. This paper is exactly a special case if the space is fixed to be a discrete Markov chain.

21 Outline 21 / 31 Motivations Dynamic Mixture Models Dependent Dirichlet Process Model Construction Key idea Three operations Discussions Inference and Experiments Inference Experiments Conclusions

22 22 / 31 Inference Scheme Gibbs sampling. Sample one latent variable from posterior at each step. Consider time 1,..., t sequentially. Update labels. Samples survived components (with probability q) and component assignments Update parameters. Samples component parameter from T (.) Iterates between step 2 and 3. Then move on to next time t + 1, and never estimate earlier distributions. Sequential sampling This paper doesn t derive a batch sampling algorithm. Earlier samples would likely be less accurate.

23 23 / 31 Posterior computation For simplicity of notation, and without loss of generality, assume the expectation of new components equals with removed components. Given a set of samples Φ D t : φ i appears c i times (By DP posterior) D t Φ DP(μ + k c kδ φk ) (This paper) D t+1 Φ DP(μ + k qc kδ T (φk )) Is that true?

24 24 / 31 Argument against it Let D t = k p 1 kδ θk, D t+1 = z k =1 p k z k =1 p kδ θk. φ D t is one sample from D t. Fact: D t+1 φ is not DP. Mixture of DP is not DP Consider z φ Bernoulli(q). There are two different cases for D t+1 φ: z φ = 1. Thus φ is not removed. Thus φ is equivalently observed in D t+1. D t+1 φ, z φ = 1 DP(μ + δ φ ) z φ = 0. In this case φ is removed. D t+1 φ, z φ = 0 DP(μ) Hence D t+1 φ is a mixture of DPs: D t+1 φ = qdp(μ + δ φ ) + (1 q)dp(μ) It is proved NOT a DP. [1]

25 25 / 31 A deeper argument The observation is censored. Only knows φ is not removed at now. The complete lifespan of a component φ is not observed. Posterior of DP under censored observations is a mixture of DP.

26 Outline 26 / 31 Motivations Dynamic Mixture Models Dependent Dirichlet Process Model Construction Key idea Three operations Discussions Inference and Experiments Inference Experiments Conclusions

27 27 / 31 Synset dataset Setup Simulated over 80 phases. Gaussian mixture models with 2 components initially. The speed of introducing new components (one new component per 20 phases in average) and removing existing components is equal. Mean of component has a Brownian motion samples per components at each phase. Baselines Finite mixture models with K = 3, 5, 10. DPM is not compared with.

28 Results 28 / 31

29 29 / 31 Real World Applications Evolutionary Topic Model Model topic evolution of research paper Data: all NIPS papers over years Method: feature extraction to generate 12 dimensions feature per document. Then use Gaussian mixture model. People Flow The motion of people in New York Grand Central station. Data: 90,000 frames in one hour, divided into 60 phases. Try to group people tracks into flows depending on their motion patterns

30 30 / 31 Summary Propose a principled methodology to construct dependent Dirichlet processes based on the theoretical connections between Poisson, Gamma and Dirichlet processes. Develop a framework of evolving mixture model, which allows creation and removal of mixture components, as well as variation of parameters. Derive a Gibbs sampling algorithm for inferring mixture model parameters from observations. Test the approach on both synthetic data and real applications.

31 31 / 31 My Summary Poisson process is not essential for constructing DDP. Sequential sampling may damage the performance. Posterior of this model should be MDP rather than DP.

32 For Further Reading I 32 / 31 C. E. Antoniak Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems Annals of Statistics, 2(6): , 1974.

33 Dirichlet Process Stick-breaking representation 33 / 31 A sample from DP is almost surely discrete. Stick-breaking representation Let D DP(α, H) is a Dirichlet process. Then, almost surely D = p i δ θi i=1 p 1... GEM(α) i, θ i H, i.i.d The GEM distribution is called "stick-breaking" distribution.

34 34 / 31 My Experiments Setup Simulated over 30 phases. Gaussian mixture models with 2 components initially. The speed of introducing new components (0.4 new component per phase in average) and removing existing components is equal. Mean of component has a Brownian motion. 200 samples per components at each phase. Bias in posterior is fixed Baselines DPM, Sequential sampling (Markov-DPM), Batch algorithm (F-DPM)

35 My Experiment Results 35 / 31 variation of information number of clusters F-DPM Markov-DPM DPM time 30 F-DPM 25 Markov-DPM DPM 20 Ground Truth time

Construction of Dependent Dirichlet Processes based on Poisson Processes

Construction of Dependent Dirichlet Processes based on Poisson Processes Dahua Lin CSAIL, MIT dhlin@mit.edu Eric Grimson CSAIL, MIT welg@csail.mit.edu John Fisher CSAIL, MIT fisher@csail.mit.edu Abstract