Phase Diagram Study on Variational Bayes Learning of Bernoulli Mixture

Size: px

Start display at page:

Download "Phase Diagram Study on Variational Bayes Learning of Bernoulli Mixture"

Matthew Hood
5 years ago
Views:

1 009 Technical Report on Information-Based Induction Sciences 009 (IBIS009) Phase Diagram Study on Variational Bayes Learning of Bernoulli Mixture Daisue Kaji Sumio Watanabe Abstract: Variational Bayes learning is widely used in statistical models that contain hidden variables, for example, normal mixtures, binomial mixtures, and hidden Marov models. Although it is reported that variational Bayes learning of mixtute model has a phase transition structure which depends on hyperparameters of mixture ratio, the detail behavior and relation of hyperparameters concerning the phase transition have not yet nown. In the present paper, we experimentally investigate the phase diagram concerning the hyperparameters by using the Bernoulli mixture and show the guidance to set the hyperparameters. Keywords: variational Bayes, phase transition, phase diagram, hyperparameter, Bernoulli mixture 1 EM [3, 8] [1, 1] /, , tel , aji@cs.pi.titech.ac.jp Toyo Inistitute of Technology, 459 Nagatsuda, Midoriu,Yoohama, Japan ( ), , tel , daisue.aji@onicaminolta.jp Konicaminolta Medical & Graphic,INC, 970 Ishiawa-machi, Hachioji-shi,Toyo, , Japan, , swatanab@cs.pi.titech.ac.jp, PI Lab., Toyo Institute of Technology, 459 Nagatsuta Midoriu, Yoohama, , Japan [3, 10] 1 M B(x µ) = µ x i (1 µ) (1 xi), i=1 x = (x 1,, x M ) T µ = (µ 1,, µ M ) T M K p(x π, µ) = π B(x µ ), =1 π B(x µ ) K x

2 z x z = (0,, 1,, 0) z x Z = (z 1,, z N ), X = (x 1,, x N ), π θ p(x Z, θ) = p(θ) = N p(z π) = n=1 =1 m N K n=1 =1 π z n, ( K M zn θ x nm m (1 θ m) (1 xnm)), p(π) = Γ(Ka) Γ(a) K K M =1 m=1 K π a 1, ( ) Γ(b) Γ(b) θb 1 m (1 θ m) b 1 (a, b) p(π) p(θ) Y X q(y ) p(y X) F (X) = F [q(y )] + KL(q(Y ) p(y X)), F, F Kullbac-Leibler KL F (X) = log p(x, Y )dy = log p(x), q(y ) F [q(y )] = q(y ) log p(x, Y ) dy, q(y ) KL(q(Y ) p(y X)) = q(y ) log p(y X)) dy. q(y ) F [q(y )] q(y ) p(y X) Kullbac- Leibler w q(y ) q(y X) = q(z, w X) = q 1 (Z X)q (w X) F [q(y X)] q 1 q Z q 1(Z X) = 1, q (w X)dw = 1 log q 1 (Z X) = E q [log P (X, Z, w)] + C 1, (1) log q (w X) = E q1 [log P (X, Z, w)] + C, () C 1, C (1) () 3. (1) ( ) VB e-step ( K ) log ρ n = ψ(α ) ψ α + r n = VB m-step ρ n K =1 ρ n N = n=1 N r n, n=1 M G(η m, η m) m=1 a = a + N N N η m = b + r n x nm, η m = b + r n (1 x nm ) G(η m, η m) n=1 = x nm ψ(η m ) x nm ψ(η m)+ψ(η m) ψ(η m + η m) ψ ψ(a) d da log Γ(a) = Γ (a) Γ(a) q 1 (π) = Dir(π α), q (θ) = K M Beta(θ m η m, η m) =1 m=1

4 [Theorem1:K.Watanabe,S.Watanabe] K 0 K ˆF n 5 5.1 M = 3 p (x) = 0.8 (0.9 x1 0.1 1 x ) 1 + 0. (0.1 x 0.

3 4 [Theorem1:K.Watanabe,S.Watanabe] K 0 K ˆF n M = 3 p (x) = 0.8 (0.9 x x ) (0.1 x x ) 1 λ 1 log n + nk n (ŵ) + c 1 < ˆF n S n < λ log n + c S n K n (ŵ) ŵ c 1, c λ 1 λ M = M+1 λ 1 = { (K 1)a + M, (a M ) MK+K 1, (a > M ) λ = { (K K 0 )a + MK0+K0 1, (a M ) MK+K 1, (a > M ) a a = M a M 0 a > M a = M [1] [1] 1: ( ) 1 0 x i (i = 1, ) a = 3+1 = 1 8 N N = S i (i = 1,, 8) t S (t) i (t = 1,, 3) S 1 S 8 P 1 P 8

4 VB e-step log ρ Si = Ψ α () + S (1) i Ψ 1 () + (1 S (1) i )Ψ 1() + Ψ () + S () i Ψ 1 () + (1 S () i )Ψ 1() + Ψ () + S (3) i Ψ 1 () + (1 S (3) i )Ψ 1() + Ψ () ρ Si r Si = 4 =1 ρ S i VB m-step N = 4 NP i r Si, i=1 a = a + N η 1 = b + r S1 NP 1 + r S NP + r S3 NP 3 + r S4 NP 4 η = b + r S1NP 1 + r SNP + r S5NP 5 + r S6NP 6 η 3 = b + r S1 NP 1 + r S4 NP 4 + r S5 NP 5 + r S7 NP 7 η 1 = b + r S5 NP 5 + r S6 NP 6 + r S7 NP 7 + r S8 NP 8 η = b + r S3NP 3 + r S4NP 4 + r S7NP 7 + r S8NP 8 η 3 = b + r SNP + r S3NP 3 + r S6NP 6 + r S8NP 8 ( K ) Ψ α () = ψ(α ) ψ α, Ψ 1 () = ψ(η 1 ) ψ(η 1) + ψ(η 1) ψ(η 1 + η 1), Ψ () = ψ(η ) ψ(η ) + ψ(η ) ψ(η + η ), Ψ 1() = ψ(η 1) ψ(η 1 + η 1), Ψ () = ψ(η ) ψ(η + η ) 5. K = 4 a, b (log ) () π 1,, π 4 z = π π 0. z 0 ( ) a = M = 3+1 = b : a b z = π π A: B:A C C: () a b 0.5 b = 0.5 b = 1 a b 1 0 [11] a >.0 b B A

19, No. 4, pp. 111-1153, 007. [3] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 006. [4] S. Watanabe. Algebraic analysis for singular statistical estimation. Proc.

5 C b = 0.5 a <.0 C A 4 M = ( ) / 6 M =, M = 3 3: ( : ), ( : ), ( ) A,B,C [1] K. Watanabe and S. Watanabe. Stochastic complexities of general mixture models in Variational Bayesian Approximation. Neural Computation, Vol. 18, No. 5, pp , 006. [] S. Naajima and S. Watanabe. Variational Bayes Solution of Linear Neural Networs and its Generalization Performance. Neural Computation, Vol. 19, No. 4, pp , 007. [3] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 006. [4] S. Watanabe. Algebraic analysis for singular statistical estimation. Proc. of International Journal of AlgorithmicLearningTheory Lecture Notes on Computer Sciences,170, pp.39-50, : M = ( ) ( ) a, b [5] S. Watanabe. Algebraic Analysis for Nonidentifiable LearningMachines. Neural Computation, Vol.13, No.4, pp , 001 [6] S. Watanabe. Learning efficiency of redundant neural networsin Bayesian estimation. IEEE Transactions on NeuralNetwors, Vol.1, No.6, pp , 001.

6 [7] H. Attias. Inferring parameters and structure of latent variable models by variational Bayes, In Proc. of Uncertainty in Artificial Intelligence(UAI 99),1999. [8] M. J. Beal. Variational Algorithms for approximate Bayesian inference. PhD thesis, University College London, 003. [9] Z. Ghahramani and M. J. Beal. Graphical Models and Variational Methods. In Advanced Mean Field. Methods. MIT Press, 000 [10] P. F. Lazarsfeld and N. W. Henry. Latent structure analysis. Houghton Mifflin, 1968 [11] D. Kaji and S. Watanabe. Optimal Hyperparameters for Generalized Learning and Knowledge Discovery in Variational Bayes. To appear in Proc. of ICONIP, 009 [1],.. NC, January 009.

Stochastic Complexity of Variational Bayesian Hidden Markov Models

Stochastic Complexity of Variational Bayesian Hidden Markov Models Tikara Hosino Department of Computational Intelligence and System Science, Tokyo Institute of Technology Mailbox R-5, 459 Nagatsuta, Midori-ku,