A Theory of Mean Field Approximation

Size: px

Start display at page:

Download "A Theory of Mean Field Approximation"

Helena Garrison
6 years ago
Views:

1 A Theory of Mean ield Approximation T. Tanaka Department of Electronics nformation Engineering Tokyo Metropolitan University, MinamiOsawa, Hachioji, Tokyo Japan Abstract present a theory of mean field approximation based on information geometry. This theory includes in a consistent way the naive mean field approximation, as well as the TAP approach the linear response theorem in statistical physics, giving clear informationtheoretic interpretations to them. NTRODUCTON Many problems of neural networks, such as learning pattern recognition, can be cast into a framework of statistical estimation problem. How difficult it is to solve a particular problem depends on a statistical model one employs in solving the problem. or Boltzmann machines[] for example, it is computationally very hard to evaluate expectations of state variables from the model parameters. Mean field approximation[2], which is originated in statistical physics, has been frequently used in practical situations in order to circumvent this difficulty. n the context of statistical physics several advanced theories have been known, such as the TAP approach[3], linear response theorem[4], so on. or neural networks, application of mean field approximation has been mostly confined to that of the socalled naive mean field approximation, but there are also attempts to utilize those advanced theories[5, 6, 7, 8]. n this paper present an informationtheoretic formulation of mean field approximation. t is based on information geometry[9], which has been successfully applied to several problems in neural networks[0]. This formulation includes the naive mean field approximation as well as the advanced theories in a consistent way. give the formulation for Boltzmann machines, but its extension to wider classes of statistical models is possible, as described elsewhere[]. 2 BOLTZMANN MACHNES A Boltzmann machine is a statistical model with binary rom variables,. The vector is called the state of the Boltzmann machine.

2 0 0 0 The state is also a rom variable, its probability law is given by the Boltzmann ibbs distribution () where is the energy defined by (2) with the parameters, is determined by the normalization condition is called "! the Helmholtz free energy of. The notation means that the summation should be taken over all distinct pairs. Let $ &%' $ (%), where +* means the expectation with respect to. The following problem is essential for Boltzmann machines: Problem Evaluate the expectations $ the Boltzmann machine. $ from the parameters of 3 NORMATON EOMETRY 3. ORTHOONAL DUAL OLATONS A whole set, of the Boltzmannibbs distribution () realizable by a Boltzmann machine is regarded as an exponential family. "! Let us use shorth notations,.,, to represent distinct pairs of indices, such as. The parameters constitute a coordinate system of,, called the canonical parameters of,. The expectations $ $ constitute another coordinate system of,, called the expectation parameters of,. Let 0& be a subset of, on which are all equal to zero. call 02 the factorizable submodel of, since 304 can be factorized with respect to. On 0& the problem is easy: Since are all zero, are statistically independent of others, therefore $ 65798: $ $ ;$< hold. Mean field approximation systematically reduces the problem onto the factorizable submodel 0(. or this reduction, introduce dual foliations 0 onto,. The foliation >0,, BADC?, is parametrized by % each leaf 0 is defined as E (3) The leaf 0 ; is the same as 0&, the factorizable submodel. Each leaf 0 is again an exponential family with $ the canonical the expectation parameters, respectively. A pair of dual potentials is defined on each leaf, one is the Helmholtz free energy another is its Legendre transform, or the ibbs free energy, %H % +$ the parameters of J0 are given by $ HK K K where K % ;KML>K ADP $. Another foliation O, is parametrized by % " each leaf E>$ H (4) (5) > is defined as,, (6)

3 %? Each leaf is not an exponential family, but again a pair of dual potentials is defined on each leaf, the former is given by $ the latter by its Legendre transform as the parameters of where K $ K are given by $ HK (7) (8) (9) $. These two foliations form the orthogonal dual foliations, since the leaves 0 are orthogonal at their intersecting point. introduce still another coordinate system on,, called the mixed coordinate system, on the basis of the orthogonal dual foliations. t uses a pair of the expectation the canonical parameters to specify a single element, O. The part specifies the leaf? on which resides, the part specifies the leaf REORMULATON O PROBLEM Assume that a target Boltzmann machine is given by specifying its parameters. Problem is restated as follows: evaluate its expectations $ $ from those parameters. To evaluate $ mean field approximation translates the problem into the following one: Problem 2 Let 0 to. be a leaf on which resides. ind 0 which is the closest At first sight this problem is trivial, since one immediately finds the solution. However, solving this problem with respect to $ is nontrivial, it is the key to understing of mean field approximation including advanced theories. Let us measure the proximity of to by the Kullback divergence (0) then solving Problem 2 reduces to finding a minimizer J0 of for a given. or J0, is expressed in terms of the dual potentials of 0 as $ () The minimization problem is thus equivalent to minimizing $ (2) % since K in eq. () does not depend on. Solving the stationary condition with respect to $ will give the correct expectations $, since the true minimizer is. However, the scenario is in general intractable since cannot be given explicitly as a function of $. H

4 3.3 PLEKA EXPANSON The problem is easy if %6$ as. n this case is given explicitly as a function of Minimization of with respect to gives the solution When 5798: (3) as expected. the expression (3) is no longer exact, but to compensate the error one may use,? leaving convergence problem aside, the Taylor expansion of % with K K 6** * (4) This expansion has been called the Plefka expansion[2] in the literature of spin glasses. Note that in considering O O the expansion one should temporarily assume that is fixed: One can rely on the solution evaluated from O K the stationary condition only if the expansion does not change the value of. The coefficients in the expansion can be efficiently computed by fully utilizing the orthogonal dual structure of the foliations. irst, we have the following theorem: Theorem The coefficients of the expansion (4) are given by the cumulant tensors of the corresponding orders, defined on. Because K holds, one can consider derivatives of instead of those of. The firstorder derivatives are immediately given by the property of the potential of the leaf (eq. (9)), yielding K $ (5) where denotes the distribution on? corresponding to. The coefficients of the lowestorders, including the firstorder one, are given by the following theorem. Theorem 2 The first, second, thirdorder coefficients of the expansion (4) are given by: K $ K K ;K ;K (6) where %. K K K ;K ;K ;K The proofs will be found in []. t should be noted that, although these results happen to be the same as the ones which would be obtained by regarding as an exponential family, they are not the same in general since actually is not an exponential family; for example, they are different for the fourthorder coefficients. The explicit formulas for these coefficients for Boltzmann machines are given as follows: or the firstorder, K 6 (7)

5 K or the secondorder, ;K or the thirdorder, "!!,., K K K K K or other combinations of,.,, K K K H (8). (9) (20) for three distinct indices!,,, (2) H (22) 4 MEAN ELD APPROXMATON 4. MEAN ELD EUATON Truncating the Plefka expansion (4) up to th order term gives th order approximations, %. The Weiss free energy, which is used in the naive mean field approximation, is given by. The TAP approach picks up all relevant terms of the Plefka expansion[2], for the SK model it gives the secondorder approximation. K H The stationary condition gives the socalled mean field equation, from which a solution of the approximate minimization problem is to be determined. or it takes the following familiar form, 5798: for it includes the socalled Onsager reaction term. 57N8 : Note that all of these are expressed as functions of. eometrically, the mean field equation approximately represents the surface in terms of the mixed coordinate K system of,, since for the exact ibbs free energy, the stationary condition gives. Accordingly, the approximate relation )K O, for fixed, represents the th order approximate expression of the leaf in the canonical coordinate system. The fit of this expression to the true leaf? around the point becomes better as the order of approximation gets higher, as seen in ig.. Such a behavior is well expected, since the Plefka expansion is essentially a Taylor expansion. 4.2 LNEAR RESPONSE or estimating $ one can utilize the linear response theorem. n information geometrical framework it is represented as a trivial identity relation for the isher information on the leaf 0. The isher information matrix, or the Riemannian metric tensor, on the leaf 0, its inverse are given by K $ $ $> (25) H (23) (24)

6 0.75 η A(m) η A(m) 0 0th order st order 2nd order 3rd order 4th order η η 2 η η 2 igure : Approximate expressions of " by mean; field approximations of several orders for 2unit Boltzmann machine, with (left), their magnified view (right). q D(p 0 q) D(p q) p (w) D(p 0 p) p 0 0 A(m) igure 2: Relation between naive approximation present theory. HK K (26) respectively. n the framework here, the linear response theorem states the trivial fact that those are the inverse of the other. n mean field approximation, one substitutes an approximation in place of in eq. (26) to get an approximate inverse of the metric. The derivatives in eq. (26) can be analytically calculated, therefore can be numerically evaluated by substituting to it a solution of the mean field equation. Equating its by using eq. (25). So far, Problem has been inverse to gives an estimate of $ solved within the framework of mean field approximation, with $ obtained by the mean field equation the linear response theorem, respectively. 5 DSCUSSON ollowing the framework presented so far, one can in principle construct algorithms of mean field approximation of desired orders. The firstorder algorithm with linear response has been first proposed examined by Kappen Rodríguez[7, 8]. Tanaka[3] has formulated second thirdorder algorithms explored them by computer simulations. t is also possible to extend the present formulation so that it can be applicable to higherorder Boltzmann machines. Tanaka[4] discusses an extension of the present formulation to thirdorder Boltzmann machines: t is possible to extend linear response theorem to higherorders, it allows us to treat higherorder correlations within the framework of mean field approximation.

7 The common understing about the naive mean field approximation is that it minimizes Kullback divergence with respect to 0 for a given. t can be shown that this view is consistent with the theory presented in this paper. Assume that 0, let be a distribution corresponding the intersecting point of the leaves 0. Because of the orthogonality of the two foliations 0 the following Pythagorean law[9] holds (ig. 2). + (27) ntuitively, measures the squared distance between 0 0, is a second? order quantity in. t should be ignored in the firstorder approximation, thus holds. Under this approximation minimization of the former with respect to is equivalent to that of the latter with respect to, which establishes the relation between the naive approximation the present theory. t can also be checked directly that the firstorder approximation of exactly gives, the Weiss free energy. The present theory provides an alternative view about the validity of mean field approximation: As opposed to a common belief that mean field approximation is a good one when is sufficiently large, one can state from the present formulation that it is so whenever higherorder contribution of the Plefka expansion vanishes, regardless of whether is large or not. This provides a theoretical basis for the observation that mean field approximation often works well for small networks. The author would like to thank the Telecommunications Advancement oundation for financial support. References [] Ackley, D. H., Hinton,. E., Sejnowski, T. J. (985) A learning algorithm for Boltzmann machines. Cognitive Science 9: [2] Peterson, C., Anderson, J. R. (987) A mean field theory learning algorithm for neural networks. Complex Systems : [3] Thouless, D. J., Anderson, P. W., Palmer, R.. (977) Solution of Solvable model of a spin glass. Phil. Mag. 35 (3): [4] Parisi,. (988) Statistical ield Theory. AddisonWesley. [5] all, C. C. (993) The limitations of deterministic Boltzmann machine learning. Network 4 (3): [6] Hofmann, T. Buhmann, J. M. (997) Pairwise data clustering by deterministic annealing. EEE Trans. Patt. Anal. & Machine ntell. 9 (): 4; Errata, ibid. 9 (2): 97 (997). [7] Kappen, H. J. Rodríguez,. B. (998) Efficient learning in Boltzmann machines using linear response theory. Neural Computation. 0 (5): [8] Kappen, H. J. Rodríguez,. B. (998) Boltzmann machine learning using mean field theory linear response correction. n M.. Jordan, M. J. Kearns, S. A. Solla (Eds.), Advances in Neural nformation Processing Systems 0, pp The MT Press. [9] Amari, S.. (985) Differentialeometrical Method in Statistics. Lecture Notes in Statistics 28, SpringerVerlag. [0] Amari, S.., Kurata, K., Nagaoka, H. (992) nformation geometry of Boltzmann machines. EEE Trans. Neural Networks 3 (2): [] Tanaka, T. nformation geometry of mean field approximation. preprint. [2] Plefka, P. (982) Convergence condition of the TAP equation for the infiniteranged sing spin glass model. J. Phys. A: Math. en. 5 (6): [3] Tanaka, T. (998) Mean field theory of Boltzmann machine learning. Phys. Rev. E. 58 (2): [4] Tanaka, T. (998) Estimation of thirdorder correlations within mean field approximation. n S. Usui T. Omori (Eds.), Proc. ifth nternational Conference on Neural nformation Processing, vol., pp

Mean-field equations for higher-order quantum statistical models : an information geometric approach

Mean-field equations for higher-order quantum statistical models : an information geometric approach N Yapage Department of Mathematics University of Ruhuna, Matara Sri Lanka. arxiv:1202.5726v1 [quant-ph]