A Theory of Mean Field Approximation

Size: px
Start display at page:

Download "A Theory of Mean Field Approximation"

Transcription

1 A Theory of Mean ield Approximation T. Tanaka Department of Electronics nformation Engineering Tokyo Metropolitan University, MinamiOsawa, Hachioji, Tokyo Japan Abstract present a theory of mean field approximation based on information geometry. This theory includes in a consistent way the naive mean field approximation, as well as the TAP approach the linear response theorem in statistical physics, giving clear informationtheoretic interpretations to them. NTRODUCTON Many problems of neural networks, such as learning pattern recognition, can be cast into a framework of statistical estimation problem. How difficult it is to solve a particular problem depends on a statistical model one employs in solving the problem. or Boltzmann machines[] for example, it is computationally very hard to evaluate expectations of state variables from the model parameters. Mean field approximation[2], which is originated in statistical physics, has been frequently used in practical situations in order to circumvent this difficulty. n the context of statistical physics several advanced theories have been known, such as the TAP approach[3], linear response theorem[4], so on. or neural networks, application of mean field approximation has been mostly confined to that of the socalled naive mean field approximation, but there are also attempts to utilize those advanced theories[5, 6, 7, 8]. n this paper present an informationtheoretic formulation of mean field approximation. t is based on information geometry[9], which has been successfully applied to several problems in neural networks[0]. This formulation includes the naive mean field approximation as well as the advanced theories in a consistent way. give the formulation for Boltzmann machines, but its extension to wider classes of statistical models is possible, as described elsewhere[]. 2 BOLTZMANN MACHNES A Boltzmann machine is a statistical model with binary rom variables,. The vector is called the state of the Boltzmann machine.

2 0 0 0 The state is also a rom variable, its probability law is given by the Boltzmann ibbs distribution () where is the energy defined by (2) with the parameters, is determined by the normalization condition is called "! the Helmholtz free energy of. The notation means that the summation should be taken over all distinct pairs. Let $ &%' $ (%), where +* means the expectation with respect to. The following problem is essential for Boltzmann machines: Problem Evaluate the expectations $ the Boltzmann machine. $ from the parameters of 3 NORMATON EOMETRY 3. ORTHOONAL DUAL OLATONS A whole set, of the Boltzmannibbs distribution () realizable by a Boltzmann machine is regarded as an exponential family. "! Let us use shorth notations,.,, to represent distinct pairs of indices, such as. The parameters constitute a coordinate system of,, called the canonical parameters of,. The expectations $ $ constitute another coordinate system of,, called the expectation parameters of,. Let 0& be a subset of, on which are all equal to zero. call 02 the factorizable submodel of, since 304 can be factorized with respect to. On 0& the problem is easy: Since are all zero, are statistically independent of others, therefore $ 65798: $ $ ;$< hold. Mean field approximation systematically reduces the problem onto the factorizable submodel 0(. or this reduction, introduce dual foliations 0 onto,. The foliation >0,, BADC?, is parametrized by % each leaf 0 is defined as E (3) The leaf 0 ; is the same as 0&, the factorizable submodel. Each leaf 0 is again an exponential family with $ the canonical the expectation parameters, respectively. A pair of dual potentials is defined on each leaf, one is the Helmholtz free energy another is its Legendre transform, or the ibbs free energy, %H % +$ the parameters of J0 are given by $ HK K K where K % ;KML>K ADP $. Another foliation O, is parametrized by % " each leaf E>$ H (4) (5) > is defined as,, (6)

3 %? Each leaf is not an exponential family, but again a pair of dual potentials is defined on each leaf, the former is given by $ the latter by its Legendre transform as the parameters of where K $ K are given by $ HK (7) (8) (9) $. These two foliations form the orthogonal dual foliations, since the leaves 0 are orthogonal at their intersecting point. introduce still another coordinate system on,, called the mixed coordinate system, on the basis of the orthogonal dual foliations. t uses a pair of the expectation the canonical parameters to specify a single element, O. The part specifies the leaf? on which resides, the part specifies the leaf REORMULATON O PROBLEM Assume that a target Boltzmann machine is given by specifying its parameters. Problem is restated as follows: evaluate its expectations $ $ from those parameters. To evaluate $ mean field approximation translates the problem into the following one: Problem 2 Let 0 to. be a leaf on which resides. ind 0 which is the closest At first sight this problem is trivial, since one immediately finds the solution. However, solving this problem with respect to $ is nontrivial, it is the key to understing of mean field approximation including advanced theories. Let us measure the proximity of to by the Kullback divergence (0) then solving Problem 2 reduces to finding a minimizer J0 of for a given. or J0, is expressed in terms of the dual potentials of 0 as $ () The minimization problem is thus equivalent to minimizing $ (2) % since K in eq. () does not depend on. Solving the stationary condition with respect to $ will give the correct expectations $, since the true minimizer is. However, the scenario is in general intractable since cannot be given explicitly as a function of $. H

4 3.3 PLEKA EXPANSON The problem is easy if %6$ as. n this case is given explicitly as a function of Minimization of with respect to gives the solution When 5798: (3) as expected. the expression (3) is no longer exact, but to compensate the error one may use,? leaving convergence problem aside, the Taylor expansion of % with K K 6** * (4) This expansion has been called the Plefka expansion[2] in the literature of spin glasses. Note that in considering O O the expansion one should temporarily assume that is fixed: One can rely on the solution evaluated from O K the stationary condition only if the expansion does not change the value of. The coefficients in the expansion can be efficiently computed by fully utilizing the orthogonal dual structure of the foliations. irst, we have the following theorem: Theorem The coefficients of the expansion (4) are given by the cumulant tensors of the corresponding orders, defined on. Because K holds, one can consider derivatives of instead of those of. The firstorder derivatives are immediately given by the property of the potential of the leaf (eq. (9)), yielding K $ (5) where denotes the distribution on? corresponding to. The coefficients of the lowestorders, including the firstorder one, are given by the following theorem. Theorem 2 The first, second, thirdorder coefficients of the expansion (4) are given by: K $ K K ;K ;K (6) where %. K K K ;K ;K ;K The proofs will be found in []. t should be noted that, although these results happen to be the same as the ones which would be obtained by regarding as an exponential family, they are not the same in general since actually is not an exponential family; for example, they are different for the fourthorder coefficients. The explicit formulas for these coefficients for Boltzmann machines are given as follows: or the firstorder, K 6 (7)

5 K or the secondorder, ;K or the thirdorder, "!!,., K K K K K or other combinations of,.,, K K K H (8). (9) (20) for three distinct indices!,,, (2) H (22) 4 MEAN ELD APPROXMATON 4. MEAN ELD EUATON Truncating the Plefka expansion (4) up to th order term gives th order approximations, %. The Weiss free energy, which is used in the naive mean field approximation, is given by. The TAP approach picks up all relevant terms of the Plefka expansion[2], for the SK model it gives the secondorder approximation. K H The stationary condition gives the socalled mean field equation, from which a solution of the approximate minimization problem is to be determined. or it takes the following familiar form, 5798: for it includes the socalled Onsager reaction term. 57N8 : Note that all of these are expressed as functions of. eometrically, the mean field equation approximately represents the surface in terms of the mixed coordinate K system of,, since for the exact ibbs free energy, the stationary condition gives. Accordingly, the approximate relation )K O, for fixed, represents the th order approximate expression of the leaf in the canonical coordinate system. The fit of this expression to the true leaf? around the point becomes better as the order of approximation gets higher, as seen in ig.. Such a behavior is well expected, since the Plefka expansion is essentially a Taylor expansion. 4.2 LNEAR RESPONSE or estimating $ one can utilize the linear response theorem. n information geometrical framework it is represented as a trivial identity relation for the isher information on the leaf 0. The isher information matrix, or the Riemannian metric tensor, on the leaf 0, its inverse are given by K $ $ $> (25) H (23) (24)

6 0.75 η A(m) η A(m) 0 0th order st order 2nd order 3rd order 4th order η η 2 η η 2 igure : Approximate expressions of " by mean; field approximations of several orders for 2unit Boltzmann machine, with (left), their magnified view (right). q D(p 0 q) D(p q) p (w) D(p 0 p) p 0 0 A(m) igure 2: Relation between naive approximation present theory. HK K (26) respectively. n the framework here, the linear response theorem states the trivial fact that those are the inverse of the other. n mean field approximation, one substitutes an approximation in place of in eq. (26) to get an approximate inverse of the metric. The derivatives in eq. (26) can be analytically calculated, therefore can be numerically evaluated by substituting to it a solution of the mean field equation. Equating its by using eq. (25). So far, Problem has been inverse to gives an estimate of $ solved within the framework of mean field approximation, with $ obtained by the mean field equation the linear response theorem, respectively. 5 DSCUSSON ollowing the framework presented so far, one can in principle construct algorithms of mean field approximation of desired orders. The firstorder algorithm with linear response has been first proposed examined by Kappen Rodríguez[7, 8]. Tanaka[3] has formulated second thirdorder algorithms explored them by computer simulations. t is also possible to extend the present formulation so that it can be applicable to higherorder Boltzmann machines. Tanaka[4] discusses an extension of the present formulation to thirdorder Boltzmann machines: t is possible to extend linear response theorem to higherorders, it allows us to treat higherorder correlations within the framework of mean field approximation.

7 The common understing about the naive mean field approximation is that it minimizes Kullback divergence with respect to 0 for a given. t can be shown that this view is consistent with the theory presented in this paper. Assume that 0, let be a distribution corresponding the intersecting point of the leaves 0. Because of the orthogonality of the two foliations 0 the following Pythagorean law[9] holds (ig. 2). + (27) ntuitively, measures the squared distance between 0 0, is a second? order quantity in. t should be ignored in the firstorder approximation, thus holds. Under this approximation minimization of the former with respect to is equivalent to that of the latter with respect to, which establishes the relation between the naive approximation the present theory. t can also be checked directly that the firstorder approximation of exactly gives, the Weiss free energy. The present theory provides an alternative view about the validity of mean field approximation: As opposed to a common belief that mean field approximation is a good one when is sufficiently large, one can state from the present formulation that it is so whenever higherorder contribution of the Plefka expansion vanishes, regardless of whether is large or not. This provides a theoretical basis for the observation that mean field approximation often works well for small networks. The author would like to thank the Telecommunications Advancement oundation for financial support. References [] Ackley, D. H., Hinton,. E., Sejnowski, T. J. (985) A learning algorithm for Boltzmann machines. Cognitive Science 9: [2] Peterson, C., Anderson, J. R. (987) A mean field theory learning algorithm for neural networks. Complex Systems : [3] Thouless, D. J., Anderson, P. W., Palmer, R.. (977) Solution of Solvable model of a spin glass. Phil. Mag. 35 (3): [4] Parisi,. (988) Statistical ield Theory. AddisonWesley. [5] all, C. C. (993) The limitations of deterministic Boltzmann machine learning. Network 4 (3): [6] Hofmann, T. Buhmann, J. M. (997) Pairwise data clustering by deterministic annealing. EEE Trans. Patt. Anal. & Machine ntell. 9 (): 4; Errata, ibid. 9 (2): 97 (997). [7] Kappen, H. J. Rodríguez,. B. (998) Efficient learning in Boltzmann machines using linear response theory. Neural Computation. 0 (5): [8] Kappen, H. J. Rodríguez,. B. (998) Boltzmann machine learning using mean field theory linear response correction. n M.. Jordan, M. J. Kearns, S. A. Solla (Eds.), Advances in Neural nformation Processing Systems 0, pp The MT Press. [9] Amari, S.. (985) Differentialeometrical Method in Statistics. Lecture Notes in Statistics 28, SpringerVerlag. [0] Amari, S.., Kurata, K., Nagaoka, H. (992) nformation geometry of Boltzmann machines. EEE Trans. Neural Networks 3 (2): [] Tanaka, T. nformation geometry of mean field approximation. preprint. [2] Plefka, P. (982) Convergence condition of the TAP equation for the infiniteranged sing spin glass model. J. Phys. A: Math. en. 5 (6): [3] Tanaka, T. (998) Mean field theory of Boltzmann machine learning. Phys. Rev. E. 58 (2): [4] Tanaka, T. (998) Estimation of thirdorder correlations within mean field approximation. n S. Usui T. Omori (Eds.), Proc. ifth nternational Conference on Neural nformation Processing, vol., pp

Mean-field equations for higher-order quantum statistical models : an information geometric approach

Mean-field equations for higher-order quantum statistical models : an information geometric approach Mean-field equations for higher-order quantum statistical models : an information geometric approach N Yapage Department of Mathematics University of Ruhuna, Matara Sri Lanka. arxiv:1202.5726v1 [quant-ph]

More information

Mean field methods for classification with Gaussian processes

Mean field methods for classification with Gaussian processes Mean field methods for classification with Gaussian processes Manfred Opper Neural Computing Research Group Division of Electronic Engineering and Computer Science Aston University Birmingham B4 7ET, UK.

More information

Information Geometry on Hierarchy of Probability Distributions

Information Geometry on Hierarchy of Probability Distributions IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 5, JULY 2001 1701 Information Geometry on Hierarchy of Probability Distributions Shun-ichi Amari, Fellow, IEEE Abstract An exponential family or mixture

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

Inference in kinetic Ising models: mean field and Bayes estimators

Inference in kinetic Ising models: mean field and Bayes estimators Inference in kinetic Ising models: mean field and Bayes estimators Ludovica Bachschmid-Romano, Manfred Opper Artificial Intelligence group, Computer Science, TU Berlin, Germany October 23, 2015 L. Bachschmid-Romano,

More information

AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD

AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD C.T.J. DODSON AND HIROSHI MATSUZOE Abstract. For the space of gamma distributions with Fisher metric and exponential connections, natural coordinate systems, potential

More information

An Information Geometry Perspective on Estimation of Distribution Algorithms: Boundary Analysis

An Information Geometry Perspective on Estimation of Distribution Algorithms: Boundary Analysis An Information Geometry Perspective on Estimation of Distribution Algorithms: Boundary Analysis Luigi Malagò Department of Electronics and Information Politecnico di Milano Via Ponzio, 34/5 20133 Milan,

More information

Barber & van de Laar In this article, we describe an alternative approach which is a perturbation around standard variational methods. This has the pr

Barber & van de Laar In this article, we describe an alternative approach which is a perturbation around standard variational methods. This has the pr Journal of Articial Intelligence Research 1 (1999) 435-455 Submitted 8/98; published 6/99 Variational Cumulant Expansions for Intractable Distributions David Barber Pierre van de Laar RWCP (Real World

More information

Belief Propagation for Traffic forecasting

Belief Propagation for Traffic forecasting Belief Propagation for Traffic forecasting Cyril Furtlehner (INRIA Saclay - Tao team) context : Travesti project http ://travesti.gforge.inria.fr/) Anne Auger (INRIA Saclay) Dimo Brockhoff (INRIA Lille)

More information

Information Geometric Structure on Positive Definite Matrices and its Applications

Information Geometric Structure on Positive Definite Matrices and its Applications Information Geometric Structure on Positive Definite Matrices and its Applications Atsumi Ohara Osaka University 2010 Feb. 21 at Osaka City University 大阪市立大学数学研究所情報幾何関連分野研究会 2010 情報工学への幾何学的アプローチ 1 Outline

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a preprint version which may differ from the publisher's version. For additional information about this

More information

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/112727

More information

Information Geometric view of Belief Propagation

Information Geometric view of Belief Propagation Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural

More information

HOPFIELD neural networks (HNNs) are a class of nonlinear

HOPFIELD neural networks (HNNs) are a class of nonlinear IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II: EXPRESS BRIEFS, VOL. 52, NO. 4, APRIL 2005 213 Stochastic Noise Process Enhancement of Hopfield Neural Networks Vladimir Pavlović, Member, IEEE, Dan Schonfeld,

More information

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN

Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables. Revised submission to IEEE TNN Connections between score matching, contrastive divergence, and pseudolikelihood for continuous-valued variables Revised submission to IEEE TNN Aapo Hyvärinen Dept of Computer Science and HIIT University

More information

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network

Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network LETTER Communicated by Geoffrey Hinton Equivalence of Backpropagation and Contrastive Hebbian Learning in a Layered Network Xiaohui Xie xhx@ai.mit.edu Department of Brain and Cognitive Sciences, Massachusetts

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Information geometry of Bayesian statistics

Information geometry of Bayesian statistics Information geometry of Bayesian statistics Hiroshi Matsuzoe Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Nagoya 466-8555, Japan Abstract.

More information

Information Geometry

Information Geometry 2015 Workshop on High-Dimensional Statistical Analysis Dec.11 (Friday) ~15 (Tuesday) Humanities and Social Sciences Center, Academia Sinica, Taiwan Information Geometry and Spontaneous Data Learning Shinto

More information

Learning Binary Classifiers for Multi-Class Problem

Learning Binary Classifiers for Multi-Class Problem Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,

More information

Energy-Decreasing Dynamics in Mean-Field Spin Models

Energy-Decreasing Dynamics in Mean-Field Spin Models arxiv:cond-mat/0210545 v1 24 Oct 2002 Energy-Decreasing Dynamics in Mean-Field Spin Models L. Bussolari, P. Contucci, M. Degli Esposti, C. Giardinà Dipartimento di Matematica dell Università di Bologna,

More information

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure Entropy 014, 16, 131-145; doi:10.3390/e1604131 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable

More information

Information Theory and Predictability Lecture 6: Maximum Entropy Techniques

Information Theory and Predictability Lecture 6: Maximum Entropy Techniques Information Theory and Predictability Lecture 6: Maximum Entropy Techniques 1 Philosophy Often with random variables of high dimensional systems it is difficult to deduce the appropriate probability distribution

More information

QM and Angular Momentum

QM and Angular Momentum Chapter 5 QM and Angular Momentum 5. Angular Momentum Operators In your Introductory Quantum Mechanics (QM) course you learned about the basic properties of low spin systems. Here we want to review that

More information

Bregman divergence and density integration Noboru Murata and Yu Fujimoto

Bregman divergence and density integration Noboru Murata and Yu Fujimoto Journal of Math-for-industry, Vol.1(2009B-3), pp.97 104 Bregman divergence and density integration Noboru Murata and Yu Fujimoto Received on August 29, 2009 / Revised on October 4, 2009 Abstract. In this

More information

Introduction to the Mathematics of the XY -Spin Chain

Introduction to the Mathematics of the XY -Spin Chain Introduction to the Mathematics of the XY -Spin Chain Günter Stolz June 9, 2014 Abstract In the following we present an introduction to the mathematical theory of the XY spin chain. The importance of this

More information

The Rectified Gaussian Distribution

The Rectified Gaussian Distribution The Rectified Gaussian Distribution N. D. Socci, D. D. Lee and H. S. Seung Bell Laboratories, Lucent Technologies Murray Hill, NJ 07974 {ndslddleelseung}~bell-labs.com Abstract A simple but powerful modification

More information

arxiv: v3 [cond-mat.dis-nn] 18 May 2011

arxiv: v3 [cond-mat.dis-nn] 18 May 2011 Exact mean field inference in asymmetric kinetic Ising systems M. Mézard and J. Sakellariou Laboratoire de Physique Théorique et Modèles Statistiques, CNRS and Université Paris-Sud, Bât 100, 91405 Orsay

More information

Chaos and Liapunov exponents

Chaos and Liapunov exponents PHYS347 INTRODUCTION TO NONLINEAR PHYSICS - 2/22 Chaos and Liapunov exponents Definition of chaos In the lectures we followed Strogatz and defined chaos as aperiodic long-term behaviour in a deterministic

More information

Belief Propagation, Information Projections, and Dykstra s Algorithm

Belief Propagation, Information Projections, and Dykstra s Algorithm Belief Propagation, Information Projections, and Dykstra s Algorithm John MacLaren Walsh, PhD Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu

More information

Transverse Euler Classes of Foliations on Non-atomic Foliation Cycles Steven HURDER & Yoshihiko MITSUMATSU. 1 Introduction

Transverse Euler Classes of Foliations on Non-atomic Foliation Cycles Steven HURDER & Yoshihiko MITSUMATSU. 1 Introduction 1 Transverse Euler Classes of Foliations on Non-atomic Foliation Cycles Steven HURDER & Yoshihiko MITSUMATSU 1 Introduction The main purpose of this article is to give a geometric proof of the following

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Mixture Models and Representational Power of RBM s, DBN s and DBM s

Mixture Models and Representational Power of RBM s, DBN s and DBM s Mixture Models and Representational Power of RBM s, DBN s and DBM s Guido Montufar Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany. montufar@mis.mpg.de Abstract

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. A Superharmonic Prior for the Autoregressive Process of the Second Order

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. A Superharmonic Prior for the Autoregressive Process of the Second Order MATHEMATICAL ENGINEERING TECHNICAL REPORTS A Superharmonic Prior for the Autoregressive Process of the Second Order Fuyuhiko TANAKA and Fumiyasu KOMAKI METR 2006 30 May 2006 DEPARTMENT OF MATHEMATICAL

More information

Natural Gradient Learning for Over- and Under-Complete Bases in ICA

Natural Gradient Learning for Over- and Under-Complete Bases in ICA NOTE Communicated by Jean-François Cardoso Natural Gradient Learning for Over- and Under-Complete Bases in ICA Shun-ichi Amari RIKEN Brain Science Institute, Wako-shi, Hirosawa, Saitama 351-01, Japan Independent

More information

arxiv: v1 [hep-ph] 5 Sep 2017

arxiv: v1 [hep-ph] 5 Sep 2017 A First Step Towards Effectively Nonperturbative Scattering Amplitudes in the Perturbative Regime Neil Christensen, Joshua Henderson, Santiago Pinto, and Cory Russ Department of Physics, Illinois State

More information

Neural Codes and Neural Rings: Topology and Algebraic Geometry

Neural Codes and Neural Rings: Topology and Algebraic Geometry Neural Codes and Neural Rings: Topology and Algebraic Geometry Ma191b Winter 2017 Geometry of Neuroscience References for this lecture: Curto, Carina; Itskov, Vladimir; Veliz-Cuba, Alan; Youngs, Nora,

More information

Geometry of U-Boost Algorithms

Geometry of U-Boost Algorithms Geometry of U-Boost Algorithms Noboru Murata 1, Takashi Takenouchi 2, Takafumi Kanamori 3, Shinto Eguchi 2,4 1 School of Science and Engineering, Waseda University 2 Department of Statistical Science,

More information

Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem

Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem https://doi.org/10.1007/s41884-018-0002-8 RESEARCH PAPER Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem Shun-ichi Amari

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Lecture 3: Error Correcting Codes

Lecture 3: Error Correcting Codes CS 880: Pseudorandomness and Derandomization 1/30/2013 Lecture 3: Error Correcting Codes Instructors: Holger Dell and Dieter van Melkebeek Scribe: Xi Wu In this lecture we review some background on error

More information

The Metric Geometry of the Multivariable Matrix Geometric Mean

The Metric Geometry of the Multivariable Matrix Geometric Mean Trieste, 2013 p. 1/26 The Metric Geometry of the Multivariable Matrix Geometric Mean Jimmie Lawson Joint Work with Yongdo Lim Department of Mathematics Louisiana State University Baton Rouge, LA 70803,

More information

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali

More information

Robustness of Principal Components

Robustness of Principal Components PCA for Clustering An objective of principal components analysis is to identify linear combinations of the original variables that are useful in accounting for the variation in those original variables.

More information

T sg. α c (0)= T=1/β. α c (T ) α=p/n

T sg. α c (0)= T=1/β. α c (T ) α=p/n Taejon, Korea, vol. 2 of 2, pp. 779{784, Nov. 2. Capacity Analysis of Bidirectional Associative Memory Toshiyuki Tanaka y, Shinsuke Kakiya y, and Yoshiyuki Kabashima z ygraduate School of Engineering,

More information

Propagating beliefs in spin glass models. Abstract

Propagating beliefs in spin glass models. Abstract Propagating beliefs in spin glass models Yoshiyuki Kabashima arxiv:cond-mat/0211500v1 [cond-mat.dis-nn] 22 Nov 2002 Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology,

More information

Tailored Bregman Ball Trees for Effective Nearest Neighbors

Tailored Bregman Ball Trees for Effective Nearest Neighbors Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia

More information

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY 2nd International Symposium on Information Geometry and its Applications December 2-6, 2005, Tokyo Pages 000 000 STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY JUN-ICHI TAKEUCHI, ANDREW R. BARRON, AND

More information

Graphical models: parameter learning

Graphical models: parameter learning Graphical models: parameter learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London London WC1N 3AR, England http://www.gatsby.ucl.ac.uk/ zoubin/ zoubin@gatsby.ucl.ac.uk

More information

Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information

Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information In Geometric Science of Information, 2013, Paris. Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information Roman V. Belavkin 1 School of Engineering and Information Sciences Middlesex University,

More information

MATH 118, LECTURES 27 & 28: TAYLOR SERIES

MATH 118, LECTURES 27 & 28: TAYLOR SERIES MATH 8, LECTURES 7 & 8: TAYLOR SERIES Taylor Series Suppose we know that the power series a n (x c) n converges on some interval c R < x < c + R to the function f(x). That is to say, we have f(x) = a 0

More information

Learning sets and subspaces: a spectral approach

Learning sets and subspaces: a spectral approach Learning sets and subspaces: a spectral approach Alessandro Rudi DIBRIS, Università di Genova Optimization and dynamical processes in Statistical learning and inverse problems Sept 8-12, 2014 A world of

More information

Chapter 2 Exponential Families and Mixture Families of Probability Distributions

Chapter 2 Exponential Families and Mixture Families of Probability Distributions Chapter 2 Exponential Families and Mixture Families of Probability Distributions The present chapter studies the geometry of the exponential family of probability distributions. It is not only a typical

More information

Probabilistic Models in Theoretical Neuroscience

Probabilistic Models in Theoretical Neuroscience Probabilistic Models in Theoretical Neuroscience visible unit Boltzmann machine semi-restricted Boltzmann machine restricted Boltzmann machine hidden unit Neural models of probabilistic sampling: introduction

More information

Spin glasses and Stein s method

Spin glasses and Stein s method UC Berkeley Spin glasses Magnetic materials with strange behavior. ot explained by ferromagnetic models like the Ising model. Theoretically studied since the 70 s. An important example: Sherrington-Kirkpatrick

More information

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 2 Sep 1998

arxiv:cond-mat/ v1 [cond-mat.dis-nn] 2 Sep 1998 Scheme of the replica symmetry breaking for short- ranged Ising spin glass within the Bethe- Peierls method. arxiv:cond-mat/9809030v1 [cond-mat.dis-nn] 2 Sep 1998 K. Walasek Institute of Physics, Pedagogical

More information

Vector Derivatives and the Gradient

Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego p. 1/1 Lecture 10 ECE 275A Vector Derivatives and the Gradient ECE 275AB Lecture 10 Fall 2008 V1.1 c K. Kreutz-Delgado, UC San Diego

More information

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING

FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING FACTOR ANALYSIS AND MULTIDIMENSIONAL SCALING Vishwanath Mantha Department for Electrical and Computer Engineering Mississippi State University, Mississippi State, MS 39762 mantha@isip.msstate.edu ABSTRACT

More information

Dynamical Systems and Deep Learning: Overview. Abbas Edalat

Dynamical Systems and Deep Learning: Overview. Abbas Edalat Dynamical Systems and Deep Learning: Overview Abbas Edalat Dynamical Systems The notion of a dynamical system includes the following: A phase or state space, which may be continuous, e.g. the real line,

More information

Information geometry of the power inverse Gaussian distribution

Information geometry of the power inverse Gaussian distribution Information geometry of the power inverse Gaussian distribution Zhenning Zhang, Huafei Sun and Fengwei Zhong Abstract. The power inverse Gaussian distribution is a common distribution in reliability analysis

More information

Information Projection Algorithms and Belief Propagation

Information Projection Algorithms and Belief Propagation χ 1 π(χ 1 ) Information Projection Algorithms and Belief Propagation Phil Regalia Department of Electrical Engineering and Computer Science Catholic University of America Washington, DC 20064 with J. M.

More information

M. Matrices and Linear Algebra

M. Matrices and Linear Algebra M. Matrices and Linear Algebra. Matrix algebra. In section D we calculated the determinants of square arrays of numbers. Such arrays are important in mathematics and its applications; they are called matrices.

More information

Does the Wake-sleep Algorithm Produce Good Density Estimators?

Does the Wake-sleep Algorithm Produce Good Density Estimators? Does the Wake-sleep Algorithm Produce Good Density Estimators? Brendan J. Frey, Geoffrey E. Hinton Peter Dayan Department of Computer Science Department of Brain and Cognitive Sciences University of Toronto

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

Lecture 10: A (Brief) Introduction to Group Theory (See Chapter 3.13 in Boas, 3rd Edition)

Lecture 10: A (Brief) Introduction to Group Theory (See Chapter 3.13 in Boas, 3rd Edition) Lecture 0: A (Brief) Introduction to Group heory (See Chapter 3.3 in Boas, 3rd Edition) Having gained some new experience with matrices, which provide us with representations of groups, and because symmetries

More information

Quantum annealing by ferromagnetic interaction with the mean-field scheme

Quantum annealing by ferromagnetic interaction with the mean-field scheme Quantum annealing by ferromagnetic interaction with the mean-field scheme Sei Suzuki and Hidetoshi Nishimori Department of Physics, Tokyo Institute of Technology, Oh-okayama, Meguro, Tokyo 152-8551, Japan

More information

8.5 Taylor Polynomials and Taylor Series

8.5 Taylor Polynomials and Taylor Series 8.5. TAYLOR POLYNOMIALS AND TAYLOR SERIES 50 8.5 Taylor Polynomials and Taylor Series Motivating Questions In this section, we strive to understand the ideas generated by the following important questions:

More information

Expectation Consistent Free Energies for Approximate Inference

Expectation Consistent Free Energies for Approximate Inference Expectation Consistent Free Energies for Approximate Inference Manfred Opper ISIS School of Electronics and Computer Science University of Southampton SO17 1BJ, United Kingdom mo@ecs.soton.ac.uk Ole Winther

More information

On the Onsager-Yang-Value of the Spontaneous Magnetization

On the Onsager-Yang-Value of the Spontaneous Magnetization On the Onsager-Yang-Value of the Spontaneous Magnetization G. Benettin, G. Gallavotti*, G. Jona-Lasinio, A. L. Stella Istituto di Fisica del Universita, Padova, Italy Received November 10, 1972 Abstract:

More information

OPTIMAL PERTURBATION OF UNCERTAIN SYSTEMS

OPTIMAL PERTURBATION OF UNCERTAIN SYSTEMS Stochastics and Dynamics c World Scientific Publishing Company OPTIMAL PERTURBATION OF UNCERTAIN SYSTEMS BRIAN F. FARRELL Division of Engineering and Applied Sciences, Harvard University Pierce Hall, 29

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Linear Response for Approximate Inference

Linear Response for Approximate Inference Linear Response for Approximate Inference Max Welling Department of Computer Science University of Toronto Toronto M5S 3G4 Canada welling@cs.utoronto.ca Yee Whye Teh Computer Science Division University

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics

Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Chapter 11. Stochastic Methods Rooted in Statistical Mechanics Neural Networks and Learning Machines (Haykin) Lecture Notes on Self-learning Neural Algorithms Byoung-Tak Zhang School of Computer Science

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

OPTIMAL PERTURBATION OF UNCERTAIN SYSTEMS

OPTIMAL PERTURBATION OF UNCERTAIN SYSTEMS Stochastics and Dynamics, Vol. 2, No. 3 (22 395 42 c World Scientific Publishing Company OPTIMAL PERTURBATION OF UNCERTAIN SYSTEMS Stoch. Dyn. 22.2:395-42. Downloaded from www.worldscientific.com by HARVARD

More information

CSC321 Lecture 20: Autoencoders

CSC321 Lecture 20: Autoencoders CSC321 Lecture 20: Autoencoders Roger Grosse Roger Grosse CSC321 Lecture 20: Autoencoders 1 / 16 Overview Latent variable models so far: mixture models Boltzmann machines Both of these involve discrete

More information

MATH 12 CLASS 4 NOTES, SEP

MATH 12 CLASS 4 NOTES, SEP MATH 12 CLASS 4 NOTES, SEP 28 2011 Contents 1. Lines in R 3 1 2. Intersections of lines in R 3 2 3. The equation of a plane 4 4. Various problems with planes 5 4.1. Intersection of planes with planes or

More information

SUBTANGENT-LIKE STATISTICAL MANIFOLDS. 1. Introduction

SUBTANGENT-LIKE STATISTICAL MANIFOLDS. 1. Introduction SUBTANGENT-LIKE STATISTICAL MANIFOLDS A. M. BLAGA Abstract. Subtangent-like statistical manifolds are introduced and characterization theorems for them are given. The special case when the conjugate connections

More information

Title Theory of solutions in the energy r of the molecular flexibility Author(s) Matubayasi, N; Nakahara, M Citation JOURNAL OF CHEMICAL PHYSICS (2003), 9702 Issue Date 2003-11-08 URL http://hdl.handle.net/2433/50354

More information

Basic Principles of Unsupervised and Unsupervised

Basic Principles of Unsupervised and Unsupervised Basic Principles of Unsupervised and Unsupervised Learning Toward Deep Learning Shun ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Deep Learning Self Organization

More information

Local minima and plateaus in hierarchical structures of multilayer perceptrons

Local minima and plateaus in hierarchical structures of multilayer perceptrons Neural Networks PERGAMON Neural Networks 13 (2000) 317 327 Contributed article Local minima and plateaus in hierarchical structures of multilayer perceptrons www.elsevier.com/locate/neunet K. Fukumizu*,

More information

Information geometry of mirror descent

Information geometry of mirror descent Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee

More information

Introduction to Group Theory

Introduction to Group Theory Chapter 10 Introduction to Group Theory Since symmetries described by groups play such an important role in modern physics, we will take a little time to introduce the basic structure (as seen by a physicist)

More information

Variational Linear Response

Variational Linear Response Variational Linear Response Manfred Opper 1) Ole Winther 2) 1) Neural Computing Research Group, School of Engineering and Applied Science, Aston University, Birmingham B4 7ET, United Kingdom 2) Informatics

More information

Discriminative Direction for Kernel Classifiers

Discriminative Direction for Kernel Classifiers Discriminative Direction for Kernel Classifiers Polina Golland Artificial Intelligence Lab Massachusetts Institute of Technology Cambridge, MA 02139 polina@ai.mit.edu Abstract In many scientific and engineering

More information

We denote the space of distributions on Ω by D ( Ω) 2.

We denote the space of distributions on Ω by D ( Ω) 2. Sep. 1 0, 008 Distributions Distributions are generalized functions. Some familiarity with the theory of distributions helps understanding of various function spaces which play important roles in the study

More information

LECTURE Questions 1

LECTURE Questions 1 Contents 1 Questions 1 2 Reality 1 2.1 Implementing a distribution generator.............................. 1 2.2 Evaluating the generating function from the series expression................ 2 2.3 Finding

More information

Neural Learning in Structured Parameter Spaces Natural Riemannian Gradient

Neural Learning in Structured Parameter Spaces Natural Riemannian Gradient Neural Learning in Structured Parameter Spaces Natural Riemannian Gradient Shun-ichi Amari RIKEN Frontier Research Program, RIKEN, Hirosawa 2-1, Wako-shi 351-01, Japan amari@zoo.riken.go.jp Abstract The

More information

BIAS-VARIANCE DECOMPOSITION FOR BOLTZMANN MACHINES

BIAS-VARIANCE DECOMPOSITION FOR BOLTZMANN MACHINES BIAS-VARIANCE DECOMPOSITION FOR BOLTZMANN MACHINES Anonymous authors Paper under double-blind review ABSTRACT We achieve bias-variance decomposition for Boltzmann machines using an information geometric

More information

,, rectilinear,, spherical,, cylindrical. (6.1)

,, rectilinear,, spherical,, cylindrical. (6.1) Lecture 6 Review of Vectors Physics in more than one dimension (See Chapter 3 in Boas, but we try to take a more general approach and in a slightly different order) Recall that in the previous two lectures

More information

Journée Interdisciplinaire Mathématiques Musique

Journée Interdisciplinaire Mathématiques Musique Journée Interdisciplinaire Mathématiques Musique Music Information Geometry Arnaud Dessein 1,2 and Arshia Cont 1 1 Institute for Research and Coordination of Acoustics and Music, Paris, France 2 Japanese-French

More information

Approximate inference algorithms for two-layer Bayesian networks

Approximate inference algorithms for two-layer Bayesian networks Approximate inference algorithms for two-layer Bayesian networks Andrew Y Ng Computer Science Division UC Berkeley Berkeley, CA 94720 ang@csberkeleyedu Michael Jordan Computer Science Division and Department

More information

6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM

6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM 6. Application to the Traveling Salesman Problem 92 6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM The properties that have the most significant influence on the maps constructed by Kohonen s algorithm

More information

Natural Gradient for the Gaussian Distribution via Least Squares Regression

Natural Gradient for the Gaussian Distribution via Least Squares Regression Natural Gradient for the Gaussian Distribution via Least Squares Regression Luigi Malagò Department of Electrical and Electronic Engineering Shinshu University & INRIA Saclay Île-de-France 4-17-1 Wakasato,

More information

Page 52. Lecture 3: Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 2008/10/03 Date Given: 2008/10/03

Page 52. Lecture 3: Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 2008/10/03 Date Given: 2008/10/03 Page 5 Lecture : Inner Product Spaces Dual Spaces, Dirac Notation, and Adjoints Date Revised: 008/10/0 Date Given: 008/10/0 Inner Product Spaces: Definitions Section. Mathematical Preliminaries: Inner

More information

Divergence of the gradient expansion and the applicability of fluid dynamics Gabriel S. Denicol (IF-UFF)

Divergence of the gradient expansion and the applicability of fluid dynamics Gabriel S. Denicol (IF-UFF) Divergence of the gradient expansion and the applicability of fluid dynamics Gabriel S. Denicol (IF-UFF) arxiv:1608.07869, arxiv:1711.01657, arxiv:1709.06644 Frankfurt University 1.February.2018 Preview

More information