Information Geometric view of Belief Propagation
|
|
- Margaret Hampton
- 6 years ago
- Views:
Transcription
1 Information Geometric view of Belief Propagation Yunshu Liu
2 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural Computation, 16, , [2]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Information Geometry of Turbo and Low-Density Parity-Check Codes, IEEE Transaction on Information Theory, VOL. 50, NO.6, June [3].Kevin P. Murphy, Machine Learning - A Probabilistic Perspective, The MIT Press, 2012.
3 Outline Information Geometry Examples of probabilistic manifold Important Submanifold Information projection Information Geometry of Belief Propagation Belief Propagation/Sum-product algorithms Information Geometry of Graphical structure Information-Geometrical view of Belief Propagation
4 Probability distribution viewed as manifold Outline for Infromation Geometry Examples of probabilistic manifold Important Submanifold Information projection
5 Manifold Manifold S Manifold: a set with a coordinate system, a one-to-one mapping from S to R n, supposed to be locally looks like an open subset of R n Elements of the set(points): points in R n, probability distribution, linear system. Figure: A coordinate system ξ for a manifold S
6 Example manifold Parametrization of color models Parametrization: map of color into R 3 Examples:
7 Example of manifold Parametrization of Gaussian distribution S = {p(x; µ, σ)} p(x; µ, σ) = 1 2πσ exp{ (x µ)2 2σ 2 } p(x) σ S x µ
8 Example of manifold Exponential family p(x; θ) = exp[c(x) + n θ i F i (x) ψ(θ)] [θ i ] are called the natural parameters(coordinates), and ψ is the potential function for [θ i ], which can be calculated as ψ(θ) = log i=1 exp[c(x) + n θ i F i (x)]dx The exponential families include many of the most common distributions, including the normal, exponential, gamma, beta, Dirichlet, Bernoulli, binomial, multinomial, Poisson, and so on. i=1
9 Example of two binary discrete random variables as Exponential families Let x = {x 1, x 2 }, x i = 1 or 1, consider the family of all the probability distributions S = {p(x) p(x) > 0, i p(x = C i) = 1} Outcomes: C 0 = { 1, 1}, C 1 = { 1, 1} C 2 = {1, 1}, C 3 = {1, 1} Parameterization: p(x = C i ) = η i for i = 1, 2, 3 p(x = C 0 ) = 1 3 j=1 ηj η = (η 1 η 2 η 3 ) is called the m-coordinate of S. η 1 (0 0 1) η 3 (1 0 0) (0 0 0) (0 1 0) Simplex η 2
10 Example of two binary discrete random variables as Exponential families A new coordinate {θ i }, which is defined as θ i = ln η i 1 3 j=1 ηj for i = 1, 2, 3. (1) θ = (θ 1 θ 2 θ 3 ) is called the e-coordinate of S. η 3 θ 3 (1 0 0) (0 0 0) (0 1 0) η 2 θ 2 η 1 (0 0 1) Simplex θ 1 Whole space
11 Example of two binary discrete random variables as Exponential families Introduce new random variables: { 1 if x = Ci δ Ci (x) = 0 otherwise and let ψ = ln p(x = C 0 ) (2) Then S is an exponential family with natural parameter {θ i }: p(x, θ) = exp( 3 θ i δ Ci (x) ψ(θ)) (3) i=1
12 Probability distribution viewed as manifold Outline for Infromation Geometry Examples of probabilistic manifold Important Submanifolds Information projection
13 Submanifolds Submanifolds Definition: a submanifold M of a manifold S is a subset of S which itself has the structure of a manifold An open subset of n-dimensional manifold forms an n-dimensional submanifold. One way to construct m(<n)dimensional manifold: fix n-m coordinates. Examples:
14 e-autoparallel submanifold and m-autoparallel submanifold e-autoparallel submanifold A subfamily E is said to be e-autoparallel submanifold of S if for some coordinate λ there exist a matrix A and a vector b such that for the e-coordinate θ of S θ(p) = Aλ(p) + b ( p E) (4) If λ is one-dimensional, it is called a e-geodesic. An e-autoparallel submanifold of S is a hyperplane in e-coordinate; A e-geodesic of S is a straight line in e-coordinate. Hyper-plane in θ E in θ coordinate
15 e-autoparallel submanifold and m-autoparallel submanifold m-autoparallel submanifold A subfamily M is said to be m-autoparallel submanifold of S if for some coordinate γ there exist a matrix A and a vector b such that for the m-coordinate η of S η(p) = Aγ(p) + b ( p M) (5) If γ is one-dimensional, it is called a m-geodesic.
16 e-autoparallel submanifold and m-autoparallel submanifold Two independent bits The set of all product distributions, which is defined as E = {p(x) p(x) = p X1 (x 1 )p X2 (x 2 )} (6) p X1 (x 1 = 1) = p 1, p X2 (x 2 = 1) = p 2 then p X1 (x 1 = 1) = 1 p 1, p X2 (x 2 = 1) = 1 p 2 Parameterization Define λ i = 1 2 ln p i 1 p i for i = 1, 2, and ψ(λ) = i=1,2 ln (eλ i + e λ i ), p(x, λ) = e λ 1x 1 ψ(λ 1 ) e λ 2x 2 ψ(λ 2 ) = e λx ψ(λ) (7) Thus E is an exponential family with nature parameter {λ i }
17 e-autoparallel submanifold and m-autoparallel submanifold Dual ( parameters: ) ( expectations ) ( ) γ 1 1 ψ(λ) 2p1 1 γ = = = 2 ψ(λ) 2p 2 1 γ 2 = E p (x) Properties E is an e-autoparallel submanifold(plane in θ coordinate) of S, but not a m-autoparallel submanifold(not a plane in η coordinate): θ ( ) 2 2 θ(p) = θ 2 = 2 0 λ1 = 2 0 λ(p) λ θ
18 e-autoparallel submanifold and m-autoparallel submanifold Two independent bits: e-autoparallel submanifold but not m-autoparallel submanifold E in θ coordinate E in η coordinate
19 Probability distribution viewed as manifold Outline for Infromation Geometry Examples of probabilistic manifold Important Submanifolds Information projection
20 Projections and Pythagorean Theorem KL-divergence On the manifold of probability mass functions for random variables taking values in the set X, we can also define the Kullback Leibler divergence, or relative entropy, measured in bits, according to D(p X q X ) = ( ) px (x) p X (x) log (8) q x X X (x) q X p X S
21 Projections and Pythagorean Theorem Information Projection Let p be a point in S and let E be a e-autoparallel submanifold of S. We want to find point Π E (p X ) in E that minimize the KL-divergence D(p X Π E (p X )) = min q X E D(p X q X ) (9) p X The m-geodesic connecting p X and Π E (p X ) is orthogonal to E at Π E (p X ). In geometry, orthogonal usually means right angle In information geometry, orthogonal usually means certain values are uncorrelated m-geodesic q X ΠE(pX) e-autoparallel submanifold E D(p X Π E (p X )) = min qx E D(p X q X )
22 Projections and Pythagorean Theorem Pythagorean relation For q X E, we have the following Pythagorean relation. D(p X q X ) = D(p X Π E (p X )) + D(Π E (p X ) q X ) q X E (10) p X m-geodesic Π E (p X ) q X e-autoparallel submanifold E D(p X q X )=D(p X Π E (p X )) + D(Π E (p X ) q X ) q X E
23 Projection in θ coordinate
24 Projection in η coordinate
25 Information Geometry of Belief Propagation Outline for Information Geometry of Belief Propagation Belief Propagation/Sum-product algorithms Information Geometry of Graphical structure Information-Geometrical view of Belief Propagation
26 Belief Propagation Belief Propagation/Sum-product algorithm The Belief Propagation algorithm involves using a message passing scheme to explore the conditional independence and use these properties to change the order of sum and product. xy + xz = x(y + z) + x + y x z y z Expression tree of xy + xz and x(y+z), where the leaf nodes represent variables and internal nodes represent arithmetic operators(addition, multiplication, negation, etc.).
27 Belief Propagation Message passing on a factor graph: Variable x to factor f : m x f = m fk x(x) k nb(x)\f f 1 m f1 x x m x f f m fk x f K nb(x) denote the neighboring factor nodes of x If nb(x)\f = then m x f = 1.
28 Belief Propagation Message passing on a factor graph: Factor f to variable x: m f x = f (x, x 1,, x M ) x 1, x M m nb(f )\x m xm f (x m ) x 1 m x1 f f m f x x x M m xm f nb(f ) denote the neighboring variable nodes of f If nb(f )\x = then m f x = f (x).
29 Serial protocol of Belief Propagation Message passing on a tree structured factor graph Serial protocol: Send messages up to the root and back x 3 f b x 2 f a f c x 1 x 4
30 Serial protocol of Belief Propagation Message passing on a tree structured factor graph Down - up: m x1 f a (x 1 ) = 1 m fa x 2 (x 2 ) = x 1 f a (x 1, x 2 ) m x4 f c (x 4 ) = 1 m fc x 2 (x 2 ) = x 4 f c (x 2, x 4 ) m x2 f b (x 2 ) = m fa x 2 (x 2 )m fc x 2 (x 2 ) m fb x 3 (x 3 ) = x 2 f b (x 2, x 3 )m x2 f b
31 Serial protocol of Belief Propagation Message passing on a tree structured factor graph Up - down: m x3 f b (x 3 ) = 1 m fb x 2 (x 2 ) = x 3 f b (x 2, x 3 ) m x2 f a (x 2 ) = m fb x 2 (x 2 )m fc x 2 (x 2 ) m fa x 1 (x 1 ) = x 2 f a (x 1, x 2 )m x2 f a (x 2 ) m x2 f c (x 2 ) = m fa x 2 (x 2 )m fb x 2 (x 2 ) m fc x 4 (x 4 ) = x 2 f c (x 2.x 4 )m x2 f c (x 2 )
32 Serial protocol of Belief Propagation Message passing on a tree structured factor graph Calculating p(x 2 ) p(x 2 ) = 1 Z m f a x 2 (x 2 )m fb x 2 (x 2 )m fc x2 (x 2 ) = 1 { }{ }{ } f a (x 1, x 2 ) f b (x 2, x 3 ) f c (x 2, x 4 ) Z x 1 x 3 x 4 = 1 f a (x 1, x 2 )f b (x 2, x 3 )f c (x 2, x 4 ) Z x 1 x 3 x 4 = p(x) x 1 x 3 x 4
33 Parallel protocol of Belief Propagation Message passing on general graphs with loops Parallel protocol: Initialize all messages All nodes receive message from their neighbors in parallel and update their belief states Send new messages back out to their neighbors Repeats the above two process until convergence
34 Information Geometry of Belief Propagation Outline for Information Geometry of Belief Propagation Belief Propagation/Sum-product algorithms Information Geometry of Graphical structure Information-Geometrical view of Belief Propagation
35 Information Geometry of Graphical structure Important Graphs(Undirected) for Belief Propagation A: Belief graph with no edges(all n nodes are independent) B: Graph with a single edge r C: Graph of the true distribution with n nodes and L edges
36 Information Geometry of Graphical structure Important Graphs(Undirected) for Belief Propagation A : B : C : n n M 0 = {p 0 (x, θ) = exp[ h i x i + θ i x i ψ 0 (θ)] θ R n } M r = {p r (x, ζ r ) = exp[ i=1 i=1 n h i x i + s r c r (x) + i=1 i=1 r=1 n ζ ri x i ψ r (ζ r )] ζ r R n } n L S = {p(x, θ, s) = exp[ h i x i + s r c r (x) ψ(θ, s)] θ R n, s R L } i=1
37 Information Geometry of Graphical structure Properties of M 0, M r M 0 is an e-autoparallel submanifold of S, with e-affine coordinate θ M r for r = 1, L are all e-autoparallel submanifold of S, with e-affine coordiante ζ r A distribution in M r include only one nonlinear interaction term c r (x) corresponding to the single edge in B
38 Information Geometry of Belief Propagation Outline for Information Geometry of Belief Propagation Belief Propagation/Sum-product algorithms Information Geometry of Graphical structure Information-Geometrical view of Belief Propagation
39 Information-Geometrical view of Belief Propagation Problem setup for belief propagation Suppose we have the true distribution q(x) S, and want to calculate q 0 (x) = M 0 q(x), the m-projection of q(x) to M 0, the goal of belief propagation is to find a good approximation ˆq(x) M 0 of q(x). M 0 M r S n n = {p 0 (x, θ) = exp[ h i x i + θ i x i ψ 0 (θ)] θ R n } = {p r (x, ζ r ) = exp[ i=1 i=1 n h i x i + s r c r (x) + i=1 i=1 r=1 n ζ ri x i ψ r (ζ r )] ζ r R n } n L = {p(x, θ, s) = exp[ h i x i + s r c r (x) ψ(θ, s)] θ R n, s R L } i=1
40 Geometrical Belief Propagation Algorithm 1) Put t = 0, and start with initial guesses ζ 0 r, for example, ζ 0 r = 0 2) For t = 0, 1, 2,, m-project p r (x, ζ t r) to M 0 and obtain the linearized version of s r c r (x) ξ t+1 r = M 0 p r (x, ζ t r) ζ t r 3) Summarize all the effects of s r c r (x), to give 4) Update ζ r by ζ t+1 r θ t+1 = = r r L r=1 ξ t+1 r ξ t+1 r = θt+1 ξr t+1 5) Repeat 2-4 until convergence
41 Geometrical Belief Propagation Algorithm Analysis Given q(x), let ζ r be the current solution which M r believes to give a good approximation of q(x). then we project it to M 0, giving the solution θ r in M 0 having the same expectation: p(x, θ r ) = M 0 p r (x, ζ r ). Since θ r includes both the effect of the single nonlinear term s r c r (x) specific to M r and that due to the linear term ζ r, we can use ξ r = θ r ζ r to represents the effect of the single nonlinear term s r c r (x), we consider it as the linearized version of s r c r (x).
42 Geometrical Belief Propagation Algorithm Analysis:continue Once M r knows the linearized version of s r c r (x) is ξ r, it broadcasts ξ r to all the other models M 0 and M r (r r). Receiving these messages from all M r, M 0 guesses that the equivalent linear term of L r=1 s r c r (x) is θ = L r=1 ξ r. Since M r in turn receives messages ξ r from all other M r (r r), M r uses them to form a new ζ r = r r ξ r = θ ξ r. The whole process is repeated until converge.
43 Geometrical Belief Propagation Algorithm Analysis:continue Assume the algorithm has converged to {ζ r } and {θ }. We find the estimation of q(x) S in M 0 is p 0 (x, θ )(which may not be the exact solution). We write q(x), p 0 (x, θ ) and p r (x, ζ r ) as: p 0 (x, θ ) p r (x, ζ r ) q(x) n L = exp[ h i x i + ξr x ψ 0 ] = exp[ i=1 r=1 n h i x i + s r c r (x) + ξr x ψ r ] r r i=1 n L = exp[ h i x i + s r c r (x) ψ] i=1 r=1 The whole idea of the algorithm is to approximate s r c r (x) by ξ r x in M r, then the independent distribution p 0 (x, θ ) M 0 integrates all the information.
44 Information-Geometrical view of Belief Propagation Convergence Analysis The convergence point satisfies the two conditions: 1. m-condition: θ = M 0 p r (x, ζ r ) 2. e-condition: θ = 1 L 1 L r=1 ζ r The m-condition and e-condition of the algorithm e-condition is satisfied at each iteration of BP algorithm m-condition is satisfied at the equilibrium of the BP algorithm
45 Information-Geometrical view of Belief Propagation Convergence Analysis Let us consider the m-autoparallel submanifold M connecting p 0 (x, θ ) and p r (x, ζ r ), r = 1,, L: M = {p(x) p(x) = d 0 p 0 (x, θ ) + L d r p r (x, ζr ); d 0 + r=1 L d r = 1} We also consider the e-autoparallel submanifold E connecting p 0 (x, θ ) and p r (x, ζ r ): r=1 L E = {p(x) log p(x) = v 0 log p 0 (x, θ ) + v r log p r (x, ζr ) ψ; r=1 L v 0 + v r = 1} r=1
46 Information-Geometrical view of Belief Propagation Convergence Analysis M hold the expectation of x unchange, thus a equivalent definition is given by: M = {p(x) p(x) S, x xp(x) = x xp 0 (x, θ ) = η 0 (θ )} Thus m-condition implies that M intersects all of M 0 and M r orthogonally. The e-condition implies that E include the true distribution q(x) since if we set v 0 = (L 1), v r = 1 for r, using e-condition, we will get q(x) = Cp 0 (x, θ ) (L 1) where C is the normalization factor L p r (x, ζr ) E r=1
47 Information-Geometrical view of Belief Propagation Convergence Analysis The proposed BP algorithm search for θ and ζ until both the m-condition and e-condition are satisfied. We know E includes q(x), if M include q(x), its m-projection to M 0 is p 0 (x, θ ), θ gives the true solution; if M doesn t include q(x), the algorithm only give a approximation of the true distribution. Thus we have the following theorem: Theorem: When the underlying graph is a tree, both E and M include q(x), and the algorithm gives the exact solution.
48 Information-Geometrical view of Belief Propagation Example of L = 2 where θ = ζ 1 + ζ 2 = ξ 1 + ξ 2, ζ 1 = ξ 2 and ζ 2 = ξ 1 1) Put t = 0, and start with initial guesses ζ 0 2, for example, ζ0 2 = 0 2) For t = 0, 1, 2,, m-project p r (x, ζ t r) to M 0 and obtain the linearized version of s r c r (x) ζ t+1 1 = p 2 (x, ζ2 t ) ζt 2, ζt+1 2 = p 1 (x, ζ t+1 1 ) ζ t+1 1 M 0 M 0 3)Summarize all the effects of s r c r (x), to give θ t+1 = ζ t ζ t+1 2 4) Repeat 2)-3) until convergence, and we get θ = ζ 1 + ζ 2
49 Information-Geometrical view of Belief Propagation IKEDA et al.: INFORMATION GEOMETRY OF TURBO AND LDPC CODES Fig. 5. Equimarginal subman Fig. 4. Information-geometrical view of turbo decoding. is schematically shown in Fig. 4. The projected distribution is written as cides with. In o of of the -pro and of with respect to Figure: Information geometrical view of the BP algorithm with L = 2 We denote the parameter
50 Information-Geometrical view of Belief Propagation Analysis of the example with L = 2 The convergence point satisfies the two conditions: 1. m-condition: θ = M 0 p 1 (x, ζ 1 ) = M 0 p 2 (x, ζ 2 ) 2. e-condition: θ = ζ 1 + ζ 2 = ξ 1 + ξ 2 * * Figure: Equimarinal submanifold M
51 Information-Geometrical view of Belief Propagation * *
52 Information-Geometrical view of CCCP-Bethe Information Geometric Convex Concave Computational Procedure(CCCP-Bethe) Inner loop: Given θ t, calculate {ζr t+1 } by solving p r (x, ζr t+1 ) = Lθ t M 0 L r=1 ζ t+1 r = p 0 (x, θ t ), r = 1,, L outer loop: Given a set of {ζr t+1 } as the result of the inner loop, update θ t+1 = Lθ t L r=1 ζ t+1 r, r = 1,, L Notes: CCCP enforces the m-condition at each iteration, the e-condition is satisfied only at the convergent point, which means we search for new θ t+1 until the e-condition is satisfied.
53 A New e-constraint Algorithm Define a cost function P under the e-constraint θ = r ζr /(L 1). We minimize Fe by the gradient descent algorithm. The gradient is Here, I0 (θ) and Ir (ζr ) are the Fisher information matrices of p0 (x, θ) and pr (x, ζr ), resepctively, which are defined as
54 A New e-constraint Algorithm Under the gradient descent algorithm, ζ r and θ are updated as where δ is a small positive learning rate.
55 A New e-constraint Algorithm
56 Thanks! Thanks!
Introduction to Information Geometry
Introduction to Information Geometry based on the book Methods of Information Geometry written by Shun-Ichi Amari and Hiroshi Nagaoka Yunshu Liu 2012-02-17 Outline 1 Introduction to differential geometry
More informationExtremal Entropy: Information Geometry, Numerical Entropy Mapping, and Machine Learning Application of Associated Conditional Independences
Extremal Entropy: Information Geometry, Numerical Entropy Mapping, and Machine Learning Application of Associated Conditional Independences Ph.D. Dissertation Defense Yunshu Liu Adaptive Signal Processing
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationCharacterizing the Region of Entropic Vectors via Information Geometry
Characterizing the Region of Entropic Vectors via Information Geometry John MacLaren Walsh Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu Thanks
More informationBelief Propagation, Information Projections, and Dykstra s Algorithm
Belief Propagation, Information Projections, and Dykstra s Algorithm John MacLaren Walsh, PhD Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/
More informationInformation Projection Algorithms and Belief Propagation
χ 1 π(χ 1 ) Information Projection Algorithms and Belief Propagation Phil Regalia Department of Electrical Engineering and Computer Science Catholic University of America Washington, DC 20064 with J. M.
More information13 : Variational Inference: Loopy Belief Propagation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction
More informationChapter 2 Exponential Families and Mixture Families of Probability Distributions
Chapter 2 Exponential Families and Mixture Families of Probability Distributions The present chapter studies the geometry of the exponential family of probability distributions. It is not only a typical
More informationA Survey of Information Geometry in Iterative Receiver Design, and Its Application to Cooperative Positioning
A Survey of Information Geometry in Iterative Receiver Design, and Its Application to Cooperative Positioning Master of Science Thesis in Communication Engineering DAPENG LIU Department of Signals and
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationInformation Geometry of Turbo and Low-Density Parity-Check Codes
1 Information Geometry of Turbo and Low-Density Parity-Check Codes Shiro Ikeda Institute of Statistical Mathematics Mitao-ku, Tokyo, 106-8569 Japan Toshiyuki Tanaka Tokyo Metropolitan University Hachioji-shi,
More informationAN AFFINE EMBEDDING OF THE GAMMA MANIFOLD
AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD C.T.J. DODSON AND HIROSHI MATSUZOE Abstract. For the space of gamma distributions with Fisher metric and exponential connections, natural coordinate systems, potential
More informationBias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions
- Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of
More informationMean-field equations for higher-order quantum statistical models : an information geometric approach
Mean-field equations for higher-order quantum statistical models : an information geometric approach N Yapage Department of Mathematics University of Ruhuna, Matara Sri Lanka. arxiv:1202.5726v1 [quant-ph]
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15
More informationVariational algorithms for marginal MAP
Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationExpectation Propagation in Factor Graphs: A Tutorial
DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational
More informationJunction Tree, BP and Variational Methods
Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3
More information13 : Variational Inference: Loopy Belief Propagation and Mean Field
10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction
More informationInformation geometry of the power inverse Gaussian distribution
Information geometry of the power inverse Gaussian distribution Zhenning Zhang, Huafei Sun and Fengwei Zhong Abstract. The power inverse Gaussian distribution is a common distribution in reliability analysis
More informationInformation geometry of Bayesian statistics
Information geometry of Bayesian statistics Hiroshi Matsuzoe Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Nagoya 466-8555, Japan Abstract.
More informationJune 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions
Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationFractional Belief Propagation
Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation
More informationBayesian Machine Learning - Lecture 7
Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationGraphical models and message-passing Part II: Marginals and likelihoods
Graphical models and message-passing Part II: Marginals and likelihoods Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available
More informationLearning MN Parameters with Approximation. Sargur Srihari
Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationLECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS
LECTURE : EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS HANI GOODARZI AND SINA JAFARPOUR. EXPONENTIAL FAMILY. Exponential family comprises a set of flexible distribution ranging both continuous and
More informationConnecting Belief Propagation with Maximum Likelihood Detection
Connecting Belief Propagation with Maximum Likelihood Detection John MacLaren Walsh 1, Phillip Allan Regalia 2 1 School of Electrical Computer Engineering, Cornell University, Ithaca, NY. jmw56@cornell.edu
More informationInformation Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure
Entropy 014, 16, 131-145; doi:10.3390/e1604131 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable
More informationInformation Geometry
2015 Workshop on High-Dimensional Statistical Analysis Dec.11 (Friday) ~15 (Tuesday) Humanities and Social Sciences Center, Academia Sinica, Taiwan Information Geometry and Spontaneous Data Learning Shinto
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationInformation geometry in optimization, machine learning and statistical inference
Front. Electr. Electron. Eng. China 2010, 5(3): 241 260 DOI 10.1007/s11460-010-0101-3 Shun-ichi AMARI Information geometry in optimization, machine learning and statistical inference c Higher Education
More informationNeighbourhoods of Randomness and Independence
Neighbourhoods of Randomness and Independence C.T.J. Dodson School of Mathematics, Manchester University Augment information geometric measures in spaces of distributions, via explicit geometric representations
More informationPart 1: Expectation Propagation
Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud
More informationMachine Learning. Lecture 3: Logistic Regression. Feng Li.
Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms
More informationBounding the Entropic Region via Information Geometry
Bounding the ntropic Region via Information Geometry Yunshu Liu John MacLaren Walsh Dept. of C, Drexel University, Philadelphia, PA 19104, USA yunshu.liu@drexel.edu jwalsh@coe.drexel.edu Abstract This
More information9 Forward-backward algorithm, sum-product on factor graphs
Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous
More informationCSCI567 Machine Learning (Fall 2014)
CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationSTATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY
2nd International Symposium on Information Geometry and its Applications December 2-6, 2005, Tokyo Pages 000 000 STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY JUN-ICHI TAKEUCHI, ANDREW R. BARRON, AND
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationMATHEMATICAL ENGINEERING TECHNICAL REPORTS. An Extension of Least Angle Regression Based on the Information Geometry of Dually Flat Spaces
MATHEMATICAL ENGINEERING TECHNICAL REPORTS An Extension of Least Angle Regression Based on the Information Geometry of Dually Flat Spaces Yoshihiro HIROSE and Fumiyasu KOMAKI METR 2009 09 March 2009 DEPARTMENT
More information12 : Variational Inference I
10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationMessage Passing Algorithm with MAP Decoding on Zigzag Cycles for Non-binary LDPC Codes
Message Passing Algorithm with MAP Decoding on Zigzag Cycles for Non-binary LDPC Codes Takayuki Nozaki 1, Kenta Kasai 2, Kohichi Sakaniwa 2 1 Kanagawa University 2 Tokyo Institute of Technology July 12th,
More informationExpectation propagation for symbol detection in large-scale MIMO communications
Expectation propagation for symbol detection in large-scale MIMO communications Pablo M. Olmos olmos@tsc.uc3m.es Joint work with Javier Céspedes (UC3M) Matilde Sánchez-Fernández (UC3M) and Fernando Pérez-Cruz
More informationCS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationInformation geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem
https://doi.org/10.1007/s41884-018-0002-8 RESEARCH PAPER Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem Shun-ichi Amari
More informationInformation Geometric Structure on Positive Definite Matrices and its Applications
Information Geometric Structure on Positive Definite Matrices and its Applications Atsumi Ohara Osaka University 2010 Feb. 21 at Osaka City University 大阪市立大学数学研究所情報幾何関連分野研究会 2010 情報工学への幾何学的アプローチ 1 Outline
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationLinear Response for Approximate Inference
Linear Response for Approximate Inference Max Welling Department of Computer Science University of Toronto Toronto M5S 3G4 Canada welling@cs.utoronto.ca Yee Whye Teh Computer Science Division University
More informationLower Bounds on the Graphical Complexity of Finite-Length LDPC Codes
Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel 2009 IEEE International
More informationDiscrete Markov Random Fields the Inference story. Pradeep Ravikumar
Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to
More informationDeep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationOn the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy
On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy Ming Su Department of Electrical Engineering and Department of Statistics University of Washington,
More informationLearning unbelievable marginal probabilities
Learning unbelievable marginal probabilities Xaq Pitkow Department of Brain and Cognitive Science University of Rochester Rochester, NY 14607 xaq@post.harvard.edu Yashar Ahmadian Center for Theoretical
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More information5. Density evolution. Density evolution 5-1
5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;
More informationBregman Divergences for Data Mining Meta-Algorithms
p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,
More informationGenerative and Discriminative Approaches to Graphical Models CMSC Topics in AI
Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single
More informationMachine Learning Basics III
Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient
More informationPerformance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels
Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels Jilei Hou, Paul H. Siegel and Laurence B. Milstein Department of Electrical and Computer Engineering
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationDEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY
DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain
More informationSelf-Organization by Optimizing Free-Energy
Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present
More informationWalk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute
More informationInformation Geometry on Hierarchy of Probability Distributions
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 5, JULY 2001 1701 Information Geometry on Hierarchy of Probability Distributions Shun-ichi Amari, Fellow, IEEE Abstract An exponential family or mixture
More informationStatistical Approaches to Learning and Discovery
Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationLearning Binary Classifiers for Multi-Class Problem
Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation
More informationIntroduction to Low-Density Parity Check Codes. Brian Kurkoski
Introduction to Low-Density Parity Check Codes Brian Kurkoski kurkoski@ice.uec.ac.jp Outline: Low Density Parity Check Codes Review block codes History Low Density Parity Check Codes Gallager s LDPC code
More informationArtificial Intelligence
Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:
More informationFactor Graphs and Message Passing Algorithms Part 1: Introduction
Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except
More informationVariational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller
Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate
More informationMaximization of the information divergence from the multinomial distributions 1
aximization of the information divergence from the multinomial distributions Jozef Juríček Charles University in Prague Faculty of athematics and Physics Department of Probability and athematical Statistics
More informationInference. Data. Model. Variates
Data Inference Variates Model ˆθ (,..., ) mˆθn(d) m θ2 M m θ1 (,, ) (,,, ) (,, ) α = :=: (, ) F( ) = = {(, ),, } F( ) X( ) = Γ( ) = Σ = ( ) = ( ) ( ) = { = } :=: (U, ) , = { = } = { = } x 2 e i, e j
More informationMaximum likelihood estimation
Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationVariational Autoencoder
Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationNishant Gurnani. GAN Reading Group. April 14th, / 107
Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,
More informationLow-Density Parity-Check Codes
Department of Computer Sciences Applied Algorithms Lab. July 24, 2011 Outline 1 Introduction 2 Algorithms for LDPC 3 Properties 4 Iterative Learning in Crowds 5 Algorithm 6 Results 7 Conclusion PART I
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More information