Information Geometric view of Belief Propagation

Size: px
Start display at page:

Download "Information Geometric view of Belief Propagation"

Transcription

1 Information Geometric view of Belief Propagation Yunshu Liu

2 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural Computation, 16, , [2]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Information Geometry of Turbo and Low-Density Parity-Check Codes, IEEE Transaction on Information Theory, VOL. 50, NO.6, June [3].Kevin P. Murphy, Machine Learning - A Probabilistic Perspective, The MIT Press, 2012.

3 Outline Information Geometry Examples of probabilistic manifold Important Submanifold Information projection Information Geometry of Belief Propagation Belief Propagation/Sum-product algorithms Information Geometry of Graphical structure Information-Geometrical view of Belief Propagation

4 Probability distribution viewed as manifold Outline for Infromation Geometry Examples of probabilistic manifold Important Submanifold Information projection

5 Manifold Manifold S Manifold: a set with a coordinate system, a one-to-one mapping from S to R n, supposed to be locally looks like an open subset of R n Elements of the set(points): points in R n, probability distribution, linear system. Figure: A coordinate system ξ for a manifold S

6 Example manifold Parametrization of color models Parametrization: map of color into R 3 Examples:

7 Example of manifold Parametrization of Gaussian distribution S = {p(x; µ, σ)} p(x; µ, σ) = 1 2πσ exp{ (x µ)2 2σ 2 } p(x) σ S x µ

8 Example of manifold Exponential family p(x; θ) = exp[c(x) + n θ i F i (x) ψ(θ)] [θ i ] are called the natural parameters(coordinates), and ψ is the potential function for [θ i ], which can be calculated as ψ(θ) = log i=1 exp[c(x) + n θ i F i (x)]dx The exponential families include many of the most common distributions, including the normal, exponential, gamma, beta, Dirichlet, Bernoulli, binomial, multinomial, Poisson, and so on. i=1

9 Example of two binary discrete random variables as Exponential families Let x = {x 1, x 2 }, x i = 1 or 1, consider the family of all the probability distributions S = {p(x) p(x) > 0, i p(x = C i) = 1} Outcomes: C 0 = { 1, 1}, C 1 = { 1, 1} C 2 = {1, 1}, C 3 = {1, 1} Parameterization: p(x = C i ) = η i for i = 1, 2, 3 p(x = C 0 ) = 1 3 j=1 ηj η = (η 1 η 2 η 3 ) is called the m-coordinate of S. η 1 (0 0 1) η 3 (1 0 0) (0 0 0) (0 1 0) Simplex η 2

10 Example of two binary discrete random variables as Exponential families A new coordinate {θ i }, which is defined as θ i = ln η i 1 3 j=1 ηj for i = 1, 2, 3. (1) θ = (θ 1 θ 2 θ 3 ) is called the e-coordinate of S. η 3 θ 3 (1 0 0) (0 0 0) (0 1 0) η 2 θ 2 η 1 (0 0 1) Simplex θ 1 Whole space

11 Example of two binary discrete random variables as Exponential families Introduce new random variables: { 1 if x = Ci δ Ci (x) = 0 otherwise and let ψ = ln p(x = C 0 ) (2) Then S is an exponential family with natural parameter {θ i }: p(x, θ) = exp( 3 θ i δ Ci (x) ψ(θ)) (3) i=1

12 Probability distribution viewed as manifold Outline for Infromation Geometry Examples of probabilistic manifold Important Submanifolds Information projection

13 Submanifolds Submanifolds Definition: a submanifold M of a manifold S is a subset of S which itself has the structure of a manifold An open subset of n-dimensional manifold forms an n-dimensional submanifold. One way to construct m(<n)dimensional manifold: fix n-m coordinates. Examples:

14 e-autoparallel submanifold and m-autoparallel submanifold e-autoparallel submanifold A subfamily E is said to be e-autoparallel submanifold of S if for some coordinate λ there exist a matrix A and a vector b such that for the e-coordinate θ of S θ(p) = Aλ(p) + b ( p E) (4) If λ is one-dimensional, it is called a e-geodesic. An e-autoparallel submanifold of S is a hyperplane in e-coordinate; A e-geodesic of S is a straight line in e-coordinate. Hyper-plane in θ E in θ coordinate

15 e-autoparallel submanifold and m-autoparallel submanifold m-autoparallel submanifold A subfamily M is said to be m-autoparallel submanifold of S if for some coordinate γ there exist a matrix A and a vector b such that for the m-coordinate η of S η(p) = Aγ(p) + b ( p M) (5) If γ is one-dimensional, it is called a m-geodesic.

16 e-autoparallel submanifold and m-autoparallel submanifold Two independent bits The set of all product distributions, which is defined as E = {p(x) p(x) = p X1 (x 1 )p X2 (x 2 )} (6) p X1 (x 1 = 1) = p 1, p X2 (x 2 = 1) = p 2 then p X1 (x 1 = 1) = 1 p 1, p X2 (x 2 = 1) = 1 p 2 Parameterization Define λ i = 1 2 ln p i 1 p i for i = 1, 2, and ψ(λ) = i=1,2 ln (eλ i + e λ i ), p(x, λ) = e λ 1x 1 ψ(λ 1 ) e λ 2x 2 ψ(λ 2 ) = e λx ψ(λ) (7) Thus E is an exponential family with nature parameter {λ i }

17 e-autoparallel submanifold and m-autoparallel submanifold Dual ( parameters: ) ( expectations ) ( ) γ 1 1 ψ(λ) 2p1 1 γ = = = 2 ψ(λ) 2p 2 1 γ 2 = E p (x) Properties E is an e-autoparallel submanifold(plane in θ coordinate) of S, but not a m-autoparallel submanifold(not a plane in η coordinate): θ ( ) 2 2 θ(p) = θ 2 = 2 0 λ1 = 2 0 λ(p) λ θ

18 e-autoparallel submanifold and m-autoparallel submanifold Two independent bits: e-autoparallel submanifold but not m-autoparallel submanifold E in θ coordinate E in η coordinate

19 Probability distribution viewed as manifold Outline for Infromation Geometry Examples of probabilistic manifold Important Submanifolds Information projection

20 Projections and Pythagorean Theorem KL-divergence On the manifold of probability mass functions for random variables taking values in the set X, we can also define the Kullback Leibler divergence, or relative entropy, measured in bits, according to D(p X q X ) = ( ) px (x) p X (x) log (8) q x X X (x) q X p X S

21 Projections and Pythagorean Theorem Information Projection Let p be a point in S and let E be a e-autoparallel submanifold of S. We want to find point Π E (p X ) in E that minimize the KL-divergence D(p X Π E (p X )) = min q X E D(p X q X ) (9) p X The m-geodesic connecting p X and Π E (p X ) is orthogonal to E at Π E (p X ). In geometry, orthogonal usually means right angle In information geometry, orthogonal usually means certain values are uncorrelated m-geodesic q X ΠE(pX) e-autoparallel submanifold E D(p X Π E (p X )) = min qx E D(p X q X )

22 Projections and Pythagorean Theorem Pythagorean relation For q X E, we have the following Pythagorean relation. D(p X q X ) = D(p X Π E (p X )) + D(Π E (p X ) q X ) q X E (10) p X m-geodesic Π E (p X ) q X e-autoparallel submanifold E D(p X q X )=D(p X Π E (p X )) + D(Π E (p X ) q X ) q X E

23 Projection in θ coordinate

24 Projection in η coordinate

25 Information Geometry of Belief Propagation Outline for Information Geometry of Belief Propagation Belief Propagation/Sum-product algorithms Information Geometry of Graphical structure Information-Geometrical view of Belief Propagation

26 Belief Propagation Belief Propagation/Sum-product algorithm The Belief Propagation algorithm involves using a message passing scheme to explore the conditional independence and use these properties to change the order of sum and product. xy + xz = x(y + z) + x + y x z y z Expression tree of xy + xz and x(y+z), where the leaf nodes represent variables and internal nodes represent arithmetic operators(addition, multiplication, negation, etc.).

27 Belief Propagation Message passing on a factor graph: Variable x to factor f : m x f = m fk x(x) k nb(x)\f f 1 m f1 x x m x f f m fk x f K nb(x) denote the neighboring factor nodes of x If nb(x)\f = then m x f = 1.

28 Belief Propagation Message passing on a factor graph: Factor f to variable x: m f x = f (x, x 1,, x M ) x 1, x M m nb(f )\x m xm f (x m ) x 1 m x1 f f m f x x x M m xm f nb(f ) denote the neighboring variable nodes of f If nb(f )\x = then m f x = f (x).

29 Serial protocol of Belief Propagation Message passing on a tree structured factor graph Serial protocol: Send messages up to the root and back x 3 f b x 2 f a f c x 1 x 4

30 Serial protocol of Belief Propagation Message passing on a tree structured factor graph Down - up: m x1 f a (x 1 ) = 1 m fa x 2 (x 2 ) = x 1 f a (x 1, x 2 ) m x4 f c (x 4 ) = 1 m fc x 2 (x 2 ) = x 4 f c (x 2, x 4 ) m x2 f b (x 2 ) = m fa x 2 (x 2 )m fc x 2 (x 2 ) m fb x 3 (x 3 ) = x 2 f b (x 2, x 3 )m x2 f b

31 Serial protocol of Belief Propagation Message passing on a tree structured factor graph Up - down: m x3 f b (x 3 ) = 1 m fb x 2 (x 2 ) = x 3 f b (x 2, x 3 ) m x2 f a (x 2 ) = m fb x 2 (x 2 )m fc x 2 (x 2 ) m fa x 1 (x 1 ) = x 2 f a (x 1, x 2 )m x2 f a (x 2 ) m x2 f c (x 2 ) = m fa x 2 (x 2 )m fb x 2 (x 2 ) m fc x 4 (x 4 ) = x 2 f c (x 2.x 4 )m x2 f c (x 2 )

32 Serial protocol of Belief Propagation Message passing on a tree structured factor graph Calculating p(x 2 ) p(x 2 ) = 1 Z m f a x 2 (x 2 )m fb x 2 (x 2 )m fc x2 (x 2 ) = 1 { }{ }{ } f a (x 1, x 2 ) f b (x 2, x 3 ) f c (x 2, x 4 ) Z x 1 x 3 x 4 = 1 f a (x 1, x 2 )f b (x 2, x 3 )f c (x 2, x 4 ) Z x 1 x 3 x 4 = p(x) x 1 x 3 x 4

33 Parallel protocol of Belief Propagation Message passing on general graphs with loops Parallel protocol: Initialize all messages All nodes receive message from their neighbors in parallel and update their belief states Send new messages back out to their neighbors Repeats the above two process until convergence

34 Information Geometry of Belief Propagation Outline for Information Geometry of Belief Propagation Belief Propagation/Sum-product algorithms Information Geometry of Graphical structure Information-Geometrical view of Belief Propagation

35 Information Geometry of Graphical structure Important Graphs(Undirected) for Belief Propagation A: Belief graph with no edges(all n nodes are independent) B: Graph with a single edge r C: Graph of the true distribution with n nodes and L edges

36 Information Geometry of Graphical structure Important Graphs(Undirected) for Belief Propagation A : B : C : n n M 0 = {p 0 (x, θ) = exp[ h i x i + θ i x i ψ 0 (θ)] θ R n } M r = {p r (x, ζ r ) = exp[ i=1 i=1 n h i x i + s r c r (x) + i=1 i=1 r=1 n ζ ri x i ψ r (ζ r )] ζ r R n } n L S = {p(x, θ, s) = exp[ h i x i + s r c r (x) ψ(θ, s)] θ R n, s R L } i=1

37 Information Geometry of Graphical structure Properties of M 0, M r M 0 is an e-autoparallel submanifold of S, with e-affine coordinate θ M r for r = 1, L are all e-autoparallel submanifold of S, with e-affine coordiante ζ r A distribution in M r include only one nonlinear interaction term c r (x) corresponding to the single edge in B

38 Information Geometry of Belief Propagation Outline for Information Geometry of Belief Propagation Belief Propagation/Sum-product algorithms Information Geometry of Graphical structure Information-Geometrical view of Belief Propagation

39 Information-Geometrical view of Belief Propagation Problem setup for belief propagation Suppose we have the true distribution q(x) S, and want to calculate q 0 (x) = M 0 q(x), the m-projection of q(x) to M 0, the goal of belief propagation is to find a good approximation ˆq(x) M 0 of q(x). M 0 M r S n n = {p 0 (x, θ) = exp[ h i x i + θ i x i ψ 0 (θ)] θ R n } = {p r (x, ζ r ) = exp[ i=1 i=1 n h i x i + s r c r (x) + i=1 i=1 r=1 n ζ ri x i ψ r (ζ r )] ζ r R n } n L = {p(x, θ, s) = exp[ h i x i + s r c r (x) ψ(θ, s)] θ R n, s R L } i=1

40 Geometrical Belief Propagation Algorithm 1) Put t = 0, and start with initial guesses ζ 0 r, for example, ζ 0 r = 0 2) For t = 0, 1, 2,, m-project p r (x, ζ t r) to M 0 and obtain the linearized version of s r c r (x) ξ t+1 r = M 0 p r (x, ζ t r) ζ t r 3) Summarize all the effects of s r c r (x), to give 4) Update ζ r by ζ t+1 r θ t+1 = = r r L r=1 ξ t+1 r ξ t+1 r = θt+1 ξr t+1 5) Repeat 2-4 until convergence

41 Geometrical Belief Propagation Algorithm Analysis Given q(x), let ζ r be the current solution which M r believes to give a good approximation of q(x). then we project it to M 0, giving the solution θ r in M 0 having the same expectation: p(x, θ r ) = M 0 p r (x, ζ r ). Since θ r includes both the effect of the single nonlinear term s r c r (x) specific to M r and that due to the linear term ζ r, we can use ξ r = θ r ζ r to represents the effect of the single nonlinear term s r c r (x), we consider it as the linearized version of s r c r (x).

42 Geometrical Belief Propagation Algorithm Analysis:continue Once M r knows the linearized version of s r c r (x) is ξ r, it broadcasts ξ r to all the other models M 0 and M r (r r). Receiving these messages from all M r, M 0 guesses that the equivalent linear term of L r=1 s r c r (x) is θ = L r=1 ξ r. Since M r in turn receives messages ξ r from all other M r (r r), M r uses them to form a new ζ r = r r ξ r = θ ξ r. The whole process is repeated until converge.

43 Geometrical Belief Propagation Algorithm Analysis:continue Assume the algorithm has converged to {ζ r } and {θ }. We find the estimation of q(x) S in M 0 is p 0 (x, θ )(which may not be the exact solution). We write q(x), p 0 (x, θ ) and p r (x, ζ r ) as: p 0 (x, θ ) p r (x, ζ r ) q(x) n L = exp[ h i x i + ξr x ψ 0 ] = exp[ i=1 r=1 n h i x i + s r c r (x) + ξr x ψ r ] r r i=1 n L = exp[ h i x i + s r c r (x) ψ] i=1 r=1 The whole idea of the algorithm is to approximate s r c r (x) by ξ r x in M r, then the independent distribution p 0 (x, θ ) M 0 integrates all the information.

44 Information-Geometrical view of Belief Propagation Convergence Analysis The convergence point satisfies the two conditions: 1. m-condition: θ = M 0 p r (x, ζ r ) 2. e-condition: θ = 1 L 1 L r=1 ζ r The m-condition and e-condition of the algorithm e-condition is satisfied at each iteration of BP algorithm m-condition is satisfied at the equilibrium of the BP algorithm

45 Information-Geometrical view of Belief Propagation Convergence Analysis Let us consider the m-autoparallel submanifold M connecting p 0 (x, θ ) and p r (x, ζ r ), r = 1,, L: M = {p(x) p(x) = d 0 p 0 (x, θ ) + L d r p r (x, ζr ); d 0 + r=1 L d r = 1} We also consider the e-autoparallel submanifold E connecting p 0 (x, θ ) and p r (x, ζ r ): r=1 L E = {p(x) log p(x) = v 0 log p 0 (x, θ ) + v r log p r (x, ζr ) ψ; r=1 L v 0 + v r = 1} r=1

46 Information-Geometrical view of Belief Propagation Convergence Analysis M hold the expectation of x unchange, thus a equivalent definition is given by: M = {p(x) p(x) S, x xp(x) = x xp 0 (x, θ ) = η 0 (θ )} Thus m-condition implies that M intersects all of M 0 and M r orthogonally. The e-condition implies that E include the true distribution q(x) since if we set v 0 = (L 1), v r = 1 for r, using e-condition, we will get q(x) = Cp 0 (x, θ ) (L 1) where C is the normalization factor L p r (x, ζr ) E r=1

47 Information-Geometrical view of Belief Propagation Convergence Analysis The proposed BP algorithm search for θ and ζ until both the m-condition and e-condition are satisfied. We know E includes q(x), if M include q(x), its m-projection to M 0 is p 0 (x, θ ), θ gives the true solution; if M doesn t include q(x), the algorithm only give a approximation of the true distribution. Thus we have the following theorem: Theorem: When the underlying graph is a tree, both E and M include q(x), and the algorithm gives the exact solution.

48 Information-Geometrical view of Belief Propagation Example of L = 2 where θ = ζ 1 + ζ 2 = ξ 1 + ξ 2, ζ 1 = ξ 2 and ζ 2 = ξ 1 1) Put t = 0, and start with initial guesses ζ 0 2, for example, ζ0 2 = 0 2) For t = 0, 1, 2,, m-project p r (x, ζ t r) to M 0 and obtain the linearized version of s r c r (x) ζ t+1 1 = p 2 (x, ζ2 t ) ζt 2, ζt+1 2 = p 1 (x, ζ t+1 1 ) ζ t+1 1 M 0 M 0 3)Summarize all the effects of s r c r (x), to give θ t+1 = ζ t ζ t+1 2 4) Repeat 2)-3) until convergence, and we get θ = ζ 1 + ζ 2

49 Information-Geometrical view of Belief Propagation IKEDA et al.: INFORMATION GEOMETRY OF TURBO AND LDPC CODES Fig. 5. Equimarginal subman Fig. 4. Information-geometrical view of turbo decoding. is schematically shown in Fig. 4. The projected distribution is written as cides with. In o of of the -pro and of with respect to Figure: Information geometrical view of the BP algorithm with L = 2 We denote the parameter

50 Information-Geometrical view of Belief Propagation Analysis of the example with L = 2 The convergence point satisfies the two conditions: 1. m-condition: θ = M 0 p 1 (x, ζ 1 ) = M 0 p 2 (x, ζ 2 ) 2. e-condition: θ = ζ 1 + ζ 2 = ξ 1 + ξ 2 * * Figure: Equimarinal submanifold M

51 Information-Geometrical view of Belief Propagation * *

52 Information-Geometrical view of CCCP-Bethe Information Geometric Convex Concave Computational Procedure(CCCP-Bethe) Inner loop: Given θ t, calculate {ζr t+1 } by solving p r (x, ζr t+1 ) = Lθ t M 0 L r=1 ζ t+1 r = p 0 (x, θ t ), r = 1,, L outer loop: Given a set of {ζr t+1 } as the result of the inner loop, update θ t+1 = Lθ t L r=1 ζ t+1 r, r = 1,, L Notes: CCCP enforces the m-condition at each iteration, the e-condition is satisfied only at the convergent point, which means we search for new θ t+1 until the e-condition is satisfied.

53 A New e-constraint Algorithm Define a cost function P under the e-constraint θ = r ζr /(L 1). We minimize Fe by the gradient descent algorithm. The gradient is Here, I0 (θ) and Ir (ζr ) are the Fisher information matrices of p0 (x, θ) and pr (x, ζr ), resepctively, which are defined as

54 A New e-constraint Algorithm Under the gradient descent algorithm, ζ r and θ are updated as where δ is a small positive learning rate.

55 A New e-constraint Algorithm

56 Thanks! Thanks!

Introduction to Information Geometry

Introduction to Information Geometry Introduction to Information Geometry based on the book Methods of Information Geometry written by Shun-Ichi Amari and Hiroshi Nagaoka Yunshu Liu 2012-02-17 Outline 1 Introduction to differential geometry

More information

Extremal Entropy: Information Geometry, Numerical Entropy Mapping, and Machine Learning Application of Associated Conditional Independences

Extremal Entropy: Information Geometry, Numerical Entropy Mapping, and Machine Learning Application of Associated Conditional Independences Extremal Entropy: Information Geometry, Numerical Entropy Mapping, and Machine Learning Application of Associated Conditional Independences Ph.D. Dissertation Defense Yunshu Liu Adaptive Signal Processing

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Characterizing the Region of Entropic Vectors via Information Geometry

Characterizing the Region of Entropic Vectors via Information Geometry Characterizing the Region of Entropic Vectors via Information Geometry John MacLaren Walsh Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu Thanks

More information

Belief Propagation, Information Projections, and Dykstra s Algorithm

Belief Propagation, Information Projections, and Dykstra s Algorithm Belief Propagation, Information Projections, and Dykstra s Algorithm John MacLaren Walsh, PhD Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Information Projection Algorithms and Belief Propagation

Information Projection Algorithms and Belief Propagation χ 1 π(χ 1 ) Information Projection Algorithms and Belief Propagation Phil Regalia Department of Electrical Engineering and Computer Science Catholic University of America Washington, DC 20064 with J. M.

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

Chapter 2 Exponential Families and Mixture Families of Probability Distributions

Chapter 2 Exponential Families and Mixture Families of Probability Distributions Chapter 2 Exponential Families and Mixture Families of Probability Distributions The present chapter studies the geometry of the exponential family of probability distributions. It is not only a typical

More information

A Survey of Information Geometry in Iterative Receiver Design, and Its Application to Cooperative Positioning

A Survey of Information Geometry in Iterative Receiver Design, and Its Application to Cooperative Positioning A Survey of Information Geometry in Iterative Receiver Design, and Its Application to Cooperative Positioning Master of Science Thesis in Communication Engineering DAPENG LIU Department of Signals and

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Information Geometry of Turbo and Low-Density Parity-Check Codes

Information Geometry of Turbo and Low-Density Parity-Check Codes 1 Information Geometry of Turbo and Low-Density Parity-Check Codes Shiro Ikeda Institute of Statistical Mathematics Mitao-ku, Tokyo, 106-8569 Japan Toshiyuki Tanaka Tokyo Metropolitan University Hachioji-shi,

More information

AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD

AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD C.T.J. DODSON AND HIROSHI MATSUZOE Abstract. For the space of gamma distributions with Fisher metric and exponential connections, natural coordinate systems, potential

More information

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions

Bias-Variance Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions - Trade-Off in Hierarchical Probabilistic Models Using Higher-Order Feature Interactions Simon Luo The University of Sydney Data61, CSIRO simon.luo@data61.csiro.au Mahito Sugiyama National Institute of

More information

Mean-field equations for higher-order quantum statistical models : an information geometric approach

Mean-field equations for higher-order quantum statistical models : an information geometric approach Mean-field equations for higher-order quantum statistical models : an information geometric approach N Yapage Department of Mathematics University of Ruhuna, Matara Sri Lanka. arxiv:1202.5726v1 [quant-ph]

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 9: Variational Inference Relaxations Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 24/10/2011 (EPFL) Graphical Models 24/10/2011 1 / 15

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Expectation Propagation in Factor Graphs: A Tutorial

Expectation Propagation in Factor Graphs: A Tutorial DRAFT: Version 0.1, 28 October 2005. Do not distribute. Expectation Propagation in Factor Graphs: A Tutorial Charles Sutton October 28, 2005 Abstract Expectation propagation is an important variational

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Information geometry of the power inverse Gaussian distribution

Information geometry of the power inverse Gaussian distribution Information geometry of the power inverse Gaussian distribution Zhenning Zhang, Huafei Sun and Fengwei Zhong Abstract. The power inverse Gaussian distribution is a common distribution in reliability analysis

More information

Information geometry of Bayesian statistics

Information geometry of Bayesian statistics Information geometry of Bayesian statistics Hiroshi Matsuzoe Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Nagoya 466-8555, Japan Abstract.

More information

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions

June 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

Graphical models and message-passing Part II: Marginals and likelihoods

Graphical models and message-passing Part II: Marginals and likelihoods Graphical models and message-passing Part II: Marginals and likelihoods Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS

LECTURE 11: EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS LECTURE : EXPONENTIAL FAMILY AND GENERALIZED LINEAR MODELS HANI GOODARZI AND SINA JAFARPOUR. EXPONENTIAL FAMILY. Exponential family comprises a set of flexible distribution ranging both continuous and

More information

Connecting Belief Propagation with Maximum Likelihood Detection

Connecting Belief Propagation with Maximum Likelihood Detection Connecting Belief Propagation with Maximum Likelihood Detection John MacLaren Walsh 1, Phillip Allan Regalia 2 1 School of Electrical Computer Engineering, Cornell University, Ithaca, NY. jmw56@cornell.edu

More information

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure

Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable Dually Flat Structure Entropy 014, 16, 131-145; doi:10.3390/e1604131 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Information Geometry of Positive Measures and Positive-Definite Matrices: Decomposable

More information

Information Geometry

Information Geometry 2015 Workshop on High-Dimensional Statistical Analysis Dec.11 (Friday) ~15 (Tuesday) Humanities and Social Sciences Center, Academia Sinica, Taiwan Information Geometry and Spontaneous Data Learning Shinto

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Information geometry in optimization, machine learning and statistical inference

Information geometry in optimization, machine learning and statistical inference Front. Electr. Electron. Eng. China 2010, 5(3): 241 260 DOI 10.1007/s11460-010-0101-3 Shun-ichi AMARI Information geometry in optimization, machine learning and statistical inference c Higher Education

More information

Neighbourhoods of Randomness and Independence

Neighbourhoods of Randomness and Independence Neighbourhoods of Randomness and Independence C.T.J. Dodson School of Mathematics, Manchester University Augment information geometric measures in spaces of distributions, via explicit geometric representations

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Machine Learning. Lecture 3: Logistic Regression. Feng Li.

Machine Learning. Lecture 3: Logistic Regression. Feng Li. Machine Learning Lecture 3: Logistic Regression Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2016 Logistic Regression Classification

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of discussion We first describe inference with PGMs and the intractability of exact inference Then give a taxonomy of inference algorithms

More information

Bounding the Entropic Region via Information Geometry

Bounding the Entropic Region via Information Geometry Bounding the ntropic Region via Information Geometry Yunshu Liu John MacLaren Walsh Dept. of C, Drexel University, Philadelphia, PA 19104, USA yunshu.liu@drexel.edu jwalsh@coe.drexel.edu Abstract This

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

CSCI567 Machine Learning (Fall 2014)

CSCI567 Machine Learning (Fall 2014) CSCI567 Machine Learning (Fall 24) Drs. Sha & Liu {feisha,yanliu.cs}@usc.edu October 2, 24 Drs. Sha & Liu ({feisha,yanliu.cs}@usc.edu) CSCI567 Machine Learning (Fall 24) October 2, 24 / 24 Outline Review

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY

STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY 2nd International Symposium on Information Geometry and its Applications December 2-6, 2005, Tokyo Pages 000 000 STATISTICAL CURVATURE AND STOCHASTIC COMPLEXITY JUN-ICHI TAKEUCHI, ANDREW R. BARRON, AND

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. An Extension of Least Angle Regression Based on the Information Geometry of Dually Flat Spaces

MATHEMATICAL ENGINEERING TECHNICAL REPORTS. An Extension of Least Angle Regression Based on the Information Geometry of Dually Flat Spaces MATHEMATICAL ENGINEERING TECHNICAL REPORTS An Extension of Least Angle Regression Based on the Information Geometry of Dually Flat Spaces Yoshihiro HIROSE and Fumiyasu KOMAKI METR 2009 09 March 2009 DEPARTMENT

More information

12 : Variational Inference I

12 : Variational Inference I 10-708: Probabilistic Graphical Models, Spring 2015 12 : Variational Inference I Lecturer: Eric P. Xing Scribes: Fattaneh Jabbari, Eric Lei, Evan Shapiro 1 Introduction Probabilistic inference is one of

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Message Passing Algorithm with MAP Decoding on Zigzag Cycles for Non-binary LDPC Codes

Message Passing Algorithm with MAP Decoding on Zigzag Cycles for Non-binary LDPC Codes Message Passing Algorithm with MAP Decoding on Zigzag Cycles for Non-binary LDPC Codes Takayuki Nozaki 1, Kenta Kasai 2, Kohichi Sakaniwa 2 1 Kanagawa University 2 Tokyo Institute of Technology July 12th,

More information

Expectation propagation for symbol detection in large-scale MIMO communications

Expectation propagation for symbol detection in large-scale MIMO communications Expectation propagation for symbol detection in large-scale MIMO communications Pablo M. Olmos olmos@tsc.uc3m.es Joint work with Javier Céspedes (UC3M) Matilde Sánchez-Fernández (UC3M) and Fernando Pérez-Cruz

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem

Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem https://doi.org/10.1007/s41884-018-0002-8 RESEARCH PAPER Information geometry connecting Wasserstein distance and Kullback Leibler divergence via the entropy-relaxed transportation problem Shun-ichi Amari

More information

Information Geometric Structure on Positive Definite Matrices and its Applications

Information Geometric Structure on Positive Definite Matrices and its Applications Information Geometric Structure on Positive Definite Matrices and its Applications Atsumi Ohara Osaka University 2010 Feb. 21 at Osaka City University 大阪市立大学数学研究所情報幾何関連分野研究会 2010 情報工学への幾何学的アプローチ 1 Outline

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Linear Response for Approximate Inference

Linear Response for Approximate Inference Linear Response for Approximate Inference Max Welling Department of Computer Science University of Toronto Toronto M5S 3G4 Canada welling@cs.utoronto.ca Yee Whye Teh Computer Science Division University

More information

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel 2009 IEEE International

More information

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to

More information

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem

Deep Poisson Factorization Machines: a factor analysis model for mapping behaviors in journalist ecosystem 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy Ming Su Department of Electrical Engineering and Department of Statistics University of Washington,

More information

Learning unbelievable marginal probabilities

Learning unbelievable marginal probabilities Learning unbelievable marginal probabilities Xaq Pitkow Department of Brain and Cognitive Science University of Rochester Rochester, NY 14607 xaq@post.harvard.edu Yashar Ahmadian Center for Theoretical

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

5. Density evolution. Density evolution 5-1

5. Density evolution. Density evolution 5-1 5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;

More information

Bregman Divergences for Data Mining Meta-Algorithms

Bregman Divergences for Data Mining Meta-Algorithms p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels Jilei Hou, Paul H. Siegel and Laurence B. Milstein Department of Electrical and Computer Engineering

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain

More information

Self-Organization by Optimizing Free-Energy

Self-Organization by Optimizing Free-Energy Self-Organization by Optimizing Free-Energy J.J. Verbeek, N. Vlassis, B.J.A. Kröse University of Amsterdam, Informatics Institute Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Abstract. We present

More information

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation Jason K. Johnson, Dmitry M. Malioutov and Alan S. Willsky Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

Information Geometry on Hierarchy of Probability Distributions

Information Geometry on Hierarchy of Probability Distributions IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 5, JULY 2001 1701 Information Geometry on Hierarchy of Probability Distributions Shun-ichi Amari, Fellow, IEEE Abstract An exponential family or mixture

More information

Statistical Approaches to Learning and Discovery

Statistical Approaches to Learning and Discovery Statistical Approaches to Learning and Discovery Graphical Models Zoubin Ghahramani & Teddy Seidenfeld zoubin@cs.cmu.edu & teddy@stat.cmu.edu CALD / CS / Statistics / Philosophy Carnegie Mellon University

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

Learning Binary Classifiers for Multi-Class Problem

Learning Binary Classifiers for Multi-Class Problem Research Memorandum No. 1010 September 28, 2006 Learning Binary Classifiers for Multi-Class Problem Shiro Ikeda The Institute of Statistical Mathematics 4-6-7 Minami-Azabu, Minato-ku, Tokyo, 106-8569,

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 16 Undirected Graphs Undirected Separation Inferring Marginals & Conditionals Moralization Junction Trees Triangulation Undirected Graphs Separation

More information

Introduction to Low-Density Parity Check Codes. Brian Kurkoski

Introduction to Low-Density Parity Check Codes. Brian Kurkoski Introduction to Low-Density Parity Check Codes Brian Kurkoski kurkoski@ice.uec.ac.jp Outline: Low Density Parity Check Codes Review block codes History Low Density Parity Check Codes Gallager s LDPC code

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:

More information

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Factor Graphs and Message Passing Algorithms Part 1: Introduction Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except

More information

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller

Variational Message Passing. By John Winn, Christopher M. Bishop Presented by Andy Miller Variational Message Passing By John Winn, Christopher M. Bishop Presented by Andy Miller Overview Background Variational Inference Conjugate-Exponential Models Variational Message Passing Messages Univariate

More information

Maximization of the information divergence from the multinomial distributions 1

Maximization of the information divergence from the multinomial distributions 1 aximization of the information divergence from the multinomial distributions Jozef Juríček Charles University in Prague Faculty of athematics and Physics Department of Probability and athematical Statistics

More information

Inference. Data. Model. Variates

Inference. Data. Model. Variates Data Inference Variates Model ˆθ (,..., ) mˆθn(d) m θ2 M m θ1 (,, ) (,,, ) (,, ) α = :=: (, ) F( ) = = {(, ),, } F( ) X( ) = Γ( ) = Σ = ( ) = ( ) ( ) = { = } :=: (U, ) , = { = } = { = } x 2 e i, e j

More information

Maximum likelihood estimation

Maximum likelihood estimation Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

Variational Autoencoder

Variational Autoencoder Variational Autoencoder Göker Erdo gan August 8, 2017 The variational autoencoder (VA) [1] is a nonlinear latent variable model with an efficient gradient-based training procedure based on variational

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Nishant Gurnani. GAN Reading Group. April 14th, / 107 Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,

More information

Low-Density Parity-Check Codes

Low-Density Parity-Check Codes Department of Computer Sciences Applied Algorithms Lab. July 24, 2011 Outline 1 Introduction 2 Algorithms for LDPC 3 Properties 4 Iterative Learning in Crowds 5 Algorithm 6 Results 7 Conclusion PART I

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information