Information Projection Algorithms and Belief Propagation

Size: px
Start display at page:

Download "Information Projection Algorithms and Belief Propagation"

Transcription

1 χ 1 π(χ 1 ) Information Projection Algorithms and Belief Propagation Phil Regalia Department of Electrical Engineering and Computer Science Catholic University of America Washington, DC with J. M. Walsh, Drexel University P. Dewilde Workshop, Wassenaar, June 2008

2 Outline 1 Introduction 2 Bregman Distance 3 Iterative Projection Algorithms 4 Belief Propagation 5 Conclusion

3 Belief Propagation Iterative Convex Projection algorithms Belief Propagation (Pearl, 1986) has met with success in: Error correction decoding (low density parity-check codes, turbo codes,... ); Network diagnostics and link monitoring; Sensor self-localization; Distributed estimation in sensor networks; Lossy source quantization; Multi-user communications; Et cetera. Iterative Convex Projection algorithms have convergence proofs, unlike BP for which proofs assume either: Tree or forest dependency graph; Arbitrarily large factor graph girth.

4 Belief Propagation Iterative Convex Projection algorithms Belief Propagation (Pearl, 1986) has met with success in: Error correction decoding (low density parity-check codes, turbo codes,... ); Network diagnostics and link monitoring; Sensor self-localization; Distributed estimation in sensor networks; Lossy source quantization; Multi-user communications; Et cetera. Iterative Convex Projection algorithms have convergence proofs, unlike BP for which proofs assume either: Tree or forest dependency graph; Arbitrarily large factor graph girth.

5 Belief Propagation Iterative Convex Projection algorithms Belief Propagation (Pearl, 1986) has met with success in: Error correction decoding (low density parity-check codes, turbo codes,... ); Network diagnostics and link monitoring; Sensor self-localization; Distributed estimation in sensor networks; Lossy source quantization; Multi-user communications; Et cetera. Iterative Convex Projection algorithms have convergence proofs, unlike BP for which proofs assume either: Tree or forest dependency graph; Arbitrarily large factor graph girth.

6 Basic Query Can iterative convex projection algorithms lend insight into belief propagation? A priori yes, and the role of information projections and information geometry in BP is nothing new: Moher and Gulliver (Trans. IT-98) rephrased iterative decoding as information projections of Csiszár; Grant (ISIT-99) examined turbo decoding via information projections; Richardson (Trans. IT-03) viewed nonlinear dynamics of iterative decoding via (tacitly) information geometry; Ikeda, Tanaka and Amari (Trans. IT-04) rephrased all of the above in terms of information geometry. Key obstacle: Extrinsic information extraction goes against projections on invariant sets.

7 Basic Query Can iterative convex projection algorithms lend insight into belief propagation? A priori yes, and the role of information projections and information geometry in BP is nothing new: Moher and Gulliver (Trans. IT-98) rephrased iterative decoding as information projections of Csiszár; Grant (ISIT-99) examined turbo decoding via information projections; Richardson (Trans. IT-03) viewed nonlinear dynamics of iterative decoding via (tacitly) information geometry; Ikeda, Tanaka and Amari (Trans. IT-04) rephrased all of the above in terms of information geometry. Key obstacle: Extrinsic information extraction goes against projections on invariant sets.

8 Basic Query Can iterative convex projection algorithms lend insight into belief propagation? A priori yes, and the role of information projections and information geometry in BP is nothing new: Moher and Gulliver (Trans. IT-98) rephrased iterative decoding as information projections of Csiszár; Grant (ISIT-99) examined turbo decoding via information projections; Richardson (Trans. IT-03) viewed nonlinear dynamics of iterative decoding via (tacitly) information geometry; Ikeda, Tanaka and Amari (Trans. IT-04) rephrased all of the above in terms of information geometry. Key obstacle: Extrinsic information extraction goes against projections on invariant sets.

9 Bregman Distance Graph of a convex function f( ) is lower bounded by any tangent hyperplane: f(r) f(q) f(q) + f(q), r q This induces the gradient inequality: q r f(r) f(q) + f(q), r q whose discrepancy in turn induces the Bregman distance D f (r, q) = f(r) f(q) f(q), r q 0

10 Common examples Probability mass functions defined on M outcomes: q i = Pr(x = x i ), i = 0, 1, 2,..., M 1. Introduce convex domain q 1 0 D = q i :., q 1 + q q M 1 1 q M 1 0 Setting q 0 = 1 M 1 i = 1 q i, the negative Shannon entropy f(q) = is convex over D. ( M 1 1 i = 1 ) ( M 1 q i log 1 i = 1 ) M 1 q i + i = 1 q i log q i

11 The induced Bregman distance becomes D f (r, q) = M 1 i = 0 r i log r i q i and is recognized as the Kullback-Leibler divergence. Closely related is the Fenchel conjugate function f (θ) = sup q = log ( ) q, θ f(q) ( M 1 i = 0 ) exp(θ i ), with θ i = log q i q 0 This is convex over a domain D IR M of vectors having first component zero. ( Log probability ratios )

12 The induced Bregman distance becomes D f (r, q) = M 1 i = 0 r i log r i q i and is recognized as the Kullback-Leibler divergence. Closely related is the Fenchel conjugate function f (θ) = sup q = log ( ) q, θ f(q) ( M 1 i = 0 ) exp(θ i ), with θ i = log q i q 0 This is convex over a domain D IR M of vectors having first component zero. ( Log probability ratios )

13 Induced Bregman distance is now By associating M 1 i = 0 exp(ρ i ) D f (ρ, θ) = log M 1 exp(θ i ) i = 0 M 1 i = 0 exp(θ i ) (ρ i θ i ) M 1 j = 0 exp(θ j ) q i = exp(θ i) M 1 exp(θ j ) j = 0 r i = exp(ρ i) M 1 exp(ρ j ) j = 0 This assumes the more familiar form D f (ρ, θ) = M 1 i = 0 q i log q i r i

14 Observe that [ f(q)] i = df(q) dq i = log q i q 0 = θ i [ f (θ)] i = df (θ) dθ i = exp(θ i) i exp(θ j) = q i giving inverse maps, as f(q) and f (θ) are Legendre transforms of each other. In general, for convex f and its conjugate f in the Legendre class, their induced Bregman distances satisfy ( ) D f (r, q) = D f f(q), f(r).

15 Consider finally the energy function: f(q) = 1 2 M 1 i = 0 q i 2, q IR M the Bregman distance becomes D f (r, q) = 1 M 1 (r i q i ) 2. 2 i = 0 As f(q) = q, and f = f, this gives a symmetric Bregman distance: D f (r, q) = D f (q, r)

16 Bregman Projections If f(q) is a convex function over a domain D, and C is a convex subset of D, the Bregman projection of q onto C is π C (q) = arg min r C D f(r, q). and is characterized by the inequality ) ( ) D f (r, q) D f (r, π C (q) + D f π C (q), q, for all r C, or, equivalently, f(q) f ( πc (q) ), r π C (q) 0, for all r C.

17 Dykstra s Cyclic Projection Algorithm Seek minimization π C (q) = arg min r C D f(r, q) where C is the intersection of convex sets: C = First stab: Sometimes convergent algorithm, using C n+n = C n and initialization r n = π Cn (r n 1 ) = π Cn ( f ( f(r n 1 ) )) N n = 1 r 0 = q, s (N 1) = = s 1 = s 0 = 0. C n

18 Dykstra s Cyclic Projection Algorithm Seek minimization π C (q) = arg min r C D f(r, q) where C is the intersection of convex sets: C = N n = 1 Improved algorithm, r n n π C (q): r n = π Cn ( f ( f(r n 1 ) + s n N ) ) s n = f(r n 1 ) + s n N f(r n ) using C n+n = C n and initialization r 0 = q, s (N 1) = = s 1 = s 0 = 0. C n

19 Minimum Distance Algorithm Find closest members: D f (r, q ) = inf D f (r, q), where C 1 C 2 =. r C 1, q C 2 Cyclic projection algorithm becomes r n = π C1 ( f ( f(q n 1 ) + v n 1 ) ) v n = f(q n 1 ) + v n 1 f(r n ( q n = π C2 f ( f ) ) (r n ) + w n 1 w n = f (r n ) + w n 1 f (q n ) Convergent in the Euclidean case D f (r, q) = 1 2 k (r k q k ) 2.

20 Belief Propagation Iterative (sometimes fast ) algorithm to calculate marginal probability functions. Given binary vector x = [x 1,..., x M ] {0, 1} M, and probability function G(x), seek marginals f k (x k ) = 1 x 1 =0 1 1 x k 1 =0 x k+1 =0 1 G(x) x M =0 Calculation is hard since there are 2 M evaluations for x. BP is applicable when G(x) splits into simpler factors: G(x) = K g k (x) k = 1 Usually, each factor g k depends only on a subset of variables in x.

21 Message passing algorithm m xi g k (x i ) = variable x i to factor g k = β k m gl x i (x i ) l k m gk x i (x i ) = factor g k to variable x i = α i m xl g k (x l ) x n:n i g k (x) l i x 1 x 2 x 3 x 4 x 5 m x1 g 1 (x 1 ) m g6 x 5 (x 5 ) Outgoing message on an edge depends only on incoming messages from other edges at that node ( extrinsic information). If convergent, the beliefs b i (x i ) = β i m gl x i (x i ) l g 1 (x) g 2 (x) g 3 (x) g 4 (x) g 5 (x) g 6 (x) are thresholded: b i (0) > < b i(1). Ideally, beliefs would be marginals.

22 For projection interpretation, let x 1,..., x K be K copies of x, and consider extended likelihood function G(x 1,..., x K ) = K g k (x k ). k = 1 Original function is obtained with constraint x 1 = = x K = x. Two convex sets: Product distributions of 2 MK variables: P = { θ : f (θ) = K M k = 1 m = 1 } q k,m (x k m) Constraint (or sparse ) distributions { } Q = r : r(x 1,..., x K ) = 0 if x i x j for any i j

23 Information geometric view: At factor nodes: Given pmf q, if r P is a product distribution built from the marginals of q, then D f (q, s) = D f (q, r) + D f (r, s), for all s P }{{} 0 so that r is the Bregman projection of q onto P. At variable nodes: Given pmf q, if Q denotes a set of sparse distributions, then { β qi, i index set for Q; b i = 0, otherwise; satisfies D f (s, q) = D f (s, b) + D }{{} f (b, q), for all s Q. 0 so that b (containing beliefs) is Bregman projection onto Q.

24 Information geometric view: At factor nodes: Given pmf q, if r P is a product distribution built from the marginals of q, then D f (q, s) = D f (q, r) + D f (r, s), for all s P }{{} 0 so that r is the Bregman projection of q onto P. At variable nodes: Given pmf q, if Q denotes a set of sparse distributions, then { β qi, i index set for Q; b i = 0, otherwise; satisfies D f (s, q) = D f (s, b) + D }{{} f (b, q), for all s Q. 0 so that b (containing beliefs) is Bregman projection onto Q.

25 Calculation of marginal pmfs Ideal solution is p = ( π Q f (χ 1 ) ) (constrain to sparse distributions) χ 0 = ( ) π P f(p) (calculate marginal distributions) with initialization ( ) χ 1 = f g k (x k ) k

26 Belief propagation is the cyclic projection algorithm ξ n = π P (χ n 1 + σ n 1 ) xlskdjflskjldkfjls99lskdjf88 σ n = χ n 1 + σ n 1 ξ n p n = π Q ( f (ξ n + τ n 1 ) ) χ n = π P ( f(pn ) ) τ n = ξ n + τ n 1 χ n ξ n + τ n 1 χ n 1 + σ n 1 with initialization ( ) χ 1 = f g k (x k ) k σ 1 = τ 1 = 0 P ξ n χ n π P f ( ) χ n + σ n π Q f π P f ( f( ) ) ( f ( ) ) p n Q

27 Special Case: Replace negative Shannon entropy by energy function, to get Euclidean belief propagation : ξ n = π P (χ n 1 + σ n 1 ) σ n = χ n 1 + σ n 1 ξ n p n = π Q (ξ n + τ n 1 ) χ n = π P (p n ) τ n = ξ n + τ n 1 χ n Given arbitary convex sets P and Q, can show that {χ n } converges for any initial condition χ 1. Conventional belief propagation, by contrast, is convergent when either of the following apply: Factor graph is a tree/forest (or nearly so: large girth); Initialization χ 1 is a product distribution (or nearly so: π P (χ 1 ) χ 1 ).

28 Special Case: Replace negative Shannon entropy by energy function, to get Euclidean belief propagation : ξ n = π P (χ n 1 + σ n 1 ) σ n = χ n 1 + σ n 1 ξ n p n = π Q (ξ n + τ n 1 ) χ n = π P (p n ) τ n = ξ n + τ n 1 χ n Given arbitary convex sets P and Q, can show that {χ n } converges for any initial condition χ 1. Conventional belief propagation, by contrast, is convergent when either of the following apply: Factor graph is a tree/forest (or nearly so: large girth); Initialization χ 1 is a product distribution (or nearly so: π P (χ 1 ) χ 1 ).

29 Further perspectives: Rephrasing belief propagation in terms of cyclic convex projection algorithms suggests that convergence studies of the latter might extend to the former. Euclidean belief propagation is always convergent, although conventional (information projection-based) belief propagation can diverge is some settings. Continuity of projectors can be invoked to extend cases where belief propagation is proved to converge to good solutions. Can some modification to belief propagation give a more faithful transcription of Dykstra s algorithm?

30

Belief Propagation, Information Projections, and Dykstra s Algorithm

Belief Propagation, Information Projections, and Dykstra s Algorithm Belief Propagation, Information Projections, and Dykstra s Algorithm John MacLaren Walsh, PhD Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu

More information

Belief Propagation, Dykstra s Algorithm, and Iterated Information Projections

Belief Propagation, Dykstra s Algorithm, and Iterated Information Projections Belief Propagation, Dykstra s Algorithm, and Iterated Information Projections John MacLaren Walsh, Member, IEEE, Phillip A. Regalia Fellow, IEEE, Abstract Belief propagation is shown to be an instance

More information

A Survey of Information Geometry in Iterative Receiver Design, and Its Application to Cooperative Positioning

A Survey of Information Geometry in Iterative Receiver Design, and Its Application to Cooperative Positioning A Survey of Information Geometry in Iterative Receiver Design, and Its Application to Cooperative Positioning Master of Science Thesis in Communication Engineering DAPENG LIU Department of Signals and

More information

Information Geometric view of Belief Propagation

Information Geometric view of Belief Propagation Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural

More information

Characterizing the Region of Entropic Vectors via Information Geometry

Characterizing the Region of Entropic Vectors via Information Geometry Characterizing the Region of Entropic Vectors via Information Geometry John MacLaren Walsh Department of Electrical and Computer Engineering Drexel University Philadelphia, PA jwalsh@ece.drexel.edu Thanks

More information

Turbo Decoding as Constrained Optimization

Turbo Decoding as Constrained Optimization urbo Decoding as Constrained Optimization John M. Walsh School of Elec. and Comp. Eng. Cornell niversity Ithaca, NY 14850 jmw56@cornell.edu Phillip A. Regalia Dept. Elec. Eng. & Comp. Sci. Catholic niv.

More information

Belief-Propagation Decoding of LDPC Codes

Belief-Propagation Decoding of LDPC Codes LDPC Codes: Motivation Belief-Propagation Decoding of LDPC Codes Amir Bennatan, Princeton University Revolution in coding theory Reliable transmission, rates approaching capacity. BIAWGN, Rate =.5, Threshold.45

More information

Graphical models and message-passing Part II: Marginals and likelihoods

Graphical models and message-passing Part II: Marginals and likelihoods Graphical models and message-passing Part II: Marginals and likelihoods Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available

More information

Distributed Source Coding Using LDPC Codes

Distributed Source Coding Using LDPC Codes Distributed Source Coding Using LDPC Codes Telecommunications Laboratory Alex Balatsoukas-Stimming Technical University of Crete May 29, 2010 Telecommunications Laboratory (TUC) Distributed Source Coding

More information

Connecting Belief Propagation with Maximum Likelihood Detection

Connecting Belief Propagation with Maximum Likelihood Detection Connecting Belief Propagation with Maximum Likelihood Detection John MacLaren Walsh 1, Phillip Allan Regalia 2 1 School of Electrical Computer Engineering, Cornell University, Ithaca, NY. jmw56@cornell.edu

More information

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes

Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes Lower Bounds on the Graphical Complexity of Finite-Length LDPC Codes Igal Sason Department of Electrical Engineering Technion - Israel Institute of Technology Haifa 32000, Israel 2009 IEEE International

More information

5. Density evolution. Density evolution 5-1

5. Density evolution. Density evolution 5-1 5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;

More information

Variational algorithms for marginal MAP

Variational algorithms for marginal MAP Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Variational algorithms for marginal MAP Alexander Ihler UC Irvine CIOG Workshop November 2011 Work with Qiang

More information

Joint Iterative Decoding of LDPC Codes and Channels with Memory

Joint Iterative Decoding of LDPC Codes and Channels with Memory Joint Iterative Decoding of LDPC Codes and Channels with Memory Henry D. Pfister and Paul H. Siegel University of California, San Diego 3 rd Int l Symposium on Turbo Codes September 1, 2003 Outline Channels

More information

EECS 545 Project Report: Query Learning for Multiple Object Identification

EECS 545 Project Report: Query Learning for Multiple Object Identification EECS 545 Project Report: Query Learning for Multiple Object Identification Dan Lingenfelter, Tzu-Yu Liu, Antonis Matakos and Zhaoshi Meng 1 Introduction In a typical query learning setting we have a set

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

Lecture 5 Channel Coding over Continuous Channels

Lecture 5 Channel Coding over Continuous Channels Lecture 5 Channel Coding over Continuous Channels I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw November 14, 2014 1 / 34 I-Hsiang Wang NIT Lecture 5 From

More information

Alternating Minimization and Alternating Projection Algorithms: A Tutorial

Alternating Minimization and Alternating Projection Algorithms: A Tutorial Alternating Minimization and Alternating Projection Algorithms: A Tutorial Charles L. Byrne Department of Mathematical Sciences University of Massachusetts Lowell Lowell, MA 01854 March 27, 2011 Abstract

More information

Context tree models for source coding

Context tree models for source coding Context tree models for source coding Toward Non-parametric Information Theory Licence de droits d usage Outline Lossless Source Coding = density estimation with log-loss Source Coding and Universal Coding

More information

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe SHARED INFORMATION Prakash Narayan with Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe 2/41 Outline Two-terminal model: Mutual information Operational meaning in: Channel coding: channel

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

LDPC Codes. Intracom Telecom, Peania

LDPC Codes. Intracom Telecom, Peania LDPC Codes Alexios Balatsoukas-Stimming and Athanasios P. Liavas Technical University of Crete Dept. of Electronic and Computer Engineering Telecommunications Laboratory December 16, 2011 Intracom Telecom,

More information

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar

Discrete Markov Random Fields the Inference story. Pradeep Ravikumar Discrete Markov Random Fields the Inference story Pradeep Ravikumar Graphical Models, The History How to model stochastic processes of the world? I want to model the world, and I like graphs... 2 Mid to

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1,

On the interior of the simplex, we have the Hessian of d(x), Hd(x) is diagonal with ith. µd(w) + w T c. minimize. subject to w T 1 = 1, Math 30 Winter 05 Solution to Homework 3. Recognizing the convexity of g(x) := x log x, from Jensen s inequality we get d(x) n x + + x n n log x + + x n n where the equality is attained only at x = (/n,...,

More information

Efficient Network-wide Available Bandwidth Estimation through Active Learning and Belief Propagation

Efficient Network-wide Available Bandwidth Estimation through Active Learning and Belief Propagation Efficient Network-wide Available Bandwidth Estimation through Active Learning and Belief Propagation mark.coates@mcgill.ca McGill University Department of Electrical and Computer Engineering Montreal,

More information

Asynchronous Decoding of LDPC Codes over BEC

Asynchronous Decoding of LDPC Codes over BEC Decoding of LDPC Codes over BEC Saeid Haghighatshoar, Amin Karbasi, Amir Hesam Salavati Department of Telecommunication Systems, Technische Universität Berlin, Germany, School of Engineering and Applied

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes EE229B - Final Project Capacity-Approaching Low-Density Parity-Check Codes Pierre Garrigues EECS department, UC Berkeley garrigue@eecs.berkeley.edu May 13, 2005 Abstract The class of low-density parity-check

More information

Tailored Bregman Ball Trees for Effective Nearest Neighbors

Tailored Bregman Ball Trees for Effective Nearest Neighbors Tailored Bregman Ball Trees for Effective Nearest Neighbors Frank Nielsen 1 Paolo Piro 2 Michel Barlaud 2 1 Ecole Polytechnique, LIX, Palaiseau, France 2 CNRS / University of Nice-Sophia Antipolis, Sophia

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Codes on Graphs. Telecommunications Laboratory. Alex Balatsoukas-Stimming. Technical University of Crete. November 27th, 2008

Codes on Graphs. Telecommunications Laboratory. Alex Balatsoukas-Stimming. Technical University of Crete. November 27th, 2008 Codes on Graphs Telecommunications Laboratory Alex Balatsoukas-Stimming Technical University of Crete November 27th, 2008 Telecommunications Laboratory (TUC) Codes on Graphs November 27th, 2008 1 / 31

More information

Information Geometry

Information Geometry 2015 Workshop on High-Dimensional Statistical Analysis Dec.11 (Friday) ~15 (Tuesday) Humanities and Social Sciences Center, Academia Sinica, Taiwan Information Geometry and Spontaneous Data Learning Shinto

More information

Bayesian networks approximation

Bayesian networks approximation Bayesian networks approximation Eric Fabre ANR StochMC, Feb. 13, 2014 Outline 1 Motivation 2 Formalization 3 Triangulated graphs & I-projections 4 Successive approximations 5 Best graph selection 6 Conclusion

More information

Lecture 4 : Introduction to Low-density Parity-check Codes

Lecture 4 : Introduction to Low-density Parity-check Codes Lecture 4 : Introduction to Low-density Parity-check Codes LDPC codes are a class of linear block codes with implementable decoders, which provide near-capacity performance. History: 1. LDPC codes were

More information

Common Randomness Principles of Secrecy

Common Randomness Principles of Secrecy Common Randomness Principles of Secrecy Himanshu Tyagi Department of Electrical and Computer Engineering and Institute of Systems Research 1 Correlated Data, Distributed in Space and Time Sensor Networks

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION

STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION EE229B PROJECT REPORT STUDY OF PERMUTATION MATRICES BASED LDPC CODE CONSTRUCTION Zhengya Zhang SID: 16827455 zyzhang@eecs.berkeley.edu 1 MOTIVATION Permutation matrices refer to the square matrices with

More information

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels

Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels Performance Analysis and Code Optimization of Low Density Parity-Check Codes on Rayleigh Fading Channels Jilei Hou, Paul H. Siegel and Laurence B. Milstein Department of Electrical and Computer Engineering

More information

Message passing and approximate message passing

Message passing and approximate message passing Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x

More information

On Bit Error Rate Performance of Polar Codes in Finite Regime

On Bit Error Rate Performance of Polar Codes in Finite Regime On Bit Error Rate Performance of Polar Codes in Finite Regime A. Eslami and H. Pishro-Nik Abstract Polar codes have been recently proposed as the first low complexity class of codes that can provably achieve

More information

EXPECTATION PROPAGATION FOR DISTRIBUTED ESTIMATION IN SENSOR NETWORKS

EXPECTATION PROPAGATION FOR DISTRIBUTED ESTIMATION IN SENSOR NETWORKS EXPECTATION PROPAGATION FOR DISTRIBUTED ESTIMATION IN SENSOR NETWORKS John MacLaren Walsh Dept. of Electrical & Computer Eng. Drexel University Philadelphia, PA 19104 Phillip A. Regalia Dept. of Electrical

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

LOW-density parity-check (LDPC) codes were invented

LOW-density parity-check (LDPC) codes were invented IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 54, NO 1, JANUARY 2008 51 Extremal Problems of Information Combining Yibo Jiang, Alexei Ashikhmin, Member, IEEE, Ralf Koetter, Senior Member, IEEE, and Andrew

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Fractional Belief Propagation

Fractional Belief Propagation Fractional Belief Propagation im iegerinck and Tom Heskes S, niversity of ijmegen Geert Grooteplein 21, 6525 EZ, ijmegen, the etherlands wimw,tom @snn.kun.nl Abstract e consider loopy belief propagation

More information

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe

SHARED INFORMATION. Prakash Narayan with. Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe SHARED INFORMATION Prakash Narayan with Imre Csiszár, Sirin Nitinawarat, Himanshu Tyagi, Shun Watanabe 2/40 Acknowledgement Praneeth Boda Himanshu Tyagi Shun Watanabe 3/40 Outline Two-terminal model: Mutual

More information

Integrating Correlated Bayesian Networks Using Maximum Entropy

Integrating Correlated Bayesian Networks Using Maximum Entropy Applied Mathematical Sciences, Vol. 5, 2011, no. 48, 2361-2371 Integrating Correlated Bayesian Networks Using Maximum Entropy Kenneth D. Jarman Pacific Northwest National Laboratory PO Box 999, MSIN K7-90

More information

Auxiliary-Function Methods in Optimization

Auxiliary-Function Methods in Optimization Auxiliary-Function Methods in Optimization Charles Byrne (Charles Byrne@uml.edu) http://faculty.uml.edu/cbyrne/cbyrne.html Department of Mathematical Sciences University of Massachusetts Lowell Lowell,

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

5 Mutual Information and Channel Capacity

5 Mutual Information and Channel Capacity 5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several

More information

ECEN 655: Advanced Channel Coding

ECEN 655: Advanced Channel Coding ECEN 655: Advanced Channel Coding Course Introduction Henry D. Pfister Department of Electrical and Computer Engineering Texas A&M University ECEN 655: Advanced Channel Coding 1 / 19 Outline 1 History

More information

Legendre transformation and information geometry

Legendre transformation and information geometry Legendre transformation and information geometry CIG-MEMO #2, v1 Frank Nielsen École Polytechnique Sony Computer Science Laboratorie, Inc http://www.informationgeometry.org September 2010 Abstract We explain

More information

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization

More information

Quasi-cyclic Low Density Parity Check codes with high girth

Quasi-cyclic Low Density Parity Check codes with high girth Quasi-cyclic Low Density Parity Check codes with high girth, a work with Marta Rossi, Richard Bresnan, Massimilliano Sala Summer Doctoral School 2009 Groebner bases, Geometric codes and Order Domains Dept

More information

Growth Rate of Spatially Coupled LDPC codes

Growth Rate of Spatially Coupled LDPC codes Growth Rate of Spatially Coupled LDPC codes Workshop on Spatially Coupled Codes and Related Topics at Tokyo Institute of Technology 2011/2/19 Contents 1. Factor graph, Bethe approximation and belief propagation

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS JONATHAN YEDIDIA, WILLIAM FREEMAN, YAIR WEISS 2001 MERL TECH REPORT Kristin Branson and Ian Fasel June 11, 2003 1. Inference Inference problems

More information

On the Block Error Probability of LP Decoding of LDPC Codes

On the Block Error Probability of LP Decoding of LDPC Codes On the Block Error Probability of LP Decoding of LDPC Codes Ralf Koetter CSL and Dept. of ECE University of Illinois at Urbana-Champaign Urbana, IL 680, USA koetter@uiuc.edu Pascal O. Vontobel Dept. of

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

arxiv: v1 [math.oc] 29 Dec 2018

arxiv: v1 [math.oc] 29 Dec 2018 DeGroot-Friedkin Map in Opinion Dynamics is Mirror Descent Abhishek Halder ariv:1812.11293v1 [math.oc] 29 Dec 2018 Abstract We provide a variational interpretation of the DeGroot-Friedkin map in opinion

More information

Adaptive Cut Generation for Improved Linear Programming Decoding of Binary Linear Codes

Adaptive Cut Generation for Improved Linear Programming Decoding of Binary Linear Codes Adaptive Cut Generation for Improved Linear Programming Decoding of Binary Linear Codes Xiaojie Zhang and Paul H. Siegel University of California, San Diego, La Jolla, CA 9093, U Email:{ericzhang, psiegel}@ucsd.edu

More information

Lecture 16: FTRL and Online Mirror Descent

Lecture 16: FTRL and Online Mirror Descent Lecture 6: FTRL and Online Mirror Descent Akshay Krishnamurthy akshay@cs.umass.edu November, 07 Recap Last time we saw two online learning algorithms. First we saw the Weighted Majority algorithm, which

More information

Graph-based codes for flash memory

Graph-based codes for flash memory 1/28 Graph-based codes for flash memory Discrete Mathematics Seminar September 3, 2013 Katie Haymaker Joint work with Professor Christine Kelley University of Nebraska-Lincoln 2/28 Outline 1 Background

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference II: Mean Field Method and Variational Principle Junming Yin Lecture 15, March 7, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3

More information

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem Michael Patriksson 0-0 The Relaxation Theorem 1 Problem: find f := infimum f(x), x subject to x S, (1a) (1b) where f : R n R

More information

Applications of Information Geometry to Hypothesis Testing and Signal Detection

Applications of Information Geometry to Hypothesis Testing and Signal Detection CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...

More information

Bounding the Entropic Region via Information Geometry

Bounding the Entropic Region via Information Geometry Bounding the ntropic Region via Information Geometry Yunshu Liu John MacLaren Walsh Dept. of C, Drexel University, Philadelphia, PA 19104, USA yunshu.liu@drexel.edu jwalsh@coe.drexel.edu Abstract This

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

The Geometry of Turbo-Decoding Dynamics

The Geometry of Turbo-Decoding Dynamics IEEE TRANSACTIONS ON INFORMATION THEORY, VOL 46, NO 1, JANUARY 2000 9 The Geometry of Turbo-Decoding Dynamics Tom Richardson Abstract The spectacular performance offered by turbo codes sparked intense

More information

A Unified Approach to Proximal Algorithms using Bregman Distance

A Unified Approach to Proximal Algorithms using Bregman Distance A Unified Approach to Proximal Algorithms using Bregman Distance Yi Zhou a,, Yingbin Liang a, Lixin Shen b a Department of Electrical Engineering and Computer Science, Syracuse University b Department

More information

Information geometry of Bayesian statistics

Information geometry of Bayesian statistics Information geometry of Bayesian statistics Hiroshi Matsuzoe Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Nagoya 466-8555, Japan Abstract.

More information

Single-Gaussian Messages and Noise Thresholds for Low-Density Lattice Codes

Single-Gaussian Messages and Noise Thresholds for Low-Density Lattice Codes Single-Gaussian Messages and Noise Thresholds for Low-Density Lattice Codes Brian M. Kurkoski, Kazuhiko Yamaguchi and Kingo Kobayashi kurkoski@ice.uec.ac.jp Dept. of Information and Communications Engineering

More information

Status of Knowledge on Non-Binary LDPC Decoders

Status of Knowledge on Non-Binary LDPC Decoders Status of Knowledge on Non-Binary LDPC Decoders Part I: From Binary to Non-Binary Belief Propagation Decoding D. Declercq 1 1 ETIS - UMR8051 ENSEA/Cergy-University/CNRS France IEEE SSC SCV Tutorial, Santa

More information

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction

A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction A New Look at First Order Methods Lifting the Lipschitz Gradient Continuity Restriction Marc Teboulle School of Mathematical Sciences Tel Aviv University Joint work with H. Bauschke and J. Bolte Optimization

More information

Lecture 2 Linear Codes

Lecture 2 Linear Codes Lecture 2 Linear Codes 2.1. Linear Codes From now on we want to identify the alphabet Σ with a finite field F q. For general codes, introduced in the last section, the description is hard. For a code of

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

17 Variational Inference

17 Variational Inference Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms for Inference Fall 2014 17 Variational Inference Prompted by loopy graphs for which exact

More information

Introduction to Low-Density Parity Check Codes. Brian Kurkoski

Introduction to Low-Density Parity Check Codes. Brian Kurkoski Introduction to Low-Density Parity Check Codes Brian Kurkoski kurkoski@ice.uec.ac.jp Outline: Low Density Parity Check Codes Review block codes History Low Density Parity Check Codes Gallager s LDPC code

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Information geometry of mirror descent

Information geometry of mirror descent Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee

More information

A Dykstra-like algorithm for two monotone operators

A Dykstra-like algorithm for two monotone operators A Dykstra-like algorithm for two monotone operators Heinz H. Bauschke and Patrick L. Combettes Abstract Dykstra s algorithm employs the projectors onto two closed convex sets in a Hilbert space to construct

More information

Some New Information Inequalities Involving f-divergences

Some New Information Inequalities Involving f-divergences BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 12, No 2 Sofia 2012 Some New Information Inequalities Involving f-divergences Amit Srivastava Department of Mathematics, Jaypee

More information

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

This article was published in an Elsevier journal. The attached copy is furnished to the author for non-commercial research and education use, including for instruction at the author s institution, sharing

More information

Inference as Optimization

Inference as Optimization Inference as Optimization Sargur Srihari srihari@cedar.buffalo.edu 1 Topics in Inference as Optimization Overview Exact Inference revisited The Energy Functional Optimizing the Energy Functional 2 Exact

More information

Machine Learning Techniques for Computer Vision

Machine Learning Techniques for Computer Vision Machine Learning Techniques for Computer Vision Part 2: Unsupervised Learning Microsoft Research Cambridge x 3 1 0.5 0.2 0 0.5 0.3 0 0.5 1 ECCV 2004, Prague x 2 x 1 Overview of Part 2 Mixture models EM

More information

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique

Master 2 MathBigData. 3 novembre CMAP - Ecole Polytechnique Master 2 MathBigData S. Gaïffas 1 3 novembre 2014 1 CMAP - Ecole Polytechnique 1 Supervised learning recap Introduction Loss functions, linearity 2 Penalization Introduction Ridge Sparsity Lasso 3 Some

More information

On the Typicality of the Linear Code Among the LDPC Coset Code Ensemble

On the Typicality of the Linear Code Among the LDPC Coset Code Ensemble 5 Conference on Information Sciences and Systems The Johns Hopkins University March 16 18 5 On the Typicality of the Linear Code Among the LDPC Coset Code Ensemble C.-C. Wang S.R. Kulkarni and H.V. Poor

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information