Factor Graphs and Message Passing Algorithms Part 1: Introduction

Similar documents
An Introduction to Factor Graphs

Lecture 4: State Estimation in Hidden Markov Models (cont.)

IEEE SIGNAL PROCESSING MAGAZINE

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

arxiv:cs/ v2 [cs.it] 1 Oct 2006

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Basic math for biology

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

STA 4273H: Statistical Machine Learning

ECEN 655: Advanced Channel Coding

CS711008Z Algorithm Design and Analysis

Probabilistic and Bayesian Machine Learning

Bayesian Machine Learning - Lecture 7

Lecture 4 : Introduction to Low-density Parity-check Codes

Graphical Models Seminar

LDPC Codes. Slides originally from I. Land p.1

9 Forward-backward algorithm, sum-product on factor graphs

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

STA 414/2104: Machine Learning

Introduction to Low-Density Parity Check Codes. Brian Kurkoski

Codes on Graphs. Telecommunications Laboratory. Alex Balatsoukas-Stimming. Technical University of Crete. November 27th, 2008

The Ising model and Markov chain Monte Carlo

Trellis-based Detection Techniques

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Computation of Information Rates from Finite-State Source/Channel Models

Turbo Codes are Low Density Parity Check Codes

Inference in Bayesian Networks

Low-Density Parity-Check Codes

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

Machine Learning 4771

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Introduction to Machine Learning CMU-10701

Codes on Graphs, Normal Realizations, and Partition Functions

Belief-Propagation Decoding of LDPC Codes

Codes on graphs. Chapter Elementary realizations of linear block codes

p L yi z n m x N n xi

Statistical NLP: Hidden Markov Models. Updated 12/15

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR. Sp ' 00

LDPC Codes. Intracom Telecom, Peania

Computer Intensive Methods in Mathematical Statistics

Expectation propagation for signal detection in flat-fading channels

p(d θ ) l(θ ) 1.2 x x x

Dual Decomposition for Inference

an introduction to bayesian inference

Belief propagation decoding of quantum channels by passing quantum messages

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

1 1 0, g Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g

Linear Dynamical Systems

Machine Learning for Data Science (CS4786) Lecture 24

COMP90051 Statistical Machine Learning

1e

Lecture 11: Hidden Markov Models

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Hidden Markov Models

Graph-based Codes and Iterative Decoding

Aalborg Universitet. Bounds on information combining for parity-check equations Land, Ingmar Rüdiger; Hoeher, A.; Huber, Johannes

Hidden Markov Models. Terminology and Basic Algorithms

Markov Chains and Hidden Markov Models

Low-density parity-check (LDPC) codes

Shannon s Noisy-Channel Coding Theorem

Dynamic Approaches: The Hidden Markov Model

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Status of Knowledge on Non-Binary LDPC Decoders

Graphical Models and Kernel Methods

A brief introduction to Conditional Random Fields

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Capacity-approaching codes

Least Squares and Kalman Filtering on Forney Graphs

Steepest descent on factor graphs

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

Hidden Markov Models. Terminology and Basic Algorithms

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Pattern Recognition and Machine Learning

Noisy channel communication

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road UNIT I

Linear Block Codes. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Gaussian Message Passing on Linear Models: An Update

Physical Layer and Coding

Joint Iterative Decoding of LDPC Codes and Channels with Memory

Bioinformatics: Biology X

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

13 : Variational Inference: Loopy Belief Propagation and Mean Field

Linear Dynamical Systems (Kalman filter)

Introduction to Probabilistic Graphical Models: Exercises

An Introduction to Low Density Parity Check (LDPC) Codes

5. Sum-product algorithm

Sequential Supervised Learning

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Graphical Models for Collaborative Filtering

13: Variational inference II

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Introducing Low-Density Parity-Check Codes

Transcription:

Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1

The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except x k 2. Maximization: Compute the max-marginal ˆf k (x k ) max f(x 1,..., x n ) x 1,..., x n except x k assuming that f is real-valued and nonnegative and has a maximum. Note that ( argmax f(x 1,..., x n ) argmax ˆf 1 (x 1 ),..., argmax ˆf ) n (x n ). For large n, both problems are in general intractable (even for x 1,..., x n {0, 1}). 2

Factorization Helps For example, if f(x 1,..., f n ) f 1 (x 1 )f 2 (x 2 ) f n (x n ) then f k (x k ) x1 and f 1 (x 1 ) f k 1 (x k 1 )f k (x k ) f k+1 (x k+1 ) f n (x n ) x k 1 x k+1 x n ˆf k (x k ) max x 1 f 1 (x 1 ) max x k 1 f k 1 (x k 1 )f k (x k ) max x k+1 f k+1 (x k+1 ) max x n f n (x n ). Factorization helps also beyond this trivial example. Factor graphs and message passing algorithms. 3

Roots Statistical physics: - Markov random fields (Ising 1925?) Signal processing: - linear state-space models and Kalman filtering: Kalman 1960... - recursive least-squares adaptive filters - Hidden Markov models and forward-backward algorithm: Baum et al. 1966... Error correcting codes: - Low-density parity check codes: Gallager 1962; Tanner 1981; MacKay 1996; Luby et al. 1998... - Convolutional codes and Viterbi decoding: Forney 1973... - Turbo codes: Berrou et al. 1993... Machine learning, statistics: - Bayesian networks: Pearl 1988; Shachter 1988; Lauritzen and Spiegelhalter 1988; Shafer and Shenoy 1990... 4

Outline of this talk 1. Factor graphs:........................................... 6 2. The sum-product and max product algorithms........... 19 3. On factor graphs with cycles............................ 48 4. Factor graphs and error correcting codes................. 51 5

Factor Graphs A factor graph represents the factorization of a function of several variables. We use Forney-style factor graphs (Forney, 2001). Example: f(x 1, x 2, x 3, x 4, x 5 ) f A (x 1, x 2, x 3 ) f B (x 3, x 4, x 5 ) f C (x 4 ). x 1 f A x 2 x 3 f B x 4 x 5 Rules: A node for every factor. An edge or half-edge for every variable. Node g is connected to edge x iff variable x appears in factor g. (What if some variable appears in more than 2 factors?) f C 6

A main application of factor graphs are stochastic models. Example: Markov Chain p XY Z (x, y, z) p X (x) p Y X (y x) p Z Y (z y). p X X p Y X Y p Z Y Z 7

Other Notation Systems for Graphical Models Example: p(u, w, x, y, z) p(u)p(w)p(x u, w)p(y x)p(z x). W U X Z W U X Z Y Y Forney-style factor graph. Original factor graph [FKLW 1997]. U X W Z W U X Z Y Y Bayesian network. Markov random field. 8

Terminology x 1 f A x 2 x 3 f B x 4 x 5 f C Local function factor (such as f A, f B, f C ). Global function f product of all local functions; usually (but not always!) real and nonnegative. A configuration is an assignment of values to all variables. The configuration space is the set of all configurations, which is the domain of the global function. A configuration ω (x 1,..., x 5 ) is valid iff f(ω) 0. 9

Invalid Configurations Do Not Affect Marginals A configuration ω (x 1,..., x n ) is valid iff f(ω) 0. Recall: 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except x k 2. Maximization: Compute the max-marginal ˆf k (x k ) max x 1,..., x n except x k f(x 1,..., x n ) assuming that f is real-valued and nonnegative and has a maximum. 10

Auxiliary Variables Example: Let Y 1 and Y 2 be two independent observations of X: p(x, y 1, y 2 ) p(x)p(y 1 x)p(y 2 x). p X p Y1 X X X f X p Y2 X Y 1 Y 2 Literally, the factor graph represents an extended model p(x, x, x, y 1, y 2 ) p(x)p(y 1 x )p(y 2 x )f (x, x, x ) where f (x, x, x ) δ(x x )δ(x x ) enforces X X X for every valid configuration. 11

Branching Points Equality constraint nodes may be viewed as branching points: X X X X The factor f (x, x, x ) δ(x x )δ(x x ) enforces X X X for every valid configuration. 12

Modularity, Special Symbols, Arrows As a refinement of the previous example, let Y 1 X + Z 1 (1) Y 2 X + Z 2 (2) with Z 1 and Z 2 independent of each other and of X: X p X p Z1 Z 1 + + Z 2 p Z2 p Y1 X p Y2 X Y 1 Y 2 The + -nodes represent the factors δ(x+z 1 y 1 ) and δ(x + z 2 y 2 ), which enforce (1) and (2) for every valid configuration. 13

Known Variables vs. Unknown Variables Known variables (observations, known parameters,... ) plugged into the corresponding factors. may be Example: Y 2 y 2 observed. X p X ( ) p Y1 X( ) p Y2 X( ) Y 1 y 2 p Y2 X(y 2 ) Known variables will be denoted by small letters; unknown variables will usually be denoted by capital letters. 14

From A Priori to A Posteriori Probability Example (cont d): Let Y 1 y 1 and Y 2 y 2 be two independent observations of X. For fixed y 1 and y 2, we have p(x y 1, y 2 ) p(x, y 1, y 2 ) p(y 1, y 2 ) p(x, y 1, y 2 ). X p X p Y1 X p Y2 X y 1 y 2 The factorization is unchanged (except for a scale factor). 15

Example: Hidden Markov Model n p(x 0, x 1, x 2,..., x n, y 1, y 2,..., y n ) p(x 0 ) p(x k x k 1 )p(y k x k 1 ) k1 p(x 0 ) X 0 p(x 1 x 0 ) X 1 p(x 2 x 1 ) X 2... p(y 1 x 0 ) p(y 2 x 1 ) Y 1 Y 2 16

Example: Hidden Markov Model with Parameter(s) n p(x 0, x 1, x 2,..., x n, y 1, y 2,..., y n θ) p(x 0 ) p(x k x k 1, θ)p(y k x k 1 ) k1 Θ p(x 0 ) X 0 p(x 1 x 0, θ) X 1 p(x 2 x 1, θ) X 2... Y 1 Y 2 17

A non-stochastic example: Least-Squares Problems n Minimizing x 2 k subject to (linear or nonlinear) constraints is k1 equivalent to maximizing e n k1 x 2 k n subject to the given constraints. k1 e x2 k constraints X 1 e x2 1... X n e x2 n Here, the factor graph represents a nonnegative real-valued function that we wish to maximize. 18

Outline 1. Factor graphs:........................................... 6 2. The sum-product and max product algorithms........... 19 3. On factor graphs with cycles............................ 48 4. Factor graphs and error correcting codes................. 51 19

Recall: The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except x k 2. Maximization: Compute the max-marginal ˆf k (x k ) max x 1,..., x n except x k f(x 1,..., x n ) assuming that f is real-valued and nonnegative and has a maximum. 20

Message Passing Algorithms operate by passing messages along the edges of a factor graph:... 21

Towards the sum-product algorithm: Computing Marginals A Generic Example Assume we wish to compute f 3 (x 3 ) f(x 1,..., x 7 ) x 1,..., x 7 except x 3 and assume that f can be factored as follows: f 2 X 2 f 4 f 7 X 4 X 7 f 1 X 1 f 3 X 3 f 5 X 5 f 6 X 6 22

Example cont d: Closing Boxes by the Distributive Law f 2 X 2 f 4 f 7 X 4 X 7 f 1 X 1 f 3 X 3 f 5 X 5 f 6 X 6 f 3 (x 3 ) ( ) f 1 (x 1 )f 2 (x 2 )f 3 (x 1, x 2, x 3 ) x 1,x ( 2 ( ) ) f 4 (x 4 )f 5 (x 3, x 4, x 5 ) f 6 (x 5, x 6, x 7 )f 7 (x 7 ) x 4,x 5 x 6,x 7 23

Example cont d: Message Passing View f 2 X 2 f 4 f 7 X 4 X 7 f 1 X 1 X 3 f 3 X 5 f 5 f 6 X 6 f 3 (x 3 ) ( f 1 (x 1 )f 2 (x 2 )f 3 (x 1, x 2, x 3 ) x 1,x } 2 {{} µx3 (x 3 ) ( ( f 4 (x 4 )f 5 (x 3, x 4, x 5 ) f 6 (x 5, x 6, x 7 )f 7 (x 7 ) x 4,x 5 x 6,x } 7 {{} µx5 (x 5 ) }{{} µx3 (x 3 ) ) )) 24

Example cont d: Messages Everywhere f 4 f 7 f 2 X 2 X 4 X 7 f 1 X 1 X 3 f 3 X 5 f 5 f 6 X 6 With µ X1 (x 1 ) f 1 (x 1 ), µ X2 (x 2 ) f 2 (x 2 ), etc., we have µ X3 (x 3 ) µ X1 (x 1 ) µ X2 (x 2 )f 3 (x 1, x 2, x 3 ) x1,x2 µ X5 (x 5 ) µ X7 (x 7 )f 6 (x 5, x 6, x 7 ) x6,x7 µ X3 (x 3 ) µ X4 (x 4 ) µ X5 (x 5 )f 5 (x 3, x 4, x 5 ) x4,x5 25

The Sum-Product Algorithm (Belief Propagation). Y 1 X Y n g Sum-product message computation rule: µ X (x) y 1,...,y n g(x, y 1,..., y n ) µ Y1 (y 1 ) µ Yn (y n ) Sum-product theorem: If the factor graph for some global function f has no cycles, then f X (x) µ X (x) µ X (x). 26

Arrows and Notation for Messages X For edges drawn with arrows: µ X denotes the message in the direction of the arrow. µ X denotes the message in the opposite direction. Edges may be drawn with arrows just for the sake of this notation. 27

Sum-Product Algorithm: a Simple Example X p X p Z1 X Z 1 + X p Z2 + Z 2 y 1 y 2 µ X (x) p X (x) µ Z1 (z 1 ) p Z1 (z 1 ) µ Z2 (z 2 ) p Z2 (z 2 ) 28

Sum-Product Example cont d X p X p Z1 X Z 1 + X p Z2 + Z 2 y 1 µ X (x ) µ Z1 (z 1 ) δ(x + z 1 y 1 ) dz 1 z 1 p Z1 (y 1 x ) µ X (x) µ X (x ) µ X (x ) δ(x x ) δ(x x ) dx dx x x p Z1 (y 1 x) p Z2 (y 2 x) y 2 29

Sum-Product Example cont d X p X p Z1 X Z 1 + X p Z2 + Z 2 y 1 y 2 Marginal of the global function at X: µ X (x) µ X (x) p X (x) p Z1 (y 1 x) p Z2 (y 2 x) }{{} p(y 1,y 2 x) p(x y 1, y 2 ). 30

Messages for Finite-Alphabet Variables may be represented by a list of function values. Assume, for example that X takes values in {+1, 1}: X p X p Z1 X Z 1 + X p Z2 + Z 2 y 1 y 2 etc. µ X ( µ X (+1), µ X ( 1) ) ( p X (+1), p X ( 1) ) µ X ( µ X (+1), µ X ( 1) ) ( p Z1 (y 1 1), p Z1 (y 1 + 1) ) 31

Applying the sum-product algorithm to Hidden Markov Models yields recursive algorithms for many things. Recall the definition of a hidden Markov model (HMM): n p(x 0, x 1, x 2,..., x n, y 1, y 2,..., y n ) p(x 0 ) p(x k x k 1 )p(y k x k 1 ) k1 p(x 0 ) X 0 p(x 1 x 0 ) X 1 p(x 2 x 1 ) X 2... p(y 1 x 0 ) p(y 2 x 1 ) p(y 3 x 2 ) y 1 y 2 y 3 Assume that Y 1 y 1,..., Y n y n are observed (known). 32

Sum-product algorithm applied to HMM: Estimation of Current State p(x n y 1,..., y n ) p(x n, y 1,..., y n ) p(y 1,..., y n ) p(x n, y 1,..., y n ) x 0... x n 1 p(x 0, x 1,..., x n, y 1, y 2,..., y n ) For n 2: µ Xn (x n ). X 0 X 1 X 2 1 1... y 1 y 2 Y 3 33

Backward Message in Chain Rule Model X Y p X p Y X If Y y is known (observed): µ X (x) p Y X (y x), the likelihood function. If Y is unknown: µ X (x) y 1. p Y X (y x) 34

Sum-product algorithm applied to HMM: Prediction of Next Output Symbol p(y n+1 y 1,..., y n ) p(y 1,..., y n+1 ) p(y 1,..., y n ) p(y 1,..., y n+1 ) For n 2: x 0,x 1,...,x n p(x 0, x 1,..., x n, y 1, y 2,..., y n, y n+1 ) µ Yn (y n ). X 0 X 1 X 2 1... y 1 y 2 Y 3 35

Sum-product algorithm applied to HMM: Estimation of Time-k State p(x k y 1, y 2,..., y n ) p(x k, y 1, y 2,..., y n ) p(y 1, y 2,..., y n ) p(x k, y 1, y 2,..., y n ) p(x 0, x 1,..., x n, y 1, y 2,..., y n ) x 0,..., x n except x k For k 1: µ Xk (x k ) µ Xk (x k ) X 0 X 1 X 2... y 1 y 2 y 3 36

Sum-product algorithm applied to HMM: All States Simultaneously p(x k y 1,..., y n ) for all k: X 0 X 1 X 2... y 1 y 2 y 3 In this application, the sum-product algorithm coincides with the Baum-Welch / BCJR forward-backward algorithm. 37

Scaling of Messages In all the examples so far: The final result (such as µ Xk (x k ) µ Xk (x k )) equals the desired quantity (such as p(x k y 1,..., y n )) only up to a scale factor. The missing scale factor γ may be recovered at the end from the condition x k γ µ Xk (x k ) µ Xk (x k ) 1. It follows that messages may be scaled freely along the way. Such message scaling is often mandatory to avoid numerical problems. 38

Sum-product algorithm applied to HMM: Probability of the Observation p(y 1,..., y n ) x 0... x n p(x 0, x 1,..., x n, y 1, y 2,..., y n ) x n µ Xn (x n ). This is a number. Scale factors cannot be neglected in this case. For n 2: X 0 X 1 X 2 1 1... y 1 y 2 Y 3 39

Towards the max-product algorithm: Computing Max-Marginals A Generic Example Assume we wish to compute ˆf 3 (x 3 ) max x 1,..., x 7 except x 3 f(x 1,..., x 7 ) and assume that f can be factored as follows: f 2 X 2 f 4 f 7 X 4 X 7 f 1 X 1 f 3 X 3 f 5 X 5 f 6 X 6 40

Example: Closing Boxes by the Distributive Law f 2 X 2 f 4 f 7 X 4 X 7 f 1 X 1 f 3 X 3 f 5 X 5 f 6 X 6 ˆf 3 (x 3 ) ( max f 1 (x 1 )f 2 (x 2 )f 3 (x 1, x 2, x 3 ) x 1,x 2 ( ( ) ) max f 4 (x 4 )f 5 (x 3, x 4, x 5 ) max f 6 (x 5, x 6, x 7 )f 7 (x 7 ) x 4,x 5 x 6,x 7 ) 41

Example cont d: Message Passing View f 2 X 2 f 4 f 7 X 4 X 7 f 1 X 1 X 3 f 3 X 5 f 5 f 6 X 6 ˆf 3 (x 3 ) ( max f 1 (x 1 )f 2 (x 2 )f 3 (x 1, x 2, x 3 ) x 1,x } 2 {{} µx3 (x 3 ) ( ( max f 4 (x 4 )f 5 (x 3, x 4, x 5 ) x 4,x 5 ) max f 6 (x 5, x 6, x 7 )f 7 (x 7 ) x 6,x } 7 {{} µx5 (x 5 ) }{{} µx3 (x 3 ) )) 42

Example cont d: Messages Everywhere f 4 f 7 f 2 X 2 X 4 X 7 f 1 X 1 X 3 f 3 X 5 f 5 f 6 X 6 With µ X1 (x 1 ) f 1 (x 1 ), µ X2 (x 2 ) f 2 (x 2 ), etc., we have µ X3 (x 3 ) max µ X1 (x 1 ) µ X2 (x 2 )f 3 (x 1, x 2, x 3 ) x 1,x 2 µ X5 (x 5 ) max µ X7 (x 7 )f 6 (x 5, x 6, x 7 ) x 6,x 7 µ X3 (x 3 ) max µ X4 (x 4 ) µ X5 (x 5 )f 5 (x 3, x 4, x 5 ) x 4,x 5 43

The Max-Product Algorithm. Y 1 X Y n g Max-product message computation rule: µ X (x) max y 1,...,y n g(x, y 1,..., y n ) µ Y1 (y 1 ) µ Yn (y n ) Max-product theorem: If the factor graph for some global function f has no cycles, then ˆf X (x) µ X (x) µ X (x). 44

Max-product algorithm applied to HMM: MAP Estimate of the State Trajectory The estimate (ˆx 0,..., ˆx n ) MAP argmax x 0,...,x n p(x 0,..., x n y 1,..., y n ) may be obtained by computing argmax x 0,...,x n p(x 0,..., x n, y 1,..., y n ) ˆp k (x k ) max p(x 0,..., x n, y 1,..., y n ) x 1,..., x n except x k µ Xk (x k ) µ Xk (x k ) for all k by forward-backward max-product sweeps. In this example, the max-product algorithm is a time-symmetric version of the Viterbi algorithm with soft output. 45

Max-product algorithm applied to HMM: MAP Estimate of the State Trajectory cont d Computing ˆp k (x k ) max p(x 0,..., x n, y 1,..., y n ) x 1,..., x n except x k µ Xk (x k ) µ Xk (x k ) simultaneously for all k: X 0 X 1 X 2... y 1 y 2 y 3 46

Marginals and Output Edges Marginals such µ X (x) µ X (x) may be viewed as messages out of a output half edge (without incoming message): X X X X µ X(x) µ X (x ) µ X (x ) δ(x x ) δ(x x ) dx dx x x µ X (x) µ X (x) Marginals are computed like messages out of -nodes. 47

Outline 1. Factor graphs:........................................... 6 2. The sum-product and max product algorithms........... 19 3. On factor graphs with cycles............................ 48 4. Factor graphs and error correcting codes................. 51 48

What About Factor Graphs with Cycles? 49

What About Factor Graphs with Cycles? Generally iterative algorithms. For example, alternating maximization ˆx new argmax x f(x, ŷ) and ŷ new argmax y using the max-product algorithm in each iteration. f(ˆx, y) Iterative sum-product message passing gives excellent results for maximization(!) in some applications (e.g., the decoding of error correcting codes). Many other useful algorithms can be formulated in message passing form (e.g., gradient ascent, Gibbs sampling, expectation maximization, variational methods,... ). Rich and vast research area... 50

Outline 1. Factor graphs:........................................... 6 2. The sum-product and max product algorithms........... 19 3. On factor graphs with cycles............................ 48 4. Factor graphs and error correcting codes................. 51 51

Factor Graph of an Error Correcting Code Tanner graph of the code (Tanner 1981) A factor graph of a code C F n represents (a factorization of) the membership indicator function of the code: I C : F n {0, 1} : x { 1, if x C 0, else 52

Factor Graph from Parity Check Matrix Example: (7, 4, 3) binary Hamming code. (F GF(2).) C {x F n : Hx T 0} with H 1 1 1 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 0 1 The membership indicator function I C : F n {0, 1} : x of this code may be written as { 1, if x C 0, else I C (x 1,..., x n ) δ(x 1 x 2 x 3 x 5 ) δ(x 2 x 3 x 4 x 6 ) δ(x 3 x 4 x 5 x 7 ) where denotes addition modulo 2. Each factor corresponds to one row of the parity check matrix. 53

Factor Graph from Parity Check Matrix (cont d) Example: (7, 4, 3) binary Hamming code. C {x F n : Hx T 0} with H 1 1 1 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 1 0 1 X 1 X 2 X 3 X 4 X 5 X 6 X 7 54

Factor Graph for Joint Code / Channel Model code X 1 X 2... X n channel model Y 1 Y 2... Y n Example: memoryless channel: X 1 X 2 X n... Y 1 Y 2 Y n 55

Factor Graph from Generator Matrix Example: (7, 4, 3) binary Hamming code is the image of F k F n : u ug with G 1 0 1 1 0 0 0 0 1 0 1 1 0 0 0 0 1 0 1 1 0 0 0 0 1 0 1 1. U 1 U 2 U 3 U 4 X 1 X 2 X 3 X 4 X 5 X 6 X 7 56

Factor Graph of Dual Code is obtained by interchanging partity check nodes and equality check nodes (Kschischang, Forney). Works only for Forney-style factor graphs where all code symbols are external (half-edge) variables. Example: dual of (7, 4, 3) binary Hamming code X 1 X 2 X 3 X 4 X 5 X 6 X 7 57

Factor Graph Corresponding to Trellis Example: (7, 4, 3) binary Hamming code 0 0 1 10 1 0 10 1 00 1 0 1 1 1 0 0 1 01 1 1 0 11 11 0 0 0 01 00 X 1 X 4 X 3 X 2 X 5 X 6 X 7 S 1 S 2 S 3 S 4 S 6 58

Factor Graph of Low-Density Parity-Check Codes... random connections... X 1 X 2 X n Standard decoder: iterative sum-product message passing. Convergence is not guaranteed! Much recent / ongoing research on improved decoding: Yedidia, Freeman, Weiss; Feldman; Wainwright; Chertkov... 59

Single-Number Parameterizations of Soft-Bit Messages Difference: µ(0) µ(1) µ(0) + µ(1) mean of {+1, 1}-representation Ratio: Λ µ(0)/µ(1) Logarithm of ratio: L log ( µ(0)/µ(1) ) 60

Conversions among Parameterizations ( µ(0) µ(1) ( ) µ(0) µ(1) ) ( 1+ 2 m Λ L 1 2 ) ( Λ Λ+1 1 Λ+1 ) ( e L e L +1 1 e L +1 ) m µ(0) µ(1) µ(0)+µ(1) Λ 1 Λ+1 tanh(l/2) Λ µ(0) µ(1) 1+ 1 e L L ln µ(0) µ(1) 2 tanh 1 ( ) ln Λ σ 2 4µ(0)µ(1) (µ(0)+µ(1)) 2 1 m 2 4Λ (Λ+1) 2 4 e L +e L +2 61

Sum-Product Rules for Binary Parity Check Codes ( µz (0) µ Z (1) ) ( µx (0) µ Y (0) µ X (1) µ Y (1) ) X Y Z δ[x y] δ[x z] Z X + Y 1 + X Y Λ Z Λ X Λ Y L Z L X + L Y ( µz (0) µ Z (1) ) ( µx (0) µ Y (0) + µ X (1) µ Y (1) µ X (0) µ Y (1) + µ X (1) µ Y (0) ) X Y Z δ[x y z] Z Λ Z X Y 1 + Λ XΛ Y Λ X + Λ Y tanh(l Z /2) tanh(l X /2) tanh(l Y /2) 62

Max-Product Rules for Binary Parity Check Codes X δ[x y] δ[x z] X Y Y Z Z δ[x y z] µ Z(0) µ Z (1) µ Z(0) µ Z (1) µ X(0) µ Y (0) µ X (1) µ Y (1) L Z L X + L Y max { µ X (0) µ Y (0), µ X (1) µ Y (1) } max { µ X (0) µ Y (1), µ X (1) µ Y (0) } L Z min { L X, L Y } sgn(l Z ) sgn(l X ) sgn(l Y ) 63

Decomposition of Multi-Bit Checks 64

Encoder of a Turbo Code X 1,k X 2,k Interleaver X 3,k 65

Factor Graph of a Turbo Code X 1,k 1 X 1,k X 1,k+1 X 2,k 1 X 2,k X 2,k+1 random connections X 3,k 1 X 3,k X 3,k+1 66

Turbo Decoder: Conventional View L in2,k Decoder A L out,k L in1,k L outa,k L outb,k Interleaver Inverse interleaver L in3,k Decoder B 67

Messages in a Turbo Decoder L in1,k L outa,k +L outb,k L outa,k L in2,k L outb,k random connections L in3,k 68