Factor Graphs and Message Passing Algorithms Part 1: Introduction

Size: px
Start display at page:

Download "Factor Graphs and Message Passing Algorithms Part 1: Introduction"

Transcription

1 Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December

2 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except x k 2. Maximization: Compute the max-marginal ˆf k (x k ) max f(x 1,..., x n ) x 1,..., x n except x k assuming that f is real-valued and nonnegative and has a maximum. Note that ( argmax f(x 1,..., x n ) argmax ˆf 1 (x 1 ),..., argmax ˆf ) n (x n ). For large n, both problems are in general intractable (even for x 1,..., x n {0, 1}). 2

3 Factorization Helps For example, if f(x 1,..., f n ) f 1 (x 1 )f 2 (x 2 ) f n (x n ) then f k (x k ) x1 and f 1 (x 1 ) f k 1 (x k 1 )f k (x k ) f k+1 (x k+1 ) f n (x n ) x k 1 x k+1 x n ˆf k (x k ) max x 1 f 1 (x 1 ) max x k 1 f k 1 (x k 1 )f k (x k ) max x k+1 f k+1 (x k+1 ) max x n f n (x n ). Factorization helps also beyond this trivial example. Factor graphs and message passing algorithms. 3

4 Roots Statistical physics: - Markov random fields (Ising 1925?) Signal processing: - linear state-space models and Kalman filtering: Kalman recursive least-squares adaptive filters - Hidden Markov models and forward-backward algorithm: Baum et al Error correcting codes: - Low-density parity check codes: Gallager 1962; Tanner 1981; MacKay 1996; Luby et al Convolutional codes and Viterbi decoding: Forney Turbo codes: Berrou et al Machine learning, statistics: - Bayesian networks: Pearl 1988; Shachter 1988; Lauritzen and Spiegelhalter 1988; Shafer and Shenoy

5 Outline of this talk 1. Factor graphs: The sum-product and max product algorithms On factor graphs with cycles Factor graphs and error correcting codes

6 Factor Graphs A factor graph represents the factorization of a function of several variables. We use Forney-style factor graphs (Forney, 2001). Example: f(x 1, x 2, x 3, x 4, x 5 ) f A (x 1, x 2, x 3 ) f B (x 3, x 4, x 5 ) f C (x 4 ). x 1 f A x 2 x 3 f B x 4 x 5 Rules: A node for every factor. An edge or half-edge for every variable. Node g is connected to edge x iff variable x appears in factor g. (What if some variable appears in more than 2 factors?) f C 6

7 A main application of factor graphs are stochastic models. Example: Markov Chain p XY Z (x, y, z) p X (x) p Y X (y x) p Z Y (z y). p X X p Y X Y p Z Y Z 7

8 Other Notation Systems for Graphical Models Example: p(u, w, x, y, z) p(u)p(w)p(x u, w)p(y x)p(z x). W U X Z W U X Z Y Y Forney-style factor graph. Original factor graph [FKLW 1997]. U X W Z W U X Z Y Y Bayesian network. Markov random field. 8

9 Terminology x 1 f A x 2 x 3 f B x 4 x 5 f C Local function factor (such as f A, f B, f C ). Global function f product of all local functions; usually (but not always!) real and nonnegative. A configuration is an assignment of values to all variables. The configuration space is the set of all configurations, which is the domain of the global function. A configuration ω (x 1,..., x 5 ) is valid iff f(ω) 0. 9

10 Invalid Configurations Do Not Affect Marginals A configuration ω (x 1,..., x n ) is valid iff f(ω) 0. Recall: 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except x k 2. Maximization: Compute the max-marginal ˆf k (x k ) max x 1,..., x n except x k f(x 1,..., x n ) assuming that f is real-valued and nonnegative and has a maximum. 10

11 Auxiliary Variables Example: Let Y 1 and Y 2 be two independent observations of X: p(x, y 1, y 2 ) p(x)p(y 1 x)p(y 2 x). p X p Y1 X X X f X p Y2 X Y 1 Y 2 Literally, the factor graph represents an extended model p(x, x, x, y 1, y 2 ) p(x)p(y 1 x )p(y 2 x )f (x, x, x ) where f (x, x, x ) δ(x x )δ(x x ) enforces X X X for every valid configuration. 11

12 Branching Points Equality constraint nodes may be viewed as branching points: X X X X The factor f (x, x, x ) δ(x x )δ(x x ) enforces X X X for every valid configuration. 12

13 Modularity, Special Symbols, Arrows As a refinement of the previous example, let Y 1 X + Z 1 (1) Y 2 X + Z 2 (2) with Z 1 and Z 2 independent of each other and of X: X p X p Z1 Z Z 2 p Z2 p Y1 X p Y2 X Y 1 Y 2 The + -nodes represent the factors δ(x+z 1 y 1 ) and δ(x + z 2 y 2 ), which enforce (1) and (2) for every valid configuration. 13

14 Known Variables vs. Unknown Variables Known variables (observations, known parameters,... ) plugged into the corresponding factors. may be Example: Y 2 y 2 observed. X p X ( ) p Y1 X( ) p Y2 X( ) Y 1 y 2 p Y2 X(y 2 ) Known variables will be denoted by small letters; unknown variables will usually be denoted by capital letters. 14

15 From A Priori to A Posteriori Probability Example (cont d): Let Y 1 y 1 and Y 2 y 2 be two independent observations of X. For fixed y 1 and y 2, we have p(x y 1, y 2 ) p(x, y 1, y 2 ) p(y 1, y 2 ) p(x, y 1, y 2 ). X p X p Y1 X p Y2 X y 1 y 2 The factorization is unchanged (except for a scale factor). 15

16 Example: Hidden Markov Model n p(x 0, x 1, x 2,..., x n, y 1, y 2,..., y n ) p(x 0 ) p(x k x k 1 )p(y k x k 1 ) k1 p(x 0 ) X 0 p(x 1 x 0 ) X 1 p(x 2 x 1 ) X 2... p(y 1 x 0 ) p(y 2 x 1 ) Y 1 Y 2 16

17 Example: Hidden Markov Model with Parameter(s) n p(x 0, x 1, x 2,..., x n, y 1, y 2,..., y n θ) p(x 0 ) p(x k x k 1, θ)p(y k x k 1 ) k1 Θ p(x 0 ) X 0 p(x 1 x 0, θ) X 1 p(x 2 x 1, θ) X 2... Y 1 Y 2 17

18 A non-stochastic example: Least-Squares Problems n Minimizing x 2 k subject to (linear or nonlinear) constraints is k1 equivalent to maximizing e n k1 x 2 k n subject to the given constraints. k1 e x2 k constraints X 1 e x X n e x2 n Here, the factor graph represents a nonnegative real-valued function that we wish to maximize. 18

19 Outline 1. Factor graphs: The sum-product and max product algorithms On factor graphs with cycles Factor graphs and error correcting codes

20 Recall: The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except x k 2. Maximization: Compute the max-marginal ˆf k (x k ) max x 1,..., x n except x k f(x 1,..., x n ) assuming that f is real-valued and nonnegative and has a maximum. 20

21 Message Passing Algorithms operate by passing messages along the edges of a factor graph:... 21

22 Towards the sum-product algorithm: Computing Marginals A Generic Example Assume we wish to compute f 3 (x 3 ) f(x 1,..., x 7 ) x 1,..., x 7 except x 3 and assume that f can be factored as follows: f 2 X 2 f 4 f 7 X 4 X 7 f 1 X 1 f 3 X 3 f 5 X 5 f 6 X 6 22

23 Example cont d: Closing Boxes by the Distributive Law f 2 X 2 f 4 f 7 X 4 X 7 f 1 X 1 f 3 X 3 f 5 X 5 f 6 X 6 f 3 (x 3 ) ( ) f 1 (x 1 )f 2 (x 2 )f 3 (x 1, x 2, x 3 ) x 1,x ( 2 ( ) ) f 4 (x 4 )f 5 (x 3, x 4, x 5 ) f 6 (x 5, x 6, x 7 )f 7 (x 7 ) x 4,x 5 x 6,x 7 23

24 Example cont d: Message Passing View f 2 X 2 f 4 f 7 X 4 X 7 f 1 X 1 X 3 f 3 X 5 f 5 f 6 X 6 f 3 (x 3 ) ( f 1 (x 1 )f 2 (x 2 )f 3 (x 1, x 2, x 3 ) x 1,x } 2 {{} µx3 (x 3 ) ( ( f 4 (x 4 )f 5 (x 3, x 4, x 5 ) f 6 (x 5, x 6, x 7 )f 7 (x 7 ) x 4,x 5 x 6,x } 7 {{} µx5 (x 5 ) }{{} µx3 (x 3 ) ) )) 24

25 Example cont d: Messages Everywhere f 4 f 7 f 2 X 2 X 4 X 7 f 1 X 1 X 3 f 3 X 5 f 5 f 6 X 6 With µ X1 (x 1 ) f 1 (x 1 ), µ X2 (x 2 ) f 2 (x 2 ), etc., we have µ X3 (x 3 ) µ X1 (x 1 ) µ X2 (x 2 )f 3 (x 1, x 2, x 3 ) x1,x2 µ X5 (x 5 ) µ X7 (x 7 )f 6 (x 5, x 6, x 7 ) x6,x7 µ X3 (x 3 ) µ X4 (x 4 ) µ X5 (x 5 )f 5 (x 3, x 4, x 5 ) x4,x5 25

26 The Sum-Product Algorithm (Belief Propagation). Y 1 X Y n g Sum-product message computation rule: µ X (x) y 1,...,y n g(x, y 1,..., y n ) µ Y1 (y 1 ) µ Yn (y n ) Sum-product theorem: If the factor graph for some global function f has no cycles, then f X (x) µ X (x) µ X (x). 26

27 Arrows and Notation for Messages X For edges drawn with arrows: µ X denotes the message in the direction of the arrow. µ X denotes the message in the opposite direction. Edges may be drawn with arrows just for the sake of this notation. 27

28 Sum-Product Algorithm: a Simple Example X p X p Z1 X Z 1 + X p Z2 + Z 2 y 1 y 2 µ X (x) p X (x) µ Z1 (z 1 ) p Z1 (z 1 ) µ Z2 (z 2 ) p Z2 (z 2 ) 28

29 Sum-Product Example cont d X p X p Z1 X Z 1 + X p Z2 + Z 2 y 1 µ X (x ) µ Z1 (z 1 ) δ(x + z 1 y 1 ) dz 1 z 1 p Z1 (y 1 x ) µ X (x) µ X (x ) µ X (x ) δ(x x ) δ(x x ) dx dx x x p Z1 (y 1 x) p Z2 (y 2 x) y 2 29

30 Sum-Product Example cont d X p X p Z1 X Z 1 + X p Z2 + Z 2 y 1 y 2 Marginal of the global function at X: µ X (x) µ X (x) p X (x) p Z1 (y 1 x) p Z2 (y 2 x) }{{} p(y 1,y 2 x) p(x y 1, y 2 ). 30

31 Messages for Finite-Alphabet Variables may be represented by a list of function values. Assume, for example that X takes values in {+1, 1}: X p X p Z1 X Z 1 + X p Z2 + Z 2 y 1 y 2 etc. µ X ( µ X (+1), µ X ( 1) ) ( p X (+1), p X ( 1) ) µ X ( µ X (+1), µ X ( 1) ) ( p Z1 (y 1 1), p Z1 (y 1 + 1) ) 31

32 Applying the sum-product algorithm to Hidden Markov Models yields recursive algorithms for many things. Recall the definition of a hidden Markov model (HMM): n p(x 0, x 1, x 2,..., x n, y 1, y 2,..., y n ) p(x 0 ) p(x k x k 1 )p(y k x k 1 ) k1 p(x 0 ) X 0 p(x 1 x 0 ) X 1 p(x 2 x 1 ) X 2... p(y 1 x 0 ) p(y 2 x 1 ) p(y 3 x 2 ) y 1 y 2 y 3 Assume that Y 1 y 1,..., Y n y n are observed (known). 32

33 Sum-product algorithm applied to HMM: Estimation of Current State p(x n y 1,..., y n ) p(x n, y 1,..., y n ) p(y 1,..., y n ) p(x n, y 1,..., y n ) x 0... x n 1 p(x 0, x 1,..., x n, y 1, y 2,..., y n ) For n 2: µ Xn (x n ). X 0 X 1 X y 1 y 2 Y 3 33

34 Backward Message in Chain Rule Model X Y p X p Y X If Y y is known (observed): µ X (x) p Y X (y x), the likelihood function. If Y is unknown: µ X (x) y 1. p Y X (y x) 34

35 Sum-product algorithm applied to HMM: Prediction of Next Output Symbol p(y n+1 y 1,..., y n ) p(y 1,..., y n+1 ) p(y 1,..., y n ) p(y 1,..., y n+1 ) For n 2: x 0,x 1,...,x n p(x 0, x 1,..., x n, y 1, y 2,..., y n, y n+1 ) µ Yn (y n ). X 0 X 1 X y 1 y 2 Y 3 35

36 Sum-product algorithm applied to HMM: Estimation of Time-k State p(x k y 1, y 2,..., y n ) p(x k, y 1, y 2,..., y n ) p(y 1, y 2,..., y n ) p(x k, y 1, y 2,..., y n ) p(x 0, x 1,..., x n, y 1, y 2,..., y n ) x 0,..., x n except x k For k 1: µ Xk (x k ) µ Xk (x k ) X 0 X 1 X 2... y 1 y 2 y 3 36

37 Sum-product algorithm applied to HMM: All States Simultaneously p(x k y 1,..., y n ) for all k: X 0 X 1 X 2... y 1 y 2 y 3 In this application, the sum-product algorithm coincides with the Baum-Welch / BCJR forward-backward algorithm. 37

38 Scaling of Messages In all the examples so far: The final result (such as µ Xk (x k ) µ Xk (x k )) equals the desired quantity (such as p(x k y 1,..., y n )) only up to a scale factor. The missing scale factor γ may be recovered at the end from the condition x k γ µ Xk (x k ) µ Xk (x k ) 1. It follows that messages may be scaled freely along the way. Such message scaling is often mandatory to avoid numerical problems. 38

39 Sum-product algorithm applied to HMM: Probability of the Observation p(y 1,..., y n ) x 0... x n p(x 0, x 1,..., x n, y 1, y 2,..., y n ) x n µ Xn (x n ). This is a number. Scale factors cannot be neglected in this case. For n 2: X 0 X 1 X y 1 y 2 Y 3 39

40 Towards the max-product algorithm: Computing Max-Marginals A Generic Example Assume we wish to compute ˆf 3 (x 3 ) max x 1,..., x 7 except x 3 f(x 1,..., x 7 ) and assume that f can be factored as follows: f 2 X 2 f 4 f 7 X 4 X 7 f 1 X 1 f 3 X 3 f 5 X 5 f 6 X 6 40

41 Example: Closing Boxes by the Distributive Law f 2 X 2 f 4 f 7 X 4 X 7 f 1 X 1 f 3 X 3 f 5 X 5 f 6 X 6 ˆf 3 (x 3 ) ( max f 1 (x 1 )f 2 (x 2 )f 3 (x 1, x 2, x 3 ) x 1,x 2 ( ( ) ) max f 4 (x 4 )f 5 (x 3, x 4, x 5 ) max f 6 (x 5, x 6, x 7 )f 7 (x 7 ) x 4,x 5 x 6,x 7 ) 41

42 Example cont d: Message Passing View f 2 X 2 f 4 f 7 X 4 X 7 f 1 X 1 X 3 f 3 X 5 f 5 f 6 X 6 ˆf 3 (x 3 ) ( max f 1 (x 1 )f 2 (x 2 )f 3 (x 1, x 2, x 3 ) x 1,x } 2 {{} µx3 (x 3 ) ( ( max f 4 (x 4 )f 5 (x 3, x 4, x 5 ) x 4,x 5 ) max f 6 (x 5, x 6, x 7 )f 7 (x 7 ) x 6,x } 7 {{} µx5 (x 5 ) }{{} µx3 (x 3 ) )) 42

43 Example cont d: Messages Everywhere f 4 f 7 f 2 X 2 X 4 X 7 f 1 X 1 X 3 f 3 X 5 f 5 f 6 X 6 With µ X1 (x 1 ) f 1 (x 1 ), µ X2 (x 2 ) f 2 (x 2 ), etc., we have µ X3 (x 3 ) max µ X1 (x 1 ) µ X2 (x 2 )f 3 (x 1, x 2, x 3 ) x 1,x 2 µ X5 (x 5 ) max µ X7 (x 7 )f 6 (x 5, x 6, x 7 ) x 6,x 7 µ X3 (x 3 ) max µ X4 (x 4 ) µ X5 (x 5 )f 5 (x 3, x 4, x 5 ) x 4,x 5 43

44 The Max-Product Algorithm. Y 1 X Y n g Max-product message computation rule: µ X (x) max y 1,...,y n g(x, y 1,..., y n ) µ Y1 (y 1 ) µ Yn (y n ) Max-product theorem: If the factor graph for some global function f has no cycles, then ˆf X (x) µ X (x) µ X (x). 44

45 Max-product algorithm applied to HMM: MAP Estimate of the State Trajectory The estimate (ˆx 0,..., ˆx n ) MAP argmax x 0,...,x n p(x 0,..., x n y 1,..., y n ) may be obtained by computing argmax x 0,...,x n p(x 0,..., x n, y 1,..., y n ) ˆp k (x k ) max p(x 0,..., x n, y 1,..., y n ) x 1,..., x n except x k µ Xk (x k ) µ Xk (x k ) for all k by forward-backward max-product sweeps. In this example, the max-product algorithm is a time-symmetric version of the Viterbi algorithm with soft output. 45

46 Max-product algorithm applied to HMM: MAP Estimate of the State Trajectory cont d Computing ˆp k (x k ) max p(x 0,..., x n, y 1,..., y n ) x 1,..., x n except x k µ Xk (x k ) µ Xk (x k ) simultaneously for all k: X 0 X 1 X 2... y 1 y 2 y 3 46

47 Marginals and Output Edges Marginals such µ X (x) µ X (x) may be viewed as messages out of a output half edge (without incoming message): X X X X µ X(x) µ X (x ) µ X (x ) δ(x x ) δ(x x ) dx dx x x µ X (x) µ X (x) Marginals are computed like messages out of -nodes. 47

48 Outline 1. Factor graphs: The sum-product and max product algorithms On factor graphs with cycles Factor graphs and error correcting codes

49 What About Factor Graphs with Cycles? 49

50 What About Factor Graphs with Cycles? Generally iterative algorithms. For example, alternating maximization ˆx new argmax x f(x, ŷ) and ŷ new argmax y using the max-product algorithm in each iteration. f(ˆx, y) Iterative sum-product message passing gives excellent results for maximization(!) in some applications (e.g., the decoding of error correcting codes). Many other useful algorithms can be formulated in message passing form (e.g., gradient ascent, Gibbs sampling, expectation maximization, variational methods,... ). Rich and vast research area... 50

51 Outline 1. Factor graphs: The sum-product and max product algorithms On factor graphs with cycles Factor graphs and error correcting codes

52 Factor Graph of an Error Correcting Code Tanner graph of the code (Tanner 1981) A factor graph of a code C F n represents (a factorization of) the membership indicator function of the code: I C : F n {0, 1} : x { 1, if x C 0, else 52

53 Factor Graph from Parity Check Matrix Example: (7, 4, 3) binary Hamming code. (F GF(2).) C {x F n : Hx T 0} with H The membership indicator function I C : F n {0, 1} : x of this code may be written as { 1, if x C 0, else I C (x 1,..., x n ) δ(x 1 x 2 x 3 x 5 ) δ(x 2 x 3 x 4 x 6 ) δ(x 3 x 4 x 5 x 7 ) where denotes addition modulo 2. Each factor corresponds to one row of the parity check matrix. 53

54 Factor Graph from Parity Check Matrix (cont d) Example: (7, 4, 3) binary Hamming code. C {x F n : Hx T 0} with H X 1 X 2 X 3 X 4 X 5 X 6 X 7 54

55 Factor Graph for Joint Code / Channel Model code X 1 X 2... X n channel model Y 1 Y 2... Y n Example: memoryless channel: X 1 X 2 X n... Y 1 Y 2 Y n 55

56 Factor Graph from Generator Matrix Example: (7, 4, 3) binary Hamming code is the image of F k F n : u ug with G U 1 U 2 U 3 U 4 X 1 X 2 X 3 X 4 X 5 X 6 X 7 56

57 Factor Graph of Dual Code is obtained by interchanging partity check nodes and equality check nodes (Kschischang, Forney). Works only for Forney-style factor graphs where all code symbols are external (half-edge) variables. Example: dual of (7, 4, 3) binary Hamming code X 1 X 2 X 3 X 4 X 5 X 6 X 7 57

58 Factor Graph Corresponding to Trellis Example: (7, 4, 3) binary Hamming code X 1 X 4 X 3 X 2 X 5 X 6 X 7 S 1 S 2 S 3 S 4 S 6 58

59 Factor Graph of Low-Density Parity-Check Codes... random connections... X 1 X 2 X n Standard decoder: iterative sum-product message passing. Convergence is not guaranteed! Much recent / ongoing research on improved decoding: Yedidia, Freeman, Weiss; Feldman; Wainwright; Chertkov... 59

60 Single-Number Parameterizations of Soft-Bit Messages Difference: µ(0) µ(1) µ(0) + µ(1) mean of {+1, 1}-representation Ratio: Λ µ(0)/µ(1) Logarithm of ratio: L log ( µ(0)/µ(1) ) 60

61 Conversions among Parameterizations ( µ(0) µ(1) ( ) µ(0) µ(1) ) ( 1+ 2 m Λ L 1 2 ) ( Λ Λ+1 1 Λ+1 ) ( e L e L +1 1 e L +1 ) m µ(0) µ(1) µ(0)+µ(1) Λ 1 Λ+1 tanh(l/2) Λ µ(0) µ(1) 1+ 1 e L L ln µ(0) µ(1) 2 tanh 1 ( ) ln Λ σ 2 4µ(0)µ(1) (µ(0)+µ(1)) 2 1 m 2 4Λ (Λ+1) 2 4 e L +e L +2 61

62 Sum-Product Rules for Binary Parity Check Codes ( µz (0) µ Z (1) ) ( µx (0) µ Y (0) µ X (1) µ Y (1) ) X Y Z δ[x y] δ[x z] Z X + Y 1 + X Y Λ Z Λ X Λ Y L Z L X + L Y ( µz (0) µ Z (1) ) ( µx (0) µ Y (0) + µ X (1) µ Y (1) µ X (0) µ Y (1) + µ X (1) µ Y (0) ) X Y Z δ[x y z] Z Λ Z X Y 1 + Λ XΛ Y Λ X + Λ Y tanh(l Z /2) tanh(l X /2) tanh(l Y /2) 62

63 Max-Product Rules for Binary Parity Check Codes X δ[x y] δ[x z] X Y Y Z Z δ[x y z] µ Z(0) µ Z (1) µ Z(0) µ Z (1) µ X(0) µ Y (0) µ X (1) µ Y (1) L Z L X + L Y max { µ X (0) µ Y (0), µ X (1) µ Y (1) } max { µ X (0) µ Y (1), µ X (1) µ Y (0) } L Z min { L X, L Y } sgn(l Z ) sgn(l X ) sgn(l Y ) 63

64 Decomposition of Multi-Bit Checks 64

65 Encoder of a Turbo Code X 1,k X 2,k Interleaver X 3,k 65

66 Factor Graph of a Turbo Code X 1,k 1 X 1,k X 1,k+1 X 2,k 1 X 2,k X 2,k+1 random connections X 3,k 1 X 3,k X 3,k+1 66

67 Turbo Decoder: Conventional View L in2,k Decoder A L out,k L in1,k L outa,k L outb,k Interleaver Inverse interleaver L in3,k Decoder B 67

68 Messages in a Turbo Decoder L in1,k L outa,k +L outb,k L outa,k L in2,k L outb,k random connections L in3,k 68

An Introduction to Factor Graphs

An Introduction to Factor Graphs IEEE Signal Processing Mag., Jan. 2004 An Introduction to Factor Graphs Hans-Andrea Loeliger ETH Zürich Abstract A large variety of algorithms in coding, signal processing, and artificial intelligence

More information

Lecture 4: State Estimation in Hidden Markov Models (cont.)

Lecture 4: State Estimation in Hidden Markov Models (cont.) EE378A Statistical Signal Processing Lecture 4-04/13/2017 Lecture 4: State Estimation in Hidden Markov Models (cont.) Lecturer: Tsachy Weissman Scribe: David Wugofski In this lecture we build on previous

More information

IEEE SIGNAL PROCESSING MAGAZINE

IEEE SIGNAL PROCESSING MAGAZINE IMAGESTATE Hans-Andrea Loeliger Alarge variety of algorithms in coding, signal processing, and artificial intelligence may be viewed as instances of the summary-product algorithm (or belief/probability

More information

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II

Robert Collins CSE586 CSE 586, Spring 2015 Computer Vision II CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter Recall: Modeling Time Series State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

arxiv:cs/ v2 [cs.it] 1 Oct 2006

arxiv:cs/ v2 [cs.it] 1 Oct 2006 A General Computation Rule for Lossy Summaries/Messages with Examples from Equalization Junli Hu, Hans-Andrea Loeliger, Justin Dauwels, and Frank Kschischang arxiv:cs/060707v [cs.it] 1 Oct 006 Abstract

More information

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series

Recall: Modeling Time Series. CSE 586, Spring 2015 Computer Vision II. Hidden Markov Model and Kalman Filter. Modeling Time Series Recall: Modeling Time Series CSE 586, Spring 2015 Computer Vision II Hidden Markov Model and Kalman Filter State-Space Model: You have a Markov chain of latent (unobserved) states Each state generates

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010

Lecture 4: Hidden Markov Models: An Introduction to Dynamic Decision Making. November 11, 2010 Hidden Lecture 4: Hidden : An Introduction to Dynamic Decision Making November 11, 2010 Special Meeting 1/26 Markov Model Hidden When a dynamical system is probabilistic it may be determined by the transition

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

ECEN 655: Advanced Channel Coding

ECEN 655: Advanced Channel Coding ECEN 655: Advanced Channel Coding Course Introduction Henry D. Pfister Department of Electrical and Computer Engineering Texas A&M University ECEN 655: Advanced Channel Coding 1 / 19 Outline 1 History

More information

CS711008Z Algorithm Design and Analysis

CS711008Z Algorithm Design and Analysis .. Lecture 6. Hidden Markov model and Viterbi s decoding algorithm Institute of Computing Technology Chinese Academy of Sciences, Beijing, China . Outline The occasionally dishonest casino: an example

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Lecture 4 : Introduction to Low-density Parity-check Codes

Lecture 4 : Introduction to Low-density Parity-check Codes Lecture 4 : Introduction to Low-density Parity-check Codes LDPC codes are a class of linear block codes with implementable decoders, which provide near-capacity performance. History: 1. LDPC codes were

More information

Graphical Models Seminar

Graphical Models Seminar Graphical Models Seminar Forward-Backward and Viterbi Algorithm for HMMs Bishop, PRML, Chapters 13.2.2, 13.2.3, 13.2.5 Dinu Kaufmann Departement Mathematik und Informatik Universität Basel April 8, 2013

More information

LDPC Codes. Slides originally from I. Land p.1

LDPC Codes. Slides originally from I. Land p.1 Slides originally from I. Land p.1 LDPC Codes Definition of LDPC Codes Factor Graphs to use in decoding Decoding for binary erasure channels EXIT charts Soft-Output Decoding Turbo principle applied to

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Introduction to Low-Density Parity Check Codes. Brian Kurkoski

Introduction to Low-Density Parity Check Codes. Brian Kurkoski Introduction to Low-Density Parity Check Codes Brian Kurkoski kurkoski@ice.uec.ac.jp Outline: Low Density Parity Check Codes Review block codes History Low Density Parity Check Codes Gallager s LDPC code

More information

Codes on Graphs. Telecommunications Laboratory. Alex Balatsoukas-Stimming. Technical University of Crete. November 27th, 2008

Codes on Graphs. Telecommunications Laboratory. Alex Balatsoukas-Stimming. Technical University of Crete. November 27th, 2008 Codes on Graphs Telecommunications Laboratory Alex Balatsoukas-Stimming Technical University of Crete November 27th, 2008 Telecommunications Laboratory (TUC) Codes on Graphs November 27th, 2008 1 / 31

More information

The Ising model and Markov chain Monte Carlo

The Ising model and Markov chain Monte Carlo The Ising model and Markov chain Monte Carlo Ramesh Sridharan These notes give a short description of the Ising model for images and an introduction to Metropolis-Hastings and Gibbs Markov Chain Monte

More information

Trellis-based Detection Techniques

Trellis-based Detection Techniques Chapter 2 Trellis-based Detection Techniques 2.1 Introduction In this chapter, we provide the reader with a brief introduction to the main detection techniques which will be relevant for the low-density

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

Computation of Information Rates from Finite-State Source/Channel Models

Computation of Information Rates from Finite-State Source/Channel Models Allerton 2002 Computation of Information Rates from Finite-State Source/Channel Models Dieter Arnold arnold@isi.ee.ethz.ch Hans-Andrea Loeliger loeliger@isi.ee.ethz.ch Pascal O. Vontobel vontobel@isi.ee.ethz.ch

More information

Turbo Codes are Low Density Parity Check Codes

Turbo Codes are Low Density Parity Check Codes Turbo Codes are Low Density Parity Check Codes David J. C. MacKay July 5, 00 Draft 0., not for distribution! (First draft written July 5, 998) Abstract Turbo codes and Gallager codes (also known as low

More information

Inference in Bayesian Networks

Inference in Bayesian Networks Andrea Passerini passerini@disi.unitn.it Machine Learning Inference in graphical models Description Assume we have evidence e on the state of a subset of variables E in the model (i.e. Bayesian Network)

More information

Low-Density Parity-Check Codes

Low-Density Parity-Check Codes Department of Computer Sciences Applied Algorithms Lab. July 24, 2011 Outline 1 Introduction 2 Algorithms for LDPC 3 Properties 4 Iterative Learning in Crowds 5 Algorithm 6 Results 7 Conclusion PART I

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010

Hidden Markov Models. Aarti Singh Slides courtesy: Eric Xing. Machine Learning / Nov 8, 2010 Hidden Markov Models Aarti Singh Slides courtesy: Eric Xing Machine Learning 10-701/15-781 Nov 8, 2010 i.i.d to sequential data So far we assumed independent, identically distributed data Sequential data

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes EE229B - Final Project Capacity-Approaching Low-Density Parity-Check Codes Pierre Garrigues EECS department, UC Berkeley garrigue@eecs.berkeley.edu May 13, 2005 Abstract The class of low-density parity-check

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: ony Jebara Kalman Filtering Linear Dynamical Systems and Kalman Filtering Structure from Motion Linear Dynamical Systems Audio: x=pitch y=acoustic waveform Vision: x=object

More information

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing

Hidden Markov Models. By Parisa Abedi. Slides courtesy: Eric Xing Hidden Markov Models By Parisa Abedi Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed data Sequential (non i.i.d.) data Time-series data E.g. Speech

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Hidden Markov Models Barnabás Póczos & Aarti Singh Slides courtesy: Eric Xing i.i.d to sequential data So far we assumed independent, identically distributed

More information

Codes on Graphs, Normal Realizations, and Partition Functions

Codes on Graphs, Normal Realizations, and Partition Functions Codes on Graphs, Normal Realizations, and Partition Functions G. David Forney, Jr. 1 Workshop on Counting, Inference and Optimization Princeton, NJ November 3, 2011 1 Joint work with: Heide Gluesing-Luerssen,

More information

Belief-Propagation Decoding of LDPC Codes

Belief-Propagation Decoding of LDPC Codes LDPC Codes: Motivation Belief-Propagation Decoding of LDPC Codes Amir Bennatan, Princeton University Revolution in coding theory Reliable transmission, rates approaching capacity. BIAWGN, Rate =.5, Threshold.45

More information

Codes on graphs. Chapter Elementary realizations of linear block codes

Codes on graphs. Chapter Elementary realizations of linear block codes Chapter 11 Codes on graphs In this chapter we will introduce the subject of codes on graphs. This subject forms an intellectual foundation for all known classes of capacity-approaching codes, including

More information

p L yi z n m x N n xi

p L yi z n m x N n xi y i z n x n N x i Overview Directed and undirected graphs Conditional independence Exact inference Latent variables and EM Variational inference Books statistical perspective Graphical Models, S. Lauritzen

More information

Statistical NLP: Hidden Markov Models. Updated 12/15

Statistical NLP: Hidden Markov Models. Updated 12/15 Statistical NLP: Hidden Markov Models Updated 12/15 Markov Models Markov models are statistical tools that are useful for NLP because they can be used for part-of-speech-tagging applications Their first

More information

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II)

CISC 889 Bioinformatics (Spring 2004) Hidden Markov Models (II) CISC 889 Bioinformatics (Spring 24) Hidden Markov Models (II) a. Likelihood: forward algorithm b. Decoding: Viterbi algorithm c. Model building: Baum-Welch algorithm Viterbi training Hidden Markov models

More information

NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR. Sp ' 00

NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR. Sp ' 00 NAME... Soc. Sec. #... Remote Location... (if on campus write campus) FINAL EXAM EE568 KUMAR Sp ' 00 May 3 OPEN BOOK exam (students are permitted to bring in textbooks, handwritten notes, lecture notes

More information

LDPC Codes. Intracom Telecom, Peania

LDPC Codes. Intracom Telecom, Peania LDPC Codes Alexios Balatsoukas-Stimming and Athanasios P. Liavas Technical University of Crete Dept. of Electronic and Computer Engineering Telecommunications Laboratory December 16, 2011 Intracom Telecom,

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Expectation propagation for signal detection in flat-fading channels

Expectation propagation for signal detection in flat-fading channels Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA

More information

p(d θ ) l(θ ) 1.2 x x x

p(d θ ) l(θ ) 1.2 x x x p(d θ ).2 x 0-7 0.8 x 0-7 0.4 x 0-7 l(θ ) -20-40 -60-80 -00 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ ˆ 2 3 4 5 6 7 θ θ x FIGURE 3.. The top graph shows several training points in one dimension, known or assumed to

More information

Dual Decomposition for Inference

Dual Decomposition for Inference Dual Decomposition for Inference Yunshu Liu ASPITRG Research Group 2014-05-06 References: [1]. D. Sontag, A. Globerson and T. Jaakkola, Introduction to Dual Decomposition for Inference, Optimization for

More information

an introduction to bayesian inference

an introduction to bayesian inference with an application to network analysis http://jakehofman.com january 13, 2010 motivation would like models that: provide predictive and explanatory power are complex enough to describe observed phenomena

More information

Belief propagation decoding of quantum channels by passing quantum messages

Belief propagation decoding of quantum channels by passing quantum messages Belief propagation decoding of quantum channels by passing quantum messages arxiv:67.4833 QIP 27 Joseph M. Renes lempelziv@flickr To do research in quantum information theory, pick a favorite text on classical

More information

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes (bilmes@cs.berkeley.edu) International Computer Science Institute

More information

1 1 0, g Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g

1 1 0, g Exercise 1. Generator polynomials of a convolutional code, given in binary form, are g Exercise Generator polynomials of a convolutional code, given in binary form, are g 0, g 2 0 ja g 3. a) Sketch the encoding circuit. b) Sketch the state diagram. c) Find the transfer function TD. d) What

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

1e

1e B. J. Frey and D. J. C. MacKay (1998) In Proceedings of the 35 th Allerton Conference on Communication, Control and Computing 1997, Champaign-Urbana, Illinois. Trellis-Constrained Codes Brendan J. Frey

More information

Lecture 11: Hidden Markov Models

Lecture 11: Hidden Markov Models Lecture 11: Hidden Markov Models Cognitive Systems - Machine Learning Cognitive Systems, Applied Computer Science, Bamberg University slides by Dr. Philip Jackson Centre for Vision, Speech & Signal Processing

More information

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them HMM, MEMM and CRF 40-957 Special opics in Artificial Intelligence: Probabilistic Graphical Models Sharif University of echnology Soleymani Spring 2014 Sequence labeling aking collective a set of interrelated

More information

Hidden Markov Models

Hidden Markov Models CS769 Spring 2010 Advanced Natural Language Processing Hidden Markov Models Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Part-of-Speech Tagging The goal of Part-of-Speech (POS) tagging is to label each

More information

Graph-based Codes and Iterative Decoding

Graph-based Codes and Iterative Decoding Graph-based Codes and Iterative Decoding Thesis by Aamod Khandekar In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy California Institute of Technology Pasadena, California

More information

Aalborg Universitet. Bounds on information combining for parity-check equations Land, Ingmar Rüdiger; Hoeher, A.; Huber, Johannes

Aalborg Universitet. Bounds on information combining for parity-check equations Land, Ingmar Rüdiger; Hoeher, A.; Huber, Johannes Aalborg Universitet Bounds on information combining for parity-check equations Land, Ingmar Rüdiger; Hoeher, A.; Huber, Johannes Published in: 2004 International Seminar on Communications DOI link to publication

More information

Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models. Terminology and Basic Algorithms Hidden Markov Models Terminology and Basic Algorithms The next two weeks Hidden Markov models (HMMs): Wed 9/11: Terminology and basic algorithms Mon 14/11: Implementing the basic algorithms Wed 16/11:

More information

Markov Chains and Hidden Markov Models

Markov Chains and Hidden Markov Models Chapter 1 Markov Chains and Hidden Markov Models In this chapter, we will introduce the concept of Markov chains, and show how Markov chains can be used to model signals using structures such as hidden

More information

Low-density parity-check (LDPC) codes

Low-density parity-check (LDPC) codes Low-density parity-check (LDPC) codes Performance similar to turbo codes Do not require long interleaver to achieve good performance Better block error performance Error floor occurs at lower BER Decoding

More information

Shannon s Noisy-Channel Coding Theorem

Shannon s Noisy-Channel Coding Theorem Shannon s Noisy-Channel Coding Theorem Lucas Slot Sebastian Zur February 13, 2015 Lucas Slot, Sebastian Zur Shannon s Noisy-Channel Coding Theorem February 13, 2015 1 / 29 Outline 1 Definitions and Terminology

More information

Dynamic Approaches: The Hidden Markov Model

Dynamic Approaches: The Hidden Markov Model Dynamic Approaches: The Hidden Markov Model Davide Bacciu Dipartimento di Informatica Università di Pisa bacciu@di.unipi.it Machine Learning: Neural Networks and Advanced Models (AA2) Inference as Message

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Status of Knowledge on Non-Binary LDPC Decoders

Status of Knowledge on Non-Binary LDPC Decoders Status of Knowledge on Non-Binary LDPC Decoders Part I: From Binary to Non-Binary Belief Propagation Decoding D. Declercq 1 1 ETIS - UMR8051 ENSEA/Cergy-University/CNRS France IEEE SSC SCV Tutorial, Santa

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

A brief introduction to Conditional Random Fields

A brief introduction to Conditional Random Fields A brief introduction to Conditional Random Fields Mark Johnson Macquarie University April, 2005, updated October 2010 1 Talk outline Graphical models Maximum likelihood and maximum conditional likelihood

More information

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Hidden Markov Model. Ying Wu. Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 Hidden Markov Model Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1/19 Outline Example: Hidden Coin Tossing Hidden

More information

Capacity-approaching codes

Capacity-approaching codes Chapter 13 Capacity-approaching codes We have previously discussed codes on graphs and the sum-product decoding algorithm in general terms. In this chapter we will give a brief overview of some particular

More information

Least Squares and Kalman Filtering on Forney Graphs

Least Squares and Kalman Filtering on Forney Graphs Least Squares and Kalman Filtering on Forney Graphs Hans-Andrea Loeliger, ISI, ETH Zürich To appear in the Forney Festschrift Revised, October 18, 2001 Abstract: General versions of Kalman filtering and

More information

Steepest descent on factor graphs

Steepest descent on factor graphs Steepest descent on factor graphs Justin Dauwels, Sascha Korl, and Hans-Andrea Loeliger Abstract We show how steepest descent can be used as a tool for estimation on factor graphs. From our eposition,

More information

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei

COMPSCI 650 Applied Information Theory Apr 5, Lecture 18. Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei COMPSCI 650 Applied Information Theory Apr 5, 2016 Lecture 18 Instructor: Arya Mazumdar Scribe: Hamed Zamani, Hadi Zolfaghari, Fatemeh Rezaei 1 Correcting Errors in Linear Codes Suppose someone is to send

More information

Hidden Markov Models. Terminology and Basic Algorithms

Hidden Markov Models. Terminology and Basic Algorithms Hidden Markov Models Terminology and Basic Algorithms What is machine learning? From http://en.wikipedia.org/wiki/machine_learning Machine learning, a branch of artificial intelligence, is about the construction

More information

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named

We Live in Exciting Times. CSCI-567: Machine Learning (Spring 2019) Outline. Outline. ACM (an international computing research society) has named We Live in Exciting Times ACM (an international computing research society) has named CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Apr. 2, 2019 Yoshua Bengio,

More information

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from:

Hidden Markov Models. Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: Hidden Markov Models Ivan Gesteira Costa Filho IZKF Research Group Bioinformatics RWTH Aachen Adapted from: www.ioalgorithms.info Outline CG-islands The Fair Bet Casino Hidden Markov Model Decoding Algorithm

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Noisy channel communication

Noisy channel communication Information Theory http://www.inf.ed.ac.uk/teaching/courses/it/ Week 6 Communication channels and Information Some notes on the noisy channel setup: Iain Murray, 2012 School of Informatics, University

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich Message Passing and Junction Tree Algorithms Kayhan Batmanghelich 1 Review 2 Review 3 Great Ideas in ML: Message Passing Each soldier receives reports from all branches of tree 3 here 7 here 1 of me 11

More information

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road UNIT I

SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road UNIT I SIDDHARTH GROUP OF INSTITUTIONS :: PUTTUR Siddharth Nagar, Narayanavanam Road 517583 QUESTION BANK (DESCRIPTIVE) Subject with Code : CODING THEORY & TECHNIQUES(16EC3810) Course & Branch: M.Tech - DECS

More information

Linear Block Codes. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay

Linear Block Codes. Saravanan Vijayakumaran Department of Electrical Engineering Indian Institute of Technology Bombay 1 / 26 Linear Block Codes Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay July 28, 2014 Binary Block Codes 3 / 26 Let F 2 be the set

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Gaussian Message Passing on Linear Models: An Update

Gaussian Message Passing on Linear Models: An Update Int. Symp. on Turbo Codes & Related Topics, pril 2006 Gaussian Message Passing on Linear Models: n Update Hans-ndrea Loeliger 1, Junli Hu 1, Sascha Korl 2, Qinghua Guo 3, and Li Ping 3 1 Dept. of Information

More information

Physical Layer and Coding

Physical Layer and Coding Physical Layer and Coding Muriel Médard Professor EECS Overview A variety of physical media: copper, free space, optical fiber Unified way of addressing signals at the input and the output of these media:

More information

Joint Iterative Decoding of LDPC Codes and Channels with Memory

Joint Iterative Decoding of LDPC Codes and Channels with Memory Joint Iterative Decoding of LDPC Codes and Channels with Memory Henry D. Pfister and Paul H. Siegel University of California, San Diego 3 rd Int l Symposium on Turbo Codes September 1, 2003 Outline Channels

More information

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)

More information

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg Temporal Reasoning Kai Arras, University of Freiburg 1 Temporal Reasoning Contents Introduction Temporal Reasoning Hidden Markov Models Linear Dynamical Systems (LDS) Kalman Filter 2 Temporal Reasoning

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Linear Dynamical Systems (Kalman filter)

Linear Dynamical Systems (Kalman filter) Linear Dynamical Systems (Kalman filter) (a) Overview of HMMs (b) From HMMs to Linear Dynamical Systems (LDS) 1 Markov Chains with Discrete Random Variables x 1 x 2 x 3 x T Let s assume we have discrete

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

An Introduction to Low Density Parity Check (LDPC) Codes

An Introduction to Low Density Parity Check (LDPC) Codes An Introduction to Low Density Parity Check (LDPC) Codes Jian Sun jian@csee.wvu.edu Wireless Communication Research Laboratory Lane Dept. of Comp. Sci. and Elec. Engr. West Virginia University June 3,

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

Sequential Supervised Learning

Sequential Supervised Learning Sequential Supervised Learning Many Application Problems Require Sequential Learning Part-of of-speech Tagging Information Extraction from the Web Text-to to-speech Mapping Part-of of-speech Tagging Given

More information

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017 1 Introduction Let x = (x 1,..., x M ) denote a sequence (e.g. a sequence of words), and let y = (y 1,..., y M ) denote a corresponding hidden sequence that we believe explains or influences x somehow

More information

Graphical Models for Collaborative Filtering

Graphical Models for Collaborative Filtering Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Capacity of a channel Shannon s second theorem. Information Theory 1/33

Capacity of a channel Shannon s second theorem. Information Theory 1/33 Capacity of a channel Shannon s second theorem Information Theory 1/33 Outline 1. Memoryless channels, examples ; 2. Capacity ; 3. Symmetric channels ; 4. Channel Coding ; 5. Shannon s second theorem,

More information

Introducing Low-Density Parity-Check Codes

Introducing Low-Density Parity-Check Codes Introducing Low-Density Parity-Check Codes Sarah J. Johnson School of Electrical Engineering and Computer Science The University of Newcastle Australia email: sarah.johnson@newcastle.edu.au Topic 1: Low-Density

More information