Learning and message-passing in graphical models

Size: px
Start display at page:

Download "Learning and message-passing in graphical models"

Transcription

1 Learning and message-passing in graphical models Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Graphical models and message-passing 1 / 41

2 graphical model: Introduction graph G = (V,E) with p vertices random vector: (X 1,X 2,...,X p ) (a) Markov chain (b) Multiscale quadtree (c) Two-dimensional grid useful in many statistical and computational fields: statistical signal processing statistical image processing statistical machine learning computational biology, bioinformatics statistical physics communication and information theory Martin Wainwright (UC Berkeley) Graphical models and message-passing 2 / 41

3 Graphs and factorization 2 ψ ψ ψ clique C is a fully connected subset of vertices compatibility function ψ C defined on variables x C = {x s,s C} factorization over all cliques Q(x 1,...,x p ) = 1 Z ψ C (x C ). C C Martin Wainwright (UC Berkeley) Graphical models and message-passing 3 / 41

4 Example: Optical digit/character recognition Goal: correctly label digits/characters based on noisy versions E.g., mail sorting; document scanning; handwriting recognition systems

5 Example: Optical digit/character recognition Goal: correctly label digits/characters based on noisy versions strong sequential dependencies captured by (hidden) Markov chain message-passing spreads information along chain (Baum & Petrie, 1966; Viterbi, 1967, and many others)

6 Example: Image denoising with grid models 8-bit digital image: matrix of intensity values {0,1,...255}

7 Example: Image denoising with grid models ψ uv (x u,x v ) x u x v 8-bit digital image: matrix of intensity values {0,1,...255} 1 γ γ pairwise interaction ψ uv (x u,x v ) = γ 1 γ γ γ 1 γ (0,1) enforces smoothness: {x u = x v } more likely than {x u x v }

8 Example: Multi-scale modeling often useful to move beyond pixel-based representations in image modelling

9 Example: Multi-scale modeling often useful to move beyond pixel-based representations in image modelling compute a multi-scale transform (e.g., wavelets, Gabor filters etc.)

10 Example: Multi-scale modeling often useful to move beyond pixel-based representations in image modelling compute a multi-scale transform (e.g., wavelets, Gabor filters etc.) model coefficients using a multi-scale autoregressive process on a tree (e.g., Willsky, 2002)

11 Example: Depth estimation in computer vision Stereo pairs: two images taken from horizontally-offset cameras

12 Modeling depth with a graphical model Introduce variable at pixel location (a, b): x ab Offset between images in position (a,b) ψ cd (x cd ) ψ ab,cd (x ab,x cd ) ψ ab (x ab ) Left image Right image Use message-passing algorithms to estimate most likely offset/depth map. (Szeliski et al., 2005)

13 Many other examples natural language processing (e.g., parsing, translation) computational biology (gene sequences, protein folding, phylogenetic reconstruction) sensor network deployments (e.g., distributed detection, estimation, fault monitoring) social network analysis (e.g., politics, Facebook, terrorism.) communication theory and error-control decoding (e.g., turbo codes, LDPC codes) satisfiability problems (3-SAT, MAX-XORSAT, graph colouring) robotics (path planning, tracking, navigation)... Martin Wainwright (UC Berkeley) Graphical models and message-passing 9 / 41

14 Remainder of talk 1 Message-passing in graphical models Marginalization and applications Based form of sum-product algorithm (belief propagation) Stochastic belief propagation 2 Learning graphical structure in data Applications Neighborhood-based regression Application to social network data

15 1. Marginalization and message-passing Define marginal distribution at vertex u V: Q(x u y) = Q(x 1,x 2,...,x N y) x v,v V\{u}

16 1. Marginalization and message-passing Define marginal distribution at vertex u V: Q(x u y) = Q(x 1,x 2,...,x N y) x v,v V\{u} Useful in many applications: Optical character recognition

17 1. Marginalization and message-passing Define marginal distribution at vertex u V: Q(x u y) = Q(x 1,x 2,...,x N y) x v,v V\{u} Useful in many applications: Computer vision: computation disparity Q( x }{{} u y 1,...,y }{{ N } ) depth at pixel u multiple images

18 Divide and conquer Goal: Compute marginal distribution at node 1. M 21 M x 2,x 3 Q(x 1,x 2,x 3 ) x 2,x 3 { } ψ 1 (x 1 )ψ 2 (x 2 )ψ 3 (x 3 )ψ 12 (x 1,x 2 )ψ 23 (x 2,x 3 )

19 Divide and conquer Goal: Compute marginal distribution at node 1. M 21 M x 2,x 3 Q(x 1,x 2,x 3 ) x 2,x 3 { } ψ 1 (x 1 )ψ 2 (x 2 )ψ 3 (x 3 )ψ 12 (x 1,x 2 )ψ 23 (x 2,x 3 ) Q(x 1,x 2,x 3 ) ψ 1 (x 1 ) {ψ 2 (x 2 )ψ 12 (x 1,x 2 ) { ψ 3 (x 3 ) ψ 23 (x 2,x 3 ) } } x 2,x 3 x 2 x 3 }{{} M 32(x 2)

20 Divide and conquer Goal: Compute marginal distribution at node 1. M 21 M x 2,x 3 Q(x 1,x 2,x 3 ) x 2,x 3 { } ψ 1 (x 1 )ψ 2 (x 2 )ψ 3 (x 3 )ψ 12 (x 1,x 2 )ψ 23 (x 2,x 3 ) Q(x 1,x 2,x 3 ) ψ 1 (x 1 ) {ψ 2 (x 2 )ψ 12 (x 1,x 2 ) { ψ 3 (x 3 ) ψ 23 (x 2,x 3 ) } } x 2,x 3 x 2 x 3 }{{} M 32(x 2) } {{ } M 21(x 1)

21 Decompose: Iterating the procedure Q(x) [ ψ 1 (x 1 ) ψ 12 (x 1,x 2 ) ] u N(2) M u2(x 2 ). x 2,x 3,x 4,x 5 x 2 M 21 M 32 4 M M 53 5 Update messages: M 32 (x 2 ) = x 3 ψ 3 (x 3 ) ψ 23 (x 2,x 3 ) M v3 (x 3 ) v N(3)\{2}

22 Ordinary belief propagation Algorithm 1 At time t = 0, for all edges (v,u) E, initialize message vectors: M 0 vu(k) = 1 for all states k {1,...,d}.

23 Ordinary belief propagation Algorithm 1 At time t = 0, for all edges (v,u) E, initialize message vectors: M 0 vu(k) = 1 for all states k {1,...,d}. 2 At time t = 0,1,2,..., for each edge directed (v u), update message: M t+1 vu (k) d ψ v (j)ψ uv (k,j) j=1 w N(u)\{v} M t wv(j).

24 Ordinary belief propagation Algorithm 1 At time t = 0, for all edges (v,u) E, initialize message vectors: M 0 vu(k) = 1 for all states k {1,...,d}. 2 At time t = 0,1,2,..., for each edge directed (v u), update message: M t+1 vu (k) d ψ v (j)ψ uv (k,j) j=1 w N(u)\{v} M t wv(j). 3 Upon convergence, compute (approximate) marginal distributions: Q[X u = k] ψ u(k) w N(u) M wu(k).

25 Ordinary belief propagation Algorithm 1 At time t = 0, for all edges (v,u) E, initialize message vectors: M 0 vu(k) = 1 for all states k {1,...,d}. 2 At time t = 0,1,2,..., for each edge directed (v u), update message: M t+1 vu (k) d ψ v (j)ψ uv (k,j) j=1 w N(u)\{v} M t wv(j). 3 Upon convergence, compute (approximate) marginal distributions: Q[X u = k] ψ u(k) w N(u) M wu(k). Computational complexity: Θ(d 2 ) operations per edge per round Communication complexity: Requires passing d 1 real numbers per edge.

26 Some reduced complexity methods faster methods for problems with algebraic structure: low-density parity check codes (e.g., Gallager, 1963; Richardson & Urbanke, 2008) Toeplitz matrices in computer vision (Felzenszwalb & Huttenlocher, 2006) exploiting symmetries (Kersten et al., 2009; McAyley & Caetano, 2011) Martin Wainwright (UC Berkeley) Graphical models and message-passing 15 / 41

27 Some reduced complexity methods faster methods for problems with algebraic structure: low-density parity check codes (e.g., Gallager, 1963; Richardson & Urbanke, 2008) Toeplitz matrices in computer vision (Felzenszwalb & Huttenlocher, 2006) exploiting symmetries (Kersten et al., 2009; McAyley & Caetano, 2011) quantized forms of belief propagation (Gallager, 1963; Coughlan & Shen, 2007) lower complexity (computation and communication) do not converge to exact BP fixed point Martin Wainwright (UC Berkeley) Graphical models and message-passing 15 / 41

28 Some reduced complexity methods faster methods for problems with algebraic structure: low-density parity check codes (e.g., Gallager, 1963; Richardson & Urbanke, 2008) Toeplitz matrices in computer vision (Felzenszwalb & Huttenlocher, 2006) exploiting symmetries (Kersten et al., 2009; McAyley & Caetano, 2011) quantized forms of belief propagation (Gallager, 1963; Coughlan & Shen, 2007) lower complexity (computation and communication) do not converge to exact BP fixed point particle filtering and non-parametric belief propagation (Doucet et al., 2001; Sudderth et al., 2003) designed for continuous state spaces high-fidelity requires taking number of particles to infinity Martin Wainwright (UC Berkeley) Graphical models and message-passing 15 / 41

29 Stochastic belief propagation Off-line phase: Pre-compute the normalized columns Γ uv (:,j) = ψ uv(:,j)ψ v (j) d where β uv (j) = ψ uv (k,j)ψ v (j). β uv (j) k=1

30 Stochastic belief propagation Off-line phase: Pre-compute the normalized columns Γ uv (:,j) = ψ uv(:,j)ψ v (j) d where β uv (j) = ψ uv (k,j)ψ v (j). β uv (j) Stochastic belief propagation (BP) k=1 1 Initialize message vectors M 0 vu(k) = 1 for all k {1,...,d}.

31 Stochastic belief propagation Off-line phase: Pre-compute the normalized columns Γ uv (:,j) = ψ uv(:,j)ψ v (j) d where β uv (j) = ψ uv (k,j)ψ v (j). β uv (j) Stochastic belief propagation (BP) k=1 1 Initialize message vectors M 0 vu(k) = 1 for all k {1,...,d}. 2 At time t = 0,1,2,..., for each directed edge (v u): (A) Compute product of incoming messages: M v\u (j) := w N(v)\{u} M t wv(j).

32 Stochastic belief propagation Off-line phase: Pre-compute the normalized columns Γ uv (:,j) = ψ uv(:,j)ψ v (j) d where β uv (j) = ψ uv (k,j)ψ v (j). β uv (j) Stochastic belief propagation (BP) k=1 1 Initialize message vectors M 0 vu(k) = 1 for all k {1,...,d}. 2 At time t = 0,1,2,..., for each directed edge (v u): (A) Compute product of incoming messages: M v\u (j) := w N(v)\{u} M t wv(j). (B) Pick a random index J t vu according to probability distribution: p t vu(j) M v\u (j) β uv(j) for j {1,...,d}.

33 Stochastic belief propagation Off-line phase: Pre-compute the normalized columns Γ uv (:,j) = ψ uv(:,j)ψ v (j) d where β uv (j) = ψ uv (k,j)ψ v (j). β uv (j) Stochastic belief propagation (BP) k=1 1 Initialize message vectors M 0 vu(k) = 1 for all k {1,...,d}. 2 At time t = 0,1,2,..., for each directed edge (v u): (A) Compute product of incoming messages: M v\u (j) := w N(v)\{u} M t wv(j). (B) Pick a random index J t vu according to probability distribution: p t vu(j) M v\u (j) β uv(j) for j {1,...,d}. (C) Update message vector M t+1 vu (:) R d with stepsize λ t (0,1): M t+1 vu (:) = (1 λ t ) M t vu(:)+λ t Γuv ( :,J t vu ).

34 Stochastic belief propagation Stochastic belief propagation (BP) 1 Initialize message vectors M 0 vu(k) = 1 for all k {1,...,d}. 2 At time t = 0,1,2,..., for each directed edge (v u): (A) Compute product of incoming messages: M v\u (j) := w N(v)\{u} M t wv(j). (B) Pick a random index J t vu according to probability distribution: p t vu(j) M v\u (j) β uv(j) for j {1,...,d}. (C) Update message vector M t+1 vu (:) R d with stepsize λ t (0,1): M t+1 vu (:) = (1 λ t ) M t vu(:)+λ t Γuv ( :,J t vu ). Computational complexity: Θ(d) operations per edge per round Communication complexity: Requires passing log(d) bits per edge.

35 Sample paths on grid-structured graph 10 0 Normalized squared error Number of iterations Martin Wainwright (UC Berkeley) Graphical models and message-passing 17 / 41

36 Some intuition stochastic BP update is unbiased at each round E[M t+1 vu M t vu] = (1 λ t )M t vu(:)+λ t d p uv (j) Γ uv (:, j) j=1 Martin Wainwright (UC Berkeley) Graphical models and message-passing 18 / 41

37 Some intuition stochastic BP update is unbiased at each round E[M t+1 vu M t vu] = (1 λ t )M t vu(:)+λ t d ψ uv (:,j)ψ v (j) j=1 w N(v)\{u} Mwv t ( ) : Martin Wainwright (UC Berkeley) Graphical models and message-passing 18 / 41

38 Some intuition stochastic BP update is unbiased at each round E[M t+1 vu M t vu] = (1 λ t )M t vu(:)+λ t d ψ uv (:,j)ψ v (j) j=1 w N(v)\{u} Mwv t ( ) : variance will depend on step sizes {λ t } t=1 need step size λ t 0 to smooth out fluctuations need t=1 λt + for infinite travel Martin Wainwright (UC Berkeley) Graphical models and message-passing 18 / 41

39 Some intuition stochastic BP update is unbiased at each round E[M t+1 vu M t vu] = (1 λ t )M t vu(:)+λ t d ψ uv (:,j)ψ v (j) j=1 w N(v)\{u} Mwv t ( ) : variance will depend on step sizes {λ t } t=1 need step size λ t 0 to smooth out fluctuations need t=1 λt + for infinite travel Challenges 1 Need stability condition = average path is well-behaved. 2 Control on spread of noise through network Martin Wainwright (UC Berkeley) Graphical models and message-passing 18 / 41

40 SBP on graphs with cycles Assume that ordinary BP is µ-contractive: F(M) F(M ) 2 (1 µ/2) M M 2.

41 SBP on graphs with cycles Assume that ordinary BP is µ-contractive: F(M) F(M ) 2 (1 µ/2) M M 2. Theorem With step size λ t = 1 µ(t+1), the SBP iterates have the following guarantees: (a) The SBP messages {M t } t=0 converge almost surely to unique BP fixed point M as t +.

42 SBP on graphs with cycles Assume that ordinary BP is µ-contractive: F(M) F(M ) 2 (1 µ/2) M M 2. Theorem With step size λ t = 1 µ(t+1), the SBP iterates have the following guarantees: (a) The SBP messages {M t } t=0 converge almost surely to unique BP fixed point M as t +. (b) The expected squared error is bounded as: E[ M t M 2 2 M 2 2 K(ψ) (1 c 0 µ 2 t ) +c1 M 0 M 2 2 M 2 2 (1) 2, t where K(ψ) is a constant depending on potential functions ψ.

43 SBP on graphs with cycles Assume that ordinary BP is µ-contractive: F(M) F(M ) 2 (1 µ/2) M M 2. Theorem With step size λ t = 1 µ(t+1), the SBP iterates have the following guarantees: (a) The SBP messages {M t } t=0 converge almost surely to unique BP fixed point M as t +. (b) The expected squared error is bounded as: E[ M t M 2 2 M 2 2 K(ψ) (1 c 0 µ 2 t ) +c1 M 0 M 2 2 M 2 2 (1) 2, t where K(ψ) is a constant depending on potential functions ψ. (c) For every q 1, with probability at least 1 2 t q, we have M t M 2 2 M 2 2 (1+2q) 32K(ψ) µ 2 (logt t 3/4 ).

44 Complexity comparison Algorithm Communication Computation Iteration Ordinary BP d 1 real numbers Θ(d 2 ) log(1/δ) Stochastic BP log d bits Θ(d) (1/δ) Martin Wainwright (UC Berkeley) Graphical models and message-passing 20 / 41

45 Denoising Martin Wainwright (UC Berkeley) Graphical models and message-passing 21 / 41

46 Denoising Ordinary BP SBP Martin Wainwright (UC Berkeley) Graphical models and message-passing 21 / 41

47 Comparison of running times BP SBP 340 Mean squared error Running time (sec)

48 Application to optical flow computation Martin Wainwright (UC Berkeley) Graphical models and message-passing 23 / 41

49 Application to optical flow computation Martin Wainwright (UC Berkeley) Graphical models and message-passing 23 / 41

50 2. Learning graph structure from data up to now, have been discussing a forward problem: computing marginals in a graphical model inverse problems concern learning the parameters and structure of graphs from data Martin Wainwright (UC Berkeley) Graphical models and message-passing 24 / 41

51 2. Learning graph structure from data up to now, have been discussing a forward problem: computing marginals in a graphical model inverse problems concern learning the parameters and structure of graphs from data many instances of such graph learning problems: fitting graphs to politicians voting behavior modeling diseases with epidemiological networks traffic flow modeling interactions between different genes and so on... Martin Wainwright (UC Berkeley) Graphical models and message-passing 24 / 41

52 Example: US Senate network ( voting) (Banerjee et al., 2008; Ravikumar, W. & Lafferty, 2010)

53 Example: Biological networks gene networks during Drosophila life cycle (Ahmed & Xing, PNAS, 2009) many other examples: protein networks phylogenetic trees Martin Wainwright (UC Berkeley) Graphical models and message-passing 26 / 41

54 Learning for pairwise models drawn n samples from 1 Q(x 1,...,x p ;Θ) = Z(Θ) exp{ θ s x 2 s + } θ st x s x t s V (s,t) E graph G and matrix [Θ] st = θ st of edge weights are unknown Martin Wainwright (UC Berkeley) Graphical models and message-passing 27 / 41

55 Learning for pairwise models drawn n samples from 1 Q(x 1,...,x p ;Θ) = Z(Θ) exp{ θ s x 2 s + } θ st x s x t s V (s,t) E graph G and matrix [Θ] st = θ st of edge weights are unknown data matrix: Ising model (binary variables): X n 1 {0,1} n p Gaussian model: X n 1 R n p estimator X n 1 Θ Martin Wainwright (UC Berkeley) Graphical models and message-passing 27 / 41

56 Learning for pairwise models drawn n samples from 1 Q(x 1,...,x p ;Θ) = Z(Θ) exp{ θ s x 2 s + } θ st x s x t s V (s,t) E graph G and matrix [Θ] st = θ st of edge weights are unknown data matrix: Ising model (binary variables): X n 1 {0,1} n p Gaussian model: X n 1 R n p estimator X n 1 Θ various loss functions are possible: graph selection: supp[ Θ] = supp[θ]? bounds on Kullback-Leibler divergence D(Q Θ Q Θ) bounds on Θ Θ op. Martin Wainwright (UC Berkeley) Graphical models and message-passing 27 / 41

57 Challenges in graph selection For pairwise models, negative log-likelihood takes form: l(θ;x n 1) := 1 n n logq(x i1,...,x ip ;Θ) i=1

58 Challenges in graph selection For pairwise models, negative log-likelihood takes form: l(θ;x n 1) := 1 n n logq(x i1,...,x ip ;Θ) i=1 for Gaussian graphical models, this is a log-determinant program for discrete graphical models, exact computation of likelihood is impossible (curse of dimensionality)

59 Challenges in graph selection For pairwise models, negative log-likelihood takes form: l(θ;x n 1) := 1 n n logq(x i1,...,x ip ;Θ) i=1 for Gaussian graphical models, this is a log-determinant program for discrete graphical models, exact computation of likelihood is impossible (curse of dimensionality) various work-arounds are possible: Markov chain Monte Carlo and stochastic gradient variational approximations to likelihood pseudo-likelihoods or neighborhood-based approaches

60 Methods for graph selection for Gaussian graphical models: l1-regularized neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao & Yu, 2006) l1-regularized log-determinant (e.g., Yuan & Lin, 2006; d Asprémont et al., 2007; Friedman, 2008; Rothman et al., 2008; Ravikumar et al., 2008)

61 Methods for graph selection for Gaussian graphical models: l1-regularized neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao & Yu, 2006) l1-regularized log-determinant (e.g., Yuan & Lin, 2006; d Asprémont et al., 2007; Friedman, 2008; Rothman et al., 2008; Ravikumar et al., 2008) methods for discrete MRFs exact solution for trees (Chow & Liu, 1967) local testing (e.g., Spirtes et al, 2000; Kalisch & Buhlmann, 2008) various other methods distribution fits by KL-divergence (Abeel et al., 2005) l 1 -regularized log. regression (Ravikumar, W. & Lafferty et al., 2008, 2010) approximate max. entropy approach and thinned graphical models (Johnson et al., 2007) neighborhood-based thresholding method (Bresler, Mossel & Sly, 2008)

62 Methods for graph selection for Gaussian graphical models: l1-regularized neighborhood regression for Gaussian MRFs (e.g., Meinshausen & Buhlmann, 2005; Wainwright, 2006, Zhao & Yu, 2006) l1-regularized log-determinant (e.g., Yuan & Lin, 2006; d Asprémont et al., 2007; Friedman, 2008; Rothman et al., 2008; Ravikumar et al., 2008) methods for discrete MRFs exact solution for trees (Chow & Liu, 1967) local testing (e.g., Spirtes et al, 2000; Kalisch & Buhlmann, 2008) various other methods distribution fits by KL-divergence (Abeel et al., 2005) l 1 -regularized log. regression (Ravikumar, W. & Lafferty et al., 2008, 2010) approximate max. entropy approach and thinned graphical models (Johnson et al., 2007) neighborhood-based thresholding method (Bresler, Mossel & Sly, 2008) information-theoretic analysis pseudolikelihood and BIC criterion (Csiszar & Talata, 2006) information-theoretic limitations (Santhanam & W., 2008, 2012)

63 Markov property and neighborhood structure Markov properties encode neighborhood structure: (X s X V\s ) }{{} Condition on full graph d = (X s X N(s) ) }{{} Condition on Markov blanket N(s) = {s,t,u,v,w} X s X t X w X s X u X v basis of pseudolikelihood method (Besag, 1974) basis of many graph learning algorithms (Friedman et al., 1999; Csiszar & Talata, 2005; Abeel et al., 2006; Meinshausen & Buhlmann, 2006) Martin Wainwright (UC Berkeley) Graphical models and message-passing 30 / 41

64 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}.

65 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}. 1 For each node s V, compute (regularized) max. likelihood estimate: { } θ[s] := arg min 1 n L(θ;X i, \s ) + λ n θ 1 θ R p 1 n }{{}}{{} i=1 local log. likelihood regularization

66 Graph selection via neighborhood regression X \s X s..... Predict X s based on X \s := {X s, t s}. 1 For each node s V, compute (regularized) max. likelihood estimate: { } θ[s] := arg min 1 n L(θ;X i, \s ) + λ n θ 1 θ R p 1 n }{{}}{{} i=1 local log. likelihood regularization 2 Estimate the local neighborhood N(s) as support of regression vector θ[s] R p 1.

67 High-dimensional analysis classical analysis: graph size p fixed, sample size n + high-dimensional analysis: allow both dimension p, sample size n, and maximum degree d to increase at arbitrary rates take n i.i.d. samples from MRF defined by G p,d study probability of success as a function of three parameters: Success(n,p,d) = Q[Method recovers graph G p,d from n samples] theory is non-asymptotic: explicit probabilities for finite (n, p, d)

68 Empirical behavior: Unrescaled plots 1 Star graph; Linear fraction neighbors 0.8 Prob. success p = 64 p = 100 p = Number of samples

69 Empirical behavior: Appropriately rescaled 1 Star graph; Linear fraction neighbors 0.8 Prob. success p = 64 p = 100 p = Control parameter

70 Rescaled plots (2-D lattice graphs) 1 4 nearest neighbor grid (attractive) 0.8 Prob. success p = 64 p = 100 p = Control parameter

71 Sufficient conditions for consistent Ising selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. edge weights θ st θ min for all (s,t) E draw n i.i.d, samples, and analyze prob. success indexed by (n,p,d) Theorem (Ravikumar, W. & Lafferty, 2006, 2010)

72 Sufficient conditions for consistent Ising selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. edge weights θ st θ min for all (s,t) E draw n i.i.d, samples, and analyze prob. success indexed by (n,p,d) Theorem (Ravikumar, W. & Lafferty, 2006, 2010) Under incoherence conditions, for a rescaled sample γ LR (n,p,d) := and regularization parameter λ n c 1 1 2exp ( c 2 λ 2 nn ) : n d 3 logp > γ crit logp n, then with probability greater than (a) Correct exclusion: The estimated sign neighborhood N(s) correctly excludes all edges not in the true neighborhood.

73 Sufficient conditions for consistent Ising selection graph sequences G p,d = (V,E) with p vertices, and maximum degree d. edge weights θ st θ min for all (s,t) E draw n i.i.d, samples, and analyze prob. success indexed by (n,p,d) Theorem (Ravikumar, W. & Lafferty, 2006, 2010) Under incoherence conditions, for a rescaled sample γ LR (n,p,d) := and regularization parameter λ n c 1 1 2exp ( c 2 λ 2 nn ) : n d 3 logp > γ crit logp n, then with probability greater than (a) Correct exclusion: The estimated sign neighborhood N(s) correctly excludes all edges not in the true neighborhood. (b) Correct inclusion: For θ min c 3 λ n, the method selects the correct signed neighborhood.

74 Some related work thresholding estimator (poly-time for bounded degree) works with n 2 d logp samples (Bresler et al., 2008)

75 Some related work thresholding estimator (poly-time for bounded degree) works with n 2 d logp samples (Bresler et al., 2008) information-theoretic lower bound over family G p,d : any method requires at least n = Ω(d 2 logp) samples (Santhanam & W., 2008, 2012)

76 Some related work thresholding estimator (poly-time for bounded degree) works with n 2 d logp samples (Bresler et al., 2008) information-theoretic lower bound over family G p,d : any method requires at least n = Ω(d 2 logp) samples (Santhanam & W., 2008, 2012) l 1 -based method: sharper achievable rates, also failure for θ large enough to violate incoherence (Bento & Montanari, 2009)

77 Some related work thresholding estimator (poly-time for bounded degree) works with n 2 d logp samples (Bresler et al., 2008) information-theoretic lower bound over family G p,d : any method requires at least n = Ω(d 2 logp) samples (Santhanam & W., 2008, 2012) l 1 -based method: sharper achievable rates, also failure for θ large enough to violate incoherence (Bento & Montanari, 2009) empirical study: l 1 -based method can succeed beyond phase transition on Ising model (Aurell & Ekeberg, 2011)

78 Some related work thresholding estimator (poly-time for bounded degree) works with n 2 d logp samples (Bresler et al., 2008) information-theoretic lower bound over family G p,d : any method requires at least n = Ω(d 2 logp) samples (Santhanam & W., 2008, 2012) l 1 -based method: sharper achievable rates, also failure for θ large enough to violate incoherence (Bento & Montanari, 2009) empirical study: l 1 -based method can succeed beyond phase transition on Ising model (Aurell & Ekeberg, 2011) simpler neighborhood-based methods: thresholding, mutual information, greedy-methods Anandkumar, Tan & Willsky, 2010a, 2010b Netrapalli et al., 2010

79 Info. theory: Graph selection as channel coding graphical model selection is an unorthodox channel coding problem: Martin Wainwright (UC Berkeley) Graphical models and message-passing 38 / 41

80 Info. theory: Graph selection as channel coding graphical model selection is an unorthodox channel coding problem: codewords/codebook: graph G in some graph class G channel use: draw sample Xi = (X i1,...,x ip from Markov random field Q θ(g) decoding problem: use n samples {X1,...,X n} to correctly distinguish the codeword G Q(X G) X 1,...,X n Martin Wainwright (UC Berkeley) Graphical models and message-passing 38 / 41

81 Info. theory: Graph selection as channel coding graphical model selection is an unorthodox channel coding problem: codewords/codebook: graph G in some graph class G channel use: draw sample Xi = (X i1,...,x ip from Markov random field Q θ(g) decoding problem: use n samples {X1,...,X n} to correctly distinguish the codeword G Q(X G) X 1,...,X n Channel capacity for graph decoding determined by balance between log number of models relative distinguishability of different models Martin Wainwright (UC Berkeley) Graphical models and message-passing 38 / 41

82 Necessary conditions for G d,p G G d,p : graphs with p nodes and max. degree d Ising models with: Minimum edge weight: θ st θ min for all edges Maximum neighborhood weight: ω(θ) := max s V θst t N(s) Martin Wainwright (UC Berkeley) Graphical models and message-passing 39 / 41

83 Necessary conditions for G d,p G G d,p : graphs with p nodes and max. degree d Ising models with: Minimum edge weight: θ st θ min for all edges Maximum neighborhood weight: ω(θ) := max s V θst t N(s) Theorem If the sample size n is upper bounded by (Santhanam & W, 2008) { d n < max 8 log p 8d, exp( ω(θ) 4 )dθ minlog(pd/8), 128exp( 3θmin 2 ) logp } 2θ min tanh(θ min ) then the probability of error of any algorithm over G d,p is at least 1/2. Martin Wainwright (UC Berkeley) Graphical models and message-passing 39 / 41

84 Necessary conditions for G d,p G G d,p : graphs with p nodes and max. degree d Ising models with: Minimum edge weight: θ st θ min for all edges Maximum neighborhood weight: ω(θ) := max s V θst t N(s) Theorem If the sample size n is upper bounded by (Santhanam & W, 2008) { d n < max 8 log p 8d, exp( ω(θ) 4 )dθ minlog(pd/8), 128exp( 3θmin 2 ) logp } 2θ min tanh(θ min ) then the probability of error of any algorithm over G d,p is at least 1/2. Interpretation: Naive bulk effect: Arises from log cardinality log G d,p d-clique effect: Difficulty of separating models that contain a near d-clique Small weight effect: Difficult to detect edges with small weights. Martin Wainwright (UC Berkeley) Graphical models and message-passing 39 / 41

85 Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. Martin Wainwright (UC Berkeley) Graphical models and message-passing 40 / 41

86 Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. note that maximum neighborhood weight ω(θ ) dθ min = require θ min = O(1/d) Martin Wainwright (UC Berkeley) Graphical models and message-passing 40 / 41

87 Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. note that maximum neighborhood weight ω(θ ) dθ min = require θ min = O(1/d) from small weight effect logp n = Ω( θ min tanh(θ min ) ) = Ω( logp) θ 2 min Martin Wainwright (UC Berkeley) Graphical models and message-passing 40 / 41

88 Some consequences Corollary For asymptotically reliable recovery over G d,p, any algorithm requires at least n = Ω(d 2 logp) samples. note that maximum neighborhood weight ω(θ ) dθ min = require θ min = O(1/d) from small weight effect logp n = Ω( θ min tanh(θ min ) ) = Ω( logp) θ 2 min conclude that l 1 -regularized logistic regression (LR) is optimal up to a factor O(d) (Ravikumar., W. & Lafferty, 2010) Martin Wainwright (UC Berkeley) Graphical models and message-passing 40 / 41

89 1 Stochastic belief propagation Summary adaptively randomized version of ordinary message-passing finite-length convergence analysis 2 Learning structure of graphical models polynomial-time method for learning neighborhood structure natural extensions (using block regularization) to higher order models matches information-theoretic limits Some papers: Noorshams & W. (2011). Stochastic belief propagation: Low-complexity message-passing with guarantees. Proceedings of the Allerton Conference. Ravikumar, W. & Lafferty (2010). High-dimensional Ising model selection using l 1 -regularized logistic regression. Annals of Statistics. Santhanam & W. (2012). Information-theoretic limits of selecting binary graphical models in high dimensions, IEEE Transactions on Information Theory.

Introduction to graphical models: Lecture III

Introduction to graphical models: Lecture III Introduction to graphical models: Lecture III Martin Wainwright UC Berkeley Departments of Statistics, and EECS Martin Wainwright (UC Berkeley) Some introductory lectures January 2013 1 / 25 Introduction

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

High-dimensional graphical model selection: Practical and information-theoretic limits

High-dimensional graphical model selection: Practical and information-theoretic limits 1 High-dimensional graphical model selection: Practical and information-theoretic limits Martin Wainwright Departments of Statistics, and EECS UC Berkeley, California, USA Based on joint work with: John

More information

Graphical models and message-passing Part I: Basics and MAP computation

Graphical models and message-passing Part I: Basics and MAP computation Graphical models and message-passing Part I: Basics and MAP computation Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available

More information

High dimensional Ising model selection

High dimensional Ising model selection High dimensional Ising model selection Pradeep Ravikumar UT Austin (based on work with John Lafferty, Martin Wainwright) Sparse Ising model US Senate 109th Congress Banerjee et al, 2008 Estimate a sparse

More information

High-dimensional statistics: Some progress and challenges ahead

High-dimensional statistics: Some progress and challenges ahead High-dimensional statistics: Some progress and challenges ahead Martin Wainwright UC Berkeley Departments of Statistics, and EECS University College, London Master Class: Lecture Joint work with: Alekh

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 5, 06 Reading: See class website Eric Xing @ CMU, 005-06

More information

High dimensional ising model selection using l 1 -regularized logistic regression

High dimensional ising model selection using l 1 -regularized logistic regression High dimensional ising model selection using l 1 -regularized logistic regression 1 Department of Statistics Pennsylvania State University 597 Presentation 2016 1/29 Outline Introduction 1 Introduction

More information

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes

EE229B - Final Project. Capacity-Approaching Low-Density Parity-Check Codes EE229B - Final Project Capacity-Approaching Low-Density Parity-Check Codes Pierre Garrigues EECS department, UC Berkeley garrigue@eecs.berkeley.edu May 13, 2005 Abstract The class of low-density parity-check

More information

Probabilistic Graphical Models Lecture Notes Fall 2009

Probabilistic Graphical Models Lecture Notes Fall 2009 Probabilistic Graphical Models Lecture Notes Fall 2009 October 28, 2009 Byoung-Tak Zhang School of omputer Science and Engineering & ognitive Science, Brain Science, and Bioinformatics Seoul National University

More information

Undirected Graphical Models

Undirected Graphical Models Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Properties Properties 3 Generative vs. Conditional

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Large-Deviations and Applications for Learning Tree-Structured Graphical Models

Large-Deviations and Applications for Learning Tree-Structured Graphical Models Large-Deviations and Applications for Learning Tree-Structured Graphical Models Vincent Tan Stochastic Systems Group, Lab of Information and Decision Systems, Massachusetts Institute of Technology Thesis

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

3 : Representation of Undirected GM

3 : Representation of Undirected GM 10-708: Probabilistic Graphical Models 10-708, Spring 2016 3 : Representation of Undirected GM Lecturer: Eric P. Xing Scribes: Longqi Cai, Man-Chia Chang 1 MRF vs BN There are two types of graphical models:

More information

Inferning with High Girth Graphical Models

Inferning with High Girth Graphical Models Uri Heinemann The Hebrew University of Jerusalem, Jerusalem, Israel Amir Globerson The Hebrew University of Jerusalem, Jerusalem, Israel URIHEI@CS.HUJI.AC.IL GAMIR@CS.HUJI.AC.IL Abstract Unsupervised learning

More information

Sparse Graph Learning via Markov Random Fields

Sparse Graph Learning via Markov Random Fields Sparse Graph Learning via Markov Random Fields Xin Sui, Shao Tang Sep 23, 2016 Xin Sui, Shao Tang Sparse Graph Learning via Markov Random Fields Sep 23, 2016 1 / 36 Outline 1 Introduction to graph learning

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Variational Inference IV: Variational Principle II Junming Yin Lecture 17, March 21, 2012 X 1 X 1 X 1 X 1 X 2 X 3 X 2 X 2 X 3 X 3 Reading: X 4

More information

Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions

Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 58, NO. 7, JULY 2012 4117 Information-Theoretic Limits of Selecting Binary Graphical Models in High Dimensions Narayana P. Santhanam, Member, IEEE, and Martin

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 295-P, Spring 213 Prof. Erik Sudderth Lecture 11: Inference & Learning Overview, Gaussian Graphical Models Some figures courtesy Michael Jordan s draft

More information

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016 Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2016 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several

More information

Which graphical models are difficult to learn?

Which graphical models are difficult to learn? Which graphical models are difficult to learn? Jose Bento Department of Electrical Engineering Stanford University jbento@stanford.edu Andrea Montanari Department of Electrical Engineering and Department

More information

Latent Tree Approximation in Linear Model

Latent Tree Approximation in Linear Model Latent Tree Approximation in Linear Model Navid Tafaghodi Khajavi Dept. of Electrical Engineering, University of Hawaii, Honolulu, HI 96822 Email: navidt@hawaii.edu ariv:1710.01838v1 [cs.it] 5 Oct 2017

More information

5. Density evolution. Density evolution 5-1

5. Density evolution. Density evolution 5-1 5. Density evolution Density evolution 5-1 Probabilistic analysis of message passing algorithms variable nodes factor nodes x1 a x i x2 a(x i ; x j ; x k ) x3 b x4 consider factor graph model G = (V ;

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

Improved Greedy Algorithms for Learning Graphical Models

Improved Greedy Algorithms for Learning Graphical Models Improved Greedy Algorithms for Learning Graphical Models Avik Ray, Sujay Sanghavi Member, IEEE and Sanjay Shakkottai Fellow, IEEE Abstract We propose new greedy algorithms for learning the structure of

More information

Bayesian Machine Learning - Lecture 7

Bayesian Machine Learning - Lecture 7 Bayesian Machine Learning - Lecture 7 Guido Sanguinetti Institute for Adaptive and Neural Computation School of Informatics University of Edinburgh gsanguin@inf.ed.ac.uk March 4, 2015 Today s lecture 1

More information

Exact MAP estimates by (hyper)tree agreement

Exact MAP estimates by (hyper)tree agreement Exact MAP estimates by (hyper)tree agreement Martin J. Wainwright, Department of EECS, UC Berkeley, Berkeley, CA 94720 martinw@eecs.berkeley.edu Tommi S. Jaakkola and Alan S. Willsky, Department of EECS,

More information

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah

Introduction to Graphical Models. Srikumar Ramalingam School of Computing University of Utah Introduction to Graphical Models Srikumar Ramalingam School of Computing University of Utah Reference Christopher M. Bishop, Pattern Recognition and Machine Learning, Jonathan S. Yedidia, William T. Freeman,

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Estimators based on non-convex programs: Statistical and computational guarantees

Estimators based on non-convex programs: Statistical and computational guarantees Estimators based on non-convex programs: Statistical and computational guarantees Martin Wainwright UC Berkeley Statistics and EECS Based on joint work with: Po-Ling Loh (UC Berkeley) Martin Wainwright

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

CSC 412 (Lecture 4): Undirected Graphical Models

CSC 412 (Lecture 4): Undirected Graphical Models CSC 412 (Lecture 4): Undirected Graphical Models Raquel Urtasun University of Toronto Feb 2, 2016 R Urtasun (UofT) CSC 412 Feb 2, 2016 1 / 37 Today Undirected Graphical Models: Semantics of the graph:

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Graphical models and message-passing Part II: Marginals and likelihoods

Graphical models and message-passing Part II: Marginals and likelihoods Graphical models and message-passing Part II: Marginals and likelihoods Martin Wainwright UC Berkeley Departments of Statistics, and EECS Tutorial materials (slides, monograph, lecture notes) available

More information

Markov Random Fields for Computer Vision (Part 1)

Markov Random Fields for Computer Vision (Part 1) Markov Random Fields for Computer Vision (Part 1) Machine Learning Summer School (MLSS 2011) Stephen Gould stephen.gould@anu.edu.au Australian National University 13 17 June, 2011 Stephen Gould 1/23 Pixel

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Junction Tree, BP and Variational Methods

Junction Tree, BP and Variational Methods Junction Tree, BP and Variational Methods Adrian Weller MLSALT4 Lecture Feb 21, 2018 With thanks to David Sontag (MIT) and Tony Jebara (Columbia) for use of many slides and illustrations For more information,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models 2016 Robert Nowak Probabilistic Graphical Models 1 Introduction We have focused mainly on linear models for signals, in particular the subspace model x = Uθ, where U is a n k matrix and θ R k is a vector

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

CS Lecture 4. Markov Random Fields

CS Lecture 4. Markov Random Fields CS 6347 Lecture 4 Markov Random Fields Recap Announcements First homework is available on elearning Reminder: Office hours Tuesday from 10am-11am Last Time Bayesian networks Today Markov random fields

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Hidden Markov Models Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Additional References: David

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Lecture 2: August 31

Lecture 2: August 31 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Markov Random Fields: Representation Conditional Random Fields Log-Linear Models Readings: KF

More information

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Lecture 18 Generalized Belief Propagation and Free Energy Approximations Lecture 18, Generalized Belief Propagation and Free Energy Approximations 1 Lecture 18 Generalized Belief Propagation and Free Energy Approximations In this lecture we talked about graphical models and

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials

Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials by Phillip Krahenbuhl and Vladlen Koltun Presented by Adam Stambler Multi-class image segmentation Assign a class label to each

More information

Undirected Graphical Models: Markov Random Fields

Undirected Graphical Models: Markov Random Fields Undirected Graphical Models: Markov Random Fields 40-956 Advanced Topics in AI: Probabilistic Graphical Models Sharif University of Technology Soleymani Spring 2015 Markov Random Field Structure: undirected

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Does Better Inference mean Better Learning?

Does Better Inference mean Better Learning? Does Better Inference mean Better Learning? Andrew E. Gelfand, Rina Dechter & Alexander Ihler Department of Computer Science University of California, Irvine {agelfand,dechter,ihler}@ics.uci.edu Abstract

More information

Undirected graphical models

Undirected graphical models Undirected graphical models Semantics of probabilistic models over undirected graphs Parameters of undirected models Example applications COMP-652 and ECSE-608, February 16, 2017 1 Undirected graphical

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Probabilistic and Bayesian Machine Learning

Probabilistic and Bayesian Machine Learning Probabilistic and Bayesian Machine Learning Day 4: Expectation and Belief Propagation Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London http://www.gatsby.ucl.ac.uk/

More information

Machine Learning for Data Science (CS4786) Lecture 24

Machine Learning for Data Science (CS4786) Lecture 24 Machine Learning for Data Science (CS4786) Lecture 24 Graphical Models: Approximate Inference Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016sp/ BELIEF PROPAGATION OR MESSAGE PASSING Each

More information

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc.

Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010

More information

Expectation propagation for signal detection in flat-fading channels

Expectation propagation for signal detection in flat-fading channels Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA

More information

A Gaussian Tree Approximation for Integer Least-Squares

A Gaussian Tree Approximation for Integer Least-Squares A Gaussian Tree Approximation for Integer Least-Squares Jacob Goldberger School of Engineering Bar-Ilan University goldbej@eng.biu.ac.il Amir Leshem School of Engineering Bar-Ilan University leshema@eng.biu.ac.il

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

9 Forward-backward algorithm, sum-product on factor graphs

9 Forward-backward algorithm, sum-product on factor graphs Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science 6.438 Algorithms For Inference Fall 2014 9 Forward-backward algorithm, sum-product on factor graphs The previous

More information

Factor Graphs and Message Passing Algorithms Part 1: Introduction

Factor Graphs and Message Passing Algorithms Part 1: Introduction Factor Graphs and Message Passing Algorithms Part 1: Introduction Hans-Andrea Loeliger December 2007 1 The Two Basic Problems 1. Marginalization: Compute f k (x k ) f(x 1,..., x n ) x 1,..., x n except

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 24. Hidden Markov Models & message passing Looking back Representation of joint distributions Conditional/marginal independence

More information

Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs Animashree Anandkumar UC Irvine a.anandkumar@uci.edu Ragupathyraj Valluvan UC Irvine rvalluva@uci.edu Abstract Graphical

More information

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI

Generative and Discriminative Approaches to Graphical Models CMSC Topics in AI Generative and Discriminative Approaches to Graphical Models CMSC 35900 Topics in AI Lecture 2 Yasemin Altun January 26, 2007 Review of Inference on Graphical Models Elimination algorithm finds single

More information

Message passing and approximate message passing

Message passing and approximate message passing Message passing and approximate message passing Arian Maleki Columbia University 1 / 47 What is the problem? Given pdf µ(x 1, x 2,..., x n ) we are interested in arg maxx1,x 2,...,x n µ(x 1, x 2,..., x

More information

Information Geometric view of Belief Propagation

Information Geometric view of Belief Propagation Information Geometric view of Belief Propagation Yunshu Liu 2013-10-17 References: [1]. Shiro Ikeda, Toshiyuki Tanaka and Shun-ichi Amari, Stochastic reasoning, Free energy and Information Geometry, Neural

More information

4 : Exact Inference: Variable Elimination

4 : Exact Inference: Variable Elimination 10-708: Probabilistic Graphical Models 10-708, Spring 2014 4 : Exact Inference: Variable Elimination Lecturer: Eric P. ing Scribes: Soumya Batra, Pradeep Dasigi, Manzil Zaheer 1 Probabilistic Inference

More information

Message-Passing Algorithms for GMRFs and Non-Linear Optimization

Message-Passing Algorithms for GMRFs and Non-Linear Optimization Message-Passing Algorithms for GMRFs and Non-Linear Optimization Jason Johnson Joint Work with Dmitry Malioutov, Venkat Chandrasekaran and Alan Willsky Stochastic Systems Group, MIT NIPS Workshop: Approximate

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Estimating Latent Variable Graphical Models with Moments and Likelihoods

Estimating Latent Variable Graphical Models with Moments and Likelihoods Estimating Latent Variable Graphical Models with Moments and Likelihoods Arun Tejasvi Chaganty Percy Liang Stanford University June 18, 2014 Chaganty, Liang (Stanford University) Moments and Likelihoods

More information

arxiv:cs/ v1 [cs.it] 15 Aug 2005

arxiv:cs/ v1 [cs.it] 15 Aug 2005 To appear in International Symposium on Information Theory; Adelaide, Australia Lossy source encoding via message-passing and decimation over generalized codewords of LDG codes artin J. Wainwright Departments

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Graphical Models Another Approach to Generalize the Viterbi Algorithm

Graphical Models Another Approach to Generalize the Viterbi Algorithm Exact Marginalization Another Approach to Generalize the Viterbi Algorithm Oberseminar Bioinformatik am 20. Mai 2010 Institut für Mikrobiologie und Genetik Universität Göttingen mario@gobics.de 1.1 Undirected

More information

Max$Sum(( Exact(Inference(

Max$Sum(( Exact(Inference( Probabilis7c( Graphical( Models( Inference( MAP( Max$Sum(( Exact(Inference( Product Summation a 1 b 1 8 a 1 b 2 1 a 2 b 1 0.5 a 2 b 2 2 a 1 b 1 3 a 1 b 2 0 a 2 b 1-1 a 2 b 2 1 Max-Sum Elimination in Chains

More information

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage Lingrui Gan, Naveen N. Narisetty, Feng Liang Department of Statistics University of Illinois at Urbana-Champaign Problem Statement

More information

Minimizing D(Q,P) def = Q(h)

Minimizing D(Q,P) def = Q(h) Inference Lecture 20: Variational Metods Kevin Murpy 29 November 2004 Inference means computing P( i v), were are te idden variables v are te visible variables. For discrete (eg binary) idden nodes, exact

More information

13 : Variational Inference: Loopy Belief Propagation and Mean Field

13 : Variational Inference: Loopy Belief Propagation and Mean Field 10-708: Probabilistic Graphical Models 10-708, Spring 2012 13 : Variational Inference: Loopy Belief Propagation and Mean Field Lecturer: Eric P. Xing Scribes: Peter Schulam and William Wang 1 Introduction

More information

Part 1: Expectation Propagation

Part 1: Expectation Propagation Chalmers Machine Learning Summer School Approximate message passing and biomedicine Part 1: Expectation Propagation Tom Heskes Machine Learning Group, Institute for Computing and Information Sciences Radboud

More information

Introduction To Graphical Models

Introduction To Graphical Models Peter Gehler Introduction to Graphical Models Introduction To Graphical Models Peter V. Gehler Max Planck Institute for Intelligent Systems, Tübingen, Germany ENS/INRIA Summer School, Paris, July 2013

More information

5. Sum-product algorithm

5. Sum-product algorithm Sum-product algorithm 5-1 5. Sum-product algorithm Elimination algorithm Sum-product algorithm on a line Sum-product algorithm on a tree Sum-product algorithm 5-2 Inference tasks on graphical models consider

More information

MAP Examples. Sargur Srihari

MAP Examples. Sargur Srihari MAP Examples Sargur srihari@cedar.buffalo.edu 1 Potts Model CRF for OCR Topics Image segmentation based on energy minimization 2 Examples of MAP Many interesting examples of MAP inference are instances

More information

Robust Inverse Covariance Estimation under Noisy Measurements

Robust Inverse Covariance Estimation under Noisy Measurements .. Robust Inverse Covariance Estimation under Noisy Measurements Jun-Kun Wang, Shou-De Lin Intel-NTU, National Taiwan University ICML 2014 1 / 30 . Table of contents Introduction.1 Introduction.2 Related

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ Bayesian paradigm Consistent use of probability theory

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

13 : Variational Inference: Loopy Belief Propagation

13 : Variational Inference: Loopy Belief Propagation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 13 : Variational Inference: Loopy Belief Propagation Lecturer: Eric P. Xing Scribes: Rajarshi Das, Zhengzhong Liu, Dishan Gupta 1 Introduction

More information

ALARGE class of codes, including turbo codes [3] and

ALARGE class of codes, including turbo codes [3] and IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 3, MARCH 2010 1351 Lossy Source Compression Using Low-Density Generator Matrix Codes: Analysis and Algorithms Martin J. Wainwright, Elitza Maneva,

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods

Computer Vision Group Prof. Daniel Cremers. 14. Sampling Methods Prof. Daniel Cremers 14. Sampling Methods Sampling Methods Sampling Methods are widely used in Computer Science as an approximation of a deterministic algorithm to represent uncertainty without a parametric

More information

High-dimensional covariance estimation based on Gaussian graphical models

High-dimensional covariance estimation based on Gaussian graphical models High-dimensional covariance estimation based on Gaussian graphical models Shuheng Zhou Department of Statistics, The University of Michigan, Ann Arbor IMA workshop on High Dimensional Phenomena Sept. 26,

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

Bayesian Learning in Undirected Graphical Models

Bayesian Learning in Undirected Graphical Models Bayesian Learning in Undirected Graphical Models Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK http://www.gatsby.ucl.ac.uk/ Work with: Iain Murray and Hyun-Chul

More information

ECE521 Lectures 9 Fully Connected Neural Networks

ECE521 Lectures 9 Fully Connected Neural Networks ECE521 Lectures 9 Fully Connected Neural Networks Outline Multi-class classification Learning multi-layer neural networks 2 Measuring distance in probability space We learnt that the squared L2 distance

More information

Approximate Inference Part 1 of 2

Approximate Inference Part 1 of 2 Approximate Inference Part 1 of 2 Tom Minka Microsoft Research, Cambridge, UK Machine Learning Summer School 2009 http://mlg.eng.cam.ac.uk/mlss09/ 1 Bayesian paradigm Consistent use of probability theory

More information